0% found this document useful (0 votes)

31 views13 pages

Online Reinforcement Learning-Based Control of An Active Suspension System Using The Actor Critic Approach

Uploaded by

ankon das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views13 pages

Online Reinforcement Learning-Based Control of An Active Suspension System Using The Actor Critic Approach

Uploaded by

ankon das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

applied

sciences
Article
Online Reinforcement Learning-Based Control of
an Active Suspension System Using the
Actor Critic Approach
Ahmad Fares 1, *,† and Ahmad Bani Younes 2,3,†
1 Mechatronics Engineering, Jordan University of Science and Technology, Irbid 22110-3030, Jordan
2 Aerospace Engineering, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182-1308, USA;
[email protected]
3 Mechanical Engineering, Al Hussein Technical University, Amman 11831-139, Jordan
* Correspondence: [email protected]
† These authors contributed equally to this work.

Received: 17 October 2020; Accepted: 10 November 2020; Published: 13 November 2020

Abstract: In this paper, a controller learns to adaptively control an active suspension system using
reinforcement learning without prior knowledge of the environment. The Temporal Difference
(TD) advantage actor critic algorithm is used with the appropriate reward function. The actor
produces the actions, and the critic criticizes the actions taken based on the new state of the
system. During the training process, a simple and uniform road profile is used while maintaining
constant system parameters. The controller is tested using two road profiles: the first one is
similar to the one used during the training, while the other one is bumpy with an extended range.
The performance of the controller is compared with the Linear Quadratic Regulator (LQR) and
optimum Proportional-Integral-Derivative (PID), and the adaptiveness is tested by estimating some
of the system’s parameters using the Recursive Least Squares method (RLS). The results show
that the controller outperforms the LQR in terms of the lower overshoot and the PID in terms of
reducing the acceleration.

Keywords: active suspension; reinforcement learning; online learning; temporal difference; actor
critic; adaptive control

1. Introduction
Ride quality and passenger comfort are some of the major concerns of any vehicle designer and
have been well investigated and researched over the past few decades. The vehicle suspension system
plays an important role in vehicle safety, handling, and comfort. It keeps the tires in contact with
the road, provides good handling and stable steering, and minimizes the vibrations and oscillations
due to road irregularities, which ensure the comfort of the passengers. In general, the suspension is
divided into three types: passive, semi-active, and active suspension systems. Most of today’s vehicles
use the passive system, which consists of a spring and a damper that are fixed at the design stage,
which provide an acceptable performance for a limited frequency range. Changing the suspension
properties like the damping coefficient allows for good performance, but in a different frequency
range. That is why sport cars have better handling than standard cars, whereas the latter have better
riding quality. In a semi-active suspension system, energy is not added to the system; however,
the system varies the viscous damping co-efficient of the shock absorber. Therefore, the need for a
system that attains the objectives for the entire working frequency range justifies the momentum for
active suspension research.

Appl. Sci. 2020, 10, 8060; doi:10.3390/app10228060 www.mdpi.com/journal/applsci

Appl. Sci. 2020, 10, 8060 2 of 13

In the active suspension system, an actuator is added to the spring and damper, which adds
energy to the system by exerting an adaptive counter force, resulting in an improved dynamics
behavior. Several controllers were implemented in the active suspension system problem, and the
discrete indirect pole assignment with a fuzzy logic gain scheduler was developed by [1]. A non-linear
model of the active suspension was investigated by [2] using a sliding mode controller that utilizes
the sky-hook damper system without the need for road profile input. There have been more recent
attempts to improve the system such as the study conducted by [3], which used two loop PID
applied to the half model in the non-linear form. Compared to the passive system with the same
parameters, they showed excellent improvement. However, the controller could not completely
reject the disturbance. Another example of the utilization of the PID controller in the half model
was that developed by [4]. They studied three scenarios and tried three tuning methods where the
iterative learning algorithm performed better than the other two methods. Reference [5] showed
that fuzzy logic performed significantly better than the PID in the quarter model based on two types
of road conditions. Another comparison based on the quarter model, but between a robust PI and
Linear Quadratic Regulator (LQR) controller, was conducted by [6], where they showed that the LQR
outperformed the PI controller in the presence of parameter uncertainty. Another optimal controller is
H∞ , which was examined by [7] and showed a 93% reduction in the car’s acceleration, wheel travel,
and suspension travel.
Recently, machine learning has been widely investigated in control problems and applied
to vehicle suspension. In many cases, neural networks are used in combination with traditional
controllers including the sliding mode controller [8], PID [9], and LQR [10] to enhance the controller
performance and other tasks like determining the road roughness [11], but they have rarely been used
as the controller itself. A good example that motivates the use of machine learning methods as the
controller is [12], where a neural network was trained by the optimal PID and surpassed it under
parameter uncertainties. Another machine learning method that has recently gained momentum
and shown great success in a broad range of applications including games and economics, but rarely
applied to control problems, is reinforcement learning.
Despite the rare application of reinforcement learning in the active suspension problem, one of
the earliest attempts to the best of our knowledge was conducted by [13–15]. Although there is a
significant difference between their learning algorithms and the currently used algorithms, the core
idea of learning by interactions and experience is the same. The three attempts shared the same idea
of maximizing a cost function and learning to select the controller parameters to achieve the best
performance. The learning algorithm implemented by [13] was able to achieve near optimal results
compared with the Linear Quadratic Gaussian (LQG) under idealized conditions. Reference [15] had
the same approach, but they introduced a new learning scheme, which allowed the controller after
learning to work in certain conditions where the traditional LQG controller resulted in an unstable
system. They also tested the learning method in a real vehicle, but with a semi-active suspension
system installed, and they showed promising experimental results. A more recent successful attempt
was the study done by [16] where they applied a stochastic real-valued reinforcement learning control
to a non-linear quarter model. A similar approach to the one considered in this research was conducted
by [17]; the actor critic networks were trained by the policy gradient method, and the controller was
tested to some extent with the same road profile considered in this study. They compared their work
with the passive suspension system and showed 62% improvement.
In this paper, an online deep reinforcement learning controller is developed using the TD
advantage actor critic algorithm. The controller is applied to a quarter model active suspension
system to investigate the performance of this learning-based algorithm against the Linear Quadratic
Regulator (LQR) and the optimum Proportional-Integral-Derivative (PID). In order to handle possible
system uncertainties, the algorithm is integrated with a Recursive Least Squares method (RLS) for
online estimation of system damping coefficients.
Appl. Sci. 2020, 10, 8060 3 of 13

2. Methods

2.1. Modeling of the Active Suspension System

The active suspension model considered in this paper is the linear quarter model
shown in Figure 1. It consists of two masses supported by two dampers and two springs. The first one
is the sprung mass ms , which represents the vehicle body and is supported by the suspension, which is
modeled by the damper bs and k s . The wheel mass and the tire contact with the road are represented
by the unsprung mass mus , the damper bus , and the stiffness k us , respectively. The nominal values
considered in this study are shown in Table 1, and the mathematical model of the active suspension is
represented by the following state space:

ẋ = Ax + Bu + d (1)

y = Cx + Du (2)
       
0 1 0 0 0 0 xs
− ks − mbss ks bs   1   0   ẋ 
 s
A =  ms ms ms
 B =  ms  d =  x= 
     
 0 0 0 1   0   0   xus 
ks bs bus k us
mus mus − ksm+uskus − bsm+usbus − m1us ż
mus r + mus rz ẋus
" #
xs − xus
y= u = Fc
ẍs

" # " #
1 0 −1 0 0
C= D=
− mkss − mbss ks
ms
bs
ms
1
ms

The states are defined as: the displacement of the sprung mass xs , the vehicle body velocity ẋs ,
the displacement of the unsprung mass xus , and the vertical velocity of the wheel ẋus . Other states can
be obtained such as the suspension travel xs − xus and the wheel relative deflection with respect to the
road profile xus − zr . zr and żr represent the changes of the road profile and the rate of its change.

Figure 1. Quarter model.

Appl. Sci. 2020, 10, 8060 4 of 13

The actuator is positioned between the sprung and unsprung mass and generates a counter force
to improve the system performance by reducing the acceleration and the suspension travel. The open
loop response of the system can be obtained by setting Fc = 0, which corresponds to the performance
of the passive suspension system. We assume the working of the actuator, also referred to as the action
space later, to be between −60 N and 60 N.

Table 1. Nominal model parameters.

Parameter Value
Sprung mass ms 2.45 kg
Damping of the car body damper bs 7.5 s/m
Stiffness of the car body k s 900 N/m
Unsprung mass mus 1 kg
Damping of tire bus 5 s/m
Stiffness of the tire k us 2500 N/m

2.2. Reinforcement Learning

Reinforcement learning is one of the three machine learning methods that has witnessed in the
past few years a significant leap due to hardware improvements, especially after the publication of the
first successful deep learning model that learned to control the policies of seven Atari 2600 games [18].
As shown in Figure 2, reinforcement learning works by the concept of trial and error where an agent
at a certain state st takes an action at to interact with the environment, steps to a new state st+1 ,
and gets a reward that depends on the value of the new state, which therefore reflects the quality
of the action taken by the agent. The target is to find a policy π θ that maximizes the accumulated
reward, subsequently. It can be said that the agent learns by experience. This method has gained much
attention due to its ability to solve complex problems that cannot be solved by supervised learning
efficiently without the need for huge training datasets and correct answers that are passed to the
neural network through backpropagation. It can also overtake traditional controllers that work in a
bounded range and has limitations when applied to non-linear models and MIMO systems. In this
paper, the agent, the environment, and the action represent the controller, the active suspension model,
and the force, respectively.
Deep reinforcement learning utilizes neural networks to generate an action at given state st and
estimates the value of a state st+1 , which allows us to extend reinforcement learning to the continuous
space and continuous action problems. In this paper, we investigate deep reinforcement learning in
control problems by applying the TD advantage actor critic algorithm to the active suspension system.

Figure 2. RL process.

2.2.1. TD Advantage Actor Critic Algorithm

The TD advantage actor critic algorithm is an online model-free algorithm that consists of
two neural networks, the actor network π θ (s) and the critic network VπU (s), where θ is the actor
Appl. Sci. 2020, 10, 8060 5 of 13

network parameters (weights and biases) and U is the critic network parameters (weights and biases).
The actor network takes the current state st as an input and generates an action at given the current
policy π θ , and the agent executes the action in the environment, which therefore produces a scalar
reward signal rt and steps to the next state st+1 . The critic network takes the current state as an
input and produces a value for that state VπU (s), then the next state is fed to the critic network to
produce a value for it VπU (st+1 ) before the parameters are updated. The critic learns a value function,
which is then used to update the actor’s weights in the direction of improving the performance at
every time step.
In this algorithm, the output of the actor is not the action itself; this is the mean µ and the
standard deviation σ, which give the probability distribution of choosing action at given the state st .
Therefore, the policy is not deterministic; it is a stochastic policy. A step by step implementation is
shown in Algorithm 1.

Algorithm 1: TD Advantage Actor Critic.

Initialize the Critic Network VπU (s) and the Actor Network π θ (s)
for episode = 1:M
Start from the initial state sto
for i = 1:N
Sample action at ∼ π θ (st |µ, σ) = N (st |µ, σ)
Execute action at in the environment and obtain the reward r and step to the next state st+1
Calculate the temporal difference error δt = r + γVπU (st+1 ) − VπU (s)
Update the Critic parameters by minimizing δt2
Update the Actor parameters by minimizing the Loss = −log( N (st |µ, σ)) · δt
Set st = st+1
end
end

2.2.2. Reward Function

The formulation of the reward function plays a crucial role in the learning process. It is used
to indicate the quality of the action taken by the agent after it steps to the next state. In supervised
learning, the correct answer is known and fed to the neural network through backpropagation which
with using an appropriate optimization method, tunes the network parameters to the direction of
minimizing the error. However, in reinforcement learning, the agent receives a reward signal from the
environment based on the next state which determines the quality of the taken action. Positive rewards
encourage the agent to accumulate as much reward as possible whereas negative rewards incentivize
the agent to reach the targeted state quickly to avoid accumulating penalties. We tested three different
reward functions, the first one takes the following form:

rt = −k (| xs − xus |) (3)

The second one is as follows:

rt = −k (| x˙s |) (4)

The third reward function is:

rt = −k1 ( ẋs )2 − k2 (|u|) (5)

During the learning process, the third reward function performed the best. The neural networks
struggled to converge with the first reward function due to low and close numerical values. The second
reward function performed better; however, the actor network was not able to eliminate the steady-state
error of the sprung mass xs . The reason for this is that the system can reach zero ẋs , therefore obtaining
the maximum reward in this case without reaching the desired xs . This problem was solved in the third
Appl. Sci. 2020, 10, 8060 6 of 13

reward function by adding a small penalty on the force, which will encourage the actor to produce
zero forces when ẋs is zero.
In the third reward function, the vehicle body velocity is used as the indicator of the quality for
the action taken; the value is squared for a gradual feedback and to benefit from the optimization
properties of a convex, thus allowing the controller to know that it is improving. The signal is amplified
because the range of the suspension travel being studied is within a few centimeters, and the negative
sign is applied to have the reward increased as ẋs decreases, which therefore motivates the controller
to reach the desired state as fast as possible. k1 and k2 are chosen to be 1000 and 0.1, respectively.
k1 is chosen to have a high value to motivate the controller to reach zero velocity, while k2 is chosen
to have a much smaller value as higher values impose limitations on the working range of controller
force, which therefore will have a negative impact on the response. Other values might improve or
worsen the learning process.

2.2.3. Learning and Optimization

The critic network and actor network structures are shown in Figures 3 and 4, respectively, and the
implementation of the algorithm is illustrated in Figure 5. Weights are initialized using variance scaling
for Tanh and Sigmoid, Xavier for ReLu, and elu, whereas layers with the linear activation function
are initialized from a uniform distribution. The mean output layer µ is initialized with µ = 0 and σ
= 0.5. The vehicle body velocity ẋs is utilized to represent the state of the suspension system and
used as the input to both networks. The loss functions for both networks are minimized using the
adaptive learning rate algorithm ADAM optimizer, which can be found in [19]. The learning rates for
the critic network are chosen to be higher than the actor network such that the critic learns faster; thus,
it produces accurate values for the states, which then help the actor learn and generate better actions.
The learning rates are 0.01 and 0.001, respectively, and after a marked learning and improvement,
the learning rates are decreased to 0.001 and 0.0001, respectively. In many cases, the experience
(st , at , rt , st+1 ) is stored in a memory with a pre-defined size, and the networks’ parameters are updated
by computing the average gradient over sampled transitions at every time step. While this approach
solves the problem of correlated states, however, it is not considered in this paper. Computing the
gradients at each time step for the single experience was faster and more efficient since the road profile
changes every 1.5 s. The value of the discount factor γ was chosen to be 0.99, and the activation
functions used throughout our study are summarized in Table 2.

Table 2. List of activation functions.

Type Function Derivative

Linear f (x) = x 1
Tanh f ( x ) = (1+2e−2x ) − 1 1 − f ( x )2
Sigmoid f ( x ) = 1+1e−x f ( x )(1 − f ( x )
f ( x ) = x f or ≥ 0 1
ReLu
f ( x ) = 0 f or < 0 0
f ( x ) = x f or ≥ 0 1
elu
f ( x ) = α(e x − 1) f or < 0 f (x) + α
e xi f ( xi )(δik − f ( xk )))
Sofmax f ( xi ) = k x
∑ j =1 e j δik = 1 f or i 6= k and 0 f or i = k

Unfortunately, there is no standard method of choosing the number of neurons, the number of
hidden layers, the types, and the order of activation functions. Therefore, we built and ran various
models of the actor and critic multiple times until satisfactory performance was achieved. The best
performing neural networks in this study are summarized in Table 3. The algorithm was implemented
in Python 3.7 and trained using TensorFlow libraries.
Appl. Sci. 2020, 10, 8060 7 of 13

Table 3. Various actor-critic models.

Accumulated
Number Actor Critic
Rewards

Hidden Layers Output Layers Hidden Layers

Output Layer
1 Number of Neurons: 5 µ σ Number of Neurons: 18 −16034

elu sigmoid Relu Tanh Linear ReLu elu Linear

Hidden Layers Output Layers Hidden Layers

Output Layer
2 Number of Neurons: 5 µ σ Number of Neurons: 5 −10328

elu Linear ReLu elu Tanh elu Tanh elu Linear

Hidden Layers Output Layers Hidden Layers

Output Layer
3 Number of Neurons: 5 µ σ Number of Neurons: 5 −106950

ReLu Tanh ReLu Linear ReLu ReLu Tanh Linear

Hidden Layers Output Layers Hidden Layers

Output Layer
4 Number of Neurons: 10 µ σ Number of Neurons: 18 −10867

elu sigmoid Relu Tanh Linear ReLu elu Tanh elu Tanh elu Linear

Hidden Layers Output Layers Hidden Layers

Output Layer
5 Number of Neurons: 10 µ σ Number of Neurons: 18 −10956

ReLu sigmoid ReLu Tanh Linear ReLu elu Tanh elu Tanh elu Linear

Figure 3. Critic structure.

Figure 4. Actor structure.

Appl. Sci. 2020, 10, 8060 8 of 13

Figure 5. Algorithm implementation.

2.3. Online Estimation

In real-life applications, the parameters of the active suspension vary with time. Therefore,
for more realistic investigation, the controller after training is tested without accurate knowledge of
the parameters of the car body damper bs and the damping of the tire bus , and we estimated them
iteratively using the recursive least squares with exponential forgetting method, which works well
with time-varying parameters [20], assuming that the parameters to be determined are not constant,
but the true value changes with time. Given the initial values of Po and θ̂(t − 1), the recursive method
satisfies the following Equations (6)–(8).

θ̂(t ) = θ̂(t − 1) + K (t )(y(t ) − ϕT (t )θ̂(t − 1)) (6)

K (t ) = P (t − 1)ϕ(t )(λI + ϕT (t )P (t − 1)ϕ(t ))−1 (7)

1
P (t ) = ( I − K (t )ϕT )P (t − 1) (8)
λ
where θ̂(t ) is a vector that includes the estimated parameters bs and bus and K (t ) is a weighting
vector that indicates how the correction and the pervious estimate should be combined. y(t) is the
measured valued; in this paper, it is the true values added with noise generated from the standard
normal distribution. ϕ is the regressor. P(t) is a matrix defined only when the matrix Φ(t) T Φ(t)
is nonsingular where:
t
Φ(t )T Φ(t ) = ∑ ϕ(i)ϕT (i) (9)
i =1

I is the identity matrix with a size of nxn, where n is the number of parameters to be determined,
and λ is a constant chosen to be 0.90. For fast convergence and to avoid singularities, the matrix P(t) is
initialized as the identity matrix multiplied by 1000. We use the unsprung mass acceleration ẍus in
the online estimation process as the output y(t), since it includes all the parameters to be estimated.
From (1), we can obtain the following:
Appl. Sci. 2020, 10, 8060 9 of 13

ks bs (k s + k us ) (bs + bus ) 1 bus k us

ẍus = xs + ẋs − xus − ẋus − u+ żr + zr (10)
mus mus mus mus mus mus mus
Taking only the coefficients of the parameters to be determined as the other parameters
will disappear when the error (y(t ) − ϕT (t )θ̂(t − 1)) is calculated and considering that
the output satisfies:
y(t ) = A(t ) x(t ) (11)

yield the following: " #

T ẋs − ẋus − ẋus
A(t ) = ϕ (t ) = (12)
mus mus

such that: " #

bs
x(t ) = (13)
bus
" #
b̂s
θ̂(t ) = (14)
b̂us

where the term y(t ) is the measured output and x(t ) contains the measured parameters, whereas the
term ϕ(t ) is the estimated output and θ̂(t ) contains the estimated parameters.

3. Results and Discussion

A simple road profile was used in the learning process where zr is generated by a square wave
with an amplitude of 0.02 m and a period of 3 s, as shown in Figure 6a,b. The performance of the
controller after training was compared with the optimum PID obtained from [12] and with the Linear
Quadratic Regulator (LQR) from the Quanser laboratory guide [21] with the weighting matrices
as follow:
 
450 0 0 0
 0 30 0 0 
Q=  and R = [0.01].
 
 0 0 5 0
0 0 0 0.1

(a) The changes in road profile zr (b) The rate of change in road profile żr

Figure 6. Scenario 1 road profile.

Two scenarios were studied to build confidence and test the trained controller. In the first scenario,
the controller was compared with the optimum PID and LQR on the same road profile used during
the training process under ideal conditions. In the second scenario, a new bumpy road profile was
used, and parameter estimation was added to the simulation.
Appl. Sci. 2020, 10, 8060 10 of 13

3.1. Scenario 1
Figure 7a–c shows the performance of the three controllers. LQR had the worst response with
an overshoot of about 20% and the highest acceleration of 0.4247 m/s2 . The actor network and
the PID performed significantly better with almost matching results. The average acceleration for
the actor network and the PID was 0.2827 m/s2 and 0.2919 m/s2 , respectively. This encouraged us
to further test the actor network in new conditions and environments that were different from the
training environment.

(a) Sprung mass displacement (b) Unsprung mass displacement

(c) Vehicle body acceleration

Figure 7. Scenario 1 results.

3.2. Scenario 2
The range of the road profile was extended as shown in Figure 8, and online parameter estimation
was included. The new road profile is shown in Figure 8a. bs and bus were assumed to be changing
frequently at a rate of 0.5 Hz. bs varied between 4 and 9 s/m, and bus varied between 3 and 7 s/m.
Noise was added to the parameters to simulate noisy measurements. Figure 9a,b shows that the system
successfully estimated the parameters in less than 100 ms. Despite the changing of parameters and the
bumpy road profile, the actor network was able to maintain excellent performance and provided 6.14%
lower overall acceleration compared to the optimum PID. The average acceleration values obtained
were 0.6861 m/s2 for the actor network and 0.7310 m/s2 for the optimum PID.
Appl. Sci. 2020, 10, 8060 11 of 13

(a) Road profile (b) Sprung mass displacement

(c) Unsprung mass displacement (d) Vehicle body acceleration

Figure 8. Scenario 2.

(a) Estimation of vehicle damping (b) Estimation of tire damping

Figure 9. Parameter estimation results.

4. Conclusions
In this paper, online reinforcement learning with the TD advantage actor critic was used to train
an active suspension system controller. The structure of the neural networks was obtained by the trial
and error method. Three different reward functions were studied, and the implemented one used
the body vehicle velocity and the produced force as an indication of the quality of the action taken.
Appl. Sci. 2020, 10, 8060 12 of 13

The results showed that the reinforcement learning can obtain near optimal results under parameter
uncertainty while estimating them using the RLS with forgetting factor method.
The results encourage further studies by testing other algorithms like the Deep Deterministic
Policy Gradient (DDPG) and Asynchronous Advantage Actor Critic (A3C). In addition, a full model
suspension system will provide a better understanding of the controller capabilities. Moreover,
the adaptiveness and the ability to continuously learn the complex dynamics under disturbances and
uncertainties motivate the use of deep reinforcement learning in non-linear models.

Author Contributions: Conceptualization, A.B.Y.; methodology, A.F. and A.B.Y.; software, A.F. and A.B.Y.;
validation, A.F.; investigation, A.F.; writing—original draft preparation, A.F.; supervision, A.B.Y. All authors have
read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Acknowledgments: The authors would like to thank Yahya Zweiri from the Department of Mechanical
Engineering in Kingston University London for his generous technical support throughout this work.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Ramsbottom, M.; Crolla, D.; Plummer, A.R. Robust adaptive control of an active vehicle suspension system.
J. Automob. Eng. 1999, 213, 1–17. [CrossRef]
2. Kim, C.; Ro, P. A sliding mode controller for vehicle active suspension systems with non-linearities.
J. Automob. Eng. 1998, 212, 79–92. [CrossRef]
3. Ekoru, J.E.; Dahunsi, O.A.; Pedro, J.O. Pid control of a nonlinear half-car active suspension system via
force feedback. In Proceedings of the IEEE Africon’11, Livingstone, Zambia, 13–15 September 2011; pp. 1–6.
4. Talib, M.H.A.; Darns, I.Z.M. Self-tuning pid controller for active suspension system with hydraulic actuator.
In Proceedings of the 2013 IEEE Symposium on Computers & Informatics (ISCI), Langkawi, Malaysia,
7–9 April 2013; pp. 86–91.
5. Salem, M.; Aly, A.A. Fuzzy control of a quarter-car suspension system. World Acad. Sci. Eng. Technol.
2009, 53, 258–263.
6. Mittal, R. Robust pi and lqr controller design for active suspension system with parametric uncertainty.
In Proceedings of the 2015 International Conference on Signal Processing, Computing and Control (ISPCC),
Waknaghat, India, 24–26 September 2015; pp. 108–113.
7. Ghazaly, N.M.; Ahmed, A.E.N.S.; Ali, A.S.; El-Jaber, G. H∞ control of active suspension system for a
quarter car model. Int. J. Veh. Struct. Syst. 2016, 8, 35–40. [CrossRef]
8. Huang, S.; Lin, W. A neural network based sliding mode controller for active vehicle suspension.
J. Automob. Eng. 2007, 221, 1381–1397. [CrossRef]
9. Heidari, M.; Homaei, H. Design a pid controller for suspension system by back propagation neural network.
J. Eng. 2013, 2013, 421543. [CrossRef]
10. Zhao, F.; Dong, M.; Qin, Y.; Gu, L.; Guan, J. Adaptive neural networks control for camera stabilization with
active suspension system. Adv. Mech. Eng. 2015, 7. [CrossRef]
11. Qin, Y.; Xiang, C.; Wang, Z.; Dong, M. Road excitation classification for semi-active suspension system based
on system response. J. Vib. Control 2018, 24, 2732–2748. [CrossRef]
12. Konoiko, A.; Kadhem, A.; Saiful, I.; Ghorbanian, N.; Zweiri, Y.; Sahinkaya, M.N. Deep learning framework
for controlling an active suspension system. J. Vib. Control 2019, 25, 2316–2329. [CrossRef]
13. Gordon, T.; Marsh, C.; Wu, Q. Stochastic optimal control of active vehicle suspensions using
learning automata. J. Syst. Control Eng. 1993, 207, 143–152. [CrossRef]
14. Marsh, C.; Gordon, T.; Wu, Q. Application of learning automata to controller design in slow-active
automobile suspensions. Veh. Syst. Dyn. 1995, 24, 597–616. [CrossRef]
15. Frost, G.; Gordon, T.; Howell, M.; Wu, Q. Moderated reinforcement learning of active and semi-active vehicle
suspension control laws. J. Syst. Control Eng. 1996, 210, 249–257. [CrossRef]
16. Bucak, İ.Ö.; Öz, H.R. Vibration control of a nonlinear quarter-car active suspension system by
reinforcement learning. Int. J. Syst. Sci. 2012, 43, 1177–1190. [CrossRef]
Appl. Sci. 2020, 10, 8060 13 of 13

17. Chen, H.C.; Lin, Y.C.; Chang, Y.H. An actor-critic reinforcement learning control approach for
discrete-time linear system with uncertainty. In Proceedings of the 2018 International Automatic Control
Conference (CACS), Taoyuan, Taiwan, 4–7 November 2018; pp. 1–5.
18. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari
with deep reinforcement learning. arXiv 2013, arXiv:1312.5602.
19. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980.
20. Åström, K.J.; Wittenmark, B. Adaptive Control; Courier Corporation: North Chelmsford, MA, USA, 2013.
21. Apkarian, J.; Abdossalami, A. Active Suspension Experiment for Matlabr/Simulinkr Users—Laboratory Guide;
Quanser Inc.: Markham, ON, Canada, 2013.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional
affiliations.

c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Unit - 26 - Machine - Learning - Assignment - 01 (1) Alish
No ratings yet
Unit - 26 - Machine - Learning - Assignment - 01 (1) Alish
42 pages
Andriy Burkov - The Hundred-Page Machine Learning Book (2019, Andriy Burkov) PDF
100% (3)
Andriy Burkov - The Hundred-Page Machine Learning Book (2019, Andriy Burkov) PDF
152 pages
Fuzzy Control of Active Suspension System Based On Quarter Car Model
No ratings yet
Fuzzy Control of Active Suspension System Based On Quarter Car Model
7 pages
Artificial General Intelligence
100% (4)
Artificial General Intelligence
277 pages
A Comparative Analysis Ofpid, LQR and Fuzzy Logic Controller For Active Suspension System Using Degree of Freedom Quarter Car Model
No ratings yet
A Comparative Analysis Ofpid, LQR and Fuzzy Logic Controller For Active Suspension System Using Degree of Freedom Quarter Car Model
5 pages
Controlstrategies Automotivesuspension Bookchapter Springer Kashem2017
No ratings yet
Controlstrategies Automotivesuspension Bookchapter Springer Kashem2017
14 pages
Active Suspension Control of Full Car Systems Without Function Approximation PDF
No ratings yet
Active Suspension Control of Full Car Systems Without Function Approximation PDF
12 pages
Development of A Machine Learning Based Control System For Vehicle Active Suspension Systems
No ratings yet
Development of A Machine Learning Based Control System For Vehicle Active Suspension Systems
9 pages
Road-Adaptive Static Output Feedback Control of A Semi-Active Suspension System For Ride Comfort
No ratings yet
Road-Adaptive Static Output Feedback Control of A Semi-Active Suspension System For Ride Comfort
25 pages
Research On Deep Reinforcement Learning Control Al
No ratings yet
Research On Deep Reinforcement Learning Control Al
20 pages
Actuators 12 00437 v2
No ratings yet
Actuators 12 00437 v2
17 pages
4.Drsrg 2017 Assjeejn
No ratings yet
4.Drsrg 2017 Assjeejn
10 pages
v1 Covered
No ratings yet
v1 Covered
10 pages
Design and Performance Analysis of Fuzzy PDF
No ratings yet
Design and Performance Analysis of Fuzzy PDF
6 pages
Vibration Control of An Active Vehicle Suspension Systems Using Optimized
No ratings yet
Vibration Control of An Active Vehicle Suspension Systems Using Optimized
9 pages
Semi-Active Suspension Control Based On Deep Reinforcement Learning
No ratings yet
Semi-Active Suspension Control Based On Deep Reinforcement Learning
9 pages
LQR Control Scheme For Active Vehicle Suspension Systems Based On Modal Decomposition
No ratings yet
LQR Control Scheme For Active Vehicle Suspension Systems Based On Modal Decomposition
6 pages
Machines 11 01022 v2
No ratings yet
Machines 11 01022 v2
17 pages
(2020) HIL Control
No ratings yet
(2020) HIL Control
6 pages
Design and Simulation Optimal Controller For Quarter Car Active Suspension System
No ratings yet
Design and Simulation Optimal Controller For Quarter Car Active Suspension System
7 pages
Applsci 13 08204
No ratings yet
Applsci 13 08204
14 pages
Neural Controller Design For Suspension Systems
No ratings yet
Neural Controller Design For Suspension Systems
10 pages
Control An Active Suspension System by U
No ratings yet
Control An Active Suspension System by U
11 pages
Adaptive Neural Network Sliding Mode Control For A
No ratings yet
Adaptive Neural Network Sliding Mode Control For A
11 pages
Active Control of Quarter Car Suspension System Us
No ratings yet
Active Control of Quarter Car Suspension System Us
10 pages
Gordon 1991
No ratings yet
Gordon 1991
21 pages
Pratheepa2010
No ratings yet
Pratheepa2010
6 pages
Robust LMI-Based Controller Design Using H and Mixed H /H For Semi Active Suspension System
No ratings yet
Robust LMI-Based Controller Design Using H and Mixed H /H For Semi Active Suspension System
9 pages
2016 Optimization and Static Output-Feedback Control For Half-Car
No ratings yet
2016 Optimization and Static Output-Feedback Control For Half-Car
13 pages
B.E. Project Model Predictive Control For Electro-Hydraulic Actuated Active Suspension System
No ratings yet
B.E. Project Model Predictive Control For Electro-Hydraulic Actuated Active Suspension System
12 pages
10 18038-Aubtda 340798-351378
No ratings yet
10 18038-Aubtda 340798-351378
11 pages
tmp4D3F TMP
No ratings yet
tmp4D3F TMP
8 pages
Active Suspension System
No ratings yet
Active Suspension System
7 pages
A Comparative Analysis of PID LQR and Fu PDF
No ratings yet
A Comparative Analysis of PID LQR and Fu PDF
6 pages
2013 - Sun, Gao, Yao - Adaptive Robust Vibration Control of Full-Car Active Suspensions With Electrohydraulic Actuators
No ratings yet
2013 - Sun, Gao, Yao - Adaptive Robust Vibration Control of Full-Car Active Suspensions With Electrohydraulic Actuators
6 pages
A Deep Reinforcement Learning-Based Controller For Magnetorheological-Damped Vehicle Suspension
No ratings yet
A Deep Reinforcement Learning-Based Controller For Magnetorheological-Damped Vehicle Suspension
19 pages
Applied Sciences: Nonlinear Control Design of A Half-Car Model Using Feedback Linearization and An LQR Controller
No ratings yet
Applied Sciences: Nonlinear Control Design of A Half-Car Model Using Feedback Linearization and An LQR Controller
17 pages
Design and Modeling of Active Suspension
No ratings yet
Design and Modeling of Active Suspension
5 pages
2015 - Filter-Based Adaptive Vibration Control For Active Vehicle Suspensions With Electro-Hydraulic Actuators
No ratings yet
2015 - Filter-Based Adaptive Vibration Control For Active Vehicle Suspensions With Electro-Hydraulic Actuators
8 pages
2012 - Adaptive Backstepping Control For Active Suspension Systems With Hard Constraints
No ratings yet
2012 - Adaptive Backstepping Control For Active Suspension Systems With Hard Constraints
8 pages
Comparison Study: Controlling 3 Dof Active Suspension System Using Different Intelligent Methods
No ratings yet
Comparison Study: Controlling 3 Dof Active Suspension System Using Different Intelligent Methods
12 pages
Sensors 23 07827 With Cover
No ratings yet
Sensors 23 07827 With Cover
21 pages
Linear Quadratic Regulator (LQR) Control Quarter Car Suspension System
No ratings yet
Linear Quadratic Regulator (LQR) Control Quarter Car Suspension System
22 pages
Marcu 2017 IOP Conf. Ser. Mater. Sci. Eng. 252 012032 PDF
No ratings yet
Marcu 2017 IOP Conf. Ser. Mater. Sci. Eng. 252 012032 PDF
9 pages
Generalized PI Control of Active Vehicle
No ratings yet
Generalized PI Control of Active Vehicle
19 pages
Vibration Control of Active Vehicle Suspension System Using Fuzzy Logic Algorithm
No ratings yet
Vibration Control of Active Vehicle Suspension System Using Fuzzy Logic Algorithm
28 pages
Fuzzy
No ratings yet
Fuzzy
8 pages
Preprints202505 0427 v1
No ratings yet
Preprints202505 0427 v1
17 pages
Model-Based Design of An Active Suspension For The
No ratings yet
Model-Based Design of An Active Suspension For The
21 pages
Con STR Cuting Contr L System Paper
No ratings yet
Con STR Cuting Contr L System Paper
13 pages
Fuzzy Ömürcan Özgüney
No ratings yet
Fuzzy Ömürcan Özgüney
16 pages
PID Control of A Nonlinear Half-Car Active Suspension System Via Force Feedback
No ratings yet
PID Control of A Nonlinear Half-Car Active Suspension System Via Force Feedback
7 pages
Vehicle Active Suspension
100% (1)
Vehicle Active Suspension
9 pages
Influence of Model Parameters On Vehicle Suspension Control: Sérgio Junichi Idehara, Matheus Rogério Roesler Sabka
No ratings yet
Influence of Model Parameters On Vehicle Suspension Control: Sérgio Junichi Idehara, Matheus Rogério Roesler Sabka
11 pages
Tuning Parameters of The Fractional Order PID-LQR
No ratings yet
Tuning Parameters of The Fractional Order PID-LQR
23 pages
Dynamic Output Feedback Fault-Tolerant Control For Switched Vehicle Active Suspension Delayed Systems
No ratings yet
Dynamic Output Feedback Fault-Tolerant Control For Switched Vehicle Active Suspension Delayed Systems
13 pages
Abouel-Nour A MechEng PHD 1989 2
No ratings yet
Abouel-Nour A MechEng PHD 1989 2
275 pages
Aperiodic Sampled-Data H Infty Control of Vehicle Active Suspension System An Uncertain Discrete-Time Model Approach
No ratings yet
Aperiodic Sampled-Data H Infty Control of Vehicle Active Suspension System An Uncertain Discrete-Time Model Approach
12 pages
Ave SH Paper
No ratings yet
Ave SH Paper
6 pages
Modeling and Control Active Suspension System For A Full Car Model
No ratings yet
Modeling and Control Active Suspension System For A Full Car Model
6 pages
Modeling and Control Active Suspension System For A Full Car Model
No ratings yet
Modeling and Control Active Suspension System For A Full Car Model
6 pages
Application of LQR Control Theory To The Design of Modified Skyhook Control Gains For Mecanial
No ratings yet
Application of LQR Control Theory To The Design of Modified Skyhook Control Gains For Mecanial
6 pages
Control of DC Motor Using Different Control Strategies
From Everand
Control of DC Motor Using Different Control Strategies
Dr. Hidaia Mahmood Alassouli
No ratings yet
ML Lecture#1
No ratings yet
ML Lecture#1
52 pages
Deep Reinforcement Learning For Drone Delivery
No ratings yet
Deep Reinforcement Learning For Drone Delivery
19 pages
MAE 598 Intro To Autonomous Project Dhiram Omkar Harshal
No ratings yet
MAE 598 Intro To Autonomous Project Dhiram Omkar Harshal
14 pages
M Thesis Report
No ratings yet
M Thesis Report
38 pages
Reinforcement Learning Meets Network Intrusion Detection A Transferable and Adaptable Framework For Anomaly Behavior Identification
No ratings yet
Reinforcement Learning Meets Network Intrusion Detection A Transferable and Adaptable Framework For Anomaly Behavior Identification
16 pages
AIML Mod-5
No ratings yet
AIML Mod-5
18 pages
Smart Cities 1nzhb60g Learning Based Energy Efficient Data Collection by Unmanned 3mjgjbrptj
No ratings yet
Smart Cities 1nzhb60g Learning Based Energy Efficient Data Collection by Unmanned 3mjgjbrptj
11 pages
Serge Levine Course Introduction To Reinforcement Learning 4: Actor Criric
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 4: Actor Criric
28 pages
Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems
No ratings yet
Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems
13 pages
Assignment 2 - Policy Gradients
No ratings yet
Assignment 2 - Policy Gradients
7 pages
Advances of Machine Learning in Multi-Energy District Communities 2022
No ratings yet
Advances of Machine Learning in Multi-Energy District Communities 2022
28 pages
CS2351 - Artificial Intelligence-2 Marks
100% (1)
CS2351 - Artificial Intelligence-2 Marks
16 pages
Reinforcement Learning Based Multi-Access Control and Battery Prediction With Energy Harvesting in Iot Systems
No ratings yet
Reinforcement Learning Based Multi-Access Control and Battery Prediction With Energy Harvesting in Iot Systems
12 pages
Data Science - UNIT-3 - Notes
No ratings yet
Data Science - UNIT-3 - Notes
32 pages
ICRA2024 IRL Reward Shaping Wu
No ratings yet
ICRA2024 IRL Reward Shaping Wu
8 pages
RL-Theory-Question Bank
No ratings yet
RL-Theory-Question Bank
3 pages
MFML PDF
No ratings yet
MFML PDF
101 pages
Unit 3
No ratings yet
Unit 3
12 pages
Artifical Intelligence IBM PDF
100% (2)
Artifical Intelligence IBM PDF
76 pages
Approximate Dynamic Programming Solving The Curses of Dimensionality Second Edition Warren B. Powell (Auth.)
No ratings yet
Approximate Dynamic Programming Solving The Curses of Dimensionality Second Edition Warren B. Powell (Auth.)
47 pages
From Chess and Atari To Starcraft and Beyond: How Game Ai Is Driving The World of Ai
No ratings yet
From Chess and Atari To Starcraft and Beyond: How Game Ai Is Driving The World of Ai
12 pages
Fcteg 02 722092
No ratings yet
Fcteg 02 722092
14 pages
ML Unit-5
No ratings yet
ML Unit-5
14 pages
Jsaer2023 10 10 168 175
No ratings yet
Jsaer2023 10 10 168 175
8 pages
Applications of Deep Reinforcement Learning in Nuclear Energy A Review
No ratings yet
Applications of Deep Reinforcement Learning in Nuclear Energy A Review
22 pages
Machine Learning For Fog Computing: Review, Opportunities and A Fog Application Classifier and Scheduler
No ratings yet
Machine Learning For Fog Computing: Review, Opportunities and A Fog Application Classifier and Scheduler
28 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
47 pages

Online Reinforcement Learning-Based Control of An Active Suspension System Using The Actor Critic Approach

Uploaded by

Online Reinforcement Learning-Based Control of An Active Suspension System Using The Actor Critic Approach

Uploaded by

applied

Appl. Sci. 2020, 10, 8060; doi:10.3390/app10228060 www.mdpi.com/journal/applsci

2.1. Modeling of the Active Suspension System

Figure 1. Quarter model.

Table 1. Nominal model parameters.

2.2. Reinforcement Learning

2.2.1. TD Advantage Actor Critic Algorithm

Algorithm 1: TD Advantage Actor Critic.

2.2.2. Reward Function

The second one is as follows:

The third reward function is:

2.2.3. Learning and Optimization

Table 2. List of activation functions.

Type Function Derivative

Table 3. Various actor-critic models.

Hidden Layers Output Layers Hidden Layers

elu sigmoid Relu Tanh Linear ReLu elu Linear

Hidden Layers Output Layers Hidden Layers

elu Linear ReLu elu Tanh elu Tanh elu Linear

Hidden Layers Output Layers Hidden Layers

ReLu Tanh ReLu Linear ReLu ReLu Tanh Linear

Hidden Layers Output Layers Hidden Layers

Hidden Layers Output Layers Hidden Layers

Figure 3. Critic structure.

Figure 4. Actor structure.

Figure 5. Algorithm implementation.

2.3. Online Estimation

θ̂(t ) = θ̂(t − 1) + K (t )(y(t ) − ϕT (t )θ̂(t − 1)) (6)

K (t ) = P (t − 1)ϕ(t )(λI + ϕT (t )P (t − 1)ϕ(t ))−1 (7)

ks bs (k s + k us ) (bs + bus ) 1 bus k us

yield the following: " #

such that: " #

3. Results and Discussion

Figure 6. Scenario 1 road profile.

(a) Sprung mass displacement (b) Unsprung mass displacement

(c) Vehicle body acceleration

Figure 7. Scenario 1 results.

(a) Road profile (b) Sprung mass displacement

(c) Unsprung mass displacement (d) Vehicle body acceleration

(a) Estimation of vehicle damping (b) Estimation of tire damping

Figure 9. Parameter estimation results.

You might also like