Model Predictive Path Integral Control Using Covariance Variable Important Sampling

Uploaded by

huy.nguyenngoc2137

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Model Predictive Path Integral Control Using Covariance Variable Important Sampling

Uploaded by

huy.nguyenngoc2137

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Model Predictive Path Integral Control using Covariance Variable

Importance Sampling
Grady Williams1 , Andrew Aldrich1 , and Evangelos A. Theodorou1

Abstract— In this paper we develop a Model Predictive Path drastically simplify the system under consideration by using
Integral (MPPI) control algorithm based on a generalized a hierarchical scheme [4], and use path integral control to
importance sampling scheme and perform parallel optimization generate trajectories for a point mass which is then followed
via sampling using a Graphics Processing Unit (GPU). The
proposed generalized importance sampling scheme allows for by a low level controller. Even though this approach may be
changes in the drift and diffusion terms of stochastic diffusion successfull for certain applications, it is limited in the kinds
arXiv:1509.01149v3 [cs.SY] 28 Oct 2015

processes and plays a significant role in the performance of the of behaviors that it can generate since it does not consider the
model predictive control algorithm. We compare the proposed full non-linearity of dynamics. A more efficient approach is
algorithm in simulation with a model predictive control version to take advantage of the parallel nature of sampling and use
of differential dynamic programming.
a graphics processing unit (GPU) [19] to sample thousands
I. INTRODUCTION of trajectories from the nonlinear dynamics.
The path integral optimal control framework [7], [15], A major issue in the path integral control framework is
[16] provides a mathematically sound methodology for de- that the expectation is taken with respect to the uncontrolled
veloping optimal control algorithms based on stochastic dynamics of the system. This is problematic since the proba-
sampling of trajectories. The key idea in this framework is bility of sampling a low cost trajectory using the uncontrolled
that the value function for the optimal control problem is dynamics is typically very low. This problem becomes more
transformed using the Feynman-Kac lemma [2], [8] into an drastic when the underlying dynamics are nonlinear and
expectation over all possible trajectories, which is known sampled trajectories can become trapped in undesirable parts
as a path integral. This transformation allows stochastic of the state space. It has previously been demonstrated
optimal control problems to be solved with a Monte-Carlo how to change the mean of the sampling distribution using
approximation using forward sampling of stochastic diffusion Girsanov’s theorem [15], [16], this can then be used to
processes. develop an iterative algorithm. However, the variance of
There have been a variety of algorithms developed in the the sampling distribution has always remained unchanged.
path integral control setting. The most straight-forward appli- Although in some simple simulated scenarios changing the
cation of path integral control is when the iterative feedback variance is not necessary, in many cases the natural variance
control law suggested in [15] is implemented in its open of a system will be too low to produce useful deviations from
loop formulation. This requires that sampling takes place the current trajectory. Previous methods have either dealt
only from the initial state of the optimal control problem. with this problem by artificially adding noise into the system
A more effective approach is to use the path integral control and then optimizing the noisy system [10], [14]. Or they
framework to find the parameters of a feedback control have simply ignored the problem entirely and sampled from
policy. This can be done by sampling in policy parameter whatever distribution worked best [12], [19]. Although these
space, these methods are known as Policy Improvement approaches can be successful, both are problematic in that
with Path Integrals [14]. Another approach to finding the the optimization either takes place with respect to the wrong
parameters of a policy is to attempt to directly sample from system or the resulting algorithm ignores the theoretical basis
the optimal distribution defined by the value function [3]. of path integral control.
Other methods along similar threads of research include [10], The approach we take here generalizes these approaches in
[17]. that it enables for both the mean and variance of the sampling
Another way that the path integral control framework distribution to be changed by the control designer, without
can be applied is in a model predictive control setting. violating the underlying assumptions made in the path inte-
In this setting an open-loop control sequence is constantly gral derivation. This enables the algorithm to converge fast
optimized in the background while the machine is simulta- enough that it can be applied in a model predictive control
neously executing the “best guess” that the controller has. setting. After deriving the model predictive path integral
An issue with this approach is that many trajectories must control (MPPI) algorithm, we compare it with an existing
be sampled in real-time, which is difficult when the system model predictive control formulation based on differential
has complex dynamics. One way around this problem is to dynamic programming (DDP) [6], [13], [18]. DDP is one of
the most powerful techniques for trajectory optimization, it
This research has been supported by NSF Grant No. NRI-1426945. relies on a first or second order approximation of the dynam-
The 1 authors are with the Autonomous Control and Decision Systems
Laboratory at the Georgia Institute of Technology, Atlanta, GA, USA. Email: ics and a quadratic approximation of the cost along a nominal
[email protected] trajectory, it then computes a second order approximation of
the value function which it uses to generate the control. this transformation we apply an exponential transformation
of the value function
II. PATH INTEGRAL CONTROL
V (x, t) = −λ log(Ψ(x, t)) (6)
In this section we review the path integral optimal control
framework [7]. Let xt ∈ RN denote the state of a dynamical Here λ is a positive constant. We also have to assume a
system at time t, u(xt , t) ∈ Rm denotes a control input for relationship between the cost and noise in the system (as
the system, τ : [t0 , T ] → Rn represents a trajectory of the well as λ) through the equation:
system, and dw ∈ Rp is a brownian disturbance. In the path
integral control framework we suppose that the dynamics Bc (xt , t)Bc (x, t)T = λGc (xt , t)R(xt , t)−1 Gc (xt , t)T (7)
take the form:
The main restriction implied by this assumption is that
dx = f (xt , t)dt + G(xt , t)u(xt , t)dt + B(xt , t)dw (1) B(xt , t) has the same rank as R(xt , t). This limits the
noise in the system to only effect state variables that are
In other words, the dynamics are affine in control and subject directly actuated (i.e. the noise is control dependent). There
to an affine brownian disturbance. We also assume that G are a wide variety of systems which naturally fall into this
and B are partitioned as: description, so the assumption is not too restrictive. However,
there are interesting systems for which this description does
0 0
G(xt , t) = ; B(xt , t) = (2) not hold (i.e. if there are known strong disturbances on
Gc (xt , t) Bc (xt , t)
indirectly actuated state variables or if the dynamics are only
Expectations taken with respect to (1) are denoted as EQ [·], partially known).
we will also be interested in taking expectations with respect By making this assumption and performing the exponen-
to the uncontrolled dynamics of the system (i.e (1) with tial transformation of the value function the stochastic HJB
u ≡ 0). These will be denoted EP [·]. We suppose that equation is transformed into the linear partial differential
the cost function for the optimal control problem has a equation:
quadratic control cost and an arbitrary state-dependent cost. Ψ(xt , t) 1
Let φ(xT ) denote a final the terminal cost, q(xt , t) a state ∂t Ψ = q(xt , t) − f (xt , t)T Ψx − tr(Σ(xt , t)Ψxx )
λ 2
dependent running cost, and define R(xt , t) as a positive (8)
definite matrix. The value function V (xt , t) for this optimal Here we’ve denoted the covariance matrix
control problem is then defined as:
" Z T # Bc (xt , t)Bc (xt , t)T
1 T
min EQ φ(xT ) + q(xt , t) + u R(xt , t)u dt as Σ(xt , t). This equation is known as the backward
u t 2
Chapman-Kolmogorov PDE. We can then apply the
(3)
Feynman-Kac lemma, which relates backward PDEs of this
The Stochastic Hamilton-Jacobi-Bellman equation [1], [11]
type to path integrals through the equation:
for the type of system in (1) and for the cost function in (3)
is given as:
" ! #
1 T
Z
Ψ(xt0 , t0 ) = EP exp − q(x, t) dt Ψ(xT , T )
λ t0
−∂t V = q(xt , t) + f (xt , t)T Vx
(9)
1
− VxT G(xt , t)R(xt , t)−1 G(xt , t)T Vx (4) Note that the expectation (which is the path integral) is
2 taken with respect to P which is the uncontrolled dynamics
1
+ tr(B(xt , t)B(xt , t)T Vxx ) of the system. By recognizing that the term Ψ(xT ) is the
2 1
transformed terminal cost: e− λ φ(xT ) we can re-write this
where the optimal control is expressed as: expression as:

u∗ = −R(xt , t)−1 G(xt , t)T Vx (5) 1
Ψ(xt0 , t0 ) ≈ EP exp − S(τ ) (10)
λ
The solution to this backwards PDE yields the value function RT
for the stochastic optimal control problem, which is then where S(τ ) = φ(xT ) + t0 q(xt , t)dt is the cost-to-go of
used to generate the optimal control. Unfortunately, classical the state dependent cost of a trajectory. Lastly we have to
methods for solving partial differential equations of this compute the gradient of Ψ with respect to the initial state xt0 .
nature suffer from the curse of dimensionality and are This can be done analytically and is a straightforward, albeit
intractable for systems with more than a few state variables. lengthy, computation so we omit it and refer the interested
The approach we take in the path integral control frame- reader to [14]. After taking the gradient we obtain:
work is to transform the backwards PDE into a path integral, 1

which is an expectation over all possible trajectories of the ∗ −1 EP exp − λ S(τ ) B(xt0 , t0 )dw
u dt = G(xt0 , t0 )
EP exp − λ1 S(τ )

system. This expectation can then be approximated by for-
ward sampling of the stochastic dynamics. In order to effect (11)
Where the matrix G(xt , t) is defined as: In order to efficiently approximate the controls, we require
−1 the ability to sample from a distribution which is likely to
R(xt , t)−1 Gc (xt , t)T Gc (xt , t)R(xt , t)−1 Gc (xt , t)T produce low cost trajectories. In previous applications of
(12) path integral control [15], [16] the mean of the sampling
Note that if Gc (xt , t) is square (which is the case if the distribution has been changed which allows for an iterative
system is not over actuated) this reduces to Gc (xt , t)−1 . update law. However, the variance of the sampling distri-
Equation (11) is the path integral form of the optimal bution has always remained unchanged. In well engineered
control. The fundamental difference between this form of systems, where the natural variance of the system is very
the optimal control and classical optimal control theory is low, changing the mean is insufficient since the state space
that instead of relying on a backwards in time process, is never aggressively explored. In the following derivation
this formula requires the evaluation of an expectation which we provide a method for changing both the initial control
can be approximated using forward sampling of stochastic input and the variance of the sampling distribution.
differential equations.
A. Likelihood Ratio
A. Discrete Approximation We suppose that we have a sampling distribution with non-
Equation (11) provides an expression for the optimal zero control input and a changed variance, which we denote
control in terms of a path integral. However, these equations as q, and we would like to approximate (16) using samples
are for continuous time and in order to sample trajectories from q as opposed to p. Now if we write the expectation
on a computer we need discrete time approximations. term (16) in integral form we get:
dxt
We first discretize the dynamics of the system. We have exp − λ1 S(τ )
R
∆t
0
− f (xt , t) p(τ )dτ
that xt+1 = xt + dxt where dxt is defined as: (17)
exp − λ1 S(τ ) p(τ )dτ
R
√
dxt = (f (xt , t) + G(xt , t)u(xt , t)) ∆t + B(xt , t) ∆t Where we are abusing notation and using τ to represent the
(13) discrete trajectory (xt0 , xt1 , . . . xtN ). Next we multiply both
The term is a vector of standard normal Gaussian random q(τ )
integrals by 1 = q(τ ) to get:
variables. For the uncontrolled dynamics of the system we
dxt
q(τ )
have: exp − λ1 S(τ )
R
√ ∆t
0
− f (x t , t) q(τ ) p(τ )dτ
dxt = f (xt , t)∆t + B(xt , t) ∆t (14) q(τ ) (18)
exp − λ1 S(τ ) q(τ
R
) p(τ )dτ
Another way we can express B(xt , t)dw which will be And we can then write this as an expectation with respect
useful is as: to q:
h dxt0 i
B(xt , t)dw ≈ dxt − f (xt , t)∆t (15) Eq exp − λ1 S(τ ) p(τ )
∆t − f (xt , t) q(τ )
PN (19)
Lastly we say: S(τ ) ≈ φ(xT )+ i=0 q(xt , t)∆t where N =
h )i
Eq exp − λ1 S(τ ) p(τ
q(τ )
(T − t)/∆t Then by defining p as the probability induced by
the discrete time uncontrolled dynamics we can approximate We now have the expectation in terms of a sampling distri-
(11) as: bution q for which we can choose:
h dxt0 i i) The initial control sequence from which to sample
Ep exp − λ1 S(τ ) ∆t − f (x t0 , t0 ) around.
u∗ = G(xt0 , t0 )−1 1
ii) The variance of the exploration noise which determines
Ep exp − λ S(τ )
(16) how aggressively the state space is explored.
p(τ )
Note that we have moved the ∆t term multiplying u over However, we now have an extra term to compute q(τ ) . This is
to the right-hand side of the equation and inserted it into the known as the likelihood ratio (or Radon-Nikodym derivative)
expectation. between the distributions p and q. In order to derive an
expression for this term we first have to derive equations
III. GENERALIZED IMPORTANCE SAMPLING for the probability density functions of p(τ ) and q(τ ) indi-
Equation (16) provides an implementable method for vidually. We can do this by deriving the probability density
approximating the optimal control via random sampling of function for the general discrete time diffusion processes
trajectories. By drawing many samples from p the expecta- P (τ ), corresponding to the dynamics:
√
tion can be evaluated using a Monte-Carlo approximation. In dxt = (f (xt , t) + G(xt , t)u(xt , t)) ∆t + B(xt , t) ∆t
practice, this approach is unlikely to succeed. The problem is (20)
that p is typically an inefficient distribution to sample from The goal is to find P (τ ) = P (xt0 , xt1 , . . . xtN ). By condi-
(i.e the cost-to-go will be high for most trajectores sampled tioning and using the Markov property of the state space this
from p). Intuitively sampling from the uncontrolled dynamics probability becomes:
corresponds to turning a machine on and waiting for the N
natural noise in the system dynamics to produce interesting
Y
P (xt0 , xt1 , . . . xtN ) = P (xti |xti−1 ) (21)
behavior. i=1
Now recall that a portion of the state space has deterministic Then under the condition that each Ati is invertible and each
dynamics and that we’ve partitioned the diffusion matrix as: Γi is invertible, the likelihood ratio for the two distributions
is:
0 N
! N
!
B(xt , t) = (22) Y ∆t X
Bc (xt , t) |Ati | exp − Qi (31)
i=1
2 i=1
We can partition the state variables x into the deterministic
(a) (c)
and non-deterministic variables xt and xt respectively.
(a) (a) (a) Proof: In discrete time the probability of a trajectory is
The next step is to conditionon xt+1 = F (xt , t) = xt +
(a) (a) formulated according to the (26). We thus have p(τ ) equal
f (xt , t) + G (xt , t)ut dt since if this does not hold
to:
P (τ ) is zero. We thus need to compute: PN
exp − ∆t 2 i=1 z i Σ i z i
N p(τ ) = (32)
Zp (τ )

(a)
Y
P xti |xti−1 , xti = F (a) (xti−1 , ti−1 (23)
i=1 and q(τ ) equal to:
And from the dynamics equations we know that each of these PN −1

T T
one-step transitions is Gaussian with mean: f (c) (xt , t) + exp − ∆t 2 i=1 (zi −µi ) ( At Σ A
i ti
i
) (zi −µi )
G(c) (xti , ti )u(xti , ti ) and variance: Zq (τ )
(33)
Σi = Bc (xti , ti )Bc (xti , ti )T ∆t. (24) p(τ )
Then dividing these two equations we have q(τ )
as:
(c)
dxt (c)
We then define zi = i
− f (xti , ti ), and µi = N
! N
!
∆t (2π)n/2 |AT
t Σi A t i |
1/2
) ∆t X
G(c) (xti , ti )u(xti , ti ). Applying the definition of the Gaus-
Y
i
exp − ζi (34)
sian distribution with these terms yields: i=1
(2π)n/2 |Σi |1/2 ) 2 i=1

N exp − ∆t (z − µ )T Σ−1 (z − µ )
i i i i
Where ζi is:
Y 2 i
P (τ ) = (25) −1
i=1
(2π)n/2 |Σi |1/2 ζ i = zT −1
i Σi zi − (zi − µi )
T
AT
t i Σi A t i (zi − µi ) (35)
And then using basic rules of exponents this probability Using basic rules of determinants it is easy to see that the
becomes: term outside the exponent reduces to
N
!
−1 ∆t X T −1 N N
Z(τ ) exp − (zi − µi ) Σi (zi − µi ) (26) Y (2π)n/2 |AT
j Σ j Aj |
1/2
) Y
2 i=1 = |Aj | (36)
j=1
(2π)n/2 |Σj |1/2 ) j=1
QN n/2
Where Z(τ ) = i=1 (2π) |Σi |1/2 . With this equation
in hand we’re now ready to compute the likelihood ratio So we need only show that ζi reduces to Qi . Observe that at
between two diffusion processes. every timestep we have the difference between two quadratic
functions of zi , so we can complete the square to combine
Theorem 1: Let p(τ ) be the probability density function this into a single quadratic function. If we recall the definition
for trajectories under the uncontrolled discrete time dynam- of Γi from above, and define Λi = AT ti Σi Ati then completing
ics: √ the square yields:
dxt = f (xt , t)∆t + B(xt , t) ∆t (27)
T
ζi = zi + Γi Λ−1 Γ−1 zi + Γi Λ−1

And let q(τ ) be the probability density function for trajecto- i µi i i µi
T −1 (37)
ries under the controlled dynamics with an adjusted variance: T −1 −1
− µi Λi µi − Γi Λi µi Γi Γt Λi µi −1

dxt = (f (xt , t) + G(xt , t)u(xt , t)) ∆t+ Now we expand out the first quadratic term to get:
√
BE (xt , t) ∆t (28)
−1 T −1 T −1 −1
ζ i = zT
i Γi zi + 2µi Λi zi + µi Λi Γi Λi µi
Where the adjusted variance has the form: (38)
−1 −1 T −1 −1
− µT
i Λi µi − (Γi Λi µi ) Γi (Γi Λi µi )
0
BE (xt , t) =
At Bc (xt , t) Notice that the two underlined terms are the same, except
for the sign, so they cancel out and we’re left with:
And define zi , µi , and Σi as before. Let Qi be defined as:
−1 T −1 T −1
ζ i = zT
i Γi zi + 2µi Λi zi − µi Λi µi (39)
T
Qi = (zi − µi ) Γ−1
i (zi − µi ) (29)
T −1 −1 Now define z̃i = zi − µi , and then re-write this equation in
+ 2 (µi ) Σi (zi − µi ) + µT
i Σi µi
terms of z˜i :
Where Γi is:
−1 ζi = (z̃i + µi )T Γ−1 T −1 T −1
i (z̃i + µi ) + 2µi Λi (z̃i + µi ) − µi Λi µi
Γ−1
i = Σ−1
i − AT
t i Σi A t i (30) (40)
which expands out to: Then by re-defining the running cost q(xt , t) as:
−1 T −1 T −1
ζi = z̃T
i Γi z̃i + 2µi Γi z̃i + µi Γi µi
1 T
−1 T −1 T −1
(41) q̃(x, u, dx) = q(xt , t) + (z − µ) Γ̃−1 (z − µ)
+ 2µT i Λi z̃i + 2µi Λi µi − µi Λi µi
2 (48)
T −1 1 T −1
Which then simplifies to: + µ H (z − µ) + µ H µ
2
−1 T −1 T −1
PN
ζi = z̃T
i Γi z̃i + 2µi Γi z̃i + µi Γi µi (42)
and S̃(τ ) = φ(xT ) + j=1 q̃(x, u, dx), we have:
−1 T −1
+ 2µT i Λi z̃i + µi Λi µi
h i
Eq exp − λ1 S̃(τ ) dx ∆t
t
− f (x t , t)
Now recall that Γi = (Σ−1
i − Λi )
−1 −1
, so we can split the u∗t = G(xt , t)−1 h i
1
quadratic terms in Γi into the Σ−1
−1
i and Λ−1
i components. Eq exp − λ S̃(τ )
Doing this yields: (49)
Also note that dxt is now equal to:
−1 T −1 T −1 T −1
ζi = z̃T
i Γi z̃i + 2µi Σi z̃i − 2µi Λi z̃i + µi Σi µi √
(43) (f (xt , t) + G(xt , t)u(xt , t)) ∆t + B(xt , t) ∆t (50)
−1 T −1 T −1
− µT
i Λi µi + 2µi Λi z̃i + µi Λi µi
dxt
So we can re-write ∆t − f (xt , t) as:
and by noting that the underlined terms cancel out we see
that we’re left with:
G(xt , t)u(xt , t) + B(xt , t) √ (51)
−1 −1 −1 ∆t
ζi = z̃T
i Γi z̃i + 2µT
i Σi z̃i + µT
i Σi µi (44)
And then since G(xt , t) does not depend on the expectation
which is the same as: we can pull it out and get the iterative update law:
T
(zi − µi ) Γ−1 T −1 T −1
i (zi − µi ) + 2µi Σi (zi − µi ) + µi Σi µi
(45) u∗t = G(xt , t)−1 G(xt , t)u(xt , t)
h i
And so ζi = Qi which completes the proof. Eq exp − λ1 S̃(τ ) B(xt , t) √∆t (52)
+ G(xt , t)−1 h i
The key difference between this proof and earlier path inte- Eq exp − λ1 S̃(τ )
gral works which use an application of Girsanov’s theorem
to sample from a non-zero control input is that this theorem C. Special Case
allows for a change in the variance as well. The update law (52) is applicable for a very general class
In the expression for the likelihood ratio derived here of systems. In this section we examine a special case which
−1 T −1
the last two terms (2µTi Σi (zi − µi ) + µi Σi µi ) are ex- we use for all of our experiments. We consider dynamics of
actly the terms from Girsanov’s theorem. The first term the form:
T
((zi − µi ) Γ−1
i (zi − µi )), which can be interpreted as pe-

1 √

nalizing over-aggressive exploration, is the only additional dxt = f (xt , t)∆t + G(xt , t) u(xt , t)∆t + √ ∆t
ρ
term.
√(53)
B. Likelihood Ratio as Additional Running Cost And for the sampling distribution we set A equal to νI.
We also assume that Gc (xt , t) is a square invertible matrix.
The form of the likelihood ratio just derived is easily
This reduces H(xt , t) to Gc (xt , t)−1 . Next the dynamics
incorporated into the path integral control framework by
can be re-written as:
folding it into the cost-to-go as an extra running cost. Note
that the likelihood ratio appears in both the numerator and 1
dxt = f (xt , t)∆t + G(xt , t) u(xt , t) + √ √ ∆t
denominator of (16). Therefore, any terms which do not ρ ∆t
depend on the state can be factored out of the expectation (54)
and canceled. This Then we can interpret √1ρ √∆t as a random change in the
QNremoves the numerically troublesome
normalizing term j=1 |Atj |. So only the summation of Qi control input, to emphasize this we will denote this term as
remains. Recall that Σ = λG(xt , t)R(xt , t)−1 G(xt , t). This δu = √1ρ √∆t . We then have B(xt , t) √∆t = G(xt , t)δu.
implies that: This yields the iterative update law as:
h i
Eq exp − λ1 S̃(τ ) δu
−1
Γ = λ G(xt , t)R(xt , t)−1 G(xt , t) u(xt , t)∗ = u(xt , t) + (55)
(46) h i
−1 Eq exp − λ1 S̃(τ )

− AT G(xt , t)R(xt , t)−1 G(xt , t)T A
which can be approximated as:
Now define H = G(xt , t)R(xt , t)−1 G(xt , t)T and Γ̃ = λ1 Γ.
PK
We then have: exp − 1
S̃(τ i,k ) δui,k
k=1 λ
1 T
u(xti , ti )∗ ≈ u(xti , ti ) +
(z−µ) Γ̃−1 (z−µ)+2µT H−1 (z−µ)+µT H−1 µ

Q= PK
exp − 1
S̃(τ )
λ k=1 λ i,k
(47) (56)
Where K is the number of random samples (termed rollouts) Algorithm 1: Model Predictive Path Integral Control
and S(τi,k ) is the cost-to-go of the kth rollout from time ti Given: K: Number of samples;
onward. This expression is simply a reward-weighted average N : Number of timesteps;
of random variations in the control input. Next we investigate (u0 , u1 , ...uN −1 ): Initial control sequence;
what the likelihood ratio addition to the running cost is. For ∆t, xt0 , f , G, B, ν: System/sampling dynamics;
these dynamics we have the following simplifications: φ, q, R, λ: Cost parameters;
i) z − µ = G(xt , t)δu uinit : Value to initialize new controls to;
ii) Γ̃−1 = (1 − ν −1 )G(xt , t)−1 R(xt , t)G(xt , t)
while task not completed do
iii) H−1 = G(xt , t)−1 R(xt , t)G(xt , t)−1
for k ← 0 to K − 1 do
Given these simplifications q̃ reduces to: x = xt0 ;
for i ← 1 to N − 1 do
(1 − ν −1 ) T xi+1 = xi + (f + G (ui + δui,k )) ∆t;
q̃(x, u, dx) = q(xt , t) + δu Rδu
2 (57) S̃(τi+1,k ) = S̃(τi,k ) + q̃;
1
+ uT Rδu + uT Ru
2 for i ← 0 to N− 1 do
PK exp(− λ1
S̃( τi,k ))δui,k
This means that the introduction of the likelihood ratio ui ← ui + k=1 PK ;
k=1 exp(− λ S̃( τi,k ))
1
simply introduces the original control cost from the optimal
control formulation into the sampling cost, which originally send to actuators(u0 );
only included state-dependent terms. for i ← 0 to N − 2 do
ui = ui+1 ;
IV. MODEL PREDICTIVE CONTROL ALGORITHM uN −1 = uinit
Update the current state after receiving feedback;
We apply the iterative path integral control update law, check for task completion;
with the generalized importance sampling term, in a model
predictive control setting. In this setting optimization and
execution occur simultaneously: the trajectory is optimized
and then a single control is executed, then the trajectory is A. Cart-Pole
re-optimized using the un-executed portion of the previous
trajectory to warm-start the optimization. This scheme has For the cart-pole swing-up task we used the state cost:
two key requirements: q(x) = p2 + 500(1 + cos(θ))2 + θ̇2 + ṗ2 , where p is the
i) Rapid convergence to a good control input. position of cart, ṗ is the velocity and θ, θ̇ are the angle and
ii) The ability to sample a large number of trajectories in angular velocity of the pole. The control input is desired
real-time. velocity, which maps to velocity through the equation: p̈ =
10(u − ṗ). The disturbance parameter √1ρ was set equal .01
The first requirement is essential because the algorithm
and the control cost was R = 1. We ran the MPPI controller
does not have the luxury of waiting until the trajectory has
for 10 seconds with a 1 second optimization horizon. The
converged before executing. The new importance sampling
controller has to swing-up the pole and keep it balanced for
term enables tuning of the exploration variance which allows
the rest of the 10 second horizon. The exploration variance
for rapid convergence, this is demonstrated in Fig. 1.
The second requirement, sampling a large number of
trajectories in real-time, is satisfied by implementing the
random sampling of trajectories on a GPU. The algorithm 300 ν = 75
ν = 500
is given in Algorithm 1, in the parallel GPU implementation ν = 1000
Average Running Cost

250
the sampling for loop (for k to K-1) is run completely in ν = 1500
parallel. 200

V. EXPERIMENTS 150

We tested the model predictive path integral control algo- 100

rithm (MPPI) on three simulated platforms (1) A cart-pole,

(2) A miniature race car, and (3) A quadrotor attempting 50

to navigate an obstacle filled environment. For the race car

0
and quadrotor we used a model predictive control version 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Number of Rollouts (Log Scale)
of the differential dynamic programming (DDP) algorithm
as a baseline comparision. In all of these experiments the Fig. 1. Average running cost for the cart-pole swing-up task as a function of
controller operates at 50 Hz, this means that the open loop the exploration variance ν and the number of rollouts. Using only the natural
control sequence is re-optimized every 20 milliseconds. system variance the MPC algorithm does not converge in this scenario.
10 10
parameter, ν, was varied between 1 and 1500. The MPPI MPC-DDP MPPI
5 5
controller is able to swing-up the pole faster with increasing
0 0
exploration variance. Fig. 1 illustrates the performance of the
MPPI controller as the exploration variance and the number −5 −5
of rollouts are changed. Using only the natural variance of −10 −10
−15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
the system for exploration is insufficient in this task, in that
case (not shown in the figure) the controller is never able to Fig. 3. Comparison of DDP (left) and MPPI (right) performing a cornering
maneuver along an ellipsoid track. MPPI is able to make a much tigther
swing-up the pole which results in a cost around 2000. turn while carrying more speed in and out of the corner than DDP. The
direction of travel is counterclockwise.
B. Race Car
In the race car task the goal was to minimize the objective 9

function: q(x) = 100d2 + (vx − 7.0)2 . Where d is defined DDP |vx | MPPI |vx |
8
x 2 2 DDP |vy | MPPI |vy |
as: d = | 13 + y6 − 1|, and vx is the forward (in
7
body frame) velocity of the car. This cost ensures that
the car to stays on an elliptical track while maintaining a 6

Velocity (m/s)
forward speed of 7 meters/sec. We use a non-linear dynamics 5
model [5] which takes into account the (highly non-linear)
4
interactions between tires and the ground. The exploration
variance was set to a constant ν times the natural variance 3

of the system. The MPPI controller is able to enter turns at 2

30 1
DDP Solution
ν = 50 0
0 10 20 30 40 50
ν = 100
25
Time (s)
Average Running Cost

ν = 150
Fig. 4. Comparison of DDP (left) and MPPI (right) performing a cornering
ν = 300
maneuver along an ellipsoid track. MPPI is able to make a much tigther
20
turn while carrying more speed in and out of the corner than DDP.

MPPI and DDP which guide the quadrotor through the forest
10
as quickly as possible. The cost function for MPPI was
100
MPC-DDP MPPI
5
1.0 1.5 2.0 2.5 3.0 3.5 4.0
80
Number of Rollouts (Log Scale)

Fig. 2. Performance comparison in terms of average cost between MPPI 60

and MPC-DDP as the exploration variance ν changes from 50 to 300 and

the number of rollouts changes from 10 to 1000. Only with a very large 40
increase in the exploration variance is MPPI able to outperform MPC-DDP.
Note that the cost is capped at 25.0 20

close to the desired speed of 7 m/s and then slide through 0

the turn. The DDP solution does not attempt to slide and 0 20 40 60 80 100 0 20 40 60 80 100

significantly reduces its forward velocity before entering the

Fig. 5. Left: sample DDP trajectory through 4m obstacle field, Right:
turn, this results in a higher average cost compared to the Sample MPPI trajectory through the same field. Since the MPPI controller
MPPI controller. Fig. 2 shows the cost comparison between can directly reason about the shape of the obstacles it is able to safely pass
MPPI and MPC-DDP, and Figures 3 and 4 show samples of through the field taking a much more direct route.
the trajectories taken by the two algorithms as well as the
velocity profiles. of the form: q(x) = 2.5(px − pdes 2 des 2
x ) + 2.5(py − py ) +
d
150(pz − pdes 2 2 2
z ) + 50ψ + kvk +350 exp(− 12 ) + 1000C
C. Quadrotor where (px , py , pz ) denotes the position of the vehicle. ψ
The quadrotor task was to fly through a field filled denotes the yaw angle in radians, v is velocity, and d is
with cylindrical obstacles as fast as possible. We used the the distance to the closest obstacle. C is a variable which
quadrotor dynamics model from [9]. This is a non-linear indicates whether the vehicle has crashed into the ground or
model which includes position, velocity, euler angles, angular an obstacle. Additionally if C = 1 (which indicates a crash),
acceleration, and the rotor dynamics. We randomly generated the rollout stops simulating the dynamics and the vehicle
three forests, one where obstacles are on average 3 meters remains where it is for the rest of the time horizon. We found
apart, the second one 4 meters apart, and the third 5 meters that the crash indicator term is not useful for the MPC-DDP
apart. We then separately created cost functions for both based controller, this is not surprising since the discontinuity
it creates is difficult to approximate with a quadratic function. ii) The use of a GPU to sample thousands of trajectories
The term in the cost for avoiding obstacles in the MPC- in real-time.
DDP P controller consists purely of a large exponential term: The derivation of the likelihood ratio enables the designer
N
2000 i=1 exp(− 21 d2i ), note that this sum is over all the of the algorithm to tune the exploration variance in the path
obstacles in the proximity of the vehicle whereas the MPPI integral control framework, whereas previous methods have
controller only has to consider the closest obstacle. only allowed for the mean of the distribution to be changed.
Tuning the exploration variance is critical in achieving a high
18
MPPI
level of performance since the natural variance of the system
17 MPC-DDP is typically too low to achieve good performance.
The experiments considered in this work only consider
Time to Completion (s)

16
changing the variance by a constant multiple times the
15 natural variance of the system. In this special case the
14
introduction of the likelihood ratio corresponds to adding in
a control cost when evaluating the cost-to-go of a trajectory.
13
A direction for future research is to investigate how to
12
automatically adjust the variance online. Doing so could
enable the algorithm to switch from aggressively exploring
11
the state space when performing aggressive maneuvers to
10 exploring more conservatively for performing very precise
3m 4m 5m
maneuvers.
Density Setting of Forest
R EFERENCES
Fig. 6. Time to navigate forest. Comparison between MMPI and DDP.
[1] W. H. Fleming and H. M. Soner. Controlled Markov processes and
viscosity solutions. Applications of mathematics. Springer, New York,
Since the MPPI controller can explicitly reason about 2nd edition, 2006.
crashing (as opposed to just staying away from obstacles), [2] A. Friedman. Stochastic Differential Equations And Applications.
it is able to travel both faster and closer to obstacles than Academic Press, 1975.
[3] Vicenç Gómez, Hilbert J Kappen, Jan Peters, and Gerhard Neumann.
the MPC-DDP controller. Fig. 7 shows the difference in time Policy search for path integral control. In Machine Learning and
between the two algorithms and Fig. 6 the trajectories taken Knowledge Discovery in Databases, pages 482–497. Springer, 2014.
by MPC-DDP and one of the MPPI runs on the forest with [4] Vicenç Gómez, Sep Thijssen, Hilbert J Kappen, Stephen Hailes, and
Andrew Symington. Real-time stochastic optimal control for multi-
obstacles placed on average 4 meters away. agent quadrotor swarms. arXiv preprint arXiv:1502.04548, 2015.
[5] R.Y Hindiyeh. Dynamics and Control of Drifting in Automobiles. PhD
thesis, Stanford University, March 2013.
[6] D. H. Jacobson and D. Q. Mayne. Differential dynamic programming.
American Elsevier Pub. Co., New York, 1970.
[7] H. J. Kappen. Linear theory for control of nonlinear stochastic
systems. Phys Rev Lett, 95:200201, 2005. Journal Article United
States.
[8] I. Karatzas and S. E. Shreve. Brownian Motion and Stochastic
Calculus (Graduate Texts in Mathematics). Springer, 2nd edition,
August 1991.
Fig. 7. Simulated forest environment used in the quadrotor navigation task. [9] Nathan Michael, Daniel Mellinger, Quentin Lindsey, and Vijay Ku-
mar. The grasp multiple micro-uav testbed. Robotics & Automation
Magazine, IEEE, 17(3):56–65, 2010.
[10] E. Rombokas, M. Malhotra, E.A. Theodorou, E. Todorov, and Y. Mat-
VI. CONCLUSION suoka. Reinforcement learning and synergistic control of the act hand.
In this paper we have developed a model predictive path IEEE/ASME Transactions on Mechatronics, 18(2):569–577, 2013.
[11] R. F. Stengel. Optimal control and estimation. Dover books on
integral control algorithm which is able to outperform a advanced mathematics. Dover Publications, New York, 1994.
state-of-the-art DDP method on two difficult control tasks. [12] F. Stulp, J. Buchli, E. Theodorou, and S. Schaal. Reinforcement learn-
The algorithm is based on stochastic sampling of system ing of full-body humanoid motor skills. In Proceedings of 10th IEEE-
RAS International Conference on Humanoid Robots (Humanoids),
trajectories and requires no derivatives of either the dynamics pages 405–410, Dec 2010.
or costs of the system. This enables the algorithm to naturally [13] E. Theodorou, Y. Tassa, and E. Todorov. Stochastic differential
take into account non-linear dynamics, such as a non-linear dynamic programming. In American Control Conference, 2010, pages
1125–1132, 2010.
tire model [5]. It is also able to handle cost functions which [14] E. A. Theodorou, J. Buchli, and S. Schaal. A generalized path integral
are intuitively appealing, such as an impulse cost for hitting approach to reinforcement learning. Journal of Machine Learning
an obstacle, but are difficult for traditional approaches that Research, (11):3137–3181, 2010.
[15] E.A. Theodorou and E. Todorov. Relative entropy and free energy du-
rely on a smooth gradient signal to perform optimization. alities: Connections to path integral and kl control. In the Proceedings
The two keys to achieving this level of performance with a of IEEE Conference on Decision and Control, pages 1466–1473, Dec
sampling based method are: 2012.
[16] Evangelos A. Theodorou. Nonlinear stochastic control and information
i) The derivation of the generalized likelihood ratio be- theoretic dualities: Connections, interdependencies and thermody-
tween discrete time diffusion processes. namic interpretations. Entropy, 17(5):3352–3375, 2015.
[17] Sep Thijssen and HJ Kappen. Path integral control and state-dependent
feedback. Physical Review E, 91(3):032104, 2015.
[18] E. Todorov and W. Li. A generalized iterative lqg method for locally-
optimal feedback control of constrained nonlinear stochastic systems.
pages 300–306, 2005.
[19] G. Williams, E. Rombokas, and T. Daniel. Gpu based path integral
control with learned dynamics. In Neural Information Processing
Systems - ALR Workshop, 2014.

Medallion User Guide
67% (3)
Medallion User Guide
149 pages
Most Asked Interview Questions in SAP BW HANA
No ratings yet
Most Asked Interview Questions in SAP BW HANA
7 pages
Model Predictive Control
No ratings yet
Model Predictive Control
12 pages
Measurement: Ankush Chakrabarty, Suvadeep Banerjee, Sayan Maity, Amitava Chatterjee
No ratings yet
Measurement: Ankush Chakrabarty, Suvadeep Banerjee, Sayan Maity, Amitava Chatterjee
14 pages
Gaussian Process-Based Predictive Contro
No ratings yet
Gaussian Process-Based Predictive Contro
12 pages
Model-Based Adaptive Critic Designs: Editor's Summary
No ratings yet
Model-Based Adaptive Critic Designs: Editor's Summary
31 pages
Observer-Based_Model-Free_Adaptive_Sliding_Mode_Predictive_Control
No ratings yet
Observer-Based_Model-Free_Adaptive_Sliding_Mode_Predictive_Control
11 pages
2017 Model-Based Control Using Koopman Operators
No ratings yet
2017 Model-Based Control Using Koopman Operators
8 pages
Optimal Tracking Control of Motion Systems
No ratings yet
Optimal Tracking Control of Motion Systems
11 pages
Amc 15410
No ratings yet
Amc 15410
14 pages
PCACCSMDec 2021
No ratings yet
PCACCSMDec 2021
33 pages
54.03_13
No ratings yet
54.03_13
8 pages
Active Suspension Control of Full Car Systems Without Function Approximation PDF
No ratings yet
Active Suspension Control of Full Car Systems Without Function Approximation PDF
12 pages
Robust Multiple Model Adaptive Control With Fuzzy Posterior Probability Combination
No ratings yet
Robust Multiple Model Adaptive Control With Fuzzy Posterior Probability Combination
10 pages
Prognosis Based On Handling Drifts in Dynamical Environments Application To A Wind Turbine Benchmark
No ratings yet
Prognosis Based On Handling Drifts in Dynamical Environments Application To A Wind Turbine Benchmark
6 pages
Iterative Feedback Tuning, Theory and Applications
No ratings yet
Iterative Feedback Tuning, Theory and Applications
16 pages
Actuators 13 00252
No ratings yet
Actuators 13 00252
16 pages
2210.08780
No ratings yet
2210.08780
6 pages
Zheng 2020
No ratings yet
Zheng 2020
12 pages
Adaptive Learning Feedback Linearization
No ratings yet
Adaptive Learning Feedback Linearization
9 pages
Mathematical Modelling of Engineering Problems: Received: 26 June 2022 Accepted: 22 October 2022
No ratings yet
Mathematical Modelling of Engineering Problems: Received: 26 June 2022 Accepted: 22 October 2022
7 pages
Development of A New Adaptive Backstepping Control Design For A Non-Strict and Under-Actuated System Based On A PSOTuner
No ratings yet
Development of A New Adaptive Backstepping Control Design For A Non-Strict and Under-Actuated System Based On A PSOTuner
17 pages
LQR Control Scheme For Active Vehicle Suspension Systems Based On Modal Decomposition
No ratings yet
LQR Control Scheme For Active Vehicle Suspension Systems Based On Modal Decomposition
6 pages
Actuators 13 00167
No ratings yet
Actuators 13 00167
20 pages
Adaptive Model Predictive Control of Multivariable Time-Varying Systems
No ratings yet
Adaptive Model Predictive Control of Multivariable Time-Varying Systems
13 pages
Optimal Exploration Strategy For Regret Minimization in Unconstrained Scalar Optimization Problems
No ratings yet
Optimal Exploration Strategy For Regret Minimization in Unconstrained Scalar Optimization Problems
8 pages
A Self-Organizing Fuzzy Controller With A Fixed Maximum Number of Rules and An Adaptive Similarity Factor
No ratings yet
A Self-Organizing Fuzzy Controller With A Fixed Maximum Number of Rules and An Adaptive Similarity Factor
44 pages
MPC Based Trajectory Tracking Control
No ratings yet
MPC Based Trajectory Tracking Control
7 pages
Energies 14 07438 v2
No ratings yet
Energies 14 07438 v2
14 pages
Control of A Multivariable System Using Optimal Control Pairs: A Quadruple-Tank Process
No ratings yet
Control of A Multivariable System Using Optimal Control Pairs: A Quadruple-Tank Process
100 pages
1904.11387
No ratings yet
1904.11387
23 pages
FINAL Article
No ratings yet
FINAL Article
16 pages
Droop Control Optimization Strategy For Parallel Inverters in A Microgrid Based On An Improved Population Division Fruit Fly Algorithm
No ratings yet
Droop Control Optimization Strategy For Parallel Inverters in A Microgrid Based On An Improved Population Division Fruit Fly Algorithm
18 pages
MPC-Based Control of A Large-Scale Power System Subject To Consecutive Pulse Load Variations
No ratings yet
MPC-Based Control of A Large-Scale Power System Subject To Consecutive Pulse Load Variations
10 pages
Solvant
No ratings yet
Solvant
6 pages
Mathematics 11 01094
No ratings yet
Mathematics 11 01094
21 pages
10.1007@s00521-020-04977-6
No ratings yet
10.1007@s00521-020-04977-6
18 pages
Function Approximation Technique Based Immersion and Invariance Control For Unknown Nonlinear Systems
No ratings yet
Function Approximation Technique Based Immersion and Invariance Control For Unknown Nonlinear Systems
6 pages
Semi-Active Suspension Control Based On Deep Reinforcement Learning
No ratings yet
Semi-Active Suspension Control Based On Deep Reinforcement Learning
9 pages
2107.00966v1
No ratings yet
2107.00966v1
11 pages
Model Predictive Control Using Fuzzy Decision Functions
No ratings yet
Model Predictive Control Using Fuzzy Decision Functions
12 pages
Nonlinear_Tire_Model_Approximation_Using_Machine_L
No ratings yet
Nonlinear_Tire_Model_Approximation_Using_Machine_L
15 pages
Introduction To Adaptive Control
No ratings yet
Introduction To Adaptive Control
3 pages
1 PB
No ratings yet
1 PB
12 pages
Obstecal Avoidance
No ratings yet
Obstecal Avoidance
7 pages
Multi Variable MPC System Performance Assessment, Monitoring, and Diagnosis
No ratings yet
Multi Variable MPC System Performance Assessment, Monitoring, and Diagnosis
17 pages
Composite Learning Control With Application To Inverted Pendulums
No ratings yet
Composite Learning Control With Application To Inverted Pendulums
5 pages
Damour Et Al-2014-Fuel Cells PDF
No ratings yet
Damour Et Al-2014-Fuel Cells PDF
8 pages
European Journal of Control: P.S. Lal Priya, Bijnan Bandyopadhyay
No ratings yet
European Journal of Control: P.S. Lal Priya, Bijnan Bandyopadhyay
8 pages
Applsci 13 08204
No ratings yet
Applsci 13 08204
14 pages
Robust Loop Shaping Controller Design For Spectral Models by Quadratic Programming
No ratings yet
Robust Loop Shaping Controller Design For Spectral Models by Quadratic Programming
6 pages
Adaptive Control of Teleoperation Systems With Uncertainties A Survey
No ratings yet
Adaptive Control of Teleoperation Systems With Uncertainties A Survey
5 pages
Advancements in Sliding Mode Control For Improved Performance
No ratings yet
Advancements in Sliding Mode Control For Improved Performance
10 pages
2302.11719v1
No ratings yet
2302.11719v1
8 pages
constrainedLQR Vehicle Dynamics Applications of Optimal Control Theory
No ratings yet
constrainedLQR Vehicle Dynamics Applications of Optimal Control Theory
40 pages
Robust Optimal Control of Quadrotor Uavs: Received February 13, 2013, Accepted April 16, 2013, Published May 10, 2013
No ratings yet
Robust Optimal Control of Quadrotor Uavs: Received February 13, 2013, Accepted April 16, 2013, Published May 10, 2013
15 pages
Research On Deep Reinforcement Learning Control Al
No ratings yet
Research On Deep Reinforcement Learning Control Al
20 pages
q3
No ratings yet
q3
12 pages
Au TS Ijphm 150527
No ratings yet
Au TS Ijphm 150527
15 pages
R. Hayat, M. Buss
No ratings yet
R. Hayat, M. Buss
8 pages
Motorcycle Steering Controlby
No ratings yet
Motorcycle Steering Controlby
9 pages
Intelligent Technologies for Automated Electronic Systems
From Everand
Intelligent Technologies for Automated Electronic Systems
S. Kannadhasan
No ratings yet
Fan 2019
No ratings yet
Fan 2019
7 pages
Tips - Network Games Theory Models and Dynamics Synthesis
No ratings yet
Tips - Network Games Theory Models and Dynamics Synthesis
160 pages
Imotion-Llm: Motion Prediction Instruction Tuning: This Work Was Done Outside of Meta With Personal Capacity
No ratings yet
Imotion-Llm: Motion Prediction Instruction Tuning: This Work Was Done Outside of Meta With Personal Capacity
16 pages
Machine Learning Meets Advanced Robotic Manipulation: A, B C, C C D e
No ratings yet
Machine Learning Meets Advanced Robotic Manipulation: A, B C, C C D e
69 pages
JDEtips Article Intro To UDOs 257
No ratings yet
JDEtips Article Intro To UDOs 257
14 pages
questions and answers
No ratings yet
questions and answers
7 pages
A76 Graphical Solutions With Functions
No ratings yet
A76 Graphical Solutions With Functions
11 pages
Launcher Log
No ratings yet
Launcher Log
237 pages
Chapter 3 - Objects Drawing
No ratings yet
Chapter 3 - Objects Drawing
7 pages
Stringart2018 Preprint Low
No ratings yet
Stringart2018 Preprint Low
12 pages
AIX Basics Student Guide-1
No ratings yet
AIX Basics Student Guide-1
4 pages
TP Manual 2019R1
No ratings yet
TP Manual 2019R1
881 pages
Malware Analysis - Wikipedia
No ratings yet
Malware Analysis - Wikipedia
4 pages
Airbus
No ratings yet
Airbus
2 pages
1 Typically
0% (1)
1 Typically
4 pages
Generative AI and Disinformation: Recent Advances, Challenges, and Opportunities
No ratings yet
Generative AI and Disinformation: Recent Advances, Challenges, and Opportunities
38 pages
a2
No ratings yet
a2
7 pages
Gs33j05d20-01en 007
No ratings yet
Gs33j05d20-01en 007
9 pages
Termux Commands List PDF
No ratings yet
Termux Commands List PDF
2 pages
IITD DSML B8 - Brochure
No ratings yet
IITD DSML B8 - Brochure
15 pages
Unit 2 - Introduction To Computers
No ratings yet
Unit 2 - Introduction To Computers
53 pages
Clave 16 Instruction Manual v1.4
No ratings yet
Clave 16 Instruction Manual v1.4
23 pages
DPSD LAB OBSERVATION
No ratings yet
DPSD LAB OBSERVATION
76 pages
Session:11: (Access Modifiers Types)
No ratings yet
Session:11: (Access Modifiers Types)
11 pages
DSA Notes Well Organised
No ratings yet
DSA Notes Well Organised
166 pages
SOTG GOAmbulance
No ratings yet
SOTG GOAmbulance
34 pages
Case Study 7412589630
No ratings yet
Case Study 7412589630
3 pages
Estimator Properties
No ratings yet
Estimator Properties
17 pages
CALCULUS 2 - MODULE 2 - Lessons 12 16
No ratings yet
CALCULUS 2 - MODULE 2 - Lessons 12 16
48 pages
LibreOffice LibreOffice - Free Office Suite - Based On OpenOffice - Compatible With Microsoft
No ratings yet
LibreOffice LibreOffice - Free Office Suite - Based On OpenOffice - Compatible With Microsoft
1 page
Profile Creation Sites
No ratings yet
Profile Creation Sites
4 pages
Mock - Test Maths Ioqm
No ratings yet
Mock - Test Maths Ioqm
4 pages

Model Predictive Path Integral Control Using Covariance Variable Important Sampling

Uploaded by

Model Predictive Path Integral Control Using Covariance Variable Important Sampling

Uploaded by

Model Predictive Path Integral Control using Covariance Variable

We tested the model predictive path integral control algo- 100

rithm (MPPI) on three simulated platforms (1) A cart-pole,

to navigate an obstacle filled environment. For the race car

of the system. The MPPI controller is able to enter turns at 2

Fig. 2. Performance comparison in terms of average cost between MPPI 60

and MPC-DDP as the exploration variance ν changes from 50 to 300 and

close to the desired speed of 7 m/s and then slide through 0

significantly reduces its forward velocity before entering the

You might also like