Optimal Control - Wikipedia
Optimal Control - Wikipedia
Optimal Control - Wikipedia
Optimal control theory is a branch of control theory that deals with finding a control for a
dynamical system over a period of time such that an objective function is optimized.[1] It has
numerous applications in science, engineering and operations research. For example, the
dynamical system might be a spacecraft with controls corresponding to rocket thrusters, and the
objective might be to reach the Moon with minimum fuel expenditure.[2] Or the dynamical system
could be a nation's economy, with the objective to minimize unemployment; the controls in this
case could be fiscal and monetary policy.[3] A dynamical system may also be introduced to
embed operations research problems within the framework of optimal control theory.[4][5]
General method
Optimal control deals with the problem of finding a control law for a given system such that a
certain optimality criterion is achieved. A control problem includes a cost functional that is a
function of state and control variables. An optimal control is a set of differential equations
describing the paths of the control variables that minimize the cost function. The optimal control
can be derived using Pontryagin's maximum principle (a necessary condition also known as
Pontryagin's minimum principle or simply Pontryagin's principle),[8] or by solving the Hamilton–
Jacobi–Bellman equation (a sufficient condition).
We begin with a simple example. Consider a car traveling in a straight line on a hilly road. The
question is, how should the driver press the accelerator pedal in order to minimize the total
traveling time? In this example, the term control law refers specifically to the way in which the
driver presses the accelerator and shifts the gears. The system consists of both the car and the
road, and the optimality criterion is the minimization of the total traveling time. Control problems
usually include ancillary constraints. For example, the amount of available fuel might be limited,
the accelerator pedal cannot be pushed through the floor of the car, speed limits, etc.
A proper cost function will be a mathematical expression giving the traveling time as a function
of the speed, geometrical considerations, and initial conditions of the system. Constraints are
often interchangeable with the cost function.
Another related optimal control problem may be to find the way to drive the car so as to minimize
its fuel consumption, given that it must complete a given course in a time not exceeding some
amount. Yet another related control problem may be to minimize the total monetary cost of
completing the trip, given assumed monetary prices for time and fuel.
A more abstract framework goes as follows.[1] Minimize the continuous-time cost functional
where is the state, is the control, is the independent variable (generally speaking,
time), is the initial time, and is the terminal time. The terms and are called the
endpoint cost and the running cost respectively. In the calculus of variations, and are
referred to as the Mayer term and the Lagrangian, respectively. Furthermore, it is noted that the
path constraints are in general inequality constraints and thus may not be active (i.e., equal to
zero) at the optimal solution. It is also noted that the optimal control problem as stated above
may have multiple solutions (i.e., the solution may not be unique). Thus, it is most often the case
that any solution to the optimal control problem is locally minimizing.
A special case of the general nonlinear optimal control problem given in the previous section is
the linear quadratic (LQ) optimal control problem. The LQ problem is stated as follows. Minimize
the quadratic continuous-time cost functional
A particular form of the LQ problem that arises in many control system problems is that of the
linear quadratic regulator (LQR) where all of the matrices (i.e., , , , and ) are constant, the
initial time is arbitrarily set to zero, and the terminal time is taken in the limit (this last
assumption is what is known as infinite horizon). The LQR problem is stated as follows. Minimize
the infinite horizon quadratic continuous-time cost functional
In the finite-horizon case the matrices are restricted in that and are positive semi-definite
and positive definite, respectively. In the infinite-horizon case, however, the matrices and
are not only positive-semidefinite and positive-definite, respectively, but are also constant. These
additional restrictions on and in the infinite-horizon case are enforced to ensure that the
cost functional remains positive. Furthermore, in order to ensure that the cost function is
bounded, the additional restriction is imposed that the pair is controllable. Note that the
LQ or LQR cost functional can be thought of physically as attempting to minimize the control
energy (measured as a quadratic form).
The infinite horizon problem (i.e., LQR) may seem overly restrictive and essentially useless
because it assumes that the operator is driving the system to zero-state and hence driving the
output of the system to zero. This is indeed correct. However the problem of driving the output to
a desired nonzero level can be solved after the zero output one is. In fact, it can be proved that
this secondary LQR problem can be solved in a very straightforward manner. It has been shown
in classical optimal control theory that the LQ (or LQR) optimal control has the feedback form
and is the solution of the differential Riccati equation. The differential Riccati equation is
given as
For the finite horizon LQ problem, the Riccati equation is integrated backward in time using the
terminal boundary condition
For the infinite horizon LQR problem, the differential Riccati equation is replaced with the
algebraic Riccati equation (ARE) given as
Understanding that the ARE arises from infinite horizon problem, the matrices , , , and
are all constant. It is noted that there are in general multiple solutions to the algebraic Riccati
equation and the positive definite (or positive semi-definite) solution is the one that is used to
compute the feedback gain. The LQ (LQR) problem was elegantly solved by Rudolf E. Kálmán.[9]
Optimal control problems are generally nonlinear and therefore, generally do not have analytic
solutions (e.g., like the linear-quadratic optimal control problem). As a result, it is necessary to
employ numerical methods to solve optimal control problems. In the early years of optimal
control (c. 1950s to 1980s) the favored approach for solving optimal control problems was that
of indirect methods. In an indirect method, the calculus of variations is employed to obtain the
first-order optimality conditions. These conditions result in a two-point (or, in the case of a
complex problem, a multi-point) boundary-value problem. This boundary-value problem actually
has a special structure because it arises from taking the derivative of a Hamiltonian. Thus, the
resulting dynamical system is a Hamiltonian system of the form[1]
where
is the augmented Hamiltonian and in an indirect method, the boundary-value problem is solved
(using the appropriate boundary or transversality conditions). The beauty of using an indirect
method is that the state and adjoint (i.e., ) are solved for and the resulting solution is readily
verified to be an extremal trajectory. The disadvantage of indirect methods is that the boundary-
value problem is often extremely difficult to solve (particularly for problems that span large time
intervals or problems with interior point constraints). A well-known software program that
implements indirect methods is BNDSCO.[10]
The approach that has risen to prominence in numerical optimal control since the 1980s is that
of so-called direct methods. In a direct method, the state or the control, or both, are approximated
using an appropriate function approximation (e.g., polynomial approximation or piecewise
constant parameterization). Simultaneously, the cost functional is approximated as a cost
function. Then, the coefficients of the function approximations are treated as optimization
variables and the problem is "transcribed" to a nonlinear optimization problem of the form:
Minimize
Depending upon the type of direct method employed, the size of the nonlinear optimization
problem can be quite small (e.g., as in a direct shooting or quasilinearization method), moderate
(e.g. pseudospectral optimal control[11]) or may be quite large (e.g., a direct collocation
method[12]). In the latter case (i.e., a collocation method), the nonlinear optimization problem
may be literally thousands to tens of thousands of variables and constraints. Given the size of
many NLPs arising from a direct method, it may appear somewhat counter-intuitive that solving
the nonlinear optimization problem is easier than solving the boundary-value problem. It is,
however, the fact that the NLP is easier to solve than the boundary-value problem. The reason for
the relative ease of computation, particularly of a direct collocation method, is that the NLP is
sparse and many well-known software programs exist (e.g., SNOPT[13]) to solve large sparse
NLPs. As a result, the range of problems that can be solved via direct methods (particularly
direct collocation methods which are very popular these days) is significantly larger than the
range of problems that can be solved via indirect methods. In fact, direct methods have become
so popular these days that many people have written elaborate software programs that employ
these methods. In particular, many such programs include DIRCOL,[14] SOCS,[15] OTIS,[16]
GESOP/ASTOS,[17] DITAN.[18] and PyGMO/PyKEP.[19] In recent years, due to the advent of the
MATLAB programming language, optimal control software in MATLAB has become more
common. Examples of academically developed MATLAB software tools implementing direct
methods include RIOTS,[20] DIDO,[21] DIRECT,[22] FALCON.m,[23] and GPOPS,[24] while an example of
an industry developed MATLAB tool is PROPT.[25] These software tools have increased
significantly the opportunity for people to explore complex optimal control problems both for
academic research and industrial problems.[26] Finally, it is noted that general-purpose MATLAB
optimization environments such as TOMLAB have made coding complex optimal control
problems significantly easier than was previously possible in languages such as C and
FORTRAN.
The examples thus far have shown continuous time systems and control solutions. In fact, as
optimal control solutions are now often implemented digitally, contemporary control theory is
now primarily concerned with discrete time systems and solutions. The Theory of Consistent
Approximations[27][28] provides conditions under which solutions to a series of increasingly
accurate discretized optimal control problem converge to the solution of the original, continuous-
time problem. Not all discretization methods have this property, even seemingly obvious ones.[29]
For instance, using a variable step-size routine to integrate the problem's dynamic equations may
generate a gradient which does not converge to zero (or point in the right direction) as the
solution is approached. The direct method RIOTS (http://www.schwartz-home.com/RIOTS) is
based on the Theory of Consistent Approximation.
Examples
A common solution strategy in many optimal control problems is to solve for the costate
(sometimes called the shadow price) . The costate summarizes in one number the marginal
value of expanding or contracting the state variable next turn. The marginal value is not only the
gains accruing to it next turn but associated with the duration of the program. It is nice when
can be solved analytically, but usually, the most one can do is describe it sufficiently well
that the intuition can grasp the character of the solution and an equation solver can solve
numerically for the values.
Having obtained , the turn-t optimal value for the control can usually be solved as a
differential equation conditional on knowledge of . Again it is infrequent, especially in
continuous-time problems, that one obtains the value of the control or the state explicitly.
Usually, the strategy is to solve for thresholds and regions that characterize the optimal control
and use a numerical solver to isolate the actual choice values in time.
Finite time
Consider the problem of a mine owner who must decide at what rate to extract ore from their
mine. They own rights to the ore from date to date . At date there is ore in the ground,
and the time-dependent amount of ore left in the ground declines at the rate of that
the mine owner extracts it. The mine owner extracts ore at cost (the cost of
extraction increasing with the square of the extraction speed and the inverse of the amount of
ore left) and sells ore at a constant price . Any ore left in the ground at time cannot be sold
and has no value (there is no "scrap value"). The owner chooses the rate of extraction varying
with time to maximize profits over the period of ownership with no time discounting.
1. Discrete-time version
As the mine owner does not value the ore remaining at time ,
Using the above equations, it is easy to solve for the and series
and using the initial and turn-T conditions, the series can be solved explicitly, giving .
2. Continuous-time version
As the mine owner does not value the ore remaining at time ,
Using the above equations, it is easy to solve for the differential equations governing
and
and using the initial and turn-T conditions, the functions can be solved to yield
See also
Active inference
Bellman equation
Brachistochrone
DIDO
DNSS point
Dynamic programming
Generalized filtering
GPOPS-II
CasADi
Kalman filter
Linear-quadratic regulator
Overtaking criterion
PID controller
Pursuit-evasion games
Stochastic control
Trajectory optimization
References
1. Ross, Isaac (2015). A primer on Pontryagin's principle in optimal control. San Francisco: Collegiate
Publishers. ISBN 978-0-9843571-0-9. OCLC 625106088 (https://www.worldcat.org/oclc/62510608
8) .
3. Kamien, Morton I. (2013). Dynamic Optimization: the Calculus of Variations and Optimal Control in
Economics and Management (http://worldcat.org/oclc/869522905) . Dover Publications. ISBN 978-1-
306-39299-0. OCLC 869522905 (https://www.worldcat.org/oclc/869522905) .
4. Ross, I. M.; Proulx, R. J.; Karpenko, M. (6 May 2020). "An Optimal Control Theory for the Traveling
Salesman Problem and Its Variants". arXiv:2005.03186 (https://arxiv.org/abs/2005.03186) [math.OC
(https://arxiv.org/archive/math.OC) ].
5. Ross, Isaac M.; Karpenko, Mark; Proulx, Ronald J. (1 January 2016). "A Nonsmooth Calculus for
Solving Some Graph-Theoretic Control Problems**This research was sponsored by the U.S. Navy" (http
s://doi.org/10.1016%2Fj.ifacol.2016.10.208) . IFAC-PapersOnLine. 10th IFAC Symposium on
Nonlinear Control Systems NOLCOS 2016. 49 (18): 462–467. doi:10.1016/j.ifacol.2016.10.208 (http
s://doi.org/10.1016%2Fj.ifacol.2016.10.208) . ISSN 2405-8963 (https://www.worldcat.org/issn/2405-
8963) .
7. Bryson, A. E. (1996). "Optimal Control—1950 to 1985". IEEE Control Systems Magazine. 16 (3): 26–33.
doi:10.1109/37.506395 (https://doi.org/10.1109%2F37.506395) .
9. Kalman, Rudolf. A new approach to linear filtering and prediction problems. Transactions of the ASME,
Journal of Basic Engineering, 82:34–45, 1960
10. Oberle, H. J. and Grimm, W., "BNDSCO-A Program for the Numerical Solution of Optimal Control
Problems," Institute for Flight Systems Dynamics, DLR, Oberpfaffenhofen, 1989
11. Ross, I. M.; Karpenko, M. (2012). "A Review of Pseudospectral Optimal Control: From Theory to Flight".
Annual Reviews in Control. 36 (2): 182–197. doi:10.1016/j.arcontrol.2012.09.002 (https://doi.org/10.10
16%2Fj.arcontrol.2012.09.002) .
12. Betts, J. T. (2010). Practical Methods for Optimal Control Using Nonlinear Programming (2nd ed.).
Philadelphia, Pennsylvania: SIAM Press. ISBN 978-0-89871-688-7.
13. Gill, P. E., Murray, W. M., and Saunders, M. A., User's Manual for SNOPT Version 7: Software for Large-
Scale Nonlinear Programming, University of California, San Diego Report, 24 April 2007
14. von Stryk, O., User's Guide for DIRCOL (version 2.1): A Direct Collocation Method for the Numerical
Solution of Optimal Control Problems, Fachgebiet Simulation und Systemoptimierung (SIM), Technische
Universität Darmstadt (2000, Version of November 1999).
15. Betts, J.T. and Huffman, W. P., Sparse Optimal Control Software, SOCS, Boeing Information and Support
Services, Seattle, Washington, July 1997
16. Hargraves, C. R.; Paris, S. W. (1987). "Direct Trajectory Optimization Using Nonlinear Programming and
Collocation". Journal of Guidance, Control, and Dynamics. 10 (4): 338–342.
Bibcode:1987JGCD...10..338H (https://ui.adsabs.harvard.edu/abs/1987JGCD...10..338H) .
doi:10.2514/3.20223 (https://doi.org/10.2514%2F3.20223) .
17. Gath, P.F., Well, K.H., "Trajectory Optimization Using a Combination of Direct Multiple Shooting and
Collocation", AIAA 2001–4047, AIAA Guidance, Navigation, and Control Conference, Montréal, Québec,
Canada, 6–9 August 2001
18. Vasile M., Bernelli-Zazzera F., Fornasari N., Masarati P., "Design of Interplanetary and Lunar Missions
Combining Low-Thrust and Gravity Assists", Final Report of the ESA/ESOC Study Contract No.
14126/00/D/CS, September 2002
19. Izzo, Dario. "PyGMO and PyKEP: open source tools for massively parallel optimization in
astrodynamics (the case of interplanetary trajectory optimization)." Proceed. Fifth International Conf.
Astrodynam. Tools and Techniques, ICATT. 2012.
21. Ross, I. M., Enhancements to the DIDO Optimal Control Toolbox, arXiv 2020.
https://arxiv.org/abs/2004.13112
22. Williams, P., User's Guide to DIRECT, Version 2.00, Melbourne, Australia, 2008
23. FALCON.m (http://www.falcon-m.com/) , described in Rieck, M., Bittner, M., Grüter, B., Diepolder, J.,
and Piprek, P., FALCON.m - User Guide, Institute of Flight System Dynamics, Technical University of
Munich, October 2019
24. GPOPS (http://gpops.sourceforge.net) Archived (https://web.archive.org/web/20110724074641/htt
p://gpops.sourceforge.net/) 24 July 2011 at the Wayback Machine, described in Rao, A. V., Benson,
D. A., Huntington, G. T., Francolin, C., Darby, C. L., and Patterson, M. A., User's Manual for GPOPS: A
MATLAB Package for Dynamic Optimization Using the Gauss Pseudospectral Method, University of
Florida Report, August 2008.
25. Rutquist, P. and Edvall, M. M, PROPT – MATLAB Optimal Control Software," 1260 S.E. Bishop Blvd Ste E,
Pullman, WA 99163, USA: Tomlab Optimization, Inc.
27. E. Polak, On the use of consistent approximations in the solution of semi-infinite optimization and
optimal control problems Math. Prog. 62 pp. 385–415 (1993).
28. Ross, I M. (1 December 2005). "A Roadmap for Optimal Control: The Right Way to Commute" (https://d
x.doi.org/10.1196/annals.1370.015) . Annals of the New York Academy of Sciences. 1065 (1): 210–
231. Bibcode:2005NYASA1065..210R (https://ui.adsabs.harvard.edu/abs/2005NYASA1065..210R) .
doi:10.1196/annals.1370.015 (https://doi.org/10.1196%2Fannals.1370.015) . ISSN 0077-8923 (http
s://www.worldcat.org/issn/0077-8923) . PMID 16510411 (https://pubmed.ncbi.nlm.nih.gov/1651041
1) . S2CID 7625851 (https://api.semanticscholar.org/CorpusID:7625851) .
29. Fahroo, Fariba; Ross, I. Michael (September 2008). "Convergence of the Costates Does Not Imply
Convergence of the Control" (https://dx.doi.org/10.2514/1.37331) . Journal of Guidance, Control, and
Dynamics. 31 (5): 1492–1497. Bibcode:2008JGCD...31.1492F (https://ui.adsabs.harvard.edu/abs/200
8JGCD...31.1492F) . doi:10.2514/1.37331 (https://doi.org/10.2514%2F1.37331) . ISSN 0731-5090
(https://www.worldcat.org/issn/0731-5090) . S2CID 756939 (https://api.semanticscholar.org/Corpus
ID:756939) .
Further reading
Bertsekas, D. P. (1995). Dynamic Programming and Optimal Control. Belmont: Athena. ISBN 1-
886529-11-6.
Bryson, A. E.; Ho, Y.-C. (1975). Applied Optimal Control: Optimization, Estimation and Control (htt
ps://books.google.com/books?id=P4TKxn7qW5kC) (Revised ed.). New York: John Wiley and
Sons. ISBN 0-470-11481-9.
Fleming, W. H.; Rishel, R. W. (1975). Deterministic and Stochastic Optimal Control (https://book
s.google.com/books?id=qJDbBwAAQBAJ) . New York: Springer. ISBN 0-387-90155-8.
Kamien, M. I.; Schwartz, N. L. (1991). Dynamic Optimization: The Calculus of Variations and
Optimal Control in Economics and Management (https://books.google.com/books?id=0IoGUn8
wjDQC) (Second ed.). New York: Elsevier. ISBN 0-444-01609-0.
Kirk, D. E. (1970). Optimal Control Theory: An Introduction (https://books.google.com/books?id
=onuH0PnZwV4C) . Englewood Cliffs: Prentice-Hall. ISBN 0-13-638098-0.
External links
CasADi – Free and open source symbolic framework for optimal control (https://web.casadi.or
g/)
Lecture Recordings and Script by Prof. Moritz Diehl, University of Freiburg on Numerical
Optimal Control (https://www.syscop.de/teaching/ss2020/numerical-optimal-control-online)