0% found this document useful (0 votes)
11 views121 pages

Optimal Control Problems Curves of Pursuit MSC Thesis

This document describes Svetlana Moiseeva's 2011 master's thesis from the University of New Mexico titled "Optimal Control Problems, Curves of Pursuit". The thesis examines pursuit-evasion problems as a class of optimal control problems. It provides background on two main approaches to studying optimal control - Pontryagin's maximum principle and Bellman's dynamic programming method. The bulk of the thesis then analyzes specific examples of pursuit, evasion, and pursuit-evasion problems, including Bouguer's pursuit problem, the wind-blown plane problem, and pursuit on a sphere.

Uploaded by

Camilo Chaparro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views121 pages

Optimal Control Problems Curves of Pursuit MSC Thesis

This document describes Svetlana Moiseeva's 2011 master's thesis from the University of New Mexico titled "Optimal Control Problems, Curves of Pursuit". The thesis examines pursuit-evasion problems as a class of optimal control problems. It provides background on two main approaches to studying optimal control - Pontryagin's maximum principle and Bellman's dynamic programming method. The bulk of the thesis then analyzes specific examples of pursuit, evasion, and pursuit-evasion problems, including Bouguer's pursuit problem, the wind-blown plane problem, and pursuit on a sphere.

Uploaded by

Camilo Chaparro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 121

University of New Mexico

UNM Digital Repository


Mathematics & Statistics ETDs Electronic Theses and Dissertations

7-1-2011

Optimal control problems, curves of pursuit


Svetlana Moiseeva

Follow this and additional works at: https://digitalrepository.unm.edu/math_etds

Recommended Citation
Moiseeva, Svetlana. "Optimal control problems, curves of pursuit." (2011). https://digitalrepository.unm.edu/math_etds/31

This Thesis is brought to you for free and open access by the Electronic Theses and Dissertations at UNM Digital Repository. It has been accepted for
inclusion in Mathematics & Statistics ETDs by an authorized administrator of UNM Digital Repository. For more information, please contact
[email protected].
Optimal Control Problems
Curves of Pursuit

by

Svetlana Moiseeva

B.S. Mathematics,
Peoples’ Friendship University of Russia, 2008

THESIS

Submitted in Partial Fulfillment of the


Requirements for the Degree of

Master of Science
Mathematics

The University of New Mexico

Albuquerque, New Mexico

May, 2011
c
°2011, Svetlana Moiseeva

iii
Dedication

To my parents, Valentina and Nikolay Moiseev,

for their support and encouragement.

iv
Acknowledgments

First, I would like to thank the Institute of International Education and the
Fulbright program who provided me the opportunity to study in the United States.
I must also recognize that my graduate experience and this thesis would not have
been possible without the financial assistance of the Department of Mathematics
and Statistics at the University of New Mexico in the form of a generous Teaching
Assistantship.
Second, I wish to thank Dr. Embid, my thesis advisor, for motivating me, and for
continuing to encourage me through the long number of months writing and rewriting
this thesis. I appreciate Dr. Embid’s vast knowledge and skill in mathematics, and
his patience in helping me during the completion of my thesis. His expertise and
understanding, guidance and professional style will remain with me as I continue my
career.
Third, I would like to thank my committee members, Dr. Lau and Dr. Nakamaye
for taking the time from their busy schedules to review my thesis and for their
valuable recommendations pertaining to this study and assistance in my professional
development. Your effort is greatly appreciated.
Finally, I would like to thank my family, for the love and support they provided
during graduate school, and always. I would also like to thank all my friends for
their constant support and always being there for me.

v
Optimal Control Problems
Curves of Pursuit

by

Svetlana Moiseeva

ABSTRACT OF THESIS

Submitted in Partial Fulfillment of the


Requirements for the Degree of

Master of Science
Mathematics

The University of New Mexico

Albuquerque, New Mexico

May, 2011
Optimal Control Problems
Curves of Pursuit

by

Svetlana Moiseeva

B.S. Mathematics,
Peoples’ Friendship University of Russia, 2008

M.S., Mathematics, University of New Mexico, 2011

Abstract

We study a class of problems known as pursuit-evasion problems (PE). These prob-


lems can be understood as special cases of optimal control problems. After describing
the two main principles to study optimal control problems, namely Pontryagin’s ma-
ximum principle and Bellman’s method of dynamic programming, this thesis focuses
on specific examples of PE problems within the classes of pursuit problems, evasion
problems, and pursuit-evasion problems.

vii
Contents

List of Figures x

1 Introduction 1

1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Optimal Control Processes 5

2.1 Formulation of the Optimal Control Problem . . . . . . . . . . . . . . 5

2.2 Necessary Conditions for Optimality


Pontryagin’s Maximum Principle . . . . . . . . . . . . . . . . . . . . 11

2.3 Bellman’s Method of Dynamic Programming . . . . . . . . . . . . . . 18

2.4 The relation between Pontryagin’s Maximum Principle and Bellman’s


Method of Dynamic Programming . . . . . . . . . . . . . . . . . . . . 27

3 The Pursuit Problem 38

3.1 Statement of the Problem . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2 Pierre Bouguer’s Pursuit Problem . . . . . . . . . . . . . . . . . . . . 40

viii
Contents

3.3 Wind-Blown Plane Problem . . . . . . . . . . . . . . . . . . . . . . . 51

3.4 The Tractrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.5 Apollonius Pursuit Problem . . . . . . . . . . . . . . . . . . . . . . . 62

4 The Evasion Problem 69

4.1 Statement of the Problem . . . . . . . . . . . . . . . . . . . . . . . . 69

4.2 Isaacs’s Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.3 Lady in the Lake Problem . . . . . . . . . . . . . . . . . . . . . . . . 76

5 Pursuit-Evasion Problem as an Optimal Control Problem 85

5.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.2 Simple Pursuit in the Plane . . . . . . . . . . . . . . . . . . . . . . . 90

5.3 One-dimensional Rocket Chase . . . . . . . . . . . . . . . . . . . . . 93

5.4 Pursuit on a Sphere (Kelley’s game) . . . . . . . . . . . . . . . . . . . 98

6 Conclusions 101

References 105

ix
List of Figures

2.1 Reformulation of the problem given by equation (2.1), where line l is


passing through the point (0, x1 ) and is parallel to the x0 axis, i.e.,
this line is made up of all the points (ξ, x1 ) where the number ξ is
arbitrary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 The route of the ship sailing from a to e . . . . . . . . . . . . . . . . 18

2.3 Two one-stage problems in the first subproblem . . . . . . . . . . . . 20

2.4 Three two-stage problems . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5 Two three-stage problems . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 Four-stage problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.7 Bang-bang time-optimal control: trajectories for u = 1 of parabolas


given by equation (2.40) . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.8 Bang-bang time-optimal control: trajectories for u = −1 of parabolas


given by equation (2.41) . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.9 Bang-bang time-optimal control: u(t) is initially equal to +1, and


then to −1, the phase trajectory consists of two adjoining parabolic
segments given by equations (2.40) and (2.41), respectively . . . . . 34

x
List of Figures

2.10 Bang-bang time-optimal control: u(t) is initially equal to −1, and


then to +1, the phase trajectory consists of two adjoining parabolic
segments given by equations (2.41) and (2.40), respectively . . . . . 35

2.11 Bang-bang time-optimal control: the switching curve and the family
of phase trajectories we obtained (AO is the arc of the parabola x1 =
1 2
2
(x2 ) in the lower half-plane, BO is the arc of the parabola x1 =
2
− 12 (x2 ) in the upper half-plane) . . . . . . . . . . . . . . . . . . . . 36

3.1 The geometry of Bouguer’s pursuit problem about a pirate ship mo-
ving directly toward the merchant vessel at constant speed Vp along
a curved path and pursuing a merchant vessel travelling at constant
speed Vm along the vertical line x = x0 . . . . . . . . . . . . . . . . . 40

3.2 The path of the pirate ship as given by equation (3.15) for n = 3/4 . 45

3.3 The geometry of the tail chase as given by equation (3.17) . . . . . . 47

3.4 The geometry of the wind-blown plane problem, where the plane’s
nose is always pointed toward a city C, the plane’s speed is v mi/h,
and a wind is blowing from the south at the rate of w mi/h . . . . . 51

3.5 Plots of the wind-blown plane’s paths given by equations (3.25) for
several values of n < 1 (n = 0.1, 0.2, 0.4, 0.8, 0.95, 0.99, 0.999) . . 54

3.6 The geometry of the tractrix problem, where a watch-on-a-chain with


the chain of length a is initially on the y-axis, the end of the chain
is pulled along the x-axis from the initial position on the origin . . . 58

3.7 A depiction of the tractrix given by equation (3.31) for a = 1 . . . . 60

xi
List of Figures

3.8 Schematic of the pursuit by interception problem with pursuer T


(Torpedo) and evader E (Enemy ship) moving with constant speeds
VT and VE , respectively . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.9 The Apollonius circle centered on (2/3, 0) with radius 2/3, given by
equation (3.34) for m = 1, p = 2, and k = 2, so that the torpedo is
located at T(2, 0) and the enemy ship is at E(1, 0) . . . . . . . . . . 64

3.10 The Apollonius circle centered on (7/3, 0) with radius 2/3, given by
equation (3.34) for m = 1, p = 2, and k = 1/2, so that the torpedo
is located at T(2, 0) and the enemy ship is at E(1, 0) . . . . . . . . . 66

3.11 The general geometry for a slow torpedo (T) interception of a fast en-
emy surface ship (E) (heading with an angle θ), where the Apollonius
circle for the points (m, 0) and (p, 0), p > m, is given by equation
(3.34) for k < 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.1 P and E defending and attacking, respectively, the target area C . . 71

4.2 The geometry of Isaacs’s problem for P defending a point target C,


and E attacking same target; l1 is the perpendicular bisector of P E,
l2 is the perpendicular line segment from C to l1 . . . . . . . . . . . 72

4.3 Plot of F (x0 /xc , y0 /xc ) given in equation (4.8) as a function of x0 /xc
for a given fixed value of y0 /xc (y0 /xc = 1.1, 2, 3, 4, 5, 6, 7). Each
curve gives the minimum value of R/xc . . . . . . . . . . . . . . . . 75

4.4 The first stage of the lady’s escape . . . . . . . . . . . . . . . . . . . 77

4.5 The instant when the lady reaches her go-for-broke circle . . . . . . 81

4.6 The radius of the lake R(φ) (in radians) given by equation (4.12) . . 84

xii
List of Figures

5.1 Simple motion in the plane. Point x moves anywhere within Ax (3)
at time t = 3; if its position y at t = 1 or z at t = 2 is known, the
possibilities are: reduce to Ay (2) or Az (1). . . . . . . . . . . . . . . . 91

5.2 Phase portrait of motion in ẍ = u in the x-y plane, where x(t) is


given by equation (5.15) for x(0) = 0, ẋ(0) = 2; attainability sets at
t = 2/3, 4/3, 2, 8/3 for the same initial values x(0) and ẋ(0). The
p
vertex loci are parabolas ẋ = y = ± 2(x + 2) . . . . . . . . . . . . . 94

5.3 Trajectories of ẋ = y − v, ẏ = u in the x-y plane with u = v = ±1


outside target |x| ≤ ε . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.4 Trajectories of ẋ = y − v, ẏ = u in the x-y plane. From point


a evader mistakenly chooses v = −1, but reverses his choice at b;
capture occurs at c (later than it would have occurred at d) . . . . . 96

xiii
Chapter 1

Introduction

1.1 Overview

A pursuit-evasion (PE) problem refers to a family of mathematical problems in which


one group (Pursuers) attempts to track down members of another group (Evaders)
in an environment. There are different formulations of these problems and each
formulation uses some specific features of the pursuit-evasion situation. Our objective
in this thesis is to study a variety of problems that encompass the main pursuit-
evasion problems from the point of view of the motion and strategy of the pursuer
(Chapter 3), the evader (Chapter 4), and both (Chapter 5).

Pursuit-evasion (PE) problems can be approached stochastically or deterministi-


cally. With the stochastic approach (for a broader discussion see [22], [24], [25], [16]
and [22]), it is more realistic to assume knowledge of the probability characteristics of
target detection, whereas the deterministic approach has to contend with trajectories
and control parameters that give the opportunity to find the minimum or maximum
values. In this thesis we restrict the discussion to the deterministic approach, and
state PE problems as optimal control problems, where we speak about optimality in

1
Chapter 1. Introduction

the sense of rapidity of action, i.e., about achieving the target in the shortest time
(Chapter 3), or avoiding the chaser as long as possible (Chapter 4), or both (Chapter
5).

Because the pursuit-evasion (PE) problem can be understood as a special case


of the more general class of problems known as optimal control processes, we devote
Chapter 2 to the formulation of the general optimal control problem and a discussion
of the two main approaches to solve this problem, namely Pontryagin’s maximum
principle (Theorems 2.2.1, 2.2.2) and Bellman’s equation (2.29). Pontryagin’s maxi-
mum principle was discovered in the late 1950s ([24]) by the Russian mathematician
Lev Semenovich Pontryagin.1 The maximum principle is an effective tool in sol-
ving a broad range of control problems, we state it for the important time-optimal
case (see sections 2.1, 2.2). Shortly before the appearance of Pontryagin’s maximum
principle in the late 1950s ([2], [3], [9], [16]), the American mathematician Richard
Bellman published his Dynamic Programming [2], [3], [15], [16]. He constructed a
partial differential equation for the functional that gives us the minimum time when
we transfer the controlled object from the initial state to some other given point (see
equation (2.6). This equation of Bellman’s gives rise to another approach to the solu-
tion of optimal control problems (see section 2.3). It must be noted though that the
assumption on the continuous differentiability of the functional (2.6) does not hold
in the simplest cases. Thus, Bellman’s consideration yields a good heuristic method,
rather than a mathematical solution of the problem. The maximum principle, in
1 Pontryagin’s
maximum principle gave birth to optimal control theory, which at present
is a vital area in applied mathematics. Pontryagin was led to the formulation of the
general time-optimal control problem by an attempt to solve a concrete fifth-order system of
ordinary differential equations with three control parameters related to optimal maneuvers
of an aircraft, which was proposed to him by the Russian Air Force in the early spring
of 1955 [10]. Right after the formulation of the time-optimal control problem, during
three days, or better to say, during three sleepless nights (Pontryagin suffered from severe
insomnia and very often used to do math in bed all night long), the first and the most
important step toward the final solution (the Pontryagin’s maximum principle) was made
by Pontryagin [10]. He derived the first version of the necessary conditions.

2
Chapter 1. Introduction

addition to its complete mathematical validity, also has the advantage that it results
in a system of ordinary differential equations, whereas Bellman’s approach requires
the solution of a partial differential equation. Both approaches will be discussed and
compared in Chapter 2.

Even though we are not going to literally apply Pontryagin’s or Bellman’s general
approach to the specific examples discussed in this thesis, we introduce them because
they provide the appropriate framework in the formulation of the pursuit-evasion
problem.

We start Chapter 2 by discussing optimal control processes. A process is called


controlled if it can be described by a vector differential equation with a control
parameter and a phase point. The problem then is to choose such a control, as
a function of time, so that the corresponding trajectory of the given differential
equation is shifted from a given initial point to some other given point in minimum
time. In this case the control and its corresponding trajectory are called optimal.
The other important class of problems is a time-optimal control problem, which is
defined in the same Chapter 2. In section 2.2 (Chapter 2) we introduce a hamiltonian
function in order to state Pontryagin’s maximum principle (Theorems 2.2.1, 2.2.2),
which includes an important Pontryagin’s maximum condition.

The rest of this thesis is organized as follows: in Chapter 3 we present a definition


of the pursuit problem, and provide examples of the pursuit problem (Bouguer’s prob-
lem (section 3.2), the plain and the wind problem (section 3.3), the tractrix (section
3.4), and Apollonius pursuit (section 3.5)). In Chapter 4 we state a definition of the
evasion problem, and give examples of the evasion problem (Isaacs’s problem (sec-
tion 4.2), and the lady in the lake problem (section 4.3)). In Chapter 5 we present a
definition of the pursuit-evasion problem, and examples of these problems (pursuit in
the plane (section 5.2), one-dimensional rocket chase (section 5.2), and Kelly’s game
(section 5.4)). Moreover, we state the possible method of solving pursuit-evasion

3
Chapter 1. Introduction

problems while using Pontryagin’s maximum principle. Although this theorem gives
the necessary conditions for optimality of pursuit-evasion problems (and it can be
generalized to multiple pursuers and multiple evaders, as in [7]), the fact is that
the PE problems studied in the current thesis can be analyzed directly by more
elementary methods. Finally, in Chapter 6 we summarize the results of the thesis.

4
Chapter 2

Optimal Control Processes

Since the pursuit evasion (PE) problem can be understood as a special case of the
more general class of problems known as optimal control processes, we are going to
devote this chapter to the formulation of the general optimal control problem and
a discussion of the two main approaches to solve this problem, namely Pontryagin’s
maximum principle (Theorems 2.2.1, 2.2.2) and Bellman’s equation (2.29).

2.1 Formulation of the Optimal Control Problem

A desirable property of most technological processes is controllability, which roughly


speaking means that a particular process can be realized by a proper adjustment of
certain control parameters. Mostly important is the search, among all the control-
lable processes, of the control that optimizes a related function of this process. This
problem is known as the optimal control problem. For example, one can speak about
optimality in the way of spending the least possible time or using the minimum energy
in order to reach the target. These problems can be formulated mathematically, and
their solution is given by a general method known as Pontryagin’s maximum principle

5
Chapter 2. Optimal Control Processes

(Theorems 2.2.1, 2.2.2) ([10], [21], [22], [23]).

To start, we consider control processes which can be described by a system of


ordinary differential equations
dxi
= f i (x1 , ..., xn , u1 , ..., ur ) = f i (xk , uj ), i, k = 1, ..., n, j = 1, ..., r, (2.1)
dt
or in vector form,
dx
= f (x, u). (2.2)
dt
The variables x1 , ..., xn characterize the process, and they are known as the phase
coordinates of the controlled object which define its state at each instant of time t.
Giving a point u = (u1 , ..., ur ) ∈ U ⊂ Rr is equivalent to giving a numerical system of
parameters u1 , ..., ur , and they are known as the control parameters which determine
the course of the process. The functions f i are defined for x ∈ X ⊂ Rn and u ∈
U ⊂ Rr . They are assumed to be continuous in the variables x1 , ..., xn , u1 , ..., ur , and
continuously differentiable with respect to x1 , ..., xn . In other words, the functions
∂f i (x1 , ..., xn , u)
f i (x1 , ..., xn , u) and , i, j = 1, ..., n,
∂xj
are defined and continuous everywhere on the direct product X × U.

In order to find a solution of equation (2.1) and determine the course of the
control process (2.1) in a certain time interval t0 ≤ t ≤ t1 , it is sufficient for the
control parameters u1 , ..., ur to be the functions of time on this time interval:

uj = uj (t), j = 1, ..., r. (2.3)

Then, for the given initial values

xi (t0 ) = xi0 , i = 1, ..., n, (2.4)

the solution is uniquely determined, at least locally in time. Hence, we say that a
control

U = (uj (t), t0 , t1 , xi0 ), j = 1, ..., r, i = 1, ..., n (2.5)

6
Chapter 2. Optimal Control Processes

of equation (2.1) is given, if a function uj (t), its range of definition t0 ≤ t ≤ t1 , and


the initial value (2.4) of the solution xi (t) are given. Therefore, we only deal with
piecewise continuous control functions uj (t) which admit discontinuities of the first
kind, and continuous solutions of equation (2.2).

The control problem to be solved, which is related to the control process (2.1),
consists of the following. We consider the integral function

Zt1
¡ ¢
L(U ) = f 0 x1 , ..., xn , u1 , ..., ur dt, (2.6)
t0

where f 0 (x1 , ..., xn , u1 , ..., ur ) is a given function, continuous, together with its partial
derivatives
∂f 0
, j = 1, ..., n,
∂xj

everywhere on the space X × U . For each control (2.5), given on a certain interval
t0 ≤ t ≤ t1 , the course of the control processes is uniquely determined, at least locally
in time, and the integral (2.6) takes on a definite value. Let us assume that there
exists a control (2.5) which transfers the controlled object from a given initial phase
state xi0 (2.4) to a prescribed terminal phase state

xi (t1 ) = xi1 , i = 1, ..., n. (2.7)

It is required to find a control u(t) which transfers the controlled object from state
xi0 to state xi1 in such a way that the functional L(U ) has a minimum value. Thus,
L is a function of the control U .

Let us summarize the above discussion and state the definition of the optimal
control problem (equations (2.2), (2.4), (2.5), (2.6), (2.7)).

7
Chapter 2. Optimal Control Processes

Definition (Optimal Control Problem) An optimal control problem is a problem


given by the equations
dx
= f (x, u),
dt
x(t0 ) = x0 ,

x(t1 ) = x1 ,

U = (u(t), t0 , t1 , x0 ),
Zt1
L(U ) = f 0 (x, u) dt,
t0

where x = (x , ..., x ) ∈ X ⊂ R , u = (u1 , ..., ur ) ∈ U ⊂ Rr is some piecewise contin-


1 n n

uous function, f = (f 1 , ..., f n ) are continuous, together with its partial derivatives
everywhere on the space X × U , and f 0 (x, u) is a given function (also continuous,
together with its partial derivatives, everywhere on the space X × U).

Definition A control U = (u(t), t0 , t1 , xi0 ) is called optimal, if, for any control

U ∗ = (u∗ (t), t0 , t1 , xi0 )

which transfers the point xi0 to the point xi1 , the inequality

L(U ) ≤ L(U ∗ )

is valid. The corresponding trajectory x(t) is called an optimal trajectory.

Thus, an optimal control problem consists of finding the optimal controls and the
corresponding optimal trajectories.

Remark 1. The times t0 and t1 are not fixed, we only require that the object should
be in state (2.4) at the initial time, and in state (2.7) at the final time, and that the
functional (2.6) should achieve a minimum. (The discussion of the case where the
times t0 and t1 are fixed can be found in [24], §8 .)

8
Chapter 2. Optimal Control Processes

Remark 2. If (2.5) is an optimal control of equation (2.2) corresponding to this


control, and t2 , t3 (t2 < t3 ) are two points in the interval t0 ≤ t ≤ t1 , then

U 0 = (u(t), t2 , t3 , xi (t2 ))

is also an optimal control.

Remark 3. If (2.5) is an optimal control of equation (2.2) that transfers the point
xi0 to the point xi1 , and τ is an arbitrary number, then

U 00 = (u(t − τ ), t0 + τ, t1 + τ, xi0 )

is also an optimal control which transfers the point xi0 to xi1 .

Definition (Time-Optimal Control Problem) When the function f 0 (xi , uj ) is defined


by equation

f 0 (x, u) ≡ 1, (2.8)

the function of the control (2.5) in this case is

L(U ) = t1 − t0 ,

and the optimality of the control u(t) signifies minimality of the transition time from
x0 to x1 . The problem of finding optimal controls (and trajectories) in this case is
called the time optimal control problem.

We should point out that up to know we have spoken about an optimal control
which brought the object to a given point. However, the optimal control problem
may consist of “optimality getting to” a moving point in phase space. Let us assume
that there exists a moving point

xi = θi (t), i = 1, ..., n, (2.9)

9
Chapter 2. Optimal Control Processes

in phase space. Then, there arises the problem of optimality bringing the object in
coincidence with a moving point. This problem is easily reduced to the one considered
above. It is sufficient to introduce new variables by setting

y i = xi − θi (t), i = 1, ..., n.

As a result of this transformation, the control system

dxi
= f i (xi , uj ), i = 1, ..., n, j = 1, ..., r,
dt

becomes a new system. The goal of the control process becomes that of bringing the
new object (y 1 , ..., y n ) to the stationary point (0, ..., 0) in phase space.

Of great importance is the case where U ⊂ Rn is a compact domain. This is


clearly the case in most practical applications, where the control parameters can
only take values with predetermined upper and lower bounds. For example, U may
be a cube defined by the inequalities

|uj | ≤ 1, j = 1, ..., r.

In many instances it turns out that the optimal control (2.5) is realized by a piecewise
constant control (u1 (t), ..., ur (t)) with values switching between various vertices of U.

It follows that the class of admissible controls (2.5) must include piecewise con-
tinuous functions. For the same reason, the phase coordinates x1 , ..., xn are assumed
to be continuous and piecewise differentiable functions of time. Under these as-
sumptions the necessary conditions for optimality are formulated in the form of
Pontryagin’s maximum principle (Theorems 2.2.1, 2.2.2) ([22], [23], [24]), which we
will present in the next section.

10
Chapter 2. Optimal Control Processes

2.2 Necessary Conditions for Optimality


Pontryagin’s Maximum Principle

In order to formulate the necessary optimality condition it will be convenient to


reformulate our optimal control problem (for a broader discussion see [24]). Namely,
let us adjoin a new coordinate x0 to the phase coordinates x1 , ..., xn , which vary
according to (2.1). Let x0 vary according to the law

dx0
= f 0 (x1 , ..., xn , u1 , ..., ur ),
dt

where f 0 is the function which appears in the definition of the functional L(U ) (see
(2.6)). In other words, we shall consider the system of differential equations

dxi
= f i (x1 , ..., xn , u1 , ..., ur ) = f i (x, u), i = 0, 1, ..., n, (2.10)
dt

whose right-hand sides do not depend on x0 . Introducing the vector

x = (x0 , x1 , ..., xn ) = (x0 , x)

in the (n + 1)-dimensional vector space X = R × X ⊆ Rn+1 , we may rewrite system


(2.10) in vector form

dx
= f(x, u), (2.11)
dt

where f(x, u) is the vector in X with coordinates f 0 (x, u), ..., f n (x, u). Note, that
f(x, u) does not depend on the coordinate x0 of the vector x, that is f~(x, u), not
f~(~x, u).

Now let u(t) be an admissible control (2.5) (i.e., piecewise continuous) transferring
x0 to x1 , and let x = x(t) be the corresponding solution of equation (2.2) with initial
condition x(t0 ) = x0 . Let us denote the point (0, x0 ) by x0 , i.e., x0 is the point of
X whose coordinates are 0, x10 , ..., xn0 , where x10 , ..., xn0 are the coordinates of x0 in X .

11
Chapter 2. Optimal Control Processes

Figure 2.1: Reformulation of the problem given by equation (2.1), where line l is
passing through the point (0, x1 ) and is parallel to the x0 axis, i.e., this line is made
up of all the points (ξ, x1 ) where the number ξ is arbitrary

Then, it is clear that the solution of equation (2.11) with initial condition x(t0 ) = x0 ,
corresponding to the control u(t), is defined on the entire interval t0 ≤ t ≤ t1 , and
has the form
Zt
x0 = f 0 (x(t0 ), u(t0 ))dt0 , x = x(t).
t0

In particular, when t = t1
Zt1
x0 = f 0 (x(t), u(t))dt = L(U ), x = x1 ,
t0

i.e., the solution x(t) of equation (2.11) with initial condition x(t0 ) = x0 passes
through the point x = (L(U ), x1 ) at t = t1 . In other words, if we let l be the line
in X passing through the point x = (0, x1 ) and parallel to the x0 axis (this line is
made up of all the points (ξ, x1 ) where the number ξ is arbitrary, see Figure 2.1), we
can say that x(t) passes through a point on line l, with coordinate x0 = L(U ), at
the time t = t1 . Conversely, suppose that u(t) is an admissible control (i.e., at least

12
Chapter 2. Optimal Control Processes

piecewise continuous) such that the corresponding solution x(t) of equation (2.11)
with initial condition x(t0 ) = x0 = (0, x0 ), at some time t1 passes through a point
x1 ∈ l, with coordinate x0 = L(U ). Then, the control u(t) transfers (in X ) the phase
point from x0 to x1 , and the functional (2.6) takes on the value L(U ).

Thus, we may formulate the above optimal problem (from 2.1) in the following
equivalent form.

In the (n + 1)-dimensional phase space X the point x0 = (0, x0 ) and the line l
are given. The line l is assumed to be parallel to the x0 axis, and to pass through the
point (0, x1 ). Among all the admissible controls u = u(t), having the property that
the corresponding solution x(t) of (2.11) with initial condition x(t0 ) = x0 intersects
l, find one whose point of intersection with l has the smallest coordinate x0 (see [24]).

Let us now proceed to the formulation of the theorem which yields the necessary
conditions of the problem. (The proof of this theorem can be found in [24], Chapter
II.) To formulate the theorem, we shall consider, in addition to the fundamental sys-
tem of equations (2.10) another system of equations in the auxiliary (supplementary)
variables ψ0 , ψ1 , ..., ψn :

Xn
dψi ∂f α (x, u)
=− ψα , i = 0, 1, ..., n. (2.12)
dt α=0
∂xi

If we choose an admissible control u(t), t0 ≤ t ≤ t1 , and have the corresponding


phase trajectory x(t) of system (2.10) with initial condition x(t0 ) = x0 , system (2.12)
takes the form
Xn
dψi ∂f α (x(t), u(t))
=− i
ψα , i = 0, 1, ..., n. (2.13)
dt α=0
∂x

This system is linear and homogeneous. Therefore, for any initial condition, it admits
the unique solution
ψ = (ψ0 , ψ1 , ..., ψn )

13
Chapter 2. Optimal Control Processes

for the ψi (which is defined on the entire interval t0 ≤ t ≤ t1 on which u(t) and x(t)
are defined). Similarly to the solution x(t) of system (2.11), the solution of system
(2.13) consists of continuous functions ψi (t) which have everywhere, except at a
finite number of points (namely, at the points of discontinuity of u(t)), continuous
derivatives with respect to t. Each solution of system (2.13) for any initial conditions
will be called the solution of system (2.12) corresponding to the chosen control u(t)
and phase trajectory x(t).

Now we will combine systems (2.10) and (2.12) into one entry. We consider the
following function H of the variables x0 , x1 , ..., xn , ψ0 , ψ1 , ..., ψn , u1 , ..., ur :

n
X
H(ψ, x, u) = (ψ, f(x, u)) = ψα f α (x, u).
α=0

The above systems (2.10) and (2.12) can be rewritten with the aid of the function
H in the form of the following Hamiltonian system:

dxi ∂H
= , i = 0, ..., n, (2.14)
dt ∂ψi

dψi ∂H
= − i, i = 0, ..., n. (2.15)
dt ∂x

For fixed (constant) values of ψ and x, the function H becomes a function of the
parameter u ∈ U. Let us now denote the least upper bound of the values of this
function by M(ψ, x):

M(ψ, x) = sup H(ψ, x, u).


u∈U

If the continuous function H achieves its upper bound on U, then M(ψ, x) is the
maximum of the values of H, for fixed ψ and x. Therefore, Theorem 2.2.1 below
(a necessary condition for optimality) will be called the maximum principle (the
principal content of the principle is in equation (2.16)) [10], [24].

14
Chapter 2. Optimal Control Processes

Theorem 2.2.1 (Pontryagin’s Maximum Principle) Let u(t), t0 ≤ t ≤ t1 , be


an admissible control such that the corresponding trajectory x(t) [see (2.14)] which
begins at the point x0 at the time t0 passes, at some time t1 , through a point on
the line l. In order that u(t) and x(t) be optimal it is necessary that there exists a
nonzero continuous vector function ψ(t) = (ψ0 (t), ψ1 (t), ..., ψn (t)) corresponding to
u(t) and x(t) [see (2.15)], such that:

(a) for every t, t0 ≤ t ≤ t1 , the function H(ψ(t), x(t), u) of the variable u ∈ U


attains its maximum at the point u = u(t):

H(ψ(t), x(t), u(t)) = M(ψ(t), x(t)), (2.16)

(b) at the terminal time t1 the relations

ψ0 (t1 ) ≤ 0, M(ψ(t1 ), x(t1 )) = 0 (2.17)

are satisfied. Furthermore, it turns out that if ψ(t), x(t), and u(t) satisfy system
(2.14), (2.15), and condition (a), the time functions ψ0 (t) and M(ψ(t), x(t)) are
constant. Thus, (2.17) may be verified at any time t, t0 ≤ t ≤ t1 , and not just at t1 .

The proof of the Theorem 2.2.1 can be found in [24], Chapter II.

To formulate the necessary condition for the time-optimal problem (equation


(2.8)), where
f 0 (x, u) ≡ 1,

let us form the Hamiltonian function


n
X
H = ψ0 + ψν f ν (x, u).
ν=1

Introducing the n-dimensional vector ψ = (ψ1 , ..., ψn ) and the function


n
X
H(ψ, x, u) = ψν f ν (x, u),
ν=1

15
Chapter 2. Optimal Control Processes

we can rewrite equations (2.1) and (2.12) (with the exception for equation (2.12) for
i = 0, which is now superfluous) in the form of the Hamiltonian system
dxi ∂H
= , i = 1, ..., n, (2.18)
dt ∂ψi
dψi ∂H
= − i, i = 1, ..., n. (2.19)
dt ∂x

For fixed values of ψ and x, H is a function of u. We denote the upper bound of


the values of this function by M (ψ, x):

M (ψ, x) = sup H(ψ, x, u).


u∈U

Since
H(ψ, x, u) = H(ψ, x, u) − ψ0 ,

we get
M (ψ, x) = M(ψ, x) − ψ0 ,

and therefore (2.16) and (2.17) become

H(ψ(t), x(t), u(t)) = M (ψ(t), x(t)) = −ψ0 ≥ 0.

Hence, we obtain the following theorem.

Theorem 2.2.2 (Pontryagin’s Maximum Principle for the time-optimal


control problem (2.8)) Let u(t), t0 ≤ t ≤ t1 be an admissible control which
transfers the phase point from x0 to x1 , and let x(t) be the corresponding trajec-
tory (see (2.18)), so that x(t0 ) = x0 , x(t1 ) = x1 . In order that u(t) and x(t) be
time-optimal it is necessary that there exist a nonzero, continuous vector function
ψ(t) = (ψ1 (t), ..., ψn (t)) corresponding to u(t) and x(t) (see (2.19)) such that:

(a) for all t, t0 ≤ t ≤ t1 , the function H(ψ(t), x(t), u) of the variable u ∈ U


attains its maximum at the point u = u(t):

H(ψ(t), x(t), u(t)) = M (ψ(t), x(t)), (2.20)

16
Chapter 2. Optimal Control Processes

(b) at the terminal time t1 the relation

M (ψ(t1 ), x(t1 )) ≥ 0 (2.21)

is satisfied. Furthermore, it turns out that if ψ(t), x(t), and u(t) satisfy system
(2.18), (2.19), and condition (a), the time function M (ψ(t), x(t)) is constant. Thus,
(2.21) may be verified at any time t, t0 ≤ t ≤ t1 , and not just at t1 .

17
Chapter 2. Optimal Control Processes

Figure 2.2: The route of the ship sailing from a to e

2.3 Bellman’s Method of Dynamic Programming

Shortly before the appearance of Pontryagin’s maximum principle in the late 1950s,
R. Bellman published his Dynamic Programming [2], [3], [15], [16], which presents
a related but different approach to the optimum design of control systems which is
more efficient in some situations. The following simple example will illustrate some
of the main ideas behind this dynamic programming approach.

Example Suppose a ship sailing from a and ending at e calls at three ports (at
either of the two b’s, at one of the three c’s, and at one of the two d’s) along the
way (as shown in Figure 2.2) and picks up and delivers the amounts of cargo (in
hundreds of tons). The objective is to deliver as much cargo as possible on the entire
trip. Since there are only 12 different routes, it is a simple matter to list them all
and choose the route that yields the maximum tonnage. However, we shall solve
the problem differently and use the following reasoning. Suppose that, somehow we
were to know the maximum tonnage values of the two shorter problems, one from b1

18
Chapter 2. Optimal Control Processes

to e and the other from b2 to e, then it would be very easy to decide on the entire
route. There are only two possible decisions left to be made at a: go to b1 or go
to b2 . To reach such a decision, simply add 4 to that maximum tonnage from b1
to e that we somehow learned, add 2 to that maximum tonnage from b2 to e, and
choose the route that gives the larger value. In other words, we will have solved the
original four-stage problem by first solving two three-stage problems. Similarly, each
of these two three-stage problems (from b1 to e or from b2 to e) would be relatively
easy to solve if we were to first solve three two-stage problems, namely, find the value
given by the maximum tonnage path from each ci , i = 1, 2, 3, to e. We continue this
reasoning and reduce the process to two one-stage problems, from d1 to e or from d2
to e, at which stage the answer is obvious - go from d1 , because 7 is larger than 4.

Let us do the problem formally. We will break it into several n - stage problems,
n = 1, 2, 3, 4. Notice that there are four stages: from a to b, b to c, c to d, and d
to e. There are two possible terminal ports, or states as we will call them, namely
b1 and b2 in stage one, three states c1 , c2 , and c3 in stage two, two states d1 and
d2 in stage three, and one state e in the last stage. Each of these states may also
be thought of as the initial state for the following stages. For instance, b1 may be
considered the initial state of a three-stage problem, c1 the initial state of a two-stage
problem, etc. Let the variable x stand for the initial state for any n-stage problem,
n = 1, 2, 3, 4. For instance, for a two-stage problem, x may be either c1 , or c2 , or c3 .
Associated with each problem is also a decision or control variable un , n = 1, 2, 3,
4, which chooses the immediate destination when there are n stages left to go. Thus,
u4 chooses b1 or b2 , u3 chooses c1 or c2 or c3 , u2 chooses d1 or d2 , and u1 = e. Let
fn (x, un ) be the total number of tons delivered during the last n stages, given that
the boat is in state x and the decision is un . If ūn is the decision which maximizes
fn (x, un ) for fixed n and x, let f¯n (x) be that maximum value of fn . Since f¯n is the
maximum value with respect to the decision variable un , it is now a function of the
initial state variable x alone, hence, the notation f¯n (x).

19
Chapter 2. Optimal Control Processes

Figure 2.3: Two one-stage problems in the first subproblem

Figure 2.4: Three two-stage problems

In the first subproblem, there is only one stage left to go, and ū1 = u1 = e. The
initial states are d1 and d2 , as shown in Figure 2.3.

We move now to the three subproblems in each of which there are two stages to
go, but we utilize the knowledge gained from the one-stage problem. If the boat is
at c1 (x = c1 ), it can proceed to either d1 (u2 = d1 ) or d2 (u2 = d2 ). If u2 = d1 ,
f2 (c1 , d1 ) = 7 + 7 = 14. If u2 = d2 , f2 (c1 , d2 ) = 3 + 4 = 7. Since 14 > 7, ū2 should be
ū2 = d1 . Similarly, if the boat is at c2 (x = c2 ) and u2 = d1 , f2 (c2 , d1 ) = 8 + 7 = 15,
while f2 (c2 , d2 ) = 4 + 4 = 8 if u2 = d2 . Let sun be the number of tons of cargo
delivered as a result of decision un . Then f2 (x, u2 ) = su + f¯1 (u). Figure 2.4 shows
2

the values for the different states and decisions.

Next, we move to the two subproblems in each of which there are three stages to
go, and again we utilize the knowledge gained from the previous two-stage problems.
If the boat is at b1 (x = b1 ) and it is decided to go to c1 (u3 = c1 ), the total
number of tons delivered would be 1, that between b1 and c1 , plus 14, the maximum

20
Chapter 2. Optimal Control Processes

Figure 2.5: Two three-stage problems

Figure 2.6: Four-stage problem

number of tons to be delivered between c1 and e. That is, f3 (b1 , c1 ) = sc1 + f¯2 (b1 ),
or f3 (x, u3 ) = su + f¯2 (x). See Figure 2.5.
3

The final, or four-stage problem should now be clear. So what is the optimal
policy for the overall problem? Retrace the steps backwards starting with Figure
2.6. Starting at a, the optimal decision ū4 is to go to b2 . At b2 , ū3 tells us to go to
c2 . At c2 , ū2 tells us to go to d1 . At d1 , ū1 says to go to e. Thus, the optimal route
is a → b2 → c2 → d1 → e, with a maximum tonnage of 25.

There are only four stages in this example, and each stage has very few states, so
that the computational advantages of the dynamic programming approach over the
direct, brute approach of listing all twelve possible routes may not be apparent. If a
problem has many stages with many states, thus involving many decision processes,
direct enumeration may require a phenomenal amount of work, and the computa-
tional savings of the dynamic programming approach are considerable. It has been

21
Chapter 2. Optimal Control Processes

shown that for a 20-stage problem with only 2 states in each stage, direct enumera-
tion generates more than 1,000,000 additions, while dynamic programming requires
only 220 additions.

The above example is a discrete multistage decision process problem, in which one
chooses a decision from a finite set of decisions at each of a finite number of stages or
times. Initially, the problem consisted of n stages, but we reduced it to a sequence of
n single stage decision processes, for each of which there is an optimal policy. These
problems are joined together by a functional equation. For this particular example,
the functional is

fn (x, un ) = sun + f¯n−1 (un ), (2.22)

where
f¯n (x) = max fn (x, un ), u = 1, 2, 3, 4.
un

Hence, we use two basic ideas, Bellman’s principle of optimality and the principle of
imbedding [16].

Summarizing the method discussed in this example yields Bellman’s Princi-


ple of optimality ) ([16]) : in control systems with a multistage decision process,
given any current state, the remaining sequence of decisions forms an optimal policy
with this given state regarded as the initial state. Thus, whatever the first state and
decision that led to this current state, all future decisions are optimal.

In our example, if we found ourselves at, say, state c1 (regardless of what decision
led us there), the policy c1 → d1 → e is optimal with c1 considered as the initial
state. Similarly, if we found ourselves at, say, state b1 , the policy b1 → c2 → d1 → e
would be optimal with b1 considered as the initial state. By applying this principle
of optimality backwards step by step repeatedly, we obtain a policy which is optimal
for the overall problem. In our example, in the one-stage problems, either decision

22
Chapter 2. Optimal Control Processes

d1 → e or d2 → e is optimal (actually, the only possible decision), depending on


whether d1 or d2 is the initial state. For the two-stage problems, if c1 is the initial
state, the decision ū2 : c1 → d1 is optimal, and the pair ū2 , ū1 : c1 → d1 → e
constitutes an optimal policy with c1 as the initial state. If c2 is the initial state,
the pair of decisions ū2 , ū1 : c2 → d1 → e is optimal. Similarly, for the three stage
problems, if b1 is the initial state, the optimal decision ū3 : b1 → c2 , coupled with the
optimal strategy from the two-stage problem ū2 , ū1 : c2 → d1 → e, form the optimal
strategy ū3 , ū2 , ū1 : b1 → c2 → d1 → e, etc.

The other principle we used in the above example is the principle of imbedding
([16]) . The principle is working in the way that, instead of attempting to solve a
difficult problem directly, one imbeds the problem in a family of simpler, easier to
solve problems and obtains the solution to the original difficult problem as a result
of the solutions to the problems in the family. By repeated use of the principle of
optimality, each n-stage problem with n > 1 is converted into a one-stage problem
with its own initial state and optimal policy. This is done through the use of some
functional equation such as the relation given by (2.22), which, for each problem in
the family with its initial state, assigns an optimum value to that problem and links
that value with all immediately preceding states.

These two basic ideas - imbedding and principle of optimality - are also to be
found in the dynamic programming approach to continuous cases.

Next, let us write Bellman’s equation for a continuous time variable.

Proposition 2.3.1 Suppose we have a time-optimal control problem (2.8). Let us


fix some point x1 of the space X , and let u(t), t0 ≤ t ≤ t1 , be an optimal control which
transfers (through the law of motion xi (t) = xi , i = 1, ..., n) the phase point from
some position x0 ∈ X to the position x1 , and let x(t) be the corresponding optimal
trajectory. The optimal transition time from the point x0 to the point x1 , t1 − t0 ,

23
Chapter 2. Optimal Control Processes

will be denoted by T (x0 ). (The point x1 does not enter into the notation for the
transition time, since it does not vary). Thus, the function T (x0 ) is defined on the
open set Ω of all points of X from which an optimal transition to x1 is possible. We
set T (x) = −ω(x), where T (x) has continuous partial derivatives with respect to the
coordinates of the point x, and derive that the function ω(x) satisfies the following
nonclassical partial differential equation (which we shall call Bellman’s equation)
in the region Ω:
n
X ∂ω(x)
sup f α (x, u) = 1. (2.23)
u∈U
α=1
∂xα

Furthermore, the upper bound is attained at some point u ∈ U (namely, at the value
of the optimal control at the time of departure from the point x), and the function
ω(x) is nonpositive and vanishes only at the point x1 .

Proof It is given, that

ω(x) = −T (x). (2.24)

Since x(t), t0 ≤ t ≤ t1 , is an optimal trajectory, and since each portion of an optimal


trajectory is also an optimal trajectory,

ω(x(t)) = −T (x0 ) + t − t0 (2.25)

for every t, t0 ≤ t ≤ t1 . Consequently,


n
X n
X
∂ω(x(t)) α ∂ω(x(t)) dxα dω(x(t)) dt
f (x(t), u(t)) = = = = 1. (2.26)
α=1
∂xα α=1
∂xα dt dt dt

Now let v be an arbitrary point of the control region U. We shall consider the motion
of the phase point from the position x(t) under the influence of a constant control
which is equal to v. Here the problem can be imbedded into the family of problems,
following the principle of imbedding discussed before. Mainly, we divide the whole
process into two control processes. Thus, after an infinitesimal time interval dt > 0,

24
Chapter 2. Optimal Control Processes

the phase point will be in the position x(t) + dx, where the vector dx = (dx1 , ..., dxn )
is defined by

dxi = f i (x(t), v)dt, i = 1, ..., n. (2.27)

If we now move in an optimal manner from the point x(t) + dx to the point x1 ,
the time spent in so doing will equal T (x(t) + dx). Hence, the total time spent in a
movement of this kind, while transferring from x(t) to x1 , is equal to T (x(t)+dx)+dt.
This time cannot be shorter than the optimal transition time T (x(t)), i.e.,

T (x(t) + dx) + dt ≥ T (x(t)),

or equivalently,
ω(x(t) + dx) − ω(x(t)) ≤ dt.

Multiply and divide the left side by dxα , and since we know that
n
X n
X
w (x(t) + dx) − w (x(t)) ∂w (x(t))
= ,
α=1
dxα α=1
∂xα

then because of (2.27), the last inequality may be rewritten in the form
n
X ∂ω(x(t))
f α (x(t), v)dt ≤ dt,
α=1
∂xα

or
n
X ∂ω(x(t))
f α (x(t), v) ≤ 1, v ∈ U. (2.28)
α=1
∂xα

Relations (2.26) and (2.27) show that


n
X ∂ω(x(t))
sup f α (x(t), v) = 1,
v∈U
α=1
∂xα

and the upper bound is achieved at v = u(t).

25
Chapter 2. Optimal Control Processes

Since an optimal trajectory leading to x1 passes through each point x of Ω, we


arrive at the conclusion that the function ω(x) satisfies the following nonclassical
partial differential equation (Bellman’s equation) in the region Ω:
n
X ∂ω(x)
sup f α (x, u) = 1. (2.29)
u∈U α=1 ∂xα

Furthermore, the upper bound is attained at some point u ∈ U (namely, at the value
of the optimal control at the time of departure from the point x), and the function
ω(x) is nonpositive and vanishes only at the point x1 .

This is the principle of dynamic programming as applied to the optimal control


problem (for simplicity we considered the time-optimal control problem (2.8)).

26
Chapter 2. Optimal Control Processes

2.4 The relation between Pontryagin’s Maximum


Principle and Bellman’s Method of Dynamic
Programming

The main difference between the calculus of variations methods and dynamic pro-
gramming lies in emphasis (see [2], [3], [9], [13], [16], [17], [24], [28]). The former
considers variations of the candidate extremizing curve, whereas in dynamic pro-
gramming the candidate curve varies over a small initial interval and the remainder
of the curve is supposed to be optimal for the other part of the problem. In other
words, the concept of variation is to be found in both approaches. Which of the two
techniques is more desirable depends entirely on the needs and point of view of the
user. The calculus of variations yields results whose analytical forms are useful to
theorists, and its main appeal perhaps lies in solving deterministic control problems
with time treated as continuous, although there are attempts to discretize time [30].
On the other hand, others claim that dynamic programming is the more promising
and powerful tool with wider applications in a variety of subjects [2]. It is certainly
much more efficient than the calculus of variations in dealing with stochastic control
problems involving multistage decision processes [9], [17].

In what follows we shall consider the relation existing between the maximum
principle and R.Bellman’s method of dynamic programming (see in 2.3 the derivation
of the Bellman’s equation from Pontryagin’s maximum principle for the time-optimal
control problem (2.8)). For a fuller discussion, see Dreyfus [9].

The method of dynamic programming was developed for the needs of optimal
control processes which are of a much more general character than those which are
describable by systems of differential equations. Therefore, the method of dynamic
programming carries a more universal character than the maximum principle ([3],

27
Chapter 2. Optimal Control Processes

[24]). However, in contrast to the latter, this method does not have the rigorous
logical basis in all those cases where it may be successfully made use of as a valuable
heuristic tool.

The basis of the method of dynamic programming given by Bellman rests on the
assumption that to the natural conditions of the problem (see our Theorems 2.2.1
and 2.2.2) another essential requirement has been added - the requirement that the
function w(x) defined in Proposition 2.3.1 be differentiable (for a broader discussion
see [24]). This assumption does not follow from the statement of the problem, and
is a restriction which, as we shall see below, is not satisfied even in the simplest
examples.

However, after this assumption has been made, the method of dynamic program-
ming leads to a certain partial differential equation, which we call Bellman’s equation.
This equation (under certain additional conditions) is equivalent to the Hamiltonian
system (2.14), (2.15), and to the maximum condition (2.16), (2.17).

In section 2.3 we showed the relation of Bellman’s method of dynamic program-


ming to Pontryagin’s maximum principle (for a broader discussion see [16], [24]). For
the sake of simplicity we only considered the time-optimal problem (2.8).

Proposition 2.4.1 Let us assume that the function ω(x) is twice continuously dif-
ferentiable. Then Pontryagin’s maximum principle can be derived from Bellman’s
principle of dynamic programming.

Proof Since ω(x) is twice continuously differentiable, the function


n
X ∂ω(x)
g(x, u) = f α (x, u), (2.30)
α=1
∂xα

which stands under the supremum in (2.29), has continuous first derivatives with
respect to x1 , ..., xn . It follows from Bellman’s principle of dynamic programming

28
Chapter 2. Optimal Control Processes

(see (2.26) and (2.29)) that if u(t) is an optimal control which transfers the phase
point from the position x0 to the position x1 , and x(t) is the corresponding optimal
trajectory, then for a fixed t, t0 ≤ t ≤ t1 , the function g(x, u(t)) of the variable x ∈ X
attains its maximum value (unity) at the point x = x(t). From this it follows that
∂g(x(t), u(t))
= 0, i = 1, ..., n, t 0 ≤ t ≤ t1 . (2.31)
∂xi
Taking the form of the function g(x, u) (see (2.30)) into account, we obtain the
relations
n
X ∂ 2 ω(x(t))
f α (x(t), u(t))
α=1
∂xα ∂xi
X n (2.32)
∂ω(x(t)) ∂f α (x(t), u(t))
+ · = 0, i = 1, ..., n,
α=1
∂xα ∂xi

which are satisfied along the optimal trajectory. Furthermore, we have


Xn Xn µ ¶ µ ¶
∂ 2 ω(x(t)) α ∂ ∂ω(x(t)) dxα (t) d ∂ω(x(t))
f (x(t), u(t)) = = ,
α=1
∂xα ∂xi α=1
∂xα ∂xi dt dt ∂xi

so that relations (2.32) may be rewritten in the form


µ ¶ Xn
d ∂ω(x(t)) ∂f α (x(t), u(t)) ∂ω(x(t))
= − · , i = 1, ..., n.
dt ∂xi α=1
∂x i ∂x α

Thus, along each optimal trajectory, the variables


∂ω(x(t))
ψi (t) = , i = 1, ..., n, (2.33)
∂xi
satisfy the linear system of differential equations
Xn
dψi (t) ∂f α (x(t), u(t))
=− ψα (t), i = 1, ..., n. (2.34)
dt α=1
∂xi

In addition, because of relation (2.26), Bellman’s equation (2.29) can be written in


the form
n
X n
X
α
ψα (t)f (x(t), u(t)) = sup ψα (t)f α (x(t), u) = 1. (2.35)
α=1 u∈U α=1

29
Chapter 2. Optimal Control Processes

Relations (2.34) and (2.35) coincide with Pontryagin’s maximum principle, and
relation (2.33) points out the relation between ψi (t) and the function ω(x) in an
explicit form. We also note, as follows from (2.35), that the optimal motions can
always be realized in such a way that

H(ψ(t), x(t), u(t)) ≡ 1 (2.36)

along optimal trajectories. We remind that all of these results can be obtained
provided that the function ω(x) is twice differentiable. Without this additional
assumption the proof of relation (2.36) loses its validity.

Let us give a simple example (see more examples in [16], [19], [24]) that shows
that the function ω(x) does not have the first derivatives at the points which lie on
the switching curves (this may be ascertained by direct calculations). Since every
optimal trajectory passes along the switching curve during some time interval in
this example, the assumption on the differentiability of ω(x) holds on none of the
trajectories. Thus, even in the simplest examples, the assumptions which must be
made in order to derive Bellman’s equation do not hold.

Example where Pontryagin’s principle applies, but Bellman’s fails because the cont-
rol is discontinuous (Bang-Bang Problem)

Consider the equation


d2 x
= u,
dt2
where u is a real control parameter constrained by the condition |u| ≤ 1. The given
equation can be rewritten using the phase coordinates x1 = x and x2 = dx/dt.
Hence, we get the following system:
dx1 dx2
= x2 , = u. (2.37)
dt dt
Let us consider (for a phase point moving in accordance with (2.37)) the problem of
getting to the origin (0, 0) from a given initial state x0 in the shortest time. In other

30
Chapter 2. Optimal Control Processes

words, we shall consider the time-optimal problem for the case where the origin (0, 0)
is the terminal position x1 .
P
n
The Hamiltonian function H(ψ, x, u) = ψν f ν (x, u) in this case has the form
ν=1

H = ψ1 x2 + ψ2 u. (2.38)

Thus, since we know that

dψi ∂H
= − i, i = 1, ..., n,
dt ∂x

(see equation (2.19)) we obtain the system of equations

dψ1 ∂H dψ2 ∂H
= − 1 = 0, = − 2 = −ψ1,
dt ∂x dt ∂x

for the auxiliary variables ψ1 and ψ2 . Hence, ψ1 = c1 and ψ2 = c2 − c1 t (c1 and


c2 are arbitrary constants). Relation (2.20) yields (taking (2.38) and the condition
−1 ≤ u ≤ 1 into account)

u(t) = signψ2 (t) = sign(c2 − c1 t). (2.39)

It follows from (2.39) that every optimal control u(t), t0 ≤ t ≤ t1 , is a piecewise


constant function which takes on the values ±1, and has at most two intervals on
which it is constant (since the linear function c2 − c1 t changes sign at most once on
the interval t0 ≤ t ≤ t1 ). Also, any such function u(t) can be obtained from relation
(2.39) for some values of c1 and c2 .

From the system (2.37)

dx1 dx2
= x2 , =u
dt dt

for the time interval on which u ≡ 1 we have


à !
2
t2 1¡ ¢2 (s2 )
x2 = t + s2 , x1 = + s2 t + s1 = t + s2 + s1 −
2 2 2

31
Chapter 2. Optimal Control Processes

Figure 2.7: Bang-bang time-optimal control: trajectories for u = 1 of parabolas given


by equation (2.40)

(s1 and s2 are constants of integration), from which we obtain


1 ¡ 2 ¢2
x1 = x + s, (2.40)
2
2
where s = s1 − 12 (s2 ) is a constant. Thus, the portion of the phase trajectory for
which u ≡ 1 is an arc of the parabola (2.40). The family of parabolas (2.40) is shown
in Figure 2.7.

Analogously, for the time interval on which u ≡ −1, we have

x2 = −t + s02 ,
µ ¶
1t2 02 01 1¡ ¢
02 2 02 1 ¡ 02 ¢2
x = − + s t + s = − −t + s + s + s ,
2 2 2

32
Chapter 2. Optimal Control Processes

Figure 2.8: Bang-bang time-optimal control: trajectories for u = −1 of parabolas


given by equation (2.41)

from which we obtain


1 ¡ 2 ¢2
x1 = − x + s0 . (2.41)
2

The family of parabolas (2.41) is shown in Figure 2.8. The phase points move
upwards along the parabolas (2.40) (since dx2 /dt = u = +1), and downwards along
the parabolas (2.41) (dx2 /dt = u = −1).

As we said before, every optimal control u(t) is a piecewise constant function,


taking on the values ±1, and having at most two intervals on which it is constant.
If u(t) is initially equal to +1, and then to −1, the phase trajectory consists of two
adjoining parabolic segments (Figure 2.9). The second of these segments lies on
that parabola defined by (2.41) which passes through the origin (since the desired
trajectory must lead to the origin). On the other hand, if u = −1 first and u = +1

33
Chapter 2. Optimal Control Processes

Figure 2.9: Bang-bang time-optimal control: u(t) is initially equal to +1, and then
to −1, the phase trajectory consists of two adjoining parabolic segments given by
equations (2.40) and (2.41), respectively

afterwards, the phase curve is replaced by one which is symmetric with respect to
the origin (Figure 2.10). In Figures 2.9, 2.10 the corresponding values of the control
parameter u are written next to the parabolic arcs. Figure 2.11 shows the entire
2
family of phase trajectories we obtained (AO is the arc of the parabola x1 = 21 (x2 )
2
in the lower half-plane, BO is the arc of the parabola x1 = − 12 (x2 ) in the upper
half-plane). The phase point moves along an arc of the parabola (2.41) which passes
through the initial points x0 , if x0 is above the curve AOB; and along an arc of a
parabola (2.40) if x0 is below this curve. In other words, if the initial position x0 is
above the curve AOB, the phase point must move under the influence of the control

34
Chapter 2. Optimal Control Processes

Figure 2.10: Bang-bang time-optimal control: u(t) is initially equal to −1, and then
to +1, the phase trajectory consists of two adjoining parabolic segments given by
equations (2.41) and (2.40), respectively

u = −1 until it reaches the arc AO. At the instant it arrives, the value of u switches
to +1 and remains at this value until the phase point reaches the origin. However, if
the initial position x0 is below AOB, u must equal +1 until the time it reaches the
arc BO, and at that time the value of u changes to −1.

Definition A piecewise constant optimal control u(t) that takes only two values on
the boundary of the control space U is called a bang-bang control [19].

According to Theorem 2.2.2, only the above described trajectories can be optimal.
Furthermore, it can be seen from the above investigation that from each point in

35
Chapter 2. Optimal Control Processes

Figure 2.11: Bang-bang time-optimal control: the switching curve and the family of
2
phase trajectories we obtained (AO is the arc of the parabola x1 = 21 (x2 ) in the lower
2
half-plane, BO is the arc of the parabola x1 = − 12 (x2 ) in the upper half-plane)

the phase plane there is only one trajectory leading to the origin which can be
optimal (i.e, once the initial point x0 is given, the corresponding trajectory is uniquely
determined). If we could be sure that the optimal trajectory always (i.e, for any initial
point x0 ) exists, we could confidently say that all the trajectories we have found are
optimal (see [24], Chapter III for the formulation of the existence theorem for linear
time-optimal systems). In particular, it follows from this theorem that in the present
example there exists an optimal trajectory (see page 127 in [24]) for each initial point
x0 . Thus, the trajectories we have found (Figure 2.11) are optimal, and there are no
other optimal trajectories which lead to the origin.

36
Chapter 2. Optimal Control Processes

Therefore, the solution of the optimal problem obtained in the above example
can be interpreted as follows. Let v(x1 , x2 ) = v(x) be the function given in the x1 x2
plane as follows:

 + 1 below the curve AOB, and on the arc AO,
v(x) =
 - 1 above the curve AOB, and on the arc BO.

Also, on each optimal trajectory the value u(t) of the control parameter (at an
arbitrary time t) is equal to v(x(t)), meaning that it equals the value of the function
v at the point at which the phase point, moving along the optimal trajectory

u(t) = v(x(t)),

is located at the time t. This means that if we replace the variable u by the function
v(x) in the original system (2.37), we obtain the system

 dx1 /dt = x2 ,
(2.42)
 dx2 /dt = v (x1 , x2 ) .

We can find the optimal phase trajectory which leads to the origin from the solution
of this system (2.42) (for an arbitrary initial state x0 ). Therefore, we system (2.42)
is the system of differential equations (with discontinuous right-hand side) for the
determination of the optimal trajectories which lead to the origin.

37
Chapter 3

The Pursuit Problem

3.1 Statement of the Problem

Let us assume that two points, one of which we shall call “pursuing” (P) and the
other “evading” (E), are moving in X ⊂ Rn :

x0 = f (x, u, t), y 0 = g(y, t), (3.1)

where u, U, and x (t) are the control parameter, the control region, and the trajectory
of the motion of the pursuing point P, respectively, and y(t) is the trajectory of the
motion of the evading point E.

Let u(t) be a certain admissible control (i.e., piecewise continuous), and let x (t)
and y(t) be the corresponding trajectories with initial conditions

x(0) = x0 , y(0) = y0 . (3.2)

If x (t 1 ) = y(t1 ) for some t1 > 0, we shall call t1 an encounter time, and the very
occurrence that x (t1 ) = y(t1 ) will be referred to as an encounter. If the control u(t)
is chosen arbitrarily, an encounter may not occur for any t > 0. If an encounter

38
Chapter 3. The Pursuit Problem

does occur, we shall call the control (which is an admissible control) u(t) a pursuing
control. Even then, for the given x0 , y0 , and the chosen control u(t), more than one
encounter may take place. We shall call the smallest positive number t1 , which is an
encounter time, the pursuit time corresponding to the control u(t). We shall denote
the pursuit time by

T = min Tu . (3.3)
u∈U

In what follows, the initial conditions (3.2) will be assumed to be fixed (in this
connection, x0 and y0 do not enter into the notation for the pursuit time). Therefore,
we get a statement of the pursuit problem.

Definition The problem is called a pursuit problem if it is defined by equations


(3.1) - (3.3)
x0 = f (x, u, t), y 0 = g(y, t),

x(0) = x0 , y(0) = y0 ,

T = min Tu ,
u∈U

where x and y belong to X ⊂ R , u ∈ U ⊂ Rr and is admissible (piecewise conti-


n

nuous).

39
Chapter 3. The Pursuit Problem

Figure 3.1: The geometry of Bouguer’s pursuit problem about a pirate ship moving
directly toward the merchant vessel at constant speed Vp along a curved path and
pursuing a merchant vessel travelling at constant speed Vm along the vertical line
x = x0

3.2 Pierre Bouguer’s Pursuit Problem

Modern mathematical pursuit analysis is generally assumed to begin with a problem


posed and solved by the French mathematician and hydrographer Pierre Bouguer
(1698-1758) in 1732 (see [5]). This general assumption is not quite correct, but
Bouguer’s problem is today nevertheless taken as the starting point of pursuit ana-
lysis in all modern textbooks. In his paper, Bouguer treated the case of pirate ship
pursuing a fleeing merchant vessel, as illustrated in Figure 3.1. The pirate ship and
the merchant vessel are taken to be at (0, 0) and (x0 , 0) at time t = 0, respectively,

40
Chapter 3. The Pursuit Problem

the instant the pursuit begins, with the merchant vessel travelling at constant speed
Vm along the vertical line x = x0 . The pirate ship travels at constant speed Vp along
a curved path such that it is always moving directly toward the merchant, that is,
the velocity vector of the pirate ship points directly at the merchant vessel at every
instant of time. Bouguer’s problem was to determine the equation y = y(x) of the
curved path which he called the line of pursuit. The pursuit curve has its association
with the path taken by a dog in following its master, and the falcon flying in its
attack directly at the instantaneous location of its prey. This is the definition of
what is now called pure pursuit.

To find the curve of pursuit for Bouguer’s problem, start by calling the location
of the pirate ship, at arbitrary time t ≥ 0, the point (x, y). At time t the merchant
vessel has sailed to the point (x0 , Vm t) and so, as shown in Figure 3.1, the slope of
the tangent line to the pursuit curve (the value of dy/dx at (x, y)) is given by

dy Vm t − y y − Vm t
= = . (3.4)
dx x0 − x x − x0

We also know that, whatever the shape of the pursuit curve, the pirate ship has sailed
along it at time t by a distance of Vp t. From calculus we know that this arc-length
is also given by the expression on the right below, and so
s µ ¶2
Zx
dy
Vp t = 1+ dz, (3.5)
dz
0

where z is simply a dummy variable of integration. Solving (3.4) and (3.5) each for
t, we can write s
Zx µ ¶2
1 dy y x − x0 dy
1+ dz = − · ,
Vp dz Vm Vm dx
0

which, if we let dy/dx = p(x), becomes


Zx p
1 y x − x0
1 + p2 (z)dz = − · p(x). (3.6)
Vp Vm Vm
0

41
Chapter 3. The Pursuit Problem

Differentiating (3.6) with respect to x (using Leibniz’s formula to differentiate an


integral), we arrive at
1p 1 dy x − x0 dp 1
1 + p2 (x) = · − · − p(x)
Vp Vm dx Vm dx Vm
or, simplifying,
dp Vm p p
(x − x0 ) =− 1 + p2 (x) = −n 1 + p2 (x), (3.7)
dx Vp
where n = Vm /Vp . (Ordinarily we’ll have n < 1, the pirate ship sailing faster than
the merchant. For n > 1 the problem is without interest as then the pirate ship is
slower than the merchant and the concept of “pursuit” is meaningless. The n = 1
case, however, does offer us a curious mathematical problem with special interest
that we’ll go into later.) Separating variables,
dp ndx ndx
p =− = (3.8)
1+ p2 x − x0 x0 − x
and, integrating (3.8) indefinitely, we have (with C as the constant of indefinite
integration)
³ p ´
ln p + 1 + p2 + C = −n ln (x0 − x) . (3.9)

From Figure 3.1 we see at t = 0 that p = dy/dx = 0 when x = 0, because at


that instant both ships are on the x-axis (the fact that dy/dx|t=0 = 0 also follows
mathematically from (3.4) since y(t = 0) = 0). Inserting these initial conditions into
equation (3.9), it follows that C = −nln(x0 ) and so (3.9) becomes
³ p ´
2
ln p + 1 + p − n ln(x0 ) = −n ln (x0 − x) ,

which, after a few steps of algebra, reduces to


·³ p ´µ ¶n ¸
2
x
ln p + 1 + p 1− = 0,
x0
which tells us that
³ p ´µ x
¶n
p+ 1+p 2 1− = 1. (3.10)
x0

42
Chapter 3. The Pursuit Problem

Thus,
p 1
p+ 1 + p2 = ³ ´n = q, (3.11)
x
1− x0

where q has been introduced to keep the next few algebraic steps easy to follow.
Solving (3.11) for p, we have
p
1 + p2 = q − p,

1 + p2 = (q − p)2 = q 2 − 2qp + p2 ,
· ¸
q2 − 1 1 1
p= = q− .
2q 2 q
Thus, replacing q with its equivalent (from (3.11)) gives
"µ ¶−n µ ¶n #
dy 1 x x Vm
p(x) = = 1− − 1− , n= . (3.12)
dx 2 x0 x0 Vp

We can solve (3.12) for y(x) by simple integration, writing C once more as the
constant of integration,
Z Z µ ¶n
1 dx 1 x
y(x) + C = ³ ´n − 1− dx.
2 1 − xx0 2 x0

In both integrals change variable to u = 1 − x/x0 (so dx = −x0 du) to get


Z Z
1 −x0 du 1
y(x) + C = n
− −x0 un du, (3.13)
2 u 2
which immediately integrates to
1 u−n+1 1 un+1
y(x) + C = − x0 + x0
2 −n + 1 2 n + 1
· ¸ · n ¸
1 u · un u · u−n 1 u u−n
= x0 − = x0 u − .
2 1+n 1−n 2 1+n 1−n
That is, ³ ´n ³ ´−n 
µ ¶ x x
1 x  1 − x0 1 − x0 
y(x) + C = x0 1 −  − ,
2 x0 1+n 1−n

43
Chapter 3. The Pursuit Problem

or
³ ´n ³ ´−n 
x x
1  1− x0
1− x0 
y(x) + C = (x0 − x)  − . (3.14)
2 1+n 1−n

Since y(x = 0), then


· ¸
1 1 1 n
C = x0 − =− x0
2 1+n 1−n 1 − n2

and so inserting this result into (3.14) gives us our answer, the pursuit curve
equation y = y(x) :
n 1
y(x) = 2
x0 + (x0 − x)
1−n 2
³ ´n ³ ´−n  (3.15)
x x
 1 − x0 1 − x0  Vm
× − , n= .
1+n 1−n Vp

“Capture” occurs when x = x0 (the pirate ship pursuit curve intersects the mer-
chant’s course), which says capture occurs at the point (x0 , n/(1 − n2 )x0 ). (This
makes physical sense only if n < 1, of course, the case of the pirate ship being
faster than the merchant.) For example, if the pirate ship sails twice as fast as the
1
merchant, then n = 2
and capture occurs at the point (x0 , 32 x0 ), while if the pirate
ship sails only one-third faster than the merchant (i.e., Vp = 43 Vm ), then n = 3
4
and
capture occurs at the point (x0 , 12 x ). As n approaches one, that is, as the sailing
7 0

speeds of the pirate ship and the merchant vessel become equal, it is clear that the
capture point moves ever father up the x = x0 line and, in the limit n = 1, the
capture point is at infinity (which is the physically obvious statement that capture
does not occur). Figure 3.2 shows the pursuit curve up to the capture point for the
case of x0 = 1 and n = 34 . The analytical expression of (3.15) fails to make sense
for the case of n = 1 (Vm = Vp ), of course, because then we have a division by zero
problem. To see what the correct analytical form of the pursuit curve is for n = 1,

44
Chapter 3. The Pursuit Problem

1.8

1.6

1.4

1.2

1
Y

0.8

0.6

0.4

0.2

0
0 0.5 1 1.5 2
X

Figure 3.2: The path of the pirate ship as given by equation (3.15) for n = 3/4

return to (3.12), to just before we integrated dy/dx. Then

"µ  
¶−1 µ ¶# µ ¶
dy 1 x x 1 1 x 
= 1− − 1− = ³ ´ − 1− (3.16)
dx 2 x0 x0 2 1 − xx0 x0

and so
"Z Z µ ¶ #
1 dx x
y(x) + C = − 1− dx .
2 1 − xx0 x0

As before, change variables in both integrals to u = 1 − x/x0 (and so dx = −x0 du)

45
Chapter 3. The Pursuit Problem

to get
Z Z
1 −x0 1
y(x) + C = du − u(−x0 )du
2 u 2

1 1 1
= − x0 ln u + x0 · u2
2 2 2
" µ ¶2 µ ¶#
1 1 x x
= x0 1− − ln 1 − .
2 2 x0 x0

Since y(x = 0) = 0, then C = 14 x0 , and so for n = 1 (Vp = Vm ) the equation of the


pursuit curve is
" µ ¶2 µ ¶#
1 1 x x 1
y(x) = x0 1− − ln 1 − − x0 . (3.17)
2 2 x0 x0 4

When Bouguer’s problem was included in the 1859 book Treatise on Differential
Equations [4] by the famous British mathematician George Boole (1815 - 1864), the
pursuit curve for n = 1 (pursuer and evader moving with equal speeds) case was
declared to be a parabola, which is clearly wrong - as observed in Burton and Eliezer
[1], whatever the pursuit curve is (for any value of n) it certainly must be asymptotic
to the line x = x0 .

Now, for the n < 1 case let us calculate the total distance travelled by the pirate
ship until its capture of the merchant vessel. As we discussed earlier, capture does
not occur in the n = 1 case, and after a “long” time, the pirate ship will have sailed
into a position directly behind the merchant and will simply chase, endlessly, after
the merchant while remaining a constant distance behind it. It is an interesting
mathematical problem to calculate the value of this so-called tail chase lag distance.

To calculate the distance sailed by the pirate ship until it captures the merchant
vessel (n < 1), recall from (3.15) that capture occurs at (x0 , n/(1 − n2 )x0 ), i.e., the
merchant vessel has travelled a distance of n/(1 − n2 )x0 . Since the pirate ship travels

46
Chapter 3. The Pursuit Problem

Figure 3.3: The geometry of the tail chase as given by equation (3.17)

1/n times faster than does the merchant, the pirate travels 1/n times as far, that is,
the pirate ship travels a total distance of 1/(1 − n2 )x0 .

To answer the second question, i.e., to determine the distance the pirate ship
lags behind the merchant vessel after a long time has passed (for n = 1), refer to
Figure 3.3. There we see the pirate ship at point (x, y), while the merchant vessel is
at (x0 , yn ). Note, that this is for any arbitrary time t. The distance separating the
pirate ship and the merchant vessel is D, where
" µ ¶2 #
ym − y
D2 = (ym − y)2 + (x0 − x)2 = (x0 − x)2 1 + .
x0 − x

Now, here is an important fact: the line joining (x, y) to (x0 , ym ) is the tangent to

47
Chapter 3. The Pursuit Problem

the pirate’s pursuit curve, because the chase is a pure pursuit, meaning that the
pirate ship is moving directly at the instantaneous location of the merchant vessel
(according to the statement of the problem), i.e., the velocity vector of the pirate
ship points directly at the merchant vessel at every instant of time. Thus,
dy ym − y
=
dx x0 − x
and so " µ ¶2 #
dy
D2 = (x0 − x)2 1 + .
dx

Substituting (3.12) for dy/dx for the n = 1 case, that is, writing
 
µ ¶
dy 1 1 x 
= ³ ´ − 1− ,
dx 2 1− x x0
x0

we have
  
¶2 µ
1 1 x
D2 = (x0 − x)2 1 + ³ ´ − 1− 
4 1− x x0 
x0

  
µ ¶2  ¶2  µ
x  1 1 x 
= x20 1 − 1 + ³ ´2 − 2 + 1 − 
x0 4
 1− x x0  
x0

"µ ¶2 µ ¶2 µ ¶4 #
x 1 1 x 1 x
= x20 1− + − 1− + 1− .
x0 4 2 x0 4 x0

As t → ∞ we physically see the pirate ship pull into behind the merchant vessel and
the pursuit becomes a vertically upward tail chase; thus, x → x0 , and so
1
lim D2 = lim D2 = x20
t→∞ x→x0 4
or, at last,
1
lim D = x0 .
t→∞ 2

48
Chapter 3. The Pursuit Problem

Application of Bouguer’s Pursuit Problem:

A merchant vessel, moving horizontal in a straight line, is b feet directly below


one pirate ship “Black Pearl” and d feet directly above another pirate ship “Dead
Men”. Both pirate ships move directly toward the merchant vessel, reaching it simul-
taneously. We know that “Black Pearl” is slower than “Dead Men”, and that “Dead
Men” moves twice as fast as the merchant vessel. At what rate does the “Black Pearl”
move?

We can see right away that the statements that “Black Pearl” is above the mer-
chant vessel, and that “Dead Mean” is below, have nothing to do with the mathema-
tics of the problem. Then, with no loss in the spirit of the problem, we can take the
initial location of the “Black Pearl” as (0, b) and of the “Dead Mean” as (0, d). In our
solution to Bouguer’s problem, the initial separation between pursuer and pursued
was x0 , and so b and d each play the role of x0 . We know from our earlier analysis
that capture will occur after the evader has travelled distance of n/(1 − n2 )x0 , where
n equals the speed of the evader over the speed of the pursuer. For “Dead Men” we
have n = 1/2, and for “Black Pearl” let’s say it moves k times as fast as the merchant
vessel (and so n = 1/k for “Black Pearl”). Now, since both pursuers “capture” the
vessel at the same instant (the same point) we have

1/2 1/k
2
d= b.
1 − (1/2) 1 − (1/k)2

Hence,
1/2 k
d= 2 b,
3/4 k −1
or
2 k
d= 2 b,
3 k −1
where d, b, and k are some constants. Simplifying, we get

2 2 2
dk − bk − d = 0,
3 3

49
Chapter 3. The Pursuit Problem

q
16 2
b± b2 + 9
d
k= 4 .
3
d
We realize that since “Black Pearl” starts closer to the vessel than does “Dead Men”,
k must be between one and two (the pirate ship “Black Pearl” must move faster than
the merchant vessel to capture it, but slower than “Dead Men”, according to the given
condition).
√2 Hence, k > 0, and we use the plus sign. Therefore, “Black Pearl” moves
b+ b +16/9d 2

4/3d
times as fast as the merchant vessel. Now, if we let, for example, b = 50
and d = 100, then we can find that “Black Pearl” moves 1.443 times as fast as the
vessel.

Remark A different generalized form of Bouguer’s problem was solved in Colman


[8], in which the merchant vessel’s straight sailing path is inclined from the vertical
by angle α, i.e., the line x = x0 is replaced by the straight line y = (x − x0 ) · cot α
for − π2 ≤ α ≤ π
2
. Colman [8] does not give an explicit formula-equation for the
flight path of the pursuer, but finds coordinates for the point of capture in the case
when the ratio of pursuer’s and evader’s speeds is n < 1. The solution presented
in this section is for α = 0, while α = π/2 radians would represent the merchant
sailing directly away from the pirate ship (and α = −π/2 radians would represent
the merchant sailing directly toward the pirate ship). In both of these extreme cases
the pursuit curve is, by inspection, simply x = 0 (the x-axis), but for α 6= ±π/2 or 0
the pursuit curve is pretty complicated, and its derivation is an exercise in nontrivial
manipulation.

50
Chapter 3. The Pursuit Problem

Figure 3.4: The geometry of the wind-blown plane problem, where the plane’s nose
is always pointed toward a city C, the plane’s speed is v mi/h, and a wind is blowing
from the south at the rate of w mi/h

3.3 Wind-Blown Plane Problem

Let us now present another important example (following [20]), where we use the
analysis of the Bouguer’s pursuit problem. It is similar to the problem solved in 1931
by E. Zermelo (see [31]).

A pilot always keeps the nose of his plane pointed toward a city C due west of
his starting point at (a, 0). Find equation of the plane’s path if the plane’s speed is v
mi/h, and a wind is blowing from the south at the rate of w mi/h.

51
Chapter 3. The Pursuit Problem

The pilot isn’t really pursuing anything, of course, unless we consider this problem
of “pursuit (with wind interference) of a stationary target”, but the spirit of this
problem is pure Bouguer.

In the notation of Figure 3.4, at an arbitrary time t ≥ 0, the plane’s location is


the point (x(t), y(t)). Writing ux and uy as the unit vectors in the x and y directions
(which are not functions of time), respectively, then we can write the position vector
of the plane as
p(t) = x(t)ux + y(t)uy ,

and so the plane’s velocity vector is

d dx dy
p(t) = ux + uy .
dt dt dt

Also, the plane’s body axis (nose-to-tail) is always along the direction of p(t), at
angle θ, toward C, where
y
tan(θ) = .
x
The wind, blowing only along the y-axis, contributes nothing to the ux component
of the plane’s velocity vector, that is, dx/dt is due only to the x-component of v.

dx vx
= −v cos(θ) = − p , (3.18)
dt x2 + y 2

where the minus sign is explicitly included, since as the plane flies toward C the value
of x decreases with increasing t. The uy component of the plane’s velocity vector, on
the other hand, is influenced by the wind, of course, as well as by the y-component
of v,
p
dy vy w x2 + y 2 − vy
= w − v sin(θ) = w − p = p . (3.19)
dt x2 + y 2 x2 + y 2

Dividing (3.19) by (3.18), we eliminate explicit time and arrive at


p
dy vy − w x2 + y 2
= . (3.20)
dx vx

52
Chapter 3. The Pursuit Problem

Let us introduce a new variable z such that y = zx. Then (3.20) becomes

dy dz vzx − w x2 + z 2 x2 w√
=z+x = =z− 1 + z2
dx dx vx v
or, defining the constant n = w/v,
dz √
x = −n 1 + z 2 , (3.21)
dx
from where we get
dz dx
√ = −n . (3.22)
1+z 2 x
(Notice the similarity of (3.22) and (3.8).) Integrating indefinitely, with C as the
constant of integration,

ln[(z + 1 + z 2 )] + C = −n ln(x).

Since y = 0 when x = a, which means z = y/x = 0 when x = a, then we have


C = −n ln(a), and so
√ ³a´ ³ a ´n
ln[(z + 1 + z 2 )] = n ln(a) − n ln(x) = n ln = ln ,
x x
or,
√ ³ a ´n
z+ 1+ z2 = . (3.23)
x
Defining q = (a/x)n , (3.23) becomes (similar to how we went from (3.11) to (3.12))

1 + z 2 = q − z,

1 + z 2 = q 2 − 2qz + z 2 ,
· ¸
q2 − 1 1 1
z= = q− .
2q 2 q
Thus, replacing q with its definition,
· ¸ · ¸
1 ³ a ´n ³ a ´−n 1 ³ x ´−n ³ x ´n
z= − = − . (3.24)
2 x x 2 a a

53
Chapter 3. The Pursuit Problem

0.5

0.45 n=0.999

0.4
n=0.95

0.35

n=0.8
0.3

0.25

0.2

0.15 n=0.4

0.1
n=0.2
0.05
n=0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x

Figure 3.5: Plots of the wind-blown plane’s paths given by equations (3.25) for several
values of n < 1 (n = 0.1, 0.2, 0.4, 0.8, 0.95, 0.99, 0.999)

Since y = zx, then


· ¸ " #
1 x−n+1 xn+1 1 x−n+1 xn+1
y= − n = − an+1
2 a−n a 2 a−n+1
a a

or, at last, we have the equation of the wind-blown plane’s path:


· ¸
a ³ x ´−n+1 ³ x ´n+1 w
y(x) = − , n= . (3.25)
2 a a v
When n = 0 - that is, when there is no wind - (3.25) collapses to the physically
obvious y(x) = 0, which simply says that the plane moves directly to city C while
always remaining on the x-axis. And when n = 1 (when the wind speed equals the
plane’s speed in still air), the plane’s path is the parabola
· ³ x ´2 ¸
a
y(x) = 1− .
2 a
In this case when x = 0 we see that y(0) = a/2, that is, the plane does not reach
city C. This probably makes intuitive sense, too, but it is interesting to see that the
miss distance is so large. What happens, physically, in the n = 1 case, is that the
plane arrives at the y-axis with a zero velocity component in the x-direction (notice
that the plane’s body axis has rotated through an angle of θ = 90◦ , and then recall

54
Chapter 3. The Pursuit Problem

(3.18)), and so there the plane remains, motionless at the point (0, a/2), as it flies
directly into the wind with the two equal magnitude but oppositely directed velocity
vectors precisely cancelling each other. Figure 3.5 shows the plane’s path for a = 1
for several different values of n, and it is clear that for n < 1 (≥ 1) the plane reaches
(does not reach) city C.

Now, let us calculate the total flight time of the wind-blown plane for n < 1, and
the total distance flown for the case of n “just less” than one.

For the total flight time T of the wind-blown plane, recall (3.18) and (3.25), where
we showed that
dx vx
= −p
dt x2 + y 2

and
· ¸
a ³ x ´−n+1 ³ x ´n+1 w
y= − , n= .
2 a a v
So,
ZT Z0 p 2
x + y2
dt = − dx,
vx
0 a

or
Z0 r
1 y2
T = 1+ dx.
v x2
a

Also,
· ³ x ´2 ³ x ´2n+2 ¸
2 a2 ³ x ´−2n+2
y = −2 +
4 a a a

· ¸
a2 ³ x ´−2n x2 x 2 ³ x ´2n x2
= −2 2 + .
4 a a2 a a a2
Thus,
· ³ x ´2n ¸
y2 1 ³ x ´−2n
= −2+ ,
x2 4 a a

55
Chapter 3. The Pursuit Problem

and so · ³ x ´2n ¸
y2 1 ³ x ´−2n
1+ 2 =1+ −2+
x 4 a a

(x/a)−2n − 2 + (x/a)2n + 4
=
4

½ ¾2
(x/a)−2n + 2 + (x/a)2n (x/a)n + (x/a)−n
= = .
4 2
We can then write T as
Za ·³ ´n ³ ´−n ¸
1 x x
T = + dx
2v a a
0

 
Za ³ ´n Za ³ ´−n
1  x x
= dx + dx .
2v a a
0 0

Letting u = x/a (dx = adu), we then have


 1 
Z Z1 · ¸¯1
1  n −n  a un+1 u−n+1 ¯¯
T = u adu + u adu = +
2v 2v n + 1 −n + 1 ¯0
0 0

µ ¶
a 1 1 a/v w
= + = , n= .
2v 1+n 1−n 1 − n2 v
This makes sense for 0 ≤ n < 1. Notice that if n = 0 (no wind) then T = a/v, which
is simply the time the plane requires to fly straight along the x-axis from (a, 0) to
(0, 0) at a speed v. As n approaches one from below, of course, we see T → ∞ as
expected.

For the total distance flown by the plane when n is “just less” than one, that is,
for the case where the plain “just managers” to reach city C, recall that at n = 1
the plane’s path is the parabola
· ³ x ´2 ¸
a
y= 1− .
2 a

56
Chapter 3. The Pursuit Problem

As n approaches one, then, the upward curved part of the flight path of the plane
approaches this parabola, as illustrated in Figure 3.5. Let us look at the three plots
which are for n = 0.98, n = 0.99, and n = 0.999, all for a = 1. From these curves it
should be clear that the length of the longest flight path that just manages to reach
city C is bounded from above by
s µ ¶2
Za
a dy
+ 1+ dx,
2 dx
0

where the second term is the length of the parabolic arc. The first term, of course, is
the length of the final leg of the journey back down along (almost along) the vertical
axis to city C at the origin. On the parabolic arc we have

dy a ³x´ 1 x
=− 2 =− ,
dx 2 a a a

and so our answer is


Za r ³ x ´2
a
+ 1+ dx.
2 a
0

If we change variables to u = x/a (dx = adu), our answer becomes


 
Za q Z1 √
a 1
+ 1 + (u)2 adu = a  + 1 + u2 du
2 2
0 0

or, " ( √ )#¯


1 u u2 + 1 1 ³ √ ´ ¯1
¯
a + + ln u + u2 + 1 ¯
2 2 2 ¯
0

" √ √ #
1+ 2 + ln(1 + 2)
=a = 1.6478a.
2
This is the total distance flown by the plane when n = 1 − ε, where ε > 0, but is
arbitrary small.

57
Chapter 3. The Pursuit Problem

Figure 3.6: The geometry of the tractrix problem, where a watch-on-a-chain with the
chain of length a is initially on the y-axis, the end of the chain is pulled along the
x-axis from the initial position on the origin

3.4 The Tractrix

In the late seventeenth century there was also a different pursuit curve (as you will
see, it is better to call this curve the following curve or the tailing curve) [20]. An
example of such a problem (with the tailing curve) is illustrated in Figure 3.6, where
a watch-on-a-chain has been laid out on a table-top with the chain (of length a)
pulled out. In our thesis we follow the statement given in [20]. The watch is initially
on the y-axis, and the other end of the chain is on the origin. If the end on the
origin is then pulled along the x-axis, the watch will obviously be dragged along. We

58
Chapter 3. The Pursuit Problem

are interested in the equation of the watch’s path, known as the tractrix. It was first
introduced by Claude Perrault in 1670, and later studied by Sir Isaac Newton (1676)
and Christian Huygens (1692) [18].

If (x, y) is the location of the watch at some arbitrary time t ≥ 0, then it is clear
that the taut chain is tangent to the tractrix at (x, y). This crucial observation allows
us to calculate where the pulling end of the taut chain is (always on the x-axis), as
follows. The slope of the tangent line is dy/dx and so, from analytic geometry, we
have the equation of the tangent line as
dy
y=x + b, (3.26)
dx
where b is some constant. Let xi be the value of x where the pulling end of the chain
is located, by definition y = 0 there. So,
dy
b = −xi ,
dx
and therefore, the equation of the tangent line that intersects the x-axis at x = xi is
dy dy dy
y=x − xi = (x − xi ) . (3.27)
dx dx dx
From the Pythagorean theorem we then have

(x − xi )2 + y 2 = a2 ,

or, using (3.27) to solve for (x − xi ), we have


y2 2 2
2 +y = a ,
(dy/dx)
or
· ¸2
y
= a2 − y 2 . (3.28)
dy/dx

Taking the positive square root of both sides of (3.28), and noting that dy/dx is
negative (look at Figure 3.6 again), we arrive at
y p
− = a2 − y 2 , (3.29)
dy/dx

59
Chapter 3. The Pursuit Problem

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.5 1 1.5 2 2.5 3
x

Figure 3.7: A depiction of the tractrix given by equation (3.31) for a = 1

a differential equation in which we can separate the variables. That is,


p
a2 − y 2
dx + dy = 0. (3.30)
y

Integrating indefinitely (with C as the arbitrary constant), we have


à p !
p a + a 2 − y2
x + a2 − y 2 − a ln = C.
y

Since y(x = 0) = a, we have C = 0 and so the equation of the watch’s path as it is


being dragged is
à p !
2
a+ a −y 2 p
x = a ln − a2 − y 2 . (3.31)
y

Figure 3.7 shows the tractrix of (3.31) for the case of a = 1.

Finally, it is interesting to contrast the tractrix with Bouguer’s pure pursuit curve
for the special case of equal speeds for the pirate ship and the merchant vessel. The
two curves seemingly have a common property, as the dragged watch is a constant
distance from the pulled end of the chain, and the pirate ship ends up a constant
distance behind the fleeing merchant vessel. The expression of (3.17) and (3.31) are

60
Chapter 3. The Pursuit Problem

quite different. The reason is that for the tractrix the constant lag of the watch is
always the case, while the constant lag of the pirate ship is an asymptotic property
that develops with the passage of time.

61
Chapter 3. The Pursuit Problem

Figure 3.8: Schematic of the pursuit by interception problem with pursuer T (Tor-
pedo) and evader E (Enemy ship) moving with constant speeds VT and VE , respec-
tively

3.5 Apollonius Pursuit Problem

In this section we talk about a question that you may have already thought about -
since the merchant vessel being pursued by Bouguer’s pirate ship always sails along
a straight line, why does the pirate use pure pursuit (meaning that the pirate ship
is moving directly at the instantaneous location of the merchant vessel) to run down
his victim? Why doesn’t the pirate ship simply sail along the straight line path that
will intercept the merchant? Bouguer himself was not oblivious to that possibility.
As Puckette [26] puts it, “[Bouguer] makes it quite clear that the pursuing ship could

62
Chapter 3. The Pursuit Problem

catch its quarry much more quickly by ‘heading it off’ than by merely following it
(assuming the line of flight remains a straight line)”.

There are at least two answers to that question (for a broader discussion see [20]).
First, of course, the pure pursuit problem is simply interesting from a mathematical
point of view. And second, if the merchant vessel deviates from its straight path
and starts executing an active evasion plan, then the pirate ship is going to have to
recalculate its intercept course continually anyway. A pure pursuit strategy is just
one way to specify how to do repetitive new course calculations. And, in any case,
even for the merchant vessel sticking to a straight line escape path, determining
the intercept course for the pirate ship is a nontrivial calculation. In the days of
submarine warfare in World War II [20], for example, this was a most practical
problem - submarines fired their torpedoes on intercept courses at unsuspecting,
that is, nonmaneuvering, enemy surface ships. Today, it isn’t such an important
problem because, unlike the torpedoes from yesteryear, modern torpedoes use what
is called “active tracking”, that is, they have onboard sensors and computers that
continually locate the target no matter how that target moves. Still, the mathematics
of interception remains elegant.

Let us suppose that the torpedo T is to intercept an enemy surface ship E (as
shown in Figure 3.8), with E moving on a straight path and T moving on a straight
path to intercept E at point I. If we assume that E and T move with constant speeds
VE and VT , respectively, then at the intercept point I the ratio of the two distances
travelled from the instant of the torpedo firing must equal the ratio of the two speeds,

IT VT
= = k, (3.32)
IE VE

where k is a constant (k > 1 is the usual case, but the k < 1 case will be of interest
to us, too, before we are done).

Equation (3.32) is the mathematical statement of the physically obvious fact

63
Chapter 3. The Pursuit Problem

Figure 3.9: The Apollonius circle centered on (2/3, 0) with radius 2/3, given by
equation (3.34) for m = 1, p = 2, and k = 2, so that the torpedo is located at T(2, 0)
and the enemy ship is at E(1, 0)

that, for an interception to occur, the torpedo and the ship must reach point I
simultaneously. It is not enough for E and T to pass thorough I individually - they
must be at I at the same time. To find where I is, given the locations of E and
T at time t = 0, the two speeds VE and VT , and the direction of E’s motion (the
“heading” of E), what we must do first is find the set S of all the points in the plane
such that (3.32) is satisfied. The point I can be any one of the points (there can be
more than one) in S that also lie on the path of E.

Now we need to identify, what is S. With no loss in generality we can draw a


rectangular coordinate system such that E and T are both, at t = 0, on the positive
horizontal axis with T to the right of E (see Figure 3.9). If we denote the coordinates
of E and T by (m, 0) and (p, 0), respectively, with p > m (we use m and p to retain

64
Chapter 3. The Pursuit Problem

a link with our original discussion of Bouguer’s merchant vessel and pirate ship) and
if (x, y) is any point in S, then (3.32) becomes
q
(x − p)2 + y 2
q = k. (3.33)
(x − m)2 + y 2

If you now go through a few algebraic manipulations, then you should be able to
confirm that (3.33) can be written as
· ¸2 · ¸2
k2m − p 2 k(p − m)
x− 2 +y = . (3.34)
k −1 1 − k2

But this is the equation of a circle, with it center on the horizontal axis at ((k 2 m −
p)/(k 2 − 1), 0) and a radius of k(p − m)/|1 − k 2 |. The set S is a circle, called
the Apollonius circle of the two points E and T (in their t = 0 locations on the
horizontal axis), which is named after the third-century B.C. Greek mathematician
Apollonius of Perga [20]. Apollonius realized (in his lost work Plane Loci) that (3.32)
is a way to define a circle in a manner different from the usual Euclidean geometry
definition (the path traced by a moving point that remains a fixed distance from a
given point). The definition in (3.32) predates Apollonius, however, being known a
century earlier to Aristotle. If m = 1, p = 2, and k = 2, for example, the Apollonius
circle is centered on ( 32 , 0) with a radius of 23 ; see Figure 3.9, where the center of
the Apollonius circle is marked with an X and labeled small circles indicate the
initial locations of the torpedo and the enemy ship. For the submarine to determine
where to aim its torpedo (that is, to locate the point I), all that remains to do is to
see where E’s path intersects the Apollonius circle. The intersection point is I. For
example, you can see from Figure 3.9 that I is, approximately, at (1, 0.58) if E has
a heading angle of 90◦ .

Now, what if k < 1, meaning, what if the torpedo is slower than the surface ship?
To be specific, let us now take k = 1/2, which reduces (3.34) (with m = 1 and p = 2)

65
Chapter 3. The Pursuit Problem

Figure 3.10: The Apollonius circle centered on (7/3, 0) with radius 2/3, given by
equation (3.34) for m = 1, p = 2, and k = 1/2, so that the torpedo is located at
T(2, 0) and the enemy ship is at E(1, 0)

to µ ¶2 µ ¶2
7 22
x− +y = .
3 3
That is, the Apollonius circle is still of radius 2/3, but now is centered on (7/3, 0),
which means the center of the Apollonius circle is now to the right of the initial
location of T, as shown in Figure 3.10. You can see that now the torpedo may or
may not be able to intercept the enemy ship - it is all a function of the heading angle
of the ship. If the heading angle is sufficiently small that the ship’s path crosses
the Apollonius circle, then an interception by a slow torpedo of a fast enemy ship
is possible (in fact, there will generally be two possible interception points), a result
that often surprises.

Instead of considering specific values of k, m, and p, it is not at all difficult to

66
Chapter 3. The Pursuit Problem

Figure 3.11: The general geometry for a slow torpedo (T) interception of a fast enemy
surface ship (E) (heading with an angle θ), where the Apollonius circle for the points
(m, 0) and (p, 0), p > m, is given by equation (3.34) for k < 1

be much more general and to derive an astonishingly simple condition that will tell
us, for any k < 1, if a slow torpedo interception is, first, even possible and, if it is,
where on the Apollonius circle the submarine should aim its slow torpedo. Equation
(3.34) tells us that, for k < 1, the Apollonius circle for the points (m, 0) and (p, 0),
p > m, is centered on the point C at ((p − k 2 m)/(1 − k 2 ), 0) and has a radius
of k(p − m)/(1 − k 2 ), as illustrated in Figure 3.11. Now, imagine that the enemy
ship’s heading angle is θ, so that the ship just touches the Apollonius circle at A.
If the absolute value of the heading angle is greater than θ then no interception is
possible, and if the absolute value of the heading angle is less than θ then the enemy
ship’s path will cross the Apollonius circle twice and so there will be two possible
interception points I. We can find a formula for θ, as follows.

The line AC, a radius of the Apollonius circle, is perpendicular to the tangent
line EA, and so the triangle ECA is a right triangle. Thus,

AC
sin θ = . (3.35)
EC

67
Chapter 3. The Pursuit Problem

The radius of the circle, as started before, is

k(p − m)
AC = ,
1 − k2

while µ ¶
p − k2m (p − m)
EC = ET + T C = (p − m) + − p = .
1 − k2 1 − k2
Inserting these expressions for AC and EC into (3.35) we arrive at

VT
sin θ = k = ,
VE

that is,
µ ¶
−1 VT
θ = sin . (3.36)
VE

If α is the heading angle of the enemy surface ship, then an interception using a slow
torpedo is possible if −θ ≤ α ≤ θ, and impossible otherwise.

Therefore, we can conclude that Apollonius circles can be used in the PE problems
to analyze how to find a better strategy to escape or prolong the capture time
whenever a successful escape is not possible.

68
Chapter 4

The Evasion Problem

4.1 Statement of the Problem

In this chapter we present PE problems with the emphasis on evasion. Let us assume
that two points, one of which we shall call “pursuing” (P) and the other “evading”
(E), are moving in X ⊂ Rn :

x0 = f (x, t), y 0 = g(y, v, t), (4.1)

where v, V, and y(t) are the control parameter, the control region, and the trajectory
of the motion of the evading point E, respectively, and x (t) is the trajectory of the
motion of the pursuing point P.

Let v (t) be a certain admissible control (i.e., piecewise continuous), and let x (t)
and y(t) be the corresponding trajectories with initial conditions

x(0) = x0 , y(0) = y0 . (4.2)

If x (t 1 ) = y(t1 ) for some t1 > 0, we shall call t1 an encounter time, and the very
occurrence that x (t1 ) = y(t1 ) will be referred to as an encounter. If the control v (t)

69
Chapter 4. The Evasion Problem

is chosen arbitrarily, an encounter may not occur for any t > 0. If an encounter
does occur, we shall call the control (which is an admissible control) v (t) an evading
control. Even then, for the given x0 , y0 , and the chosen control v (t), more than one
encounter may take place. We shall call the largest positive number t1 , which is an
encounter time, the evading time corresponding to the control v (t). We shall denote
the evading time by

T = max Tv . (4.3)
v∈V

In what follows, the initial conditions (4.2) will be assumed to be fixed (in this
connection, x0 and y0 do not enter into the notation for the evading time). Therefore,
we get a statement of the evasion problem.

Definition The problem is called an evasion problem if it is defined by equations


(4.1) - (4.3)
x0 = f (x, t), y 0 = g(y, v, t),

x(0) = x0 , y(0) = y0 ,

T = max Tv ,
v∈V

where x and y belong to X ⊂ Rn , v ∈ V ⊂ Rr and is admissible (piecewise conti-


nuous).

70
Chapter 4. The Evasion Problem

Figure 4.1: P and E defending and attacking, respectively, the target area C

4.2 Isaacs’s Problem

One of the classic general evasion problems is Isaacs’s guarding the target problem
by the American mathematician Rufus Isaacs (1914 - 1981) [6], [14],[20], [27]. Isaacs
states his problem, along with giving its general solution, as follows.

“Both P and E (pursuer and evader) travel with the same speed. The motive of
P is to guard a target C, which we take as an area in the plane, from attack by E.
The optimal strategies for both P and E are: draw the perpendicular bisector of P E
(where P and E denote starting positions). Any point in the half-plane above this
line can be reached by E prior to P , and this property fails in the lower half-plane.

71
Chapter 4. The Evasion Problem

Figure 4.2: The geometry of Isaacs’s problem for P defending a point target C, and E
attacking same target; l1 is the perpendicular bisector of P E, l2 is the perpendicular
line segment from C to l1

Clearly, E should head for the best of his accessible points. Let D be the point of the
bisector nearest C. The optimal strategies for both P and E decree that they travel
toward D. When does the capture occur?”

The military conception of this problem by Isaacs is, E must reach at least the
boundary of C to be successful in his attack. It may seem easy for E to reach
an interior point of C (and thus, hit C), but it is not necessarily true. Here P is
successful in defeating E if the capture point D (see Figure 4.1) is anywhere outside
C. However, we need to know some additional information about the shape and
dimensions of C to say more about the strategies of P and E, and the solution of

72
Chapter 4. The Evasion Problem

the given problem.

Let us assume that C is, for example, the location of a specific enemy commander,
or an enemy radio transmitter. Let us also consider that E is carrying an explosive
device, which, when detonated, has a circular radius of destruction R > 0. For P let
us say that it can stop E only by direct impact, i.e., P must intercept E. Therefore,
for E to be successful, it must come within a distance less than or equal to R before
P reaches E. Now the problem can be formulated in the following manner: “we
have a crude model for an attacking missile versus a missile defense system that is
supposed to protect an area, for example, a city, against a ballistic missile attack”
[20].

Without any loss of generality we can place P at time t = 0 at the origin of an


x-y coordinate system, and the point target C on the x-axis at x = xc > 0. That is,
the target is to the right of P . The case where the target is initially to the left of P
is a mirror-image of our assumed case.

Let E be at (x0 , y0 ) at time t = 0. We need to remark that we consider the case


where C is not one of the E’s accessible points, i.e., assume that y0 is sufficiently
large. If C is one of E’s accessible points, E can destroy C by actually reaching C
before P can reach E.

We can easily find the equation of the bisector line l1


x0 x2 + y02
y=− x+ 0 , (4.4)
y0 2y0
which has the required slope and pass through the point midway between P and E
at time t = 0, i.e., the point (x0 /2, y0 /2). Then, l2 is the perpendicular line segment
from the point target C to l1 , and the length of l2 is the closest approach distance
of E to C. The slope of l2 is y0 /x0 and, since it passes through the point (xc , 0), we
get the equation of l2 :
y0 y0
y= x − xc . (4.5)
x0 x0

73
Chapter 4. The Evasion Problem

Thus, the point of closest approach of E to C is the intersection of l1 and l2 , which


is the point (X, Y ). Hence, we can find the values of X and Y from the equations
(4.4) and (4.5):
y0 y0 x0 x2 + y02
X − xc = − X + 0 ,
x0 x0 y0 2y0
which gives us
y02 x0
X= 2
x + ,
2 c
(4.6)
x0 + y 0 2
and from any of the two equations we had above (4.4), (4.5) we get
µ ¶
y0 y02 x0 y0
Y = 2
x +
2 c
− xc . (4.7)
x0 x0 + y0 2 x0
Now we can find the length (squared) of l2 , and it is (X − xc )2 + Y 2 , or
· ¸2 · µ ¶ ¸2
y02 x0 y0 y02 x0 y0
xc + − xc + xc + − xc ,
x20 + y02 2 x0 x20 + y02 2 x0
which after simplifying will be equal to
2
[x0 (x20 + y02 ) − 2xc x20 ]
.
4x20 (x20 + y02 )
Then, for E to achieve its mission goal of destroying C, the circular radius of de-
struction R (squared) of E’s weapon must exceed the length (squared) of l2 , i.e.,
2
2 [x0 (x20 + y02 ) − 2xc x20 ]
R > ,
4x20 (x20 + y02 )
or
x0 (x20 + y02 ) − 2xc x20
R> p .
2x0 x20 + y02
Therefore, we get
µ ¶
R (x0 /xc )2 + (y0 /xc )2 − 2 (x0 /xc ) x0 y 0
> q =F , . (4.8)
xc xc xc
2 (x0 /xc )2 + (y0 /xc )2

Figure 4.3 shows several of the curves that represent the right-hand side of (4.8).
Each curve gives the minimum value of R/xc , as a function of x0 /xc , for a given

74
Chapter 4. The Evasion Problem

4
y0/xc=7

3 6

2 4

3
1
2

1.1
0
0 1 2 3 4 5 6
x /x
0 c

Figure 4.3: Plot of F (x0 /xc , y0 /xc ) given in equation (4.8) as a function of x0 /xc
for a given fixed value of y0 /xc (y0 /xc = 1.1, 2, 3, 4, 5, 6, 7). Each curve gives the
minimum value of R/xc

fixed value of y0 /xc (the label-value next to each curve). From these curves E can
determine the minimum value of R (the amount of explosive) required for success in
destroying C as a function of both E’s starting point and the location of the target.

75
Chapter 4. The Evasion Problem

4.3 Lady in the Lake Problem

The lady in the lake problem became famous decades ago, when it appeared in
Martin Gardner’s “Mathematical Games” column in Scientific American [11] in 1975.
Gardner presented the problem as follows:

A young lady was vacationing on Circle Lake, a large artificial body of water
named for its precisely circular shape. To escape from a man who was pursuing her,
she got into a rowboat and rowed to the center of the lake, where a raft was anchored.
The man decided to wait it out on shore. He knew she would have to come ashore
eventually. Since he could run four times faster than she could row, he assumed
that it would be a simple matter to catch her as soon as her boat touched the lake’s
edge. But the girl - a mathematics major at Radcliffe - gave some thought to her
predicament. She knew that on foot she could outrun the man (which does raise the
question of why such a smart lady got herself into this situation in the first place by
rowing out into a lake!). It was only necessary to devise a rowing strategy that would
get her to a point on shore before he could get there. She soon hit on a simple plan,
and her applied mathematics applied successfully. What was the girl’s strategy?

The lady’s escape strategy consists of two stages. She first hops into her boat
and rows away from the raft in such a way that she, the raft, and the man are always
collinear. This first part of the lady’s rowing path will clearly have to change direction
constantly to continually maintain collinearity because the man will instantly begin
running around the lake’s edge in his attempt to intercept her at the shore. This
is illustrated in Figure 4.4, where we assumed that the man runs counterclockwise
around the lake. We will show later that the lady can maintain collinearity at least
for a while. Let us assume that the man runs at speed v and that the lady rows with
speed αv. Thus, in the original statement of the problem α = 0.25. We see from
Figure 4.4 that the man opens up the angle θ at the rate of dθ/dt = v/R, where θ is

76
Chapter 4. The Evasion Problem

Figure 4.4: The first stage of the lady’s escape

measured with respect to the line initially joining the man and the lady on the raft.
Without any loss of generality we can assume that this initial line is the vertical axis
of our coordinate system, as shown in Figure 4.4. Since the lady’s angular speed
component must be

vθ = r
dt

for her to maintain the raft between herself and the man, we can write her angular
speed as

r
vθ = v . (4.9)
R

The farther she gets from the raft, then, (4.9) tell us, the greater must be her angular
speed if she is to maintain collinearity.

Next, since the lady’s total speed through the water is αv, her radial speed

77
Chapter 4. The Evasion Problem

component (vr ) must be such that

vr2 + vθ2 = (αv)2 ,

because her total speed is geometrically represented by the hypotenuse of a right


triangle, with perpendicular sides vr and vθ . Thus,
q r
r2
vr = α2 v 2 − vθ2 = α2 v 2 − v 2 ,
R2

or
r
dr r2
vr = =v α2 − . (4.10)
dt R2

The lady has a positive vr (that is, she moves ever closer to shore, all the while
keeping half the lake between herself and the man) as long as α2 − r2 /R2 > 0, that
is, until r = αR. At the instant her vr drops to zero she switches to the second stage
of her escape strategy, which we will describe below.

First, let us calculate, how long it takes her to arrive at the condition vr = 0.
Since dt = dr/vr , then if we call t = T the time at which vr = 0, we have

ZT ZαR ZαR ZαR


dr dr R dr
dt = T = = p = q
vr 2 2
v α − r /R 2 v
0 0 0 0 (αR)2 − r2
R ³ −1 ³ r ´´¯¯αR R −1
= sin ¯ = sin (1),
v αR 0 v
or

πR
T = . (4.11)
2v

When the lady arrives at the circle with radius αR centered on the raft, at time
t = T , she has arrived at what we call the “go-for-broke” circle, because now that
she is no longer moving ever closer to shore with the first part of her escape strategy,
she forgets about maintaining collinearity and simply rows straight for shore at her

78
Chapter 4. The Evasion Problem

full water speed of αv. She has distance R − αR to row (at speed αv) and the man
has distance πR (half the circumference of the lake) to run at speed v. She gets to
shore before he gets to her if
R − αR πR
< ,
αv v
or
R(1 − α) < παR,

or
1 − α < πα,

or
1 < α(1 + π),

or, at last, if

α > 1/(1 + π) = 0.241453. (4.12)

Since α = 0.25 in the Scientific American version of the problem, we see that this
two-stage escape strategy works and that the lady’s virtue is preserved.

Of course, if α is sufficiently large there is no need for a two-stage escape strategy.


It is easy to see that if α is “big enough” then all the lady needs to do is immediately
row directly to shore, to the point directly opposite the man’s location. She gets to
shore before he gets to her if
R πR
< ,
αv v
that is, if α > 1/π = 0.3183099.... Still, while not essential for her, the two-stage
strategy will give the lady a little extra head start on the man, and it is interesting
to calculate how much this head starts is for α = 1/π. As before, in the two-stage
strategy the man, the raft, and the lady remain collinear until the lady reaches the
go-for-broke circle, with radius αR = R/π. Then she rows straight for shore, now

79
Chapter 4. The Evasion Problem

distance R − R/π = R(1 − 1/π) away. Since her rowing speed is αv = v/π, this
requires a time (during her second stage) of

R(1 − 1/π) R
= (π − 1).
v/π v
The man reaches her landing point on the shore after running halfway around the
lake, which requires a time (starting at the instant the lady “goes for broke”) of

πR R
= π.
v v
So, she arrives at her landing point on the shore before he does by a time interval of

R R R
π − (π − 1) = .
v π v
To put this head start (in time) in perspective, it is the time it takes the man to run
distance R, the radius of the lake.

Let us suppose now that the lady does not have a big α. Suppose, in fact, that
it is smaller than (1 + π)−1 . Is it then impossible for her to escape from the man?
Actually, if we make a plausible assumption about the man’s reasoning (meaning, he
is rational), then it is still possible for a slow-rowing lady to escape. Since the lady
is a Radcliffe math major, and the man surely knows some math, too, therefore, let
us assume that, as soon as the lady leaves the raft and begins to execute the first
stage of her escape strategy, the man deduces what she is up to. That is, he observes
that as he moves, she moves to keep the raft between him and her even as she moves
ever closer to the shore. He then further deduces that as soon as she reaches her
go-for-broke circle she will head straight for the shore. So, here is our assumption
- as soon as he sees her go to the second stage of her escape strategy, that is, at
the instant she makes straight for shore, he stops watching her carefully and simply
runs around the lake to the point on the shore where he now knows she is heading.
The only thing that will cause him to reevaluate matters is if the lady stops her
go-for-broke rowing and, for whatever reason, begins to move back toward the raft.

80
Chapter 4. The Evasion Problem

Figure 4.5: The instant when the lady reaches her go-for-broke circle

However, being a clever math major, and knowing her α is less than (1 + π)−1 ,
she has one last trick up her sleeve. She will, indeed, row a straight-line path to
shore as soon as she reaches her go-for-broke circle, but it will not be the shortest
distance straight-line path that the man thinks she will row. To see what she has in
mind instead, look at Figure 4.5, where, with no loss in generality, we put the lady’s
position at the instant she reaches her go-for-broke circle at (αR, 0). The man’s
position at that instant is (−R, 0). In the notation of the Figure 4.5, φ is the angle
the straight line joining the raft to the lady’s landing point on the shore (S) makes
with the horizontal axis. The man is assuming that φ = 0, but he is wrong, as you
will see soon.

Let us talk about the lady’s new escape strategy. First, we will simplify our
calculations by noticing that the ratio of the radius of the go-for-broke circle to the
radius of the lake is αR/R = α. If we next denote the radius of the go-for-broke

81
Chapter 4. The Evasion Problem

circle as our unit distance, then αR = 1, and so

α = 1/R. (4.13)

What this means is that if we wish to find the smallest value for α for which the lady
can still escape, then an equivalent problem is that of finding the largest R for which
the lady can still escape. And finally, since the lady rows at speed αv, we can write
her rowing speed as (1/R)v = v/R. We can now set the problem up mathematically
as follows. When the lady reaches her go-for-broke circle (point L in the figure), she
is distance αR = 1 from the raft, and the law of cosines tells us that the distance
LS she has left to row to the shore to reach point S is
p
LS = 1 + R2 − 2R cos φ.

This takes her a time interval of


p
LS 1 + R2 − 2R cos φ Rp
= = 1 + R2 − 2R cos φ (4.14)
v/R v/R v
to row.

The man is running clockwise around the lake to S (see Figure 4.5). We will
quote Schuurman and Lodder [29] about what both the lady and the man conclude
once she reaches her go-for-broke circle: “... she performs an infinitesimal radial
feint (toward the shore that leads the man to start running clockwise). From the
moment on, (the man’s) best policy is to continue running clockwise if (the lady)
goes to shore along a straight line not crossing the (go-for-broke) circle. If (the man)
would return, a new diametrical mutual position, advantageous to (the lady) would
be established.” This last sentence is important to understand. It points out that
the man should at any time reverse his running direction around the lake, then the
lady could, at the least, start rowing directly away from him at the instant of his
reversal and head straight for shore. That would have her starting the second stage
of her original escape strategy from a point beyond the go-for-broke circle, and yet

82
Chapter 4. The Evasion Problem

still leave the man with half the lake’s circumference to travel. Even better (from
the lady’s point of view), would be for her to simply flip the sign of φ, and then
the situation is just as it was before he switched. So, once the man has committed
to a running direction, we see that he gains nothing by reversing his decision - he
therefore will run through the angle π + φ to reach S. The time required for the man
to run distance (π + φ)R around the lake to S is
R
(π + φ). (4.15)
v
Thus, the lady will just escape the man if the two times given by (4.14) and (4.15)
are equal, that is, if
p
π+φ= 1 + R2 − 2R cos φ.

Squaring both sides and solving for R gives


p
R = cosφ ± cos2 φ + (π + φ)2 − 1,

and since R > 0 we must use the plus sign,


p
R = cosφ + cos2 φ + (π + φ)2 − 1. (4.16)

Figure 4.6 shows the behavior of R(φ), and it is obviously a nondecreasing function
of φ. To find the smallest α for which the lady escapes we must use the largest
possible value for R (see equation (4.13)). That is, we want to find the value of φ
that maximizes R(φ). Now, even though R continually gets bigger with increasing
φ, there is a limit on how big φ can be. If φ exceeds the value it has such that the
line LS (in Figure 4.5) is tangent to the go-for-broke circle, then the lady’s rowing
path will take her back inside the go-for-broke circle, that is, she will have a radial
speed component pointing back toward the raft (which is not a feature we expect
in an escape strategy). That is, the lady should pick φ such that the line LS is
perpendicular to the x - axis. From Figure 4.5 we see that this value of φ (= φt )
satisfies the condition
cos(φt ) = αR/R = 1/R.

83
Chapter 4. The Evasion Problem

6.5

5.5

4.5

4
0 0.5 1 1.5 2 2.5 3 3.5 4
φ (radians)

Figure 4.6: The radius of the lake R(φ) (in radians) given by equation (4.12)

If we substitute this condition in (4.16) we get

1 p
= cos(φt ) + cos2 φt + (π + φt )2 − 1,
cos(φt )

which reduces to the equation

tan(φt ) = π + φt . (4.17)

It is clear, simply by sketching the curves for each side of (4.17), that there is a solu-
tion to (4.17) somewhere in the interval (0, π/2). In [20] it was found (by numerical
means) with the result φt = 1.3518168... radians, that

cos(φt ) = 1/Rmax = αmin = 0.2172336....

Alternatively, the lady can escape even if the man runs 1/αmin = 4.6033388... times
as fast as she can row, which is significantly greater than the factor of four given in
the Scientific American version of the problem.

84
Chapter 5

Pursuit-Evasion Problem as an
Optimal Control Problem

5.1 Basic Concepts

Let us assume that two points, one of which we shall call “pursuing” and the other
“pursued” or “evading”, are moving in X ⊂ Rn . The motion of each of these points is
subject to its own particular system of differential equations with its own particular
control parameter. We shall denote the control parameter, the control region, and the
trajectory of the motion of the pursuing point by u, U ⊂ Rr , and x (t), respectively.
We shall denote these quantities for the pursued point by the symbols v, V ⊂ Rr ,
and y(t).

Let u(t) and v (t) be certain admissible controls (i.e., piecewise continuous), and
let x (t) and y(t) be the corresponding trajectories with initial conditions

x(0) = x0 , y(0) = y0 . (5.1)

If x (t 1 ) = y(t1 ) for some t1 > 0, we shall call t1 an encounter time, and the very

85
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

occurrence that x (t1 ) = y(t1 ) will be referred to as an encounter. Generally speaking,


if u(t) and v (t) are chosen arbitrarily, an encounter may not occur for any t > 0.
If an encounter does occur, we shall say that u(t) is a pursuing control (for a given
control v (t), and for given initial conditions x0 and y0 ). Even then, for the given x0 ,
y0 , v (t), and the chosen control u(t), more than one encounter may take place. We
shall call the smallest positive number t1 , which is an encounter time, the pursuit
time corresponding to the controls u(t) and v (t). We shall denote the pursuit time
by Tu,v . In what follows, the initial conditions (5.1) will be assumed to be fixed (in
this connection, x0 and y0 do not enter into the notation for the pursuit time).

Henceforth, we shall assume that the pursuing point has the following property:
for every given control v (t) there exists (for given initial conditions (5.1)) a pursuing
control u(t).

If the control v (t) of the evading point has been chosen, we can pose the problem
of finding a pursuing control u(t) such that the corresponding pursuit time Tu,v takes
on a minimal value. We shall assume that there exists, for every admissible control
v (t), an admissible control u(t) which brings about the minimum of the pursuit
times. We shall denote the minimum by Tv :

Tv = min Tu,v .
u

Furthermore, we shall assume that there exists an admissible control v (t) which
brings about the maximum of the values of Tv . We shall denote this maximum by
T:

T = max Tv = max (min Tu,v ). (5.2)


v v u

Similarly,
T = min Tu = min (max Tv,u ).
u u v

Moreover,
min (max Tv,u ) = max (min Tu,v ).
u v v u

86
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

The problem consists of finding a pair of admissible controls u(t) and v (t) such
that Tu,v = Tv,u = T . Such a pair u(t) and v (t) will be called an optimal pair
of controls; the corresponding pair of trajectories x (t) and y(t) (with initial values
(5.1)) will be called an optimal pair of trajectories. Thus, the control u (for a given
control v (t)) is to be chosen in such a way that the encounter of the pursuing and
pursued points will take place as soon as possible. The choice of the control v, on
the other hand, is aimed at putting off the encounter as long as possible.

Remark Let us consider the case explained by equation (5.2). Note, that in choosing
the control u(t) (which defines the motion of the pursuing point), we shall always
assume that the control v (t) for the evading point is known beforehand.

In accordance with this fact, in order to determine T , first the minimum with respect
to all possible controls u(t) is taken for a certain fixed control v (t), then the maximum
with respect to all possible controls v (t) is taken.

To solve the given problem, we shall assume that the motion of the pursuing
point in X is described by the linear equation (in vector form)

dx
= f (x, u) ≡ Ax + Bu + c, (5.3)
dt

for which the corresponding control region U is a closed, convex, bounded polyhedron
in Rr , of the variable u = (u1 , ..., ur ). Let the motion of the evading point be described
by the equation (in vector form)

dy
= g(y, v, t) (5.4)
dt

and let the corresponding control region V be a set in the s-dimensional space Rs of
the variable v = (v 1 , ..., v s ). We shall assume that the set of all piecewise continuous
controls is the class of admissible controls (both for u and for v). We shall impose
the usual conditions (continuity in y, v, and t, and continuous differentiability with

87
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

respect to the coordinates y 1 , ..., y n of y) on the coordinates of the vector function


g(y, v, t).

To solve the given problem we can use Pontryagin’s maximum principle. We shall
introduce two auxiliary vectors

ψ = (ψ1 , ..., ψn ), χ = (χ1 , ..., χn ),

and two Hamiltonian functions


n
X
H1 (ψ, x, u) = ψα f α (x, u) = (ψ, f (x, u)),
α=1

n
X
H2 (χ, y, v) = χα g α (y, v, t) = (χ, g(y, v, t)),
α=1
corresponding to the pursuing and pursued objects. We can write the following two
systems of equations for the auxiliary unknowns ψi and χi with the aid of H1 and
H2 :
dψi ∂H1
= − i , i = 1, 2, ..., n, (5.5)
dt ∂x

dχi ∂H2
= − i , i = 1, 2, ..., n, (5.6)
dt ∂y

Suppose that u(t), x(t), v(t) and y(t) are given. Then, if we substitute these
functions in the right-hand sides of systems (5.5) and (5.6), we obtain linear systems
in the unknowns ψi and χi . Every solution ψ(t), χ(t) of these systems will be said to
correspond to the chosen functions u(t), x(t), v(t), and y(t). The following theorem
gives a necessary condition for optimality in the problem under consideration.

Theorem 5.1.1 Let u(t) and v(t) be an optimal pair of controls, let x(t) and y(t) be
the corresponding optimal pair of trajectories, and let T be the pursuit time. Then,

88
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

there exist nontrivial solutions ψ(t) and χ(t) of systems (5.5) and (5.6) which cor-
respond to u(t), x(t), v(t), and y(t) such that:

1. the Maximum conditions

max H1 (ψ(t), x(t), u) = H1 (ψ(t), x(t), u(t)), (5.7)


u∈U

max H2 (χ(t), y(t), v) = H2 (χ(t), y(t), v(t)) (5.8)


v∈V

hold for all t, 0 ≤ t ≤ T;

2. At the time t = T, the conditions

H1 (ψ(T ), x(T ), u(T )) ≥ H2 (χ(T ), y(T ), v(T )), (5.9)

ψ(T ) = χ(T ) (5.10)

hold.

The details of the proof of theorem 5.1.1 are long and involved, the reader can
find them in [24]. Although this theorem gives the necessary conditions of optimality
for PE problems (and it can be generalized to multiple pursuers and multiple evaders
as in [7]), the fact is that the PE problems studied below can be analyzed directly
by more elementary methods.

89
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

5.2 Simple Pursuit in the Plane

In the simple pursuit problem (we follow the representation given in [12]) two players
move in the Euclidean plane R2 with simple motion: each has a bound on his speed,
but there are no further restrictions (e.g., abrupt directional changes are allowed).
One player, the pursuer, wishes to capture the other, the evader, that is, attain
perfect coincidence of their terminal positions. Here if α > β holds, for the pursuer’s
speed bound α and the evader’s β, then termination is assured in finite time, whatever
the initial positions and action of evader. On the other hand, in the case α ≤ β the
evader can avoid capture forever from any positions not in contact initially.

First, let us discuss briefly some aspects of simple motion for a single player. If
the player’s position at time t ∈ R1 is denoted by x(t) ∈ R1 , then the velocity vector
is ẋ(t), and the speed |ẋ(t)|. Thus the dynamical constraint is |ẋ | ≤ α, and the
following holds for the control u:

ẋ = u; u : R1 → R1 , |u(t)| ≤ α. (5.11)

Hence,
Zt
x(t) = x(0) + u(s)ds
0
for some function u(·) as given above. For simplicity let us place the origin at x(0),
so that x(0) = 0.

We want to know, where can the player get to at time t. The constraint on u(·)
yields
Zt
|x(t)| ≤ |u(s)| ds ≤ αt.
0
Any point y with |y| ≤ αt can be “attained” by control u:

 α y , for 0 ≤ s ≤ |y|;
|y|
u(s) =
 0, for |y| < s ≤ t.

90
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

Figure 5.1: Simple motion in the plane. Point x moves anywhere within Ax (3) at
time t = 3; if its position y at t = 1 or z at t = 2 is known, the possibilities are:
reduce to Ay (2) or Az (1).

Thus, the attainability set (“reachable set”) can be defined by A0 = {y : |y| ≤ αt}.
Figure 5.1 shows the simple motion in the plane, according to the defined attainability
set rule.

Let us return to the game, in the case α > β. If the pursuer’s motion is x:
R1 → R1 and the evader’s y : R1 → R1 , the equations of motion are

ẋ = u, ẏ = v (5.12)

for suitable controls u, v : R1 → R2 with all values |u(t)| ≤ α, |v(t)| ≤ β. At any


time denote the player’s distance by r = |x − y|. Then

d 1
rṙ = · |x − y|2 = (x − y)0 · (ẋ − ẏ) = (x − y)0 u − (x − y)0 v. (5.13)
dt 2

91
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

The natural strategy for the pursuer is to take

y−x α
u = ẋ = α = (y − x).
|y − x| r

Then, in (5.13),

r2
rṙ = −α − (x − y)0 v ≤ −αr + rβ = −(α − β)r
r

by Cauchy’s inequality on the last term. Therefore, as long as r > 0, we have


ṙ ≤ −(α − β),
r(t) ≤ r(0) − (α − β)t = |x0 − y0 | − (α − β)t.

This shows that capture (r = 0) for the case α > β must occur at some time T with

|x0 − y0 |
T ≤ . (5.14)
α−β

92
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

5.3 One-dimensional Rocket Chase

Next, we consider a general problem of the rocket chase and we base the discussion
on [12].

Two players move on a straight line, the pursuer having a bound on his accele-
ration, the evader a bound on his speed. The game ends when the pursuer attains a
previously given distance from the evader.

There is an obvious solution: the pursuer uses all his capabilities to move toward
the evader, who is then captured within a bound time interval. (The precise time
bound will depend on the parameters of the game, and on the initial positions.)

If x : R1 → R1 describes the pursuer’s motion, and y : R1 → R1 describes the


evader’s motion, then the equations of motion are

ẍ = u, ẏ = v

for admissible u, v : R1 → [−1, 1]. Consider 1 as a bound for both controls, and ε
as the evader’s distance, 0 ≤ ε < +∞. Thus, the evader moves on R1 with simple
motion, in the sense of what is given in the previous example. The pursuer’s motion
is described by

Z t Zs Zt
x(t) = x(0) + ẋ(0)t + u(r)drds = x(0) + ẋ(0)t + (t − s)u(s)ds (5.15)
0 0 0

and suggested by the attainability sets in Figure 5.2, where

t2 t2
x0 + ẋ0 t − ≤ x(t) ≤ x0 + ẋ0 t + ,
2 2

y0 − t ≤ y(t) ≤ y0 + t,

since u = v = 1.

93
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

Figure 5.2: Phase portrait of motion in ẍ = u in the x-y plane, where x(t) is given
by equation (5.15) for x(0) = 0, ẋ(0) = 2; attainability sets at t = 2/3, 4/3, 2, 8/3
forpthe same initial values x(0) and ẋ(0). The vertex loci are parabolas ẋ = y =
± 2(x + 2)

The first-order version of the motion equation is the dynamical equation for the
two-player system:

ẋ1 = x2 , ẋ2 = u, ẋ3 = v.

Subsequently the matrix form of this will be treated,

        
ẋ1 0 1 0 x1 0 0
        
        
 ẋ2   0 0 0   x2  +  u  +  0 
=
        
ẋ3 0 0 0 x3 0 v

94
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

Figure 5.3: Trajectories of ẋ = y −v, ẏ = u in the x-y plane with u = v = ±1 outside


target |x| ≤ ε

the termination condition |x − y| ≤ ε translating to

 
x1
 
 
(1, 0, −1)  x2  ∈ [−ε, ε].
 
x3

Thus, the natural phase space is R3 . This can be reduced to R2 by introducing


new variables x = x1 −x3 , y = x2 . The resulting equations and termination condition

95
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

Figure 5.4: Trajectories of ẋ = y − v, ẏ = u in the x-y plane. From point a evader


mistakenly chooses v = −1, but reverses his choice at b; capture occurs at c (later
than it would have occurred at d)

are

ẋ = y − v, ẏ = u; |x| ≤ ε. (5.16)

Let us assume (for a preliminary orientation), that both players’ controls are
constant on some time interval. The differential equation for the trajectories, with

dy ẏ
= ,
dx ẋ

96
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

is

dy
(y − v) = u, (y − v)2 = 2ux + const. (5.17)
dx

The point then moves along the parabola, upward for u > 0 and downward for u < 0,
(see Figure 5.3 with u = v = 1 in the left half plane, and u = v = −1 in the right).

Now let us suppose that, at some point to the left of the target, the evader chooses
a control other than v = 1, and pursuer sticks with u = 1. The motion then proceeds
along another parabola (with axis y = v, see Figure 5.4 for v = −1). Therefore,
capture, even with ε = 0, can be ensured from all initial positions, for example, by
taking u = 1 quite indifferent to evader’s action.

97
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

5.4 Pursuit on a Sphere (Kelley’s game)

Another example is given by pursuit on a sphere, originally formulated by Kelly. We


follow the presentation of [12].

Two players move on the two-sphere S 2 in R3 , each with a fixed bound on his
speed. The game ends at the coincidence of positions.

Here the idea is that “in a dogfight, the planes tend to move in a circular fashion”.
The simplification does away with one significant aspect of actual combat: that the
roles of the pursuer and the evader are not fixed, but may well switch back and forth.
The outcome is similar to the simple motion problem (section 5.2). Let us denote
the pursuer’s speed bound by α, the evader’s by β. If α > β, the pursuer can force
termination from any initial position, within a bounded time interval. In the case
α < β the evader can avoid capture at all times t > 0 (and the stand - off situation
α = β is rather too sensitive to details in the specification of the players’ strategies).

Let us talk about these different games. In the case α > β, first assume that the
players are not at diametrically opposite points initially. Then there is the unique
shortest arc γ of a great circle joining their positions. By a parallel shift along γ,
move a neighborhood of the evader’s position to the pursuer’s (this “action at a
distance” serves to identify the control of the evader). The pursuer then uses the
control u = v + w, where the first component neutralizes the evader’s action, the
second, w with magnitude |w| = α − β in the direction of γ, serves to decrease the
player’s distance (at a rate α − β, see section 5.3) until capture occurs. If their initial
positions are opposite, then any constant control u with |u| = α > β, applied over a
short interval, will achieve non-opposing positions. By a like reasoning, in the case
α < β the evader can maintain forever an initial distance from the pursuer.

The idea is probably clear enough, and will apply equally well to simple pursuit on

98
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

an n - dimensional Riemannian manifold (thus, the “diametrically opposite points”


would be replaced by conjugate points). Consider the motion of a single player over
the unit sphere S n−1 of Rn . If its motion is described by x : R1 → Rn , x(0) ∈ S n−1 ,
dx
then x(t) will remain on S n−1 iff |x(t)|2 is constant, i.e., x is perpendicular to ẋ = dt
.
Further, the motion will be “simple” if the only further dynamical restriction is a
magnitude bound on ẋ. We wish to express this as a relation between ẋ and suitable
arbitrary controls u.

Lemma 5.4.1 (for a broader discussion see [12]) For any points a 6= b on S n−1 there
is a mapping x, y 7→ E(x, y), defined for x, y near a, b and with (n, n − 1) matrices
as values, analytic in the coordinates of x, y and such that

x0 E(x, y) = 0, E 0 (x, y)E(x, y) = In−1 , (5.18)

y 0 E(x, y) = | sin ϕ|(1, 0, ..., 0), (5.19)

where ϕ is the angle between x and y.

Proof By assumption, the vectors a, b are independent, so that there is a basis for
Rn of the form a, b, c3 , ..., cn . Then

x, y, c3 , ..., cn (5.20)

remain independent if x, y are close enough to a, b. Apply the Gram-Schmidt orthogo-


nalization process to the sequence (5.20), obtaining orthonormal vectors e1 , e2 , ..., en .
Since |x| = 1, we have e1 = x, and, in the second step,

y − (x0 y)x
e2 = , (5.21)
|sin ϕ|

since
|y − (x0 y)x|2 = 1 − (x0 y)2 = 1 − cos2 ϕ.

99
Chapter 5. Pursuit-Evasion Problem as an Optimal Control Problem

Collect the column vectors e2 , ..., en into the (n, n - 1) matrix E(x, y). Equation
(5.18) holds since (x, E(x, y)) is orthonormal. The first coordinate of y 0 E(x, y) is
y 0 e2 = | sin ϕ| from (5.21). The remaining coordinates are 0, since ek is perpendicular
to both e1 = x and e2 , and hence to y also. This completes the proof.

Corollary 5.4.2 On the neighborhood of any point on S n−1 there is an analytic


mapping x 7→ E(x), whose values are (n, n − 1) matrices, and

x0 E(x) = 0, E 0 (x)E(x) = In−1 , E(tx) = E(x) f or t > 0. (5.22)

Proof The corollary will follow on taking y = b 6= ±a, and defining E(x) = E(x, b).
Positive homogeneity is ensured by extending E(·) in the obvious manner, namely
E(tx) = E(x) for t > 0.

Returning to Kelly’s game (actually, for dimension’s n ≥ 2), we may choose the
state space description

ẋ = E(x, y)u, ẏ = E(y, x)v.

The control values in Rn−1 are constrained by |u(t)| ≤ α, |v(t)| ≤ β. The initial
positions are on the unit sphere. If ϕ is the angle between x, y, and r = | sin ϕ|, then
1d 1d
rṙ = sin2 ϕ = (1 − (x0 y)2 )
2 dt 2 dt (5.23)
= −(x0 ẏ + y 0 ẋ) = −x0 E(x, y)v − y 0 E(x, y)u.

Write u = −v + w with w ∈ Rn−1 to be chosen subsequently, subject to |w| ≤ α − β.


Then, using (5.19),

rṙ = (−x0 E(y, x) + y 0 E(x, y))v − y 0 E(x, y)w


(5.24)
= 0 − | sin ϕ|(1, 0, ..., 0)w ≤ −r(α − β)

on taking w0 = (α − β)(1, 0, ..., 0). Thus, ṙ ≤ −(α − β) < 0, and sin ϕ = 0 can be
attained in finite time, i.e., the capture is in finite time.

100
Chapter 6

Conclusions

In this thesis we studied a family of mathematical problems known as pursuit-evasion


problems (PE). We presented PE problems within the classes of pursuit problems,
evasion problems, and pursuit-evasion problems. We restricted the discussion to the
deterministic approach of PE problems, and stated PE problems as optimal cont-
rol problems. Because of that, we formulated the general optimal control problem,
and discussed the two main approaches to solve this problem, namely, Pontryagin’s
maximum principle (Theorems 2.2.1, 2.2.2) (where we introduce a hamiltonian func-
tion) and Bellman’s method of dynamic programming (equation (2.29)) (where we
gave a simple example from dynamic programming to explain the main ideas of the
method). To compare the two solutions we provided an example, where Pontrya-
gin’s principle applied, but Bellman’s failed because the control was discontinuous
(Bang-Bang Problem) . Thus, we proved that the assumption on the continuous
differentiability of the functional (2.6), minimizing the transition time from the ini-
tial point to some other given terminal point, did not hold in the simplest cases.
Therefore, we showed that in general Bellman’s consideration yields a good heuristic
method, rather than a mathematical solution of the problem.

101
Chapter 6. Conclusions

Since there are different formulations of PE problems, we stated the definition of


the PE problem from the point of view of the pursuer, the evader, and both. For the
pursuit problems, we presented some main examples, namely, the Pierre Bouguer’s
pursuit problem, the wind-blown plane problem, the tractrix, and the Apollonius
pursuit problem. In the Pierre Bouguer’s pursuit problem, which we explained as a
pure pursuit problem, we treated the case of a pirate ship pursuing a merchant vessel,
and determine the equation of the the trajectory of the pirate ship (the pursuer),
called the line of pursuit. We identified when did the capture occur, and talked about
the cases where the pirate ship was slower than the merchant vessel (no capture),
and where the pirate ship was faster than the vessel (there was a capture, and we
calculated the total distance travelled by the pirate ship until its capture of the
merchant vessel for that case). It was also interesting for us to find out that in the
case where the pirate ship and the merchant vessel had equal speeds, the pursuit
became a vertically upward tail chase, since the pirate ship pulled into behind of the
merchant vessel.

In the other important example of the pursuit problem, we talked about the
pursuit of a stationary target, namely, the problem where the plane went to the city
C due west of his starting point, and the wind blew from the south. Here we identified
that for the no wind case (when the ration of the wind’s speed and the plane’s speed
is zero) the plane moved directly to the city C while always remaining on the x-
axis. For the case where the wind and the plane had equal speeds we presented
the plane’s path and showed the different situation that happened according to the
initial position of the plane. Also, when the ratio of the speeds was less that one,
the plane did reach the city, and we calculate the time of the trip.

The other pursuit curve we talked about was the tractrix, which we showed got
its name because of its path looking as the following curve, or the tailing curve. We
compared the tractrix with Bouguer’s pure pursuit curve for the special case of equal

102
Chapter 6. Conclusions

speeds for the pirate ship and the merchant vessel. Thus, we realized that the results
were quite different, since for the tractrix the constant lag was always the case, while
the constant lag of the pirate ship was an asymptotic property that developed with
the passage of time.

Then, we presented Apollonius pursuit problem, where we discussed a method of


interception, which is of great interest for the PE problems. We provided an example
of a torpedo (T) trying to pursuer an enemy ship (E). Here we identified the three
main cases of the location of T and E. We defined the set S of all the points in the
plane so that the interception can be obtained. This set S is known as the Apollonius
circle, and it is broadly used in the PE problems for analyzing how to find a better
strategy to escape or prolong the capture time whenever a successful escape is not
possible. Again, we talked about the cases of the speed differences, namely, for the
fast torpedo we determined that the interception would occur, for the slow torpedo
the interception might or might not occur.

Next, we defined the evasion problem and provided one of the general examples
of those problems, called the Isaac’s guarding the target problem, where we had P
guarding the target area C from attack by E. We formulated the military conception
of this problem, and identified the equation that gave us the evasion curves. More-
over, we showed how to determine the minimum value of the amount of explosive
(that E carries) required for success in destroying the target area C, as a function of
both E’s starting point and the location of the target.

After that we provided a problem about the lady in the lake and the man who
was trying to track her down. We talked about the strategies the lady had to have
in order to escape the man, and explained each of the possible cases. Here we
defined another interesting definition, known as a go-for-broke circle which we got
by formulating the lady’s strategies.

103
Chapter 6. Conclusions

Finally, we formulated the PE problem, where there were both objects of the
PE game, the pursuer P and the evader E. We stated the necessary conditions of
optimality for those PE problems, which were similar to Pontryagin’s maximum
principle presented before. We provided examples of a simple pursuit in the plane,
that we used as an example of the simple motion of P and E in the plane. We
explained, what did we mean by the reachable set for this case, and compared the
dynamical constraints of both P and E. In the one-dimensional rocket chase problem
we presented the solution that could be used in other PE problems as an example
of the problem, where the game ended when the pursuer attained a previously given
distance from the evader. Here we discussed the different strategies the two objects
had to pursuer there goal. Kelly’s game is an example of a pursuit on a sphere, where
the idea is that in a dogfight, the planes tend to move in a circular fashion. We
expressed the relation between the objects’ speeds and there controls. The outcome
was important, and could be used as a simple motion of the PE problem on a sphere.

104
References

[1] Barton, J.C. and Eliezer, C.J., On Pursuit Curves, Journal of Australian Math-
ematical Society, Ser. B 41, pp. 358-371 (2000)

[2] Bellman, R.E. and Dreyfus, S.E., Applied Dynamic Programming, Princeton
Univeristy Press, Princeton, NJ (1962)

[3] Bellman, R.E., Dynamic Programming, Princeton University Press, Princeton,


NJ (1957)

[4] Boole, G., Treatise on Differential Equations,Macmillan, Cambridge (1859)

[5] Bouguer, P., Sur les Courbes de Poursuites Histoire de l’Académie Royale des
Sciences, Année M.DCCXXXII, Avec les Mémoires de Mathématique et de
Physique, France (1735)

[6] Breitner, M.H communicated by Berkovitz, L.D., Historical Paper, The Genesis
of Differential Games in Light of Isaacs’ Contributions, Journal of Optimization
Theory and Applications, Vol. 124, No. 3, pp. 523-559 (March 2005)

[7] Chikrii, A.A., Prokopovich P.V., Pursuit and evasion problem for interacting
groups of moving objects , Cybernetics and Systems Analysis, Volume 25, Num-
ber 5, pp. 634-640 (1990)

[8] Colman, W.J.A., A Curve of Pursuit, Bulletin of the Institute of Mathematics


and Its Application 27 (3), pp.45-47 (March 1991)

[9] Dreyfus, S.E., Dynamic Programming and the Calcuclus of Variations, Aca-
demic Press, NY (1965)

[10] Gamkrelidze, R.V., Discovery of The Maximum Principle in Optimal Control,


Mathematics and War (Eds.) VIII, Springer, NY, pp. 160-173 (2003)

105
References

[11] Gardner, M., Mathematical Games, Scientific American Columns, Mathematical


Carnival, NY, (1975)
[12] Hájek, O., Pursuit Games, Academic Press, NY (1975)
[13] Ioffe, A.D., Tihomirov V.M., Theory of Extremal Processes, North Holland Pub-
lishing Company, NY (1979)
[14] Isaacs, R., Differential Games, Dover Publications, NY (1999)
[15] Kamien Morton I., Schwartz Nancy L., Dynamic Optimization: The Calculus of
Variations and Optimal Control in Economics and Management, Series Volume
4, North-Holland Publishing Company, NY (1981)
[16] Koo, D., Elements of Optimization With Applications in Economics and Busi-
ness, Springer, NY (1977)
[17] Krasovskii, N.N, Subbotin, A.I., Game-Theoretical Control Problems, Springer,
NY (1988)
[18] Lawrence J. Dennis, A catalog of special plane curves, Dover Publications, NY
(1972)
[19] Liberzon D., Switching in Systems and Control, Birkhäuser, Boston (2003)
[20] Nahin, Paul J., The Mathematics of Pursuit and Evasion, Princeton University
Press, Princeton, New Jersey (2007)
[21] Pierre, Donald A., Optimization Theory with Applications, Dover Publications,
NY (1986)
[22] Pontryagin, L.S., Boltyanskii, V.G. Gamgrelidze, R.V., Mishchenko, E.F., The
Mathematical Theory of Optimal Processes, The Macmillian Company, NY
(1964)
[23] Pontryagin, L.S., Boltyanskii, V.G., Gamkrelidze, R.V., and Mishchenko, E.F.,
L.S.Pontryagin, Selected Works, Volume 1, Selected Research Papers Gordon
and Breach Science Publishers, NY (1986)
[24] Pontryagin, L.S., Boltyanskii, V.G., Gamkrelidze, R.V., and Mishchenko, E.F.,
L.S.Pontryagin, Selected Works, Volume 4, The Mathematical Theory of Opti-
mal Processes Gordon and Breach Science Publishers, NY (1986)
[25] Pshenichnyi B. N., Shishkina, N. B., Pursuit problems with two moving objects,
Cybernetics and Systems Analysis, Volume 25, Number 4, Springer, NY, pp.
464-471 (2005)

106
References

[26] Puckette, C.C., The Curve of Pursuit, The Mathematical Gazette, Volume 37,
pp. 256-260 (1953)

[27] Ruckle, W.H., Geometric Games of Search and Ambush, Mathematics Maga-
zine, Volume 52, Number 4, pp. 195-206 (September 1979)

[28] Saaty, Thomas L., Alexander, Joyce M., Thinking with Models, Pergamont
Press, NY (1981)

[29] Schuurman, W., Lodder, J., The Beauty, the Beast, and the Pond, Mathematics
Magazine, Volume 47, Number 2, pp. 91-93 (March-April 1974)

[30] Young, L.C., Lectures on the Calculus of Variations and Optimal Control The-
ory, W.B. Saunders Company, PA (1969)

[31] Zermelo, E., Über das Navigationsproblem bei ruhender oder veränderlicher
Windverteilung, Zeitschr. f. angew. Math. u. Mech. 11, pp. 114-124 (1931)

[32] Zhukovskiy, V.I., Salukvadze, M.E., The Vector-Valued Maximum, Academic


Press, Princeton, San Diego (1993)

107

You might also like