Introduction To The Mathematics of Variation: January 2021
Introduction To The Mathematics of Variation: January 2021
net/publication/348390908
CITATIONS READS
0 6,337
1 author:
Taha Sochi
Independent author
128 PUBLICATIONS 2,088 CITATIONS
SEE PROFILE
All content following this page was uploaded by Taha Sochi on 06 July 2024.
1
Contents
Preface 1
Table of Contents 2
Nomenclature 4
1 Preliminaries 6
1.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 The Calculus of Variations and the Variational Principle . . . . . . . . . . . . . . . . . . . 7
1.3 Functionals in the Calculus of Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 The Euler-Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Variational Problems with Higher Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.6 Variational Problems with Multiple Independent Variables . . . . . . . . . . . . . . . . . . 38
1.7 Variational Problems with Multiple Dependent Variables . . . . . . . . . . . . . . . . . . . 44
1.8 Variational Problems with Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.9 Variational Problems with Variable Boundaries . . . . . . . . . . . . . . . . . . . . . . . . 53
1.10 Variational Problems of Mixed Nature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
1.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2 Optimal Curves 66
2.1 Geodesic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.2 Fastest Descent Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.3 The Catenary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.4 Isoperimetric Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2
9 Numerical Methods 216
References 239
Index 240
3
Nomenclature
In the following list, we define the common symbols, notations and abbreviations which are used in the
book as a quick reference for the reader. The list may exclude what is used locally and casually.
4
v speed
vw speed of wave
v velocity
V volume
x, y, z coordinates of 3D Euclidean space (usually orthonormal Cartesian)
yi the ith 1D Rayleigh-Ritz approximation
y1 , · · · , yn y values of function points in 1D finite difference method[1]
y (i) the ith derivative of y (i.e. y (i) = di y/dxi )
yji partial derivative of yj with respect to xi (i.e. yji = ∂yj /∂xi )
yx1 partial derivative of y with respect to x1 (i.e. yx1 = ∂y/∂x1 )
zmn the mnth 2D Rayleigh-Ritz approximation
z11 , · · · , zmn z values of function points in 2D finite difference method
Γ curve
λ, λ1 , · · · , λn Lagrange multipliers, eigenvalues
λm , λM minimum, maximum eigenvalue of Sturm-Liouville problem
µ linear mass density
ρ, φ polar coordinates of plane
ρ, φ, z cylindrical coordinates of 3D Euclidean space
σ area
τ tension force
φ0 , · · · , φn basis functions in 1D Rayleigh-Ritz method[2]
Φ gravitational potential
ω angular speed
Ω set of functions, domain of multi-variate function
[1] In fact, these y values (as well as the upcoming z values z11 , · · · , zmn ) are supposed to be approximates of the extremizing
function at these points.
[2] In fact, we should also have φ , φ
0 11 · · · , φmn basis functions for the 2D Rayleigh-Ritz method but we did not use this
notation in this book.
5
Chapter 1
Preliminaries
In this chapter we provide an outline of the calculus of variations and its basic principles. In the subse-
quent chapters we will investigate various categories of variational problems using different methods and
approaches which are mostly based on what is outlined in this chapter. However, before we start we make
a few introductory remarks of general nature.
6
1.2 The Calculus of Variations and the Variational Principle 7
different Problems for these secondary items for the purpose of distinction and to avoid potential confusion.
We also deliberately diversified some secondary symbols and notations in some cases for the purpose of
training the reader to deal with different symbols and notations so that he acquires the ability and skill to
capture and recognize the essential mathematical forms and patterns regardless of the specific symbols and
notations used to express and formulate these forms and patterns in the individual mathematical problems.
Anyway, the reader can always consult the Nomenclature for providing consistent (although potentially
generic, tentative and non-comprehensive) explanation of the used symbols if no such explanation is
provided locally.
• The Problems in the chapters and sections represent just a small sample of the categories represented
by these chapters and sections. We did our best to make the selected sample representative of the entire
category so that it reflects the main aspects and the most important features of that category (within the
restrictions on the book size and scope).
• For optimal categorization of the Problems and to ensure better structures, we may refer in some cases
to Problems or items (e.g. graphs) in later parts of the book. However, we ensured that there is no
ambiguity or difficulty in understanding and appreciating the indicated content.
• In this book we are mainly interested in optimization problems and techniques regardless of the nature
of the optimum and if it is maximum or minimum (and whether it is local or global).[3] Therefore, we
generally do not investigate the nature of the optimal technically (e.g. by investigating the signs of the
derivatives) although we usually provide short remarks based on intuitive and simple (but not rigorous or
technical) arguments about the nature of the obtained optimal. In fact, we even ignored the investigation
of the possibility of having non-optimal stationary solutions (e.g. inflection or saddle points) in some
cases.[4]
• The axes of most plots in this book are not scaled equally. Also, the purpose of the figures in this book
is to demonstrate and outline the main features and settings of interest in the concerned problems, and
hence these figures may not be realistic in their shapes and dimensions.
functional (see § 1.3). We should also note that although “extremum” suggests maximum or minimum, it is used in some
texts of variational calculus to include even inflection (or stationary) values.
1.3 Functionals in the Calculus of Variations 8
at an extremum the function should change its trend (i.e. from increasing to decreasing or from decreasing
to increasing) and hence at the extremum the function should cease to vary abruptly. In other words, a
positive/negative variation trend followed by a negative/positive variation trend should be separated by
zero variation.
As we will see, optimization problems generally require the methods and techniques of the calculus
of variations for their solution. However, some simple optimization problems can be solved by ordinary
calculus with no need for the calculus of variations.[6] But even these simple problems usually require
a variational argument (which is based ultimately on the variational principle) for their solution (see for
example the Problems of § 2.3 and § 5). In fact, some of these simple problems can also be solved using
the formal techniques of the calculus of variations although this usually incurs an extra cost and hence it
should be avoided (unless it is needed for legitimate purposes).
Problems
1. What is the main objective of the calculus of variations?
Answer: The main objective is to find and develop mathematical methods and techniques for opti-
mization of functionals (i.e. finding their maximums and minimums).
2. In the text we described the calculus of variations as a branch of mathematics whose purpose is to find
methods and techniques for optimizing functionals. Comment on optimizing.
Answer: To be more general and inclusive we should replace “optimizing” (or extremizing) with
“finding stationary values” (or what we may call “stationarizing”). However, optimization is the main
objective in the investigations of this subject and hence almost all the problems of the calculus of
variations are related to optimization. In fact, most problems in the calculus of variations are related
to minimization.
Note: in this book we commonly use terms like extremizing or optimizing to mean stationarizing
and we rely on this understanding (noting that extremizing and optimizing are more common in use
and these cases are more frequent in occurrence in the real life and hence they are the main focus of
investigation).
where I is a functional of the function y ≡ y (x), y 0 = dy/dx, y (x1 ) = C1 and y (x2 ) = C2 (with C1 and C2
being given numbers). The notation [y] indicates that the variation and extremization of I is essentially
determined by the form of y while the notation (x, y, y 0 ) indicates the dependencies of F (which is the
integrand). We should also note that the choice of I to symbolize the functional is to indicate the fact
that the functional is an integral (and hence we usually call I in this book the functional integral). To
summarize the essence of Eq. 1 we can say: the functional I whose variation/optimization depends on
the nature of y (which is a function of x) is the integral of F (which is a function of x, y, y 0 ) over the real
interval [x1 , x2 ].
It should be noted[7] that in the formulations and applications of the Euler-Lagrange equation (which is
the essence of the calculus of variations and is based on the functional of Eq. 1 as will be investigated next)
the symbols x, y, y 0 are treated as if they are representing variables that are independent of each other
[6] Infact, some may be solved even by simple arithmetic and algebraic techniques based on simple intuitive arguments.
[7] Infact, this note applies to the form of Eq. 1 (i.e. single independent and dependent variables). For other forms (e.g. the
form of Eq. 14) more details are required (as will be seen in the upcoming chapters, sections, Problems and applications).
1.3 Functionals in the Calculus of Variations 9
(although in reality y is usually dependent on x and y 0 is usually dependent on x and possibly y) and this
is reflected in taking partial derivatives with respect to these variables (as will be seen in the upcoming
examples and applications). It is also important to note that the symbols x, y, y 0 should be seen as generic
symbols and hence they are not necessarily representing variables in Cartesian systems. For example,
these symbols can represent φ, ρ, ρ0 in 2D polar coordinates or φ, z, z 0 in 3D cylindrical coordinates (where
the prime represents d/dφ) or t, x, ẋ in mechanics (where t is time, x is spatial coordinate and ẋ ≡ dx/dt).
Problems
1. Give a brief symbolic definition for “functional” as a mapping.
Answer: A functional φ is a function defined by the mapping φ : Ω → R where Ω is a set of functions
and R is the set of real numbers.
2. Make a simple comparison between functions and functionals as mapping devices.
Answer: Functions map numbers to numbers while functionals map functions to numbers.
3. Make a simple comparison between optimization of functions and optimization of functionals consid-
ering the number of variables involved in the optimization process.
Answer: The first is an optimization of functions of finite number of variables while the second can
be seen as an optimization of functions of infinite number of variables.
4. Compare the functional notation I[y] to the function notation f (x) and outline the significance of
each notation. Also, outline the difference between functional variation (in functional relations) and
function variation (in function relations) at stationary values.
Answer: The use of square brackets in I[y] is to distinguish functional dependency (i.e. the functional
I depends on the function y) from function dependency in f (x), i.e. the function f depends on the
independent variable x. So, in I[y] the value of I depends on the form of y on the entire interval
[x1 , x2 ] while in f (x) the value of f at a certain point depends on the value of x at that point.
In functional relations the variation of y anywhere in the interval [x1 , x2 ] will disturb the stationary
value of the functional (and hence the stationary value of the functional is obtained from a single form
of y over the entire interval), while in function relations the variation of x away from the immediate
neighborhood of the stationary point does not disturb the stationary value of the function (i.e. the
function and its stationary point have only local dependency).
5. Briefly explain the objective in the problems investigated by the calculus of variations and how they
are dealt with.
Answer: The objective is to find the form of a function y = f (x) such that a definite integral I of a
given expression F that involves y and its derivative y 0 ≡ dy/dx is extremum.
To deal with these problems we start by forming the integral:
ˆ x2
I [y] = F (x, y, y 0 ) dx
x1
where F is formulated according to the description and statement of the problem. We then use the
techniques of the calculus of variations to find the solution y. Any unknown parameters in the solution
may then be determined by using the given boundary conditions and constraints. The main step in
the solution technique of the calculus of variations is the formation of the Euler-Lagrange equation
from the obtained F where this equation (in its various shapes and forms depending on the nature of
the problem and the form of F ) is solved subject to the given boundary conditions and constraints to
obtain the final solution.
6. What is the difference between the variational problems in ordinary calculus and the variational prob-
lems in the calculus of variations.
Answer: In ordinary calculus the variational problems are about finding points (xi , yi ) that extremize
(or stationarize) a given function y = f (x), while in the calculus of variations the variational problems
are about finding functions y(x) (represented by curves) that extremize (or stationarize) a given func-
tional I [y]. So, in ordinary calculus we are looking for points while in the calculus of variations we are
looking for curves (represented by functions).[8] This is demonstrated schematically in Figure 1.
[8] In other words, in ordinary calculus (the form of) y is known (e.g. y = x2 + 1) but its extremum (or stationary) points are
1.4 The Euler-Lagrange Equation 10
M
B
m A
Figure 1: A schematic illustration of the difference in the nature of the variational problems in ordinary
calculus and in the calculus of variations where on the left (representing ordinary calculus) the objective
is to find the extremal points (M and m) on the curve, while on the right (representing the calculus
of variations) the objective is to find the extremal curve (solid) among all other curves (dashed) in its
immediate neighborhood that connect two given points (A and B). See Problem 6 of § 1.3.
where F (which is a function of x, y, y 0 ) is the integrand of a functional integral I[y] whose varia-
tion/optimization depends on y (which is a function of x), and y 0 is the derivative of y with respect
to x (i.e. y 0 = dy/dx). It can be shown (see Problem 3) that the functional given by Eq. 1 (see § 1.3)
possesses stationary values obtained by solving the above Euler-Lagrange equation (i.e. Eq. 2) where
“solving” means obtaining the form of y(x). So in brief, the objective of the variational problems in
the calculus of variations (as formulated by the Euler-Lagrange equation) is to find the function y that
extremizes (i.e. minimizes or maximizes) the functional I.[10]
The Euler-Lagrange equation is simplified
in the following cases:
I. If F does not depend explicitly on x i.e. F ≡ F (y, y 0 ) the Euler-Lagrange equation (Eq. 2) will
reduce to the following form:[11]
∂F
F − y0 0 = C (3)
∂y
where C is a constant. This equation is called the Beltrami
identity (see Problem 5).
II. If F does not depend explicitly on y i.e. F ≡ F (x, y 0 ) then ∂F∂y = 0 and hence Eq. 2 takes the
unknown and hence we are required to find these points, while in the calculus of variations (the form of) y that extremizes
(or stationarizes) I is unknown (e.g. whether y = x2 + 1 or y = ex or y = sin x, etc.) and hence we are required to find
(the form of) y that extremizes I (so that a function of any other form or a function that is obtained by a slight variation
of y in its neighborhood will make I deviate from its extremum value).
[9] In fact, a more general objective of the equation is to search for stationary points (although they are usually maximums
or minimums).
[10] To be more general, we may need to replace “extremize” with “stationarize” (i.e. find stationary values). However, we do
a first integral of the Euler-Lagrange equation because only one integration is needed to obtain the solution.
1.4 The Euler-Lagrange Equation 11
following form:
d ∂F ∂F
=0 and hence =C (4)
dx ∂y 0 ∂y 0
where C is a constant. This equation also applies when F depends explicitly on y 0 only.[12]
III. If F does not depend explicitly on y 0 i.e. F ≡ F (x, y) then ∂y
∂F
0 = 0 and hence Eq. 2 reduces to the
following form:
∂F
=0 (5)
∂y
This equation also applies when F depends explicitly on y only.[13]
It is important to note that using these simplified forms is not mandatory although they usually (but not
necessarily) make the solution easier. Accordingly, in the future we will use these simplified forms if they
are convenient. We should also note that the full form and the simplified forms usually lead to different
forms of equations even after simplification although in principle this should not affect the final solution.
We should finally draw the attention to the following remarks:
1. The Euler-Lagrange equation may also be given in the following forms:
∂F d ∂F
− F − y0 0 = 0 (6)
∂x dx ∂y
∂F ∂2F ∂2F ∂2F
− 0
− y0 0
− y 00 02 = 0 (7)
∂y ∂x∂y ∂y∂y ∂y
These forms will be investigated in Problems 6 and 7.
2. The Euler-Lagrange equation is a necessary, but not sufficient, condition for extremizing the functional
I and hence further investigation to determine the nature of the solution is generally required. However,
in most practical variational problems (especially in physics) the required extremizing solution is usually
found by just solving the Euler-Lagrange equation with no need for further investigation. In fact, in
many cases the nature of the solution (as being minimizing or maximizing or even being inflection)
can be easily inferred from the nature of the problem using non-formal arguments and simple intuitive
considerations (such as geometric or physical considerations).
Regarding the practical side of this issue, formal identification (by using technically rigorous tools and
methods) of the nature of the obtained solution is generally difficult and requires obtaining and testing
derivatives of various orders. Therefore, in this book we generally avoid going through these messy
details contenting ourselves with just obtaining the solution (and possibly relying on the context and
other intuitive considerations to identify the nature of the solution).
3. A solution of the Euler-Lagrange equation may be called an extremal (or extremal function or extremal
curve) which is inline with the nature of most solutions of the variational problems in the calculus of
variations (although the previous remark should be taken into consideration).
4. If the integrand F is a total derivative (with respect to x) of a given function of x and y, say φ(x, y),
then the Euler-Lagrange equation is satisfied identically, i.e. any function (of any form) that satisfies
the given boundary conditions will satisfy the Euler-Lagrange equation (see part m of Problem 9).
The reason is that in this case the value of the functional integral is solely determined by the value
of φ at the boundaries (and hence any function φ that satisfies the given boundary conditions should
“optimize” the functional integral regardless of the form of φ), that is:
ˆ x2 ˆ x2
0 dφ x
I [y] = F (x, y, y ) dx = dx = [φ]x21 = φ x2 − φ x1
x1 x1 dx
[12] In ∂F 0
fact, in this case ∂y 0 is independent of x and y and this should lead to a simpler equation (i.e. y = constant) that
can be used instead of Eq. 4.
[13] In fact, in this case ∂F is independent of x and y 0 and this should lead to a simpler equation (i.e. y = constant) that
∂y
can be used instead of Eq. 5.
1.4 The Euler-Lagrange Equation 12
An implication of this is that adding a term (or terms) that is a total derivative of some function to the
integrand F will not change the Euler-Lagrange equation and this could lead to simplification in some
cases where the additional term(s) can be ignored in the derivation of the Euler-Lagrange equation of
the given variational problem (see for example part h of Problem 10 or part j of Problem 11).
5. As indicated earlier (see § 1.3), the symbols x, y, y 0 in the Euler-Lagrange equation are treated as
if they are representing variables that are independent of each other. This means that the partial
derivatives in the Euler-Lagrange equation operate on the explicit (but not implicit) dependencies of
these variables on each other. For example, ∂y/∂x = 0 according to this type of partial differentiation
even though y is a function of x. However, there are some exceptions to this rule where in some cases
partial derivatives operate on both the explicit and implicit dependencies (see § 1.6). In brief, when we
deal with the Euler-Lagrange equation (in its different forms and flavors) we have two types of partial
differentiation where in one type (which is the common type) only explicit dependencies are considered
while in another type (which is the exceptional type) both the explicit and implicit dependencies are
considered. So, we should be careful about the interpretation and treatment of partial derivatives in
the applications of variational calculus.
6. As indicated earlier (see § 1.3), the symbols x, y, y 0 in the Euler-Lagrange equation should be seen as
generic symbols and hence they are not necessarily representing variables in Cartesian systems.
7. The term “Euler-Lagrange equation” may be used to refer to the Euler-Lagrange equation in its general
form (i.e. as given by Eq. 2 and its equivalent and reduced forms as well as its upcoming generalized and
extended forms) and may be used to refer to the equation obtained from applying the Euler-Lagrange
equation in its general form to a particular problem and hence the Euler-Lagrange equation in this
case is an instantiation of the Euler-Lagrange equation in its general form (i.e. it is the Euler-Lagrange
equation for that particular problem). The meaning should be obvious from the context.
Problems
1. Describe in plain terms a simple typical variational problem and how it is formulated and solved.
Answer: Suppose we have a functional I which usually depends on x and y(x) as well as the derivative
y 0 = dy/dx, that is:
I = φ (x, y, y 0 ) (8)
This functional is in the form of an integral and its optimization (or stationarization to be more
general) depends on the form of the function y assuming that y is fixed at its two end points A(x1 , y1 )
and B(x2 , y2 ), i.e. y(x1 ) = y1 = C1 and y(x2 ) = y2 = C2 . Moreover, we are interested in optimizing
this functional by finding the form of y that achieves this optimization. Accordingly, we modify the
notation of Eq. 8 (to make it more explicit and suggestive) to the following:
ˆ x2
I [y] = F (x, y, y 0 ) dx (9)
x1
where the notation I[y] is to suggest that the required optimization of I depends on the form of y
(which we are looking for) and where F is the integrand (which depends on x, y, y 0 ). In brief, Eq. 9
suggests that if we use a certain function y (which we are looking for) in the integrand F then the
integral I will be optimal. So, our objective now is to find this special (or optimizing) function y(x).
To find this optimizing y we form the following Euler-Lagrange equation:
∂F d ∂F
− =0
∂y dx ∂y 0
using F that we formulated in Eq. 9. We then solve this equation (which in general is a second
order differential equation) to find the optimizing solution y. Finally, the solution of a second order
differential equation contains two unknown constants (as a result of two integrations) and hence we
need the given boundary conditions (i.e. the fixed values of y at the two end points) to find the specific
solution to our variational problem.
Note: as indicated in the question, the above description belongs to a simple typical variational
1.4 The Euler-Lagrange Equation 13
problem and hence it can be subject to extensions, generalizations and restrictions in different and
more complex situations (as will be investigated later on).
2. State some general properties of the Euler-Lagrange equation.
Answer: For example:
• This equation in its various shapes and forms (as investigated earlier and will be investigated further
later on) is the main pillar of the mathematics of variation (and specifically the calculus of variations).
• The existence and uniqueness of solution of this equation is not guaranteed in general and hence
there may not be a solution, or there is only one solution, or there are multiple (finite or infinite)
solutions.
• The Euler-Lagrange equation is a necessary, but not sufficient, condition for the existence of optimal
solution(s) even if solution(s) to this equation do exist. Accordingly, any obtained solution to this
equation should be inspected and assessed to determine its nature and if it satisfies the requirements
and meets the objectives (e.g. searching for a minimum). In this regard, mathematical and non-
mathematical (e.g. physical) considerations should be taken into account.
3. Derive the Euler-Lagrange equation using the variational principle.
Answer: In the following we outline a rather simple derivation method of the Euler-Lagrange equation.
We start from the functional of Eq. 1, that is:
ˆ b
I [y] = F (x, y, y 0 ) dx
a
where the values of y at the two end points are fixed (noting that x1 = a and x2 = b are given
constants). Now, if y is perturbed slightly to y + δy (where δy is a tiny change or variation in y) then
I should also be perturbed to I + δI where δI is given by:[14]
δI = I [y + δy] − I [y]
ˆ b ˆ b
0 0
= F (x, y + δy, y + δy ) dx − F (x, y, y 0 ) dx
a a
ˆ b ˆ b
0 ∂F ∂F 0
= F (x, y, y ) + δy + 0 δy dx − F (x, y, y 0 ) dx
a ∂y ∂y a
ˆ b
∂F ∂F
= F (x, y, y 0 ) + δy + 0 δy 0 − F (x, y, y 0 ) dx
a ∂y ∂y
ˆ b
∂F ∂F
= δy + 0 δy 0 dx
a ∂y ∂y
ˆ b ˆ b
∂F ∂F 0
= δy dx + 0
δy dx
a ∂y a ∂y
ˆ b b ˆ b
∂F ∂F d ∂F
= δy dx + δy − δy dx
a ∂y ∂y 0 a a dx ∂y 0
b ˆ b
∂F ∂F d ∂F
= δy + δy − δy dx
∂y 0 a a ∂y dx ∂y 0
where in line 3 we use first order Taylor expansion, and in line 7 we integrate by parts the second term
of line 6. Now, since the two end points are fixed (and hence δy = 0) the first term in the last line is
zero and hence:
ˆ b ˆ b
∂F d ∂F ∂F d ∂F
δI = δy − δy dx = − δy dx
a ∂y dx ∂y 0 a ∂y dx ∂y 0
[14] Theuse of δ here to symbolize variation is common (and rather conventional although not compulsory) and should
indicate the difference between functional variation and function variation where in the latter d or ∂ are usually used to
symbolize variation (see Problem 4 of § 1.3).
1.4 The Euler-Lagrange Equation 14
By the variational principle, I should be stationary at its extremum and hence δI = 0 for all possible
tiny variations δy in y (i.e. δy is arbitrary). This is true iff
∂F d ∂F
− =0
∂y dx ∂y 0
which is the Euler-Lagrange equation. It should be understood that the last equation applies over the
entire interval (except possibly at a few isolated points).
4. Referring to the simplified forms of the Euler-Lagrange equation (as given by Eqs. 3-5), it is common
in the literature to say F in these cases is independent of (or does not depend on) x or y or y 0 . Clarify
this issue.
Answer: This saying means that these variables do not appear explicitly in the expression of F
although F is generally dependent on these variables implicitly.
5. Prove the Beltrami identity (Eq. 3).
Answer: By the chain rule we have:
dF ∂F dx ∂F dy ∂F dy 0
= + + 0
dx ∂x dx ∂y dx ∂y dx
∂F ∂F 0 ∂F dy 0
= + y + 0
∂x ∂y ∂y dx
∂F 0 ∂F dy 0
= y + 0
∂y ∂y dx
where the last line is justified by the fact that ∂F/∂x = 0 because F is supposed to be independent of
x (i.e. F doesnot depend explicitly on x). Now, by the Euler-Lagrange equation (i.e. Eq. 2) we have
∂F d ∂F
∂y = dx ∂y 0 and hence on substituting from this into the last line we get:
dF d ∂F 0 ∂F dy 0
= y +
dx dx ∂y 0 ∂y 0 dx
dF d ∂F 0
= y (product rule)
dx dx ∂y 0
dF d ∂F 0
− y = 0
dx dx ∂y 0
d ∂F 0
F − 0y = 0
dx ∂y
∂F
F − y0 0 = C
∂y
Note: the Beltrami identity (Eq. 3) can be obtained more simply from the other form of the Euler-
Lagrange equation (i.e. Eq. 6) by setting ∂F/∂x = 0. However, we followed the above method for
more practice.
6. Derive Eq. 6 from Eq. 2.
Answer: We have:
dF ∂F dx ∂F dy ∂F dy 0 ∂F ∂F 0 ∂F 00
= + + 0 = + y + 0y (chain rule)
dx ∂x dx ∂y dx ∂y dx ∂x ∂y ∂y
0
d ∂F dy ∂F 0 d ∂F 00 ∂F 0 d ∂F
y0 0 = + y = y + y (product rule)
dx ∂y dx ∂y 0 dx ∂y 0 ∂y 0 dx ∂y 0
Now, if we subtract the first line from the second line we get:
d ∂F dF d ∂F ∂F ∂F 0
y0 0 − = y0 − − y
dx ∂y dx dx ∂y 0 ∂x ∂y
1.4 The Euler-Lagrange Equation 15
∂F d ∂F dF d ∂F ∂F 0
+ y0 0 − = y0 0
− y
∂x dx ∂y dx dx ∂y ∂y
∂F d ∂F ∂F d ∂F
− F − y0 0 = −y 0 −
∂x dx ∂y ∂y dx ∂y 0
∂F d ∂F
− F − y0 0 = 0
∂x dx ∂y
where in the last line we used Eq. 2.
Note: from the above derivation we can see that Eq. 6 is based on Eq. 2 (since we used Eq. 2 in
the derivation of Eq. 6). However, strictly speaking Eq. 6 is not another form of the Euler-Lagrange
equation (as we described it in the text) although it should be equivalent to it.
7. Show that the Euler-Lagrange equation may be given by the following form (see Eq. 7):
∂ 3 ∂2 3 0 ∂
2 3 00 ∂
2
x y + y 02 − x y + y 02
− y x y + y 02
− y x 3
y + y 02
= 0
∂y ∂x∂y 0 ∂y∂y 0 ∂y 02
1.4 The Euler-Lagrange Equation 16
∂ ∂ ∂
x3 − [2y 0 ] − y 0 [2y 0 ] − y 00 0 [2y 0 ] = 0
∂x ∂y ∂y
x3 − 0 − 0 − 2y 00 = 0
x3 − 2y 00 = 0
So, the three equations are equivalent (at least in this case).
Note: the three equations will lead to the same result even if y 0 = 0 because in this case they will all
lead to x3 = 0 (at least as a possibility).
9. Obtain the Euler-Lagrange equations for the following variational integrands:
(a) F (x, y, y 0 ) = xy 4 .[15]
(b) F (x, y, y 0 ) = xy 03 .
(c) F (x, y, y 0 ) = yy 02 .
(d) F (x, y, y 0 ) = xyy 0 .
(e) F (x, y, y 0 ) = ex y 3 /y 0 .
√
(f ) F (t, x, ẋ) = x 1 + ẋ.[16]
(g) F (t, θ, θ̇) = t sin θ.
(h) F (x, y, y 0 ) = yp2 ey .
(i) F (x, y, y 0 ) = 1 + y 02 .
(j) F (x, y, y 0 ) = y 02 − y 2.
(k) F (x, y, y 0 ) = y 2 − 1 /y 02 .
p
(l) F (x, y, y 0 ) = y 02 / y 2 + y 02 .
(m) F (x, y, y 0 ) = x2 y 0 + 2xy.
Answer:
(a) y 0 does not appear in F and hence we can use Eq. 5, that is:
∂F
= 0
∂y
∂
xy 4 = 0
∂y
4xy 3 = 0
3
xy = 0
(b) y does not appear in F and hence we can use Eq. 4, that is:
∂F
= C
∂y 0
∂
xy 03 = C
∂y 0
3xy 02 = C
02
xy = D (D = C/3)
(c) x does not appear in F and hence we can use Eq. 3, that is:
∂F
F − y0 = C
∂y 0
∂
yy 02 − y 0 yy 02 = C
∂y 0
[15] Despite the absence of explicit dependency on y 0 in this expression, we use F (x, y, y 0 ) to highlight the overall dependencies
(whether explicit or implicit). This is also followed in similar examples and similar missing dependencies (unless stated
otherwise to highlight the explicit dependencies only).
[16] We note that the overdot (in this expression and similar expressions) means derivative with respect to t (i.e. d/dt).
1.4 The Euler-Lagrange Equation 17
yy 02 − y 0 (2yy 0 ) = C
02 02
yy − 2yy = C
02
yy = D (D = −C)
(d) We use Eq. 2, that is:
∂F d ∂F
− = 0
∂y dx ∂y 0
∂ 0 d ∂ 0
[xyy ] − [xyy ] = 0
∂y dx ∂y 0
d
xy 0 − (xy) = 0
dx
xy − y − xy 0
0
= 0
y = 0
(e) We use Eq. 2, that is:
∂F d ∂F
− = 0
∂y dx ∂y 0
x 3
∂ e y d ∂ ex y 3
− = 0
∂y y0 dx ∂y 0 y0
3ex y 2 d ex y 3
− − = 0
y0 dx y 02
3ex y 2 ex y 3 3ex y 2 y 0 2ex y 3 y 00
+ + − = 0 (product and chain rules)
y0 y 02 y 02 y 03
x 2
3e y x 3
e y 3e yx 2
2ex y 3 y 00
+ + − = 0
y0 y 02 y0 y 03
x 2 x 3 x 3 00
6e y e y 2e y y
+ 02 − = 0
y0 y y 03
6y 2 y3 2y 3 y 00
0
+ 02 − = 0 (ex 6= 0)
y y y 03
(f ) t does not appear in F and hence we can use Eq. 3 (noting that t, x, ẋ replaces x, y, y 0 ), that is:
∂F
F − ẋ = C
∂ ẋ
√ ∂ √
x 1 + ẋ − ẋ x 1 + ẋ = C
∂ ẋ
√ x
x 1 + ẋ − ẋ √ = C
2 1 + ẋ
√ ẋ
x 1 + ẋ − √ = C
2 1 + ẋ
x (2 + ẋ)
√ = C
2 1 + ẋ
x (2 + ẋ)
√ = D (D = 2C)
1 + ẋ
(g) θ̇ does not appear in F and hence we can use Eq. 5 (noting that t, θ, θ̇ replaces x, y, y 0 ), that is:
∂F
= 0
∂θ
1.4 The Euler-Lagrange Equation 18
∂
(t sin θ) = 0
∂θ
t cos θ = 0
(h) Only y appears in F and hence we can use Eq. 5, that is:
∂F
= 0
∂y
∂
y 2 ey = 0
∂y
2yey + y 2 ey = 0 (product rule)
2
y + 2y = 0 (ey 6= 0)
Hence, y = 0 or y = −2 (which is consistent with footnote [13] since y = constant).
(i) Only y 0 appears in F and hence we can use Eq. 4, that is:
∂F
= C
∂y 0
∂ p 02
1 + y = C
∂y 0
2y 0
p = C
2 1 + y 02
y0
p = C
1 + y 02
y 02 = C 2 + C 2 y 02
r
C2
y0 = ±
1 − C2
which is consistent with footnote [12] since y 0 = constant.
(j) x does not appear in F and hence we can use Eq. 3, that is:
∂F
F − y0 = C
∂y 0
∂
y 02 − y 2 − y 0 0 y 02 − y 2 = C
∂y
y 02 − y 2 − y 0 (2y 0 ) = C
−y 2 − y 02 = C
y 02 + y 2 = D (D = −C)
(k) x does not appear in F and hence we can use Eq. 3, that is:
∂F
F − y0 0 = C
∂y
2 2
y −1 0 ∂ y −1
− y = C
y 02 ∂y 0 y 02
y2 − 1 0 y2 − 1
− y −2 = C
y 02 y 03
y2 − 1 y2 − 1
+ 2 = C
y 02 y 02
y2 − 1
= D (D = C/3)
y 02
1.4 The Euler-Lagrange Equation 19
Dy 02 − y 2 + 1 = 0
(l) x does not appear in F and hence we can use Eq. 3, that is:
∂F
F − y0 = C
∂y 0
! !
y 02 ∂ y 02
p − y0 p = C
y 2 + y 02 ∂y 0 y 2 + y 02
!
y 02 0 2y 0 y 02 (2y 0 )
p −y p − 3/2
= C
y 2 + y 02 y 2 + y 02 2 (y 2 + y 02 )
y 02 2y 02 y 04
p −p + 3/2
= C
y 2 + y 02 y 2 + y 02 (y 2 + y 02 )
y 02 y 04
−p + 3/2
= C
y 2 + y 02 (y 2 + y 02 )
y 02 y 2 + y 02 y 04
− 3/2
+ 3/2
= C
(y 2 + y 02 ) (y 2 + y 02 )
y 2 y 02
− 3/2
= C
(y 2 + y 02 )
3
y 4 y 04 = C 2 y 2 + y 02
This trivial (but true) result is justified by remark 4 in the text because F is a total derivative of
x2 y and hence the Euler-Lagrange equation is satisfied identically. In other words, we do not have a
specific function that can be seen as a solution to the Euler-Lagrange equation since any function that
satisfies the boundary conditions will satisfy the Euler-Lagrange equation. Hence, the Euler-Lagrange
equation is 0 = 0 and its solution is y = any function (satisfying the boundary conditions).
10. Obtain the´Euler-Lagrangeequations for the following functional integrals:
x √
(a) I [y] = x12 y 02 − 4 xy dx.
´ t2
(b) I [x] = t1 tx2 + ẋ2 dt.
´t
(c) I [φ] = t12 (t cos φ + t2 φ̇) dt.
´x
(d) I [y] = x12 y 02 − xy 02 dx.
´t
(e) I [x] = t12 t2 ẋ − x2 dt.
´ x2 02
(f ) I [y] = x1 y − yeax + y 2 dx (where a is a constant).
´ x2 03
(g) I [y] = x1 y dx.
´x
(h) I [y] = x12 y 02 + y 0 y − y 2 dx.
´t
(i) I [x] = t12 tẋ2 − 2x − xẋ dt.
1.4 The Euler-Lagrange Equation 20
´ x2
(j) I [y] = xy + yy 0 + y 2 + 2y 2 y 0 dx.
´ xx1 p
(k) I [y] = x12 1 − y 02 dx.
´ x yy02
(l) I [y] = x12 1+yy 0 dx.
Answer:
√
(a) Comparing this functional to the functional of Eq. 1, we see that F (x, y, y 0 ) = y 02 − 4 xy and
hence the Euler-Lagrange equation (i.e. Eq. 2) is:
∂ 02 √ d ∂ 02 √
y − 4 xy − y − 4 xy = 0
∂y dx ∂y 0
4x d
− √ − (2y 0 ) = 0
2 xy dx
2x
− √ − 2y 00 = 0
xy
x
y 00 + √ = 0
xy
(b) Comparing this functional to the functional of Eq. 1 (noting that x, y, y 0 correspond to t, x, ẋ), we
see that F (t, x, ẋ) = tx2 + ẋ2 and hence the Euler-Lagrange equation (i.e. Eq. 2) is:
∂ 2 d ∂ 2
tx + ẋ2 − tx + ẋ2 = 0
∂x dt ∂ ẋ
d
2tx − (2ẋ) = 0
dt
2tx − 2ẍ = 0
ẍ − tx = 0
0
(c) Comparing
this functional to the functional of Eq. 1 (noting that x, y, y correspond to t, φ, φ̇), we
see that F t, φ, φ̇ = t cos φ + t2 φ̇ and hence the Euler-Lagrange equation (i.e. Eq. 2) is:
i
∂ h i d ∂ h
t cos φ + t2 φ̇ − 2
t cos φ + t φ̇ = 0
∂φ dt ∂ φ̇
d 2
−t sin φ − t = 0
dt
t sin φ + 2t = 0
(d) We have F (x, y, y 0 ) = y 02 − xy 02 and since F is independent of y we can use Eq. 4, that is:
∂
y 02 − xy 02 = C
∂y 0
2y 0 − 2xy 0 = C
0
y (1 − x) = D (D = C/2)
(e) We have F (t, x, ẋ) = t2 ẋ−x2 and hence the Euler-Lagrange equation (i.e. Eq. 2 noting that x, y, y 0
correspond to t, x, ẋ) is:
∂ 2 2
d ∂ 2 2
t ẋ − x − t ẋ − x = 0
∂x dt ∂ ẋ
d 2
−2x − t = 0
dt
−2x − 2t = 0
x+t = 0
1.4 The Euler-Lagrange Equation 21
(f ) We have F (x, y, y 0 ) = y 02 − yeax + y 2 and hence the Euler-Lagrange equation (i.e. Eq. 2) is:
∂ 02 ax 2
d ∂ 02 ax 2
y − ye + y − y − ye + y = 0
∂y dx ∂y 0
d
−eax + 2y − (2y 0 ) = 0
dx
−eax + 2y − 2y 00 = 0
eax
y 00 − y + = 0
2
(g) We have F (x, y, y 0 ) = y 03 and hence we can use Eq. 4, that is:
∂y 03
= C
∂y 0
3y 02 = C
02
y = D (D = C/3)
(i) We have F (t, x, ẋ) = tẋ2 − 2x − xẋ and hence the Euler-Lagrange equation (i.e. Eq. 2 noting that
x, y, y 0 correspond to t, x, ẋ) is:
∂ 2 d ∂ 2
tẋ − 2x − xẋ − tẋ − 2x − xẋ = 0
∂x dt ∂ ẋ
d
−2 − ẋ − (2tẋ − x) = 0
dt
−2 − ẋ − (2ẋ + 2tẍ − ẋ) = 0
−2 − ẋ − 2ẋ − 2tẍ + ẋ = 0
−2 − 2ẋ − 2tẍ = 0
tẍ + ẋ + 1 = 0
2
d(x /2)
Note: referring to remark 4 in the text, we note that xẋ = dt and hence it is a total derivative of
x2 /2. We could therefore use the suggested simplification in that remark and obtain the Euler-Lagrange
1.4 The Euler-Lagrange Equation 22
(j) We have F (x, y, y 0 ) = xy + yy 0 + y 2 + 2y 2 y 0 and hence the Euler-Lagrange equation (i.e. Eq. 2) is:
∂ 0 2 2 0
d ∂ 0 2 2 0
xy + yy + y + 2y y − xy + yy + y + 2y y = 0
∂y dx ∂y 0
d
x + y 0 + 2y + 4yy 0 − y + 2y 2 = 0
dx
x + y 0 + 2y + 4yy 0 − y 0 − 4yy 0 = 0
x + 2y = 0
x
y = −
2
Note: referring to remark 4 in the text, we note that yy 0 is a total derivative of y 2 /2 and 2y 2 y 0 is a
total derivative of 32 y 3 . We could therefore use the suggested simplification in that remark and obtain
the Euler-Lagrange equation using F = xy + y 2 , that is:
∂ d ∂
xy + y 2 − xy + y 2
= 0
∂y dx ∂y 0
d
x + 2y − (0) = 0
dx
x + 2y = 0
x
y = −
2
p
0 02
(k) We have F (x, y, y ) = 1 − y and hence the Euler-Lagrange equation (i.e. Eq. 4 because y does
not appear in F ) is:
∂ hp i
0
1 − y 02 = C
∂y
−y 0
p = C
1 − y 02
C2
y 02 =
1 + C2
which is consistent with footnote [12] since y 0 = constant.
yy 02
(l) We have F (x, y, y 0 ) = 1+yy 0 and since F is independent of x we can use Eq. 3, that is:
yy 02 0 ∂ yy 02
− y = C
1 + yy 0 ∂y 0 1 + yy 0
!
yy 02 0 2yy 0 yy 02 (y)
−y − 2 = C
1 + yy 0 1 + yy 0 (1 + yy 0 )
yy 02 2yy 02 y 2 y 03
− + 2 = C
1 + yy 0 1 + yy 0 (1 + yy 0 )
1.4 The Euler-Lagrange Equation 23
yy 02 y 2 y 03
− 0
+ 2 = C
1 + yy (1 + yy 0 )
yy 02 (1 + yy 0 ) y 2 y 03
− 2 + 2 = C
(1 + yy 0 ) (1 + yy 0 )
yy 02
− 2 = C
(1 + yy 0 )
2
yy 02 = D (1 + yy 0 ) (D = −C)
11. Find the extremizing (or stationarizing) functions of the following functional integrals and verify the
results: ´x
(a) I [y] = x12 x3 y 02 dx.
´t
(b) I [x] = t12 x2 + ẋ2 dt.
´x p
(c) I [y] = x12 x 1 − y 02 dx.
´x
(d) I [y] = x12 y 02 − xy dx.
´θ
(e) I [r] = θ12 12 r02 − cos θ dθ (where the prime means d/dθ).
´ x2 02
(f ) I [y] = x1 y − axy 0 dx (where a is a constant).
´ x2 01/2 1/2
(g) I [y] = x1 y −x dx.
´x
(h) I [y] = x12 y 02 − ky dx (where k is a constant).
´ x2 02
(i) I [y] = x1 y − 2y + 5x dx.
´x
(j) I [y] = x12 y 02 + 2yy 0 − y 2 dx.
´x
(k) I [y] = x12 y 02 + 2yy 0 + 4y 2 dx.
´x p
(l) I [y] = x12 x (1 + y 02 ) dx.
Answer:
(a) Comparing this functional to the functional of Eq. 1, we see that F = x3 y 02 . Now, since F is
independent of y we can use Eq. 4, that is:
∂
0
x3 y 02 = C
∂y
2x3 y 0 = C
(b) Comparing this functional to the functional of Eq. 1 (noting that x, y, y 0 correspond to t, x, ẋ), we
see that F = x2 + ẋ2 . Now, since F is independent of t we can use Eq. 3, that is:
∂
x2 + ẋ2 − ẋ x2 + ẋ2 = C
∂ ẋ
2 2
x + ẋ − ẋ (2ẋ) = C
2 2 2
x + ẋ − 2ẋ = C
2 2
x − ẋ = C
2
ẋ = x2 − C
So, the extremizing function x(t) is given implicitly by the last equation.
To verify the result we show that the last equation is equivalent to the Euler-Lagrange equation, that
is:
p
x2 − C + x = Ee±t
p
ln x2 − C + x = ln E ± t (taking natural logarithm of both sides)
√ x + 1 dx
x2 −C
√ = ±1 (taking derivative of both sides with respect to t)
x2 − C + x dt
√
2
√ x −C
x+
x2 −C dx
√ = ±1
x2 − C + x dt
1
√ ẋ = ±1
x2 − C
1
ẋ2 = 1
x −C
2
ẋ2 = x2 − C
∂ p 02
x 1 − y = C
∂y 0
xy 0
p = −C
1 − y 02
x2 y 02 = C 2 − C 2 y 02
1.4 The Euler-Lagrange Equation 25
C2
y 02 =
x2 + C2
±1
y0 = q
2
(x/C) + 1
x
y = ±C arcsinh +D
C
which is the extremizing function.
To verify the result we substitute from the last equation into the Euler-Lagrange equation, that is:
0
x ±Carcsinh Cx + D ?
q 02 = −C
x
1 − ±Carcsinh C + D
x √ ±1 2
(x/C) +1 ?
s = −C
2
±1
1− √
(x/C)2 +1
h i
1
x2 (x/C) 2
+1 ?
2 = C2
1 − √ ±1 2
(x/C) +1
h i
2 1
x (x/C)2 +1 ?
h i = C2
1
1− (x/C)2 +1
h i
1
x2 (x/C)2 +1 ?
(x/C)2 +1−1
= C2
(x/C)2 +1
2
x ?
2 = C2
(x/C)
C2 = C2
(d) We have F (x, y, y 0 ) = y 02 − xy and hence the Euler-Lagrange equation (Eq. 2) is:
∂ 02 d ∂ 02
y − xy − y − xy = 0
∂y dx ∂y 0
d
−x − (2y 0 ) = 0
dx
−x − 2y 00 = 0
2y 00 + x = 0
To verify the result we substitute from the last equation into the Euler-Lagrange equation, that is:
00
1 ?
2 − x3 + Cx + D + x = 0
12
0
1 ?
2 − x2 + C + 0 + x = 0
4
1 ?
2 − x+0 +x = 0
2
−x + x = 0
(e) We have F (θ, r, r0 ) = 21 r02 − cos θ and since F is independent of r we use Eq. 4 (noting that x, y, y 0
correspond to θ, r, r0 and the prime means d/dθ), that is:
∂ 1 02
r − cos θ = C
∂r0 2
r0 = C
(f ) We have F (x, y, y 0 ) = y 02 − axy 0 and since F is independent of y we can use Eq. 4, that is:
∂
0
y 02 − axy 0 = C
∂y
2y 0 − ax = C
(g) We have F (x, y, y 0 ) = y 01/2 − x1/2 and since F is independent of y we can use Eq. 4, that is:
∂ 01/2 1/2
y − x = C
∂y 0
1 0−1/2
y = C
2
This is the Euler-Lagrange equation which we solve as follows:
y 0−1/2 = 2C
0−1
y = 4C 2
0
y = D (D = 1/[4C 2 ])
y = Dx + E
(h) We have F (x, y, y 0 ) = y 02 − ky and since F is independent of x we can use Eq. 3, that is:
∂
y 02 − ky − y 0 0 y 02 − ky = C
∂y
y 02 − ky − y 0 (2y 0 ) = C
02
−ky − y = C
02
ky + y = D (D = −C)
2
k2 2 k ?
D− (x + E) + − (x + E) = D
4 2
2
k 2 k2 2 ?
D− (x + E) + (x + E) = D
4 4
D = D
(i) We have F (x, y, y 0 ) = y 02 − 2y + 5x and hence the Euler-Lagrange equation (i.e. Eq. 2) is:
∂ 02 d ∂ 02
y − 2y + 5x − y − 2y + 5x = 0
∂y dx ∂y 0
d
−2 − (2y 0 ) = 0
dx
y 00 + 1 = 0
y 00 = −1
0
y = −x + C
x2
y = − + Cx + D
2
which is the extremizing function.
To verify the result we substitute from the last equation into the Euler-Lagrange equation, that is:
00
x2 ?
− + Cx + D +1 = 0
2
0 ?
(−x + C + 0) + 1 = 0
?
(−1 + 0) + 1 = 0
−1 + 1 = 0
(j) We have F (x, y, y 0 ) = y 02 + 2yy 0 − y 2 and hence the Euler-Lagrange equation (i.e. Eq. 2) is:[18]
∂ 02 d ∂ 02
y + 2yy 0 − y 2 − y + 2yy 0
− y 2
= 0
∂y dx ∂y 0
d
2y 0 − 2y − (2y 0 + 2y) = 0
dx
2y 0 − 2y − 2y 00 − 2y 0 = 0
y 00 + y = 0
to remark 4 in the text and noting that 2yy 0 is a total derivative of y 2 we can obtain the Euler-Lagrange
[18] Referring
equation using F = y 02 − y 2 .
1.4 The Euler-Lagrange Equation 29
?
(−a sin x − b cos x) + (a sin x + b cos x) = 0
0 = 0
(k) We have F (x, y, y 0 ) = y 02 + 2yy 0 + 4y 2 and hence the Euler-Lagrange equation (i.e. Eq. 2) is:[19]
∂ 02 0 2
d ∂ 02 0 2
y + 2yy + 4y − y + 2yy + 4y = 0
∂y dx ∂y 0
d
2y 0 + 8y − (2y 0 + 2y) = 0
dx
2y 0 + 8y − 2y 00 − 2y 0 = 0
y 00 − 4y = 0
This is the Euler-Lagrange equation whose solution is:
y = a sinh(2x) + b cosh(2x) (a and b are constants)
which is the extremizing function.
To verify the result we substitute from the last equation into the Euler-Lagrange equation, that is:
00 ?
a sinh(2x) + b cosh(2x) − 4 a sinh(2x) + b cosh(2x) = 0
0 ?
2a cosh(2x) + 2b sinh(2x) − 4 a sinh(2x) + b cosh(2x) = 0
?
4a sinh(2x) + 4b cosh(2x) − 4 a sinh(2x) + b cosh(2x) = 0
0 = 0
p
(l) We have F (x, y, y 0 ) = x (1 + y 02 ) and hence the Euler-Lagrange equation (i.e. Eq. 4 since F has
no explicit dependency on y) is:
p
∂ x (1 + y 02 )
= C
∂y 0
√
y0 x
p = C
1 + y 02
y 02 x = C 2 1 + y 02
C2
y 02 =
x − C2
r
C2
y0 = ± (10)
x − C2
This is the Euler-Lagrange equation whose solution is:
p
y = ±2 C 2 (x − C 2 ) + D
which is the extremizing function.
To verify the result we substitute from the last equation into the Euler-Lagrange equation, that is:[20]
h p i0 r
? C2
±2 C 2 (x − C 2 ) + D = ±
x − C2
√ r
2 C2 ? C2
± √ +0 = ±
2 x − C2 x − C2
r r
C2 C2
± = ±
x−C 2 x − C2
[19] Referring to remark 4 in the text and noting that 2yy 0 is a total derivative of y 2 we can obtain the Euler-Lagrange
equation using F = y 02 + 4y 2 .
[20] In fact, the result can be verified more easily by just differentiating y with respect to x to obtain Eq. 10.
1.4 The Euler-Lagrange Equation 30
12. Find the extremizing (or stationarizing) functions of the following functional integrals as well as the
specific solutions
´x for the given
boundary conditions:
(a) I [y] = x12 y 02 + ky 2 dx with y (x1 = 0) = 0 and y (x2 = 1) = 1 (k > 0 is a constant).
´ x2 02
(b) I [y] = x1 xy − y dx with y (x1 = 0) = 0 and y (x2 = 1) = 12.
´ x2 y02
(c) I [y] = x1 x3 dx with y (x1 = 2) = 1 and y (x2 = 4) = 31.
´ x2 √1+y02 √
(d) I [y] = x1 dx with y (x1 = 0) = 2 and y (x2 = 1) = 1.
´x x
(e) I [y] = x12 y 02 − y 2 + y cosh x dx with y (x1 = 0) = 0 and y (x2 = π/2) = 1.
´ x2 02
(f ) I [y] = x1 y + y 2 − 4y cos x dx with y (x1 = 0) = 1 and y (x2 = π) = 1.
´x
(g) I [y] = x12 y 02 − y 2 − 2xy dx with y (x1 = 0) = 1 and y (x2 = 1) = 2.
´ x2 2
(h) I [y] = x1 y − x2 y dx with y (x1 = 0) = 0 and y (x2 = 1) = 3.
Answer:
(a) Comparing this functional to the functional of Eq. 1, we see that F = y 02 + ky 2 and hence from
the Euler-Lagrange equation (Eq. 2) we have:
∂ 02 d ∂ 02
y + ky 2 − y + ky 2
= 0
∂y dx ∂y 0
d
2ky − (2y 0 ) = 0
dx
2ky − 2y 00 = 0
y 00 − ky
= 0
√ √
So, the solution is y = C cosh( kx) + D sinh( kx) which can be checked by substitution into the
Euler-Lagrange equation, that is:
h √ √ i00 h √ √ i
?
C cosh kx + D sinh kx − k C cosh kx + D sinh kx = 0
h√ √ √ √ i0 h √ √ i
?
kC sinh kx + kD cosh kx − k C cosh kx + D sinh kx = 0
h √ √ i h √ √ i
?
kC cosh kx + kD sinh kx − k C cosh kx + D sinh kx = 0
0 = 0
Note: this solution (with k = 1) is plotted later in the book (see Figures 69, 77 and 78).
(b) We have F = xy − y 02 and hence from the Euler-Lagrange equation (Eq. 2) we have:
∂ 02
d ∂ 02
xy − y − xy − y = 0
∂y dx ∂y 0
d
x− (−2y 0 ) = 0
dx
x + 2y 00 = 0
d2 y x
= −
dx2 2
dy x2
= − +C
dx 4
1.4 The Euler-Lagrange Equation 31
x3
y = − + Cx + D (C and D are constants)
12
Now, from the condition y (x1 = 0) = 0 we get D = 0 while from the condition y (x2 = 1) = 12 we get
C = 145/12 and hence the specific solution is:
−x3 + 145x
y=
12
(c) We have F = y 02 /x3 . Now, since F is independent of y we can use Eq. 4, that is:
∂ y 02
= C
∂y 0 x3
2y 0
= C
x3
dy C 3
= x
dx 2
C 4
y = x +D
8
Now, from the condition y (x1 = 2) = 1 we get 2C + D = 1 while from the condition y (x2 = 4) = 31
we get 32C + D = 31 and hence C = 1 and D = −1. So, the specific solution is:
x4
y= −1
8
Note: this solution
√ is plotted later in the book (see Figures 71 and 76).
1+y 02
(d) We have F = x and because it does not contain y we can use Eq. 4, that is:
"p #
∂ 1 + y 02
= C
∂y 0 x
y0
p = C
x 1 + y 02
y 02 = C 2 x2 1 + y 02
C 2 x2
y 02 =
1 − C 2 x2
Cx
y0 = ± √
1 − C 2 x2
1p
y = ∓ 1 − C 2 x2 + D
C
2 1
x2 + (y − D) =
C2
√ √ 2
Now, from the condition y (x1 = 0) = 2 we get 2 − D = C12 while from the condition y (x2 = 1) =
2 √
1 we get 1 + (1 − D) = C12 and hence C = 1/ 2 and D = 0. So, the specific solution is:
x2 + y 2 = 2
(e) We have F = y 02 − y 2 + y cosh x and hence from the Euler-Lagrange equation (Eq. 2) we have:
∂ 02 d ∂ 02
y − y 2 + y cosh x − y − y 2
+ y cosh x = 0
∂y dx ∂y 0
1.4 The Euler-Lagrange Equation 32
d
−2y + cosh x − (2y 0 ) = 0
dx
−2y + cosh x − 2y 00 = 0
1
y 00 + y − cosh x = 0
2
So, the solution is obviously a combination of sinusoidal and hyperbolic functions (i.e. y = C cos x +
D sin x + 41 cosh x) which can be checked by substitution into the Euler-Lagrange equation, that is:
00
1 1 1 ?
C cos x + D sin x + cosh x + C cos x + D sin x + cosh x − cosh x = 0
4 4 2
0
1 1 1 ?
−C sin x + D cos x + sinh x + C cos x + D sin x + cosh x − cosh x = 0
4 4 2
1 1 1 ?
−C cos x − D sin x + cosh x + C cos x + D sin x + cosh x − cosh x = 0
4 4 2
0 = 0
Now, from the condition y (x1 = 0) = 0 we get C + 14 = 0 while from the condition y (x2 = π/2) = 1
we get D + 41 cosh π2 = 1 and hence C = − 41 and D = 1 − 14 cosh π2 . So, the specific solution is:
1 1 π 1
y = − cos x + 1 − cosh sin x + cosh x
4 4 2 4
Note: this solution is plotted later in the book (see Figure 72).
(f ) We have F = y 02 + y 2 − 4y cos x and hence from the Euler-Lagrange equation (Eq. 2) we have:
∂ 02 d ∂ 02
y + y 2 − 4y cos x − y + y 2
− 4y cos x = 0
∂y dx ∂y 0
d
2y − 4 cos x − (2y 0 ) = 0
dx
2y − 4 cos x − 2y 00 = 0
y 00 − y + 2 cos x = 0
So, the solution is obviously a combination of hyperbolic and sinusoidal functions (i.e. y = C cosh x +
D sinh x + cos x) which can be checked by substitution into the Euler-Lagrange equation, that is:
00 ?
C cosh x + D sinh x + cos x − C cosh x + D sinh x + cos x + 2 cos x = 0
0 ?
C sinh x + D cosh x − sin x − C cosh x + D sinh x + cos x + 2 cos x = 0
?
C cosh x + D sinh x − cos x − C cosh x + D sinh x + cos x + 2 cos x = 0
0 = 0
Now, from the condition y (x1 = 0) = 1 we get C + 1 = 1 while from the condition y (x2 = π) = 1 we
get C cosh π + D sinh π − 1 = 1 and hence C = 0 and D = 2/ sinh π. So, the specific solution is:
2
y= sinh x + cos x
sinh π
Note: this solution is plotted later in the book (see Figure 80).
(g) We have F = y 02 − y 2 − 2xy and hence from the Euler-Lagrange equation (Eq. 2) we have:
∂ 02 d ∂ 02
y − y 2 − 2xy − y − y 2
− 2xy = 0
∂y dx ∂y 0
1.4 The Euler-Lagrange Equation 33
d
−2y − 2x − (2y 0 ) = 0
dx
−2y − 2x − 2y 00 = 0
y 00 + y + x = 0
So, the solution is obviously a combination of sinusoidal and polynomial functions (i.e. y = C cos x +
D sin x − x) which can be checked by substitution, that is:
00 ?
C cos x + D sin x − x + C cos x + D sin x − x + x = 0
0 ?
− C sin x + D cos x − 1 + C cos x + D sin x − x + x = 0
?
− C cos x − D sin x − 0 + C cos x + D sin x − x + x = 0
0 = 0
Now, from the condition y (x1 = 0) = 1 we get C +D×0−0 = 1 while from the condition y (x2 = 1) = 2
we get C cos 1 + D sin 1 − 1 = 2 and hence C = 1 and D = (3 − cos 1)/ sin 1. So, the specific solution
is:
3 − cos 1
y = cos x + sin x − x
sin 1
Note: this solution is plotted later in the book (see Figures 70 and 79).
(h) We have F = y 2 − x2 y and hence from the Euler-Lagrange equation (Eq. 2) we have:
∂ 2 2
d ∂ 2 2
y −x y − y −x y = 0
∂y dx ∂y 0
d
2y − x2 − (0) = 0
dx
2y − x2 = 0
x2
y =
2
Now, the first boundary condition y (x1 = 0) = 0 is satisfied by this solution but the second boundary
condition y (x2 = 1) = 3 is not. Hence, this problem has no solution (i.e. specific solution for the given
boundary conditions).
´t
13. Find the Euler-Lagrange equation that associates the functional I [θ] = t12 θ̇2 − βθ2 dt (with β
being a constant) and investigate its solution.
Answer: We have F (t, θ, θ̇) = θ̇2 − βθ2 and hence the Euler-Lagrange equation (i.e. Eq. 2 noting that
x, y, y 0 correspond to t, θ, θ̇) is:
h i
∂ h 2 i d ∂
θ̇ − βθ2 − θ̇2 − βθ2 = 0
∂θ dt ∂ θ̇
d
−2βθ − 2θ̇ = 0
dt
−2βθ − 2θ̈ = 0
θ̈ + βθ = 0
Regarding its solution, we have three cases:
p θ̈ −
(a) If β < 0 then we have
p |β| θ = 0 whose solution is obviously hyperbolic of the form θ =
a cosh |β| t + b sinh |β| t (with a and b being constants).
(b) If β = 0 then we have θ̈ = 0 whose solution is obviously linear polynomial of the form θ = at + b
(with a and b being constants).
(c) If β > 0 then we have θ̈ + |β| θ = 0 whose solution is obviously sinusoidal of the form θ =
p
p
a cos |β| t + b sin |β| t (with a and b being constants).
1.5 Variational Problems with Higher Derivatives 34
This can be generalized if I depends on derivatives higher than the second (up to the nth derivative) as
well, that is:
ˆ n
x2
(1) (n) ∂F X i d
i
∂F
I [y] = F x, y, y ,··· ,y dx and + (−1) =0 (13)
x1 ∂y i=1
dxi ∂y (i)
where y (1) , y (n) , y (i) are the 1st , nth , ith derivatives of y, i.e. y (1) = dy/dx, y (n) = dn y/dxn and y (i) =
di y/dxi .
Problems
1. What is the Euler-Lagrange equation when:
0 00 00 02
(a) F (x,
y, y , y ) = yy + xy .
(b) F t, θ, θ̇, θ̈ = 2 sin θ + θ̇2 + 3θ̈.
(c) F (x, y, y 0 , y 00 ) = ay 00 − y 02 + cxy (with a and c being constants).
(d) F (s, φ, φ0 , φ00 ) = φ002 + aφ0 − bs2 φ + cs3 (with a, b, c being constants and the prime means d/ds).
Answer:
(a) Using Eq. 12 we have:
∂ 00 02
d ∂ 00 02
d2 ∂ 00 02
yy + xy − yy + xy + 2 yy + xy = 0
∂y dx ∂y 0 dx ∂y 00
d d2
y 00 − (2xy 0 ) + 2 (y) = 0
dx dx
y 00 − (2y 0 + 2xy 00 ) + y 00 = 0
2y 00 − 2y 0 − 2xy 00 = 0
00 0
y (1 − x) − y = 0
2. Obtain the´Euler-Lagrange equations for the following functional integrals and solve them:
x
(a) I [y] = x12 y (2 − y 00 ) dx.
´ x2
(b) I [y] = x1 xy + y 02 − x2 y 00 dx.
´t
(c) I [θ] = t12 2θ̈ + θ̇2 − ω 2 θ2 dt (with ω being constant).
´ x2 00 02 y
(d) I [y] = x1 y + αy − β x dx (with α and β being constants).
Answer:
(a) In this case we have F (x, y, y 0 , y 00 ) = y (2 − y 00 ) and hence the Euler-Lagrange equation (i.e. Eq.
12) is:
∂ 00 d ∂ 00 d2 ∂ 00
[y (2 − y )] − [y (2 − y )] + 2 [y (2 − y )] = 0
∂y dx ∂y 0 dx ∂y 00
d d2
(2 − y 00 ) − (0) + 2 (−y) = 0
dx dx
2 − y 00 − y 00 = 0
1 − y 00 = 0
d2 y
= 1
dx2
On integrating the last equation twice we obtain the solution, that is:
dy
= x+C
dx
1 2
y = x + Cx + D
2
(b) In this case we have F (x, y, y 0 , y 00 ) = xy + y 02 − x2 y 00 and hence the Euler-Lagrange equation (i.e.
Eq. 12) is:
∂ d ∂ d2 ∂
xy + y 02 − x2 y 00 − xy + y 02
− x2 00
y + xy + y 02
− x 2 00
y = 0
∂y dx ∂y 0 dx2 ∂y 00
1.5 Variational Problems with Higher Derivatives 36
d d2
x− (2y 0 ) + 2 −x2 = 0
dx dx
x − 2y 00 − 2 = 0
d2 y 1
= x−1
dx2 2
On integrating the last equation twice we obtain the solution, that is:
dy 1 2
= x −x+C
dx 4
1 3 1 2
y = x − x + Cx + D
12 2
(c) In this case we have F (t, θ, θ̇, θ̈) = 2θ̈ + θ̇2 − ω 2 θ2 and hence the Euler-Lagrange equation (i.e. Eq.
12 noting that x, y, y 0 , y 00 correspond to t, θ, θ̇, θ̈) is:
h i h i
∂ h 2 2 2
i d ∂ 2 2 2 d2 ∂ 2 2 2
2θ̈ + θ̇ − ω θ − 2θ̈ + θ̇ − ω θ + 2 2θ̈ + θ̇ − ω θ = 0
∂θ dt ∂ θ̇ dt ∂ θ̈
d d2
−2ω 2 θ − 2θ̇ + 2 (2) = 0
dt dt
2
−2ω θ − 2θ̈ + 0 = 0
2
θ̈ + ω θ = 0
So, the solution is θ = C cos (ωt) + D sin (ωt) (with C and D being constants) as can be checked by
substitution into the Euler-Lagrange equation, that is:
d2 ?
2
C cos (ωt) + D sin (ωt) + ω 2 C cos (ωt) + D sin (ωt) = 0
dt
d ?
− ωC sin (ωt) + ωD cos (ωt) + ω 2 C cos (ωt) + D sin (ωt) = 0
dt
2 ?
−ω C cos (ωt) − ω 2 D sin (ωt) + ω 2 [C cos (ωt) + D sin (ωt)] = 0
0 = 0
(d) In this case we have F (x, y, y 0 , y 00 ) = y 00 + αy 02 − β xy and hence the Euler-Lagrange equation (i.e.
Eq. 12) is:
∂ h 00 02 yi d ∂ h 00 02 yi d2 ∂ h 00 02 yi
y + αy − β − y + αy − β + 2 y + αy − β = 0
∂y x dx ∂y 0 x dx ∂y 00 x
β d d2
− − (2αy 0 ) + 2 (1) = 0
x dx dx
β
− − 2αy 00 + 0 = 0
x
β
y 00 + = 0
2αx
β
So, the solution is y = − 2α (x ln x − x) + Cx + D (with C and D being constants) as can be checked
by substitution into the Euler-Lagrange equation, that is:
00
β β ?
− (x ln x − x) + Cx + D + = 0
2α 2αx
0
β x β ?
− ln x + − 1 + C + 0 + = 0
2α x 2αx
1.5 Variational Problems with Higher Derivatives 37
β 1 β ?
− +0−0 +0 + = 0
2α x 2αx
β β
− + = 0
2αx 2αx
3. Find the extremizing (or stationarizing) functions of the following functional integrals as well as the
specific solutions for the given boundary conditions:
´ 3π
(a) I [y] = π 2 x2 y 00 + y 02 − y 2 dx with y (π) = 2 and y 3π = 5.
´ 1 002 2
(b) I [y] = 0 y + 720x y dx 2
with y (0) = 0, y (1) = 0, y 0 (0) = 1 and y 0 (1) = 1.
Answer:
(a) We have F = x2 y 00 + y 02 − y 2 and hence from the Euler-Lagrange equation (Eq. 12) we get:
∂ 2 00 d ∂ 2 00 d2 ∂ 2 00
x y + y 02 − y 2 − x y + y 02
− y 2
+ x y + y 02
− y 2
= 0
∂y dx ∂y 0 dx2 ∂y 00
d d2
−2y − (2y 0 ) + 2 x2 = 0
dx dx
−2y − 2y 00 + 2 = 0
y 00 + y − 1 = 0
Hence, the extremizing function (which can be verified by substitution in the last equation) is:
y = C cos x + D sin x + 1
From the given boundary conditions (respectively), we get:
C cos π + D sin π + 1 = 2 → C = −1
3π 3π
C cos + D sin +1 = 5 → D = −4
2 2
Therefore, the specific solution is:
y = − cos x − 4 sin x + 1
(b) We have F = y 002 + 720x2 y and hence from the Euler-Lagrange equation (Eq. 12) we get:
∂ 002 2
d ∂ 002 2
d2 ∂ 002 2
y + 720x y − y + 720x y + 2 y + 720x y = 0
∂y dx ∂y 0 dx ∂y 00
d d2
720x2 − (0) + 2 (2y 00 ) = 0
dx dx
720x2 + 2y (4) = 0
y (4) = −360x2
On integrating 4 times we get the extremizing function:
y = −x6 + C3 x3 + C2 x2 + C1 x + C0
From the given boundary conditions (respectively), we get:
−0 + 0 + 0 + 0 + C0 = 0 → C0 = 0
−1 + C3 + C2 + C1 + 0 = 0 → C3 + C2 + C1 = 1
−0 + 0 + 0 + C1 = 1 → C1 = 1
−6 + 3C3 + 2C2 + 1 = 1 → 3C3 + 2C2 = 6
Accordingly: C0 = 0, C1 = 1, C2 = −6 and C3 = 6. Therefore, the specific solution is:
y = −x6 + 6x3 − 6x2 + x
1.6 Variational Problems with Multiple Independent Variables 38
n
∂F X ∂ ∂F
− =0
∂y i=1
∂xi ∂yxi
(b) Using Eq. 15 with F (t, x, y, yt , yx ) = yt2 + yx2 + Cy (noting that x1 , x2 , y, yx1 , yx2 in Eq. 15
correspond to t, x, y, yt , yx here) we get:
∂ 2 ∂ ∂ 2 ∂ ∂ 2
yt + yx2 + Cy − yt + yx2 + Cy − yt + yx2 + Cy = 0
∂y ∂t ∂yt ∂x ∂yx
∂ ∂
C− (2yt ) − (2yx ) = 0
∂t ∂x
∂ ∂ C
(yt ) + (yx ) =
∂t ∂x
2
∂ ∂y ∂ ∂y C
+ =
∂t ∂t ∂x ∂x 2
∂2y ∂2y C
2
+ =
∂t ∂x2 2
which is a 2D Poisson equation.
(c) Using Eq. 15 with F (x, z, y, yx , yz ) = x2 yx2 − az 2 yz2 (noting that x1 , x2 , y, yx1 , yx2 in Eq. 15
correspond to x, z, y, yx , yz here) we get:
∂ 2 2 2 2
∂ ∂ 2 2 2 2
∂ ∂ 2 2 2 2
x yx − az yz − x yx − az yz − x yx − az yz = 0
∂y ∂x ∂yx ∂z ∂yz
∂ ∂
0− 2x2 yx − −2az 2 yz = 0
∂x ∂z
−2 2xyx + x2 yxx + 2a 2zyz + z 2 yzz = 0
2 2
−2xyx − x yxx + 2azyz + az yzz = 0
(d) Using Eq. 15 with F (x, y, ξ, ξx , ξy ) = ξx2 + ξy2 + αξ (noting that x1 , x2 , y, yx1 , yx2 in Eq. 15
1.6 Variational Problems with Multiple Independent Variables 40
∂2 ∂2 ∂ ∂
2
(xz) + 2
(xz) = (z) + (x) = 0 + 0 = 0
∂x ∂z ∂x ∂z
Note: it should be obvious that the independent variables x and z are independent of each other.
4. The area of a simple surface in 3D Euclidean space with a given domain Ω and a given closed boundary
curve is to be optimized.[24] Find the Euler-Lagrange equation for this problem.
Answer: Let the surface be given as z = z (x, y) over the domain Ω (in the xy plane) with a given
closed boundary space curve Γ (where the projection of Γ on the xy plane is a simple plane curve
∂Ω). So, the problem is a variational problem with two independent variables (i.e. x and y) and one
dependent variable (i.e. z). Now, from elementary calculus we know that the area of such a surface is
given by: ¨ q
σ= 1 + zx2 + zy2 dx dy
Ω
where
q zx = ∂z/∂x and zy = ∂z/∂y. So, our functional integral is I [z] ≡ σ and hence F (x, y, z, zx , zy ) =
1 + zx2 + zy2 . Accordingly, the Euler-Lagrange equation for this Problem is (see Eq. 15 noting that
x, y, z, zx , zy here correspond to x1 , x2 , y, yx1 , yx2 in Eq. 15):
∂ q ∂ ∂ q ∂ ∂ q
1 + zx + zy −
2 2 1 + zx + zy −
2 2 2 2
1 + zx + zy = 0
∂z ∂x ∂zx ∂y ∂zy
∂ zx − ∂ q zy = 0
0− q
∂x 1 + z2 + z2 ∂y 1 + z2 + z2
x y x y
[24] “Optimized” here should mean minimized since the area of such a surface can diverge.
1.6 Variational Problems with Multiple Independent Variables 41
q zxx z x (2zx zxx + 2z y zyx ) z yy zy (2z x zxy + 2z y zyy )
− 3/2 + q − 3/2 = 0
1 + zx2 + zy2 2 1 + zx2 + zy2 1 + zx2 + zy2 2 1 + zx2 + zy2
! !
zxx 1 + zx2 + zy2 zx (zx zxx + zy zyx ) zyy 1 + zx2 + zy2 zy (zx zxy + zy zyy )
3/2 − 3/2 + 3/2 − 3/2 = 0
1 + zx2 + zy2 1 + zx2 + zy2 1 + zx2 + zy2 1 + zx2 + zy2
! !
zxx + zxx zx2 + zxx zy2 − zx2 zxx − zx zy zyx zyy + zyy zx2 + zyy zy2 − zx zy zxy − zy2 zyy
3/2 + 3/2 = 0
1 + zx2 + zy2 1 + zx2 + zy2
zxx + zxx zy2 − zx zy zyx zyy + zyy zx2 − zx zy zxy
3/2 + 3/2 = 0
1 + zx2 + zy2 1 + zx2 + zy2
zxx + zxx zy2 − 2zx zy zxy + zyy + zyy zx2
3/2 = 0
1 + zx2 + zy2
zxx + zxx zy2 − 2zx zy zxy + zyy + zyy zx2 = 0 (17)
We remark that in line 3 the partial differentiation with respect to x and y includes implicit as well as
explicit dependencies on these variables (as explained in the text), while in line 7 we used zyx = zxy .
Note: Eq. 17 defines the provision for minimal surfaces[25] with the above given conditions (noting
that some minimal surfaces may require a slight modification to the above conditions with regard to
the boundary).
5. Show that planes are minimal surfaces.
Answer: A plane in 3D Euclidean space (defined over a given domain in the xy plane with a given
boundary) is defined by the equation z = ax + by + c (with a, b, c being constants).[26] Accordingly,
zxx = zxy = zyy = 0 and hence Eq. 17 is satisfied identically. So, the plane is a solution to Eq. 17 and
hence its area is minimum (according to Problem 4), i.e. it is a minimal surface, as required.
6. Find the extremizing (or stationarizing) function of the following functional integral and suggest a
specific solution that satisfies the given boundary conditions (as well as the given constraint):
ˆ 1ˆ 1
I [z] = zx2 − zy2 dx dy z (0, y) = z (x, 0) = z (1, y) = z (x, 1) = 0 z (0.5, 0.5) = 1
0 0
Answer:[27] Using Eq. 15 with F (x, y, z, zx , zy ) = zx2 − zy2 (noting that x1 , x2 , y, yx1 , yx2 in Eq. 15
correspond to x, y, z, zx , zy here) we get:
∂ 2 ∂ ∂ 2 ∂ ∂ 2
zx − zy2 − zx − zy2 − zx − zy2 = 0
∂z ∂x ∂zx ∂y ∂zy
∂ ∂
0− (2zx ) − (−2zy ) = 0
∂x ∂y
zxx − zyy = 0
This is the Euler-Lagrange equation of this Problem. We can suggest the following solution (which
satisfies all the given boundary conditions and constraint as well as the Euler-Lagrange equation):
z = sin (πx) sin (πy)
Note: this solution is plotted later in the book (see the lower frame of Figure 73).
[25] Minimal surface is a surface whose area is minimum compared to the area of any other surface that shares the same
boundary curve. Common examples of minimal surface are planes, catenoids, helicoids and ennepers.
[26] The fact that some planes may be defined differently (e.g. y = 6) does not affect the generality of our assertion because
with a simple transformation (which does not affect the geometrical properties) of the plane or the coordinate system
(e.g. rotation) the plane can be defined by an equation of the above form.
[27] In this answer (as well as in the answers of the following Problems in this section), we omit many details and possibilities
10
12
10
8
z 6
6
2 4
0
1
0.8 1 2
0.6 0.8
0.4 0.6
0.4
0.2
0.2 0
0 0
y x
7. Find the extremizing (or stationarizing) function of the following functional integral and suggest a
specific solution that satisfies the given boundary conditions and plot the solution:
ˆ 1ˆ 1
I [z] = zx2 + zy2 dx dy z (0, y) = z (x, 0) = z (1, y) = 0 z (x, 1) = sin (πx) sinh (π)
0 0
Answer: Using Eq. 15 with F (x, y, z, zx , zy ) = zx2 + zy2 (noting that x1 , x2 , y, yx1 , yx2 in Eq. 15
correspond to x, y, z, zx , zy here) we get:
∂ 2 ∂ ∂ 2 ∂ ∂ 2
zx + zy2 − zx + zy2 − zx + zy2 = 0
∂z ∂x ∂zx ∂y ∂zy
∂ ∂
0− (2zx ) − (2zy ) = 0
∂x ∂y
zxx + zyy = 0
This is the Euler-Lagrange equation of this Problem. We can suggest the following solution (which
satisfies all the given boundary conditions as well as the Euler-Lagrange equation):
z = sin (πx) sinh (πy)
This solution is plotted in Figure 2.
8. Find the extremizing (or stationarizing) function of the following functional integral and suggest a
specific solution that satisfies the given boundary conditions:
ˆ 1ˆ 1
2z 2
I [z] = zx2 + zy2 − π 2 z 2 + 2 dx dy z (0, y) = z (x, 0) = z (1, y) = 0 z (x, 1) = sin (πx)
0 0 y
2
Answer: Using Eq. 15 with F (x, y, z, zx , zy ) = zx2 + zy2 − π 2 z 2 + 2zy 2 (noting that x1 , x2 , y, yx1 , yx2 in
Eq. 15 correspond to x, y, z, zx , zy here) we get:
∂ 2z 2 ∂ ∂ 2z 2
zx2 + zy2 − π 2 z 2 + 2 − zx2 + zy2 − π 2 z 2 + 2
∂z y ∂x ∂zx y
1.6 Variational Problems with Multiple Independent Variables 43
∂ ∂ 2z 2
− zx2 + zy2 − π 2 z 2 + 2 = 0
∂y∂zy y
4z ∂ ∂
−2π 2 z + 2 − (2zx ) − (2zy ) = 0
y ∂x ∂y
4z
−2π 2 z + 2 − 2zxx − 2zyy = 0
y
2z
zxx + zyy + π 2 z − 2 = 0
y
This is the Euler-Lagrange equation of this Problem. We can suggest the following solution (which
satisfies all the given boundary conditions as well as the Euler-Lagrange equation):
z = y 2 sin (πx)
Note: this solution is plotted later in the book (see the lower frame of Figure 74).
9. Find the extremizing (or stationarizing) function of the following functional integral and suggest a
specific solution that satisfies the given boundary conditions and plot the solution:
ˆ 1ˆ 1
I [z] = zx2 + zy2 − 4z dx dy
0 0
This is the Euler-Lagrange equation of this Problem. We can suggest the following solution (which
satisfies all the given boundary conditions as well as the Euler-Lagrange equation):
z = xy − x2
Answer: The functional integral of this Problem is the same as the functional integral of Problem 9
and hence the Euler-Lagrange equation for this Problem is the same as for Problem 9. We can suggest
the following series solution:
∞ ∞
8 X X (1 − [−1] ) (1 − [−1] )
m n
z= sin (mπx) sin (nπy)
π 4 n=1 m=1 mn (m2 + n2 )
Note: this solution is plotted (up to and including m = n = 7) later in the book (see the lower frame
of Figure 75).
1.7 Variational Problems with Multiple Dependent Variables 44
0.2
0.4
0
0.2
0
−0.2
−0.2
z
−0.4
−0.4
−0.6
−0.8
−0.6
−1
1
0.8 1 −0.8
0.6 0.8
0.4 0.6
0.4
0.2
0.2 −1
0 0
y x
where I is the functional integral whose optimization depends on the extremizing functions y1 and y2 ,
the prime stands for d/dx, and the functional is constrained by four boundary conditions: y1 (x1 ) = C1 ,
y1 (x2 ) = C2 , y2 (x1 ) = D1 , y2 (x2 ) = D2 (with C1 , C2 , D1 , D2 being constants). In this case there will be
one Euler-Lagrange equation (see Eq. 2) for each extremizing function, that is:
∂F d ∂F
− = 0 (19)
∂y1 dx ∂y10
∂F d ∂F
− = 0 (20)
∂y2 dx ∂y20
where these equations should be solved simultaneously[28] to obtain the solution of the variational prob-
lem. The above formulation can be easily generalized when the functional depends on more than two
extremizing functions (see Problem 1; also see § 6).
Problems
1. Write down the functional integral I and the Euler-Lagrange equations when I depends on n dependent
variables y1 (x), · · · , yn (x).
Answer: ˆ x2
I [y1 , · · · , yn ] = F (x, y1 , · · · , yn , y10 , · · · , yn0 ) dx
x1
[28] Simultaneous here does not necessarily mean they are linked as a system.
1.7 Variational Problems with Multiple Dependent Variables 45
∂F d ∂F
− =0 (i = 1, · · · , n)
∂yi dx ∂yi0
2. Obtain the Euler-Lagrange equations for the following functional integrals (of multiple dependent
variables): ´
x
(a) I [y, z] = x12 y 03 + az 02 + 3bxy − cz dx (with a, b, c being constants).
´ s2 02
(b) I [r, φ] = s1 r + Cr−1 + r2 φ02 ds.
´t
(c) I [x1 , x2 ] = t12 ẋ21 + ẋ22 dt.
Answer:
(a) We have F (x, y, z, y 0 , z 0 ) = y 03 + az 02 + 3bxy − cz and hence the Euler-Lagrange equation for y is:
∂ 03 02
d ∂ 03 02
y + az + 3bxy − cz − y + az + 3bxy − cz = 0
∂y dx ∂y 0
d
3bx − 3y 02 = 0
dx
0 00
3bx − 6y y = 0
6y 0 y 00 − 3bx = 0
(b) We have F (s, r, φ, r0 , φ0 ) = r02 + Cr−1 + r2 φ02 and hence the Euler-Lagrange equation for r is:
∂ 02 d ∂ 02
r + Cr−1 + r2 φ02 − r + Cr −1
+ r 2 02
φ = 0
∂r ds ∂r0
d
−Cr−2 + 2rφ02 − (2r0 ) = 0
ds
−Cr−2 + 2rφ02 − 2r00 = 0
00 −2 02
2r + Cr − 2rφ = 0
(c) We have F (t, x1 , x2 , ẋ1 , ẋ2 ) = ẋ21 + ẋ22 and hence the Euler-Lagrange equation for x1 is:
∂ 2 2
d ∂ 2 2
ẋ + ẋ2 − ẋ + ẋ2 = 0
∂x1 1 dt ∂ ẋ1 1
d
0− (2ẋ1 ) = 0
dt
1.7 Variational Problems with Multiple Dependent Variables 46
ẍ1 = 0
Similarly, the Euler-Lagrange equation for x2 is ẍ2 = 0. So, the Euler-Lagrange equations are:
ẍi = 0 (i = 1, 2)
3. Find the extremizing (or stationarizing) functions of the following functional integrals as well as the
specific solutions´ 1for02the given boundary conditions:
2 02
(a) I [y1 , y2 ] = 0 y1 + y1 + y2 + 4y2 dx with y1 (0) = 1, y1 (1) = 0, y2 (0) = 1, y2 (1) = 0.
´ π/2 02
(b) I [y1 , y2 ] = 0 y1 − y12 + y202 + y22 dx with y1 (0) = 1, y1 (π/2) = 1, y2 (0) = 1, y2 (π/2) = 1.
Answer:
(a) We have F = y102 + y12 + y202 + 4y2 and hence from the Euler-Lagrange equations (Eqs. 19 and 20)
we get:
∂ 02 2 02
d ∂ 02 2 02
y + y1 + y2 + 4y2 − y + y1 + y2 + 4y2 = 0
∂y1 1 dx ∂y10 1
d
2y1 − (2y10 ) = 0
dx
2y1 − 2y100 = 0
y100 − y1 = 0
∂ 02 d ∂ 02
y1 + y12 + y202 + 4y2 − y + y 2
+ y 02
+ 4y 2 = 0
∂y2 dx ∂y20 1 1 2
d
4− (2y20 ) = 0
dx
4 − 2y200 = 0
y200 = 2
C1 cosh 0 + D1 sinh 0 = 1 → C1 = 1
C1 cosh 1 + D1 sinh 1 = 0 → D1 = − coth 1
0 + 0 + D2 = 1 → D2 = 1
1 + C2 + D2 = 0 → C2 = −2
(b) We have F = y102 − y12 + y202 + y22 and hence from the Euler-Lagrange equations (Eqs. 19 and 20)
we get:
∂ 02 2 02 2
d ∂ 02 2 02 2
y − y1 + y2 + y2 − y − y1 + y2 + y2 = 0
∂y1 1 dx ∂y10 1
d
−2y1 − (2y10 ) = 0
dx
−2y1 − 2y100 = 0
y100 + y1 = 0
∂ 02 d ∂ 02
y1 − y12 + y202 + y22 − y − y 2
+ y 02
+ y 2
= 0
∂y2 dx ∂y20 1 1 2 2
1.8 Variational Problems with Constraints 47
d
2y2 − (2y20 ) = 0
dx
2y2 − 2y200 = 0
y200 − y2 = 0
C1 cos 0 + D1 sin 0 = 1 → C1 = 1
π π
C1 cos + D1 sin = 1 → D1 = 1
2 2
C2 cosh 0 + D2 sinh 0 = 1 → C2 = 1
π π π π
C2 cosh + D2 sinh = 1 → D2 = csch − coth
2 2 2 2
Therefore, the specific solution is:
π π
y1 = cos x + sin x y2 = cosh x + csch − coth sinh x
2 2
[29] Infact, “extremum” in the above statement represents the issue of interest in the calculus (or mathematics) of variations
(otherwise “stationary” is more inclusive). Also, whether we should use if or iff in the above statement depends in our
opinion on how we view the problem (i.e. whether it is a variational problem or a constrained variational problem).
1.8 Variational Problems with Constraints 48
ˆ x2
= H (x, y, y 0 ) dx (21)
x1
Again, more than one constraint can be embedded using more than one Lagrange multiplier. Therefore,
if we have n constraints then the variational formulation will take the form F + λ1 G1 + · · · + λn Gn , and
hence we use the integrand H ≡ F + λ1 G1 + · · · + λn Gn (instead of F ) in our functional integral with the
employment of Eq. 22 (instead of Eq. 2).
Regarding the solution of this type of problems, we note that the Euler-Lagrange equation is a second
order differential equation and hence the solution y usually contains two constants of integration. More-
over, it contains n Lagrange multipliers λ’s (where n is the number of the given constraints). Hence, the
obtained solution usually contains n + 2 unknowns. Accordingly, to obtain a specific solution´ x we should
use the two boundary conditions at the end points plus the n constraining conditions x12 Gi dx = Ci
(i = 1, · · · , n). This will be applied and clarified in some of the upcoming Problems.[30]
We should finally note that although the Lagrange multiplier(s) λ enter in the formulation of variational
problems with constraints, in many cases the determination of λ is irrelevant to the required solution (even
though the determination of λ may be needed provisionally in some of these cases).[31] Accordingly, λ
(or λ’s) is just a tool to obtain the solution and hence the reader should not worry about λ and its
value or nature if the sought solution is obtained. Yes, there are some types of problems in which λ
(or λ’s) has certain mathematical or physical significance and hence the determination of λ may be
desirable or even indispensable. These issues will be seen (and investigated in practical terms) in the
upcoming Problems which are solved by this technique (or they are linked to this technique). We should
also note that the Lagrange multipliers technique does not necessarily lead to extremization (since it is
essentially a stationarization technique which is more general) although we were generally talking above
about extremization (which is the main purpose of the variational techniques). So, further investigation
to determine the nature of the obtained solution (i.e. minimum or maximum or inflection or saddle) may
be required.
Problems
1. Justify the logic of the Lagrange multipliers technique using a simple argument.
Answer: Starting from Eq. 21 we have:
ˆ x2 ˆ x2 ˆ x2 ˆ x2
I= H dx = (F + λG) dx = F dx + λ G dx = I1 + λI2
x1 x1 x1 x1
So, if I1 is extremum (or stationary) and I2 is constant (according to the constraint) then I should
also be extremum (or stationary).
2. Outline the procedure of the Lagrange multipliers technique in solving constrained variational prob-
lems.
Answer:[32] If f (x) is a function to be optimized (or stationarized) subject to a constraint g(x) = c
(with c being a constant), then according to the procedure of the Lagrange multipliers technique we
do the following:
• We define a new function h = f + λg where λ is an extremizing (or stationarizing) parameter. The
[30] As indicated in the phrasing of this paragraph, we are talking about a typical problem of this type; otherwise some
problems may not strictly follow the above description and procedure.
[31] In fact, this is behind the “Lagrange undetermined multiplier” that is used to label λ in some texts.
[32] This answer is rather generic and lacks rigorous technicalities. The purpose of it is to give a general idea about the
Lagrange multipliers technique rather than a rigorous treatment and formulation. However, the technique will be more
clarified by the many upcoming constrained variational Problems in this book which are formulated and solved by this
technique.
1.8 Variational Problems with Constraints 49
function h is commonly known as the Lagrangian while the parameter λ is commonly known as the
Lagrange multiplier (or Lagrange undetermined multiplier).
• We optimize h by taking its derivative with respect to x and equating the derivative to zero. The
solution of this equation will yield the optimal point(s).
• If we have more than one constraint (say n constraints g1 = c1 , · · · , gn = cn ) then h is defined as
h = f + λ1 g1 + · · · + λn gn and the above procedure (of taking derivative and equating it to zero) is
repeated.
• If f and g (or g’s) are multi-variable functions[33] (say they are functions of x1 , · · · , xm ) then we
take the partial derivatives of h with respect to these variables and equate the derivatives to zero
∂h ∂h
(i.e. ∂x 1
= 0, · · · , ∂x m
= 0). The solution of this set of simultaneous equations will yield the optimal
point(s).
• In the case of optimizing a functional (rather than a function) which is usually dealt with by the
techniques of the calculus of variations, the above procedure is amended to cope with this altered
situation. So, in the calculus of variations (where the functional is an integral) the function H (which
is the integrand of the functional integral where H = F + λG or H = F + λ1 G1 + · · · + λn Gn ) is
used as an input to the Euler-Lagrange equation (see Eq. 22) whose solution will yield the optimal
function(s). Although this procedure looks rather different from the above-described procedure of the
Lagrange multipliers technique, it essentially rests on the same logic and rationale.
3. We have a 3 meter rope which we want to shape into a rectangle such that the enclosed area is optimal
(i.e. maximum). Find the dimensions of the required rectangle.
Answer: This is a constrained variational problem where we want to optimize a function (which is
the area σ of the rectangle) subject to the constraint that the perimeter p of the rectangle is equal
to 3. Now, if the lengths of the two sides of the rectangle are x and y then f (x, y) ≡ σ = xy and
g (x, y) ≡ p = 2 (x + y) = 3 and hence h = xy + λ2 (x + y). Accordingly:
∂h
= y + λ2 = 0
∂x
∂h
= x + λ2 = 0
∂y
On subtracting the second equation from the first we get y − x = 0 and hence y = x which means that
our rectangle is a square of side length x = 3/4.
Note: we may use a single variable approach for solving this Problem where f (x) ≡ σ = x(1.5 − x) =
1.5x − x2 and g (x) ≡ p = C = 3 (where C is a constant) and hence h = f + λg = 1.5x − x2 + λC.
Accordingly:
dh
= 1.5 − 2x = 0
dx
which leads to the same solution, i.e. x = 3/4. However, this in reality is a non-constrained approach,
i.e. the constraint is actually embedded in the formulation of f rather than being imposed as an
additional condition in the formulation. ´
4. What is the´ Lagrange´multipliers formulation for a functional F that to be optimized subject to n
constraints G1 , · · · , Gn .
Answer: The formulation is given by Eqs. 21 and 22 with H being given by:
n
X
H=F+ λi Gi
i=1
5. Find the Euler-Lagrange equations for the following constrained variational problems:
(a) F (x, y, y 0 ) = xy 02 with G = x2 y 2 .
(b) F (x, y, y 0 ) = yy 03 with G = xy.
[33] Wenote that the Lagrange multipliers technique is generally associated with multi-variable functions (rather than single-
variable functions). In fact, even when f is originally a single-variable function the Lagrange multipliers technique usually
leads to multi-variable formulation.
1.8 Variational Problems with Constraints 50
!
d λy 0
1− p = 0
dx 1 + y 02
!
d λy 0
p = 1
dx 1 + y 02
(b) H ≡ F + λG = y 01/2 + λx2 which is independent of y and hence we can use Eq. 4 (with H replacing
F ), that is:
∂ 01/2 2
y + λx = C
∂y 0
1
= C
2y 01/2
This is the Euler-Lagrange equation which we solve as follows:
1
y0 =
4C 2
x
y = +D
4C 2
7. Find and solve the Euler-Lagrange equations for the following constrained variational problems (subject
to the given boundary conditions and constraints): ´π
(a) F (x, y, y 0 ) = y 02 − y 2 and G = y with y(x = 0) = 0, y(x = π) = 1 and 0 G dx = 1.
(b) F (x, y, y 0 ) = y 02 and G = −y 2 with y(x = 0) = 0 and y 0 (x = 1) = 0.
Answer:
(a) We use Eq. 22 with H ≡ F + λG = y 02 − y 2 + λy, that is:
∂ 02 2
d ∂ 02 2
y − y + λy − y − y + λy = 0
∂y dx ∂y 0
d
−2y + λ − (2y 0 ) = 0
dx
−2y + λ − 2y 00 = 0
λ
y 00 + y − = 0
2
So, the solution is y = C cos x + D sin x + λ2 which can be checked by substitution, that is:
00
λ λ λ ?
C cos x + D sin x + + C cos x + D sin x + − = 0
2 2 2
1.8 Variational Problems with Constraints 52
0 λ λ ?
− C sin x + D cos x + 0 + C cos x + D sin x + − = 0
2 2
λ λ ?
− C cos x − D sin x + C cos x + D sin x + − = 0
2 2
0 = 0
Now, from the condition y (x = 0) = 0 we get C + λ2 = 0 and hence C = − λ2 while from the condition
y(x = π) = 1 we get − λ2 (−1) + 0 + λ2 = 1 and hence λ = 1. Thus, the solution becomes y =
− 21 cos x + D sin x + 12 . Also, from the constraint we get:
ˆ π
1 1
− cos x + D sin x + dx = 1
0 2 2
ˆ π
(− cos x + 2D sin x + 1) dx = 2
0
π
[− sin x − 2D cos x + x]0 = 2
[−0 − 2D (−1) + π] − [−0 − 2D + 0] = 2
2D + π + 2D = 2
4D + π = 2
2−π
D =
4
Therefore, the solution is:
1 2−π 1
y = − cos x + sin x +
2 4 2
(b) We use Eq. 22 with H ≡ F + λG = y 02 − λy 2 , that is:
∂ 02 2
d ∂ 02 2
y − λy − y − λy = 0
∂y dx ∂y 0
d
−2λy − (2y 0 ) = 0
dx
y 00 + λy = 0
Now: p p
If λ < 0 then y = a cosh |λ|x + b sinh |λ|x with a, b being constants and from the first
boundary condition we have 0 = a cosh (0) + b sinh (0) and hence a = 0 while from
p the second
p p
boundary condition we have 0 = b |λ| cosh |λ| and hence b = 0 because cosh |λ| = 0 has
no real solution . So, the solution is y = 0.
If λ = 0 then y = ax + b (with a, b being constants) and from the first boundary condition we have
0 = a0 + b (and hence b = 0) while from the second boundary condition we have 0 = a. So, the solution
is y = 0.[34] √ √
If λ > 0 then y = a cos λx + b sin λx with a, b being constants and from the first boundary
condition we have 0 = a cos (0) + b sin (0) and hence a = 0 while from the second boundary condition
√ √ √
we have 0 = b λ cos λ and hence if we assume non-trivial solution, i.e. b 6= 0, then λ = π/2 .
[35]
So, the solution is y = b sin πx2 .
[34] We consider the case λ = 0 for the sake of comprehensiveness (considering more general situations); otherwise the
multiplier λ is not supposed to be zero. √
[35] In fact, this is the principal solution; otherwise λ can be an odd multiple of π/2.
1.9 Variational Problems with Variable Boundaries 53
8. Obtain a specific solution for part (b) of Problem 7 assuming y > 0 plus the following constraint
´1
condition: 0 G dx = −π.
Answer: From the given constraint we get:
ˆ 1 πx
−b2 sin2 dx = −π
0 2
1
b2 1
− x − sin (πx) = −π
2 π 0
b2
− (1 − 0 − 0 + 0) = −π
2 √
b = ± 2π
Therefore, the specific solution (taking the positive root only since y > 0) is:
√ πx
y = 2π sin (0 ≤ x ≤ 1)
2
9. Find the Euler-Lagrange equations for the following √ constrained variational problems:
(a) F (x, y, y 0 ) = x2 y 02 with G1 = xy and G2 = y y 0 .
(b) F (x, y, y 0 ) = y 01/2 with G1 = y 2 and G2 = ax3 (a is constant).
Answer: √
(a) We use Eq. 22 with H ≡ F + λ1 G1 + λ2 G2 = x2 y 02 + λ1 xy + λ2 y y 0 and hence the Euler-Lagrange
equation is:
∂ h 2 02 p i d ∂ h 2 02 p i
x y + λ1 xy + λ2 y y 0 − x y + λ1 xy + λ2 y y 0 = 0
∂y dx ∂y 0
p d y
λ1 x + λ2 y 0 − 2x2 y 0 + λ2 √ 0 = 0
dx 2 y
p y0 yy 00
λ1 x + λ2 y 0 − 4xy 0 − 2x2 y 00 − λ2 √ 0 + λ2 03/2 = 0
2 y 4y
p λ 2
p λ 2 yy
00
λ1 x + λ2 y 0 − 4xy 0 − 2x2 y 00 − y0 + = 0
2 4 y 03/2
λ2 p 0 λ2 yy 00
λ1 x + y − 4xy 0 − 2x2 y 00 + = 0
2 4 y 03/2
(b) We use Eq. 22 with H ≡ F + λ1 G1 + λ2 G2 = y 01/2 + λ1 y 2 + λ2 ax3 and hence the Euler-Lagrange
equation is:
i
∂ h 01/2 i d ∂ h 01/2
y + λ1 y 2 + λ2 ax3 − y + λ 1 y 2
+ λ 2 ax 3
= 0
∂y dx ∂y 0
d 1
2λ1 y − = 0
dx 2y 01/2
y 00
2λ1 y − − 03/2 = 0
4y
y 00
2λ1 y + 03/2 = 0
4y
y 00 + 8λ1 yy 03/2 = 0
So, in this section we briefly investigate only some simple cases of variational problems with variable
boundaries.[36]
The simplest of these cases is when the variational problems (of single variable) have only one fixed end
point and hence the other end point is variable. In this case the Euler-Lagrange equation (i.e. Eq. 2) will
be used as before but with a modification to the boundary conditions where a transversality condition
will be imposed on the variable end point (while the fixed end point keeps its fixed boundary condition as
before). To be more specific, let have a curve Γ represented by a function yΓ = y(x) and it connects a fixed
point A to a curve Γ1 where the end point B of Γ is restricted to move freely on Γ1 (as depicted
´x in Figure 4).
In this case, the transversality condition states: if the value of the functional I [yΓ ] = x12 F (x, y, y 0 ) dx is
optimal (with respect to neighboring curves connecting point A to curve Γ1 ) then at point B the direction
dX : dY of Γ1 and the element of Γ should satisfy the relation:
y
Γ1
B
Γ
A
O x
Figure 4: A schematic illustration of the setting of variational problems with variable boundaries in the
simplest case when a curve Γ (that is to be optimized) connects a fixed point A to a given boundary curve
Γ1 where the end point B of Γ is restricted to move freely on Γ1 . See § 1.9.
where y 0 belongs to Γ at B while dX and dY belong to Γ1 at B, and Fy0 is the partial derivative of F
with respect to y 0 .
Another case is when a curve Γ represented by a function yΓ = y(x) is connecting two other curves (Γ1
and Γ2 ) where the end points (B1 and B2 ) of Γ are restricted to move freely on Γ1 and Γ2 (as depicted in
Figure 5). In this case, the transversality condition states: if the value of the functional I [yΓ ] is optimal
(with respect to neighboring curves connecting Γ1 and Γ2 ) then at the points B1 and B2 the directions
dX1 : dY1 of Γ1 and dX2 : dY2 of Γ2 and the corresponding elements of Γ should satisfy the following two
relations:
where again y 0 belongs to Γ at B1 and B2 while dX1 and dY1 belong to Γ1 at B1 and dX2 and dY2 belong
to Γ2 at B2 .
[36] Variable
boundaries may also be called free boundaries because free movement within certain restrictions is allowed.
However, in our view “variable” is more appropriate than “free” because it is not really free.
1.9 Variational Problems with Variable Boundaries 55
y
Γ2
B1
Γ B2
Γ1
O x
Figure 5: A schematic illustration of the setting of variational problems with variable boundaries in the
case when a curve Γ (that is to be optimized) connects two given boundary curves (Γ1 and Γ2 ) where the
end points (B1 and B2 ) of Γ are restricted to move freely on Γ1 and Γ2 . See § 1.9.
which is identical to Eq. 23 but with different ordering and grouping of terms. The transversality equation
may also be “divided” by dX and hence Eq. 23 (as well as Eqs. 24 and 25) becomes F + (Y 0 − y 0 ) Fy0 = 0
where Y 0 = dY /dX (and similarly Eq. 26 becomes F − y 0 Fy0 + Fy0 Y 0 = 0).
Problems
1. Solve the following variational problems (in which we have one fixed boundary and one variable bound-
ary):
(a) F (x, y, y 0 ) = y 02 + 2yy 0 − y 2 with fixed boundary y(0) = 1 and variable boundary x = π/4.
(b) F (x, y, y 0 ) = p
y 02 − xy 0 with fixed boundary y(0) = 5 and variable boundary x = 3.
0
(c) F (x, y, y ) = 1 + y 02 /y with fixed boundary y(0) = 0 and variable boundary y = x + 2.
Answer:
(a) In this case we have one point of the curve fixed by the condition y(0) = 1 while the other point
of the curve is free to move on the vertical line x = π/4. In part (j) of Problem 11 of § 1.4 we solved
this Problem (without boundary conditions) and obtained the solution y = a sin x + b cos x. Now, from
the fixed boundary we get b = 1 and hence y = a sin x + cos x. Regarding the variable boundary, we
impose the transversality condition:
Now, since the boundary curve x = π/4 is a vertical line then dX = 0 (noting that in the transversality
condition y 0 belongs to the extremal curve while dX and dY belong to the boundary curve) and hence
the transversality condition becomes dY Fy0 = 0. If we now note that on a vertical line dY 6= 0 then
1.9 Variational Problems with Variable Boundaries 56
Fy 0 = 0
0
2y + 2y = 0
0
y +y = 0
(a cos x − sin x) + (a sin x + cos x) = 0
(a + 1) cos x + (a − 1) sin x = 0
a+1
= tan x
1−a
a+1
= 1 (x = π/4)
1−a
a = 0
So, the solution is y = cos x (which is plotted in Figure 6). The two curves meet at point ( π4 , √12 ).
0.8
y 0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
x
Figure 6: Plot of the extremal curve y = cos x (solid) and the boundary line x = π/4 (dashed). See part
(a) of Problem 1 of § 1.9.
(b) We have F = y 02 − xy 0 and hence the Euler-Lagrange equation (noting that y is missing in F and
hence we use Eq. 4) is:
∂
y 02 − xy 0 = C
∂y 0
2y 0 − x = C
x C
y0 = +
2 2
1.9 Variational Problems with Variable Boundaries 57
x2 C
y= + x+D
4 2
2
Now, from the fixed boundary y(0) = 5 we get D = 5 and hence y = x4 + C2 x + 5. Regarding the
variable boundary, we follow similar procedures and arguments to those of part (a) of this Problem
and hence we get the following transversality condition:
Fy 0 = 0
0
2y − x = 0
0 x
y =
2
x C x
+ =
2 2 2
C = 0
x2
Accordingly, the solution is y = 4 + 5 (which is plotted in Figure 7). The two curves meet at point
(3, 7.25).
5
y
4
0
0 0.5 1 1.5 2 2.5 3 3.5 4
x
x2
Figure 7: Plot of the extremal curve y = 4 + 5 (solid) and the boundary line x = 3 (dashed). See part
(b) of Problem 1 of § 1.9.
p
(c) We have F = 1 + y 02 /y and hence the Euler-Lagrange equation (noting that x is missing in F
and hence we use Eq. 3) is:
"p # "p #
1 + y 02 ∂ 1 + y 02
− y0 0 = C
y ∂y y
1.9 Variational Problems with Variable Boundaries 58
p " #
1 + y 02 0 y0
−y p = C
y y 1 + y 02
p
1 + y 02 y 02
− p = C
y y 1 + y 02
1 + y 02 y 02
p − p = C
y 1 + y 02 y 1 + y 02
1
p = C
y 1 + y 02
1
y 2 1 + y 02 =
C2
This is the Euler-Lagrange equation which we solve as follows:
1 − C 2 y2
y 02 =
C 2 y2
p
1 − C 2 y2
y0 = ±
Cy
Cy
p dy = ±dx
1 − C 2 y2
p
1 − C 2 y2
− = ±x + C1
C
Now, from the fixed condition y(0) = 0 we get C1 = −1/C and hence:
p
1 − C 2 y2 1
− = ±x −
p C C
1 − C 2 y 2 = ∓Cx + 1
1 − C 2 y2 = C 2 x2 ∓ 2Cx + 1
−C 2 y 2 = C 2 x2 ∓ 2Cx
2
y2 = −x2 ± x
C
y2 = −x2 + bx (b = ±2/C) (27)
Referring to Eq. 27, the extremal curve is given by y 2 = −x2 + bx. On differentiating this implicitly
we get 2yy 0 = −2x + b. Now, on the point of intersection (where the extremal curve and the boundary
curve meet) we should also have y = x + 2 (since this point is on the boundary curve) as well as
y 0 = −1 and hence on substituting from y = x + 2 and y 0 = −1 into 2yy 0 = −2x + b we get:
2 (x + 2) (−1) = −2x + b
−2x − 4 = −2x + b
b = −4
√
Accordingly, the solution (taking
√ the positive
√ root) is y = −x2 − 4x (which is plotted in Figure 8).
The two curves meet at point ( 2 − 2, 2).
1.8
1.6
1.4
1.2
1 y
0.8
0.6
0.4
0.2
0
−0.5 −0.4 −0.3 −0.2 −0.1 0
x
√
Figure 8: Plot of the extremal curve y = −x2 − 4x (solid) and the boundary line y = x + 2 (dashed).
See part (c) of Problem 1 of § 1.9.
2. Solve the following variational problems (in which we have two variable boundaries):
(a) F (x, y, y 0 ) = yp
02
+ y 0 + yy 0 + 2y with x = 0 and x = 2.
(b) F (x, y, y 0 ) = 1 + y 02 with y = x4 and y = x − 1.
Answer:
(a) In this case the left point of the extremal curve is free to move on the vertical line x = 0 while
the right point of the extremal curve is free to move on the vertical line x = 2. The Euler-Lagrange
equation is:
∂ 02 0 0
d ∂ 02 0 0
y + y + yy + 2y − y + y + yy + 2y = 0
∂y dx ∂y 0
d
y0 + 2 − (2y 0 + 1 + y) = 0
dx
y 0 + 2 − 2y 00 − y 0 = 0
1.9 Variational Problems with Variable Boundaries 60
y 00 − 1 = 0
So, the solution is y = 12 x2 + Cx + D (with C and D being constants). This can be easily verified by
substitution into the last equation.
Following similar procedures and arguments to those of part (a) of Problem 1 we get the following
transversality conditions:
Fy 0 = 0 at x = 0 and Fy 0 = 0 at x = 2
Accordingly, at x = 0 we have:
∂
0
y 02 + y 0 + yy 0 + 2y = 0
∂y
2y 0 + 1 + y = 0
1 2
2 (x + C + 0) + 1 + x + Cx + D = 0
2
2 (0 + C + 0) + 1 + (0 + 0 + D) = 0 (x = 0)
2C + D + 1 = 0
Similarly, at x = 2 we have:
2 (2 + C + 0) + 1 + (2 + 2C + D) = 0 (x = 2)
4C + D + 7 = 0
Hence, C = −3 and D = 5. So, the solution is y = 21 x2 − 3x + 5 (which is plotted in Figure 9). The
extremal curve meets with the boundary line x = 0 at point (0, 5) and with the boundary line x = 2
at point (2, 1).
(b) In this case one boundary point of the extremal curve is free to move on the curve y = x4 while the
other boundary point of the extremal curve is free to move on the line y = x − 1. Referring to part (i)
of Problem 9 of § 1.4 (also see footnote [12] ), the Euler-Lagrange equation for this Problem is y 0 = a
and hence y = ax + b (with a and b being constants). Following similar procedures and arguments to
those of the previous Problems we get the following transversality conditions:
dY
F+ − y 0 Fy 0 = 0 (for the boundary Y = x4 )
dX
p y0
1 + y 02 + 4x3 − y 0 p = 0
1 + y 02
1 + y 02 + 4x3 − y 0 y 0 = 0
1 + 4x3 y 0 = 0
3
1 + 4x a = 0 (28)
AND
dY
F+ − y 0 Fy 0 = 0 (for the boundary Y = x − 1)
dX
p y0
1 + y 02 + (1 − y 0 ) p = 0
1 + y 02
1 + y 02 + (1 − y 0 ) y 0 = 0
0
1+y = 0
1+a = 0
a = −1 (29)
1.9 Variational Problems with Variable Boundaries 61
y
3
0
0 0.5 1 1.5 2
x
Figure 9: Plot of the extremal curve y = 21 x2 − 3x + 5 (solid) and the boundary lines x = 0 and x = 2
(dashed). See part (a) of Problem 2 of § 1.9.
Therefore, the equation of the extremal curve becomes y = −x + b. So, what is left is to find b. On
substituting from Eq. 29 into Eq. 28 we get 1 − 4x3 = 0 and hence x = 4−1/3 . Now, the point with
coordinate x = 4−1/3 is where the extremal curve y = −x + b and the boundary curve y = x4 meet
and hence we should have:
−x + b = x4
−4−1/3 + b = 4−4/3
b = 4−4/3 + 4−1/3
b ' 0.787451
Accordingly, the solution is y = −x + 0.787451 (which is plotted in Figure 10). The extremal curve
meets with the boundary curve y = x4 at point (4−1/3 , 4−4/3 ) and with the boundary line y = x − 1
at point (0.893725, −0.106275).
´x p
3. Find the solution of the variational problem I [y] = x12 1 + y 02 dx when we have:
(a) Fixed boundary y(0) = 0 and variable boundary x = 1.
(b) Fixed boundary y(1) = 1 and variable boundary x = 0.
(c) Variable boundaries x = 0 and x = 1.
Answer: Referring to part (i) of Problem 9 of § 1.4 (also see footnote [12] ), the Euler-Lagrange
equation for this Problem is y 0 = a and hence y = ax + b (with a and b being constants). Following
similar procedures and arguments to those of the previous Problems we get the following transversality
condition:
y0
Fy0 = p =0 and hence y0 = 0
1+y 02
So, the solution in all three cases (i.e. a, b and c) is y = b (i.e. a horizontal line). Accordingly, all we
1.9 Variational Problems with Variable Boundaries 62
0.5
y
0
−0.5
−1
0 0.5 1 1.5 2
x
Figure 10: Plot of the extremal curve y = −x + 0.787451 (solid) and the boundary curves y = x4 and
y = x − 1 (dashed). See part (b) of Problem 2 of § 1.9.
need is to determine the value of b for each one of these cases, that is:
(a) From the fixed boundary condition y(0) = 0 we get b = 0 and hence the solution is y = 0.
(b) From the fixed boundary condition y(1) = 1 we get b = 1 and hence the solution is y = 1.
(c) There is no restriction on b (apart from being a constant) and hence the solution is y = b (i.e. any
horizontal line can be a solution).
4. Use the transversality condition in the following variational problems with variable boundaries to
determine y 0 at the boundaries:[37]
(a) F = y 02 + y + C with boundary curves x = 1 and x = 2.
(b) F = y 02 + xy 0 with boundary curves x = α and x = β (where α and β are constants).
(c) F = y 02 − 2yy 0 with boundary curves x = a and x = b (where a and b are constants).
Answer: The boundary curves in all these cases are vertical lines. So, referring to the previous
Problems (see for instance part a of Problem 1) the transversality condition in all these cases should
be given by Fy0 = 0. Accordingly:
(a) In this case Fy0 = 2y 0 = 0 and hence we should have y 0 = 0 at both boundaries, i.e. the slope of
the extremal curve should vanish at the boundaries x = 1 and x = 2. So, the extremal curve at the
boundary lines satisfies the following conditions: y 0 (1) = y 0 (2) = 0.
(b) In this case Fy0 = 2y 0 + x = 0 and hence at the boundaries we should have y 0 = −x/2. So, the
extremal curve at the boundary lines satisfies the following conditions: y 0 (α) = −α/2 and y 0 (β) =
−β/2.
(c) In this case Fy0 = 2y 0 − 2y = 0 and hence at the boundaries we should have y 0 = y. So, the
extremal curve at the boundary lines satisfies the following conditions: y 0 (a) = y(a) and y 0 (b) = y(b).
[37] This sort of problems is usually studied within the context of investigating natural boundary conditions in the calculus
of variations. However, because we have no appetite for going through these details we posed this Problem in this rather
primitive way.
1.10 Variational Problems of Mixed Nature 63
where yα = ∂y/∂α (and similarly for zα , yβ , zβ ). Hence, we have two Euler-Lagrange equations:
∂F ∂ ∂F ∂ ∂F
− − = 0
∂y ∂α ∂yα ∂β ∂yβ
∂F ∂ ∂F ∂ ∂F
− − = 0
∂z ∂α ∂zα ∂β ∂zβ
To generalize this formulation, let denote the independent variables with xi (i = 1, · · · , m) and the
dependent variables with yj (j = 1, · · · , n). On combining the formulations of § 1.6 and § 1.7 (using
this notation), the functional integral is:
˙
I [y1 , · · · , yn ] = F (x1 , · · · , xm , y1 , · · · , yn , y11 , · · · , y1m , · · · , yn1 , · · · , ynm ) dx1 · · · dxm
Ω
where ynm = ∂yn /∂xm (and similarly for similar notations). Hence, we have n Euler-Lagrange equa-
tions:
Xm
∂F ∂ ∂F
− =0 (j = 1, · · · , n)
∂yj i=1
∂xi ∂yji
where yji = ∂yj /∂xi .
2. State the variational formulation for a variational problem with two independent variables (t and x)
and one dependent variable y where the functional I depends also on the second order derivatives of
the dependent variable y (i.e. ∂ 2 y/∂t2 , ∂ 2 y/∂x∂t and ∂ 2 y/∂x2 ).
Answer: Combining the formulations of § 1.5 and § 1.6, the functional integral is:
¨
I [y] = F (t, x, y, yt , yx , ytt , ytx , yxx ) dt dx
Ω
[38] In
fact, this formulation has some restrictions which are not given here (because the example is for demonstration only).
For instance, in some problems the multiplier λ (or multipliers λ’s) may enter as a dependent variable.
1.10 Variational Problems of Mixed Nature 64
Note: the highest order derivative for the dependent variables yi ’s may differ (as the notation ni
indicates) and hence the highest order derivative of y1 which I depends on may be the third while the
highest order derivative of y2 which I depends on may be the second. We should also note that some
of the derivatives between the first order and the highest order may be missing[40] (e.g. I may depend
on the first and third order derivatives of y1 but not on the second order derivative) and hence the
above formulation could be amended accordingly.
4. Extend the generalized´ formulation ´ of Problem 3 for the case in which the variational problem also
includes k constraints G1 , · · · , Gk .
Answer: We have:
ˆ x2
(n )
I [y1 , · · · , ym ] = H x, y1 , y10 , · · · , y1 1 , · · · , ym , ym
0 (nm )
, · · · , ym dx
x1
where:
k
X
H=F+ λi Gi
i=1
[39] The reader should be careful in interpreting the partial derivatives with respect to the independent variables (as explained
in § 1.6).
[40] In fact, in some cases even the first order derivative (and possibly the derivatives of all orders) of some dependent variables
can be missing.
1.11 Summary 65
1.11 Summary
We can summarize the results of the previous sections in Table 1.
F (y, y 0 ) F − y 0 ∂y
∂F
0 = C
F (x, y 0 ) ∂F
∂y 0 =C
∂F
F (x, y) ∂y =0
Pn
∂F i di ∂F
F x, y, y (1) , · · · , y (n) ∂y + i=1 (−1) dxi ∂y (i)
=0
Pn
∂F ∂ ∂F
F (x1 , · · · , xn , y, yx1 , · · · , yxn ) ∂y − i=1 ∂xi ∂yxi =0
F (x, y1 , · · · , yn , y10 , · · · , yn0 ) ∂F
∂yi − d
dx
∂F
∂yi0 =0 (i = 1, · · · , n)
F (x, y, y 0 ) with n constraints ∂H
∂y − d
dx=0 ∂H
∂y 0
´ ´ Pn
G1 , · · · , Gn where H = F + i=1 λi Gi
F (x, y, y 0 ) with variable boundary(s) ∂F d ∂F
∂y − dx ∂y 0 = 0 with transversality condition(s)
F dX + (dY − y 0 dX) Fy0 = 0
Mixed nature Mixed formulation (as above)
[41] Insome cases, it may be required to treat λ’s as variational dependent variables and hence other equation(s) may be
added (as required). In fact, extra conditions and restrictions are required to make the formulation more rigorous (so the
above formulation is mainly for the purpose of demonstrating the idea of mixed techniques in treating the variational
problems of mixed nature).
Chapter 2
Optimal Curves
In this chapter we present and solve problems about topics and applications of the mathematics of variation
related to optimal curves, i.e. we are looking in these problems to certain curves (or 1D objects) that
optimize something (such as length). In fact, the given problems represent just a sample of the variational
problems in this category to outline the methods of tackling this sort of problems. There are many other
problems of this type that can be considered and solved similarly. This equally applies to the problems
of other forthcoming chapters.
66
2.1 Geodesic Curves 67
B
ds
dy
A dx
O x
Figure 11: A simple sketch depicting the setting of the Problem of shortest distance connecting two points
(A and B) on a plane with ds representing infinitesimal arc length (or line element). See Problem 1 of §
2.1.
y 02 = D2 1 + y 02
y 02 1 − D2 = D2
p
y0 = a a = ± D2 / (1 − D2 ) is constant
y = ax + b (b is constant)
which is an equation of a straight line. So, the shortest distance between two points on a plane is
the length of the straight line segment that connects these two points, (i.e. on a Euclidean plane the
geodesic is a straight line).
Note: it is obvious that the optimal length in this Problem is a minimum (not a maximum) because
the length of a curve connecting two points can diverge.
q polar coordinates (ρ, φ).
2. Re-solve Problem 1 using this time plane
2 2
Answer: In polar coordinates ds = (dρ) + ρ2 (dφ) and hence:
ˆ q ˆ ρB q ˆ ρB p
2 2 2
s= (dρ) + ρ2 (dφ) = 1 + ρ2 (dφ/dρ) dρ = 1 + ρ2 φ02 dρ ≡ I [φ]
Γ ρA ρA
0
where the prime means d/dρ. On comparing pthis to Eq. 1 (noting that x, y, y in Eq. 1 correspond to
0 0
ρ, φ, φ here) we can see that F (ρ, φ, φ ) ≡ 1 + ρ2 φ02 . If we now apply the Euler-Lagrange equation
(i.e. Eq. 4 noting that F is independent of φ which corresponds to y) we get:
p
∂ 1 + ρ2 φ02
= C
∂φ0
ρ2 φ0
p = C
1 + ρ2 φ02
ρ4 φ02 = C 2 1 + ρ2 φ02
C2
φ02 =
ρ4 − C 2 ρ2
2.1 Geodesic Curves 68
C
φ0 = ±p
ρ4 − C 2 ρ2
p !
ρ2 − C 2
φ + φ0 = ± arctan
C
p
ρ2 − C 2
± tan (φ + φ0 ) =
C
C tan (φ + φ0 ) = ρ − C 2
2 2 2
2
C tan (φ + φ0 ) + 1 = ρ2
2
C 2 sec2 (φ + φ0 ) = ρ2
ρ2 cos2 (φ + φ0 ) = C2
ρ cos (φ + φ0 ) = ±C
This is an equation of a straight line as can be shown as follows:
ρ cos (φ + φ0 ) = ±C
ρ (cos φ cos φ0 − sin φ sin φ0 ) = ±C (trigonometric identity)
ρ cos φ cos φ0 − ρ sin φ sin φ0 = ±C
x cos φ0 − y sin φ0 = ±C (transforming to Cartesian)
As we see, this is an equation of a straight line (noting that φ0 and C are constants).
Note: when we take the square root in Problems like this we usually use ± for the sake of formality
although C (and its alike in similar Problems) is usually arbitrary and hence it can represent both
signs.
3. Find the equation of geodesic in 3D Euclidean space.
Answer:
p This Problem is similar p to Problem 1. We use Cartesian coordinates x, y, z and hence
ds = (dx)2 + (dy)2 + (dz)2 = 1 + y 02 + z 02 dx (where the prime means d/dx). Hence, the distance
(which represents the functional I that we are supposed to optimize) is given by:
ˆ ˆ xB p
s= ds = 1 + y 02 + z 02 dx ≡ I [y, z]
Γ xA
p
So, F (x, y, z, y 0 , z 0 ) ≡ 1 + y 02 + z 02 (noting that this is a Problem with multiple dependent variables;
see § 1.7). Accordingly, the Euler-Lagrange equations for the dependent variables y and z are given
by Eqs. 19 and 20 (noting that y1 , y2 there correspond to y, z here).
The y equation is:
p p !
∂ 1 + y 02 + z 02 d ∂ 1 + y 02 + z 02
− = 0
∂y dx ∂y 0
!
d y0
0− p = 0
dx 1 + y 02 + z 02
y0
p = C
1 + y 02 + z 02
y 02 = C 2 1 + y 02 + z 02
y 02 1 − C2 = C 2 1 + z 02 (30)
The z equation is:
p p !
∂ 1 + y 02 + z 02 d ∂ 1 + y 02 + z 02
− = 0
∂z dx ∂z 0
2.1 Geodesic Curves 69
!
d z0
0− p = 0
dx 1 + y 02 + z 02
z0
p = D
1 + y 02 + z 02
z 02 = D2 1 + y 02 + z 02
z 02 1 − D2 = D2 1 + y 02 (31)
On substituting from Eq. 31 into Eq. 30 (see the upcoming note 1) and simplifying we get y 0 = constant
and hence y = ax + b (with a and b being constants) which is an equation of a plane. Similarly, on
substituting from Eq. 30 into Eq. 31 (see the upcoming note 1) and simplifying we get z 0 = constant
and hence z = cx + d (with c and d being constants) which is also an equation of a plane. So, the
solution of the system of the y and z equations is the intersection of these planes which is a straight
line. Accordingly, the geodesic (or the curve of shortest length) in this case is a straight line.
Note 1: from Eq. 31 we get:
02 D2 1 + y 02
z =
1 − D2
On substituting from this equation into Eq. 30 we get:
!
02 2
2 D2 1 + y 02
y 1−C = C 1+
1 − D2
C 2 D2 C 2 D2 y 02
y 02 1 − C 2 = C2 + +
1 − D2 1 − D2
C 2 D2 C D2 2
y 02 1 − C 2 − = C2 +
1 − D2 1 − D2
−1
C 2 D2 C 2 D2
y 02 = C2 + 1 − C 2
−
1 − D2 1 − D2
y0 = constant
We similarly obtain z 0 = constant (with the exchange of y 0 and z 0 and C and D).
Note 2: it is obvious that the optimal length in this Problem is a minimum (not a maximum) because
the length of a curve connecting two points can diverge.
4. Find the equation of geodesic on a right circular cylinder.
Answer: We use cylindrical coordinates ρ, φ, z to represent the cylinder where the axis of the cylinder
coincides with the z axis of the coordinate system. Now, the line element ds in cylindrical coordinates
is given by: q
2 2 2
ds = (dρ) + ρ2 (dφ) + (dz)
For right circular cylinder ρ is constant and hence dρ = 0 and ρ = R (with R being the constant radius
of the cylinder). Therefore, the line element of the cylinder is:
q q p
2 2 2
ds = R (dφ) + (dz) = R2 + (dz/dφ) dφ = R2 + z 02 dφ
2
where z 0 = dz/dφ. Hence, the length of the curve Γ (which is the functional I that we intend to
optimize) is:
ˆ ˆ φ2 p
s= ds = R2 + z 02 dφ ≡ I [z]
Γ φ1
where φ1 , φ2 represent the azimuthal coordinates of the end points of the curve. On comparing
√ this
to Eq. 1 (noting that x, y, y 0 in Eq. 1 correspond to φ, z, z 0 here) we can see that F ≡ R2 + z 02 .
2.1 Geodesic Curves 70
Considering that F has no explicit dependency on φ (which corresponds to x in our case), we can use
the Beltrami identity (i.e. Eq. 3 with the replacement of y 0 with z 0 ), that is:[44]
√
p 2 02
02 0∂ R + z
R +z −z
2 = C
∂z 0
p 2z 0
R2 + z 02 − z 0 √ = C
2 R2 + z 02
p z 02
R2 + z 02 − √ = C
R2 + z 02
p
R2 + z 02 − z 02 = C R2 + z 02
R4 = C 2 R2 + z 02
R4
z 02 = 2
− R2
Cr
R4
z0 = ± − R2
C2 !
r
dz R4
= ±D D= − R2
dφ C2
z = ±Dφ + E (E is constant) (32)
which is an equation of a circular helix.
Note: the helix can represent a circular arc as a special case when D = 0 (in the case where the two
points connected by the geodesic are on a circle). It can also represent a generator line parallel to the
z axis when D → ∞ (in the case where the two points connected by the geodesic are on a generator).
5. Re-solve Problem 4 using this time a multiple dependent variables approach (see § 1.7).
Answer: We use t-parameterized cylindrical coordinates ρ(t), φ(t), z(t) to represent the cylinder where
the axis of the cylinder coincides with the z axis of the coordinate system. Now, the line element ds
in cylindrical coordinates is given by:
q
2 2 2
ds = (dρ) + ρ2 (dφ) + (dz)
For right circular cylinder ρ is constant and hence dρ = 0 and ρ = R (with R being the constant radius
of the cylinder). Therefore, the line element of the cylinder is:
q q q
2 2 2 2
ds = R2 (dφ) + (dz) = R2 (dφ/dt) + (dz/dt) dt = R2 φ̇2 + ż 2 dt
where the overdot represents d/dt. Hence, the length of the curve Γ (which is the functional I that we
intend to optimize) is:
ˆ ˆ t2 q
s= ds = R2 φ̇2 + ż 2 dt ≡ I [φ, z]
Γ t1
where t1 , t2 represent the values of the parameter t at the end points of the curve. On comparing this
0 0
equation to q Eq. 18 (noting that x, y1 , y2 , y1 , y2 in Eq. 18 correspond to t, φ, z, φ̇, ż here) we can see
that F ≡ R2 φ̇2 + ż 2 . Accordingly, the Euler-Lagrange equations for the dependent variables φ and
z are given by Eqs. 19 and 20 (with the replacement of x, y1 , y2 , y10 , y20 with t, φ, z, φ̇, ż).
The φ equation is:
q q
∂ R2 φ̇2 + ż 2 d ∂ R2 φ̇2 + ż 2
− = 0
∂φ dt ∂ φ̇
[44] Noting that only z 0 appears in F we can use footnote [12] (or Eq. 4) to conclude z 0 = constant immediately. However,
we use here the Beltrami identity for diversity and practice (as well as showing that these equations lead to the same
result).
2.1 Geodesic Curves 71
d 2R2 φ̇
0− q = 0
dt
2 R2 φ̇2 + ż 2
R2 φ̇
q = C1
R2 φ̇2 + ż 2
C12 2 2
φ̇2 = 4
R φ̇ + ż 2
R
R4
ż 2 = − R φ̇2
2
C12
s !
R4
ż = ±D1 φ̇ D1 = − R2
C12
z = ±D1 φ + E1 (E1 is constant)
specific handedness for the case |∆φ| = π. We should finally note that part of the confusion about
these issues may originate from the meaning of φ and if it belongs to the coordinate system and hence
to the cylinder (where 0 ≤ φ < 2π) or it belongs to the helix that connects the two end points (and
hence φ represents the “spin” of the helix regardless of the aforementioned restriction on φ). In other
words, whether the helix (as a function of φ) is represented as one-to-many or as one-to-one. Also see
Problem 6.
6. A right circular cylinder has a radius R = 6 with its axis being aligned along the z axis of a cylindrical
coordinate system. Find the equation of the geodesic on this cylinder that passes through the point
(φA , zA ) = ( π2 , 1) and the point (φB , zB ) = (π, 6).
Answer: Inserting the coordinates of these points into Eq. 32 (noting that the sign is rather arbitrary),
we get:
π
1 = D +E
2
6 = Dπ + E
On solving these equations we get D = 10 π and E = −4. Accordingly, the geodesic on this cylinder
that passes through those points is a helix (or rather helical arc) described by the equations:
10
z= φ−4 and ρ=6
π
Note: if we used the minus sign in Eq. 32 we get D = − 10 10
π and E = −4 and hence z = π φ − 4 which
is the same.
7. Find the equation of geodesic on a sphere.
Answer: We use a spherical coordinate system r, θ, φ centered on the center of the sphere. Now, the
line element ds in spherical coordinates is given by:
q
2 2 2
ds = (dr) + r2 (dθ) + r2 sin2 θ (dφ)
For sphere, r is constant and hence dr = 0 and r = R (with R being the constant radius of the sphere).
Therefore, the line element on a sphere is:
q q q
2 2 2 2
ds = R2 (dθ) + R2 sin2 θ (dφ) = R (dθ) + sin2 θ (dφ) = R 1 + φ02 sin2 θ dθ
where φ0 = dφ/dθ. Hence, the length of the curve Γ (which is the functional I that we intend to
optimize) is:
ˆ ˆ θ2 q
s= ds = R 1 + φ02 sin2 θ dθ ≡ I [φ]
Γ θ1
where θ1 , θ2 represent the θ coordinates of the end points of the curve. On comparing
p this to Eq. 1
(noting that x, y, y in Eq. 1 correspond to θ, φ, φ here) we can see that F ≡ 1 + φ sin2 θ (noting
0 0 02
that R is a constant and hence it is no more than a scaling factor). Considering that F has no explicit
dependency on φ (which corresponds to y in our case), we can use Eq. 4 (with the replacement of y 0
with φ0 ), that is:
q
∂ 02 2
1 + φ sin θ = C
∂φ0
2φ0 sin2 θ
p = C
2 1 + φ02 sin2 θ
φ02 sin4 θ = C 2 + C 2 φ02 sin2 θ
C2
φ02 =
sin4 θ − C 2 sin2 θ
2.1 Geodesic Curves 73
C2
φ02 =
sin4 θ 1 − C 2 sin−2 θ
C 2 csc4 θ
φ02 =
(1 − C 2 csc2 θ)
C csc2 θ
φ0 = √ (csc2 θ = 1 + cot2 θ)
1 − C 2 − C 2 cot2 θ
csc2 θ
φ0 = q
1−C 2
C2 − cot2 θ
csc2 θ dθ
dφ = q
1−C 2
C2 − cot2 θ
Now, let D2 = C 2 /(1 − C 2 ) and w = cot θ (and hence dw = − csc2 θ dθ). On substituting these in the
last equation and integrating we get:
ˆ ˆ
dw
dφ = − q
1
D2 − w
2
where in the last line we transformed to Cartesian coordinates. As we see, the last equation is an
equation of a plane passing through the origin (which is the center of the sphere) and hence the
geodesic curve on a sphere is an arc of a great circle of the sphere (where the great circle is the
intersection of the plane and the sphere).
8. Find the equation of geodesic on a right circular cone.
Answer: We use spherical coordinates r, θ, φ to represent the cone where the apex of the cone is at
the origin of coordinates while the axis of the cone coincides with the θ = 0 axis of the coordinate
system. Now, the line element ds in spherical coordinates is given by:
q
2 2 2
ds = (dr) + r2 (dθ) + r2 sin2 θ (dφ)
For right circular cone (according to the above setting), θ is constant and hence dθ = 0 and θ = α
(with α being a constant angle). Therefore, the line element on a cone is:
q q p
2 2 2
ds = (dr) + r sin α (dφ) = (dr/dφ) + r2 sin2 α dφ = r02 + r2 sin2 α dφ
2 2
where r0 = dr/dφ. Hence, the length of the curve Γ (which is the functional I that we intend to
optimize) is:
ˆ ˆ φ2 p
s= ds = r02 + r2 sin2 α dφ ≡ I [r]
Γ φ1
where φ1 , φ2 represent the azimuthal coordinates of the end points of the curve. On comparing
p this to
Eq. 1 (noting that x, y, y in Eq. 1 correspond to φ, r, r here) we can see that F ≡ r + r sin2 α.
0 0 02 2
2.1 Geodesic Curves 74
Considering that F has no explicit dependency on φ (which corresponds to x in our case), we can use
the Beltrami identity (i.e. Eq. 3 noting that y 0 in Eq. 3 correspond to r0 here), that is:
p ∂ p 02
r02 + r2 sin2 α − r0 0 r + r2 sin2 α = C
∂r !
p 2r 0
r02 + r2 sin2 α − r0 p = C
2 r02 + r2 sin2 α
p r02
r02 + r2 sin2 α − p = C
r02 + r2 sin2 α
p
r02 + r2 sin2 α − r02 = C r02 + r2 sin2 α
p
r2 sin2 α = C r02 + r2 sin2 α
r4 sin4 α = C 2 r02 + r2 sin2 α
r4 sin4 α
r02 = − r2 sin2 α
C2
r02 = ar4 − br2 (a and b are constants)
p
r0 = ar − br
4 2
dr p
= ar4 − br2
dφ
dr
dφ = √
ar4 − br2
r !
1 ar2 − b
φ = √ arctan +D (33)
b b
where the result of the last line can be easily checked by differentiation (to obtain the results of the
earlier lines).
9. Re-solve Problem 8 using this time cylindrical coordinates (ρ, φ, z).
Answer: Let the axis of the cone coincide with the (positive) z axis of the coordinate system while the
apex of the cone be at the origin of coordinates. Now, the line element ds in cylindrical coordinates is
given by: q
2 2 2
ds = (dρ) + ρ2 (dφ) + (dz)
For right circular cone z is proportional to ρ and hence z = kρ where k is a constant. Therefore, the
line element of the cone becomes:
q p
2 2 2
ds = (dρ) + ρ2 (dφ) + (kdρ) = 1 + k 2 + ρ2 φ02 dρ
where φ0 = dφ/dρ. Hence, the length of the curve Γ (which is the functional I that we intend to
optimize) is: ˆ ˆ ρ2 p
s= ds = 1 + k 2 + ρ2 φ02 dρ ≡ I [φ]
Γ ρ1
where ρ1 , ρ2 represent the ρ coordinates of the end points of the curve. On comparing
p this to Eq.
1 (noting that x, y, y 0 in Eq. 1 correspond to ρ, φ, φ0 here) we can see that F ≡ 1 + k 2 + ρ2 φ02 .
Considering that F has no explicit dependency on φ (which corresponds to y in our case), we can use
Eq. 4 (with the replacement of y 0 with φ0 ), that is:
∂F
= C
∂φ0
ρ2 φ0
p = C
1 + k 2 + ρ2 φ02
2.1 Geodesic Curves 75
ρ4 φ02 = C 2 + C 2 k 2 + C 2 ρ2 φ02
C 2 + C 2 k2
φ02 =
ρ4 − C 2 ρ2
s
dφ C 2 + C 2 k2
=
dρ ρ4 − C 2 ρ2
p !
p ρ 2 − C2
φ = 1 + k 2 arctan +D (34)
C
Note: the similarity in form between Eq. 33 and Eq. 34 is because on a right circular cone (coordinated
as described in Problem 8 and in the present Problem) ρ of the cylindrical coordinates is proportional
to r of the spherical coordinates (since ρ = r sin α) noting that φ is the same in both coordinate
systems (assuming that the systems are in a standard configuration with respect to a corresponding
orthonormal Cartesian system as we do).
10. Find the equation of geodesic on a surface of revolution.
Answer: Let use cylindrical coordinates ρ, φ, z where the z axis of the coordinate system is the axis
of the surface of revolution and hence the profile curve (i.e. meridian) of the surface of revolution is
given as ρ = f (z). Now, the line element ds in cylindrical coordinates is given by:
q p p
2 2 2
ds = (dρ) + ρ2 (dφ) + (dz) = ρ02 + ρ2 φ02 + 1 dz = f 02 + f 2 φ02 + 1 dz
where the prime means d/dz. Hence, the length of the curve Γ (which is the functional I that we
intend to optimize) is: ˆ ˆ z2 p
s= ds = f 02 + f 2 φ02 + 1 dz ≡ I [φ]
Γ z1
where z1 , z2 represent the z coordinates of the end points of the curve. On comparing
p this to Eq.
1 (noting that x, y, y 0 in Eq. 1 correspond to z, φ, φ0 here) we can see that F ≡ f 02 + f 2 φ02 + 1.
Considering that F has no explicit dependency on φ (which corresponds to y in our case), we can use
Eq. 4 (with the replacement of y 0 with φ0 ), that is:
p
∂ f 02 + f 2 φ02 + 1
= C
∂φ0
f 2 φ0
p = C (35)
f 02 + f 2 φ02 + 1
f 4 φ02 = C 2 f 02 + f 2 φ02 + 1
C 2 f 02 + C 2
φ02 =
f 4 − C 2f 2
s
f 02 + 1
φ0 = ±C (36)
f 4 − C 2f 2
ˆ s 02
f +1
φ = ±C dz (37)
f 4 − C 2f 2
r
1
±φ = C z + C1
− C 2 R2
R4
z = ±Dφ + E
where the constants D and E are defined accordingly. The last equation is the same as Eq. 32 of
Problem 4.
12. Apply the result of Problem 10 on the right circular cone and hence confirm the result of Problem 9.
Answer: Using the setting of Problem 9 we have ρ ≡ f (z) = κz where κ = 1/k. So, from Eq. 37
(with f = κz and f 0 = κ) we get:
ˆ r
κ2 + 1
φ = C dz
κ4 z 4 − C 2 κ2 z 2
√ ˆ
C κ2 + 1 1
= q dz
κ2 2
z 4 − (C/κ) z 2
q
√ 2
C κ +12 z 2 − (C/κ)
= arctan +D
κ2 (C/κ) C/κ
q
√ 2 2
κ2 + 1 (ρ/κ) − (C/κ)
= arctan +D
κ C/κ
p !
p ρ2 − C 2
= 2
1 + (1/κ ) arctan +D
C
p !
p ρ 2 − C2
= 1 + k 2 arctan +D
C
p y0 p
1 + y 02 + (Y 0 − y 0 ) p = 0 (F = 1 + y 02 ; see Problem 1)
1 + y 02
p a
1 + a2 + 3x2 − a √ = 0 (y = ax + b and Y = x3 )
1 + a2
p
1 + a2 + 3ax2 − a2 = 0 (× 1 + a2 )
1+ 3ax2B = 0 (38)
where in the last equation we used xB to indicate that this equation applies to point B. Now, the
geodesic y = ax + b should pass through point (0, 2) and hence 2 = 0 + b, i.e. b = 2. Also, point B is
on both the geodesic y = ax + 2 and the boundary curve y = x3 and hence:
x3B − 2
axB + 2 = x3B → a= (xB 6= 0) (39)
xB
On substituting from the last equation into Eq. 38 and simplifying we get 3x4B − 6xB + 1 = 0. On
solving this quartic equation we get:
xB ' 0.16705608755 and hence a ' −11.94411932 and yB ' 0.00466216
OR
xB ' 1.19858497157 and hence a ' −0.23202837 and yB ' 1.72189428
Now, if we compute (using Pythagoras) the distance d1 between point (0, 2) and point (xB , yB ) =
(0.16705, 0.00466) and the distance d2 between point (0, 2) and point (xB , yB ) = (1.19858, 1.72189)
we get d1 ' 2.00231887 and d2 ' 1.23042624. So, the geodesic (i.e. curve of shortest length) is
y = −0.23202837x + 2 and the shortest distance is d2 ' 1.23042624. For clarity, we plot the result in
Figure 12.
Note: in Eq. 39 we excluded xB = 0 so we need to test the possibility that the geodesic is the line
segment connecting point (0, 2) to point (0, 0) on the curve y = x3 . However, this line segment cannot
be the required geodesic because its length is 2 which is longer than d2 .
16. Find the shortest distance between the parabola y = x2 and the straight line y = x − 5.
Answer: This is obviously a geodesic problem (since it is about shortest distance) on a Euclidean
plane with two variable boundaries. So, the solution is obviously a straight line y = ax + b whose
length between the two end points (i.e. B1 on y = x2 and B2 on y = x − 5) is the shortest distance (see
Problem 1). Moreover, from § 1.9 we should also have two transversality conditions (see Eqs. 24 and
25 noting that in the following we do not use the subscripted version of these equations for simplicity)
at the two end points, that is:
F + (Y 0 − y 0 ) Fy0 = 0 (at point B1 )
p y0 p
1 + y 02 + (Y 0 − y 0 ) p = 0 (F = 1 + y 02 ; see Problem 1)
1 + y 02
p a
1 + a2 + (2x − a) √ = 0 (y = ax + b and Y = x2 )
1 + a2
p
1 + a2 + (2x − a) a = 0 (× 1 + a2 )
2 2
1 + a + 2ax − a = 0
1 + 2ax1 = 0 (40)
AND
0 0
F + (Y − y ) Fy0 = 0 (at point B2 )
p y0 p
1 + y 02 + (Y 0 − y 0 ) p = 0 (F = 1 + y 02 ; see Problem 1)
1 + y 02
p a
1 + a2 + (1 − a) √ = 0 (y = ax + b and Y = x − 5)
1 + a2
2.1 Geodesic Curves 78
(0, 2)
2
y=−0.232x+2 (1.19858, 1.72189)
y
1
y=x3
0
(0.16705, 0.00466)
−1
−1 0 1 2 3
x
Figure 12: The result of Problem 15 of § 2.1. The point (0.16705, 0.00466) is also plotted for clarity. The
plot seems to suggest that d1 is a local maximum.
p
1 + a2 + (1 − a) a = 0 (× 1 + a2 )
1 + a2 + a − a2 = 0
a = −1 (41)
So, from Eq. 41 the equation of the geodesic becomes y = −x + b. Also, on substituting from Eq. 41
into Eq. 40 we get x1 = 1/2.
Now, point B1 is on both the geodesic y = −x + b and the boundary curve y = x2 and hence:
1 1
−x1 + b = x21 → − +b=
2 4
i.e. b = 3/4. So, the equation of the geodesic becomes y = −x + 43 .
Similarly, B2 is on both the geodesic y = −x + 43 and the boundary curve y = x − 5. Hence:
3 3
−x2 + = x2 − 5 → 2x2 = 5 +
4 4
i.e. x2 = 23/8. So in brief, the geodesic is y = 34 − x and the
shortest distance between point B1 with
1 1 23 −17
coordinates ( 2 , 4 ) and point B2 with coordinates ( 8 , 8 ) is (using Pythagoras):
s 2 2 r
q
2 2 1 23 1 17 361
(x1 − x2 ) + (y1 − y2 ) = − + + = ' 3.35875721
2 8 4 8 32
y=x2
1
(0.5, 0.25)
0
y=0.75−x
y
−1
−2
(2.875, −2.125)
−3
y=x−5
−4
−1 0 1 2 3 4 5
x
17. Find the shortest distance between the parabola y = x2 + 2 and the natural logarithm curve y = ln x.
Answer: This is obviously a geodesic problem (since it is about shortest distance) on a Euclidean
plane with two variable boundaries. So, the solution is obviously a straight line y = ax + b whose
length between the two end points (i.e. B1 on y = x2 + 2 and B2 on y = ln x) is the shortest distance
(see Problem 1). Moreover, from § 1.9 we should also have two transversality conditions (see Eqs. 24
and 25 noting that in the following we do not use the subscripted version of these equations) at the
two end points, that is:
F + (Y 0 − y 0 ) Fy0 = 0 (at point B1 )
p y0 p
1 + y 02 + (Y 0 − y 0 ) p = 0 (F = 1 + y 02 ; see Problem 1)
1 + y 02
p a
1 + a2 + (2x − a) √ = 0 (y = ax + b and Y = x2 + 2)
1 + a2
p
1 + a2 + (2x − a) a = 0 (× 1 + a2 )
1 + a2 + 2ax − a2 = 0
1 + 2ax1 = 0 (42)
AND
F + (Y 0 − y 0 ) Fy0 = 0 (at point B2 )
p y0 p
1 + y 02 + (Y 0 − y 0 ) p = 0 (F = 1 + y 02 ; see Problem 1)
1 + y 02
p a
1 + a2 + x−1 − a √ = 0 (y = ax + b and Y = ln x)
1 + a2
p
1 + a2 + x−1 − a a = 0 (× 1 + a2 )
2.1 Geodesic Curves 80
1 + a2 + ax−1 − a2 = 0
a = −x2 (43)
Now, point B1 is on both the geodesic y = ax + b and the boundary curve y = x2 + 2 and hence:
On solving this equation numerically we get x1 ' 0.335942278. Hence, x2 ' 1.488350923 (from Eq.
44), a ' −1.488350923 (from Eq. 43), and b ' 2.612857214 (from Eq. 45 or Eq. 46).
So, the geodesic is y = −1.488350923x + 2.612857214 and the shortest distance between point B1 with
coordinates (0.335942278, 2.112857214) and point B2 with coordinates (1.488350923, 0.397668744) is:
q q
2 2 2 2
(x1 − x2 ) + (y1 − y2 ) ' (0.335942278 − 1.488350923) + (2.112857214 − 0.397668744)
' 2.066377790
y=x2+2
3
2
(0.33594, 2.11285)
y
y=−1.48835x+2.61286
(1.48835, 0.39766)
0
y=ln(x)
−1
−1 0 1 2 3 4
x
As we see, the last equation means that the product of the slopes of Γ and Γ1 is −1 and hence Γ and Γ1
are perpendicular.[46] So, we conclude that Γ (whose segment between A and B is of shortest length)
is straight and perpendicular to Γ1 , as required.
O A x
Γ B
y
Figure 15: A simple sketch representing a wire Γ that connects two points (A and B) with a bead D sliding
frictionlessly along the wire under the effect of gravity alone. The bead is released from rest at point A
which is at the origin of an inverted Cartesian coordinate system and hence point A has coordinates (0, 0)
while point B has coordinates (xB , yB ). See § 2.2.
to φ = π as will be clarified in Problem 1) on the curve is independent of the position of the starting point
(i.e. the point of release A) on the curve,[48] and hence the brachistochrone curve is also a tautochrone
(i.e. same time in Greek) or isochrone (i.e. equal time in Greek). We should also note that although the
fastest descent (or brachistochrone) problem is the most famous of its kind (and hence it is investigated
systematically in almost all variational calculus texts) there are many other similar time optimization
problems. Some of these problems (such as river crossing) can be found in the textbooks and research
papers related to the calculus of variations.
Problems
1. Solve the brachistochrone problem (as described above in the text).
Answer: Let solve this problem using the setting of Figure 15 where point A is at the origin of an
inverted Cartesian coordinate system (i.e. its y axis is pointing downward) in the vertical plane. The
bead is initially at rest and hence its kinetic energy is zero (therefore any energy of the bead should be
potential). Now, as the bead is released and it starts sliding it loses potential energy by descending into
the potential well of the Earth and hence the lost potential energy will be converted to kinetic energy
according to the conservation of energy principle (or the work-energy principle). The magnitude of
the lost potential energy is given by mgy (where m is the mass of the bead, g is the magnitude of the
gravitational field and y is the vertical coordinate of the bead noting that point A is at the origin of
coordinates), and hence by the work-energy principle we have (with v being the speed of the bead):
1
mv 2 = mgy
2 p
v = 2gy
ds p
= 2gy (v = ds/dt)
dt
ds
dt = √
2gy
p q
1 + y 02 dx 2 2
p
dt = √ 02
ds = (dx) + (dy) = 1 + y dx (47)
2gy
[48] It should be obvious that the starting point is where the bead starts descending (i.e. where it is initially at rest).
2.2 Fastest Descent Curves 83
ˆ tB ˆ xB
p
1 + y 02
dt = √ dx
0 0 2gy
ˆ xB p
1 1 + y 02
tB = √ √ dx (g is constant) (48)
2g 0 y
Now, since we are supposed to minimize the time tB then the integral of the last equation represents
√ 02 integral I[y]. So, on comparing the last equation to Eq. 1 we can see that in this case
our functional
1+y √
F ≡ √y (noting that 1/ 2g is a constant and hence it is no more than a scaling factor). On
applying the Euler-Lagrange equation (noting that F has no explicit dependency on x and hence we
can use Eq. 3) we get:
p p !
1 + y 02 0 ∂ 1 + y 02
√ −y √ = C
y ∂y 0 y
p
1 + y 02 y0 2y 0
√ −√ p = C
y y 2 1 + y 02
p
1 + y 02 y 02
√ −√ p = C
y y 1 + y 02
1
√ p = C
y 1 + y 02
√ p
y 1 + y 02 = D (D = 1/C)
y 1 + y 02 = D 2
D2
y 02 = −1
y
s
0 D2 − y
y =
y
s
dy D2 − y
=
dx y
r
y
dx = dy
D2 − y
ˆ xB ˆ yB r
y
dx = 2−y
dy
0 0 D
ˆ yB r
y
xB = 2−y
dy
0 D
ˆ yr
y
x = 2−y
dy (49)
0 D
where the last equation is justified by the fact that the curve passes through the origin of coordinates
and hence xB = x and yB = y.[49]
To integrate Eq. 49 we use the substitution y = D2 sin2 θ and hence dy = 2D2 sin θ cos θdθ. Therefore,
Eq. 49 becomes:
s
ˆ θ
D2 sin2 θ
x = 2D2 sin θ cos θ dθ
0 D2 − D2 sin2 θ
[49] Theuse of y as a variable and as a limit of integration should be noticed. This abuse of notation should be tolerated
here to avoid unwanted complications.
2.2 Fastest Descent Curves 84
W P
Γ
P P
Figure 16: A simple sketch representing a cycloid Γ traced by a point P on the rim of a rolling wheel W.
See Problem 1 of § 2.2.
s
ˆ θ
sin2 θ
x = 2D2 sin θ cos θ dθ
0 1 − sin2 θ
s
ˆ θ
sin2 θ
x = 2D2 sin θ cos θ dθ
0 cos2 θ
ˆ θ
sin θ
x = 2D2 sin θ cos θ dθ
0 cos θ
ˆ θ
x = D2 2 sin2 θ dθ
0
ˆ θ
x = D2 (1 − cos 2θ) dθ using the identity sin2 θ = (1 − cos 2θ) /2
0
sin 2θ
x = D2 θ −
2
2
D
x = (2θ − sin 2θ)
2
Now, if we use a = D2 /2 and φ = 2θ we get:
x = a (φ − sin φ) and y = D2 sin2 θ = D2 /2 2 sin2 θ = a (1 − cos φ)
where we again use the identity sin2 θ = (1 − cos 2θ) /2. The last equation (which is a parametric
equation of x and y in terms of a constant a and an angular parameter φ) represents a cycloid. So,
the required brachistochrone curve is a segment of a cycloid.
Note: the cycloid is a curve traced by a point on the rim of a circular wheel (restricted to a plane) as
the wheel rolls on a straight line without slipping (see Figure 16; also see Problem 8). We should also
note that the optimal time in the brachistochrone problem is a minimum (rather than a maximum)
because the descent time along other curves (some of which will be investigated later) is longer.
2. Re-solve the brachistochrone problem (i.e. Problem 1) but this time assume that the bead is not
initially at rest.
Answer: Because the bead in this Problem is not initially at rest, it should have an initial speed vi .
On repeating the analysis of Problem 1, we should again be led to using the work-energy principle
(with v being the actual speed of the bead) but with an added term, that is:
1 1
mv 2 − mvi2 = mgy
2 2
2.2 Fastest Descent Curves 85
O A x
D Γ1
Γ B
y
Figure 17: A simple sketch representing a wire Γ connecting a fixed point A to a given curve Γ1 with a
bead D sliding frictionlessly along the wire under the effect of gravity alone. The bead is released from
rest at point A which is at the origin of an inverted Cartesian coordinate system. See Problem 3 of § 2.2.
q
v = 2gy + vi2
q
ds
= 2gy + vi2
dt
ds
dt = p
2gy + vi2
p
1 + y 02 dx
dt = p
2gy + vi2
ˆ tB ˆ xB p
1 + y 02
dt = p dx
0 0 2gy + vi2
ˆ xB p
1 1 + y 02
tB = √ p dx
2g 0 y + (vi2 /2g)
ˆ xB √
1 1 + Y 02
tB = √ √ dx
2g 0 Y
v2
where Y is defined by the coordinate transformation Y = y + 2gi (and hence y 0 = Y 0 ). As we see, the
last equation is identical in form to Eq. 48 in Problem 1 and hence the solution of this Problem is also
a cycloid.
3. Re-solve the problem of fastest descent (i.e. the brachistochrone problem) but assume this time that
the bead is sliding from a fixed point A towards a given curve Γ1 (see Figure 17).
Answer: In this Problem we are required to find the curve Γ for which the bead (which is released
from rest at a fixed point A) reaches the curve Γ1 in the least possible time. So, this problem is of the
type that we investigated in § 1.9 because the curve Γ1 represents a variable boundary (since the end
point which is free to move along this curve is not fixed).
We use in our solution the same setting as in Problem 1 (see Figure 17). If we follow similar analysis
and formulation√ to the analysis and formulation of Problem 1, then it should be clear that F again is
1+y 02
given by F ≡ √
y and hence the solution here should also be a cycloid connecting the fixed point A
2.2 Fastest Descent Curves 86
to a yet-unknown point B on the given curve Γ1 . However, we should have an additional transversality
condition which is given by Eq. 23, that is:
"p # "p #
1 + y 02 0 ∂ 1 + y 02
√ dX + (dY − y dX) 0 √ = 0
y ∂y y
p
1 + y 02 y0
√ dX + (dY − y 0 dX) p = 0
y y (1 + y 02 )
1 + y 02 y0
p dX + (dY − y 0 dX) p = 0
y (1 + y 02 ) y (1 + y 02 )
dX + y 02 dX + y 0 dY − y 02 dX
p = 0
y (1 + y 02 )
dX + y 0 dY
p = 0
y (1 + y 02 )
dX + y 0 dY = 0 (50)
dX
y0 = −
dY
dy dX
= −
dx dY
dy dY
× = −1
dx dX
The last equation means (noting that in the transversality condition y 0 belongs to Γ at B and dX
and dY belong to Γ1 at B) that the two curves Γ and Γ1 are perpendicular at B.[50] So in brief, the
solution is a cycloid curve Γ connecting the fixed point A to the curve Γ1 at a point B on Γ1 such that
the curves Γ and Γ1 are perpendicular to each other at point B.[51]
Note 1: if Γ1 is a vertical line then the perpendicularity condition requires the slope of Γ at the point
of contact to be zero (i.e. the tangent to Γ at point B is horizontal). This can be concluded from
Eq. 50 (without going through the subsequent steps) because if Γ1 is a vertical line then dX = 0 (and
dY 6= 0) and hence Eq. 50 becomes y 0 dY = 0 which leads to y 0 = 0 (since dY 6= 0).
Note 2: if Γ1 is a horizontal line (assuming this case is allowed in the formulation of the brachistochrone
problem or it is treated as a separate case) then there are many details and situations to be considered
(e.g. whether the point of contact between Γ and Γ1 is allowed to be vertically beneath the point of
release or not). In fact, some of these details should be subject to deliberation and research. However,
the simplest situation is when the point of contact is allowed to be beneath the point of release in
which case the obvious solution is a vertical straight line (i.e. free fall) where this solution can be seen
as a limiting case to the cycloid solution (noting that even this solution meets the perpendicularity
condition). Also, see Problem 10.
4. Re-solve the problem of fastest descent (i.e. the brachistochrone problem) but assume this time that
the bead is sliding from a given curve Γ1 towards a fixed point B (see the upper frame of Figure 18).
Answer: For technical reasons (which are fully explained in the literature; see for instance Weinstock
in the References), this Problem cannot be solved like Problem 3 by simple application of the above
method (with the imposition of the additional transversality condition). However, instead of going
through detailed technical analysis and formulation we can obtain the result (which is obtained tech-
nically in the literature) by a simple argument, as explained in the following paragraph.
Let the force of gravity become repulsive (instead of being attractive) and hence we are looking for
[50] Itshould be known from elementary calculus that when two curves are perpendicular then the product of their slopes at
the intersection point is −1.
[51] This should be understood to set the optimization condition for a generic form of Γ . Whether this condition can or
1
cannot be met by a certain form of Γ1 (defined by certain conditions and restrictions according to the setting of a specific
problem ) is a different story.
2.2 Fastest Descent Curves 87
Γ1
A
Γ
B
Γ1
B
A
Γ1
A
Γ
B
Figure 18: The three frames that demonstrate the setting and reasoning of Problem 4 of § 2.2. The dashed
lines in the lower frame are the tangent to Γ1 at the upper point A and the tangent to Γ at the lower
point B.
2.2 Fastest Descent Curves 88
Γ1
A
Γ
B
Γ2
Figure 19: A simple sketch representing the setting and reasoning of Problem 5 of § 2.2.
the curve of fastest ascent (which we also label with Γ for simplicity). Accordingly, this Problem will
reduce to the setting and formulation of Problem 3 where the bead is sliding from a fixed (lower) point
towards (a higher point on) a given curve Γ1 (see the middle frame of Figure 18) along a curve Γ where
this curve (i.e. Γ) is also a cycloid but it is now concaving downward. So, according to the result of
Problem 3 the curve Γ (in the middle frame of Figure 18) should meet the curve Γ1 at a point where
their tangents are perpendicular. Hence, we obtained the shape of the brachistochrone curve (and in
fact the solution) but for the setting of repulsive gravity. Now, let us return to the attractive gravity.
In fact, all we need to do now is to rotate the curve Γ through π (i.e. 180◦ ) to match the setting and
effect of attractive gravity. Accordingly, the solution of the present Problem is a cycloid (concaving
upward ) whose tangent at the lower point is perpendicular to the tangent of Γ1 at the upper point
(see the lower frame of Figure 18).
5. Re-solve the problem of fastest descent (i.e. the brachistochrone problem) but assume this time that
the bead is sliding from a given curve Γ1 towards a given curve Γ2 (see Figure 19).
Answer: If we follow a similar argument to that used in Problem 4 then we can say that the solution
of the present Problem can be obtained by combining the results of Problem 3 and Problem 4. In
brief, the curve Γ is a (segment of) given cycloid that connects a point A on Γ1 to a point B on Γ2 . So,
from the point of view of connecting point A to curve Γ2 the perpendicularity condition of Problem 3
applies, while from the point of view of connecting curve Γ1 to point B the perpendicularity condition
of Problem 4 applies. Hence, the solution should be a cycloid that connects points A and B of equal
slope (i.e. the slope of Γ1 at point A is equal to the slope of Γ2 at point B).
Note: in fact, there are many details and considerations that should be taken into account in tackling
this Problem and establishing its rationale and formulation. So, the above argument is primitive and
rough and should be treated as a pedagogical exercise.
6. Referring to the setting of Problem 1, a bead on a cycloid (i.e. cycloid-shaped wire) given by x =
a (φ − sin φ) and y = a (1 − cos φ) started descending from point (0, 0) at a given time and reached the
lowest point on the cycloid at a later time. Find the time required for the bead to descend from point
(x1 , y1 ) to point (x2 , y2 ) where 0 ≤ x1 < x2 ≤ aπ.
2.2 Fastest Descent Curves 89
Answer: We have:
dy a sin φdφ sin φ
dx = a (1 − cos φ) dφ dy = a sin φdφ y0 ≡ = =
dx a (1 − cos φ) dφ 1 − cos φ
Now, let t1 and t2 correspond to points (x1 , y1 ) and (x2 , y2 ) respectively (and similarly for φ1 and φ2 )
and the time required for the bead to descend from point (x1 , y1 ) to point (x2 , y2 ) be T = t2 − t1 .
Starting from Eq. 47, we have:
p
1 + y 02
dt = √ dx
2gy
ˆ t2 ˆ x2 s
1 + y 02
dt = dx
t1 x1 2gy
v
ˆ φ2 u
u 1 + sin2 φ 2
t2 t (1−cos φ)
[t]t1 = a (1 − cos φ) dφ
φ1 2ga (1 − cos φ)
ˆ φ2 s 2
(1 − cos φ) + sin2 φ
t2 − t1 = 3 a (1 − cos φ) dφ
φ1 2ga (1 − cos φ)
ˆ φ2 s
1 − 2 cos φ + cos2 φ + sin2 φ
T = 3 a (1 − cos φ) dφ
φ1 2ga (1 − cos φ)
ˆ φ2 s
2 − 2 cos φ
T = 3 a (1 − cos φ) dφ
φ1 2ga (1 − cos φ)
ˆ φ2 s 2 3
2a (1 − cos φ)
T = 3 dφ
φ1 2ga (1 − cos φ)
ˆ φ2 r
a
T = dφ
φ1 g
r
a
T = (φ2 − φ1 ) (51)
g
Note: as we see, the last equation means that the time of descent between two given points on this
curve (assuming the motion started at the origin) is proportional to the angular displacement (φ2 −φ1 ).
For example, the time of descent from the point corresponding to φ1 = 0 to the point corresponding
to φ2 = π/2 is the same as the time of descent from the point corresponding to φ1 = π/2 to the point
corresponding to φ2 = π. Now, if we note that the angular displacement is proportional to the x
coordinate of the point of contact of the “wheel” that generates the cycloid (since xw = aφ where xw
is the x coordinate of the point of contact of the “wheel”) we can imagine the movement of the bead
as if it is the result of being attached to the rim of a wheel that moves uniformly (i.e. with constant
speed) on the x axis.
7. Show that the brachistochrone curve is also a tautochrone (or isochrone) curve.
Answer: In this answer we again use the setting of Problem 1. Our objective here is to show that the
time taken by a bead to descend to the lowest point (xl , yl ) = (aπ, 2a) on the curve is independent of
the position of the point of release (from rest) on the curve. Let note first that according to Eq. 51
the time of descent of the bead to the lowest point (i.e. the point corresponding to φ2 = π) on the
brachistochrone assuming the p bead is released (from rest) at the origin of coordinates (i.e. the point
corresponding to φ1 = 0) is π a/g. Now, let assume that instead of releasing the bead (from rest)
at the origin of coordinates the bead is released (from rest) at a lower point on the cycloid say point
2.2 Fastest Descent Curves 90
(xs , ys ) where 0 < xs < xl and 0 < ys < yl and hence Eq. 47 becomes:
p
1 + y 02
dt = p dx
2g (y − ys )
ˆ tl ˆ xl s
1 + y 02
dt = dx
ts xs 2g (y − ys )
v
ˆ φl u
u sin2 φ
1 + (1−cos
t t φ)2
[t]tls = a (1 − cos φ) dφ (see Problem 6)
φs 2ga ([1 − cos φ] − [1 − cos φs ])
ˆ s
π 2
(1 − cos φ) + sin2 φ
tl − ts = 2 a (1 − cos φ) dφ
φs 2ga (cos φs − cos φ) (1 − cos φ)
ˆ s
π
2 − 2 cos φ
tl − ts = 2 a (1 − cos φ) dφ
φs 2ga (cos φs − cos φ) (1 − cos φ)
ˆ s
π 3
2a2 (1 − cos φ)
tl − ts = 2 dφ
φs 2ga (cos φs − cos φ) (1 − cos φ)
ˆ s
π
a (1 − cos φ)
tl − ts = dφ
φs g (cos φs − cos φ)
r ˆ πs
a 1 − cos φ
tl − ts = dφ
g φs cos φs − cos φ
r ˆ π s
a 1 1 − cos φ
tl − ts = 2 dφ
g φs 2 cos φs − cos φ
q
r ˆ π 1−cos φ
a 2
tl − ts = 2 q dφ
g φs 2 cos φs −cos φ
2
q
r ˆ π 1−cos φ
a 2
tl − ts = 2 q dφ
g φs 2 1+cos φs − 1+cos φ
2 2
r ˆ π
a sin (φ/2)
tl − ts = 2 p dφ (trigonometric identities)
g φs 2 cos2 (φs /2) − cos2 (φ/2)
r ˆ π
a sin (φ/2)
tl − ts = 2 q dφ
g φs 2 cos (φ /2) cos2 (φs /2)−cos2 (φ/2)
s cos2 (φs /2)
r ˆ π
a sin (φ/2)
tl − ts = 2 q dφ
g φs 2 cos (φ /2) 1 − cos2 (φ/2)
s cos2 (φs /2)
r π
a cos (φ/2)
tl − ts = 2 − arcsin
g cos (φs /2) φs
r
a cos (π/2) cos (φs /2)
tl − ts = 2 − arcsin + arcsin
g cos (φs /2) cos (φs /2)
r
a
tl − ts = 2 − arcsin 0 + arcsin 1
g
2.2 Fastest Descent Curves 91
y
yP P
a Γ
φ a W
O xP (aφ, 0) x
Figure 20: A simple sketch representing a cycloid Γ generated by a point P on the rim of a rolling wheel
W of radius a. As can be seen, the x and y coordinates of the point P during the rolling of the wheel are
xP = aφ − a sin φ and yP = a − a cos φ. See Problem 8 of § 2.2.
r h
a πi
tl − ts = 2 −0 +
g 2
r
a
tl − ts = π
g
So, according to this equation the time of descent (i.e. tl − ts ) to the lowest point is independent of the
position of the release point. In fact, this time is the same as the time of descent if the bead is released
from the origin of coordinates (as we noted earlier) which is logical since the release from the origin of
coordinates is just a special case of a (general) release point. So, we conclude that the brachistochrone
curve is also a tautochrone (or isochrone) curve, as required.
8. Show that the cycloid is a curve traced by a point on the rim of a circular wheel (restricted to a plane)
as the wheel rolls on a straight line without slipping.
Answer: From the construction of Figure 20 (where the wheel W of radius r = a generates the cycloid
Γ) we can easily see that the x and y coordinates of point P (i.e. the point that generates the cycloid)
are:
xP = aφ − a sin φ = a (φ − sin φ)
yP = a − a cos φ = a (1 − cos φ)
We note that the signs of the trigonometric functions over the entire range of φ (which is of fundamental
cycle 0 ≤ φ ≤ 2π) take care of all the possibilities.
9. Using the setting of Problem 1, find the equation of the curve of fastest descent between the points
(0, 0) and (1, 1) and plot it.
Answer: From the result of Problem 1, the curve of fastest descent is a cycloid given by the parametric
equations x = a (φ − sin φ) and y = a (1 − cos φ). Now, the point (1, 1) should satisfy these equations
and hence we have:
a (φ − sin φ) = a (1 − cos φ)
φ − sin φ = 1 − cos φ
2.2 Fastest Descent Curves 92
On solving this equation (numerically or analytically) we get φ ' 2.412011144. On inserting this value
of φ into one of the equations of Eq. 52 we get a ' 0.572917037. So, the curve of fastest descent that
connects the points (0, 0) and (1, 1) is given by the parametric equations:
x = 0.572917037 (φ − sin φ) and y = 0.572917037 (1 − cos φ)
where 0 ≤ φ ≤ 2π for a complete cycle. This curve is plotted (for the range 0 ≤ φ ≤ 2π) in Figure 21.
x
0 0.5 1 1.5 2 2.5 3 3.5
0
0.4
y
0.8
1.2
Figure 21: Plot of the curve of fastest descent that connects the points (0, 0) and (1, 1) for the range
0 ≤ φ ≤ 2π where the point (1, 1) is marked. See Problem 9 of § 2.2.
10. Find the time of descent from point (0, 0) to the lowest point on the curve of Problem 9 (assuming
SI units). Also, compare the time of descent of this Problem to the time of free fall and the time of
descent on the straight line that connects the two points, i.e. point (0, 0) and the lowest point.
Answer: Regarding the time of descent from point (0, 0) to the lowest point on the curve of Problem
9, we use Eq. 51 where the point (0, 0) corresponds to φ1 = 0 while the lowest point corresponds to
φ2 = π, that is: r
r
a 0.572917037
T = (φ2 − φ1 ) = (π − 0) ' 0.7596 s
g 9.8
The time of free fall (i.e. vertically from ys = 0 to yl = 2a) is:
r r r
2 2 2
T = (yl − ys ) = (2a − 0) = (2 × 0.572917037) ' 0.4836 s
g 9.8 9.8
Regarding the time of descent on the straight line that connects the two points, we use Eq. 47 (noting
∆y
that for straight line the slope y 0 is constant which in our case is y 0 = ∆x 2a
= aπ = π2 and y = y 0 x = π2 x)
and hence we have:
p
1 + y 02
dt = √ dx
2gy
s
1 + π42
dt = dx
2g π2 x
s
ˆ aπ
π 1 + π42
T = dx
0 4gx
s ˆ aπ
π 1 + π42
T = x−1/2 dx
4g 0
2.3 The Catenary 93
s
π 1 + π42 √ aπ
T = 2 x 0
4g
s
√ π 1 + π42
T = 2 aπ
4g
s
aπ 2 1 + π42
T =
g
s
4
0.572917037 × π 2 1 + π2
T =
9.8
T ' 0.90046 s
As we see, the shortest is the free fall time, followed by the time of descent on the brachistochrone
curve, followed by the time of descent on the straight line.
11. Summarize the main properties of the brachistochrone curve that we obtained in this section.
Answer: We note the following:
• It is the curve of fastest descent between two points (within the stated conditions and considering
variant cases). This property is a “consequence” of its characteristic as a brachistochrone (see the text
and Problem 1).
• The time of descent on this curve is proportional to the angular displacement (see Problem 6).
• The brachistochrone curve is also a tautochrone (i.e. same time) or isochrone (i.e. equal time) curve
(see Problem 7).
[52] Wenote that many (if not most) problems in physics and mathematics (including the calculus of variations) can be solved
by various methods and techniques and hence the problem of catenary is not an exception in this regard. However, the
catenary problem is used here as a case study (due to its simplicity and wide use) for demonstration and practice. Some
(few) other Problems in this book are also solved by different methods and techniques for the same reasons.
2.3 The Catenary 94
y
A g B
Γ
O x
Figure 22: A schematic illustration of the setting of the problem of catenary where a chain Γ of uniform
linear density is hanging freely from its two fixed ends (A and B) in a uniform gravitational field g. See
Problem 1 of § 2.3.
y
|Fτ | sin φ Fτ
s φ |Fτ | cos φ
Fh Fv
O x
Figure 23: Illustration of the setting of the catenary problem (for the solution by ordinary calculus). See
Problem 3 of § 2.3.
|Fτ | sin φ = Fv
On dividing the second equation by the first we get:
Fv
tan φ =
Fh
dy gµs
=
dx Fh
gµ
y0 = s
Fh
dy 0 gµ ds
=
dx Fh dx
dy 0 gµ p
= 1 + y 02
dx Fh
dy 0 gµ
p = dx
1 + y 02 Fh
gµ
arcsinh (y 0 ) = x+E (E is constant)
Fh
gµ
y0 = sinh x+E
Fh
!
x
y0 = sinh Fh + E
gµ
Fh
!
0 x gµ E
y = sinh Fh
+ Fh
gµ gµ
Fh
!
0
x+ gµ E
y = sinh Fh
gµ
!
Fh x + Fgµh E
y = cosh Fh
gµ gµ
x−D
y = C cosh
C
2.3 The Catenary 97
where C = Fgµh and D = − Fgµh E. This is the same as Eq. 55 and hence the solution from ordinary
calculus is the same as the solution from the calculus of variations (i.e. hyperbolic cosine or catenary).
4. Given that the catenary of Problem 1 passes through the boundary points (1, 1) and (2, 2), find the
equation of this catenary and plot it.
Answer: We solve Eq. 55 (or Eq. 54) for D, that is:
y
D = x − C arccosh
C
On substituting the coordinates of the two boundary points into this equation we get:
1
D = 1 − C arccosh (57)
C
2
D = 2 − C arccosh (58)
C
On subtracting Eq. 57 from Eq. 58 we get:
2 1
1 − C arccosh + C arccosh =0
C C
On solving this equation for C (using a numerical solver) we get C ' 0.949988827 and hence D '
0.693083213 (where we use Eq. 57 or Eq. 58 to find D). Accordingly, the equation of this catenary is:
x − 0.693083213
y ' 0.949988827 cosh ' 0.949988827 cosh (1.052643959x − 0.729569857)
0.949988827
The catenary is plotted in Figure 24.
1.8
1.6
y
1.4
1.2
1
1 1.2 1.4 1.6 1.8 2
x
Figure 24: Plot of the (segment of the) catenary that passes through the boundary points (1, 1) and (2, 2).
See Problem 4 of § 2.3.
2.4 Isoperimetric Problems 98
[54] In fact, most of the Problems in the present section are about curves of given length that enclose or border optimal areas
although in some Problems we investigated the opposite (i.e. given areas enclosed by curves of optimal length) which
may be more appropriate for the classification and inclusion of this section in the “Optimal Curves” chapter.
[55] In this Problem the two end points of the curve are not fixed, i.e. the Problem is based on assuming a loose curve whose
end points are free to move on a straight line such that the enclosed area is maximum. In fact, fixing one point and
leaving the other free should be sufficient.
2.4 Isoperimetric Problems 99
y
Γ
σ
O A B x
Figure 25: A simple sketch demonstrating a planar curve Γ of a given length l that encloses maximum
area σ between it and the x axis. See Problem 1 of § 2.4.
r y 2
dy
= ± 1−
ds C
±dy
q = ds
y 2
1− C
y
±C arcsin = s+D
C
s+D
y = ±C sin
C
Now, at point A we have s = 0 and y = 0 and at point B we have s = l and y = 0, that is:
D l+D
0 = ±C sin and 0 = ±C sin
C C
that is:
D l+D l
= mπ and = + mπ = nπ (m and n are integers)
C C C
l lmπ lm
So, C = (n−m)π and D = Cmπ = (n−m)π = (n−m) . Now, to simplify the solution we make n − m = 1
(because m and n are arbitrary) and make D = 0 (because it is just a shift in s and hence it is also
arbitrary).[56] We also note that the curve is above the x axis (and hence y is positive). Accordingly,
the solution becomes: πs
l
y = sin (0 ≤ s ≤ l)
π l
Now, we have:
dy πs
= cos
ds l
2 πs
dy
= cos2
ds l
2
dx πs h
2 2 2
i
1− = cos2 (ds) = (dx) + (dy)
ds l
[56] Infact, this may be seen as choosing a specific solution more than a simplification although this should not affect the
generality of the final result.
2.4 Isoperimetric Problems 100
2 πs
dx
= 1 − cos2
ds l
2 πs
dx
= sin2
ds l
dx πs
= ± sin
ds l
l πs
x = ∓ cos +E (E is constant)
π l
l πs
x−E = ∓ cos
π l
Therefore:
2 πs l 2 πs
2 l
(x − E) + y 2 = cos2 + sin2
π l π l
2 h i
2 l πs πs
(x − E) + y 2 = cos2 + sin2
π l l
2
2 l
(x − E) + y 2 = (59)
π
which is an equation of a circle with center (E, 0) and radius l/π. Accordingly, the curve has the shape
of a circular arc. In fact, the curve is a semi-circle of length l and center (E, 0).
Note: the optimal area in this Problem is a maximum (not a minimum) because the enclosed area
can approach zero when the curve Γ approaches the straight line (and hence Γ becomes effectively
straight). The enclosed area can also approach zero in another extreme case when the two end points
of the curve approach each other and the two sides of the curve also approach each other. This result
will also be confirmed in Problem 2 (assuming the curve to be a circular arc which is the result that
we obtained in the present Problem). We should also note that if the curve is shaped as a complete
l2
circle (i.e. with A and B being the same point) then its area (which is σ = 4π ) is half the area of the
2
l
semi-circle (which is σ = 2π ).
2. Assuming that the curve in Problem 1 is a circular arc, show formally that it is a semi-circle.[57]
Answer: The area of a segment of a circle is given by:
1 2 1 l2 l2 1 sin θ
σ = (θ − sin θ) r = (θ − sin θ) 2 = − 2
2 2 θ 2 θ θ
where in the second step we used r = l/θ (since l = rθ with r being the radius of the circle, θ being the
central angle subtending the segment, and l being the length of the circular arc subtending θ). Now,
to find the optimal value of σ we take its derivative with respect to θ (since it is the variable noting
that l is fixed) and set it to zero to obtain the optimization condition, that is:
2 dσ
= 0
l2 dθ
1 cos θ sin θ
− 2 − 2 +2 3 = 0
θ θ θ
θ + θ cos θ − 2 sin θ = 0
The solution of this equation in the interval (0, 2π] is θ = π, i.e. the segment of optimal area subtends
an angle θ = π which means that the circular arc is a semi-circle and the segment is a semi-disc. This
result can also be confirmed graphically by plotting σ as a function of θ (see Figure 26 where σ is given
in units of l2 ).
[57] Noting
the setting of Problem 1, Eq. 59 should be sufficient for this demonstration. So, the present Problem is about
another way for the required demonstration.
2.4 Isoperimetric Problems 101
0.16
0.12
σ/l2
0.08
0.04
0
0 1 2 3 4 5 6
θ
1
Figure 26: Plot of σ (in units of l2 ) as a function of θ where the peak of the curve at (π, 2π ) is marked.
See Problem 2 of § 2.4.
We should finally note that we can show formally that the optimal area in this Problem is a maximum
(not a minimum) by employing the second derivative test, that is:
d2 σ l2 2 2 cos θ sin θ 6 sin θ 2 cos θ
= + + 2 − +
dθ2 2 θ3 θ3 θ θ4 θ3
θ=π
θ=π
l2 2 2 cos π sin π 6 sin π 2 cos π
= + + 2 − +
2 π3 π3 π π4 π3
l2 2 2 2
= − 3 +0−0− 3
2 π3 π π
l2
= − 3 <0
π
Hence, the optimal area is a maximum.
3. Given that the length of the curve in Problem 1 is l = 5, find the maximum enclosed area.
Answer: As seen in Problems 1 and 2, the curve is a semi-circle of radius r = πl and hence the
maximum enclosed area is:
1 2 1 l2 l2 52 25
σ= πr = π 2 = = = ' 3.97887
2 2 π 2π 2π 2π
4. Find the shape of a planar curve Γ of a given length l that has maximum area beneath it (and above
the x axis) with two fixed end points A and B (see Figure 27).
Answer: We are supposed to maximize the area beneath the curve subject to the constraint that
the length of the curve is equal to l and´ hence we use the Lagrange multipliers technique (see § 1.8).
x
Now, the area is given by the integral x12 y dx while the length of the curve is given by the integral
2.4 Isoperimetric Problems 102
y B
Γ
A
σ
O x1 x2 x
Figure 27: A simple sketch demonstrating a planar curve Γ of a given length l (connecting two fixed points
A and B) that has maximum area σ beneath it. See Problem 4 of § 2.4.
´ q 2 2 ´x p p
Γ
(dx) + (dy) = x12 1 + y 02 dx. So, F = y and G = 1 + y 02 and hence H ≡ F + λG =
p
y + λ 1 + y 02 . Using Eq. 22, we have:[58]
∂H d ∂H
− = 0
∂y dx ∂y 0
p h p i
∂ d ∂
y + λ 1 + y 02 − y + λ 1 + y 02 = 0
∂y dx ∂y 0
!
d λy 0
1− p = 0
dx 1 + y 02
!
d λy 0
p = 1
dx 1 + y 02
λy 0
p = x + C1
1 + y 02
2
λ2 y 02 = (x + C1 ) 1 + y 02
h i
2 2
y 02 λ2 − (x + C1 ) = (x + C1 )
x + C1
y0 = q (60)
2
λ2 − (x + C1 )
q
2
y = − λ2 − (x + C1 ) + C2
2 2
(x + C1 ) + (y − C2 ) = λ2 (61)
As we see, the last line is the equation of a circular arc with center (−C1 , C2 ) and radius |λ|.
Note 1: the three parameters C1 , C2 , λ in the above solution can be determined from the two boundary
conditions (i.e. the coordinates of the end points A and B) plus the constraint on the length l (see
[58] In fact, we can also us the Beltrami identity (Eq. 3) with H replacing F .
2.4 Isoperimetric Problems 103
Problem 5).
Note 2: assuming the curve to concave downward the optimal area is a maximum (not a minimum)
because the area can become a minimum when the curve concaves upward. In fact, the technicalities
of the problem (supported by the associated conditions) should take care of these issues.
Note 3: we are assuming certain restrictions on the length of the curve (i.e. limits on its minimum
and maximum) for this Problem and its solution to apply. The details can be easily worked out in
each particular problem.
5. Given that the curve in Problem 4 passes through the points (2, 4) and (5, 1) and the length of the arc
is 3π
2 , determine the unknown constants in the solution and hence obtain the specific solution. Also,
plot the curve and find the area beneath the arc.
Answer: On inserting the coordinates of the points (2, 4) and (5, 1) into Eq. 61 we get:
2 2
(2 + C1 ) + (4 − C2 ) = λ2 (62)
2 2 2
(5 + C1 ) + (1 − C2 ) = λ (63)
q " ! !#
3π 5 + C1 2 + C1
= 29 + 14C1 + 2C12 arcsin p − arcsin p
2 29 + 14C1 + 2C12 29 + 14C1 + 2C12
where in the last equation we substituted for λ from Eq. 65. On using a numerical solver we find
C1 = −2 and hence C2 = 1 (from Eq. 64) and λ = 3 (from Eq. 65). Therefore, the specific solution
2 2
is (x − 2) + (y − 1) = 9. The circle is plotted in Figure 28. Regarding the area beneath the arc, it
is obvious that the central angle that subtends the arc i.e. the angle between the points (2, 4), (2, 1)
and (5, 1) is a right angle and hence the area of this sector
is one quarter of the area of the circle, i.e.
9π
should also add the area of the rectangle whose vertices are the points (2, 0), (5, 0), (5, 1)
it is 4 . We
and (2, 1) which is equal to 1 × 3 = 3. Therefore, the area beneath the arc is σ = 9π 4 + 3 ' 10.06858.
y
5
(2, 4)
4
3
2 σ
1 (2, 1) (5, 1)
−2 −1 O1 2 3 4 5 x
−1
−2
−3
2 2
Figure 28: Plot of the circle (x − 2) + (y − 1) = 9 where the area σ under the circular arc is shown in
gray. See Problem 5 of § 2.4.
6. Find the shape of the closed plane curve of a given length that encloses maximum area.
Answer: We use a 2D Cartesian system (in the plane of the curve) whose origin O is inside the curve
_
(see Figure 29). Now, an infinitesimal sector OAB can be represented by a triangle OAB and hence
its area is given by:
1 1 1
dσ = |r × dr| = |(x, y, 0) × (dx, dy, 0)| = (x dy − y dx)
2 2 2
Therefore, the area inside the curve is:
˛ ˛ ˛
1 1
σ = dσ = (x dy − y dx) = (xy 0 − y) dx
2 2
2.4 Isoperimetric Problems 105
Γ σ B
dr
O r A
Figure 29: A simple sketch representing a closed planar curve Γ enclosing an area σ. See Problem 6 of §
2.4.
¸q 2 2 ¸p
Similarly, the length of the curve is given by the integral (dx) + (dy) = 1 + y 02 dx.
Now, we are required to maximize the area subject to the constraint that the length is constant.
p So,
we use the Lagrange multipliers technique (see § 1.8) where H ≡ F + λG = 21 (xy 0 − y) + λ 1 + y 02 .
Using the Euler-Lagrange equation (Eq. 22), we have:
p p
∂ 1 d ∂ 1
(xy 0 − y) + λ 1 + y 02 − (xy 0
− y) + λ 1 + y 02 = 0
∂y 2 dx ∂y 0 2
!
1 d x λy 0
− − +p = 0
2 dx 2 1 + y 02
!
d x λy 0 1
+p = −
dx 2 1+y 02 2
x λy 0 x
+p = − +C
2 1 + y 02 2
λy 0
p = C −x
1 + y 02
2
λ2 y 02 = (C − x) 1 + y 02
2 2
λ2 y 02 − (C − x) y 02 = (C − x)
2
(C − x)
y 02 = 2
λ2 − (C − x)
(C − x)
y0 = ±q
2
λ2 − (C − x)
q
2
y = ± λ2 − (C − x) + D
2 2
(C − x) + (y − D) = λ2
2 2
(x − C) + (y − D) = λ2
This is an equation of a circle with center (C, D) and radius |λ|. So, the shape of the curve that
maximizes the enclosed area is a circle (noting that the solution cannot be a minimum because the
area enclosed by a squeezed shape can approach zero). In fact, this solution is also intuitive and hence
it was reached even by the ancient scholars using very simple arguments and reasoning.
2.4 Isoperimetric Problems 106
y
A
Γ
B
σ = constant
O x
Figure 30: A simple sketch demonstrating the setting of Problem 8 of § 2.4 where a planar curve Γ of
shortest length (connecting two fixed points A and B) encloses fixed area σ between it and the x axis.
7. Find the equation of the closed plane curve of length l ' 4.18879 that encloses maximum area σ and
passes through the points (−0.5, −0.83333) and (0, −1.05904). Also find the enclosed area.
Answer: From the answer of Problem 6 we know that this curve is a circle of radius λ and circumference
l. Accordingly, l = 4.18879 = 2πλ and hence λ = 0.66667 = 2/3. Therefore, the equation of the curve
2 2
becomes (x − C) + (y − D) = 49 . On substituting the coordinates of the points (−0.5, −0.83333) and
(0, −1.05904) into this equation we get:
2 2 4
(−0.5 − C) + (−0.83333 − D) =
9
2 4
C 2 + (−1.05904 − D) =
9
On solving these equations simultaneously (e.g. by substitution from the second into the first) we get
2 2
C = −0.5 and D = −1.5.[59] So, the equation of the curve is (x + 0.5) + (y + 1.5) = 94 . The enclosed
2 4π
area is σ = πλ = 9 ' 1.396263.
8. Find the shape of the planar curve of shortest length that connects two given points say point (x1 , y1 )
and point (x2 , y2 ) such that the area beneath it (and above the x axis) is a given constant (see Figure
30).
Answer: We are required to minimize the arc length s of the curve Γ which is represented by the
function y = y(x) subject to the area constraint σ = a (where a is a given constant). Now, the
´ ´ x2 p ´x
length is given by s = Γ ds = x1 1 + y 02 dx ≡ I1 while the area is given by σ = x12 y dx ≡ I2 .
p
So, on using thepLagrange multipliers technique (see § 1.8) with F ≡ 1 + y 02 and G ≡ y we have
H = F + λG = 1 + y 02 + λy. Hence, the Euler-Lagrange equation (noting that H is independent of
x and hence we can use Eq. 3 with H replacing F ) is:
∂H
H − y0 = C
∂y 0
hp i ∂ h p i
1 + y 02 + λy − y 0 0 1 + y 02 + λy = C
∂y
p y0
1 + y 02 + λy − y 0 p = C
1 + y 02
[59] In fact, there is another solution (i.e. C = 0 and D ' −0.392373) which we do not consider here.
2.4 Isoperimetric Problems 107
p p
1 + y 02 + λy 1 + y 02 − y 02 = C 1 + y 02
p p
1 + λy 1 + y 02 = C 1 + y 02
p −1
1 + y 02 =
λy − C
1
1 + y 02 = 2
(λy − C)
s
1
y0 = ± 2 −1
(λy − C)
dy
±q = dx
1
(λy−C)2
−1
q
2
1 − (λy − C)
∓ = x+D
λ
2
1 − (λy − C) 2
= (x + D)
λ2
2
1 C 2
− y − = (x + D)
λ2 λ
2
2 C 1
(x + D) + y − =
λ λ2
As we see, this is an arc of a circle with center (−D, C/λ) and radius |1/λ|.
Note: the constants C,´D, λ can be determined from the information about the two boundary points
x
and the constraint σ = x12 y dx = a (see Problem 9).
√ √
9. Given that the curve in Problem 8 passes through the points (− 2, 0) and ( 2, 0) and the area beneath
it and above the x axis is 1.1416, determine the unknown constants in the solution and hence obtain
the specific solution. Also, plot the curve and find the length of the arc.
Answer: As we found in the answer of Problem 8, the curve is (an arc of) a circle and hence its
2 2
equation is (x − c1 ) + (y − c2 ) = R2 with (c1 , c2 ) being the center and R being the radius. So,
instead of determining C, D, λ we determine c1 , c2 , R and hence obtain the specific
√ solution.
√ Now,
from the symmetry of the circular segment as can be seen from the points (− 2, 0) and ( 2, 0) we
know that the center of the circle is on the y axis and hence c1 = 0. Thus, the equation of the circle
2
becomes x2 + (y − c2 ) = R2 . We also know from elementary geometry that the area σ of a segment
of a circle with radius R is given by:
s
2
1 2 l l
σ= 2R arcsin − l R2 −
2 2R 2
√ l is the length of the chord (i.e. the straight base of the segment)
where √ which in our case is equal
to 2 2. On using a numerical solver (with σ = 1.1416 and l = 2 2) we find R =√2 and hence the
2
equation of the circle becomes x2 + (y − c2 ) = 4. We finally use one point say ( 2, 0) to find c2 ,
that is: √ 2
2
√
2 + (0 − c2 ) = 4 and hence c2 = − 2
where we take the negative root for obvious geometric considerations. Therefore, the specific solution
√ 2
is x2 + y + 2 = 4. The circle is plotted in Figure 31. Regarding the length of the arc, it is obvious
√ √
that the central angle that subtends the arc i.e. the angle between the points (− 2, 0), (0, − 2) and
√
( 2, 0) is a right angle and hence the arc is one quarter of the perimeter, i.e. it is π.
2.4 Isoperimetric Problems 108
y
1
√ √
(− 2, 0) σ ( 2, 0)
−2 −1 O 1 2 x
−1
√
(0, − 2)
−2
−3
−4
√ 2
Figure 31: Plot of the circle x2 + y + 2 = 4 where the segment of area σ under the circular arc is
shown in gray. See Problem 9 of § 2.4.
10. Given that the curve in Problem 8 passes through the points (−1, 5) and (4, 0) and the area beneath
it and above the x axis is 25π
4 , determine the unknown constants in the solution and hence obtain the
specific solution. Also, plot the curve and find the length of the arc.
Answer: As in Problem 9, the curve is a circle and its equation can be written as:
2 2
(x − c1 ) + (y − c2 ) = R2 (66)
where (c1 , c2 ) is the center and R is the radius. Now, from the points (−1, 5) and (4, 0) we get:
2 2
(−1 − c1 ) + (5 − c2 ) = R2 (67)
2 2 2
(4 − c1 ) + (0 − c2 ) = R (68)
where in the last equation we substituted for c2 and R from Eqs. 69 and 70. On using a numerical
solver we find c1 = −1 and hence c2 = 0 (from Eq. 69) and R = 5 (from Eq. 70). Therefore, the
2
specific solution is (x + 1) + y 2 = 25. The circle is plotted in Figure
32. Regarding the length of
the arc, it is obvious that the central angle that subtends the arc i.e. the angle between the points
(−1, 5), (−1, 0) and (4, 0) is a right angle and hence the arc is one quarter of the perimeter, i.e. it is
5π
2 .
11. Find the shape of the closed plane curve of shortest length that encloses p a given area.
Answer: If we repeat the analysis of Problem 6 then we have H = 1 + y 02 + λ 12 (xy 0 − y) where
we shift the roles of F and G in that Problem (because we are trying to optimize the length while
imposing the area constraint). Accordingly, the Euler-Lagrange equation (Eq. 22) for this Problem is:
∂ p 1 d ∂ p 1
1 + y 02 + λ (xy 0 − y) − 1 + y 02 + λ (xy 0 − y) = 0
∂y 2 dx ∂y 0 2
2.4 Isoperimetric Problems 110
y
6
(−1, 5) 5
4
3
2 σ
1
(−1, 0) O (4, 0)
−7 −6 −5 −4 −3 −2 −1 1 2 3 4 5 x
−1
−2
−3
−4
−5
−6
2
Figure 32: Plot of the circle (x + 1) + y 2 = 25 where the part of area σ under the circular arc is shown
in gray. See Problem 10 of § 2.4.
!
λ d y0 λx
− − p + = 0
2 dx 1 + y 02 2
!
d y0 λx λ
p + = −
dx 1 + y 02 2 2
y0 λx λx
p + = − +C
1 + y 02 2 2
0
y
p = −λx + C
1 + y 02
2
y 02 = (λx − C) 1 + y 02
2
(λx − C)
y 02 = 2
1 − (λx − C)
(λx − C)
y0 = ±q
2
1 − (λx − C)
q
1 2
y = ∓ 1 − (λx − C) + D
λ
2 1 h 2
i
(y − D) = 1 − (λx − C)
λ2
2.4 Isoperimetric Problems 111
2
C 2 1
x− + (y − D) =
λ λ2
This is an equation of a circle with center (C/λ, D) and radius |1/λ|. So, the shape of the closed plane
curve of shortest length that encloses a given area is a circle (noting that the solution cannot be a
maximum because the length of the curve of a squeezed shape that encloses that area can diverge, e.g.
a rectangle with one of its sides approaches zero).
Note: the result of this Problem is intuitive and can be concluded from Problem 6 as a corollary using
the proof by contradiction method[60] because if the closed plane curve of a given enclosed area (say
σ0 ) and shortest length (say p0 where p stands for perimeter) was not a circle then the circle of area σ0
should have a longer perimeter than p0 and hence the circle of perimeter p0 should have smaller area
than σ0 and this contradicts the result of Problem 6 because the circle of perimeter p0 should enclose
maximum area.[61]
12. Find the equation of the closed plane curve of shortest length l that encloses an area σ ' 28.2743 and
passes through the points (6, −1) and (3, 2). Also, plot the result and find the length of the curve.
Answer: From the answer of Problem 11 we know that this curve is a circle, so we can write the
2 2
equation of the curve as (x − cx ) + (y − cy ) = R2 where (cx , cy ) is the center of the circle and R is
2
its radius. Now, σ = 28.2743 = πR and hence R = 3. Therefore, the equation of the curve becomes
2 2
(x − cx ) + (y − cy ) = 9. On substituting the points (6, −1) and (3, 2) into this equation we get:
2 2
(6 − cx ) + (−1 − cy ) = 9 (71)
2 2
(3 − cx ) + (2 − cy ) = 9 (72)
[60] The proof by contradiction method should be linked in many cases to the “Principle of Reciprocity” which states (in one
of its forms) that if y optimizes I1 [y] subject to the condition that I2 [y] is constant then y optimizes (usually in opposite
sense) I2 [y] subject to the condition that I1 [y] is constant. However, this principle is not rigorous and hence it may be
violated in some cases. We note that the use of I1 [y] and I2 [y] is a reference to the Lagrange multipliers formulation and
symbolism (see § 1.8).
[61] For circle, σ = πr 2 and p = 2πr (where σ is its area, p is its perimeter, and r is its radius) which can be combined to
p2
obtain σ = 4π
and hence the area increases/decreases as the perimeter increases/decreases (and vice versa).
2.4 Isoperimetric Problems 112
3
(3,2)
2 (6,2)
1
y
0
−1 (3,−1)
(6,−1)
−2
−3
−4
0 2 4 6 8 10
x
2 2 2 2
Figure 33: Plot of the circles (x − 3) + (y + 1) = 9 and (x − 6) + (y − 2) = 9 of Problem 12 of § 2.4.
Chapter 3
Optimal Surfaces
In this chapter we present and solve problems about topics and applications of the mathematics of variation
related to optimal surfaces, i.e. we are looking in these problems to certain surfaces (or 2D objects) that
optimize something (such as area).
[62] “Shape” in this context refers to a distinctive feature in the generic shape (which is rectangle) and hence this question
can be posed as: what is the length to width ratio of the rectangle of fixed area and optimal perimeter?
[63] As indicated earlier, “shape” in questions like this means a distinctive feature in the generic shape of the object (such as
113
3.1 2D Planar Shapes of Optimal Perimeter 114
b c
h
x y
a
Figure 34: A schematic illustration of the setting of Problem 3 of § 3.1 where a triangle of fixed area σ is
to be optimized in its perimeter p = a + b + c.
p p
p = a + b + c = (x + y) + x2 + h 2 + y 2 + h2
1 h
σ = ah = (x + y)
2 2
where p is the perimeter, a, b, c are the lengths of the triangle sides, σ is its area, h is its height, and
x + y = a. Now, we are supposed to optimize p subject to the restriction on σ (since it is fixed), and
hence we use the Lagrange multipliers method (see § 1.8) with f + λg = p + λσ, that is:
p p h
p + λσ = (x + y) + x2 + h2 + y 2 + h2 + λ (x + y)
2
To optimize p + λσ we need to take the partial derivatives of p + λσ with respect to the variables x, y, h
and set the derivatives to zero to obtain the optimization conditions, that is:
∂ x λh
(p + λσ) = 1 + √ + =0 (74)
∂x 2
x +h 2 2
∂ y λh
(p + λσ) = 1 + p + =0 (75)
∂y 2
y +h 2 2
∂ h h λ
(p + λσ) = √ +p + (x + y) = 0 (76)
∂h 2
x +h 2 2
y +h 2 2
x y
√ = p
x + h2
2 y 2 + h2
2
x y2
=
x2 + h2 y 2 + h2
x2 y 2 + h2 = y 2 x2 + h2
2 2 2 2 2 2 2 2
x y +x h −x y −y h = 0
h2 x2 − y 2 = 0
2 2
x −y = 0 (h 6= 0)
(x − y) (x + y) = 0
x−y = 0 (x + y = a 6= 0)
3.1 2D Planar Shapes of Optimal Perimeter 115
x = y
x2 h2
x+ √ −√ = 0
x2 + h 2 x2 + h2
h2 − x2
x = √
x2 + h2
2
h2 − x2
x2 =
x2 + h2
x4 + x2 h2 = h − 2x2 h2 + x4
4
x2 h2 = h4 − 2x2 h2
σ2 = h4 − 2σ 2 (substituting from Eq. 77)
2
3σ = h4
h2
σ = √
3
h
x = √ (using Eq. 77)
3
Accordingly, the length of the three sides are:
2h
a = x + y = 2x = √
3
r r
p h2 4h2 2h
b = 2
x +h = 2 2
+h = =√ (Using Pythagoras)
3 3 3
r r
p p h 2 4h2 2h
c = y 2 + h2 = x2 + h2 = + h2 = =√ (Using Pythagoras)
3 3 3
2h
So, a = b = c = √ 3
which means that our triangle is equilateral.
Note 1: it should be obvious that the optimal perimeter in this Problem is a minimum (not a
maximum) because for a triangle with a fixed area if the height (or the base) approaches zero the
perimeter diverges.
Note 2: although the above solution is based on the configuration of Figure 34 (which seems rather
restricted) it can be easily adapted to include all possible configurations.
4. What is the shape of the ellipse with fixed area and optimal perimeter (or circumference)?
Answer: The perimeter of ellipse is:
ˆ π/2 p
p= 4a 1 − e2 sin2 θ dθ
0
3.2 2D Planar Shapes of Optimal Area 116
where a is its semi-major axis while e is its eccentricity. Now, we are supposed to optimize p subject
to the restriction on σ (since it is fixed), and hence we can use the Lagrange multipliers method with
f + λg = p + λσ, that is:
ˆ π/2 p
p + λσ = 4a 1 − e2 sin2 θ dθ + λσ
0
So, we need to optimize p + λσ by taking its derivative with respect to e (since p + λσ is a function of
e) and setting the derivative to zero to obtain the optimization condition, that is:
d
(p + λσ) = 0
de
ˆ π/2
∂ p
4a 1 − e2 sin2 θ dθ + 0 = 0
0 ∂e
ˆ π/2
−4ae sin2 θ
p dθ = 0
0 1 − e2 sin2 θ
As we see, this integral can be zero only if e = 0 which means that the ellipse is a circle. So, the
ellipse of fixed area and optimal perimeter is a circle (which is a special case of ellipse corresponding
to e = 0).
Note 1: in differentiating the integral we use the fact that if ξ is a function of two variables (say t
and x) and
ˆ β
φ (t) = ξ (t, x) dx
α
Note 2: the optimal perimeter in this Problem is obviously a minimum (not a maximum) because for
an ellipse of fixed area the perimeter diverges when the minor axis approaches zero (i.e. e approaches
one).
dσ p p p p p p
= − 2L = 0 and hence L= and W = −L= − =
dL 2 4 2 2 4 4
Therefore, L = W = p/4 which means that the rectangle should be a square to optimize its area.
Note: it is obvious that the optimal area in this Problem is a maximum (not a minimum) because the
area can approach zero when the width of the rectangle approaches zero (noting that the perimeter is
fixed).
3.2 2D Planar Shapes of Optimal Area 117
2. Re-solve Problem 1 but this time use the Lagrange multipliers technique (see § 1.8).
Answer: If we use the setting and labeling of Problem 2 of § 3.1 and follow a similar reasoning then
we can say: in the present Problem we want to optimize f = LW (which is the area) subject to the
constraint that the perimeter 2 (L + W ) is constant (and hence g = L + W is constant). So, we should
optimize h = f + λg = LW + λ(L + W ) by taking the partial derivative of h with respect to the
variables L and W and setting the derivatives to zero to obtain the optimization conditions, that is:
∂h
= W +λ=0 and hence W = −λ
∂L
∂h
= L +λ=0 and hence L = −λ
∂W
Accordingly, L = W which again means that the rectangle should be a square to optimize its area.
Note: again, the optimal area in this Problem is a maximum (for the same reason).
3. Re-solve Problem 1 but this time use the proof by contradiction method.
Answer: The result of Problem 1 (or Problem 2) can be obtained as a corollary from the result of
Problem 1 of § 3.1 (or Problem 2 of § 3.1) because if the rectangle of the given perimeter (say p0 ) and
maximum area (say σ0 ) was not a square then the square of perimeter p0 should have a smaller area
than σ0 and hence the square of area σ0 should have longer perimeter than p0 and this contradicts the
result of Problem 1 of § 3.1 (or Problem 2 of § 3.1) because the square of area σ0 should have minimum
perimeter.[64]
Note: the course of action in the present Problem can be reversed by using the result of Problem 1 of
the present section (or Problem 2 of the present section) in conjunction with the proof by contradiction
method to establish the result of Problem 1 of § 3.1 (or Problem 2 of § 3.1). In this case we would
say: the result of Problem 1 of § 3.1 (or Problem 2 of § 3.1) can be obtained as a corollary from the
result of Problem 1 of the present section (or Problem 2 of the present section) because if the rectangle
of a given area (say σ0 ) and minimum perimeter (say p0 ) was not a square then the square of area
σ0 should have a longer perimeter than p0 and hence the square of perimeter p0 should have smaller
area than σ0 and this contradicts the result of Problem 1 of the present section (or Problem 2 of the
present section) because the square of perimeter p0 should have maximum area.[65]
4. What is the shape of the triangle with fixed perimeter and optimal area?
Answer: It is equilateral. This is a corollary of the result of Problem 3 of § 3.1 because if the
equilateral triangle with fixed perimeter (say p0 ) is not optimal (i.e. maximum; see the upcoming
note) in area then there should be a non-equilateral triangle with perimeter p0 and with maximum
area (say σ0 ) and hence an equilateral triangle of area σ0 will have a longer perimeter than p0 which
contradicts the result of Problem 3 of § 3.1 because the equilateral triangle of area σ0 has minimum
perimeter.[66]
Note: it should be obvious that the optimal area in this Problem is a maximum (not a minimum)
because for a triangle with a fixed perimeter if the height (or the base) approaches zero the area
converges to zero.
5. What is the shape of the triangle of optimal area whose perimeter and one of its sides are fixed (i.e.
only two of its sides can vary while keeping their sum fixed).[67]
Answer: If we follow the method of Problem 3 of § 3.1 (assuming the fixed side to be a) then this
[64] For square, p = 4x and σ =√x2 (where p is its perimeter, σ is its area, and x is the length of its sides) which can
be combined to obtain p = 4 σ and hence the perimeter increases/decreases as the area increases/decreases (and vice
versa).
[65] For square, σ = x2 and p = 4x (where σ is its area, p is its perimeter, and x is the length of its sides) which can be
combined to obtain σ = p2 /16 and hence the area increases/decreases as the perimeter increases/decreases (and vice
versa). √
[66] For equilateral triangle, p = 3a and σ = 3 2
a (where p is its perimeter, σ is its area, and a is the length of its sides)
q 4
σ
which can be combined to obtain p = 6 √ and hence the perimeter increases/decreases as the area increases/decreases
3
(and vice versa).
[67] We can also characterize this triangle as: the length of one of its sides and the sum of the lengths of its two other sides
are fixed.
3.3 2D Planar Shapes Inside Other 2D Planar Shapes 118
√ p
time we have f + λg = σ + λp = h2 (x + y) + λ (x + y) + λ x2 + h2 + λ y 2 + h2 (because here we
want to optimize the area subject to a perimeter constraint) and hence:
∂ h x
(σ + λp) = +λ 1+ √ =0
∂x 2 x2 + h2
!
∂ h y
(σ + λp) = +λ 1+ p =0
∂y 2 y + h2
2
and hence x = y (as shown in Problem 3 of § 3.1) which means that the triangle is an isosceles.
Note 1: it should be obvious that the optimal area in this Problem is a maximum (not a minimum)
because for a triangle with a fixed perimeter (regardless of having a fixed side or not) if the height
approaches zero the area converges to zero.
Note 2: if the length of the fixed side is half the sum of the other two sides then the optimal triangle
of this Problem becomes equilateral and this Problem becomes an instance of Problem 4.
6. What is the shape of the ellipse with fixed perimeter and optimal area?
Answer: It should be a circle. This is a corollary of the result of Problem 4 of § 3.1 because if the
circle with the fixed perimeter (say p0 ) is not optimal (i.e. maximum; see the upcoming note) in area
then there should be an ellipse with perimeter p0 and with maximum area (say σ0 ) and hence a circle
of area σ0 will have a longer perimeter than p0 which contradicts the result of Problem 4 of § 3.1
because the circle of area σ0 has minimum perimeter.
Note: it should be obvious that the optimal area in this Problem is a maximum (not a minimum)
because for an ellipse with a fixed perimeter if the minor axis approaches zero (i.e. e approaches one)
the area converges to zero.
p = a+c+b
q q p
= (1 − cos θ)2 + sin2 θ + (1 − cos φ)2 + sin2 φ + (cos θ − cos φ)2 + (sin θ − sin φ)2
p q
= 1 − 2 cos θ + cos2 θ + sin2 θ + 1 − 2 cos φ + cos2 φ + sin2 φ +
3.3 2D Planar Shapes Inside Other 2D Planar Shapes 119
y
B
a
b
φ θ A
O x
c
C
Figure 35: The setting of Problem 1 of § 3.3 where a triangle with vertices A(1, 0), B(cos θ, sin θ) and
C(cos φ, sin φ) is inscribed inside a unit circle centered on the origin of coordinates with a, b, c being the
lengths of the sides of the triangle.
q
cos2 θ − 2 cos θ cos φ + cos2 φ + sin2 θ − 2 sin θ sin φ + sin2 φ
√ p p
= 2 − 2 cos θ + 2 − 2 cos φ + 2 − 2 cos θ cos φ − 2 sin θ sin φ
√ h√ p p i
= 2 1 − cos θ + 1 − cos φ + 1 − cos θ cos φ − sin θ sin φ
√ h√ p p i
= 2 1 − cos θ + 1 − cos φ + 1 − cos (θ − φ)
To optimize p we take its partial derivatives with respect to the variables θ and φ and set the derivatives
to zero to obtain the optimization conditions, that is:
1 ∂p
√ = 0
2 ∂θ
sin θ sin (θ − φ)
√ + p = 0
2 1 − cos θ 2 1 − cos (θ − φ)
sin θ sin (θ − φ)
√ = −p
1 − cos θ 1 − cos (θ − φ)
2
sin θ sin2 (θ − φ)
=
1 − cos θ 1 − cos (θ − φ)
1 − cos2 θ 1 − cos2 (θ − φ)
=
1 − cos θ 1 − cos (θ − φ)
1 + cos θ = 1 + cos (θ − φ)
cos θ = cos (θ − φ) (80)
AND
3.3 2D Planar Shapes Inside Other 2D Planar Shapes 120
1 ∂p
√ = 0
2 ∂φ
sin φ sin (θ − φ)
√ − p = 0
2 1 − cos φ 2 1 − cos (θ − φ)
sin φ sin (θ − φ)
√ = p
1 − cos φ 1 − cos (θ − φ)
2
sin φ sin2 (θ − φ)
=
1 − cos φ 1 − cos (θ − φ)
1 − cos2 φ 1 − cos2 (θ − φ)
=
1 − cos φ 1 − cos (θ − φ)
1 + cos φ = 1 + cos (θ − φ)
cos φ = cos (θ − φ) (81)
On combining Eqs. 80 and 81 we get cos θ = cos φ = cos (θ − φ). Now, if we note that cos φ =
cos (2π − φ) and cos (θ − φ) = cos (φ − θ) then we have cos θ = cos (2π − φ) = cos (φ − θ). Noting
that θ is the positive angle AOB, (2π − φ) is the positive angle COA, and (φ − θ) is the positive angle
BOC, we conclude (within the given constraints) that the angles AOB, BOC and COA are equal which
means that the triangle is equilateral.
Note: it should be obvious that the optimal perimeter in this Problem is a maximum (not a minimum)
because the perimeter can converge to zero when the vertices of the triangle become too close to each
other.[68]
2. Calculate the length of the longest perimeter of a triangle inscribed in a circle of radius R.
Answer: From the result of Problem 1, the inscribed triangle of longest perimeter is equilateral and
hence the length is: √
π 3 √
p = 6R cos = 6R × = 3 3R
6 2
3. What is the shape of the triangle of optimal area inscribed inside a circle?
Answer: We refer to Figure 36 where the setting of the Problem is illustrated. Because the area σ of
an inscribed triangle is equal to the area of the circle minus the sum S of the areas of the segments
(shaded gray in Figure 36) we will optimize (i.e. minimize) S rather than optimize (i.e. maximize) σ.
2
Now, the area σs of a segment of a circle of radius r is σs = r2 (α − sin α) where α is the subtended
√
angle. With no loss of generality we can use a circle of radius r = 2 (because the radius r is just a
scaling factor that does not affect the basic shape of the inscribed triangle) and hence the formula of
σs becomes σs = α − sin α. So, our task is to optimize S which is the sum of σ1 , σ2 , σ3 (see Figure 36)
that is:
S = σ1 + σ2 + σ3
= θ − sin θ + φ − sin φ + (2π − θ − φ) − sin(2π − θ − φ)
Now, to optimize S we take its partial derivatives with respect to θ and φ and set the derivatives to
zero to obtain the optimization conditions, that is:
∂S
= 1 − cos θ − 1 + cos(2π − θ − φ) = − cos θ + cos(2π − θ − φ) = 0 (82)
∂θ
∂S
= 1 − cos φ − 1 + cos(2π − θ − φ) = − cos φ + cos(2π − θ − φ) = 0 (83)
∂φ
On subtracting Eq. 82 from Eq. 83 we get:
cos θ − cos φ = 0
[68] In
√
fact, the perimeter of the inscribed equilateral triangle (which is p = 3 3R with R being the radius of the circle) is
greater than the perimeter in another limiting case when two vertices approach each other while the other vertex is on
the other side of the circle (and hence the perimeter approaches p = 4R).
3.3 2D Planar Shapes Inside Other 2D Planar Shapes 121
B
σ1
σ2 θ
φ A
C σ3
Figure 36: The setting of Problem 3 of § 3.3 where a triangle with vertices A, B and C is inscribed inside
√
a circle of radius r = 2.
So, from Eqs. 84 and 85 we can see that (within the given restrictions) θ = φ = 2π − θ − φ which
means that the three angles that determine the shape of the triangle are equal and hence the triangle
is equilateral.
Note 1: it is obvious that in this Problem the optimal area of the triangle is a maximum (not a
minimum) because the area can converge to zero when the vertices of the triangle (or two of them)
become too close to each other.
Note 2: in our answer we optimized the sum S of the circle segments σ1 , σ2 , σ3 instead of optimizing
the area of the triangle directly (by optimizing the sum of the three inner triangles seen in Figure
36) to avoid necessary (and rather complicated) justifications according to some possible settings and
configurations of the problem.
4. Calculate the area of the triangle of maximum area inscribed in a circle of radius R.
Answer: From the result of Problem 3, the inscribed triangle of maximum area is equilateral and
hence the area is:
√ √
1 √ √ π 1 √ √ 3 3 3 2
σ = × 3R × 3R sin = × 3R × 3R = R
2 3 2 2 4
5. What is the shape of the rectangle of optimal perimeter inscribed inside a circle?
Answer: We use a Cartesian coordinate system centered on the center of the circle with the sides of
the rectangle being parallel to the axes (see Figure 37). Because the size of the radius of the circle does
not affect the basic shape of the inscribed rectangle (since the radius is just a scale factor) we use (with
3.3 2D Planar Shapes Inside Other 2D Planar Shapes 122
no loss of generality) a unit circle. Due to the symmetry, the perimeter p is given by p = 4(α+β) where
α and β are the coordinates of the vertex in the first quadrant. Hence, we are required to optimize
α + β (because 4 is no more than a scale factor). Now, the vertex (α, β) is on the circle and hence it
should satisfy the equation of the circle x2 + y 2 = 1, that is:
(α, β)
O x
Figure 37: The setting of Problem 5 of § 3.3 where a rectangle (with its sides being parallel to the
coordinate axes) is inscribed inside a unit circle centered on the origin of coordinates with α and β being
the x and y coordinates of the vertex in the first quadrant.
α2 + β 2 = 1
p
β = 1 − α2
√
Accordingly, we are required to optimize α + 1 − α2 by taking its derivative with respect to α and
setting it to zero to obtain the optimization condition, that is:
d p
α + 1 − α2 = 0
dα
−α
1+ √ = 0
1 − α2
α2
= 1
1 − α2
2α2 = 1
1
α = √
2
√
Therefore, β = 1 − α2 = √12 and hence α = β = √12 which means that the rectangle is a square
(noting that the square is a special case of rectangle in which the length and width are equal).
3.3 2D Planar Shapes Inside Other 2D Planar Shapes 123
Note 1: according to the result of√this Problem, the length of the longest perimeter of a rectangle
inscribed in a circle of radius R is 4 2R (where this rectangle is a square).
Note 2: it is obvious that the optimal perimeter in this Problem is a maximum (not a minimum)
because the perimeter can converge to 4R (i.e. 4 times the circle radius) when the width of the
rectangle approaches zero.
6. What is the shape of the rectangle of optimal area inscribed inside a circle?
Answer: If we follow the setting and reasoning of Problem 5 then the area σ of the rectangle is:
p p
σ = 4αβ = 4α 1 − α2 = 4 α2 − α4
√
Accordingly, we are required to optimize α2 − α4 by taking its derivative with respect to α and
setting it to zero to obtain the optimization condition, that is:
d p 2
α − α4 = 0
dα
2α − 4α3
√ = 0
2 α2 − α4
α − 2α3 = 0
α 1 − 2α2 = 0
1 − 2α2 = 0 (α 6= 0)
1
α = √
2
√
Therefore, β = 1 − α2 = √12 and hence α = β = √12 which means that the rectangle is a square.
Note 1: according to the result of this Problem, the maximum area of a rectangle inscribed in a circle
of radius R is 2R2 (where this rectangle is a square).
Note 2: it is obvious that the optimal area in this Problem is a maximum (not a minimum) because
the area can converge to zero when the width of the rectangle approaches zero.
7. What is the shape of the rectangle of optimal perimeter inscribed inside an ellipse? Also, find the
length of the optimal perimeter.
Answer: Let use a Cartesian coordinate system centered on the center of the ellipse with the major
axis of the ellipse being on the x axis and hence the sides of the rectangle being parallel to the coordinate
axes (see Figure 38). Also, let the ellipse have semi-major axis a and semi-minor axis b. Due to the
symmetry, the perimeter p of the rectangle is given by p = 4(α + β) where α and β are the coordinates
of the vertex in the first quadrant. Hence, we are required to optimize α + β (because 4 is no more
than a scale factor). Now, the vertex (α, β) is on the ellipse and hence it should satisfy the equation
2 2
of the ellipse xa2 + yb2 = 1, that is:
α2 β2
+ = 1
a2 b2 r
α2
β = b 1−
a2
q
2
Accordingly, we are required to optimize α + b 1 − αa2 by taking its derivative with respect to α and
setting it to zero to obtain the optimization condition, that is:
" r #
d α2
α+b 1− 2 = 0
dα a
b −2 aα2
1+ q = 0
2
2 1 − αa2
3.3 2D Planar Shapes Inside Other 2D Planar Shapes 124
(α, β)
b
O x
a
Figure 38: The setting of Problem 7 of § 3.3 where a rectangle is inscribed inside an ellipse (with semi-
major axis a and semi-minor axis b) centered on the origin of coordinates. The major axis of the ellipse
is on the x axis (and hence the sides of the rectangle are parallel to the coordinate axes), and α and β
are the x and y coordinates of the rectangle vertex in the first quadrant.
bα
1− q = 0
α2
a2 1 − a2
bα
1− √ = 0
a4 − a2 α 2
b2 α2
= 1
a4 − a2 α 2
b α + a2 α2
2 2
= a4
a4
α2 =
a2 + b2
a2
α = √
a2 + b2
Therefore: r s r
α2 1 a4 a2 b2
β =b 1− 2 =b 1− 2 2 2
=b 1− 2 2
=√
a a a +b a +b a + b2
2
2 2
Hence, the rectangle of optimal perimeter has length L = 2α = √ 2a and width W = 2β = √ 2b .
a2 +b2 a2 +b2
So, its perimeter is:
4a2 4b2 4 a2 + b2 p
p = 2L + 2W = √ +√ = √ = 4 a2 + b2
a 2 + b2 a2 + b2 a2 + b2
Note 1: it is obvious that the optimal perimeter in this Problem is a maximum (not a minimum)
because the perimeter can converge √ to 4b when α approaches
√ zero and it can converge to 4a when β
approaches zero (noting that 4b < 4 a2 + b2 and 4a < 4 a2 + b2 ).
Note 2: if we set a = b = R (i.e. the ellipse becomes a circle of radius R) then we obtain α = β = √R2 ,
3.3 2D Planar Shapes Inside Other 2D Planar Shapes 125
i.e. we retrieve (as a special case) the result of the circle that we obtained in Problem 5 (noting that
in Problem 5 we used unit circle).
8. What is the shape of the rectangle of optimal area inscribed inside an ellipse? Also, find the optimal
area.
Answer: If we follow the setting and reasoning of Problem 7 then the area σ of the rectangle is:
r
α2 4b p 2 2
σ = 4αβ = 4αb 1 − 2 = a α − α4
a a
√
Accordingly, we are required to optimize a2 α2 − α4 by taking its derivative with respect to α and
setting it to zero to obtain the optimization condition, that is:
d p 2 2
a α − α4 = 0
dα
2a2 α − 4α3
√ = 0
2 a2 α2 − α4
a2 α − 2α3 = 0
α a2 − 2α2 = 0
2 2
a − 2α = 0 (α 6= 0)
a
α = √
2
Therefore: r s r
α2 1 a2 1 b
β =b 1− 2 =b 1− 2 =b 1− = √
a a 2 2 2
√ √
Hence, the rectangle of optimal area has length L = 2α = 2a and width W = 2β = 2b. So, its area
is: √ √
σ = LW = 2a 2b = 2ab
Note 1: it is obvious that the area in this Problem is a maximum (not a minimum) because the area
can converge to zero when α or β approaches zero.
Note 2: if we set a = b = R (i.e. the ellipse becomes a circle of radius R) then we obtain α = β = √R2 ,
i.e. we retrieve (as a special case) the result of the circle that we obtained in Problem 6 (noting that
in Problem 6 we used unit circle).
Note 3: comparing the result of this Problem to the result of Problem 7 we see that the optimal
rectangles in these Problems are different. This is unlike the Corresponding Problems of rectangles
inscribed in circles (see Problems 5 and 6) where the two “rectangles” are the same because they are
actually squares due to the circular symmetry.
9. What is the shape of the triangle of optimal area whose two vertices are on the foci of an ellipse while
the other vertex is on the perimeter of the ellipse?
Answer: According to the definition of ellipse,[69] the perimeter of this triangle is fixed and hence
this Problem is an instance of Problem 5 of § 3.2 since the perimeter and one of the sides (i.e. the
side that connects the two foci) of the triangle are fixed. Therefore, the triangle is an isosceles and its
optimal area is a maximum (according to note 1 of Problem √ 5 of § 3.2).
Note 1: the optimal area in this Problem is σ = bc = b a2 − b2 where a is the semi-major axis of the
ellipse, b is the semi-minor axis and c is the distance between the center and a focus.
Note 2: the result of the present Problem can be easily obtained from the fact that the area of triangle
is half its base times its height plus the fact that in our case the base is fixed while the height takes its
[69] As it should be known from elementary geometry, an ellipse is a plane curve with the property that the sum of the
distances between any point on the curve and two fixed points (i.e. its foci) in the plane is constant (which is equal to
2a with a being the semi-major axis of the ellipse).
3.3 2D Planar Shapes Inside Other 2D Planar Shapes 126
optimal (maximum) value when the two sides are equal (i.e. when the moving vertex of the triangle is
at the co-vertex of the ellipse).[70]
10. What is the shape of the triangle of optimal perimeter whose two vertices are on the vertices of an
ellipse[71] while the other vertex is on the perimeter of the ellipse?
Answer: Referring to Figure 39, the side of the triangle on the major axis ispfixed and hence we
p need to optimize the sum S of the other two sides whose lengths are
only h2 + (a + X)2 and
h + (a − X) (where 0 < h ≤ b and 0 ≤ X < a), that is:
2 2
b h
O (X, 0) x
a
Figure 39: The setting of Problem 10 of § 3.3 where a triangle is inscribed inside an ellipse (with semi-
major axis a and semi-minor axis b) centered on the origin of coordinates. The major axis of the ellipse
(which is on the x axis) forms one side of the triangle while the opposite vertex of the triangle is on the
ellipse perimeter.
p p
S= h2 + (a + X)2 + h2 + (a − X)2
Now, to optimize S we take its partial derivatives with respect to h and X and set them to zero to
obtain the optimization conditions, that is:
∂S
= 0
∂h
h h
p +p = 0
h2
+ (a + h + (a − X)2
X)2
2
" #
1 1
h p +p = 0
h2 + (a + X)2 h2 + (a − X)2
h = 0 (86)
AND
∂S
= 0
∂X
[70] We mean by “co-vertex of the ellipse” the end point of the minor axis.
[71] We mean by “vertices of an ellipse” the end points of the major axis.
3.4 Surface of Revolution of Optimal Area 127
(a + X) −(a − X)
p +p = 0
h2 + (a + X)2 h2 + (a − X)2
(a + X)2 (a − X)2
=
h2
+ (a + X)2 h2 + (a − X)2
(a + X)2 h2 + (a − X)2 = (a − X)2 h2 + (a + X)2
2 2 2 2
h (a + X) + (a + X) (a − X) = h2 (a − X)2 + (a + X)2 (a − X)2
h2 (a + X)2 = h2 (a − X)2
2
(a + X) = (a − X)2 (h 6= 0)
2 2 2 2
a + 2aX + X = a − 2aX + X
2aX = −2aX
X = −X (a 6= 0)
X = 0 (87)
As we see, Eq. 86 is unacceptable because 0 < h ≤ b (in fact h = 0 corresponds to the lower limit of
the perimeter). However, Eq. 87 is acceptable (since 0 ≤ X < a) and hence the triangle is an isosceles
(with h√ = b). Accordingly, the triangle in this Problem is an isosceles with an optimal perimeter of
2a + 2 a2 + b2 .
Note 1: if the fixed side in this Problem is on the minor
√ axis then the result is similar, i.e. the triangle
is an isosceles with an optimal perimeter of 2b + 2 a2 + b2 .
Note 2: it is obvious that the optimal perimeter in this Problem is a maximum (not a minimum)
because the perimeter can converge to 4a (or 4b) when the height h of the triangle approaches zero.
Note 3: the circle is a special case of ellipse and hence the result of this Problem also applies to circles,
i.e. the triangle of optimal perimeter whose one side is a diameter of a circle while the opposite vertex √
is on the circumference of the circle is an isosceles of optimal (i.e. maximum) perimeter of 2R + 2 2R
(with R being the radius of the circle).
11. What is the shape of the triangle of optimal area whose two vertices are on the vertices of an ellipse
while the other vertex is on the perimeter of the ellipse?
Answer: Referring to the setting of Problem 10 and Figure 39, the area σ of the triangle is half the
base (which is 2a) times the height (which is h) and hence σ = ah. Now, a is constant and hence the
optimal of σ occurs when h is maximum (noting that σ 6= 0 and h 6= 0). Hence, from the condition
0 < h ≤ b we conclude that the optimal area is when h = b. Accordingly, the triangle in this Problem
is an isosceles with an optimal (i.e. maximum) area of ab.
Note 1: if the fixed side in this Problem is on the minor axis then the result is similar, i.e. the triangle
is an isosceles with an optimal (i.e. maximum) area of ab.
Note 2: the circle is a special case of ellipse and hence the result of this Problem also applies to circles,
i.e. the triangle of optimal area whose one side is a diameter of a circle while the opposite vertex is on
the circumference of the circle is an isosceles of optimal (i.e. maximum) area of R2 (with R being the
radius of the circle).
y
B
A ds Γ
O x
Figure 40: A simple sketch depicting the setting of Problem 1 of § 3.4 where a surface of revolution is
generated by revolving a planar curve Γ around the x axis. The curve Γ is represented by a function
y(x) and it connects two fixed points (A and B) with ds representing the length of an infinitesimal arc
generating an infinitesimal ring of radius y and width ds.
Problems
1. Find the shape of the surface of revolution of optimal area (as described above in the text).
Answer: The area σ of a surface of revolution is the sum of the areas of the infinitesimal rings of
radius y and width ds (see Figure 40). Noting that the area of each one of these rings (which are
infinitesimal cylinders) is given by 2πyds (i.e. perimeter 2πy times width ds), the area σ should be
given by the following integral (which is the functional that we intend to minimize in this Problem):
ˆ ˆ xB p
σ= 2πyds = 2π y 1 + y 02 dx ≡ I[y]
Γ xA
p
Accordingly, F (x, y, y 0 ) = y 1 + y 02 . As we see, F in this Problem is identical to F in the problem of
catenary (see Problem 1 of § 2.3). Therefore, the shape of the curve Γ (which represents the profile of
the required surface of revolution) should also be a catenary (i.e. hyperbolic cosine), that is:
x−D
y = C cosh (88)
C
A surface generated by the revolution of a catenary (around its horizontal axis) is called catenoid, and
hence the surface of revolution of optimal area is a catenoid.
Note 1: the constants C and D in Eq. 88 can be determined from the two boundary conditions at
the end points of the curve (see Problem 2).
Note 2: the optimal solution of this Problem should be a minimum (not a maximum) because it is
obvious that the surface area generated by the revolution of some curves can diverge and hence the
optimal solution cannot be a maximum.[74] Also see Problem 4.
Note 3: the existence and uniqueness of the solution of this Problem is not guaranteed. In more
[74] Althoughthis sort of arguments essentially rules out the possibility of global (or absolute) maximum, it should work in
general even for local maximum (noting also our lax approach in pursuing issues like determining the nature of optimum
and considering the upcoming Problem 4 as well although it is also based ultimately on a similar argument).
3.4 Surface of Revolution of Optimal Area 129
accurate terms, depending on the positions of the fixed end points (A and B) there may be one
solution of this type (which is usually the required solution), or more than one solution (with only one
possibly representing the required minimal surface) or there is no solution at all.[75]
2. Given that the profile curve of the surface of Problem 1 passes through the boundary points (0, 1) and
(0.5, 2), find the equation of this profile curve and plot it.
Answer: We start by solving Eq. 88 for D, that is:
y
D = x − C arccosh
C
On substituting the coordinates of the two boundary points into this equation we get:
1
D = 0 − C arccosh (89)
C
2
D = 0.5 − C arccosh (90)
C
On solving this equation for C (using a numerical solver) we get C ' 0.6348247523 and hence D '
−0.6518922526 (where we use Eq. 89 or Eq. 90 to find D). Accordingly, the equation of the profile
curve is:
x + 0.6518922526
y ' 0.6348247523 cosh ' 0.6348247523 cosh (1.575237885x + 1.026885373)
0.6348247523
(rather than the nature of the solution and its physical significance as we have no primary interest in these details).
3.4 Surface of Revolution of Optimal Area 130
1.8
1.6
y
1.4
1.2
1
0 0.1 0.2 0.3 0.4 0.5
x
Figure 41: Plot of the profile curve of the minimum surface of revolution that passes through the boundary
points (0, 1) and (0.5, 2). See Problem 2 of § 3.4.
dρ
dz = q
ρ 2
C −1
ρ
z = C arccosh +D (91)
C
z−D
ρ = C cosh
C
Now, if we note the above-indicated correspondence between z, ρ, ρ0 and x, y, y 0 then the last equation
is the same as Eq. 88, as required.
4. Verify the fact that catenoid is a minimal surface using the result of Problem 4 of § 1.6.
Answer: We use the settings of Problem 3 which satisfy the domain and boundary conditions of
Problem 4 of § 1.6.[77] Hence, according to Eq. 91 (which is an equation of a catenoid in 3D as well
as a catenary in 2D) we have:
p !
x2 + y 2 p
z = C arccosh +D (ρ = x2 + y 2 )
C
x
zx = p q
2 2 x2 +y 2
x +y C2 − 1
−x4 + y 4 − C 2 y 2
zxx = 3/2
3/2 x2 +y 2
C 2 (x2 + y 2 ) C2 − 1
[77] In
fact, there are some issues about the domain and boundary conditions that require discussion and clarification.
However, we ignore these issues because they are not relevant or necessary for our objectives.
3.4 Surface of Revolution of Optimal Area 131
y
zy = p q
x2 +y 2
x2 + y 2 C2 −1
−y + x − C 2 x2
4 4
zyy = 3/2
3/2 x2 +y 2
C 2 (x2 + y 2 ) C2 − 1
−xy 2x2 + 2y 2 − C 2
zxy = 3/2
3/2 x2 +y 2
C 2 (x2 + y 2 ) C2 − 1
On substituting from these equations into Eq. 17 we get 0 = 0, i.e. Eq. 17 is satisfied identically by
the equation of the catenoid. So, the catenoid is a solution to Eq. 17 and hence its area is minimum
(according to Problem 4 of § 1.6), i.e. it is a minimal surface, as required.
Chapter 4
Optimal Solids
In this chapter we present and solve problems about topics and applications of the mathematics of variation
related to solids, i.e. we are looking in these problems to certain 3D objects that optimize something (such
as volume).
[78] Moretechnically and specifically, the 3D shapes of interest in this section have straight sides and hence they are polyhe-
drons. So, in these problems we are optimizing the sum of the lengths of the sides of polyhedrons.
132
4.1 3D Shapes of Optimal Sides Lengths 133
V
parallelepiped and V is the volume which is a constant) and hence z = xy . So, the sum of sides lengths
V
is S = 4(x + y + z) = 4 x + y + xy . Now, we should optimize S by taking its partial derivatives with
respect to x and y and setting them to zero to obtain the optimization conditions, that is:
1 ∂S V
= 1− =0
4 ∂x x2 y
1 ∂S V
= 1− 2 =0
4 ∂y xy
On subtracting the first equation from the second we get:
V V
− = 0
x2 y xy 2
x2 y = xy 2
x = y (dividing by xy 6= 0)
V V
Now, if we repeat the above process but this time with y = xz (instead of z = xy ) then we get x = z.
Hence, x = y = z which means that our parallelepiped is a cube.
Note 1: if we use the Lagrange multipliers technique then f + λg = 4(x + y + z) + λxyz and hence:
∂
(f + λg) = 4 + λyz = 0
∂x
∂
(f + λg) = 4 + λxz = 0
∂y
∂
(f + λg) = 4 + λxy = 0
∂z
which by comparison lead to x = y = z.
Note 2: it should be obvious that the optimal sum of sides lengths in this Problem is a minimum (not
a maximum) because this sum will diverge if two of the dimensions (say y and z) of the parallelepiped
approach zero (noting that the volume is fixed).[79]
3. What is the shape of the regular pyramid of square base and fixed volume that optimizes the sum of
the lengths of all its sides?[80]
Answer: Let a be the length of the 4 sides of the square base, A the length of the 4 slant sides, V
the fixed volume of the pyramid, H its height, and h the height of the slant faces (see Figure 42).
Accordingly, we have:
1 3V
V = a2 H and hence H= 2
3 a
Now, the sum S of the lengths of all sides is given by:
S = 4a + 4A
r
a2
= 4a + 4 + H2
2
s 2
a2 3V
= 4a + 4 +
2 a2
r !
a2 9V 2
= 4 a+ + 4
2 a
h
H
a/2 a/2
s
a2/2
a
a/2
Figure 42: A schematic illustration of the setting of Problem 3 of § 4.1 showing a regular pyramid of
square base and fixed volume. The symbols are explained in the text.
To optimize S we take its derivative with respect to a and set it to zero to obtain the optimization
condition, that is:
1 dS
= 0
4 da
2
a − 4 9V
a5
1+ q = 0
a2 2
2 2 + 9V a4
a6 − 36V 2
1+ q = 0
2 2
2a5 a2 + 9Va4
2 a2 9V 2
a6 − 36V 2 = 4a10 + 4
2 a
a12 − 72V 2 a6 + 362 V 4 = 2a12 + 36V 2 a6
a12 + 108V 2 a6 − 362 V 4 = 0
On solving the last equation for the variable a6 (using the quadratic formula), we get:
√
6 −108V 2 ± 1082 V 4 + 4 × 362 V 4 √
a = = −54 ± 18 13 V 2
2
√
The only physically [81]
sensible root is the positive and hence a6 = −54 + 18 13 V 2 and thus a =
[81] In fact, even mathematically considering our restriction to real quantities.
4.2 3D Shapes of Optimal Surface Area 135
a6 − 18V 2
2a + q = 0
4 2
a3 a4 + 9V a2
2
a6 − 18V 2
4 2 = 4a2
a6 a4 + 9V
a2
a12 − 36V 2 a6 + 182 V 4 = a12 + 36V 2 a6
2 6
72V a = 182 V 4
9V 2
a6 =
2
1/6
9
a = V 1/3 (95)
2
3V
So, our pyramid has a square base of side length a and height H = with an optimal surface area
q a2
4 2
σ = a2 + 2 a4 + 9Va2 (where a is given by Eq. 95).
4.2 3D Shapes of Optimal Surface Area 136
Note: it should be obvious that the optimal surface area in this Problem is a minimum (not a
maximum) because the surface area will diverge if the height H of the pyramid approaches zero
(noting that the volume is fixed).
2. What is the shape of the right circular cone of fixed volume and optimal surface area?
Answer: This is an area optimization problem with a volume constraint, so we solve it by the Lagrange
√ § 1.8). If the radius of the cone base is R and its height is H then its surface
multipliers technique (see
area is σ = πR2 + πR H 2 + R2 while its volume is V = π3 R2 H. So, what we should optimize
(according to the Lagrange multipliers technique) is:
p π
f + λg ≡ σ + λV = πR2 + πR H 2 + R2 + λ R2 H
3
Accordingly, we take the partial derivatives of σ + λV with respect to R and H (which are the
variables in this optimization since they control the area and volume) and set them to zero to obtain
the optimization conditions, that is:
1 ∂ p R2 2λ
(σ + λV ) = 2R + H 2 + R2 + √ + RH = 0
π ∂R 2
H +R 2 3
1 ∂ RH λ
(σ + λV ) = √ + R2 = 0
π ∂H 2
H +R 2 3
2H
Now, if we multiply the second equation by R and subtract the result from the first equation we get:
p R2 2H 2
2R + H 2 + R2 + √ −√ = 0
H 2 + R2 H 2 + R2
2 2 2
H +R R 2H 2
2R + √ +√ −√ = 0
H 2 + R2 H 2 + R2 H 2 + R2
2R2 − H 2
2R + √ = 0
H 2 + R2
2
2R − H 2
2
= 4R2
H 2 + R2
4R − 4R2 H 2 + H 4
4
= 4R2 H 2 + 4R4
H 4 − 8R2 H 2 = 0
2 2
H − 8R = 0 (H 6= 0)
√
H = 8R
1/3 √
3V
So, the optimal cone has a base radius R = √
π 8
and height H = 8R with an optimal surface
2
area σ = 4πR .
Note: the optimal surface area in this Problem is a minimum because σ can diverge when H approaches
zero (noting that the volume is fixed).
3. What is the shape of the rectangular parallelepiped of fixed volume and optimal surface area?
Answer: This is an area optimization problem with a volume constraint and hence we can solve it
using the Lagrange multipliers technique (see § 1.8). Now, if the three dimensions of the parallelepiped
are x, y, z then its surface area is σ = 2(xy + xz + yz) and its volume is V = xyz. So, our Lagrange
multipliers formulation is f + λg ≡ σ + λV = 2(xy + xz + yz) + λxyz and hence we should optimize
σ + λV by taking its partial derivatives with respect to x, y, z and setting these partial derivatives to
zero to obtain the optimization conditions, that is:
∂
(σ + λV ) = 2(y + z) + λyz = 0 (96)
∂x
∂
(σ + λV ) = 2(x + z) + λxz = 0 (97)
∂y
4.2 3D Shapes of Optimal Surface Area 137
∂
(σ + λV ) = 2(x + y) + λxy = 0 (98)
∂z
Now, if we multiply Eq. 96 with x and Eq. 97 with y and subtract the second from the first we get:
Similarly, if we multiply Eq. 97 with y and Eq. 98 with z and subtract the second from the first we
get y = z.
On combining the results x = y and y = z we get x = y = z which means that our parallelepiped is a
cube.
Note: the optimal surface area in this Problem is a minimum because σ can diverge when one
dimension (say z) approaches zero (noting that the volume is fixed).
4. What is the shape of the rectangular parallelepiped of fixed sum of sides lengths and optimal surface
area?
Answer: This Problem can be solved by a similar method to that used in Problem 1 of § 4.1. However,
it is easier to use the proof by contradiction to show that the shape is a cube because if the cube with
fixed sum of sides lengths (say S0 ) is not optimal (i.e. maximum; see the upcoming note) in surface
area then there should be a non-cube parallelepiped with sum S0 and with maximum surface area (say
σ0 ) and hence a cube of surface area σ0 will have a larger sum than S0 which contradicts the result of
Problem 1 of § 4.1 because the cube of surface area σ0 has minimum sum of sides lengths.[82]
Note: the optimal surface area in this Problem is a maximum because the surface area will converge
to zero if two dimensions (say y and z) of the parallelepiped approach zero (noting that the sum of
sides lengths is fixed).
5. What is the shape of the right circular cylinder of fixed volume and optimal surface area?
Answer: The surface area of right circular cylinder is σ = 2πR2 + 2πRH and its volume is V = πR2 H
where R and H are its radius and height. We are asked to optimize σ subject to a constraint on V and
hence according to the Lagrange multipliers technique we have f + λg = σ + λV = 2πR2 + 2πRH +
λπR2 H. So, we take the partial derivatives of σ + λV with respect to R and H and set them to zero
to obtain the optimization conditions, that is:
1 ∂
(σ + λV ) = 4R + 2H + 2λRH = 0
π ∂R
1 ∂
(σ + λV ) = 2R + λR2 = 0
π ∂H
2H
Now, if we multiply the second equation with R and subtract it from the first equation we get:
4R + 2H − 4H = 0
H = 2R
So, the optimal cylinder has a height equal to its diameter with an optimal surface area σ = 2πR2 +
V 1/3
2πR (2R) = 6πR2 where R = 2π .
Note: the optimal surface area in this Problem is a minimum because σ will diverge when H approaches
zero (noting that the volume is fixed).
[82] Forcube, S = 12x and σ = 6x2 (where S ispthe sum of sides lengths, σ is its surface area, and x is the length of its sides)
which can be combined to obtain S = 12 σ/6 and hence the sum of sides lengths increases/decreases as the surface
area increases/decreases (and vice versa).
4.3 3D Shapes of Optimal Volume 138
[83] For cube, σ = 6x2 and V = x3 (where σ is its surface area, V is its volume, and x is the length of its sides) which can
be combined to obtain σ = 6V 2/3 and hence the surface area increases/decreases as the volume increases/decreases (and
vice versa).
[84] For cube, V = x3 and σ = 6x2 (where V is its volume, σ is its surface area, and x is the length of its sides) which can
be combined to obtain V = (σ/6)3/2 and hence the volume increases/decreases as the surface area increases/decreases
(and vice versa).
[85] For cube, S = 12x and V = x3 (where S is the sum of sides lengths, V is its volume, and x is the length of its sides)
which can be combined to obtain S = 12V 1/3 and hence the sum of sides lengths increases/decreases as the volume
increases/decreases (and vice versa).
[86] In fact, it will converge to zero even if only one dimension approaches zero.
4.3 3D Shapes of Optimal Volume 139
6. What is the shape of the ellipsoid of fixed sum of semi-axes lengths and optimal volume?
Answer: The sum of the lengths of semi-axes a, b, c is S = a + b + c while the volume is V = 43 πabc. It
is required to optimize V subject to a constraint on S and hence we can use the Lagrange multipliers
technique with f + λg ≡ V + λS = 43 πabc + λ (a + b + c). On taking the partial derivatives of V + λS
with respect to a, b, c and setting them to zero we can obtain the optimization conditions, that is:
∂ 4
(V + λS) = πbc + λ = 0
∂a 3
∂ 4
(V + λS) = πac + λ = 0
∂b 3
∂ 4
(V + λS) = πab + λ = 0
∂c 3
By subtracting the second equation from the first equation we get a = b while by subtracting the third
equation from the first equation we get a = c. Hence, a = b = c which means that the optimal ellipsoid
is a sphere.
Note: the optimal volume in this Problem is a maximum because V can approach zero when one (or
two) of the semi-axes approach zero (noting that the sum of semi-axes lengths is fixed).
7. What is the shape of the right circular cone of fixed slant height (or generator) and optimal volume?
√ to Figure 43, if the fixed slant height is H while the height and base radius are h
Answer: Referring
and r then r = H 2 − h2 and hence the volume of the cone is:
H H
h
r r
Figure 43: A cross section of a right circular cone of base radius r and height h with fixed slant height H.
The cross section is through the apex A of the cone and the diameter of its base. See Problem 7 of § 4.3.
1 2 1 1
V = πr h = π H 2 − h2 h = π H 2 h − h3
3 3 3
To optimize V we take its derivative with respect to h (which is the variable since H is fixed) and set
the derivative to zero to obtain the optimization condition, that is:
3 dV
= 0
π dh
H 2 − 3h2 = 0
4.4 3D Shapes Inside Other 3D Shapes 141
H
h = √ (h > 0)
3
q q
H2
Hence, the optimal cone has h = √H3 and r = H 2 −
3 = 2
3 H with an optimal volume V =
1 2 2 H
3π 3H
√
3
= 92π
√ H 3.
3
Note: the optimal volume in this Problem is a maximum because V can converge to zero when r
approaches zero or when h approaches zero (noting that the slant height is fixed).
x2 y2 z2
+ + =1 (99)
a2 b2 c2
where a, b, c are the lengths of the semi-axes (which are positive constants). Now, by symmetry[87] the
volume of the parallelepiped in each octant is equal to the volume in any other octant and hence if we
consider the part of the parallelepiped that is in the first octant (i.e. the octant with x, y, z > 0) then
the volume of this part is xyz (with x, y, z being the coordinates of the parallelepiped vertex in that
octant) and hence the volume of the parallelepiped is 8xyz (since in 3D we have 8 octants).[88] So,
our objective is to optimize f = xyz (which is equivalent to optimizing 8xyz). However, because the
parallelepiped is inscribed inside the ellipsoid then its vertices should be on the ellipsoid and hence the
x, y, z of the vertex in the first octant should satisfy the equation of the ellipsoid (i.e. Eq. 99). So in
brief, we have to optimize xyz subject to the constraint of Eq. 2 99 and hence
we can use the Lagrange
x y2 z2
multipliers technique by optimizing h = f + λg = xyz + λ a2 + b2 + c2 . Accordingly, we take the
partial derivatives of h with respect to x, y, z (which are the variables of optimization since they control
the volume of the parallelepiped) and set them to zero to obtain the optimization conditions, that is:
∂h 2λx
= yz + =0 (100)
∂x a2
∂h 2λy
= xz + 2 = 0 (101)
∂y b
∂h 2λz
= xy + 2 = 0 (102)
∂z c
[87] The symmetry of the ellipsoid (as a result of being in standard form) should imply the symmetry of the parallelepiped,
i.e. it is in standard form with its center being on the origin of coordinates and its sides being parallel to the axes of the
coordinate system.
[88] The reader should note that we are actually using x, y, z in two different meanings, i.e. as general coordinates (as in Eq.
99) and as coordinates of the parallelepiped vertex in the first octant. The reason is to simplify the notation and avoid
unnecessary complications; otherwise we can (for instance) use X, Y, Z for the coordinates of the vertex to distinguish
between the two meanings.
4.4 3D Shapes Inside Other 3D Shapes 142
On multiplying Eq. 100 with x and Eq. 101 with y and subtracting the results we get:
2λx2 2λy 2
− = 0
a2 b2
x2 y2
= (103)
a2 b2
Also, on multiplying Eq. 100 with x and Eq. 102 with z and subtracting the results we get:
2λx2 2λz 2
− = 0
a2 c2
x2 z2
= (104)
a2 c2
On comparing Eqs. 103 and 104 we get:
x2 y2 z2
2
= 2 = 2
a b c
y2 z2 x2 z2
On substituting from the last equation into Eq. 99 (once for b2 and c2 , once for a2 and c2 , and once
x2 y2
for a2 and b2 ) we get:
x2 y2 z2
3 =1 3 =1 3 =1
a2 b2 c2
that is:
a b c
x= √ y=√ z=√
3 3 3
Again, from the symmetry it is obvious that x, y, z are just half the sides of the parallelepiped and
2a 2b
hence the lengths of the sides of the parallelepiped are √ 3
(along the x axis), √ 3
(along the y axis), and
2c 2a 2b 2c
√
3
(along the z axis). Accordingly, the optimal volume of the parallelepiped is √ 3
×√ 3
×√ 3
= 8abc
√
3 3
(which can also be obtained from 8xyz which we used above).
Note: the optimal volume in this Problem is a maximum because the volume approaches zero when
one or two of the sides of the parallelepiped approach zero.
2. What is the shape of the rectangular parallelepiped of optimal surface area that is inscribed inside a
spheroid (i.e. ellipsoid of revolution) of a fixed shape?
Answer: The equation of spheroid in standard form can be written as:
x2 y2 + z2
2
+ =1 (105)
a b2
∂h 2λx
= y+z+ =0
∂x a2
∂h 2λy
= x+z+ 2 =0
∂y b
∂h 2λz
= x+y+ 2 =0
∂z b
Now, from the symmetry we should have z = y and hence the last three equations will reduce to only
two, that is:
2λx
2y + = 0 (106)
a2
4.4 3D Shapes Inside Other 3D Shapes 143
2λy
x+y+ = 0 (107)
b2
2
On solving Eq. 106 for λ we get λ = − axy and on substituting this in Eq. 107 we get:
2a2 y 2
x+y− = 0
b2 x
2a2 y 2
x2 + yx − = 0
b2
On solving the last equation for x (using the quadratic formula) we get:
q q
2 2 2
−y + y 2 + 8ab2y −1 + 1 + 8a b 2
x= = y
2 2
where we ignored the non-sensible negative root. Now, on substituting from the last equation (plus
z = y) into Eq. 105 we get:
q 2
2
1 −1 +1 + 8a 2 2
b2
y2 + y + y = 1
a2 2 b2
q 2
8a2
1 −1 + 1 +
+ 2
b2
2
2 y = 1
a 2 b2
q 2 −1/2
8a2
1 −1 + 1+
y = z = 2
b2
+ 2
a 2 b2
and its optimal surface area is σ = 8 (xy + xz + yz) where x, y, z are as given above.
Note: the optimal surface area in this Problem is a maximum because the surface area can approach
zero when two sides of the parallelepiped (i.e. those corresponding to the identical semi-axes) approach
zero.
3. A spheroid is given by the equation x2 + 3y 2 + 3z 2 = 1. Find the dimensions and the surface area of
the inscribed parallelepiped of optimal surface area.
Answer: On comparing this equation with Eq. 105 we get a2 = 1 and b2 = 1/3. Therefore, the
dimensions and the surface area of the optimal parallelepiped are:
q q 2 −1/2
8 8
−1 + 1+ 1 −1 + 1+
+ 2 4
1/3 1/3
2x = 2 =√
2 1 2 1/3 10
4.4 3D Shapes Inside Other 3D Shapes 144
q 2 −1/2
8
−1 + 1+
1 1/3
+ 2 2
2y = 2z = 2 =√
1 2 1/3 10
2 1 2 1 1 1
σ = 8 √ √ +√ √ +√ √ =4
10 10 10 10 10 10
4. What is the shape of the right circular cylinder of optimal surface area that is inscribed inside a sphere
of a fixed radius?
Answer: Let the radius of the sphere be R while the radius √ and height of the cylinder be r and h (see
Figure 44). From the Pythagoras theorem we have h = 2 R2 − r2 and hence the surface area of the
cylinder is:
h r
h/2
R
Figure 44: A cross section of a sphere of radius R in which a right circular cylinder of radius r and height
h is inscribed. The cross section is through the center of the sphere (and the center of the cylinder) and
along two generators of the cylinder. See Problem 4 of § 4.4.
p p
σ = 2πr2 + 2πrh = 2πr2 + 4πr R2 − r2 = 2π r2 + 2 R2 r2 − r4
To optimize σ we take its derivative with respect to r (which is the optimization variable since R is
fixed) and set it to zero to obtain the optimization condition, that is:
1 dσ
= 0
2π dr
2R2 r − 4r3
2r + 2 √ = 0
2 R2 r 2 − r 4
R2 r − 2r3
r+ √ = 0
R2 r 2 − r 4
2
R2 r − 2r3
= r2
R2 r 2 − r 4
R r − 4R2 r4 + 4r6
4 2
= R2 r 4 − r 6
5r6 − 5R2 r4 + R4 r2 = 0
4.4 3D Shapes Inside Other 3D Shapes 145
5r4 − 5R2 r2 + R4 = 0 (r 6= 0)
On solving this equation for r2 (using the quadratic formula) we get:
2
√ √ ! √ !1/2
5R ± 25R 4 − 20R4 5 ± 5 5± 5
r2 = = R2 and hence r= R
10 10 10
Therefore: v
u √ ! √ !1/2
p u 5± 5 5∓ 5
h = 2 R − r = 2tR2 −
2 2 R2 = 2R
10 10
On inserting the obtained expressions for r and h in the equation of the area (i.e. σ = 2πr2 + 2πrh)
we get:
√ ! √ !1/2 √ !1/2
5± 5 5 ± 5 5 ∓ 5
σ = 2π R2 + 2π R 2R
10 10 10
√ ! √ !1/2 √ !1/2
5± 5
2 2 5± 5 5∓ 5
= 2πR + 4πR
10 10 10
" √ ! 2 1/2 #
2 5± 5 5 −5
= 2πR +2
10 100
" √ ! √ #
5± 5 20
= 2πR2 +
10 5
√ 1/2
5+ 5
Now, since the optimal surface area is a maximum (see the upcoming notes), we take r = 10 R
√ 1/2
and h = 2R 5−10 5 and hence the optimal surface area is:
" √ ! √ #
5+ 5 20 √
2
σ = 2πR + = π 1 + 5 R2 ' 10.1664R2
10 5
This is less than the surface area of the sphere which is 4πR2 ' 12.5664R2 (as it should be).
Note 1: the optimal surface area in this Problem is a maximum because σ can approach zero when r
approaches zero (and h approaches 2R).[89]
Note 2: for more clarity, we plotted in Figure 45 the surface area σ of a cylinder inscribed inside a
unit sphere as a function of the cylinder radius r.[90]
5. What is the shape of the right circular cylinder of optimal volume that is inscribed inside a sphere of
a fixed radius?
Answer: If we follow in our footsteps in Problem 4 then the volume V of the cylinder is given by:
2 2 h2 2 h3
V = πr h = πh R − =π R h−
4 4
To optimize V we take its derivative with respect to h (which is the optimization variable since R is
fixed) and set it to zero to obtain the optimization condition, that is:
1 dV
= 0
π dh
[89] We
√
should also note that the above optimal area i.e. σ = π(1 + 5)R2 is also larger than the area in the other limiting
case when h approaches zero and hence the surface area approaches 2πR2 .
[90] The use of unit sphere does not affect the generality because this is equivalent to using length units scaled by 1/R and
12
10
σ
6
0
0 0.2 0.4 0.6 0.8 1
r
Figure 45: Plot of the surface area σ (of a cylinder inscribed inside a unit sphere) as a function of the
q √ √
cylinder radius r where the peak of the curve at ( 5+10 5 , π + π 5) ' (0.8507, 10.1664) is marked. See
Problem 4 of § 4.4.
3h2
R2 − = 0
4
2R
h = √
3
q q q
2R h2 R2 2
Hence, the optimal cylinder has a height h = √
3
and a radius r = R2 − 4 = R2 − 3 =R 3
with an optimal volume V = 34π √ R3 .
3
Note: the optimal volume in this Problem is a maximum because V can approach zero when r
approaches zero (and h approaches 2R) or when h approaches zero (and r approaches R).
6. What is the shape of the right circular cylinder of optimal volume that is inscribed inside a right
circular cone of a fixed shape?
Answer: Let r and h be the radius and height of the cylinder and R and H the base radius and height
of the cone (see Figure 46). Now, the triangles ABC and ADE are similar and hence:
H −h r r
= → h=H 1−
H R R
Thus, the volume of the cylinder is:
2 2 r 2 r3
V = πr h = πr H 1 − = πH r −
R R
To optimize V we take its derivative with respect to r (which is the optimization variable since R and
4.4 3D Shapes Inside Other 3D Shapes 147
H −h
B C
H
D r E
R
Figure 46: A cross section of a right circular cylinder of radius r and height h inscribed inside a right
circular cone of base radius R and height H. The cross section is through the apex A of the cone and the
diameter of its base (and hence through the center of the cylinder and along two of its generators). See
Problem 6 of § 4.4.
H are fixed) and set it to zero to obtain the optimization condition, that is:
1 dV
= 0
π dr
r2
H 2r − 3 = 0
R
r
2−3 = 0 (H 6= 0, r 6= 0)
R
2
r = R
3
Thus, the cylinder of optimal volume has r = 23 R and h = H 1 − Rr = H 1 − R1 23 R = H 3 with an
2 2H 4π 2
optimal volume V = π( 3 R) 3 = 27 R H.
Note: the optimal volume in this Problem is a maximum because V can approach zero when h
approaches H (or when r approaches R).
7. What is the shape of the right circular cone of optimal volume that is inscribed inside a sphere of a
fixed radius?
Answer: Let R be the radius of the sphere and r and h be the base radius and height of the cone (see
Figure 47). Now, r = R cos θ and h = R + R sin θ and hence the volume of the cone is:
1 2 1 πR3
V = πr h = π R2 cos2 θ (R + R sin θ) = cos2 θ + cos2 θ sin θ
3 3 3
To optimize V we take its derivative with respect to θ (which is the optimization variable since R is
fixed) and set it to zero to obtain the optimization condition, that is:
3 dV
= 0
πR3 dθ
4.4 3D Shapes Inside Other 3D Shapes 148
R
h
C
R
θ
r r
Figure 47: A cross section of a sphere of radius R inside which a right circular cone of base radius r and
height h is inscribed. The cross section is through the apex A of the cone and the diameter of its base
(and hence through the center C of the sphere). See Problem 7 of § 4.4.
H −r
H B
C r
r r
D E
R
Figure 48: A cross section of a sphere of radius r inscribed inside a right circular cone of base radius R
and height H. The cross section is through the apex A of the cone and the diameter of its base (and
hence through the center C of the circle). See Problem 8 of § 4.4.
r2 H 2 = R2 H 2 − 2rR2 H
R2 − r 2 H 2 = 2rR2 H
R2 − r 2 H = 2rR2 (H 6= 0)
2
2rR
H =
R2 − r 2
Now, the volume of the cone is:
1 1 2rR2 2πrR4
V = πR2 H = πR2 =
3 3 R2 − r 2 3 (R2 − r2 )
To optimize V we take its derivative with respect to R (which is the optimization variable since r is
fixed) and set it to zero to obtain the optimization condition, that is:
3 dV
= 0
2π dR
4rR3 rR4 (2R)
− = 0
(R2 − r2 ) (R2 − r2 )2
4rR3 R2 − r2 2rR5
2 − 2 = 0
(R2 − r2 ) (R2 − r2 )
4rR5 − 4r3 R3 − 2rR5 = 0
5 3 3
2rR − 4r R = 0
2 2
R − 2r = 0 (r 6= 0, R 6= 0)
√
R = 2r
√
√ 2r ( 2r )
2 √ 2
So, the optimal cone has R = 2r and H = √ 2 = 4r with an optimal volume V = 13 π 2r (4r) =
( 2r) −r2
8 3
3 πr .
4.5 Solids of Revolution of Optimal Volume 150
Note 1: the optimal volume in this Problem is a minimum because V can diverge when H approaches
2r (or when R approaches r).
Note 2: for more clarity, we plotted in Figure 49 the volume V of a cone (in which a unit sphere is
inscribed) as a function of the cone radius R.[91]
24
22
20
18
V
16
14
12
10
8
1 1.5 2 2.5 3
R
Figure 49: Plot of the volume V (of a right circular cone in which a unit sphere is inscribed) as a function
√
of the cone base radius R where the minimum of the curve at ( 2, 8π 3 ) ' (1.4142, 8.3776) is marked. See
Problem 8 of § 4.4.
So, if the volume is optimum then it should be stationary with respect to variations in the parameter
a of the parabola (which controls the shape of the parabola and hence the volume), that is:
1 dV
= 0
π da
ˆ 1
d 2 2
ax + (1 − a) x dx = 0 (108)
da 0
ˆ 1
∂ 2 2
ax + (1 − a) x dx = 0 (109)
0 ∂a
ˆ 1 2
2 ax + (1 − a) x x2 − x dx = 0
0
ˆ 1 4
2 ax + (1 − a) x3 − ax3 − (1 − a) x2 dx = 0
0
ˆ 1 4
ax + (1 − 2a) x3 − (1 − a) x2 dx = 0
0
1
a 5 (1 − 2a) 4 (1 − a) 3
x + x − x = 0
5 4 3 0
a (1 − 2a) (1 − a)
+ − − [0 + 0 − 0] = 0
5 4 3
12a + 15 − 30a − 20 + 20a
= 0
60
2a − 5
= 0
60
5
a =
2
Therefore, the volume-optimizing parabola is given by:
5 5 5 3
y = x2 + 1 − x = x2 − x
2 2 2 2
This parabola is plotted in Figure 50.
Note 1: regarding the differentiation of the integral (see Eqs. 108 and 109) we are using the fact that
if f is a function of two variables (t and x) and
ˆ β
φ (t) = f (t, x) dx
α
y
(1, 1)
(0, 0)
(0.6, 0) x
Figure 50: Plot of the parabola Γ that passes through the points (0, 0) and (1, 1) and optimizes the volume
generated by its revolution around the x axis. See Problem 1 of § 4.5.
which leads to a = 5/2 (as before). However, we followed this method (which requires differentiating
the integral) for diversity and practice.[92]
Note 2: although the immediate objective here is to find a curve, the ultimate objective is to find the
shape of a 3D solid object (of optimal volume) and hence the Problem is classified as an optimal solid
issue. The sufficiency of finding a curve is due to the fact that we are dealing with a solid of revolution
which is completely defined by its profile curve (plus the x axis).
2. Show that the solution that we obtained in Problem 1 is a minimum.
Answer: It should be intuitively obvious that the solution in Problem 1 is a minimum because for
some parabolas (passing through the designated boundary points) the volume can diverge. However,
we can show this formally by calculating the volume (as a function of a) and plotting it, that is:
ˆ 1
2 2
V = π ax + (1 − a) x dx
0
ˆ 1
= π a2 x4 + 2ax3 − 2a2 x3 + x2 − 2ax2 + a2 x2 dx
0
ˆ 1 2 4
= π a x + 2a − 2a2 x3 + 1 − 2a + a2 x2 dx
0
1
a2 5 2a − 2a2 4 1 − 2a + a2
= π x + x + x3
5 4 3 0
2
a 2a − 2a2 1 − 2a + a2
= π + +
5 4 3
2
6a 15a − 15a2 10 − 20a + 10a2
= π + +
30 30 30
[92] Infact, this method may also reduce the required algebra and can help in simplifying the integration (which can be
essential in some cases).
4.5 Solids of Revolution of Optimal Volume 153
a2 − 5a + 10
= π
30
On plotting the volume as a function of a (see Figure 51) we can see clearly that the volume has a
minimum (V = π/8) at a = 5/2 and hence the solution is a minimum (rather than a maximum or
inflection).
0.3938
0.3936
0.3934
V
0.3932
0.393
0.3928
0.3926
2.4 2.45 2.5 2.55 2.6
a
Figure 51: Plot of the volume V of the parabola of Problem 2 of § 4.5 as a function of a where the lowest
point (5/2, π/8) is marked.
3. Find the cubic curve y(x) that passes through the points (0, 0) and (1, 1) and optimizes the volume
generated by its revolution around the x axis. Also, plot the curve between x = 0 and x = 1.
Answer: The equation of cubic curve is y = ax3 + bx2 + cx + d where a, b, c, d are constants and a 6= 0.
Now, since the curve passes through (0, 0) then d = 0, and since it passes through (1, 1) then a+b+c = 1
and hence c = 1 − a − b. Therefore, the curve should be given by y = ax3 + bx2 + (1 − a − b) x. Now,
the volume generated by the revolution of this cubic curve around the x axis between (0, 0) and (1, 1)
is given by:
ˆ 1
V = πy 2 dx
0
ˆ 1 3 2
= π ax + bx2 + (1 − a − b) x dx
0
ˆ "
1
= π a2 x6 + 2abx5 + −2a2 − 2ab + b2 + 2a x4 +
0
#
2
3 2 2
2
−2ab − 2b + 2b x + a + 2ab + b − 2a − 2b + 1 x dx
4.5 Solids of Revolution of Optimal Volume 154
"
a2 7 2ab 6 −2a2 − 2ab + b2 + 2a
= π x + x + x5 +
7 6 5
2 #1
−2ab − 2b2 + 2b 4 a + 2ab + b2 − 2a − 2b + 1
x + x3
4 3
0
"
2 2 2 2
a 2ab −2a − 2ab + b + 2a −2ab − 2b + 2b
= π + + + +
7 6 5 4
2 #
a + 2ab + b2 − 2a − 2b + 1
−0
3
2
8a ab b2 4a b 1
= π + + − − + (110)
105 10 30 15 6 3
Now, if the volume is optimum then it should be stationary with respect to variations in the parameters
a and b of the curve (since these parameters control the shape of the curve and hence the volume),
that is:
1 ∂V 16 1 4
= a+ b− =0
π ∂a 105 10 15
1 ∂V 1 1 1
= a+ b− =0
π ∂b 10 15 6
On solving this system of simultaneous equations we obtain a = 7 and b = −8 and hence c = 1−7+8 =
2. Accordingly, the cubic curve is represented by the function y = 7x3 − 8x2 + 2x and it is plotted in
Figure 52.
0.8
0.6
y
0.4
0.2
−0.2
0 0.2 0.4 0.6 0.8 1
x
Figure 52: Plot of the cubic curve y = 7x3 − 8x2 + 2x between x = 0 and x = 1. See Problem 3 of § 4.5.
4.5 Solids of Revolution of Optimal Volume 155
Table 2: A table for the values of V as a function of a and b where a minimum (for V ) can be seen at
(a, b) = (7, −8). Refer to Problem 4 of § 4.5 for details.
b
-8.03 -8.02 -8.01 -8.00 -7.99 -7.98 -7.97
6.96 0.2102937 0.2101157 0.2099586 0.2098225 0.2097073 0.2096130 0.2095397
6.97 0.2100319 0.2098853 0.2097597 0.2096549 0.2095712 0.2095083 0.2094664
6.98 0.2098180 0.2097028 0.2096086 0.2095353 0.2094829 0.2094515 0.2094410
6.99 0.2096519 0.2095682 0.2095053 0.2094634 0.2094425 0.2094425 0.2094634
a 7.00 0.2095338 0.2094814 0.2094500 0.2094395 0.2094500 0.2094814 0.2095338
7.01 0.2094634 0.2094425 0.2094425 0.2094634 0.2095053 0.2095682 0.2096519
7.02 0.2094410 0.2094515 0.2094829 0.2095353 0.2096086 0.2097028 0.2098180
7.03 0.2094664 0.2095083 0.2095712 0.2096549 0.2097597 0.2098853 0.2100319
7.04 0.2095397 0.2096130 0.2097073 0.2098225 0.2099586 0.2101157 0.2102937
∂H
H − y0 = C
∂y 0
p ∂ p
πy 2 + λ2πy 1 + y 02 − y 0 0 πy 2 + λ2πy 1 + y 02 = C
∂y
!
p 2y 0
πy 2 + λ2πy 1 + y 02 − y 0 λ2πy p = C
2 1 + y 02
p λ2πyy 02
πy 2 + λ2πy 1 + y 02 − p = C
1 + y 02
λ2πy 1 + y 02 − λ2πyy 02
πy 2 + p = C
1 + y 02
λ2πy
πy 2 + p = C
1 + y 02
[93] “Closed”
here means that the end points of the revolving curve that generates the surface are fixed in space and hence
they do not rotate. In other words, the end points are on the axis of revolution (i.e. the line which the curve revolves
around).
4.5 Solids of Revolution of Optimal Volume 156
y
Γ
OA B x
Figure 53: A simple sketch depicting the setting of Problem 5 of § 4.5 where the curve Γ revolves around
the x axis to generate a closed solid of revolution.
Now, because the curve passes through the origin (where y = 0), C is zero. Hence:
λ2πy
πy 2 + p = 0
1 + y 02
2λ
y+ p = 0
1 + y 02
p 2λ
1 + y 02 = −
y
4λ2
y 02 = −1
y2
p
dy 4λ2 − y 2
=
dx y
ˆ ˆ
y dy
p = dx
4λ2 − y 2
p
− 4λ2 − y 2 = x + C1
Now, because the curve passes through the origin (where x = 0 and y = 0), C1 = −2λ and hence:
p
− 4λ2 − y 2 = x − 2λ
2
4λ2 − y 2 = (x − 2λ)
2
(x − 2λ) + y 2 = 4λ2
This is an equation of a circle with center (2λ, 0) and radius 2λ. Therefore, the optimal solid of
revolution is a sphere with center (2λ, 0, z0 ) and radius 2λ (with z0 being a given number representing
the z coordinate of the center).
Note 1: the optimal volume in this Problem is a maximum because V will converge to zero when the
revolving curve (and hence the surface) approaches the axis of rotation (or when the two sides of the
curve approach each other).
Note 2: because y(x) is presumed to be a function over its domain, we exclude one-to-many curves.
However, this will not affect the generality of the result noting that the optimal volume is a maximum
(not a minimum).
Note 3: in any specific problem of this type, λ can be determined from the given area σ of the surface.
For example, if the given area is 16π then we have:
σ = 16π
4.6 Solids of Revolution of Optimal Surface Area 157
2
4π (2λ) = 16π
2
16πλ = 16π
λ = 1
where in line 2 we used the formula for the area of sphere (i.e. σ = 4πR2 ). Accordingly, the equation
2 2
of the circle is (x − 2) + y 2 = 4 and the equation of the sphere is (x − 2) + y 2 + (z − z0 )2 = 4.
[94] For 4
sphere, V = 3
πR3
and σ = 4πR2 (where V is its volume, σ is its surface area, and R is its radius) which can be
σ 3/2
combined to obtain V = 34 π 4π
and hence the volume increases/decreases as the surface area increases/decreases
(and vice versa).
[95] “Solid” here means 3D continuous object with no cavity.
4.8 Solids of Optimal Volume 158
discs. Accordingly, the surface area of the sphere of the same volume should be less than (or equal to)
the surface area of the original solid, i.e. the surface area of the sphere is the minimum of all surface
areas of all solids of the same volume. So, the shape of the solid of fixed volume and optimal (i.e.
minimum) surface area is sphere.
Note: the optimal surface area in this Problem is a minimum because the surface area of some 3D
shapes will diverge when one of the dimensions approaches zero (noting that the volume is fixed).
This can also be concluded from the results of the Problems that we used in our answer of the present
Problem.
2. Find the shape of the solid of minimum surface area with a volume V = 36π. Also, find the surface
area of this solid.
Answer: From the answer of Problem 1 the solid should be sphere. From the formula of the volume
of sphere of radius R we have V = 43 πR3 = 36π and hence R = 3. Therefore, the solid is a sphere of
radius R = 3. Also, from the formula of the surface area σ of sphere of radius R we have σ = 4πR2 =
4π × 32 = 36π. So, our solid is a sphere of surface area 36π (area units) and volume 36π (volume
units).
[96] Again,
“solid” here means 3D continuous object with no cavity.
sphere, σ = 4πR2 and V = 43 πR3 (where σ is its surface area, V is its volume, and R is its radius) which can be
[97] For
2/3
combined to obtain σ = 4π 3V4π
and hence the surface area increases/decreases as the volume increases/decreases
(and vice versa).
4.9 Solids of Revolution of Optimal Resistance to Fluid Flow 159
y Flow
(L, R)
(0, 0) x
Figure 54: A simple sketch depicting the setting of Problem 1 of § 4.9 where a solid of revolution offers
minimum resistance to fluid flow.
the flow is minimum. Use the assumption that the resistance R experienced by such a solid is given
by:
ˆ L
R∝ yy 03 dx
0
where the profile of the object is described by the function y = y(x) with the boundary conditions
y(0) = 0 and y(L) = R and where the flow is in the positive x direction (see Figure 54).
Answer: From the given information we can write:
ˆ L
R=C yy 03 dx ≡ I[y]
0
where C is a constant, R (which is the resistance to fluid flow) represents the functional integral I
that should be minimized, and y (which is the profile of the solid of revolution and hence it completely
determines its shape subject to the boundary conditions) represents the minimizing function that we
want to find. Accordingly, F (x, y, y 0 ) = yy 03 which is independent of x and hence we can use the
Beltrami identity (i.e. Eq. 3), that is:
∂
yy 03 − y 0 0 yy 03 = C
∂y
yy − y 0 3yy 02
03
= C
03 03
yy − 3yy = C
yy 03 = C1 (C1 = −C/2)
1/3
y 1/3 y 0 = C2 (C2 = C1 )
y 1/3 dy = C2 dx
3 4/3
y = C2 x + C3
4
y 4/3 = C4 x + C5
3/4
y = (C4 x + C5 )
Now, from the boundary condition y(0) = 0 we have C5 = 0 and from the boundary condition y(L) = R
we have C4 = R4/3 /L. Hence, the shape of the resistance-minimizing solid of revolution is given by its
4.9 Solids of Revolution of Optimal Resistance to Fluid Flow 160
profile curve:
3/4 x 3/4
R4/3
y= x =R
L L
2. Re-solve Problem 1 but assume this time that the resistance R experienced by the solid object is given
by:
ˆ L
R∝ y 2 y 02 dx
0
Also, plot the profile of the object assuming that the boundary conditions are y(0) = 0 and y(2) = 1.
Answer: Following in our footsteps in Problem 1 we have F (x, y, y 0 ) = y 2 y 02 and hence the Euler-
Lagrange equation in this case is (see Eq. 3):
∂
y 2 y 02 − y 0 0 y 2 y 02 = C
∂y
y 2 y 02 − y 0 2y 2 y 0 = C
y 2 y 02 − 2y 2 y 02 = C
2 02
y y = C1
0
yy = C2
ydy = C2 dx
1 2
y = C2 x + C3
2 p
y = ± C4 x + C5
Now, from the boundary condition y(0) = 0 we have C5 = 0 and from p the boundary condition y(2) = 1
we have C4 = 1/2. Hence, the profile of the object is given by y = ± x/2 and it is plotted in Figure
55.
0.5
y
0
−0.5
−1
0 0.5 1 1.5 2
x
p
Figure 55: A simple sketch depicting the profile y = ± x/2 of the solid of revolution of minimum
resistance to fluid flow according to Problem 2 of § 4.9.
Chapter 5
Geometrical Optics
One of the main foundations of the classical geometrical optics is Fermat’s principle which states that light
travels along paths that take least time, and hence this principle is essentially a variational principle. For
example, in a vacuum or in a uniform medium the light should propagate along straight paths because
least time (according to Fermat’s principle) implies least distance (noting that the shortest paths in
Euclidean spaces are straight lines and that the speed of light is fixed). Although this principle is not
strictly correct[98] and does normally apply only in geometrical optics (where the path of light is much
longer than its wavelength), it still has many useful applications.
In this fairly short chapter we investigate a number of variational Problems related to common (and
simple) applications of geometrical optics. These Problems are essentially based on Fermat’s principle
of least time. It is noteworthy that some of these Problems are so simple that they can (and will) be
solved by ordinary calculus with no need for the variational formulation of the calculus of variations (as
represented by the Euler-Lagrange equation in its various forms and shapes).
Problems
1. Use Fermat’s principle to conclude that light in vacuum and homogeneous media follows straight path.
Answer: This rather trivial result can be easily concluded from Fermat’s principle because in Eu-
clidean space (which is the space of classical geometrical optics) the shortest path connecting two
points (directly) is straight line (see § 2.1). Now, since in vacuum and homogeneous media the speed
of light is constant then least time (according to Fermat’s principle) means shortest distance[99] which
is the (length of) straight line according to the aforementioned Euclidean geometrical fact.
2. Generalize the result of Problem 1 to include non-Euclidean spaces.
Answer: The equivalent to “straight line” in Euclidean spaces is “geodesic” in non-Euclidean spaces
(or indeed more general). So, by the logic of Problem 1 Fermat’s principle should lead to the conclusion
that light in vacuum and homogeneous media follows geodesic paths in general (i.e. both in Euclidean
spaces and in non-Euclidean spaces).[100]
Note: it should be noted that in this Problem (as well as in Problem 1) we are assuming direct
propagation of light without being affected by subsidiary phenomena (like reflection) that divert the
light from its original geodesic trajectory.
3. Derive the law of light reflection (i.e. the angle of incidence is equal to the angle of reflection) using
Fermat’s principle.
Answer: In this Problem the light is presumed to propagate in a single uniform medium (or in
vacuum) and hence it has a constant speed throughout its journey from the point of departure A to
the point of destination C passing through the point of reflection B (see Figure 56).[101] Accordingly,
least time (according to Fermat’s principle) is equivalent to least distance of travel. So, what we need
to do to solve this Problem is to find the path ABC of minimum length by considering the variations
[98] In fact, this principle may be rectified by changing it from being a minimization principle (as implied by least time) to
be a principle for obtaining stationary points (where these stationary points can be minimum or maximum or inflection
or saddle). The details (which are essentially of physical significance and hence they are of little interest to us as
mathematicians of variation) should be sought in the literature.
[99] This is due to the direct relation between distance and time, i.e. d = ct where d is distance, c is speed and t is time.
[100] In fact, we need to provide a rigorous definition for “homogeneous” in non-Euclidean spaces (at least over certain patches
of the space). We should also consider possible position dependency of the speed and its local and global significance.
[101] The presumption that the path between A and B and between B and C is straight is based on the result of Problem
1. So, in reality we are applying Fermat’s principle twice: once on the sub-paths AB and BC, and once on the entire
path ABC. In other words: in this Problem we are using Fermat’s principle in both propagation (along AB and BC)
and reflection (along ABC).
161
5 GEOMETRICAL OPTICS 162
C
A
y2
y1 θ1 θ2
B Mirror
x L−x
L
Figure 56: A simple sketch depicting the setting of the derivation of the law of light reflection (see Problem
3 of § 5).
of the angle of incidence θ1 and the angle of reflection θ2 (noting that these angles vary as the point of
reflection B varies and hence they are functions of the position of B and are correlated to the length
of the path ABC). In fact, this Problem can (and will) be solved using simple geometry, algebra and
ordinary calculus (supported by the variational principle) with no need for the calculus of variations
(as normally represented by the Euler-Lagrange equation).
Now, the length s of the path ABC is the sum of the length of AB and the length of BC where each
one of these lengths can be obtained from the Pythagoras theorem, that is:
q q
2
s = x2 + y12 + (L − x) + y22
So, by the variational principle s should be stationary at its extremum and hence:
ds
= 0
dx
2x −2 (L − x)
p + q = 0
2 2
2 x + y1 2
2 (L − x) + y22
x (L − x)
p −q = 0
2 2
x + y1 2
(L − x) + y22
sin θ1 − sin θ2 = 0 (see Figure 56)
sin θ1 = sin θ2
π
θ1 = θ2 (considering the restrictions 0 < θ1 , θ2 < )
2
which is the law of light reflection.
Note 1: it should be obvious that the optimal path in this Problem is a minimum because the path
of light (and hence the time of travel) can diverge when point B moves far away.[102] However, in the
[102] Aswe noted earlier in a similar context, although this sort of arguments may not rule out the possibility of a local
maximum it is still useful in supporting our intuitive conclusion (noting the secondary importance of issues like this for
our variational objectives).
5 GEOMETRICAL OPTICS 163
following we establish this technically by testing the sign of the second derivative of s, that is:
2
d2 s 1 x2 1 (L − x)
= p − +q −h i3/2
dx2 x2 + y12 (x2 + y12 )
3/2 2 2
(L − x) + y22 (L − x) + y22
2 2
x2 + y12 − x2 (L − x) + y22 − (L − x)
= 3/2
+ h i3/2
(x2 + y12 ) 2
(L − x) + y22
y12 y22
= 3/2
+h i3/2 >0
(x2 + y12 ) (L − x) +
2
y22
So, the optimal s is a minimum and hence the obtained result is consistent with Fermat’s principle
(even in its restricted form as a principle of least time).
Note 2: in the above formulation and arguments we assumed implicitly that the path of light is in a
plane that is perpendicular to the plane of the mirror (where the latter plane is the tangent plane if
the mirror is not flat). This may also be justified by Fermat’s principle (noting that the path in other
planes is not optimal relative to the path in the perpendicular plane even if the condition θ1 = θ2 is
satisfied).
4. A light ray is emitted from point (3, 12, 5) in a 3D Euclidean space coordinated by a Cartesian system
toward a flat mirror in the xy plane. If the reflection of this ray is required to pass through the point
(9, 13, 45), what the initial direction of the ray should be?
Answer: If the mirror does not exist then it is required (due to the symmetry and the law of reflection)
that the ray should pass through the point (9, 13, −45). Hence, the initial direction should be along
the line passing through the initial point (3, 12, 5) and the final point (9, 13, −45), i.e. along the vector
(9 − 3, 13 − 12, −45 − 5) = (6, 1, −50).
5. Derive Snell’s law of light refraction using Fermat’s principle.
Answer: Snell’s law of refraction states that the path of a light ray crossing the boundary between
two propagation media of different refractive indices[103] satisfies the following relation (see Figure 57):
sin θ1 c1 n2
= =
sin θ2 c2 n1
where θ1 and θ2 are the angles of incidence and refraction, c1 and c2 are the speeds of light in medium
1 and in medium 2, and n1 and n2 are the refractive indices of medium 1 (where the source of light)
and medium 2 (where the receiver of light). So, let see how this law can be derived from Fermat’s
principle.
The length s of the path ABC is the sum of the length s1 of AB and the length s2 of BC where each
one of these lengths can be obtained from the Pythagoras theorem, that is:[104]
q q
2
s = s1 + s2 = x + y1 + (L − x) + y22
2 2
The total time of travel t (which we should minimize according to Fermat’s principle) is the sum of
the time t1 along AB and the time t2 along BC, that is:
q
p 2
s1 s2 2
x + y12 (L − x) + y22
t = t1 + t2 = + = +
c1 c2 c1 c2
[103] The refractive index n of a medium (for light propagation) is the ratio of the speed of light in vacuum c to the speed
of light in that medium cm , that is n = c/cm . We should note that both media are presumed optically homogeneous;
moreover one of the media (and possibly both as a special case) could be vacuum (whose refractive index is 1).
[104] Again, the presumption that the path between A and B and between B and C is straight is based on the result of
Problem 1. So, in this Problem we are actually using Fermat’s principle in both propagation and refraction.
5 GEOMETRICAL OPTICS 164
A
s1 Medium 1
y1
θ1
Boundary
B
θ2 s2 y2
x L−x C Medium 2
L
Figure 57: A simple sketch depicting the setting of Problem 5 of § 5 regarding the derivation of Snell’s
law of light refraction.
Now, by the variational principle t should be stationary at its extremum and hence:
dt
= 0
dx
x (L − x)
p − q = 0
c1 x2 + y12 c2
2
(L − x) + y22
sin θ1 sin θ2
− = 0 (see Figure 57)
c1 c2
sin θ1 c1
=
sin θ2 c2
sin θ1 c/n1
=
sin θ2 c/n2
sin θ1 n2
= (111)
sin θ2 n1
which is Snell’s law of light refraction.
Note: again, the optimal path in this Problem is obviously a minimum because the path of light
(and hence the time of travel) can diverge when point B moves far away. This can also be established
technically by testing the sign of the second
derivative of t (as done for s in Problem
3).
6. A light ray is emitted from point (5, 3) in a plane coordinated by a Cartesian system and it is required
to reach point (10, −23). If the refractive index for y > 0 is n1 = 1.2 and the refractive index for y < 0
is n2 = 1.35, what are the coordinates of the point on the boundary through which the ray passes?
Also, what are the angle of incidence θ1 and the angle of refraction θ2 ?
Answer: Referring to Problem 5 (and Figure 57), we have L = 10 − 5 = 5, sin θ1 = √x2x+32 and
sin θ2 = √ 5−x
2 2
. Hence, from Eq. 111 we get:
(5−x) +(−23)
q
2 2
x (5 − x) + (−23)
√ × = 1.125
x + 32
2 5−x
5 GEOMETRICAL OPTICS 165
x ' 0.6424877425
7. Find the shape of the path traced by a ray of light moving in a material medium whose refractive
index is proportional to y (y > 0; see Figure 58).[105]
y B
A
O x
Figure 58: A schematic illustration of the setting of Problem 7 of § 5 where a light ray moves from point
A(xA , yA ) to point B(xB , yB ) along the path Γ (which is confined to the xy plane) in a material medium
whose refractive index is proportional to y (i.e. n = ay with a being a positive constant).
Answer: According to Fermat’s principle the time t for the propagation of the light ray between point
A (source) and point B (destination) should be a minimum. So, our minimized functional I[y] should
be an integral representing t, that is:
ˆ ˆ ˆ xB p ˆ xB p ˆ
ds 1 + y 02 dx 1 + y 02 dx a xB p
t= dt = = = = y 1 + y 02 dx ≡ I[y]
Γ Γ v xA cm xA c/(ay) c xA
p
where in step 2 we used v = ds/dt (and hence dt = ds/v), in step 3 we used ds = 1 + y 02 dx and
v ≡ cm (with cm being the speed of light in the material medium), and in step 4 we used the fact
that the speed of light cm in a material medium with refractive index n is given by c/n noting that
in this case n = ay (with a being a positive constant) because the refractive index is proportional to
[105] Inthis Problem (and other similar Problems which will follow) we are assuming that the speed of light (and hence
the refractive index) is a function of position but not of direction, i.e. the medium is optically isotropic although it
is not homogeneous. We also consider the path of the ray to be confined to the xy plane. It should also be noted
that the condition y > 0 does not impose any real restriction on the formulation of this Problem (neither physical
nor mathematical) because we can always choose our coordinate system so that the entire path is in the region y > 0
(although y < 0 or using a negative proportionality constant may be dealt with as a different problem).
5 GEOMETRICAL OPTICS 166
y . As we see, F in this Problem is identical to F in the problem of catenary (see Problem 1 of § 2.3).
Accordingly, the shape of the light path should also be a hyperbolic cosine, that is:
x−D
y = C cosh (112)
C
8. Find the shape of the path traced by a ray of light moving in a material medium whose refractive
index is proportional to 1/y (y > 0; see Figure 59).
y B
A
O x
Figure 59: A schematic illustration of the setting of Problem 8 of § 5 where a light ray moves from point
A(xA , yA ) to point B(xB , yB ) along the path Γ (which is confined to the xy plane) in a material medium
whose refractive index is proportional to 1/y (i.e. n = b/y with b being a positive constant).
1
y 2 1 + y 02 =
C2
1 − C 2 y2
y 02 =
C 2 y2
s
1 − C 2 y2
y0 = ±
C 2 y2
s
C 2 y2
± dy = dx
1 − C 2 y2
1p
∓ 1 − C 2 y2 = x+D
C
1 2
2
1 − C 2 y2 = (x + D)
C
2 1
(x + D) + y 2 = (113)
C2
So, the shape of the light path is a circular arc centered on the x axis.
9. Find the shape of the path traced by a ray of light moving in a material medium of refractive index
n = ey (y > 0).
Answer: Following in our footsteps in the previous Problems we get:
ˆ ˆ ˆ xB p ˆ xB p ˆ
ds 1 + y 02 dx 1 + y 02 dx 1 xB y p
t= dt = = = = e 1 + y 02 dx ≡ I[y]
Γ Γ v xA cm xA c/ey c xA
p
As we see, F = ey 1 + y 02 and hence the Euler-Lagrange equation (see Eq. 3) is:
p ∂ p
ey 1 + y 02 − y 0 0 ey 1 + y 02 = C
∂y
p ey y 02
ey 1 + y 02 − p = C
1 + y 02
ey 1 + y 02 ey y 02
p −p = C
1 + y 02 1 + y 02
ey
p = C
1 + y 02
e2y = C 2 1 + y 02
e2y − C 2
y 02 = 2
√ C
e − C2
2y
y0 =
C
C dy
√ = dx
− C2
e2y
√ !
e2y − C 2
arctan = x+D (114)
C
10. Find the shape of the path traced by a ray of light moving in a material medium of refractive index
n = a/ρ where a is a positive constant and ρ is a polar coordinate (assuming the path being in a plane
coordinated by polar coordinates ρ, φ).
Answer: Following in our footsteps in the previous Problems (but using polar coordinates) we get:
q q
ˆ ˆ ˆ (dρ)
2
+ ρ 2 (dφ)2 ˆ ρB 2
1 + ρ2 (dφ/dρ) dρ
ds
t = dt = = =
Γ Γ v Γ cm ρA c/ (a/ρ)
5 GEOMETRICAL OPTICS 168
ˆ ρB
p
a 1 + ρ2 φ02
= dρ ≡ I[φ]
c ρA ρ
where in step 3 we used thep expression for the line element ds in polar coordinates and v ≡ cm . As
we see, F (ρ, φ, φ0 ) = ρ−1 1 + ρ2 φ02 and hence the Euler-Lagrange equation (see Eq. 4 noting the
correspondence between x, y, y 0 and ρ, φ, φ0 ) is:
∂ −1 p 2 φ02
ρ 1 + ρ = C
∂φ0
ρ2 φ 0
ρ−1 p = C
1 + ρ2 φ02
ρφ0
p = C
1 + ρ2 φ02
ρ2 φ02 = C 2 1 + ρ2 φ02
C2
φ02 =
ρ2 − C 2 ρ2
D p
φ0 = D = ± C 2 /[1 − C 2 ]
ρ
φ = D ln ρ + E (115)
11. Find the shape of the path traced by a ray of light moving in a material medium of refractive index
4 03
n = √aρ φ2 02 where a is a positive constant, ρ is a polar coordinate (assuming the path being in a
1+ρ φ
plane coordinated by polar coordinates ρ, φ) and φ0 = dφ/dρ.
Answer: As in the previous Problem, we have:
q
ˆ ˆ ˆ 2
(dρ) + ρ2 (dφ)
2 ˆ ρB p ˆ
ds 1 + ρ2 φ02 dρ a ρB 4 03
t= dt = = = √ = ρ φ dρ ≡ I[φ]
Γ Γ v Γ cm ρA 1+ρ2 φ02 c ρA
c aρ4 φ03
As we see, F (ρ, φ, φ0 ) = ρ4 φ03 and hence the Euler-Lagrange equation (see Eq. 4 noting the corre-
spondence between x, y, y 0 and ρ, φ, φ0 ) is:
∂
0
ρ4 φ03 = C (C > 0)
∂φ
3ρ4 φ02
= C
02 C
φ =
3ρ4
p
0 ± C/3
φ =
ρ2
D p
φ = +E D = ∓ C/3 (116)
ρ
12. A light ray is emitted from point (7, 11) in a plane coordinated by a Cartesian system and it is
required to reach point (8, 37). If the refractive index of the medium of propagation is proportional to
y, what the initial direction of the ray should be? Also, plot the trajectory between the two points.
Answer: According to the result of Problem 7, the path is given by Eq. 112 and hence the points
(7, 11) and (8, 37) should satisfy this equation. On solving Eq. 112 for x and substituting the two
points in the resulting equation we get:
11
7 = C arccosh +D
C
5 GEOMETRICAL OPTICS 169
37
8 = C arccosh +D
C
On subtracting the first of these equations from the second we get 1 = C arccosh 37 11
C − C arccosh C
which can be solved numerically to obtain C ' 0.823517739. On inserting this value of C into one of
the above equations we get D ' 4.295725460. So, the trajectory of the light ray is given by:
x − 4.295725460
y = 0.823517739 cosh
0.823517739
Accordingly, the initial direction can be determined from the slope of the trajectory at point (7, 11),
that is:
dy x − 4.295725460
= sinh ' 13.319846963
dx 0.823517739
x=7 x=7
The trajectory is plotted in Figure 60.
40
35
30
y
25
20
15
10
7 7.2 7.4 7.6 7.8 8
x
Figure 60: Plot of the trajectory y = 0.823517739 cosh x−4.295725460
0.823517739 of the light ray of Problem 12 of §
5.
13. A light ray is emitted from point (7, 11) in a plane coordinated by a Cartesian system and it is
required to reach point (8, 37). If the refractive index of the medium of propagation is proportional to
1/y, find the equation of the trajectory of the ray.
Answer: According to the result of Problem 8, the trajectory is given by Eq. 113 and hence the
points (7, 11) and (8, 37) should satisfy this equation, that is:
2 1
(7 + D) + 112 =
C2
2 1
(8 + D) + 372 =
C2
5 GEOMETRICAL OPTICS 170
2 2
On subtracting the first of these equations from the second we get (8 + D) + 372 − (7 + D) − 112 = 0
which can be solved to p obtain D = −1263/2. On inserting this value of D into one of the above
equations we get C = 4/1560485. So, the equation of the trajectory is:
2
1263 1560485
x− + y2 =
2 4
which is a circular arc (as concluded inProblem 8).
14. A light ray is emitted from point (0, 1) in a plane coordinated by a Cartesian system in the direction
(1, 2). If the refractive index of the medium of propagation is n = ey , determine if the ray will pass
through the point (0.3, 2). Also, plot the trajectory for the range y = 1 to y = 5. √
2y 2
Answer: According to the result of Problem 9, the slop of the trajectory is y 0 = e C−C . So, at
point (0, 1) the slope is:
√
0 e2 − C 2
y =
y=1 C
√
2 e − C2
2
=
1 C
4C 2 = e2 − C 2
e
C = √
5
So, from Eq. 114 the trajectory is given by:
q √ 2
e2y − e/ 5
arctan √ = x+D
e/ 5
p
arctan 5e2(y−1) − 1 = x+D
Now, since the ray is emitted from point (0, 1) this point should satisfy this equation, that is:
p
arctan 5e2(1−1) − 1 = 0+D
D = arctan (2)
Accordingly, the trajectory is given (implicitly) by:
p
arctan 5e2(y−1) − 1 = x + arctan (2)
p
x = arctan 5e2(y−1) − 1 − arctan (2)
The point (0.3, 2) does not satisfy this equation and hence it is not on the trajectory (but it is very
close). The trajectory for the range y = 1 to y = 5 is plotted in Figure 61.
15. Make a polar plot for the trajectory of the light ray in Problem 11 for the range φ = π/3 to φ = π
assuming the ray passes through the points with polar coordinates (3, π/3) and (1, π).
Answer: Inserting the coordinates of the given points in Eq. 116 we get:
π D
= +E
3 3
π = D+E
On subtracting the first equation from the second we get 2π 2D
3 = 3 and hence D = π. On inserting this
value of D into one of the above equations we get E = 0. Hence, the equation of the trajectory is:
π π
ρ= ≤φ≤π
φ 3
The plot is shown in Figure 62.
5 GEOMETRICAL OPTICS 171
y
3
2
(0.3,2)
1
0 0.1 0.2 0.3 0.4 0.5
x
√
Figure 61: Plot of the trajectory x = arctan 5e2(y−1) − 1 − arctan (2) of the light ray of Problem 14
of § 5. The point (0.3, 2) which is not on the trajectory (but it is very close) is marked.
90
120 60
4
2
180 (1, π)
0
210 330
240 300
270
Figure 62: The polar plot of the trajectory ρ = πφ of the light ray of Problem 15 of § 5 for the range
φ = π/3 to φ = π. The numbers on the perimeter are the polar angle φ in degrees while the numbers 2
and 4 are the ρ = 2 and ρ = 4 circles.
Chapter 6
Hamiltonian Mechanics
One of the biggest fields of application (as well as development) of the mathematics of variation (and the
calculus of variations in particular) is mechanics where this subject in its variational form was historically
developed by William Hamilton and hence it is commonly known as the Hamiltonian mechanics.[106] In
the Hamiltonian mechanics distinctive terminology is used. For example, the functional integral I in
this mechanics is called the “action” and hence the principle of minimizing this integral (which essentially
reflects the spirit of the variational principle) is called “Hamilton’s principle of least action”.[107] Distinctive
notation is also used in the formulation of the variational problems. For example, in the variational
problems with n extremizing functions (see § 1.7) the independent variable is usually denoted with t
(which normally stands for time) and the extremizing functions are denoted with q1 (t), · · · , qn (t) while
overdot is used to symbolize the derivatives of these variables with respect to t.[108] Also, the integrand
F in the Hamiltonian mechanics is usually denoted with L and is called the Lagrangian (or Lagrangian
function). Hence, the action integral is given as:
ˆ t2
I [q1 , · · · , qn ] = L (t, q1 , · · · , qn , q̇1 , · · · , q̇n ) dt (117)
t1
Accordingly, a set of n Euler-Lagrange equations is required where these equations are given compactly
by:
∂L d ∂L
− =0 (i = 1, · · · , n) (118)
∂qi dt ∂ q̇i
This set of n second order differential equations (which are commonly known as Lagrange’s equations or
Lagrangian equations) should be solved to obtain the solution of the variational problem (i.e. q1 , · · · , qn )
where the initial conditions can be used to determine the 2n constants of integration that are involved
in the solution (noting that in some cases the Hamiltonian formulation can lead to first integrals). It is
worth noting that Eq. 117 may be written in a compact form (using vector notation) as:
ˆ t2
I [q] = L (t, q, q̇) dt (119)
t1
[106] It may also be called the Lagrangian mechanics. In fact, Hamilton is not the only or the first mathematician to work
on this topic but we attributed the development to him because in this book we are interested in his version of the
variational formulation of this field.
[107] Some authors suggest that Hamilton’s principle is closely related to the principle of least action but it is not the same.
More specifically, Hamilton’s principle is seen as more general than the principle of least action. Also, being minimizing
(as suggested by “least”) is not guaranteed by this principle and hence it may be more appropriate to call it the principle
of stationary action (as will be indicated later). In fact, there are many details about these issues and their alike but
they are beyond the scope of this book.
[108] We should note that the extremizing functions q in the Hamiltonian mechanics usually represent the spatial coordinates
i
(and hence their time derivatives q̇i represent the velocity components) of a mechanical system with n degrees of freedom
where the configuration of the system is described by these n functions and their derivatives. So, the purpose of the
variational formulation is to describe the motion of the system and determine its configuration (as a function of time) as
a result of the external and internal forces which the system is subject to. In fact, the configuration of the system can be
seen as a t-parameterized “curve” in an n dimensional space. Classical systems are deterministic and hence by knowing
the configuration at an initial time (by knowing the positions and velocities) it can be determined at any time in the
future (in fact this may be generalized by saying: by knowing the configuration at a given time it can be determined
at any other time). So, the variational formulation of a given mechanical problem plus the initial conditions should be
able to solve the problem completely.
172
6 HAMILTONIAN MECHANICS 173
where q = (q1 , · · · , qn ) and q̇ = (q̇1 , · · · , q̇n ). This notation is particularly useful when the above discrete
system formulation (with a finite number of degrees of freedom n) is generalized to the continuous system
formulation (with an infinite n).
The Hamiltonian mechanics is based on Newton’s laws (or on the Newtonian mechanics to be more com-
prehensive) in its physical framework while it employs the calculus of variations as its main mathematical
technique. The fundamental principle of this mechanics is the aforementioned “Hamilton’s principle of
least action” which states: in a mechanical system subject to conservative forces only the behavior of the
[109]
system
´ (according to the Newtonian mechanics) is described by an extremal of the action integral
I = L dt where the Lagrangian is given by L = T − U (with T and U being the kinetic and potential
energy of the system respectively).[110]
We should finally draw the attention to the following important remarks:
• There are two important generalizations to the Hamiltonian formulation. First, qi ’s are not necessarily
required to be representing conventional coordinates (i.e. they can be used more generally to represent
the variables and physical conditions of the mechanical system and hence they may be called generalized
coordinates). Second, as indicated earlier the Hamiltonian formulation is not necessarily required to be
for discrete systems (e.g. systems of separate particles) and therefore the Hamiltonian mechanics can
be used to formulate even problems of continuous systems such as fluid dynamics and other continuum
mechanics systems (see Problem 19).
• Since the Hamiltonian mechanics is a variational subject that is based on the Euler-Lagrange equation
(as seen above) it is subject to the variations of the Euler-Lagrange equation (as investigated in § 1.4-
1.10) and hence the above formulation of the Hamiltonian mechanics (as represented by Eqs. 117 and 118)
represents the common cases. For example, the Hamiltonian formulation may have only one dependent
variable (see for instance part a of Problem 4; also see Problems 12 and 19) or it may have multiple
independent variables (see Problem 19). In brief, the Hamiltonian mechanics is just an application and
instantiation of the calculus of variations as represented by the Euler-Lagrange equation in its various
forms and flavors (where the principle of least action plays the major role in establishing and justifying
the physics).[111]
• Our objective in the Problems of this chapter is to clarify the variational aspects of the Hamiltonian
mechanics (as an application of the calculus of variations), and hence any other issue (such as the physics
behind the individual Problems) is not of major concern or interest to us. Accordingly, the presentation
and investigation of those other issues may be superficial or non-rigorous.
Problems
1. Discuss the following statement: “The Hamiltonian mechanics is based on Newton’s laws in its physical
framework”.
Answer: In general terms, the Newton’s laws formulation of mechanics and the Hamiltonian formula-
tion of mechanics are independent but equivalent formulations and hence they may be seen as equally
fundamental (noting that each one of these formulations can be derived from the other). Yes, Newton’s
laws have historical precedence and hence the Hamiltonian mechanics can be seen from this perspective
as originating from the Newtonian mechanics although the Hamiltonian mechanics may also be seen as
more fundamental from other perspectives (as discussed in the literature and will be touched on later).
In fact, there are some differences in opinion about which is more fundamental (assuming a precedence
in some sense is presumed or established). Anyway, our opinion is that Newton’s laws are more funda-
mental from a theoretical and conceptual perspective due to the fact that the Newtonian philosophy
and paradigms (which Newton’s laws and Newtonian mechanics are based on) are at the foundation
of the Hamiltonian rationale and formulation. On the other hand, the Hamiltonian mechanics can
[109] We note that in most cases it is minimal but this is not necessarily the case. In fact, to be more accurate and general we
should use “stationary” instead of “extremal” (but we followed what is common in the literature in stating this principle).
[110] We should note that the given statement expresses the essence of this principle only (ignoring some details and conditions
which are out of scope) and hence it is not sufficiently rigorous. We should also note that from this statement we can
see that the action has the physical dimensions of energy times time.
[111] In other words, the Euler-Lagrange equation provides the mathematics of this mechanics while Hamilton’s principle of
be seen as more fundamental (at least in some cases and situations) from a practical and procedural
perspective. It may also be regarded as being more fundamental from its variational perspective since
the principle of variation or optimization is one of the most fundamental physical principles due to
the fact that many applications and branches of science are (or can be) established from the logic and
rationale of optimization and variation. Also, see Problem 2.
2. Make a brief comparison between Lagrange’s equations in mechanics (as embedded in the above-
described formulation of the Hamiltonian mechanics) and Newton’s laws of motion.
Answer: We note the following:
• Lagrange’s equations are physically and logically equivalent to Newton’s laws and hence they can be
seen as another form of Newton’s laws.
• Lagrange’s equations are scalar equations while Newton’s laws are essentially vector equations. This
may be seen as an advantage to Lagrange’s equations since dealing with scalar formulations is generally
easier than dealing with vector formulations.
• The difference in the previous point between the two formulations may lead to another difference
(and advantage to Lagrange’s equations over Newton’s laws) related to the invariance of the two formu-
lations where it is claimed that Newton’s laws in component form are not manifestly invariant across
coordinate systems while Lagrange’s equations are invariant.
• Being based on the paradigms of energy and variation (which are general paradigms that occur across
various disciplines of physics), the formulation of Lagrange’s equations can be easily and naturally ex-
tended to fields of physics other than mechanics (and classical mechanics in particular which is the
birthplace of this formulation). This may be seen as another advantage to this formulation in compar-
ison to Newton’s laws which are less general in application and usefulness. In fact, this advantage can
lead to other advantages such as generalization and unification.
• Being essentially a variational formulation based on the notion of action, the formulation of La-
grange’s equations is more fit and natural for investigating conservation principles and symmetries
in physical systems and the relations between the two. This should be seen as another important
advantage to this formulation over the Newtonian formulation.
• Noting that the principle of least (or stationary) action is restricted to conservative systems, Newton’s
laws of motion may be seen as more general from this perspective. In fact, there are more limitations
and restrictions on the Hamiltonian mechanics (and hence on Lagrange’s equations of mechanics).[112]
• Newton’s laws of motion are associated with (and based on) a certain philosophical and epistemo-
logical framework (unlike the formulation of Lagrange’s equations which is more like a physical theory
or a mathematical method of purely technical nature) and hence Newton’s laws have more significance
and far-reaching consequences from this theoretical perspective (although this may be questioned).
3. State (briefly and non-rigorously) the principle of least action of the Hamiltonian mechanics.
Answer: In conservative mechanical systems,´ the particles (possibly within a continuum) follow tra-
jectories that optimize the action integral I = L dt where L is the Lagrangian defined as the difference
between the kinetic and potential energies, i.e. L = T − U .
4. Find the Lagrangian of the following mechanical systems:
(a) A yo-yo hanging from a ceiling and it is unwinding vertically downward (see Figure 63).
(b) A system made of a particle of mass m1 connected to another particle of mass m2 by a flexible
inextensible string of negligible mass where m1 is moving on a horizontal table while m2 is dangling
vertically with the string being passed through a tiny hole in the table (see Figure 64). Assume that
gravity is the only acting force with no friction involved (neither between m1 and the table nor between
the string and the table or the hole).
Answer:
(a) Assume that the mass of the yo-yo string is negligible and the mass of the yo-yo (i.e. its rotating
part) is m and its moment of inertia is I. Let the plane of the ceiling represent the zero potential
energy reference (see Figure 63). Now, if l is the length of the unwinded part of the yo-yo string
then the potential energy of the system is U = −mgl. Regarding the kinetic energy, it consists of a
[112] Infact, generalizations and extensions to the principle of least action and its application should lift some of these
limitations and restrictions.
6 HAMILTONIAN MECHANICS 175
translational part:
2
1 1 dl 1 ˙2
Tt = mv 2 = m = ml
2 2 dt 2
and a rotational part:
1 2 1 v 2 1 l˙2
Tr =
Iω = I = I 2
2 2 R 2 R
where v is the translational speed of the yo-yo (i.e. its descending part), ω is its angular speed, and
R is the radius of its axle (assuming the thickness of the winded part of the string is negligible).[113]
Accordingly, the Lagrangian of this mechanical system is:
˙ 1 ˙2 1 l˙2 1 I
L t, l, l = T − U = Tt + Tr − U = ml + I 2 + mgl = m + 2 l˙2 + mgl
2 2 R 2 R
Note: unlike the Lagrangian of most Hamiltonian mechanics problems, the Lagrangian of this problem
has only one dependent variable (i.e. l) thanks to the correlation between l and ω which we exploited
to simplify the formulation (as well as being essentially a 1D problem).
Ceiling
Figure 63: A schematic illustration of the yo-yo mechanical system. See part (a) of Problem 4 of § 6.
(b) Let use a cylindrical coordinate system whose origin is at the hole, and its z axis is pointing
vertically downward (see Figure 64). Also, let the length of the string be l and the zero potential energy
reference be the table level. Accordingly, the coordinates of m1 are (ρ, φ, 0) and the coordinates of m2
are (0, 0, z). However, because the string is inextensible l is constant and hence ρ + z = l which leads
to z = l − ρ. Now, the potential energy of the system is the sum of the potential energies of its parts.
However, because m1 remains at the zero potential energy level its potential energy is zero and hence
the potential energy of the system is the potential energy of m2 which is U = −m2 gz = −m2 g (l − ρ).
Regarding the kinetic energy of the system, it is the sum of the kinetic energies of its parts, that is:
T = T1 + T2
1 1
= m1 v12 + m2 v22
2 2
[113] If the thickness is not negligible then it should be included in R and this leads to more complicated formulation.
6 HAMILTONIAN MECHANICS 176
m1
Table Hole
(ρ, φ, 0)
(0, 0, z) m2
Figure 64: A schematic illustration of the setting of part (b) of Problem 4 of § 6.
2
1 1 dz
= m1 ρ̇2 + ρ2 φ̇2 + m2 (v1 has radial and azimuthal components)
2 2 dt
1 1
= m1 ρ̇2 + ρ2 φ̇2 + m2 ρ̇2 (z = l − ρ)
2 2
Accordingly, the Lagrangian of this mechanical system is:
1 1
L t, ρ, φ, ρ̇, φ̇ = T − U = m1 ρ̇2 + ρ2 φ̇2 + m2 ρ̇2 + m2 g (l − ρ)
2 2
5. Given that the kinetic energy T and the potential energy U of a mechanical system are given by:
n
1X
T = mi q̇i2 and U = U (q1 , · · · , qn )
2 i=1
− (T + U ) = −T − U
= T − U − 2T
= L − 2T
Xn
= L− mi q̇i2
i=1
n
X
= L− q̇i [mi q̇i ]
i=1
n
X n
∂ 1X
= L− q̇i mj q̇j2 − U
i=1
∂ q̇i 2 j=1
n
X ∂L
= L− q̇i
i=1
∂ q̇i
= C
where in the last line we used the reduced form of the Euler-Lagrange equation, i.e. the equivalent to
Eq. 3 (or rather a summed version of it) noting that the Lagrangian L does not contain t. Accordingly,
6 HAMILTONIAN MECHANICS 177
T +U = E where E is a constant (= −C). We note that in line 6 we use the fact that q̇j2 is independent
of q̇i unless i = j plus the fact that U is independent of q̇i .
Now, T + U is the total energy (i.e. kinetic plus potential) and hence the obtained result means that
the energy of such a system is conserved.[114]
6. Show that for a simple system made of a single particle of mass m in a conservative force field, Lagrange
equations are equivalent to Newton’s second law.
Answer: We have:
1 1 2
L (t, x1 , x2 , x3 , ẋ1 , ẋ2 , ẋ3 ) = L (t, r, ṙ) = T − U = m(ẋ21 + ẋ22 + ẋ23 ) − U = m |ṙ| − U
2 2
where r = (x1 , x2 , x3 ) is the position vector of the particle, ṙ = (ẋ1 , ẋ2 , ẋ3 ) is its velocity
vector
and U = U (x1 , x2 , x3 ) = U (r). Accordingly, the three Lagrangian equations are ∂xi − dt ∂∂L
∂L d
ẋi =0
(i = 1, 2, 3), that is:
∂U d
− − (mẋi ) = 0 (i = 1, 2, 3)
∂xi dt
∂U
− − mẍi = 0
∂xi
∂U
− = mẍi
∂xi
−∇U = mr̈
F = mr̈
where in the last line we used the fact that the conservative force is the negative gradient of the
potential energy. As we see, the last line is Newton’s second law (i.e. force F equals mass m times
acceleration r̈).
7. Find the Hamiltonian formulation[115] of a simple mechanical system consisting of a free particle of
mass m moving in a plane. Also, interpret the result.
Answer: We use polar coordinates (ρ, φ) which are sufficient to describe the motion of this particle
whose movement is restricted to a plane. Now, the particle
and hence its potential energy U is
is free
zero.[116] Regarding its kinetic energy, it is T = 21 m ρ̇2 + ρ2 φ̇2 due to the fact that the velocity has
in general radial and azimuthal components. Hence, the Lagrangian of this system is:
1
L t, ρ, φ, ρ̇, φ̇ = T − U = m ρ̇2 + ρ2 φ̇2
2
Accordingly, we have two Euler-Lagrange equations (one equation for each coordinate):
∂L d ∂L
− = 0
∂ρ dt ∂ ρ̇
∂ 1 2 1 2 2 d ∂ 1 2 1 2 2
mρ̇ + mρ φ̇ − mρ̇ + mρ φ̇ = 0
∂ρ 2 2 dt ∂ ρ̇ 2 2
d
mρφ̇2 − (mρ̇) = 0
dt
mρφ̇2 − mρ̈ = 0 (120)
AND
[114] More technically, the total energy of a mechanical system in a conservative force field is constant along its trajectory
in space-time. Alternatively, in a mechanical system whose Lagrangian is independent (explicitly) of time the particles
follow trajectories along which the total energy (i.e. kinetic plus potential) of the system is conserved.
[115] We mean by “Hamiltonian formulation” in this sort of Problems the Lagrangian L and the Euler-Lagrange equation(s).
[116] In fact, it is constant that can be set to zero due to the arbitrariness of the reference level. Anyway, this may affect the
∂L ∂L d
− = 0
∂ φ̇ ∂φ dt
∂ 1 1 d ∂ 1 1
mρ̇2 + mρ2 φ̇2 − mρ̇2 + mρ2 φ̇2 = 0
∂φ 2 2 dt ∂ φ̇ 2 2
d 2
0− mρ φ̇ = 0
dt
d 2
mρ φ̇ = 0 (121)
dt
Regarding the interpretation of the result, Eq. 120 represents Newton’s second law (i.e. force in the
radial direction equals mass times centripetal/centrifugal acceleration)[117] while Eq. 121 represents
the conservation of angular momentum (noting that Eq. 121 leads to mρ2 φ̇ = constant).
Note: the expression of the kinetic energy T that we used above can be easily obtained (using the
equivalent Cartesian system) as follows:
1
T = mv 2
2
1
= m ẋ2 + ẏ 2
2 (
2 2 )
1 d d
= m (ρ cos φ) + (ρ sin φ)
2 dt dt
h i2 h i2
1
= m ρ̇ cos φ − ρφ̇ sin φ + ρ̇ sin φ + ρφ̇ cos φ
2
1 h 2
= m ρ̇ cos2 φ − 2ρρ̇φ̇ cos φ sin φ + ρ2 φ̇2 sin2 φ +
2 i
ρ̇2 sin2 φ + 2ρρ̇φ̇ cos φ sin φ + ρ2 φ̇2 cos2 φ
1 2
= m ρ̇ + ρ2 φ̇2
2
8. Find the Hamiltonian formulation of a simple mechanical system consisting of a single particle of mass
m in the gravitational field of the Earth.
Answer: If q1 , q2 , q3 stand for the Cartesian coordinates x, y, z of the particle and Φ(x, y, z) represents
the gravitational potential of the Earth (noting that this potential depends on the coordinates only as
indicated by the notation) then we have:
1 1
L (t, x, y, z, ẋ, ẏ, ż) = T − U = mv 2 − mΦ = m ẋ2 + ẏ 2 + ż 2 − mΦ
2 2
Accordingly, we have three Euler-Lagrange equations (one equation for each coordinate):
∂L d ∂L ∂L d ∂L ∂L d ∂L
− =0 − =0 − =0
∂x dt ∂ ẋ ∂y dt ∂ ẏ ∂z dt ∂ ż
that is:
∂Φ d ∂Φ d ∂Φ d
−m − (mẋ) = 0 −m − (mẏ) = 0 −m − (mż) = 0
∂x dt ∂y dt ∂z dt
These equations can be simplified to:
∂Φ ∂Φ ∂Φ
ẍ = − ÿ = − z̈ = −
∂x ∂y ∂z
[117] This should not contradict the fact that the particle is free.
6 HAMILTONIAN MECHANICS 179
9. Find the Hamiltonian formulation of a mechanical system consisting of a satellite of mass m orbiting
the Earth. Use a normalized spherical coordinate system centered on the center of the Earth.
Answer: In this Problem q1 , q2 , q3 stand for the spherical coordinates r, θ, φ of the satellite and
Φ(r, θ, φ) represents the gravitational potential of the Earth (which solely depends on the coordinates).
Now, in spherical coordinates the velocity is v = (ṙ, rθ̇, rφ̇ sin θ) and hence v 2 = ṙ2 + r2 θ̇2 + r2 φ̇2 sin2 θ
while the gravitational potential is Φ = −C/r (with C being a positive constant).[118] So, the La-
grangian of this system is:
1 1 C
L t, r, θ, φ, ṙ, θ̇, φ̇ = T − U = mv 2 − mΦ = m ṙ2 + r2 θ̇2 + r2 φ̇2 sin2 θ + m
2 2 r
Accordingly, we have three Euler-Lagrange equations (one equation for each coordinate):
∂L d ∂L C dṙ
− = 0 ⇒ rθ̇2 + rφ̇2 sin2 θ − 2 − =0 (122)
∂r dt ∂ ṙ r dt
∂L d ∂L d 2
− = 0 ⇒ r2 φ̇2 sin θ cos θ − r θ̇ = 0 (123)
∂θ dt ∂ θ̇ dt
∂L d ∂L d 2
− = 0 ⇒ r φ̇ sin2 θ = 0 (124)
∂φ dt ∂ φ̇ dt
10. Simplify the Hamiltonian formulation of Problem 9 by assuming that the orbit is in the equatorial
plane of the coordinate system (i.e. the plane θ = π/2 that passes through the center of the Earth).
Answer: In the equatorial plane θ = π/2 and hence cos θ = θ̇ = 0 and sin θ = 1. Therefore, Eqs.
122-124 reduce to:
C
rφ̇2 − − r̈ = 0
r2
0 = 0
d 2
r φ̇ = 0
dt
On discarding the second equation (which is trivial) we get two Euler-Lagrange equations:
C
rφ̇2 − − r̈ = 0 (125)
r2
d
r2 φ̇ = 0 (126)
dt
11. Analyze the results of Problem 10.
Answer: If we multiply Eq. 125 with m (which we discarded earlier for simplicity) and rearrange the
terms we get:
mC
2
= m r φ̇ − r̈ (127)
r2
where mC 2
r 2 is the magnitude of the gravitational force while (r φ̇ − r̈) is the magnitude of the radial
acceleration and hence Eq. 125 is just Newton’s second law for the gravitational field (which is a
central force field).
Regarding Eq. 126, it can be easily integrated to obtain r2 φ̇ = C1 (with C1 being a constant). Now,
r2 φ̇ is the magnitude of the angular momentum per unit mass and hence Eq. 126 represents the
conservation of angular momentum. In fact, r2 φ̇ also represents twice the areal speed and hence the
equation r2 φ̇ = C1 represents Kepler’s second law.
Note: we may put Eq. 127 in the following form:
mC
+ mr̈ = mrφ̇2
r2
[118] In fact, C = GM where G is the gravitational constant and M is the mass of the Earth.
6 HAMILTONIAN MECHANICS 180
where the left hand side represents the total central force (i.e. gravitational plus inertial) while the right
hand side represents mass times centripetal acceleration (and hence the equation represents Newton’s
second law for central gravitational fields).
12. Obtain the Hamiltonian formulation of a mechanical system consisting of a single particle of mass m
executing a 1D simple harmonic motion. What type of solution this system has?
Answer: For simple harmonic motion we have:
1 1
L (t, x, ẋ) = T − U = mẋ2 − kx2
2 2
where t is time, x is the displacement of the particle from its equilibrium position, ẋ is its speed
(i.e. dx/dt) and k is a positive constant (i.e. the “spring” constant). Accordingly, we have a single
Euler-Lagrange equation, that is:
∂L d ∂L
− = 0
∂x dt ∂ ẋ
∂ 1 1 d ∂ 1 1
mẋ2 − kx2 − mẋ2 − kx2 = 0
∂x 2 2 dt ∂ ẋ 2 2
d
−kx − (mẋ) = 0
dt
mẍ + kx = 0
q
1 k
The solution is obviously a sinusoidal function of time (with frequency 2π m and appropriate mag-
nitude and phase shift).
13. Obtain the Hamiltonian formulation of a mechanical system consisting of three particles connected in
series by three springs in-between (and the entire system is connected to a fixed support S). Assume
that the springs are linear (Hookean) and massless and the only forces acting on the system are the
spring forces (so the effect of other forces like gravity is negligible).
S1 m1 S2 m2 S3 m3
S
x1 x2 x3
O1 O2 O3
Figure 65: A schematic illustration of a system consisting of three particles (of masses m1 , m2 , m3 )
connected in series to three springs S1 , S2 , S3 and secured to a fixed support S. See Problem 13 of § 6.
Answer: Let m1 , m2 , m3 be the masses of the particles, S1 , S2, S3 the springs, k1 , k2, k3 their spring
constants, and x1 , x2, x3 the displacements of the particles from their equilibrium positions O1 , O2 , O3
(see Figure 65). Now, m1 is displaced from its equilibrium position O1 by x1 and hence its potential
energy (which is due to the stretch/compression in S1 ) is 12 k1 x21 . Regarding m2 , it is displaced from
its equilibrium position O2 by x2 but part of this displacement is due to the displacement x1 and
hence its potential energy (which is due to the stretch/compression in S2 ) is 12 k2 (x2 − x1 )2 . Similarly,
m3 is displaced from its equilibrium position O3 by x3 but part of this displacement is due to the
displacement x2 and hence its potential energy (which is due to the stretch/compression in S3 ) is
1 2 1 2 1 2 1 2
2 k3 (x3 − x2 ) . So, the potential energy of the system is U = 2 k1 x1 + 2 k2 (x2 − x1 ) + 2 k3 (x3 − x2 ) .
dxi
Regarding the kinetic energy, each particle mi (i = 1, 2, 3) has a speed ẋi ≡ dt which represents
6 HAMILTONIAN MECHANICS 181
the temporal rate of change of its position relative to its equilibrium position (noting that all the
equilibrium positions are at rest). Therefore, the kinetic energy of each particle is 12 mi ẋ2i and hence
the kinetic energy of the system is T = 12 m1 ẋ21 + 12 m2 ẋ22 + 12 m3 ẋ23 . So, the Lagrangian of the system
is:
1
L (t, x1 , x2 , x3 , ẋ1 , ẋ2 , ẋ3 ) = T − U = m1 ẋ21 + m2 ẋ22 + m3 ẋ23 − k1 x21 − k2 (x2 − x1 )2 − k3 (x3 − x2 )2
2
Accordingly, we have three Euler-Lagrange equations (corresponding to x1 , x2, x3 ):
∂L d ∂L
− = 0 ⇒ −k1 x1 + k2 (x2 − x1 ) − m1 ẍ1 = 0
∂x1 dt ∂ ẋ1
∂L d ∂L
− = 0 ⇒ −k2 (x2 − x1 ) + k3 (x3 − x2 ) − m2 ẍ2 = 0
∂x2 dt ∂ ẋ2
∂L d ∂L
− = 0 ⇒ −k3 (x3 − x2 ) − m3 ẍ3 = 0
∂x3 dt ∂ ẋ3
14. Obtain the Hamiltonian formulation of a simple pendulum consisting of a particle of mass m hanging
at the end of an inextensible and weightless string of length l and swinging in a vertical plane where
(uniform) gravity is the only active force (see Figure 66). Suggest a simple solution.
θ
l
l
m
l(1 − cos θ)
Figure 66: A schematic illustration of the simple pendulum system. See Problem 14 of § 6.
∂ 1 2 2 d ∂ 1 2 2
ml θ̇ − mgl (1 − cos θ) − ml θ̇ − mgl (1 − cos θ) = 0
∂θ 2 dt ∂ θ̇ 2
d 2
−mgl sin θ − ml θ̇ = 0
dt
−mgl sin θ − ml2 θ̈ = 0
lθ̈ + g sin θ = 0
For small θ we can use the approximation sin θ ' θ and hence the pendulum equation becomes
lθ̈ + gθ = 0. pSo, the solution (according to this simplification) is a sinusoidal function of time (with
1 g
frequency 2π l ).
15. Obtain the Hamiltonian formulation of a compound pendulum consisting of a particle of mass m1
(hanging at the end of an inextensible and weightless string of length l1 ) to which a second particle
of mass m2 (hanging at the end of an inextensible and weightless string of length l2 ) is attached (see
Figure 67). Again, the swing of the compound pendulum is restricted to a vertical plane and (uniform)
gravity is the only active force.
Answer: Let solve this Problem using the setting of Figure 67 where we use an inverted Cartesian
coordinate system (i.e. its y axis is pointing downward) in the vertical plane. The origin of the
coordinate system is taken at the hanging point of the string l1 while the zero potential energy reference
is taken at y = 0. In this setting θ1 and θ2 are the angles of displacement of m1 and m2 from the
vertical line. Now, the positions of m1 and m2 are:
O x
θ1 l
1
m1
l2
θ2
m2
y
Figure 67: A schematic illustration of the compound pendulum system. See Problem 15 of § 6.
(x1 , y1 ) = (l1 sin θ1 , l1 cos θ1 ) and (x2 , y2 ) = (l1 sin θ1 + l2 sin θ2 , l1 cos θ1 + l2 cos θ2 )
= − gl1 (m1 + m2 ) cos θ1 + m2 gl2 cos θ2
1 1 h i
= m1 l12 θ̇12 + m2 l12 θ̇12 + l22 θ̇22 + 2l1 l2 θ̇1 θ̇2 (cos θ1 cos θ2 + sin θ1 sin θ2 )
2 2
1 1 h i
= m1 l1 θ̇1 + m2 l12 θ̇12 + l22 θ̇22 + 2l1 l2 θ̇1 θ̇2 cos (θ1 − θ2 )
2 2
2 2
1 1
= (m1 + m2 ) l12 θ̇12 + m2 l22 θ̇22 + m2 l1 l2 θ̇1 θ̇2 cos (θ1 − θ2 )
2 2
So, the Lagrangian of the system is:
L t, θ1 , θ2 , θ̇1 , θ̇2 = T −U
1 1
= (m1 + m2 ) l12 θ̇12 + m2 l22 θ̇22 + m2 l1 l2 θ̇1 θ̇2 cos (θ1 − θ2 ) +
2 2
gl1 (m1 + m2 ) cos θ1 + m2 gl2 cos θ2
Accordingly, we have two Euler-Lagrange equations (one equation for each θ):
∂L d ∂L
− = 0
∂θ1 dt ∂ θ̇1
−m2 l1 l2 θ̇1 θ̇2 sin (θ1 − θ2 ) − gl1 (m1 + m2 ) sin θ1
d h i
− (m1 + m2 ) l12 θ̇1 + m2 l1 l2 θ̇2 cos (θ1 − θ2 ) = 0
dt
−m2 l1 l2 θ̇1 θ̇2 sin (θ1 − θ2 ) − gl1 (m1 + m2 ) sin θ1
− (m1 + m2 ) l12 θ̈1 − m2 l1 l2 θ̈2 cos (θ1 − θ2 ) + m2 l1 l2 θ̇2 θ̇1 − θ̇2 sin (θ1 − θ2 ) = 0
AND
∂L d ∂L
− = 0
∂θ2 dt ∂ θ̇2
m2 l1 l2 θ̇1 θ̇2 sin (θ1 − θ2 ) − m2 gl2 sin θ2
d h i
− m2 l22 θ̇2 + m2 l1 l2 θ̇1 cos (θ1 − θ2 ) = 0
dt
m2 l1 l2 θ̇1 θ̇2 sin (θ1 − θ2 ) − m2 gl2 sin θ2
−m2 l22 θ̈2 − m2 l1 l2 θ̈1 cos (θ1 − θ2 ) + m2 l1 l2 θ̇1 θ̇1 − θ̇2 sin (θ1 − θ2 ) = 0
6 HAMILTONIAN MECHANICS 184
16. Obtain the Hamiltonian formulation of a spherical pendulum consisting of a particle of mass m hanging
at the end of an inextensible and weightless string of length l and swinging around (in all directions
but restricted by the length l) where the (uniform) gravity of the Earth is the only active force.
Answer: We use a normalized spherical coordinate system centered on the fixed end of the string
with the θ = 0 axis (corresponding to the z axis) pointing toward the Earth center. In this Problem
the dependent variables are the spherical coordinates r, θ, φ of the particle m and we take the zero
potential energy reference to be the plane passing through the origin of coordinates and parallel to
the surface of the Earth. Now, in spherical coordinates the velocity is v = (ṙ, rθ̇, rφ̇ sin θ) but since m
is always at distance l from the origin then r = l and ṙ = 0 and hence the velocity of the particle
is
1 2 2 2 2 2
v = (0, lθ̇, lφ̇ sin θ). Accordingly, the kinetic energy of the particle is T = 2 m l θ̇ + l φ̇ sin θ and
its potential energy is U = −mgl cos θ. Therefore, the Lagrangian of this mechanical system is:
1
L t, r, θ, φ, ṙ, θ̇, φ̇ = T − U = ml2 θ̇2 + φ̇2 sin2 θ + mgl cos θ
2
Accordingly, we have three Euler-Lagrange equations (one equation for each coordinate):
∂L d ∂L
− = 0
∂r dt ∂ ṙ
∂ 1 2 2 d ∂ 1
ml θ̇ + φ̇2 sin2 θ + mgl cos θ − ml2 θ̇2 + φ̇2 sin2 θ + mgl cos θ = 0
∂r 2 dt ∂ ṙ 2
0−0 = 0
0 = 0
AND
∂L d ∂L
− = 0
∂θ dt ∂ θ̇
∂ 1 2 2 d ∂ 1
ml θ̇ + φ̇2 sin2 θ + mgl cos θ − ml2 θ̇2 + φ̇2 sin2 θ + mgl cos θ = 0
∂θ 2 dt ∂ θ̇ 2
1 2 2 d 1 2
ml 2φ̇ sin θ cos θ − mgl sin θ − ml 2θ̇ = 0
2 dt 2
ml2 φ̇2 sin θ cos θ − mgl sin θ − ml2 θ̈ = 0
lφ̇2 sin θ cos θ − g sin θ − lθ̈ = 0
AND
∂L d ∂L
− = 0
∂φ dt ∂ φ̇
∂ 1 2 2 2 2
d ∂ 1 2
2 2 2
ml θ̇ + φ̇ sin θ + mgl cos θ − ml θ̇ + φ̇ sin θ + mgl cos θ = 0
∂φ 2 dt ∂ φ̇ 2
d 1 2
0− ml 2φ̇ sin2 θ = 0
dt 2
2 2
ml φ̈ sin θ + 2φ̇θ̇ sin θ cos θ = 0
φ̈ sin2 θ + 2φ̇θ̇ sin θ cos θ = 0
So, we have only two useful (non-trivial) equations.
17. Use a Hamiltonian approach to analyze a free system made of two massive particles interacting by a
conservative force that solely depends on their separation.
Answer: If the masses of the particles are m1 and m2 and their position vectors are r1 and r2 , then
the Lagrangian of the system is:
1 2 1 2
L (t, r1 , r2 , ṙ1 , ṙ2 ) = m1 |ṙ1 | + m2 |ṙ2 | − U (r)
2 2
6 HAMILTONIAN MECHANICS 185
Now, from the perspective of the center of mass (where the Lagrangian is a function of the coordinates
X1 , X2 , X3 and their derivatives as well as time), the Euler-Lagrange equations are:
∂L d ∂L
− = 0 (i = 1, 2, 3)
∂Xi dt ∂ Ẋi
d
0− M Ẋi = 0
dt
M Ẍi = 0
Ẍi = 0
R̈ = 0
This result is logical because the system as a whole (and hence its center of mass) is free of any force
and therefore its acceleration should vanish according to Newton’s first law.
From the perspective of the interacting particles (where the Lagrangian is a function of the coordinates
x1 , x2 , x3 and their derivatives as well as time), the Euler-Lagrange equations are:
∂L d ∂L
− = 0 (i = 1, 2, 3)
∂xi dt ∂ ẋi
∂U d m1 m2
− − ẋi = 0
∂xi dt M
∂U m1 m2
− − ẍi = 0
∂xi M
m1 m2 ∂U
ẍi = −
M ∂xi
mr r̈ = −∇U
where mr is the reduced mass and ∇ is the gradient operator. This result is also logical because it
represents Newton’s second law, i.e. mass times acceleration equals force (noting that the conservative
force is the negative gradient of the potential energy).
18. Show that a particle whose trajectory in a 3D Euclidean space is restricted to an equipotential surface
follows a geodesic path.
Answer: We describe the particle trajectory (which connects its start and end points) by Cartesian
coordinates x, y, z. Now, since the trajectory is restricted to an equipotential surface then the potential
energy of the particle is constant, that is U = C. Hence the Lagrangian is:
1
L (t, x, y, z, ẋ, ẏ, ż) = T − U = m ẋ2 + ẏ 2 + ż 2 − C
2
where m is the mass of the particle. Accordingly, we have three Euler-Lagrange equations (one equation
for each coordinate):
∂L d ∂L
− = 0 ⇒ ẍ = 0
∂x dt ∂ ẋ
∂L d ∂L
− = 0 ⇒ ÿ = 0
∂y dt ∂ ẏ
∂L d ∂L
− = 0 ⇒ z̈ = 0
∂z dt ∂ ż
Now, let parameterize the Cartesian coordinates of the geodesic that connects the two end points of
the trajectory with t and hence x = x(t), y = y(t) and z = z(t). Accordingly, the geodesic is the path
of optimal length s as given by:
ˆ ˆ q ˆ tB p
2 2 2
s= ds = (dx) + (dy) + (dz) = ẋ2 + ẏ 2 + ż 2 dt ≡ I [x, y, z]
Γ Γ tA
6 HAMILTONIAN MECHANICS 187
p
and therefore F = ẋ2 + ẏ 2 + ż 2 . Now, referring to footnote [12] F is independent of t, x, y, z and
hence the Euler-Lagrange equations are ẋ = constant, ẏ = constant and ż = constant, which lead to
ẍ = 0, ÿ = 0 and z̈ = 0. This means that the mathematical formulation of the two problems (i.e.
motion over an equipotential surface and motion along a geodesic curve) is the same and hence the
trajectory as found from the Hamiltonian formulation is identical to the trajectory as found from the
geodesic formulation, i.e. the particle follows a geodesic path.
Note: those who may not feel comfortable with the use of footnote [12] may use Eq. 3, that is:
p
p ∂ ẋ2 + ẏ 2 + ż 2 ẏ 2 + ż 2
ẋ2 + ẏ 2 + ż 2 − ẋ = C1 ⇒ p = C1
∂ ẋ ẋ2 + ẏ 2 + ż 2
p
p ∂ ẋ2 + ẏ 2 + ż 2 ẋ2 + ż 2
ẋ + ẏ + ż − ẏ
2 2 2 = C2 ⇒ p = C2
∂ ẏ ẋ2 + ẏ 2 + ż 2
p
p ∂ ẋ2 + ẏ 2 + ż 2 ẋ2 + ẏ 2
ẋ + ẏ + ż − ż
2 2 2 = C3 ⇒ p = C3
∂ ż ẋ2 + ẏ 2 + ż 2
p
which lead (by adding and simplifying) to ẋ2 + ẏ 2 + ż 2 = constant. Hence, from Eq. 4 we get:
∂F ẋ ∂F ẏ ∂F ż
=p = C4 =p = C5 =p = C6
∂ ẋ ẋ + ẏ 2 + ż 2
2 ∂ ẏ ẋ + ẏ 2 + ż 2
2 ∂ ż ẋ + ẏ 2 + ż 2
2
to conservative systems.
[123] We note that the (continuum) expression µ y 2 dx is no more than the usual (discrete) expression of the kinetic energy
2 t ´
1
mv 2 where µ dx represents m while yt2 represents v 2 . So, T = xx2 µ y 2 dx is the continuous version of the discrete
2
Pn 1 1 2 t
version T = 2
i=1 2 mi vi . For a more technical approach, the reader is referred to the literature (see for instance
Weinstock in the References).
6 HAMILTONIAN MECHANICS 188
where x1 and x2 are the x coordinates of the end (fixed) points of the string and yt = ∂y/∂t. Regarding
the potential energy, we simply apply Hooke’s law (in its continuous version) and hence:[124]
ˆ x2 2 ˆ x2
τ ∂y τ 2
U= dx = y dx
x1 2 ∂x x1 2 x
Now, if we apply Eq. 117 to our Lagrangian (noting that we have a single dependent variable y) we
get: ˆ t2 ˆ t2 ˆ x2 ˆ ˆ
1 1 t2 x2
I [y] = L dt = µyt2 − τ yx2 dx dt = µyt2 − τ yx2 dx dt
t1 t1 2 x1 2 t1 x1
As we see, this is a functional integral (with F = µyt2 − τ yx2 ) of a variational problem with multiple
independent variables (see Eq. 14 in § 1.6) and hence we use Eq. 15 (noting that t, x, y, yt , yx in our
case correspond to x1 , x2 , y, yx1 , yx2 in Eqs. 14 and 15), that is:
∂F ∂ ∂F ∂ ∂F
− − = 0
∂y ∂t ∂yt ∂x ∂yx
∂ 2 ∂ ∂ 2 ∂ ∂ 2
µyt − τ yx2 − µyt − τ yx2 − µyt − τ yx2 = 0
∂y ∂t ∂yt ∂x ∂yx
∂ ∂
0− (2µyt ) − (−2τ yx ) = 0
∂t ∂x
∂ ∂
(µyt ) − (τ yx ) = 0
∂t ∂x
Now, if µ and τ are independent of t and x then the last equation becomes (noting the meaning of the
partial derivatives with respect to the independent variables as explained in § 1.6):
∂ ∂
µ (yt ) − τ (yx ) = 0
∂t
∂x
∂ ∂y ∂ ∂y
µ −τ = 0
∂t ∂t ∂x ∂x
∂2y ∂2y
µ 2 −τ 2 = 0
∂t ∂x
which is the 1D wave equation for small transverse
p oscillations on a taut string of uniform linear density
and constant tension (with wave speed vw = τ /µ).
[124] Instead
of going through detailed analysis (which is time consuming and distracting from our variational objective) we
can justify the above integral as follows (using slack notations and deliberations):
ˆ x2 ˆ x2 ˆ x2 ˆ x2
τ ∂y 2
τ ∂y ∂y 1 τ dx 1 τ
U = dx = dx = ∂y∂y = (∂y)2
x1 2 ∂x x1 2 ∂x ∂x x1 2 ∂x ∂x x1 2 ∂x
τ
As we see, ∂x corresponds to k and (∂y)2 corresponds to x2 in the expression of the potential energy of the common form
´
of Hooke’s law (i.e. U = 12 kx2 ). So, U = xx2 τ2 yx2 dx is the continuous version of the discrete version U = n 1 2
P
1 i=1 2 ki xi .
For a more technical approach, the reader is referred to the literature (see for instance Weinstock in the References).
Chapter 7
Sturm-Liouville Problems
The Sturm-Liouville problems are defined by the following differential equation (on a given interval a ≤
x ≤ b):[125]
d
− (py 0 ) + qy = λwy (128)
dx
where p, q, w, y are functions of x (with p and w not vanishing on the interval), λ is an eigenvalue of the
eigenfunction y, and y 0 = dy/dx. It can be easily shown that Sturm-Liouville problems can be formulated
and solved as variational problems. In fact, the Sturm-Liouville differential equation is no more than the
Euler-Lagrange equation for a certain type of variational problems with constraint (see Problem 3). This
establishes a relationship between the eigenvalue problems and the calculus of variations (see Problem 7).
Problems
1. Write the Sturm-Liouville equations for the following sets of p, q, w:
(a) p = 1, q = x, w = 1.
(b) p = a, q = ex , w = xb (where a and b are constants).
(c) p = −x, q = x, w = x3 .
Answer:
(a) Inserting p, q, w into Eq. 128 we get:
d
− (1 y 0 ) + xy = λ 1 y that is − y 00 + xy = λy
dx
(b) Inserting p, q, w into Eq. 128 we get:
d
− (ay 0 ) + ex y = λxb y that is − ay 00 + ex y = λxb y
dx
(c) Inserting p, q, w into Eq. 128 we get:
d
− (−xy 0 ) + xy = λx3 y that is xy 00 + y 0 + xy = λx3 y
dx
189
7 STURM-LIOUVILLE PROBLEMS 190
d
− −x2 y 0 + (ln |x|) y = λ (−α) y
dx
By comparing this to Eq. 128 we see: p = −x2 , q = ln |x|, and w = −α.
(c) Putting the equation in the standard form of Sturm-Liouville equation (Eq. 128) we get:
y λ5y
−e2x y 00 − 2e2x y 0 + =
x x3
2x 00 2x 0
1 5
− e y + 2e y + y = λ y
x x3
d 2x 0 1 5
− e y + y = λ y
dx x x3
y 00 − xy = −λx2 y
d
− (−y 0 ) + (−x) y = λ −x2 y
dx
By comparing this to Eq. 128 we see: p = −1, q = −x, and w = −x2 .
3. Show that the Sturm-Liouville problem (as given by Eq. 128) is equivalent to the constrained varia-
tional problem: I[y] = I1 [y] − λI2 [y] where:[126]
ˆ b ˆ b
I1 [y] = py 02 + qy 2 dx and I2 [y] = wy 2 dx (129)
a a
Answer: According to the formulation of constrained variational problems (see § 1.8) we have:
(ˆ ) (ˆ ) ˆ
b b b
02
I[y] = I1 [y]−λI2 [y] = 2
py + qy dx −λ 2
wy dx = py 02 + qy 2 − λwy 2 dx (130)
a a a
Hence, H ≡ F − λG = py 02 + qy 2 − λwy 2 and the Euler-Lagrange equation (see Eq. 22) is:
∂ 02 2 2
d ∂ 02 2 2
py + qy − λwy − py + qy − λwy = 0
∂y dx ∂y 0
d
2qy − 2λwy − (2py 0 ) = 0
dx
d
−2 (py 0 ) + 2qy = 2λwy
dx
d
− (py 0 ) + qy = λwy
dx
which is the Sturm-Liouville equation (Eq. 128).
Note: a consequence of the result of this Problem is that Sturm-Liouville problems can be treated and
formulated as constrained variational problems (see Problem 4), and some (but not all)[127] constrained
variational problems can be treated and formulated as Sturm-Liouville problems (see Problem 5).
[126] The reader should note that the eigenvalue of the Sturm-Liouville problem is represented by the Lagrange multiplier λ
of the variational problem. We should also note that the minus sign in the constrained formulation is allowed because
the sign of λ is rather arbitrary. The use of the minus (instead of plus) sign is to keep the given form of Sturm-Liouville
equation.
[127] This restriction is due to the fact that I and I in Eq. 129 are restricted to certain forms.
1 2
7 STURM-LIOUVILLE PROBLEMS 191
5. Find the Euler-Lagrange equation for the constrained variational problem of part (a) of Problem 5 of
§ 1.8 and the constrained variational problem of part (b) of Problem 7 of § 1.8 without applying Eq.
22.
Answer: As we saw in Problem 3, H of a constrained variational problem (of a certain form whose
I1 and I2 are given by Eq. 129) is given by H = py 02 + qy 2 − λwy 2 where p, q, w are the parameters of
the Sturm-Liouville equation that corresponds to the constrained variational problem (and hence the
Sturm-Liouville equation represents the Euler-Lagrange equation of the given constrained variational
problem).
Now, for part (a) of Problem 5 of § 1.8 we have H ≡ py 02 + qy 2 − λwy 2 = xy 02 + λx2 y 2 and hence the
Euler-Lagrange equation for this constrained variational problem is the Sturm-Liouville equation with
p = x, q = 0 and w = −x2 , that is:
d
− (xy 0 ) + 0 = −λx2 y
dx
−xy 00 − y 0 = −λx2 y
00 0 2
xy + y − λx y = 0
which is what we found in part (a) of Problem 5 of § 1.8 by applying Eq. 22.
Similarly, for part (b) of Problem 7 of § 1.8 we have H ≡ py 02 + qy 2 − λwy 2 = y 02 − λy 2 and hence
the Euler-Lagrange equation for this constrained variational problem is the Sturm-Liouville equation
with p = 1, q = 0 and w = 1, that is:
d 0
− (y ) + 0 = λy
dx
y 00 + λy = 0
which is what we found in part (b) of Problem 7 of § 1.8 by applying Eq. 22.
6. Using the Sturm-Liouville equation (Eq. 128), confirm that λ = I1 /I2 .
Answer: Multiplying Eq. 128 with y, we get:
d
−y (py 0 ) + qy 2 = λwy 2
dx
7 STURM-LIOUVILLE PROBLEMS 192
ˆ b ˆ b ˆ b
d
− (py 0 ) dx +
y qy 2 dx = λwy 2 dx (integrating)
a dx a a
ˆ b ˆ b ˆ b
b
− [ypy 0 ]a + py 02 dx + qy 2 dx = λwy 2 dx (integration by parts)
a a a
ˆ b ˆ b ˆ b
py 02 dx + qy 2 dx = λwy 2 dx (using suitable boundary conditions)
a a a
ˆ b ˆ b
02 2
py + qy dx = λ wy 2 dx
a a
= λI2 I1
(see Eq. 129)
I1
λ =
I2
´ ´
Note: in the integration by parts formula u dv = uv − v du we use u = y and v = py 0 .
7. Referring to the formulation of Problem 3 and assuming normalization such that I2 [y] = 1,[128] show
that the stationary values of I1 [y] of the variational problem produce the eigenvalues of the Sturm-
Liouville problem.
Answer: From the result of Problem 6 and the assumption I2 [y] = 1 we have:[129]
I1
λ= = I1
I2
Accordingly, the stationary values of I1 of the variational problem produce the eigenvalues (λ’s) of the
Sturm-Liouville problem, as required.
8. Referring to the formulation of Problem 3, show that obtaining the stationary (or extreme) values of
I is equivalent to obtaining the stationary (or extreme) values of I1 /I2 .[130]
Answer: Noting that variation follows the pattern of differential (and hence the rules of differentiation
apply in its manipulation), we have:
I1 δI1 I1
δ = − 2 δI2
I2 I2 I2
δI1 (I1 /I2 )
= − δI2
I2 I2
δI1 − (I1 /I2 ) δI2
=
I2
δI1 − λδI2
=
I2
δI
=
I2
where in line 4 we use I1 /I2 = λ (see Problem 6). Now, since I2 is constant (see § 1.8) the last equation
means that stationarizing (or extremizing) I is equivalent to stationarizing (or extremizing) I1 /I2 .
9. Discuss and analyze the result of Problem 8.
Answer: Since stationarizing I is equivalent to stationarizing I1 /I2 (= λ) then any function y that
stationarizes I should stationarize I1 /I2 (and vice versa). Now, because the functions that stationarize
I are solutions of the Sturm-Liouville equation (see Problem 3) then this means that obtaining the
functions that stationarize I1 /I2 is equivalent to obtaining the functions that are solutions to the
[128] We remind the reader that according to the formulation of constrained variational problems (see § 1.8) I2 is constant.
[129] The reader should be careful in reading and interpreting this equation and its alike. This is due mainly to the (rather
loose) use of I1 where sometimes it stands for the value and sometimes for the stationary value.
[130] Although this seems trivial (considering that I [y] is constant), the purpose is to show this technically (with the
2
assumption I2 [y] = constant being used only in the final stage).
7 STURM-LIOUVILLE PROBLEMS 193
Sturm-Liouville equation. In fact, this should facilitate the solution of (certain types of) differential
equations[131] by the methods of variational calculus (as well as benefiting from the techniques of solving
differential equations in variational calculus). This will also have implications and consequences on the
estimation and evaluation of the eigenvalues and eigenfunctions of the Sturm-Liouville equations (as
will be demonstrated in the upcoming Problems).
10. Using the results of the previous Problems, try to propose a method for estimating the eigenvalues
and eigenfunctions (and obtaining bounds and approximations) of the Sturm-Liouville problems.
Answer:[132] Based on the results of the previous Problems, the stationary values of I1 /I2 of a
(restricted) variational problem (that complies with the above formulations and conditions) are the
eigenvalues of the corresponding Sturm-Liouville problem. This means that the values of I1 /I2 for a
given problem should lie between the minimum eigenvalue λm and the maximum eigenvalue λM of the
Sturm-Liouville equation, that is:[133]
I1
λm ≤ ≤ λM
I2
Accordingly, an estimation of I1 /I2 for a given problem will provide an upper bound on λm and a
lower bound on λM without going through any variational process or optimization procedure.[134]
Moreover, a (guessed) trial function yt that is used in this estimation could provide an approximation
to the stationarizing (i.e. true) eigenfunction y that corresponds to these eigenvalues (i.e. λm and
λM ) where the best of the trial functions usually corresponds to the best of the estimated eigenvalues.
These issues will be clarified in Problems 11 and 12.
11. Find an upper bound on the lowest eigenvalue of the following Sturm-Liouville problem:
y 00 + λy = 0 with boundary conditions y(0) = 0, y(1) = 1 and y 0 (1) = 0
using the following trial functions (which all satisfy the given boundary conditions as they should
be):[135]
(a) yt = x4 + x3 − 6x2 + 5x (b) yt = x3 − 3x2 + 3x (c) yt = −x3 + x2 + x
2 4 5 3 2 3
(d) yt = 2x − x (e) yt = x − 2 x + x + 2 x.
00
Answer: The equation y + λy = 0 can be put in the following form: − dx d
(y 0 ) = λy. Comparing this
form to the standard form of Sturm-Liouville equation (as given by Eq. 128) we have:
p=1 q=0 w=1
Accordingly (see Eq. 129):
ˆ 1 ˆ 1
02
I1 = py + qy 2
dx = y 02 dx (131)
0 0
ˆ 1 ˆ 1
I2 = wy 2 dx = y 2 dx (132)
0 0
answer more rigorous. The details should be sought in more specialized texts on Sturm-Liouville problems and their
relation to variational problems and variational calculus.
[133] Again, the reader should be careful in reading and interpreting this equation and its alike. This is due mainly to
the (rather loose) use of I1 /I2 where sometimes it stands for the stationary value (corresponding to the stationar-
izing/extremizing function) and sometimes for the value (corresponding to an arbitrary function which is usually an
approximation to the stationarizing/extremizing function). We should also note that there are certain restrictions on
λm and λM for the validity of this equation (the details should be sought in more specialized texts).
[134] In fact, with certain conditions and insights the values of I /I may also provide approximations to the corresponding
1 2
eigenvalues (as will be clarified later).
[135] The subscript t in y in this Problem stands for “trial” and does not symbolize partial derivative with respect to t.
t
7 STURM-LIOUVILLE PROBLEMS 194
1
16 7 87 136
= x + 4x6 − x5 − 8x4 + 58x3 − 60x2 + 25x =
7 5 0 35
ˆ 1 ˆ 1
2
I2 = x4 + x3 − 6x2 + 5x dx = x8 + 2x7 − 11x6 − 2x5 + 46x4 − 60x3 + 25x2 dx
0 0
1
1 9 1 8 11 7 1 6 46 5 25 1247
= x + x − x − x + x − 15x4 + x3 =
9 4 7 3 5 3 0 1260
Hence, an upper bound on the lowest eigenvalue is I1 /I2 = 4896/1247 ' 3.9262.
(b) For the trial function yt = x3 − 3x2 + 3x we have:
ˆ 1 ˆ 1
2
2
I1 = 3x − 6x + 3 dx = 9x4 − 36x3 + 54x2 − 36x + 9 dx
0 0
1
9 5 9
= x − 9x4 + 18x3 − 18x2 + 9x =
5 0 5
ˆ 1 ˆ 1
2
I2 = x3 − 3x2 + 3x dx = x6 − 6x5 + 15x4 − 18x3 + 9x2 dx
0 0
1
1 7 9 9
= x − x6 + 3x5 − x4 + 3x3 =
7 2 0 14
Hence, an upper bound on the lowest eigenvalue is I1 /I2 = 238/93 ' 2.5591.
(d) For the trial function yt = 2x − x2 we have:
ˆ 1 ˆ 1 1
2 4 4
I1 = (2 − 2x) dx = 4 − 8x + 4x2 dx = 4x − 4x2 + x3 =
0 0 3 0 3
ˆ 1 ˆ 1
5 1
2 4 x 8
I2 = 2x − x2 dx = 4x2 − 4x3 + x4 dx = x3 − x4 + =
0 0 3 5 0 15
1
1 9 5 8 33 7 1 6 13 5 3 4 3 3 1339
= x − x + x − x − x + x + x =
9 8 28 3 10 4 4 0 2520
Hence, an upper bound on the lowest eigenvalue is I1 /I2 = 3324/1339 ' 2.4825.
12. Discuss the results of Problem 11.
Answer: Let first obtain the exact solution of the given Sturm-Liouville problem. In fact,this problem
was solved in part (b) of Problem 7 of § 1.8 as a variational problem with constraint but without
the boundary condition y(1) = 1 . Now, if λ ≤ 0 then (according to Problem 7 of § 1.8) the solution
is y = 0 which does not satisfy the boundary condition y(1) = 1 (whichwe imposed in Problem 11).
Therefore, we should have λ > 0 and hence the solution is y = b sin πx 2 which when combined with
the boundary condition y(1) = 1 yields b = 1. So, the exact solution (eigenfunction) of the given
Sturm-Liouville problem is y = sin πx 2 with an eigenvalue λ = π 2
/4 ' 2.4674 (as can be checked by
substitution in the equation y 00 + λy = 0 and verifying the given boundary conditions).
Now, if we plot this solution (see Figure 68) beside the trial functions (which can be seen as approximate
solutions) of Problem 11 we can see that as the trial functions become closer and closer to the exact
solution (as we move from a to e) the approximations to the eigenvalue become closer and closer
to the exact eigenvalue and hence the best estimation of the eigenvalue is obtained from the best
approximation of the eigenfunction.
1.4
1.2
a
1
e
0.8
b
y
0.6 c
d
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
x
Figure 68: Plot of the exact solution y = sin πx 2 of Problem 12 of § 7 alongside the trial functions of
Problem 11 of § 7. The curve of the exact solution is solid thick while the curves of the trial functions are
labeled with a, b, c, d, e (according to their labels in Problem 11 of § 7) with the curve e being dashed
for clear distinction.
13. Verify the result of Problem 6 by showing that using the exact solution (i.e. eigenfunction) of Problem
12 as a trial function will produce the (exact)
eigenvalue.
Answer: The exact solution is y = sin πx 2 and its derivative is y 0 = π2 cos πx
2 . Hence (see Eqs. 131
7 STURM-LIOUVILLE PROBLEMS 196
and 132):
ˆ 1 hπ πx i2 ˆ 1 πx
π2 2 π2 1 π2
I1 = cos dx = cos dx = =
0 2 2 4 0 2 4 2 8
ˆ 1 πx 1
I2 = sin2 dx =
0 2 2
[136] Infact, this shows that the stationary values of I1 /I2 of a (restricted) variational problem are the eigenvalues of the
corresponding Sturm-Liouville problem (see Problem 10). It should be obvious that the “stationary value” means the
value obtained from using the stationarizing (or extremizing) function.
Chapter 8
Rayleigh-Ritz Method
This is an approximation technique for finding the extremizing function in variational problems. The
method can be instigated and applied numerically as well as analytically. Thanks to its flexibility, relative
simplicity and natural adaptability to numerical implementations it is widely used in science and engineer-
ing. The method is based on starting from a guess representing a generic form of the extremizing function
where this guess can be regarded as an approximation to the real (or exact) extremizing function which
is the sought solution to the variational problem. The generic form is then developed and determined by
a variational procedure (possibly in a gradual process as will be clarified later in the following remarks
and in some of the upcoming Problems).
To put it in more practical terms, suppose that we are looking for an extremizing function y = y(x) of
an integral I[y] representing the functional of a variational problem. Also assume that the function y can
be approximated by a linear combination of linearly independent basis functions φi = φi (x) (i = 0, · · · , n)
of a certain type (e.g. polynomial or sinusoidal) and hence we can write:
n
X
y ' φ0 + ci φ i (133)
i=1
where ci are constants to be determined during the variational procedure.[137] So, all we need to determine
y (or rather its approximation) is to determine the constants ci ’s since the basis functions are known. This
means that we simply converted (or rather reduced) our task from determining y itself to determining
a set of constants assuming that the general form of y is known (as determined by the chosen type of
the basis functions). In fact, our functional integral I[y] can now be written (rather more appropriately
although we will not do that) as I[c1 , · · · , ci ] because this functional is now dependent (in its variation
and optimization) on the constants ci ’s. In other words, this functional is extremized (or stationarized)
by the set of ci ’s and hence we can tackle the variational problem by seeking the solution of the system
of equations:
∂I
=0 (i = 1, · · · , n) (134)
∂ci
In fact, the best way to understand the rationale and procedure of the Rayleigh-Ritz method is to use
it in some practical problems to understand and appreciate how and why it works. However, before that
it is important to note the following points about this method:
(a) Although the guessed form (as determined by the choice of the type of the basis functions) is rather
arbitrary, this form should be chosen to be as close as possible to the real form (assuming that we have
an idea about the real form) so that we can get the best approximation to the real solution. In fact, if the
form of the real solution is known then the guessed form should be chosen to match the real form so that
the obtained approximation will be very close (and potentially identical) to the real solution. For example,
if we know that y is a sinusoidal function then we should choose our basis functions to be sinusoidal so
that we obtain better results.
(b) The Rayleigh-Ritz method as described above can be generalized and extended to include other
variations and flavors such as having multiple variables. So, in this regard it is like the variational
treatment in its analytical form (as represented by the Euler-Lagrange equation in its various variations
and flavors which we investigated in the sections of chapter 1).
[137] Thechoice of c0 to be 1 does not affect the generality since we can always divide by c0 6= 0 (or absorb non-unity factor
into φ0 ) to reduce the form of y to the above form. In fact, this is related to the implementation of the boundary
conditions as will be clarified later.
197
8 RAYLEIGH-RITZ METHOD 198
(c) The Rayleigh-Ritz method can be used for estimating the eigenvalues in Sturm-Liouville problems.
(d) Noting that the Rayleigh-Ritz method is used in boundary value problems, the zeroth basis function
φ0 is usually chosen to satisfy the given boundary conditions while all the other basis functions φi (i =
1, · · · , n) are chosen to vanish at the boundaries.[138]
(e) As indicated above, the determination of the form of the extremizing function may be done in a gradual
process by starting from a certain order of approximation and moving to higher orders of approximation (if
necessary) where this process stops when we get the required accuracy from the most recent approximation
(according to certain criteria). For example, we may start from first order approximation y ' y1 =
φ0 + c1 φ1 where we need only to determine c1 . If, we are happy with y1 then we stop the process;
otherwise we go to the second order approximation y ' y2 = φ0 + c1 φ1 + c2 φ2 where c2 is estimated
with re-estimation of c1 . This may be followed by other approximations where in each approximation
(say the ith approximation) the value of the constant ci is estimated while the values of c1 , · · · , ci−1
are re-estimated. The main issue (and the fundamental presumption) in this gradual process is that each
approximation is better than (or at least not inferior to) the previous approximation so that we are always
heading toward better approximations to the real solution hoping that in the end (i.e. if we continue with
this process) we get very close to the solution y (or we may even get to the solution itself if we are lucky
and made good choices). In fact, if this procedure develops as described above (i.e. the approximations
improve persistently) then we should expect that we can make our approximation as close as we wish
to the real solution by increasing the order of approximation (as represented by n) and hence we should
expect to converge to the real solution when n → ∞, that is:[139]
∞
X
y = φ0 + ci φi (135)
i=1
(f ) The above-described Rayleigh-Ritz method is one dimensional. The method can be easily generalized
to multi-dimensions (although the required algebra and mathematical manipulation become very lengthy
and messy). However, instead of going through the description of this simple generalization we will
demonstrate this generalization by some examples of the Rayleigh-Ritz method in 2D (see Problems 6-9).
Problems
1. Describe the Rayleigh-Ritz method in a few words.
Answer: It is a variational method that employs basis functions to find approximate solutions for
boundary-value variational problems.
2. Re-solve part (a) of Problem 12 of § 1.4 (with k = 1) using this time the Rayleigh-Ritz method in a
gradual process. Plot the obtained approximation in the end of each stage of approximation (starting
from y1 ) and hence stop this gradual process when the obtained solution is sufficiently close (visually)
to the analytical (exact) solution that you obtained
´x in Problem
12 of § 1.4.
Answer: Noting that k = 1, we have I[y] = x12 y 02 + y 2 dx with y (x1 = 0) = 0 and y (x2 = 1) = 1.
We assume that we have no idea about the general form of the real solution and hence we use polynomial
basis functions. Referring to point (d) in the text, we select the zeroth basis function φ0 to satisfy
the given boundary conditions and select all the other basis functions φi (i = 1, · · · , n) to vanish at
the boundaries. So, if we choose φ0 = x (which is the straight line passing through the two boundary
points) then the boundary conditions are satisfied by this function because φ0 (0) = 0 and φ0 (1) = 1.
Also, if we choose the other basis functions so that they all contain the factor x(x − 1) then they will
all vanish at the boundaries because the x factor will ensure the vanishing at x = 0 while the (x − 1)
factor will ensure the vanishing at x = 1. Accordingly, we can write the nth approximation yn (i.e. the
approximation obtained in the nth stage of the gradual process) as:
yn = x + x (x − 1) c1 + c2 x + · · · + cn xn−1
[138] It may be more appropriate to say: while the other basis functions are chosen so that all the terms involving these
functions vanish at the boundaries. This depends on the meaning of “basis functions”. Anyway, this is a trivial matter.
[139] In fact, there are other conditions and restrictions and hence the above description is not sufficiently rigorous.
8 RAYLEIGH-RITZ METHOD 199
As we see, this approximation obviously satisfies the two boundary conditions at any stage of this
process.
Now, the first approximation is:
y1 = x + c1 x (x − 1) = c1 x2 + (1 − c1 ) x (136)
On substituting this into the integrand of the functional integral we get:
2 2
y 02 + y 2 ' [2c1 x + (1 − c1 )] + c1 x2 + (1 − c1 ) x
2 2
= 4c21 x2 + 4c1 (1 − c1 ) x + (1 − c1 ) + c21 x4 + 2c1 (1 − c1 ) x3 + (1 − c1 ) x2
= 4c21 x2 + 4c1 x − 4c21 x + 1 − 2c1 + c21 + c21 x4 + 2c1 x3 − 2c21 x3 + x2 − 2c1 x2 + c21 x2
= c21 x4 + 2c1 − 2c21 x3 + 5c21 − 2c1 + 1 x2 + 4c1 − 4c21 x + c21 − 2c1 + 1
On substituting this into the functional integral and integrating we get:
2 2
c1 5 2c1 − 2c21 4 5c1 − 2c1 + 1 3 4c1 − 4c21 2 2
1
I ' x + x + x + x + c1 − 2c1 + 1 x
5 4 3 2 0
2 2
c1 2
2c1 − 2c1 5c1 − 2c1 + 1 2
4c1 − 4c1
= + + + + c21 − 2c1 + 1 − 0
5 4 3 2
11 2 1 4
= c1 − c1 +
30 6 3
On differentiating I with respect to c1 and setting the result to zero we get:
dI 11 1 5
= c1 − = 0 and hence c1 =
dc1 15 6 22
So, the first approximation is (see Eq. 136):
5 2 5 5 2 17
y1 = x + 1− x= x + x
22 22 22 22
On plotting this solution (plotted as circles in Figure 69) alongside the analytical solution (plotted as
solid curve in Figure 69) we see that it is sufficiently close. So, we stop this gradual process.
3. Re-solve part (g) of Problem 12 of § 1.4 using this time the Rayleigh-Ritz method in a gradual process.
Plot the obtained approximation in the end of each stage of approximation (starting from y1 ) and
hence stop this gradual process when the obtained solution is practically indistinguishable (by vision)
´ xthat you obtained in Problem 12 of § 1.4.
from the analytical solution
Answer: We have I[y] = x12 y 02 − y 2 − 2xy dx with y (x1 = 0) = 1 and y (x2 = 1) = 2. We assume
that we have no idea about the general form of the real solution and hence we use polynomial basis
functions. Referring to point (d) in the text, we select the zeroth basis function φ0 to satisfy the
given boundary conditions and select all the other basis functions φi (i = 1, · · · , n) to vanish at the
boundaries. So, if we choose φ0 = 1 + x (which is the straight line passing through the two boundary
points) then the boundary conditions are satisfied by this function because φ0 (0) = 1 + 0 = 1 and
φ0 (1) = 1 + 1 = 2. Also, if we choose the other basis functions so that they all contain the factor
x(x − 1) then they will all vanish at the boundaries because the x factor will ensure the vanishing at
x = 0 while the (x − 1) factor will ensure the vanishing at x = 1. Accordingly, we can write the nth
approximation yn (i.e. the approximation obtained in the nth stage of the gradual process) as:
yn = 1 + x + x (x − 1) c1 + c2 x + · · · + cn xn−1
As we see, this approximation obviously satisfies the two boundary conditions at any stage of this
process.
Now, the first approximation is:
y1 = 1 + x + c1 x (x − 1) = c1 x2 + (1 − c1 ) x + 1 (137)
8 RAYLEIGH-RITZ METHOD 200
1
Analytical
y1
0.8
0.6
y
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
x
On plotting this solution (plotted as dashed curve in Figure 70) alongside the analytical solution
(plotted as solid curve in Figure 70) we see that it is close but distinguishable. So, we go to the next
approximation.
Analytical
2 y1
y2
1.8
y 1.6
1.4
1.2
1
0 0.2 0.4 0.6 0.8 1
x
Figure 70: Plot of the analytical solution y = cos x + 3−cos
sin 1
1
sin x − x of Problem 3 of § 8 (as obtained
in part g of Problem 12 of § 1.4) alongside the first Rayleigh-Ritz approximation y1 = − 10 2 19
9 x + 9 x+1
14 3 221 2 716
and the second Rayleigh-Ritz approximation y2 = − 41 x − 369 x + 369 x + 1.
y2 = 1 + x + c1 x (x − 1) + c2 x2 (x − 1) = c2 x3 + (c1 − c2 ) x2 + (1 − c1 ) x + 1 (138)
#1
4c1 c2 − 4c21 + 6c1 − 4c2 − 4 2 2
x + c1 − 2c1 x
2
0
" 2 2
2 2
c2 2c2 − 2c1 c2 8c2 − c1 + 4c1 c2 − 4c2
= − + + +
7 6 5
2 2
2c1 + 10c1 c2 − 12c22 − 4c1 + 2c2 4c2 − 14c1 c2 + 2c1 + 8c2 − 3 + 3c21
+ +
4 3
#
4c1 c2 − 4c21 + 6c1 − 4c2 − 4
+ c21 − 2c1 − 0
2
3 2 3 13 2 2 11
= c + c1 c2 + c + c1 + c2 − 3
10 1 10 105 2 3 30
On differentiating I with respect to c1 and c2 and setting the results to zero we get:
∂I 3 3 2
= c1 + c2 + = 0
∂c1 5 10 3
∂I 3 26 11
= c1 + c2 + =0
∂c2 10 105 30
On solving this system of simultaneous equations we get c1 = − 347 14
369 and c2 = − 41 . So, the second
approximation is (see Eq. 138):
14 3 347 14 2 347 14 221 2 716
y2 = − x + − + x + 1+ x + 1 = − x3 − x + x+1
41 369 41 369 41 369 369
On plotting this solution (plotted as circles in Figure 70) alongside the analytical solution (plotted as
solid curve in Figure 70) we see that it is indistinguishable from the analytical solution. So, we stop
this gradual process.
4. Re-solve part (c) of Problem 12 of § 1.4 using this time the Rayleigh-Ritz method in a gradual process.
Plot the obtained approximation in the end of each stage of approximation (starting from y1 ) and
hence stop this gradual process when the obtained solution is practically indistinguishable (by vision)
from the analytical solution that you obtained in Problem 12 of § 1.4.
´ x 02
Answer: We have I[y] = x12 yx3 dx with y (x1 = 2) = 1 and y (x2 = 4) = 31. We again use polynomial
basis functions. Referring to point (d) in the text, we select the zeroth basis function φ0 to satisfy the
given boundary conditions and select all the other basis functions φi (i = 1, · · · , n) to vanish at the
boundaries. So, if we choose φ0 = 15x−29 (which is the straight line passing through the two boundary
points) then the boundary conditions are satisfied by this function because φ0 (2) = 30 − 29 = 1 and
φ0 (4) = 60 − 29 = 31. Also, if we choose the other basis functions so that they all contain the factor
(x − 2) (x − 4) then they will all vanish at the boundaries because the (x − 2) factor will ensure the
vanishing at x = 2 while the (x − 4) factor will ensure the vanishing at x = 4. Accordingly, we can
write the nth approximation yn as:
yn = 15x − 29 + (x − 2) (x − 4) c1 + c2 x + · · · + cn xn−1
= 15x − 29 + x2 − 6x + 8 c1 + c2 x + · · · + cn xn−1
2
[2c1 x − 6c1 + 15]
=
x3
4c1 x + 60c1 − 24c21 x + 36c21 − 180c1 + 225
2 2
=
x3
= 2 −1
4c1 x + 60c1 − 24c1 x−2 + 36c21 − 180c1 + 225 x−3
2
On plotting this solution (plotted as dashed curve in Figure 71) alongside the analytical solution
(plotted as solid curve in Figure 71) we see that it is close but distinguishable. So, we go to the next
approximation.
Now, the second approximation is:
35
Analytical
y1
30 y2
25
20
y
15
10
0
2 2.5 3 3.5 4
x
4
Figure 71: Plot of the analytical solution y = x8 − 1 of Problem 4 of § 8 (as obtained in part c of Problem
12 of § 1.4) alongside the first Rayleigh-Ritz approximation y1 = 15x − 29 + 6.352111366(x2 − 6x + 8) and
the second Rayleigh-Ritz approximation y2 = 15x − 29 + (x2 − 6x + 8)(2.526375488 + 1.455927429x).
#
(2880 ln 2 − 2160) c2 + 675
On differentiating I with respect to c1 and c2 and setting the results to zero we get:
∂I
= 2 (128 ln 2 − 84) c1 + (1888 − 2688 ln 2) c2 − 60 = 0
∂c1
∂I
= (1888 − 2688 ln 2) c1 + 2 (6144 ln 2 − 4224) c2 + (2880 ln 2 − 2160) = 0
∂c2
On solving this system of simultaneous equations we get c1 ' 2.526375488 and c2 ' 1.455927429. So,
the second approximation is (see Eq. 140):
y2 = 15x − 29 + (x2 − 6x + 8)(2.526375488 + 1.455927429x)
On plotting this solution (plotted as circles in Figure 71) alongside the analytical solution (plotted as
solid curve in Figure 71) we see that it is indistinguishable from the analytical solution. So, we stop
this gradual process.
5. Re-solve part (e) of Problem 12 of § 1.4 using the y5 Rayleigh-Ritz approximation. Plot the obtained
y5 approximation alongside the analytical solution that you obtained in Problem 12 of § 1.4.
Answer: Following a similar method to that used in the previous Problems, we have:
2 π
y5 = x+x x− c1 + c2 x + c3 x2 + c4 x3 + c5 x4
π 2
π π
= c5 x + c4 − c5 x5 + c3 − c4 x4 +
6
2 2
8 RAYLEIGH-RITZ METHOD 205
π π 2 π
c2 − c3 x3 + c1 − c2 x2 + − c1 x (141)
2 2 π 2
On plotting this solution (plotted as circles in Figure 72) alongside the analytical solution (plotted as
solid curve in Figure 72) we see that the y5 Rayleigh-Ritz approximation is virtually identical to the
analytical solution.
8 RAYLEIGH-RITZ METHOD 206
1
Analytical
y5
0.8
0.6
y
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
x
Figure 72: Plot of the analytical solution y = − 41 cos x + 1 − 14 cosh π2 sin x + 14 cosh x of Problem 5 of §
8 (as obtained in part e of Problem 12 of § 1.4) alongside the y5 Rayleigh-Ritz approximation.
6. Re-solve Problem 6 of § 1.6 using this time the Rayleigh-Ritz method. Plot the obtained approximation
of the Rayleigh-Ritz method and compare it to the analytical solution that you obtained in Problem
6 of § 1.6.
Answer: This is a 2D problem and hence in the following we will extend (rather briefly) the above-
´1´1
described 1D Rayleigh-Ritz method to 2D. We have I [z] = 0 0 zx2 − zy2 dx dy with z (0, y) =
z (x, 0) = z (1, y) = z (x, 1) = 0 and z (0.5, 0.5) = 1. We again use polynomial basis functions.
Referring to point (d) in the text (and noting the extension to 2D), we select the zeroth basis function
φ0 to satisfy the given boundary conditions and select all the other basis functions φij to vanish at
the boundaries. So, if we choose φ0 = 0 (which is the plane z = 0 that passes through the four
boundary lines) then the above boundary conditions are obviously satisfied by this function. Also,
if we choose the other basis functions so that they all contain the factor xy(x − 1)(y − 1) then they
will all vanish at the boundaries because the factors x, y, (x − 1), (y − 1) will ensure the vanishing at
(0, y) , (x, 0) , (1, y) , (x, 1) respectively. Accordingly, we can write the mnth approximation zmn as:
zmn = 0 + xy(x − 1)(y − 1) c11 + c21 x + c12 y + c22 xy + c31 x2 + c13 y 2 + · · · + cmn xm−1 y n−1
= x2 y 2 − x2 y − xy 2 + xy c11 + c21 x + c12 y + c22 xy + c31 x2 + c13 y 2 + · · · + cmn xm−1 y n−1
where for the sake of simplicity and clarity we use in the second line (and subsequently) a, b, c, f to
8 RAYLEIGH-RITZ METHOD 207
represent c11 , c21 , c12 , c22 .[140] On substituting this into the functional integral and integrating twice
we get:
ˆ 1ˆ 1
I = zx2 − zy2 dx dy
0 0
ˆ 1 ˆ 1 h
' dy dx 3f x2 y 3 + 3 (b − f ) x2 y 2 + 2 (c − f ) xy 3 − 3bx2 y − cy 3 + 2 (a − b − c + f ) xy 2
0 0
h i2
+2 (b − a) xy + (c − a) y 2 + ay − 3f x3 y 2 + 2 (b − f ) x3 y + 3 (c − f ) x2 y 2
i2
−bx3 − 3cxy 2 + 2 (a − b − c + f ) x2 y + (b − a) x2 + 2 (c − a) xy + ax
ˆ
h 1
1
= dy 28f 2 + 70cf + 70c2 y 6
210 0
+ −56f 2 + (56b − 140c + 70a) f − 140c2 + (70b + 140a) c y 5
+ 10f 2 + (7c − 112b − 140a) f + 7c2 − (140b + 280a) c + 28b2 + 70ab + 70a2 y 4
+ 24f 2 + (84c + 32b + 28a) f + 84c2 + (28b + 56a) c − 56b2 − 140ab − 140a2 y 3
+ −8f 2 + (28b − 28c + 49a) f − 28c2 + (49b + 98a) c + 20b2 + 42ab + 42a2 y 2
i
+ − (8b + 14a) f − (14b + 28a) c + 8b2 + 28ab + 28a2 y − 2b2 + 7ab + 7a2
2 b2 − c2 + bf − cf
=
1575
As we see, I is independent of a and hence a (which stands for c11 ) is an arbitrary constant that can
be set to zero, i.e. a = 0.[141] On differentiating I with respect to b, c, f and setting the results to zero
we get:
∂I
= 2b + f = 0
∂b
∂I
= −2c − f = 0
∂c
∂I
= b−c=0
∂f
On solving this system of simultaneous equations we get b = c = − f2 . So, the z22 approximation is
(see Eq. 142 noting that a = 0):
3 3 1 1 1 1
z22 = f x3 y 3 − x3 y 2 − x2 y 3 + x3 y + xy 3 + 2x2 y 2 − x2 y − xy 2
2 2 2 2 2 2
Hence:
3 3 1 1 1 1
z22 = −64 x3 y 3 − x3 y 2 − x2 y 3 + x3 y + xy 3 + 2x2 y 2 − x2 y − xy 2
2 2 2 2 2 2
[140] Weuse f instead of d or e to avoid potential confusion with dx and the number e ' 2.71828.
[141] In
fact, this should have been anticipated earlier from the fact that φ0 = 0 and all the boundary conditions are zero
and hence c11 (which represents the constant term) should be zero. However, we preferred to do it the long way to
demonstrate this fact.
8 RAYLEIGH-RITZ METHOD 208
On plotting this solution (see the upper frame of Figure 73) alongside the analytical solution (see
the lower frame of Figure 73) which we obtained in Problem 6 of § 1.6 we see that although the z22
approximation is not sufficiently accurate, it is still useful. In fact, we may expect a better result from
higher order approximations although the algebra becomes increasingly messy and difficult. However,
the algebraic difficulties can be overcome with automation where computer codes can take care of the
required hard work.
7. Re-solve Problem 7 of § 1.6 using this time the Rayleigh-Ritz method. Plot the obtained approximation
of the Rayleigh-Ritz method and compare it to the analytical solution that you obtained in Problem
7 of § 1.6. ´1´1
Answer: We have I [z] = 0 0 zx2 + zy2 dx dy with z (0, y) = z (x, 0) = z (1, y) = 0 and z (x, 1) =
sin (πx) sinh (π). We again use polynomial basis functions (apart from the zeroth basis function which
should satisfy the boundary conditions and hence it can take any necessary form, as will be clarified
next). Referring to point (d) in the text (and noting the extension to 2D), we select the zeroth basis
function φ0 to satisfy the given boundary conditions and select all the other basis functions φij to
vanish at the boundaries. So, if we choose φ0 = sin (πx) sinh (πy) then the above boundary conditions
are obviously satisfied by this function. Also, if we choose the other basis functions so that they all
contain the factor xy(x − 1)(y − 1) then they will all vanish at the boundaries because the factors
x, y, (x − 1), (y − 1) will ensure the vanishing at (0, y) , (x, 0) , (1, y) , (x, 1) respectively. Accordingly,
we can write the mnth approximation zmn as:
h
zmn = sin (πx) sinh (πy) + xy(x − 1)(y − 1) c11 + c21 x + c12 y + c22 xy +
i
c31 x2 + c13 y 2 + · · · + cmn xm−1 y n−1
h
= sin (πx) sinh (πy) + x2 y 2 − x2 y − xy 2 + xy c11 + c21 x + c12 y + c22 xy +
i
c31 x2 + c13 y 2 + · · · + cmn xm−1 y n−1
where for the sake of simplicity and clarity we use in the second line (and subsequently) a, b, c, f to
represent c11 , c21 , c12 , c22 . On substituting this into the functional integral and integrating twice we
get:
ˆ 1ˆ 1
I = zx2 + zy2 dx dy
0 0
ˆ 1 ˆ 1 h
' dy dx π cos (πx) sinh (πy) + 3f x2 y 3 + 3 (b − f ) x2 y 2 + 2 (c − f ) xy 3
0 0
i2
−3bx2 y − cy 3 + 2 (a − b − c + f ) xy 2 + 2 (b − a) xy + (c − a) y 2 + ay +
h
π sin (πx) cosh (πy) + 3f x3 y 2 + 2 (b − f ) x3 y + 3 (c − f ) x2 y 2
i2
−bx3 − 3cxy 2 + 2 (a − b − c + f ) x2 y + (b − a) x2 + 2 (c − a) xy + ax
1 h
= 32f 2 + 4 {24 (c + b) + 35a} f + 96c2 + 140 (b + 2a) c + 96b2 +
12600
i
35 8ab + 8a2 + 45π e2π − e−2π
8 RAYLEIGH-RITZ METHOD 209
0.9
1
0.8
0.8 0.7
0.6 0.6
z
0.5
0.4
0.4
0.2
0.3
0
1
0.2
0.8 1
0.6 0.8
0.6 0.1
0.4
0.4
0.2
0.2 0
0 0
y x
1
0.9
1
0.8
0.8 0.7
0.6 0.6
z
0.5
0.4
0.4
0.2
0.3
0
1
0.2
0.8 1
0.6 0.8
0.6 0.1
0.4
0.4
0.2
0.2 0
0 0
y x
Figure 73: Plot of the z22 approximation of the Rayleigh-Ritz method of Problem 6 of § 8 (upper frame)
alongside the analytical solution z = sin (πx) sin (πy) of Problem 6 of § 1.6 (lower frame). For fair
comparison, the same xy mesh is used in both plots.
8 RAYLEIGH-RITZ METHOD 210
where for the sake of simplicity and clarity we use in the second line (and subsequently) a, b, c, f to
represent c11 , c21 , c12 , c22 . On substituting this into the functional integral and integrating twice we
get:
ˆ 1ˆ 1
2z 2
I = zx2 + zy2 − π 2 z 2 + 2 dx dy
0 0 y
ˆ 1 ˆ 1 h
' dy dx πy cos (πx) + 3f x2 y 3 + 3 (b − f ) x2 y 2 + 2 (c − f ) xy 3 − 3bx2 y
0 0
8 RAYLEIGH-RITZ METHOD 211
i2
−cy 3 + 2 (a − b − c + f ) xy 2 + 2 (b − a) xy + (c − a) y 2 + ay +
h
sin (πx) + 3f x3 y 2 + 2 (b − f ) x3 y + 3 (c − f ) x2 y 2 − bx3 − 3cxy 2
i2
+2 (a − b − c + f ) x2 y + (b − a) x2 + 2 (c − a) xy + ax +
h
2 2
− π y sin (πx) + f x3 y 3 + (b − f ) x3 y 2 + (c − f ) x2 y 3 − bx3 y
y2
i2
−cxy 3 + (a − b − c + f ) x2 y 2 + (b − a) x2 y + (c − a) xy 2 + axy
1 h
= − 3
8π 5 − 280π 3 f 2 + 28π 5 − 868π 3 cf + 28π 5 − 952π 3 bf
88200π
+ 49π 5 − 1470π 3 af − 117600f + 28π 5 − 868π 3 c2 + 49π 5 − 1470π 3 bc
+ 98π 5 − 2940π 3 ac − 235200c + 28π 5 − 1232π 3 b2 + 98π 5 − 3920π 3 ab
i
−352800b + 98π 5 − 3920π 3 a2 − 705600a − 132300π 3
On plotting this solution (see the upper frame of Figure 74) alongside the analytical solution (see the
lower frame of Figure 74) which we obtained in Problem 8 of § 1.6 we see that the z22 approximation is
almost identical to the analytical solution. In fact, we should expect a better result from higher order
approximations (e.g. z33 ) although the algebra becomes increasingly messy and difficult.
9. Re-solve Problem 10 of § 1.6 using this time the Rayleigh-Ritz method. Plot the obtained approxima-
tion of the Rayleigh-Ritz method and compare it to the analytical (series) solution that you obtained
in Problem 10 of § 1.6. ´1´1
Answer: We have I [z] = 0 0 zx2 + zy2 − 4z dx dy with z (0, y) = z (x, 0) = z (1, y) = z (x, 1) = 0.
In fact, the domain and boundary conditions for this Problem are the same as those of Problem 6 and
8 RAYLEIGH-RITZ METHOD 212
0.9
1
0.8
0.8 0.7
0.6 0.6
z
0.5
0.4
0.4
0.2
0.3
0
1
0.2
0.8 1
0.6 0.8
0.6 0.1
0.4
0.4
0.2
0.2 0
0 0
y x
1
0.9
1
0.8
0.8 0.7
0.6 0.6
z
0.5
0.4
0.4
0.2
0.3
0
1
0.2
0.8 1
0.6 0.8
0.6 0.1
0.4
0.4
0.2
0.2 0
0 0
y x
Figure 74: Plot of the z22 approximation of the Rayleigh-Ritz method of Problem 8 of § 8 (upper frame)
alongside the analytical solution z = y 2 sin (πx) of Problem 8 of § 1.6 (lower frame). For fair comparison,
the same xy mesh is used in both plots.
8 RAYLEIGH-RITZ METHOD 213
hence the basic formulation (as represented by zmn ) is the same. So, let try the z33 approximation,
that is:
z33 = x2 y 2 − x2 y − xy 2 + xy c11 + c12 y + c13 y 2 + c21 x + c22 xy + c23 xy 2
+c31 x2 + c32 x2 y + c33 x2 y 2
= kx4 y 4 + (h − k) x3 y 4 + (j − k) x4 y 3 + (c − h) x2 y 4 + (i − j) x4 y 2 + (g − h − j + k) x3 y 3 +
(b − c − g + h) x2 y 3 − cxy 4 + (f − g − i + j) x3 y 2 − ix4 y + (a − b − f + g) x2 y 2 +
(c − b) xy 3 + (i − f ) x3 y + (f − a) x2 y + (b − a) xy 2 + axy (145)
where for the sake of simplicity we use in the second equality (and subsequently) a, b, c, f, g, h, i, j, k
to represent c11 , c12 , c13 , c21 , c22 , c23 , c31 , c32 , c33 . On substituting this into the functional integral and
integrating twice we get:
ˆ 1ˆ 1
I = zx2 + zy2 − 4z dx dy
0 0
ˆ 1 ˆ 1 h
' dy dx 4kx3 y 4 + 3 (h − k) x2 y 4 + 4 (j − k) x3 y 3 + 2 (c − h) xy 4 + 4 (i − j) x3 y 2
0 0
+3 (g − h − j + k) x2 y 3 + 2 (b − c − g + h) xy 3 − cy 4 + 3 (f − g − i + j) x2 y 2 − 4ix3 y
i2
+2 (a − b − f + g) xy 2 + (c − b) y 3 + 3 (i − f ) x2 y + 2 (f − a) xy + (b − a) y 2 + ay +
h
4kx4 y 3 + 4 (h − k) x3 y 3 + 3 (j − k) x4 y 2 + 4 (c − h) x2 y 3 + 2 (i − j) x4 y
+3 (g − h − j + k) x3 y 2 + 3 (b − c − g + h) x2 y 2 − 4cxy 3 + 2 (f − g − i + j) x3 y − ix4
i2
+2 (a − b − f + g) x2 y + 3 (c − b) xy 2 + (i − f ) x3 + (f − a) x2 + 2 (b − a) xy + ax −
h
4 kx4 y 4 + (h − k) x3 y 4 + (j − k) x4 y 3 + (c − h) x2 y 4 + (i − j) x4 y 2
+ (g − h − j + k) x3 y 3 + (b − c − g + h) x2 y 3 − cxy 4 + (f − g − i + j) x3 y 2 − ix4 y
i
+ (a − b − f + g) x2 y 2 + (c − b) xy 3 + (i − f ) x3 y + (f − a) x2 y + (b − a) xy 2 + axy
1 h
= 5880a2 + 5880ab + 3444ac + 5880af + 2940ag + 1722ah + 3444ai + 1722aj +
264600
1008ak + 2016b2 + 2814bc + 2940bf + 2016bg + 1407bh + 1722bi + 1176bj + 819bk +
1106c2 + 1722cf + 1407cg + 1106ch + 1008ci + 819cj + 642ck + 2016f 2 + 2016f g +
1176f h + 2814f i + 1407f j + 819f k + 672g 2 + 924gh + 1407gi + 924gj + 630gk +
356h2 + 819hi + 630hj + 480hk + 1106i2 + 1106ij + 642ik + 356j 2 + 480jk +
i
180k 2 − 29400a − 14700b − 8820c − 14700f − 7350g − 4410h − 8820i − 4410j − 2646k
∂I
= 2940a + 2016b + 1407c + 2016f + 1344g + 924h + 1407i + 924j + 630k − 7350 = 0
∂g
∂I
= 1722a + 1407b + 1106c + 1176f + 924g + 712h + 819i + 630j + 480k − 4410 = 0
∂h
∂I
= 3444a + 1722b + 1008c + 2814f + 1407g + 819h + 2212i + 1106j + 642k − 8820 = 0
∂i
∂I
= 1722a + 1176b + 819c + 1407f + 924g + 630h + 1106i + 712j + 480k − 4410 = 0
∂j
∂I
= 1008a + 819b + 642c + 819f + 630g + 480h + 642i + 480j + 360k − 2646 = 0
∂k
On solving this system of simultaneous equations we get a = 4, b = −21/4, c = 21/4, f = −21/4,
g = 63/4, h = −63/4, i = 21/4, j = −63/4 and k = 63/4. So, the z33 approximation is (see Eq. 145):
63 4 4 63 3 4 63 4 3 21
z33 = x y − x y − x y + 21x2 y 4 + 21x4 y 2 + 63x3 y 3 − 42x2 y 3 − 42x3 y 2 − xy 4
4 2 2 4
21 4 121 2 2 21 3 21 3 37 2 37 2
− x y+ x y + xy + x y − x y − xy + 4xy
4 4 2 2 4 4
On plotting this solution (see the upper frame of Figure 75) alongside the analytical (series) solution
(see the lower frame of Figure 75 where we used in this plot all the series terms up to and including
m = n = 7) which we obtained in Problem 10 of § 1.6 we see that the z33 Rayleigh-Ritz approximation
is almost identical to the analytical (series) solution.
8 RAYLEIGH-RITZ METHOD 215
0.14
0.16 0.12
0.14
0.12 0.1
0.1
z 0.08 0.08
0.06
0.04 0.06
0.02
0 0.04
1
0.8 1
0.6 0.8 0.02
0.4 0.6
0.4
0.2
0.2 0
0 0
y x
0.14
0.16 0.12
0.14
0.12 0.1
0.1
z 0.08 0.08
0.06
0.04 0.06
0.02
0 0.04
1
0.8 1
0.6 0.8 0.02
0.4 0.6
0.4
0.2
0.2 0
0 0
y x
Figure 75: Plot of the z33 approximation of the Rayleigh-Ritz method of Problem 9 of § 8 (upper frame)
P7 P7 (1−[−1]m )(1−[−1]n )
alongside the analytical (series) solution z = π84 n=1 m=1 mn(m2 +n2 ) sin (mπx) sin (nπy) of
Problem 10 of § 1.6 (lower frame). For fair comparison, the same xy mesh is used in both plots.
Chapter 9
Numerical Methods
There are many numerical methods for solving variational problems. These methods are generally based
on discretizing the continuous (analytical) formulation of the calculus of variations problems and imple-
menting the discretized formulation version manually or computationally (using computers). In this book,
we just investigate briefly one of the simplest (and possibly the simplest) numerical variational methods
which is based on the finite difference technique. The idea of this method is very simple although the
technique is algebraically messy in most cases (especially if it is implemented manually). This method is
outlined in the following points:
1. The analytical variational formulation is fundamentally based on a functional integral (as given by
Eq. 1). Now, the integral is the continuous version of the discretized version of a sum. So, we simply
convert the functional integral to a sum, that is:
X
S= F (xi , yi , yi0 ) ∆x (146)
i
where we replaced I (for Integral) with S (for Sum) and replaced the infinitesimal dx with the finite
∆x.
2. We discretize the x interval (as determined by the boundaries which represent the limits of the integral)
to n + 1 equal divisions by inserting n points evenly between the two boundaries. Accordingly, we have
−x0
∆x = xn+1 n+1 where x0 and xn+1 are the x coordinates of the boundary points. So, we have n + 2
points (i.e. 2 boundary points and n inserted points) and n + 1 divisions. We can therefore rewrite
the sum of Eq. 146 as:
X n
S= F (xi , yi , yi0 ) ∆x (147)
i=0
yi+1 −yi
where =yi0 ∆x .
3. We can now form a table summarizing our n + 2 points. This table shows what is known and what is
unknown (which what we should look for in our solution), that is:
x0 x1 x2 ··············· xn−1 xn xn+1
X X X X············X X X X
X ? ? ? ············ ? ? ? X
y0 y1 y2 ··············· yn−1 yn yn+1
where the check mark X means known while the question mark ? means unknown. Accordingly, S
in Eq. 147 becomes a function (rather than functional) of n unknowns which are y1 , · · · , yn . We can
therefore rewrite the sum of Eq. 147 as:
n
X
S [y1 , · · · , yn ] = F (xi , yi , yi0 ) ∆x (148)
i=0
216
9 NUMERICAL METHODS 217
.. .. ..
. . .
∂S
= 0
∂yn
We then solve this system of n simultaneous equations and obtain the unknowns y1 , · · · , yn .
5. So, we now have n + 2 known points, i.e.
We can use these points as an approximation to the extremizing function either directly (and hence the
extremizing function is a a polygonal curve that connects these points consecutively by straight line
segments) or as an input to an interpolation schemes (such as polynomial and spline interpolations).
We should finally remark that the approximation generally improves by increasing n (although at a
cost that may not be affordable or desirable at a certain point). We should also note that the above-
described finite difference method is one dimensional. The method can be easily generalized to multi-
dimensions (although the required algebra and mathematical manipulation become very lengthy and
messy). However, instead of going through the description of this simple generalization we will demonstrate
this generalization by some examples of the finite difference method in 2D (see Problems 6-8).
Problems
1. Re-solve part (c) of Problem 12 of § 1.4 using this time the finite difference method with n = 3
discretization scheme (i.e. by inserting 3 evenly-spaced points between the boundaries and hence
making 4 divisions). Plot the obtained approximation points beside the analytical solution that you
obtained in Problem 12 of § 1.4.
Answer: From the required discretization scheme n = 3 and the boundary conditions y0 (x0 = 2) = 1
and y4 (x4 = 4) = 31 we get ∆x and form the table, that is:
xn+1 − x0 x3+1 − x0 x4 − x0 4−2 1
∆x = = = = =
n+1 3+1 4 4 2
x0 x1 x2 x3 x4
2 2.5 3 3.5 4
1 ? ? ? 31
y0 y1 y2 y3 y4
3
X 2
(yi+1 − yi )
=
i=0
x3i ∆x
2 2 2 2
(y1 − y0 ) (y2 − y1 ) (y3 − y2 ) (y4 − y3 )
= + + +
x30 ∆x x31 ∆x x32 ∆x x33 ∆x
" #
2 2 2 2
(y1 − 1) (y2 − y1 ) (y3 − y2 ) (31 − y3 )
= 2 + + +
8 125/8 27 343/8
2
2 2
y1 − 2y1 + 1 16 y2 − 2y1 y2 + y1
= +
4 125
9 NUMERICAL METHODS 218
2 y32 − 2y2 y3 + y22 16 961 − 62y3 + y32
+ +
27 343
189 2 32 682 2 4 1118 2 1 992 61847
= y − y1 y2 + y − y2 y3 + y − y1 − y3 +
500 1 125 3375 2 27 9261 3 2 343 1372
On taking the partial derivatives of S with respect to y1 , y2 , y3 and setting the results to zero we get:
∂S 189 32 1
= + y1 − y2 − = 0
∂y1 250 125 2
∂S 32 1364 4
= − y1 + y2 − y3 = 0
∂y2 125 3375 27
∂S 4 2236 992
= − y2 + y3 − =0
∂y3 27 9261 343
On solving this system of simultaneous equations we get:
667 3209 6449
y1 = y2 = y3 =
187 374 374
On plotting the 5 points (i.e. 2 boundaries and 3 inserted) beside the analytical solution (which we
obtained in Problem 12 of § 1.4) we get Figure 76.
35
Analytical
yi
30
25
20
y
15
10
0
2 2.5 3 3.5 4
x
4
Figure 76: Plot of the analytical solution y = x8 − 1 of Problem 1 of § 9 (as obtained in part c of Problem
12 of § 1.4) alongside the points y1 , y2 , y3 obtained by finite difference (as well as the boundary points).
2. Re-solve part (a) of Problem 12 of § 1.4 (with k = 1) using this time the finite difference method
with n = 3 discretization scheme (i.e. by inserting 3 evenly-spaced points between the boundaries and
hence making 4 divisions). Plot the obtained approximation points beside the analytical solution that
you obtained in Problem 12 of § 1.4.
9 NUMERICAL METHODS 219
Answer: From the required discretization scheme n = 3 and the boundary conditions y0 (x0 = 0) = 0
and y4 (x4 = 1) = 1 we get ∆x and form the table, that is:
xn+1 − x0 x3+1 − x0 x4 − x0 1−0 1
∆x = = = = =
n+1 3+1 4 4 4
x0 x1 x2 x3 x4
0 0.25 0.5 0.75 1
0 ? ? ? 1
y0 y1 y2 y3 y4
1
Analytical
yi
0.8
0.6
y
0.4
0.2
0
0 0.25 0.5 0.75 1
x
get:
∂S 102
= y1 − 10y2 = 0
∂y1 5
∂S 102
= y2 − 10y1 − 10y3 = 0
∂y2 5
∂S 102
= y3 − 10y2 − 10y4 = 0
∂y3 5
∂S 102
= y4 − 10y3 − 10 = 0
∂y4 5
On solving this system of simultaneous equations we get:
390625 796875 1235000 1722525
y1 = y2 = y3 = y4 =
2278951 2278951 2278951 2278951
On plotting the 6 points (i.e. 2 boundaries and 4 inserted) beside the analytical solution (which we
obtained in Problem 12 of § 1.4) we get Figure 78.
1
Analytical
yi
0.8
0.6
y
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
x
4. Re-solve part (g) of Problem 12 of § 1.4 using this time the finite difference method with n = 4
discretization scheme (i.e. by inserting 4 evenly-spaced points between the boundaries and hence
making 5 divisions). Plot the obtained approximation points beside the analytical solution that you
obtained in Problem 12 of § 1.4.
Answer: From the required discretization scheme n = 4 and the boundary conditions y0 (x0 = 0) = 1
9 NUMERICAL METHODS 222
x0 x1 x2 x3 x4 x5
0 0.2 0.4 0.6 0.8 1
1 ? ? ? ? 2
y0 y1 y2 y3 y4 y5
Analytical
2 yi
1.8
y 1.6
1.4
1.2
1
0 0.2 0.4 0.6 0.8 1
x
Figure 79: Plot of the analytical solution y = cos x + 3−cos
sin 1
1
sin x − x of Problem 4 of § 9 (as obtained
in part g of Problem 12 of § 1.4) alongside the points y1 , y2 , y3 , y4 obtained by finite difference (as well as
the boundary points).
5. Re-solve part (f) of Problem 12 of § 1.4 using this time the finite difference method with n = 5
discretization scheme (i.e. by inserting 5 evenly-spaced points between the boundaries and hence
making 6 divisions). Plot the obtained approximation points beside the analytical solution that you
obtained in Problem 12 of § 1.4.
Answer: From the required discretization scheme n = 5 and the boundary conditions y0 (x0 = 0) = 1
and y6 (x6 = π) = 1 we get ∆x and form the table, that is:
xn+1 − x0 x5+1 − x0 x6 − x0 π−0 π
∆x = = = = =
n+1 5+1 6 6 6
x0 x1 x2 x3 x4 x5 x6
π 2π 3π 4π 5π 6π
0 6 6 6 6 6 6
1 ? ? ? ? ? 1
y0 y1 y2 y3 y4 y5 y6
5
" #
X 2
yi+1
2
− 2yi+1 yi + yi2 + yi2 (∆x) − 4yi cos xi (∆x)
2
=
i=0
∆x
1 Xh 2 i
5
2 2
= yi+1 − 2yi+1 yi + yi2 + yi2 (∆x) − 4yi cos xi (∆x)
∆x i=0
5
6 X 2 2 π2 2 π2
= y − 2yi+1 yi + yi + y − yi cos xi
π i=0 i+1 36 i 9
6h 2 π2 2 π2 π2 2 π2
= y1 − 2y1 y0 + y02 + y0 − y0 cos x0 + y22 − 2y2 y1 + y12 + y − y1 cos x1
π 36 9 36 1 9
π2 2 π2 π2 2 π2
+y32 − 2y3 y2 + y22 + y2 − y2 cos x2 + y42 − 2y4 y3 + y32 + y − y3 cos x3
36 9 36 3 9
π2 2 π2 π2 2 π2 i
+y52 − 2y5 y4 + y42 + y4 − y4 cos x4 + y62 − 2y6 y5 + y52 + y5 − y5 cos x5
36 9 √ 36 9
6h 2 π2 π2 π 2
π 2
3
= y − 2y1 + 1 + − + y22 − 2y2 y1 + y12 + y2 − y1
π 1 36 9 36 1 18
π2 2 π2 π2 2
+y32 − 2y3 y2 + y22 + y2 − y2 + y42 − 2y4 y3 + y32 + y
36 18 36 3 √
π2 2 π2 π2 2 π2 3 i
+y52 − 2y5 y4 + y42 + y4 + y4 + 1 − 2y5 + y52 + y + y5
" 36 18 36 5 18
6 π2
= 2+ y12 + y22 + y32 + y42 + y52 − 2 (y1 y2 + y2 y3 + y3 y4 + y4 y5 )
π 36
√ ! √ ! #
π2 3 π2 π2 π2 3 π2
− 2+ y1 − y2 + y4 − 2 − y5 + 2 −
18 18 18 18 12
On taking the partial derivatives of S with respect to y1 , y2 , y3 , y4 , y5 and setting the results to zero
we get:
√ !
∂S π2 π2 3
= 4+ y1 − 2y2 − 2 + =0
∂y1 18 18
∂S π2 π2
= 4+ y2 − 2y1 − 2y3 − =0
∂y2 18 18
∂S π2
= 4+ y3 − 2y2 − 2y4 = 0
∂y3 18
∂S π2 π2
= 4+ y4 − 2y3 − 2y5 + =0
∂y4 18 18
√ !
∂S π2 π2 3
= 4+ y5 − 2y4 − 2 − =0
∂y5 18 18
y1 ' 0.96676626 y2 ' 0.72372541 y3 ' 0.40494232 y4 ' 0.19717646 y5 ' 0.31762333
On plotting the 7 points (i.e. 2 boundaries and 5 inserted) beside the analytical solution (which we
obtained in Problem 12 of § 1.4) we get Figure 80.
9 NUMERICAL METHODS 225
Analytical
1 yi
0.8
y 0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
x
2
Figure 80: Plot of the analytical solution y = sinh π sinh x + cos x of Problem 5 of § 9 (as obtained in
part f of Problem 12 of § 1.4) alongside the points y1 , y2 , y3 , y4 , y5 obtained by finite difference (as well as
the boundary points).
6. Re-solve Problem 7 of § 1.6 using this time the finite difference method with m = n = 3 discretization
scheme (i.e. by inserting 3 evenly-spaced points between the x and y boundaries and hence making
16 square divisions). Plot the obtained approximation points beside the analytical solution that you
obtained in Problem 7 of § 1.6.
Answer: This is a 2D problem and hence in the following we will extend (rather briefly) the above-
described 1D finite difference method to 2D. From the required discretization scheme m = n = 3 and
the boundary conditions z (0, y) = z (x, 0) = z (1, y) = 0 and z (x, 1) = sin (πx) sinh (π) we get ∆x and
∆y and form the table, that is:
xm+1 − x0 1−0 1 yn+1 − y0 1−0 1
∆x = = = and ∆y = = =
m+1 3+1 4 n+1 3+1 4
x0 = 0 x1 = 14 x2 = 42 x3 = 43 x4 = 44
y0 = 0 z00 = 0 z10 = 0 z20 = 0 z30 = 0 z40 = 0
y1 = 14 z01 = 0 z11 =? z21 =? z31 =? z41 = 0
y2 = 24 z02 = 0 z12 =? z22 =? z32 =? z42 = 0
y3 = 34 z03 = 0 z13 =? z23 =? z33 =? z43 = 0
y4 = 44 z04 = 0 z14 = sin π4 sinh (π) z24 = sinh (π) z34 = sin 3π4 sinh (π) z44 = 0
where
z(i+1)j − zij zi(j+1) − zij
zij,x = and zij,y = (151)
∆x ∆y
On using this equation with m = n = 3 and F = zx2 + zy2 , we get:
3 X
X 3
2 2
S [z11 , · · · , z33 ] = zij,x + zij,y ∆x ∆y
j=0 i=0
3 X 3
" #
X z(i+1)j − zij 2 zi(j+1) − zij 2
= + ∆x ∆y
j=0 i=0
∆x ∆y
3 X 3
" 2 2 2 2
#
X z(i+1)j − 2z(i+1)j zij + zij zi(j+1) − 2zi(j+1) zij + zij
= + ∆x ∆y
j=0 i=0
∆x∆x ∆y∆y
Now, if we have ∆x = ∆y (as in our case) then this formula will simplify to the following:
3 h
3 X
X i
2 2 2
S= z(i+1)j − 2z(i+1)j zij + zi(j+1) − 2zi(j+1) zij + 2zij
j=0 i=0
that is:
2 2 2
S = z10 − 2z10 z00 + z01 − 2z01 z00 + 2z00 + (i = 0, j = 0)
2 2 2
z20 − 2z20 z10 + z11 − 2z11 z10 + 2z10 + (i = 1, j = 0)
2 2 2
z30 − 2z30 z20 + z21 − 2z21 z20 + 2z20 + (i = 2, j = 0)
2 2 2
z40 − 2z40 z30 + z31 − 2z31 z30 + 2z30 + (i = 3, j = 0)
2 2 2
z11 − 2z11 z01 + z02 − 2z02 z01 + 2z01 + (i = 0, j = 1)
2 2 2
z21 − 2z21 z11 + z12 − 2z12 z11 + 2z11 + (i = 1, j = 1)
2 2 2
z31 − 2z31 z21 + z22 − 2z22 z21 + 2z21 + (i = 2, j = 1)
2 2 2
z41 − 2z41 z31 + z32 − 2z32 z31 + 2z31 + (i = 3, j = 1)
2 2 2
z12 − 2z12 z02 + z03 − 2z03 z02 + 2z02 + (i = 0, j = 2)
2 2 2
z22 − 2z22 z12 + z13 − 2z13 z12 + 2z12 + (i = 1, j = 2)
2 2 2
z32 − 2z32 z22 + z23 − 2z23 z22 + 2z22 + (i = 2, j = 2)
2 2 2
z42 − 2z42 z32 + z33 − 2z33 z32 + 2z32 + (i = 3, j = 2)
2 2 2
z13 − 2z13 z03 + z04 − 2z04 z03 + 2z03 + (i = 0, j = 3)
2 2 2
z23 − 2z23 z13 + z14 − 2z14 z13 + 2z13 + (i = 1, j = 3)
2 2 2
z33 − 2z33 z23 + z24 − 2z24 z23 + 2z23 + (i = 2, j = 3)
2 2 2
z43 − 2z43 z33 + z34 − 2z34 z33 + 2z33
(i = 3, j = 3)
On combining similar terms and applying the first three boundary conditions i.e. z (0, y) = z (x, 0) =
z (1, y) = 0 by eliminating all the terms containing z0j or zi0 or z4j , this expression of S will simplify
to the following:
2 2 2 2 2 2 2 2 2
S = 4z11 + 4z21 + 4z31 + 4z12 + 4z22 + 4z32 + 4z13 + 4z23 + 4z33 +
2 2 2
z14 + z24 + z34 − 2z21 z11 − 2z12 z11 − 2z31 z21 − 2z22 z21 − 2z32 z31 − 2z22 z12 − 2z13 z12
−2z32 z22 − 2z23 z22 − 2z33 z32 − 2z23 z13 − 2z14 z13 − 2z33 z23 − 2z24 z23 − 2z34 z33
On taking the partial derivatives of S with respect to z11 , · · · , z33 and setting the results to zero (with
the application of the fourth boundary condition which concerns z14 , z24 , z34 ) we get:
∂S
= 8z11 − 2z21 − 2z12 = 0
∂z11
9 NUMERICAL METHODS 227
∂S
= 8z12 − 2z11 − 2z22 − 2z13 = 0
∂z12
∂S π
= 8z13 − 2z12 − 2z23 − 2 sin sinh (π) = 0
∂z13 4
∂S
= 8z21 − 2z11 − 2z31 − 2z22 = 0
∂z21
∂S
= 8z22 − 2z21 − 2z12 − 2z32 − 2z23 = 0
∂z22
∂S
= 8z23 − 2z22 − 2z13 − 2z33 − 2 sinh (π) = 0
∂z23
∂S
= 8z31 − 2z21 − 2z32 = 0
∂z31
∂S
= 8z32 − 2z31 − 2z22 − 2z33 = 0
∂z32
∂S 3π
= 8z33 − 2z32 − 2z23 − 2 sin sinh (π) = 0
∂z33 4
On solving this system of simultaneous equations we get:
z11 ' 0.673903372 z21 ' 0.953043288 z31 ' 0.673903372
z12 ' 1.742570199 z22 ' 2.464366408 z32 ' 1.742570199
z13 ' 3.832011015 z23 ' 5.419281947 z33 ' 3.832011015
On plotting these points with the boundary points (see the upper frame of Figure 81) alongside the
analytical solution (see the lower frame of Figure 81) which we obtained in Problem 7 of § 1.6 we see
that the two plots are almost identical.
7. Re-solve Problem 8 of § 1.6 using this time the finite difference method with m = n = 4 discretization
scheme (i.e. by inserting 4 evenly-spaced points between the boundaries and hence making 25 square
divisions). Plot the obtained approximation points beside the analytical solution that you obtained in
Problem 8 of § 1.6.
Answer: From the required discretization scheme m = n = 4 and the boundary conditions z (0, y) =
z (x, 0) = z (1, y) = 0 and z (x, 1) = sin (πx) we get ∆x and ∆y and form the table, that is:
xm+1 − x0 1−0 1 yn+1 − y0 1−0 1
∆x = = = and ∆y = = =
m+1 4+1 5 n+1 4+1 5
x0 = 0 x1 = 15 x2 = 25 x3 = 53 x4 = 45 x5 = 55
y0 = 0 z00 = 0 z10 = 0 z20 = 0 z30 = 0 z40 = 0 z50 = 0
y1 = 15 z01 = 0 z11 =? z21 =? z31 =? z41 =? z51 = 0
y2 = 25 z02 = 0 z12 =? z22 =? z32 =? z42 =? z52 = 0
y3 = 35 z03 = 0 z13 =? z23 =? z33 =? z43 =? z53 = 0
y4 = 45 z04 = 0 z14 =? z24 =? z34 =? z44 =? z54 = 0
y5 = 55 z05 = 0 z15 = sin π5 z25 = sin 2π5 z35 = sin 3π5 z45 = sin 4π5 z55 = 0
On generalizing Eq. 148 to 2D (as we did in Problem 6; see Eqs. 150 and 151) and noting that
m = n = 4 and F = zx2 + zy2 + 2y −2 − π 2 z 2 we get S [z11 , · · · , z44 ], that is:
4 X
X 4
2 2
S = 2
zij,x + zij,y + 2yj−2 − π 2 zij ∆x ∆y
j=0 i=0
4 X 4
" #
X z(i+1)j − zij 2 zi(j+1) − zij 2 2
−2 2
= + + 2yj − π zij ∆x ∆y
j=0 i=0
∆x ∆y
9 NUMERICAL METHODS 228
10
12
10
8
z 6
6
2 4
0
1
0.8 1 2
0.6 0.8
0.4 0.6
0.4
0.2
0.2 0
0 0
y x
10
12
10
8
z 6
6
2 4
0
1
0.8 1 2
0.6 0.8
0.4 0.6
0.4
0.2
0.2 0
0 0
y x
Figure 81: The upper frame is a plot of the points z11 , · · · , z33 which we obtained in Problem 6 of § 9 by
finite difference (as well as the boundary points) while the lower frame is a plot of the analytical solution
z = sin (πx) sinh (πy) which we obtained in Problem 7 of § 1.6. For fair comparison, we use the same xy
mesh in both plots.
9 NUMERICAL METHODS 229
4 X
4
" 2 2 2 2
#
X z(i+1)j − 2z(i+1)j zij + zij zi(j+1) − 2zi(j+1) zij + zij
= + + 2yj−2 −π 2 2
zij ∆x ∆y
j=0 i=0
∆x∆x ∆y∆y
Now, if we have ∆x = ∆y = 0.2 (as in our case) then this formula will simplify to the following:
4 h
4 X
X 2i
S = 2
z(i+1)j 2
− 2z(i+1)j zij + zi(j+1) 2
− 2zi(j+1) zij + 2zij + 0.04 2yj−2 − π 2 zij
j=0 i=0
4 h
4 X
X 2i
= 2
z(i+1)j 2
− 2z(i+1)j zij + zi(j+1) − 2zi(j+1) zij + 0.08yj−2 + 2 − 0.04π 2 zij
j=0 i=0
that is:
2
S = 2
z10 2
− 2z10 z00 + z01 − 2z01 z00 + 0.08y0−2 + 2 − 0.04π 2 z00 + (i = 0, j = 0)
2 2 −2 2
2
z20 − 2z20 z10 + z11 − 2z11 z10 + 0.08y0 + 2 − 0.04π z10 + (i = 1, j = 0)
2
2
z30 2
− 2z30 z20 + z21 − 2z21 z20 + 0.08y0−2 + 2 − 0.04π 2 z20 + (i = 2, j = 0)
2 2 −2 2
2
z40 − 2z40 z30 + z31 − 2z31 z30 + 0.08y0 + 2 − 0.04π z30 + (i = 3, j = 0)
2
2
z50 2
− 2z50 z40 + z41 − 2z41 z40 + 0.08y0−2 + 2 − 0.04π 2 z40 + (i = 4, j = 0)
2 2 −2 2
2
z11 − 2z11 z01 + z02 − 2z02 z01 + 0.08y1 + 2 − 0.04π z01 + (i = 0, j = 1)
2
2
z21 2
− 2z21 z11 + z12 − 2z12 z11 + 0.08y1−2 + 2 − 0.04π 2 z11 + (i = 1, j = 1)
2 2 −2 2
2
z31 − 2z31 z21 + z22 − 2z22 z21 + 0.08y1 + 2 − 0.04π z21 + (i = 2, j = 1)
2
2
z41 2
− 2z41 z31 + z32 − 2z32 z31 + 0.08y1−2 + 2 − 0.04π 2 z31 + (i = 3, j = 1)
2 2 −2 2
2
z51 − 2z51 z41 + z42 − 2z42 z41 + 0.08y1 + 2 − 0.04π z41 + (i = 4, j = 1)
2
2
z12 2
− 2z12 z02 + z03 − 2z03 z02 + 0.08y2−2 + 2 − 0.04π 2 z02 + (i = 0, j = 2)
2 2 −2 2
2
z22 − 2z22 z12 + z13 − 2z13 z12 + 0.08y2 + 2 − 0.04π z12 + (i = 1, j = 2)
2
2
z32 2
− 2z32 z22 + z23 − 2z23 z22 + 0.08y2−2 + 2 − 0.04π 2 z22 + (i = 2, j = 2)
2 2 −2 2
2
z42 − 2z42 z32 + z33 − 2z33 z32 + 0.08y2 + 2 − 0.04π z32 + (i = 3, j = 2)
2
2
z52 2
− 2z52 z42 + z43 − 2z43 z42 + 0.08y2−2 + 2 − 0.04π 2 z42 + (i = 4, j = 2)
2 2 −2 2
2
z13 − 2z13 z03 + z04 − 2z04 z03 + 0.08y3 + 2 − 0.04π z03 + (i = 0, j = 3)
2
2
z23 2
− 2z23 z13 + z14 − 2z14 z13 + 0.08y3−2 + 2 − 0.04π 2 z13 + (i = 1, j = 3)
2 2 −2 2
2
z33 − 2z33 z23 + z24 − 2z24 z23 + 0.08y3 + 2 − 0.04π z23 + (i = 2, j = 3)
2
2
z43 2
− 2z43 z33 + z34 − 2z34 z33 + 0.08y3−2 + 2 − 0.04π 2 z33 + (i = 3, j = 3)
2 2 −2
z53 − 2z53 z43 + z44 − 2z44 z43 + 0.08y3 + 2 − 0.04π 2 z43 2
+ (i = 4, j = 3)
2 2 −2 2
2
z14 − 2z14 z04 + z05 − 2z05 z04 + 0.08y4 + 2 − 0.04π z04 + (i = 0, j = 4)
2
2
z24 2
− 2z24 z14 + z15 − 2z15 z14 + 0.08y4−2 + 2 − 0.04π 2 z14 + (i = 1, j = 4)
2 2 −2 2
2
z34 − 2z34 z24 + z25 − 2z25 z24 + 0.08y4 + 2 − 0.04π z24 + (i = 2, j = 4)
2
2
z44 2
− 2z44 z34 + z35 − 2z35 z34 + 0.08y4−2 + 2 − 0.04π 2 z34 + (i = 3, j = 4)
2 2 −2 2
2
z54 − 2z54 z44 + z45 − 2z45 z44 + 0.08y4 + 2 − 0.04π z44 (i = 4, j = 4)
On combining similar terms and applying the first three boundary conditions i.e. z (0, y) = z (x, 0) =
z (1, y) = 0 by eliminating all the terms containing z0j or zi0 or z5j as well as inserting the numeric
values, this expression of S will simplify to the following:
2
S = −2z21 z11 − 2z12 z11 + 6 − 0.04π 2 z11
9 NUMERICAL METHODS 230
2
−2z31 z21 − 2z22 z21 + 6 − 0.04π 2 z21
2
−2z41 z31 − 2z32 z31 + 6 − 0.04π 2 z31
2
−2z42 z41 + 6 − 0.04π 2 z41
2
−2z22 z12 − 2z13 z12 + 4.5 − 0.04π 2 z12
2
−2z32 z22 − 2z23 z22 + 4.5 − 0.04π 2 z22
2
−2z42 z32 − 2z33 z32 + 4.5 − 0.04π 2 z32
2
−2z43 z42 + 4.5 − 0.04π 2 z42
2
−2z23 z13 − 2z14 z13 + 4.222222 − 0.04π 2 z13
2
−2z33 z23 − 2z24 z23 + 4.222222 − 0.04π 2 z23
2
−2z43 z33 − 2z34 z33 + 4.222222 − 0.04π 2 z33
2
−2z44 z43 + 4.222222 − 0.04π 2 z43
π π 2
−2z24 z14 + sin2 − 2 sin z14 + 4.125 − 0.04π 2 z14
5 5
2 2π 2π 2
−2z34 z24 + sin − 2 sin z24 + 4.125 − 0.04π 2 z24
5 5
3π 3π 2
−2z44 z34 + sin2 − 2 sin z34 + 4.125 − 0.04π 2 z34
5 5
4π 4π 2
+ sin2 − 2 sin z44 + 4.125 − 0.04π 2 z44
5 5
On taking the partial derivatives of S with respect to z11 , · · · , z44 and setting the results to zero we
get:
∂S
= −2z21 − 2z12 + 2 6 − 0.04π 2 z11 = 0
∂z11
∂S
= −2z11 − 2z22 − 2z13 + 2 4.5 − 0.04π 2 z12 = 0
∂z12
∂S
= −2z12 − 2z23 − 2z14 + 2 4.222222 − 0.04π 2 z13 = 0
∂z13
∂S π
= −2z13 − 2z24 − 2 sin + 2 4.125 − 0.04π 2 z14 = 0
∂z14 5
∂S
= −2z11 − 2z31 − 2z22 + 2 6 − 0.04π 2 z21 = 0
∂z21
∂S
= −2z21 − 2z12 − 2z32 − 2z23 + 2 4.5 − 0.04π 2 z22 = 0
∂z22
∂S
= −2z22 − 2z13 − 2z33 − 2z24 + 2 4.222222 − 0.04π 2 z23 = 0
∂z23
∂S 2π
= −2z23 − 2z14 − 2z34 − 2 sin + 2 4.125 − 0.04π 2 z24 = 0
∂z24 5
∂S
= −2z21 − 2z41 − 2z32 + 2 6 − 0.04π 2 z31 = 0
∂z31
∂S
= −2z31 − 2z22 − 2z42 − 2z33 + 2 4.5 − 0.04π 2 z32 = 0
∂z32
∂S
= −2z32 − 2z23 − 2z43 − 2z34 + 2 4.222222 − 0.04π 2 z33 = 0
∂z33
∂S 3π
= −2z33 − 2z24 − 2z44 − 2 sin + 2 4.125 − 0.04π 2 z34 = 0
∂z34 5
9 NUMERICAL METHODS 231
∂S
= −2z31 − 2z42 + 2 6 − 0.04π 2 z41 = 0
∂z41
∂S
= −2z41 − 2z32 − 2z43 + 2 4.5 − 0.04π 2 z42 = 0
∂z42
∂S
= −2z42 − 2z33 − 2z44 + 2 4.222222 − 0.04π 2 z43 = 0
∂z43
∂S 4π
= −2z43 − 2z34 − 2 sin + 2 4.125 − 0.04π 2 z44 = 0
∂z44 5
On solving this system of simultaneous equations we get:
z11 ' 0.0242152 z21 ' 0.0391810 z31 ' 0.0391810 z41 ' 0.0242152
z12 ' 0.0965504 z22 ' 0.1562218 z32 ' 0.1562218 z42 ' 0.0965504
z13 ' 0.2159231 z23 ' 0.3493709 z33 ' 0.3493709 z43 ' 0.2159231
z14 ' 0.3805110 z24 ' 0.6156797 z34 ' 0.6156797 z44 ' 0.3805110
On plotting these points with the boundary points (see the upper frame of Figure 82) alongside the
analytical solution (see the lower frame of Figure 82) which we obtained in Problem 8 of § 1.6 we see
that the two plots are almost identical.
8. Re-solve Problem 9 of § 1.6 using this time the finite difference method with m = n = 5 discretization
scheme (i.e. by inserting 5 evenly-spaced points between the boundaries and hence making 36 square
divisions).
Answer: From the required discretization scheme m = n = 5 and the boundary conditions z (0, y) = 0,
z (x, 0) = −x2 , z (1, y) = y − 1 and z (x, 1) = x − x2 we get ∆x and ∆y and form the table, that is:
xm+1 − x0 1−0 1 yn+1 − y0 1−0 1
∆x = = = and ∆y = = =
m+1 5+1 6 n+1 5+1 6
x0 = 0 x1 = 16 x2 = 26 x3 = 36 x4 = 46 x5 = 56 x6 = 66
1 4 9
y0 = 0 z00 = 0 z10 = − 36 z20 = − 36 z30 = − 36 z40 = − 16
36 z50 = − 25
36 z60 = −1
y1 = 16 z01 = 0 z11 =? z21 =? z31 =? z41 =? z51 =? z61 = − 56
y2 = 26 z02 = 0 z12 =? z22 =? z32 =? z42 =? z52 =? z62 = − 46
y3 = 36 z03 = 0 z13 =? z23 =? z33 =? z43 =? z53 =? z63 = − 36
y4 = 46 z04 = 0 z14 =? z24 =? z34 =? z44 =? z54 =? z64 = − 26
y5 = 56 z05 = 0 z15 =? z25 =? z35 =? z45 =? z55 =? z65 = − 16
y6 = 66 z06 = 0 5
z16 = 36 8
z26 = 36 9
z36 = 36 8
z46 = 36 5
z56 = 36 z66 = 0
On generalizing Eq. 148 to 2D (as we did in Problem 6; see Eqs. 150 and 151) and noting that
m = n = 5 and F = zx2 + zy2 − 4z we get S [z11 , · · · , z55 ], that is:
5 X
X 5
2 2
S = zij,x + zij,y − 4zij ∆x ∆y
j=0 i=0
5 X 5
" #
X z(i+1)j − zij 2 zi(j+1) − zij 2
= + − 4zij ∆x ∆y
j=0 i=0
∆x ∆y
5 X 5
" 2 2 2 2
#
X z(i+1)j − 2z(i+1)j zij + zij zi(j+1) − 2zi(j+1) zij + zij
= + − 4zij ∆x ∆y
j=0 i=0
∆x∆x ∆y∆y
Now, if we have ∆x = ∆y = 1/6 (as in our case) then this formula will simplify to the following:
5 h
5 X
X i
2 2 2
S= z(i+1)j − 2z(i+1)j zij + zi(j+1) − 2zi(j+1) zij + 2zij − (zij /9)
j=0 i=0
9 NUMERICAL METHODS 232
0.9
0.8
1
0.7
0.8
0.6
0.6
z
0.5
0.4
0.4
0.2
0.3
0
1
0.2
0.8 1
0.6 0.8
0.1
0.4 0.6
0.4
0.2
0.2 0
0 0
y x
0.9
0.8
1
0.7
0.8
0.6
0.6
z 0.5
0.4
0.4
0.2
0.3
0
1
0.2
0.8 1
0.6 0.8
0.1
0.4 0.6
0.4
0.2
0.2 0
0 0
y x
Figure 82: The upper frame is a plot of the points z11 , · · · , z44 which we obtained in Problem 7 of § 9 by
finite difference (as well as the boundary points) while the lower frame is a plot of the analytical solution
z = y 2 sin (πx) which we obtained in Problem 8 of § 1.6. For fair comparison, we use the same xy mesh
in both plots.
9 NUMERICAL METHODS 233
that is:
2 2 2
S = z10 − 2z10 z00 + z01 − 2z01 z00 + 2z00 − (z00 /9) + (i = 0, j = 0)
2 2 2
z20 − 2z20 z10 + z11 − 2z11 z10 + 2z10 − (z10 /9) + (i = 1, j = 0)
2 2 2
z30 − 2z30 z20 + z21 − 2z21 z20 + 2z20 − (z20 /9) + (i = 2, j = 0)
2 2 2
z40 − 2z40 z30 + z31 − 2z31 z30 + 2z30 − (z30 /9) + (i = 3, j = 0)
2 2 2
z50 − 2z50 z40 + z41 − 2z41 z40 + 2z40 − (z40 /9) + (i = 4, j = 0)
2 2 2
z60 − 2z60 z50 + z51 − 2z51 z50 + 2z50 − (z50 /9) + (i = 5, j = 0)
2 2 2
z11 − 2z11 z01 + z02 − 2z02 z01 + 2z01 − (z01 /9) + (i = 0, j = 1)
2 2 2
z21 − 2z21 z11 + z12 − 2z12 z11 + 2z11 − (z11 /9) + (i = 1, j = 1)
2 2 2
z31 − 2z31 z21 + z22 − 2z22 z21 + 2z21 − (z21 /9) + (i = 2, j = 1)
2 2 2
z41 − 2z41 z31 + z32 − 2z32 z31 + 2z31 − (z31 /9) + (i = 3, j = 1)
2 2 2
z51 − 2z51 z41 + z42 − 2z42 z41 + 2z41 − (z41 /9) + (i = 4, j = 1)
2 2 2
z61 − 2z61 z51 + z52 − 2z52 z51 + 2z51 − (z51 /9) + (i = 5, j = 1)
2 2 2
z12 − 2z12 z02 + z03 − 2z03 z02 + 2z02 − (z02 /9) + (i = 0, j = 2)
2 2 2
z22 − 2z22 z12 + z13 − 2z13 z12 + 2z12 − (z12 /9) + (i = 1, j = 2)
2 2 2
z32 − 2z32 z22 + z23 − 2z23 z22 + 2z22 − (z22 /9) + (i = 2, j = 2)
2 2 2
z42 − 2z42 z32 + z33 − 2z33 z32 + 2z32 − (z32 /9) + (i = 3, j = 2)
2 2 2
z52 − 2z52 z42 + z43 − 2z43 z42 + 2z42 − (z42 /9) + (i = 4, j = 2)
2 2 2
z62 − 2z62 z52 + z53 − 2z53 z52 + 2z52 − (z52 /9) + (i = 5, j = 2)
2 2 2
z13 − 2z13 z03 + z04 − 2z04 z03 + 2z03 − (z03 /9) + (i = 0, j = 3)
2 2 2
z23 − 2z23 z13 + z14 − 2z14 z13 + 2z13 − (z13 /9) + (i = 1, j = 3)
2 2 2
z33 − 2z33 z23 + z24 − 2z24 z23 + 2z23 − (z23 /9) + (i = 2, j = 3)
2 2 2
z43 − 2z43 z33 + z34 − 2z34 z33 + 2z33 − (z33 /9) + (i = 3, j = 3)
2 2 2
z53 − 2z53 z43 + z44 − 2z44 z43 + 2z43 − (z43 /9) + (i = 4, j = 3)
2 2 2
z63 − 2z63 z53 + z54 − 2z54 z53 + 2z53 − (z53 /9) + (i = 5, j = 3)
2 2 2
z14 − 2z14 z04 + z05 − 2z05 z04 + 2z04 − (z04 /9) + (i = 0, j = 4)
2 2 2
z24 − 2z24 z14 + z15 − 2z15 z14 + 2z14 − (z14 /9) + (i = 1, j = 4)
2 2 2
z34 − 2z34 z24 + z25 − 2z25 z24 + 2z24 − (z24 /9) + (i = 2, j = 4)
2 2 2
z44 − 2z44 z34 + z35 − 2z35 z34 + 2z34 − (z34 /9) + (i = 3, j = 4)
2 2 2
z54 − 2z54 z44 + z45 − 2z45 z44 + 2z44 − (z44 /9) + (i = 4, j = 4)
2 2 2
z64 − 2z64 z54 + z55 − 2z55 z54 + 2z54 − (z54 /9) + (i = 5, j = 4)
2 2 2
z15 − 2z15 z05 + z06 − 2z06 z05 + 2z05 − (z05 /9) + (i = 0, j = 5)
2 2 2
z25 − 2z25 z15 + z16 − 2z16 z15 + 2z15 − (z15 /9) + (i = 1, j = 5)
2 2 2
z35 − 2z35 z25 + z26 − 2z26 z25 + 2z25 − (z25 /9) + (i = 2, j = 5)
2 2 2
z45 − 2z45 z35 + z36 − 2z36 z35 + 2z35 − (z35 /9) + (i = 3, j = 5)
2 2 2
z55 − 2z55 z45 + z46 − 2z46 z45 + 2z45 − (z45 /9) + (i = 4, j = 5)
9 NUMERICAL METHODS 234
2 2 2
z65 − 2z65 z55 + z56 − 2z56 z55 + 2z55 − (z55 /9) (i = 5, j = 5)
On eliminating the vanishing terms (from the zero boundary conditions) and simplifying we get:
2 2 2 2 2 2 2
S = 3z10 + 4z11 + 4z12 + 4z13 + 4z14 + 4z15 + z16 +
2 2 2 2 2 2 2
3z20 + 4z21 + 4z22 + 4z23 + 4z24 + 4z25 + z26 +
2 2 2 2 2 2 2
3z30 + 4z31 + 4z32 + 4z33 + 4z34 + 4z35 + z36 +
2 2 2 2 2 2 2
3z40 + 4z41 + 4z42 + 4z43 + 4z44 + 4z45 + z46 +
2 2 2 2 2 2 2
3z50 + 4z51 + 4z52 + 4z53 + 4z54 + 4z55 + z56 +
2 2 2 2 2 2
z60 + z61 + z62 + z63 + z64 + z65
∂S
= 8z51 − 2z50 − 2z41 − 2z61 − 2z52 − (1/9) = 0
∂z51
∂S
= 8z52 − 2z51 − 2z42 − 2z62 − 2z53 − (1/9) = 0
∂z52
∂S
= 8z53 − 2z52 − 2z43 − 2z63 − 2z54 − (1/9) = 0
∂z53
∂S
= 8z54 − 2z53 − 2z44 − 2z64 − 2z55 − (1/9) = 0
∂z54
∂S
= 8z55 − 2z54 − 2z45 − 2z65 − 2z56 − (1/9) = 0
∂z55
On solving this system of simultaneous equations (with inserting the numeric values from the boundary
points, i.e. z10 = −1/36, z20 = −4/36, etc.) we get:
1
z11 =0 z21 = − 18 z31 = − 16 z41 = − 31 z51 = − 59
1 1
z12 = 36 z22 =0 z32 = − 12 z42 = − 92 z52 = − 125
1 1
z13 = 18 z23 = 18 z33 =0 z43 = − 91 z53 = − 185
1 1 1 5
z14 = 12 z24 =9 z34 = 12 z44 =0 z54 = − 36
z15 = 19 z25 = 16 z35 = 16 z45 = 91 z55 =0
In fact, these values are identical to the values of the analytical solution (i.e. z = xy − x2 ) at these
points and hence we do not need to compare by plotting.
9. Make a brief comparison between the Rayleigh-Ritz method (of chapter 8) and the finite difference
method (of chapter 9).
Answer: We may note the following:
• The cost of the finite difference method scales up as the mesh is refined (i.e. its size increases) while
the cost of the Rayleigh-Ritz method is independent of the mesh size since it is based on obtaining a
closed form. However, the Rayleigh-Ritz method also “scales up” as higher approximations are pursued
(although this scaling up is not related to the mesh size).
• The finite difference method is essentially based on algebra while the Rayleigh-Ritz method is es-
sentially based on calculus (integration). The advantage of algebra over calculus is its simplicity (and
hence the general availability of solution) while its disadvantage is that it is normally very lengthy and
messy.
• Both methods have flexibility and inflexibility with certain types of boundary conditions. So, each
method has its advantages and disadvantages in this regard.
Chapter 10
Hybrid Methods
Hybrid methods are variational (and optimization) methods that are based on combining and mixing
other (simpler) methods. In fact, the title of this chapter is very generic and hence it can include many
methods which combine some of the previously-investigated methods (and possibly other methods). For
example, we can combine the Rayleigh-Ritz method with a deterministic or stochastic computational
technique to obtain a Rayleigh-Ritz numerical method (which is a hybrid method since it can be classified
under chapter 8 and under chapter 9). In this case, the objective of the computational (numeric) method
is to optimize the parameters (i.e. the parameters ci ’s in the case of 1D and the parameters cij ’s in
the case of 2D and similarly for higher dimensions) of the Rayleigh-Ritz method in one go using an
optimization algorithm based for example on conjugate gradient or Nelder-Mead or quasi-Newton or
simulated annealing methods.[142] Anyway, the hybrid methods are generally very useful, flexible and
practical and hence they represent an effective and powerful tool in the investigations and applications of
the mathematics of variation in general (and the calculus of variations in particular). Moreover, they are
usually associated with computer codes and packages and hence they are generally easy to use and easy
to adapt and transform. Also, their cost is usually negligible because once the computer code or package
is created or acquired it can be used infinite times with minimum time and effort (where the computers
usually do all the hard work in the blink of an eye).
Problems
1. It is suggested in the text that the Rayleigh-Ritz method can be used in conjunction with a computer
algorithm to obtain the parameters numerically. Suggest an alternative (hybrid) method in which the
Rayleigh-Ritz method is used.
Answer: For example, the analytical approach (which we explained and demonstrated in chapter
8) can be automated (stage by stage) to obtain an automated analytical (or symbolic) Rayleigh-Ritz
method (rather than numeric Rayleigh-Ritz method which we suggested above). However, being a
hybrid method may be disputed unless the automation is based on a different approach for obtaining
the parameters.
2. Re-solve Problem 5 of § 8 using the y5 Rayleigh-Ritz approximation and employing a numerical opti-
mizer to obtain the parameters c1 , · · · , c5 (instead of differentiating I and solving the resulting equa-
tions analytically).
Answer: We solved this Problem following a similar method to that used in Problem 5 of § 8 but
instead of differentiating I and solving the resulting equations analytically we passed the expression
of I to a numerical optimizer (to optimize I for the parameters c1 , · · · , c5 ) and we obtained: c1 '
0.1680197269, c2 ' −0.0522179496, c3 ' 0.0062854969, c4 ' 0.0041982436 and c5 ' 0.0003880377.
These values are very close to the values obtained in Problem 5 of § 8. In fact, these values produce
identical results (up to the sixth decimal place of the values of y5 ) and identical plot.
3. Re-solve Problem 9 of § 8 using the z33 Rayleigh-Ritz approximation and employing a numerical
optimizer to obtain the parameters c11 , · · · , c33 (instead of differentiating I and solving the resulting
equations analytically).
Answer: We solved this Problem following a similar method to that used in Problem 9 of § 8 but
instead of differentiating I and solving the resulting equations analytically we passed the expression of
I to a numerical optimizer (to optimize I for the parameters c11 , · · · , c33 ) and we obtained: c11 = 4.0,
c12 = −5.25, c13 = 5.25, c21 = −5.25, c22 = 15.75, c23 = −15.75, c31 = 5.25, c32 = −15.75 and
c33 = 15.75. These values are identical to the values obtained in Problem 9 of § 8 (noting the
[142] Infact, I used some of these (or similar) methods in my past research in fluid mechanics. However, these methods are
associated with computer codes and hence they are beyond the scope of this book.
237
10 HYBRID METHODS 238
correspondence between a, b, c, f, g, h, i, j, k and c11 , c12 , c13 , c21 , c22 , c23 , c31 , c32 , c33 ).
References
G.B. Arfken; H.J. Weber; F.E. Harris. Mathematical Methods for Physicists A Comprehensive Guide.
Elsevier Academic Press, seventh edition, 2013.
M.L. Boas. Mathematical Methods in the Physical Sciences. John Wiley & Sons Inc., third edition, 2006.
W.E. Byerly. Introduction to the Calculus of Variations. Harvard University Press, first edition, 1917.
T.L. Chow. Mathematical Methods for Physicists: A Concise Introduction. Cambridge University Press,
first edition, 2003.
B. Dacorogna. Introduction to the Calculus of Variations. Imperial College Press, first edition, 2004.
L. Komzsik. Applied Calculus of Variations for Engineers. CRC Press, first edition, 2009.
M.L. Krasnov; G.I. Makarenko; A.I. Kiselev. Problems and Exercises in the Calculus of Variations. Mir
Publishers Moscow, first edition, 1975. Translated from the Russian by George Yankovsky.
H.A. Lauwerier. Calculus of Variations in Mathematical Physics. Mathematisch Centrum Amsterdam,
first edition, 1966.
Open University Team. Introduction to the Calculus of Variations. Latimer Trend and Company Ltd,
first edition, 2016.
K.F. Riley; M.P. Hobson; S.J. Bence. Mathematical Methods for Physics and Engineering. Cambridge
University Press, third edition, 2006.
K.F. Riley; M.P. Hobson. Essential Mathematical Methods for the Physical Sciences. Cambridge Uni-
versity Press, first edition, 2011.
K.T. Tang. Mathematical Methods for Engineers and Scientists 3. Springer, first edition, 2007.
R. Weinstock. Calculus of Variations with Applications to Physics and Engineering. Dover Publications,
second edition, 1974.
Note: as well as the above references, we also consulted during our work on the preparation of this book
many other books, research and review papers and general articles about this subject.
239
Index
Absolute value, 6 Circular
Acceleration, 4, 177–181, 186 arc, 70, 71, 100, 102, 104, 108, 110, 167, 170
Action, 172–174 symmetry, 125
integral, 172–174 Circumference, 106, 111, 115, 127
Algebra, 118, 152, 162, 198, 208, 211, 217, 236 Classical mechanics, 174
Angle, 73, 100, 104, 107, 109, 118, 120, 148, 161, 162, 164, Co-vertex (of ellipse), 126
171, 181 Complex number, 6
Angular Compound pendulum, 182
displacement, 89, 93 Cone, 73–76, 135, 136, 138–141, 146–150
momentum, 178, 179 Conjugate gradient, 237
speed, 5, 175, 181 Conservation
Applied mathematics, 1 of angular momentum, 178, 179
Areal speed, 179 of energy, 82
Arithmetic, 8 principles, 174
Azimuthal Conservative
component, 175, 177 force, 173, 177, 184, 186
coordinate, 69, 71, 73 system, 174, 187
Constraint, 4, 6, 9, 41, 47–49, 51–53, 63–65, 95, 98, 101, 102,
Base (of geometric shape), 113, 115, 117, 125, 127, 133–136, 105–107, 109, 113, 117, 118, 120, 132, 136–138,
139, 140, 146–150 140, 141, 155, 157, 158, 189, 195, 207
Basis function, 5, 197–199, 202, 206, 208, 210 Continuous, 157, 158, 173, 187, 188, 216
Bead, 81, 82, 84–86, 88, 89, 91 Continuum, 173, 174, 187
Beltrami identity, 10, 14, 70, 74, 94, 102, 155, 159 mechanics, 173
Bernoulli, 81 Coplanar, 80
Boundary Cube, 6, 132, 133, 137, 138
condition, 9, 11, 12, 19, 30, 33, 37, 41–44, 46–48, 51, 52, Cubic curve, 76, 153–155
54, 55, 62, 94, 102, 128, 130, 159, 160, 191, 193, Cycloid, 84–86, 88, 89, 91
195, 197–199, 202, 206–208, 210, 211, 217, 219, Cylinder, 69–72, 75, 128, 137–139, 144–147
221, 223, 225–227, 229, 231, 234, 236 Cylindrical
curve, 40, 41, 54, 55, 58, 59, 61, 62, 76–78, 80 coordinate system, 72, 175
point, 60, 97, 107, 129, 130, 152, 155, 198, 199, 202, coordinates, 5, 9, 69, 70, 74–76, 129
216, 218, 220, 221, 223, 225, 227, 228, 231, 232,
236 Density (or mass density), 5, 93–95, 187, 188
value problem, 198 Diameter, 127, 137, 139, 140, 147–149
Brachistochrone, 81, 82, 84–86, 88, 89, 91, 93 Differentiable, 6
Differential
Cable, 93, 95 equation, 1, 6, 10, 12, 48, 172, 189, 193
Calculus, 1, 6, 8–10, 93, 95–98, 113, 116, 118, 132, 135, 138, operator, 4
141, 150, 161, 162, 236 Differentiation, 1, 41, 74, 138, 151, 192
of variations, 1, 6–11, 13, 47, 49, 62, 81, 82, 93, 95, 97, Discrete, 4, 173, 187, 188
98, 157, 161, 162, 172, 173, 189, 216, 237 Discretization, 216–219, 221, 223, 225, 227, 231
Cartesian Displacement, 180–182, 187
coordinate system, 5, 9, 12, 66, 75, 82, 85, 93, 104, 118, Distance, 67, 68, 76–80, 125, 161, 181, 184
121, 123, 141, 163, 164, 168–170, 178, 182
coordinates, 66, 68, 73, 178, 186 Eccentricity (of ellipse), 4, 116
Catenary, 93–97, 128, 130, 166 Eigenfunction, 189, 193, 195
Catenoid, 41, 128, 130, 131 Eigenvalue, 5, 189, 190, 192–196, 198
Center of Ellipse, 4, 115, 116, 118, 123–127
gravity, 93 Ellipsoid, 4, 140–142
mass, 4, 185, 186 Engineering, 1, 197
Central Enneper, 41
angle, 100, 104, 107, 109 Equatorial plane, 179
field, 180 Equilateral (triangle), 113, 115, 117, 118, 120, 121
force, 179, 180 Equilibrium, 93, 95, 180, 181, 187
Centripetal acceleration, 178, 180 Equipotential surface, 186, 187
Chain, 93–95 Euclidean
rule, 14, 15, 17 plane, 66, 67, 76, 77, 79, 80
Chord (of circle), 107 space, 4, 5, 40, 41, 66, 68, 161, 163, 186
Circle, 6, 70, 71, 100, 104–112, 116, 118–125, 127, 149, 156, Euler-Lagrange equation, 8–16, 19
157, 171 Euler-Ostrogradsky equation, 38
240
Euler-Poisson equation, 34 Hyperbolic
Explicit dependency, 10–12, 14, 16, 29, 38, 41, 70, 72, 74, 75, cosine, 93–95, 97, 128, 166
83, 93, 177 function, 32, 33
Extremal
curve, 10, 11, 55–57, 59–62 Imaginary number, 6
function, 11 Implicit dependency, 12, 14, 16, 38, 41
point, 10 Inertial force, 180
Extremizing function, 5, 23–30, 34, 37, 38, 41–44, 46, 172, Infinitesimal, 4, 67, 93, 104, 128, 157, 216
193, 196–198, 217 Inflection, 7, 11, 48, 153, 161
Inscribed, 118–126, 141–150
Fastest descent curve, 81, 82, 85, 86, 88, 91–93 Integrand, 4, 8, 10–12, 16, 48, 49, 172, 199–203
Fermat’s principle, 161, 163, 165 Integration, 1, 10, 12, 48, 76, 83, 152, 172, 236
Finite difference method, 5, 216–218, 220, 221, 223, 225, 227, by parts, 191, 192
228, 231, 232, 236 Interpolation, 217
First integral, 10, 172 Inverted Cartesian coordinate system, 82, 85, 182
Fluid Isochrone, 82, 89, 91, 93
dynamics, 173 Isoperimetric, 98
mechanics, 237 Isosceles (triangle), 113, 118, 125, 127
Focus (of ellipse), 125 Isotropic (optically), 165
Force, 5, 86, 95, 172–174, 177–182, 184, 186, 187
Free Kepler’s second law, 179
boundary, 54 Kinetic energy, 4, 82, 173–178, 180, 181, 183, 184, 187
fall, 86, 92, 93
movement, 54 Lagrange
particle, 177, 178 multiplier (or undetermined multiplier), 5, 47–49, 190
system, 184 multipliers technique (or method), 4, 47–49, 101, 105,
Frequency, 180, 182 106, 111, 113, 114, 116, 117, 132, 133, 136, 137,
Friction, 174, 187 139–141, 155, 157
Frictionlessly, 81, 82, 85 Lagrange’s equations, 172, 174
Function of function, 7, 8 Lagrangian, 4, 49, 172–177, 179, 181, 183–186, 188
Functional, 1, 4, 6–13 equations, 172, 177
integral, 4, 8, 10, 11, 19, 23, 30, 34, 35, 37–46, 48, 49, function, 172
63, 64, 83, 159, 172, 188, 191, 197, 199–203, 205, mechanics, 172
207, 208, 210, 213, 216 Laplace equation, 40
Law of light
Galileo, 81 reflection, 161–163
Generator, 70, 71, 140, 144, 147 refraction, 163, 164
Geodesic, 66–73, 75–80, 161, 186, 187 Line element, 4, 66, 67, 69, 70, 72–75, 168
Geometrical optics, 161 Linear mass density, 5, 93–95, 187, 188
Geometry, 107, 125, 162 Logarithmic function, 6
Gradient, 4, 177, 186
operator, 186 Mapping, 9
Gravitational Mass, 4, 82, 93, 174, 177–182, 184–186
acceleration, 4, 181 Massive particle, 81, 184
constant, 179 Massless, 180
field, 81, 82, 93–95, 178–181 Mathematics, 1, 7, 8, 81, 93, 98, 173
force, 4, 179, 180 of variation, 1, 6, 7, 13, 47, 66, 113, 132, 172, 237
potential, 5, 178, 179 Mechanical system, 172–180, 184
Gravity, 81, 82, 85, 86, 88, 93, 174, 180–182, 184, 187 Mechanics, 9, 172–174
Great circle, 73, 76 Meridian, 75, 76
Minimal surface, 41, 129–131
Hamilton (William), 172 Mixed nature, 63, 65
Hamilton’s principle of least action, 172, 173 Moment of inertia, 4, 174
Hamiltonian Multi-variable
formulation, 172, 173, 177–182, 184, 187 differential calculus, 1, 7
mechanics, 4, 172–175 function, 5, 49
Height (of geometric shape), 4, 113–115, 117, 118, 125, 127, integral calculus, 1, 7
133, 135–137, 139–141, 144, 146–149
Helical arc, 72 nabla operator, 4
Helicoid, 41 Natural
Helix, 70–72 boundary conditions, 62
Homogeneous (optically), 161, 163, 165 logarithm, 1, 24, 79
Hooke’s law, 188 parameter, 98
Hookean, 180 Nelder-Mead, 237
Hybrid methods, 237 Newton’s
241
first law, 186 Quadrant, 122–124
laws, 173, 174 Quadratic formula, 134, 143, 145
second law, 177–180, 186 Quartic equation, 77
Newtonian mechanics, 173 Quasi-Newton, 237
Non-
conservative force, 187 Radial
Euclidean, 161 acceleration, 179
negative, 1, 6 component, 175, 177
oscillating, 187 direction, 178
Numerical Radius, 4, 69, 70, 72, 75, 91, 100–102, 105–108, 111, 118,
methods, 216 120, 121, 123–125, 127, 128, 136, 137, 139, 140,
optimizer, 237 144–150, 156–158, 175
Rayleigh-Ritz
Octant, 141 approximation, 5, 200, 201, 204–206, 214, 237
Optimal method, 4, 5, 197–202, 204–206, 208–212, 214, 215, 236,
curve, 66, 98 237
solid, 132, 152, 156 Real number, 4, 6, 8, 9, 52, 134
surface, 113, 127 Real-valued function, 1, 6
Origin of coordinates, 73, 74, 82, 83, 89, 91, 119, 122, 124, Rectangle, 6, 49, 104, 111, 113, 116, 117, 121–125
126, 141, 155, 184 Rectangular
Orthonormal Cartesian coordinate system, 5, 66, 75 Cartesian coordinate system, 141
Oscillation, 187, 188 parallelepiped, 132, 136–138, 141, 142
Overdot, 4, 16, 70, 172 Reduced mass, 4, 186
Reflection (of light), 161–163
Parabola, 77, 79, 150–153 Refraction (of light), 163, 164
Parallelepiped, 6, 132, 133, 136–138, 141–143 Refractive index, 4, 163–170
Partial Regular pyramid, 133–135
derivative, 5, 9, 12, 15, 38, 49, 54, 64, 113, 114, 117, Resistance (to fluid flow), 4, 158–160
119, 120, 126, 132, 133, 136, 137, 139–141, 188, Right circular
193, 216, 218–220, 222, 224, 226, 230, 235 cone, 73–76, 136, 138, 140, 146–150
differentiation, 1, 12, 41 cylinder, 69–72, 75, 137–139, 144–147
Particle, 81, 173, 174, 177, 178, 180–182, 184, 186, 187 Rotational kinetic energy, 4, 175
Pendulum, 181, 182, 184 Rules of
Perimeter, 4, 6, 49, 98, 107, 109, 111, 113–118, 120–128, 157, differentiation, 192
171 variation, 192
Physics, 1, 7, 11, 93, 95, 173, 174
Planar Saddle, 7, 48, 161
curve, 81, 98, 99, 101, 102, 105, 106, 127, 128 Scalar, 4, 174
shape, 98, 113, 116, 118 Science, 174, 197
Plane, 5, 40, 41, 66, 67, 69, 73, 76, 77, 79–82, 84, 91, 93, 104, Second order
106, 109, 111, 125, 127, 157, 163–170, 174, 177, derivative, 34, 63, 64
179, 181, 182, 184, 187, 206 differential equation, 12, 48, 172
curve, 40, 104, 106, 109, 111, 125, 127, 157 Sector (of circle), 104
Poisson equation, 39, 40 Segment (of circle), 100, 107, 108, 120, 121
Polar Semi-
angle, 171 axes (of ellipsoid), 4, 140, 141, 143
coordinate system, 167, 168 circle, 100, 101
coordinates, 5, 9, 67, 167, 168, 170, 177 disc, 100
plot, 170, 171 major axis (of ellipse), 4, 116, 123–126
Polyhedron, 132 minor axis (of ellipse), 4, 123–126
Polynomial function, 33, 197–199, 202, 206, 208, 210, 217 SI units, 92
Position vector, 4, 177, 184, 185 Side (of polygon or polyhedron), 49, 117, 118, 125–127, 135
Potential energy, 4, 82, 93, 95, 173–177, 180–182, 184, 186, Simple
188 harmonic motion, 180
Prime, 4, 9, 23, 26, 34, 44, 66–68, 75, 95, 98 pendulum, 181
Principle of Simulated annealing, 237
least action, 172–174 Simultaneous equations, 49, 154, 202, 204, 205, 207, 210, 211,
least time, 161, 163 214, 217–219, 221, 222, 224, 227, 231, 236
Reciprocity, 111 Single-variable
stationary action, 174 differential calculus, 1, 7
Product rule, 14, 17, 18 function, 49
Profile curve, 75, 127, 129, 130, 152, 160 integral calculus, 1, 7
Proof by contradiction, 111, 117, 137–139, 158 Sinusoidal function, 32, 33, 180, 182, 197
Pyramid, 133–136 Slant height, 140, 141
Pythagoras theorem, 77, 78, 115, 144, 162, 163 Snell’s law, 163, 164
242
Solid of revolution, 6, 150, 152, 155–160 notation, 172
Speed, 5, 82, 84, 89, 161, 163, 175, 180 Velocity, 5, 172, 177, 179, 184
of light, 4, 161, 163, 165 vector, 177
of light in material media, 4, 163, 165 Vertex, 104, 118–127, 141
of light in vacuum, 4, 163
of wave, 5, 188 Wave equation, 188
Sphere, 6, 72, 73, 76, 140, 144–150, 156–158 Wavelength, 161
Spherical Width, 4, 113, 116, 122–125, 128, 157
coordinate system, 72, 179, 184 Wire, 82, 85, 88
coordinates, 4, 72, 73, 75, 179, 184 Work-energy principle, 82, 84
pendulum, 184
symmetry, 76 Yo-yo, 174, 175
Spheroid, 142, 143
Spline, 217
Spring, 180
constant, 180
Square, 6, 49, 113, 116, 117, 122, 123, 125, 133–135
brackets, 9, 216
division, 225, 227, 231
root, 1, 6, 68
Stationarization, 4, 8–10, 12, 48, 71, 192, 193, 197, 216
Stationarizing function, 23, 30, 37, 41–43, 46, 193, 196
Stationary, 7–10, 14, 47, 48, 151, 154, 161, 162, 164, 172–174,
192, 193, 196
String, 174, 175, 181, 182, 184, 187, 188
Sturm-Liouville
equation, 189–193
problem, 5, 189, 190, 192, 193, 195, 196, 198
Sufficiently smooth, 6
Sum of sides lengths (of polyhedron), 132, 133, 135, 137, 138
Surface of revolution, 6, 75, 76, 127–130, 157
Symmetry, 76, 107, 122, 123, 125, 129, 141, 142, 158, 163
Uniform
gravitational field, 81, 94
gravity, 93, 181, 182, 184
linear density, 93, 94, 188
mass density, 187
medium, 161
motion, 89, 158
Unit
circle, 118, 119, 122, 125
sphere, 145, 146, 150
Variable boundary, 53–55, 57–59, 61, 62, 65, 76, 77, 79, 80,
85
Variational
calculus, 7, 12, 82, 129, 193
principle, 7, 8, 10, 13, 14, 161, 162, 164, 172
Vector, 4, 163, 172, 174, 177, 184, 185
equations, 174
243
Author Notes
• All copyrights of this book are held by the author.
• This book, like any other academic document, is protected by the terms and conditions of the universally
recognized intellectual property rights. Hence, any quotation or use of any part of the book should be
acknowledged and cited according to the scholarly approved traditions.
• This book is totally made and prepared by the author including all the graphic illustrations, indexing,
typesetting, book cover, and overall design.
244
View publication stats