100% found this document useful (5 votes)
393 views442 pages

Raffoul y Applied Mathematics For Scientists and Engineers

Applied mathematics

Uploaded by

Strahinja Donic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (5 votes)
393 views442 pages

Raffoul y Applied Mathematics For Scientists and Engineers

Applied mathematics

Uploaded by

Strahinja Donic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 442

Applied Mathematics

for Scientists and


Engineers

Mathematicians, physicists, engineers, biologists, and other scientists who study related
fields frequently use differential equations, linear algebra, calculus of variations, and integral
equations. The purpose of Applied Mathematics for Scientists and Engineers is to provide a con-
cise and well-organized study of the theoretical foundations for the development of mathemat-
ics and problem-solving methods. A wide range of solution strategies are shown for real-world
challenges. The author’s main objective is to provide as many examples as possible to help make
the theory reflected in the theorems more understandable. The book’s five chapters can be used
to create a one-semester course as well as for self-study. The only prerequisites are a basic un-
derstanding of calculus and differential equations.

The five main topics include:

• Ordinary differential equations


• Partial differential equations
• Matrices and systems of linear equations
• Calculus of variations
• Integral equations
The author strikes a balance between rigor and presentation of very challenging content in
a simple format by adopting approachable notations and using numerous examples to clarify
complex themes. Exercises are included at the end of each section. They range from simple
computations to more challenging problems.
Textbooks in Mathematics
Series editors:
Al Boggess, Kenneth H. Rosen

Abstract Algebra
A First Course, Second Edition
Stephen Lovett

Multiplicative Differential Calculus


Svetlin Georgiev, Khaled Zennir

Applied Differential Equations


The Primary Course
Vladimir A. Dobrushkin

Introduction to Computational Mathematics: An Outline


William C. Bauldry

Mathematical Modeling the Life Sciences


Numerical Recipes in Python and MATLABTM
N. G. Cogan

Classical Analysis
An Approach through Problems
Hongwei Chen

Classical Vector Algebra


Vladimir Lepetic

Introduction to Number Theory


Mark Hunacek

Probability and Statistics for Engineering and the Sciences with Modeling using R
William P. Fox and Rodney X. Sturdivant

Computational Optimization: Success in Practice


Vladislav Bukshtynov

Computational Linear Algebra: with Applications and MATLAB Computations


Robert E. White

Linear Algebra With Machine Learning and Data


Crista Arangala

Discrete Mathematics with Coding


Hugo D. Junghenn

Applied Mathematics for Scientists and Engineers


Youssef N. Raffoul

https://www.routledge.com/Textbooks-in-Mathematics/book-series/CANDHTEX-
BOOMTH
Applied Mathematics
for Scientists and
Engineers

Youssef N. Raffoul
First edition published 2024
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742

and by CRC Press


4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN

CRC Press is an imprint of Taylor & Francis Group, LLC

© 2024 Youssef N. Raffoul

Reasonable efforts have been made to publish reliable data and information, but the author and pub-
lisher cannot assume responsibility for the validity of all materials or the consequences of their use.
The authors and publishers have attempted to trace the copyright holders of all material reproduced
in this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, access www.copyright.
com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermis-
[email protected]

Trademark notice: Product or corporate names may be trademarks or registered trademarks and are
used only for identification and explanation without intent to infringe.

ISBN: 978-1-032-58257-3 (hbk)


ISBN: 978-1-032-58394-5 (pbk)
ISBN: 978-1-003-44988-1 (ebk)

DOI: 10.1201/9781003449881

Typeset in Nimbus Roman font


by KnowledgeWorks Global Ltd.

Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.
Dedication
To my beautiful and adorable granddaughter
Aurora Jane Palmore
Contents

Preface xi

Author xv

1 Ordinary Differential Equations 1


1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Separable Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Exact Differential Equations . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Integrating factor . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Linear Differential Equations . . . . . . . . . . . . . . . . . . . . . 14
1.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Homogeneous Differential Equations . . . . . . . . . . . . . . . . . 16
1.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Bernoulli Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.7 Higher-Order Differential Equations . . . . . . . . . . . . . . . . . 21
1.7.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.8 Equations with Constant Coefficients . . . . . . . . . . . . . . . . . 25
1.8.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.9 Nonhomogeneous Equations . . . . . . . . . . . . . . . . . . . . . 30
1.9.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.10 Wronskian Method . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.10.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.11 Cauchy-Euler Equation . . . . . . . . . . . . . . . . . . . . . . . . 36
1.11.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2 Partial Differential Equations 40


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2 Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2.1 Linear equations with constant coefficients . . . . . . . . . 45
2.2.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2.3 Equations with variable coefficients . . . . . . . . . . . . . 50
2.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

vii
viii Contents

2.3 Quasi-Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . 53


2.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.4 Burger’s Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.4.1 Shock path . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.5 Second-Order PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.6 Wave Equation and D’Alembert’s Solution . . . . . . . . . . . . . . 79
2.6.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.6.2 Vibrating string with fixed ends . . . . . . . . . . . . . . . 93
2.6.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
2.7 Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.7.1 Solution of the heat equation . . . . . . . . . . . . . . . . . 104
2.7.2 Heat equation on semi-infinite domain: Dirichlet condition . 111
2.7.3 Heat equation on semi-infinite domain: Neumann condition 115
2.7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
2.8 Wave Equation on Semi-Infinite Domain . . . . . . . . . . . . . . . 120
2.8.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

3 Matrices and Systems of Linear Equations 126


3.1 Systems of Equations and Gaussian Elimination . . . . . . . . . . . 126
3.2 Homogeneous Systems . . . . . . . . . . . . . . . . . . . . . . . . 130
3.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
3.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
3.4 Determinants and Inverse of Matrices . . . . . . . . . . . . . . . . 136
3.4.1 Application to least square fitting . . . . . . . . . . . . . . 147
3.4.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
3.5 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
3.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
3.6 Eigenvalues-Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . 160
3.6.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
3.7 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 167
3.7.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
3.8 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
3.8.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
3.9 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
3.9.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
3.10 Functions of Symmetric Matrices . . . . . . . . . . . . . . . . . . . 195
3.10.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

4 Calculus of Variations 203


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
4.2 Euler-Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . 206
4.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Contents ix

4.3 Impact of y′ on Euler-Lagrange Equation . . . . . . . . . . . . . . . 220


4.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
4.4 Necessary and Sufficient Conditions . . . . . . . . . . . . . . . . . 223
4.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
4.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
4.6 Generalization of Euler-Lagrange Equation . . . . . . . . . . . . . 246
4.6.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
4.7 Natural Boundary Conditions . . . . . . . . . . . . . . . . . . . . . 252
4.8 Impact of y′′ on Euler-Lagrange Equation . . . . . . . . . . . . . . 261
4.8.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
4.9 Discontinuity in Euler-Lagrange Equation . . . . . . . . . . . . . . 263
4.9.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
4.10 Transversality Condition . . . . . . . . . . . . . . . . . . . . . . . 267
4.10.1 Problem of Bolza . . . . . . . . . . . . . . . . . . . . . . . 271
4.10.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
4.11 Corners and Broken Extremal . . . . . . . . . . . . . . . . . . . . 275
4.11.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
4.12 Variational Problems with Constraints . . . . . . . . . . . . . . . . 282
4.12.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
4.13 Isoperimetric Problems . . . . . . . . . . . . . . . . . . . . . . . . 287
4.13.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
4.14 Sturm-Liouville Problem . . . . . . . . . . . . . . . . . . . . . . . 298
4.14.1 The First Eigenvalue . . . . . . . . . . . . . . . . . . . . . 302
4.14.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
4.15 Rayleigh Ritz Method . . . . . . . . . . . . . . . . . . . . . . . . . 310
4.15.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
4.16 Multiple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
4.16.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

5 Integral Equations 321


5.1 Introduction and Classifications . . . . . . . . . . . . . . . . . . . 321
5.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
5.2 Connection between Ordinary Differential Equations and Integral
Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
5.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
5.3 The Green’s Function . . . . . . . . . . . . . . . . . . . . . . . . . 334
5.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
5.4 Fredholm Integral Equations and Green’s Function . . . . . . . . . 341
5.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
5.4.2 Beam problem . . . . . . . . . . . . . . . . . . . . . . . . 343
5.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
5.5 Fredholm Integral Equations with Separable Kernels . . . . . . . . 348
5.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
5.6 Symmetric Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
x Contents

5.6.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 363


5.7 Iterative Methods and Neumann Series . . . . . . . . . . . . . . . . 365
5.7.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
5.8 Approximating Non-Degenerate Kernels . . . . . . . . . . . . . . . 378
5.8.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
5.9 Laplace Transform and Integral Equations . . . . . . . . . . . . . . 381
5.9.1 Frequently used Laplace transforms . . . . . . . . . . . . . 387
5.9.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
5.10 Odd Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
5.10.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 396

Appendices 397

A Fourier Series 399


A.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
A.2 Finding the Fourier Coefficients . . . . . . . . . . . . . . . . . . . 401
A.3 Even and Odd Extensions . . . . . . . . . . . . . . . . . . . . . . . 404
A.4 Applications of Fourier Series . . . . . . . . . . . . . . . . . . . . 408
A.5 Laplacian in Polar, Cylindrical and Spherical Coordinates . . . . . . 410

Bibliography 417

Index 421
Preface

The author is very excited to share his book with you, and hopes you will find it
beneficial in broadening your education and advancing your career. The main objec-
tive of this book is to give the reader a thorough understanding of the basic ideas
and techniques of applied mathematics as they are employed in various engineering
fields. Topics such as differential equations, linear algebra, the calculus of variations,
and integral equations are fundamental to scientists, physicists, and engineers. The
book emphasizes both the theory and its applications. It incorporates engineering
applications throughout, and in line with that idea, derivations of the mathemat-
ical models of numerous physical systems are presented to familiarize the reader
with the foundational ideas of applied mathematics and its applications to real-world
problems.
For the last twenty-four years, the author has been teaching a graduate course in
applied mathematics for graduate students majoring in mathematics, physics, and
engineering at the University of Dayton. The course covered various topics in differ-
ential equations, linear algebra, calculus of variations, and integral equations. As a
result, the author’s lecture notes eventually became the basis for this book.
The book is self-contained, and no knowledge beyond an undergraduate course on
ordinary differential equations is required. A couple of sections of Chapters 4 and 5
require knowledge of Fourier series. To make up for this deficiency, an appendix was
added. The book should serve as a one-semester graduate textbook exploring the the-
ory and applications of topics in applied mathematics. Educators have the flexibility
to design their own three-chapter, one-semester course from Chapters 2–5. The first
chapter is intended as a review on the subject of ordinary differential equations, and
we refer to particular sections of it in later chapters when dealing with calculus of
variations and integral equations. The author made every effort to create a balance
between rigor and presenting the most difficult subject in an elementary language
while writing the book in order to make it accessible to a wide variety of readers.
The author’s main objective was to provide as many examples as possible to help
make the theory reflected in the theorems more understandable. The purpose of the
book is to provide a concise and well-organized study of the theoretical foundations
for the development of mathematics and problem-solving methods. This book’s text
is organized in a way that is both very readable and mathematically sound. A wide
range of solution strategies are shown for a number of real-world challenges.
The author’s presentational manner and style have a big impact on how this book
develops mathematically and pedagogically. Some of the concepts from the extensive

xi
xii Preface

and well-established literature on many applied mathematics topics found their way
into this book. Whenever possible, the author tried to deal with concepts in a more
conversational way, copiously illustrated by 165 completely worked-out examples.
Where appropriate, concepts and theories are depicted in 83 figures.
Exercises are a crucial component of the course’s learning tool and are included
at the end of each section. They range from simple computations to the solution
of more challenging ones. Before starting the exercises, students must read the
mathematics in the pertinent section. The book is divided into five chapters and an
appendix.
Chapter 1 is a review of ordinary differential equations and is not intended to be for-
mally covered by the instructor. It is recommended that students become acquainted
with it before proceeding to the following chapters. The main reason for including
Chapter 1 is that by the time students take a graduate course in applied mathematics,
they have already forgotten most techniques for solving ordinary differential equa-
tions. In addition, it will save class time by not formally reviewing such topics but
rather asking the students to read them beforehand. The chapter covers first-order and
higher-order differential equations. It also includes a section on the Cauchy-Euler
equation, which plays a significant role in Chapters 4 and 5.
The second chapter is devoted to the study of partial differential equations, with
the majority of the content aimed toward graduate students pursuing engineering
degrees. The chapter begins with linear equations with constant and variable coef-
ficients and then moves on to quasi-linear equations. Burger’s equation occupies an
important role in the chapter, as do second-order partial differential equations and
homogeneous and nonhomogeneous wave equations.
The third chapter discusses matrices and systems of linear equations. Gauss elimina-
tion, matrix algebra, vector spaces, and eigenvalues and eigenvectors are all covered.
The chapter concludes with an examination of inner product spaces, diagonalization,
quadratic forms, and functions of symmetric matrices.
Chapter 4 delves deeply into fundamental themes in the calculus of variations in a
functional analytic environment. The calculus of variations is concerned with the op-
timization of functionals over a set of competing objects. We begin by deriving the
Euler-Lagrange necessary condition and generalizing the concept to functionals with
higher derivatives or with multiple variables. We provide a nice discussion on the the-
ory behind sufficient conditions. Some of the topics are generalized to isoperimetric
problems and functionals with constraints. Toward the end of the chapter, we closely
examine the connection between the Sturm-Liouville problem and the calculus of
variations. We end the chapter with the Rayleigh-Ritz method and the development
of Euler-Lagrange to allow variational computation of multiple integrals.
Chapter 5 is solely devoted to the study of Fredholm and Volterra integral equations.
The chapter begins by introducing integral equations and the connections between
them and ordinary differential equations. The development of Green’s function occu-
pies an important role in the chapter. It is used to classify kernels, which in turn leads
Preface xiii

us to the appropriate approach for finding solutions. This includes integral equations
with symmetric kernels or degenerate kernels. Toward the end of the chapter, we
develop iterative methods and the Neumann series. We briefly discuss ways of ap-
proximating non-degenerate kernels and the use of the Laplace transform in solving
integral equations of convolution types. Since not all integral equations can be re-
duced to differential equations, one should expect odd behavior from solutions. For
such reasons, we devote the last section of the chapter to the qualitative analysis of
solutions using fixed point theory and the Liapunov direct method.
Appendix A covers the basic topics of Fourier series. We briefly discuss Fourier se-
ries expansion, including sine and cosine, and the corresponding relations to periodic
odd extension and periodic even extension. We provide applications to the heat prob-
lem in a finite slab by utilizing the concept of separation of variables. We transform
the Laplacian equation in different dimensions to polar, cylindrical, and spherical
coordinates. We end this appendix by studying the Laplacian equation in circular do-
mains, such as the annulus. Materials in this section will be useful in several places
in the book, especially Chapters 2, 4, and 5.
The author owes a debt of gratitude to Drs. Sam Brensinger and George Todd for
reading the first and third chapters, respectively, and for their insightful remarks and
recommendations. I’d like to express my gratitude to the hundreds of graduate stu-
dents at the University of Dayton who helped the author polish and refine the lecture
notes so that a significant portion of them made it into this book over the course of
the last 22 years.
This book would not exist without the encouragement and support of my wife, my
children Hannah, Paul, Joseph, and Daniel, and my brother Melhem.
Youssef N. Raffoul
University of Dayton
Dayton, Ohio
July, 2023
Author

Youssef N. Raffoul is Professor and Graduate Program Director at the University of


Dayton. After receiving his PhD in mathematics from Southern Illinois University, he
joined the faculty at Tougaloo College in Mississippi, serving as Department Chair.
Prof. Raffoul has published 160 articles in prestigious journals in the areas of func-
tional differential, difference equations, and dynamical systems on time scales. He
is twice recipient of the University of Dayton College of Arts and Sciences’ Award
for Outstanding Scholarship as well as recipient of the University of Dayton Alumni
Award in Scholarship. He was honored by the Lebanese government with the Career
in Science Award. The Archbishop of Lebanon awarded him the Lifetime Achieve-
ment Award. Most notably, he is the recipient of the Order of Merit, Silver Medal
with Distinction, presented to him by President General Michel Aoun.

xv
1
Ordinary Differential Equations

In this chapter, we briefly go over elementary topics from ordinary differential equa-
tions that we will need in later chapters. The chapter provides the foundations to
assist students in learning not only how to read and understand differential equa-
tions but also how to read technical material in more advanced-setting texts as they
progress through their studies. We discuss basic topics in first-order differential equa-
tions, including separable and exact equations and the variation of parameters for-
mula. We provide applications for infections. At the end of the chapter, we study
higher-order differential equations and some of their theoretical aspects. The chapter
is not intended to be taught as a part of a graduate course but rather as a reference for
the students for later chapters.

1.1 Preliminaries
Let I be an interval of the real numbers R and consider the function f : I → R. For
x0 ∈ I, the derivative of f at x0 is

f (x0 + h) − f (x0 )
f ′ (x0 ) = lim (1.1)
h→0 h
provided the limit exists. When the limit exists, we say that f is differentiable at
x0 . The term f ′ (x0 ) is the instantaneous rate of change of the function f at x0 . If x0
is one of the endpoints of the interval I, then the above definition of the derivative
becomes a one-sided derivative. If f ′ (x0 ) exists at every point x0 ∈ I, then we say
f is differentiable on I and write f ′ (x). The derivative of a function f is again a
function f ′ ; its domain, which is a subset of the domain of f , is the set of all points
x0 for which f is differentiable. Other notations for the derivative are Dx f , d f /dx,
and dy/dx, where y = f (x). The function f ′ may in turn have a derivative, denoted
by f ′′ , which is defined at all points where f ′ is differentiable. f ′′ is called the second
derivative of f . For higher-order derivatives, we use the notations
dn f
f ′′′ (x), f (4) (x), . . . , f (n) (x), or for n = 1, 2, 3, . . . .
dxn

DOI: 10.1201/9781003449881-1 1
2 Ordinary Differential Equations

Example 1.1 For x ∈ R, we set f (x) = x|x|. Then we have that f (x) = x2 , for x >
0 and f (x) = −x2 , for x < 0. Next, we compute f ′ (x0 ). For x0 > 0, we may choose
|h| small enough so that x0 + h > 0. Then by (1.1), we see that

f (x0 + h) − f (x0 ) (x0 + h)2 − x02


f ′ (x0 ) = lim = lim
h→0 h h→0 h
2x0 h + h2
= lim = 2x0 .
h→0 h
On the other hand, if x0 < 0, we may choose |h| small enough so that x0 + h < 0, and
so from (1.1), we obtain
f (x0 + h) − f (x0 ) −(x0 + h)2 + x02
f ′ (x0 ) = lim = lim
h→0 h h→0 h
−2x0 h − h 2
= lim = −2x0 .
h→0 h
Finally,
f (0 + h) − f (0) h|h|
f ′ (0) = lim = lim = 0.
h→0 h h→0 h
In conclusion,
f ′ (x) = 2|x| for all x ∈ R.

A differential equation is an equation involving a function and derivatives of this
function. Differential equations are divided into two classes: ordinary and partial.
Ordinary differential equations contain only functions of a single variable, called the
independent variable, and derivatives with respect to that variable. Partial differential
equations contain a function of two or more variables and some partial derivatives of
this function.
The order of a differential equation is defined by the highest derivative present in the
equation. An nth-order ordinary differential equation is a functional relation of the
form
dy d 2 y d 3 y dny 
F x, y, , 2 , 3 , ..., n = 0, x ∈ R, (1.2)
dx dx dx dx
between the independent variable x and the dependent variable y, and its deriva-
tives
dy d 2 y d 3 y dny
, 2 , 3 , ..., n .
dx dx dx dx
We shall always assume that (1.2) can be solved for y(n) and put in the form

y(n) = f x, y, y′ , . . . , y(n−1) .

(1.3)

Loosely speaking, by a solution of (1.2) on an interval I, we mean a function y(x) =


ϕ(x) such that
F x, ϕ(x), ϕ ′ (x), . . . , ϕ (n) (x)

Preliminaries 3

is defined for all x ∈ I and

F x, ϕ(x), ϕ ′ (x), . . . , ϕ (n) (x) = 0




for all x ∈ I. If we require, for some initial time x0 ∈ R, a solution y(x) to satisfy the
initial conditions

y(x0 ) = a0 , y′ (x0 ) = a1 , . . . , yn−1 (x0 ) = an−1 , (1.4)

for constants ai , i = 0, 1, 2, ..., n − 1, then (1.3) along with (1.4) is called an initial
value problem (IVP).
Before we state the next theorem, we define partial derivatives.
Given a function of several variables f (x, y), the partial derivative of f with respect
to x is the rate of change of f as x varies, keeping y constant, and it is given by

∂f f (x + h, y) − f (x, y)
= lim .
∂x h→0 h
Similarly, the partial derivative of f with respect to y is the rate of change of f as y
varies, keeping x constant, and it is given by

∂f f (x, y + h) − f (x, y)
= lim .
∂ y h→0 h
∂f ∂f
More often we write fx , fy , to denote ∂x and ∂y , respectively.
For the (IVP) (1.3) and (1.4), the following existence and uniqueness result is true.
For more discussion on the topic and on the proof of the next theorem, we refer to
[3], [4], or [5].
Theorem 1.1 Consider the (IVP) defined by (1.3) and (1.4), where f is continuous
on the (n + 1)-dimensional rectangle D of the form

D = {(x, y0 , y1 , . . . , yn−1 ) : bk−1 < yk−1 < dk−1 and bn < x < dn , k = 1, 2, . . . , n}.

If the initial conditions are chosen so that the point (x0 , a0 , a1 , . . . , an−1 ) is in D, then
the (IVP) has at least one solution satisfying the initial conditions. If, in addition, f
has continuous partial derivatives

∂ f ∂2 f ∂ n−1 f
, ′
, ..., ,
∂y ∂y ∂ yn−1
in D, then the solution is unique.
Suppose f satisfies the hypothesis of Theorem 1.1 in D. Then a general solution of
the (IVP) in D is given by the formula

y = ϕ(x, c1 , c2 , . . . , cn )
4 Ordinary Differential Equations

if y solves (1.3) and if for any initial condition (x0 , a0 , a1 , . . . , an−1 ) in D one can
choose values c1 , c2 , . . . , cn so that the solution y satisfies these initial conditions.
We have the following corollary concerning existence and uniqueness of solutions
of first-order initial value problems, which is an immediate consequence of Theorem
1.1.
Corollary 1 Let D ⊂ R × R, and denote the set of all real continuous functions on
D by C(D, R). Let f ∈ C(D, R) and suppose ∂∂ xf is continuous on D. Then for any
(t0 , x0 ) ∈ D, the (IVP)
y′ = f (x, y), y(x0 ) = y0 ,
has a unique solution on an interval containing x0 in its domain.
Example 1.2 As an example, consider

y′ (x) = xy1/2 ,

then
∂f x
f (x, y) = xy1/2 and = 1/2
∂y 2y
are continuous in the upper half-plane defined by y > 0. We conclude from Corollary
1 that for any point (x0 , y0 ), y0 > 0, there is some interval around x0 on which the
given differential equation has a unique solution. □
Example 1.3 Consider
y′ (x) = y2 , y(0) = 1.
Here, we have that
∂f
f (x, y) = y2 and = 2y
∂y
are continuous everywhere in the plane and in particular on the rectangle

D = {(x, y) : −2 < x < 2, 0 < y < 2}.

Since the initial point (0, 1) lies inside the rectangle, Corollary 1 guarantees a unique
solution of the (IVP). □
In the next example, we illustrate the existence of more than one solution.
Example 1.4 Consider the differential equation
3
y′ (x) = y1/3 , y(0) = 0, x ∈ R.
2
Here, we have
3 ∂f 1
f (x, y) = y1/3 and = y−2/3 .
2 ∂y 2
Separable Equations 5

y(x)

x
1 2 3 4 5 6

−5

FIGURE 1.1
This example displays three solutions at x1 = 0, 1, 2.

Clearly f (x, y) is continuous everywhere , but the partial derivative is discontinuous


when y = 0, and hence at the point (0, 0). This explains the existence of many so-
lutions, as we display below. It is clear that y(x) = 0 is a solution. Hence, we may
consider a solution y1 (x) = 0 and let

 0, for x ≤ 0
y2 (x) =
 3/2
x , for x > 0
which is also a solution that is continuous and differentiable. Likewise, for x1 > 0 we
have 
 0, for x ≤ x1
y3 (x) = .
 3/2
(x − x1 ) , for x > x1
Continuing in this way, we see that the differential equation has infinitely many so-
lutions. Similarly, if y is a solution then −y is also a solution (see Fig. 1.1). □

1.2 Separable Equations


Consider the first-order differential equation
y′ = f (x, y) (1.5)
∂f
where f and ∂y are continuous for all (x, y) in a rectangle D of the form

D = {(x, y) : b1 < x < b2 and d1 < y < d2 }.


6 Ordinary Differential Equations

In some cases, we may want to solve (1.5) and, at different times, we may want to
solve it subject to the initial condition
y(x0 ) = y0 , (1.6)

where the point (x0 , y0 ) is specified and in D. In some instances, the first-order dif-
ferential equation (1.5) can be rearranged and put in the form,
dy
+ h(x) = 0
g(y) (1.7)
dx
where h, g : R → R and are continuous on some subsets of R. Here, the functions
h(x) and g(y) are only functions of x and y, respectively. When (1.5) can be put in
the form (1.7), we say it separates or that the differential equation is separable. Now
we discuss the method of solution when (1.5) is separable. In this case, we consider
(1.7) and assume that the functions H(x) and G(y) are the definite integrals of h(x)
and g(y), respectively. Then (1.7) can be written in the form
H ′ (x)dx = −G′ (y)dy.
By integrating the left-side with respect to x and the right-side with respect to y, we
get the general solution
H(x) + G(y) = c, (1.8)
where c is an arbitrary constant. If (1.8) permits us to solve for y in terms x and c,
then we obtain a general solution of (1.7) of the form y = ϕ(x, c). If we cannot solve
for y in terms of x and c, then (1.8) represents an implicit solution of (1.7).
The technique used to obtain (1.8) is simple and easy to understand and apply. How-
ever, if we try to be precise and justify the procedure, some care would have to be im-
plemented. To see this, we take the initial point (x0 , y0 ) in D and suppose that
h(x0 ) ̸= 0 and g(y0 ) ̸= 0.
Let y = ϕ(x) be a solution of (1.7) on an interval I = {x : |x −x0 | < d}, which satisfies
the initial condition ϕ(x0 ) = y0 . Then for all x in I we see that
dϕ(x)
h(x) = −g(ϕ(x)) .
dx
Integrating both sides from x0 to x, we arrive at
 x  x
h(s)ds = − g(ϕ(s))ϕ ′ (s)ds.
x0 x0
h(x0 ) 0)
Since ϕ ′ (x0 ) = − g(ϕ(x = − h(x
g(y0 ) ̸= 0, then in the neighborhood of x0 the solution
0 ))
ϕ(x) is either increasing or decreasing. In either case, we are able to use the change
of variable u = ϕ(x) in the right-hand integral and obtain
 x  ϕ(x)  y
h(s)ds = − g(u)du = − g(u)du, (1.9)
x0 ϕ(x0 ) y0

which is equivalent to (1.8).


Separable Equations 7

Example 1.5 Solve y′ = 5/2 − y/2, y(0) = 2. The differential equation can be writ-
ten as
1 dy 1
= .
5 − y dx 2
Using (1.9), we obtain  
x y
1 1
ds = du.
0 2 2 5−u
For finite x, we must have u < 5; otherwise, the integral on the right diverges. There-
fore, 5 − u > 0 and integration yields
x
= − ln(5 − y) + ln(3),
2
or  3 
x
= ln .
2 5−y
Taking the exponential of both sides, we arrive at the solution
x
y(x) = 5 − 3e− 2 .


Example 1.6 (Logistic Equation) The logistic equation is a simple model of popu-
lation dynamics. Suppose, we have a population of size y with initial population size
of y0 > 0 that has a birth rate αy and a death rate β y. With this model, we obtain

dy
= (α − β )y, which has the solution
dt
y = y0 e(α−β )t .

Our population increases or decreases exponentially depending on whether the birth


rate exceeds death rate or vice versa. However, in reality, there is fighting for limited
resources. The probability of some piece of food (resource) being found is propor-
tional to y. The probability of the same piece of food being found by two individuals
is proportional to y2 . If food is scarce, they fight (to the death), so the death rate due
to fighting (competing) is γy2 for some γ. So

dy
= (α − β )y − γy2
dt
or
dy  y
= ry 1 − ,
dt d
where r = α − β and d = r/γ. This is the differential logistic equation. Note that it
is separable and can be solved explicitly. □
8 Ordinary Differential Equations

Example 1.7 The law of mass action is a useful concept that describes the behavior
of a system that consists of many interacting parts, such as molecules, that react with
each other, or viruses that are passed along from a population of infected individuals
to non-immune ones. The law of mass action was derived first for chemical systems
but subsequently found wide use in epidemiology and ecology. To describe the law
of mass action, we assume m substances s1 , s2 , . . . , sm together form a product with
concentration p. Then the law of mass action states that ddtp is proportional to the
product of the m concentrations si , i = 1, . . . , m. That is,
dp
= ks1 s2 . . . sm .
dt
Suppose we have a homogeneous population of fixed size, divided into two groups.
Those who have the disease are called infective, and those who do not have the dis-
ease are called susceptible. Let S = S(t) be the susceptible portion of the population
and I = I(t) be the infective portion. Then by assumption, we may normalize the
population and have S + I = 1. We further assume that the dynamics of this epidemic
satisfy the law of mass action. Hence, for some positive constant λ we have the
nonlinear differential equation
I ′ (t) = λ SI (1.10)
Let I(0) = I0 , 0 < I(0) < 1 be a given initial condition. It follows that by substituting
S = 1 − I into (1.10),
I ′ (t) = λ I(1 − I), I(0) = I0 . (1.11)
If we can solve (1.11) for I(t), then S(t) can be found from the relation I + S = 1. We
separate the variables in (1.11) and obtain
dI
= λ dt.
I(1 − I)
Using partial fractions on the left side of the equation and then integrating both sides
yields
ln(|I|) − ln(|1 − I|) = λt + c,
or for some positive constant c1 we have
c1 eλt
I(t) = .
1 + c1 eλt
Applying I(0) = I0 gives the solution
I0 eλt
I(t) = . (1.12)
1 − I0 + I0 eλt
Now for 0 < I(0) < 1, the solution given by (1.12) is increasing with time as ex-
pected. Moreover, using L’Hospital’s rule, we have
I0 eλt
lim I(t) = lim = 1.
t→∞ t→∞ 1 − I0 + I0 eλt

Hence, the infection will grow, and everyone in the population will get infected even-
tually. □
Separable Equations 9

1.2.1 Exercises
In Exercises 1.1, verify that the given expression is a solution of the equation. Where
appropriate, c1 , c2 denote constants.
Exercise 1.1 (a) y′′ + y′ − 2y = 0; y = c1 ex + c2 e−2x .
(b) y′ = 25 + y2 ; y = 5 tan(5x).

r
′ y
(c) y = ; y = ( x + c1 )2 , x > 0, c1 > 0.
x
(d) 3x2 ydx + (x3 y + 2y)dy = 0; x3 y + y2 = c1 .
dy 2c1 e2x
(e) = y(2 − 3y); y = .
dx 1 + 3c1 e2x
(f) x2 y′′ − xy′ + 2y = 0; y = x cos(ln x), x > 0.
In each of Exercises 1.2–1.7, decide whether the existence and uniqueness theorem
or corollary of this section does or does not guarantee the existence of a solution of
each of the initial value problems. In the case a solution exists, determine whether
uniqueness is guaranteed or not and determine the region of existence and unique-
ness.
Exercise 1.2
y′ = 3x2 y, y(0) = 3.
Exercise 1.3
y′ = y2 , y(1) = 5.
Exercise 1.4 √
y′ = x − y, y(2) = 1
Exercise 1.5
y′ = ln(1 + y2 ), y(2) = 2.
Exercise 1.6
y′ = x ln(y), y(1) = 0
Exercise 1.7 p
y′ = 1 − y2 , y(0) = 0.
Exercise 1.8 Show that the (IVP)
p
y′ = − 4 − y2 , y(0) = 2

has the two solutions y1 (x) = 2 and y2 (x) = 2 cos(x) on the interval [0, 2π]. Why
doesn’t this contradict Corollary 1?
In 1.9–1.16, solve the given differential equation by separation of variables.
Exercise 1.9
y′ + xy = y, y(1) = 3.
10 Ordinary Differential Equations

Exercise 1.10
dy
(y2 − 1) = xex , y(0) = 5.
dx
Exercise 1.11
dy √
− 1 = x − y.
dx
Exercise 1.12
dy
x = y2 − y, y(0) = 1.
dx
Exercise 1.13
y′ tan(x) = y, y(π/2) = π/2.
Exercise 1.14
dy
+ 2 = sin(2x + y + 1).
dx
Exercise 1.15
y′ = 2x3 y2 + 3x2 y2 , y(0) = 2.
Exercise 1.16
x3 y − y
y′ = , y(0) = 1.
y4 − y3 + 1

1.3 Exact Differential Equations


In this section, we look at a special type of differential equations called exact differ-
ential equations. Given a function f (x, y), the total derivative of f is given by

d f = fx dx + fy dy.
dy
Definition 1.1 (Exact equation) Q(x, y) dx + P(x, y) = 0 is an exact equation if and
only if the differential form Q(x, y) dy + P(x, y) dx is exact, i.e. there exists a function
f (x, y) for which

d f = Q(x, y) dy + P(x, y) dx.

If P(x, y) dx + Q(x, y) dy is an exact differential of f , then d f = P(x, y) dx +


Q(x, y) dy. But by the chain rule, d f = ∂∂ xf dx + ∂∂ yf dy and this equality holds for
any displacements dx, dy. So
∂f ∂f
= P, = Q.
∂x ∂y
From this, we have
∂2 f ∂P ∂2 f ∂Q
= , = .
∂ y∂ x ∂y ∂ x∂ y ∂x
Exact Differential Equations 11

We know that the two mixed second derivatives are equal. So


∂P ∂Q
= .
∂y ∂x
The converse is not necessarily true. Even if this equation holds, the differential need
not be exact. However, it is true if the domain is simply-connected.
Definition 1.2 (Simply-connected domain) A domain D is simply-connected if it is
connected and any closed curve in D can be shrunk to a point in D without leaving
D.
Example 1.8 A disc in 2D is simply-connected. A disc with a “hole” in the middle is
not simply-connected because a loop around the hole cannot be shrunk into a point.
Similarly, a sphere in 3D is simply-connected but a torus is not. □
Theorem 1.2 If ∂P
∂y = ∂Q
∂x throughout a simply-connected domain D, then
P(x, y) dx + Q(x, y) dy = 0 (1.13)
is an exact differential of a single-valued function in D.
If the equation is exact, then the solution is simply f = constant, and we can find f
by integrating ∂∂ xf = P and ∂∂ yf = Q.
Example 1.9 Consider
y cos(xy) + 1 dx + x cos(xy) + ey dy = 0.
 

We have
P = y cos(xy) + 1, Q = x cos(xy) + ey
Then ∂∂Py = ∂∂Qx = cos(xy) − xy sin(xy). Hence, we are dealing with an exact equation.
To find the function f we integrate either
∂f ∂f
= y cos(xy) + 1, or = x cos(xy) + ey .
∂x ∂y
Integrating the first equation gives
f (x, y) = sin(xy) + x + g(y).
Note that since it was a partial derivative with respect to x holding y constant, the
“constant” term can be any function of y. Differentiating the derived f with respect
to y, we have
∂f
= x cos(xy) + g′ (y) = x cos(xy) + ey .
∂y
Thus g′ (y) = ey and g(y) = ey . The constant of integration need not be included in the
preceding line since the solution is f (x, y) = c. This is due to the fact that d f (x, y) = 0
implies that f (x, y) = constant. Thus, the final solution is
sin(xy) + x + ey = c.
If we are given an initial condition, say y(π/2) = 1, then c = 1 + π/2 + e. □
12 Ordinary Differential Equations

1.3.1 Integrating factor


In some cases, it is possible to convert a first-order differential equation of the form
(1.13) that is not exact into an exact one. In such a scenario, it is possible to multiply
(1.13) by a function µ(x, y), called the integrating factor. Thus we make the following
definition.
Definition 1.3 An integrating factor µ(x, y) of (1.13) is a function that makes
µ(x, y)P(x, y) dx + µ(x, y)Q(x, y) dy = 0 (1.14)
an exact differential equation. That is
∂ µ(x, y)P ∂ µ(x, y)Q
= .
∂y ∂x

We have the following theorem.


Theorem 1.3 Equation (1.13) has
(a) an integrating factor of x alone, µ(x) = e f (x) if and only if
1 
Py − Qx = f (x), (1.15)
Q

(b) an integrating factor of y alone, µ(y) = eg(y) if and only if


1 
Qx − Py = g(y). (1.16)
P

Proof Suppose (1.15) holds with µ(x) = e f (x) . Then



∂ µQ ∂µ ∂Q
= Q+µ
∂x ∂x ∂x
µ ∂Q  
since µ ′ (x) = µ(x) f (x)

= Py − Qx Q + µ
Q ∂x
∂P
= µ
∂y

∂ µP
= .
∂y
The rest of the proof will be in the Exercises.
Example 1.10 Consider the differential equation
6xydx + (4y + 9x2 )dy = 0, y>0
which is not exact. Now
1  1 
Py − Qx = 2
6x − 18x ,
Q 4y + 9x
Exact Differential Equations 13

which is not a function of one variable alone. However,


1  1  2
Qx − Py = 18x − 6x =
P 6xy y
and so by (b) of Theorem 1.3, the equation has the integrating factor
 2
µ(y) = e y dy = y2 .

Hence the new differential equation


h i
y2 6xydx + (4y + 9x2 )dy = 0

is exact and has the solution


3x2 y3 + y4 = c.

1.3.2 Exercises
In Exercises 1.17–1.22 show the differential equation is exact and then find its solu-
tion.
Exercise 1.17

(3x2 y + y)dx + (x3 + x + 1 + 2y)dy = 0, y(0) = 2.

Exercise 1.18
(ex sin(y) + 3y)dx + (3x + ex cos(y))dy = 0.
Exercise 1.19

(6xy − y3 )dx + (4y + 3x2 − 3xy2 )dy = 0, y(1) = −1.

Exercise 1.20
y
(x3 + )dx + (y2 + ln(x))dy = 0.
x
Exercise 1.21
x+y
(x + arctan(y))dx + dy = 0.
1 + y2
Exercise 1.22

(y2 cos(x) − 3x2 y − 4x)dx + (2y sin(x) − x3 + ln(y) + 1)dy = 0, y(0) = e.

Exercise 1.23 Suppose µ(x) is a function of only


 x and is an integrating factor of
P(x, y)dx + Q(x, y)dy = 0. Show that Q1 Py − Qx is a function of x alone.
Exercise 1.24 Prove part (b) of Theorem 1.3.
14 Ordinary Differential Equations

Exercise 1.25 Solve the given differential equation by finding an appropriate inte-
grating factor.
(a) (xy + y2 + y)dx + (x + 2y)dy = 0.
(b) (2y2 + 3x)dx + 2xydy = 0.
(c) (y ln(y) + yex )dx + (x + y cos(y))dy = 0.
(d) (4xy2 + y)dx + (6y3 − x)dy = 0.
Exercise 1.26 For appropriate integers r and q show that µ(x, y) = xr yq is an inte-
grating factor for the differential equation and then solve it,

(3y2 + 10xy)dx + (5xy + 12x2 )dy = 0.

1.4 Linear Differential Equations


Consider the differential equation

y′ (x) + a(x)y(x) = g(x, y(x)), y(x0 ) = y0 , x ≥ x0 (1.17)

where g ∈ C(R × R, R) and a ∈ C(R, R). Note that (1.17) can be nonlinear or linear,
which depends on the function g. If g(x, y) = g(x), a function of x alone then (1.17)
is said to be linear (linear in y.) To obtain a formula for the solution, we multiply
x
x0 a(u)du
both sides of (1.17) by the integrating factor e . Observing that
x x x
d a(u)du

a(u)du a(u)du
y(x)e x0 = y′ (x)e x0 + a(x)y(x)e x0 ,
dt
we arrive at x x
d a(u)du

a(u)du
y(x)e x0 = g(x, y(x))e x0 .
dx
An integration of the above expression from x0 to x and using y(x0 ) = y0
yields 
x x s
x0 a(u)du x0 a(u)du
y(x)e = y0 + g(s, y(s))e ds
x0

from which we get


  x x
− xx a(u)du
y(x) = y0 e 0 + g(s, y(s))e− s a(u)du
ds, x ≥ x0 . (1.18)
x0

It can be easily shown that if y(x) satisfies (1.18), then it satisfies (1.17). Expression
(1.18) is known as the variation of parameters formula. We note that (1.18) is a
functional equation in y since the integrand is a function of y. If we replace the
Linear Differential Equations 15

function g with a function h(x) where h ∈ C(R, R), then (1.18) takes the special
form   x 
− xx a(u)du x
y(x) = y0 e 0 + h(s)e− s a(u)du ds, x ≥ x0 . (1.19)
x0

Another special form of (1.18) is that, if the function a(x) is constant for all x ≥ x0
and g is replaced with h(x) as before, then we have from (1.19) that
 x
y(x) = y0 e−a(x−x0 ) + e−a(x−s) h(s)ds, x ≥ x0 . (1.20)
x0

It is easy to compute, using (1.20) that the differential equation

y′ (x) − 3y = e2x , y(0) = 3

has the solution


y(x) = 4e3x − e2x .

Remark 1 If no initial conditions are assigned, then (1.18) takes the form
    
− a(x)dx
y(x) = e C + g(x, y(x))e a(x)dx dx .

1.4.1 Exercises
Exercise 1.27 Solve each of the given differential equation.
dy
(a) x + 2y = 5xy, y(1) = 0.
dx
dy
(b) + 2y = ex , y(1) = 0.
dx
dy
(c) (x + 1) + (x + 2)y = 2xe−x .
dx
dy ex
(d) + = y, y(1) = 0.
dx 1 + x2
dy
(e) x = y ln(x), y(1) = 2.
dx
Exercise 1.28 Find a continuous solution satisfying

y′ (x) + y(x) = f (x), y(0) = 0



1, 0≤x≤1
where f (x) =
0, x > 1.
Is the solution differentiable at x = 1?
16 Ordinary Differential Equations

Exercise 1.29 Find a continuous solution satisfying

y′ (x) + 2xy(x) = f (x), y(0) = 2



 x, 0≤x<1
where f (x) =
0, x ≥ 1.

Is the solution differentiable at x = 1?

1.5 Homogeneous Differential Equations


In this section we look at first-order differential equations that can be written in the
form
P(x, y) dx + Q(x, y) dy = 0. (1.21)
We begin with the following definition.
Definition 1.4 A function g(x, y) of two variables is called homogeneous of degree
α if for all x, y where g is defined and for all positive constants λ we have

g(λ x, λ y) = λ α g(x, y). (1.22)

For example the function h(x, y) = xy + y2 is homogeneous of degree α = 2,


since
h(λ x, λ y) = λ 2 (xy + y2 ) = λ 2 h(x, y).
On the other hand, h(x, y) = xy + y2 + 1 is not homogeneous of any degree α
since
h(λ x, λ y) = λ 2 (xy + y2 ) + 1 ̸= λ α h(x, y),
for any α.
Definition 1.5 The differential equation (1.21) is called a homogeneous differential
equation if both P and Q satisfy condition (1.22) for the same α.
Method of finding solutions
Suppose (1.21) is a homogeneous differential equation. Then you may use either
transformation
(a) y = ux or
(b) x = vy

where u and v are continuous and are functions of the variable x. The choice of either
(a) or (b) depends on the number of terms that multiply P and Q. For example, if P
is multiplied by fewer terms, then go with (b). Similarly, if Q is multiplied by fewer
Homogeneous Differential Equations 17

terms, then go with (a). You may use either (a) or (b) if they are multiplied by the
same number of terms.
dy du
Say we go with y = ux. Then compute = x + u. Multiplying both sides by dx
dx dx
we get
dy = x du + u dx. (1.23)
Next substitute y = ux and (1.23) into (1.21) and the resulting differential equation is
separable in x and u and can be easily solved. If we go with (b), then use
dx = y dv + v dy (1.24)
and then substitute back into (1.21) to obtain a separable equation in terms of v and
y.
Example 1.11 Consider
dy
x + x − 3y = 0. (1.25)
dx
Then the equation (1.25) takes the form
xdy + (x − 3y)dx = 0,
which is homogeneous of degree 1. Since dy is multiplied by one term only, we use
y = ux. Using the above procedure, the differential equation reduces to
x(udx + xdu) + (x − 3xu)dx = 0.
Simplifying by x and then regrouping we arrive at the separable equation
dx du
= .
x 2u − 1
Integrating both sides and then substituting u = y/x we arrive at the solution of the
original problem
1 2y
ln |x| = ln − 1 + c,

2 x
for some constant c. □
Notice that (1.25) can be written as
dy y y
= 1 + 3 = f (1, ),
dx x x
and this hints at another way of defining homogeneous differential equations. Thus
we make another alternate definition.
Definition 1.6 A differential equation
dy
= g(x, y) (1.26)
dx
is called a homogeneous differential equation if g is a homogeneous function of de-
gree 0, that is
g(λ x, λ y) = λ 0 g(x, y) = g(x, y).
18 Ordinary Differential Equations
1
If we let λ = , then
x
y
g(λ x, λ y) = λ 0 g(x, y) = g(x, y) = g(1, ).
x
Hence, if we let F( xy ) = g(1, xy ), then (1.26) can be put in the form

dy y
= F( ). (1.27)
dx x
y
This suggests making the substitution u = . In this case the differential equation
x
given by (1.27) is reduced to
du F(u) − u
= ,
dx x
which is separable.
Another type of first-order differential equation that requires transformations in both
the dependent and independent variables is of the form
dy ax + by + c
= , (1.28)
dx dx + ey + g
with ae ̸= bd. To solve (1.28) we propose the substitutions

x = t − p, y = w − k,

where the constants p and k must be carefully chosen so that the resulting equation
is homogeneous. Using the chain rule, we see that
dy d dw dw dt dw dw
= (w − k) = = = (1) = .
dx dx dx dt dx dt dt
Moreover,

ax + by + c = at + bw + (c − ap − bk).

Similarly,

dx + ey + g = dt + ew + (g − d p − ek).

In order for the resulting equation to be homogeneous we require

c − ap − bk = 0, g − d p − ek = 0,

which has a unique solution p and k since ae ̸= bd. This results in the new differential
equation
dw at + bw
= , (1.29)
dt dt + ew
which is homogeneous in w and t.
Homogeneous Differential Equations 19

Example 1.12 Consider


dy x + 2y − 5
= .
dx −2x − y + 4
It is clear that ae ̸= bd. We use the substitutions

x = t − p, y = w−k

and solve for the unique solution of

−5 − p − 2k = 0, 4 + 2p + k = 0

and obtain p = −1, k = −2. Using (1.29) we arrive at the homogeneous differential
equation
(2t + w)dw + (t + 2w)dt = 0.
This equation can be easily solved by letting w = ut, and obtaining the separable
differential equation
dt 2+u
=− 2 du.
t u + 4u + 1
An integration of both sides gives
1
ln |t| = − ln |u2 + 4u + 1| +C.
2
Or,
1 (y − 2)2 y−2
ln |x − 1| = − ln + 4 + 1 +C.

2 (x − 1)2 x−1

1.5.1 Exercises
In Exercises 1.30–1.34 Show the given differential equations are homogeneous and
solve them.
Exercise 1.30
xdy + (2y + 5x)dx = 0, y(1) = 2.
Exercise 1.31
dy y2
x − 3y = , y(1) = 5.
dx x
Exercise 1.32
dy
x2 xy = x2 + y2 , y(1) = 2.
dx
Exercise 1.33
dy
2xy = 4x2 + 3y2 .
dx
Exercise 1.34
dy p
x = y + x2 − y2 , y(1) = 0.
dx
20 Ordinary Differential Equations

Exercise 1.35 Solve


dy 4x − 3y + 13
= , y(0) = 1.
dx x−y+3
Exercise 1.36 Solve
dy 4x + 3y + 11
=− .
dx 2x + y + 5
Exercise 1.37 Find a substitution that reduces the differential equation
dy
= f (ax + by + c)
dx
to a separable equation, where a, b and c are constants, with b ̸= 0.
Exercise 1.38 Use Exercise 1.37 to solve
dy x+y+1
= −1 + , x + y + 1 > 0.
dx ln(x + y + 1)

1.6 Bernoulli Equation


In the previous section we used substitutions to solve first-order differential equa-
tions. The same intuition can be applied to the Bernoulli differential equation

y′ + P(x)y = Q(x)yn . (1.30)

Note that if n = 0, then (1.30) is a linear differential equation and it can be solved by
the method of Section 1.4. Similarly, if n = 1, then (1.30) is a separable differential
equation that can be solved by the method of Section 1.2. Thus, we consider (1.30)
only for
n≠ 0, 1.
We make the substitution
W = y1−n ,
so that
1 n
y′ = W 1−n W ′ .
n−1
Substituting into (1.30) we arrive at the linear differential equation in W and x

W ′ + (1 − n)P(x)W = (1 − n)Q(x) (1.31)

that can be now solved by the method of Section 1.4.


Example 1.13 Consider
dy
x − y = ex y3 , x > 0.
dx
Higher-Order Differential Equations 21
ex
Then the differential equation is Bernoulli with n = 3, P(x) = − 1x , and Q(x) = x.
Thus, by (1.31) we have the new linear differential equation

2 ex
W ′ + W = −2 ,
x x
which has the solution
ex ex
W = cx−2 − 2 +2 2.
x x
Since y = W −1/2 , the solution of the Bernoulli equation is given by
 ex ex −1/2
y = cx−2 − 2 + 2 2 .
x x

1.6.1 Exercises
Exercise 1.39 Find a general solution of the Bernoulli equation.
dy
(a) 3(1 + x2 ) = 2xy(y3 − 1).
dx
dy y4 cos(x)
(b) y3 + = .
dx x x4
dy 2
(c) (x2 + 1) + 3x3 y = 6xe−3/x .
dx
dy √
(d) x + 2y + sin(x) y = 0.
dx
dy
(e) + 2y = x3 y2 sin(x).
dx

1.7 Higher-Order Differential Equations


Now is the time to consider higher-order linear differential equations. In particular,
we consider the general nth order linear differential equation

dny d n−1 y dy
an (x) n
+ an−1 (x) n−1 + . . . + a1 (x) + a0 (x)y = F(x). (1.32)
dx dx dx
Unless otherwise noted, we always assume that the coefficients ai (x), i = 1, 2, . . . , n
and the function F(x) are continuous on some open interval I. The interval I may be
unbounded. If the function F(x) vanishes for all x ∈ I, then we call (1.32) a homo-
geneous linear equation; otherwise, it is nonhomogeneous. Thus the homogeneous
22 Ordinary Differential Equations

linear equation associated with (1.32) is

dny d n−1 y dy
an (x) n
+ an−1 (x) n−1
+ . . . + a1 (x) + a0 (x)y = 0. (1.33)
dx dx dx

Definition 1.7 Let f1 , f2 , . . . , fn be set of functions defined on an interval I. We


say the set { f1 , f2 , . . . , fn } is linearly dependent on I if there exists constants
c1 , c2 , . . . , cn not all zero, such that

c1 f1 + c2 f2 + . . . + cn fn = 0

for every x ∈ I. If the set of functions is not linearly dependent on the interval I, it is
said to be linearly independent.
Definition 1.8 (Fundamental set of solutions). A set of n solutions of the linear dif-
ferential system (1.33) all defined on the same open interval I, is called a fundamental
set of solutions on I if the solutions are linearly independent functions on I.
We have the following corollary.
Corollary 2 Let the coefficients ai (x), i = 1, 2, . . . , n be continuous on an interval
I. If {φ1 (x), φ2 (x), . . . , φn (x)} form a fundamental set of solutions on I, then the
general solution of (1.33) is given by

y(x) = c1 φ1 (x) + c2 φ2 (x) + . . . + cn φn (x),

for constants ci , i = 1, 2, . . . , n.
Example 1.14 The second-order differential equation

y′′ (x) + y′ (x) − 2y(x) = 0, (1.34)

has the two solutions ϕ1 (x) = ex and ϕ2 (x) = e−2x and they are linearly independent
on I = (−∞, ∞) and hence the fundamental solution is

y(x) = c1 ϕ1 (x) + c2 ϕ2 (x) = c1 ex + c2 e−2x ,

for constants c1 and c2 . □


To completely describe the solution of either (1.32) or (1.33), we impose the initial
conditions
y(x0 ) = d0 , y′ (x0 ) = d1 , . . . , y(n−1) (x0 ) = dn−1 , (1.35)
for an initial point x0 , and constants di , i = 0, 1, 2, ..., n − 1,
Theorem 1.4 [Existence and Uniqueness] Consider the (IVP) defined by (1.32) and
(1.35), where F and ai (x), i = 1, 2, . . . , n are continuous on some open interval I. As-
sume an (x) ̸= 0 for all x ∈ I. For x0 ∈ I and given the constants di , i = 0, 1, 2, ..., n−1,
the differential equation (1.32) has a unique solution on the entire interval I satisfy-
ing the initial condition (1.35).
Higher-Order Differential Equations 23

Another way of determining whether a set of functions is linearly independent or not


is to look at the Wronskian.
Definition 1.9 (Wronskian) Given two functions f and g, the Wronskian of f and g
is the determinant
f g
W = ′ = f g′ − f ′ g.
f g′

We write W ( f , g) to emphasize the functions. Consider the two functions given in


Example 1.14. Then

e−2x
x
e
W (ϕ1 , ϕ2 ) = x = −2e−x − e−x = −3e−x ̸= 0, for all x ∈ (−∞, ∞).
−2e−2x

e

This is an example of a linearly independent pair of functions. Note that the Wron-
skian is everywhere nonzero. On the other hand, if the functions f and g are linearly
dependent, with g = k f for a nonzero constant k, then

f k f
W ( f , g) = ′ = k f f ′ − k f f ′ = 0.
f k f ′

Thus the Wronskian of two linearly dependent functions is zero. This will be made
formal in Theorem 1.5. The above Wronskian discussion can be easily extended to
the set of functions f1 , f2 , . . . , fn , where

···

f1 f2 fn

f2′ ··· fn′

f1
W ( f1 , f2 , . . . , fn ) = .. .. ..

..
. . . .
(n−1) (n−1) (n−1)

1
f
2 f ··· f n

For better illustration, we consider the second-order differential equation

y′′ + b1 (x)y′ + b0 (x)y = 0, (1.36)

where the functions b0 and b1 are continuous on some fixed interval I.


Theorem 1.5 Suppose y1 and y2 are solutions of (1.36) on the interval I. Then y1
and y2 are linearly independent if and only if

W (y1 (x), y2 (x)) ̸= 0, for all x ∈ I.

Proof Let y1 and y2 be linearly independent. Then,

c1 y1 (x) + c2 y2 (x) = 0, for all x ∈ I,

is true only when c1 = c2 = 0. Then a simple differentiation yields

c1 y1 (x) + c2 y2 (x) = 0 and c1 y′1 (x) + c2 y′2 (x) = 0 (1.37)


24 Ordinary Differential Equations

for all x ∈ I. Now (1.37) can only be true for some c1 , and c2 not both zero if and
only if W (y1 (x), y2 (x)) = 0, for all x ∈ I. If (1.37) holds for some point x0 ∈ I, then
the function y = c1 y1 + c2 y2 is a solution (1.36) and satisfies the initial conditions,
y(x0 ) = y′ (x0 ) = 0. On the other hand the zero function; y = 0 is also a solution and
satisfies the initial conditions. This violates the uniqueness of the solution unless y
and the zero solution, y = 0 are the same. Now y = 0 implies (1.37) is true for all x ∈ I.
This shows that W (y1 (x), y2 (x)) = 0, for all x ∈ I if and only if W (y1 (x), y2 (x)) =
0, for at least one x0 ∈ I. This completes the proof.

Next, we define the general solution of the nonhomogeneous differential equation


(1.32). The proof of the next theorem will be left as an exercise.
Theorem 1.6 Let the coefficients ai (x), i = 1, 2, . . . , n and the function F(x) be con-
tinuous on an interval I. Suppose {φ1 (x), φ2 (x), . . . , φn (x)} form a fundamental set
of solutions on I of the homogeneous differential equation (1.33), Denote such solu-
tion with
yh (x) = c1 φ1 (x) + c2 φ2 (x) + . . . + cn φn (x),
for constants ci , i = 1, 2, . . . , n. Let y p (x) be a particular solution of the nonhomoge-
neous differential equation (1.32). Then the general solution of (1.32) on I is given
by
y(x) = yh (x) + y p (x).
Example 1.15 Consider the second-order differential equation

y′′ + 3y′ + 2y = 6, y(0) = 1, y′ (0) = −3. (1.38)

Clearly, each of the functions ϕ1 (x) = e−x , and ϕ2 (x) = e−2x is a solution of the
homogeneous equation y′′ + 3y′ + 2y = 0. Also, they are linearly independent since
−x
e−2x

e
W (ϕ1 , ϕ2 ) = −x = −3e−3x ̸= 0, for all x ∈ (−∞, ∞).
−2e−2x

−e

Thus, the homogeneous solution of y′′ + 3y′ + 2y = 0 is given by

yh (x) = c1 e−x + c2 e−2x ,

for constants c1 and c2 . Moreover, y p (x) = 3 is a particular solution of (1.38), since

y′′p + 3y′p + 2y p = 6.

Thus, the general solution of (1.38) is

y(x) = yh (x) + y p (x) = c1 e−x + c2 e−2x + 3.

Applying the initial conditions we arrive at c1 = −7, and c2 = 5.


Later on in the chapter we will look at different techniques for finding the particular
solution y p (x). □
Equations with Constant Coefficients 25

1.7.1 Exercises
Exercise 1.40 Use Definition 1.7 to show that for any nonzero constant r the set

{erx , xerx , x2 erx , . . . , xn erx }

is linearly independent.
Exercise 1.41 Decide whether or not the solutions given determine a fundamental
set of solutions for the equation.
(a) y′′′ − 3y′′ − y′ + 3y = 0, y1 = e3x + ex , y2 = ex − e−x , y3 = e3x + e−x .
(b) y′′′ − 2y′′ − y′ + 2y = 0, y1 = ex , y2 = e−x , y3 = e2x .
(c) x2 y′′′ + xy′′ − y′ = 0, y1 = 1 + x2 , y2 = 2 + x2 , y3 = ln(x).
(d) x3 y′′′ + 2x√
2 ′′
y + 3xy′ − 3y = 0, √y1 = x,
y2 = cos( 3 ln(x)), y3 = sin( 3 ln(x)), x > 0.

1.8 Equations with Constant Coefficients


We consider the nth order differential equation with constant coefficients

an y(n) (x) + an−1 y(n−1) (x) + . . . + a2 y′′ (x) + a1 y′ (x) + a0 y(x) = 0 (1.39)

and try to find its solution. In the previous section we noticed that solutions to some
of the differential equations that were considered, were exponential functions. Thus,
we search for solutions of (1.39) of the form

y = erx ,

where r is a parameter to be determined. We begin with the observation that

d k rx 
e = rk erx for k = 0, 1, 2, . . . , (1.40)
dxk
A substitution of (1.40) into (1.39) leads to
 
erx an rn + an−1 rn−1 + . . . + a2 r2 + a1 r + a0 = 0.

Since, erx ̸= 0 for any finite r or x, we must have that

an rn + an−1 rn−1 + . . . + a2 r2 + a1 r + a0 = 0. (1.41)

Equation (1.41) is referred to as the characteristic equation or the auxiliary equa-


tion.
26 Ordinary Differential Equations

Distinct Roots
Now suppose the roots of (1.41) can be found. Then we can always write the funda-
mental solution or general solution of (1.39). The easiest case is when all the roots
ri , i = 1, 2, . . . , n are real and distinct. That is, no two roots are the same, or
ri ̸= r j , i, j = 1, 2, . . . , n.
We have the following theorem.
Theorem 1.7 (Distinct Real Roots) Suppose all the roots of (1.41) ri , i = 1, 2, . . . , n
are real and distinct. Then the general solution of (1.39) is given by
n
y= ∑ c k e rk x , (1.42)
k=1

for constants ck , k = 1, 2, . . . , n.

Proof Since the roots are real and distinct, the set {erk x , k = 1, 2, . . . , n} is linearly
independent. Moreover, each function in the set is a solution of (1.39) and hence they
form a fundamental set of solutions. Then, by Theorem 1.6, the solution is given by
(1.42).
Example 1.16 Consider the third order differential equation
y′′′ + 2y′′ − y′ − 2y = 0.
Its characteristic equation is found to be r3 + 2r2 − r − 2 = 0, which factors into
(r2 − 1)(r + 2) = 0. Thus the three roots are −2, −1, and 1 and they are real and
distinct and hence by Theorem 1.7 the general solution is
y = c1 e−2x + c2 e−x + c3 ex .

Repeated Roots
Now we turn our attention to the case when the characteristic equation (1.41) has
some of its roots repeated. In such cases, we are not able to produce n linearly in-
dependent solutions using Theorem 1.7. For example, if the characteristic equation
of a given differential equation has the roots −1, 1, 2, and 2, then we can only pro-
duce the three linearly independent functions e−x , ex , and e2x . The problem is then
to find a way to obtain the linearly independent solutions. To that end we introduce
the symbol L to represent a linear operator in the sense that for functions y1 and y2
in an appropriate space
L (c1 y1 + c2 y2 ) = c1 L y1 + c2 L y2 , for constants c1 , c2 .
Thus, in terms of the operator L , equation (1.39) can take the form L y = 0,
where
dn d n−1 d
L = an n + an−1 n−1 + . . . + a1 + a0 . (1.43)
dx dx dx
Equations with Constant Coefficients 27
d
In addition, we introduce the term D = and hence the notations,
dx

Dy = y′ , D2 y = y′′ , . . . , Dn y = y(n) .

For example, if y is a function of x and d is a constant,

(D − d)y = Dy − dy = y′ − dy.

Thus, in terms of the operator L , and D we may rewrite (1.43)

L = an Dn + an−1 Dn−1 + . . . + a1 D + a0 . (1.44)

Now suppose that (1.41) has a simple root r0 and another root r1 with multiplicity k,
where k is an integer such that k > 1. Then by Exercise 1.42, equation (1.44) reduces
to
L = (D − r1 )k (D − r0 ) = (D − r0 )(D − r1 )k . (1.45)
Then setting (1.45) equal to zero, which corresponds to the differential L y = 0,
we arrive at the two solutions y0 = er0 x , and y1 = er1 x . Remember, we need to find
k +1 linearly independent solutions for the construction of the general solution. Thus,
there are k − 1 missing linearly independent solutions. Applying y to the operator in
(1.45), yields
k 
L y = (D − r0 )(D − r1 )k y = (D − r0 ) D − r1 y .


By setting L y = 0 we arrive at
k
D − r1 y = 0. (1.46)

Every solution of the kth order differential equation in (1.46) will also be a solution
of the original differential equation L y = 0. Since er1 x is already a known solution,
we search for other solutions of the form

y(x) = u(x)er1 x , (1.47)

where the function u is to be determined. Using the product rule we obtain


 
(D − r1 ) u(x)er1 x = (Du(x))er1 x + r1 u(x)er1 x − r1 u(x)er1 x = (Du(x))er1 x .

Or,  
(D − r1 ) u(x)er1 x = (Du(x))er1 x .

Applying u(x)er1 x to (1.46) and by an induction argument on k (see Exercise 1.43) it


can be shown that  
(D − r1 )k u(x)er1 x = (Dk u(x))er1 x .
28 Ordinary Differential Equations

Thus y = u(x)er1 x is a solution of (1.46) if and only if (Dk u(x))er1 x = 0. But this
holds if and only if (Dk u(x)) = 0. Since (Dk u(x)) = uk (x) = 0, the solution is

u(x) = c0 + c1 x + c2 x2 + . . . + ck xk−1 ,

which is a polynomial of degree at most k − 1. We arrived at the following theo-


rem.
Theorem 1.8 (Repeated Roots) Suppose a root r of the characteristic equation
(1.41) has multiplicity k > 1. Then the root r contributes to the general solution
of (1.39) the term  
c0 + c1 x + c2 x2 + . . . + ck xk−1 erx ,
for constants ci , i = 0, 1, 2, . . . , k.
To see the set
{erx , xerx , x2 erx , . . . , xn erx }
is linearly independent, we refer to Exercise 1.40.
Example 1.17 Consider the fourth order differential equation

y(4) − 7y(3) + 18y′′ − 20y′ + 8y = 0.

Its characteristic equation is found to be r4 − 7r3 + 18r2 − 20r + 8 = 0, which factors


into (r − 2)3 (r − 1) = 0. Thus, we have a simple root 1 and another
 root 2 of multi-

plicity 3. Now by Theorem 1.8 the root 2 contributes the term c0 + c1 x + c2 x2 e2x .
Consequently, the general solution is
 
y = c0 + c1 x + c2 x2 e2x + c3 ex .


Complex Roots
We discuss the situation when one of the roots is complex. That is, if (1.41) has a
√ it appears in complex conjugate pairs α ± iβ , where α,
simple complex root then,
and β are real and i = −1. Recall

tn t2 t3 tn
et = ∑ n! = 1 + t + 2! + 3! + n! + . . . .
n=0

If we let t = ix, then the above series becomes

(ix)n

x2 ix3 x4 ix5
eix = ∑ = 1 + ix − − + + −...
n=0 n! 2! 3! 4! 5!
 x2 x4   x3 x5 
= 1− + −... +i x− + +...
2! 4! 3! 5!
= cos(x) + i sin(x).
Equations with Constant Coefficients 29

From Euler’s formula we know that


 
e(α±iβ )x = eαx cos(β x) ± i sin(β x) ,

and moreover, by Exercise 1.44 that for any complex number r,

Derx = rerx .

For emphasis, erx will be a solution of the differential equation given by (1.39) if and
only if r is a root of its characteristic equation given by (1.41). Thus, if the conjugate
compelx pair of roots r1 = α + iβ , and r2 = α − iβ , are simple (nonrepeated), then
the corresponding part of the general solution is
 
K1 e(α+iβ )x + K2 e(α−iβ )x = K1 eαx cos(β x) + i sin(β x)
 
+ K2 eαx cos(β x) − i sin(β x)
 
= eαx c1 cos(β x) + c2 sin(β x) ,

where c1 = K1 + K2 , and c2 = (K1 − K2 )i. It is easy to verify that eαx cos(β x),
and eαx sin(β x), are linearly independent. As a consequence we have the following
theorem.
Theorem 1.9 (Complex Simple Roots) Suppose the characteristic equation (1.41)
has a nonrepeated pair of complex conjugate roots α ± iβ . Then the corresponding
part of the general solution of (1.39) is
 
eαx c1 cos(β x) + c2 sin(β x)

for constants c1 , and c2 .


The next example summarizes all three cases.
Example 1.18 Suppose the characteristic equation of a given differential equation
is found to be
(r2 − 2r + 5)(r2 − 9)(r + 2)2 = 0.
We are interested in finding its general solution. The term r2 − 2r + 5 = 0, has the
pairs of complex conjugate roots 1 −  2i, and 1 + 2i, and therefore
 its contribution
to the general solution is given by ex c1 cos(2x) + c2 sin(2x) . Similarly, r2 − 9 = 0
makes the contribution c3 e3x + c4 e−3x . Finally, c5 e−2x + c6 xe−2x is the contribution
corresponding to (r + 2)2 = 0. Hence, the general solution is
 
y = ex c1 cos(2x) + c2 sin(2x) + c3 e3x + c4 e−3x
+ c5 e−2x + c6 xe−2x .


30 Ordinary Differential Equations

1.8.1 Exercises
Exercise 1.42 Show that for constants a, and b and a function y(x) that is differen-
tiable,
(D − a)(D − b)y = (D − b)(D − a)y.
Exercise 1.43 Use an induction argument to show that
 
(D − r1 )k u(x)er1 x = (Dk u(x))er1 x .

Exercise 1.44 Show that for any complex number r,

Derx = rerx .

In Exercises 1.45- 1.46 the characteristic equation of a certain differential equation


is given. Find the corresponding general solution.
Exercise 1.45
(r2 + 4)(r2 − 9)2 (r + 2)2 = 0.
Exercise 1.46
(r2 + 4)2 (r2 + 3r + 2)3 (r + 5)2 = 0.

In Exercises 1.47- 1.51 solve the given differential equation.


Exercise 1.47
y′′ + y′ − 2y = 0, y(0) = 1, y′ (0) = 4.
Exercise 1.48
y′′′ − y′′ − 4y′ + 4y = 0,
Exercise 1.49
y(4) − y′′′ + y′′ − 3y′ + 5y = 0.
Exercise 1.50
y(4) + 2y′′′ + 3y′′ + 2y′ + y = 0.
Exercise 1.51

y′′′ + 10y′′ + 25y′ = 0, y(0) = 3, y′ (0) = 4, y′′ (0) = 5.

1.9 Nonhomogeneous Equations


In this section we consider the nonhomogeneous nth order differential equation with
constant coefficients

an y(n) (x) + an−1 y(n−1) (x) + . . . + a2 y′′ (x) + a1 y′ (x) + a0 y(x) = f (x), (1.48)
Nonhomogeneous Equations 31

and the associated homogeneous equation

an y(n) (x) + an−1 y(n−1) (x) + . . . + a2 y′′ (x) + a1 y′ (x) + a0 y(x) = 0, (1.49)

where f (x) is continuous on some interval I. In terms of the operator L Equations


(1.48) and (1.49) take the form

L y = f, L y = 0,

respectively. Let y p be a given particular solution of (1.48). Then L y p = f . In addi-


tion, assume z is any other solution of (1.48). Then L z = f , too. Due to the linearity
of L we have that

L (y p − z) = L y p − L z = f − f = 0.

Thus, yh = z − y p is a solution of the associated homogeneous equation given by


(1.49). It follows from Theorem 1.6 that,

yh (x) = c1 φ1 (x) + c2 φ2 (x) + . . . + cn φn (x),

for constants ci , i = 1, 2, . . . , n, where the functions ϕi (x), i = 1, 2, . . . , n are linearly


independent solutions of (1.49). Finding the particular solution y p depends on two
things:
(a) The type of function f (x) in (1.48), and
(b) the nature of the homogeneous solution yh of (1.49).

The method of this section only applies to functions f (x) that are polynomial in
x, combinations of sine or cosine, exponentials in x or combinations of the after-
mentioned forms of f (x). We illustrate the idea by displaying a few examples.
Example 1.19 The differential equation

y′′ − 3y′ + 2y = 4

has the homogenous solution yh (x) = c1 ex + c2 e2x . Since f (x) = 4, is a constant


we consider a particular solution of the form y p = A, where A is to be determined.
Substituting y p into the differential equation and solving for A gives, A = 2. Thus the
general solution is
y(x) = yh (x) + y p = c1 ex + c2 e2x + 2.

Example 1.20 For
y′′ − 3y′ + 2y = 2e3x
we have yh (x) = c1 ex + c2 e2x . Since f (x) = 2e3x , we consider a particular solution of
the form y p = Ae3x , where A is to be determined. Substituting y p into the differential
32 Ordinary Differential Equations

equation we arrive at the relation 2Ae3x = 2e3x . This gives A = 1, and hence the
general solution is

y(x) = yh (x) + y p = c1 ex + c2 e2x + e3x .


Example 1.21 The equation

y′′ − 3y′ + 2y = 2ex

has yh (x) = c1 ex + c2 e2x . Since f (x) = ex , we consider a particular solution of the


form y p = Aex , where A is to be determined. Substituting y p into the differential
equation we arrive at the relation (A − 3A + 2A)ex = 2ex . Or, 0ex = 2ex , which can
only imply that
0 = 2.
So what went wrong? Well, we said in the beginning that f (x) depends on the form
of yh too. Let’s start over. Now f (x) = ex , and so we try a particular solution of the
form y p = Aex . But the term ex is already present in yh and so we try to multiply Aex
by x. Thus, we end up with the particular solution of the form y p = Axex . Substituting
y p into the differential equation we arrive at A = −2, and hence the general solution
is
y(x) = yh (x) + y p = c1 ex + c2 e2x − 2xex .

The table below provides guidance on how to construct y p in the case that none of
the terms present in yh are parts of the forcing function f (x).

1.9.1 Exercises
In Exercises 1.52–1.53 the characteristic equation and the forcing function of a cer-
tain differential equation are given. Write down the particular solution without solv-
ing for the coefficients.
Exercise 1.52

(r2 + 4)(r2 − 9)2 (r + 2)2 = 0; f (x) = sin(2x) + e−5x + 10.

Exercise 1.53

r(r + 4)(r2 + 3r + 2)3 (r + 5)2 = 0; f (x) = cos(2x) + xe−5x + 1.

solve the differential equations.


Exercise 1.54 Solve each of the given differential equations.
(a) y′′ + y′ + y = ex cos(2x) − 2ex sin(2x).
(b) y′′ − 5y′ + 6y = xex cos(2x).
Wronskian Method 33
TABLE 1.1
Shows how to find y p .

f (x) yp
Constant C A
eax Aeax
n
Cx , n = 0, 1, 2, . . . A0 + A1 x + A2 x2 + . . . + An xn
cos(bx), or sin(bx) A1 cos(bx) + A2 sin(bx)
(A0 + A1 x + A2 x2 + . . . + An xn ) cos(bx)
xn cos(bx), or xn sin(bx)
+(B0 + B1 x + B2 x2 + . . . + Bn xn ) sin(bx)
xn eax (A0 + A1 x + A2 x2 + . . . + An xn )eax

(c) y(4) + 5y′′ + 4y = sin(x) + cos(2x).


(d) y(4) − 2y′′ + y = x2 cos(x).
(e) y′′′ − y′′ − 12y′ = x − 2xe−3x .

1.10 Wronskian Method


Suppose we are to solve the differential equation
y′′ + 4y = sec(2x). (1.50)
Then the method of Section 1.8 is not of much help here in constructing the particular
solution y p . This is due to the fact that f (x) = sec(2x) does not fit any of the forms
given in Table 1.1. To find y p we use the Method of variation of constants that we
call here the Wronskian method. It is a general method and applies to any linear
equation whether the coefficients are constants or not. Thus, we begin by considering
the general linear second-order differential equation
y′′ + P(x)y′ + Q(x)y = f (x), (1.51)
where the functions P(x), Q(x), and f (x) are all continuous on some interval I. As-
sume y1 (x), and y2 (x) are known solutions on the interval I of the corresponding
homogeneous equation
y′′ + P(x)y′ + Q(x)y = 0. (1.52)
Then the homogeneous solution of (1.52) is
yh (x) = c1 y1 + c2 y2 ,
for constants c1 , and c2 . Assume a particular solution y p (x) of (1.51) of the
form
y p (x) = u1 (x)y1 (x) + u2 (x)y2 (x) (1.53)
34 Ordinary Differential Equations

where the functions u1 , and u2 are to be found and continuous on the interval I. For
the rest of this section we suppress the independent variable x in (1.53). Differentiat-
ing (1.53) with respect to x we obtain

y′p = u′1 y1 + u1 y′1 + u′2 y2 + u2 y′2 .

We assume u1 and u2 satisfy the natural condition

u′1 y1 + u′2 y2 = 0. (1.54)

Substituting y p and y′p into (1.51) and making use of the fact that y1 (x), and y2 (x) are
known solutions of the corresponding homogeneous equation (1.52), we arrive at the
relation
u′1 y′1 + u′2 y′2 = f (x). (1.55)
Solving (1.54) and (1.55) by using the process of elimination we get

f (x)y2 f (x)y1
u′1 = and u′2 = . (1.56)
y2 y′1 − y1 y′2 y1 y′2 − y2 y′1

Using Wronskian notations, and hence the name of this section, we arrive at the easy
formulae to remember
W1 W2
u′1 = and u′2 = , (1.57)
W W
where

y y2 0 y2 y1 0
W = 1′ , W 1 = , and W2 = .
y′2 y′2 y′

y1 f (x) f (x)
1

Before we go for an example, we briefly discuss how the method can be extended to
nonhomogenous nth order differential equations of the form

y(n) (x) + Pn−1 (x)y(n−1) (x) + . . . + P2 (x)y′′ (x) + P1 (x)y′ (x) + P0 (x)y(x) = f (x).
(1.58)
If its corresponding homogeneous equation has the homogeneous solution yh (x) =
c1 y1 + c2 y2 + . . . + cn yn , then the particular solution y p is of the form

yh (x) = u1 y1 + u2 y2 + . . . + un yn ,

where
Wi
u′i = , i = 1, 2, . . . , n.
W
Here,
···

y1 y2 yn

y′2 ··· y′n

y1
W (y1 , y2 , . . . , yn ) = .. .. ..

..
. . . .
(n−1) (n−1) (n−1)

y
1 y2 ··· yn
Wronskian Method 35

and Wi is the determinant obtained by replacing the ith column of the Wronskian
by  
0
 0 
 ..  .
 
 . 
f (x)
We provide the following example.
Example 1.22 We consider (1.50). First, the homogeneous solution is

yh = c1 sin(2x) + c2 cos(2x),

and so we may set y1 = sin(2x), and y2 = cos(2x). Moreover,



sin(2x) cos(2x) 0 cos(2x)
W = = −2, W 1 = = −1
2 cos(2x) −2 sin(2x) sec(2x) −2 sin(2x)

and
sin(2x) 0
W2 = = tan(2x).
2 cos(2x) sec(2x)
Thus,
1 1
u′1 = , u′2 = − tan(2x).
2 2
An integration gives
x 1 
u1 = , u2 = ln cos(2x) .
2 4
Hence
x 1 
yp = sin(2x) + ln cos(2x) cos(2x).
2 4
Finally, the general solution is y = yh + y p . □

1.10.1 Exercises
In Exercises 1.55- 1.60 solve the given differential equation.
Exercise 1.55
y′′ + 9y = csc(3x).
Exercise 1.56
ln(x)
y′′ + 2y′ + y = , x > 0.
ex
Exercise 1.57
y′′ + y′ + 2y = e−x sin(2x).
Exercise 1.58
2ex
y′′ − y = .
ex + e−x
36 Ordinary Differential Equations

Exercise 1.59
1
y′′ − 6y′ + 9y = , x > 0.
x
Exercise 1.60
1
e4x y′′ + 8y′ + 16y = 2 , x > 0.

x

1.11 Cauchy-Euler Equation


We end this chapter by looking at Cauchy-Euler equations. An nth order homoge-
neous Cauchy-Euler equation is of the form

an xn y(n) (x) + an−1 xn−1 y(n−1) (x) + . . . + a2 x2 y′′ (x) + a1 xy′ (x) + a0 y(x) = 0, x>0

where a j , j = 1, 2, . . . , n, are constants. Euler equations are important since they pop
up in many applications and partial differential equations. In addition, they make
their presence in Chapter 4. We concentrate on the second-order Cauchy-Euler ho-
mogeneous equation
ax2 y′′ + dxy′ + ky = 0,
and write it in the form
b c
y′′ + y′ + 2 y = 0, x > 0. (1.59)
x x
We note that the coefficients of (1.59) are continuous everywhere except at x = 0.
However, we shall consider the equation over the interval (0, ∞). We solve the
Cauchy-Euler equation (1.59) by making the substitution x = et , or equivalently t =
ln(x). Once we rewrite (1.59) in terms of the new variables y, and t, then it is possible
to use the method of Section 1.8 to find its general solution. Let

x = et , or equivalently t = ln(x).

Then,
dy dy dt dy 1 dy
= = = e−t .
dx dt dx dt x dt
Moreover,

d2y d  dy  d  −t dy 
y′′ = = = e−t e
dx2 dx dx dt dt
dy d 2y 
= e−t − e−t + e−t 2

dt dt
dy 2
d y
= −e−2t + e−2t 2 .
dt dt
Cauchy-Euler Equation 37

Substituting into (1.59) and noting that x2 = e2t , we arrive at the second-order differ-
ential equation
d2y dy
+ (b − 1) + cy = 0. (1.60)
dt 2 dt
We remark that on the interval (−∞, 0) we make the substitution |x| = et , or equiva-
lently, x = −et , which will again reduce (1.59) to (1.60). Thus, once (1.60) is solved,
we use the inverse substitution t = ln |x| to obtain solutions of the original equation
(1.59). Recall the three cases that we discussed in Section 1.8.
(a) (Distinct roots) Let r1 and r2 be the two distinct roots of the auxiliary equation
of (1.60). Then, the general solution is given by

y(t) = c1 er1 t + c2 er2 t = c1 (et )r1 + c2 (et )r2 .

Letting t = ln |x|, we obtain the general equation of our original equation to be

y(x) = c1 |x|r1 + c2 |x|r2 .

(b) (Repeated roots) Let r be a repeated root of the auxiliary equation of (1.60).
Then, the general solution is given by

y(t) = c1 ert + c2tert = c1 (et )r + c2t(et )r .

Letting t = ln |x|, we obtain the general solution of our original equation to be

y(x) = c1 |x|r + c2 |x|r ln |x|

for constants c1 , and c2 .


(c) (Complex roots) Let r be a complex root of the auxiliary equation of (1.60) that
has a pair of complex conjugate roots α ± iβ , then the solution is

y(t) = eαt c1 cos(βt) + c2 sin(βt)

for constants c1 , and c2 . Letting t = ln |x|, we obtain the general solution of our
original equation to be

y(x) = |x|α c1 cos(β ln |x|) + c2 sin(β ln |x|) .

Example 1.23 Consider

x2 y′′ − xy′ + y = ln(x), x > 0.

We need to put the equation in the standard form, by dividing with x2 and arrive at
1 1 ln(x)
y′′ − y′ + 2 y = 2 , x > 0.
x x x
Next we find yh of
1 1
y′′ − y′ + 2 y = 0, x > 0.
x x
38 Ordinary Differential Equations

Here b = −1 and c = 1. The auxiliary equation is

r2 − 2r + 1 = 0,

with the repeated root r = 1. Thus, the homogeneous solution is

yh = c1 x + c2 x ln(x).

To find y p we use the method of Section 1.50. Let y1 = x, and y2 = x ln(x). Set
f (x) = ln(x)
x2
. Then

ln2 (x) ln(x)


W = x, W1 = − , and W2 = .
x x
As a consequence, we have

W1 ln2 (x) W2 ln(x)


u′1 = =− 2 and u′2 = = 2 .
W x W x
We carry out the integrations since they require techniques that students may benefit
from. Let u = ln(x). Then
  
ln2 (x) ln2 (x) 1
u1 = − dx = − dx = − u2 e−u du.
x2 x x
After an integration by parts twice in a row we arrive at

ln2 (x) 2 ln(x) 2


u1 = u2 e−u + 2ue−u + 2e−u = + + .
x x x
Similarly, if we let u = ln(x), then
  
ln(x) ln(x) 1
u2 = dx = dx = ue−u du.
x2 x x
An integration yields,

ln(x) 1
u2 = −ue−u − e−u = − − .
x x
Then, after some calculations we arrive at

y p = u1 y1 + u2 y2 = 2 + ln(x).

Finally,
y = yh + y p = c1 x + c2 x ln(x) + 2 + ln(x),
is the general solution. □
Cauchy-Euler Equation 39

1.11.1 Exercises
Exercise 1.61 Solve each of the given differential equation.
(a) x2 y′′ − 4xy′ + 6y = 0, y(−2) = 8, y′ (−2) = 0.
(b) x2 y′′ − xy′ + y = 4x ln(x), x > 0.
(c) x2 y′′ + xy′ + y = sec(ln(x)), x > 0.

(d) x2 y′′ + 3xy′ + y = x, x > 0.
(e) x2 y′′ − 2xy′ + 2y = 2x3 , x > 0.
(f) x2 y′′ + 2xy′ + y = ln(x), y(1) = y′ (1) = 0.
Exercise 1.62 Let α be a constant. Use the substitution x − α = et to reduce

(x − α)2 y′′ + b(x − α)y′ + cy = 0

to the equation
d2y dy
+ (b − 1) + cy = 0.
dt 2 dt
In Exercises 1.63-1.64 use the results of Exercise 1.62 to solve each of the given
differential equation.
Exercise 1.63

(x + 1)2 y′′ − (x + 1)y′ + y = 0, y(0) = 1, y′ (0) = 0.

Exercise 1.64
(x − 2)2 y′′ + y = 0, y(1) = 3, y′ (1) = 1.
2
Partial Differential Equations

This chapter is intended to serve as an introduction to the topic of partial differ-


ential equations. We will discuss basic and fundamental topics that are suitable for
this course. We mainly discuss first-order partial differential equations and Burger’s
equation using the method of characteristics. We move on to the study of second-
order partial differential equations and their classifications. We consider the wave
and heat equations on bounded and unbounded domains.

2.1 Introduction
A partial differential equation, short PDE, is an equation that contains the unknown
function u and its partial derivatives. Recall from Chapter 1, Section 1.1 that given a
function of two variables, f (x, y), the partial derivative of f with respect to x is the
rate of change of f as x varies, keeping y constant and it is given by
∂f f (x + h, y) − f (x, y)
= lim .
∂ x h→0 h
Similarly, the partial derivative of f with respect to y is the rate of change of f as y
varies, keeping x constant and it is given by
∂f f (x, y + h) − f (x, y)
= lim .
∂ y h→0 h
More often we write fx , fy to denote ∂∂ xf and ∂∂ yf , respectively. Similar notations will
be used to denote higher partial derivatives and mixed partial derivatives. Let D be a
subset of R2 and u = u(x, y) such that u : D → R. Then we may denote the general
first oder PDE in u(x, y) by
F(x, y, u(x, y), ux (x, y), uy (x, y)) = 0,
or
F(x, y, u, ux , uy ) = 0, (2.1)
for some function F. In this Chapter, we limit our discussion to PDEs in two inde-
pendent variables. Below we list some important PDEs.
ut + cux = 0, c ∈ R (Transport equation).

DOI: 10.1201/9781003449881-2 40
Introduction 41

uxx + uyy = 0 (Laplace’s equation).


ut + uux = 0 (Burger’s equation).
2
utt − c uxx = 0 (Wave equation).
ut − kuxx = 0 (Heat equation).

Loosely speaking, by a solution of (2.1), we mean a function u(x, y) = ϕ(x, y) such


that 
F x, y, ϕ, ϕx , ϕy = 0
for all x, y ∈ D ⊆ R2 .
Example 2.1 For an arbitrary function f , ϕ(x,t) = f (x − ct) is a general solution
of the transport equation ut + cux = 0 since

ϕt + cϕx = −cϕ ′ (x − ct) + cϕ ′ (x − ct) = 0.


Order and linearity are two of the main properties of PDEs.
Definition 2.1 The order of a partial differential equation is the highest order
derivative in the given PDE.
Definition 2.2 A given partial differential equation is said to be linear if the un-
known function and all of its derivatives enter linearly.
For example, all the equations listed above are linear except Burger’s equation. Now
to better understand linearity we utilize the operator concept L on an appropriate
space, where L is a differential operator. Recall from Chapter 1, that an operator
is really just a function that takes a function as an argument instead of numbers as
we are used to dealing with in functions. For example, L u assigns u a new function
L u. Another example if we take

∂2 ∂2
L = − ,
∂t 2 ∂ x2
then
L u = utt − uxx .
The next definition gives a precise and convenient way to test for linearity.
Definition 2.3 An operatorL is said to be linear if is satisfies
(a)
L (u1 + u2 ) = L u1 + L u2 ,

(b)
L (cu1 ) = cL u1 ,
42 Partial Differential Equations

for any functions u1 , u2 and constant c. Moreover, the equation L u = 0 is said to be


linear if the operator L is linear.
Example 2.2 Consider utt − uxx = 0. To show it is linear we let

L u = utt − uxx .

Then for any functions u, v, we have

L (u + v) = (u + v)tt − (u + v)xx = [utt − uxx ] + [vtt − vxx ] = L u + L v,

L (cu) = (cu)tt − (cu)xx = c(utt ) − c(uxx ) = c[utt − uxx ] = cL u.


This shows the PDE in question is linear. □
Example 2.3 The following PDE uut − ux = 0 is not a linear. To see this, we let

L u = uut − ux .

Then for any functions u, v and a constant c, we have

L (u+v) = (u+v)(u+v)t −(u+v)x = [uut −ux ]+[vvt −vx ]+uvt +vut ̸= L u+L v.

This shows the PDE in question is not linear. □


Let L u = 0 and assume L is a linear operator. If u is a solution, then so is cu, for
some constant c since L (cu) = cL u = 0. Similarly, if v is another solution then so is
c1 u + c2 v, since for any constants c1 and c2 we have L (c1 u + c2 v) = c1 L u + c2 L v.
Thereupon, we have the following theorem.
Theorem 2.1 [Principle of Superposition] Assume L is a linear operator. If
u1 (x, y), u2 (x, y), . . . , un (x, y) are solutions of L u = 0, then so is
n
c1 u1 (x, y) + c2 u2 (x, y) + . . . + cn un (x, y) = ∑ ci ui (x, y).
i=1

Example 2.4 In this example, we show that each of


2 π 2t
un (x,t) = e−n sin(nπx), n = 1, 2, . . . , N

is a solution of the heat equation uxx − ut = 0, t > 0, for the temperature u = u(x,t)
in a rod, considered as a function of the distance x measured along the rod and of the
time t. In addition, we show for constants ci , i = 1, 2, . . . , N that
N
u(x,t) = ∑ cn un (x,t)
n=1

is also a solution.
Introduction 43
∂2
To do so, we let L be the differential operator given by L = ∂ x2

− ∂t . It is clear that
L is linear. Now for any fixed n, n = 1, 2, . . . , N we have

∂ 2  −n2 π 2 t  ∂  2 2 
L un = 2
e sin(nπx) − e−n π t sin(nπx)
∂x ∂t
∂  2 2
 2 2
= nπe−n π t cos(nπx) + n2 π 2 e−n π t sin(nπx)
∂x
2 π 2t 2 π 2t
= −n2 π 2 e−n sin(nπx) + n2 π 2 e−n sin(nπx)
= 0.

This shows L un (x,t) = 0, n = 1, 2, . . . , N. As for the rest of the work, we apply u to


L.
N
Lu = L

∑ cn un (x,t)
n=1
= L (c1 u1 (x,t)) + L (c2 u2 (x,t)) + L (cN uN (x,t))
= c1 L u1 (x,t) + c2 L u2 (x,t) + cN L uN (x,t)
= 0.


Let L be a differential operator. Then the partial differential equation

Lu= f (2.2)

for continuous function f in x and y is called nonhomogeneous partial differential


equation. Its corresponding homogeneous partial differential equation is

L u = 0. (2.3)

If we assume L is linear, then the construction of the general solution of the non-
homogeneous PDE given by (2.2) is similar to its counterpart in ordinary differen-
tial equations. Let u p be a particular solution of (2.2) and uh be the homogeneous
solution of (2.3). Then, due to the linearity of the differential operator L we see
that
L (uh + u p ) = L uh + L u p = 0 + f = f .
Thus, it suffices to find u p of (2.2) and add it to the homogeneous solution uh of (2.3)
to get the general solution
u = uh + u p .
When the PDE is linear and involves only simple derivatives of only one variable, it
is more likely that it can be solved along the lines of an ordinary differential equation,
as the next example shows.
Example 2.5 In this example we display various forms of the solution u = u(x, y)
for the following PDEs
44 Partial Differential Equations

(a) uy = 0,
(b) uyy + u = 0,
(c) uyyx = 0.

We begin with (a). The given PDE, uy = 0 has no partial derivatives with respect
to x, which indicates that the solution u is a function of x only. Thus, the solution is
u(x, y) = g(x), for some function g. Suppose we impose the initial condition u(x, a) =
e2x , then
e2x = u(x, a) = g(x),
which uniquely determines the function g.

(b) Now the PDE, uyy + u = 0 can be thought of as the second-order ODE
z′′ +z = 0, which has the solution z = c1 cos(y)+c2 sin(y). Or, u(x, y) = g(x) cos(y)+
h(x) sin(y), since the constants c1 and c2 may depend on the other variable x, where
the functions g and h are differentiable.

(c) Let f be twice differentiable function in y. Then, we integrate with respect to


x and get
uyy = fyy (y),
and from which we have uy = fy (y) + g(x). Integrating one more time we arrive at
the general solution
u(x, y) = f (y) + yg(x) + h(x).

Here is another example.
Example 2.6 Consider the PDE uyy + u = 3 + y.
Since it is nonhomogeneous we need to find uh and u p . From Example 2.5, we have

uh (x, y) = g(x) cos(y) + h(x) sin(y).

Since f = 3 + y, we look for a particular solution in the form u p = A + By, where A


and B are to be determined by substituting, (u p )yy , and u p into the original PDE. This
implies that A + By = 3 + y, and therefore, A = 3, and B = 1. As a result, we obtain
the general solution

u(x, y) = g(x) cos(y) + h(x) sin(y) + 3 + y.

2.1.1 Exercises
Exercise 2.1 Determine the order and use the operator L to decide linearity, non-
linearity of the given equations.
Linear Equations 45

(a) uxx + 3uy + 4u = 0.


(b) uxy + 3ux = x.
(c) ux + 3uyyyy + u1/2 = 0.
(b) ux = 1.
Exercise 2.2 Show that u(x,t) = et−x is a solution of utt − uxx = 0.
Exercise 2.3 Show that u(x, y) = f (bx − ay) for a differentiable function f is a solu-
tion of aux + buy = 0.
1
Exercise 2.4 Show that u(x,t) = t 3 + (x − t 2 /2)t + (x − t 2 /2)2 solves ut + tux =
6
x, −∞ < x < ∞, t > 0, and satisfies the initial data u(x, 0) = x2 .
Exercise 2.5 Solve 2ux − 3uy = 0, subject to the condition u(0, y) = ey .
Exercise 2.6 Find the general solution of each of the PDEs.
(a) uxx + 4u = 0.
(b) uxy = 0.
(c) uxy + uy = 0.
(d) uxx + 3ux + 2u = e5x .

(e) (x − y)uxy − ux + uy = 0. Hint: let v = (x − y)ux + u.
(f) uxxy = 0.
(g) uxy + uy = e2x+3y .

2.2 Linear Equations


Now is the time to learn how to solve first-order linear partial differential equations
with constant or variable coefficients. We begin with linear equations with constant
coefficients.

2.2.1 Linear equations with constant coefficients


We consider the first-order nonhomogeneous general partial differential equa-
tion
Aux (x, y) + Buy (x, y) +Cu(x, y) = G(x, y), (2.4)
where A, B, and C are constants such that A2 + B2 ̸= 0, and G is a given continuously
differentiable function in x and y. We begin by examining the homogeneous solution
46 Partial Differential Equations

of the corresponding homogenous equation

Aux + Buy +Cu = 0, (2.5)

where we have suppressed the independent variables x and y. Note that the require-
ment A2 + B2 ̸= 0, implies that A and B are not both zero at the same time; otherwise,
we would not have a differential equation to solve. Our aim is to use a linear transfor-
mation and transform (2.5) into an ordinary differential equation in a single indepen-
dent variable, say, x, and the dependent variable u. We begin by letting ξ = ξ (x, y)
and η(x, y) = η, where
ξ = c11 x + c12 y (2.6)
and
η = c21 x + c22 y (2.7)
where the constants c11 , c12 , c21 , and c12 are to be appropriately chosen in order to
reduce the PDE into an ODE. Using the chain rule, we have

ux = uξ ξx + uη ηx = c11 uξ + c21 uη , (2.8)

and
uy = uξ ξy + uη ηy = c12 uξ + c22 uη . (2.9)
Now substituting (2.8) and (2.9) into (2.5) and rearranging the terms we arrive
at  
Ac11 + Bc12 uξ + Ac21 + Bc22 uη +Cu = 0. (2.10)
Assume A ̸= 0 and choose c11 = 1, c12 = 0, c21 = B, and c22 = −A. Then

ξ = x and η = Bx − Ay.

Hence (2.10) becomes


Auξ +Cu = 0,
which is a separable ODE. Separating the variables while fixing η we arrive at
uξ C
=− ,
u A
C
from which we get ln |u| = − ξ + g(η), for some function g. Taking exponential on
A
both sides we obtain the homogeneous solution
C
u(ξ , η) = e− A ξ f (η),

where f is an arbitrary continuously differentiable function. Thus, the general solu-


tion of the homogenous equation (2.5) is
C
uh (x, y) = e− A x f (Bx − Ay). (2.11)
Linear Equations 47

If u p is a particular solution of (2.4), then the general solution of (2.4) is

u(x, y) = uh (x, y) + u p (x, y),

where uh (x, y) is given by (2.11). Now if B ̸= 0 then uh of (2.4) can be found to


be C
uh (x, y) = e− B y f (Bx − Ay). (2.12)

Remark 2 In the case that both A and B are not zero, you may use either (2.11) or
(2.12).
Example 2.7 Solve
ux + uy + u = x + y, (2.13)
subject to the initial condition
u(0, y) = y2 .
First we find uh and since neither A nor B is zero we have the luxury to use either
(2.11) or (2.12). Using (2.12) with A = B = C = 1, we obtain

uh (x, y) = e−y f (x − y).

Using the concept of Chapter 1 of obtaining the particular solution, we try

u p = a + bx + cy.

Then substituting into (2.13) we get

a + b + c + bx + cy = x + y.

It follows from the above expression that a = −2, b = c = 1. Then the general solu-
tion is
u(x, y) = e−y f (x − y) + x + y − 2.
Our next task is to use the initial data to uniquely determine the function f . The
initial data implies that when x = 0 and y = y we have u = y2 . Thus, it follows from
the general solution that
y2 = e−y f (−y) + y − 2.
This gives f (−y) = ey (y2 − y + 2). Let k = −y. Then f (k) = e−k (k2 + k + 2), from
which we get
f (x − y) = ey−x [(x − y)2 + (x − y) + 2].
It follows from
u(x, y) = e−y f (x − y) + x + y − 2
that
u(x, y) = e−x [(x − y)2 + (x − y) + 2] + x + y − 2.

48 Partial Differential Equations

Remark 3 Note that if C = 0 in (2.5), then the homogeneous solution is reduced to

uh (x, y) = f (Bx − Ay),

for an arbitrary continuously differentiable function f .


Example 2.8 Solve the transport equation

ut + cux = 0, −∞ < x < ∞, t > 0, (2.14)

subject to
u(x, 0) = f (x). (2.15)
Using the above remark, we immediately obtain the solution to be

u(x,t) = f (x − ct).

Now suppose we are given



 1, x≤0
f (x) = 1 − x, 0<x≤1
0, x > 1.

The easiest way is to replace x in f with x − ct and rearrange the domains. Thus,

 1, x − ct ≤ 0
u(x,t) = f (x − ct) = 1 − x + ct, 0 < x − ct ≤ 1
0, x − ct > 1.

It follows that 
 1, x ≤ ct
u(x,t) = 1 − x + ct, ct < x ≤ 1 + ct
0, x > 1 + ct.


Next, we give a geometrical interpretation of the characteristic lines and solutions.
Consider the simpler form of (2.5) with C = 0. That is,

Aux + Buy = 0, (2.16)

where A2 + B2 ̸= 0. That is A and B can not both be zero. Let · denote the inner
product in R2 . Then (2.16) is equivalent to

< A, B > · < ux , uy >= 0.

If we let →
−v =< A, B >, then the left side of the equation is the directional derivative
of u in the direction of the vector →

v . That is, the solution u of (2.16) must be constant
in the direction of the vector v = Ai + B j. The lines parallel to the vector →

− −v have
the equation for an arbitrary constant constant K

Bx − Ay = K, (2.17)
Linear Equations 49
y y
η

>
B
A,
<
v= (x, y) ξ

x x
η =< x, y > · < B, −A >
ξ =< x, y > · < A, B >

FIGURE 2.1
Characteristic lines; change of coordinates.

since the vector < B, −A > is orthogonal to → −v , and as such is a normal vector to all


lines that are parallel to v . Thus, (2.17) provides a family of lines and each one of
them is uniquely determined by the specific value of K. The family of lines given
by (2.17) are called characteristic lines for the equation (2.16). We conclude that the
solution u(x, y) is constant in the direction of →

v , which is true also along the family
of lines given by (2.17). Such any line is determined by K = Bx − Ay and hence u
will depend only on Bx − Ay. It follows that

u(x, y) = f (Bx − Ay),

as we have obtained in Remark 3, for an arbitrary function f . We refer to Fig.


2.1.

2.2.2 Exercises
Exercise 2.7 Give all the details in arriving at Equation (2.12).
Exercise 2.8 Redo Example 2.7 using (2.11).
Exercise 2.9 For constants A and B use the transformation

ξ = Ax + By, η = Bx − Ay

and obtain the general solution



− C ξ C ξ
 
u(ξ , η) = e A2 +B2 f (η) + g(ξ , η)e A2 +B2 dξ ,

of the nonhomogeneous equation

Aux (x, y) + Buy (x, y) +Cu(x, y) = g(x, y).


50 Partial Differential Equations

Exercise 2.10 Use the results of Exercise 2.9 to solve

−2ux + 4uy + 5u = ex+3y ,

subject to
u(0, y) = 2 + y.
Exercise 2.11 Solve each of the given PDEs.
(a) ux + uy + u = sin(x).
(b) ux + u = x, u(0, y) = y2 .
(c) 2ux + uy = x + ey , u(0, y) = y2 .
Exercise 2.12 Solve
(a) ux − 3uy = sin(x) + cos(y), u(x, 0) = x.
(b) 2ux + uy = x + ey , u(0, y) = y2 .
Exercise 2.13 Solve
1
(a) ux − 5uy = 0, u(x, 0) = .
1 + x2
(b) ux + ut = −3, u(x, 0) = e3x .
Exercise 2.14 Solve for u(x,t) and sketch u(x,t) at t = 1, 2.

ut − ux = 0, −∞ < x < ∞, t > 0,

subject to
u(x, 0) = f (x), (2.18)
where 
 1, x≤0
f (x) = 1 − x, 0<x≤1
0, x > 1.

2.2.3 Equations with variable coefficients


We turn our attention to the general first-order linear partial differential equation with
variable coefficients

A(x, y)ux (x, y) + B(x, y)uy (x, y) +C(x, y)u(x, y) = G(x, y), (2.19)

where A and B are continuously differentiable functions in the variables in the open
set D ⊂ R2 and that A and B do not simultaneously identically vanish in D. In addi-
tion, the functions C and G are continuous in D. We begin by examining the homo-
geneous equation
Aux + Buy +Cu = 0, (2.20)
Linear Equations 51

where we have suppressed the independent variables x and y. Let


ξ = ξ (x, y), η = ξ (x, y)
be a transformation on D with

ξ ηx
J = x = ξx ηy − ξy ηx ̸= 0. (2.21)
ξy ηy
Note that condition (2.21) is necessary so that the transformation is invertible. Then
by (2.8) and (2.9)
ux = uξ ξx + uη ηx ,
and
uy = uξ ξy + uη ηy .
Substituting into (2.20) and rearranging the terms we obtain
 
Aξx + Bξy uξ + Aηx + Bηy uη +Cu = 0, (2.22)
where A, B, and C are functions of ξ and η. As before, our ultimate goal is to
reduce (2.22) into an ODE and so we ask that
Aηx + Bηy = 0. (2.23)
With this particular choice, the PDE given by (2.22) is reduced into an ODE in the
B
variables ξ , and u. Let’s assume A ̸= 0. Let y(x) be a curve with slope , that forms
A
the surface that is a solution to the PDE. Then,
d dy B
η(x, y(x)) = ηx + ηy = ηx + ηy .
dx dx A
This implies that the characteristic lines given by η(x, y) = k, for constant k is a
dy B
solution to the characteristic equation = , with ηy ̸= 0, otherwise ηx = 0 as well,
dx A
and this will not be a solution. Thus, η(x, y) = k determines η. We may choose the
other variable ξ (x, y) = x. We observe that for this change of variables

ξx ηx
J= = ηy ̸= 0,
ξy ηy
and hence the transformation is invertible. Finally, our PDE reduces to
A(ξ , η)uξ (ξ , η) +C(ξ , η)u(ξ , η) = 0.
The above equation is a separable ODE and has the general solution
 C(ξ ,η)
− dξ
uh (x, y) = f (η)e A(ξ ,η) = f (η)Ψ(ξ , η), (2.24)
where f is an arbitrary function and Ψ is known. If u p is a particular solution of
(2.19), then the general solution of (2.19) is
u(x, y) = uh (x, y) + u p (x, y),
where uh is given by (2.24).
52 Partial Differential Equations

Remark 4 In some cases, it is difficult to compute the particular solution u p of


(2.19) and then add it to the homogeneous solution uh to obtain the general solu-
tion. However, it might be beneficial to transform the whole equation of (2.19) into
a PDE in terms of ξ and η and then use methods of ODEs to solve it. In doing so,
the transformation turns (2.19) into the new PDE

a(ξ , η)uξ + b(ξ , η)uη + c(ξ , η)u = g(ξ , η). (2.25)

This can be achieved by making use of

ux = uξ ξx + uη ηx ,

and
uy = uξ ξy + uη ηy .
Example 2.9 Consider

x2 ux − xyuy + yu = 0, x ̸= 0, (2.26)

subject to
u(1, y) = e2y .
Based on the above discussion, we have A = x2 , B = −xy, and C = y. The character-
istic equation
dy −xy
= 2 ,
dx x
has the characteristic lines as its solution given by xy = k. Thus, we define η(x, y) =
xy. Let ξ (x, y) = x. Then the Jacobian

ξx ηx
J= = x ̸= 0.
ξy ηy

Hence the transformation is invertible. Moreover,

ux = uξ + yuη , uy = xuη .

Substituting into (2.26), we obtain

x2 uξ + yu = 0.

In terms of ξ and η the equation is reduced to


−η
uξ = u.
ξ3

An integration with respect to ξ yields the solution


 −η
− dξ
u(ξ , η) = f (η)e ξ3

− η2
= f (η)e 2ξ .
Quasi-Linear Equations 53

In terms of x and y we have


y
u(x, y) = f (xy)e− 2x .
Using the initial data we obtain
y
e2y = f (y)e− 2 ,
5y 5xy
from which we have f (y) = e 2 . Thus, f (xy) = e 2 , and the solution is
5xy y
u(x, y) = e 2 e− 2x .

2.2.4 Exercises
Exercise 2.15 Solve
2
(a) yux + xuy = 0, u(0, y) = e−y .
(b) xux − uy = 0, u(x, 0) = x.
Exercise 2.16 Solve
1
(a) 3ux − 2uy + u = x, u(x, 0) = .
1 + x2
(b) ux + ut = −3, u(x, 0) = e3x .
Exercise 2.17 Use Remark 4 to solve
(a) xux + uy = 1, u(x, 0) = cos(3x).
(b) xux − yuy + y2 u = y2 , x, y ̸= 0.

2.3 Quasi-Linear Equations


We now investigate the general first-order quasi-linear partial differential equa-
tion
A(x, y, u)ux (x, y) + B(x, y, u)uy (x, y) = C(x, y, u), (2.27)

where A, B and C are continuously differentiable functions of the variables x, y and


u in some open set D ⊂ R3 . We represent the function u(x, y) by a surface z = u(x, y),
in the xyz-space, see Fig. 2.2. Surfaces corresponding to solutions of a PDE are called
integral surfaces of the PDE. Suppressing the argument, and using inner product, the
left side of (2.27) can be written as
< A, B, C > · < ux , uy , −1 > = 0.
54 Partial Differential Equations

< ux , uy , −1 >

< A, B,C >

z = u(x, y)

FIGURE 2.2
Surface z = u(x, y).

Hence, the normal vector < ux , uy , −1 > to the surface at a given point is orthog-
onal to the vector < A, B, C > at that point. It follows that the vector < A, B, C >
must be tangent to the surface u(x, y)−z = 0 and therefore integral surfaces must be
formed from the integral curves of the vector field < A, B, C > . Thus, the integral
curves are given as solutions to the system of ODEs
dx dy dz
= A(x, y, z), = B(x, y, z), = C(x, y, z). (2.28)
dt dt dt

Note that the choice of parameter t in (2.28) is artificial and it can be suppressed to
write (2.28) in the form
dx dy dz
= = . (2.29)
A(x, y, z) B(x, y, z) C(x, y, z)

Either of the systems (2.28) or (2.29) is called the characteristic system associ-
ated with (2.27). Characteristics curves are solutions of either system (2.28) or
(2.29).
If a surface S : z = u(x, y) is a union of characteristic curves, then S is an integral
surface. We have the following theorem.
Theorem 2.2 Assume the point P = (x0 , y0 , z0 ) is a point on the integral surface
S : z = u(x, y). Let γ be the characteristic curve through P. Then γ lies entirely on
S.
Quasi-Linear Equations 55

Proof Assume γ = (x(t), y(t), z(t)) is a solution of (2.29) and satisfies γ =


(x(t0 ), y(t0 ), z(t0 )) = (x0 , y0 , z0 ) at some initial time t0 . Let

W (t) = z(t) − u(x(t), y(t)).

Then W (t0 ) = z(t0 ) − u(x0 , y0 ) = 0, since the point P lies on the surface S. It follows
from the chain rule and (2.28) that
dW dz dx dy
= − ux (x, y) − uy (x, y)
dt dt dt dt
= C(x, y, z) − ux (x, y)A(x, y, z) − uy (x, y)B(x, y, z),

which can be written in the form


dW
= C(x, y,W + u(x, y)) − ux (x, y)A(x, y,W + u(x, y))
dt
− uy (x, y)B(x, y,W + u(x, y)). (2.30)

Since u(x, y) satisfies (2.27) we see that W = 0 is a particular solution of (2.30). By


uniqueness of solutions, (uniqueness theorem from ODEs) we know that W = 0 only
hold at t = t0 . Thus, the function W (t) = z(t)−u(x(t), y(t)) vanishes identically. This
implies that the curve gamma lies entirely on S. This completes the proof.

Next, we discuss the Cauchy Problem for quasi-linear equation (2.27), which says,
find the integral surface z = u(x, y) of (2.27) containing an initial curve Γ.
Method of solution:
Let Γ be non-characteristic contained in the surface u(x, y) − z = 0 ; (that is the
tangent to Γ is nowhere tangent to the characteristic vector < A, B, C > along Γ), be
given initial curve. Parametrize Γ by

x = f (s), y = g(s), z = h(s), at initial time t = 0.

To construct an integral surface containing Γ, one can proceed as follow;


1) Solve either (2.28) or (2.29) and then use the data given by the initial curve
Γ to obtain the constants of integrations. This should produce two independent
integral curves,
   
u1 f (s), h(s), h(s) = c1 , u2 f (s), h(s), h(s) = c2 .

2) Eliminate s from the two equations in 1) and obtain the functional relation
F(c1 , c2 ) = 0, between c1 and c2 .
3) Then the solution to the Cauchy problem is
 
F u1 (x, y, z), u2 (x, y, z) = 0.
56 Partial Differential Equations

4) If we are given an initial curve, then utilize it to compute the arbitrary function
in 3).
Remark 5 (1) You may obtain different looking solutions, but this depends on
whether you use (2.28) or (2.29). However, once you apply the initial data given
by the initial curve Γ, then solutions should match.
(2) The method can be easily extended to PDEs of multiple variables by adapting the
relations (2.28) and (2.29). We shall explain this in one of the examples below.
Example 2.10 Find u(x, y) satisfying

xux + yuy = u + 1,

and
u(x, 1) = 3x.
Let z = u(x, y). First we parametrize the initial curve Γ given by the data u(x, 1) = 3x.
Let x = s at t = 0. Then

Γ : x = s, y = 1, z = 3s, for t = 0.

Using system (2.28) we have

dx dy dz
= x, = y, = z + 1,
dt dt dt
with corresponding solutions

x = c1 et , y = c2 et , z = c3 et − 1.

Applying the initial data given by the parametrized curve Γ, we obtain c1 = s, c2 = 1,


and c3 = 3s + 1. Thus, we have the solutions

x = set , y = et , z = (3s + 1)et − 1.


x
Eliminating s in the two equations x = set , y = et gives s = et = xy . Finally, substi-
tuting s = xy in z = (3s + 1)et − 1 yields,

x
z = (3 + 1)y − 1 = 3x + y − 1.
y
So the solution is
u(x, y) = 3x + y − 1.
x
Fig. 2.3 shows the characteristic lines given by y = intersecting the initial curve
s
y = 1 at exactly one point (they are not in the same direction) and hence, as the curves
cover all the plane, the solution is defined everywhere. □
In the next example, we revisit the transport equation with incompatible data.
Quasi-Linear Equations 57
y

s = −1 s = −0.5 s = 0.5 s=1


s = −2 s=2
s=3 y=1

FIGURE 2.3
x
Characteristic lines y = s intersecting initial curve y = 1.

Example 2.11 Consider the transport equation

ux + cuy = 0, −∞ < x < ∞, y > 0, (2.31)

subject to
u(x, cx) = f (x). (2.32)
To parametrize the initial curve Γ we let x = s at t = 0. Then

Γ : x = s, y = cs, z = f (s), for t = 0.

Using system (2.28), we have

dx dy dz
= 1, = c, = 0,
dt dt dt
and corresponding solutions along Γ are

x = t + c1 , y = ct + c2 , z = c3 .

Applying the initial data given by the parametrized curve Γ, we obtain c1 = s, c2 =


cs, and c3 = f (s). Thus, we have the solutions

x = s + t, y = ct + cs, z = f (s).

We see that it is not feasible to eliminate s and write the solution u in x and y. Notice
that the characteristic line given by y = c(s + t) = cx, for a fixed value of c is the
same as the equation of the initial curve y = cx. In other words, the direction of the
characteristic lines are in the same direction as the initial curve. □
58 Partial Differential Equations

Example 2.12 Find u(x, y) satisfying

ux + uy + u − 1 = 0,

subject to
u(x, x + x2 ) = sin(x), x > 0.
Let z = u(x, y). By parametrizing the initial curve Γ we have

Γ : x = s, y = s + s2 , z = sin(s), s>0 for t = 0.

Using system (2.28), we have


dx dy dz
= 1, = 1, = 1 − z,
dt dt dt
and corresponding solutions evaluated at the initial curve

y = s + s2 + t, z = 1 − 1 − sin(s) e−t .

x = s + t,

The√two equations x = s + t, y = s + s2 + t, give y = x + s2 . Then solving for s gives


s = y − x. Substituting s and t = x − s into the equation z = 1 − 1 − sin(s) e−t and
using z = u(x, y) we obtain the solution
 √  √
u(x, y) = 1 − (1 − sin( y − x) e(−x+ y−x) ,

which is defined for y > x. In Fig. 2.4, we plotted the traces of the characteristic
curves given by y = x + c, where c = s2 against the data from the initial curve y =
x + x2 and the two intersect in the region y > x, where the solution exists. □
Example 2.13 Consider the transport equation

ux + xuy = 0, −∞ < x, y < ∞, (2.33)

subject to
u(x, 0) = f (x). (2.34)
To parametrize the initial curve Γ we let x = s at t = 0. Then

Γ : x = s, y = 0, z = f (s), for t = 0.

Using system (2.28), we get


dx dy
= ,
1 x
gives
x 2 s2
y= + .
2 2
We also get z = f (s) from the relation dz
dt = 0. Solving for s we arrive at s =
p
2
± x − 2y. Thus, in order for a solution to exist we must require that f be an even
Quasi-Linear Equations 59

y>x

y = x + x2 y = x+1

y y=x

x
(0, 0)

FIGURE 2.4
The solution exists in the shaded region y > x.

2 2
function. Since the solution is constant on the characteristic curves y = x2 + s2 , and
the curves intersect the line y = 0 (the initial curve), the solution exists in the re-
2
gion where the traces of the characteristic lines given by y = x2 , intersects the y-axis.
Moreover, we must make sure that s2 = 2y − x2 > 0. In conclusion, solutions exist in
the region
x2
y≤
2
as depicted by Fig. 2.5.

The next example will show that based on the nature of the given equation, you will
need to utilize either (2.28) or (2.29). But first, we make the following remark.
Remark 6 A useful technique for integrating a system of first-order equations is that
of multipliers. Recall from algebra that is ab = dc , then
a c λ a + µc
= =
b d λ b + µd
for arbitrary values of multipliers λ , µ. This can be generalized and one would have
a c e λ a + µc + νe
= = = (2.35)
b d f λ b + µd + ν f
60 Partial Differential Equations
y

x2 x2
y≤ 2 y≤ 2

x
(0, 0)
x2
y≤ 2

x2
y≤ 2

FIGURE 2.5
Feasible region for the solution.

for arbitrary multipliers λ , µ, and ν. Hopefully, with the right choices of the param-
eters λ , µ, and ν, expression (2.35) leads to simpler systems of ODEs that can be
easily solved. In particular, if λ , µ, and ν, are chosen such that
λ A + µB + νC = 0,
then λ dx + µdy + νdz = 0. Now if there is a function u such that
du = λ dx + µdy + νdz = 0,
then u(x, y, z) = c1 is an integral curve.
Example 2.14 Solve
xux + (x + u)uy = y − x, (2.36)
containing the curve
u(x, 2) = 1 + 2x.
A parametrization of the initial curve Γ gives
Γ : x = s, y = 2, z = 1 + 2s, for t = 0.
Here you can not use system (2.28) since the obtained ODEs will not be separable.
Hence we resort to using (2.28), in combination with (2.35). From (2.28), we see that
dx dy dz
= = .
x x+z y−x
Quasi-Linear Equations 61

Then, it follows from (2.35)


λ dx + µdy + νdz dx dy dz
= = = . (2.37)
λ x + µ(x + z) + ν(y − x) x x+z y−x
In (2.37), set λ = 0, µ = 1 and ν = 1, and obtain
dy + dz dx
= .
y+z x
y+z 
A direct integration yields, ln(y + z) = ln(x) + k, or ln x = k. Taking exponential
on both sides we arrive at the first integral curve,
y+z
c1 = , (2.38)
x
where the constant k is replaced with the c1 = ek . Applying the initial data given by
the initial curve we obtain c1 = 3+2s
s , or

3
s= . (2.39)
c1 − 2
dy − dx dz
Similarly, if we set λ = −1, µ = 1, and ν = 0 in (2.37) we obtain = ,
z y−x
or
d(y − x)(y − x) = zdz.
An integration gives the second integral curve
(y − x)2 − z2 = c2 .
Applying the initial data we get the relation
(2 − s)2 − (1 + 2s)2 = c2 . (2.40)
Substituting the value of s given by (2.39) into (2.40) produces the expression
3 2 3 2
(2 − ) − (1 + 2 ) = c2 . (2.41)
c1 − 2 c1 − 2
Finally, to obtain a functional relation of the solution substitute c1 and c2 where c1 is
given by (2.38) and c2 = (y − x)2 − z2 into (2.41). Don’t forget to replace z by u for
the final answer.

2.3.1 Exercises
Exercise 2.18 Find u(x, y) satisfying
ux + uy = 1 − u,
subject to
u(x, x + x2 ) = ex , x > 0.
62 Partial Differential Equations

Exercise 2.19 Find u(x, y) satisfying

xuux − uy = 0,

subject to
u(x, 0) = x.
Exercise 2.20 Find u(x, y) satisfying

xux + yuuy = −xy, x > 0

subject to
1
u(x, ) = 5.
x
Exercise 2.21 Find u(x, y) satisfying

uux + uy = 3,

subject to
u(2x2 , 2x) = 0.
Exercise 2.22 Find u(x, y) satisfying

xuux + y2 uy = u2 , x > 0, y > 0,

and
1
u = s, when x = , y = 2s.
s
Exercise 2.23 Find the general solution u(x, y) of

(x + u)ux + (y + u)uy = 0,

Exercise 2.24 Find u(x, y) satisfying

xux + yuy = xe−u ,

subject to
u(x, x2 ) = 0.
Exercise 2.25 Find u(x, y) satisfying

xux + yuy = 1 + y2 ,

subject to
u(x, 1) = x + 1.
y2
Answer: u(x, y) = ln(y) + 2 + 12 + xy .
Burger’s Equation 63

Exercise 2.26 Find u(x, y) satisfying


2xyux + (x2 + y2 )uy = 0, x ̸= y, x > 0, y > 0,
subject to
x
u(x, 1 − x) = sin( ).
x−y
Exercise 2.27 Find u(x, y) satisfying
x2 ux + uuy = 1,
subject to
u(x, 1 − x) = 0, x > 0.
x u2
Answer: = + 1 − y.
1 + xu 2
Exercise 2.28 Find u(x, y) satisfying
3ux − 2uy = x − u,
subject to
(a) u(x, x) = 2x.
1 − 2x
(b) u(x, ) = 0.
3
Exercise 2.29 Use the idea of Remark 6 to solve
(a) (x + y)ux + yuy = x − y; u(x, 1) = 1 − 3x.
(b) (x + y)ux + (x + yu)uy = u2 − 1; u(x, 1) = x2 .
(c) (y − u)ux + (u − x)uy = x − y; u(x, 2x) = 0.
x2 + y2
(d) (y − x)ux + (y + x)uy = ; u(x, 0) = x.
u
Exercise 2.30 Find u(x, y, z) satisfying
(a) xux + yuy + uz = u; u(x, y, 0) = h(x, y), for a suitable function h.
(b) ux + 3uy − 2uz = u; u(0, y, z) = h(y, z), for a suitable function h.
(c) yux + xuy + uz = 0; u(x, y, 0) = h(x, y), for a suitable function h.

2.4 Burger’s Equation


This section is devoted to the study of Burger’s equation
uy + a(u)ux = 0, −∞ < x < ∞, y > 0 (2.42)
64 Partial Differential Equations
y

• (x, y)

slope = 1/c

• x
(s, 0)

FIGURE 2.6
Characteristic lines y = 1c x − cs do not run into each others.

subject to the initial data


u(x, 0) = h(x), (2.43)
for a suitable function h. We seek a solution u = u(x, y) that satisfies (2.42) and (2.43).
Burger’s equation (2.42) can be thought of as nonlinear one-wave equation, where
a(u) is the wave speed. We begin by considering the wave speed a(u) as a constant
c. In particular, we are analyzing

uy + cux = 0, −∞ < x < ∞, (2.44)

along the parametrized initial curve

Γ : x = s, y = 0, z = h(s), at t = 0.

From Example 2.11, we have the characteristic lines (Fig. 2.6)


1 s
s = x − cy, or y = x −
c c
and the solution u is constant along those lines. Thus, the solution is given by

u(x, y) = h(x − cy).

2
If we take the initial function h(x) = e−x , then the solution becomes
2
u(x, y) = e−(x−cy) ,

and are graphed in Fig. 2.8 for wave speed c = 1 and different values of y.
Now we turn our attention to equation (2.42) subject to the initial data given by
(2.43). One of the characteristic equation is
dx
= a(u).
dy
Burger’s Equation 65
y

• (x, y)

• x
(s, 0)

FIGURE 2.7
Wave propagation of the solutions u(x + cy, y) = h(s − cy) by considering y as spacial
time.

y=1
y = 0.5 y = 2

x
y=0

FIGURE 2.8
2
Wave propagation of the solutions u(x, y) = e−(x−cy) for wave speed c = 1.

If x(t) is the corresponding characteristic curve, then

du(x, y) dx
= uy + ux = uy + a(u)ux = 0.
dy dy
This shows that the solution u is constant along the characteristics. Moreover, the
characteristics are straight lines as it is obvious from the calculation below

d2x d dx du(x, y)
2
= ( ) = a′ = 0.
dy dy dy dy
Let
Γ : x = s, y = 0, z = h(s), at t = 0.
Using system (2.28), we have

dx dy dz
= a(z), = 1, = 0,
dt dt dt
66 Partial Differential Equations
y

• (x, y)

slope = 1/c

• x
(s, 0)

FIGURE 2.9
Characteristic lines running into each others in the nonlinear case, unlike Fig. 2.6.

and corresponding solutions along Γ are


y = t, x = s + a(h(s))t, z = h(s).
Replacing t with y we obtain the family of characteristic lines
x = s + a(h(s))y, (2.45)
where s is the x-intercept of the characteristic curves. The characteristics are straight
lines whose slopes are not constants but varies with s as Fig. 2.9 shows. Eliminating
s and using z = u we obtain the solution

u(x, y) = h x − a(u)y . (2.46)
We want a continuous solution with continuous ux and uy . Using (2.46) we
have
ux = h′ x − a(u)y 1 − a′ (u)ux y ,
 

or
h′ (s)
ux = (2.47)
1 + a′ (u)h′ (s)y
Similarly, taking partial derivative with respect to y we get
uy = h′ x − a(u)y − a′ (u)uy y − a(u) .
 

Solving for uy gives


h′ (s)a(u)
uy = . (2.48)
1 + a′ (u)h′ (s)y

Thus, along the characteristic x = s + a(h(s))y, we have ux and uy given by (2.47) and
(2.48), respectively. Moreover, ux and uy become infinite at the positive time
1
y=− , provided that a′ (u)h′ (s) < 0. (2.49)
a′ (u)h′ (s)
Burger’s Equation 67

If a′ (u) > 0, then in order for solutions to exist, expression (2.49) implies that
h′ (s) > 0. In other words, h(s) is an increasing function. Otherwise, the solutions
will experience a “blow -up.” For example, if a(u) = u, then condition (2.49) takes
the form
1
y = min{− ′ },
h (s)
and solutions will experience blow-up at and beyond the time y = − h′ (s1 ) , where
0
h(s0 ) is the minimum of h(s) at s = s0 , and h′ (s) < 0; that is h is non-
increasing.
Example 2.15 Solve

uy + uux = 0, −∞ < x < ∞, y > 0 (2.50)

subject to the initial data



 1, x≤0
u(x, 0) = h(x) = 1 − x, 0<x≤1 (2.51)
0, x > 1.

Parametrizing the initial curve Γ, we get



 1, s≤0
Γ : x = s, y = 0, z= 1 − s, 0<s≤1
0, s > 1.

Using a(u) = u, it follows from the above discussion that



 1, s≤0
y = t, x = zt + s, z = h(s) = 1 − s, 0 < s ≤ 1
0, s > 1.

Eliminating t from x and y, the characteristics satisfy

x = zy + s or s = x − zy. (2.52)

We try to piece the solution together since our initial data is given piecewise.
1. For s ≤ 0, u = z = 1. Moreover, we have from (2.52) that s = x − y. Since, s ≤ 0,
it follows that x − y ≤ 0. We conclude that

u = 1, for x ≤ y.

2. For s > 1, u = z = 0. Moreover, we have from (2.52) that s = x. We conclude


that
u = 0, for x > 1.
68 Partial Differential Equations
y

x
1

FIGURE 2.10
Initial wave profile.

3. For 0 < s ≤ 1, u = z = 1 − s. Moreover, we have from (2.52) that s = x − zy =


x−y
x − (1 − s)y. Solving for s yields s = . Now substituting this value of s into
1−y
u = 1 − s gives
x−y 1−x
u = 1− = .
1−y 1−y

x−y
As for the domain, we have 0 < s ≤ 1, which implies that 0 < ≤ 1. Rearranging
1−y
the terms we arrive at y < x ≤ 1. We conclude that
1−x
u= , for y < x ≤ 1.
1−y
Finally, the solution is

 1, x≤y
1−x
u(x, y) = 1−y , y<x≤1 (2.53)
0, x > 1.


The obtained solution in (2.53) is valid for 0 ≤ y < 1, and discontinuous at y = 1.
The characteristics run into each others in the wedged region where y > 1. (See Fig.
2.11). Next, our goal is to extend the solution beyond y ≥ 1. To do so, we introduce a
curve starting at the discontinuity point (1, 1) and try to construct such curve (shock
path) as shown in the next section.
Example 2.16 Find the blow-up time for

uy + uux = 0, −∞ < x < ∞, y > 0

subject to the initial data


1
u(x, 0) = .
1 + x2
Burger’s Equation 69

• (1, 1)
u(x, y) = 1 u(x, y) = 0
x
1 1−x
u(x, y) = 1−y

FIGURE 2.11
Characteristic lines running into each others in the nonlinear case, unlike Fig. 2.6.

1 1
According to (2.49), the blow-up time is y = min{− }, with h(ξ ) = .
h′ (ξ ) 1+ξ2

Wherfore, we have h′ (ξ ) = − . We need to find the minimum of the func-
(1 + ξ 2 )2
tion g where
1 (1 + ξ 2 )2
g(ξ ) = − = .
h′ (ξ ) 2ξ
After some calculations, we find that
3ξ 4 + 2ξ 2 − 1
g′ (ξ ) = .
2ξ 2
Setting g′ (ξ ) = 0, it follows that the only feasible solution (positive and real time) is
ξ = √13 , which minimizes the function g. Thus, the blow-up time is

1 8 3
y = g( √ ) = .
3 9

2.4.1 Shock path


Consider the first-order PDE

ut + F(u) x = 0, −∞ < x < ∞, t > 0 (2.54)
where F is continuously differentiable function. An equation of this form is called
conservation law for the following reasons. Integrate (2.54) from x = a to x = b and
get
  b
d b 
u(x,t)dx + F(u) x dx = 0.
dt a a
70 Partial Differential Equations

t
x = γ(t)
t

x
a b

FIGURE 2.12
Shock path.

But,  b   
F(u) x dx = F u(b,t) − F u(a,t) ,
a
and so we have  b
d  
u(x,t)dx = F u(a,t) − F u(b,t) . (2.55)
dt a
If u is the amount of a quantity per unit length, then the left of (2.55) is the time rate

of change of the total amount of the quantity inside the interval [a, b]. If F u(x,t) is
the flux through x, that is, the amount of the quantity per unit time positively flowing
across x, then (2.55) implies that the rate of the quantity in [a, b] equals the flux in at
x = a minus the flux out through x = b.
As depicted in Fig. 2.12, let x = γ(t) be a smooth curve across which u is discontin-
uous. Assume u is smooth on each side of the curve γ. Let u0 and u1 denote the right
and left limits of u at γ(t), respectively. That is,

lim u(x,t) = u0 and lim u(x,t) = u1 .


x→γ + (t) x→γ − (t)

From (2.55), we have


 γ − (t)  b
  d d
F u(a,t) − F u(b,t) = u(x,t)dx + u(x,t)dx
dt a dt γ + (t)
 γ − (t)
dγ −
= ut (x,t)dx + u(γ − (t),t)
a dt
 b
dγ +
+ ut (x,t)dx − u(γ + (t),t)
γ + (t) dt
 γ − (t)
dγ −
= ut (x,t)dx + u1
a dt
 b
dγ +
+ ut (x,t)dx − u0 . (2.56)
γ + (t) dt
Burger’s Equation 71

As
a → γ − (t) and b → γ + (t),
we have  
γ − (t) b
ut (x,t)dx → 0 and ut (x,t)dx → 0.
a γ + (t)
As a consequence, expression (2.56) reduces to

F(u1 ) − F(u0 ) = (u1 − u0 ) . (2.57)
dt
Adopting the notation
[F(u)] = F(u1 ) − F(u0 ) and [u] = u1 − u0 ,
equation (2.57) takes the form
dγ [F(u)]
= . (2.58)
dt [u]
Using (2.58), the weak solution that evolves will be a piecewise smooth function
with a discontinuity or shock wave, that propagates with shock speed.
Example 2.17 Consider the problem of Example 2.15. By setting t = y, then ut +
uux = 0 is equivalent to
1 
ut + u2 x = 0, x ∈ R, t > 0.
2
It follows that
1
F(u) = u2 .
2
We know from Fig. 2.13 that the shock occurs at and beyond (1, 1). Also from Fig.
2.13, to the left of the shock we have u1 = 1, and to the right of the shock, u0 = 0.
So, [u] = u1 − u0 = 1 and
1 1 1
[F(u)] = F(u1 ) − F(u0 ) = u21 − u20 = .
2 2 2
Thus the path of shock, γ has the slope
dγ 1
= .
dt 2
Therefore, γ is of the form 2x = t + c. Since this path passes through (1, 1), we see
that c = 1. It follows that the shock line is given by
t 1
x(t) =
+ .
2 2
Therefore, for t ≥ 1, the solution is given by
1, x(t) < 2t + 12

u(x, y) = (2.59)
0, x(t) > 2t + 12 .

72 Partial Differential Equations

t
t = 2x − 1

u=1

u=0
1 •

x
1 1−x
u= 1−y

FIGURE 2.13
Characteristic lines intersecting the shock line t = 2x − 1.

2.4.2 Exercises
Exercise 2.31 Find the breaking time for
2
ut + uux = 0, u(x, 0) = e−2x , x ∈ R, t ≥ 0.
Exercise 2.32 Consider the PDE
1
ut + u2 ux = 0, u(x, 0) = , x ∈ R, t ≥ 0.
x4 + 1
(a) Find and graph the characteristics.
(b) Determine the breaking time and find the shock line.
(c) Find the solution before the breaking time.
Exercise 2.33 Find the breaking time for
ut + u3 ux = 0, u(x, 0) = x1/3 , x ∈ R, t ≥ 0
and then find the solution.
Exercise 2.34 Solve and find the shock line of the traffic flow problem and explain
the physical meaning of the solution
ut + (1 − 2u)ux = 0,
subject to 
1/2, x < 0
u(x, 0) =
1, x > 0.
Exercise 2.35 Solve and find the shock line

 0, x≤0
ut + uux = 0, u(x, 0) = x, 0 ≤ x ≤ 2
2, x > 2.

Second-Order PDEs 73

Exercise 2.36 Solve and find the shock line



 0, x≤0
ut + u2 ux = 0, u(x, 0) = 2, 0 ≤ x ≤ 1
0, x ≥ 1.

Exercise 2.37 Solve



 0, x ≤ −1
ut + uux = 0, u(x, 0) = 2, −1 ≤ x ≤ 1
1, x > 1.

Exercise 2.38 Solve



 1, x≤0
ut + uux = 0, u(x, 0) = 1 − ax , 0 ≤ x < a
1, x ≥ 1.

Exercise 2.39 (a) Solve



1, x ≤ 0
ut + uux = 0, u(x, 0) =
2, x ≥ 0.

(b) Draw the characteristics.

2.5 Second-Order PDEs


In this section we consider general second-order partial differential equations. We be-
gin by looking into their classifications and reduction of orders. Consider the general
second-order linear PDE

A(x, y)uxx +2B(x, y)uxy +C(x, y)uyy +D(x, y)ux + E(x, y)uy + F(x, y)u=G(x, y),
(2.60)

where the function u and the coefficients are twice continuously differentiable in
some domain Ω ⊂ R2 . We shall consider (2.60) along with the Cauchy conditions
imposed on some curve Γ that is defined by y = f (x). Imposing Cauchy conditions
implies that ux and uy are known on Γ. That is,

ux (x, y(x)) = f (x), uy (x, y(x)) = g(x),

where f and g are known functions. A differentiation of these relations with respect
to x gives
dy
uxx (x, y(x)) + uxy (x, y(x)) = fx (x) (2.61)
dx
74 Partial Differential Equations
dy
uxy (x, y(x)) + uyy (x, y(x)) = gx (x). (2.62)
dx
In addition, along the curve Γ, equation (2.60) takes the form
Auxx (x, y(x)) + 2Buxy (x, y(x)) +Cuyy (x, y(x)) = H, (2.63)
where H is a known function in x. Equations (2.61)–(2.63) determine uxx , uyy , and
uxy uniquely unless

A 2B C
dy dy 2 dy
△ = 0 1 dx = −A dx + 2B −C = 0.
1 dy dx
dx 0
Or,
dy 2 dy
A − 2B +C = 0. (2.64)
dx dx
dy
The above equation is quadratic in dxwith solutions

dy B ± B2 − AC
= . (2.65)
dx A
When B2 − AC > 0, there exists two families of curves such that no solution can
be found when Cauchy conditions are imposed on them. The families of curves are
known as the characteristics. On the other hand, there are no characteristics when
B2 − AC < 0, and one family of characteristics exists when B2 − AC = 0. We call the
initial curve Γ characteristic with respect to (2.60) and Cauchy conditions if △ = 0
along Γ, noncharacteristic if △ =
̸ 0 along Γ. When Γ is noncharacteristic, the Cauchy
data uniquely determine the solution. However, in the case of a characteristic initial
curve Γ, then (2.61)–(2.63) are inconsistent, unless more data is offered. Thus, when
Cauchy data coincide with the initial curve Γ, the PDE (2.60) has no solution.
Definition 2.4 The PDE in (2.60) has the following classifications: it is hyperbolic
if B2 − AC > 0, parabolic if B2 − AC = 0, and elliptic if B2 − AC < 0.
We conclude from Definition 2.4, that the classification of the PDE (2.60) depends
on the highest order terms. Our next task is to use transformations that will reduce
the complicated PDE (2.60) to a simpler one that we can easily solve using the
knowledge of the previous sections of this chapter. We introduce the transforma-
tions
ξ = ξ (x, y), η = η(x, y), (2.66)
where ξ and η are twice continuously differentiable and that the Jacobian

ξx ξy
J= ̸= 0 (2.67)
ηx ηy
in the region of interest. Then, x and y are uniquely determined from the system
(2.66). With this in mind, using the chain rule, we obtain
ux = uξ ξx + uη ηx
Second-Order PDEs 75

uy = uξ ξy + uη ηy

uxx = (uξ )x ξx + uξ ξxx + (uη )x ηx + uη ηxx


= uξ ξ ξx2 + 2uηξ ξx ηx + uηη ηx2 + uξ ξxx + uη ηxx .

In a similar fasion, we obtain



uxy = uξ ξ ξx ξy + uξ η ξx ηy + ξy ηx + uηη ηx ηy + uξ ξxy + uη ηxy ,

uyy = uξ ξ ξy2 + 2uηξ ξy ηy + uηη ηy2 + uξ ξyy + uη ηyy .


Substituting into (2.60) we obtain the new and reduced PDE

Âuξ ξ + 2B̂uξ η + Ĉuηη + M(ξ , η, u, uξ , uη ) = 0, (2.68)

where the new coefficients are known and we list the highest order terms. That
is,
 = Aξx2 + 2Bξx ξy +Cξy2 ,

B̂ = Aξx ηx + B ξx ηy + ξy ηx +Cξy ηy ,
and
Ĉ = Aηx2 + 2Bηx ηy +Cηy2 .
Equation (2.68) is called the canonical form of (2.60). Thus, it can be easily shown
that
B̂2 − ÂĈ = J 2 (B2 − AC),
which preserves the personification of the PDE (2.60) under the transformation
(2.66). In the next discussion we explain how to find the transformations ξ and η.
Suppose none of A, B, C is zero. Assume that under the transformations (2.66) Â and
Ĉ vanish. Let’s consider  = 0. Then it follows that

Aξx2 + 2Bξx ξy +Cξy2 = 0.

Divide by ξy2 to obtain


ξx 2 ξx 
A + 2B +C = 0, (2.69)
ξy ξy
ξx
which is quadratic in . Now along the curve ξ = constant, we have
ξy

dξ = ξx dx + ξy dy = 0,

or
dy ξx
=− . (2.70)
dx ξy
76 Partial Differential Equations

Similarly, if we set Ĉ = 0 then we obtain parallel equations to (2.69) and (2.70) in


terms of η. A comparison of (2.65), (2.69), and (2.70) yields that ξ and η are the
solutions to the ordinary differentials equations given by (2.65)

dy B ± B2 − AC
=
dx A
along which ξ = constant and η = constant.
Remark 7 1. B2 − AC > 0. (Hyperbolic equations)
In this case ξ and η are uniquely obtained from (2.65) and satisfy (2.67).
2. B2 − AC = 0. (Parabolic equations)
In this case we can only determine either ξ or η from (2.65). Say ξ is a solution
dy
to dx = BA . Then ξ = BA x − y. We may set η = x, and under this transformation of
ξ and η (2.67) is satisfied. As a matter of fact, J = 1 ̸= 0.
3. B2 − AC < 0. (Elliptic equations)
In this case we determine both ξ and η from (2.65) and arrive at the transformation
√ √
B B2 − AC B B2 − AC
ξ = x−y+i x and η = x − y − i x.
A A A A
One may show (see Exercise 2.43) that J ̸= 0, and  ̸= 0, Ĉ ̸= 0, and B̂ = 0.
Example 2.18 Solve
uxx − 2uxy − 3uyy = 0,
subject to the Cauchy conditions
1
u(x, 0) = x2 , uy (x, 0) = .
3
From (2.64) we have
dy 2 dy
+ 2 − 3 = 0,
dx dx
with √
dy −1 ± 4
= = 1, −3.
dx 1
The two differential equations have the solutions

y = x + c1 , y = −3x + c2 .

Solving for c1 and c2 , we get c1 = y − x, and c2 = y + 3x. As a consequence, we may


set
ξ = y + 3x, η = y − x.
We note that the Jacobian J = 4 ̸= 0. It follows that

ux = 3uξ − uη
Second-Order PDEs 77

uy = uξ + uη
uxx = 9uξ ξ − 6uηξ + uηη
uxy = 3uξ ξ + 2uξ η − uηη
uyy = uξ ξ + 2uηξ + uηη .
Thus,

uxx − 2uxy − 3uyy = 9uξ ξ − 6uηξ + uηη − 2 3uξ ξ + 2uξ η − uηη

− 3 uξ ξ + 2uηξ + uηη
= −16uξ η = 0.

Thus, under the transformation ξ = y+3x, η = y−x, the original PDE is transformed
to the canonical form
uξ η = 0,
which has the solution
u(ξ , η) = F(ξ ) + G(η),
for some functions F and G. In terms of x and y the general solution is

u(x, y) = F(y + 3x) + G(y − x).

Applying the Cauchy condition x2 = u(x, 0), we arrive at

x2 = F(3x) + G(−x). (2.71)

To apply the second Cauchy condition, we notice that

uy (x, y) = F ′ (y + 3x) + G′ (y − x).

Thus,
1/3 = F ′ (3x) + G′ (−x).
Integrate both sides and then multiply the resulting equation with 3, to get

x = F(3x) − 3G(−x). (2.72)

Solving the system of equations given by (2.71) and (2.72) we obtain

3x2 + x x2 − x
F(3x) = , G(−x) = .
4 4
z2 z
Let z = 3x, then F(z) = 12 + 12 , and we conclude that

(y + 3x)2 (y + 3x)
F(y + 3x) = + .
12 12
78 Partial Differential Equations
w2
Similarly, if we set w = −x, then G(w) = 4 + w4 and as a consequence,

(y − x)2 (y − x)
G(y − x) = − .
4 4
Finally, the solution is

u(x, y) = F(y + 3x) + G(y − x)


(y + 3x)2 (y + 3x) (y − x)2 (y − x)
= + + − .
12 12 4 4

2.5.1 Exercises
Exercise 2.40 Find the characteristics and reduce to canonical form and then solve.

uxx − 2 sin(x)uxy − cos2 (x)uyy − cos(x)uy = 0.

Exercise 2.41 Find the characteristics and reduce to canonical form and then solve

4uxx + 5uxy + uyy + ux + uy = 0.

Exercise 2.42 Show that in the hyperbolic case when B2 − AC > 0, we have  = Ĉ =
0, and B̂ ̸= 0.
Exercise 2.43 Show that in the elliptic case when B2 − AC < 0, we have  ̸= 0, Ĉ ̸=
0, and B̂ = 0.
Exercise 2.44 Solve
uxx − 4uxy + 4uyy = 0,
subject to the Cauchy conditions

u(x, 0) = e2x , uy (x, 0) = 5.

Exercise 2.45 Solve


3uxx + 10uxy + 3uyy = 0,
subject to the Cauchy conditions

u(x, 0) = e−x , uy (x, 0) = x.

Exercise 2.46 Find the characteristics and reduce to canonical form and then find
the general solution
4uxx + 5uxy + uyy + ux + uy = 3.
Exercise 2.47 Find the characteristics and reduce to canonical form and then find
the general solution

x2 uxx + 2xyuxy + y2 uyy + xyux + y2 uy = 0.


Wave Equation and D’Alembert’s Solution 79

Exercise 2.48 Find the characteristics and reduce to canonical form and then solve

uxx + 2uxy + 2uyy = 0.

Exercise 2.49 For constants a and b use the transformation

v = ue−(aξ +bη)

to transform the PDE


uxx − uyy + 3ux − 2uy + u = 0,
into
vξ η = cv, c = constant.
Exercise 2.50 Solve
x2 uxx − y2 uyy = xy,
subject to the Cauchy conditions

u(x, 1) = x, uy (x, 1) = 6.

Exercise 2.51 Solve


(x − y)uxy − ux + uy = 0.
Hint: set v = (x − y)ux + u.
Exercise 2.52 Solve
xyuxx + x2 uxy − yux = 0,
subject to the Cauchy conditions

u(x, 0) = ex , uy (x, 0) = 5.

2.6 Wave Equation and D’Alembert’s Solution


This section is about the wave equation. We will discuss the derivation of the wave
equation, the D’Alembert solution, the domain of dependence of solutions, and so-
lutions to the nonhomogeneous wave equation.
We begin this long section with the derivation of the wave equation. Let l > 0 be the
length of a thin string that is stretched between two points on the x-axis. A string
vibrates only if it is tightly stretched. Assume the string undergoes relatively small
transverse vibrations (think of the string of a musical instrument, say a violin string).
Let ρ be the linear density of the string, measured in units of mass per unit of length.
We will assume that the string is made of homogeneous material. and its density is
constant along the entire length of the string. The displacement of the string from its
equilibrium state at time t and position x will be denoted by u(t, x). We assume the
80 Partial Differential Equations

string is positioned in such a way that its left endpoint coincides with the origin of
the xu coordinate system. Consider the motion of a small portion of the string sitting
atop the interval [a, b]. Then the corresponding mass is ρ(b − a), and acceleration
utt . Using Newton’s second law of motion, we have
ρ(b − a) = Total force. (2.73)
Since the mass of the string is negligible, we may discard the effect of gravity on the
string. In addition, we may as well ignore air resistance, and other external forces.
Thus, the only force that is acting on the string is the tension force T(x,t). Assuming
that the string is perfectly flexible, the tension force will have the direction of the
tangent vector along the string. At a fixed time t the position of the string is given by
the parametric equations, x = x, u = u(x,t), where x is a parameter. Then, the tangent
1 ux
vector is < 1, ux >, with corresponding unit vector < p ,p > . Under
1 + u2x 1 + u2x
this set up the tension force takes the form
T (x,t) T (x,t)ux
T(x,t) =< p ,p > (2.74)
1 + u2x 1 + u2x
where T (x,t) is the magnitude of the tension force. Due to the assumption of a small
vibration, it is safe to assume that ux is small, and thus, via Taylor’s expansion we
have
1
q
1 + u2x = 1 + u2x + o(u4x ) ≈ 1.
2
Substituting this approximation into (2.74), we arrive at an equivalent form of the
tension force
T(x,t) =< T (x,t), T (x,t)ux > .
Since there is no longitudinal displacement, we arrive at the following identities for
the balances of forces (2.73) in the x, respectively u directions
0 = T (b,t) − T (a,t)
ρ(b − a)utt = T (b,t)ux (b,t) − T (a,t)ux (a,t).
Simply stated, the first equation shows that the tensions from the two edges of the
little portion of the string balance each other out in the x direction (no longitudinal
motion). From this, we can also infer that the position of the string has no impact on
the tension force. Hence, the second equation might be rewritten as
ux (b,t) − ux (a,t)
ρutt = T .
b−a
Taking the limit in the above equation we arrive at the wave equation
ux (b,t) − ux (a,t)
ρutt = lim T = Tuxx ,
b→a b−a
or
utt − c2 uxx = 0,
with c2 = Tρ .
Wave Equation and D’Alembert’s Solution 81

The above derivation can be generalized to incorporate effects of other forces, as


displayed below. The wave equation

utt − c2 uxx + rut = 0, r > 0,

reflects air resistance as a force proportional to the speed ut . On the other hand, the
wave equation
utt − c2 uxx + ku = 0, k > 0,
incorporates transverse elastic force that is proportional to the displacement u. Fi-
nally, the wave equation

utt − c2 uxx = f (x,t), r > 0,

incorporates externally applied forces. Such equation is refereed to as inhomoge-


neous wave equation, which we will discuss its solution at the end of this sec-
tion.
A good and important application of a hyperbolic PDE with Cauchy conditions is the
one-dimensional wave equation

utt − c2 uxx = 0 (2.75)

u(x, 0) = f (x) (2.76)


ut (x, 0) = g(x) (2.77)
where the function f is twice continuously differentiable and g is continuously dif-
ferentiable. The function f is the initial displacement and g is the initial velocity.
Using the method of the previous section, with A = 1, B = 0, and C = −c2 , we have
the characteristics, which are solutions of the differential equations
dt ±c 1
= 2 =± .
dx c c
Thus, the corresponding characteristic lines (solutions) are

c1 = x − ct, c2 = x + ct.

Let
ξ = x + ct, η = x − ct.
Then,
uxx = uξ ξ + 2uξ η + uηη ,
and
utt = c2 (uξ ξ − 2uξ η + uηη ).
Substituting into (2.75) we arrive at the canonical form −4c2 uξ η = 0. Since c ̸= 0,
we must have
uξ η = 0,
82 Partial Differential Equations

which has the general solution

u(ξ , η) = F(ξ ) + G(η),

where the functions F and G are arbitrary and required to be twice differentiable. In
terms of t and x, the solution takes the form

u(x,t) = F(x + ct) + G(x − ct). (2.78)

To determine the arbitrary functions F and G we apply the initial conditions or


Cauchy conditions (2.76) and (2.77) and obtain

f (x) = F(x) + G(x), (2.79)

and
g(x) = cF ′ (x) − cG′ (x). (2.80)
Integrating (2.80) from x0 to x we arrive at
 x
1
F(x) − G(x) = g(s)ds + K, (2.81)
c x0

where x0 ∈ R and K are constants. Solving for F and G from (2.79) and (2.81)
yields 
1 1 x
F(x) = [ f (x) + g(s)ds + K],
2 c x0
and  x
1 1
G(x) = [ f (x) − g(s)ds − K].
2 c x0
Then by (2.78) the general solution takes the form
 x+ct
1 1
u(x,t) = [ f (x + ct) + g(s)ds + K]
2 c x0
 x−ct
1 1
+ [ f (x − ct) − g(s)ds − K]
2 c x0
1
= [ f (x + ct) + f (x − ct)]
2
  x−ct
1 h x+ct i
+ g(s)ds − g(s)ds .
2c x0 x0

Combining the two integrals we arrive at the D’Alembert solution


 x+ct
1 1
u(x,t) = [ f (x + ct) + f (x − ct)] + g(s)ds. (2.82)
2 2c x−ct

It is simple to verify u(x,t) given by (2.82) is a solution of the wave equation (2.75).
Moreover, by a direct substitution into the solution (2.82), it is evident that the initial
Wave Equation and D’Alembert’s Solution 83

(x0 ,t0 )

(x0 − ct0 , 0) (x0 + ct0 , 0) x

FIGURE 2.14
Nonhomogeneous wave equation.

conditions uniquely determine (2.82). According to (2.82), the value u(x0 ,t0 ) de-
pends on the initial data f and g in the interval [x0 − ct0 , x0 + ct0 ] which is cut out of
the initial line by the the two characteristics lines with slopes ± 1c passing through the
point (x0 ,t0 ). The interval [x0 − ct0 , x0 + ct0 ] on the line t = 0 is called the domain of
dependence, as indicated in Fig. 2.14.
The next theorem is about stability; it says that for a small change in the initial data,
only produces a small change in the solution.
Theorem 2.3 Let u∗ (x,t) be another solution of (2.75)–(2.77) with initial data f ∗
and g∗ . Define |h| = max−∞<x<∞ |h(x)|, for h : R → R is continuous. Similarly, for
u = u(x,t), we define
|u|T = max |u(x,t)|.
−∞<x<∞;|t|≤T

Assume there is a small change in the initial data over a finite time T . That is, for
small and positive ε we see that
ε ε
| f − f ∗| < , |g − g∗ | < .
2 2T
Then,
|u(x,t) − u∗ (x,t)| < ε.

Proof Using (2.82), we have


1
|u(x,t) − u∗ (x,t)| = f (x + ct) − f ∗ (x + ct) + f (x − ct) − f ∗ (x − ct)

2

1 x+ct
+ (g(s) − g∗ (s))ds

2c x−ct
1 1
≤ f (x + ct) − f ∗ (x + ct) + f (x − ct) − f ∗ (x − ct)

2 2
84 Partial Differential Equations

Domain of influence of the point (x0 , 0)


x= ct
x0 −
+ x 0
ct x=

(x0 , 0) x

FIGURE 2.15
Domain of influence of the point (x0 , 0).

 x+ct
1 g(s) − g∗ (s) ds

+
2c x−ct

1 1 x+ct ε
≤ (ε/2 + ε/2) + ds
2 2c x−ct 2T
1 ε
≤ ε/2 + (2cT ) = ε.
2c 2T
Thus, for |t| ≤ T, we have shown that

|u − u∗ |T < ε.

This completes the proof.


The D’Alambert solution given by (2.82) indicates that if an initial velocity, or initial
displacement is in the neighborhood of (x0 ,t0 ), it can only influence the area t > t0
bounded by the characteristic lines with slope ± 1c passing through the point (x0 ,t0 ),
as shown in Fig. 2.15 with initial time t0 = 0.
Example 2.19 Consider
4utt − 9uxx = 0
subject to u(x, 0) = x2 ,
ut (x, 0) = sin(x). Then u(x,t) is given by the D’Alambert’s
solution (2.82) with f = x2 , g = sin(x), and c2 = 94 . Thus,
 x+ 32 t
1 3 3  1
u(x,t) = (x + t)2 + (x − t)2 + sin(s)ds
2 2 2 3 x− 23 t
9 2 3
= x2 + t 2 + sin(x) sin( t).
4 3 2


Wave Equation and D’Alembert’s Solution 85

Now that we displayed an example, let us take a closer look at the geometrical in-
terpretation of (2.78). The term on the right-hand side of (2.78) is called the pro-
gressive wave. If we let x∗ = ct, then the transformation ξ = x + ct = x + x∗ is a
translation of the coordinate system to the left by x∗ . Thus, F(x + ct) is a wave that
moves in the negative x direction with speed c without change in its shape. For ex-
ample, u(x,t) = cos(x + ct) represents a cosine wave which moves in the negative
x-direction with speed c without changing its shape. Similarly, F(x − ct) is a wave
which moves in the positive x-direction with speed c without change in its shape.
Consequently, the solution
u(x,t) = F(x + ct) + G(x − ct)
is the sum of two waves traveling in opposite directions, and the shape of u(x,t) will
change with time.
Example 2.20 Consider the wave problem with zero initial velocity
utt − c2 uxx = 0,
subject to
ut (x, 0) = 0,

h, |x| ≤ a
u(x, 0) =
0, |x| > a.
This initial data corresponds to an initial disturbance of the string centered at x = 0
of height h. The solution is given by (2.82) with g(x) = 0. In other words,
1
u(x,t) = [ f (x + ct) + f (x − ct)].
2
We need to piece together the solution. Notice that

h, |x + ct| ≤ a
f (x + ct) =
0, |x + ct| > a
and 
h, |x − ct| ≤ a
f (x − ct) =
0, |x − ct| > a.
As a consequence the solution is defined piecewise over four different regions. We
will only consider all regions for t ≥ 0. It is clear from the definitions of f (x + ct)
and f (x − ct) that the four regions are:
I = {|x + ct| ≤ a, |x − ct| ≤ a},
II = {|x + ct| ≤ a, |x − ct| > a},
III = {|x + ct| > a, |x − ct| ≤ a},
IV = {|x + ct| > a, |x − ct| > a},
with
h h
uI (x,t) = h, uII (x,t) = , uIII (x,t) = , uIV (x,t) = 0.
2 2
The notation uI (x,t) stands for the value of u in region I, and so on. See Fig. 2.16. □
86 Partial Differential Equations

IV
u=0

III
II u = h/2
u = h/2
IV I IV
u=0 u=h u=0
−a a x

FIGURE 2.16
Different values of u.

Example 2.21 Consider the wave problem with zero initial displacement

utt − c2 uxx = 0,

subject to
u(x, 0) = 0,

g0 , |x| ≤ a
ut (x, 0) =
0, |x| > a.
This is similar to the previous example but we will have to adjust the interval of
integrations. However, here we have six different regions that we list

I = {x − ct < x + ct < −a < a},

II = {x − ct < −a < x + ct < a},


III = {x − ct < −a < a < x + ct},
IV = {−a < x − ct < x + ct < a},
V = {−a < x − ct < a < x + ct},
V I = {−a < a < x − ct < x + ct},
Wave Equation and D’Alembert’s Solution 87
s

(x,t)

s) τ=
t− x+
c( △(x,t) c(
x− t−
τ= s)

(x − ct, 0) (x + ct, 0) τ

FIGURE 2.17
Nonhomogeneous wave equation.

with
0 in I


 
1 x+ct g0
2c −a g0 dx = 2c (x + ct + a), in II








1 a

−a g0 dx = g0 t, in III




 2c

u(x,t) = 
1 x+ct

 2c x−ct g0 dx = g0t, in IV





 1 a g0



 2c x−ct g0 dx = 2c (−x + ct + a), in V




0, in V I

Next we consider the nonhomogeneous wave equation
utt − c2 uxx = h(x,t) (2.83)
subject to
u(x, 0) = f (x), ut (x, 0) = g(x), (2.84)
where the function h is assumed to be continuous with respect to both arguments. We
show that the solution of (2.83) along with the initial data (2.84) is given by
 x+ct
1 1
u(x,t) = [ f (x + ct) + f (x − ct)] + g(s)ds
2 2c x−ct
 
1
+ h(τ, s)dτds, (2.85)
2c △(x,t)

where △(x,t) is shown in Fig. 2.17. We will do this by piecing together the solution
of the homogeneous problem (2.75)–(2.77), which has the solution given by (2.82),
and the solution u p of
uttp − c2 uxx
p
= h(x,t) (2.86)
88 Partial Differential Equations

subject to
u p (x, 0) = 0, utp (x, 0) = 0. (2.87)
We already have the transformation

ξ = x + ct, η = x − ct.

Solving for x and t we get

ξ +η ξ −η
x= , t= . (2.88)
2 2c
Under the same transformation, we saw the left side of (2.86) becomes

−4c2 uξ η = 0.

Therefore, (2.86) takes the form


1
uξp η = − h(ξ , η). (2.89)
4c2
Setting t = 0 in (2.88), we immediately have ξ = η. Thus, the first initial condition
of (2.87) reduces to
u p (ξ , ξ ) = 0.
Using uxp = uξp ξx + uηp ηx = uξp + uηp we have uxp (x, 0) = 0, implies that

uξp (ξ , ξ ) + uηp (ξ , ξ ) = 0.

Similarly, utp = uξp ξt + uηp ηt = cuξp − cuηp . Thus, the second boundary condition of
(2.87) reduces to
cuξp (ξ , ξ ) − cuηp (ξ , ξ ) = 0.

From the last two equations above, it is immediate that uξp (ξ , ξ ) = uηp (ξ , ξ ) = 0. Fix a
point (x0 ,t0 ). Then the corresponding point in the characteristic variables is (ξ0 , η0 ).
In order to find the value of the solution at this point we begin by integrating (2.89)
in term of η from ξ to η0 and obtain
 η0  η0
1
uξp η dη = − h(ξ , η)dη.
ξ 4c2 ξ

However,  η0
uξp η dη = uξp (ξ , η0 ) − uξp (ξ , ξ ) = uξp (ξ , η0 ).
ξ

As a result, we have
 η0  ξ
1 1
uξp (ξ , η0 ) = − 2 h(ξ , η)dη = 2 h(ξ , η)dη. (2.90)
4c ξ 4c η0
Wave Equation and D’Alembert’s Solution 89

Similar to above, the integral,


 ξ0
uξp (ξ , η)dξ = u p (ξ0 , η0 ) − u p (ξ0 , ξ0 ) = u p (ξ0 , η0 ). (2.91)
η0

Integrating (2.90) with respect to ξ from η0 to ξ0 and then using (2.91), we arrive
at
ξ0 ξ
p 1
u (ξ0 , η0 ) = h(ξ , η)dηdξ
4c2 η0 η0
 
1
= h(ξ , η)dξ dη, (2.92)
4c2 △

where the double integral is taken over the triangle of dependence of the point (x0 ,t0 ),
as shown in Fig. 2.14. Left to transform the double integral in (2.92) to a double
integral in terms of the variables (x,t). For ξ = x + ct, η = x − ct, we have

ξ ξy 1 c
J = x = = −2c ̸= 0.
ηx ηy 1 −c

Thus,
   
1 1
u p (ξ , η) = h(τ, s)|J|dτds = h(τ, s)dτds, (2.93)
4c2 △(x,t) 2c △(x,t)

where △(x,t) is shown in Fig. 2.17. Finally, adding (2.93) to (2.82), we obtain (2.85).
For illustrational purpose, we provide the following two examples.
Example 2.22 Consider
4utt − 9uxx = 4xt
subject to u(x, 0) = x2 ,
ut (x, 0) = sin(x). Here u(x,t) is given by the D’Alambert’s
solution (2.85) with f = x2 , g = sin(x), h(x,t) = xt and c2 = 49 . With the aid of
Example 2.19 we have
 t x+ 23 (t−s)
9 2 3
u(x,t) = x2 + t 2 + sin(x) sin( t) + τ sdτds
4 3 2 0 x− 32 (t−s)
 t
9 2 3 1 s 2
= x + t 2 + sin(x) sin( t) +
2
[x − c (t − s)2 ]ds 2
4 3 2 3 0 2
9 2 3 t 2 x2 3t 4
= x2 + t 2 + sin(x) sin( t) + − .
4 3 2 12 96

Example 2.23 Consider
12
utt − 9uxx =
t2 + 1
subject to u(x, 0) = x, ut (x, 0) = e−x .
90 Partial Differential Equations
s

(x,t)

s) τ=
− x+
3 (t △(x,t) 3
2 (t
x−
2

τ= s)

(x − 32 t, 0) (x + 23 t, 0) τ

FIGURE 2.18
△(x,t).

(a) Find the solution.


(b) Determine the region where the solution is uniquely defined when 0 ≤ x ≤ 4.

For (a) the solution is given with


 x+3t
1 1
u(x,t) = [(x + 3t) + (x − 3t)] + e−s ds
2 6 x−3t
 t x+3(t−s)
1
+ 2 2 +1
dτds
0 x−3(t−s) s
 t
1 t −s
= x + e−(x−3t) − e−(x+3t) + 12

2
ds
6 0 s +1
1
= x + e−(x−3t) − e−(x+3t) + 12t tan−1 (t) − 6 ln(t 2 + 1).

6
On the other hand, for (b) the initial data is prescribed at t = 0, and 0 ≤ x ≤ 4, and
so we need to work with the two points (0, 0) and (4, 0). The characteristic lines are
x − 3t = c1 , and x + 3t = c2 .
At (0, 0), we have c1 = c2 = 0 and at (4, 0) we have c1 = c2 = 4. Thus, the region
of existence and uniqueness of the solution is the region bounded by the four lines,
x − 3t = 0, x + 3t = 0, x − 3t = 4, and x + 3t = 4,
as shown in Fig. 2.19. □

2.6.1 Exercises
Exercise 2.53 Solve
utt − 4uxx = 0, u(x, 0) = sin(x), ut (x, 0) = cos(x).
Wave Equation and D’Alembert’s Solution 91

(2, 23 )
x=
−3t
t +4
x =3

(4, 0) x
x=
−3t
t +4
x =3
(2, − 23 )

FIGURE 2.19
Region for existence and uniqueness.

Exercise 2.54 At points in space where no sources are present the spherical wave
equation satisfies
2 
utt = c2 urr + ur . (2.94)
r
Equation (2.94) is obtained by writing the homogeneous wave equation in spher-
ical coordinates r, θ , φ and neglecting the angular dependence. Assume the initial
functions
u(r, 0) = f (r), ut (r, 0) = g(r).
Make the change of variables v = ru to transform (2.94) into the equation in v:
vtt = c2 vrr .
Solve for v and then find the general solution of (2.94) subject to the initial data.
Exercise 2.55 Solve

utt − 4uxx = cos(x) sin(t), u(x, 0) = x, ut (x, 0) = sin(x).

Exercise 2.56 Consider


36
2utt − 18uxx = , u(x, 0) = x2 , ut (x, 0) = e−2x .
t2 + 1
(a) Find the solution.
(b) Determine and sketch the region where the solution is uniquely defined when
0 ≤ x ≤ 6.
Exercise 2.57 [Semiinfinite string with a fixed end] Consider a semiinfinite vibrating
string with a fixed end, that is

utt − c2 uxx = 0, x > 0 t > 0,


92 Partial Differential Equations

subject to
u(x, 0) = f (x), ut (x, 0) = g(x), x ≥ 0,
u(0,t) = 0, t ≥ 0.
Show that its solution is given by
 x+ct
1 1
u(x,t) = [ f (x + ct) + f (x − ct)] + g(s)ds, for x > ct,
2 2c x−ct
 x+ct
1 1
u(x,t) = [ f (x + ct) − f (ct − x)] + g(s)ds, for x < ct.
2 2c ct−x
Exercise 2.58 Find the solution for

utt − 4uxx = 0, x > 0, t > 0,

subject to
u(x, 0) = | sin(x)|, x>0
ut (x, 0) = 0, x ≥ 0,
u(0,t) = 0, t ≥ 0.
Exercise 2.59 Construct the solution for the wave problem with zero initial velocity

utt − uxx = 0, ut (x, 0) = 0,



1, |x| ≤ 1
u(x, 0) =
0, |x| > 1
for
(a) 0 < t < 1.
(b) t > 1.
Exercise 2.60 Construct the solution for the wave problem with zero initial displace-
ment.
utt − uxx = 0, u(x, 0) = 0,
1 − x2 ,

|x| < 1
ut (x, 0) =
0, |x ≥ 1.
Exercise 2.61 Construct the solution for the wave problem with zero initial velocity.

utt − uxx = 0, ut (x, 0) = 0,



1 − |x|, |x| < 1
u(x, 0) =
0, |x ≥ 1.
Exercise 2.62 Solve

utt − uxx = 1, u(x, 0) = sin(x), ut (x, 0) = x.


Wave Equation and D’Alembert’s Solution 93

Exercise 2.63 Assume that u(x,t)

utt − uxx = 0, 0 < x < l, t ≥ 0,

with Robin boundary conditions

u(0,t) − ux (0,t) = 0, and u(l,t) + ux (l,t) = 0.

Show that the energy function


 l
1 1 1
ut2 (x,t) + u2x (x,t) dx + u2 (0,t) + u2 (l,t)

E(t) =
2 0 2 2
d
is conserved. That is dt E(t) = 0.

2.6.2 Vibrating string with fixed ends


Obtaining the solution of a vibrating string with fixed ends is more complicated than
solving a wave equation or an infinite string due to the repeated reflection of waves
from boundaries. We want to find the solution of


 utt − c2 uxx = 0, 0 < x < l, t > 0
u(x, 0) = f (x), 0≤x≤l

(2.95)

 u t (x, 0) = g(x), 0≤x≤l
u(0,t) = 0 = u(l,t), t ≥ 0.

As before, the characteristic lines are given by

ξ = x + ct, η = x − ct.

From (2.78), we have

u(x,t) = F(x + ct) + G(x − ct), (2.96)

where the functions F and G are arbitrary and differentiable. Equation (2.96) is valid
and over the domain

0 ≤ x + ct ≤ l, and 0 ≤ x − ct ≤ l.

Moreover, the solution is uniquely determined by the initial data in the the re-
gion
x l −x
t≤ , t≤ , t ≥ 0.
c c
For the fixed end u(0,t) = 0, t ≥ 0, we have

0 = u(0,t) = F(ct) + G(−ct). (2.97)

Letting ζ = −ct, we obtain from (2.97) that

G(ζ ) = −F(−ζ ), ζ ≤ 0. (2.98)


94 Partial Differential Equations

Equation (2.98) extends the range of G to negative values and can then be used to
do the same for F. If we apply the initial data to (2.96), then it was obtained from
Section 2.6 that
  
1 1 ξ
F(ξ ) = f (ξ ) + g(s)ds + K , 0 ≤ ξ = x + ct ≤ l (2.99)
2 c 0
  
1 1 η
G(η) = f (η) − g(s)ds − K , 0 ≤ η = x − ct ≤ l (2.100)
2 c 0
Using (2.98) in combination with (2.99) and (2.100) we arrive at by setting G(ζ ) =
−F(−ζ ) that
 
1 −ζ
   
1 1 ζ 1
f (ζ ) − g(s)ds = − f (−ζ ) + g(s)ds
2 c 0 2 c 0
  
1 1 ζ
= − f (−ζ ) − g(−s)ds .
2 c 0
By comparing both sides of the above expression, we immediately see that (2.98) is
satisfied when
f (ζ ) = − f (−ζ ), and g(ζ ) = −g(−ζ ).
In other words, we must extend the functions f and g to be odd functions with respect
to x = 0.
Now we turn our attention to the boundary condition 0 = u(l,t). As before, we
have
0 = u(l,t) = F(l + ct) + G(l − ct).
Letting ζ = l + ct in the above equation we get

F(ζ ) = −G(2l − ζ ), ζ ≥ l. (2.101)

This equation extends the range of F to positive values l ≤ ζ ≤ 2l. As before, setting
F(ζ ) = −G(2l − ζ ), we arrive at
 
1 2l−ζ
   
1 1 ζ 1
f (ζ ) + g(s)ds = − f (2l − ζ ) − g(s)ds
2 c 0 2 c 0
  
1 1 ζ
= − f (2l − ζ ) + g(2l − τ)dτ .
2 c 2l
By comparing both sides of the above expression, we immediately arrive at
 ζ  ζ
f (2l − ζ ) = − f (ζ ), and g(s)ds = − g(2l − τ)dτ.
0 2l

Differentiating the second expression with respect to ζ , we obtain

g(2l − ζ ) = −g(ζ ).
Wave Equation and D’Alembert’s Solution 95

These conditions on f and g means that we can extend these functions to l ≤ ζ ≤ 2l


by performing an odd extension about x = l. In summary, the conditions on f and g
are 
f (x) = − f (−x), −l ≤ x ≤ 0
(2.102)
f (2l − x) = − f (x), l ≤ x ≤ 2l,
and 
g(x) = −g(−x), −l ≤ x ≤ 0
(2.103)
g(2l − x) = −g(x), l ≤ x ≤ 2l.
To make some sense out of (2.102) and (2.103), we notice that

f (x + 2l) = − f (2l − (x + 2l)) = − f (−x) = f (x).

A similar situation occurs for the function g. Thus, f and g need to be periodic odd
extensions of the original functions with period 2l. Now we try to piece the solution
together.
Let f p and g p denote the odd extensions of 2l-periodic of f and g, respectively.
Then,
 
f (x), 0 < x < l, g(x), 0 < x < l,
f p (x) = g p (x) =
− f (−x), −l < x < 0, −g(−x), −l < x < 0

Consider the wave problem on the whole real line with the extended initial
data

 vtt − c2 uxx = 0, −∞ < x < ∞, t > 0
v(x, 0) = f p (x), 0≤x≤l
vt (x, 0) = g p (x), 0 ≤ x ≤ l.

With this set up we automatically have v(0,t) = v(l,t) = 0, and the restriction

u(x,t) = v(x,t) 0≤x≤l

will solve (2.95).


Finally, using the D’Alembert solution, we see that
 x+ct
1 1
u(x,t) = [ f p (x + ct) + f p (x − ct)] + g p (s)ds, 0 < x < l. (2.104)
2 2c x−ct

Next, we discuss periodic odd extension, and for more on the subject we refer to Ap-
pendix A. Suppose we have a function f that is piecewise continuous on the interval
(0, l). We define the Fourier sine series of f by

nπx
f (x) = ∑ bn sin( ), 0 < x < l, (2.105)
n=1 l
96 Partial Differential Equations

where the coefficients bn , n = 1, 2, . . . , are constants. To determine the coefficients


bn we make use of the orthogonality property
 l 
mπx nπx 0, m ̸= n,
sin( ) sin( )dx = l
0 l l 2 , m = n.
With this in mind, by multiplying both sides of (2.105) by sin( mπx
l ) and integrating
from x = 0 to x = l, we arrive at
 l
2 nπx
bn = f (x) sin( )dx, (2.106)
l 0 l
where we have assumed that term-by-term integration is valid. It can be shown that
if
f (0) = 0, and f (π) = 0,
then series (2.105) converges uniformly to f (x) for all 0 < x < l. It is clear that the
series converges to zero when x = 0 and when x = l. Moreover, the series in (2.105)
converges to the odd periodic extension, with period 2l, of f for all values of x. We
have the following example.
Example 2.24 Let
f (x) = x, 0 < x < π.
Then,
 
2 π 2 h x cos(nx) π 1 π i
bn = x sin(nx)dx = − + cos(nx)dx
π 0 π n 0 n 0
(−1)n+1
= 2 , n = 1, 2, . . .
n
Thus,

(−1)n+1
fp = 2 ∑ sin(nx), 0 < x < π. (2.107)
n=1 n
It can be shown that this series converges to f (x) = x when 0 < x < π. □
Example 2.25 Consider


 utt = uxx , 0 < x < π, t > 0
u(x, 0) = x, 0≤x≤π


 ut (x, 0) = 0, 0≤x≤π
u(0,t) = 0 = u(π,t), t ≥ 0.

Using (2.104) and (2.107), we arrive at


1
u(x,t) = [ f p (x + ct) + f p (x − ct)]
2
1 ∞ (−1)n+1  
= ∑ 2 sin(n(x + t)) + sin(n(x − t))
2 n=1 n

Clearly, the above solution satisfies u(0,t) = 0, u(π,t) = 0. □


Wave Equation and D’Alembert’s Solution 97

Example 2.26 Consider




 utt = uxx , 0 < x < π, t > 0
u(x, 0) = 0, 0≤x≤π


 ut (x, 0) = x, 0≤x≤π
u(0,t) = 0 = u(π,t), t ≥ 0.

Using (2.104) and (2.107), we arrive at


 x+t
1
u(x,t) = g p (s)ds,
2 x−t
 x+t

(−1)n+1
= ∑ n sin(ns)ds
n=1 x−t

(−1)n+1 h cos(ns) x+t i



= ∑ −
n=1 n n x−t

(−1)n+1 h i
= ∑ 2
− cos(n(x + t)) + cos(n(x − t))
n=1 n

Clearly, the above solution satisfies u(0,t) = 0, u(π,t) = 0. □

2.6.3 Exercises
Exercise 2.64 Consider


 utt = uxx , 0 < x < π, t > 0
u(x, 0) = x3 , 0≤x≤π


 ut (x, 0) = 0, 0≤x≤π
u(0,t) = 0 = u(π,t), t ≥ 0.

(a) Show that



(nπ)2 − 6
f p (x) = 2 ∑ (−1)n+1 sin(nx), 0 < x < π.
n=1 n3

(b) Find the solution u(x,t).


Exercise 2.65 Consider


 utt = uxx , 0 < x < π, t > 0
u(x, 0) = 1, 0≤x≤π


 ut (x, 0) = 0, 0≤x≤π
u(0,t) = 0 = u(π,t), t ≥ 0.

98 Partial Differential Equations

(a) Show that


4 ∞ sin(2n − 1)x
f p (x) = ∑ 2n − 1 , 0 < x < π.
π n=1

(b) Find the solution u(x,t).


Exercise 2.66 Consider


 utt = uxx , 0 < x < π, t > 0
u(x, 0) = π − x, 0≤x≤π


 ut (x, 0) = 0, 0≤x≤π
u(0,t) = 0 = u(π,t), t ≥ 0.

(a) Show that



sin(nx)
f p (x) = 2 ∑ , 0 < x < π.
n=1 n

(b) Find the solution u(x,t).


Exercise 2.67 Consider


 utt = uxx , 0 < x < 1, t > 0
u(x, 0) = 1, 0≤x≤1


 ut (x, 0) = 0, 0≤x≤1
u(0,t) = 0 = u(1,t), t ≥ 0.

(a) Show that


2 ∞ (−1)n+1
f p (x) = ∑ n sin(nπx), 0 < x < 1.
π n=1

(b) Find the solution u(x,t).


Exercise 2.68 Consider


 utt = uxx , 0 < x < 1, t > 0
u(x, 0) = x(1 − x2 ), 0≤x≤1


 ut (x, 0) = 0, 0≤x≤1
u(0,t) = 0 = u(1,t), t ≥ 0.

(a) Show that


12 ∞ (−1)n+1
f p (x) = ∑ n3 sin(nπx), 0 < x < 1.
π 3 n=1

(b) Find the solution u(x,t).


Exercise 2.69 Solve


 utt = uxx , 0 < x < 1, t > 0
u(x, 0) = 0, 0≤x≤1


 ut (x, 0) = x(1 − x2 ), 0≤x≤1
u(0,t) = 0 = u(1,t), t ≥ 0.

Heat Equation 99

Exercise 2.70 Solve




 utt = 4uxx , 0 < x < 1, t > 0
u(x, 0) = 0, 0≤x≤1


 ut (x, 0) = x(1 − x), 0≤x≤1
u(0,t) = 0 = u(1,t), t ≥ 0.

2.7 Heat Equation


We consider the heat conduction problem of a thin rod and look at the solution. Part
of our presentation is inspired and influenced by Strauss [11]. Let u(x,t) represent
the temperature at position x in a thin insulated rod at time t. The vertical axis will
measure the temperature, and the rod will be positioned along the x-axis in the xu-
coordinate system. As in the wave equation, we consider a small portion of the rod
over an interval [a, b]. The heat or thermal energy of the rod situated at the interval
[a, b] is given by
 b
D(x,t) = cρudx,
a
where c denotes the specific heat capacity of the material of the rod and ρ is the mass
density of the rod. The instantaneous change of the heat with respect to time will be
the time derivative of the above equation
 b  b
d d
D(x,t) = cρudx = cρut dx. (2.108)
dt dt a a

The above expression is true since a and b are constants and u is continuous. Recall
that the thin rod is insulated. The change in heat must be balanced by the heat flux
across the cross-section of the cylindrical piece around the interval [a, b], as the heat
cannot be gained or lost in the absence of an external heat source. Fourier’s law
states that the heat flux across the boundary will be inversely proportional to the
temperature derivative in the direction of the boundary’s outward normal, in this
instance the x-derivative. The second way to compute the time rate of change of D is
to notice that, in the Absence of heat sources within the rod, the quantity of heat in
u can change only through the flow of heat across the boundaries of u at x = a and
x = b. The heat flux through a section of the rod is called the heat flux through the
section.
Let κ denote the thermal conductivity of the rod. Recall that the thermal conductivity
of a material is a measure of its ability to conduct heat. Then the heat flux into u at
x = a and x = b is
−κux (a,t), and κux (b,t),
100 Partial Differential Equations

respectively. Thus, the total time rate of change of D is the sum of the rates at the two
ends. Using the fundamental theorem of calculus, we may write
 b
d h i
D(x,t) = κ ux (b,t) − ux (a,t) = κ uxx dx. (2.109)
dt a

A quick comparison of equations (2.108) and (2.109) yields


 b  b
cρut dx = κ uxx dx,
a a

or  b 
cρut − κuxx dx = 0.
a
Since the above integral must hold for all x ∈ [a, b] with a < b we must have

cρut − κuxx = 0,
κ
throughout the material. Dividing by cρ and setting k = cρ , we arrive at the one-
dimensional heat equation
ut − kuxx = 0, (2.110)
where k is called the thermal diffusivity of the material.
If we consider the heat equation in (2.110) on an interval I ∈ R, then we have the
heat problem, with initial and boundary conditions

 ut = kuxx , x ∈ I, t > 0
u(x, 0) = f (x), x∈I (2.111)
u satisfies certain BCs

Here, the initial condition u(x, 0) = f (x), means the lateral surface of the rod is insu-
lated and parallel to the x-axis, and its initial temperatures are f (x) for x ∈ I.
In practice, the most common boundary conditions are the following:

• u(0,t) = 0 = u(l,t) : I = (0, l), Dirichlet . It is the case when both faces of the
rod are kept at temperature zero.

• ux (0,t) = 0 = ux (l,t) : I = (0, l), Neumann . It is the case when both faces of
the rod are insulated.

• ux (0,t) − a0 u(0,t) = 0 and ux (l,t) + al u(l,t) = 0 : I = (0, l), Robin .

• u(−l,t) = u(l,t) = 0 and ux (−l,t) = ux (l,t) = 0 : I = (−l, l), Periodic .
Before we attempt to find the solution to the heat equation, we prove the uniqueness
of the solution of the nonhomogeneous heat equation. We do so by defining an energy
function V and show along the solutions of the heat equation, the energy function is
Heat Equation 101

nonnegative and its derivative is less or equal to zero. We begin by considering the
nonhomogeneous heat equation with initial and boundary conditions

 ut − kuxx = f (x,t), 0 ≤ x ≤ l, t > 0
u(x, 0) = φ (x), 0 ≤ x ≤ l, (2.112)
u(0,t) = g(t), u(l,t) = h(t),

for given functions f , φ , g, h.


Theorem 2.4 The heat equation given by (2.112) has at most one solution.

Proof Assume (2.112) has two solutions u and v. Set w = u − v. Then

w(x, 0) = u(x, 0) − v(x, 0) = φ (x) − φ (x) = 0,

w(0,t) = u(0,t) − v(0,t) = g(t) − g(t) = 0,


w(l,t) = u(l,t) − v(l,t) = h(t) − h(t) = 0.
Moreover,

wt − kwxx = (u − v)t − k(u − v)xx



= ut − kuxx − vt − kvxx
= f (x,t) − f (x,t)
= 0.

Thus, we arrive at the homogeneous heat equation in w,



 wt − kwxx = 0, 0 ≤ x ≤ l, t > 0
w(x, 0) = 0, 0 ≤ x ≤ l, (2.113)
w(0,t) = 0, w(l,t) = 0, t > 0.

Define the energy function


 l
1
V [w](t) = [w(x,t)]2 dx. (2.114)
2 0

Then it is clear that


 l  l
1 2 1
V [w](0) = [w(x, 0)] dx = [0]2 dx = 0,
2 0 2 0

and V [w](t) is positive for t > 0. To obtain any meaningful information from the
energy function, we must show it is decreasing in time along the solutions of (2.113).
Thus, using the first equation in (2.113) we arrive at
 l  l
d
V [w](t) = w(x,t)wt (x,t)dx = k w(x,t)wxx (x,t)dx.
dt 0 0
102 Partial Differential Equations

An integration by parts yields


l 
d l 2
V [w](t) = kw(x,t)wx (x,t) 0 − k wx (x,t) dx
dt 0
 l
  2
= k w(l,t)wx (l,t) − w(0,t)wx (0,t) − k wx (x,t) dx
0
 l 2
= −k wx (x,t) dx ≤ 0.
0

Since the energy function V is decreasing, we get

0 ≤ V [w](t) ≤ V [w](0) = 0.

Hence,  l
1
V [w](t) = [w(x,t)]2 dx = 0, for all t ≥ 0,
2 0
which implies w ≡ 0, for all x ∈ [0, l], t > 0. Wherefore, u−v = 0 for all x ∈ [0, l], t >
0. This shows
u = v for all x ∈ [0, l], t > 0.
This completes the proof.

Next we extend Theorem 2.4 to show uniqueness of solution for the heat equation on
R. Consider the heat equation

 ut − kuxx = f (x,t),
 −∞ ≤ x ≤ ∞, t > 0
u(x, 0) = φ (x), −∞ ≤ x ≤ ∞, (2.115)
 lim u = 0, lim ux = 0,
 t > 0,
x→±∞ x→±∞

for a given function φ .


Theorem 2.5 The heat equation given by (2.115) has at most one solution.

Proof Assume (2.115) has two solutions u and v. Set w = u − v. Then by similar
arguments as in the proof of Theorem 2.4 w is a solution to the homogeneous heat
equation

 wt − kwxx = 0,
 −∞ ≤ x ≤ ∞, t > 0
w(x, 0) = 0, −∞ ≤ x ≤ ∞, (2.116)
 lim w = 0, lim wx = 0,
 t > 0.
x→±∞ x→±∞

Define the energy function for (2.116) by


 ∞
1
V [w](t) = [w(x,t)]2 dx. (2.117)
2 −∞
Heat Equation 103

Then it is clear that  ∞


1
V [w](0) = [w(x, 0)]2 dx = 0,
2 −∞

and V [w](t) ≥ 0, for t ≥ 0. Moreover, using (2.117) we get


 in f ty  ∞
d
V [w](t) = w(x,t)wt (x,t)dx = k w(x,t)wxx (x,t)dx.
dt −∞ −∞

An integration by parts yields


∞ 
d x=∞ 2
V [w](t) = kw(x,t)wx (x,t) x=−∞ − k wx (x,t) dx
dt −∞
 ∞
2
= −k wx (x,t) dx ≤ 0.
−∞

Since the energy function V is decreasing, we get

0 ≤ V [w](t) ≤ V [w](0) = 0.

Hence,  ∞
1
V [w](t) = [w(x,t)]2 dx = 0, for all t ≥ 0,
2 −∞

which implies w ≡ 0, for all x ∈ (−∞, ∞), t > 0. Consequently, u − v = 0 for all
x ∈ (−∞, ∞), t > 0. This shows

u=v for all x ∈ (−∞, ∞), t > 0.

This completes the proof.

The next theorem is about stability; it says that for a small change in the initial data,
only produces a small change in the solution.
Theorem 2.6 Let u∗ (x,t) be another solution of (2.115) with initial data g∗ . Define
the L2 norm of a function h as
 ∞ 1
2
||h||2 = h2 (x)dx .
−∞

Assume there is a small change in the initial data. That is, for small and positive ε
we have that ||g − g∗ ||2 < ε. Then,

||u(x,t) − u∗ (x,t)||2 < ε.

Proof The proof depends on the energy function. For simpler notation, we let

w(x,t) = u(x,t) − u∗ (x,t),


104 Partial Differential Equations

then w is a solution to the homogeneous heat problem



 wt − kwxx = 0,
 −∞ ≤ x ≤ ∞, t > 0
w(x, 0) = g(x) − g∗ (x), −∞ ≤ x ≤ ∞, (2.118)
 lim w = 0,
 lim wx = 0
x→±∞ x→±∞

Define the energy function V for (2.118) by (2.117). Then by Theorem 2.5, we have V
is decreasing along the solutions of (2.118) with V [w](t) ≥ 0, for t ≥ 0. Thus
 ∞
1
V [w](t) ≤ V [w](0) = [w(x, 0)]2 dx.
2 −∞

Accordingly, we have
1 1
||w||22 = V [w] ≤ ||w(x, 0)||22 ,
2 2
or
1 1
||u − u∗ ||22 ≤ ||g − g∗ ||22 ,
2 2
which implies that
||u − u∗ ||2 ≤ ε, t ≥ 0.
We have established stability for all t ≥ 0 in terms of the square error. This completes
the proof.

2.7.1 Solution of the heat equation


Finding the solution of the heat equation using the method of characteristics as we
did for the wave equation will not be of much success. To see this, we consider the
heat equation over R,
ut − kuxx = 0. (2.119)
Then we have A = k, B = C = 0. Using (2.65) we obtain

dt B ± B2 − AC
= = 0.
dx A
Wherefore, we only have one characteristic line given by

t = c,

for some constant c. Remember, we should be able to trace any point in the xt plane
along the characteristic lines, which is not the case here since they are parallel to the
x-axis.
Our aim then is to find another approach to establishing a bounded solution of the
heat equation on an unbounded domain. We consider the heat problem with initial
condition 
ut − kuxx = 0, −∞ < x < ∞, t > 0
(2.120)
u(x, 0) = φ (x), −∞ < x < ∞.
Heat Equation 105

To arrive at the solution of (2.120), we begin by considering simple form of the initial
condition. In particular, we first derive the solution of the heat problem with initial
condition of the form

Vt − kVxx = 0, −∞ < x < ∞, t > 0
(2.121)
V (x, 0) = H(x),

where H(x) is the Heavyside step function defined by



1, x > 0
H(x) = (2.122)
0, x < 0.
The next lemma is needed in our future work.
Lemma 1 For x ∈ R,  √

2 π
e−x dx = .
0 2

Proof Let  ∞
2
I= e−x dx.
0
Then, for y ∈ R,
 ∞  ∞  ∞ ∞
−x2 −y2 2 +y2 )
2
I = I ·I = e dx e dy = e−(x dxdy.
0 0 0 0

We switch to polar coordinates by letting

x = r cos(θ ), y = r sin(θ ).

Then,
 π/2  ∞
2
I 2
= e−r rdr dθ
0 0
 π/2  ∞
2
e−r rdr dθ

=
0 0
 π/2
1 2 ∞ 
= − e−r 0 dθ
0 2
 π/2
1 π
= dθ = .
0 2 4
Taking the square root on both sides we arrive at
 ∞ √
−x2 π
I= e dx = .
0 2
This completes the proof.

In the next lemma we explore the invariance properties of the heat equation.
106 Partial Differential Equations

Lemma 2 [Invariance properties of the heat equation] The heat equation (2.119) is
invariant under these transformations.
(a) If u(x,t) is a solution of (2.119), then so is u(x − z,t) for any fixed z. (Spatial
translation)
(b) If u(x,t) is a solution of (2.119), then so are ux , ut , uxx , and so on. (Differentia-
tion)
(c) If u1 , u2 , . . . , un are solutions of (2.119), then so is ∑ni=1 ci ui for any constants
c1 , c2 , . . . , cn . (Linear combinations)
(d) If S(x,t) solves (2.119), then so is
 ∞
S(x − y,t)g(y)dy
−∞

for function g as long as the integral converges.



(e) If u(x,t) is a solution of (2.119), then so is u( ax, at) for any constant a > 0.
(Dilation, or scaling)

Proof The proof of parts (a)–(c) are straightforward and we refer to Exercise 2.71.
To prove (d), we assume a finite interval [−b, b] partitioned by points {yi }ni=1 such
that −b = y1 < y2 < · · · < yn = b with equal length ∆y. Then using (c) combined with
representing the integral with the Riemann sum we may write
 ∞  b n
S(x − y,t)g(y)dy = lim S(x − y,t)g(y)dy = lim lim ∑ S(x − y,t)g(yi )∆y.
−∞ b→∞ −b b→∞ n→∞ i=1

v(x,t) = u( ax, at). Then,
As for the√proof of (d), we√make√use of the chain rule. Let √
vt = aut ( ax, at), vx = aux ( ax, at), and vxx = auxx ( ax, at). Substituting into
(2.120) we arrive at
√ √
aut ( ax, at) − kauxx ( ax, at) = 0,

or √ √
ut ( ax, at) − kuxx ( ax, at) = 0.
This completes the proof.

Now we are in a good position to solve (2.120). As we have mentioned before, we


will find a particular solution of (2.121) and then make use of it to obtain the solution
of the heat problem given in (2.120). First we note that H(x) is invariant under the
dilation, since


 
1, √ax > 0 1, x > 0
H( ax) = = = H(x),
0, ax < 0 0, x < 0
as a > √0. Due to (e) and the fact that H is invariant under the dilation, we know
that V ( ax, at) solves (2.121). Due to the uniqueness of solutions, we must have
Heat Equation 107

V (x,t), for all x ∈ R and t > 0. Thus, V is invariant under the dila-
V ( ax, at) = √
tion (x,t) → ( ax, at). Since our goal is to transfer (2.121) to an ODE, we want to
eliminate one of the variables in V, which can be easily achieved by letting a = 1t .
Then
x 1 x
V (x,t) = V ( √ , t) = V ( √ , 1).
t t t
Let q be a function such that
x
V (x,t) = q( √ ).
t
Thus, V is completely determined by the function of one variable q. Then,
x x 1 x
qt = − q′ ( √ ), qxx = q′′ ( √ ).
3
2t 2 t t t
This reduces the PDE into,
x x k x
Vt − kVxx = − q′ ( √ ) − q′′ ( √ ) = 0,
3
2t 2 t t t
which implies that
x x x
− q′ ( √ ) − kq′′ ( √ ) = 0.
1
2t 2 t t
x
Setting, z = √ , the above second-order differential equation takes the form
t
z ′
q′′ (z) + q (z) = 0.
2k
Let G(z) = q′ (z), then the above equation reduces to the first-order differential equa-
tion
z
G′ (z) + G(z) = 0,
2k
which has the solution (see Chapter 1)
z2
G(z) = q′ (z) = Ce− 4k .

Integrating from 0 to z gives


 z
s2
q(z) = C e− 4k ds + D,
0

for unknown constants C and D. This yields


 √x
x t s2
V (x,t) = q( √ ) = C e− 4k ds + D. (2.123)
t 0

Note that by making a substitution in Lemma 1, one can show that


 ∞ √
s2
e− 4k ds = kπ,
0
108 Partial Differential Equations

which we will need to compute the constants in (2.123). Since (2.123) is only valid
for t > 0, so to check the initial condition in (2.121) we take the limit at t → 0+ .
Additionally, we observe that

x ∞, x > 0
lim √ =
t→0+ t −∞, x < 0.
Thus, for x > 0, we have that
 ∞ √
s2
1 = lim V (x,t) = C e− 4k ds + D = C kπ + D.
t→0+ 0

On the other hand, for x < 0, we obtain


 −∞ √
s2
0 = lim V (x,t) = C e− 4k ds + D = −C kπ + D.
t→0+ 0

By solving the system of equations


√ √
1 = C kπ + D, 0 = −C kπ + D,

we obtain
1 1
C= √ , D= .
4kπ 2
Plugging the constants into (2.123), we arrive at
 √x
1 t s2 1
V (x,t) = √ e− 4k ds + .
4kπ 0 2
We try to put V in terms of the error function that we define below.
Definition 2.5 The error function is the following improper integral considered as a
real function er f : R → R, such that
 x
2 2
er f (x) = √ e−z dz,
π 0

where exponential is the real exponential function. In addition the complementary


error function,
 0
2 2
er f c(x) = 1 − er f (x) = √ e−z dz.
π x

Let z = √s . Then
4k
 √x  √x
1 t 2
− s4k 1 4kt 2 1 x
√ e ds = √ e−z dz = er f ( √ ),
4kπ 0 π 0 2 4kt
Hence the unique particular solution of (2.121) is given by
 √x
1 4kt 2 1
V (x,t) = √ e−z dz + , (2.124)
π 0 2
Heat Equation 109

and in terms of the error function,


1 1 x
V (x,t) = + er f ( √ ). (2.125)
2 2 4kt
Solving (2.120).
Now, we attempt to find the solution for the general heat problem given by (2.120).
Define the function

∂V
S(x,t) =
(x,t). (2.126)
∂x
By the invariance property (a), S(x,t) solves (2.120). Therefore, by the invariance
property (b),  ∞
u(x,t) = S(x − y,t)φ (y)dy, for t > 0 (2.127)
−∞

solves (2.120). We must show u(x,t) given by (2.127) is the unique solution of
(2.120). This can be accomplished by showing it satisfies the initial condition. Uti-
lizing (2.126), we can write u as follows
 ∞  ∞
∂V ∂
u(x,t) = (x − y,t)φ (y)dy = − [V (x − y,t)]φ (y)dy.
−∞ ∂ x −∞ ∂ y

We note that S(x − y,t) decays exponentially as y − x grows larger. For now, we
assume
φ (±∞) = 0,
so we may perform an integration by parts on the above integral. That is
 ∞  ∞
V (x − y,t)φ ′ (y)dy = V (x − y,t)φ ′ (y)dy.

u(x,t) = −V (x − y,t)φ (y) −∞ +
−∞ −∞

Setting t = 0, and noticing that V (x − y, 0) is H(x) we obtain


 ∞  x
x
V (x − y, 0)φ ′ (y)dy = φ ′ (y)dy = φ (y)

u(x, 0) = = φ (x).
−∞ −∞ −∞

Thus, u(x,t) does satisfy the initial condition of (2.120). Left to compute S(x,t)
which can be easily done from (2.126) with V given by (2.124). That is, using (2.124)
we obtain
∂V 1 x2
S(x,t) = (x,t) = √ e− 4kt . (2.128)
∂x 4πkt
Finally, substituting S given by (2.128) into (2.127), we have the explicit form of the
solution of (2.120)
 ∞
1 (x−y)2
u(x,t) = √ e− 4kt φ (y)dy, for t > 0. (2.129)
4πkt −∞
110 Partial Differential Equations

The function S(x,t) is known as the heat kernel. Other resources may refer to it as
fundamental solution, source function, Green’s function, or propagator of the heat
equation
We present the following example.
Example 2.27 Consider the heat problem

ut − 4vxx = 0, −∞ < x < ∞, t > 0
u(x, 0) = φ (x), −∞ < x < ∞,

with initial data 


1 − |x|, |x| < 1
φ (x) =
0, |x| ≥ 1
We are searching for a bounded solution u(x,t). Substituting the initial data into
(2.129) yields
∞
1 (x−y)2
u(x,t) = √ e− 4kt φ (y)dy
4πkt −∞
  1
1 h 0 − (x−y)2 (x−y)2
i
= √ e 16t (1 + y)dy + e− 16t (1 − y)dy .
16πt −1 0

We make the change of variables


y−x
s= √
16t
and transform the above integrals to obtain
 √−x
1 h 16t 2 √
u(x,t) = √ e−s (1 + x + 4s t)ds
π 1+x
−√
16t
 x−1
−√
16t 2 √ i
+ e−s (1 − x − 4s t)ds .
√−x
16t

2
The integrals with integrands in which e−s is not multiplied by a term that has an s
in it can be written in terms of the error function. The rest can be integrated out and
at the end, we end up with the solution
1  l +x  x 
v(x,t) = (1 + x) er f √ − er f √
2 16t 16t
1  x  x − 1 
+ (1 − x) er f √ − er f √
2 16t 16
√ √ 
2 t  − (x+1)2 x 2  2 t (x−1)2 x2

+ √ e 16t − e− 16t + √ e− 16t − e− 16t . (2.130)
π π

Heat Equation 111

We end this section by looking into the solution of the nonhomogenous heat problem
with initial condition,

ut − kuxx = f (x,t), −∞ < x < ∞, t > 0
(2.131)
u(x, 0) = φ (x), −∞ < x < ∞,

for given function f , and φ . The derivation of the solution of (2.131) depends on
Dumamel’s principle and we ask the reader to consult with [1]. The next theorem is
stated without proof.
Theorem 2.7 The heat equation given by (2.131) has the solution
 ∞  t ∞
u(x,t) = S(x − y,t)φ (y)dy + S(x − y,t − τ) f (y, τ)dydτ, for t > 0
−∞ 0 −∞
(2.132)
where S(x,t) is given by (2.128).

2.7.2 Heat equation on semi-infinite domain: Dirichlet condition


In the previous subsection, we considered the heat problem on R. Presently, we want
to make use of our previous results and obtain an explicit solution for the heat equa-
tion on the unbounded semi-infinite domain [0, ∞]. Now it makes sense to add a
boundary condition at x = 0, that is, u(0,t) = 0. Thus, we consider the heat problem
with Dirichlet boundary condition at the end point x = 0,

 vt − kvxx = 0, 0 < x < ∞, t > 0
v(x, 0) = φ (x), x>0 (2.133)
v(0,t) = 0, t > 0.

Our aim is to find a solution of (2.133) as we did for the heat problem over the entire
real line. There will be no need to start from scratch, but instead we will reintroduce
the problem over the entire real line by extending the initial data to the whole line.
Whatever method we use to extend to the negative half-line, we should make sure
that the boundary condition is automatically satisfied by the solution of the problem
on the whole line that arises from the extended data. For heat problems with Dirichlet
condition, one would choose the odd extension of the initial data φ (x). If ψ(x) is odd,
then ψ(x) = −ψ(−x), from which we get 2ψ(0) = 0, or ψ(0) = 0. This is true for
any odd function. We make it formal in the next lemma.
Lemma 3 Let f : (−∞, ∞) → R be an odd function ( f (x) = − f (−x)), that is con-
tinuous at x = 0 then f (0) = 0.

Proof Since f is conitnuous at x = 0 and odd, we have

f (0) = lim f (x) = − lim f (x) = lim f (x) = − lim f (x) = − f (0).
x→0+ x→0+ x→0− x→0−

this gives 2 f (0) = 0, or f (0) = 0. This completes the proof.


112 Partial Differential Equations

The next lemma assures us that if the initial data is odd, then the solution of the heat
equation over the real line is also odd.
Lemma 4 Let u(x,t) be the solution of the heat equation on −∞ < x < ∞. If the
initial data φ (x) = u(x, 0) is odd, then for all t ≥ 0, u(x,t) is an odd function of x.

Proof By (2.129), the solution is given by




1 (−x−y)2
u(−x,t) = √ e− 4kt φ (y)dy
4πkt −∞
 ∞
1 (−x−y)2
= −√ e− 4kt φ (−y)dy, (y 7→ −y)
4πkt −∞
 ∞
1 (−x+y)2
= −√ e− 4kt φ (y)dy
4πkt −∞
 ∞
1 (x−y)2
= −√ e− 4kt φ (y)dy
4πkt −∞
= −u(x,t).

This completes the proof.

Thus, by Lemmas 3 and 4, we see that if the initial data φ (x) is odd, then u(x,t) is
odd. Since
u(x,t) + u(−x,t)
solves the heat problem, we have 2u(0,t) = 0, and hence u(0,t) = 0 for any t > 0,
which is exactly the boundary condition in (2.133) in v. In summary, if one extends
the initial data to an odd function on the whole real line, then the solution with the ex-
tended initial data automatically satisfies the Dirichlet boundary condition of (2.133).
We have the following definition.
Definition 2.6 The odd extension of a function f (x) denoted by fo (x) is defined as

 f (x), x>0
fo (x) = − f (−x), x < 0 (2.134)
0, x = 0.

The odd extension fo is defined for negative x by reflecting the f (x) with respect to the
vertical axis, and then with respect to the horizontal axis. This procedure produces a
function whose graph is symmetric with respect to the origin, and thus it is odd.
For example. if f (x) = x, x > 0, then fo (x) = x for −∞ < x < ∞. In light of the above
discussion we recast the heat problem in (2.133) with extended data as

ut − kuxx = 0, −∞ < x < ∞, t > 0
(2.135)
u(x, 0) = φo (x).

We already know from (2.127) that the solution of (2.135) is given by


Heat Equation 113

 ∞
u(x,t) = S(x − y,t)φo (y)dy, for t > 0 (2.136)
−∞
Since v is a solution for x ≥ 0, we have v(x,t) = u(x,t) for x ≥ 0. Notice that

v(x, 0) = u(x, 0) = φo (x) = φ (x)

x>0 x>0

and v(0,t) = u(0,t) = 0, since u(x,t) is an odd function of x. Thus, v(x,t) satisfies the
boundary condition in (2.135). Substituting φo (x) into the solution given by (2.136),
we obtain by splitting the integral over two regions
 ∞  0
u(x,t) = S(x − y,t)φo (y)dy + S(x − y,t)φo (y)dy
0 −∞
 ∞  0
= S(x − y,t)φ (y)dy − S(x − y,t)φ (−y)dy.
0 −∞

Substituting y for −y in the second integral leads to


 ∞  ∞
u(x,t) = S(x − y,t)φ (y)dy − S(x + y,t)φ (y)dy.
0 0

Using (2.128)
1 x2
S(x,t) = √ e− 4kt ,
4πkt
and v(x,t) in the above expression, we may write the solution formula for (2.133) as
follows
 ∞h
1 (x−y)2 (x+y)2
i
v(x,t) = √ e− 4kt − e− 4kt φ (y)dy, for t > 0. (2.137)
4πkt 0

Example 1 Consider the heat equation in (2.133) with φ (x) = u0 for constant u0 .
Substituting φ (y) = u0 into the solution v(x,t) in (2.137), we obtain
 ∞h
u0 (x−y)2 (x+y)2
i
v(x,t) = √ e− 4kt − e− 4kt dy, for t > 0.
4πkt 0
x−y
Making the change of variable s = √
4kt
, we arrive at
 ∞  −∞
u (x−y)2 u0 2
√ 0 e− 4kt ds = − √ e−s ds
4πkt 0 π √x
4kt
 √x
u 4kt 2
= √0 e−s ds.
π −∞
x+y
Similarly, by letting s = √
4kt
, we arrive at
 ∞  ∞
u (x+y)2 u0 2
√ 0 e− 4kt ds = √ e−s ds.
4πkt 0 π √x
4kt
114 Partial Differential Equations

Thus,
 √x  ∞
u h 4kt 2 2
i
v(x,t) = √0 e−s ds − e−s ds
π −∞ √x
4kt

u h 0
2
 √x
4kt 2
= √0 e−s ds + e−s ds
π −∞ 0
 ∞  √x i
−s2 4kt 2
− e ds − e−s ds
0 0
 √x 
2 4kt 2 2
= u0 √ e−s ds (since e−s ds is even)
π 0
x 
= u0 er f √ .
4kt

In the heat problem (2.133), we considered the Dirichlet boundary condition v(0,t) =
0, and derived the solution given by (2.137). Currently, we are interested in finding
the solution for (2.133), but with a nonzero boundary condition. That is, we consider
the heat problem with a nonhomogenous boundary condition

 vt − kvxx = 0, 0 < x < ∞, t > 0
v(x, 0) = 0, x>0 (2.138)
v(0,t) = p(t), t > 0,

where the function p is differentiable. Our approach is to make a change of variables


and reduce (2.138) to a problem with homogenous boundary condition. To do so, we
let
u(x,t) = v(x,t) − p(t).
Then (2.138) is transformed into the heat problem

 ut − kuxx = −p′ (t),



0 < x < ∞, t > 0
u(x, 0) = −p(0), x>0 (2.139)
u(0,t) = 0, t > 0.

This is an nonhomogenous heat problem with Dirichelet boundary condition. There-


fore, the problem can be solved by recasting it over the real line by the concept of
odd extensions and then use (2.132) to obtain its solution. To stay compatible with
previous notation we let

φ (x) = −p(0), and f (x,t) = −p′ (t).

Then
 −p′ (t),
 
 −p(0), x>0 x>0
φo (x) = p(0), x<0 and fo (x,t) = p′ (t), x<0
0, x=0 0, x = 0.
 
Heat Equation 115

Then, by (2.132), the solution maybe written in terms of the heat kernel
 ∞  t ∞
u(x,t) = S(x − y,t)φ0 (y)dy + S(x − y,t − τ) fo (y, τ)dydτ, for t > 0.
−∞ 0 −∞
(2.140)
By substituting φo and fo into (2.140), we obtain the explicit solution
 ∞
1 (x−y)2 (x+y)2

u(x,t) = − √ e− 4kt − e− 4kt p(0)dy
4πkt 0
 t ∞  (x−y)2 (x+y)2 
1 − −
− p e 4k(t−τ) − e 4k(t−τ) p′ (τ)dydτ.
0 0 4πk(t − τ)
(2.141)

Finally, the solution of our original heat problem (2.138) is


v(x,t) = u(x,t) + p(t),
where u(x,t) is given by (2.141).

2.7.3 Heat equation on semi-infinite domain: Neumann condition


Now we turn our attention to addressing the one-dimensional heat equation on half-
line with the Neumann condition, ux (0,t) = 0. We employ a similar procedure as
in the previous subsection by extending the initial data to the negative half-line in
such a fashion that the boundary condition is automatically satisfied. In the case of
the Dirichlet condition, we used the odd extension to satisfy the initial data, whereas
here we use the notion of even extension. We begin by considering the heat problem
under Neumann condition at the end point x = 0,

 vt − kvxx = 0, 0 < x < ∞, t > 0
v(x, 0) = φ (x), x>0 (2.142)
vx (0,t) = 0, t > 0.

Our aim is to find a solution to (2.142) as we did for the heat problem over the entire
real line. There will be no need to start from scratch, but instead we will try to recast
the problem over the entire real line by extending the initial data to the whole line.
For heat problems with Neumann conditions, one would choose the even extension
of the initial data φ (x). Note that if ψ(x) is even, then ψ(x) = ψ(−x), from which
we get ψ ′ (x) = −ψ ′ (−x), and hence 2ψ ′ (0) = 0, or ψ ′ (0) = 0. This is true for any
even function. We make it formal in the next lemma.
Lemma 5 Let f : (−∞, ∞) → R be an even function ( f (x) = f (−x)), that is differ-
entiable at x = 0 then f ′ (0) = 0.

Proof Recall from Chapter 1 that the derivative of a function g is given by


g(x + h) − g(x)
g′ (x) = lim .
h→0 h
116 Partial Differential Equations

Thus, since f is differentiable and f (h) = f (−h), we have

f (h) − f (0) f (−h) − f (0)


f ′ (0) = lim = lim
h→0 h h→0 h
f (0) − f (−h)
= − lim = − f ′ (0).
h→0 h
This implies 2 f ′ (0) = 0, or f ′ (0) = 0. This completes the proof.

Now we state a similar lemma to Lemma 4 for even initial data.


Lemma 6 Let u(x,t) be the solution of the heat equation on −∞ < x < ∞. If the
initial data φ (x) = u(x, 0) is even, then for all t ≥ 0, u(x,t) is an even function of x.
As before, by Lemma 6, we have that if the initial data φ (x) is even, then u(x,t) is
even. Thus, by Lemma 5, ux (0,t) = 0 for any t > 0. Hence u automatically satisfies
the Neumann boundary condition at t = 0. We have the following definition.
Definition 2.7 The even extension of a function f (x) denoted by fe (x) is defined as

 f (x), x>0
fe (x) = f (−x), x<0 (2.143)
0, x = 0.

In light of the above discussion, we recast the heat problem in (2.142) with extended
data as 
 ut − kuxx = 0, −∞ < x < ∞, t > 0
u(x, 0) = φe (x) (2.144)
ux (0,t) = 0, t > 0.

We already know from (2.127) that the solution of (2.144) is given by


 ∞
u(x,t) = S(x − y,t)φe (y)dy, for t > 0. (2.145)
−∞

Since v is a solution for x ≥ 0, we have v(x,t) = u(x,t) for x ≥ 0. Notice that



v(x, 0) = u(x, 0) = φe (x) = φ (x)

x>0 x>0

and vx (0,t) = ux (0,t) = 0, since u(x,t) is an even function of x. Thus, v(x,t) satis-
fies the Neumann condition in (2.144). Substituting φe (x) into the solution given by
(2.145), we obtain by splitting the integral over two regions
 ∞  0
u(x,t) = S(x − y,t)φe (y)dy + S(x − y,t)φe (y)dy
0 −∞
 ∞  0
= S(x − y,t)φ (y)dy + S(x − y,t)φ (−y)dy.
0 −∞
Heat Equation 117

Making the change of variables y 7→ −y in the second integral leads to


 ∞  0
u(x,t) = S(x − y,t)φ (y)dy − S(x + y,t)φ (y)dy.
0 ∞

Using (2.128)
1 x2
S(x,t) = √ e− 4kt ,
4πkt
and v(x,t) in the above expression, we may write the solution formula for (2.142) as
follows
 ∞h
1 (x−y)2 (x+y)2
i
v(x,t) = √ e− 4kt + e− 4kt φ (y)dy, for t > 0. (2.146)
4πkt 0
We have the following example.
Example 2.28 Consider the heat equation given by (2.142) with φ (x) = u0 for con-
stant u0 . Substituting φ (y) = u0 into the solution v(x,t) in (2.146) we obtain
 ∞h
u0 (x−y)2 (x+y)2
i
v(x,t) = √ e− 4kt + e− 4kt dy, for t > 0.
4πkt 0
Making the change of variable s = √x−y 4kt
, and s = x+y

4kt
, in the first integral and the
second integral, respectively, we arrive at
 √x  ∞
u h 4kt 2 2
i
v(x,t) = √0 e−s ds + e−s ds
π −∞ √x
4kt

u h 0
2
 √x
4kt 2
= √0 e−s ds + e−s ds
π −∞ 0
 ∞  √x i
−s2 4kt 2
+ e ds − e−s ds
0 0
∞  
2 2 2
= u0 √ e−s ds (since e−s ds is even)
π 0

2 π
= u0 √ ( ), (by Lemma 1)
π 2
= u0 .

2.7.4 Exercises
Exercise 2.71 Prove parts (a)–(c) of Lemma 2.
Exercise 2.72 For constant u0 , write the solution in terms of the error function for
the heat problem

ut − kuxx = 0, −∞ < x < ∞, t > 0
u(x, 0) = φ (x), −∞ < x < ∞
118 Partial Differential Equations

where 
u0 , |x| < l
φ (x) =
0, |x| > l
Exercise 2.73 Consider the heat problem

 vt − kvxx = 0, 0 < x < ∞, t > 0
v(x, 0) = φ (x), x>0
v(0,t) = 0, t >0

with initial data 


0, 0<x<l
φ (x) =
1, x > l.
Show its solution u(x,t) can be written as

1 l +x  1 l −x 
v(x,t) = er f √ − er f √ .
2 4kt 2 4kt
Exercise 2.74 Solve the heat problem

ut − kuxx = 0, −∞ < x < ∞, t > 0
u(x, 0) = e−2|x| , −∞ < x < ∞.

Exercise 2.75 Solve



 ut − kuxx = 0, 0 < x < ∞, t > 0
u(x, 0) = φ (x), x>0
u(0,t) = 0, t >0

(a) 
1 − x, 0<x<1
φ (x) =
0, x≥1

(b)  2
xe−x , 0<x<1
φ (x) =
0, x ≥ 1.
Exercise 2.76 Provide all the details in obtaining (2.141).
Exercise 2.77 Consider the nonlinear heat equation

ut − kuxx + bu2x = 0 for − ∞ < x < ∞ and t > 0,

subject to an initial condition u(x, 0) = f (x). This form of PDE makes its presence
in stochastic optimal control theory. In this question you will derive a representation
formula for the solution u(x,t).
b
(a) Define the Cole-Hopf transformation w(x,t) = e− k u(x,t) . Show that w is a solution
of the linear heat equation wt − kwxx = 0.
Heat Equation 119

(b) Use the fundamental solution of the heat equation to solve for w(x,t).
(c) Invert the Cole-Hopf transformation to find a formula for u.
Exercise 2.78 Solve the heat equation ut −uxx = 0 on the entire real line −∞ < x < ∞
with initial condition u(x, 0) = cos(x) without using the fundamental solution of the
heat equation. Use your answer to deduce the value of the integral
 ∞
x2
cos(x)e− 4t dx.
−∞

Hint: Search for a solution of the form u(x,t) = h(t) cos(x).


Exercise 2.79 Solve

 ut − kuxx = 0, 0 < x < ∞, t > 0
u(x, 0) = 0, x>0
u(0,t) = p(t), t >0

(a) 
1, 0<t <1
p(t) =
0, t ≥ 1.

(b)
p(t) = 1, t > 0.
Exercise 2.80 Provide all the details in obtaining (2.130).
Exercise 2.81 For positive constant c, consider the heat equation with convection
term
ut + cux − kuxx = 0, (2.147)
(a) Determine the values of α and β so that the transformation

u(x,t) = v(x,t)eαx+βt

transforms (2.147) to the heat equation vt − kvxx = 0.


(b) Find the solution of (2.147) on −∞ < x < ∞, with initial condition u(x, 0) = φ (x).
(c) Find the solution of (2.147) on x > 0, with initial and boundary conditions
u(x, 0) = φ (x), and u(x, 0) = 0, t > 0.
Exercise 2.82 Let f : (−∞, ∞) → R be an even function ( f (z) = f (−z)), that is
differentiable. Use the definition of the derivative and show that f ′ (−x) + f ′ (x) = 0.
Exercise 2.83 Solve

 ut − kuxx = 0, 0 < x < ∞, t > 0
u(x, 0) = φ (x), x>0
ux (0,t) = 0, t >0

120 Partial Differential Equations

(a) 
1, 0<x<1
φ (x) = , (b) φ (x) = xe−ax .
0, x ≥ 1.
(c)
1 − x2 ,

0<x<1
φ (x) =
0, x ≥ 1.
Exercise 2.84 Consider the heat equation over R along with its solution given by
(2.129). Show that if the initial function φ (x) is uniformly bounded on R, then the
solution u(x,t) satisfies
|u(x,t)| ≤ max |φ (x)|.
−∞<x<∞
x−y
Hint: Make use of the substitution s = 4kt .

2.8 Wave Equation on Semi-Infinite Domain


We take advantage of the development of solutions of the heat equation on semi-
infinite domain with Dirichlet conditions or Neumann conditions, and establish the
solution of the wave equation on semi unbounded domain with Dirichlet condition
or Neumann condition. We begin by considering the wave equation of semi-infinite
string with a fixed end or Dirichlet condition


 utt − c2 uxx = 0, 0 < x < ∞, t > 0
u(x, 0) = f (x), x>0

(2.148)

 u t (x, 0) = g(x), x>0
u(0,t) = 0, t > 0.

Using the same concept as we did for the heat equation with Dirichlet condition,
we use the odd extensions fo (x) and go (x) of f (x) and g(x), respectively and solve
the wave equation on the whole real line with initial conditions u(x, 0) = fo (x) and
ut (x, 0) = go (x). In other words, we consider

utt − c2 uxx = 0,

−∞ < x < ∞, t > 0
(2.149)
u(x, 0) = fo (x), ut (x, 0) = go (x), x > 0.

D’Alembert formula gives


 x+ct
1 1
u(x,t) = [ fo (x + ct) + fo (x − ct)] + go (s)ds. (2.150)
2 2c x−ct

By Exercise 2.89, u(x,t) given in (2.150) is odd and so u(0,t) = 0. Now we try to
make some sense out of the solution in (2.150).
Remember that x > 0. We do this in two cases.
Wave Equation on Semi-Infinite Domain 121

(a) First, suppose that x > ct. Then x + ct ≥ 0 and x − ct ≥ 0, and so

fo (x + ct) = f (x + ct) and fo (x − ct) = f (x − ct).

In addition, on [x − ct, x + ct] we have go (s) = g(s). Thus if x > ct, we have from
(2.150) that
 x+ct
1 1
u(x,t) = [ f (x + ct) + f (x − ct)] + g(s)ds.
2 2c x−ct

(b) Second, if 0 < x < ct. Then x + ct ≥ 0 and x − ct < 0, and so

fo (x + ct) = f (x + ct) and fo (x − ct) = − f (ct − x).

Moreover, on [x − ct, 0] we have go (s) = −g(−s) and on [0, x + ct] we have


go (s) = g(s). Therefore if 0 < x < ct we have from (2.150) that
 0  x+ct
1 1 1
u(x,t) = [ f (x + ct) − f (ct − x)] + (−g(−s))ds + g(s)ds
2 2c x−ct 2c 0
 ct−x
1 1
= [ f (x + ct) − f (ct − x)] − g(s)ds
2 2c 0
 x+ct
1
+ g(s)ds (s 7→ −s)
2c 0

1 1 ct+x
= [ f (x + ct) − f (ct − x)] + g(s)ds.
2 2c ct−x

In summary, the solution of the half-line wave equation with Dirichlet boundary
condition is
 1 
1 x+ct
 2 [ f (x + ct) + f (x − ct)] + 2c x−ct g(s)ds, x ≥ ct
u(x,t) = 
(2.151)
 1 1 ct+x
2 [ f (x + ct) − f (ct − x)] + 2c ct−x g(s)ds, 0 < x < ct.

Fig. 2.20 shows the two regions of the existence of the solution.
Now we turn our attention to establishing the solution of the wave equation on semi
unbounded domain with Neumann condition. We begin by considering the wave
equation of semi-infinite string with a Neumann condition


 utt − c2 uxx = 0, 0 < x < ∞, t > 0
u(x, 0) = f (x), x>0

(2.152)

 ut (x, 0) = g(x), x>0
ux (0,t) = 0, t > 0.

Using the same concept as we did for the wave equation with Dirichlet condition,
we use the even extensions fe (x) and ge (x) of f (x) and g(x), respectively, and solve
122 Partial Differential Equations

0 < x < ct

ct
x=
x ≥ ct

FIGURE 2.20
Regions of existence of the solution.

the wave equation on the whole real line with initial conditions u(x, 0) = fe (x) and
ut (x, 0) = ge (x). In other words, we consider
utt − c2 uxx = 0,

−∞ < x < ∞, t > 0
(2.153)
u(x, 0) = fe (x), ut (x, 0) = ge (x), x > 0.
D’Alembert formula gives
 x+ct
1 1
u(x,t) = [ fe (x + ct) + fe (x − ct)] + ge (s)ds. (2.154)
2 2c x−ct

By Exercise 2.88, u(x,t) given by (2.154) is even and since the derivative of an even
function is odd, and so ux will be odd in x, and hence ux (0,t) = 0. As before we can
simplify (2.150). We do this in two cases and recall that x > 0.
(a) First, suppose that x > ct. Then x + ct ≥ 0 and x − ct ≥ 0, and so
fe (x + ct) = f (x + ct) and fe (x − ct) = f (x − ct).
Moreover, on [x − ct, x + ct] we have ge (s) = g(s). Thus if x > ct, we have from
(2.154) that
 x+ct
1 1
u(x,t) = [ f (x + ct) + f (x − ct)] + g(s)ds.
2 2c x−ct

(b) Second, if 0 < x < ct. Then x + ct ≥ 0 and x − ct < 0, and so


fe (x + ct) = f (x + ct) and fe (x − ct) = f (ct − x).
Moreover, on [x − ct, 0] we have ge (s) = g(−s) and on [0, x + ct] we have ge (s) =
g(s). Therefore if 0 < x < ct we have from (2.154) that
 0  x+ct
1 1 1
u(x,t) = [ f (x + ct) + f (ct − x)] + g(−s)ds + g(s)ds
2 2c x−ct 2c 0
Wave Equation on Semi-Infinite Domain 123
 ct−x
1 1
= [ f (x + ct) + f (ct − x)] + g(s)ds
2 2c 0

1 x+ct
+ g(s)ds (s 7→ −s)
2c 0

In summary, the solution of the half-line wave equation with Neumann boundary
condition is
 1 
1 x+ct
 2 [ f (x + ct) + f (x − ct)] + 2c x−ct g(s)ds, x ≥ ct
u(x,t) =
 1 1
  ct−x  x+ct 
2 [ f (x + ct) + f (ct − x)] + 2c 0 g(s)ds + 0 g(s)ds , 0 < x < ct.
(2.155)
Example 2.29 Consider the wave equation (2.148) with c = 1 and with initial data
f (x) = sin(x), g(x) = 0, and the Dirichlet condition u(0,t) = 0. Then using (2.151),
we have for x > t, that
1
u(x,t) = [sin(x + t) + sin(x − t)] = sin(x) cos(t).
2
On the other hand, for 0 < x < t, we have
1
u(x,t) = [sin(x + t) − sin(t − x)] = sin(x) cos(t).
2
Thus,
u(x,t) = sin(x) cos(t), x > 0.

Example 2.30 Consider the wave equation (2.148) with c = 1 and with initial data
f (x) = sin(x), g(x) = 0, and the Neumann condition ux (0,t) = 0. Then using (2.155),
One can easily verify that
u(x,t) = sin(t) cos(x), x > 0.

2.8.1 Exercises
Exercise 2.85 Solve


 utt − c2 uxx = 0, 0 < x < ∞, t > 0
u(x, 0) = 0, x>0


 ut (x, 0) = cos(x), x>0
u(0,t) = 0, t > 0.

Exercise 2.86 Solve




 utt − c2 uxx = 0, 0 < x < ∞, t > 0
u(x, 0) = 0, x>0


 ut (x, 0) = sin(x), x>0
u(0,t) = 0, t > 0.

124 Partial Differential Equations

Exercise 2.87 Solve




 utt − c2 uxx = 0, 0 < x < ∞, t > 0
u(x, 0) = 0, x>0


 ut (x, 0) = 1, x>0
u(0,t) = 0, t > 0.

Exercise 2.88 Let u(x,t) be the D’alembert’s solution on −∞ < x < ∞ of the wave
equation given by 2.82. If the initial data given by f (x) = u(x, 0) and g(x) = ux (x, 0)
is even, then for all t ≥ 0, u(x,t) is an even function of x.
Exercise 2.89 Let u(x,t) be the D’alembert’s solution on −∞ < x < ∞ of the wave
equation given by 2.82. If the initial data given by f (x) = u(x, 0) and g(x) = ux (x, 0)
is odd, then for all t ≥ 0, u(x,t) is an odd function of x.
Exercise 2.90 Solve


 utt − c2 uxx = 0 0 < x < ∞, t > 0
u(x, 0) = 0, x>0


 ut (x, 0) = sin(x), x>0
ux (0,t) = 0, > 0.

Exercise 2.91 Solve




 utt − c2 uxx = 0, 0 < x < ∞, t > 0
u(x, 0) = 0, x>0

 ut (x, 0) = cos(x),
 >0
ux (0,t) = 0, t > 0.

Exercise 2.92 Solve




 utt − c2 uxx = 0, 0 < x < ∞, t > 0
u(x, 0) = 1, x>0


 ut (x, 0) = 0, x>0
ux (0,t) = 0, t > 0.

Exercise 2.93 Solve




 utt − c2 uxx = 0, 0 < x < ∞, t > 0
u(x, 0) = 0, x>0


 ut (x, 0) = 1, x>0
ux (0,t) = 0, t > 0.

Exercise 2.94 Consider the wave problem with nonhomogenous Dirichlet boundary
condition  2
 vtt − c vxx = 0,
 0 < x < ∞, t > 0
v(x, 0) = f (x), x>0


 v t (x, 0) = g(x), x>0
v(0,t) = p(t), t > 0,

Wave Equation on Semi-Infinite Domain 125

where the function p is twice differentiable. Make the change of variables

u(x,t) = v(x,t) − p(t)

and reduce the wave problem to a problem with homogenous boundary condition
and then use the D’alembert solution given by (2.85) to find the solution u and then
find the solution v of the original wave problem.
3
Matrices and Systems of Linear
Equations

In this chapter, we look at systems of equations, matrix algebra, and applications


to linear algebra by solving linear systems of differential equations. In addition, we
will study quadratic forms and their applications and functions in symmetric matri-
ces.

3.1 Systems of Equations and Gaussian Elimination


We are interested in solving a nonhomogeneous system of m linear equations with n
unknown variables x1 , x2 , . . . , xn of the form

a11 x1 + a12 x2 + a13 x3 + . . . + a1n xn = b1


a21 x1 + a22 x2 + a23 x3 + . . . + a2n xn = b2
a31 x1 + a32 x2 + a33 x3 + . . . + a3n xn = b3 (3.1)
..
.
am1 x1 + am2 x2 + am3 x3 + . . . + amn xn = bm

where the ai j , bi are constants for 1 ≤ i ≤ m, 1 ≤ j ≤ n. By a solution of the system


(3.1), we mean an n-tuple of real numbers of the form (p1 , p2 , . . . , pn ) that when
plugged into (3.1) produces a true statement. For example, (3, −1, 0, 2) is a solution
of the system

x1 + x2 − x3 + 4x4 = 10
−x1 − x2 + 2x3 + x4 = 0
10x1 + 3x2 + x4 = 29

The set of all solutions is called the solution set. Under the assumption that the system
(3.1) has a solution, we use the Gaussian elimination method to find the solution set.
We now describe the method in few steps.

DOI: 10.1201/9781003449881-3 126


Systems of Equations and Gaussian Elimination 127

Step 1. In this step, we try to eliminate x1 from the second , third, . . . , mth equation.
Suppose that a11 ̸= 0. If not, renumber the equations or variables so that
this is the case. We may achieve the elimination of x1 by multiplying the
a21
first equation with and then subtracting the resulting equation from the
a11
a31
second equation and by multiplying the first equation with and then
a11
subtracting the resulting equation from the third equation, and so forth. This
will result of the new system of equations

a11 x1 + a12 x2 + l13 x3 + . . . + al1n xn = b1


l22 x2 + l23 x3 + . . . + l2n xn = b′2 (3.2)
..
.
lm2 x2 + lm3 x3 + . . . + lmn xn = b′m .

Since any solution of (3.1) is a solution of (3.2) and conversely, because steps
are reversible, we may obtain (3.1) from (3.2).
Step 2. In this step we try to eliminate x2 from the third, . . . , mth equation in (3.2).
Suppose that l22 ̸= 0. (Otherwise, renumber the equations or variables so
l32
that this is so.) We do this by multiplying the second equation with and
l22
then subtracting the resulting equation from the third equation. Similarly,
l42
we multiply the second equation with and then subtracting the resulting
l22
equation from the fourth equation, and so forth. The further steps are now
obvious. For example, in the third step, we eliminate x3 and in the fourth
step, we eliminate x4 , etc. The process will only stop when no equations are
left or when the coefficients of all the unknowns in the remaining equations
are all zero. This leads to the system of equations

a11 x1 + a12 x2 + a13 x3 + . . . + a1n xn = b1


l22 x2 + l23 x3 + . . . + l2n xn = b′2 (3.3)
..
.
′′
drr xr + . . . + drn xn = bm

where either r = m, or r < m. If r < m, the remaining equations have the


form
′′ ′′ ′′
0 = br+1 , 0 = br+2 , . . . , 0 = bm
and the system has no solution, unless
′′ ′′
br+1 = 0, . . . , bm = 0.

If the system has a solution we may obtain it by assigning arbitrary values for
the unknown xr+1 , . . . , xn , solving the last equation in (3.3) for xr , the next
128 Matrices and Systems of Linear Equations

to the last for xr−1 , and so on up to the line. When m = n = r, the system
(3.3) has triangular form and there is one, and only one , solution. We will
illustrate the method in a series of examples.
Example 3.1 Consider the system

x1 + 2x2 − 2x3 + 4x4 = 11


2x1 + x2 − x3 + 3x4 = 9
x1 − x2 + x3 − x4 = −2.

In the first step, we eliminate x1 from the last two equations. This is done by mul-
tiplying the first equation by 2 and then subtracting the resulting equation from the
second equation. Similarly, we subtract the third equation from the first equation.
This leads us to the new system of equations

x1 + 2x2 − 2x3 + 4x4 = 11


−3x2 + 3x3 − 5x4 = −13
−3x2 + 3x3 − 5x4 = −13.

Note that the last two equations are identical. In the second step we eliminate x2 from
the third equation by subtracting the third equation from the second equation. This
results into the new system of equations

x1 + 2x2 − 2x3 + 4x4 = 11


−3x2 + 3x3 − 5x4 = −13
0x2 + 0x3 + 0x4 = 0.

The third equation is satisfied for any value for x2 , x3 , and x4 . Thus the third equation
puts no constraint on the solution. However, the first and second equations represent
four unknowns with two constraints and hence there are two arbitrary unknowns
(4 − 2 = 2). We may choose x3 and x4 arbitrarily. Thus, we let x3 = s and x4 = t,
where s and t are arbitrary. Then the second equation gives
5 13 5 13
x2 = x3 − x4 + = s− t + .
3 3 3 3
Using the first equation, we solve for x1 and obtain
2 7
x1 = −2x2 + 2x3 − 4x4 + 11 = − t + .
3 3

Example 3.2 Consider the system

x1 + x2 − 3x3 = 4
2x1 + x2 − x3 = 2
3x1 + 2x2 − 4x3 = 7.
Systems of Equations and Gaussian Elimination 129

We start by using the first equation. We perform two steps simultaneously. Multiply
the first equation by −2 and add it to the second equation. Then multiply the first
equation by −3 and add it to the third equation to obtain

x1 + x2 − 3x3 = 4
−x2 + 5x3 = −6
−x2 + 5x3 = −5.

Next, −1 times the second equation is added to the third equation produces

x1 + x2 − 3x3 = 4
−x2 + 5x3 = −6
0x1 + 0x2 + 0x3 = 1.

The third equation can not be satisfied for any values x1 , x2 , and x3 and hence the
process stops and the system has no solution. □
Example 3.3 Consider the system

2x1 − 2x2 = −6
x1 − x2 + x3 = 1
3x2 − 2x3 = −5.

We start by multiplying the first equation by 1/2

x1 − x2 = −3
x1 − x2 + x3 = 1
3x2 − 2x3 = −5.

Now, adding −1 times the first equation to the second equation yields

x1 − x2 = −3
x3 = 4
3x2 − 2x3 = −5.

From the second equation we immediately have x3 = 4. Substituting this value into
the third equation gives 3x2 − 2(4) = −5, and hence x2 = 1. Similarly, substituting
into the first equation gives x1 = −2. Thus the system has the solution (x1 , x2 , x3 ) =
(−2, 1, 4). □
Example 3.4 Consider the system

x1 + x2 − 3x3 = 4
2x1 + x2 − x3 = 2
3x1 + 2x2 − 4x3 = 6.
130 Matrices and Systems of Linear Equations

We will perform two steps at once. First, −2 times the first equation and then add
it to the second equation; second, −3 times the first equation and then add it to the
third equation yield the following equivalent system

x1 + x2 − 3x3 = 4
−x2 + 5x3 = −6
−x2 + 5x3 = −6.

Now, adding −1 times the second equation to the third equation yields

x1 + x2 − 3x3 = 4
−x2 + 5x3 = −6
0x1 + 0x2 + 0x3 = 0.

The third equation is satisfied for any value for x1 , x2 and x3 . Thus the third equation
puts no constraint on the solution. However, the first and second equations represent
three unknowns with two constraints and hence there is one arbitrary unknown (3 −
2 = 1.) To simplify calculation, it is more convenient to let x3 = s where s is arbitrary.
Then the second equation gives x2 = 5s + 6. Similarly, the first equation yields x1 =
−2 − 2s. Since s is arbitrary, the system has infinitely many solutions. □

3.2 Homogeneous Systems


We begin by considering the linear homogeneous system of equations

a11 x1 + a12 x2 + a13 x3 + . . . + a1n xn =0


a21 x1 + a22 x2 + a23 x3 + . . . + a2n xn =0
...
am1 x1 + am2 x2 + am3 x3 + . . . + amn xn =0

where the ai j , 1 ≤ i ≤ m, 1 ≤ j ≤ n, are constants. Note that the above homogeneous


system will always have the trivial solution (x1 = x2 , . . . , xn = 0) as a solution. Any
other solution is nontrivial. In the case m < n, that is, there are more unknowns
than equations, then the above homogeneous system (3.4) has a nontrivial solution,
as seen in the next theorem.
Theorem 3.1 In the homogeneous system (3.4), if m < n, then the system has a non-
trivial solution.

Proof The proof is based on the Gauss elimination method. We may assume the
coefficient a11 of x1 is not zero. This is a fair assumption since if all the coefficients
of x1 are zero; that is a11 = a21 = . . . = am1 = 0, then, x1 = 1, x2 = x3 = . . . = xn = 0
Homogeneous Systems 131

is a nontrivial solution. Thus, we may assume a11 ̸= 0. Divide the first equation in
(3.4) by a11 to obtain the equation

x1 + b12 x2 + b13 x3 + . . . b1n xn = 0. (3.4)

Multiply (3.4) successively by a21 , a31 , . . . , am1 , and subtract the respective resultant
equations from the second, third, . . . , mth equations of (3.4), to reduce (3.4) to the
form
x1 + b12 x2 + b13 x3 + . . . + b1n xn =0
b22 x2 + b23 x3 + . . . + b2n xn =0
...
bm1 x1 + bm2 x2 + bm3 x3 + . . . + bmn xn = 0.
Now we repeat the same process but now we assume the coefficient b22 of x2 is
not zero. Hence, by applying the Gaussian procedure again produces the third sys-
tem
x1 + c13 x3 + . . . + c1n xn =0
x2 + c23 x3 + . . . + c2n xn =0
c33 x3 + . . . + c3n xn =0
...
cm3 x3 + . . . + cmn xn = 0.
By continuing this process and in particular at the r stage and by using the fact that
the numbers of variables is less than the number of equations, we ultimately, arrive
a system of m equations of the form
x1 + d1r xr + . . . + d1n xn =0
x2 + d2r xr + . . . + d2n xn =0
..
.
xr−1 + dr−1 xr + . . . + dr−1,n xn =0
0 = 0.
If we let xr = 1, xr+1 = · · · = xn = 0, and x1 = −d1r , x2 = −d2r , . . . , xr−1 = −dr−1,r ,
we obtain a nontrivial solution. The proof is done since the systems are equiva-
lent.
Remark 8 In fact, the homogeneous system (3.4) has infinitely many solutions since
the choice of xr is arbitrary.

3.2.1 Exercises
Exercise 3.1 Solve the given system

x1 − 2x2 − x3 + 3x4 = 1
2x1 − 4x2 + x3 = 5
x1 − 2x2 + 2x3 − 3x4 = 4.
132 Matrices and Systems of Linear Equations

Exercise 3.2 Solve the given system


3x1 + 7x2 − x3 = −1
x1 + 3x2 + x3 = 1
−x1 − 2x2 + x3 = 1.
Exercise 3.3 Determine all solutions of the system
x1 − 3x2 + 4x3 = 1
−2x1 + x2 + 2x4 = −1
−x1 − 2x2 + 4x3 + 2x4 = 0.
4x1 + 3x2 − 8x3 − 6x4 = 1.
Exercise 3.4 Solve the given system
−2x1 + x2 + x3 = −3
x1 + 3x2 − x3 = 5
3x1 + 2x2 − 2x3 = 8.
Exercise 3.5 Solve the given system
−2x1 + x2 + x3 = 0
x1 + 3x2 − x3 = 0
3x1 + 2x2 − 2x3 = 0.
Exercise 3.6 Find necessary ans sufficient conditions on a, b and c so the system has
a solution
x1 + 7x2 + 4x3 = a
2x1 + 8x2 + 5x3 = b
3x1 + 9x2 + 6x3 = c.
Exercise 3.7 Find q so that the following system has a nontrivial solution
qx1 + x2 − (4 − q)x3 = 0
2qx1 + (2 − q)x2 − x3 = 0
3x1 + (1 + q)x2 − (5 − q)x3 = 0.
Exercise 3.8 Solve
x1 + x2 + x3 = 6
x1 + 2x2 − 3x3 = −4
−x1 − 4x2 + 9x3 = 18.
Exercise 3.9 Solve
x1 + 2x2 − 3x3 = −1
3x1 − x2 + 2x3 = 7
5x1 + 3x2 − 4x3 = 2.
Matrices 133

3.3 Matrices
In this section we look at matrix algebra and related issues. We begin with the defi-
nition of a matrix. A matrix A is a rectangular array of real or complex numbers of
the form  
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A= . ..  . (3.5)
 
.. ..
 .. . . . 
am1 am2 · · · amn
The element in the ith row and jth column of the matrix A is denoted by ai j . Some-
times we use the more compact notation
A = (ai j ), i = 1, 2, . . . , m, j = 1, 2, . . . n.
The matrix A has m rows and n columns and we say it is an m × n matrix and we may
write it as Am×n . When m = n, then A is said to be a square matrix. Two matrices are
said to be equal when and only when they have the same size (that is same numbers
of rows and columns) and have the same entry in each position. In other words, if
B p×q is another matrix with B = (bi j ), i = 1, 2, . . . , p, j = 1, 2, . . . q, then A = B if
and only if m = p and n = q, and ai j = bi j for all i and j. As for addition, if A and B
are two matrices with the same size, then

A + B = ai j + bi j m×n .
The product of two matrices Am×n and Bn×p is another matrix Cm×p , where the ma-
trix  n 
C = ∑ ai j b jk .
j=1 m×p

To be more explicit, the product of the two matrices A and B has the general for-
mula
AB = (ai j )m×n (bi j )n×p
  
a11 a12 · · · a1n b11 b12 ··· b1p
 a21 a22 · · · a2n  b21 b22 ··· b2p 
=  .
  
.. .. ..   .. .. .. .. 
 .. . . .  . . . . 
am1 am2 · · · amn bn1 bn2 · · · bnp
 n
∑k=1 a1k bk1 · · · ∑nk=1 a1k bkp

=  .. .. ..
.
 
. . .
n n
∑k=1 amk bk1 · · · ∑k=1 amk bkp

Notice that the resulting matrix from the product AB has the same number of rows
as A and the same number of columns as B. Thus, if AB is defined, BA may not be
134 Matrices and Systems of Linear Equations

defined. Moreover, the multiplication of two matrices, is, in general, not commuta-
tive.
Associative law If A is an m × n matrix, B is an n × p matrix, and C is a p × q matrix,
then
(AB)C = A(BC). (3.6)
Moreover, if α is a constant, then clearly
αA = (αai j )m×n .

As for an application, the system (3.1) in matrix notation may be written as


    
a11 a12 · · · a1n x1 b1
 a21 a22 · · · a2n  x2   b2 
..   ..  =  ..  ,
    
 .. .. ..
 . . . .  .   . 
am1 am2 · · · amn xn bm
   
x1 b1
x2   b2 
or, Ax = b, where x =  .  , b =  . , and A is given by (3.5).
   
 ..   .. 
xn bm
Definition 3.1 The transpose of the m × n matrix (3.5) is the n × m matrix AT defined
by  
a11 a21 · · · am1
a12 a22 · · · am2 
AT =  . ..  .
 
.. ..
 .. . . . 
a1n a2n ··· amn

Let the matrix An×n be a square matrix. That is


 
a11 a12 · · · a1n
a21 a22 · · · a2n 
A= . ..  . (3.7)
 
.. ..
 .. . . . 
an1 an2 ··· ann
Then the diagonal containing the entries
a11 , a22 , . . . , ann
is called the principal diagonal. The square identity matrix, denoted by I has 1s for
its principal diagonal entries and has 0s elsewhere. In other words,
 
1 0 ··· 0
0 1 · · · 0
I = . . . .
 
 .. .. . . ... 

0 0 ··· 1
Matrices 135

Thus, if A is given by (3.7), then

AI = IA = A.

Also, it is readily verified that if X is an n × 1 matrix, then

IX = X.

The Kronecker delta is defined as follows:



1, i= j
δi j =
0, i ̸= j

Using the Kronecker delta we may write the identity matrix as

I = (δi j )n×n .

Throughout this chapter, we denote the identity n × n matrix by I.


Definition 3.2 a) A real square matrix A = (ai j ) is said to be symmetric if it is equal
to its transpose, that is,
A = AT .

b) A real square matrix A = (ai j ) is said to be skew-symmetric if

AT = −A.

As a consequence of the above definition, we know that any square matrix may
be written as the sum of a symmetric matrix R, and a skew-symmetric matrix S,
where
1 1
R = (A + AT ) and S = (A − AT ). (3.8)
2 2
Definition 3.3 Let A be an n × n matrix.
a) If all the elements above the principal diagonal (or below the principal diagonal)
are zero, then the matrix A is called a triangular matrix
b) If all the elements above and below the principal diagonal are zero, then the
matrix A is called a diagonal matrix.
c) A matrix whose entries are all zero is called a zero matrix or null matrix.

3.3.1 Exercises
Exercise 3.10 Prove (3.6).
Exercise 3.11 a) Prove Associative law for matrix addition:

(A + B) +C = A + (B +C).
136 Matrices and Systems of Linear Equations

b) Prove the distributive law for matrix multiplication:

(B +C)A = BA +CA.

Exercise 3.12 Suppose A is an m × n matrix such that Ax = b, and Ax = c, where b


and c are m×1 matrices and x is an n×1 matrix have solutions. Show that Ax = b+c
has a solution.
Exercise 3.13 Suppose A and B are two matrices such that AB and BA are defined
and AB = BA. Then show that A and B are square matrices with the same numbers
of rows and columns.
Exercise 3.14  B be 2 × 2 matrices that each one of them commutes with
 Let A and
0 1
the matrix . Show that
−1 0

AB = BA.

Exercise 3.15 Give an example of two matrices A and B such that AB = 0, but nei-
ther A = 0 nor B = 0.
Exercise 3.16 Show that (AB)T = BT AT .
Exercise 3.17 Let A be a square matrix given by A = (ai j ), i, j = 1, 2, . . . , n. Suppose
A is skew-symmetric matrix. Show that if i = j, then all the entries in its principle
diagonal are zero.
Exercise 3.18 Give an example of a 3 × 3 matrix that is skew-symmetric.
 
2 3
Exercise 3.19 Write the matrix A = as the sum of R and S as given in
5 −1
(3.8).
Exercise 3.20 Show that the transpose of a triangular matrix is triangular.
 
1 2 6
Exercise 3.21 Write A = 3 4 7 as a sum of two triangular matrices. In
5 8 9
this sum unique?

3.4 Determinants and Inverse of Matrices


Determinants of square matrices naturally arise when solving linear equations. For
example, consider the 2 equations with 2 unknowns

ax + by = k1

cx + dy = k2 .
Determinants and Inverse of Matrices 137

Using the Gauss elimination process we find the solution to be


k1 a − k2 b k2 a − k2 c
x= , y= ,
ad − bc ad − bc
provided ad − bc ̸= 0.
In matrix form, the above system is written
AX = k where      
a b x k
A= , X= and k = 1 .
c d y k2
Define the determinant of the square matrix A2×2 by

a b
det(A) = |A| = = ad − bc.
c d
Then we may write the solution of the above system using determinants nota-
tions
k1 a − k2 b k2 a − k2 c
x= , and y = .
|A| |A|
The determinant of order n of the square matrix An×n , denoted by |A| is symbolically
defined by
a11 a12 · · · a1n

a21 a22 · · · a2n
det(A) = |A| = . (3.9)

.. .. ..
.. . . .

an1 an2 · · · ann
and how to compute its value we explain now. Let Mik denote the resulting de-
terminant after the ith row and kth column from |A| are deleted. We are left with
an (n − 1)th order determinant. The cofactor of aik is denoted by Cik and defined
by
Cik = (−1)i+k Mik . (3.10)

Example 3.5 Consider the third order determinant



a11 a12 a13

|D| = a21 a22 a23 .
a31 a32 a33

Then, we have

a a23 a a23
C11 = M11 = 22 , C12 = −M12 = − 21 ,
a32 a33 a31 a33

a a13
C32 = −M32 = − 11 , etc.
a21 a23

138 Matrices and Systems of Linear Equations

We are ready to state the following definition.


Definition 3.4 The determinant of the n × n matrix A in (3.9) is the sum of the prod-
ucts of the elements of any row or column and their respective cofactors. That is

|A| = ai1Ci1 + ai2Ci2 + . . . + ainCin , i = 1, 2, . . . , n,

or
|A| = a1kC1k + a2kC2k + . . . + ankCnk , k = 1, 2, . . . , n.

If all the entries of the matrix A are real constants, then the value of the determinant
is a real constant.
Remark 9 The determinant of an n × n matrix is the same regardless of which row
or column is chosen.
Example 3.6 Find A of
1
2 −1
A = 3 6 0 .
0 4 2
We make use of the first row

6 0 3 0 3 6
|A| = 1 − 2 − 1 = −12.
4 2 0 2 0 4


Below we state a theorem that contains certain facts concerning determinants. We
leave the proofs to you.
Theorem 3.2 1) If all elements of one row or one column of an n × n matrix are
multiplied by a constant k, then the determinant is k times the determinant of the
original matrix.
2) If all entries of a row or a column of an n × n matrix are zero, then the determi-
nant is zero.
3) If any two rows or columns of an n × n matrix are interchanged, then the deter-
minant is −1 times the original determinant.
4) If any two rows (or two columns) of an n × n matrix are constant multiples of
each other, then the determinant is zero.
5) If the entries of any row (or column) of an n × n matrix are altered by adding to
them any constant multiple of the corresponding elements in any other row (or
column) then the determinant does not change.
Theorem 3.3 Let A and B be two n × n matrices. Then,

det(AB) = det(A) det(B).


Determinants and Inverse of Matrices 139

Theorem 3.4 Let A be an n × n matrix. Then

det(A) = det(AT ).

Now we transition to the concept on the inverse of a matrix. We have the following
definition.
Definition 3.5 Let A be an n × n matrix. If there exists an n × n matrix B such that

AB = BA = I,

then A is said to be invertible and B is said to be the inverse of A. We denote the


inverse matrix of A by A−1 . Matrices that do not have inverses are said to be nonin-
vertible or singular.
Theorem 3.5 A square matrix A can not have more than one inverse.

Proof Suppose the matrix A has two inverses B and C. That is, AB = BA = I, and
AC = CA = I. Then,

B = BI = B(AC) = (BA)C = IC = C.

In the next example we show how to find the inverse of a 2 × 2 matrix by solving
systems of equations.
   
1 2 a b
Example 2 Find the inverse of A = . Suppose the matrix B =
3 4 c d
is the inverse matrix of A. Then, it must satisfy AB = BA = I. In Other words:
    
1 2 a b 1 0
= .
3 4 c d 0 1

We obtain the following system of linear equations,

a + 2c = 1, b + 2d = 0, 3a + 4c = 0, and 3b + 4d = 1.

Solving for a in the third equation and substituting into the first equation gives a = −2
and c = 3/2. Similarly, solving for b in the second equationand substituting  it into
−2 1
the fourth equation gives d = −1/2 and d = 1. Hence B = 3 . We can
2 − 12
easily check that AB = BA = I. We conclude the matrix B is the inverse matrix of A.

Theorem 3.6 Let A be an n × n matrix. Then A is invertible if and only if

det(A) ̸= 0.
140 Matrices and Systems of Linear Equations

Proof Since A is invertible, we have that AA−1 = I It follows that det(AA−1 ) =


det(I) = 1. By Theorem 3.3, we see that

det(A) det(A−1 ) = 1,

from which it follows that det(A) ̸= 0. The second part of the proof is left as an
exercise. This completes the proof.

Now we are ready to give a formula for the inverse of a square matrix, but first we
make the following definition.
Definition 3.6 Let A be an n × n matrix. The adjoint of A, Adj A is

Adj A = (Cik )T ,

where Cik is given by (3.10).


The next definition provides an alternative for finding the inverse of a matrix it term
of its adjoint.
Definition 3.7 Let A be an n × n matrix with det(A) ̸= 0. Then

1
A−1 = Adj A. (3.11)
det(A)
 
a b
Let A = . Then from (3.11) we have that
c d
 
−1 1 d −b
A = . (3.12)
det(A) −c a

In the case of an n × n diagonal matrix


 
a11 0 ··· 0
0 a22 ··· 0 
A= . ,
 
.. .. ..
 .. . . . 
0 0 ··· ann

we have that  
1/a11 0 ··· 0
 0 1/a22 ··· 0 
A−1 =  . .
 
.. .. ..
 .. . . . 
0 0 ··· 1/ann

Let A be an n × n matrix that is invertible. Then

AA−1 = I.
Determinants and Inverse of Matrices 141

Replace A by the matrix C and get

CA−1 = I.

If we now take for C the inverse A−1 , this becomes

A−1 (A−1 )−1 = I.

By multiplying both sides from the left by A we obtain

(A−1 )−1 = A.

This shows that the inverse of the inverse of an invertible matrix A is the matrix
A.
Theorem 3.7 Let A and B be two n × n invertible matrices. Then

(AB)−1 = B−1 A−1 .

Proof We begin with AA−1 = I. Next we replace A by AB and obtain

AB(AB)−1 = I.

Multiply both sides of the preceding expression from the left with A−1 we arrive
at
B(AB)−1 = A−1 .
By multiplying both sides from the left by B−1 the results follows. This completes the
proof.

Of course the Theorem 3.7 can be easily generalized to products of more than two
matrices. Hence, by induction one might have
−1
ABC . . . PQ = Q−1 P−1 . . . B−1 A−1 . (3.13)

Example 3.7 Consider the third order matrix


 
1 2 0
A = −1 1 1 .
1 2 3

Then, det(A) = 9. We are interested in finding A−1 . We begin by computing the


cofactors using (3.10). Then we have

= 1, C12 = −M12 = − −1 1 = 4, C13 = M13 = −1 1 = −3,
1 2
C11 = M11 =
1 3 1 3 1 2
142 Matrices and Systems of Linear Equations

2 0
= −6, C22 = M22 = 1 0 = 3, C23 = −M23 = − 1 2 = 0,

C21 = −M21 = −
2 3 1 3 1 2

2 0 1 0 1 2
C31 = M21 =
= 2, C32 = −M22 = −
= −1, C33 = M33 = = 3.
1 1 −1 1 −1 1
Thus,  
1 4 −3
C = (Ci j ) = −6 3 0 .
2 −1 3
Moreover,  
1 −6 2
Adj A = CT =  4 3 −1 .
−3 0 3
Finally, using (3.11) we obtain
   
1 −6 2 1/9 −2/3 2/9
1
A−1 =  4 3 −1 =  4/9 1/3 −1/3 .
9
−3 0 3 −1/3 0 1/3


For application we consider the nonhomogenous system of n equations in n un-
knowns given by

a11 x1 + a12 x2 + a13 x3 + . . . + a1n xn = b1


a21 x1 + a22 x2 + a23 x3 + . . . + a2n xn = b2
a31 x1 + a32 x2 + a33 x3 + . . . + a3n xn = b3 (3.14)
..
.
an1 x1 + an2 x2 + an3 x3 + . . . + ann xn = bn

where the ai j , bi are constants for 1 ≤ i ≤ n, 1 ≤ j ≤ n. In matrix form the system


may be written as
AX = b, (3.15)
where
     
a11 a12 ··· a1n x1 b1
a21 a22 ··· a2n  x2  b2 
A= . ..  , X =  . , and b =  .  .
     
.. ..
 .. . . .   ..   .. 
an1 an2 ··· ann xn bn

If det(A) ̸= 0, then by multiplying (3.15) from the left by A−1 we get

X = A−1 b. (3.16)
Determinants and Inverse of Matrices 143

Clearly, (3.16) is a solution of (3.15). To see this, substitute X into (3.15) and
get
A(A−1 b) = (AA−1 )b = Ib = b.
As for uniqueness, suppose there is another solution Y such that

AY = b.

Then multiplying from the left by A−1 we arrive at

X = A−1 b, Y = A−1 b,

from which we conclude that X = Y. Now that we have established the system has
a unique solution, we try to give an explicit formula for such solution. Using (3.11)
along with (3.16) and Definition 3.6 we have
1
X= (Adj A)b.
det(A)

Or,
    
x1 C11 C21 · · · Cn1 b1
x2  1 C12
 C22 · · · Cn2  b2 
 ..  =
   
 . .. .. ..   .. 
. det(A)  .. . . .  . 
xn C1n a2n · · · Cnn bn
 
b1C11 + b2C21 + · · · + bnCn1
1  b1C12 + b2C22 + · · · + bnCn2 

= .. .
det(A) 

. 
b1C1n + b2C2n + · · · bnCnn

It follows from the above calculation that the components of the solutions are given
by
b1C1i + b2C2i + · · · + bnCni
xi = , i = 1, 2, . . . , n (3.17)
det(A)
We summarize the results in the following theorem.
Theorem 3.8 Consider the nonhomogeneous system (3.14) of n linear equations
with n unknowns. If its coefficient matrix A has det(A) ̸= 0, then the system has a
unique solution x1 , x2 , . . . , xn given by (3.17).
As a direct consequence of (3.17) we have the following corollary.
Corollary 3 A homogeneous system of n linear equations with n unknowns and a
coefficients matrix A with det(A) ̸= 0 has just the trivial solution.

Proof Formula (3.17) is valid since det(A) ̸= 0. The results follow since each bi =
0, i = 1, 2, . . . , n.
144 Matrices and Systems of Linear Equations

Example 3.8 Solve the system

x1 + 2x2 = 5
−x1 + x2 + x3 = 4
x1 + 2x2 + 3x3 = 14.

In matrix notation, we have


     
1 2 0 x1 5
A = −1 1 1 , X = x2  , b =  4 .
1 2 3 x3 14

Notice that the matrix A is the same as the one in example (3.7). Thus, the Ci j are
readily available and using (3.17), we obtain

b1C11 + b2C21 + b3C31 5(1) + 4(−6) + 14(2)


x1 = = = 1,
det(A) 9

b1C12 + b2C22 + b3C32 5(4) + 4(3) + 14(−1)


x2 = = = 2,
det(A) 9
and
b1C13 + b2C23 + b3C33 5(−3) + 4(0) + 14(3)
x3 = = = 3.
det(A) 9

We have the following theorem regarding the homogenous system (3.4).
Theorem 3.9 If    
x1 y1
x2  y2 
X(1) =  .  , X(2) =  . 
   
 ..   .. 
xn yn
are solutions of the homogenous system (3.4), then

X = c1 X(1) + c2 X(2) ,

where c1 , c2 are constants, is also a solution of the homogenous system (3.4).

Proof The homogenous system can be written as AX = 0. By assumption, AX(1) =


AX(1) = 0. Hence,

AX = A c1 X(1) + c2 X(2) = c1 AX(1) + c2 AX(2) = 0.

We note that the theorem does not hold for nonhomogenous systems.
Determinants and Inverse of Matrices 145

Definition 3.8 Any matrix obtained by omitting some rows or columns from a given
Am×n matrix is said to be a submatrix of A. We note a submatrix includes the matrix
A itself.
Example 3.9 The matrix
 
a11 a12 a13
A=
a21 a22 a23

contains the following submatrices: A itself; the three 2 × 2 submatrices


     
a11 a12 a11 a13 a12 a13
, , ;
a21 a22 a21 a23 a22 a23

the two 1 × 3 submatrices


 
a11 a12 a13 , a21 a22 a23 ;

the three 1 × 2 submatices


     
a11 a12 a13
, , ;
a21 a22 a23

the six 1 submatrices,


  
a11 a12 , a11 a13 , a12 a13 ,
  
a21 a22 , a21 a23 , a22 a23 ,
and the six 1 × 1 submatrices,

(a11 ), (a12 ), (a13 ), (a21 ), (a22 ), (a23 ).


Definition 3.9 The rank of a matrix A is the order of the largest square submatrix
with a nonzero determinant.
To expand on the above definition, a matrix A is said to be of rank r if it contains at
least one r-rowed square submatrix with nonvanishing determinant, while the deter-
minant of any square submatrix having r + 1 or more rows, possibly contained in A,
is zero.
Example 3.10 Let  
−3 3 0
A= 1 −2 −1 .
2 2 4
 
1 −2
Then det(A) = 0. Now the 2×2 submatrix A = has nonzero determinant
2 2
and we conclude the rank of A is 2. □
146 Matrices and Systems of Linear Equations

Example 3.11 Let  


6 3 4 7
A = 4 2 1 3 .
2 1 0 1
Then all of its 3 × 3 submatrices
       
6 3 4 3 4 7 6 4 7 6 3 7
4 2 1 , 2 1 3 , 4 1 3 , 4 2 3
2 1 0 1 0 1 2 0 1 2 1 1
have determinant zero. However, the 2 × 2 submatrix
 
6 4
4 1
has nonzero determinant and the matrix A has rank 2.

Theorem 3.10 Suppose the rank of the Am×n matrix is r. Then then rank of its trans-
pose, AT is r too.

Proof Let R be an r-rowed submatrix of A with det(R) ̸= 0. It is obvious that RT


is a submatrix of AT . Then by Theorem 3.4, det(RT ) = det(R). This implies that
rank of AT ≥ r. On the other hand, if A contains an (r + 1)-rowed square matrix
M, then, by the rank definition, det(M) = 0. Since M corresponds to M T in AT , and
det(M T ) = 0, it follows that AT can not contain an (r + 1)-rowed square matrix with
nonzero determinant. Thus, rank of AT is r. This completes the proof.
Theorem 3.11 If a matrix A is of rank r, and a set of r rows (or columns) containing
a non-singular submatrix of order r is selected, then any other row (column) in the
matrix A is a linear combination of these r rows (or columns).

Proof To simplify notation, we suppose that the submatrix R of order r is the up-
per left corner of the matrix A has a non-vanishing determinant, and consider the
submatrix of A
..
 

a11 a12 · · · a1r a1s

 . a1s 
a21 a22 · · · a2r a2s   .. 
    R . a2s 
M =  ... .. .. .. ..  =  

. . . . .. .. 
. . 
 
 
 ar1 ar2 · · · arr ars   
 . . . . . . . . . . . . ... ars 
 
aq1 aq2 · · · aqr aqs
aq1 aq2 · · · aqr aqs

where s > r and q > r. Since A is of rank r, |M| = 0 for all such q and s. Now the
system,

α1 a11 + α2 a21 + . . . + αr ar1 = aq1


Determinants and Inverse of Matrices 147

α1 a12 + α2 a22 + . . . + αr ar2 = aq2


..
.
α1 a1r + α2 a2r + . . . + αr arr = aqr , or
   
α1 aq1
T  ..   .. 
R α = b, where α =  .  , b =  .  ,
αr aqr
has a unique solution α = (RT )−1 b, since |RT | = |R| =
̸ 0. Rewrite this system in the
following form:
α1 b1 + α2 b2 + . . . + αr br ,
where b j is the jthe row of R. We see that we can determine a row of elements which
indeed is a linear combination of the first row of M, and which we will have its first
elements identical with the first r elements of the last row of M. Let the last elements
of that combination be a′qs . In evaluating the determinant |M|, we may subtract this
linear combination of the first r rows from the last row without changing the value of
the determinant. Thus,
 
a11 a12 · · · a1r a1s
a21 a22 · · · a2r a2s 
 
 .. .. . . .
. ..
|M| =  . ,

 . . . . 
 ar1 ar2 · · · arr ars 
0 0 · · · 0 aqs − a′qs

where
a′qs = α1 a1s + α2 a2s + . . . + αr ars ,
and
|M| = ± aqs − a′qs |R| = 0.


Hence the last row of M is a linear combination of the first r rows. Since this is true
for any q and s, the result follows.

As a consequence of Theorem 3.10, we have the following two corollaries.


Corollary 4 A homogeneous system of n linear equations with n unknowns has a
nontrivial solution if and only if, its coefficients matrix A satisfies det(A) = 0.
Corollary 5 A square matrix is singular if and only if one of its rows (or columns)
is a linear combination of the others.

3.4.1 Application to least square fitting


Least square approximation is a mathematical process that determines the curve
that best fits a set of points by minimizing the sum of the squares of the offsets
(also known as “the residuals”) of the points from the curve. Finding the best-fitting
148 Matrices and Systems of Linear Equations

straight line through a group of points is a problem that can be solved using the lin-
ear least squares fitting technique, which is the most straightforward and widely used
type of linear regression.
We begin with the simple problem by trying to find the straight line y = ax + b,
that best fits an n-observations given by (xn , yn ), for n = 1, 2, . . . , N. From the linear
equation, it is intuitive to define the error by
N  2
E(a, b) = ∑ yn − (axn + b) . (3.18)
n=1

This is simply N times the variance of the data collection

{y1 − (ax1 + b), . . . , yN − (axN + b)}.

For best fitting, we must minimize the error given by (3.18). That is, we must find
the values of (a, b) such that

∂E ∂E
= 0, = 0. (3.19)
∂a ∂b

Using (3.19), we obtain


N 
∂E 
= 2 ∑ yn − (ax + b) (−xn ),
∂a n=1

N 
∂E 
= 2 ∑ yn − (ax + b) (−1). (3.20)
∂b n=1
∂E ∂E
Setting ∂a = ∂b = 0, we arrive at the system of equations with two unknown

N N N
( ∑ xn2 )a + ( ∑ xn )b = ∑ xn yn ,
n=1 n=1 n=1

N N N
( ∑ xn )a + ( ∑ 1)b = ∑ yn , (3.21)
n=1 n=1 n=1

In Matrix from,

∑Nn=1 xn2 ∑Nn=1 xn ∑Nn=1 xn yn


!  !
a
= .
∑Nn=1 xn N b ∑Nn=1 yn

This implies that


!−1
∑Nn=1 xn2 ∑Nn=1 xn ∑Nn=1 xn yn
  !
a
= .
b ∑Nn=1 xn N ∑Nn=1 yn
Determinants and Inverse of Matrices 149

Consequently,
− ∑Nn=1 xn ∑Nn=1 xn yn
! !
N
 
a 1
= .
b N ∑n=1 xn − ∑Nn=1 xn ∑Nn=1 xn
N 2
− ∑Nn=1 xn ∑Nn=1 xn2 ∑Nn=1 yn
The above concept can be easily generalized to functions that are not straight lines.
Given functions f1 , . . . , fk , find the values of coefficients a1 , . . . , ak , such that the
linear combination
y = a1 f1 (x) + . . . + ak fk (x)
is the best approximation to the data. Staying with the same set up, we define the
error by
N  2
E(a1 , . . . , ak ) = ∑ yn − (a1 f1 (xn ) + . . . + ak fk (xn )) . (3.22)
n=1
To find the values of (a1 , . . . , ak ) we set
∂E ∂E
= 0, . . . , = 0. (3.23)
∂ a1 ∂ ak
To be more specific, we consider fitting the parabola y = a + bx + cx2 . Then
N  2
E(a, b, c) = ∑ yn − (a + bxn + cxn2 ) .
n=1

Then
∂E ∂E ∂E
= = = 0,
∂a ∂b ∂c
implies that
N  
2 ∑ yn − (a + bxn + cxn2 ) (−1) = 0,
n=1
N  
2 ∑ yn − (a + bxn + cxn2 ) (−xn ) = 0
n=1
and
N  
2 ∑ yn − (a + bxn + cxn2 ) (−xn2 ) = 0 (3.24)
n=1
The system (3.24) reduces to
N N N
∑ yn = Na + b ∑ xn + c ∑ xn2
n=1 n=1 n=1
N N N N
∑ xn yn = a ∑ xn + b ∑ xn2 + c ∑ xn3
n=1 n=1 n=1 n=1
and
N N N N
∑ xn2 yn = a ∑ xn2 + b ∑ xn3 + c ∑ xn4 . (3.25)
n=1 n=1 n=1 n=1
150 Matrices and Systems of Linear Equations

Example 3.12 Suppose we want to find the values a, b, and c so that y = a + bx + cx2
is the best approximation to the data

(0, 1), (1, 1.8), (2, 1.3), (3, 2.5), (4, 6.3).

Then, we have N = 5. Using the given data, one can easily compute
5 5 5 5 5
∑ xn = 10, ∑ yn = 12.9, ∑ xn2 = 30, ∑ xn3 = 100, ∑ xn4 = 354,
n=1 n=1 n=1 n=1 n=1

5 5
∑ xn yn = 37.1, ∑ xn2 yn = 130.3.
n=1 n=1

Thus, system (3.25) reduces to

5a + 10b + 30c = 12.9


10a + 30b + 100c = 37.1
30a + 100b + 354c = 130.3.

Using matrix notation, we arrive at the solution


   −1  
a 5 10 30 12.9
b =  10 30 100  37.1  .
c 30 100 354 130.3

This gives a = 1.42, b = −1.07, c = 0.55, and the desired parabola is

y = 1.42 − 1.07x + 0.55x2 .

3.4.2 Exercises
Exercise 3.22 Use the method of Example 2 to find the inverse matrix of each of the
following matrices.  
    1 3 3
3 5 5 2
(a) A = , (b) A = , (c) A = 1 3 4 .
1 2 −7 −3
1 5 3
Exercise 3.23 Prove Theorem 3.3 .
Exercise 3.24 Prove Theorem 3.4.
Exercise 3.25 Let A be an n × n matrix with det(A) ̸= 0. Show that

1
det(A−1 ) = .
det(A)
Determinants and Inverse of Matrices 151

Exercise 3.26 Find the inverse


 of each of the following
 matrices.
  −3 6 −11
3 1
(a) A = , (b) A =  3 −4 6 
2 4
4 −8 13
 
  1 2 3 4
3 5 7 2 3 4 1
(c) A = 1 2 3 (d) A = 3
.
4 1 2
2 3 5
4 1 2 3
Exercise 3.27 Show that if AB = AC, then B = C, provided that |A| ̸= 0.
Exercise 3.28 Suppose A and B are symmetric and invertible. Show that if AB = BA,
then AB, A−1 B, AB−1 , and A−1 B−1 are symmetric.
Exercise 3.29 Find 2 × 2 matrices A and B that satisfy
     
3 1 3 4 1 1
A+ B=
2 1 2 3 1 1
     
3 1 3 4 1 0
A− B= .
2 1 2 3 0 1
Exercise 3.30 Solve

2x1 + 2x2 − x3 = 4
3x1 + x2 + 4x3 = −9
x1 + 2x2 + x3 = 1.

Exercise 3.31 Solve

x1 − x2 − x3 + x4 = −2
2x1 + x2 + x3 + x4 = 3
−x1 + x2 + x3 = 1
2x1 + x3 − 3x4 = 7.

Exercise 3.32 Findthe rank of the following


 matrices.
  0 5 8  
2 5 3 0 4
a) , b)  3 1 4 , c) ,
3 8 1 7 −1
−3 4 4
 
8 1 3 6
d) −8 −1 −3 4 .
0 3 2 2
a2
 
1 a
Exercise 3.33 Let a, b, be real numbers and consider the matrix A = 1 b b2  .
1 c c2
Show that
det(A) = (b − a)(c − a)(c − b).
152 Matrices and Systems of Linear Equations

Exercise 3.34 Let A be an n × n time-dependent matrix. That is, the entries of A


depend on the independent variable t. Suppose all its entries are continuously differ-
entiable on some interval I.
Show that
d  d(A) 
det(A) = tr adj(A) .
dt dt
Exercise 3.35 Let A, and B be two n × n nonzero matrices such that AB = 0. Then
show that both A and B are singular.
Exercise 3.36 Find the line y = a + bx to best fit the data of Example 3.12.
Exercise 3.37 Find the line y = a + bx to best fit the data

(1, 2), (2, 5), (3, 3), (4, 8), (5, 7).

Exercise 3.38 Generalize the method of least squares to find the function

y = am xm + am−1 xm−1 + . . . + a0 ,

that best fits a given data.

3.5 Vector Spaces


Vector spaces play a fundamental role in the development of linear algebra. We will
state some definitions and go over a few concepts, such as bases, dimensions, and
span.
Definition 3.10 A triple (V, +, ·) is said to be a linear (or vector) space over a field
F if V is a set and the following are true.
1. Properties of +
(a) + is a function from V ×V to V . Outputs are denoted x + y.
(b) for all x, y ∈ V , x + y = y + x. (+ is commutative)
(c) for all x, y, w ∈ V , x + (y + w) = (x + y) + w. (+ is associative)
(d) there is a unique element of V which we denote 0 such that for all x ∈ V ,
0 + x = x + 0 = x. (additive identity)
(e) for each x ∈ V there is a unique element of V which we denote −x such that
x + (−x) = −x + x = 0. (additive inverse)
2. Properties of ·
(a) · is a function from F ×V to V . Outputs are denoted α · x, or αx.
(b) for all α, β ∈ F and x ∈ V , α(β x) = (αβ )x.
Vector Spaces 153

(c) for all x ∈ V , 1 · x = x.


(d) for all α, β ∈ F and x ∈ V , (α + β )x = αx + β x.
(e) for all α ∈ F and x, y ∈ V , α(x + y) = αx + αy.

Commonly, the real numbers or complex numbers are the field in the above defini-
tion.
Example 3.13 The set of real numbers R is a vector space over the field F = R under
the usual addition and multiplication. □
Example 3.14 The set Rn = {x = (x1 , x2 , . . . , xn )T } is a vector space over the field
F = R under the usual addition and multiplication. That is for x, y ∈ Rn and α ∈ F
the addition and multiplication are defined as
x + y = (x1 + y1 , x2 + y2 , . . . , xn + yn )T
and
αx = (αx1 , αx2 , . . . , αxn )T
is a vector space. □
Example 3.15 Let an ̸= 0, and define the set V = {p(x) = an xn + an−1 xn−1 + . . . +
a1 x + a0 : ai ∈ R, i = 0, 1, . . . , n}. Then V is a vector space over the field F = R.
We define the addition of two polynomial and multiplication as follows: if p, q ∈ V,
q(x) = bn xn + bn−1 xn−1 + . . . + b1 x + b0 : bi ∈ R, i = 0, 1, . . . , n then
(p+q)(x) = (an +bn )xn +(an−1 +bn−1 )xn−1 +. . .+(a1 +b1 )x+a0 +b0 = p(x)+q(x),
and
(α p)(x) = αan xn + αan−1 xn−1 + . . . + αa1 x + αa0 = α p(x),
for α ∈ R, we have V is a vector space. For example, if p(x) = 3x2 + 4x + 6, q(x) =
x3 − 2x2 + 2x + 5, then (p + q)(x) = x3 + (3 − 2)x2 + (2 + 4)x + 5 + 6, and (3p)(x) =
9x2 + 12x + 18. The additive inverse of p(x) is −p(x) = −an xn − an−1 xn−1 − . . . −
a1 x − a0 . □
Example 3.16 Let the set C(D) be the set of all continuous functions f : D → R. For
f , g ∈ C(D), we define addition and multiplication pointwise as follows:
( f + g)(x) = f (x) + g(x), for all x ∈ D,
and
(c f )(x) = c f (x), c ∈ R
then C(D) is a vector space. □
Definition 3.11 The n×1 vectors v1 , v2 , . . . , vn in V are said to be linearly dependent
if there exists constants c1 , c2 , . . . , cn not all zero, such that
c1 v1 + c2 v2 + . . . + cn vn = 0
If the vectors are not linearly dependent, then they are called linearly independent.
154 Matrices and Systems of Linear Equations

Example 3.17 The vectors

e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, 0, . . . , 1)

are linearly independent since if

(0, . . . , 0) = c1 (1, 0, . . . , 0) + c2 (0, 1, 0, . . . , 0) + cn (0, 0, . . . , 1) = (c1 , c2 , . . . , cn ),

then,
(0, . . . , 0) = (c1 , c2 , . . . , cn ).
This has the only solution c1 = c2 = . . . = cn = 0. □
Definition 3.12 Let V be a vector space over R. Let v1 , v2 , . . . , vn in V. A vector v ∈ V
is a linear combination of {v1 , v2 , . . . , vn } if there exists scalars b1 , b2 , . . . , bn ∈ R such
that
v = b1 v1 + b2 v2 + . . . + bn vn .
Definition 3.13 (Span) The span of {v1 , v2 , . . . , vn } is defined as

span(v1 , v2 , . . . , vn ) := {b1 v1 + b2 v2 + . . . + bn vn b1 , b2 , . . . , bn ∈ R}.

Example 3.18 Consider the vectors

v1 = (3, 2, 1), v2 = (2, −3, 2), and v3 = (−12, 5, −8).

Then,
a1 v1 + a2 v2 + a3 v3 = (0, 0, 0)
implies that

(3a1 + 2a2 − 12a3 , 2a1 − 3a2 + 5a3 , a1 + 2a2 − 8a3 ) = (0, 0, 0).

So we have the three equations with three unknowns

3a1 + 2a2 − 12a3 = 0


2a1 − 3a2 + 5a3 = 0
a1 + 2a2 − 8a3 = 0.

Using Gauss-elimination method we see the system has infinitely many solutions;
namely,
a1 = 2a3 , a2 = 3a3 .
Setting a3 = 1, we obtain a solution (2, 3, 1). In general, the set of all solutions (or
solutions space) is given by

S = {(a1 , a2 , a3 ) a1 = 2a3 , a2 = 3a3 } = span (2, 3, 1) .


Vector Spaces 155

We have the following important Lemma concerning linear independence.


Lemma 7 (Linear Independence Lemma) The set of vectors {v1 , v2 , . . . , v p } is lin-
early independent if and only if every vector v ∈ span{v1 , v2 , . . . , v p } can be uniquely
written as a linear combination of {v1 , v2 , . . . , v p }.

Proof Suppose v1 , v2 , . . . , v p are linearly independent and suppose any vector v ∈


span{v1 , v2 , . . . , v p } can be written in two different ways.
That is there are constants bi , ci , i = 1, 2, . . . p such that

v = b1 v1 + b2 v2 + . . . + b p v p

and
v = c1 v1 + c2 v2 + . . . + c p v p .
By subtracting the two equations we arrive at

0 = (b1 − c1 )v1 + (b2 − c2 )v2 + . . . + (b p − c p )v p . (3.26)

Since the set of vectors {v1 , v2 , . . . , v p } is linearly independent, the only solution to
equation (3.26) is

b1 − c1 = 0, b2 − c2 = 0, ..., b p − c p = 0.

Thus,
b1 = c1 , b2 = c2 , . . . , b p = c p .

This proves the necessary part of the lemma. For the proof of the sufficient condition,
for every v ∈ span{v1 , v2 , . . . , v p }, there are unique bi , i = 1, 2, . . . p such that v =
b1 v1 + b2 v2 + . . . + c p v p . This implies that the zero vector v = 0 can be written as a
linear combination of v1 , v2 , . . . , v p , only when

b1 = b2 = . . . = b p = 0.

This shows the set of vectors {v1 , v2 , . . . , v p } is linearly independent. This completes
the proof.
Definition 3.14 (subspace) Let V be a vector space over F. Then U is a subspace of
V if and only if the following properties are satisfied:
1. 0 ∈ U; additive identity
2. If u1 , u2 ∈ U, then u1 + u2 ∈ U; (closure under addition)
3. For scalar a ∈ F, u ∈ U, then au ∈ U; (closure under scalar multiplication).
Example 3.19 The set
U = {(a, 0) | a ∈ R}
is a subspace of R2 . □
156 Matrices and Systems of Linear Equations

Example 3.20 The set

U = {(a, b, c) ∈ R3 | b + 4c = 0}

is a subspace of R3 . To see this , we make sure the requirements of (3.14) are met.
As for 1., we easily see that (0, 0, 0) ∈ U, since b + 4c = 0 is satisfied. To verify 2.,
we let u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ). Then we have

u2 + 4u3 = 0, and v2 + 4v3 = 0

by adding the two equations we easily arrive at

(u2 + v2 ) + 4(u3 + v3 ) = 0.

Let K = (u1 +v1 , u2 +v2 , u3 +v3 ) ∈ U. Then it must satisfy (u2 +v2 )+4(u3 +v3 ) = 0.
This shows that K := u + v ∈ U. It remains to be shown that 3. holds. Let α ∈ R,
and u = (u1 , u2 , u3 ) ∈ U. Then, αu = (αu1 , αu2 , αu3 ) satisfies the equation αu2 +
4αu3 = α(u2 + 4u3 ) = 0, and so αu ∈ U. □
Lemma 8 Let v1 , v2 , . . . , vn be vectors in the vector space V. Then
1. v j ∈ span(v1 , v2 , . . . , vn ),
2. span(v1 , v2 , . . . , vn ) is a subspace of V.
3. If v1 , v2 , . . . , vn are vectors in the vector space V and U ⊂ V is a subspace such
that v1 , v2 , . . . , vn ∈ U, then span(v1 , v2 , . . . , vn ) ⊂ U.

Proof As for 1. let v j be any vector of v1 , v2 , . . . , vn ∈ V. Since V is a vector space,


the result follows from parts b) of addition and e) of multiplication. To prove 2.
we observe that 0 ∈ span(v1 , v2 , . . . , vn ) and that span(v1 , v2 , . . . , vn ) is closed under
addition and scalar multiplication.
Remark 10 Lemma 8 implies that span(v1 , v2 , . . . , vn ) is the smallest subspace of V
containing v1 , v2 , . . . , vn .
Example 3.21 The vectors v1 = (2, 2, 0), v2 = (2, −2, 0) spans a subspace of R3 .
Actually, if v = (x1 , x2 , x3 ) ∈ R3 , then span(v1 , v2 ) is R2 as a subset of R3 . To see
this, we write v as a combination of v1 , and v2 .

(x1 , x2 , x3 ) = a1 (2, 2, 0) + a2 (2, −2, 0),

implies that
(x1 , x2 , x3 ) = (2a1 + 2a2 , 2a1 − 2a2 , 0).
x1 +x2 x1 −x2
Clearly, a1 = 4 , and a1 = 4 form a solution for any x1 , x2 ∈ R and x3 = 0. □
Definition 3.15 If span(v1 , v2 , . . . , vn ) = V, then we say that (v1 , v2 , . . . , vn ) spans V.
In this case the vector space V is finite-dimensional . A vector space that is not finite-
dimensional is called infinite-dimensional .
Vector Spaces 157

Example 3.22 The vectors

e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), ..., en = (0, 0, . . . , 1)

span Rn since any vector u = (u1 , u2 , . . . , un ) ∈ Rn can be written as a combination


of e1 , e2 , . . . , en . That is

u = u1 e1 + u2 e2 + . . . + un en .

Thus, the vector space Rn is finite-dimensional with dimension n. □


Example 3.23 Consider the polynomial p(x) = an xn + an−1 xn−1 + . . . + a1 x + a0 ∈
V [x] with coefficients in R such that an ̸= 0. Thus, the polynomial has degree n and
we write deg(p(x)) = n. By convention the degree of the zero polynomial p(x) = 0,
is −∞. Let Vn [x] = {p(x) ∈ V [x] | deg(p(x)) ≤ n}. Then by Example 3.15, Vn [x] is a
vector space. Then Vn [x] is a subspace of V [x]. That is

Vn [x] ⊂ V [x].

This is the case since the zero polynomial is in Vn [x]. Moreover, Vn [x] is closed under
vector addition and scalar multiplication. Since

Vn [x] = span(1, x, x2 , . . . , xn ),

the subspace Vn [x] is of finite dimension. On the other hand, we assert that V [x] is
infinite-dimensional. Assume the contrary; that is

V [x] = span(p1 (x), p2 (x), . . . , pk (x))

for finite index k. Let


 
n = max deg(p1 (x)), deg(p2 (x)), . . . , deg(pk (x)) .

Then, xn+1 ∈ V [x], but

xn+1 ∈
/ span(p1 (x), p2 (x), . . . , pk (x)).

Hence, V [x] is infinite-dimensional. □


Definition 3.16 (Bases) The set of vectors {v1 , v2 , . . . , vn } is a basis for the finite-
dimensional vector space V if {v1 , v2 , . . . , v p } is linearly independent and V =
span(v1 , v2 , . . . , vn ).
Example 3.24 The vectors ei , i = 1, 2, . . . , n of Example 3.22 form a basis for Rn . □
Example 3.25 The vectors ei , i = 1, 2, . . . , n of Example 3.22 form a basis for Rn .
Along the same lines of thinking, the vectors (1, 2), (1, 1) forms a basis for R2 . □
Example 3.26 The set {1, x, x2 , . . . , xn } forms a basis for Vn [x], where Vn [x] is defined
in Example 3.23. □
158 Matrices and Systems of Linear Equations

Observe that the set {1, x, x2 } is a basis for the vector space of polynomials in x
with real coefficients having degree at most 2. Note that V2 [x] has infinitely many
polynomial with degree at most 2, yet we managed to have a description of all them
using the set {1, x, x2 }.
Recall that a vector space V is called finite-dimensional if V has a basis consisting of
a finite numbers of vectors; otherwise, V is infinite-dimensional.
Remark 11 The dimension of a vector space is the number of vectors in a basis.
It can be shown that in an n-dimensional vector space, any set of n + 1 vectors is
linearly dependent. Thus, the dimension of a vector space could be defined as the
number of vectors in a maximal linearly independent set.
Remark 12 By Lemma 7, If {v1 , v2 , . . . , vn } forms a basis of V, then every vector
v ∈ V can be uniquely written as a linear combination of v1 , v2 , . . . , vn .
To see the difference between basis and span, we consider the vectors

v1 = (1, 0), v2 = (0, 1), and v3 = (2, 0).

Clearly, span(v1 , v2 ) = R2 . Moreover, span(v1 , v2 , v3 ) = R2 . (We ask you to verify


this). However, only this set {v1 , v2 } is a basis of R2 , because the vector v3 makes
the set {v1 , v2 , v3 } linearly dependent.

3.5.1 Exercises
Exercise 3.39 Show the set

U = {(a, b, c) ∈ R3 | a + b + 4c = 0}

is a vector space under the usual operations vector addition and scalar multiplica-
tion on R3 .
Exercise 3.40 Show the set

U = {(a, b, c) ∈ R3 | a + 2b = 0}

is a subspace under the usual operations vector addition and scalar multiplication
on R3 .
Exercise 3.41 Show the set

U = {(a, 0) ∈ R2 | a ∈ R}

is a subspace under the usual operations addition and multiplication on R2 .


Exercise 3.42 Show that the vectors v1 = (1, 1, 1), v2 = (2, 1, 3), and v3 =
(−1, 2, 1) are linearly independent in R3 . Write v = (2, 4, 3) as a linear combina-
tion of the vectors v1 , v2 , and v3 .
Vector Spaces 159

Exercise 3.43 Redo Example 3.18 for the following set of vectors.
(a) v1 = (1, 1, 1), v2 = (1, 2, 0), and v3 = (0, −1, 1).
(b) v1 = (1, 1, 1), v2 = (1, 2, 0), and v3 = (0, −1, 2).
Exercise 3.44 Explain why the set of vectors given by

v1 = (3, 4, 5), v2 = (−3, 0, 5), v3 = (4, 4, 4) and v4 = (3, 4, 0)

is linearly dependent.
Exercise 3.45 Prove 3. of Lemma 8.
Exercise 3.46 Either show the set is a vector space or explain why it is not. All
functions are assumed to be continuous.
(a) U = {(a, 2) ∈ R2 |a ∈ R} under the usual operations of addition and multiplica-
tion on R2 .
(b) U = {(a, b) ∈ R2 | a, b ≥ 0} under the usual operations of addition and multipli-
cation on R2 .
d
(c) U = { f : R → R | dx f exists} under the usual operations of addition and multi-
plication on functions.
(d) U = { f : R → R | f (x) ̸= 0 for any x ∈ R} under the usual operations of addition
and multiplication on functions.
(e) The solution set to a linear nonhomogeneous equations.
(f) U = {A2×2 | det(A) = 0} under the usual operations of addition and multiplica-
tion for matrices.
(g) U = { f : [−1, 1] → [−1, ∞)} under the usual operations of addition and multi-
plication on functions.
(h) U = { f : R → R | f (0) = 0} under the usual operations of addition and multi-
plication on functions.
(i) U = { f : R → R | f (x) ≤ 0, for all x ∈ R} under the usual operations of addition
and multiplication on functions.
Exercise 3.47 Show that any set of vectors {v1 , v2 , . . . , vn }, which spans a vector
space V contains a linearly independent subset which also spans V.
Exercise 3.48 Show the vectors v1 = (1, 1), v2 = (1, 2), and v3 = (1, 0) span
R2 .
Exercise 3.49 Give a basis of
 
a b
M2×2 = { | a, b, c, and d ∈ R}.
c d

There are many possible answers.


160 Matrices and Systems of Linear Equations

Exercise 3.50 Let v1 = (1, 0, 0), v2 = (0, 1, 0), v3 = (0, 0, 1),


and v4 = (1, 1, 1).
(a) Show that {v1 , v2 , v3 , v4 } spans R3 .
(b) Does the set of vectors in part a) form a basis for R3 ? Explain.

3.6 Eigenvalues-Eigenvectors
For motivational purpose we begin the the following example.
Example 3.27 (Lotka–Volterra Predator–Prey Model) We consider the Lotka–
Volterra Predator–Prey model. Let x = x(t) and y = y(t) be the number of preys
and predators at time t, respectively. To keep the model simple, we will make the
following assumptions:
• the predator species is dependent on a single prey species as its only food supply,
• the prey species has an unlimited food supply, and
• there is no threat to the prey other than the specific predator.
We observe that, in the absence of predation, the prey population would grow at a
natural rate
dx
= ax, a > 0.
dt
On the other hand, in the absence of prey, the predator population would decline at a
natural rate
dy
= −cy, c > 0.
dt
The effects of predators eating prey is an interaction rate of decline (−bxy, b > 0)
in the prey population x, and an interaction rate of growth (dxy, d > 0) of predator
population y. Hence, one obtains the predator-prey model
dx
= ax − bxy
dt
dy
= −cy + dxy. (3.27)
dt
The Lotka-Volterra model consists of a system of linked differential equations that
cannot be separated from each other and that cannot be solved in closed form. Since,
(0, 0) is a solution of the system, we linearize around it and rewrite the systems as

X ′ = AX + g(x, y), (3.28)

where      
x a 0 −bxy
X= , A= , g= .
y 0 −c dxy
Eigenvalues-Eigenvectors 161

Since the function g is continuously differentiable in both variables near the origin,
the stability of the nonlinear system (3.28) is heavily influenced by the stability of
linear system
X ′ = AX. (3.29)
We search for solutions to (3.29) of the form

X = zeλt ,
 
z1
where z = , for a parameter λ . Substituting into (3.29) we arrive at the relation
z2

Az = λ z. (3.30)
 
0
It is evident that the zero vector z = is a solution of (3.30) for any value of λ .
0
We are interested in the values of λ for which (3.30) has a nonzero solution. Such
values are called eigenvalues and the corresponding vector solutions given by z are
called eigenvectors. We have this important definition below. □
Definition 3.17 Let A be an n × n constant matrix, in short “matrix.” A number λ is
said to be an eigenvalue of A if there exists a nonzero vector v such that

Av = λ v. (3.31)

The vector v is said to be an eigenvector corresponding to the eigenvalue λ . We may


refer to λ and v as an eigenpair.
Theorem 3.12 If λ0 , v0 is an eigenpair of A, then

X(t) = eλ0 t v0 = v0 eλ0 t

is a solution of
X ′ = AX. (3.32)

Proof Let A be an n × n matrix. Let X(t) = eλ0 t v0 . Then

X ′ (t) = λ0 eλ0 t v0 = λ0 v0 eλ0 t = Av0 eλ0 t = Aeλ0 t v0 = AX,

as desired. This completes the proof.

Consider the n × n matrix A = (ai j ) such that

Ax = λ x, (3.33)

where    
a11 a12 ··· a1n x1
a21 a22 ··· a2n  x2 
A= . ..  , x =  . .
   
.. ..
 .. . . .   .. 
an1 an2 ··· ann xn
162 Matrices and Systems of Linear Equations

For the purpose of finding the eigenvalues and corresponding eigenvectors, we


rewrite (3.33) as

a11 x1 + a12 x2 + a13 x3 + . . . + a1n xn = λ x1


a21 x1 + a22 x2 + a23 x3 + . . . + a2n xn = λ x2
a31 x1 + a32 x2 + a33 x3 + . . . + a3n xn = λ x3
..
.
an1 x1 + an2 x2 + an3 x3 + . . . + ann xn = λ xn

By transferring the terms on the right-hand side to the left-hand side, we arrive
at

(a11 − λ )x1 + a12 x2 + a13 x3 + . . . + a1n xn = 0


a21 x1 + (a22 − λ )x2 + a23 x3 + . . . + a2n xn = 0
a31 x1 + a32 x2 + (a33 − λ )x3 + . . . + a3n xn = 0
..
.
an1 x1 + an2 x2 + an3 x3 + . . . + (ann − λ )xn = 0.

By Corollary 4 this homogeneous system has a nontrivial solution if and only if the
corresponding determinant of the coefficients is zero. That is

a11 − λ a12 ··· a1n

a21 a22 − λ · · · a2n
D(λ ) = det(A − λ I) = . .. = 0. (3.34)

. .
.. .. .. .

an1 an2 · · · ann − λ
Equation (3.34) is called the characteristic equation corresponding to the matrix A.
By expanding D(λ ) we obtain a polynomial of nth degree in λ . This is called the
characteristic polynomial corresponding to the matrix A. Thus, we have proved the
following theorem.
Theorem 3.13 The eigenvalues of an n × n matrix A are the roots of its correspond-
ing characteristic equation (3.34).
In general, if D(λ ) is an nth degree polynomial then it can be factored into linear
terms over C. of the form

D(λ ) = (λ − λ1 )k1 (λ − λ2 )k2 . . . (λ − λ p )k p ,

where k1 + k2 + . . . + k p = n, and ki is called the multiplicity of the eigenvalue λi . For


example, if A5×5 is a matrix with characteristic polynomial

D(λ ) = λ 5 − 3λ 2 + 6λ 3 − 4λ 2 ,

then
D(λ ) = λ 2 (λ − 1)(λ − 2)2 .
Eigenvalues-Eigenvectors 163

So, λ1 = 0 has multiplicity k1 = 2, λ2 = 1 has multiplicity k2 = 1, and λ3 = 2 has


multiplicity k3 = 2. Once the eigenvalues are obtained, we use (3.33) to find the cor-
responding eigenvectors. We will apply the concept of eigenvalues and eigenvectors
to solving linear systems of differential equations, As a result, we need to undertake
some initial work.
Theorem 3.14 (Independent eigenvectors) Let v1 , v2 , . . . , v p be the correspond-
ing eigenvectors to the distinct eigenvalues λ1 , λ2 , . . . , λ p of a matrix A. Then
v1 , v2 , . . . , v p are linearly independent.

Proof Suppose v1 , v2 , . . . , v j are linearly independent for positive integer j, where j


is maximal. If j < p, then v j+1 can be written as a linear combination of the vectors,
v1 , v2 , . . . , v j . That is there are constants c1 , c2 , . . . , c j such that
v j+1 = c1 v1 + c2 v2 + . . . + c j v j .
Multiply from the left by the matrix A and apply the fact that Avi = λi vi for i =
1, 2, . . . j to arrive at
Av j+1 = λ j+1 v j+1

= λ j+1 c1 v1 + c2 v2 + . . . + c j v j
= c1 λ j+1 v1 + c2 λ j+1 v2 + . . . + c j λ j+1 v j .
On the other hand

Av j+1 = A c1 v1 + c2 v2 + . . . + c j v j
= c1 Av1 + c2 Av2 + . . . + c j Av j
= c1 λ1 v1 + c2 λ2 v2 + . . . + c j λ j v j .
Subtracting the two equations gives
c1 (λ j+1 − λ1 )v1 + c2 (λ j+1 − λ2 )v2 + . . . + c j (λ j+1 − λ j )v j = 0.
Since v1 , v2 , . . . , v j are linearly independent, we must have that
c1 (λ j+1 − λ1 ) = 0, c2 (λ j+1 − λ2 ) = 0, ... , c j (λ j+1 − λ j ) = 0.
But then
λ j+1 − λ j ̸= 0, for all j = 1, 2, . . . , n
which can only hold when
c1 = c2 , . . . , c j = 0.
This implies that the vector
v j+1 = 0, (zero vector)
which is a contradiction, since v j+1 is the eigenvector corresponding to λ j+1 . This
completes the proof.

The solution of a given linear system of differential equations is the focus of the
following theorem.
164 Matrices and Systems of Linear Equations

Theorem 3.15 (Distinct eigenvalues) Let λ1 , λ2 , . . . , λn be n distinct real eigenval-


ues of the matrix A of (3.32) and let K1 , K2 , . . . , Kn be the corresponding eigenvectors.
Then the general solution of (3.32) on the interval I = (−∞, ∞) is given by

X(t) = c1 K1 eλ1 t + c2 K2 eλ2 t , . . . , cn Kn eλn t

for constants ci , i = 1, 2, . . . , n.
Example 3.28 Consider the linear homogeneous system of differential equations

x1′ = 5x1 + 2x2 + 3x3


x2′ = 8x2 + 3x3
x3′ = 4x3 .

In matrix form we have


x′ = Ax,
where    
x1 5 2 3
x = x2  , A = 0 8 3 .
x3 0 0 4
Then
5 − λ 2 3

D(λ ) = det(A − λ I) = 0 8−λ 3
0 0 4−λ
and the system has a nontrivial solution if and only if D(λ ) = 0. Expanding the
determinant along the first row we obtain the characteristic equation

(5 − λ )(8 − λ )(4 − λ ) = 0,

which has the three distinct eigenvalues

λ1 = 5, λ2 = 8,
and λ3 = 4.
 
k1
To compute the corresponding eigenvectors, we let K1 = k2  . Then using (3.31)
k3
we have (A − λ I)K1 = 0, or

(5 − λ )k1 + 2k2 + 3k3 = 0


(8 − λ )k2 + 3k3 = 0 (3.35)
(4 − λ )k3 = 0.

From the third and second equations, it is obvious that, with λ = 5, that k3 = k2 = 0.
The first equation implies that 0k1 + 0 + 0 = 0, from which we conclude that k1
is arbitrary. So, if we set k1 = 1, then the corresponding eigenvector is given by
Eigenvalues-Eigenvectors 165
 
1
K1 = 0 . Similarly, if we substitute λ = 8 in (3.35), we arrive at the corresponding
0  
2
eigenvector K2 = 3 . Finally, the third eigenvector corresponding to λ = 4 is
0
 
6
K3 =  3  . Using Theorem 3.15, we arrive at the solution
−4
     
1 2 6
x(t) = c1 0 e5t + c2 3 e8t + c3  3  e4t .
0 0 −4


In some cases a repeated eigenvalue gives one independent eigenvector and the others
must be found using the following method as the next example demonstrates.
We consider the system   
′ 3 −18 x1
x = . (3.36)
2 −19 x2
 
k
Then the coefficient matrix has the repeated eigenvalue λ1 = λ2 = −3. If K1 = 1
k2
is the corresponding eigenvector, then we have the two equations 6k1 − 18k2 =
0, 2k1 − 6k2 = 0, which are bothequivalent
 to k1 = 3k2 . By setting k2 = 1, we ob-
3
tain the single eigenvector K1 = and it follows that the corresponding solution
1
is given by  
3 −3t
φ1 = e .
1
But since we are interested in finding the general solution, we need to examine the
question of finding another solution.
In general, if m is a positive integer and (λ − λ1 )m is a factor of the characteristic
equation det(A − λ I) = 0, while (λ − λ1 )m+1 is not a factor, then λ1 is said to be an
eigenvalue of multiplicity m. Below, we discuss two such scenarios:
(a) For some n × n matrice A it may be possible to find m linearly independent eigen-
vectors K1 , K2 , . . . , Kn corresponding to an eigenvalue λ1 of multiplicity m ≤ n.
In this case the general solution of the system contains the linear combination

c1 K1 eλ1 t + c2 K2 eλ2 t + . . . + cn Kn eλn t .

(b) If there is one eigenvector corresponding to an eigenvalue of multiplicity m, then


m linearly independent solutions of the form
166 Matrices and Systems of Linear Equations

φ1 = K11 eλ1 t
φ2 = K21teλ1 t + K22 eλ1 t
..
.
t m−1 λ1 t t m−2 λ1 t
φm = Km1 e + Km2 e + . . . + Kmm eλ1 t ,
(m − 1)! (m − 2)!
where Ki j are columns vectors that can always be found, and they are known as gen-
eralized eigenvectors. For an illustration of case (b), we suppose λ1 is an eigenvalue
of multiplicity two with only one corresponding eigenvector K1 . To find the second
eigenvector, we assume a second solution of

X ′ = AX

of the form
φ2 (t) = K1teλ1 t + Peλ1 t , (3.37)
where    
p1 k1
 p2  k2 
P =  .  and K =  . 
   
 ..   .. 
pn kn
are to be found. Differentiate φ2 (t) and substitute back into x′ = Ax to get

(AK1 − λ1 K1 )teλ1 t + (AP − λ1 P − K1 )eλ1 t = 0.

Since the above equation must hold for all t, it follows that

(A − λ1 I)K1 = 0, (3.38)

and
(A − λ1 I)P = K1 . (3.39)
Equation (3.38) reaffirm that K1 is the eigenvector of A associated with the eigenvalue
λ1 . Thus, we obtained one solution φ1 (t) = K1 eλ1 t . To find the second solution given
by (3.37) we
 must
 solve for the vector P in (3.39). To find a second solution for (3.36),
p1
we let P = . Then from equation (3.39), we have (A+3I)P = K1 , which implies
p2
that 6p1 − 18p2 = 3, or 2p1 − 6p2 = 1. Since these two equations are equivalent,
we may chose p1 = 1 and find p2 = 1/6. However, for simplicity, we shall choose
p1 = 1/2 so that p2 = 0. Using (3.37) we find that
 
 −3t 1/2 −3t
φ2 (t) = 31 te + e .
0
Finally, the general solution is

X = c1 φ1 (t) + c2 φ2 (t).
Inner Product Spaces 167

3.6.1 Exercises
Exercise 3.51 Find the eigenvalues and the corresponding eigenvectors.
   
  3 −1 0 13 −3 5
2 5
(a) , (b) 4 0 0  , (c)  0 4 0 .
3 8
2 5 −3 −15 9 −7
Exercise 3.52 Show that if A is an n × n matrix with det(A) ̸= 0, then all of its eigen-
values are different from zero.
Exercise 3.53 Show that if A is an n × n matrix with eigenvalues λi , i = 1, 2, . . . , n,
then the eigenvalues of A2 are λi2 , i = 1, 2, . . . , n.
Exercise 3.54 Solve thew following systems of differential equations.
  
5 −1 0 x1   
′ ′ 1 2 x1
(a) x = 0
 −5 9  x2 , (b) x =
 ,
4 3 x2
5 −1 0 x3
  
3 −1 −1 x1   
−4 2 x1
(c) x′ = 1 1 −1 x2  , (d) x′ = .
− 52 2 x2
1 −1 1 x3
Exercise 3.55 Let A and B be two square matrices with AB = BA. Let λ be an eigen-
value of A with corresponding eigenvector k. If Bk ̸= 0, show that Bk is an eigenvector
of A, with eigenvalue λ .
Exercise 3.56 Let A be an n × n matrix. Show that if the sum of all entries of each
column is r, then r is an eigenvalue of A.
Exercise 3.57 Let A be a non-zero n × n matrix. Show that if AT = λ A, then λ = ±1.
Exercise 3.58 Solve
(a)   
1 −2 2 x1
x′ = −2 1 −2 x2  ,
2 −2 1 x3

(b)   
3 −18 x1
x′ = .
2 −9 x2

3.7 Inner Product Spaces


In this section we introduce inner product spaces, normed vector spaces and orthog-
onality of vectors. We begin with the following definition.
168 Matrices and Systems of Linear Equations

Definition 3.18 (Inner product) If for any vectors u, v, and w in a vector space V and
a scalar a ∈ R we can define an inner (or scalar) product (u, v) such that
1. (u, v) = (v, u),
2. (u, v + w) = (u, v) + (u, w),
3. (au, v) = a(u, v),
4. (u, u) ≥ 0, and (u, u) = 0 if and only if u = 0,

then V is called an inner product space.


Example 3.29 Let V be a finite dimensional vector space. For
u = (u1 , u2 , . . . , un ) ∈ V, v = (v1 , v2 , . . . , vn ) ∈ V,
we define
(u, v) = u · v = u1 v1 + u2 v2 + . . . + un vn
n
= ∑ ui vi . (3.40)
i=1

Clearly, (3.40) satisfies 1. − 4. For the purpose of illustration, we quickly go over the
verifications. Now
n n
(u, v) = ∑ ui vi = ∑ vi ui = (v, u).
i=1 i=1
This verifies 1. As for 2. we let w = (w1 , w2 , . . . , wn ) ∈ V. Then
(u, v + w) = (u1 , u2 , . . . , un ) · (v1 + w1 , v2 + w2 , . . . , vn + wn )
= u1 (v1 + w1 ) + u2 (v2 + w2 ) + . . . + un (vn + wn )
= u1 v1 + u2 v2 + . . . un vn + u1 w1 + u2 w2 + . . . + un wn
= (u, v) + (u, w).
On the other hand,
(au, v) = (au1 , au2 , . . . , aun ) · (v1 , v2 , . . . , vn )
= au1 v1 + au2 v2 + . . . + aun vn
= a(u1 v1 + u2 v2 + . . . + un vn )
= a(u, v).
This verifies 3. For verifying 4. we see that
(u, u) = u21 + u22 + . . . + u2n = 0
if and only if u1 = u2 = . . . = un = 0. Thus, (u, u) > 0, if and only if, u ̸= 0. We
conclude that (3.40) defines an inner product. We note that if u and v are two vectors
in Rn , then
(u, v) = uvT = vuT = u1 v1 + u2 v2 + . . . + un vn .

Inner Product Spaces 169

For the next example we define the space C0 [a, b] to be the set of all continuous
functions f : [a, b] → R.
Example 3.30 Consider the vector space C0 [a, b]. Let f , g ∈ C0 [a, b]. If we define
 b
( f , g) = f (x)g(x)dx, (3.41)
a

then C0 [a, b] is an inner product space. □


Definition 3.19 (Norm) A linear space is said to have a norm on it if there is a rule
that uniquely determines the size of a given element in the space. In particular, if V
is a linear space, then a norm on V is a mapping that associates to each y ∈ V a
nonnegative real number denoted by
p
||y|| = (y, y)

called the norm of y, and that satisfies the following conditions:


1. ∥y∥ > 0 and ∥y∥ = 0 if and only if y = 0.
2. ∥αy∥ = |α|∥y∥ for all y ∈ V, α ∈ R.
3. ∥y + z∥ ≤ ∥y∥ + ∥z∥ for y, z ∈ V (triangle inequality).
Definition 3.20 ( Normed space) A normed linear space is a linear space V on
which there is defined a norm || · ||.
Remark 13 The number ||y|| is interpreted as the magnitude on the size of y. Thus a
norm puts geometric structure on V.
Lemma 9 (Schwartz inequality) Let V be a linear normed space. Then for y, z ∈ V
we have
||(y, z)|| ≤ ||y|| ||z||.

Proof Let λ ∈ R and consider

0 ≤ ||y + λ z||2 = (y + λ z, y + λ z)
= (y, y) + 2λ (y, z) + λ 2 (z, z)
= ||y||2 + 2(y, z)λ + ||z||2 λ 2 ,

which is quadratic in λ . Thus, we must have

4(y, z)2 − 4||y||2 ||z||2 ≤ 0.

By remarking that (y, z)2 = ||(y, z)||2 , the above inequality gives

||(y, z)||2 ≤ ||y||2 ||z||2 .

Taking the square root implies the result.


170 Matrices and Systems of Linear Equations

Example 3.31 Let V = Rn . For v = (v1 , v2 , . . . , vn ) ∈ V, we claim


s
p n
||v|| = (v, v) = ∑ v2i (3.42)
i=1

defines a norm on Rn . The verifications of 1. and 2. are similar to those in Example


3.29. We verify 3. Let y, z ∈ V. Then by (3.42) we have ||y||2 = (y, y). Thus,

0 ≤ ||y + z||2 = ||y + z|| ||y + z||


= (y + z, y + z)
= (y, y) + (y, z) + (z, y) + (z, z)
= ||y||2 + 2(y, z) + ||z||2
≤ ||y||2 + 2||y|| ||z|| + ||z||2 (Using Lemma 9)
 2
= ||z|| + ||z|| .

Taking the square root on both sides we arrive at

∥y + z∥ ≤ ∥y∥ + ∥z∥.

Thus, Rn is a normed space. □


Definition 3.21 i) A set of vectors {vi }ni=1 is called orthogonal if

1, i = j
(vi , v j ) = δi j =
0, i ̸= j

ii) A matrix A is called orthogonal if

AT A = I, or A−1 = AT .

iii) A set of vectors S = {vi }ni=1 is called orthonormal if every vector in S has mag-
nitude 1 and the set of vectors are mutually orthogonal.
For example the matrix
 
2 −2 1
1
A= 1 2 2
3
2 1 −2

is orthogonal since AAT = I, or A−1 = AT . Now we present an elementary review of


complex numbers.
Definition 3.22 A complex number z is any relation of the form

z = a + ib = (a, b), where i2 = −1.


Inner Product Spaces 171

The real numbers a and b are called the real and imaginary parts of z, respectively.
The set of all complex numbers is denoted by C. The number z̄ = a − ib is called the
conjugate of z.
By placing a on the x-axis and b on the y-axis, we can interpret z = a + ib as a vector
from the origin terminating at (a, b). The length of the vector is called the modulus
or magnitude of z and is denoted by
p
||z|| = a2 + b2 .

Notice that
||z||2 = a2 + b2 = zz̄.
Next, we state two of the most important characteristics of symmetric matri-
ces.
Theorem 3.16 Suppose A is an n × n real symmetric matrix.
a) If λ1 and λ2 are two distinct eigenvalues of A, then their corresponding eigen-
vectors y1 and y2 are orthogonal. That is

(y1 , y2 ) = 0.

b) All the eigenvalues of A are real.

Proof a) Let λ1 and λ2 be two distinct eigenvalues of A, with corresponding eigen-


vectors y1 and y2 . This implies that

Ay1 = λ1 y1 , Ay2 = λ2 y2 . (3.43)

Multiplying the transpose of the first equation in (3.43) from the right by y2 we
get
(Ay1 )T y2 = λ1 yT1 y2 ,
or
yT1 AT y2 = λ1 yT1 y2 .
Multiplying the second equation in (3.43) from the left by yT1 , we obtain

yT1 Ay2 = λ2 yT1 y2 .

Subtracting the last two expressions yields

yT1 Ay2 − yT1 AT y2 = λ2 yT1 y2 − λ1 yT1 y2 .

This results into,


0 = (λ2 − λ1 )yT1 y2 .
Since λ1 ̸= λ2 , we must have

yT1 y2 = (y1 , y2 ) = 0.
172 Matrices and Systems of Linear Equations

This proves the first part. As for the second part, suppose λ is a complex eigenvalue
of the symmetric matrix A with the possibility of a complex eigenvector V such that
Av = λ v. We take the complex conjugate on both sides of the preceding equation and
obtain Av¯ = λ¯v. This implies that Av̄ = λ̄ v̄. Using AT = A, we have the following
manipulation:
v̄T Av = v̄T (Av) = v̄T (λ v) = λ (v̄T , v).
Similarly,
v̄T Av = (Av̄)T v = (λ̄ v̄)T v = λ̄ (v̄, v).
Subtracting the above two expressions, we obtain

0 = (λ̄ − λ )(v̄, v).

Since (v̄, v) > 0, we must have (λ̄ − λ ) = 0, or λ̄ = λ . This completes the


proof.

Next, we define the Gram-Schmidt process which is a procedure that converts a set
of linearly independent vectors into a set of orthonormal vectors that spans the same
space as the original set.
Theorem 3.17 (Gram-Schmidt Process) Suppose the vectors u1 , u2 , . . . , un form a
basis for a vector space V. Then, from the vectors ui , i = 1, 2, . . . , n we can form an
orthonornal basis x1 , x2 , . . . , xn for V.

Proof We first let v1 = u1 and take


v1 u1
x1 = , (= ).
||v1 || ||u1 ||
Then
u1 u1  1 ||u1 ||
(x1 , x1 ) = , = (u1 , u1 ) = = 1.
||u1 || ||u1 || ||u1 || ||u1 ||
Next we seek x2 such that ||x2 || = 1, and (x1 , x2 ) = 0. For a constant c to be deter-
mined, we set
v2 = u2 − cx1 .
Then,

(x1 , v2 ) = (x1 , u2 − cx1 )


= (x1 , u2 ) − c(x1 , x1 )
want
= (x1 , u2 ) − c = 0.

This implies c = (x1 , u2 ), which is scalar component of u2 in the direction of x1 .


Thus,
v2 = u2 − (x1 , u2 )x1 .
So we take
v2
x2 = .
||v2 ||
Inner Product Spaces 173

In a similar fashion, we set

v3 = u3 − c1 x1 − c2 x2 .

Then
want
(x1 , v3 ) = (x1 , u3 ) − c1 (x1 , x1 ) − c2 (x1 , x2 ) = 0.
Since (x1 , x2 ) = 0, the above expression implies that c1 = (x1 , u3 ). Also,

want
(x2 , v3 ) = (x2 , u3 ) − c1 (x2 , x1 ) − c2 (x2 , x2 ) = 0,

implies c2 = (x2 , u3 ). Thus,

v3 = u3 − (x1 , u3 )x1 − (x2 , u3 )x2 .

So we take
v3
x3 = .
||v3 ||
Continuing in this process, we obtain a general formula for all vectors given
by
j−1
vj
xj = , where v j = u j − ∑ (xi , u j )xi . (3.44)
||v j || i=1

Left to show that v j ̸= 0, for all j = 1, 2, . . . n. If not, then u j is a linear combination


of u1 , u2 , . . . , un . This is impossible since u1 , u2 , . . . , un are linearly independent. This
completes the proof.
Example 3.32 Find an orthonormal basis for the subspace of R4 spanned by the
vectors      
1 1 0
1 1 −1
u1 = 
1 , u2 = −1 , u3 =  2  .
    

1 −1 1
According to (3.44), we have
 
1
u1 11
x1 = =  
||u1 || 2 1

1
v2
and x2 = ||v2 || , where v2 = u2 − (x1 , u2 )x1 . Now,
       
1 1 1 1
 1  1 1  1  1 1
−1 − 2 1 · −1 2 1 ,
v2 =        

−1 1 −1 1
174 Matrices and Systems of Linear Equations

or      
1 1 1
1 1 1  1 
−1 − 4 (0) 1 = −1 .
v2 =      

−1 1 −1
 
1
v2 1 v3
= 12 

Thus, x2 = ||v2 || −1 . Similarly, x3 = ||v3 || , where

−1

v3 = u3 − (x1 , u3 )x1 − (x2 , u3 )x2 .


  
1/2 1/2
−1/2
 . Hence x3 = v3 = −1/2 .
 
From this we get v3 = 
 1/2  ||v3 ||  1/2  □
−1/2 −1/2
In the next section, we define the diagonalization of matrices. To do so, we must first
define similarity, a crucial concept in linear algebra. We would like to begin classi-
fying matrices at this point in our work. How can we determine whether matrices A
and B are similar in type, or, to put it another way, whether they are comparable? We
begin with the following definition.
Definition 3.23 Let A and B be two n × n matrices. We say A is similar to B if there
exists an invertible matrix P such that A = P−1 BP.
Note that if A is similar to B then B is similar to A (see Exercise 3.72). Thus, we may
say A and B are similar matrices, or simply, similar. We have the following theorem
regarding similar matrices.
Theorem 3.18 Suppose A and B are similar. Then the following is true.
(a) det(A) = det(B).
(b) rank(A) = rank(B).
(c) A and B have the same eigenvalues.

Proof We only prove part (c). For parts (a) and (b), see Exercise 3.74. Since A and
B are similar, there exists a nonsingular matrix P such that A = PBP−1 . Using I =
PP−1 , we have

det(A − λ I) = det(A − λ PP−1 )


= det(PBP−1 − λ PP−1 )
 
= det P(B − λ I)P−1
= det(P) det(B − λ I) det(P−1 )
Inner Product Spaces 175
1
= det(P) det(B − λ I)
det(P)
= det(B − λ I),

given that A and B share the same characteristic polynomial. This concludes the
proof.

3.7.1 Exercises
Exercise 3.59 Verify C0 [a, b] of Example 3.30 is an inner product space.
Exercise 3.60 Let V = Rn and for y ∈ V show that

||y|| = max {|yi |}


1≤i≤n

defines a norm.
Exercise 3.61 For f ∈ C0 ([a, b]), we define

|| f ||M = max {| f (x)|}, (maximum norm)


a≤x≤b

and  b
|| f ||1 = | f (x)|dx.
a
Show that || f ||M and || f ||1 , define norms on C0 ([a, b]).
Exercise 3.62 Show that every finite-dimensional inner product space has an or-
thonormal basis.
Exercise 3.63 Every orthonormal list of vectors in V can be extended to an or-
thonormal basis of V.
Exercise 3.64 Use the inner product defined by (3.41) to find all values of a so that
the two functions
f (x) = ax, g(x) = x2 − ax + 2
are orthogonal on [0, 1].
Exercise 3.65 Show the two vectors
   
1 2
u1 = −2 , u2 = 3 ,
1 4

are orthogonal but not orthonormal. Use u1 , u2 to form two vectors v1 , and v2 that
are orthonormal and span the same space.
Exercise 3.66 Suppose the vectors u1 , u2 , . . . , un are orthogonal to the vector y. Then
show that any vector in the span(u1 , u2 , . . . , un ) is orthogonal to y.
176 Matrices and Systems of Linear Equations

Exercise 3.67 For any two vectors u and v in a vector space V, show that
1 
||u + v||2 + ||u − v||2 = ||u||2 + ||v||2 .
2
Exercise 3.68 Let w : R → (0, ∞) be continuous. Show that for any two polynomials
f and g of degree n,
 b
( f , g) = f (x)g(x)w(x)dx
a
defines an inner product. Actually this is called “weighted inner product.”
Exercise 3.69 Consider the vector space C over R. Show that if z, w ∈ C, then
1
(z, w) = (zw̄ + wz̄),
2
is an inner product.
Exercise 3.70 Use the inner product defined by (3.41) to show that the set of func-
tions
sin(nπ ln(x))
{ fn (x)}∞
n=1 = { √ }, x ∈ [1, e]
x
is orthogonal.
Exercise 3.71 Apply the Gram-Schmidt process to the following vectors
(a)      
5 3 3
u1 = −2 , u2 = −1 , u3 = −3 .
4 7 6

(b)      
1 1 1
1 −2 0
u1 = 
0 ,
 u2 = 
 0 ,
 u3 = 
−1 .

1 0 2

(c)    
1 2
u1 = 1 , u2 = 1 .
0 1
Exercise 3.72 Consider the three n × n matrices A, B, and C. Show that
(a) A is similar to A. (Reflexive)
(b) If A is similar to B, then B is similar to A. (Symmetric)
(c) If A is similar to B, and B is similar to C, then A is similar to C. (Transitive)
Diagonalization 177

Exercise 3.73 Find the matrix A that is similar to the matrix B given that
   
−13 −8 −4 1 1 2
B =  12 7 4  , P = −2 −1 −3 .
24 16 7 1 −2 0

Exercise 3.74 Prove parts (a) and (b) of Theorem 3.18.

3.8 Diagonalization
In this section, we look at the concept of matrix diagonalization, which is the process
of transformation on a matrix in order to recover a similar matrix that is diagonal.
Once a matrix is diagonalized, it becomes very easy to raise it to integer powers. We
begin with the following definition.
Definition 3.24 Let A be an n × n matrix. We say that A is diagonalizable if there
exists an invertible matrix P such that

D = P−1 AP

where D is a diagonal matrix.


Theorem 3.19 Let A be an n × n matrix. The following are equivalent.
(a) The matrix A is diagonalizable.
(b) The matrix A has n linearly independent eigenvectors.

Proof Assume (a) and consider the invertible matrix


 
p11 p12 · · · p1n
 p21 p22 · · · p2n 
P= . ..  .
 
.. ..
 .. . . . 
pn1 pn2 ··· pnn

Then the relation D = P−1 AP implies that AP = PD, where


 
k1 0 · · · 0
.
 0 k2 . . . .. 

D= .

. .

 .. . . . . . . .. 
0 · · · 0 kn
178 Matrices and Systems of Linear Equations

Thus, if we denote the columns of the matrix P with p1 , p2 , . . . , pn , then


 
k1 p11 k2 p12 · · · kn p1n
k1 p21 k2 p22 . . . kn p2n 
 
AP = PD =  .

..  .

 .. .. ..
. . . 
k1 pn1 k2 pn2 · · · kn pnn

yields
Ap1 = k1 p1 , Ap2 = k1 p2 , . . . , Apn = k1 pn ,
where Api = ki pi , i = 1, 2, . . . , n are the successive columns of AP. Since P is in-
vertible, each of its column vector is nonzero. Thus the above relation implies that
k1 , k2 , . . . , kn are eigenvalues of A with correponding eigenvectors p1 , p2 , . . . , pn .
Since P is invertible, it follows from Corollary 5 that p1 , p2 , . . . , pn are linearly
independent eigenvectors. As for (b) implying (a), we assume p1 , p2 , . . . , pn are
linearly independent eigenvectors with corresponding eigenvectors, k1 , k2 , . . . , kn .
Let the matrix P be given as in the proof of part (a). Then the prod-
uct of the two matrices AP has the columns  Api , i = 1, 2, . . . , n. But Api =
k1 p11 k2 p12 · · · kn p1n
k1 p21 k2 p22 . . . kn p2n 
 
ki pi , i = 1, 2, . . . , n, and this translates into AP =  . ..  =

 .. .. ..
. . . 
k1 pn1 k2 pn2 · · · kn pnn
  k 0 · · · 0

p11 p12 · · · p1n 1
 p21 p22 · · · p2n   .. .
  0 k2 . ..  = PD, where D is the diagonal ma-
 .. .. .. ..   . .

 . . . .   .. . . . . . ... 

pn1 pn2 · · · pnn 0 · · · 0 kn
trix having its diagonal entries the eigenvalues k1 , k2 , . . . , kn . The matrix P is invert-
ible, since its column vectors are linearly independent. Thus, the relation AP = PD
implies that D = P−1 AP. This completes the proof.

Note  not every matrix is diagonalizable. To see this we consider the matrix A =
 that
0 1
. Then 0 is the only eigenvalue but A is not the zero matrix.
0 0
In summary, to diagonalize a matrix, one should perform the following steps:
(1) Compute the eigenvalues of A and the corresponding n linearly independent
eigenvectors.
(2) Form the matrix P by taking its columns to be the eigenvectors found in step (1).
(3) The diagonalization is done and given by D = P−1 AP.
Diagonalization 179

Example 3.33 Consider the matrix


 
5 2 3
A = 0 8 3 .
0 0 4
Expanding the determinant along the first row we obtain the third degree equation
(5 − λ )(8 − λ )(4 − λ ) = 0,
which has the three distinct eigenvalues
λ1 = 5, λ2 = 8, and λ3 = 4.
The corresponding eigenvectors are
     
1 2 6
p1 = 0 , p2 = 3 , and p3 =  3  .
0 0 −4
Thus,  
1 2 6
P = 0 3 3 .
0 0 −4
One can easily check that
 
5 0 0
P−1 AP = 0 8 0 .
0 0 4

Note that the diagonalization of a matrix is not unique, since you may rename the
eigenvalues or remix the columns of the matrix P.
As we have said before, one of the most important application to diagonaliza-
tion is the computation of matrix powers. Suppose the matrix A is diagonaliz-
able.Then there exists a matrix
 P such that D = P−1 AP, or PDP−1 = A, where
d11 0 · · · 0
.. 
 0 d22 . . .

. 
D=  . ..  .Then for a positive integer k we have

. . . . .
 . . . . 
0 · · · 0 dnn
−1 −1 −1 k −1
Ak = PDP
| · · · PDP
{z · · · PDP } = PD P ,
k−times
 k 
d11 0 ··· 0
 k .. .. 
0 d22 . . 
Dk = 
 . ..  .

 .. .. ..
. . . 
0 ··· 0 dnnk
180 Matrices and Systems of Linear Equations

Another advantage is that once a matrix is diagonalized, then it is easy to find its in-
 −1
verse if it has one. To see this, let PDP−1 = A. Then A−1 = PDP−1 = PD−1 P−1 ,
where  1 
d11 0 ··· 0

1 .. .. 
−1
0
d
. . 
D = .
 22

 .. .. .. .. 
. .

 . . 
0 · · · 0 d1nn

Example 3.34 Consider the 2 × 2 matrix


 
1 4
A= .
4 3

Then the eigenpairs are


   
1 1
λ1 = −1, p1 = , λ2 = 5; p2 = .
−1 2

Hence,    
1 1 1 2 −1
P= , and P−1 = .
−1 2 3 1 1
Finally, for positive integer k, we have
  k  
1 1 −1 0 1 2 −1
Ak = PDk P−1 =
−1 2 0 5 3 1 1
k k k
 
1 2(−1) + 5 −1 + 5
= .
3 −2 + 2 · 5k 1 + 2 · 5k

In particular, for k = 100, we have

2 + 5100 (−1)k + 5100


 
100 1
A = .
3 (−2)k + 2 · 5100 1 + 2 · 5100


Definition 3.25 Let A be an n × n matrix. We say that A is orthogonally diagonaliz-
able if there exists an orthogonal matrix P such that

D = P−1 AP

where D is a diagonal matrix. The matrix P is said to orthogonally diagonalize A.


Theorem 3.20 Let A be an n × n matrix. The following are equivalent.
(a) The matrix A is orthogonally diagonalizable.
(b) The matrix A has set of n orthonormal eigenvectors.
(c) The matrix A is symmetric.
Diagonalization 181

The proof follow along the lines of Theorem 3.20. The only change is to apply the
Gram-Schmidt process to obtain orthonormal basis for each eigenspace. Then form
P whose columns are the orthonormal basis. This matrix orthogonally diagonalizes
A. Note that the proof of (c) implies (a) is a bit demanding and we refer to [8]. As
for the proof of (a) implies (c), we have that D = P−1 AP, or A = PDP−1 . Since P is
orthogonal, we have A = PDPT . Therefore,

AT = (PDPT )T = PDT PT = PDPT = A.

Example 3.35 Consider the matrix


 
8 −2 2
A = −2 5 4 .
2 4 5

Expanding the determinant along the first row we obtain the cubic equation

λ (λ − 9)2 = 0,

which has the eigenvalues λ1 = 0 of multiplicity one and λ2 = 9 of multiplicity two.


The corresponding eigenpairs are
     
1 −2 2
λ1 = 0, p1 =  2  ; λ2 = 9, p2 =  1  , and p3 = 0 .
−2 0 1

By applying the Gram-Schmidt process we obtain the orthonormal basis


     
1 −2 2
1  1   1  
x1 = 2 , x2 = √ 1 , and x3 = √ 4 .
3 5 0 3 5 5
−2

Thus,
1
− √25 2 


3 3 5
2 √1 4 

P= .

3 5 3 5
− 23 0 5

3 5

3.8.1 Exercises
Exercise 
3.75 Diagonalize
 each of
 the following
 matrices  and find A100
.
1 4 5 −3 5 −3
(a). A = , (b). A = (c). A = .
4 3 6 −4 −6 2
Exercise 
3.76 Diagonalize
 each of the following
 matrices.

2 0 0 5 0 0
(a). A = 0 2 2 , (b). A = 2 6 0 .
0 0 4 3 2 1
182 Matrices and Systems of Linear Equations
 
2 4 6
Exercise 3.77 Explain why this matrix A = 0 2 2 is not diagonalizable.
0 0 4
Exercise 3.78 Show that if B is diagonalizable and invertible, then so is B−1 .
Exercise 3.79 Let  
p11 p12
P= .
p21 p22
Show that:
(a) P is diagonalizable if (p11 − p22 )2 + 4p12 p21 > 0.
(b) P is not diagonalizable if (p11 − p22 )2 + 4p12 p21 < 0.
Exercise 3.80 Show that if A and B are orthogonal matrices, then AB is also orthog-
onal.
Exercise 3.81 Find the matrix P that orthogonally diagonalizes each of the follow-
ing matrices.
   
2 1 −1 3 2 6
(a). A = 0 1 1  , (b). A = −6 3 2 .
1 −1 1 2 6 −3
Exercise 3.82 Show that if P is orthogonal, then aP is orthogonal if and only if a = 1
or a = −1.

3.9 Quadratic Forms


In this section, we examine the concepts of quadratic forms and their applications to
diagonalizing matrices. We begin with the following definition.
Definition 3.26 A general quadratic form with n variables is of the form
n
Q(x1 , x2 , . . . , xn ) = ∑ ai j xi x j , (3.45)
i, j=1

where ai j are constants.


It will be more convenient at times to write (3.45) in matrix forms as we do next. Let
x be an n × 1 column vector and A be a symmetric matrix, then (3.45) is equivalent
to
Q = xT Ax, (3.46)
Quadratic Forms 183

where    
x1 a11 a12 ··· a1n
x2  a21 a22 ··· a2n 
x =  .  and A =  . ..  .
   
. .. ..
.  .. . . . 
xn an1 an2 ··· ann

Example 3.36 Consider the quadratic form

Q(x1 , x2 ) = 3x12 − 8x1 x2 + x22 .

Then Q can be represented by the following 2 × 2 matrices


     
3 −2 3 −1 3 −4
, , .
−6 1 −7 1 −4 1

However, only the third matrix among them is symmetric. □


Recall (3.8) says, generally, one can find symmetrization R of a square matrix A
by
1
R = (A + AT ).
2
We state this fact as a theorem.
Theorem 3.21 Any quadratic form can be represented by a symmetric matrix.
a +a
Proof Let A = (ai j ). If ai j ̸= a ji then replace those entries with ri j = i j 2 ji . Then it
is evident that ri j = r ji , and this will not change the corresponding quadratic form.
This completes the proof.
Example 3.37 Find the symmetric matrix that corresponds to the quadratic form

Q(x1 , x2 , x3 ) = −x12 + 3x22 + x32 − 2x1 x2 + 4x2 x3 + 7x1 x3 .

The matrix is  
−1 −1 7/2
A =  −1 3 2 .
7/2 2 1
It is clear that AT = A and xT Ax = Q. □
Definition 3.27 Let x ∈ Rn and suppose A is an n × n constant symmetric matrix.
Then the quadratic form
Q(x) = xT Ax
is
(a) positive definite if Q(x) > 0, for all x ̸= 0,
(b) negative definite if Q(x) < 0, for all x ̸= 0,
184 Matrices and Systems of Linear Equations

(c) postive semidefinite if Q(x) ≥ 0, for all x ̸= 0,


(d) negative semidefinite if Q(x) ≤ 0, for all x ̸= 0,
(e) indefinite if Q(x) changes sign.
(f) If Q is positive definite, then the symmetric matrix A is called positive definite
matrix.
Example 3.38 Let
 
k1 0 ··· 0  
x1
 .. ..  x2 
0 k2 . .
A= and x =  .  .
  
. .. .. ..   .. 
 .. . . .
0 ··· 0 kn xn

Then
 
k1 0 ··· 0 x1 
 .. ..  x 
0 k2 . . 2
xT Ax = (x1 , x2 , . . . , xn ) 
.
 . 
..  
 .. .. .. . 
. . . .
 
0 ··· 0 kn xn
= k1 x12 + k2 x22 + . . . + kn xn2 .

Thus, for any n×n diagonal matrix A, Q(x) is positive definite for x ̸= 0 and provided
that ki ≥ 0 and ki ̸= 0 for at least one i = 1, 2, . . . . □
 
a b
Theorem 3.22 Let A = and consider the quadratic form
b c
 
x
Q(x, y) = (x, y)A = ax2 + 2bxy + cy2 .
y

(a) Q is positive definite if and only if a > 0, and ac − b2 > 0.


(b) Q is negative definite if and only if a < 0, and ac − b2 > 0.
(c) Q is indefinite, if and only if a > 0, and ac − b2 < 0.
(d) Q is indefinite and only if a < 0, and ac − b2 < 0.

The proof follows immediately from the fact that

b 2 (ac − b2 ) 2
Q(x, y) = a x + y + y .
a a
Next we turn our attention to the characterization of eigenvalues of matrices that are
symmetric.
Quadratic Forms 185

Theorem 3.23 Let A be an n × n symmetric matrix with eigenvalues λ1 ≥ λ2 ≥ . . . ≥


λn . Let

S n−1 = {x ∈ Rn , such that its Euclidean norm (x, x) = ||x|| = 1}.


p

Then,
(a) λn ≤ xT Ax ≤ λ1 for all x ∈ S n−1 .
(b) Let y1 , y2 ∈ S n−1 . Then if y1 ∈ Rn is the corresponding eigenvector for λ1 , then
yT1 Ay1 = λ1 . Similarly, If y2 ∈ Rn is the corresponding eigenvector for λn , then
yT2 Ay2 = λn .

The proof of the theorem is based on the fact that

xT Ax = (Ax, x) = (x, Ax).

Note that, S n−1 denotes the unit (n − 1)-dimensional sphere in Rn . Moreover, since
the set S n−1 is closed and bounded, continuous functions on S n−1 attain their max-
imum and minimum values. Thus, if x ∈ S n−1 , then the maximum and minimum
of the quadratic form Q = xT Ax can be easily computed using Theorem 3.23, as the
next example shows.
Example 3.39 Consider the quadratic form
  
0 1/2  x1
Q(x1 , x2 ) = x1 x2 = x1 x2 .
1/2 0 x2
 
0 1/2
The eigenvalues of the matrix A = are λ1 = 12 and λ2 = − 12 . Then the
1/2 0
eigenpairs are
! !
√1 − √12
2
λ1 = 1/2, v1 = 1 ; λ2 = −1/2, v2 = .
√ 1 √
2 2

Thus, the maximum λ1 = 1/2 of Q occurs at ±v1 , and its minimum λ2 = −1/2 of Q
occurs at ±v2 . In fact, one may uses Lagrange multiplier to extremize the function
f (x1 , x2 ) = x1 x2 subject to the constraint function g(x1 , x2 ) = x12 + x22 − 1. □
Theorem 3.24 If A is a real symmetric matrix, then there exists an orthogonal matrix
T such that the transformation x = Tx̄ will reduce the quadratic form (3.46) to the
canonical or diagonal form

Q = λ1 x̄12 + λ2 x̄22 + . . . + λn x̄n2 , (3.47)


 
x̄1
x̄2 
where x̄ =  .  , and λi , i = 1, 2, . . . , n are the eigenvalues of A.
 
 .. 
x̄n
186 Matrices and Systems of Linear Equations

Proof The proof is a direct consequence of Theorems 3.19 and 3.20. Let T be the
orthogonal matrix P in Theorem 3.20 and assume λ1 , λ2 , . . . , λn are the eigenvalues
of the symmetric matrix A. Let the columns of T be the obtained orthonormal vectors
yi
||yi || , i = 1, 2, . . . , n. Then we have
 
y1 y2
T= ||y1 || ||y2 || . . . ||yynn || .

As a consequence,
   
y y y y y y
AT = A ||y1 || A ||y2 || . . . A ||ynn || = λ1 ||y1 || λ2 ||y2 || . . . λn ||ynn || .
1 2 1 2

This yields
 T  
y1 y2
TT AT = ||y1 || ||y2 || . . . ||yynn || λ1 ||yy1 || λ2 ||yy2 || . . . λn ||yynn ||
1 2
 
λ1 0 ··· 0
 .. .. 
0 λ2 . .
= 
. ..  = D.

 .. .. ..
. . .
0 ··· 0 λn

Clearly T is orthogonal and hence TT = T. Let

x = Tx̄.

Then,

Q = xT Ax = (Tx̄)T ATx
= x̄T TT ATx = x̄T Dx
= λ1 x̄12 + λ2 x̄22 + . . . + λn x̄n2 .

This completes the proof.


Example 3.40 Consider the quadratic form that we wish to put in canonical form,

Q(x1 , x2 , x3 ) = 3x12 + 2x22 + 3x32 + 2x1 x3 .

Now Q is equivalent to
Q = xT Ax,
where    
3 0 1 x1
A = 0 2 0 , x = x2  .
1 0 3 x3
It is clear that A is symmetric. The eigenvalues of A satisfy

(2 − λ )2 (λ − 4) = 0.
Quadratic Forms 187

Thus the eigenvalues are

λ1 = λ2 = 2, and λ3 = 4.
 
k1
Let K = k2  . Then using (3.31) we have

k3

(3 − λ )k1 + k3 = 0
(2 − λ )k2 = 0 (3.48)
k1 + (3 − λ )k3 = 0.

Substituting λ3 = 4 for λ we get k2 = 0, and k1 = k3 . Letting k1 = 1, we get the


corresponding eigenvector  
1
y1 = 0 .
1
Next we substitute λ1 = 2 for λ and obtain k1 = −k3 , and k2 is free. Setting k3 = b
and k2 = a we arrive at
     
−b 0 −1
y =  a  = a 1 + b  0  .
b 0 1

By choosing a = 1, b = 0 and then a = 0, b = 1, we arrive at the other two corre-


sponding eigenvectors    
0 −1
y2 = 1 , y3 =  0  .
0 1
After normalizing the eigenvectors we form the matrix
 1 

2
0 − √12
T= 0 1 0 ,
 
√1 0 √1
2 2
 
x̄1
which is orthogonal. Let x̄ = x̄2  . Then,
x̄3
 1  

2
0 − √12 x̄1
x = Tx̄ =  0 1 0  x̄2  ,
 
√1 0 √1 x̄3
2 2

from which we arrive at


1 1
x1 = √ x̄1 − √ x̄3
2 2
188 Matrices and Systems of Linear Equations

x2 = x̄2
1 1
x3 = √ x̄1 + √ x̄3 .
2 2
Substituting x1 , x2 , and x3 back into Q(x1 , x2 , x3 ) confirms that

Q(x̄1 , x̄2 , x̄3 ) = 4x̄12 + 2x̄22 + 2x̄32 = λ1 x̄12 + λ2 x̄22 + λ3 x̄32 .


The above results extend to cover quadratic forms of the form Q = c, where c is
constant. Here is an example.
Example 3.41 Consider the quadratic form

Q(x1 , x2 , x3 ) = 4x12 + 4x22 + 4x32 + 4x1 x2 + 4x1 x3 + 4x2 x3 − 3.

Now Q is equivalent to
Q = xT Ax − 3,
where    
4 2 2 x1
A = 2 4 2 , x = x2  .
2 2 4 x3
The eigenvalues are
λ1 = λ2 = 2, and λ3 = 8.
The corresponding normalized eigenvectors are
 1 
√1
 1   
− √2 − √6 3
− √1 
y3 =  √13  .
 √1 
y1 =   , y2 =  ,
 
2 6
0 √2 √1
6 3
 1
− √2 − √16 √1

3
T =  √12 − √16 √1  ,

3
0 √2 √1
6 3
 
x̄1
which is orthogonal. Let x̄ = x̄2  . Then,
x̄3

x = Tx̄,

reduces Q to the form

Q(x̄1 , x̄2 , x̄3 ) = 2x̄12 + 2x̄22 + 8x̄32 − 3.


Quadratic Forms 189

Theorem 3.25 A quadratic form Q = xT Ax is positive definite, if and only if, all the
eigenvalues of A are positive.

Proof Suppose Q is positive definite and let λ be an eigenvalue of the matrix A. If


λ = 0, then there is an eigenvector x so that Ax = 0. But then Q = xT Ax = 0, which
implies Q is not positive definite. Similarly, if λ < 0, then there is an eigenvector x
so that Ax = λ x. But then Q = xT Ax = λ ||x||2 < 0, since λ < 0. This implies Q is
not positive definite. On the other hand, if all the eigenvalues λi , i = 1, 2, . . . , n are
positive, then there is a transformation x = Tx̄ that will reduce the quadratic form Q
to the canonical form

Q = λ1 x̄12 + λ2 x̄22 + . . . + λn x̄n2 > 0,

unless x̄ = 0. But then x = T0 = 0. This completes the proof.

We have the following example.


Example 3.42 The quadratic form

Q(x1 , x2 , x3 ) = 4x12 + 4x22 + 4x32 + 4x1 x2 + 4x1 x3 + 4x2 x3

is positive definite since the eigenvalues of


 
4 2 2
A = 2 4 2
2 2 4
are
λ1 = λ2 = 2, and λ3 = 8.

Another characterization of the positive definiteness of a quadratic form is with re-
spect to the determinant of all principal submatrices. We begin with the following
definition.
Definition 3.28 Let A = (ai j ) be an n × n matrix. For 1 ≤ k ≤ n, the kth principal
submatrix of A is  
a11 a12 · · · a1k
a21 a22 · · · a2k 
..  .
 
 .. .. ..
 . . . . 
ak1 ak2 ··· akk

Now we have the following theorem.


Theorem 3.26 A quadratic form Q = xT Ax is positive definite if and only if the de-
terminant of every principal submatrix is positive.
We furnish the following simple example.
190 Matrices and Systems of Linear Equations

Example 3.43 Consider the matrix A in Example 3.42. Then, the principal subma-
trices of A are  
4 2
B = (4), and C = .
2 4
Since
det(B) = 4, and det(C) = 12,
T
are all positive, the quadratic form Q = x Ax is positive definite. □
The next theorem is about reducing two quadratic forms simultaneously to canonical
forms when one of them is positive definite.
Theorem 3.27 If at least one of the quadratic forms

Q1 = xT Ax, Q2 = xT Bx (3.49)

is positive definite, it is always possible to reduce the two forms simultaneously to


linear combinations of only squares of new variables, that is, to canonical forms, by
a nonsingular real transformation.

Proof Suppose Q2 is positive definite. Then by Theorem 3.24, there exists T such
that
x = Ty (3.50)
that reduces Q2 to the form

Q2 = µ1 y21 + µ2 y22 + . . . + µn y2n , (3.51)

where µi , i = 1, 2, . . . , n are the eigenvalues of the symmetric matrix B. Since B is


positive definite, all of its eigenvalues are positive. Hence we may set

ηi = µi yi , i = 1, 2, . . . , n, (3.52)

which reduces Q2 to the form

Q2 = η12 + η22 + . . . + ηn2 = η T η, (3.53)


 
η1
where η =  ...  . At the same time (3.50) reduces Q1 to the form
 

ηn

Q1 = xT Ax = (Ty)T ATy = yT TT ATy


= yT (TT AT)y. (3.54)

Now (3.52) reduces (3.54) to


T
Q1 = η T (T′ AT′ )η (3.55)
Quadratic Forms 191

where T′ is the matrix obtained from T by dividing each element of the ith column by

µi . Hence we may write Q1 as
T
Q1 = η T Gη, where G = T′ AT′ . (3.56)

Then the matrix G is symmetric since


T T
GT = (T′ AT′ )T = (AT′ )T (T′ )T = T′ AT′ = G.

Thus (3.56) maybe reduced to canonical form by setting


 
α1
 .. 
η = Sα, where α =  .  ,
αn

and S is made up of the normalized eigenvectors of G. Thus

Q1 = λ1 α12 + λ2 α22 + . . . + λn αn2 , (3.57)

where λi , i = 1, 2, . . . , n are the eigenvalues of G. At the same time using η = Sα in


(3.53) gives

Q2 = η T η = (Sα)T (Sα) = α T ST Sα = α T α
= α12 + α22 + . . . + αn2 (since S is orthogonal). (3.58)

Thus, the change of variables

x = Ty = T′ η = T′ Sα, (3.59)

will simultaneously reduce Q1 and Q2 to diagonal forms, or to the canonical forms


(3.57) and (3.58), respectively. This completes the proof.

We provide the following example.


Example 3.44 Find the real transformation that will simultaneously reduce the
quadratic forms

Q1 = 3x12 + 3x22 − 2x1 x2 and Q2 = 2x12 + 2x22 ,

to canonical forms. It is clear that Q2 is positive and definite. Moreover, in matrix


notation, we have that
Q2 = xT Bx,
   
2 0 k
where B = with eigenvalues µ1 = µ2 = 2. Let K = 1 . Then
0 2 k2

(2 − µ1 )k1 + 0k3 = 0
0k1 + (2 − µ1 )k2 = 0.
192 Matrices and Systems of Linear Equations

Substituting µ1 = 2, we get 0k1 + 0k2 = 0. Letting k1 = a, k2 = b we arrive at


     
a 1 0
k= =a +b .
b 0 1
 
1
Let a = 1, b = 0 and get the eigenvector K1 = . Similarly, if we set a = 0, b = 1
 0  
0 1 0
we arrive at the second eigenvector K2 = . So T = , and hence the
  1 0 1
y
transformation x = Ty, y = 1 reduces Q2 to
y2

Q2 = µ1 y21 + µ2 y22 = 2y21 + 2y22 .

Or, √

Q2 = η12 + η22 , where ηi = µi yi = 2yi , i = 1, 2.
Thus, !
√1 0
′ 2
T = ,
0 √1
2


3 −1
and Q1 = η T (T′ T AT′ )η, where A = . In particular,
−1 3
! !
√1
 √1
T 2
0 3 −1 2
0
Q1 = η η
0 √12 −1 3 0 √12
 
3/2 −1/2
= ηT η := η T Gη.
−1/2 3/2

The matrix G has the normalized eigenpairs


1
! !
√ √1
λ1 = 1, v1 = 2 ; λ2 = 2, v2 = 2 .
√1 − √12
2

Thus, !
√1 √1
S= 2 2 .
√1 − √12
2

Setting  
α1
η = Sα, where α = ,
α2
gives
1 1
η1 = √ (α1 + α2 ), η2 = √ (α1 − α2 ).
2 2
Quadratic Forms 193

This implies that


! !
√1 √1 √1 √1

3/2 −1/2
Q1 = α T ST GSα = α T 2 2 2 2 α
√1 − √12 −1/2 3/2 √1 − √12
2 2
= α12 + 2α22 ,

as expected. Thus the transformation that will simultaneously transform Q1 and Q2


into canonical forms is
! 1 ! 
√1 0 √ √1 α1
x = T′ Sα = 2 2 2
0 √12 √1
2
− √1
2
α2
1 
2 (α1 + α2 )
=  .
1
2 (α1 − α2 )

Componentwise, the transformation is


1 1
x1 = (α1 + α2 ), x2 = (α1 − α2 ).
2 2

3.9.1 Exercises
Exercise 3.83 Write the quadratic forms in matrix forms with symmetric matrices.
(a) Q(x1 , x2 ) = 3x12 + 3x22 − x1 x2 .
(b) Q(x1 , x2 , x3 ) = x12 + x22 + x32 − 8x1 x2 + 4x2 x3 + 10x1 x3 .
Exercise 3.84 For each of the given matrices, write down the correspond-
ing quadratic form and then find a symmetric matrix
 which  determines the
  5 −1 2
2 1
same quadratic form. (a) A = , (b) B = 3 4 1 , (c) C =
3 4
1 6 2
 
1 2 0
3 4 5 .
0 7 6
Exercise 3.85 Let A be an n×n matrix. We say A = (ai j ) is positive definite if xT Ax >
0 for nonzero n × 1 vector x. Show that if A is positive definite, then aii > 0, i =
1, 2, . . . , n.
Exercise 3.86 Give an example of a quadratic form in 2 variables Q(x1 , x2 ), which
is
(a) positive definite,
(b) negative definite,
194 Matrices and Systems of Linear Equations

(c) postive semidefinite,


(d) negative semidefinite,
(e) indefinite.
Exercise 3.87 Find an orthogonal transformation which will reduce each of the
quadratic forms given below to canonical form.
(a) Q(x1 , x2 , x3 ) = 3x12 + 3x22 + 3x32 − x1 x2 − x2 x3 .
(b) Q(x1 , x2 , x3 ) = 3x12 + 4x22 + 3x32 + 4x1 x2 − 4x2 x3 .
(c) Q(x1 , x2 ) = 5x12 + 8x22 − 4x1 x2 − 36
(d) Q(x1 , x2 , x3 , x4 ) = 5x1 x4 + 5x2 x3 .
(e) Q(x1 , x2 , x3 ) = x12 + x22 + x32 − 2x1 x2 .
(f) Q(x1 , x2 , x3 ) = 2x12 + x22 + x32 + 2x1 x2 − 2x1 x3 − 4x2 x3 .
Exercise 3.88 Let S n−1 be defined as in Theorem 3.23. Find the maximum and
minimum of each of the multivariable functions on S n−1 .
(a)
f (x1 , x2 , x3 ) = 3x12 + 2x22 + 3x32 + 2x1 x3 .

(b)
f (x1 , x2 , x3 ) = 4x12 + 4x22 + 4x32 + 4x1 x2 + 4x1 x3 + 4x2 x3 .
Exercise 3.89 Use Theorem 3.26 to show the quadratic forms in Example 3.89 are
positive definite.
Exercise 3.90 Show the matrix
 
1 −1 2 0
−1 4 −1 1 
A=
 2 −1 6 −2

0 1 −2 4

is positive definite.
Exercise 3.91 Find all values of x so that the matrix
 
2 −1 x
A = −1 2 −1
x −1 2

is
(a) positive semidefinite,
(b) positive definite.
Functions of Symmetric Matrices 195

Exercise 3.92 Let A and B be symmetric matrices and consider the two quadratic
forms
Q1 = xT Ax and Q2 = xT Bx.
Show that if there is a matrix P that simultaneously diagonalizes Q1 and Q2 then
A−1 B is diagonalizable.
Exercise 3.93 Use Exercise 3.92 to show the two quadratic forms
Q1 = x12 + x1 x2 − x22 and Q2 = x12 − 2x1 x2
can not be simultaneously diagonalized.
Exercise 3.94 Find the real transformation that will simultaneously reduce the
quadratic forms
Q1 = x1 x2 and Q2 = 3x12 − 2x1 x2 + 2x22 .
to canonical forms.
Exercise 3.95 Find the real transformation that will simultaneously reduce the
quadratic forms
Q1 = 4x12 + 4x22 + 4x32 + 4x1 x2 + 4x1 x3 + 4x2 x3 ,
and
Q2 = 3x12 + 3x32 + 4x1 x2 + 8x1 x3 + 4x2 x3 ,
to canonical forms.

3.10 Functions of Symmetric Matrices


In this section, we restrict our study to symmetric matrices. We have already seen
that if A and B are square matrices and symmetric, then AB is symmetric only if
AB = BA. However, A + B is symmetric. Also, if A is any square matrix then,
A2 = AA, A3 = AA2 , ..., An+1 = AAn ,
and consequently, for any positive integers r and s we have
Ar As = As Ar = Ar+s . (3.60)
On the other hand,
A−n = (A−1 )n
provided that A is non-singular and A−1 is unique. If we adopt the notation
A0 = I,
then equation (3.60) holds for any integers r and s. In addition, if A is a square
symmetric matrix, then Ar is also symmetric for positive integer r.
196 Matrices and Systems of Linear Equations

Theorem 3.28 Suppose A is an n × n real symmetric matrix. If λi ̸= 0 is an eigen-


value of A with corresponding eigenvector ui , then for an integer r, the value λir is
an eigenvalue of Ar with the same eigenvector ui .

Proof Since Aui = λi ui , the outcome is the result of repeating


A2 ui = A(Aui ) = Aλi ui = λi Aui = λi λi ui = λi2 ui .
By repeating the above process, we deduce the relation
Ar ui = λir ui ,
for any positive integer r. By multiplying Aui = λi ui from the left with A−1 we get
ui = A−1 λi ui = λi A−1 ui . After multiplication by λi−1 yields
A−1 ui = λi−1 ui .
A similar argument leads to Ar ui = λir ui , for negative integer r. This completes the
proof.

We note that if A is symmetric, then Ar can not possess additional eigenvalues to


those obtained from A, nor can it possess eigenvectors which are linearly independent
of those of A. However, Ar may have eigenvectors not possessed by A, as the next
example shows.
Example 3.45 Suppose A is a 2 × 2  symmetric
 matrix   2 and −1
 with eigenvalues
1 3 4
and corresponding eigenvectors u1 = and u2 = . Let w = . Compute
2 1 2
A3 w. First we write w as a combination of u1 and u2 . That is, we need to find constants
c1 and c2 such that w = c1 u1 + c2 u2 . Or,
4 = c1 + 3c2 , 2 = 2c1 + c2 .
Solving the system, we arrive at c1 = 25 and c2 = 65 . Now by Theorem 3.28, we see
that
A3 u1 = 23 u1 and A3 u2 = (−1)3 u2 .
Hence,
 
2 6
A3 w = A3 u1 + u2
5 5
2 3 6 3
= A u1 + A u2
5 5
2 3 6
(2 u1 ) + (−1)3 u2

=
5 5
16 6
= u1 − u2
5 5
 2
−5
= 26 .
5

Functions of Symmetric Matrices 197

The next theorem plays an important role in the proof of the Cayley-Hamilton The-
orem.
Theorem 3.29 Let A be an n × n symmetric matrix. For constants αi , i = 1, 2, . . . , n
let
P(A) = αn An + αn−1 An−1 + . . . + α1 A + α0 I
be the characteristic polynomial of A. Then all eigenvectors of A are eigenvectors of
P(A) and if the eigenvalues of A are λ1 , . . . , λn , then those of P(A) are
P(λ1 ), P(λ2 ), ..., P(λn ).

Proof Let λi be an eigenvalue of A with corresponding eigenvector ui , i = 1, 2, . . . , n.


Then from Theorem 3.28, we have,
P(A)ui = αn An ui + αn−1 An−1 ui + . . . + α1 Aui + α0 ui
= αn λ n ui + αn−1 λ n−1 ui + . . . + α1 λ ui + α0 ui
= P(λ )ui .
Thus,  
P(A) − P(λ ) ui = 0,
which implies P(λ ) is an eigenvalue of P(A) with corresponding eigenvector ui . This
completes the proof.

The next theorem, known as the Cayley-Hamilton Theorem, sheds light on an inter-
esting relationship between a matrix and its characteristic polynomial.
Theorem 3.30 (Cayley-Hamilton Theorem) Let A be an n × n symmetric matrix. If
P(λ ) = |A − λ I| = 0,
then A satisfies P(A) = 0 (zero matrix).

Proof We know
P(λ ) = |A − λ I| = (−1)n [λ n + βn−1 λ n−1 + . . . + β1 λ + (−1)n β0 ].
By definition, we have that P(λi ) = 0, i = 1, 2, . . . , n. By Theorem 3.29, we
have
P(A) = P(λi )ui , i = 1, 2, . . . , n,
where λi is an eigenvalue of P(A) and ui is its corresponding eigenvector. Let B =
P(A). Then Bx = 0 possesses the n linearly independent solutions x = u1 , u2 , . . . , un .
But since B is a square matrix of order n, B must be of rank n − n = 0. Hence B =
P(A) must be a zero matrix and it follows that B = P(A) = 0. This completes the
proof.

We note that if A is an n × n symmetric matrix, then Theorem 3.30 enables us to


express any polynomial in A as a linear combination of I, A, A2 , . . . , An as the next
example shows.
198 Matrices and Systems of Linear Equations

Example 3.46 Let  


1 −2 4
A = 0 −1 2 .
2 0 3
Then its characteristic polynomial is given by

P(λ ) = λ 3 − 3λ 2 − 9λ + 3.

Now, by Cayley-Hamilton Theorem we have

A3 − 3A2 − 9A + 3I = 0. (3.61)

Thus,
A3 = 3A2 + 9A − 3I.
Multiplying (3.61) by A we arrive at

A4 = 3A3 + 9A2 − 3A = 3(3A2 + 9A − 3I) + 9A2 − 3A = 18A2 + 24A − 9I.

On the other hand, if we multiply (3.61) by A−1 we obtain


1 
A−1 = − A2 + 3A + 9I .
3
Similarly, if we multiply the preceding equation by A−1 again it yields,
1  1
A−2 = − A + 3I + 9A−1 = [−A + 3I + 3(−A2 + 3A + 9I)],
3 3
or
1
A−2 = [−3A2 + 8A + 30I].
3

Another application to the Cayley-Hamilton Theorem is Sylvester’s formula which
we discuss next. Assume all the eigenvalues of the n × n symmetric matrix A are
distinct. We attempt to write any polynomial in A of degree n − 1 as

P(A) = α1 An−1 + α2 An−2 + . . . + αn−1 A + αn I.

Equivalently, we seek C1 ,C2 , . . . ,Cn such that


 
P(A) = C1 (A − λ2 I)(A − λ3 I) . . . (A − λn I)
 
+ C2 (A − λ1 I)(A − λ3 I) . . . (A − λn I)
..
.
 
+ Cn (A − λ1 I)(A − λ2 I) . . . (A − λn−1 I) . (3.62)

Note that the right-hand side of (3.62) is of degree n − 1 in A. To determine Ci , i =


1, 2, · · · n, we multiply (3.62) by uk and use Auk = λk uk to observe that the coefficients
Functions of Symmetric Matrices 199

of all Ci except Ck contains the term λk − λk , and hence vanish. Thus, it follows after
some calculations that
 
P(A)uk = Ck (λk − λ1 ) · · · (λk − λk−1 )(λk − λk+1 ) · · · (λk − λn ) uk , (3.63)

where k = 1, 2, . . . , n. By Theorem 3.29, we have P(A)ui = P(λi )ui , and as a conse-


quence, (3.63) becomes
 
P(λk )uk = Ck (λk − λ1 ) · · · (λk − λk−1 )(λk − λk+1 ) · · · (λk − λn ) uk .

This yields the relation

P(λk )
Ck = , k = 1, 2, . . . , n (3.64)
∏ k − λr )

r̸=k

where the notation ∏ denotes the product of those factors for which r takes on
nonzero values, through n, excluding k. Substituting (3.63) and (3.64) into (3.62)
we obtain
n
P(A) = ∑ P(λk )Zk (A), (3.65)
k=1

where
∏ (A − λr I)
r̸=k
Zk (A) = . (3.66)
∏ (λk − λr )
r̸=k

We furnish the following example.


Example 3.47 Compute Am for positive integers m where
 
2 1
A= .
1 2

The eigenvalues of A are λ1 = 3, λ2 = 1. We are interested in calculating P(A) =


Am , m > 1. From (3.66) we have

(A − λ2 I) 1
Z1 (A) = = (A − I).
(λ1 − λ2 I) 2

Similarly,
(A − λ1 I) 1
Z2 (A) = = − (A − 3I). Thus
(λ2 − λ1 I) 2

2
P(A) = Am = ∑ P(λk )Zk (A)
k=1
= P(λ1 )Z1 (A) + P(λ2 )Z2 (A)
= P(3)Z1 (A) + P(1)Z2 (A)
200 Matrices and Systems of Linear Equations
1  1
= 3m (A − I) + (1)m − (A − 3I)
 
2 2
3m 1
= (A − I) − (A − 3I).
2 2
3100 1
Hence, A100 = 2 (A − I) − 2 (A − 3I). □
Next, we extend the application of Sylvester’s formula to linear systems of ordinary
xn
differential equations. Recall that ex = ∑∞
n=0 n! converges for all x. If A is a matrix of
A n
order n, the sum eA = ∑∞ n=0 n! is a polynomial of order n − 1 in A. So if A has dis-
tinct eigenvalues, then we can use Sylvester’s formula to calculate eA . For simplicity,
suppose A is of order two with distinct eigenvalues λ1 and λ2 . Then,

(A − λ2 I) (A − λ1 I)
Z1 (A) = , Z2 (A) = .
(λ1 − λ2 I) (λ2 − λ1 I)

Setting, P(A) = eA , we obtain


2
eA = P(A) = ∑ P(λk )Zk (A)
k=1
= P(λ1 )Z1 (A) + P(λ2 )Z2 (A)
A − λ2 I A − λ1 I
= eλ1 + eλ2
λ1 − λ2 λ2 − λ1
1 h i
= (eλ1 − eλ2 )A − (λ2 eλ1 − λ1 eλ2 )I .
λ1 − λ2
Note that if we replace A with At then we have
1 h λ1 t i
eAt = (e − eλ2 t )A − (λ2 eλ1 t − λ1 eλ2 t )I . (3.67)
λ1 − λ2
We have the following definition.
Definition 3.29 Let A be an n × n constant matrix. Then we define the exponential
matrix function by eA(t−t0 ) and is the solution of x′ = Ax, x(t0 ) = I (identity matrix).
More precisely,
x(t) = eA(t−t0 ) x0 (3.68)
is the unique solution of

x′ (t) = Ax(t), x(t0 ) = x0 ,

for all t ∈ R.
Example 3.48 Solve for t ≥ 0,
   
′ 2 3 x1 2
x = , x(0) = .
3 2 x2 −3
Functions of Symmetric Matrices 201
 
2 3
The matrix A = has the eigenvalues λ1 = 5, λ2 = −1. The solution of the
3 2
system is x(t) = eAt x0 . By (3.67) we have

1 h 5t i
eAt = (e − e−t )A − (−e5t − 5e−t )I
6
1 3e5t + 3e−t 3e5t − 3e−t

= .
6 3e5t − 3e−t 3e5t + 3e−t

Finally the solution is given by

1 3e5t + 3e−t 3e5t − 3e−t


  
At 2
x(t) = e x0 = .
6 3e5t − 3e−t 3e5t + 3e−t −3

3.10.1 Exercises
Exercise 3.96 Find the eigenvalues of A and A5 where,
 
3 −12 4
A = −1 0 −2 .
−1 5 −1

Exercise 3.97 Suppose A is a 3 × 3 symmetric


  matrix  eigenvalues 2 and −1
with
1 2
and corresponding eigenvectors u1 =  0  and u2 = 1 . Compute A5 w, where
  −1 0
7
w =  2 .
−3
Exercise 3.98 Compute A5 and A−4 in Example 3.46.
Exercise
 3.99
 Verify the statement of the Cayley-Hamilton Theorem for the matrix
1 2
A= , and then compute A4 and A−3 .
4 3
Exercise
 3.100 Verify
 the statement of the Cayley-Hamilton Theorem for the matrix
1 −2 4
A = 0 −1 2 , and then compute A3 and A−3 .
2 0 3
Exercise
 3.101 Use Sylvester’s
 formula to compute A100 where,
1 4 16
A =  18 20 4 .
−12 −14 −7
202 Matrices and Systems of Linear Equations

Exercise 3.102 For t ≥ 0, solve


    
′ 1 2 x1 −2
(a) x = , x(0) = .
2 1 x2 −3
    
0 2 −2 x1 −2
(b) x′ = 0 1 0  x2  , x(0) =  2  .
1 −1 3 x3 −3
Exercise 3.103 Write the second-order differential equation

y′′ + 4y′ + 3y = 0, y(0) = −1, y′ (0) = 3,

as a system and find its solution.


4
Calculus of Variations

This chapter is devoted to the study of the calculus of variations. The subject of
calculus of variations is a wide field in mathematics that is devoted to minimizing
or maximizing functionals. The calculus of variations has a rampant application in
physics, engineering, and applied mathematics. In addition, the calculus of variations
naturally makes its presence felt in the field of partial differential equations. In this
chapter, we will consider many applications, such as distance between two points,
Brachistochrone problem, surfaces of revolution, navigation, Catenary and others.
The chapter covers a wide range of classical topics on the subject of the calculus
of variations. Our aim is to cover the topics in a way that strikes a balance between
the development of theory and applications. The chapter is suitable for advanced un-
dergraduate and graduate students. In most sections, we limit ourselves to smooth
solutions of the Euler-Lagrange equations and finding explicit solutions to classical
problems. We will generalize the concept to systems and functionals that contain
higher derivatives of the unknown functions. The chapter contains a long but inter-
esting section on the sufficient conditions for the existence of an extremal.

4.1 Introduction
Let f : R → R be a real valued function that is continuous. Then we know from
calculus that if f has a local minimum or maximum value at an interior point c, and
if f ′ (c) exists, then
f ′ (c) = 0. (4.1)
Condition (4.1) is a necessary condition for maximizing or minimizing the function
f . Let f (x) = x3 . Then, f ′ (0) = 0. However, the function has neither a maximum nor
minimum at c = 0, as the graph in Fig. 4.1 shows. This shows that condition (4.1) is
not sufficient.
Before we commence on formal definitions, we must be precise when talking about
maximum or minimum in the sense of distances. This brings us to the notion of a
norm.
Definition 4.1 (Normed spaces) Let V denote a linear space over the field R. A func-
tional ∥x∥, which is defined on V is called the norm of x ∈ V , if it has the following

DOI: 10.1201/9781003449881-4 203


204 Calculus of Variations
y
y = x3

x
(0, 0)

FIGURE 4.1
f ′ (0) = 0, but f has neither a maximum nor a minimum.

properties:
1. ∥x∥ > 0 for all x ̸= 0, x ∈ V.
2. ∥x∥ = 0 if x = 0.
3. ∥αx∥ = |α|∥x∥ for all x ∈ V, α ∈ R.
4. ∥x + y∥ ≤ ∥x∥ + ∥y∥ (triangle inequality)
Example 4.1 The space (Rn , +, ·) over the field R is a vector space (with the usual
vector addition, + and scalar multiplication, ·) and there are many suitable norms for
it. For example, if x = (x1 , x2 , . . . , xn ) then
1. ∥x∥ = max |xi |,
1≤i≤n
s
n
2. ∥x∥ = ∑ xi2 , or
i=1
n
3. ∥x∥ = ∑ |xi |,
i=1
 n 1/p
4. ∥x∥ p = ∑ |xi | p , p≥1
i=1

are all suitable norms. Norm 2. is the Euclidean norm: the norm of a vector is its
Euclidean distance to the zero vector and the metric defined from this norm is the
usual Euclidean metric. Norm 3. generates the “taxi-cab” metric on R2 and Norm 4.
is the l p norm. □
Let D ⊂ Rn and define a function f : D → Rn . Let c be a point in the interior of D.
We define a neighborhood of c by

N(δ , c) = {x : ||x − c|| < δ } for some δ > 0.

Thus, a point c ∈ D is said to be a relative, or local minimum point of the function f


over D if for all x ∈ N(δ , c) we have f (c) ≤ f (x). On the other hand, if f (c) < f (x)
for all x ∈ N(δ , c) with x ̸= c, then c is said to be a strict relative minimum point of f
over D.
Introduction 205
y

y(x)
B •Q

P
A •

O x
a b

FIGURE 4.2
Shortest path between two points.

Assume the function is scalar. That is f : R → R. Then the Taylor series expansion
of f at c is
1
f (x) = f (c) + (x − c) f ′ (c) + (x − c)2 f ′′ (c) + O((x − c)3 ).
2
By making the change of variables x = c + ε, the above expression takes the
form
1
f (c + ε) = f (c) + ε f ′ (c) + ε 2 f ′′ (c) + O(ε 3 ). (4.2)
2
The proofs of the next two theorems are based on (4.2) and we urge the interested
readers to consult any calculus textbook.
Theorem 4.1 A necessary condition for a function f to have a relative minimum at
a point c in its domain is (i) f ′ (c) = 0 and (ii) f ′′ (c) ≥ 0.
Theorem 4.2 A sufficient condition for a function f to have a strict relative minimum
at a point c in its domain is (i) f ′ (c) = 0 and (ii) f ′′ (c) > 0.
Our main purpose is to extend the above discussion to the calucus of variations.
Suppose we have two points P(a, A) and Q(b, B) in the xy-plane and we are interested
in finding the shortest path between them, see Fig. 4.2. Let f (x) be a candidate for
being the shortest path between the two points. We know from calculus that if f ′
is continuous on [a, b], then the length of the curve y = f (x), a ≤ x ≤ b, is given
by
 bq  bq
L= 1 + ( f ′ (x))2 dx = 1 + (y′ )2 dx. (4.3)
a a
Note that the integral in (4.3) is a functional since the integrand depends on the
unknown function y. Since the right hand side of (4.3) depends on the unknown
function y we write
 bq  bq
L(y) = ′ 2
1 + ( f (x)) dx = 1 + (y′ )2 dx, (4.4)
a a
206 Calculus of Variations

to emphasize that it is a functional relation. Our work now is to develop a necessary


condition parallel to condition (4.1) that will enable us to compute the minimum
function y so that L(y) is minimized, which in turns will yield the shortest path, or
distance between the two pints P and Q. Of course, the lucky function will have to
satisfy the boundary conditions y(a) = A and y(b) = B.

4.2 Euler-Lagrange Equation


In this section we develop the Euler-Lagrange equation, which is a necessary for
minimizing or maximizing functionals. We begin by considering the functional or
variational  b
L(y) = F(x, y, y′ )dx, (4.5)
a

dy
where y′ (x) = dx . We are interested in finding a particular function y(x) that maxi-
mizes or minimizes (4.5) subject to the boundary conditions y(a) = A and y(b) = B.
Such a function will be called extremal of L(y).
Definition 4.2 Let S be a vector space (space that has algebraic structures under
multiplication and addition). Our main problem in calculus of variations is to find
y = y0 (x) ∈ S[a, b] for which the functional L(y) takes an extremal value (maximum
or minimum) with respect to all y(x) ∈ S[a, b].
The set Ck [a, b] denotes the set of functions that are continuous on [a, b] with their k-
th derivatives also being continuous on [a, b]. The vector space S[a, b] can be thought
of as the space of competing functions. To be precise, let Σ be the set of all competing
functions for the variational problem (4.5), then

Σ = {y : y ∈ C2 ([a, b]), y(a) = A, y(b) = B}.

Note that this space is not linear because if y, w ∈ Σ, then y(a)+w(a) = 2A ̸= A unless
A = 0. The same is true for the boundary condition at b. Next we define relative
minimum and relative maximum for a functional.
Definition 4.3 A competing function y0 ∈ Σ is said to yield relative minimum (maxi-
mum) for L(y) in Σ if
L(y) − L(y0 ) ≥ 0 (≤ 0)
for all
y ∈ N(y0 , ε) := {y ∈ Σ : ||y − y0 || < ε}, for some ε > 0,
where N(y0 , ε) is neighborhood of y0 .
Below, we build upon the notion of competing functions to define the so-called space
of admissible functions.
Euler-Lagrange Equation 207

η(x)

x
a x1 x2 b

FIGURE 4.3
The function η(x) with a, b > 0.

Definition 4.4 The space of admissible functions C is defined as

C = {ζ : ζ ∈ C2 ([a, b]), ζ (a) = ζ (b) = 0}.

This way, if y0 ∈ Σ, then y0 + η ∈ Σ, for η ∈ C . Before we can obtain the Euler-


Lagrange equation, we state and prove one of the most important results, which is
called the fundamental lemma of calculus of variations.
Lemma 10 [Fundamental lemma of calculus of variations] Assume f (x) is contin-
uous in [a, b] such that
 b
f (x)η(x)dx = 0 (4.6)
a
for every continuous function η ∈ C . Then f (x) = 0 for all x ∈ [a, b].

Proof Suppose the contrary. That is, f (x) is not zero over its entire domain [a, b].
Then, without loss of generality (w.l.o.g), let us assume it is positive for some interval
[x1 , x2 ] that is contained in [a, b]. Define

(x − x1 )3 (x2 − x)3 , x1 < x < x2



η(x) =
0, otherwise.

see Fig. 4.3. Then, the term (x − x1 )3 (x2 − x)3 > 0, for x ∈ (x1 , x2 ). We must make
sure that η ∈ C2 ([a, b]).

η(x) − η(x1 ) (x − x1 )3 (x2 − x)3 − 0


lim = lim
+
x→x1 x − x1 x→x1+ x − x1
= lim (x − x1 )2 (x2 − x)3 = 0.
x→x1+

Moreover,
η(x) − η(x1 ) 0−0
lim = lim = 0.
x→x1− x − x1 x→x1− x−x
1
208 Calculus of Variations

It follows that η ′ (x1 ) = 0, and hence η is continuously differentiable at x1 . The prove


of η is continuously differentiable at x2 follows along the same lines. Next we show
the second derivative η exists at x1 .

η ′ (x) − η ′ (x1 ) 3(x − x1 )2 (x2 − x)2 (x2 + x1 − 2x) − 0


lim = lim
x→x1+ x − x1 x→x1+ x − x1
= lim 3(x − x1 )(x2 − x)2 (x2 + x1 − 2x) = 0.
x→x1+

In addition,
η ′ (x) − η ′ (x1 ) 0−0
lim = lim = 0.

x→x1 x − x1 x→x1 x − x1

Hence, η ′′ (x1 ) = 0. It follows along the lines of the previous work that η ′′ (x2 ) = 0.
Thus, the second derivative of η exists and is given by

 (x − x1 )(x2 − x){(x − x1 )2 + (x2 − x)2 }
′′
η (x) = −3(x − x1 )(x2 − x), x1 < x < x2
0, otherwise.

It is evident that
lim η ′′ (x) = η ′′ (x1 ) = 0,
x→x1

and
lim η ′′ (x) = η ′′ (x2 ) = 0.
x→x2

This shows that η ∈ C2 ([a, b]). To get a contradiction, we integrate f (x)η(x) from
x = a, to x = b.
 b  x1  x2  b
f (x)η(x)dx = f (x)η(x)dx + f (x)η(x)dx + f (x)η(x)dx
a a x1 x2
 x2
= 0+ f (x)η(x)dx + 0
x1
 x2
= f (x)(x − x1 )3 (x2 − x)3 dx > 0,
x1

which contradicts (4.6). Thus, f (x) can not be non-zero anywhere in its domain [a, b].
We conclude that f (x) is zero on its entire domain [a, b]. The proof of taking f < 0 is
similar, so we omit it. This completes the proof.

Our aim is to find the path y(x) that minimizes or maximizes the functional. We will
consider all possible functions by adding a function η(x) ∈ C .
Theorem 4.3 [Euler-Lagrange equation] Assume F in (4.5) is twice differentiable
with respect to its arguments. Let y ∈ C2 [a, b] such that y(a) = A, and y(b) = B. That
Euler-Lagrange Equation 209
y

y(x) •B

A
• y(x) + εη(x)

η(x)

O x
a b

FIGURE 4.4
Possible extremal.

is y ∈ Σ. Then y = y(x) is an extremal function for the functional (4.5) if it satisfies


the Euler-Lagrange second-order differential equation

d ∂F  ∂F
− = 0. (4.7)
dx ∂ y′ ∂y

Proof Let η(x) be defined as in Lemma 10 and y ∈ Σ. For ε > 0 set

y(x) + εη(x) ∈ Σ,

where y is an extremal function for the functional L(y) given by (4.5). See Fig. 4.4.
In the functional L(y) we replace y by y + εη and obtain

 b
L(ε) = F(x, y + εη, y′ + εη ′ )dx. (4.8)
a
Once y and η are assigned, then L(ε) has extremum when ε = 0. But this possible
only when
dL(ε)
= 0 when ε = 0.

dL(ε)
Suppress the arguments in F and compute dε .
 b
dL(ε) d
= F(x, y + εη, y′ + εη ′ )dx
dε dε a

∂ F dx ∂ F d(y + εη) ∂ F d(y′ + εη ′ ) i
bh
= + + ′ dx
a ∂x ∂ε ∂y dε ∂y dε
 bh
∂F ∂F i
= η + ′ η ′ dx,
a ∂y ∂y
210 Calculus of Variations
dx
since = 0. Setting

dL(ε)
=0
dε ε=0
we arrive at
 bh
∂F ∂F i
(x, y + εη, y′ + εη ′ )η + ′ (x, y + εη, y′ + εη ′ )η ′ dx = 0.

a ∂y ∂y ε=0

Thereupon, we obtain the necessary condition


 bh
∂F ∂F i
(x, y, y′ )η(x) + ′ (x, y, y′ )η ′ (x) dx = 0. (4.9)
a ∂y ∂y

We perform an integration by parts on the second term in the integrand of (4.9). Let
dv = η ′ (x)dx and u = ∂∂ yF′ . Then

d ∂F 
v = η(x), and du = dx.
dx ∂ y′
It follows that
 b b  b d ∂ F 
∂F ′ ∂F
η dx = η(x) −

η(x)dx
a ∂ y′ ∂ y′ a a dx ∂ y

 b
∂F ∂F d ∂F 
= ′
η(b) − ′ η(a) − η(x)dx
∂y ∂y a dx ∂ y′
 b
d ∂F 
= − ′
η(x)dx,
a dx ∂ y

since η(a) = η(b) = 0. Substituting back into (4.9) we arrive at


 bh
∂F d ∂ F i
− η(x)dx = 0.
a ∂ y dx ∂ y′

It follows from Lemma 10 that

d ∂F  ∂F
− = 0, (4.10)
dx ∂ y′ ∂y

for all functions η(x). Equation (4.10) is referred to as Euler-Lagrange equation.


This completes the proof.
Remark 14 1. Equation (4.10) is a second-order ordinary differential equation.
2. The function y satisfying the Euler-Lagrange equation is a necessary, but not
sufficient, condition for L(y) to be an extremum. In other words, a function y(x)
may satisfy the Euler-Lagrange equation even when L(y) is not an extremum.
Euler-Lagrange Equation 211

For simpler notations, we may write Fy , Fy′ to denote ∂F


∂y and ∂F
∂ y′ , respectively. We
have the following simple examples.
Example 4.2 Find the extremal function for
 1
(y′ )2 + xy + y2 dx,

L(y) = y(0) = 1, y(1) = 2.
0

Here,
F(x, y, y′ ) = (y′ )2 + xy + y2 , with
d
Fy′ = 2y′ , Fy = x + 2y, and F ′ = 2y′′ .
dx y
It follows that
d
F ′ − Fy = 2y′′ − x − 2y = 0,
dx y
which is the second-order ODE

2y′′ − 2y = x,

and can be solved using the method of Section 1.9. Thus the solution is
1
y(x) = c1 ex + c2 e−x − x.
2
Using the given boundary conditions we end up with the system
5
c1 + c2 = 1, c1 e + c2 e−1 = , with
2
2e−1 − 5 5 − 2e
c1 = −1
, c2 = .
2(e − e) 2(e−1 − e)
Finally, the extremal function is given by

2e−1 − 5 x 5 − 2e −x 1
y(x) = e + e − x.
2(e−1 − e) 2(e−1 − e) 2


Example 4.3 Find the extremal function for
 π/4
(y′ )2 /2 − 2y2 dx,

L(y) = y(0) = 1, y(π/4) = 2.
0

Here,
F(x, y, y′ ) = (y′ )2 /2 − 2y2 , with
d
Fy′ = y′ , Fy = −4y, and F ′ = y′′ .
dx y
212 Calculus of Variations

It follows that
d
F ′ − Fy = y′′ + 4y = 0,
dx y
which can be solved using the method of Section 1.8 . It follows that the solution is

y(x) = c1 cos(2x) + c2 sin(2x).

Using the given boundary conditions, we arrive at c1 = 1 and c2 = 2. Hence, the


extremal function is given by

y(x) = cos(2x) + 2 sin(2x).

For fun, we evaluate the functional L at the extremal function. After some calcula-
tions we arrive at
(y′ )2 /2 − 2y2 = 6 cos(4x) − 8 sin(4x).
As a result, we see that
 π/4
L(cos(2x) + 2 sin(2x)) = [6 cos(4x) − 8 sin(4x)]dx = −4.
0


Example 4.4 Find the extremal function for
 2
x2 (y′ )2
+ y2 dx,

L(y) = y(1) = 1, y(2) = −1.
1 2
It follows that
d
F ′ − Fy = x2 y′′ + 2xy′ − 2y = 0,
dx y
which is a Cauchy-Euler equation. Using Section 1.11, we arrive at the solution
1
y(x) = c1 x + c2 .
x2

Applying the given boundary conditions, we obtain c1 = − 57 and c2 = 12


7 . Finally,
the extremal function is given by
5 12 1
y(x) = − x + .
7 7 x2

Notice that the Euler-Lagrange equation depends on the nature of the function
F(x, y, y′ ). Different forms, or alternate forms of (4.7) could be obtained based on
the dependence of F on the variables x, y, or y′ . We have the following corollar-
ies.
Euler-Lagrange Equation 213

Corollary 6 If F = F(x, y′ ), that is F does not explicitly depend on the variable y,


then the Euler-Lagrange equation (4.7) becomes

Fy′ = C, (4.11)

where C is a constant

Proof From (4.7), the term Fy is zero since F is independent of y. Hence we are left
with
d
F ′ = 0.
dx y
An integration with respect to x gives the result.
Corollary 7 If F = F(y, y′ ), that is F does not explicitly depend on the variable x,
then the Euler-Lagrange equation (4.7) is reduced to

F − y′ Fy′ = C, (4.12)

where C is a constant

Proof It suffices to show that


d 
F − y′ Fy′ = 0.
dx
Thus,
d  d
F − y′ Fy′ = Fx + Fy y′ + Fy′ y′′ − y′′ Fy′ − y′ Fy′
dx dx
d
= Fx + Fy y′ − y′ Fy′
dx
 d 
= Fx + y′ Fy − Fy′
dx 
= Fx since y satisfies Euler-Lagrange equation

= 0 since F does not include x.

An integration with respect to x gives the result. This completes the proof.

Now we are in a good place to find the shortest distance or path between two points
in the plane.
Example 4.5 Consider the functional
 bq
L(y) = 1 + (y′ )2 dx,
a

that was given by (4.4). Then,


q
F= 1 + (y′ )2 ,
214 Calculus of Variations

which is independent of x, and y. Using

Fy′ = C

that is given in Corollary 4.12, it follows that

y′
p = C.
1 + (y′ )2

Solving for y′ we end up with

y′ = constant = K,
y′
or by noticing that the left-hand of √ = C can be constant only if y′ = K,
1+(y′ )2
where K is some function of C (another constant). Hence,

y(x) = Kx + D.

Using the boundary conditions A = y(a) and B = y(b) we obtain

B−A Ab − Ba
K= , D= .
b−a b−a
Of course the shortest path is a straight line, as we have expected. □
We make the following definition regarding smoothness of a function.
Definition 4.5 Let Ω ⊂ Rn . Then the function f : Ω → Rn is said to be smooth on Ω
if f (x) ∈ Cn (Ω), in the sense that f (x) has n derivatives in the entire domain Ω and
the nth derivative of f (x) is continuous.
For example, the function

0, x ≤ 0
f (x) =
x2 , x > 0

is in C1 (R) but not in C2 (R). Recall, Theorem 4.3 asks for y ∈ C2 ([a, b]), which may
not be the case in some situations. To better illustrate the requirement, we look at the
next example.
Example 4.6 Consider the variational
 3
9
L(y) = (y − 2)2 (x − y′ )2 dx, y(1) = 2, y(3) = .
1 2
The integrand is positive and hence the variational is minimized when its value is
zero at the extremal. This is achieved for
2, 1 ≤ x ≤ 2

y(x) = x2
2, 2<x≤3
Euler-Lagrange Equation 215

Note that y′′ (2) does not exist and hence y ∈


/ C2 ([1, 3]), or not smooth. Nevertheless,
y = y(x) satisfies the corresponding Euler-Lagrange equation
d
[−2(y − 2)2 (x − y′ )] − 2(y − 2)(x − y′ )2 = 0.
dx
Thus, we have found an example in which y satisfies the corresponding Euler-
Lagrange equation and yet its second derivative does not exist at every point in [1, 3].

The next theorem guarantees when an extremal of a variation is indeed in
C2 ([a, b]).
Theorem 4.4 Assume F in (4.5) is twice differentiable with respect to its arguments.
Let y ∈ C1 [a, b] and satisfies the Euler-Lagrange second-order differential equation

d ∂F  ∂F
− = 0.
dx ∂ y′ ∂y
Then y(x) has a continuous second derivatives at all points (x, y), where

Fy′ y′ (x, y(x), y′ (x)) ̸= 0.

We will further discuss Fy′ y′ in the next two sections. Now we try to connect the
concept of extremum of functionals with functions that we discussed in Section 4.1.
Consider the variational problem (4.8). An expansion of Maclaurin series of the first
term on the right hand side about ε gives
 b
L(ε) = L(y) + ε [Fy η + Fy′ η ′ ]dx
a
 b  ε2
+ Fyy η 2 + 2Fyy′ ηη ′ + Fy′ y′ (η ′ )2 dx + O(ε 3 )
a 2!
ε 2
:= L(y) + εδ L(y) + δ 2 L(y) + O(ε 3 ).
2!
Let
L(0) = L(y), L′ (0) = δ L(y), and L′′ (0) = δ 2 L(y).
Then, we may write L(ε) in the form

ε 2 ′′
L(ε) = L(0) + εL′ (0) + L (0) + O(ε 3 ).
2!
The terms δ L(y) and δ 2 L(y) are called the first variation and second variation, re-
spectively, and they will be discussed in detail in Section 4.4.
Example 4.7 Consider the variational in Example 4.5 with y(0) = 0, and y(1) = 3.
Then,
y(x) = 3x.
216 Calculus of Variations

Moreover,
y′
Fy = 0, Fy′ = p , Fyy′ = Fy′ y = 0 and Fy′ y′ = (1 + (y′ )2 )−3/2 .
1 + (y′ )2
Thus,
 1  1
′ 3
δ L(y) = [Fy η + Fy′ η ]dx = √ η ′ (x)dx
0 0 10
3
= √ (η(1) − η(0)) = 0, and
10
 1 
δ 2 L(y) = Fyy η 2 + 2Fyy′ ηη ′ + Fy′ y′ (η ′ )2 dx
0
 1
= (10)−3/2 (η ′ (x))2 dx ≥ 0.
0

4.2.1 Exercises
Exercise 4.1 Assume f (x) is continuously differentiable in [a, b] such that
 b
f (x)η ′ (x)dx = 0
a

for every continuous function η ∈ C2 ([a, b]) such that η(a) = η(b) = 0. Show then
f (x) = constant for all x ∈ [a, b].
Exercise 4.2 Assume f (x) is C2 ([a, b]) such that
 b
f (x)η ′′ (x)dx = 0
a

for every continuous function η ∈ C3 ([a, b]) such that η(a) = η(b) = 0. Show then
f (x) = c1 + c2 x for all x ∈ [a, b], where c1 and c2 are constants.
Exercise 4.3 Assume f (x) and g(x) are continuous in [a, b] and
 b
[ f (x)η(x) + g(x)η ′ (x)]dx = 0
a

for every function η ∈ C1 ([a, b]) such that η(a) = η(b) = 0. Show then g(x) is dif-
ferentiable and g′ (x) − f (x) = 0 for all x ∈ [a, b].
Exercise 4.4 Let

(x − α)(β − x), α < x < β
η(x) =
0, otherwise.
Show that η(x) ∈ C(R).
Euler-Lagrange Equation 217

Exercise 4.5 In Example 4.6, find all points such that


Fy′ y′ (x, y(x), y′ (x)) ̸= 0
fails to hold.
Exercise 4.6 Find the extremal function y0 = y0 (x) for
 2
L(y) = x2 (y′ )2 dx, y(1) = 1, y(2) = 3.
1

Exercise 4.7 Find the extremal function for


 1
L(y) = (y′ )2 − 2yy′ + y2 )dx, y(0) = 0, y(1) = 2.
0

Exercise 4.8 Find the extremal function for


 1
L(y) = (y′ )2 /2 + 4y)dx, y(0) = 0, y(1) = 2.
0

Exercise 4.9 Find the extremal function for


 π/4
L(y) = (y′ )2 /2 − 3y′ y − 2y2 )dx, y(0) = 1, y(π/4) = −1.
0

Exercise 4.10 Find the extremal function for


 π/4
L(y) = (y′ )2 /2 − 3y′ y − 2y2 − xy)dx, y(0) = 1, y(π/4) = −1.
0

Exercise 4.11 Find the extremal function for


 2
L(y) = 2x2 (y′ )2 − y2 /2)dx, y(1) = 1, y(2) = −1.
1

Exercise 4.12 Find the extremal function for


 π
(y′ )2 − y2 + 4y cos(x) dx,

L(y) = y(0) = 0, y(π) = 0.
0

Exercise 4.13 Find the extremal function for


 2
3
L(y) = x2 (y′ )2 + 4xy + y2 )dx, y(1) = 1, y(2) = 2.
1 4
Exercise 4.14 (a) Show that
y′′ g
gy − y′ gx − =0
1 + (y′ )2
is the Euler-Lagrange equation for the functional
 b q
L(y) = g(x, y) 1 + (y′ )2 dx.
a
218 Calculus of Variations

(b) Find the extremal of


 b q
L(y) = x 1 + (y′ )2 dx, y(a) = A, y(b) = B.
a

You don’t need to find the constants of integration.


Exercise 4.15 Show that the extremal for
 b q
1
L(y) = 1 + (y′ )2 dx, y>0
a y

is
(x − B)2 + y2 = R2 ,
for appropriate constants B and R.
Exercise 4.16 Find the extremal for
 2 q
1
L(y) = 1 + (y′ )2 dx, y(1) = 0, y(2) = 1.
1 x

Exercise 4.17 (a) Show that the functional


 2
L(y) = x(y′ )2 − xy + y)dx, y(1) = 0, y(2) = 1
1

has the Euler-Lagrange equation


2xy′′ + 2y′ + x − 1 = 0.
(b) Use the substitution y′ = u to find the extremal y(x).
Exercise 4.18 (a) Find the extremal function y0 (x) for
 1
L(y) = (y′ )2 + 1)dx, y(0) = 0, y(1) = 1.
0

(b) Let  1
ϕ(ε) = F(x, y0 + εη, y′0 + εη ′ ).
0
Use the y0 from part (a) and η(x) = x(1 − x) to show
dϕ(ε)
ϕ ′ (0) = = 0.
dε ε=0
Exercise 4.19 Let p and q be known constants. Find the extremal y = y(x) that min-
imizes or maximizes the functional
 b
L(y) = (y2 + pyy′ + q(y′ )2 )dx, y(a) = A, y(b) = B,
a
Euler-Lagrange Equation 219

by considering the three cases:


(1). q = 0,
(2). q > 0,
(3). q < 0.

In each case, explain the effect of p on the solution.


Exercise 4.20 Find the general form of the extremal y = y(x) and show it is a relative
minimum for
 b
L(y) = (x(y′ )2 − yy′ + y)dx, a > b > 0.
a
Exercise 4.21 Consider the variational
 1
L(y) = xyy′ dx.
0

Does it have an extremal if:


(a) y(0) = 0, y(1) = 1;
(b) y(0) = 0, y(1) = 0.
Exercise 4.22 Consider the variational
 1
L(y) = yy′ dx.
0

Does it have an extremal if:


(a) y(0) = 0, y(1) = 1;
(b) y(0) = 0, y(1) = 0;
Exercise 4.23 Find the extremal for the functional
 1
L(y) = y′2 f (x)dx, y(0) = 0, y(1) = 1,
0

where
−1, 0 ≤ x < 12

f (x) = 1
1, 2 <x≤1
Exercise 4.24 Display a function y(x) that minimizes the functional
 1
L(y) = y2 (2x − y′ )2 dx, y(−1) = 0, y(1) = 1
−1

and yet y fails to have a second derivative at x = 0.


Exercise 4.25 Consider the variational
 1
(1 − y′2 )2 + y2 dx,

L(y) = y(0) = 0, y(1) = 0.
0
220 Calculus of Variations

(a) Find the corresponding Euler-Lagrange equation.(Hard to solve and so don’t


bother).
(b) Set y0 (x) = 0. Clearly it satisfies both boundary conditions. Find another func-
tion y1 (x) that satisfies both boundary conditions and y1 (x) ̸= 0 for all x ∈ (0, 1).
(c) Compare L(y0 ) and L(y1 ), and decide which one is likely to be a “better” can-
didate to minimize L, and explain why.
Exercise 4.26 Compute
δ L(y0 ) and δ 2 L(y0 ),
of the variational of Example 4.17.

4.3 Impact of y′ on Euler-Lagrange Equation


Consider the functional
 b
L(y) = F(x, y, y′ )dx, y(a) = A, y(b) = B.
a

So far we have encountered functionals F = F(x, y, y′ ) in which y′ has entered non-


linearly. This resulted in a second-order differential equations with two linearly in-
dependent solutions where the two constants are found using the provided boundary
conditions. To be precise, if y′ enters nonlinearly in F, then Fy′ is a function, possibly
in x, y and y′ . In this case we denote it by φ = φ (x, y, y′ ). Then

d dy dy′
φ = φx + φy + φy′ ,
dx dx dx
and the corresponding Euler-Lagrange equation becomes

d dy dy′
Fy′ − Fy = φx + φy + φy′ − Fy = 0,
dx dx dx
which is a second-order differential equation. On the other hand, if y′ enters linearly
d dy
in F, then Fy′ is a function in x and y only. Then φ = φ (x, y) and dx φ = φx + φy dx ,
which implies that
d dy
Fy′ − Fy = φx + φy − Fy = 0.
dx dx
The last expression is a first-order differential equation with its solution having only
one constant to be computed, based on two boundary conditions. In most cases, such
solution will not exist. To enforce this notion, we consider
 2
L(y) = x2 yy′ dx, y(1) = 1, y(2) = −1. (4.13)
1
Impact of y′ on Euler-Lagrange Equation 221

Then the corresponding Euler-Lagrange Equation gives 2xy = 0. Since x ̸= 0 we must


have y(x) = 0, which can not satisfy any of the given boundary conditions. Boundary
value problems, in general, are extremely sensitive to boundary conditions. This leads
us to the next point, where y′ enters linearly and the accompanying Euler-Lagrange
equation does not result in a well-posed differential equation. There are no constants
to compute in this scenario utilizing the specified boundary conditions in order for
the solution to satisfy them.
Let’s examine variationals of the form
 b 
L(y) = N(x, y)y′ + M(x, y) dx, y(a) = A, y(b) = B. (4.14)
a

Then,
d d ∂  
F ′ − Fy = N(x, y) − N(x, y)y′ + M(x, y)
dx y dx ∂y
= Nx + y Ny − y′ Ny + My



= Nx − My . Thus,

d
F ′ − Fy = 0, implies
dx y
Nx − My = 0. (4.15)
Relation (4.15) is not even a differential equation, but rather a relation that, in most
cases, can not satisfy both boundary conditions. For example, if F(x, y, y) = 2xy′ +y2 ,
d
then the corresponding Euler-Lagrange equation is F ′ − Fy = 2 − 2y = 0, only
dx y
when y(x) = 1, which may not satisfy any of the given two boundary conditions.
However, there is a useful result in the case Nx = My for all x and y, that we state and
prove in the following theorem.
Theorem 4.5 [Path independent] Let y(x) ∈ C1 ([a, b]) be an extremal function for
the functional (4.14). If (4.15) holds for all Nx = My then the value of L is path
independent. That is, there is a function f (x, y) such that

L(y) = f (b, y(b)) − f (a, y(a)).

Proof Suppose (4.15) holds for all x and y. Then there is a function f (x, y) such that
fy = N and fx = M. As a consequence, we have

dy
F = N(x, y)y′ + M(x, y) = fy + fx .
dx
df
This is saying that F = . Thus
dx
Fdx = d f .
222 Calculus of Variations

More precisely, we have

N(x, y)y′ + M(x, y) dx = d f .




Integrating both from x = a to x = b we get


 b 
L(y) = N(x, y)y′ + M(x, y) dx
a
 b x=b
= d f = f (x, y(x))

a x=a
= f (b, y(b)) − f (a, y(a)).

This shows the value of L is independent of the extremal y = y(x), and so L is path
independent. This completes the proof.
Example 4.8 Consider
 2
(x3 + y2 )y′ + 3x2 y dx,

L(y) = y(1) = 1, y(2) = −1.
1

The corresponding Euler-Lagrange equation is −2yy′ = 0. Either y(x) = 0 or y(x) =


constant. Neither one satisfies both boundary conditions, as was expected. Here
N(x, y) = x3 + y2 and M(x, y) = 3x2 y. Moreover,

Nx = 3x2 = My

and so condition (4.15) is satisfied for all x and y. So there exists a function f (x, y)
such that N = x3 +y2 = fy and M = 3x2 y = fx . This gives f (x, y) = 3x2 ydx = x3 y+
g(y), for some function g. In addition, fy = x3 + g′ (y) = N = x3 + y2 , which implies
y3 3
that g′ (y) = y2 . An integration yields g(y) = + c. Hence, f (x, y) = yx3 + y3 + c.
3
Finally, according to Theorem 4.5 we have
 
L(y) = f (2, y(2)) + c − f (1, y(1)) + c
29
= f (2, −1) − f (1, 1) = − .
3

4.3.1 Exercises
Exercise 4.27 Show each of the functionals is path independent and evaluate L.
 2
(3y + 7)y′ + (2x − 1) dx,

(a) L(y) = y(0) = 1, y(2) = 0,
0
 2
(2yx2 + 7)y′ + (2y2 x − 3) dx,

(b) L(y) = y(1) = 1, y(2) = −1,
1
Necessary and Sufficient Conditions 223
 2
3xy2 y′ + (x3 + y3 ) dx,

(c) L(y) = y(1) = 1, y(2) = 0,
1
 π/2 
(x cos(xy) + ey )y′ + (y cos(xy) + 1) dx,

(d) L(y) = y(0) = 0, y(π/2) = 1.
0
Exercise 4.28 Determine M(x, y) so that the functional
 b
1
(xexy + 2xy + )y′ + M(x, y) dx,

L(y) = a>b>0
a x
with fixed end points, is path independent.
Exercise 4.29 Develop a parallel theory for the variational with fixed end points
 b
N(x, y) + y′ M(x, y) dx.

L(y) =
a

4.4 Necessary and Sufficient Conditions


In this section we are interested in obtaining sufficient conditions under which the
variational (4.16) has a relative minimum or a relative extremum. For η(x) ∈ C and
y(x) ∈ Σ, consider the functional or variational
 b
L(y) = F(x, y, y′ )dx, y(a) = A, y(b) = B. (4.16)
a

Let F = F(x, y, y′ ) and replace y by y + εη(x) and y′ by y′ + εη ′ (x). Define


△F = F(x, y + εη, y′ + εη ′ ) − F(x, y, y′ ),
which is the change in F. An expansion of Maclaurin series of the first term on the
right hand side about ε gives
F(x, y + εη, y′ + εη ′ ) = F(x, y, y′ ) + Fy η + Fy′ η ′ ε

  ε2
+ Fyy η 2 + 2Fyy′ ηη ′ + Fy′ y′ (η ′ )2 + O(ε 3 ).
2!
Or,
 ε2  ′

△F = ε Fy η + Fy′ η ′ + Fyy η 2 + 2Fyy′ η η ′ + Fy′ y′ η 2 + O(ε 3 ).(4.17)
2!
We make the following definition.
Definition 4.6 Let y ∈ Σ and η ∈ C . If F = F(x, y, y′ ) then
a) the first variation of F is
δ F = Fy η + Fy′ η ′ , (4.18)
224 Calculus of Variations

b) the second variation of F is



δ 2 F = Fyy η 2 + 2Fyy′ η η ′ + Fy′ y′ η 2 . (4.19)

In a similar way, we may obtain the first and second variations of the functional L.
Let
△L = L(y + εη) − L(y),
which is the change in L. An expansion of Maclaurin series of the first term on the
right hand side about ε gives
 b
L(y + εη) = L(y) + ε [Fy η + Fy′ η ′ ]dx
a
 b  ε2
+ Fyy η 2 + 2Fyy′ ηη ′ + Fy′ y′ (η ′ )2 dx + O(ε 3 ).
a 2!

So,

ε2 2
△L(y) = εδ L(y) + δ L(y) + O(ε 3 ), (4.20)
2!
where O(ε 3 ) can be written as
 b
(ε1 η 2 + ε2 ηη ′ + ε3 η ′2 )dx. (4.21)
a

Due to the continuity of Fyy , Fyy′ , and Fy′ y′ , it follows that ε1 , ε2 , ε3 → 0 as ||η||1 → 0,
where
||η||1 = max |η(x)| + max |η ′ (x)|.
a≤x≤b a≤x≤b

So we have another definition.


Definition 4.7 Let y ∈ Σ, and η ∈ C . If F = F(x, y, y′ ) then
a) the first variation of L at y = y(x) along the direction of η(x) is
 b
δ L(y) = [Fy η + Fy′ η ′ ]dx, (4.22)
a

b) the second variation of L at y = y(x) along the direction of η(x) is


 b 
2
δ L(y) = Fyy η 2 + 2Fyy′ ηη ′ + Fy′ y′ (η ′ )2 dx. (4.23)
a

Recall from calculus, if a function f has a local minimum or maximum value at an


interior point x0 of its domain and if f ′ is defined at x0 , then f ′ (x0 ) = 0. In the next
lemma we give an analogous result to functionals.
Necessary and Sufficient Conditions 225

Lemma 11 If y ∈ C1 ([a, b]) is an extremal for the functional (4.16), then

δ L(y) = 0.

Proof We have from (4.22) that


 b
δ L(y) = [Fy η + Fy′ η ′ ]dx.
a

Perform an integration by parts on the second term in the integrand. Let dv =


∂F
η ′ (x)dx and u = ′ . Then
∂y

d ∂F 
v = η(x) and du = dx.
dx ∂ y′

Since η(b) = η(a) = 0, we arrive at


 b  b
d ∂F 
[Fy′ η ′ (x)dx = − η(x)dx.
a a dx ∂ y′

It follows that
 bh
d ∂ F i
δ L(y) = Fy − η(x)dx = 0,
a dx ∂ y′
by Euler-Lagrange equation. This completes the proof.

Note that the variation of L is


ε2 2
L(y + εη) − L(y) = εδ L(y) + δ L(y) + O(ε 3 ),
2!
and since y = y(x) is an extremal of L, we have δ L(y) = 0 by Lemma 11.
Hence,
L(y + εη) − L(y) = δ 2 L(y) + O(ε 3 ).
As a direct consequence we state the following.
Theorem 4.6 [Legendre necessary condition] Let δ L(y) and δ 2 L(y) be given by
(4.22) and (4.23), respectively, for the functional defined in (4.16).
1. If the extremal y = y(x) of L is a local minimum, then δ 2 L(y) ≥ 0,
2. Similarly, a necessary condition for the extremal y = y(x) of L to be a local
maximum is that δ 2 L(y) ≤ 0.
3. If δ 2 L(y) changes signs, then L can not have minima or maxima.

For the next example we need the following inequality.


226 Calculus of Variations

Lemma 12 [Poincare inequality] If f (x) is continuous on [a, b] with f (a) = f (b) =


0, then
 b 
2 (b − a)2 b ′
| f (x)| dx ≤ | f (x)|2 dx.
a π2 a

It takes some ingenuity to apply Theorem 4.6 as the next example shows.
Example 4.9 For a fixed b > 0 consider the functional
 b
(y′ )2 − y2 dx,

L(y) = y(0) = y(b) = 0.
0

Then, Fyy = −2, Fy′ y = 0, and Fy′ y′ = 2. It follows that


 b
1 2
(η ′ (x))2 − η 2 (x) dx.

δ L(y) =
2 0

Since η(0) = η(b) = 0, we have from Lemma 12 that


 b  b
b2
η 2 (x)dx ≤ (η ′ (x))2 dx.
0 π2 0

As a consequence we obtain
 b
1 2 b2 
δ L(y) ≥ 1 − 2 (η ′ (x))2 dx.
2 π 0

It is evident from the above inequality that δ 2 L(y) ≥ 0, for all such functions η and
extremal y if b ≤ π. This implies that y is a candidate for minimizing L.
As for the case b > π, we carefully choose η by

kπx
ηk (x) = sin( ), k = 1, 2, . . . .
b
It is evident that ηk (0) = ηk (b) = 0. A direct substitution of η and η ′ into δ 2 L(y)
yields
 b 2 2
1 2 k π 2 kπx 2 kπx

δ L(y) = cos ( ) − sin ( ) dx.
2 0 b2 b b
Using trigonometric substitutions, one can compute the definite integral and find

1 2 k 2 π 2 − b2
δ L(y) = .
2 2b2
2
Since b > π, we have δ 2 L(y) < 0 for k = 1, and δ 2 L(y) > 0 for k2 > πb 2 . This shows
that δ 2 L(y) changes signs and therefore in this case (b > π) the considered functional
L can not have either relative minimum or relative maximum. □
Necessary and Sufficient Conditions 227

In practice, if the functional of interest contain all of x, y, and y′ then it is difficult to


study the sign of δ 2 L(y), as it was evident from Example 4.9. So, it is in our interest
to explore the relation between the second variation and Fy′ y′ . Integrate by parts the
second term in the integrand of δ 2 L(y) that is defined by (4.23). Then
 b  b
d
2Fyy′ η(x)η ′ (x)dx = − η2 F ′ dx.
a a dx yy
Substituting into (4.23) gives the alternate form
 bh  d  i
δ 2 L(y) = η 2 Fyy − Fyy′ + (η ′ )2 Fy′ y′ dx. (4.24)
a dx

Now one can choose η so that the sign of (η ′ )2 Fy′ y′ dominates the sign of the in-
tegrand of δ 2 L(y) that is given by (4.24). In particular, in order to have δ 2 L(y) ≥ 0
for all η, it is necessary that Fyy′ ≥ 0. As a consequence we have the following theo-
rem.
Theorem 4.7 [Legendre necessary condition]
1. If y = y(x) is a local minimum of L in Σ, then

Fy′ y′ ≥ 0 for all x ∈ [a, b].

2. If y = y(x) is a local maximum of L in Σ, then

Fy′ y′ ≤ 0 for all x ∈ [a, b].

3. If Fy′ y′ changes signs, then L cannot have minima or maxima.

Proof We will only prove 1. since the proof of 2. follows along the lines. In addition,
our argument here is inspired by the one given in [12] or [21]. The idea of the proof
is to display a function η with η(a) = η(b) = 0, so that |η| is uniformly bounded and
at the same time |η ′ | can be made as large as we want it. One of the logical choice of
such η is in term of sine functions. We accomplish our proof by contradiction. That
is, assume there is a point x1 ∈ (a, b) such that

Fy′ y′ (x1 ) := Fy′ y′ (x1 , y(x1 ), y′ (x1 )) < 0.

By the continuity of Fy′ y′ , we can find a number ζ > 0 such that [x1 −ζ , x1 +ζ ] ⊂ [a, b]
Fy′ y′ (x1 )
with Fy′ y′ < 2 for all x ∈ (x1 − ζ , x1 + ζ ).
The idea is to chose η so that the term η ′2 Fy′ y′ dominates the other terms in the inte-
grand of δ 2 L(y). In other words, it is imperative that Fy′ y′ ≥ 0 in order for δ 2 L(y) ≥ 0.
Let k > 2 be an integer and set
(
1)
sin2k ( π(x−x ), x ∈ [x1 − ζ , x1 + ζ ]
η(x) = ζ
0, x∈ / [x1 − ζ , x1 + ζ ].
228 Calculus of Variations

Then
(
1) 1)

2kπ
sin2k−1 ( π(x−x ) cos( π(x−x ), x ∈ [x1 − ζ , x1 + ζ ]
η (x) = ζ ζ ζ
0, x∈
/ [x1 − ζ , x1 + ζ ].

Observe that η(x1 − ζ ) = η(x1 + ζ ) = 0, and η ′ (x1 − ζ ) = η ′ (x1 + ζ ) = 0, and


η ′′ (x1 − ζ ) = η ′′ (x1 + ζ ) = 0. Since η is zero outside [x1 − ζ , x1 + ζ ] and Fy′ y′ <
Fy′ y′ (x1 )
2 for all x ∈ (x1 − ζ , x1 + ζ ), we have that
 b  x1 +ζ
′2 ′
η Fy′ y′ dx = η 2 Fy′ y′ dx
a x1 −ζ
 x1 +ζ
Fy′ y′ (x1 ) 4k2 π 2 π(x − x1 ) π(x − x1 )
≤ sin4k−2 ( ) cos2 ( )dx
2 ζ2 x1 −ζ ζ ζ

Fy′ y′ (x1 ) 4k2 π 2 ζ π
= sin4k−2 (z) cos2 (z)dz
2 ζ 2 π −π
π(x − x1 ) 
by letting z =
ζ
F y′ y′ (x1 )
=2k2 p0 π ,
ζ
π 4k−2
for a fixed p0 = −π sin (z) cos2 (z)dz > 0. Since the term

2 d 
η Fyy − Fyy′

dx
is bounded independent of η, we take ζ small enough so that
 b Fy′ y′ (x1 )
η ′2 Fy′ y′ dx < 2k2 p0 π
a ζ
can be made as negative as we want, which in turns will make
 bh  d  i
δ 2 L(y) = η 2 Fyy − Fyy′ + (η ′ )2 Fy′ y′ dx < 0.
a dx

This is a contradiction to the fact that δ 2 L(y) ≥ 0. This completes the proof.

Warning:
Be aware that we only know when δ 2 L(y) ≥ 0 for all functions η we see that Fy′ y′ ≥ 0
(the reverse is not true). Also, so far we only know that if y = y(x) is a relative
minimum, then δ 2 L(y) ≥ 0 (Necessary condition).
Our ultimate goal is to have results that assure our solution is indeed the relative
minimum. That is, δ 2 L(y) ≥ 0 for all functions η implies that y = y(x) is a relative
minimum of L. This will be established after the next examples.
Necessary and Sufficient Conditions 229

Example 4.10 Consider the functional


 1 q
L(y) = x (y′ )2 /2 + 1 dx, y(−1) = y(1) = 1.
−1

It follows that
x
Fy′ y′ = ,
2((y′ )2 /2 + 1)3/2
which changes signs for x ∈ [−1, 1]. So by Legendre’s Theorem, this functional has
neither a local minimum nor a local maximum. One can easily verify that y(x) = 1 is
the only extremal. □
Example 3 Consider the functional
 1
(y′ )2 /2 + y dx,

L(y) = y(0) = 0, y(1) = 1.
0

Here,
F(x, y, y′ ) = (y′ )2 /2 + y.
Since Fyy = Fy′ y = 0, and Fy′ y′ = 1, it follows that
 1
2
δ L(y) = (η ′ )2 (x) > 0.
0

Thus, the necessary condition for a relative minimum is met. In particular, L can not
have a local maximum which requires δ 2 L(y) ≤ 0. The equation y = x2 /2 + x/2 can
be computed to identify a potential minimizer, and we’ll show later that it does, in
fact, minimize the functional. □
Sufficient Conditions
Our next task is to obtain conditions that are sufficient for a function y to be a relative
minimum or a relative maximum for the functional L. Let y(x) be an extremal of the
functional (4.16). We have established that if δ 2 L(y) ≥ 0 for all functions η then
Fy′ y′ ≥ 0. We will be in a great shape if we can show that

δ 2 L(y) ≥ 0

for all functions η if and only if

Fy′ y′ ≥ 0 for x ∈ [a, b].

The next lemma plays a crucial role in proving our results regarding sufficient con-
ditions.
Lemma 13 If α(x) > 0, and the ordinary differential equation

z2
z′ + β (x) − = 0 for x ∈ [a, b], (4.25)
α(x)
230 Calculus of Variations

has a solution z = z(x), then δ 2 L(y) > 0, where

α(x) = Fy′ y′ (x, y(x), y′ (x)), (4.26)

and
d
β (x) = Fyy (x, y(x), y′ (x)) − F ′ (x, y(x), y′ (x)). (4.27)
dx y y

Proof Let y ∈ Σ and η ∈ C . Remember our functional is given by (4.16). Using the
terms α and β , δ 2 L(y) can be put in the simplified form
 b 
2
δ L(y) = α(x)η ′2 + β (x)η 2 dx. (4.28)
a

Jacobi brilliantly recognized that for any continuous function z = z(x), one
has  b
zη 2 )′ dx = 0, for all functions η ∈ C .
a
He also observed that
(zη 2 )′ = 2zηη ′ + z′ η 2 .
With these two observations in mind, δ 2 L(y) given in (4.28) takes the form
 b 
δ 2 L(y) = αη ′2 + 2zηη ′ + (z′ + β )η 2 dx. (4.29)
a

We already know that if α(x) ≥ 0, x ∈ [a, b] then y = y(x) is a relative minimum. So


it is safe to assume α(x) > 0 for x ∈ [a, b]. Consider the integrand in (4.29). After
some manipulations we arrive at

z z2 
αη ′2 + 2zηη ′ + (z′ + β )η 2 = α η ′2 + 2 ηη ′ + 2 η 2
α α
z 2
+ z′ + β − η2
α
z 2 z2  2
= α η ′ + η + z′ + β − η .
α α
Thus, if
z2
z′ + β (x) − = 0,
α(x)
has a solution z, then (4.29) reduces to
 b
z 2
δ 2 L(y) = α(x) η ′ + η dx ≥ 0,
a α

for any η ∈ C . Furthermore,


δ 2 L(y) = 0
Necessary and Sufficient Conditions 231

if and only if the initial value problem


z
η′ + η = 0, η(a) = 0,
α
if and only if
η(x) = 0,
due to the uniqueness of the solution. This would violates the fact that η(x) can not be
zero on the whole interval [a, b]. This tells us that (η ′ + αz η)2 > 0, and hence
 b
z 2
2
δ L(y) = α(x) η ′ + η dx > 0.
a α
This completes the proof.

The million-dollar question is, when does the differential equation given by (4.25)
have a solution? We adopt the following terminology:
Definition 4.8 The second variation δ 2 L(y) of the functional L(y) is said to be pos-
itive definite if
δ 2 L(y) > 0 for all η ∈ C and η ̸= 0.

The results of Lemma 13 depend on the existence of a solution for the Ricatti non-
linear first-order differential equation given by (4.25). We introduce a new function
h = h(x) and use the transformation

α(x)h′ (x)
z(x) = − . (4.30)
h(x)
Then (4.25) is transformed to the Jacobi differential equation
 ′
α(x)h′ − β (x)h = 0 for x ∈ [a, b]. (4.31)

We already know from Chapter 1 that (4.31) has a solution defined on the whole
interval [a, b] as long as α(x) > 0 and β (x) is continuous. However, our next headache
stems from the fact of inverting the transformation to go back from z(x) to h(x). In
other words, we can not have the solution h(x) of (4.31) to vanish or have zeros in
[a, b]. The next definition regarding conjugacy plays an important role in deciding
whether or not the Jacobi equation (4.31) vanishes in [a, b] or not.
Definition 4.9 Two points x = ξ1 and x = ξ2 , ξ1 ̸= ξ2 , are said to be conjugate points
for the Jacobi differential equation (4.31) if it has solution h such that h ̸= 0 between
ξ1 and ξ2 , and h(ξ1 ) = h(ξ2 ) = 0.
Notice that (4.31) has the general solution of the form

h(x) = c1 h1 (x) + c2 h2 (x)

where h1 and h2 are two linearly independent solutions on [a, b].


232 Calculus of Variations

Remark 15 The following statements are equivalent:


i. There is no conjugate points to a in (a, b].
ii. The solution h = h(x) of the initial value problem
d
[α(x)h′ (x)] − β (x)h(x) = 0, h(a) = 0 and h′ (a) = 1 (4.32)
dx
has no zero in (a, b].

Thus, if the interval [a, b] contains no conjugate points, then the Jacobi equation
(4.31) admits a solution h that does not vanish at any points in [a, b]. We have the
following theorem.
Theorem 4.8 The Jacobi equation (4.31) has a nonzero solution for all x ∈ [a, b] if
α(x) > 0 and there are no conjugate points to a in (a, b].
The implication of Lemma 13 and Theorem 4.8 is that (4.31) will have a nonzero
solution, which is a necessary condition for δ 2 L(y) to be positive definite. Thus we
have the next theorem.
Theorem 4.9 Let y ∈ C1 ([a, b]) be an extremal for the functional (4.16). Suppose
that α(x) > 0 for all x ∈ [a, b]. If there are no conjugate points to a in (a, b], then the
second variation δ 2 L(y) is positive definite.
Example 4.11 Consider the functional
 1
(y′ )2 + y2 − yy′ dx, y(0) = 0, y(1) = 1.

L(y) =
0

We compute the Jacobi equation given by (4.31). It turns out that


d
α = Fy′ y′ = 2 and β = Fyy − F ′ = 2.
dx yy
 ′
Then, the Jacobi differential equation is 2h′ (x) −2h = 0, which reduces to h′′ (x)−
h(x) = 0, and has the general solution h(x) = c1 ex + c2 e−x . Applying h(0) = 0, and
h′ (0) = 1 we arrive at
1
h(x) = (ex − e−x ),
2
which has no zeros in (0, 1]. Thus, by Theorem 4.8 it follows that the second variation
δ 2 L(y) is positive definite on [0, 1]. □
Finally we have the following result that would yield Jacobi Necessary condition
along with Theorem 4.7.
Theorem 4.10 Let y ∈ C1 ([a, b]) be an extremal for the functional (4.16). Suppose
that α(x) > 0 for all x ∈ [a, b].
(1). If δ 2 L(y) is positive definite, then there is no conjugate points to a in (a, b].
Necessary and Sufficient Conditions 233

(2). If δ 2 L(y) ≥ 0 for all η ∈ C , then there is no conjugate points to a in (a, b).

Note that the statement δ 2 L(y) ≥ 0 for all η ∈ C , permits the possibility that
δ 2 L(y) = 0 for some η ̸= 0 ∈ C .

Proof We begin by proving (1). by first showing x = b can not be a conjugate point
to a. We do this by contradiction. Assume b is a conjugate point to a. Then there is
a function h∗ depending on x such that h∗ (a) = h∗ (b) = 0 and satisfying the Jacobbi
equation (4.31). That is   ′
α(x)h′∗ − β (x)h∗ = 0. (4.33)

Just a reminder that


α(x) = Fy′ y′ (x, y(x), y′ (x)),
and
d
β (x) = Fyy (x, y(x), y′ (x)) −
F ′ (x, y(x), y′ (x)).
dx y y
Multiplying (4.33) with h∗ (x) followed by an integration by parts on the first term in
the integrand yields
 b  b
′ b
h∗ (x) α(x)h′∗ (x) dx = h∗ (x)α(x)h′∗ (x) x=a − α(x)h′2
∗ (x)dx.
a a

Then the full integral becomes


 b ′ b
α(x)h′∗ (x) − β (x)h∗ h∗ dx = h∗ (x)α(x)h′∗ (x) x=a

a
 b  b
− α(x)h′2
∗ (x)dx − β (x)h2∗ (x)dx
a a
 b
α(x)h′2 2

= − ∗ (x) + β (x)h∗ (x) dx = 0,
a
 ′
as a consequence of α(x)h′∗ − β (x)h∗ = 0. Thus,
 b
α(x)h′2 2

∗ (x) + β (x)h∗ (x) dx = 0.
a

Our aim is to show the above integral is δ 2 L(y). Notice that


 b  b
[α(x)h′∗ (x)]′ dx = [Fy′ y′ h′∗ (x)]′ dx.
a a
234 Calculus of Variations

Performing an integration by parts yields


 b  b
b
[Fy′ y′ h′∗ (x)]′ h∗ (x)dx = h∗ (x)h′∗ (x)Fy′ y′ x=a − Fy′ y′ h′2
∗ (x)dx
a a
 b
= − Fy′ y′ h′2
∗ (x)dx. (4.34)
a

Since h∗ is a solution of (4.33) we have


 b h ′ i
α(x)h′∗ − β (x)h∗ h∗ (x)dx = 0. (4.35)
a

Substituting (4.34) into (4.35) gives


 b 
δ 2 L(y) = Fy′ y′ h′2 2
∗ (x) + β (x)h∗ (x) dx = 0.
a

This implies there is a nontrivial η ∈ C such that the second variation vanishes,
contradicting the fact that δ 2 L(y) is positive definite. Hence b can not be conjugate
to a. Left to show that there is no conjugate points to a in (a, b). We follow the proof
given by Gelfand and Fomin ([12], p. 109). The plan is to build a family of positive
definite functionals K(µ), which depend on the parameter µ ∈ [0, 1], such that K(1)
is the second variation and K(0) is unconstrained by conjugate points to a. This
means that any solution to the Jacobi equation for K will be a continuous function
of µ. This continuity is then used by to demonstrate that the absence of a conjugate
point for K(0) implies that for K(µ), and in particular K(1). Let K represent the
functional as defined by
 b
K(µ) = µδ 2 L(y) + (1 − µ) η ′2 (x)dx.
a
 b
It can be easily shown that η ′2 (x)dx has no conjugate points in (a, b]. Moreover,
a
K(µ) is positive definite for all µ ∈ [0, 1]. The Jacobi Equation associated to K(µ)
is h  i′
(J)µ := µα(x) + (1 − µ) u′ − µβ (x)u = 0. (4.36)

Every solution u(x; µ) to (4.36), however, is continuous with regard to µ ∈ [0, 1]. As
a result, we may state that u(x, µ), has a continuous derivative with respect to µ
for all µ in an open interval including [0, 1] because µα(x) + (1 − µ) > 0 for all
µ ∈ [0, 1]. Therefore, the solution u(x; µ) with u(a; µ) = 0 and u′ (a; µ) = 1 depends
on µ continuously for x ∈ (a, b]. Let’s begin by the value µ = 0. Then (J)0 of (4.36)
gives u′′ = 0, with the solution

u(x; 0) = x − a

which has no conjugate points in (a, b). Next we deal with µ = 1, and assume the
contrary. That is there is a conjugate point c∗ ∈ (a, b], that is, u(c∗ ; 1) = 0. Then
Necessary and Sufficient Conditions 235

there is µ0 ∈ (0, 1) so that corresponding solution u(x; µ0 ) satisfies u(b; µ0 ) = 0.


This implies (J)µ0 has a nonzero solution u(x; µ0 ) with

u(a; µ0 ) = u(b; µ0 ) = 0.

We will try to get a contradiction by concluding δ 2 L(y) = 0. Multiply (J)µ with


u = u(x; µ0 ) and then integrate the resulting equation from x = a to x = b. After some
calculations we arrive at
 b h  i
µ0 α(x) + (1 − µ0 ) u′2 + µ0 β (x)u2 dx = 0,
a

which is equivalent to
 b
2
µ0 δ L(y) + (1 − µ0 ) η ′2 (x)dx = 0
a

with η(x) = u(x; µ0 ) ̸= 0 and η ∈ C . This is a contradiction to the fact that δ 2 L(y) >
b
0 and a η ′2 (x)dx > 0 for all η ̸= 0 ∈ C .
The proof of (2). follows along the same lines beginning with the statement “Left
to show that there is no conjugate points to a in (a, b).” This completes the
proof.

The next result is known as Jacobi Necessary Condition, which is consequential of


Theorems 4.7 and 4.10.
Theorem 4.11 [Jacobi necessary condition] Let y ∈ C1 ([a, b]) be an extremal for
the functional (4.16) with α(x) > 0 for x ∈ [a, b]. If y = y(x) is a local minimum, then
there is no conjugate point to a in (a, b).
Remark 16 As we shall see next that, if α(x) > 0 for x ∈ [a, b] and under the strong
condition that there is no conjugate point to a in (a, b], then y = y(x) is a local min-
imum. In other words, the existence of no conjugate point to a in (a, b] is equivalent
to δ 2 L(y) > 0.
In the next theorem we provide sufficient conditions for relative minimum and rela-
tive maximum.
Theorem 4.12 [Legendre sufficient condition] Let y0 (x) ∈ C1 ([a, b]) (smooth) be an
extremal function for the functional
 b
L(y) = F(x, y, y′ )dx, y(a) = A, y(b) = B.
a

Assume there is no conjugate point to a in (a, b].


1. If Fy′ y′ (x, y0 (x), y′0 (x)) > 0, then y0 (x) is relative minimum for L(y).
2. If Fy′ y′ (x, y0 (x), y′0 (x)) < 0, then y0 (x) is relative maximum for L(y).
236 Calculus of Variations

Proof We follow the proof of Sagan, [21]. Let α be given by (4.26). Assume α(x) > 0
and that the interval [a, b] does not contain any conjugate points to a Then, due to
the continuity of the Jacobi’s equation (4.31), a bigger interval [a, b + ε] exists that
still has no conjugate points to a and is such that α(x) > 0 in [a, b + ε]. For nonzero
constant ζ , consider the variational
 b   b
α(x)η ′2 + β (x)η 2 dx − ζ 2 η ′2 dx. (4.37)
a a

Then the corresponding Euler-Lagrange equation of (4.37) is


d
β (x)η − [(α − ζ 2 )]η ′ ] = 0. (4.38)
dx
Given that α(x) is positive in [a, b + ε] and so has a positive greatest lower bound,
and that the solution to the equation (4.38) satisfying η(a) = 0, η ′ (a) = 1 depends
continuously on ζ for all sufficiently small ζ , we have:
1. α(x) − ζ 2 > 0, a ≤ x ≤ b;
2. The solution to the equation (4.38) satisfies η(a) = 0 and η ′ (a) = 1 does not
vanish for a ≤ x ≤ b.

Thus, by Theorem 4.8, these two conditions imply that the quadratic functional (4.37)
is positive definite for all sufficiently small ζ . That is, there exists a positive constant
d such that  b  b

α(x)η ′2 + β (x)η 2 dx > d η ′2 dx. (4.39)
a a
As a consequence of (4.39), the functional or variational L(y) has a minimum. In
other words, if y = y(x) is the extremal and y = y(x) + η(x) is a sufficiently close
neighboring curve, then from the notation of Definition 4.3, and equations (4.20)
and (4.21) we have that
 b   b
L(y + η) − L(y) = α(x)η ′2 + β (x)η 2 dx + (ε1 η 2 + ε2 η ′2 )dx, (4.40)
a a

where ε1 (x), ε2 (x) → 0 as ||η||1 → 0, uniformly for a ≤ x ≤ b. On the other hand,


from Lemma 12 we see that
 b  b
(b − a)2
η 2 (x)dx ≤ (η ′ (x))2 dx.
a 2 a

This yields
 b 
 (b − a)2  b ′
(ε1 η 2 + ε2 η ′2 )dx ≤ ε 1 + (η (x))2 dx, (4.41)
a 2 a
Necessary and Sufficient Conditions 237

when |ε1 (x)| ≤ ε, |ε2 (x)| ≤ ε. Since we can chose ε > 0 arbitrarily small, it follows
from (4.39) and (4.41) that
 b   b
L(y + η) − L(y) = α(x)η ′2 + β (x)η 2 dx + (ε1 η 2 + ε2 η ′2 )dx > 0,
a a

for sufficiently small ||η||1 . Therefore, we conclude that the extremal y = y(x) is a
relative minimum of the functional (4.16). This completes the proof of 1. The proof of
2. is not trivial, and it follows along the lines of the proof of 1.
Example 4.12 Consider the functional
 π/2
L(y) = ((y′ )2 − y2 )dx, y(0) = 1, y(π/2) = 0. (4.42)
0

Then y0 (x) = cos(x) is an extremal for (4.42). Now,

Fyy′ = 0, Fy′ y′ = 2, and Fyy = −2.

Then, the Jacobi differential equation is

h′′ (x) + h(x) = 0,

has the nontrivial solution h(x) = sin(x). Clearly, h(0) = 0, and there are no other
points a∗ ∈ (0, π/2] such that h(a∗ ) = 0. Therefore, the interval [0, π/2] admits no
conjugate points. More over, the Legendre condition

Fy′ y′ (x, y0 (x), y′0 (x)) = 2 > 0,

and by 1. of Theorem 4.12, y0 (x) = cos(x) minimizes or a relative minimum of the


functional (4.42). Note that if we make use of the equivalent condition for conjugacy
ii. of Remark 15, we see that h(x) = sin(x) solves the Jacobi differential equation
and it satisfies h(0) = 0, h′ (0) = 1 and h(x) does not vanish at any other points in the
interval (0, π/2]. Therefore there are no conjugate points. □
Example 4.13 Show the extremal y(x) = 6(1 − 1x ) is a relative minimum for the
functional  2
L(y) = x2 (y′ )2 dx, y(1) = 0, y(2) = 3.
1

We have Fy′ = 2x2 y′ , d


dx Fy′ = 4xy′ + 2x2 y′′ , and Fy = 0. This yields the Euler-
Lagrange equation
x2 y′′ + 2xy′ = 0,
which has the solution
1
y(x) = 6(1 − ),
x
by the method of Section 1.11. Next we compute the Jacobi equation. It turns out
that
d
α = Fy′ y′ = 2x2 , and β = Fyy − Fyy′ = 0.
dx
238 Calculus of Variations

Then, by (4.32) the Jacobi differential equation is


 ′
2x2 h′ (x) = 0.
c1
A direct integration leads to h(x) = − + c2 . Applying h(1) = 0 and h′ (1) = 1, we
2x
arrive at
1
h(x) = − + 1,
x
which has no zeros in (1, 2]. Finally
Fy′ y′ = 2x2 > 0 for all x ∈ [1, 2].
We conclude by Theorem 4.12, the extremal y(x) = 6(1 − 1x ) is a relative minimum
for the functional. □
Remark 17 In Theorem 4.12, if y0 (x) is a solution of the Euler-Lagrange equation
and
Fy′ y′ (x, y0 (x), y′0 (x)) = 0,
then y0 (x) is neither a maximum nor a minimum and it is said to be a saddle path.
It is unclear if extremals with conjugate points may be categorized in a way that is
similar to how saddle points are classified in finite dimensions. Although many phys-
ical applications of such a classification may be of little interest, it turns out that it
is unquestionably a lucrative area of research in topology and differential geometry.
The Calculus of Variations in the Large, a large area of study invented by M. Morse
[16], is founded on the classification of extremals with conjugate points. Morse the-
ory is outside the scope of this book, and we urge the interested reader to consult the
reference [16].

4.4.1 Exercises
Exercise 4.30 Find the extremal function for
 1
(y′ )2 + y2 + 2yex dx,

L(y) = y(0) = 0, y(1) = 1
0
and show it minimizes the functional L.
Exercise 4.31 Find the extremal function for
 π/4
(y′ )2 /2 − 4y dx,

L(y) = y(0) = 0, y(π/4) = 1
0
and show it minimizes the functional L.
Exercise 4.32 Find the extremal function for
 2
L(y) = x2 (y′ )2 + y′ )dx, y(1) = 1, y(2) = 3
1
and show it minimizes the functional L.
Applications 239

Exercise 4.33 Find the extremal function for


 1q
L(y) = 1 + (y′ )2 dx, y(0) = 0, y(1) = 1
0

and show it minimizes the functional L.


Exercise 4.34 Find the extremal y = y(x) and show it minimizes
 2
L(y) = x3 (y′ )2 dx, y(1) = 0, y(2) = 1.
1

Exercise 4.35 Let g(x) be continuous and positive on the interval [a, b] with a > b >
0. Show that if y = y(x) is an extremal for the functional
 b
L(y) = g(x)(y′ )2 dx,
a

with fixed end points, then it minimizes the functional.


Exercise 4.36 Show that if y = y(x) is an extremal for the functional
 π
y sin(x) − (y′ )2 + 2yy′ + 1 dx,

L(y) =
0

with fixed end points, then it is a relative maximum.

4.5 Applications
This section is devoted to the application of calculus of variations. We will look into
familiar problems in physics such as minimal surface, geodesics on sphere, and the
histochrone problem.
Minimal surface area
Suppose we have a curve y given by y = f (x) that is continuous on [a, b]. For sim-
plicity, we assume f (x) > 0 on [a, b]. The goal is to find the curve passing thorough
the points P(a, A) and Q(b, B) which when rotated about the x-axis gives a minimum
surface area. This is depicted in Fig. 4.5.
Let ds be the arc length of PQ. Then at any point on the curve, ds rotates through a
ds
distance 2πy around the x-axis. Hence the sectional area is 2πyds = 2πy dx. There-
 b  b dx
ds p
fore, the total surface area is 2πy dx = 2πy 1 + y′2 dx. We must minimize
a dx a
the functional
 b p
L(y) = 2πy 1 + y′2 dx, y(a) = A, y(b) = B.
a
240 Calculus of Variations
y

y = f (x)
B
A
x
a b

FIGURE 4.5
Surface of revolution; minimal surface area.


Since F = y 1 + y2 is independent of x, we use

F − y Fy = C.

After simple calculations, it follows that


 yy
y 1 + y2 − y  = C,
1 + y2
which simplifies to
y
 = C.
1 + y2

Solving for y we arrive at y = C1 y2 −C2 , and as a consequence, we are to
solve
dy
C = dx.
y −C2
2

dy
Let y = C cosh(t). Using the identity cosh2 (u) − sinh2 (u) = 1, and dt = C sinh(t) we
have
dy C sinh(t)
C  =C dt = Ct.
y2 −C2 C sinh(t)

But Cy = cosh(t), which implies that t = cosh−1 ( Cy ). Thus after integrating both sides
we end up with
y
C cosh−1 ( ) = x + K,
C
or,
y x
cosh−1 ( ) = + K.
C C
Taking cosine hyperbolic inverse on both sides leads to
x
y = C cosh( + K),
C
Applications 241

A(x1 , y1 )

B(x2 , y2 )

v
mg

FIGURE 4.6
Brachistochrone curve.

where the constants C and K can be found using the boundary conditions. The graph
of the solution represents catenary.
In engineering, catenaries are frequently used in designing bridges, roofs and
arches.
Brachistochrone curve
A brachistochrone curve, also known as a curve of fastest descent in physics and
mathematics, is the curve on a plane between a point A and a lower point B, where B
is not directly below A, on which a bead slides frictionlessly under the influence of a
uniform gravitational field to a given end point in the shortest amount of time. Johann
Bernoulli posed the issue in 1696, asking: “Given two points A and B in a vertical
plane, what is the curve sketched out by a point acting only under the influence of
gravity, which starts at A and reaches B in the shortest time?” For the mathematical
set up, we assume a mass m with initial velocity zero slides with no friction under the
force of gravity g from a point A(x1 , y1 ) to a point B(x2 , y2 ) along a wire defined by
a curve y = f (x) in the xy-plane (x1 < x2 , y1 > y2 ). Which curve leads to the fastest
time of descent? See Fig. 4.6.
A variational problem can be formulated by computing the time of descent t for a
fixed curve connecting the points A and B. Let s denotes the distance traveled and
ds ds
v = v(t) represents the velocity. Then v = , which implies that dt = . The arc
p dt v
length ds of AB is ds = 1 + y′2 . To obtain an expression for v we use the fact that
energy is conserved through the motion; that is
(kinetic energy at t > 0) + (potential energy at t > 0) = (kinetic energy at t = 0) +
(potential energy at t = 0). This translate into
1 2
mv + mgy = 0 + mgy1 . (4.43)
2
Solving for v we get p
v= 2g(y1 − y(x)).
242 Calculus of Variations
ds
Using the obtained values of ds and v in dt = gives
v
p
1 + y′2
dt = p dx.
2g(y1 − y(x))

Integrating both sides from x = x1 to x = x2 we obtain the total time of descent


 x2 p
1 + y′2
t= p dx.
x1 2g(y1 − y(x))

Thus our problem is to minimize the functional or variational


 x2 p
1 + y′2
L(y) = p dx, y(x1 ) = y1 , y(x2 ) = y2 . (4.44)
x1 2g(y1 − y(x))

Notice that F is independent of x, and so we make use of the necessary Euler-


equation
F − y′ Fy′ = C.
After some calculations we arrive at
p
1 + y′2 (1 + y′2 )−1/2
√ − y′2 √ = C,
y1 − y y1 − y

which reduces to
dy 2 1 −C2 (y1 − y)
= .
dx C2 (y1 − y)
dy
Solving for and separating the variables, it follows that
dx
p
y1 − y)
dx = − p dy, C1 = C−2 .
C1 − (y1 − y)

dy
The negative sign is due to the fact that < 0. Integrating both sides and using the
dx  
transformation y1 − y = C1 sin2 (ϕ/2), we obtain x = C1 /2 ϕ − sin(ϕ) + C2 . The
solution is then
 
y1 − y = C1 sin2 (ϕ/2), x = C1 /2 ϕ − sin(ϕ) +C2 ,

which is the parametrization of a cycloid.


Great circle: Geodesic
In this problem, we are interested in finding the shortest path between two points on a
sphere with radius a > 0. It turned out that the shortest surface path between them is
an arc of a great circle between the two points. This problem is analogous to finding
Applications 243

the shortest distance between two points in a plane. Next, we formulate the problem
into a variational equation and find its solution. Let a > 0 and consider the sphere
centered at the origin with radius a,

x2 + y2 + z2 = a2 .

We will use spherical coordinates

x = r sin(θ ) cos(φ ), y = r sin(θ ) sin(φ ), z = r cos(θ ),

where θ is the angle from the positive z- axis, φ is the angle from the positive x-axis
and r = a is constant. By the chain rules we have

∂x ∂x
dx = dθ + dφ ,
∂θ ∂φ

∂y ∂y
dy = dθ + dφ ,
∂θ ∂φ
and
∂z ∂z
dz = dθ + dφ .
∂θ ∂φ
As a consequence, we arrive at

dx = r cos(θ ) cos(φ )dθ − r sin(θ ) sin(φ )dφ ,

dy = r cos(θ ) sin(φ )dθ + r sin(θ ) cos(φ )dφ ,


and
dz = −r sin(θ )dθ .
Using the identity cos2 (u) + sin2 (u) = 1, and after some calculations, we arrive
at
 
(dx)2 + (dy)2 + (dz)2 = a2 (dθ )2 + sin2 (θ )(dφ )2
 dφ 
= a2 1 + sin2 (θ )( )2 (dθ )2 .

Let P(a, θ1 , φ1 ) and Q(a, θ2 , φ2 ) be any two points on the sphere. Then the arc length
ds between the two points is given by
r
dφ 
q
2 2
ds = (dx) + (dy) + (dz) = a 2 1 + sin2 (θ )( )2 dθ .

Q
Knowing that s = P ds, we arrive at
 θ2
r
dφ 2
s=a 1 + sin2 (θ )( ) dθ .
θ1 dθ
244 Calculus of Variations
dy dφ
Setting x = θ and y = φ . Then, dx = dθ and y′ = = . Thus, the problem
dx dθ
reduces to minimizing the functional
 x2 q
L(y) = 1 + sin2 (x)(y′ )2 dx, y(x1 ) = y1 , y(x2 ) = y2 .
x1
q
Since F = 1 + sin2 (x)(y′ )2 is independent of y we use alternate form of Euler-
Lagrange equation (Fy′ )x = 0, which implies Fy′ = c, for constant c. It can be obtained
that
y′ sin2 (x)
Fy′ = q = c.
1 + sin2 (x)(y′ )2
Solving for y′ we see that

c csc2 (x)
y′ = p .
1 − c2 (1 + cot2 (x))
Separating the variables and then integrating both sides yiels

c csc2 (x)
y= p + constant.
1 − c2 (1 + cot2 (x))

Let u = c cot(x). Then du = −c csc2 (x)dx and the above integral reduces to

du  u 
y=− √ = cos−1 √ + d,
1 − c2 − u2 1 − c2
for some constant d. This implies that
u
√ = cos(y − d),
1 − c2
or p
c cot(x) = 1 − c2 cos(y − d).
Finally, replacing x by θ and y by φ , leads to the solution
p
c cot(θ ) = 1 − c2 cos(φ − d).

Next we try to make some sense out of this solution. Multiply both sides by
a sin(θ ), where a is the radius of the sphere and at the same time use cos(u − v) =
cos(u) cos(v) + sin(u) sin(v) to get
p 
ca cos(θ ) = 1 − c2 a cos(d) sin(θ ) cos(φ ) + a sin(d) sin(θ ) sin(φ ) .

Recall that, we are in spherical coordinates, and so the above equation takes the form
in rectangular coordinates
p
cz = 1 − c2 cos(d)x + sin(d)y , c2 ∈ (0, 1)

Applications 245

which represents an equation of the plane that intersects the sphere. Since the plane
passes through the centre of the sphere, which is the origin, the section of the sphere
by the plane is the great circle, or geodesic. All sections of other planes are small
circles. This great circle has two arcs between P and Q; the major arc, and the minor
has the minimum length. This is the geodesic on the surface of a sphere. Recall, a
geodesic on a given surface is a curve lying on that surface along which distance
between two points is as small as possible.

4.5.1 Exercises
Exercise 4.37 Show that the shortest path between two points on a circular cylinder
is along the circular helix joining them. Assume the two points are not on a generator.
Hint: use cylindrical coordinates to parametrize the circular cylinder x2 + y2 = a2 .
Let P(a, θ1 , z1 ) and Q(a, θ2 , z2 ). Compute ds and then integrate to obtain the varia-
tional that needs to be minimized.
p
Hint: Let x = a cos(θ ), y = a sin(θ ), z = z(θ ). Show ds = a2 + [z′ (θ )]2 dθ .
Exercise 4.38 Find the geodesics on a right circular cone. Use spherical coordi-
nates
x = u sin(α) cos(v), y = u sin(α) sin(v), z = u cos(v),
q q
to show ds = 1 + u2 sin2 (α)(v′ )2 du and minimize 1 + u2 sin2 (α)(v′ )2 du,
where α is the apex angle. If you replace u with x and v with y, then you are to
minimize  q
L(y) = 1 + x2 sin2 (α)(y′ )2 dx.

Exercise 4.39 [Hanging chain] Let y = y(x) be the curve configuration of a uniform
inextensible heavy chain hanging from two fixed points P(a, A) and Q(b, B) at rest in
a constant gravitational field. For mathematical convenience assume the rope density
and gravity are both one. Show that the shape of the curve y is a catenary.
Answer: y(x) = c cosh x−d

c , for constants c and d.
Exercise 4.40 [Minimal surface] Consider the solution of the Minimal surface prob-
lem
x
y = C cosh( + K).
C
Show that under the boundary conditions
L L
y(− ) = y( ) = 1,
2 2
the constant K = 0.
Hint: Make use of the identities cosh(x + y) and cosh(x − y).
246 Calculus of Variations

4.6 Generalization of Euler-Lagrange Equation


In this section, we extend the development of Euler-Lagrange equations to variational
with higher-order derivatives and variational involving several variables.
Generalizations to variational with higher-order derivatives
Let y = y(x) ∈ C4 [a, b] and consider the variational with second-order derivative and
given boundary conditions
 b
L(y) = F(x, y, y′ , y′′ )dx (4.45)
a

y(a) = A1 , y′ (a) = A2 , y(b) = B1 , y′ (b) = B2 .


Let η = η(x) ∈ C4 [a, b] satisfying

η(a) = η ′ (a) = η(b) = η ′ (b) = 0.

We follow the same development as in Section 4.2. For ε > 0, set

y(x) + εη(x),

where y is an extremal function for the functional L(y) given by (4.45). In the func-
tional L(y) replace y by y + εη to arrive at
 b
L(ε) = F(x, y + εη, y′ + εη ′ , y′′ + εη ′′ )dx.
a

Once y and η are assigned, then L(ε) has extremum when ε = 0. But this possible
only when
dL(ε)
= 0 when ε = 0.

dL(ε) dx
Suppress the arguments in F and compute dε and notice that since dε =0
 b
dL(ε) ∂
= F(x, y + εη, y′ + εη ′ , y′′ + εη ′′ )dx

dε ∂ε a ε=0
 bh i
= Fy η + Fy′ η ′ + Fy′′ η ′′ dx.
a

We perform an integration by parts on the second and third terms in the integrand.
Let dv = η ′ (x)dx, and u = ∂∂ yF′ . Then

d ∂F 
v = η(x) and du = dx.
dx ∂ y′
Generalization of Euler-Lagrange Equation 247

It follows that
 b  b
′ d
Fy′ η dx = − F ′ η(x)dx,
a a dx y

since η(a) = η(b) = 0. Performing integration by parts twice on the third term
gives
 b  b 2
d
Fy′′ η ′′ dx = Fy′′ ηdx.
a a dx
Consequently, we have
 bh
d d2 i
Fy − Fy′ + 2 Fy′′ η(x)dx = 0.
a dx dx

It follows from Lemma 10 that


d d2
Fy − Fy′ + 2 Fy′′ = 0. (4.46)
dx dx
for all functions η(x). Equation (4.46) is referred to as Euler-Lagrange equa-
tion.
Remark 18 1. Equation (4.46) is a fourth order ordinary differential equation.
2. The function y satisfying the Euler-Lagrange equation is a necessary, but not
sufficient, condition for L(y) to be an extremum. In other words, a function y(x)
may satisfy the Euler-Lagrange equation even when L(y) is not an extremum.
We have the following theorem.
Theorem 4.13 [Euler-Lagrange equation] If a function y = y(x) ∈ C4 ([a, b]) is an
extremal to the variational problem in (4.45), then y(x) must satisfy the Euler-
Lagrange equation
d d2
Fy − Fy′ + 2 Fy′′ = 0.
dx dx
The prove of the results in Remark 19 are left as an exercise.
Remark 19 Let y = y(x) ∈ C4 [a, b] be an extremal of (4.45).
(a) If F does not contain y, then the respective necessary Euler-Lagrange equation
reduces to
d
F ′′ − Fy′ = constant. (4.47)
dx y
(b) If F does not explicitly contain x, then the corresponding required Euler-
Lagrange equation becomes
d 
y′′ Fy′′ − y′ Fy′′ − Fy′ − F = constant. (4.48)
dx
248 Calculus of Variations

The aforementioned findings are easily generalized to functionals with nth order
derivatives. Let y = y(x) ∈ Cn [a, b] and consider the variational with nth order deriva-
tives  b
L(y) = F(x, y, y′ , y′′ , y′′′ , . . . , y(n−1) , y(n) )dx
a
and boundary conditions

y(a) = A1 , y′ (a) = A2 , ... , y(n−1) (a) = An ,

y(b) = B1 , y′ (b) = B2 , ... , y(n−1) (b) = Bn .


Then it can be easily shown that y(x) satisfies the necessary Euler-Lagrange equa-
tion
d d2 dn
Fy − Fy′ + 2 Fy′′ + . . . + (−1)n n Fy(n) = 0.
dx dx dx

Example 4.14 Find the extremal y = y(x) for the functional


 1
x + y′2 + (y′′ )2 dx

L(y) =
0

subject to
y(0) = 0, y′ (0) = 1, y(1) = −1, y′ (1) = 2.
The corresponding necessary Euler-Legandre condition is

y(4) − y′′ = 0.

Using the method of Section 1.8, we obtain the general solution

y(x) = c1 + c2 x + c3 ex + c4 e−x .

Applying the given boundary conditions yields

1 + e(2 − e) − 2e−1 e(e − 1) + e−1 − 1


c1 = , c2 = ,
e − e−1 e − e−1

3 − 2e−1 e + 2(e−1 − 1)
c3 = , c4 = .
e − e−1 1 − e−2

Generalizations to variational involving several variables.
Let y, z ∈ C2 [a, b], and consider the variational with two variables y and z
 b
L(y, z) = F(x, y, y′ , z, z′ )dx, (4.49)
a
Generalization of Euler-Lagrange Equation 249

with boundary conditions

y(a) = A1 , y(b) = B1 , z(a) = A2 , z(b) = B2 .

Let η1 = η1 (x) ∈ C([a, b]) and η2 = η2 (x) ∈ C([a, b]), such that

η1 (a) = η1 (b) = η2 (a) = η2 (b) = 0.

By imitating the derivation of previous work we arrive at


 b
dL(ε) ∂
= F(x, y + εη1 , y′ + εη ′ , z + εη2 , z′ + εη2′ )dx

dε ∂ε a ε=0
 bh i
= Fy η1 + Fy′ η1′ + Fz η2 + Fz′ η2′ dx.
a

dL(ε)
dx
since dε = 0. Setting dε |ε=0 and integrating by parts the terms that involves η1′

and η2 we arrive at
 bh
d  d  i
Fy − Fy′ η1 (x)dx + Fz − Fz′ η2 (x)dx = 0,
a dx dx

that must hold for all η1 (x), η2 (x). So without loss of generality, we assume it holds
for η2 (x) = 0. Then, we have
 b
d 
Fy − F ′ η1 (x)dx = 0
a dx y

and by Lemma 10, we arrive at

d
Fy − F ′ = 0.
dx y
Substituting this back into the above integral gives
 b
d 
Fz − F ′ η2 (x)dx = 0,
a dx z
d
and by Lemma 10, we see that Fz − dx Fz′ = 0. As a consequence, we state the fol-
lowing theorem.
Theorem 4.14 [Euler-Lagrange equation] If the functions y = y(x), z = z(x) are ex-
tremal of the variational problem in (4.49), then y(x), z(x) must satisfy the the pair
of Euler-Lagrange equations
d d
Fy − F ′ = 0, Fz − F ′ = 0.
dx y dx z
250 Calculus of Variations

Again, the above discussion can be generalized to a variational with n variable func-
tions. To see this, we assume each of yi = yi (x) ∈ C([a, b]), i = 1, 2, . . . n is an extremal
for the variational
 b
L(y1 , y2 , . . . , yn ) = F(x, y1 , y2 , . . . , yn , y′1 , y′2 , . . . , y′n )dx,
a
with
yi (a) = Ai , yi (b) = Bi , i = 1, 2, . . . , n.
Then each of yi = yi (x), i = 1, 2, . . . n must satisfy the necessary Euler-Lagrange
equation
d
Fyi − Fy′i = 0, i = 1, 2, . . . n.
dx

Example 4.15 Consider the functional


 π/2
L(y, z) = (x + y2 − z2 + y′2 + z′2 )dx
0
with boundary conditions
y(0) = 1, y(π/2) = 2, z(0) = −1, z(π/2) = 4.
The corresponding pairs of Euler-Lagrange equations given in Theorem 4.14 are
y′′ − y = 0, z′′ + z = 0,
with the general solutions
y(x) = c1 ex + c2 e−x , z(x) = c3 sin(x) + c4 cos(x),
where
2 − e−π/2 2 − e−π/2
c1 = , c2 = 1 − , c3 = 4, c4 = −1.
eπ/2 − e−π/2 eπ/2 − e−π/2

4.6.1 Exercises
Exercise 4.41 Find the extremal y(x) for the variational
 1
L(y) = (1 + y′′2 )dx, y(0) = 0, y′ (0) = 1, y(1) = 1, y′ (1) = 1.
0

Exercise 4.42 Find the extremals y = y(x), z = z(x) for the variational
 π/4
L(y, z) = (4y2 + z2 − y′2 − z′2 )dx
0
subject to
y(0) = 1, y(π/4) = 0, z(0) = 0, z(π/4) = 1.
Generalization of Euler-Lagrange Equation 251

Exercise 4.43 Find the extremals y = y(x), z = z(x) for the variational with boundary
conditions  π/4
L(y, z) = (4y2 + z2 + y′ z′ )dx,
0
y(0) = 1, y(π/4) = 0, z(0) = 0, z(π/4) = 1.
Hint: Solving for the constants will be messy.
Exercise 4.44 Prove parts (a) and (b) of Remark 19.
Exercise 4.45 Use Exercise 4.44 to show that for constants c1 and c2 the Euler-
Lagrange equation of the variational
 b
(1 + y′2 )2
L(y) = dx
a y′′
is
c1 y′ + c2
y′′ =1
(1 + y′2 )2
and solve the differential equation.
Hint: Use the transformation y′ = tan(u) to solve the differential equation.
Exercise 4.46 Find the extremals y = y(x), z = z(x) (no need to solve for the con-
stants) for the variational
 1
z′2 + (y′2 − 1)2 + z2 + yz dx

L(y, z) =
0

Exercise 4.47 An elastic beam has vertical displacement y(x), x ∈ [0, l]. (The x-axis
is horizontal and the y-axis is vertical and directed upwards.) Let ρ be the load per
unit length on the beam. The ends of the beam are supported, that is, y(0) = y(l) = 0.
Then the displacement y minimizes the energy functional
 l
1 2
D y′′ (x) + ρgy(x) dx,

L(y) =
0 2
where D, ρ and g are positive constants. Write down the differential equation and the
rest of the boundary conditions that y(x) must satisfy and then show that the solution
is
ρg
y(x) = − x(l − x)[l 2 + x(l − x)].
24D
Exercise 4.48 Find the extremal y = y(x), z = z(x), for the variational with bound-
ary conditions
 π/2
L(y, z) = (y′2 + z′2 + 2yz)dx
0
y(0) = 1, y(π/2) = 1, z(0) = 0, z(π/2) = −1.
Answer: y(x) = sin(x), z(x) = − sin(x).
252 Calculus of Variations

Exercise 4.49 Find the extremal y = y(x), z = z(x) for fixed end points of the varia-
tional  b
L(y, z) = (2yz − 2y2 − (y′ )2 + (z′ )2 )dx.
a
Exercise 4.50 Find the extremal y = y(x), z = z(x) for fixed end points of the varia-
tional  b
L(y, z) = (y′ z′ + y2 + z2 )dx.
a
Exercise 4.51 Find the extremal y = y(x), z = z(x) for the variational with boundary
conditions  1
L(y, z) = (2y + (y′ )2 + (z′ )2 )dx,
0
3
y(0) = 1, y(1) = , z(0) = 1, z(1) = 1.
2
Answer: y(x) = 1 + x2 /2, z(x) = 1.

4.7 Natural Boundary Conditions


In the preceding sections, we only considered variational with fixed endpoints. In
this section, our aim is to redevelop the theory that either one or both endpoints
are free to move. We begin by letting y = y(x) ∈ C2 ([a, b]) be an extremal of the
variational
 b
L(y) = F(x, y, y′ )dx, y(a) = A, y(b) is unspecified. (4.50)
a

Such a problem is called free endpoints problem. Note that y(b) takes values at the
vertical line x = b, as illustrated in Fig. 4.7. It seems that if y is an extremal, additional
condition(s) must be imposed at the second boundary point x = b. Most of the next
derivations are similar to those in Theorem 4.3.
Let η = η(x) ∈ C2 ([a, b]) with η(a) = 0. In the functional L(y) replace y by y + εη.
Setting
dL(ε)
dε ε=0
we arrive at
 bh
∂F ∂F i
(x, y + εη, y′ + εη ′ )η + ′ (x, y + εη, y′ + εη ′ )η ′ dx .

a ∂y ∂y ε=0

Therefore, we obtain the necessary condition


 bh
∂F ∂F i
(x, y, y′ )η(x) + ′ (x, y, y′ )η ′ (x) dx = 0.
a ∂y ∂y
Natural Boundary Conditions 253
y

y(x)
• y(b)

• y(b)

A • η(b)

(0, 0) x
a x=b

FIGURE 4.7
Free boundary condition at x = b.

We perform an integration by parts on the second term in the integrand of the above
integral.
 b  b
∂F ′ ∂F b d ∂F 
η dx = η(x) a − η(x)dx
a ∂ y′ ∂y′
a dx ∂ y′

∂ F(b, y(b), y′ (b)) ∂F b
d ∂F 
= ′
η(b) − ′ η(a) − η(x)dx
∂y ∂y a dx ∂ y′

∂ F(b, y(b), y′ (b)) b
d ∂F 
= η(b) − η(x)dx,
∂ y′ a dx ∂ y′

since η(a) = 0. Substituting back into the integral we arrive at


 bh
∂F d ∂ F i
− η(x)dx + Fy′ (b, y(b), y′ (b))η(b) = 0. (4.51)
a ∂ y dx ∂ y′

Since (4.51) holds for all values of η, it must hold for η also satisfying the condition
η(b) = 0. Hence
 bh
∂F d ∂ F i
− η(x)dx = 0,
a ∂ y dx ∂ y′
and by Lemma 10 , it follows that

d ∂F  ∂F
− = 0, (4.52)
dx ∂ y′ ∂y

for all functions η(x). A substitution of (4.52) into (4.51) gives

Fy′ (b, y(b), y′ (b)) := Fy′ x=b = 0.



(4.53)
254 Calculus of Variations

Similar results can be easily obtained for cases when y(a) is unspecified or both y(a)
and y(b) are unspecified. We summarize the results in the next theorem but first we
state
Fy′ (a, y(a), y′ (a)) := Fy′ x=a = 0.

(4.54)

Theorem 4.15 Let y = y(x) ∈ C2 [a, b] be an extremal for the variational


 b
L(y) = F(x, y, y′ )dx (4.55)
a

with boundary conditions specified or unspecified at x = a and x = b.


1) If both boundary conditions are specified, (y(a) = A, y(b) = B) then a necessary
condition for y(x) to be an extremal of (4.55) is the Euler-Lagrange equation
given by (4.52).
2) If y(a) is not specified and y(b) is specified (y(b) = B), then the necessary condi-
tions for y(x) to be an extremal of (4.55) are the Euler-Lagrange equation given
by (4.52) and (4.54).
3) If y(a) is specified (y(a) = A) and y(b) is unspecified then the necessary condi-
tions for y(x) to be an extremal of (4.55) are the Euler-Lagrange equation given
by (4.52) and (4.53).
4) If neither y(a) nor y(b) is specified then the necessary conditions for y(x) to be an
extremal of (4.55) are the Euler-Lagrange equation given by (4.52), plus (4.53)
and (4.54) .
Example 4.16 Let y = y(x) be an extrema of the variational
 1
L(y) = (y′2 + y2 )dx, y(0) = 1, y(1) is unspecified.
0

Then fy′ = 2y′ , and hence Fy′ x=1 = 2y′ (1) = 0. Moreover, the corresponding Euler-

Lagrange equation is y′′ − y = 0. Thus, we are left with solving the second-order
differential equation

y′′ − y = 0, y(0) = 1, y′ (1) = 0,

which has the solution


1 e−2 x
y(x) = −2
e−x + e.
1+e 1 + e−2

Example 4.17 [River crossing] A boat wants to cross a river with two parallel banks
at a distance b apart. One of the banks coincides with the y-axis. The other bank is
line x = b as depicted in Fig. 4.8. The water is assumed to be moving parallel to
Natural Boundary Conditions 255

v(x)
y

y(x)
• y(b)

(0, 0) x
x=b

FIGURE 4.8
Boat route.

the banks with speed v(x). The boat’s constant speed is c such that c2 > v2 . Assume
(0, 0) is the departure point. We are interested in finding the route that the boat should
take to reach the opposite bank in the shortest possible time.
To do so, we assume the boat moves along a path y = y(x). Let α be the angle at
which the boat is steered. Then the velocity of the boat in the river is
dy dy/dt v + c sin(α) v
y′ = = = = sec(α) + tan(α).
dx dx/dt c cos(α) c
On the other hand, the time T required to cross the river is
 b  b  b  b
′ dt 1 1
T= t (x)dx = dx = dx
dx = sec(α)dx.
0 0 dx 0 dt 0 c

From the preceding equation of y′ we have

cy′ = v sec(α) + c tan(α).

Or
(cy′ − v sec(α))2 = c2 tan2 (α) = c2 (sec2 (α) − 1).
After rearranging the terms we arrive at the quadratic equation in sec(α),

(c2 − v2 ) sec2 (α) + 2cvy′ sec(α) − c2 (1 + y′2 ) = 0,

that we need to solve. Since sec(α) > 0 in the first quadrant we have that

−cvy′ + c2 v2 y′2 + c2 (c2 − v2 )(1 + y′2 )


p
sec(α) =
c2 − v2
−cvy′ + c c2 (1 + y′2 ) − v2
p
= .
c2 − v2
256 Calculus of Variations

O(0, 0)

y(b))
v
mg

FIGURE 4.9
Brachistochrone free end point.

A substitution of sec(α) in the integrand of T yields the variational


 bp 2
c (1 + y′2 (x)) − v2 (x) − v(x)y′ (x)
L(y) = , y(0) = 0, y(b) is unspecified.
0 c2 − v2 (x)
(4.56)
Minimizing expression (4.56) yields finding the shortest trajectory or path y = y(x)
that the boat follows in order to cross to the other bank of the river, which is equiva-
lent to finding the shortest possible time. □
Example 4.18 [Brachistochrone problem revisited] We revisit the Brachistochrone
problem that was considered in Section 4.5. For simplicity, we assume the bead’s
starting point is the origin. That is, (x1 , y1 ) = (0, 0). As before, we let x2 = b so that
we have compatible notation as in Section 4.5. In this problem, we are seeking the
shape of the wire y(x) that enables the bead to get from the origin to a point on the
line x = b > 0 in the shortest time. In other words, y(b) is unspecified; see Fig. 4.9.
Since we are starting at the origin, the conservation of energy equation (4.43) takes
the form
1 2
mv = mgy.
2
Solving for v and setting y1 = 0, the variational in (4.44) is reduced to
 bp
1 + y′2 (x)
L(y) = p dx, y(0 = 0, y(b) is free. (4.57)
0 2gy(x)

Natural boundary conditions of higher-orders.
We follow the same set up as in Section 4.6. Let y ∈ C4 ([a, b]) and consider the
variational  b
L(y) = F(x, y, y′ , y′′ )dx, (4.58)
a
with boundary conditions
y(a) = A1 , y′ (a) = A2 , y(b) = B1 , y′ (b) = B2 .
Natural Boundary Conditions 257
dL(ε)
Setting = 0,
dε ε=0
 b
dL(ε) d
= F(x, y + εη, y′ + εη ′ , y′′ + εη ′′ )dx

dε dε a ε=0
 bh i
= Fy η + Fy′ η ′ + Fy′′ η ′′ dx. (4.59)
a

dx
since = 0. We perform an integration by parts on the second and third terms in

the integrand. After some work we end up with
 bh i x=b x=b d x=b
Fy η + Fy′ η ′ + Fy′′ η ′′ dx = Fy′ η + Fy′′ η ′ − Fy′′ η

a x=a x=a dx x=a
 b 2 
d d
+ F ′′ − Fy′ + Fy ηdx
2 y
a dx dx
 d   x=b
= Fy′′ η ′ − Fy′′ − Fy′ η
dx x=a
 b 2 
d d
+ F ′′ − Fy′ + Fy ηdx.
a dx2 y dx

Thus the natural boundary conditions depend on the relation


h d  i x=b
Fy′′ η ′ − Fy′′ − Fy′ η = 0. (4.60)
dx x=a

Then a combination of the following natural boundary conditions are needed when
one or more boundary condition is unprescribed or unspecified. To be specific, we
may deduce from (4.60) the following:

Fy′′ x=a = 0, if y′ (a) is unspecified,



(4.61)


Fy′′ x=b = 0, if y (b) is unspecified,
(4.62)
d 
F ′′ − Fy′ = 0, if y(a) is unspecified, (4.63)

dx y x=a
and
d 
F ′′ − Fy′ = 0, if y(b) is unspecified. (4.64)
dx y x=b

Recall that in order for y(x) to be an extremal of (4.58) it must satisfy the Euler-
Lagrange equation given by

d2 d
2
Fy′′ − Fy′ + Fy = 0,
dx dx
that readily follows from (4.59). We have the following example.
258 Calculus of Variations

Example 4.19 Find the extremal y = y(x) for the functional


 π/2
− y2 + (y′′ )2 dx

L(y) =
0
subject to
y(0) = 1, y′ (0) = 2, y(π/2), and y′ (π/2) are unspecified.
The corresponding necessary Euler-Legrange equation is
y(4) − y = 0.
Using the method of Section 1.8 we obtain the general solution
y(x) = c1 e−x + c2 ex + c3 sin(x) + c4 cos(x).
The two natural boundary conditions that we need are (4.62) and (4.64). Condition
(4.62) yields
y′′ (π/2) = 0.
Similarly, from condition (4.64) we get
y′′′ (π/2) = 0.
Hence, by applying all four boundary conditions, we arrive at the system of equations
c1 + c2 + c4 = 1,
−c1 + c2 + c3 = 2,
c1 e−π/2 + c2 eπ/2 − c3 = 0,
−c1 e−π/2 + c2 eπ/2 + c4 = 0,

with solution
c1 = 3.35786, c2 = 0.80197, c3 = 4.55589, c4 = −3.15983.

Next, we provide an application for reducing a cantilever beam’s potential energy. A
more general case of the study of beam will be considered in Chapter 5. As a result of
an underlying force that pulls a body toward its source, a system has a propensity to
reduce potential energy. Or shoving a body away if the force is repellent. As a result,
the distance is reduced, which reduces potential energy. Hence, potential energy is a
measure of potential movement; potential energy is a measure of potential motion.
Clearly, if the two attractive bodies are already together, there is no movement and
no potential energy. Now, this justification holds true for both elastic and electric
potential energy. There is an underlying force that moves the material in each of these
instances. While these forces can produce movement, if their nature is attracting, the
corresponding potential energy increases with distance. In conclusion, the support
situation, profile (form of the cross-section), geometry, equilibrium situation, and
material of a beam are its defining characteristics.
Natural Boundary Conditions 259

q(x)
L y(L) = 0
y(0) = 0 • • x
y(x)
y′ (0) = 0 y′ (L) = 0

FIGURE 4.10
Clamped Beam at both end points.

Example 4.20 Suppose we have a beam of length L with small transverse displace-
ment y(x) under transverse load q(x). The beam is subject to infinitesimal deflections
only. According to the force and moment balance approach, the displacement is gov-
erned by the fourth-order differential equation

d4y
eI = q(x), (4.65)
dx4
where e is the modulus of elasticity of the beam’s material and I(x) is the moment of
inertia of the beam’s cross-sectional area about a point x. We are interested in min-
imizing the potential energy. It is thought that applying the minimal total potential
energy approach will make future extensions of the beam equation into large deflec-
tions, nonlinear materials, and accurate modeling of shear forces between the cable
elements simpler than using force- and moment balances. The potential energy is a
combination of the strain energy,

1  d 2 y 2
eI ,
2 dx2
or the deformed energy stored in the elastic plus the work potential. The work po-
tential is the negative work done by external forces, which is −qy. Thus, the total
potential energy is given by the variational
 Lh
1  d 2 y 2 i
L(y) = eI − q(x)y(x) dx, (4.66)
0 2 dx2

where e, q, and I are known quantities. Note that then Euler-Lagrange equation of
(4.66) is (4.65). In what to follows, we will consider different cases of conditions
corresponding to support systems for the beam, and we assume that e and I are con-
stants.
(I). The beam is clamped at each end, as Fig. 4.10 shows. In this case, we have the
four boundary conditions

y(0) = y′ (0) = 0, y(L) = y′ (L) = 0,

and hence no natural boundary conditions are in play.


260 Calculus of Variations

q(x)
L
y(0) = 0 • x
′ y(x)
y (0) = 0

FIGURE 4.11
Clamped Beam at x = 0.

(II). The beam is only clamped at x = 0 as Fig. 4.11 shows. A beam that is fixed at
one end and free at the other end is known as a cantilever beam. A cantilever
beam is one that is free-hanging at one end and fixed at the other. This type of
beam is capable of carrying loads with both bending moment and sheer stress
and is typically used when building bridge trusses or similar structures. The end
that is fixed is typically attached to a column or wall. The tension zone of a
cantilever beam, is found at the top of the beam with the compression zone at the
bottom of the beam. In such a case, we are considering a cantilever beam, which
is a rigid structure supported at one end and free at the other. We are assuming
small deflection of the beam since the end point x = L is unclamped. In this case
we need the natural boundary conditions (4.62) and (4.64). Conditions (4.62)
and(4.64) yields
eIy′′′ (L) = 0, and eIy′′ (L) = 0.
The condition y′′′ (L) = 0 means that the reaction force at x = L is zero. Similarly,
the condition y′′ (L) = 0 means that the reaction moment force at x = L is zero.
(III). We assume the beam is simply supported at the end points as depicted in Fig.
4.12. Simply supported beams are those that have supports at both ends of the
beam. These are most frequently utilized in general construction and are very
versatile in terms of the types of structures that they can be used with. A simply
supported beam has no moment resistance at the support area and is placed in a
way that allows for free rotation at the ends on columns or walls. In other words,
the beam is pinned at both ends, and no restrictions are imposed on y′ at x = 0
and x = L. The relevant natural boundary conditions in this instance are (4.61)
and (4.62) and as a consequence, we obtain y′′ (0) = 0 and y′′ (L) = 0.
(IV). Double overhanging: This is a simple beam with both ends extending beyond its
supports on both ends. Then all four natural boundary conditions (4.61)–(4.64)
are in play. Consequently, they yield

y′′ (0) = y′′ (L) = 0, and y′′′ (0) = y′′′ (L) = 0.

Physically, this means that the reaction force and moment at each end of the
beam must be zero under these circumstances.


Impact of y′′ on Euler-Lagrange Equation 261

q(x)
L
y(0) = 0 x y(L) = 0
y(x)

FIGURE 4.12
Simply supported beam.

4.8 Impact of y′′ on Euler-Lagrange Equation


In this brief section, we examine variational in which y′′ enters linearly. Thus, we are
interested in variational of the form
 b 
L(y) = N(x, y)y′′ + M(x, y) dx, (4.67)
a

with boundary conditions

y(a) = A1 , y′ (a) = A2 , y(b) = B1 , y′ (b) = B2 .

Assume N and M are continuous with continuous partial derivatives on some subset
of R2 . Let
F(x, y) = N(x, y)y′′ + M(x, y).
Then, Fy′′ = N, and therefore

d d
F ′′ = N = Nx + Ny y′ .
dx y dx
Moreover,
d2
F ′′ = Nxx + Nxy y′ + Ny y′′ + Nyx y′ + Nyy y′2 .
dx2 y
In addition, Fy′ = 0, and Fy = Ny y′′ + My . Thus, the Euler-Lagrange equation

d2 d
2
Fy′′ − Fy′ + Fy = 2Ny y′′ + Nyy y′2 + Nxy y′ + My = 0,
dx dx
which is a second-order differential equation, and hence not all four boundary con-
ditions can be satisfied in most cases.

4.8.1 Exercises
Exercise 4.52 Let y = y(x) be an extremal of the variational
 1
L(y) = (y′2 + y2 )dx.
0
262 Calculus of Variations

(a) Find y(x) when y(0) is unspecified and y(1) = 1.


(b) Find y(x) when both end points are unspecified, and argue that it minimizes L.
Exercise 4.53 Find the extremal y = y(x) of the variational
 π/4
L(y) = (−y′2 + y2 )dx,
0

when y(0) = 1 and y(π/4) is unspecified.


Exercise 4.54 Find the extremal y = y(x) of the variational
 e
x2 ′2 y2
L(y) = ( y − )dx,
1 2 8

when y(1) = 1 and y(e) is unspecified.


Exercise 4.55 Find the extremal y = y(x) of the variational
 1
1 ′2
L(y) = [y + yy′ + y]dx,
0 2

when y(0) and y(1) are unspecified.



Exercise 4.56 [River crossing] Compute Fy′ x=b for the variational (4.56).
v(b)
Answer: y′ (b) = .
c
Exercise 4.57 Suppose p(x) and q(x) are continuous and positive functions on [0, 1].
Find the Euler-Lagrange equation and the natural boundary condition for the varia-
tional  1
L(y) = [p(x)y′2 − q(x)y2 ]dx,
0
y(0) = 0, y(1) free.
Exercise 4.58 Solve the variational problem (4.57) that describes the shortest path
for Brachistochrone with free end point.
Exercise 4.59 Assume g(x, y) ̸= 0 for all (x, y). Find the natural boundary condition
for the variational
 b q
L(y) = g(x, y) 1 + (y′ )2 dx,
a
y(0) is free and y(1) = 0.
Exercise 4.60 Show that the extremal for
 b
1
q
L(y) = 1 + (y′ )2 dx, y > 0
a y
Discontinuity in Euler-Lagrange Equation 263

is
(x − B)2 + y2 = R2 ,
for appropriate constants B and R.
Exercise 4.61 Find the extremals y = y(x), z = z(x) for the variational
 π
L(y, z) = (4y2 + z2 − y′2 − z′2 )dx;
0

y(0) = 1 = z(0), y(π) and z(π) are unspecified.


Exercise 4.62 Find y = y(x) the extremal of the variational
 1
L(y) = (1 + (y′′ )2 )dx,
0

when y(0) = 0, y′ (0) = 1 and y(1), y′ (1) are unspecified.


Exercise 4.63 Find the extremal y = y(x) for the functional
 π/2
− y2 + 2yx3 + (y′′ )2 dx;

L(y) =
0

y(0), y′ (0) are unspecified and y(π/2) = 1, y′ (π/2) = 2.


Exercise 4.64 Compute
d2 d
F ′′ − Fy′ + Fy
dx2 y dx
for the functional
 b 
L(y) = N(x, y)y′′ + P(x, y)y′ + M(x, y) dx
a

4.9 Discontinuity in Euler-Lagrange Equation


Consider the variational
 b
L(y) = F(x, y, y′ )dx, y(a) = A, y(b) = B (4.68)
a
d
and suppose one or both of the terms Fy and dx Fy′ are discontinuous at one or
more points in (a, b). For illustrational purpose, we assume there is one point of
discontinuity, c ∈ (a, b). Divide the interval [a, b] into two subintervals such that
[a, b] = [a, c− ) ∪ (c+ , b]. We are searching for a continuous extremal y(x) of (4.68)
and as a consequence the following condition must hold.

lim y(x) = lim y(x). (4.69)


x→c− x→c+
264 Calculus of Variations

Our η function is assumed to be continuous in the sense that

η(c+ ) = lim η(x) = η(c− ) = lim η(x) = η(c).


x→c+ x→c−

By a similar arguments as in Section 4.7, one has


 bh
∂F ∂F  i
(x, y + εη, y′ + εη ′ )η + ′ x, y + εη, y′ + εη ′ η ′ dx
a ∂y ∂y ε=0
 c− h
∂F ∂F i
x, y + εη, y′ + εη ′ η + ′ (x, y + εη, y′ + εη ′ )η ′ dx

=

a ∂y ∂y ε=0
 bh
∂F ∂ F i
x, y + εη, y′ + εη ′ η + ′ (x, y + εη, y′ + εη ′ )η ′ dx

+

c+ ∂ y ∂y ε=0

dL(ε)
Set dε |ε=0 = 0 and integrate by parts to obtain,
 c− h  bh
∂F ∂F i ∂F ∂F i
η + ′ η ′ dx + η + ′ η ′ dx
a ∂y ∂y c+ ∂y ∂y
 c− b 
d  d 
= Fy − Fy′ ηdx + Fy − Fy′ ηdx
a dx + dx
 c−
+ Fy′ c , y(c ), y (c ) η(c ) − Fy′ a, y(a), y′ (a) η(a)
− − ′ −


+ Fy′ b, y(b), y′ (b) η(b) − Fy′ c+ , y(c+ ), y′ (c+ ) η(c− ) = 0.


 
(4.70)

As a consequence of (4.70) we obtain the following conditions.


d d
Fy − F ′ = 0, a < x < c− ; Fy −
F ′ = 0, c+ < x < b, (4.71)
dx y dx y
Fy′ a, y(a), y′ (a) = 0

(4.72)
Fy′ b, y(b), y′ (b) = 0,

(4.73)
and
lim Fy′ = lim Fy′ . (4.74)
x→c− x→c+

Theorem 4.16 Let y = y(x) ∈ C2 [a, b] be an extremal for the variational


 b
L(y) = F(x, y, y′ )dx (4.75)
a

with boundary conditions specified or unspecified at x = a and x = b. Assume there


is a discontinuity at at point c ∈ (a, b).
1) If both boundary conditions are specified, (y(a) = A, y(b) = B) then conditions
(4.69), (4.71), and (4.74) are needed.
2) If y(a) is not specified and y(b) is specified (y(b) = B), then the necessary con-
ditions (4.69), (4.71), (4.72), and (4.74) are needed.
Discontinuity in Euler-Lagrange Equation 265

3) If y(a) is specified (y(a) = A) and y(b) is unspecified then the necessary condi-
tions (4.69), (4.71), (4.73), and (4.74) are needed.
4) If neither y(a) nor y(b) is specified then the necessary conditions (4.69), (4.71)–
(4.74) are needed.
Example 4 Consider the functional
 1
L(y) = ( f (x)y′2 + y)dx, y(−1) = y(1) = 0
−1

with 
1, −1 ≤ x < 0
f (x) =
2, 0 < x ≤ 1.
Obviously, we have discontinuity at c = 0. Regardless of the discontinuity, the Euler-
Lagrange equation is
d
(2 f (x)y′ ) − 1 = 0. (4.76)
dx
For −1 ≤ x < 0, we have 2y′′ − 1 = 0, with the general solution

x2
y(x) = + c1 x + c2 . (4.77)
4
Similarly, for 0 < x ≤ 1, we have 4y′′ − 1 = 0, with the general solution

x2
y(x) = + d1 x + d2 . (4.78)
8
An application of 0 = y(−1) to (4.77) gives
1
c2 − c1 = − .
4
Next apply 0 = y(1) to (4.77) and get
1
d1 + d2 = − .
8
An application of (4.69)
lim y(x) = lim y(x),
x→0− x→0+
yields
c2 = d2 .
Finally, condition (4.74) yields
c1 = 2d1 .
Next we substitute d1 = c1 /2, and d2 = c2 into d1 +d2 = − 18 to obtain c1 +2c2 = − 14 .
Finally, solving
1 1
c2 − c1 = − ; c1 + 2c2 = − ,
4 4
266 Calculus of Variations

one obtains
1 1
c1 = , c2 = − .
12 6
Also, It follows that
1 1
, d2 = − .
d1 =
24 6
In conclusion, the solution over the whole interval is given piecewise
 2
1
 x4 + 12
 x − 16 , −1 ≤ x < 0
y(x) =
 x2 + 1 x − 1 , 0 < x ≤ 1.

8 24 6

which is continuous at x = 0. □

4.9.1 Exercises
Exercise 4.65 Find the extrema y = y(x) for the functional
 1
f (x)y′2 + 8y2 dx,

L(y) = y(−1) = 0, y(1) = 1,
−1

with
1

2, −1 ≤ x < 0
f (x) =
2, 0 < x ≤ 1.
Exercise 4.66 Find the extrema y = y(x) for the functional
 π/2
L(y) = ( f (x)y′2 − 2y2 )dx, y(0) = 0, y(π/2) = 1,
0

with 
2, 0 ≤ x < π/4
f (x) = 1
2, π/4 < x ≤ π/2.

−1, 0 ≤ x < 1/4
Exercise 4.67 Let f (x) = and consider the functional
1, 1/4 < x ≤ 1
 1
L(y) = (y′2 f (x))dx.
0

Find the extremal of L(y) when


(a) y(0) = 0, y(1) = 1,
(b) y(0) = 0 and y(1) is unassigned,
(c) y(0) is unassigned and y(1) = 1,
(d) both y(0) and y(1) are unassigned.
Transversality Condition 267
y

ϕ(b, y(b))

ϕ(x, y(x))
y(a)
(0, 0) x
a b b + εξ

FIGURE 4.13
Transversality condition.

4.10 Transversality Condition


Let y = y(x) ∈ C2 ([a, b]) and consider the variational,
 b
L(y) = F(x, y, y′ )dx, y(a) = A. (4.79)
a

So far, we have investigated specified boundaries and unspecified boundaries that


take values along vertical lines. In this section, we are interested in exploring the
scenario when the free point moves along a specified curve. Without loss of gen-
erality, we assume y(a) is fixed and y(b) slides or lies on a curve defined by the
equation ϕ(x, y) = 0. Assume that ϕx and ϕy don’t vanish simultaneously on the do-
main of interest. As before, we assume a function η ∈ C2 ([a, b]) with η(a) = 0 so
that the function y(x) + η(x) is in the admissible space of functions. See Fig. 4.13.
Let (b, ϕ(b)) be the terminal point of the extremal of y on ϕ(x, y). From Fig. 4.13,
the terminal point of the varied path y(x) + εη(x) is

y(b + εξ ) + εη(b + εξ ) = y(b) + ε ξ y′ (b) + η(b) + O(ε 2 ),




for ξ > 0.
Since the same point lies on the curve ϕ(x, y) = 0 we have that
 
ϕ b + εξ , y(b) + ε(ξ y′ (b) + η(b)) = 0. (4.80)

Expanding expression (4.80) to the first-order ε, yields

ϕ b, y(b) + ξ ϕx + ξ y′ (b) + η(b) ϕy = 0.


 
268 Calculus of Variations

Use the fact that ϕ b, y(b) = 0 and rearrange the terms to get

ξ ϕx + ϕy y′ (b) + η(b)ϕy = 0.

(4.81)
dL(ε)
Now, we are ready to compute dε . Let
 b+εξ
L(y + εη) = F(x, y + εη, y′ + εη ′ )dx.
a

Then by Leibniz rule, which says,


 f (x)  f (x)
d ∂
B(x,t)dt = B(x, f (x)) f ′ (x) − B(x, g(x))g′ (x) + B(x,t)dt,
dx g(x) g(x) ∂x

we see that
 b+εξ

dL(ε) d ′ ′
= F(x, y + εη, y + εη )dx
dε dε a ε=0

′ ′

= F b + εξ , y(b + εξ ) + εη(b + εξ ), y (b + εξ ) + εη (b + εξ ) ξ
ε=0
 bh i
∂F ∂F
+ (x, y + εη, y′ + εη ′ )η + ′ (x, y + εη, y′ + εη ′ )η ′ dx .
a ∂y ∂y ε=0

dL(ε)
Setting ε = 0, dε = 0 and integrating by parts, the above expression yields,
 b
d
F b, y(b), y′ (b) ξ + Fy′ b, y(b), y′ (b) η(b) +
  
Fy − Fy′ η(x)dx = 0. (4.82)
a dx
Solving for ξ in (4.81) yields
η(b)ϕy
ξ =− .
ϕx + ϕy y′ (b)

Substituting into (4.82) and factoring η(b) give


 b
 Fϕy  d 
− + Fy′ x=b η(b) + Fy − F ′ η(x)dx = 0. (4.83)
ϕx + ϕy y′ (b) a dx y

The above relation (4.83) holds for all η(x), a ≤ x ≤ b and in particular it must hold
when η(b) = 0. Thus (4.83) implies
 b
d 
Fy − Fy′ η(x)dx = 0,
a dx
and so by Lemma 10 we arrive at
d
Fy − F ′ = 0. (4.84)
dx y
Transversality Condition 269

Substituting (4.84) into (4.83) yields


 Fϕy 
− ′
+ Fy′ x=b = 0,
ϕx + ϕy y (b)
or
h   i
Fy′ ϕx + y′ (b)ϕy − ϕy F = 0. (4.85)

x=b
Condition (4.85) is called the transversality condition. A similar work can be per-
formed to obtain h   i
Fy′ ψx + y′ (a)ψy − ψy F =0 (4.86)

x=a
when y(a) varies along the curve ψ(x, y) = 0 and y(b) is fixed.
Let us take a closer look at the transversality condition given by (4.85). Suppose we
can solve for y in terms of x in ϕ(x, y) = 0. If so, then we set y = g(x). Now
d
ϕ(x, y) = ϕx + ϕy y′ = 0.
dx
This implies that
ϕx
y′ = − = g′ (x).
ϕy
We may solve for ϕx and obtain ϕx = −g′ (x)ϕy . Substituting ϕx into (4.85)
yields h  i
F + g′ (x) − y′ (b) Fy′ = 0. (4.87)
x=b

Reminder:

F = F(b, y(b), y′ (b)), and Fy′ = Fy′ (b, y(b), y′ (b)).

x=b x=b

Along the lines of the preceding discussion, if y(b) is fixed and the left end point
y(a) varies along a curve y = h(x), then the corresponding transversality condition
is h  i
F + h′ (x) − y′ (a) Fy′ = 0. (4.88)
x=a
Thus, we proved the following theorem.
Theorem 4.17 Let y = y(x) ∈ C2 [a, b] be an extremal for the variational (4.79) with
boundary conditions specified or unspecified at x = a and x = b.
1) If y(a) moves along the curve y = h(x) and y(b) is specified (y(b) = B), then the
necessary conditions for y(x) to be an extremal of (4.79) are the Euler-Lagrange
equation given by (4.84) and (4.88).
2) If y(a) is specified (y(a) = A) and y(b) moves along the curve y = g(x), then the
necessary conditions for y(x) to be an extremal of (4.79) are the Euler-Lagrange
equation given by (4.84) and (4.87).
270 Calculus of Variations

3) If both endpoints are allowed to move freely along the curves h and g, then the
necessary conditions for y(x) to be an extremal of (4.79) are the Euler-Lagrange
equation given by (4.84), plus (4.87) and (4.88).

Natural boundary conditions can be easily derived from this discussion. For example,
if y(a) is fixed and y(b) varies along the line x = b, then ϕ(x, y) = x − b. This im-
plies that ϕx = 1, and ϕy = 0. Substituting into (4.85), we obtain Fy′ (b, y(b), y′ (b)) =
0.
Example 4.21 Find the shortest distance from the point (0, 0) to the nearest point on
the curve xy = 1, x, y > 0. Basically, by Example 4.5 we are to minimize
 bq
L(y) = 1 + (y′ )2 dx, y(0) = 0
0

and y(b) lies on the curve g(x) = 1x . Then,


q
F = 1 + (y′ )2 ,
which is independent of x, and y, and hence we make use of Fy′ = C that is given in
Corollary 4.12. It follows that
y′
p = C.
1 + (y′ )2
Solving for y′ we end up with
y′ = constant = K,
where K is some function of C (another constant). Hence, y(x) = Kx + D. Applying
0 = y(0) we get D = 0. We are in need of another boundary condition to solve for K.
We make use of the transversality condition (4.87), which requires that
h  i
F + g′ (x) − y′ (b) Fy′ = 0.
x=b
Or
y′ (b) 1
q
− 2 − y′ (b) = 0.

1 + y′2 (b) + p
1 + y′2 (b) b
Since y′ = k, the above expression reduces to
p K 1
1 + K2 + √ (− 2 − K) = 0.
1+K 2 b
√ 2
Multiply by 1 + K 2 to arrive at K = b . Consequently, the shortest distance from
the point (0, 0) to the nearest point (b, y(b)) on the curve xy = 1 is
y(x) = b2 x.
For example if b = 1, then y(x) = x is a straight line with the shortest distance be-
tween the origin and the point (1, 1) that lies on the parabola y = 1/x as depicted in
Fig. 4.14. □
Transversality Condition 271
y

(b, y(b))

• (1, 1) 1
y= x

(0, 0) x

FIGURE 4.14
Shortest distance to a parabola.

4.10.1 Problem of Bolza


Now we extend the results of Section 4.10 to the Bolza Problem. For y = y(x) ∈
C2 ([a, b]) we are interested in finding the extremal of the functional
 b
L(y) = h(b, y(b)) + F(x, y, y′ )dx, y(a) = A. (4.89)
a

Without loss of generality, we assume y(a) is fixed and y(b) slides or lies on a curve
defined by the equation ϕ(x, y) = 0. The set up is very identical to the one in Section
4.10. Thus, following the same derivations, we have, with slight modification due to
the presence of the function h that

L(y + εη) = h b + εξ , y(b + εξ + εη(b + εξ ))
 b+εξ
+ F(x, y + εη, y′ + εη ′ )dx.
a

Then by Leibniz rule, we have that

dL(ε) d 
= h b + εξ , y(b + εξ ) + εη(b + εξ )
dε dε
 b+εξ
d
+ F(x, y + εη, y′ + εη ′ )dx

dε a ε=0
 
  ′
= hx b, y(b) ξ + hy b, y(b) y (b)ξ + η(b)
 bh i
+ F b, y(b), y′ (b) ξ + Fy (x, y, y′ )η + Fy′ (x, y, y′ )η ′ dx.

a
272 Calculus of Variations

After integrating by parts and rearranging the terms, the above expression simplifies
to
   b
  ′ ′ d 
hx b, y(b) + hy b, y(b) y (b) + F(b, y(b), y (b)) ξ + Fy − Fy′ η(x)dx
a dx
 
+ hy b, y(b)) + Fy′ (b, y(b), y′ (b) η(b).

(4.90)

The value of ξ is not affected by the presence of the function h and hence, using the
results of the previous section we see that

η(b)ϕy
ξ =− .
ϕx + ϕy y′ (b)

Suppose we can solve for y in terms of x in ϕ(x, y) = 0. If so, then we set y = g(x).
Now
d
ϕ(x, y) = ϕx + ϕy y′ = 0.
dx
This implies that
ϕx
y′ = − = g′ (x).
ϕy
We may solve for ϕx and obtain ϕx = −g′ (x)ϕy . As a consequence, we will
have
η(b)ϕy η(b)
ξ =− = .
ϕx + ϕy y′ (b) g′ (b) − y′ (b)
Substituting into (4.90) and factoring η(b) give
h h + h y′ + F  b
x y
i d 
+ hy + Fy′ η + Fy − Fy′ η(x)dx = 0. (4.91)

g′ − y′ x=b a dx

Arguing as before one obtains from (4.91) that

d
Fy − F′ =0 (4.92)
dx y
and the transversality condition

hx + hy y′ + F
+ hy + Fy′ ,
g′ − y′
which simplifies to
h i
hx + F + g′ hy + (g′ − y′ )Fy′ = 0. (4.93)

x=b
Note that the term y in (4.93) is the solution of the Euler-Lagrange equation given
by (4.92). Along the lines of the preceding discussion, if y(b) is fixed and the left
Transversality Condition 273

end point y(a) varies along a curve y = l(x), then the corresponding transversality
condition is
h i
hx + l ′ hy − F − (l ′ − y′ )Fy′ =0 (4.94)

x=a

Example 4.22 Find the extremal of the functional


 π/2
2
(y′ )2 − y2 dx,

L(y) = (π/2) + y(0) = 0,
0

and y(π/2) varies along the curve y + 2 − x2 = 0.


Here we have
F = (y′ )2 − y2 , h(x) = x2 .
Thus, (4.92) yields
y′′ (x) + y(x) = 0,
which has the general solution
y(x) = c1 cos(x) + c2 sin(x).
Applying the first boundary condition, we arrive at c1 = 0. To obtain c2 we make use
of (4.93). Let y(x) = c2 sin(x). By computing all necessary terms, condition (4.93)
yields,  
′ 2 ′
2
  ′
2b + (y ) (b) − y (b) + − 2b − y (b) (2y (b) π = 0.

b= 2

Since y(π/2) = c2 and y′ (π/2) = 0, the above expression yields


2(π/2) + (0 − c22 ) + (−2(π/2) − 0)(0) = 0.

Solving for c2 we obtain c2 = ± π, and so the extremal is

y(x) = ± π sin(x).

4.10.2 Exercises
Exercise 4.68 Find the shortest distance from the point (a, A) to the nearest point
(b, y(b)) on the line with slope m, y = mx + c.
Exercise 4.69 Find the extremal y = y(x) for the functional
 bp
1 + y′2
J(y) = dx, y(0) = 0
0 y
and y(b) varies along the circle
(x − 9)2 + y2 = 9.
274 Calculus of Variations

Exercise 4.70 Find the extremal y = y(x) for the functional


 bp
1 + y′2
J(y) = dx, y(0) = 0
0 y

and y(b) varies along the line y = x − 5.


Exercise 4.71 Consider the variational
 b p
L(y) = xy 1 + y′2 dx, y(a) = A
a

and y(b) varies along the curve y = g(x). Show that at the point x = b,

g′ (b)y′ (b) = −1.

Of course same results hold if we interchange the boundary conditions.


Exercise 4.72 Find the extremal y = y(x) for the functional
 b
J(y) = x3 y′2 dx, y(1) = 0
1

and y(b) varies along the curve x2 (y + 2) − 2 = 0.


Exercise 4.73 Find the extremal y = y(x) for the functional
 b
J(y) = y′2 dx
0

(a) y(0) = 1 and y(b) varies along the curve y − 2x + 3 = 0.


(b) y(0) = 2 and y(b) varies along the curve y − sin(x) = 0.
Exercise 4.74 Derive (4.94) for the functional
 b
L(y) = h(a, y(a)) + F(x, y, y′ )dx, y(b) = B
a

and y at a varies along the curve l(x).


Exercise 4.75 Find the extremal of the functional
 π/2
L(y) = (π/2)2 + y2 (π/2) + ((y′ )2 − y2 )dx, y(0) = 0,
0

and y(π/2) varies along the curve y + 1 − x2 = 0.


Corners and Broken Extremal 275

4.11 Corners and Broken Extremal


In Example 4.6 of Section 4.2 we touched on broken extremal. In this section we
want to make the concept formal and more precise. So far, we have looked at ex-
tremal y(x) ∈ C2 ([a, b]), which is not always the case. Let’s begin with the following
example.
Example 4.23 Consider the variational
 2
L(y) = y2 (2 − y′ )2 dx, y(−2) = 0, y(2) = 2.
−2

Then the second-order differential equation corresponding to the Euler-Lagrange


equation
F − y′ Fy′ = c1
is
y2 (2 − y′ )(2 + y′ ) = c1 ,
or
y2 (4 − y′2 ) = c1 .
If c1 = 0, then we obtain the two solutions

y = 0, or y = ±2x + B.

Easy to see that neither solution satisfy both boundary conditions. Thus, we suspect
̸ 0. So we assume c1 ̸= 0 and obtain
at least for now that c1 =

y2 − c1
y′2 = .
y2
After separating the variables we arrive at
y
dx = ± p dy.
y2 − c1

An integration of both sides yields


p
x = ± y2 − c1 + c2 .

Rearrange the terms to obtain the solution

(x − c2 )2 = y2 − c1 ,

which is hyperbola. Next, we make use of both boundary conditions to evaluate c1


and c2 . With that being said, the following two equations are obtained.

(−2 − c2 )2 = −c1
276 Calculus of Variations
y

• (2, 2)

(−2, 0)
• x

FIGURE 4.15
There is no smooth path that connects boundary conditions.

(2 − c2 )2 = 4 − c1 .
Solving for c1 in the first equation and substituting it into the second equation yields

(2 − c2 )2 = 4 + (2 + c2 )2 .

After expanding the terms we obtain c2 = − 21 . Consequently, using c1 = −(2 + c2 )2


we see that c1 = − 94 . Finally, the solution is
9
y2 = (x + 1/2)2 − ,
4
which is a hyperbola as depicted in Fig. 4.15.
It is clear from Fig. 4.15 that the endpoints are on opposite branches of the hyperbola
and hence there is no smooth extremal curve that connects (−2, 0) and (2, 2). There-
fore, we must seek a broken curve or curve with corners to connect the endpoints. A
broken extremal is a continuous extremal whose derivative has jump discontinuities
at a finite number of points. We will revisit this example once we develop the needed
conditions to obtain a piecewise continuous extremal. □
In what to follow, we assume there is one corner point and obtain necessary condi-
tions for the continuity of the broken extremal. Assume we have a corner point at
x∗ ∈ (a, b) and let y be an extremal of
 b
L(y) = F(x, y, y′ )dx, y(a) = A, y(b) = B (4.95)
a
Corners and Broken Extremal 277
y


y1 y=
y= y2

(a, A) • (b, B)
x
x∗

FIGURE 4.16
Broken path with one corner point.

• (x̃∗ , ỹ∗ )


y = y(x)


(a, A) • (b, B)
x
x∗ x̃∗

FIGURE 4.17
Perturbing corner point x∗ with x̃∗ .

where 
y1 (x), a ≤ x ≤ x∗
y(x) =
y2 (x), x∗ ≤ x ≤ b.
See Fig. 4.16
Now we perturb the corner point along with the broken extremal. See Fig. 4.17. Let
ξ1 , ξ2 be functions of x and positive. Then the perturbed point (x̃∗ , ỹ∗ ) must satisfy,
for the purpose of compatibility, the relations

x̃∗ = x∗ + εξ1 ,

ỹ∗ = y∗ + εξ2 . (4.96)


As before, notice that

ỹ∗ = y(x∗ + εξ1 ) + εη(x∗ + εξ1 ).


278 Calculus of Variations

We follow the same procedure as in Section 4.10. We write our variational as the sum
of two variations in the sense that

L(y) = L1 (y1 ) + L2 (y2 )


 x∗  b
= F(x, y1 , y′1 )dx + F(x, y2 , y′2 )dx.
a x∗

First, we consider L(y1 ).


 x∗ +εξ1
dL1 (ε) d
0= = F(x, y1 + εη, y′1 + εη ′ )dx

dε dε a ε=0
F x∗ , y1 (x∗ ), y′1 (x∗ ) ξ1

=
 x∗ h
∂F ∂F i
+ (x, y1 + εη, y′1 + εη ′ )η + ′ (x, y1 + εη, y′1 + εη ′ )η ′ dx .

a ∂ y1 ∂ y1 ε=0

After integrating by parts, the above expression yields,


 x∗
d
F(x∗ , y1 (x∗ ), y′1 (x∗ ))ξ1 +Fy′ (x∗ , y1 (x∗ ), y′ (x∗ ))η(x∗ )+

Fy′ η(x)dx = 0. Fy1 −
1
a dx 1
(4.97)
Next we compute η(x∗ ). By Taylor’s theorem and for small ε, the first term from the
right of third equation in (4.96) yields

y(x∗ + εξ1 ) = y(x∗ ) + εξ1 y′ (x∗ ) + O(ε 2 )


= y∗ + εξ1 y′ (x∗ ) + O(ε 2 ).

Similarly,
εη(x∗ + εξ1 ) = εη(x∗ ) + O(ε 2 ).
Substituting the two expressions into the right-side of the third equation of (4.96) and
then using the second equation of (4.96) yield

y∗ + εξ2 = y(x∗ + εξ1 ) + εη(x∗ + εξ1 )


= y∗ + εξ1 y′ (x∗ ) + εη(x∗ ) + O(ε 2 ).

This provides us with

εξ2 = εξ1 y′ (x∗ ) + εη(x∗ + O(ε 2 ).

Solving for η(x∗ ) gives

η(x∗ ) = ξ2 − ξ1 y′ (x∗ ) + O(ε). (4.98)

Substituting (4.98) into (4.97) and using simplified notations we arrive at


 x∗
d
− y′1 Fy′ ]

ξ1 [F + ξ2 Fy′ + Fy1 − F ′ η(x)dx = 0.

1 x=x∗ 1 x=x∗ a dx y1
Corners and Broken Extremal 279

From the above expression we obtain the familiar Euler-Lagrange equation


d
Fy1 − F ′ = 0, (4.99)
dx y1
plus the additional condition
n o
ξ1 [F − y′1 Fy′ ] + ξ2 Fy′ = 0. (4.100)

1 1 x=x∗

By doing similar work we obtain from L2 (y2 ), equation (4.98) and the additional
condition n o
− ξ1 [F − y′2 Fy′ ] − ξ2 Fy′ = 0. (4.101)

2 2 x=x∗
Combining conditions (4.100) and (4.101) we arrive at
n h   i  o
ξ1 F x, y1 , y′1 − y′1 Fy′ − F x, y2 , y′2 − y′2 Fy′ + ξ2 Fy′ − Fy′
 
= 0.

1 2 1 2 x=x∗

In light of the fact that the point of discontinuity is free to change, we can indepen-
dently change both ξ1 and ξ2 or set them both to zero. We can therefore divide the
condition into two conditions.
n o
F(x, y1 , y′1 ) − y′1 Fy′ − F(x, y2 , y′2 ) − y′2 Fy′
 
= 0,
1 2 x=x∗
 
Fy′ − Fy′ = 0.
1 2 x=x∗
The above corner conditions can be expressed in terms of limits from the left and
right rather than dividing y into y1 and y2 . That is

lim F(x, y, y′ ) − y′ Fy′ = lim F(x, y, y′ ) − y′ Fy′ ,


   
(4.102)
x→x∗− x→x∗+

lim Fy′ = lim Fy′ (4.103)


x→x∗− x→x∗+

must hold at very corner point. The corners conditions given by (4.102) and (4.103)
are called Weirstrass-Erdmann corner conditions. We proved the following theo-
rem.
Theorem 4.18 For the functional (4.95) with one corner point x∗ ∈ (a, b) conditions
(4.102) and (4.103) must hold.
We note that (4.102) and (4.103) hold everywhere in (a, b) since if we are not at a
corner point y′ (x) is continuous as is Fy′ .
Back to Example 4.23. We saw that for c1 ̸= 0, then there is no smooth extremal that
connect both endpoints. Thus, we must look for an extremal that is piecewise defined
or has a corner. We are left with the choice of c1 = 0. In this case the Euler-Lagrange
equation has the two solutions

y = 0, or y = 2x + B.
280 Calculus of Variations

The branch of the solution y = 0, satisfies the first boundary condition y(−2) = 0. In
addition, the second part of the solution y = 2x + B satisfies y(2) = 2, for B = −2.
The corner conditions (4.102) and (4.103) are satisfied independently of the location
of the corner point in (−2, 2) since

F − y′ Fy′ = 0 and Fy′ = 0.

Thus, to have a continuous extremal, we may take x∗ = 1, (corner point at 1) and


then the solution is defined by

0, −2 ≤ x ≤ 1
y(x) =
2x − 2, 1 ≤ x ≤ 2.

Corollary 8 If Fy′ y′ ̸= 0, then an extremal for the functional (4.95) must be smooth.
That is it can not have corners.

Proof Let y0 be an extremal of (4.95) with a corner point at x∗ ∈ (a, b). Then from
the corner condition (4.103), we must have the continuity condition

lim Fy′ = lim Fy′ .


x→x∗− x→x∗+

That is
Fy′ x∗− , y(x∗− ), y′ (x∗− ) − Fy′ x∗+ , y(x∗+ ), y′ (x∗+ ) = 0.
 
(4.104)
Let p = y′ (x∗− ) and q = y′ (x∗+ ). Then by the Mean value theorem, there exists an
α ∈ (0, 1) such that
  
Fy′ x∗ , y(x∗ ), p) − Fy′ x∗ , y(x∗ ), q) = (p − q)Fy′ y′ x∗ , y(x∗ ), q + α(p − q) .

But then from (4.104), this implies that



(p − q)Fy′ y′ x∗ , y(x∗ ), q + α(p − q) = 0.

Or, 
Fy′ y′ x∗ , y(x∗ ), q + α(p − q) = 0,
which is a contradiction to the fact that Fy′ y′ ̸= 0. This completes the proof.
Example 4.24 According to Corollary 8 the extremal of the variational
 b
αy′2 + ϕ(y) + φ (x) dx, y(a) = A, y(b) = B

L(y) =
a

where ϕ and φ are continuous functions of y and x, respectively, has no corner points
when α ̸= 0, since Fy′ y′ = 2α. □
The next example shows that Fy′ y′ ̸= 0, is only a necessary condition.
Corners and Broken Extremal 281

Example 4.25 The variational


 2
(x3 + y2 )y′ + 3x2 y dx,

L(y) = y(1) = 1, y(2) = −1
1

was considered in Example 4.8, and it was shown that the functional was path inde-
pendent. In addition, the corresponding Euler-Lagrange equation is −2yy′ = 0, from
which we obtain either y(x) = 0 or y(x) = constant. Hence, neither one satisfies both
boundary conditions. Notice that Fy′ y′ = 0. Clearly, the path y0 (x) = 2x − 1 connects
both endpoints and it can be easily computed and verified that L(y0 (x)) = − 29 3 . (See
Example 4.8). Next, we construct another path with a corner point that will piecewise
connect both endpoints. Note that since the functional is path independent we may
assume a corner point anywhere in (1, 2). Thus we may take the corner point to be
at (3/2, 2), and we wish to construct a piecewise continuous and linear path in the
form of 
A1 x + B1 , 1 ≤ x ≤ 3/2
y(x) =
A2 x + B2 , 3/2 ≤ x ≤ 2.
Applying y(1) = 1, y(2) = −1, we obtain B1 = 1−A1 and B2 = −1−2A2 . Therefore,

A1 x + 1 − A1 , 1 ≤ x ≤ 3/2
y(x) = (4.105)
A2 x − 1 − 2A2 , 3/2 ≤ x ≤ 2.
Applying the corner condition
lim Fy′ = lim Fy′
x→x∗− x→x∗+

at x∗ = 3/2 we arrive at
lim y(x) = lim y(x),
x→(3/2)− x→(3/2)+

or
3 3
A1 + 1 − A1 = A2 − 1 − 2A2 .
2 2
This results into
A1 + A2 = −4. (4.106)
Making use of the other corner condition
lim F − y′ Fy′ = F − y′ Fy′ ,
   
lim
x→(3/2)− x→(3/2)+

yields to
lim (3x2 y) = lim (3x2 y).
x→(3/2)− x→(3/2)+

Simplifying 3x2 from both sides, we arrive at the same expression (4.106). Due to
the continuity requirement at 3/2, we must have the solution match at 3/2. That is,
y(3/2) = 2. Applying this to the first branch of the solution we obtain
3
A1 + 1 − A1 = 2,
2
282 Calculus of Variations

or A1 = 2. Using (4.106), we arrive at A2 = −6. Substituting A1 and A2 with their


values into (4.105), leads to the continuous and linear path

2x − 1, 1 ≤ x ≤ 3/2
y∗ (x) =
−6x + 11, 3/2 ≤ x ≤ 2.

One may check that L(y∗ (x)) = − 29


3 , also, since the variational is path independent.

4.11.1 Exercises
Exercise 4.76 In the spirit of Example 4.23 discuss the variational
 1
L(y) = y2 (1 − y′ )2 dx, y(−1) = 0, y(1) = 1.
−1

Exercise 4.77 Provide all details for obtaining (4.101).


Exercise 4.78 Show the solution of the variational
 3
L(y) = (y′ )3 dx, y(0) = 0, y(3) = 1,
0

has no corner point and find its extremal.


Hint: Check the corner conditions.
Exercise 4.79 In the spirit of Example 4.23 discuss the variational
 1
L(y) = (1 − y′2 )2 dx, y(0) = 0, y(1) = 1/4.
0

Exercise 4.80 Find the broken extremal of the variational


 4
L(y) = (y′2 − 1)2 (y′ + 1)2 dx, y(0) = 0, y(4) = 2.
0

Exercise 4.81 Redo Example 4.24, with corner point at (4/3, 5).

4.12 Variational Problems with Constraints


So far, we have dealt with functionals where the boundary points are fixed or allowed
to freely move along a well-defined curve. In this section, we generalize that result to
situations where equality constraints are imposed on the admissible curves. It might
be helpful to review the finite-dimensional problem with constraints. Recall from
calculus that such problems can be optimized using the concept of the Lagrange
Variational Problems with Constraints 283
y

g(x, y) = k f (x, y) = 7

• f (x, y) = 6
f (x, y) = 5
f (x, y) = 4
(0, 0) x

FIGURE 4.18
Level curves and Lagrange multiplier.

multiplier. The same concept will be used to deal with variational problems with
constraints. First, we begin with a short review of Lagrange multipliers for finite-
dimensional optimization problems.
Suppose we want to find the extreme value of the function f (x, y) subject to the
constraint g(x, y) = k, for a fixed constant k. In other words, if f (x.y) has an extrema
at (x∗ , y∗ ), then (x∗ , y∗ ) must lie on the level g(x, y) = k. To maximize f (x, y) subject
to g(x, y), = k is to find the largest value of c such that the level curve f (x, y) = c
intersects g(x, y) = k. In Fig 4.18, c = 4, 5, 6, 7. Also, it appears from Fig. 4.18 that
this happens when the curves touch each other; that is, when they have a common
tangent line. (Otherwise, the values of c could be increased further.) This can only
mean that the normal lines at (x0 , y0 ) where they touch are identical. This implies
that the gradient vectors are parallel. In other words,
∇ f (x0 , y0 ) = λ ∇g(x0 , y0 ),
for some scalar λ . The number λ is called a Lagrange multiplier. The next theorem
can be found in any advanced calculus textbook.
Theorem 4.19 (Lagrange Multiplier Rule) Let f and g be differentiable functions
with gx (x0 , y0 ) and gy (x0 , y0 ) not both zero. If (x0 , y0 ) provides an extreme value to
f (x, y) = 0 subject to the constraint g(x, y) = k, then there exists a constant λ such
that
fx∗ (x0 , y0 ) = 0, fy∗ (x0 , y0 ) = 0,
and g∗ (x0 , y0 ) = k, where f ∗ = f + λ g.
The above theorem is valid for functions in Rn . Recall that fort x ∈ Rn and f : Rn → R
is a smooth function, then the gradient of f , denoted by ∇ f is the vector
∂f ∂f ∂f
∇ f =< , ,..., >.
∂ x1 ∂ x2 ∂ xn
284 Calculus of Variations

Theorem 4.20 (Lagrange Multiplier Rule) Let ω ⊂ Rn and let f , g : ω → R, be


smooth functions. Suppose f has a local extremum at x∗ ∈ Rn subject to the constraint
g(x) = 0. If ∇ f (x∗ ) =
̸ 0, then there is a number λ such that

∇ f (x∗ ) = λ ∇g(x∗ ).

Let y, z ∈ C2 [a, b], and consider the variational with two variables y and z
 b
L(y, z) = F(x, y, y′ , z, z′ )dx, (4.107)
a

with boundary conditions

y(a) = A1 , y(b) = B1 z(a) = A2 , z(b) = B2 ,

and subject to the constraint


ϕ(x, y, z) = 0. (4.108)
Let η1 = η1 (x) ∈ C([a, b]) and η2 = η2 (x) ∈ C([a, b]), such that

η1 (a) = η1 (b) = η2 (a) = η2 (b) = 0.

From Section 4.6 we have that


 bh
d  d  i
Fy − Fy′ η1 (x)dx + Fz − Fz′ η2 (x) dx = 0. (4.109)
a dx dx

For the same η1 (x), η2 (x) we see that


δ ϕ(x, y, z) = (y + εη1 , z + εη2 ) ε=0

 
= ϕy (y + εη1 , z + εη1 )η1 + ϕz (y + εη2 , z + εη2 )η2
ε=0
= ϕy (y, z)η1 + ϕz (y, z)η2 .

Setting δ ϕ(x, y, z) = 0, we obtain

ϕy (y, z)η1 + ϕz (y, z)η2 = 0.

Multiply the above expression with Lagrange multiplier λ and then integrate the
resulting equation from a to b to obtain
 b  
λ ϕy (y, z)η1 + ϕz (y, z)η2 dx = 0. (4.110)
a

Subtracting (4.110) from (4.109) yields the following expression,


 bh
d  d  i
Fy − Fy′ − λ ϕy η1 (x)dx + Fz − Fz′ − λ ϕz η2 (x) dx = 0. (4.111)
a dx dx
Variational Problems with Constraints 285

Applying Lemma 10 one obtains from (4.111) the Euler-Lagrange equations,

d
Fy − F ′ − λ ϕy = 0 (4.112)
dx y
and
d
Fz − F ′ − λ ϕz = 0. (4.113)
dx z
Theorem 4.21 Let y, z ∈ C2 [a, b] be extremals for the variational (4.107) with bound-
ary conditions specified at x = a and x = b, subject to the constraint function (4.108).
Then y(x) and z(x) must satisfy the Euler-Lagrange equations given by (4.112) and
(4.113).
The next theorem easily generalizes Theorem 4.21 to n constraints functions and its
proof is Exercise 4.82.
Theorem 4.22 Let y, z ∈ C2 [a, b] be extremals for the variational (4.107) with bound-
ary conditions specified at x = a and x = b, subject to the n constraints

ϕi (y, z) = 0, i = 1, 2, . . . n.

Then y(x) and z(x) must satisfy the Euler-Lagrange equations


n n
d ∂ ϕi d ∂ ϕi
Fy − Fy′ − ∑ λi = 0, Fz − Fz′ − ∑ λi = 0.
dx i=1 ∂y dx i=1 ∂z

Example 4.26 Find the extremals y and z that minimize the functional
 π/2
1 + y′2 + z′2 dx, y(0) = z(0) = y(π/2) = z(π/2) = 0,

L(y, z) =
0

subject to the constraint y2 + z2 = 5. Here ϕ(y, z) = y2 + z2 − 5. Thus (4.112) and


(4.113) generate the two second-order differential equations

y′′ + λ y = 0, z′′ + λ z = 0.

Remember λ ∈ R and so, special care must be applied. We will do this in three
separate cases.
case 1 λ = 0. In this case the general solution for the first differential equation is

y(x) = c1 x + c2 .

Applying the boundary conditions, we get c1 = c2 = 0. This results in the trivial


solution y(x) = 0, which has to be rejected since it does not satisfy the constraint.
case 2. λ < 0. Say λ = −α 2 , where α > 0. Then the general solution is

y(x) = c1 eαx + c2 e−αx


286 Calculus of Variations

Applying the boundary condition, we get c1 = c2 = 0. Again, this results in the trivial
solution y(x) = 0, which has to be rejected since it does not satisfy the constraint.
case 3. λ > 0. Say λ = α 2 , where α > 0. Then the general solution is

y(x) = c1 cos(αx) + c2 sin(αx)

Applying the boundary conditions y(0) = 0, yields c1 = 0. Similarly, 0 = y(π/2)


implies that
π
c2 sin(α ).
2
So we have either c2 = 0, which results in the trivial solution again, or we set
π
sin(α ) = 0,
2
which holds when α π2 = nπ, or when α = 2n, n = 1, 2, . . . A similar argument can
be applied to the differential equation in z and obtain

y(x) = c sin(2nx), z(x) = d sin(2nx), n = 1, 2, . . .

Next we evaluate L at the obtained y and z to see if they minimize L since the inte-
grand of L is positive for all functions y and z.
 π/2  
1 + 4n2 (c2 + d 2 ) cos2 (2nx) dx

L c sin(2nx), d sin(2nx) =
0
π/2
π
= + 4n2 (c2 + d 2 ) cos2 (2nx)dx
2 0
π
= + πn2 (c2 + d 2 ). (4.114)
2
Note that expression (4.114) is increasing in n and therefore its minimum is achieved
when n = 1. That is y and z minimize L for n = 1. Therefore, the extremals are

y(x) = c sin(2x), z(x) = d sin(2x),

where
c2 sin2 (2x) + d 2 sin2 (2x) = 5, 0 < x < π/2.

4.12.1 Exercises
Exercise 4.82 Prove Theorem 4.22.
Exercise 4.83 Show that if y, z, w ∈ C2 [a, b] are extremals for the variational
 b
L(y, z) = F(x, y, y′ , z, z′ )dx,
a
Isoperimetric Problems 287

with boundary conditions specified at x = a and x = b, subject to the constraint


function
ϕ(y, z, w) = 0,
then y(x), z(x) and w(x) must satisfy the Euler-Lagrange equations

d
Fy − F ′ − λ ϕy = 0,
dx y
d
Fz −
F ′ − λ ϕz = 0,
dx z
d
Fw − Fw′ − λ ϕw = 0.
dx
Exercise 4.84 Use Exercise 4.83 to find the extremals y, z and w that minimizes the
functional
 b
1 ′2 ′2
y + z + w′2 dx,

L(y, z, w) =
0 2
with boundary conditions

y(0) = z(0) = 0, w(0) = 0, y(b) = z(b) = 0, w(b) = 0,

subject to the constraint y2 + z2 + w2 = 1.

4.13 Isoperimetric Problems


Let y = y(x) ∈ C2 ([a, b]) and consider the isoperimetric problem
 b
L(y) = F(x, y, y′ )dx, y(a) = A, y(b) = B, (4.115)
a

subject to the integral constraint


 b
W (y) = G(x, y, y′ )dx = d, (4.116)
a

where d is a fixed constant. The fixed functions F and G are assumed to be twice
continuously differentiable. The subsidiary condition (4.116) is called isoperimetric
constraint. Before, we assumed a local extremal y(x) in a family of admissible func-
tions with respect to which we carry out the extremization. A one parameter family
y(x) + εη(x) is not , however, a suitable choice since those curves may not maintain
the consistency of W. Therefore, we introduce a two parameters family

z = y(x) + ε1 η1 (x) + ε2 η2 (x),


288 Calculus of Variations

where η1 , η2 ∈ C2 ([a, b]) such that η1 (a) = η1 (b) = η2 (a) = η2 (b) = 0, and ε1 and
ε2 are real parameters ranging over the intervals containing the origin. We make the
assumption that y is not an extremal of W. Therefore, for any choice of η1 and η2
there will be values ε1 and ε2 in the neighborhood of (0, 0), for which W (z) = d.
Let  b
S1 (ε1 , ε2 ) = F(x, z, z′ )dx,
a
and  b
S2 (ε1 , ε2 ) = G(x, z, z′ )dx = C.
a
Since y is a local extremal of (4.115), subject to the constraint (4.116), the point
(ε1 , ε2 ) = (0, 0) must be a local extremal for S1 (ε1 , ε2 ) subject to the constraint
S2 (ε1 , ε2 ) = C. This is just a differential calculus problem and so the Lagrange mul-
tiplier rule might be applied. That is, there must be a constant λ such that
∂ S∗ ∂ S∗
= = 0, at (ε1 , ε2 ) = (0, 0), (4.117)
∂ ε1 ∂ ε2
where  b

S = S1 + λ S2 = F ∗ (x, z, z′ )dx,
a
with
F ∗ = F + λ G.
Substituting z = y(x) + ε1 η1 (x) + ε2 η2 (x) into S∗ and then calculating partial deriva-
tives with respect to ε1 and ε2 we arrive at

∂ S∗ b
Fy∗ (x, y, y′ )ηi (x) + Fy∗′ (x, y, y′ )ηi′ (x) dx,

(ε1 , ε2 ) = i = 1, 2.
εi a
Setting
∂ S∗
(ε1 , ε2 ) = 0,

εi (ε1 ,ε2 )=(0,0)
followed by an integration by parts on the term that involves η ′ and then applying
Lemma 10, we arrive at the Euler-Lagrange equation

d ∗
Fy∗ (x, y, y′ ) −
F ′ (x, y, y′ ) = 0, (4.118)
dx y
which is a necessary condition for an extremal. We proved the following theo-
rem.
Theorem 4.23 Let y ∈ C2 [a, b]. If is y not an extremal of (4.116) but an extremal
for the variational (4.115) with boundary conditions specified at x = a and x = b,
subject to the isoperimetric constraint (4.116), then y(x) satisfies the Euler-Lagrange
equation (4.118), or
∂   dh ∂  i
F +λG − ′
F + λ G = 0.
∂y dx ∂ y

We furnish an example.
Isoperimetric Problems 289

Example 4.27 In this example, we show that the sphere is the solid figure of revo-
lution that, for a given surface area l, has the maximum volume. Consider a curve
y(x) ≥ 0 with y(0) = 0, and y(a) = 0, a > 0. Revolve y(x) along the x-axis. Then,
any short circular strip with a radius y and a height ds has a surface area of 2πy ds.
Consequently, the total surface area of revolution is
 a  a p
l= 2πy ds = 2πy 1 + y′2 dx.
0 0
a
On the other hand, the volume of the solid of revolution is 0 πy2 dx. Thus, the
problem can be formulated as a variational problem with constraints. In other words,
we want to maximize  a
L(y) = πy2 dx,
0
subject to the constraint
 a p
2πy 1 + y′2 dx = l (constant).
0

Set p
F ∗ = πy2 + λ 2πy 1 + y′2 .
Since x does not enter in F ∗ , we will use the Euler-Lagrange equation

F ∗ − y′ Fy∗′ = c.

Or,
p 2πλ yy′
πy2 + 2πλ y 1 + y′2 − y′ p = c,
1 + y′2
which simplifies to
2πλ y
πy2 + p = c.
1 + y′2
Now y = 0, at x = 0 and at x = a, which can be true if c = 0, and wherefore we have


y = −p . (4.119)
1 + y′2

By squaring both sides and then solving for y′ we arrive at


p
′ 4λ 2 − y2
y = .
y
dy
Setting y′ = dx , separating the variables followed by an integration give
 
y
p dy = dx.
4λ 2 − y2
290 Calculus of Variations

This yields the expression


p
− 4λ 2 − y2 = x + k, for constant of integration k. (4.120)

Using y(0) = 0, in (4.120) we get k = ±2λ . Substituting k into (4.120) gives


p
4λ 2 − y2 = x ± 2λ .

By squaring both sides and rearranging the terms we obtain the solution

(x ± 2λ )2 + y2 = 4λ 2 .

Hence the obtained curve is a circle centered at (±2λ , 0) and radius 2λ . This shows
p the solid of revolution is a sphere. To find λ , we make use of (4.119) and obtain
that
y 1 + y′2 = −2λ . Substituting this into the integral constraint we arrive at
 a p  a
l= ′2
2πy 1 + y dx = 2π(−2λ ) dx = −4πλ a.
0 0

−l
This gives λ = 4πa .

The next theorem easily generalizes Theorem 4.23 and its proof is left as an exer-
cise.
Theorem 4.24 Let y, z ∈ C2 [a, b] be extremals for the variational
 b
L(y, z) = F(x, y, z, y′ , z′ )dx,
a

with fixed end points, and subject to the isoperimetric constraint


 b
W (y, z) = G(x, y, z, y′ , z′ )dx = d, (4.121)
a

where d is a fixed constant. If y and z are not extremals to (4.121), then they must
satisfy the Euler-Lagrange equations

∂   dh ∂  i
F +λG − F + λ G = 0, (4.122)
∂y dx ∂ y′
and  dh ∂ 
∂  i
F +λG − ′
F + λ G = 0. (4.123)
∂z dx ∂ z

We provide the following example.


Isoperimetric Problems 291
y

(x(t), y(t))

FIGURE 4.19
Dido’s area.

Example 4.28 [Dido’s problem] Most traditions identify Dido as the Phoenician
city-state of Tyre’s queen, who fled oppression to create her own city in northwest
Africa. Tyre is now in Lebanon. The legendary Dido requested a plot of land to farm
when she landed in Carthage (Tunisia) in 814 BC. Her request was accepted with
the provision that an n oxhide should encircle the area. She divided the oxhide into
incredibly tiny pieces and arranged them to completely enclose the available land.
View Figure 4.19. The problem comes down to finding the closed curve with a fixed
perimeter that encloses the maximum area. Let’s describe the curve by the parametric
equations (x(t), y(t)) with velocity (x′ (t), y′ (t)) and x(0) = x(1) and y(0) = y(1).
Then the length of the curve, or its perimeter is
q
x′2 (t) + y′2 (t)dt = d,

where d is the allowed perimeter. To find a formula for the enclosed area, we make
use of Green’s Theorem, which states that over a region D in the plane with boundary
∂ D we have 
∂g ∂ f 
f dx + gdy = − dxdy.
∂D D ∂x ∂y
If we set f = − 2y and g = 2x , we get
 
1
xdy − ydx = dxdy.
2 D D

Thus the enclosed area is given by



1
(xy′ − yx′ )dt.
2
So the problem comes down to maximizing

1
L(x, y) = (xy′ − yx′ )dt,
2
292 Calculus of Variations

subject to the constraint


q
W (x, y) = x′2 (t) + y′2 (t)dt = d,

Using equations (4.122) and (4.123) we arrive at


d λ x′ d λ y′
y′ = p , and x′ = − p .
dt x′2 + y′2 dt x′2 + y′2
An integration of both equations yield
λ x′ λ y′
y= p + c1 , and x = p + c2 .
x′2 + y′2 x′2 + y′2
Rearrange the terms and then square both sides of the two equations and get
λ 2 x′2 λ 2 y′2
(y − c1 )2 = , and (x − c 2 )2
= .
x′2 + y′2 x′2 + y′2
Adding both equations gives
(x − c2 )2 + (y − c1 )2 = λ 2
which is a circle centered at (c2 , c1 ) of radius λ . To find the radius λ , we substitute
the solutions into the isoperimetric constraint. Note that the solutions may be written
as
x(t) = −λ cos(2πt) + c2 , y(t) = λ sin(2πt) + c1 .
W.l.o.g, assume the circle is centered at (0, 0). Then
 1q  1
x′2 (t) + y′2 (t)dt = 2πλ dt = d,
0 0
d
implies that λ = 2π . Another way to find λ is to set the perimeter of a circle with
radius λ equal the given length of the circle. That is 2πd = λ . □
The next theorem addresses variationals with higher-order derivatives subject to con-
straints.
Theorem 4.25 [Euler-Lagrange equation] If a function y = y(x) ∈ C4 ([a, b]) is an
extremal to the variational problem in (4.45), subject to the constraint
 b
W (y) = G(x, y, y′ , y′′ )dx = d,
a

then y(x) must satisfy the Euler-Lagrange equation


d ∗ d2
Fy∗ − F′+ F ′′ = 0,
dx y dx2 y
where
F ∗ = F + λ G.
Isoperimetric Problems 293

y = x−1

Suspended chain

x
x=b

FIGURE 4.20
Catenary; transversality and natural conditions.

The next theorem addresses natural boundary conditions and transversality condi-
tions of variational that are subject to given constraints.
Theorem 4.26 Let y = y(x) ∈ C2 [a, b] be an extremal for the variational (4.79) with
boundary conditions specified or unspecified at x = a and x = b, and subject to the
constraint  b
W (y) = G(x, y, y′ )dx = d.
a
For λ ∈ R, set
F ∗ = F(x, y, y′ ) + λ G(x, y, y′ ). (4.124)
Then 1)-3) of Theorem 4.17 hold when F is replaced with F ∗.
Example 4.29 (Catenary revisited) In Exercise 4.39 we presented the problem of
the hanging of heavy chain from two fixed points. Now, we are considering the length
of the chain to be l > 1. In addition, unlike the situation in Exercise 4.39 we let the
chain slides freely along the vertical line x = b. The left end of the cable, that is at
x = a the chain slides along a tilted pole or skewed line. Again, for mathematical
convenience we assume the chain density and gravity are both one. See Fig. 4.20.
Let y − x + 1 = 0 be the tilted pole. Then, we must have 1 ≤ a < b. Then the problem
is to minimize to potential energy
 b p
L(y) = y 1 + y′2 dx,
a
294 Calculus of Variations

with y(b) being unspecified and y(a) moves along the curve y = h(x) = x − 1, subject
to the constraint  bp
W (y) = 1 + y′2 dx = l.
a
Equivalently, we are to find the path y(x) that minimizes L, where
p p
F ∗ = y 1 + y′2 + λ 1 + y′2 .

Since F ∗ does not explicitly depend on the variable x, by Corollary 7 the Euler-
Lagrange equation that y must satisfy is

y′ Fy∗′ − F ∗ = D.

Or,
y′2 (y + λ ) p
p − (y + λ ) 1 + y′2 = D,
1+y ′2

which simplifies to
y+λ
p = D.
1 + y′2
Solving for y′ we arrive at
1
q
y′ = (y + λ )2 − D2 .
D
By letting
y + λ = D cosh(t),
and then imitating the work of Section 4.5 on minimal surface we arrive at the solu-
tion
x + c2
y + λ = c1 cosh( ), (4.125)
c1
where the constants c1 and c2 are to be found. We have a natural boundary condition
at x = b which implies that

y′ (y + λ )
Fy∗′ x=b = p

.
1 + y′2 x=b

This yields that y′ (b) = 0, or y(b) = −λ . Now, if y(b) = −λ , then (4.125) implies
that cosh( b+c2
c1 ) = 0, which can not be. Therefore, we must take

y′ (b) = 0.

From (4.125) we get y′ (x) = sinh( x+c2 ′


c1 ). Apply y (b) = 0 to get
b+c2
c1 = 0, or c2 = −b.
Thus,
x−b
y(x) = −λ + c1 cosh( ).
c1
Isoperimetric Problems 295

The transversality condition



F ∗ + h′ (x) − y′ (a) Fy∗′ = 0,
x=a

yields
(y + λ )
p (1 + y′ ) = 0,
1 + y′2
or y′ (a) = −1. Combining this with y′ (x) = sinh( x−b
c1 ) we arrive at

b−a
sinh( ) = 1.
c1
Using the isoperimetric constraint we get
 br
x−b b−a 
1 + sinh2 ( )dx = c1 0 + sinh( ) = l.
a c1 c1

Combining
b−a b−a
sinh( ) = 1 and c1 sinh( ) = l,
c1 c1
−1
gives c1 = l. Thus, sinh( b−a
l ) = 1, from which we obtain b − a = l sinh (1). Using
(4.125) we obtain the solution
x−b
y(x) = −λ + l cosh( ).
l
Left to determine λ . Since y(a) lies on the line h(x) = x − 1, we have y(a) = a − 1.
In addition,
b−a
y(a) = −λ + l cosh( )
l
1 
= −λ + l cosh sinh−1 (1)l
√ l
= −λ + l 2.

Setting y(a) = y(a), yields λ = 1 − a + l 2. Thus, the solution is
√ x−b
y(x) = a − 1 − l 2 + l cosh( ).
l

4.13.1 Exercises
Exercise 4.85 Prove Theorem 4.24.
296 Calculus of Variations

Exercise 4.86 Find the extremal that minimizes


 π
L(y) = y′2 dx, y(0) = y(π) = 0,
0

subject to the constraint W (y) = 0 y2 dx = 1.
Exercise 4.87 Find the extremals for
 π
L(y) = (y′2 + x2 )dx, y(0) = y(π) = 0,
0

subject to the constraint W (y) = 0 y2 dx = 8.
Exercise 4.88 Maximize the surface area
 π p
S(y) = 2π y 1 + y′2 dx, y(0) = 0, y(π) = 0,
0

subject to the constraint V (y) = π 0 y2 dx = l which is the volume of the given sur-
face, for positive constant l.

Exercise 4.89 Find the extremal that minimizes


 1 p
L(y) = 1 + y′2 dx, y(−1) = y(1) = 0,
−1
1 2 dx
subject to the constraint W (y) = −1 y = l, l > 2.
Exercise 4.90 Find the extremal for
 2
L(y) = ydx, y(−2) = y(2) = 0,
−2
2 p
subject to the constraint W (y) = −2 1 + y′2 dx = 2π.
Exercise 4.91 Find the extremal for
 2
L(y) = y′2 dx, y(0) = 0, y(2) = 1,
0
2
subject to the constraint W (y) = 0 ydx = l, l > 0.
Exercise 4.92 [Dido’s problem in polar coordinates] Suppose in Dido’s problem we
require the enclosed land to be along a straight river, say the x-axis. We want to find
the curve that encloses the maximum area. Assume the curve is bounded by the river
and under the graph y = f (x), −1 ≤ x ≤ 1, with given perimeter L > 2. Set up of the
problem in polar coordinates to maximize
 π
L(r) = r2 dr, r(0) = 0, r(π) = 0,
0
Isoperimetric Problems 297

where r = r(θ ) subject to the constraint


 πp
W (y) = r2 + r′2 dθ = L.
0

Exercise 4.93 Find the curve for which the functional


 bp
1 + y′2
J(y) = dx, y(0) = 0
0 y
and y(b) varies along the circle

(x − 9)2 + y2 = 9.

Exercise 4.94 Minimize the functional


 π/2
J(y, z) = (1 + y′2 + z′2 ) dx, y(0) = z(0) = y(π/2) = z(π/2) = 0.
0

subject to y2 + 2z = 2.
1
Exercise 4.95 Find an extremal corresponding to J(y) = −1 y dx when subject to
1 2 ′2
y(−1) = y(1) = 0 and −1 (y + y )dx = 1.
e
Exercise 4.96 Find an extremal
e
corresponding to J(y) = 1 x2 y′2 dx when subject
to y(1) = y(e) = 0 and 1 y2 dx = 1.
Exercise 4.97 Find an extremal corresponding to
 π
J(y) = y′2 dx, y′ (0) = y′ (π) = 0,
0

when subject to 0 y2 dx = 1.
Exercise 4.98 Find an extremal corresponding to
 1
L(y) = (y′′2 + x2 )dx, y(0) = y(1) = y′ (0) = y′ (1) = 0
0

and  1
W (y) = (y2 + 1)dx = 2.
0
Exercise 4.99 Find the curve of fixed length πa joining the two points (−a, 0) and
(a, 0) and situated above the x-axis such that the area below it and above the x-axis
is maximum.
Exercise 4.100 Consider Example 4.29, but this time the chain in freely sliding on
the line x = 0 (y-axis). Also, the right end of the chain is left free to slide on a tilted
pole, given by the equation cx + dy = cd, where c, d > 0. Find the equation of the
chain that minimizes the potential energy.
298 Calculus of Variations

4.14 Sturm-Liouville Problem


Consider the second-order differential equation with parameter λ ∈ R,

y′′ (x) + P(x)y′ (x) − Q(x)y(x) − λ R(x)y(x) = 0, a ≤ x ≤ b,

where P, Q, R are continuous, and R is positive on [a, b]. Multiply both sides of the
above equation with 
r(x) = e P(x)dx ,
and then by observing that
 ′
r(x)y′ = r′ (x)y′ + r(x)y′′ ,

the above equation may take the form


 ′
r(x)y′ − q(x)y − λ p(x)y = 0, (4.126)

where q(x) = r(x)Q(x), and p(x) = r(x)R(x) > 0 for all x ∈ [a, b]. For constants
α1 , α2 , β1 , β2 , we impose the boundary conditions

α1 y(a) + β1 y′ (a) = 0, α2 y(b) + β2 y′ (b) = 0, (4.127)

with
α12 + β12 ̸= 0; α22 + β22 ̸= 0.

The differential equation given by (4.126) along with (4.127) is called Sturm-
Liouville problem (SLP). There is a habitual relation between variational with
isoperimetric constraint and Sturm-Liouville problem. To see this, let y = y(x) ∈
C2 ([a, b]) be an extremal for the variational
 b 
L(y) = r(x)y′2 + q(x)y2 dx,
a

subject to
 b
W (y) = p(x)y2 dx.
a
Then
F ∗ = r(x)y′2 + q(x)y2 + λ p(x)y2 ,
and
d ∗
Fy∗ − F ′ = 0,
dx y
implies that
r(x)y′′ + r′ (x)y′ − q(x)y − λ p(x)y = 0,
Sturm-Liouville Problem 299

which is equivalent to
 ′
r(x)y′ − q(x)y − λ p(x)y = 0.

Notice that the nontrivial y does not satisfy the Euler-Lagrange equation for the con-
straint W (y), since
−2p(x)y(x) = 0,
is not possible due to the fact that p(x) > 0 for all x ∈ [a, b]. Thus, we have
shown that the (SLP) can be recasted as variational problem with isoperimetric con-
straint.
The (SLP) has a wide range of applications. The boundary conditions make it con-
veniently suitable for standing wave. In addition, (SLP) models the one dimensional
time dependent Schrödinger equation
′ 2m
− ψ ′ (x) + 2 V (x)Ψ(x) − λ Ψ(x) = 0.
h

We make the following definitions.


Definition 4.10 If for a certain value of λ , the (SLP) given by (4.126) and (4.127)
has a nontrivial solution y(x), then λ is called an eigenvalue and y(x) the corre-
sponding eigenfunction.
Definition 4.11 If for any two functions f and g we have
 l
f (x)g(x)p(x)dx = 0, (4.128)
0

then we say f and g are orthogonal on [0, l] with respect to the weight function
p(x) > 0.
The integral on the left side of (4.128) is called the inner product of f and g and is
denoted by ( f , g). Thus,
 l
( f , g) = f (x)g(x)p(x)dx.
0

The number || f || defined by


 l 1
2
|| f || = f 2 (x)p(x)dx
0

is called the norm of f . Clearly,


 1
2
|| f || = f , f .

Now we are able to make the following definition.


300 Calculus of Variations

Definition 4.12 A set of functions f1 (x), f2 (x), . . . , fn (x) is orthonormal if



0, n ̸= m
( fn , fm ) = δnm =
1, n = m

It is then clear that an orthogonal set of functions can be made into an orthonormal set
by dividing each function in the set by its norm. The next theorem characterizes the
eigenvalues and eigenfunctions solutions of (SLP). Its proof can be found in various
places such as [1] of Chapter 2.
Theorem 4.27 For the Sturm-Liouville propblem given by (4.126) and (4.127) the
following statements hold.
i) The eigenvalues are real and to each eigenvalue there corresponds a single
eigenfunction up to a constant multiple.
ii) The eigenvalues form an infinite sequence −λ1 , −λ2 , . . . , −λn , . . . , and can be
ordered in a manner that 0 < −λ1 < −λ2 < −λ3 < . . . with

lim (−λn ) = ∞.
n→∞

iii) If −λm and −λn are two distinct eigenvalues, then the corresponding eigenfunc-
tions ym (x) and yn (x) are orthogonal on the interval [0, l].
Example 4.30 Consider the variational
 e
L(y) = x2 y′2 dx, y(1) = y(e) = 0,
1

subject to  e
W (y) = y2 dx = 1.
1
Then the corresponding Euler-Lagrange equation is
′
x2 y′ − λ y = 0,

or
x2 y′′ + 2xy′ − λ y = 0, (4.129)
which is a (SLP) with r(x) = x2 , q(x) = 0, and p(x) = 1. Using the method of Section
1.11, we arrive at
y′′ + y′ − λ y = 0,
where the independent variable x has been changed to the independent variable t
under the transformation x = et . If we assume solutions of the form y = emt , then we
have √
1 1 + 4λ
m=− ± .
2 2
Sturm-Liouville Problem 301

For λ = 0, we get the solution y(t) = c1 + c2 e−t , or y(x) = c1 + cx2 . Applying the
given boundary conditions we obtain c1 = c2 = 0, which corresponds to the trivial
solution (y = 0).
If 1 + 4λ > 0, then the solution is given by

y(x) = c1 xm1 + c2 xm2 ,


√ √
where m1 = − 12 + 1+4λ
2 , and m2 = − 12 − 1+4λ
2 . Again, applying the given bound-
ary conditions we obtain c1 = c2 = 0, which corresponds to the trivial solution
(y = 0).
If 1 + 4λ < 0, then we have complex roots and the solution is (see Section 1.11)
p p
− 12
 −(1 + 4λ ) −(1 + 4λ ) 
y(x) = x c1 cos( ln(x)) + c2 sin( ln(x)) .
2 2
Using 0 = y(1), automatically gives c1 = 0 and we are left with the solution
p
− 12 −(1 + 4λ )
y(x) = x c2 sin( ln(x)).
2
If we apply y(e) = 0, we get
p
− 12 −(1 + 4λ )
0=e c2 sin( ).
2
We have the trivial solution for c2 = 0, since c1 = 0. Thus, to avoid the trivial solution,
we set p
−(1 + 4λ )
sin( ) = 0,
2
which holds for p
−(1 + 4λ )
= nπ, n = 1, 2, . . . .
2
Or,
1
λ = −n2 π 2 − , n = 1, 2, . . . .
4
Thus, the eigenvalues and the corresponding eigenfunctions are given by
1 1
− λn = n2 π 2 + , yn (x) = c2 x− 2 sin(nπ ln(x)), n = 1, 2, . . . . (4.130)
4
To find the constant c2 , we make use of the isoperimetric constraint W (y).
 
e
2 sin2 (nπ ln(x))
e
y dx = c22 dx (4.131)
1 1 x
 
c22 e 1 − cos 2nπ ln(x)
= dx
2 1 x
302 Calculus of Variations

e cos 2nπ ln(x)

c22  e 
= ln(x) − dx .

2 1 1 x
du dx
To evaluate the integral we make the substitution u = 2nπ ln(x). Then 2nπ = x and
  
cos 2nπ ln(x) cos(u) sin(2nπ ln(x))
dx = du = .
x 2nπ 2nπ
So, (4.131) yields,
 e
c22  sin(2nπ ln(x)  e
W (y) = y2 dx = ln(x) −
2 2nπ

1 1
c22
= [1 − 0] = 1.
2
Or, √
c2 = ± 2.
Then, by (4.130) the eigenfunctions are
√ 1
yn (x) = ± 2x− 2 sin(nπ ln(x)), n = 1, 2, . . . .

4.14.1 The First Eigenvalue


In Example 4.30, we saw that the corresponding variational problem with isoperi-
metric constraint has infinitely many eigenvalues, and to each eigenvalue there cor-
responds an eigenfunction. Let yn (x) be given by (4.130) with c2 = 1. Due to the
Principle of superposition; p. 51, and the linearity of the Euler-Lagrange equation,
any function of the form

z(x) = ∑ bn yn (x),
n=1

is an extremal to the variational provided that z satisfies the isoperimetric constraint.


We have naturally presumed that the series converges and that each of its terms is
twice differentiable. We analyze the constraint at z to further clarify the concept. In
the next argument, we make use of the orthogonality of the eigenfunctions found in
Example 4.30 (see Exercise 4.103)
 e  e ∞
sin(nπ ln(x)) 2
W (z) = z2 dx = ∑ bn √ dx
1 1 n=1 x
 e ∞ ∞
sin(nπ ln(x)) sin(mπ ln(x))
= ∑ ∑ bm bn √ √ dx
1 n=1 m=1 x x
∞ ∞  e
sin(nπ ln(x)) sin(mπ ln(x))
= ∑∑ bm bn √ √ dx
n=1 m=1 1 x x
Sturm-Liouville Problem 303

∞ e
sin2 (nπ ln(x))
= ∑ b2n 1 x
dx due to orthogonality
n=1

1
= ∑ b2n = 1.
2 n=1
(4.132)

Thus,

∑ b2n = 2. (4.133)
n=1
Let yn (x) be given by (4.130) with c2 = 1. We have shown z(x) = ∑∞ n=1 bn yn (x),
with bn satisfying (4.133) is an extremal of the variational problem given in Example
4.30. However, if we want that same extremal to minimize the variational problem,
then we need to look deeper into the sequence of the eigenvalues λn , n = 1, 2, . . . .
Let’s evaluate the variational L at z. In the coming calculations we make use of the
following:
Let
sin(nπ ln(x)) cos(nπ ln(x))
fn (x) = √ , gn (x) = √ , x ∈ [1, e].
x x
Then,
( fn (x), fm (x)) = (gn (x), gm (x)) = 0, for all n ̸= m; n, m = 1, 2, . . . , (4.134)
and
( fn (x), gm (x)) = 0, for all n, m = 1, 2, . . . , (4.135)
The next argument is similar to the preceding one, so we skip some of the de-
tails.
 e
L(z) = x2 z′2 dx
1
∞  e
sin2 (nπ ln(x))
 e
cos2 (nπ ln(x)) 
∑ b2n

= + dx by (4.134) and (4.135)
n=1 1 4x 1 x

1 n2 π 2 
= ∑ b2n + . (4.136)
n=1 8 2

Since, the variational L is positive for non trivial solution, and the right side of (4.136)
is increasing in n, then it is likely that the minimum of L is achieved at n = 1, or at the
first eigenvalue −λ1 = π 2 + 14 . Recall, −λn = n2 π 2 + 14 , n = 1, 2, . . . . The extremal
eigenfunction that corresponds to the first eigenvalue is
√ 1
y1 (x) = ± 2x− 2 sin(π ln(x)).
Of course at this eigenfunction, W (y) = 1. Note that the number 1 in W (y) = 1, is
“symbolic”. As a matter of fact, the above discussion should hold for any number
l > 0, such that W (y) = l. Additionaly, (4.136) implies for n = 1 that
1
L(y1 (x)) = + π 2 = −λ1 ,
4
304 Calculus of Variations

where we have used b21 = 2 that was obtained from (4.133). Next, we make it clear
that y1 (x) minimizes L. Suppose there is another function f that minimizes L such
that f is different from y1 . Due to the completeness property of Fourier series, (see
Appendix A) f must be of the form ∑∞ n=1 bn yn (x). Since y1 and f differ, there is an
integer k ≥ 2 such that bK = ̸ 0. Thus, from (4.136) we get

1 n2 π 2 
L( f ) = ∑ b2n 8
+
2
n=1
1 ∞
π2 ∞ 2 2
= ∑ b2n + ∑ n bn
8 n=1 2 n=1
1 ∞ 2 π 2  K−1 2 2 ∞ 
= ∑ bn + ∑ n bn + K 2 b2K + ∑ n2 b2n
8 n=1 2 n=1 n=K+1
∞ 2 K−1 ∞
1 π  
≥ ∑ b2n + K 2 b2K + ∑ b2n + ∑ n2 b2n
8 n=1 2 n=1 n=K+1
1 ∞
π 2  ∞ 
> ∑ b2n + K 2 b2K − b2K + ∑ b2n
8 n=1 2 n=1
π 2  1 π 2  ∞
= (K 2 − 1)b2K + + ∑ b2
2 8 2 n=1 n
 1 π2  ∞
>
8
+
2 n=1∑ b2n
1 1 ∞
= + π2 ∑ b2n
4 2 n=1
1
= + π 2 , (by(4.132)).
4
This shows
1
+ π 2 = L(y1 ).
L( f ) >
4
This proves that y1 minimizes L subject to the constraint W.
The next theorem asserts that, in general, the corresponding eigenfunction to the
first eigenvalue of (SLP), does indeed minimize the variational subject to its isoperi-
metric constraint. For a quick reference we restate the (SLP). Consider the varia-
tional  b 
L(y) = r(x)y′2 + q(x)y2 dx, y(a) = y(b) = 0, (4.137)
a
and subject to
 b
W (y) = p(x)y2 dx = 1. (4.138)
a
Note that, (4.138) holds when the corresponding eigenfunctions are normalized with
respect to the weight function p.
Sturm-Liouville Problem 305

Theorem 4.28 Suppose −λ1 is the first eigenvalue of (4.137) and (4.138) with cor-
responding normalized eigenfunction y1 (x) ∈ C2 ([a, b]). Then among all admissible
normalized eigenfunctions y ∈ C2 ([a, b]), the function y = y1 (x) minimizes L, subject
to (4.138). Moreover, L(y1 ) = −λ1 .

Proof We mention that the presence of the number −λ1 and not λ1 , depends solely
on the way we decided to consider (4.126). We begin by multiplying (4.126) with y
and then integrating by parts the first term in the resulting equation from x = a, to
x = b.  bh
′ i
y r(x)y′ − q(x)y2 − λ p(x)y2 dx = 0.
a
′
Letting u = y, dv = r(x)y′ (x) dx and making use of y(a) = y(b) = 0, we arrive
at
 b  b  b
′ b
y(x) r(x)y (x) dx = r(x)y(x)y′ (x) x=a −
′ ′2
r(x)y (x)dx = − r(x)y′2 (x)dx.
a a a

Substituting into the previous equations and rearranging terms we arrive at


 b   b
′2 2
r(x)y + q(x)y dx = −λ p(x)y2 dx. (4.139)
a a

As W (y) = 1, (4.139) implies that


 b 
r(x)y′2 + q(x)y2 dx = −λ .
a

Since y is nontrivial, the number −λ is an eigenvalue. By ii) of Theorem 5.4, the first
eigenvalue is −λ1 and hence it has the corresponding normalized eigenfunction y1 .
This shows L is minimized at −λ1 ; that is

L(y1 ) = −λ1 > 0.

This completes the proof.

It is crucial that we examine the ratio of L(y) and W (y) in more detail. Expression
(4.139) gives
L(y)
L(y) = −λ1W (y), or = −λ1 .
W (y)
We define the Rayleigh quotient
L(y)
R(y) = . (4.140)
W (y)
It is evident from (4.140) that for any nontrivial solution φn (x), that corresponds to
eigenvalues −λn , we have that

R(φn ) = −λn , n = 1, 2, . . . . (4.141)


306 Calculus of Variations

It is important to remark that (4.141) holds for all nontrivial eigenfunctions y whether
they are normalized or not, since the same excess factor will appear in the numerator
and denominator, and hence it cancels out.
Also, (4.141) is handy when the eigenvalues can be computed, which is not the case
in some situations. The Rayleigh quotient can be easily generalized to the (SLP) with
general boundary conditions. Let y = y(x) ∈ C2 ([a, b]) be an extremal for
 b 
L(y) = r(x)y′2 + q(x)y2 dx,
a

with boundary conditions (4.127) and subject to the constraint


 b
W (y) = p(x)y2 dx.
a

Then b b
−r(x)y(x)y′ (x) x=a + a r(x)y′2 + q(x)y2 dx

−λ = b . (4.142)
a p(x)y2 dx
The verification of (4.142) comes from integrating by parts the first term in
 b 
′
y r(x)y′ − q(x)y2 − λ p(x)y2 dx = 0,
a

and then solving for −λ . Under the special boundary conditions

y(a) = y(b) = 0, (Dirichlet boundary conditions)

y′ (a) = y′ (b) = 0, (Neumann boundary conditions),


the first term in (4.142)
b
r(x)y(x)y′ (x) x=a = 0.
In such cases, we have from (4.140) and (4.142) that
L(y)
= −λ1 = R(y).
W (y)

The next theorem says that the Rayleigh quotient yields an upper bound to the true
value of the lowest eigenvalue −λ1 .
Theorem 4.29 Suppose −λ1 is the first eigenvalue of (4.137) and (4.138). Let

σ = {u : u ∈ C2 ([a, b]), u(a) = 0, u(b) = 0}.

Let the Rayleigh quotient be given by (4.140). Then

min R(u) = −λ1 . (4.143)


u∈σ
Sturm-Liouville Problem 307

Note that Theorem 4.29 does not requires the function u to be an extremal of (4.137)
subject to the constraint (4.138). You may think of the set σ as the set of “trial
functions”.

Proof
For y ∈ σ , we let ŷ = y + εη, with η(a) = η(b) = 0. Set

min R(y) = M.
y∈σ

By Taylor’s expansion about ε we have


 b 
L(ŷ) = r(x)(y′ + εη ′ )2 + q(x)(y + εη)2 dx
a
 b 
= r(x)y′2 + q(x)y2 dx
a
 b 
+ 2ε r(x)y′ η ′ (x) + q(x)yη(x) dx + O(ε 2 ).
a

An integration by parts yields,


 b  b
b ′
r(x)y′ η ′ (x)dx = r(x)y′ (x)η(x) x=a − η(x) r(x)y′ (x) dx
a a
 b ′
= − η(x) r(x)y′ (x) dx.
a

Thus,  b  
L(ŷ) = L(y) + 2ε η − (r(x)y′ )′ + q(x)y dx + O(ε 2 ).
a
Similarly, but without the integration by parts,
 b
W (ŷ) = W (y) + 2ε η p(x)ydx + O(ε 2 ).
a

Then by (4.140) we see that


 b  
L(ŷ) = MW (y) + 2ε η − (r(x)y′ )′ + q(x)y dx + O(ε 2 ).
a

Rearranging the terms we get


 b  
L(ŷ) − MW (ŷ) = 2ε η − (r(x)y′ )′ + q(x)y − M py dx + O(ε 2 ).
a

Since W (y) ̸= 0 for nontrivial y, it follows that


L(ŷ) L(y)
R(ŷ) − R(y) = −
W (ŷ) W (y)
308 Calculus of Variations
W (y)L(ŷ) −W (ŷ)L(y)
=
W (y)W (ŷ)
L(ŷ) − MW (ŷ) L(y)
= (using the fact that M = )
W (ŷ) W (y)
b  
2ε a η − (r(x)y′ )′ + q(x)y − M py dx + O(ε 2 )
= .
W (ŷ)
We claim that
−(r(x)y′ )′ + q(x)y − M py = 0.
Otherwise, we may choose η so that, for small ε the integral is negative, which
violates the fact that R(ŷ) − R(y) > 0. We conclude that (r(x)y′ )′ − q(x)y − M py = 0,
which is the (SLP), and therefore M must be an eigenvalue. From Theorem 4.28, we
have
R(φn ) = −λn ≥ min R(y) = M = −λn , for all n.
y∈σ

Therefore,
min R(y) = −λ1 .
y∈σ

This completes the proof.

4.14.2 Exercises
Exercise 4.101 Put the second-order differential equation in the form of (4.126),

y′′ + 4y′ − 3y − λ y = 0, y(0) = 0, y(1) = 0.

Exercise 4.102 Put the second-order differential in the form of (4.126),


1
x2 y′′ + y′ − λ y = 0, y(1) = 0, y(2) = 0.
x
Exercise 4.103 Show that the eignefunctions
√ 1
yn (x) = ± 2x− 2 sin(nπ ln(x))

that were found in Example 4.30 are orthogonal and normalize the eigenfunctions.
Exercise 4.104 Prove (4.134) and (4.135).
Exercise 4.105 Redo Example 4.30 for the variational problem
 π
L(y) = y′2 dx, y(0) = y(π) = 0,
0

subject to W (y) = 0 y2 dx = 3. Then evaluate L at the eigenfunctions yn (x) =
2
∑∞ ∞
n=1 an sin(nx), to find a formula for ∑n=1 an . Finally, argue or refer to other state-
ments, that the eigenfunction corresponding to the first eigenvalue minimizes L.
Sturm-Liouville Problem 309

Exercise 4.106 Put the (SLP)

λ
(xy′ )′ − = 0, y(1) = y(b) = 0
x
in the form of (4.137) and (4.138). Find the eigenvalues and normalized eigenfunc-
tions and show that the normalized eigenfunction corresponding to the first eigen-
value minimizes L.
Exercise 4.107 Consider the variational problem
 l
L(y) = y′2 dx, y(0) = y(l) = 0,
0
l
subject to W (y) = 0 y2 dx. Find the eigenvalues −λn , and corresponding eigen-
functions yn (x), n = 1, 2, . . .(No need to normalize them). We already know that
R(y1 ) = −λ1 .
For the next parts, take l = 1 and use(4.140)
(a) Compute R(yT ) at the trial function

x, 0 ≤ x ≤ 1/2
yT =
1 − x, 1/2 ≤ x ≤ 1

(b) Redo part (a) for


yT = x(1 − x).

(c) Which function is a better estimate of −λ1 ?


(d) Use the trial function
1
yT = x(x − )(x − 1)
2
to estimate −λ2 . How close is the estimate of yT to −λ2 ?
Exercise 4.108 Consider the boundary value problem
1
y′′ − λ (1 + x)y = 0, y(0) = y(π) = 0.
10
Compute R(y) for the following trial functions:
(a) 
x, 0 ≤ x ≤ π/2
yT =
π − x, 1/2 ≤ x ≤ π

(b)
yT = x(π − x).
310 Calculus of Variations

4.15 Rayleigh Ritz Method


The Rayleigh-Ritz method is a numerical procedure to obtain approximate solutions
to problems that can be recast as variational problems. The method takes us from
an infinite-dimensional problem to a finite-dimensional problem. We seek a function
y = y(x) that is an extremal to the variational problem.
 b
L(y) = F(x, y, y′ )dx, y(a) = A, y(b) = B, (4.144)
a

in the form
N
y(x) ≈ φ0 (x) + c1 φ1 (x) + c2 φ1 (x) + . . . + cN φN (x) = φ0 (x) + ∑ cn φn (x), (4.145)
n=1

where the constants ci , i = 1, 2, . . . N are to be found. The mystery is to find or select


the functions φi , i = 0, 1, . . . N, that will efficiently do the job. Below we list couple
observations:
1) If the boundary conditions are given, then chose φ0 (x) so that it satisfies all
the problem’s boundary conditions and the others φi i = 1, 2, . . . N, vanish at the
boundary conditions.
2) The functions φi i = 1, 2, . . . N, can be chosen in problems when one is aware of
the form of the answer in order for the equation (4.145) to take that form.

For example if we have a boundary value problem in which the solutions are of the
form y = c + dx, and boundary conditions y(0) = 0, y(1) = 1, then we may take
φ0 (x) = x, and φ1 (x) = x(x − 1), and hence

y(x) ≈ x + c1 x(1 − x).

Note that φ0 (0) = 0, φ0 (1) = 1, and φ1 (0) = φ1 (1) = 0. Here we only decided to
select φ0 and φ1 . If we were to write down all of them, then we would set

φ2 (x) = x2 (1 − x), φ3 (x) = x3 (1 − x), . . . , φN (x) = xN (1 − x),

which corresponds to an approximate of the form

y(x) ≈ x + x(x − 1) c1 + c2 x + . . . + cN xN−1 .




Next we substitute (4.145) into the variational in (4.144) and suppose we want
to
 b  N N 
′ ′
Minimize L(y) = F x, φ0 (x) + ∑ cn φn (x), φ0 (x) + ∑ cn φn (x) dx.
a n=1 n=1
Rayleigh Ritz Method 311

The independent variable x will integrate out and we are left with a function of the
unknown constants say, L(c1 , c2 , . . . , cN ). The problem reduces to
min L(y) = min L(c1 , c2 , . . . , cN ).
c1 ,c2 ,...,cN

Using our knowledge of calculus, we require


∂L
= 0, i = 1, 2, . . . , N.
∂ ci
We are left with solving N linear equations in N unknown variables c1 , c2 , . . . , cN .
This procedure generates the best estimate when we begin with the initial estimate
y(x) ≈ φ0 (x) + c1 φ1 (x) and c1 are determined by the substitution of y into the varia-
tional. Then, set y(x) ≈ φ0 (x) + c1 φ1 (x) + c2 φ2 (x), and redetermine c1 and determine
c2 by substituting y into the variational. Repeat the same process, and at each stage,
the following is true: At the Nth stage, the terms c1 , c2 , . . . , cN−1 that have been pre-
viously determined are predetermined. This guarantees a better estimate at the Nth
stage
y(x) ≈ φ0 (x) + c1 φ1 (x) + . . . cN φN (x)
than the approximation at the N − 1st stage
y(x) ≈ φ0 (x) + c1 φ1 (x) + . . . cN−1 φN−1 (x).
The above process should lead to a convergence of the approximations to the real
extremizer. That is
 N 
lim φ0 (x) + ∑ cn φn (x) = y0 (x),
N→∞
n=1
where y0 (x) is the extremizing function. We furnish the following example.
Example 4.31 In Example 4.2 we found the true solution for the extremal of
 1
(y′ )2 − xy − y2 dx,

L(y) = y(0) = 1, y(1) = 2
0

to be
2e−1 − 5 x 5 − 2e −x 1
y(x) = e + e − x.
2(e−1 − e) 2(e−1 − e) 2
Next we apply the Rayleigh-Ritz Method. Set φ0 (x) = 1 + x. then φ0 (0) = 1, and
φ0 (1) = 2. Thus φ0 (x) satisfies the boundary conditions as required by the method.
Choose, φ1 (x) = x(1 − x). Clearly, φ1 (x) vanishes at the boundaries and it has no
zeroes in (0, 1). Set
y1 (x) = φ0 (x) + c1 φ1 (x) = 1 + x + c1 x(1 − x).
Next we substitute y1 into the variational and obtain
 1 
L(y1 ) = (y′1 )2 − xy1 − y21 dx
0
312 Calculus of Variations
 1 n 2 2 o
1 + c1 (1 − 2x) − x − x2 − c1 x2 (1 − x) − 1 + x + c1 x(1 − x)

= dx
0
 1n o
= (−3x − 2x2 ) + c1 (2 − 6x − x2 + 3x3 ) + c21 (1 − 4x + 3x2 + 2x3 − x4 ) dx
0
13 7 3
=− − c1 + c21 .
6 12 10
∂L 35
Solving = 0 yields c1 = . Thus
∂ c1 4
35
y1 (x) = 1 + x + x(1 − x),
4
d2 L 35
as the first approximate solution. We remark that, since dc21
> 0 at c1 = 4 , y1 (x) is
a minimizer candidate. The relation

y(x) ≈ 1 + x + x(1 − x)[c1 + c2 x + . . . + cN xN−1 ]

offers higher-order approximations. For example, when N = 1, we get y1 (x), and


when N = 2, we get y2 (x) = 1 + x + x(1 − x)[c1 + c2 x]. If we substitute y2 (x) into L,
then we would set
∂L ∂L
= 0, = 0.
∂ c1 ∂ c2

Remark 20 According to our guidelines on how to choose φi (x), i = 0, 1, . . . , N if
the boundary conditions are y(0) = y(1) = 0, then you could set φ0 (x) = 0. In this
case y1 may take the form y1 (x) = c1 φ1 (x) = c1 x(1 − x).

4.15.1 Exercises
Exercise 4.109 Compute y2 (x) = 1 + x + x(1 − x)[c1 + c2 x] in Example 4.31.
Exercise 4.110 Compute the second-order approximation y2 (x) for the variational
 1
(y′ )2 − 2xy − y2 dx,

L(y) = y(0) = 1, y(1) = 2,
0

and compute the true extremal. Graph both functions; that is, the true solution and
y2 (x) on the same graph.
Exercise 4.111 Redo Exercise 4.110 when the boundary conditions are

y(0) = 0, y(1) = 0.

Exercise 4.112 Put the boundary value problem

xy′′ + y′ + y − x = 0, y(0) = 0, y(1) = 1,


Multiple Integrals 313

into a variational form and use Rayleigh Ritz method to obtain an approximation in
the form
y2 (x) ≈ x + x(1 − x)[c1 + c2 x].
Exercise 4.113 Compute the second-order approximation y2 (x) for the variational
 1
(y′ )2 − 2xy − 2y dx,

L(y) = y(0) = 2, y(1) = 1,
0

and compute the true extremal. Graph both functions; that is, the true solution and
y2 (x) on the same graph.
Exercise 4.114 Compute the second-order approximation y2 (x) for the variational
 2
1 2 ′ 2 
L(y) = (x y ) + 6xy dx, y(1) = y(2) = 0,
1 2
and compute the true extremal. Graph both functions; that is, the true solution and
y2 (x) on the same graph.

4.16 Multiple Integrals


Let R be a closed region in the xy-plane. By C2 (R) we mean the set of all continuous
functions u = u(x, y) defined on R having continuous second partial derivatives on
the interior of R. Geometrically, u = u(x, y) represents a smooth surface over the
region R. For the set of admissible functions A we take the set of functions in C2 (R)
whose values are fixed on the curve C , which bounds the region R in the xy-plane.
Hence u ∈ A if u ∈ C2 (R) and

u(x, y) = f (x, y), (x, y) ∈ C ,

where f (x, y) is a given function defined over C and whose values trace out a fixed
curve Γ, which forms the boundary of the surface u. See Fig. 4.21.
The variational problem is to minimize
 

J(u) = F x, y, u(x, y), ux (x, y), uy (x, y) dx dy, (4.146)
R

where u ∈ A. We seek a necessary condition for a minimum. Let u(x, y) provide


a local minimum for the functional J and consider the family of admissible func-
tions
u(x, y) + εη(x, y)
where η ∈ C2 (R) and η(x, y) = 0 for (x, y) on C . Then
d
δ J(u, η) = J(u + εη) ε=0

314 Calculus of Variations
u
Γ

u = u(x, y)

R u = f (x, y)

x C

FIGURE 4.21
Surface u = u(x, y) minimizing J(u).

 
d 
= F x, y, u + εη, ux + εηx , uy + εηy dx dy
dε R
 
 
= Fu η + Fux ηx + Fuy ηy dx dy
 R
 ∂ ∂ 
= Fu − Fux − Fuy ηdx dy
∂x ∂y
 R
∂ ∂ 
+ (ηFux ) + (ηFuy ) dx dy.
R ∂x ∂y

The second integral can be transformed to a line integral over C using Green’s theo-
rem, which states that if P and Q are functions in C1 (R), then
  
∂Q ∂P
− dx dy = Pdx + Qdy.
R ∂x ∂y C
Consequently,
 
 ∂ ∂ 
δ J(u, η) = Fu − Fux − Fuy ηdx dy
R ∂x ∂y

+ ηFux dy − ηFuy dx.
C

Since η(x, y) = 0 for (x, y) on C , the line integral vanishes and we have
 
 ∂ ∂ 
δ J(u, η) = Fu − Fux − Fuy ηdx dy. (4.147)
R ∂x ∂y
Multiple Integrals 315

Since u is a local minimum if follows that δ J(u, η) = 0 for every η ∈ C2 (R) with
η(x, y) = 0 on C . Next, we extend the Fundamental Lemma of Calculus of Variations
to two variables.
Lemma 14 Suppose g(x, y) is continuous over the region Ω ⊂ R2 . If
 
g(x, y)η(x, y)dxdy = 0,

for every continuous function η(x, y) defined on Ω and satisfying η = 0 on ∂ Ω, then


g(x, y) = 0 for all (x, y) in Ω.
Back to (4.147). Since u is a local minimum it follows that δ J(u, η) = 0 on C . Thus
with the aid of Lemma 14, we have
∂ ∂
Fu − Fu − Fu = 0, (4.148)
∂x x ∂y y
which is the Euler-Lagrange equation for the problem (4.146). Equation (4.148) is
a second-order partial differential equation for u = u(x, y). It is straight forward to
generalize (4.146) to m-integrals of the form
 
∂u ∂u 
J(u) = ... F x1 , x2 , . . . , u, ,..., dx1 . . . dxm ,
Rm ∂ x1 ∂ xm
where u = u(x1 , x2 , . . . , xm ), R is a closed region in m-dimensional Euclidean space.
The Euler-Lagrange equation in this case is
∂ ∂F ∂ ∂F ∂ ∂F
Fu − − −...− = 0, (4.149)
∂ x1 ∂ ux1 ∂ x2 ∂ ux2 ∂ xm ∂ uxm
∂u
where uxi = .
∂ uxi
Example 4.32 (Plateau’s problem) Given a fixed curve Γ in space, find the surface
u = u(x, y) with boundary whose surface area is least. That is we need to minimize
  q
J(u) = 1 + u2x + u2y dx dy,
R

where R is the region enclosed by the curve C which is the projection of Γ onto the
xy-plane. The function u should satisfy u(x, y) = h(x, y), (x, y) ∈ C , where h is the
1
function defining Γ. We have F = (1 + u2x + u2y ) 2 , with
ux uy
Fu = 0; Fux = q ; Fuy = q ,
1 + u2x + u2y 1 + u2x + u2y

and the corresponding Euler equation is


∂ ux ∂ uy
− q − q = 0.
∂ x 1 + u2 + u2 ∂ y 1 + u2 + u2
x y x y
316 Calculus of Variations
y
u = f (x)
b

u=0
∇2 u = 0
u=0

(0, 0) x
u=0 a

FIGURE 4.22
Dirichlet problem in rectangular coordinates.

After simplifications, the Euler equation reduces to

(1 + u2y )uxx − 2ux uy uxy + (1 + u2x )uyy = 0,

which is the equation for the minimized surface. This is nonlinear partial differential
equation and almost impossible to solve in its current form for a given boundary
curve Γ. □
In the next example we consider the steady temperature in rectangular coordinates,
called the Dirichlet problem. For more on this, we refer to Appendix A.
Example 4.33 Consider the variational problem
 
u2x + u2y dx dy,

J(u) =
R

where R = {(x, y) : 0 < x < a, 0 < y < b}. Suppose on the boundary of R, we have

u(0, y) = 0, u(a, y) = 0 (0 < y < b),

u(x, 0) = 0, u(x, b) = f (x) (0 < x < a),


as depicted in Fig. 4.22. Then the corresponding Euler-Lagrange equation is

∇2 u = uxx (x, y) + uyy (x, y) = 0. (4.150)

Equation (4.150) along with the boundary conditions represent the steady tempera-
tures u(x, y) in a plates whose faces are insulated. The function u(x, y) represents the
electrostatic potential in a space formed by the planes x = 0, x = a, y = 0, and y = b
when the space is free of charges and planar surfaces are kept at potentials given by
the boundary conditions. We are seeking non trivial solution and hence if we assume
the solution u(x, y) is the product of two functions one in x and the other in y, such
that
u(x, y) = X(x)Y (y),
Multiple Integrals 317

we obtain
X ′′ (x)Y (y) + X(x)Y ′′ (y) = 0.
Since X(x) ̸= 0, and Y (y) ̸= 0, we may divide by the term X(x)Y (y), to separate the
variables. That is,
X ′′ (x) Y ′′ (y)
=− .
X(x) Y (y)
Since the left-hand side is a function of x alone, it does not vary with y. However, it
is equal to a function of y alone, and so it can not vary with x. Hence the two sides
must have some constant value −λ in common. That is,
X ′′ (x) Y ′′ (y)
=− = −λ .
X(x) Y (y)
This gives the Sturm-Liouville problems

X ′′ (x) + λ X(x) = 0, X(0) = 0, X(a) = 0, (4.151)

and
Y ′′ (y) − λY (y) = 0, Y (0) = 0. (4.152)
Using arguments of Section 4.14, Equations (4.151) and (4.152) have the respective
eigenfunctions
nπx nπy
Xn (x) = sin( ), Yn (y) = sinh( ), n = 1, 2, . . .
a a
where the eigenvalues are given by λn = ( nπ 2
a ) . Thus, the general solution of the
Dirichlet problem

nπy nπx
u(x, y) = ∑ bn sinh( ) sin( ).
n=1 a a
For detail on computing the coefficients bn we refer to Appendix A. Thus, bn are
given by  a
2 nπx
bn = nπb
f (x) sin( )dx, n = 1, 2, . . . .
a sinh( a ) 0 a

Next we look at a parametrized three dimensional surface. Suppose we have a surface
S specified or parametrized with

r = r(u, v) = x(u, v), y(u, v), z(u, v) . (4.153)

The shortest curve lying on a surface S connecting given points on S is called


geodesic. Remember, a curve lying on the surface S can be specified by u = u(t), v =
v(t). The arclength between the points on S corresponding to t = t0 and t = t1
is  t1 p
J(u, v) = Eu′2 + 2Ku′ v′ + Gv′2 dt, (4.154)
t0
318 Calculus of Variations

where E, K, and G are called the coefficient of the first fundamental form

E = ru · ru , K = ru · rv , G = rv · rv ,

where · means the dot product. Equation (4.154) is a variational with several func-
tions and by (4.149) we have the relevant Euler-Lagrange equations
d d
Fu − Fu′ = 0, and Fv − Fv′ = 0.
dt dt
Be aware that the coefficients of the first fundamental form E, G and K depend on u
and v. After some calculations the corresponding Euler-Lagrange equations are given
by
Eu u′2 + 2Ku u′ v′ + Gu v′2 d Eu′ + Kv′
√ − √ = 0, (4.155)
2 Eu′2 + 2Ku′ v′ + Gv′2 dt Eu′2 + 2Ku′ v′ + Gv′2
and
Ev u′2 + 2Kv u′ v′ + Gv v′2 d Ku′ + Gv′
√ − √ = 0. (4.156)
′2
2 Eu + 2Ku v + Gv′ ′ ′2 dt Eu + 2Ku′ v′ + Gv′2
′2

In the next example we use equations (4.155) and (4.156) to find the geodesics on a
circular cylinder.
Example 4.34 In this example we want to find the geodesics on a circular cylinder.
Note that the circular cylinder has the parametrization

r = (a cos(u), a sin(u), v)

where a is the radius. Then,

E = ru · ru = (−a cos(u), a cos(u), 0) · (−a cos(u), a cos(u), 0) = a2 ,

K = ru · rv = (−a cos(u), a cos(u), 0) · (0, 0, 1) = 0,


and
G = rv · rv = (0, 0, 1) · (0, 0, 1) = 1.
Then the corresponding equations to (4.155) and (4.156) are

d a2 u′ d v′
− √ = 0, − √ = 0.
dt a2 u′2 + v′2 dt a2 u′2 + v′2
Moreover, the corresponding solutions are given by

a2 u′ a2 v′
√ = c1 , √ = c2 ,
a2 u′2 + v′2 a2 u′2 + v′2
for constants c1 and c2 . Taking the ratio we obtain

√ v′
a2 u′2 +v′2 c2
= .
√ a2 u′ c1
a2 u′2 +v′2
Multiple Integrals 319

FIGURE 4.23
Geodesics on a right cylinder.

It follows that
v′ c2
2 ′
= = k,
a u c1
for another constant k. This implies that

v′
= a2 k,
u′
and by rewriting the derivatives we see that
dv
dt
du
= a2 k.
dt

Separating the variables yields the first-order ODE dv = a2 kdu, which has the solu-
tion
v(t) = a2 ku(t) + c3 , for constant c3 ,
which is a two parameter family of helical lines on the cylinder, where the constants
can be determined from the location of the two points A and B as depicted in Fig.
4.23. □

4.16.1 Exercises
Exercise 4.115 Determine the natural boundary condition for
 

J(u) = F x, y, u(x, y), ux (x, y), uy (x, y) dx dy,
R

u ∈ C2 (R) and u unspecified on the boundary of ∂ R of R.


Exercise 4.116 Determine the Euler-Lagrange equation for
 
x2 u2x + y2 u2y dx dy.

J(u) =
R
320 Calculus of Variations

Exercise 4.117 Let a and b be nonzero constants. Determine the Euler-Lagrange


equation for  
a2 u2x + b2 u2y dx dy,

J(u) =
R
x2 y2
where R = {(x, y) : a2
+ b2 < 1}. Suppose on the boundary of R, we have u(x, y) =
2x2
a2
− 1 for all
x2 y2
(x, y) ∈ ∂ R = {(x, y) : + = 1}.
a2 b2
Refer to Section 2.5 to find the solution.
Exercise 4.118 Redo Example 4.33 with boundary conditions

u(0, y) = 0, u(a, y) = 0 (0 < y < b),

u(x, 0) = g(x), u(x, b) = 0 (0 < x < a).


Exercise 4.119 Redo Example 4.33 with boundary conditions

u(0, y) = 0, u(a, y) = 0 (0 < y < b),

u(x, 0) = f (x), u(x, b) = g(x) (0 < x < a).


Exercise 4.120 Give all details on how to obtain equations (4.155) and (4.156).
Exercise 4.121 Find the geodesics on the right circular cone

x2 y2
z2 = + , 0 ≤ z ≤ 3.
4 9
Hint: Use the following parametrization for the cone

r = (2v cos(u), 3v sin(u), v), 0 ≤ u ≤ 2π, 0 ≤ v ≤ 3.

Exercise 4.122 Determine (4.155) and (4.156) for the surface parametrized by

r = (5 sin(u) cos(v), sin(u) sin(v), 2 cos(u)), 0 ≤ u ≤ π, 0 ≤ v ≤ 2π.


5
Integral Equations

Integral equations are used in a wide variety of contexts, including science and engi-
neering. Integral equations such as those derived from Volterra or Fredholm can be
utilized to find solutions to a wide variety of initial and boundary value problems.
Integral equations can take on a number of different forms, but in most cases they
are used to model scientific procedures in which the current value of a quantity (or
set of values) or its rate of change is dependent on its historical performance. This is
in contrast to differential equations, which assume that the value of a quantity at any
given time is the only factor that may affect the rate at which it changes. In the same
way that differential equations need to be “solved,” integral equations also need to be
“solved” in order to describe and predict how a physical quantity will behave over a
period of time. One strong argument in favor of using integral equations rather than
differential equations is the fact that all of the conditions defining the initial value
problems or boundary value problems for a differential equation can frequently be
condensed into a single integral equation. This is one of the many reasons why in-
tegral equations are preferred over differential equations. The study of a variety of
integral equations, including Fredholm first- and second-kind integral equations as
well as Volterra integral equations, symmetric and separable kernels, iterative meth-
ods, the approximation of non-degenerate kernels, and the application of the Laplace
transform to the solution of convoluted integral equations, will be the focus of our
work. The chapter comes to a close with a discussion on integral equations that ex-
hibit strange behavior.

5.1 Introduction and Classifications


This section focuses on integral equations when the integration is with respect to
a single variable. Higher-order generalization is straightforward and unproblematic.
Differential equations and integral equations differ significantly in that the former
are about the local behavior of a system, while the latter are about global behavior.
Local behavior is often easier to explain and grasp intuitively.

DOI: 10.1201/9781003449881-5 321


322 Integral Equations

Definition 5.1 An integral equation in the unknown function y(x) is a relation of the
form 
y(x) = f (x) + K(x, ξ )y(ξ )dξ (5.1)

in which y(x) appears in the integrand, where K(x, ξ ) is a function of two variables
x and ξ and referred to as the kernel of the integral equation.
Note that we purposefully omitted the limits of integration from the formulation
above because, in most circumstances, they determine the sort of integral equation
we have. In (5.1) the functions f and K are given and satisfy continuity conditions
and perhaps others. The following are examples of integral equations.
 x
y(x) = sin(2x) + (x3 + ξ x + 1)y(ξ )dξ ,
0

and  1
ex y(x) = sin(xξ )y(ξ )dξ .
0
In this chapter we discuss the Fredholm equation of the first kind
 b
α(x)y(x) = K(x, ξ )y(ξ )dξ , (5.2)
a

and the Fredholm equation of the second kind


 b
y(x) = f (x) + λ K(x, ξ )y(ξ )dξ . (5.3)
a

Fredholm integral equations given by (5.2) and (5.3) have the unique property of
having finite limits of integration ξ = a, and ξ = b. In addition to discussing Fredhom
equations, we will discuss Volterra equations of first kind and second kind given by
 x
y(x) = K(x, ξ )y(ξ )dξ , (5.4)
a
and  x
y(x) = f (x) + λ K(x, ξ )y(ξ )dξ , (5.5)
a
respectively. In later sections we will develop particular methods to solve integral
equations with specific characteristics. Without worrying about technicality, we try
to define a sequence of functions {yn } successively for (5.5), with λ = 1 by set-
ting

y0 (x) = f (x)
 x
y1 (x) = f (x) + K(x, ξ )y0 (ξ )dξ
0 x
y2 (x) = f (x) + K(x, ξ )y1 (ξ )dξ (5.6)
0
Introduction and Classifications 323
..
.  x
yn (x) = f (x) + K(x, ξ )yn−1 (ξ )dξ , n = 1, 2, . . . (5.7)
0

This method is referred to as the successive approximation method. To illustrate the


above procedure we provide the following example.
Example 5.1 Consider the Volterra integral equation
 x
y(x) = 1 − (x − ξ )y(ξ )dξ .
0

Setting y0 (x) = 1, we have the recurrent formula


 x
yn (x) = 1 − (x − ξ )yn−1 (ξ )dξ .
0

For n = 1 we have
 x
x2
y1 (x) = 1 − (x − ξ )(1)dξ = 1 − .
0 2
ξ2
For n = 2, with y1 (ξ ) = 1 − 2 ,
 x
ξ2 x2 x4
y2 (x) = 1 − (x − ξ )(1 − )dξ = 1 − + .
0 2 2 24
2 4
Similarly, for n = 3 with y2 (ξ ) = 1 − ξ2 + ξ24 , we have
 x
ξ2 ξ4 x2 x4 x6
y3 (x) = 1 − (x − ξ )( + )dξ = 1 − + − .
0 2 24 2 24 720
A continuation of this method leads to the sequence of functions
n
x2 x4 x6 (−1)k x2k
yn (x) = 1 − + − +··· = ∑ .
2 24 720 k=0 (2k)!

Note that
n
(−1)k x2k
lim yn (x) = lim ∑ = cos(x).
k=0 (2k)!
n→∞ n→∞

We leave it to the students to verify, using either, the Laplace transform or by direct
substitution that y(x) = cos(x) is indeed a solution of the integral equation. □

5.1.1 Exercises
3
Exercise 5.1 By direct substitution, show that y(x) = (1 + x2 )− 2 is a solution of the
Volterra integral equation
 x
1 ξ
y(x) = − y(ξ )dξ .
1 + x2 0 1 + x2
324 Integral Equations

Exercise 5.2 By direct substitution, show that y(x) = cos(x) is a solution of the
Volterra integral equation
 x
y(x) = 1 − (x − ξ )y(ξ )dξ .
0

Exercise 5.3 By direct substitution, show that y(x) = (x + 1)2 is a solution of the
Volterra integral equation
 x
y(x) = e−x + 2x + eξ −x y(ξ )dξ .
0

Exercise 5.4 Use the method of successive approximation and show the solution of
the Volterra integral equation
 x
y(x) = 1 + (x − ξ )y(ξ )dξ
0

is y(x) = cosh(x).
Exercise 5.5 Use the method of successive approximation and show the solution of
the Fredholm integral equation
 1
1
y(x) = x + (ξ − x)y(ξ )dξ
2 −1

is y(x) = 34 x + 14 .

5.2 Connection between Ordinary Differential Equations and In-


tegral Equations
We begin this section by addressing existence and uniqueness of initial value problem
and its relationship between integral equations. Thus, we consider the (IVP)
x′ = f (t, x), x(t0 ) = x0 (5.8)
where we assume throughout this section that f : D → R is continuous and D is a
subset of R × R. In the case the differential equation given by (5.8) is linear, then a
solution can be found. However, in general this approach is not feasible when the
differential equation is not linear and hence another approach must be indirectly
adopted that establishes the existence of a solution of (5.8). For the development
of the existence theory we need a broader definition of the Lipchitz condition that we
state next.
Definition 5.2 The function f : D → R is said to satisfy a global Lipschitz condition
in x if there exists a Lipschitz constant k > 0 such that
| f (t, x) − f (t, y)| ≤ k|x − y|, for (t, x), (t, y) ∈ D. (5.9)
Connection between Ordinary Differential Equations and Integral Equations 325

Definition 5.3 The function f : D → R is said to satisfy a local Lipschitz condition


in x if for any (t1 , x1 ) ∈ D there exists a domain (t1 , x1 ) ∈ D1 ⊂ D and that f (t, x)
satisfies a Lipschitz condition in x on D1 . That is, there exists a positive constant K1
such that
| f (t, x) − f (t, y)| ≤ K1 |x − y|, for (t, x), (t, y) ∈ D1 . (5.10)

Definition (5.4) can be easily extended to functions f : D → Rn , where D ⊂ R × Rn


under a proper norm. Let R be any rectangle in D such that R = {(t, x) : |t| ≤ a, |x| ≤
b}. If we assume that f and ∂∂ xf are continuous on R, which is the case in this chapter,
then f and ∂∂ xf are bounded on R. Therefore, there exists positive constants M and
K such that ∂ f
| f (t, x)| ≤ M and ≤ K (5.11)

∂x
for all points (t, x) in R. Now for any two points (t, x1 ), (t, x2 ) in R, by the mean value
theorem there exists a constant c ∈ (x1 , x2 ) such that

∂f
f (t, x1 ) − f (t, x2 ) = (t, c)(x1 − x2 ),
∂x
form which it follows that
∂ f
| f (t, x1 ) − f (t, x2 )| ≤ (t, c) |x1 − x2 |

∂x
≤ K |x1 − x2 |. (5.12)
∂f
We have shown that if f and ∂y are continuous on R, then f satisfies a global Lips-
chitz condition on R.
We state the following definition regarding solutions of (IVP) and integral equa-
tions.
Definition 5.4 We say x is a solution of (5.8) on an interval I, provided that x : I → R
is differentiable, (t, x(t)) ∈ D, for t ∈ I, x′ (t) = f (t, x(t)) for t ∈ I, and x(t0 ) = x0 for
(t0 , x0 ) ∈ D.
In preparation for the next theorem, we observe that the (IVP) (5.8) is related
to  t
x(t) = x0 + f (s, x(s))ds. (5.13)
t0

Relation (5.13) is an integral equation since it contains an integral of the unknown


function x. This integral is not a formula for the solution, but rather it provides an-
other relation which is satisfied by solution of (5.8).
Definition 5.5 We say x : I → R is a solution of the integral equation given by (5.13)
on an interval I, provided that t0 ∈ I, x is continuous on I, (t, x(t)) ∈ D, for t ∈ I, and
(5.13) is satisfied for t ∈ I.
The next theorem is fundamental for the proof of the existence theorems.
326 Integral Equations

Theorem 5.1 Let D be an open subset of R2 and (t0 , x0 ) ∈ D. Then x is a solution


of (5.8) on an interval I if and only if x satisfies the integral equation given by (5.13)
on I.

Proof Let x(t) be a solution of (5.8) on an interval I. Then t0 ∈ I, x is differentiable


on I, and hence x is continuous on I. Moreover, (t, x(t)) ∈ D, for t ∈ I, with x(t0 ) = x0 ,
and x′ (t) = f (t, x(t)) for t ∈ I. Now an integration of x′ (t) = f (t, x(t)) from t0 to t
gives (5.13) for t ∈ I. For the converse, if x satisfies (5.13) for t ∈ I, then t0 ∈ I and x
is continuous on I. Moreover, (t, x(t)) ∈ D, for t ∈ I, and (5.13) is satisfied for t ∈ I.
Thus x(t) is differentiable on I. By differentiating t(5.13) with respect to t we arrive at
x′ (t) = f (t, x(t)) for all t ∈ I and x(t0 ) = x0 + t00 f (s, x(s))ds = x0 . This completes
the proof.

For a reference, we mention Picard’s local existence and uniqueness theorem,using


successive approximations.
Theorem 5.2 (Picard’s Local Existence and Uniqueness) Let D ⊂ R × R be defined
by
D = {(t, x) : |t − t0 | ≤ a, |x − x0 | ≤ b},
where a and b are positive constants. Assume f ∈ C(D, R) and f satisfies the Lips-
chitz condition (5.9). Let
M = max | f (t, x)| (5.14)
(t,x)∈D

and set
b
}.
h = min{a, (5.15)
M
Then the (IVP) (5.8) has a unique solution denoted by x(t,t0 , x0 ) on the interval
|t − t0 | ≤ h and passing through (t0 , x0 ). Furthermore,
|x(t) − x0 | ≤ b, for |t − t0 | ≤ h.

The next lemma is convenient when converting a second-order differential equation


into an integral equation of Volterra type.
Lemma 15 Suppose F(x) is continuous function on [a, ∞). Then
 x ξ  x
F(t)dtdξ = (x − t)F(t)dt. (5.16)
a a a

Proof The proof involves changing the limits of integration. From Fig. 5.1, we define
the shaded region by
D = {(t, ξ ) : t ≤ ξ ≤ x, a ≤ t ≤ x}.
Then
 x ξ  
F(t)dtdξ = F(t)dξ dt
a a D
Connection between Ordinary Differential Equations and Integral Equations 327

ξ
t
=
ξ
x

a x t

FIGURE 5.1
Shaded region of integrations.

 x x
= F(t)dξ dt
a x t
 x
= F(t) dξ dt
a x t
 x 
= F(t) dξ dt
a x t

= (x − t)F(t)dt.
a

This completes the proof.

We provide the following example.


Example 5.2 In this example, we show that the second-order differential equation

y′′ + Ay′ + By = 0, y(0) = y(1) = 0, (5.17)


1
where A and B are constants, leads to the integral equation y(x) = 0 K(x, ξ )y(ξ ) dξ ,
where (
Bx(1 − x) + Ax − A for ξ < x,
K(x, ξ ) = (5.18)
Bx(1 − ξ ) + Ax for ξ > x.
We begin by integrating all the terms in (5.17)
 x  x  x
y′′ (ξ ) dξ + A y′ (ξ ) dξ + B y(ξ ) dξ = 0,
0 0 0

which implies that


x x  x


y (ξ ) + Ay(ξ ) + B
y(ξ ) dξ = 0.
0 0 0
328 Integral Equations

Or,  x
y′ (x) − y′ (0) + Ay(x) − Ay(0) + B y(ξ ) dξ = 0.
0
We are given y(0) = 0 and so
 x
′ ′
y (x) − y (0) + Ay(x) + B y(ξ ) dξ = 0. (5.19)
0

By integrating (5.19), we get


 x  x  x  x s
y′ (ξ ) dξ − y′ (0) dξ + A y(ξ ) dξ + B y(ξ ) dξ ds = 0.
0 0 0 0 0

By Lemma 15, we see that


 x s  x
y(ξ ) dξ ds = (x − ξ )y(ξ ) dξ ,
0 0 0

and so,
 x  x

y(x) − y(0) − xy (0) + A y(ξ ) dξ + B (x − ξ )y(ξ ) dξ = 0.
0 0

As y(0) = 0, then
 x  x
y(x) − xy′ (0) + A y(ξ ) dξ + B (x − ξ )y(ξ ) dξ = 0. (5.20)
0 0

Now, by letting x = 1 and making use of y(1) = 0 in the above equation we obtain
 1  1
y′ (0) = A y(ξ ) dξ + B (1 − ξ )y(ξ ) dξ ,
0 0

or  1

y (0) = [A + B − Bξ ]y(ξ ) dξ . (5.21)
0
Finally, substitute (5.21) into (5.20) to get
 1   x  x
y(x) − x [A + B − Bξ ]y(ξ ) dξ + A y(ξ ) dξ + B (x − ξ )y(ξ ) dξ = 0,
0 0 0
(5.22)
or  
1 x
y(x) = [Ax + Bx − Bxξ ]y(ξ ) dξ − [A + Bx − Bξ ]y(ξ ) dξ .
0 0
Since, 0 ≤ x ≤ 1, we may use
 1  x  1
= +
0 0 x
Connection between Ordinary Differential Equations and Integral Equations 329

and rewrite
 x  1
y(x) = [Ax + Bx − Bxξ ]y(ξ ) dξ + [Ax + Bx − Bxξ ]y(ξ ) dξ
0 x x

− [A + Bx − Bξ ]y(ξ ) dξ
0
 x  1
= [Ax − A − Bxξ + Bξ ]y(ξ ) dξ + [Ax + Bx − Bxξ ]y(ξ ) dξ
0 x x

= [Bξ (1 − x) + Ax − A]y(ξ ) dξ
0
 1
+ [Ax + Bx(1 − ξ )]y(ξ ) dξ . (5.23)
x

By observing that the first term and second term in (5.23) are valid over 0 < ξ < x
and x < ξ < 1, respectively, we conclude that (5.23) can be written as
 1
y(x) = K(x, ξ )y(ξ ) dξ ,
0

which is a Fredholm equation of first kind where


(
Bξ (1 − x) + Ax − A for ξ < x,
K(x, ξ ) =
Bx(1 − ξ ) + Ax for ξ > x.


The next lemma is essential when differentiating an integral equation. It is referred
to as Leibnitz formula
Lemma 16 Suppose α(x), β (x) are continuous such that ∂∂αx and ∂∂βx exist. If F is
continuous in both variables and its first partial derivatives exist, then
 β (x)  β (x)
d ∂F
F(x, ξ )dξ = (x, ξ )dξ
dx α(x) α(x) ∂x
+ F(x, β (x))β ′ (x) − F(x, α(x))α ′ (x) (5.24)

Proof Let  β (x)


φ (α, β , x) = F(x, ξ )dξ ,
α(x)

with
∂f
= F(x, ξ ).
∂y
Then
φ (α, β , x) = f (x, β (x)) − f (x, α(x)).
330 Integral Equations

Using the chain rule, we arrive at

dφ ∂φ ∂φ ∂β ∂φ ∂α
= + + . (5.25)
dx ∂x ∂β ∂x ∂α ∂x
Moreover,  
β (x) β (x)
∂φ ∂ ∂F
= F(x, ξ )dξ = (x, ξ )dξ ,
∂x ∂x α(x) α(x) ∂x

since computing ∂∂φx means that α and β are kept constants. On the other hand, using
φ (α, β , x) = f (x, β (x)) − f (x, α(x)), we get

∂φ ∂ f (x, β ) ∂ f (x, α)
= − = 0 − F(x, α).
∂α ∂α ∂α
Similarly,
∂φ ∂ f (x, β ) ∂ f (x, α)
= − = F(x, β ) − 0.
∂β ∂β ∂β
Substituting the last three expressions into (5.25), we arrive at
 β (x)
dφ ∂β ∂α
= F(x, ξ )dξ + F(x, β ) − F(x, α) ,
dx α(x) ∂x ∂x

which is (5.25). This completes the proof.


Example 5.3 Consider the integral equation
 x  1
u(x) = λ (1 − x)ξ u(ξ )dξ + λ x(1 − ξ )u(ξ )dξ . (5.26)
0 x

Then (5.26) can be written as


 1
u(x) = λ K(x, ξ )u(ξ )dξ
0

where K(x, ξ ) is defined by the relations


(
ξ (1 − x) when ξ ≤ x ≤ 1,
K(x, ξ ) =
x(1 − ξ ) when 0 ≤ x ≤ ξ .

It is clear that the kernel K is symmetric and from (5.26) we have that u(0) = u(1) =
0. Moreover, using Lemma 16, we have
 x
u′ (x) = λ (1 − x)xu(x) − λ ξ u(ξ )dξ
0
 1
− λ (1 − x)xu(x) + λ (1 − ξ )u(ξ )dξ
x
Connection between Ordinary Differential Equations and Integral Equations 331
 x  1
= −λ ξ u(ξ )dξ + λ (1 − ξ )u(ξ )dξ .
0 x

Differentiating one more time gives

u′′ (x) = −λ xu(x) − λ (1 − x)u(x)


= −λ u(x).

Thus, the integral equation satisfies the second-order boundary value problem

u′′ (x) + λ u(x) = 0, 0 < x < 1,

u(0) = 0, u(1) = 0.
The boundary value problem is a Sturm-Liouville problem and we refer you to Chap-
ter 4, Section 4.14. Note that for λ ≤ 0, the problem only has the trivial solution. For
λ > 0, we let λ = α 2 , α ̸= 0. Then the problem has the solution

u(x) = c1 cos(αx) + c2 sin(αx).

Applying the first boundary condition we immediately have c1 = 0. Now, applying


u(1) = 0, we have c2 sin(α) = 0. To avoid a trivial solution, we set sin(α) = 0,
and from which we get α = nπ, n = 1, 2, . . . Thus, the problem has the eigenvalues
λn = n2 π 2 , with corresponding eigenfunctions

un (x) = sin(nπx), n = 1, 2, . . .

5.2.1 Exercises
Exercise 5.6 Show that f (t, x) = x2/3 does not satisfy Lipschitz condition in the rect-
angle R = {(t, x) : |t| ≤ 1, |x| ≤ 1}.
Exercise 5.7 Use integration by parts to prove (5.16) of Lemma 15.
Exercise 5.8 (a) If y′′ (x) = F(x), and y satisfies the initial condition y(0) = y0 and
y′ (0) = y′0 , show that
 x
y(x) = (x − ξ )F(ξ ) dξ + y′0 x + y0 .
0

(b) Verify that this expression satisfies the prescribed differential equation and initial
conditions.
Exercise 5.9 (a) If y′′ (x) = F(x), and y satisfies the end conditions y(0) = 0 and
y(1) = 0, show that
 1
y(x) = K(x, ξ )F(ξ ) dξ
0
332 Integral Equations

where K(x, ξ ) is defined by the relations


(
ξ (x − 1) when ξ < x,
K(x, ξ ) =
x(ξ − 1) when ξ > x.

(b) Verify directly that the expression obtained satisfies the prescribed differential
equation and end conditions.
Exercise 5.10 Verify the integral equation
 t
8
y(t) = 1 + t − (ξ − t)3 y(ξ )dξ ,
3 0

is a solution of the fourth order differential equation

y(4) (t) − 16y(t) = 0, y(0) = y′ (0) = 1, y′′ (0) = y′′′ (0) = 0.

Exercise 5.11 Reduce the integral equation


 ∞
y(x) = λ e|x−ξ | y(ξ )dξ ,
0

to a differential equation.
Exercise 5.12 Write the second-order nonhomogenous differential equation y′′ (x) =
λ y(x) + g(x), x > 0 that satisfies the initial conditions y(0) = y′ (0) = 0 into an inte-
gral equation.
Exercise 5.13 Show that the second-order boundary value problem

y′′ (x) = λ y(x), y(a) = y(b) = 0, a < x < b

can be written in the form


 b
y(x) = λ K(x, ξ )y(ξ ) dξ
a

where K(x, ξ ) is defined by the relations


(
(x−b)(ξ −a)
b−a when ξ ≤ x ≤ b,
K(x, ξ ) = (x−a)(ξ −b)
b−a when a ≤ x ≤ ξ .

Exercise 5.14 Show that the second-order differential equation

y′′ + A(x)y′ + B(x)y = g(x), y(a) = a1 , y(b) = b1 ,

where A and B are differentiable functions on (a, b) and g is continuous, leads to the
integral equation  a
y(x) = f (x) + K(x, ξ )y(ξ ) dξ ,
b
Connection between Ordinary Differential Equations and Integral Equations 333

where
 x  b
x−a 
f (x) = a1 + (x − ξ )g(ξ )dξ + b1 − a1 − (b − ξ )g(ξ )dξ ,
a b−a a

and  
A(ξ )−(a−ξ )(A′ (ξ )−B(ξ )
 (x−b)
b−a when ξ ≤ x ≤ b,
K(x, ξ ) = 
 (x−a) A(ξ )−(b−ξ )(A′ (ξ )−B(ξ )
b−a when a ≤ x ≤ ξ .
Exercise 5.15 Find the solution of the Volterra integral equation
 x
y(x) = 1 − x − 4 sin(x) + [3 − 2(x − ξ )]y(ξ )dξ ,
0

by transforming it into a nonhomogeneous second-order differential equation. In or-


der to solve for the solution you need to compute y(0) and y′ (0).
Exercise 5.16 [Only if you covered Chapter 4.]
Find the necessary Euler-Lagrange condition for a function y to be a local minimum
of the functional
 b b  b  b
L(y) = C(t, s)y(s)y(t)dsdt + y2 (t)dt − 2 y(t) f (t)dt,
a a a a

where y(a) and y(b) are fixed.


b
Answer: a [C(s,t) + C(t, s)]y(s)ds + 2y(t) = 2 f (t), which is a Fredholm integral
equation.
Exercise 5.17 For constants b and c suppose the continuous function h(x) is a solu-
tion to the differential equation

h′′ (x) + bh′ (x) + ch(x) = 0, h(0) = 0, h′ (0) = 1.

Show that if  x
y(x) = h(x − ξ )g(ξ ) dξ ,
0
then y(x) solves the nonhomogeneous second-order differential equation

y′′ (x) + by′ (x) + cy(x) = g(x), y(0) = y′ (0) = 0.

Exercise 5.18 Find all continuous functions y = y(x) that satisfy the relation
 x  x
t y(t)dt = (t + 1) ty(t)dt.
0 0

Hint: Differentiate twice to obtain a first-order differential equation that can be


solved by separation of variables.
334 Integral Equations

5.3 The Green’s Function


The Poisson’s equation, ∇2 u = f , for the electric potential u defined inside a con-
fined volume with certain boundary conditions on the volume’s surface first appeared
in George Green’s work in 1828, which is when the Green’s function first appeared.
He introduced a function that is now known as the Green’s function, as later defined
by Riemann. We shall develop methods on finding Green’s functions for initial value
problems as well as boundary value problems. In the previous section, we reduced
initial value problems and boundary value problems into Fredholm integral equations
where the kernels K(x, ξ ) are actually the Green’s functions. The Green’s function
play a major role in setting up nonlinear boundary value problems as integral equa-
tions and then some known methods are used to deduce qualitative results regarding
solutions. Before we embark on finding the Green’s function we prove an important
result concerning self-adjoint second-order differential operators.
Consider the differential operator
d dz 
Lλ z := p(x) + [q(x) + λ ρ(x)]z, (5.27)
dx dx
which is associated with the well-knowm Sturm-Liouville problem.
Definition 5.6 The differential operator L given by (5.27) is self-adjoint if there
exists a continuously differentiable function g such that

wLλ z − zLλ w dx = dg, (5.28)
where z and w satisfy Lλ z = Lλ w = 0. In other words, wLλ z − zLλ w dx must be


exact.
We have the following lemma.
Lemma 17 The differential operator L given by (5.27) is self-adjoint.

Proof Let z and w satisfy L z = L w = 0. Then


d dz 
wLλ z − zLλ w = w p(x) + w[q(x) + λ ρ(x)]z
dx dx
d dw 
− z p(x) − z[q(x) + λ ρ(x)]w
dx dx
d dz  d dw 
= w p(x) −z p(x)
dx dx dx dx
= wpz′′ + wp′ z′ − zpw′′ − zp′ w′
= p[wz′′ − zw′′ ] + p′ [wz′ − zw′ ]. (5.29)
On the other-hand, after simple calculation we find that
d
{p(x)[wz′ − zw′ ]} = p[wz′′ − zw′′ ] + p′ [wz′ − zw′ ].
dx
The Green’s Function 335

This implies
d
p′ [wz′ − zw′ ] = {p(x)[wz′ − zw′ ]} − p[wz′′ − zw′′ ].
dx
By substituting the above term into (5.29) we obtain
d
wLλ − zLλ = {p(x)[wz′ − zw′ ]}.
dx
Or,
wLλ z − zLλ w dx = d{p(x)[wz′ − zw′ ]} := dg.

(5.30)
This completes the proof.

We continue by presenting methods for the construction of the Green’s func-


tion.
Consider the second-order differential equation

L y + Φ(x) = 0, (5.31)

where L is the differential operator

d d d2 dp d
L := p(x) + q(x) = p 2 + + q, (5.32)
dx dx dx dx dx
together with homogeneous boundary conditions of the form
dy dy
αy(a) + β (a) = 0, αy(b) + β (b) = 0, (5.33)
dx dx
for some constants α and β . It is assumed that the function p(x) is continuous and
that p(x) ̸= 0 for all x ∈ (a, b). Also p′ (x) and q(x) are continuous on (a, b). The
function Φ(x) may depend on x and y(x); that is

Φ(x) = ϕ(x, y(x)).

Note that the differential operator defined by (5.32) is the same as the one defined by
(5.27) when λ = 0. We attempt to find a Green function, denoted with G(x, ξ ) and
given by (
G1 (x, ξ ) when x < ξ
G(x, ξ ) = (5.34)
G2 (x, ξ ) when x > ξ ,
and satisfies the following four properties:
(i) The functions G1 and G2 satisfy the equation L G = 0; that is L G1 = 0 when
x < ξ and L G2 = 0 when x > ξ .
(ii) The function G satisfies the homogeneous conditions prescribed at the end points
x = a, and x = b; that is G1 satisfies the condition prescribed at x = a, and G2
satisfies the condition prescribed at x = b.
336 Integral Equations

(iii) The function G is continuous at x = ξ ; that is G1 (ξ ) = G2 (ξ ).


1
(iv) The derivative of G has a discontinuity of magnitude − p(ξ )
at the point x = ξ ;
that is
1
G′2 (ξ ) − G′1 (ξ ) = − . (5.35)
p(ξ )
Once we determine the Green’s function of (5.31) and (5.33), then the problem can
be transformed to the relation
 b
y(x) = G(x, ξ )Φ(ξ )dξ . (5.36)
a

Note that if Φ is constant or a function of x but not y(x) then (5.36) can be solved to
obtain the solution. However, if Φ has y(x) then (5.36) is an integral equation of the
form  b
y(x) = G(x, ξ )ϕ(ξ , y(ξ ))dξ ,
a
where y needs to be determined. We begin by determining the Green’s function G.
Let y = u(x) be a nontrivial solution of the homogeneous equation L y = 0 along with
dy
αy(a) + β dx (a) = 0. Similarly, we let y = v(x) be a nontrivial solution of the homo-
dy
geneous equation L y = 0 and αy(b) + β dx (b) = 0. Then (i) and (ii) are satisfied if
we write (
c1 u(x) when x < ξ
G(x, ξ ) = (5.37)
c2 v(x) when x > ξ ,
where c1 and c2 are constants. Condition (iii) yields

c1 u(ξ ) − c2 v(ξ ) = 0. (5.38)

Whereas, (iv) implies


1
c2 v′ (ξ ) − c1 u′ (ξ ) = − . (5.39)
p(ξ )
Equations (5.38) and (5.39) have a unique solution if the determinant

u(ξ ) v(ξ )
W [u(x), v(x)] = ′ = u(ξ )v′ (ξ ) − u′ (ξ )v′ (ξ ) =
̸ 0. (5.40)
u (ξ ) v′ (ξ )

Again, as in Chapter 1, W is referred to as the Wronskian of the solutions u(x) and


v(x) of the equation L y = 0. Since u and v are solutions to L y = 0, we have

(pu′ )′ + qu = 0 and (pv′ )′ + qv = 0.

By multiplying the second equation by u and the first equation by v, and subtracting
the results, there follows
u(pv′ )′ − v(pu′ )′ = 0.
The Green’s Function 337

But by Lemma 17, we have

u(pv′ )′ − v(pu′ )′ = [p(uv′ − vu′ )]′ ,

from which it follows that


[p(uv′ − vu′ )]′ = 0.

This results into


A
uv′ − vu′ =
p
for some constant A. In other words,

A
u(ξ )v′ (ξ ) − v(ξ )u′ (ξ ) = . (5.41)
p(ξ )

Multiplying (5.39) by −A and comparing the resulting expression with (5.41) we


obtain
v(ξ ) u(ξ )
c1 = − and c2 = − .
A A
Therefore, (5.37) takes the form
(
)
− v(ξ
A u(x) when x < ξ
G(x, ξ ) = ) (5.42)
− u(ξ
A v(x) when x > ξ ,

where the constant A is independent of x and ξ , and is uniquely determined by (5.41).


Note that in (5.42) the Green’s function G is symmetric. That is

G(x, ξ ) = G(ξ , x).

It turns out that the Green’s function of a self-adjoint operator is symmetric. Finally
substituting (5.42) into (5.36) the solution can be explicitly found to be
 b
y(x) = G(x, ξ )Φ(ξ )dξ
a
 x b 
u(ξ ) v(ξ )
= − v(x)Φ(ξ )dξ + − u(x)Φ(ξ )dξ
a A x A
 x  b 
1
= − u(ξ )v(x)Φ(ξ )dξ + v(ξ )u(x)Φ(ξ )dξ . (5.43)
A a x

Remark 21 The Green’s function for (5.31) and (5.33) is independent of the function
Φ(x). For example if (5.31) is replaced with

L y = f (x), (5.44)
338 Integral Equations

then the solution for (5.44) along with (5.33) is given by


 b
y(x) = G(x, ξ )(− f (ξ ))dξ , (5.45)
a

where G(x, ξ ) is given by (5.37) or (5.42).


Next we provide two examples for the purpose of better explaining the
method.
Example 5.4 Solve the second-order boundary value problem
L y + g(x) = 0, y(0) = y(l) = 0,
where L y = y′′ , using the method of Green’s function.
Let u(x) and v(x) be solutions of y′′ = 0, y(0) = 0, and y′′ = 0, y(l) = 0, respectively.
Then y′′ = 0 has the solution y(x) = ax + b. Using y(0) = 0, we obtain b = 0. We
may take u(x) = x by setting a = 1. Similarly, applying y(l) = 0 gives 0 = al + b.
This implies that b = −al. Substituting b into y(x) = ax + b gives, y(x) = a(x − l).
Thus, we may take v(x) = x − l. Note that W [u(x), v(x)] = l ̸= 0. Set
(
c1 x when x < ξ
G(x, ξ ) =
c2 (x − l) when x > ξ .

Using (5.41) with p = 1, we arrive at A = l. Hence,


l −ξ ξ
c1 = and c2 = − .
l l
Thus, (
x
G(x, ξ ) = l (l − ξ ) when x < ξ
ξ
l (l − x) when x > ξ ,
and the solution to the problem is
 l
y(x) = G(x, ξ )g(ξ )dξ .
0

If we take g(x) = x, then the solution is


 l
y(x) = G(x, ξ )ξ dξ
0
 x  l
ξ x
= (l − x)ξ dξ + (l − ξ )ξ dξ
0 l x l
1
xl 3 − lx3 .

=
6l

Here is another boundary value problem with homogeneous boundary conditions.
The Green’s Function 339

Example 5.5 In this example, we attempt to find the Green’s function for the second-
order boundary value problem with homogeneous boundary conditions

y′′ (x) = 0, 0 < x < 1, y(0) = 0, y(1) − 3y′ (1) = 0.

Let u(x) = A∗ + Bx and v(x) = C + Dx be solutions of y′′ = 0, y(0) = 0, and y′′ =


0, y(1) − 3y′ (1) = 0, respectively. Using y(0) = 0, we obtain A∗ = 0 and so we
may take u(x) = x by setting B = 1. Similarly, applying y(1) − 3y′ (1) = 0 gives
0 = C + D − 3D. This implies that C = 2D. Substituting C into y(x) = C + Dx gives,
y(x) = D(2 + x). Thus, we may take v(x) = 2 + x. Set
(
c1 x when x < ξ
G(x, ξ ) =
c2 (2 + x) when x > ξ .

Using (5.41) with p = 1, we arrive at A = −2. Hence,

2+ξ ξ
c1 = and c2 = .
2 2
Thus, (
1
G(x, ξ ) = 2 (2 + ξ )x when x < ξ
1
2 ξ (2 + x) when x > ξ ,
Note that G(x, ξ ) = G(ξ , x). □

5.3.1 Exercises
Exercise 5.19 Consider the second-order boundary value problem

y′′ (x) = 0, a < x < b, y(a) = y(b) = 0. (5.46)

(a) Show that (5.46) has the Green’s function


(
(x−a)(ξ −b)
b−a when a ≤ x ≤ ξ ≤ b
G(x, ξ ) = (x−b)(ξ −a)
b−a when a ≤ ξ ≤ x ≤ b.

(b) Show that for all x, ξ ∈ [a, b],

a−b
≤ G(x, ξ ) ≤ 0.
4

(c) Show that for all x, ξ ∈ [a, b],


 b
(b − a)2
|G(x, ξ )|dξ ≤ .
a 8
340 Integral Equations

(d) Show that for all x, ξ ∈ [a, b],


 b
b−a
|G′ (x, ξ )|dξ ≤ ,
a 2

where G′ = d
dx G.
Exercise 5.20 Use the Green’s function to solve the second-order boundary value
problem
y′′ (x) + x2 = 0, 0 < x < 1, y(0) = y(1) = 0.
Exercise 5.21 Use the Green’s function to solve the second-order boundary value
problem

e2x y′′ (x) + 2e2x y′ (x) = e3x , 0 < x < ln(2), y(0) = y(ln(2)) = 0.

Exercise 5.22 Use the Green’s function to solve the second-order boundary value
problem
y′′ (x) + ex = 0, a < x < b, y(a) = 0, y′ (b) = 0.
Exercise 5.23 (a) Show the Green’s function for the second-order boundary value
problem

y′′ (x) + α 2 y(x) = 0, 0 < x < 1, α ̸= 0, y(0) = y(1) = 0

is given by
( sin[α(1−ξ )] sin(αx)
α sin(α) when 0 ≤ x ≤ ξ ≤ 1
G(x, ξ ) = sin[α(1−x)] sin(αξ )
α sin(α) when a ≤ ξ ≤ x ≤ 1.

(b) Use part (a) to solve

y′′ (x) + y(x) = x, y(0) = y(π/2) = 0.

Hint: After simplifying and integrating by parts, the solution is found to be


π
y(x) = x − sin(x).
2
Exercise 5.24 Use the method of the Green’s function to solve

y′′ (x) + x = 0, 0 < x < 1, y(0) = 0, 2y(1) − y′ (1) = 0.

Exercise 5.25 Use the method of the Green’s function to solve

y′′ (x) + x = 0, 0 < x < 1, y(0) − 2y′ (0) = 0, 2y(1) − y′ (1) = 0.

Exercise 5.26 Make use of (5.43) to show that y given by (5.36) satisfies the bound-
ary value problem (5.31) and the boundary conditions given by (5.33).
Fredholm Integral Equations and Green’s Function 341

5.4 Fredholm Integral Equations and Green’s Function


In Section 5.3 we discussed the Green’s function of second-order differential equa-
tions of the form
L y + Φ(x) = 0 (5.47)
where L is be defined by (5.27). Our aim is to use the concept of Section 5.3 to
extend the notion of Green’s function to second-order differential equations of the
form
L y + λ ρ(x)y(x) = f (x), (5.48)
where L is given by (5.27). If we set

Φ(x) = λ ρ(x)y(x) − f (x)

then by the result of Section 5.3, the solution to the boundary value problem (5.47)
subject to the boundary conditions given by (5.33) is given by
 b  b
y(x) = λ G(x, ξ )ρ(ξ )y(ξ )dξ − G(x, ξ ) f (ξ )dξ , (5.49)
a a

where G is the Green’s function of L y = 0. If we let


 b
F(x) = − G(x, ξ ) f (ξ )dξ , (5.50)
a

then (5.49) becomes


 b
y(x) = F(x) + λ G(x, ξ )ρ(ξ )y(ξ )dξ , (5.51)
a

which is Fredholm integral equation with kernel

K(x, ξ ) = G(x, ξ )ρ(ξ ).

Next we try to put the Fredholm integral equation given by (5.51) in symmetric form
provided
p ρ(x) > 0 for x ∈ (a, b). Assume so and multiply both sides of (5.51) by
ρ(x) and arrive at

p p  bp p
ρ(x)y(x) = ρ(x)F(x) + λ ρ(x)ρ(ξ )G(x, ξ ) ρ(ξ )y(ξ )dξ .
a
p p
Letting z(x) = ρ(x)y(x) and g(x) = ρ(x)F(x), the preceding integral equation
reduces to the symmetric Fredhom integral equation
342 Integral Equations
 bp
z(x) = g(x) + λ ρ(x)ρ(ξ )G(x, ξ )z(ξ )dξ
a
 b
= g(x) + λ K(x, ξ )z(ξ )dξ . (5.52)
a

Note that p
K(x, ξ ) = ρ(x)ρ(ξ )G(x, ξ ) = K(ξ , x)
since G is symmetric.
Example 5.6 Let Ly be given by Example 5.4 and we want to reduce the boundary
value problem
L y + λ y = x, y(0) = y(l) = 0,
to a Fredholm integral equation. From the above discussion we have ρ(x) = 1, and
hence  l
g(x) = − G(x, ξ )ξ dξ ,
0
where from Example 5.4 we have
(
x
G(x, ξ ) = l (l − ξ ) when x < ξ
ξ
l (l − x) when x > ξ .

Thus,
 l
g(x) = − G(x, ξ )ξ dξ
0
h ξ x  l
x i
= − (l − x)ξ dξ + (l − ξ )ξ dξ
0 l x l
x  2 2
= x −l .
6
Hence, by (5.52) the solution is given by the Fredholm integral equation
 l
x  2 2
y(x) = x −l +λ G(x, ξ )y(ξ )dξ ,
6 0

since ρ(x) = 1. □

5.4.1 Exercises
Exercise 5.27 Reduce the boundary value problem

y′′ (x) − λ y = x2 , y(0) = y(l) = 0,

to a Fredholm integral equation.


Fredholm Integral Equations and Green’s Function 343

Exercise 5.28 Transform the boundary value problem

y′′ (x) + λ xy = 1, y(0) = y(l) = 0,

to the integral equation


 l
x x
y(x) = − (l − x) + λ G(x, ξ )ξ y(ξ )dξ ,
l 2 0

where G is given in Example 5.6.


Hint: take Φ(x) = λ xy(x) − 1.
Exercise 5.29 Reduce the boundary value problem

y′′ (x) + λ y(x) = x, 0 < x < 1, y(0) − 2y′ (0) = 0, 2y(1) − y′ (1) = 0

to a Fredholm integral equation.


Exercise 5.30 Let
d dy  1
Ly= x − y
dx dx x
(a) Show the Green’s function for

L y = 0, y(0) = 0, y(1) = 0

is (
x

(1 − ξ 2 ) when x < ξ
G(x, ξ ) = ξ 2
2x (1 − x ) when x > ξ .

(b) Find the integral solution of

L y + λ xy = 0, y(0) = 0, y(1) = 0.

5.4.2 Beam problem


In this section, we briefly discuss the oscillation of thin cantilever beams. Remem-
ber, a cantilever beam is a rigid structural element supported at one end and free at
the other. Cantilever construction allows overhanging structures without additional
supports and bracing. This structural element is widely used in the construction of
bridges, towers, and buildings and can add a unique beauty to the structure. Assume
we have a long balcony that is supported by an underneath beam that extends outward
from the wall. Let the origin be the point at which the beam is supported by the wall.
Let the x-axis be parallel to the undeflected beam with the z-axis upward. Assume
L is the length of the beam having uniform cross section under a load. Let w(x,t)
represent the distributed load or force acting on the beam in the negative z-direction.
Let u(x,t) designate the deflection of the beam from its equilibrium position (see
Fig. 5.2). Assume e is the modulus of elasticity of the beam’s material and I(x) is the
344 Integral Equations

w(x,t)
L
u(0,t) = 0 x
ux (0,t) = 0 u(x,t)

FIGURE 5.2
A thin beam undergoing a small deflections u(x,t) from equilibrium.

moment of inertia of the beam’s cross-sectional area about a point x. If M(x,t) is the
total bending moment produced by all the forces acting on the beam at point x, then
the differential equation of the elastic curve of a beam is found to be

∂ 2 u(x,t)
eI(x) = M(x,t). (5.53)
∂ x2
The bending moment is related to the applied load by the second-order partial differ-
ential equation
∂ 2 M(x,t)
= −w(x,t). (5.54)
∂ x2
Let’s decompose the applied load w(x,t) into an external applied component F(x,t)
2
and an internal inertia component ρ ∂ ∂u(x,t)
x2
, where ρ = ρ(x) is the linear mass density
of the beam at the point x. Thus, if we let

∂ 2 u(x,t)
w(x,t) = ρ + F(x,t),
∂ x2
then (5.53) and (5.54) yield

∂2 ∂ 2 u(x,t) ∂ 2 u(x,t)
 
2
eI(x) 2
+ ρ(x) = −F(x,t). (5.55)
∂x ∂x ∂ x2

Since the beam is fixed at the end point x = 0, while at the end at x = L is free, we
have the appropriate initial and boundary conditions

∂ 2 u(L,t)
u(0,t) = 0, = 0,
∂ x2
∂ u(0,t) ∂  ∂ 2 u 
= 0, eI 2 x=L = 0, (5.56)
∂x ∂x ∂x
∂ u(x, 0)
u(x, 0) = g(x), = h(x).
∂t
Fredholm Integral Equations and Green’s Function 345

If we assume harmonic oscillations in the sense that

F(x,t) = f (x) sin(ωt + θ ) and u(x,t) = y(x) sin(ωt + θ ),

then (5.55) can be easily reduced to the fourth order ordinary differential equa-
tion ′′
eIy′′ − ρω 2 y = − f . (5.57)
Consequently, the first four initial and boundary conditions given by (5.56) are re-
duced to

y(0) = 0, y′′ (L) = 0,


′
y′ (0) = 0, eIy′′ = 0.
x=L
(5.58)

If G is the Green’s function of ′′


eIy′′ = 0, (5.59)
then the solution of (5.57) and (5.58) is given by the integral relation
 L
G(x, ξ ) − ω 2 ρ(ξ )y(ξ ) + f (ξ ) dξ .
 
y(x) =
0

Next, we briefly discuss how to compute the Green’s function for the fourth-order
ordinary differential operator. Consider the fourth-order differential equation

L y + Φ(x) = 0, (5.60)

where L is the differential operator

d2 d2y 
(L y)(x) := p(x) + q(x)y(x) = 0, x ∈ [a, b] (5.61)
dx2 dx2
together with homogeneous boundary conditions of the form

R1 y := α1 y(a) + α2 y′ (a) + α3 p(a)y′′ (a) + α4 (py′′ )′ (a) = 0


R2 y := β1 y(a) + β2 y′ (a) + β3 p(a)y′′ (a) + β4 (py′′ )′ (a) = 0
R3 y := γ1 y(b) + γ2 y′ (b) + γ3 p(b)y′′ (b) + γ4 (py′′ )′ (b) = 0 (5.62)
R4 y := η1 y(b) + η2 y′ (b) + η3 p(b)y′′ (b) + η4 (py′′ )′ (b) = 0

for some constants αi , βi , γi , and ηi , i = 1, 2, 3, 4 are real constants. It is assumed that


the function p ∈ C2 [a, b] with p(x) > 0 on [a, b] and q and is continuous on [a, b]. We
proceed as before in finding the Green’s function for (5.61) and (5.62).
We attempt to find a Green function, denoted with G(x, ξ ) and given by
(
G1 (x, ξ ) when x < ξ
G(x, ξ ) = (5.63)
G2 (x, ξ ) when x > ξ ,

and satisfies the following four properties:


346 Integral Equations

(i) The functions G1 and G2 satisfy the equation L G = 0; that is L G1 = 0 when


x < ξ and L G2 = 0 when x > ξ .
(ii) The function G satisfies the homogeneous conditions prescribed at the end points
x = a, and x = b; that is G1 satisfies R1 y, R2 y and G2 satisfies satisfies R3 y, R4 y.
(iii) The function G is continuous at x = ξ ; that is

G1 (ξ ) = G2 (ξ ),
d d
G1 (ξ ) = G2 (ξ ),
dx dx
d2 d2
G1 (ξ ) = G2 (ξ ).
dx2 dx2
1
(iv) The third derivative of G has a discontinuity of magnitude − p(ξ )
at the point
x = ξ ; that is

d3 d3 1
3
G2 (ξ ) − 3
G1 (ξ ) = − . (5.64)
dx dx p(ξ )
Once we determine the Green’s function of (5.61) and (5.62), then the problem can
be transformed to the relation
 b
y(x) = G(x, ξ )Φ(ξ )dξ . (5.65)
a

Our interest is to use Green’s function to solve the beam problem when the inertia
I(x) is constant subject to the initial and boundary conditions given by (5.58). Thus,
we consider the boundary value problem given by (5.59) and the initial and boundary
conditions given by (5.58). Using (5.63) we obtain from
′′
eIy′′ = 0

that
(
1 A1 + A2 x + A3 x2 + A4 x3 when x < ξ
G(x, ξ ) = (5.66)
eI B1 + B2 (x − L) + B3 (x − L)2 + B4 (x − L)3 when x > ξ .

Applying y(0) = 0, y′ (0) = 0 we readily have A1 = A2 = 0. Similarly, applying the


′
boundary conditions y′′ (L) = 0 and eIy′′ |x=L = 0, leads to B3 = B4 = 0. Thus, so
far we have (
1 A3 x2 + A4 x3 when x < ξ
G(x, ξ ) = (5.67)
eI B1 + B2 (x − L) when x > ξ .

The jump condition given by (iv) yields G′′′ ′′′ 1


2 − G1 = − eI or

1 1
0− 6A4 = − ,
eI eI
Fredholm Integral Equations and Green’s Function 347

which implies that A4 = 16 . Next we apply the continuity condition given by (iii) and
obtain

A3 ξ 2 + A4 ξ 3 = B1 + B2 (ξ − L),
2A3 ξ + 3A4 ξ 2 = B2 ,
2A3 + 3A4 ξ = 0.

From the third equation we obtain A3 = − 12 ξ . On the other hand, the second equation
yields B2 = − 12 ξ 2 . Thereupon, from the first equation we arrive at

ξ2
B1 = (ξ − L).
2
The second part of the Green’s function B1 + B2 (x − L) reduces to

ξ2 ξ ξ2 ξ2 ξ
B1 + B2 (x − L) = ( − L) − (x − L) = ( − x).
2 3 2 2 3
Finally, the Green’s function takes the form
( 2
1 x2 3x − ξ

when x < ξ
G(x, ξ ) = 2 (5.68)
eI ξ ξ − x

2 3 when x > ξ .

Furthermore, the solution of the beam problem for constant inertia I takes the
form
 L
G(x, ξ ) − ω 2 ρ(ξ )y(ξ ) + f (ξ ) dξ
 
y(x) =
0
 x
1 ξ2 ξ
− x − ω 2 ρ(ξ )y(ξ ) + f (ξ ) dξ
 
=
eI 0 2 3
 L 3
1 x x2 ξ 
− ω 2 ρ(ξ )y(ξ ) + f (ξ ) dξ .

+ −
eI x 6 2

5.4.3 Exercises
Exercise 5.31 Find the Green’s function for
′′
eIy′′ = 0,

subject to the following initial and boundary conditions:


(a)

y(0) = 0, y(L) = 0,
y′ (0) = 0, y′ (L) = 0.
348 Integral Equations

(b)

y(0) = 0, y′ (L) = 0,
y′ (0) = 0, ′′′
y (L) = 0.

(c)

y(0) = 0, y(L) = 0,
y′′ (0) = 0, y′′ (L) = 0.

5.5 Fredholm Integral Equations with Separable Kernels


The ability to solve integral equations, is in most cases, depend on the type of the
kernels. In this section we investigate the Fredholm integral equation of the second
kind  b
y(x) = f (x) + λ K(x, ξ )y(ξ )dξ , (5.69)
a
where all functions are continuous on their respective domains. We begin with the
following definition.
Definition 5.7 The kernel K is said to be separable or degenerate if it can be written
in the form
n
K(x, ξ ) = ∑ αi (x)βi (ξ ). (5.70)
i=1

Throughout this section we assume K is separable. For example, the kernel K(x, ξ ) =
3 + 2xξ is separable since it can be written in the form K(x, ξ ) = ∑2i=1 αi (x)βi (ξ ),
where α1 (x) = 3, β1 (ξ ) = 1, α2 (x) = 2x, and β2 (ξ ) = ξ . Note that αi (x) and βi (ξ ) are
not unique. If we substitute (5.70) into (5.69) we arrive at the new expression
n  b
y(x) = f (x) + λ ∑ {βi (ξ )y(ξ )dξ }αi (x). (5.71)
i=1 a

Letting
 b
ci = βi (ξ )y(ξ )dξ , (5.72)
a
equation (5.71) simplifies to
n
y(x) = f (x) + λ ∑ ci αi (x). (5.73)
i=1
Fredholm Integral Equations with Separable Kernels 349

Note that the ci given by (5.72) are unknown constants. Once they are determined the
solution is given by (5.73). Multiplying (5.73) by β j (x) and integrating the resulting
expression with respect to x from a to b gives
 b  b n  b
β j (x)y(x)dx = β j (x) f (x)dx + λ ∑ ci β j (x)αi (x)dx, j = 1, 2, . . . n.
a a i=1 a
(5.74)
Interchanging i with j expression (5.74) can be written as
n
ci = fi + λ ∑ c j ai j , i = 1, 2, . . . n (5.75)
j=1

where ci is given by (5.72) and


 b  b
fi = βi (x) f (x)dx and ai j = βi (x)α j (x)dx. (5.76)
a a

In matrix form, equation (5.75) takes the form

(I − λ A)c = f (5.77)

where I is the identity matrix,

A = (ai j ), c = (c1 , c2 , . . . , cn )T , f = ( f1 , f2 , . . . , fn )T ,

where T denotes the transpose. Thus, (5.77) represents a system of n linear algebraic
equations for c. Before we embark on few examples, we recall some basic facts from
Chapter 3 about linear systems. Consider the linear system

Bx = b (5.78)

where B is an n × n matrix, b is a given n vector, and x is the unknown vector.


Lemma 18 Suppose b = 0. Then
̸ 0, implies (5.78) has only the trivial solution x = 0.
(a) det(B) =
(b) det(B) = 0, implies (5.78) has infinitely many solutions. Suppose b ̸= 0. Then
̸ 0, implies (5.78) has a unique solution.
(c) det(B) =
(d) det(B) = 0, implies (5.78) has no solution or infinitely many solutions.

Based on Lemma 18 we have the following Fredholm theorem.


Theorem 5.3 (Fredholm Theorem) Consider the Fredholm integral equation
(5.69) with a separable kernel K.
b
(i) Assume a βi (x) f (x)dx, j = 1, . . . n are not all zero.
(a) If det(I − λ A) =
̸ 0, then there exists a unique solution to (5.69) given by
(5.73), where c = (c1 , c2 , . . . , cn )T is the unique solution of (5.77).
350 Integral Equations

(b) If det(I −λ A) = 0, then either no solution exists or infinitely many solutions


exist.
b
(ii) Assume a βi (x) f (x)dx = 0, j = 1, . . . n.
(a) If det(I − λ A) ̸= 0, then (5.69) has the solution y = f (x).
(b) If det(I − λ A) = 0, then (5.69) has infinitely many solutions.

We provide two examples to illustrate the method of Fredholm equations with sepa-
rable kernels.
Example 5.7 Consider the homogeneous Fredholm equation
 1
y(x) = λ (4xξ − 5x2 ξ 2 )y(ξ )dξ . (5.79)
0

So we have
2
K(x, ξ ) = 4xξ − 5x2 ξ 2 = ∑ αi (x)β j (ξ ).
i=1

We may choose

α1 (x) = x, α2 (x) = x2 , β1 (ξ ) = 4ξ , β2 (ξ ) = −5ξ 2 .

Next we use  1
ai j = βi (x)α j (x)dx, i, j = 1, 2
0
to compute the matrix A = (ai j ).
 1  1
4
a11 = β1 (x)α1 (x)dx = 4x2 dx = ,
0 0 3
 1  1
a12 = β1 (x)α2 (x)dx = 4x3 dx = 1,
0 0
 1  1
5
a21 = β2 (x)α1 (x)dx = − 5x3 dx = − ,
0 0 4
 1  1
a22 = β2 (x)α2 (x)dx = − 5x4 dx = −1.
0 0
Hence we have matrix
4
 
3 1
A=
− 54 −1
and
1 − 4 λ

−λ 4 5
det(I − λ A) = 5 3 = (1 − λ )(1 + λ ) + λ 2 .
4λ 1+λ 3 4
Fredholm Integral Equations with Separable Kernels 351

If det(I − λ A) = 0, then we have the simplified quadratic equation

λ 2 + 4λ − 12 = 0

that has the two roots


λ = −6, 2.
• If det(I − λ A) ̸= 0; that is λ ̸= −6, 2, then by (b) of (ii) of Theorem 5.3 equation
(5.79) has only the trivial solution y(x) = 0, since f (x) = 0.
• Now we consider the case det(I − λ A) = 0; that is λ = −6, 2. Then by (ii) of
Theorem 5.3 equation (5.79) has infinitely many solutions which depend on the
value of λ . Now we determine the forms of solutions by computing the vector c
from the equation
(I − λ A)c = 0,
 
c
where c = 1 . The corresponding system of equations is
c2

4
(1 − λ )c1 − λ c2 = 0
3
5
λ c1 + (1 + λ )c2 = 0. (5.80)
4
Setting λ = −6 in (5.80) we arrive at 3c1 + 2c2 = 0, from either equation. Setting
c1 = a, implies that c2 = − 32 a, for nonzero constant a. Thus, using (5.73) with f = 0
we arrive at the infinitely many solutions
2 
y(x) = 0 + λ ∑ ci αi (x) = −6 c1 α1 (x) + c2 α2 (x)
i=1
3  3
= −6 ax − ax2 = −6a(x − x2 ).
2 2
In a similar manner if we substitute λ = 2 into (5.80) we arrive at 5c1 + 6c2 = 0,
from either equation. Setting c1 = b, implies that c2 = − 56 b, for nonzero constant b.
Thus, using (5.73) with f = 0 we arrive at the infinitely many solutions
2 
y(x) = 0 + λ ∑ ci αi (x) = 2 c1 α1 (x) + c2 α2 (x)
i=1
5  5
= 2 bx − bx2 = 2b(x − x2 ).
6 6

The next example illustrates the techniques for dealing with nonhomogeneous Fred-
holm integral equations.
352 Integral Equations

Example 5.8 Consider the nonhomogeneous Fredholm equation


 1
y(x) = f (x) + λ (4xξ − 5x2 ξ 2 )y(ξ )dξ . (5.81)
0

Notice that the kernel and hence the matrix A and the values of λ are the same as in
Example 5.7. We begin by addressing (i) of Theorem 5.3.
1
• If fi = 0 βi (x) f (x)dx, i = 1, 2 are not all zero; that is
 1  1
f1 = 4x f (x)dx ̸= 0, or f2 = − 5x2 f (x)dx ̸= 0,
0 0

and λ ̸= −6, 2, then (5.81) has a unique solution


2
y(x) = f (x) + λ ∑ ci αi (x) = f (x) + λ (c1 α1 (x) + c2 α2 (x)),
i=1

where c1 and c2 is the unique solution of the system


 1
4
(1 − λ )c1 − λ c2 = 4x f (x)dx
3 0
 1
5
λ c1 + (1 + λ )c2 = − 5x2 f (x)dx.
4 0

• If  
1 1
f1 = 4x f (x)dx ̸= 0, or f2 = − 5x2 f (x)dx ̸= 0, (5.82)
0 0
and λ = −6, then using (5.77) we arrive at the system
 1
9c1 + 6c2 = 4x f (x)dx
0
 1
15
c1 − 5c2 = − 5x2 f (x)dx.
2 0

Multiplying the second equation by 2 and then simplifying the resulting system we
arrive at the new system of equations
 1
1
3c1 + 2c2 = 4x f (x)dx
3 0
 1
−3c1 − 2c2 = −2 x2 f (x)dx.
0

Adding both equations leads to


 1  1
1
0= 4x f (x)dx − 2 x2 f (x)dx.
3 0 0
Fredholm Integral Equations with Separable Kernels 353

Thus, (5.81) has no solution if


 1  1
1
4x f (x)dx ̸= 2 x2 f (x)dx.
3 0 0

On the other hand, (5.81) has infinitely many solutions when


 1  1
1
4x f (x)dx = 2 x2 f (x)dx.
3 0 0

In such case to determine the solutions we let 3c1 = a and obtain 2c2 = −a +
1
2 0 x2 f (x)dx. This gives the solutions

2
y(x) = f (x) − 6 ∑ ci αi (x)
i=1
  1 
a −a
x2 f (x)dx x2

= f (x) − 6 x+ +
3 2 0

As for λ = 2 and assuming (5.82), we have the system


 1
5
− c1 − 2c2 = 4x f (x)dx
3 0
 1
5
c1 + 3c2 = − 5x2 f (x)dx.
2 0

By simplifying the above system we get


 1
−5c1 − 6c2 = 3 4x f (x)dx
0
 1
5c1 + 6c2 = −10 x2 f (x)dx.
0

Adding both equations leads to


 1  1
0 = 12 x f (x)dx − 10 x2 f (x)dx.
0 0

Thus, (5.81) has no solution if


 1  1
12 x f (x)dx ̸= 10 x2 f (x)dx.
0 0

On the other hand,(5.81) has infinitely many solutions when


 1  1
12 4x f (x)dx = 10 x2 f (x)dx.
0 0
354 Integral Equations

In such case to determine the solutions we let 5c1 = a and obtain 6c2 = −a −
1
10 0 x2 f (x)dx. This gives the solutions

2
y(x) = f (x) + 2 ∑ ci αi (x)
i=1

−a 5 1 2
 
a  2
= f (x) + 2 x + − x f (x)dx x .
5 6 3 0

Now we consider (ii) of Fredholm Theorem.


• Suppose
 1  1
4x f (x)dx = − 5x2 f (x)dx = 0. (5.83)
0 0
and λ ̸= −6, 2, then (5.81) has the unique solution

y(x) = f (x).

• Assume (5.83) with λ = −6. Setting λ = −6 in (5.80) we arrive at 3c1 + 2c2 = 0,


from either equation. Setting c1 = a, implies that c2 = − 32 a, for nonzero constant
a. Thus, using (5.73) we arrive at the infinitely many solutions
2 
y(x) = f (x) + λ ∑ ci αi (x) = −6 c1 α1 (x) + c2 α2 (x)
i=1
3  3 
= f (x) − 6 ax − ax2 = f (x) − 6a x − x2 .
2 2
In a similar manner if we substitute λ = 2 into (5.80) we arrive at 5c1 + 6c2 = 0,
from either equation. Setting c1 = b, implies that c2 = − 56 b, for nonzero constant
b. Thus, using (5.73) we arrive at the infinitely many solutions
2 
y(x) = f (x) + λ ∑ ci αi (x) = 2 c1 α1 (x) + c2 α2 (x)
i=1
5  5 
= f (x) + 2 bx − bx2 = f (x) + 2b x − x2 .
6 6

5.5.1 Exercises
Exercise 5.32 Solve the Fredholm equation
 1
y(x) = x2 + λ xξ y(ξ )dξ , for λ = −1.
0
Fredholm Integral Equations with Separable Kernels 355

Exercise 5.33 Consider the homogeneous Fredholm equation


 1
y(x) = λ (xξ 2 + x2 ξ )y(ξ )dξ .
0

(a) Find the matrix A and show the roots of the equation
det(I − λ A) = 0
are √ √
4 15 4 15
λ=√ , √ .
15 − 4 15 + 4
(b) Use (a) and discuss the solutions of the Fredholm equation.
(c) Find the solution for nonhomogeneous Fredholm equation
 1
y(x) = x + λ (xξ 2 + x2 ξ )y(ξ )dξ .
0

Exercise 5.34 Find all values of λ so that the nonhomogeneous Fredholm equation
 1
x
y(x) = e + λ xξ y(ξ )dξ
0

has a solution and find it.


Exercise 5.35 In light of Example 5.7 discuss the solutions of the homogeneous
Fredholm equation
 1
y(x) = λ (x + ξ )y(ξ )dξ .
−1
Exercise 5.36 (a) Show that the characteristic values of λ for the equation
 2π
y(x) = λ sin(x + ξ ) y(ξ ) dξ
0

are λ1 = 1/π and λ2 = −1/π, with corresponding characteristic functions of the


form y1 (x) = sin(x) + cos(x) and y2 (x) = sin(x) − cos(x).
(b) Obtain the most general solution of the equation
 2π
y(x) = λ sin(x + ξ ) y(ξ ) dξ + F(x)
0

when F(x) = x and when F(x) = 1, under the assumption that λ ̸= ±1/π.
(c) Prove that the equation
 2π
1
y(x) = sin(x + ξ ) y(ξ ) dξ + F(x)
π 0

possesses no solution when F(x) = x, but that it possesses infinitely many solu-
tions when F(x) = 1. Determine all such solutions.
356 Integral Equations

(d) Determine the most general form of the prescribed F(x), for which the integral
equation
 2π
sin(x + ξ ) y(ξ ) dξ = F(x),
0
of the first kind, possesses a solution.
Exercise 5.37 In light of Example 5.8, discuss the solutions of the nonhomogeneous
Fredholm equation
 1
y(x) = F(x) + λ (1 − 3xξ )y(ξ )dξ .
0

Exercise 5.38 In light of Example 5.8, discuss the solutions of the nonhomogeneous
Fredholm equation
 1
y(x) = F(x) + λ (x + ξ )y(ξ )dξ .
−1
In addition, find an example of F(x) that satisfies all the relevant condition(s) that
you obtain in studying the solutions.
Exercise 5.39 Solve
 1
y(x) = 1 + λ (x + 3x2 ξ )y(ξ )dξ .
0

Exercise 5.40 Solve


 1
y(x) = 1 + λ (18x + 4x2 ξ )y(ξ )dξ .
0

Exercise 5.41 Solve


 1
y(x) = 1 + (1 + ξ + 3xξ )y(ξ )dξ .
−1

5.6 Symmetric Kernel


In Section 5.5 we looked at Fredholm integral equations with degenerate or separable
kernels. Now, we consider the Fredholm integral equation
 b
y(x) = f (x) + λ K(x, ξ )y(ξ )dξ , (5.84)
a

where all functions are continuous on their respective domains. We assume the kernel
K in (5.84) is symmetric. That is

K(x, ξ ) = K(ξ , x).


Symmetric Kernel 357

Throughout this section it is assumed that the kernel K is symmetric. As in the case
of nonhomogeneous differential equations, first we learn how to find the solution of
the homogeneous integral equation
 b
y(x) = λ K(x, ξ )y(ξ )dξ , (5.85)
a
and then utilize it to find the general solution of (5.84).
Recall that, If λ and y(x) satisfy (5.85) we say λ is an eigenvalue and y(x) is the
corresponding eigenfunction. It should cause no confusion between λn being all the
eigenvalues of (5.85) and the value of λ for (5.84). In most cases, we will require
λ ̸= λn . We have the following theorem regarding eigenvalues and corresponding
eigenfunctions of (5.85).
Theorem 5.4 Assume the kernel of the homogeneous integral equation (5.85) is sym-
metric. Then the following statements hold.
(i) If λm and λn are two distinct eigenvalues, then the corresponding eigenfunctions
ym (x) and yn (x) are orthogonal on the interval [a, b]. That is
 b
ym (x)yn (x)dx = 0, for m ̸= n.
a

(ii) The eigenvalues are real.

Proof Let λm and λn be two distinct eigenvalues with corresponding eigenfunctions


ym (x) and yn (x), respectively. Then we have
 b
ym (x) = λm K(x, ξ )ym (ξ )dξ ,
a
and  b
yn (x) = λn K(x, ξ )yn (ξ )dξ .
a
b
Multiplying yn (x) = λn a K(x, ξ )yn (ξ )dξ by ym (x) and integrating the resulting
equation from a to b we get
 b  b  b
ym (x)yn (x)dx = λn ym (x) K(x, ξ )yn (ξ )dξ dx
a a a
 b  b 
= λn ym (x)K(x, ξ )dx yn (ξ )dξ
a a
 b  b 
= λn K(ξ , x)ym (x)dx yn (ξ )dξ (since K(x, ξ ) = K(ξ , x))
a a
 b
1 
= λn ym (ξ ) yn (ξ )dξ
a λm
 b
λn
= ym (ξ )yn (ξ )dξ .
λm a
358 Integral Equations

This gives the relation


 b
λn 
1− ym (x)yn (x)dx = 0. (5.86)
λm a

Since λn ̸= λm , then we must have from expression (5.86) that


 b
ym (x)yn (x)dx = 0.
a

This completes the proof of (i).


Next we prove (ii). Assume one of the eigenvalues, say λ is complex. That is λ =
α + iβ , where α and β are real numbers. Let y(x) be the corresponding eigenfunc-
tion that might be complex. The conjugate of λ is λ̄ = α − iβ . If y(x) = u(x) + iv(x),
then ȳ(x) = u(x) − iv(x) is the corresponding eigenfunction of λ̄ . Using a similar ar-
gument as in the proof of (i) and interchanging λm and λn with λ̄ and λ , respectively
we arrive at from (5.86) that
 b
λ̄ 
1− y(x)ȳ(x)dx = 0.
λ a

Since
y(x)ȳ(x) = u2 (x) + v2 (x) > 0,
the above expression takes the form
 b
λ̄ 
1− (u2 (x) + v2 (x))dx > 0,
λ a

which is not identically zero. Thus, our assumption that β ̸= 0, has led to a contra-
diction. Therefore, we must conclude that β = 0, and hence λ is real. This completes
the proof.
Example 5.9 See Example 5.3. □
Next, we develop the solution for the nonhomogeneous integral equation
(5.84).
We begin with Hilbert-Schmidt theorem.
Theorem 5.5 (Hilbert-Schmidt Theorem) Assume that there is a continuous func-
tion g for which
 b
F(x) = K(x, ξ )g(ξ )dξ .
a
Then F(x) can be expressed as

F(x) = ∑ cn yn (x),
n=1
Symmetric Kernel 359

where yn (x) are the normalized eigenfunctions of (5.85) and


 b
cn = F(x)yn (x)dx. (5.87)
a

As a result of Theorem 5.5, we may say the function F is generated by the continuous
function g.
Theorem 5.6 Let y(x) be a solution to (5.84) where λ is not an eigenvalue of (5.85).
Then

fn
y(x) = f (x) + λ ∑ yn (x), (5.88)
n=1 λn − λ

where  b
fn = f (x)yn (x)dx, (5.89)
a
and the λn and yn are the eigenvalues and normalized eigenfunctions of (5.85).

Proof From (5.84), we have


 b
y(x) − f (x) = K(x, ξ )(λ y(ξ ))dξ
a

and hence y − f is generated by the continuous function λ y. Thus by Theorem 5.5,


the function y − f can be expressed by

y− f = ∑ cn yn (x), (5.90)
n=1

where yn (x) are the normalized eigenfunctions of (5.85) and


 b  b
cn = (y(x) − f (x))yn (x)dx = y(x)yn (x)dx − fn , (5.91)
a a

with  b
fn = f (x)yn (x)dx.
a
Next we multiply (5.84) by yn (x) and integrate from a to b
 b  b  b
y(x)yn (x)dx = fn + λ ( K(x, ξ )y(ξ )dξ )yn (x)dx
a a a
 b  b
= fn + λ ( K(ξ , x)yn (x)dx)y(ξ )dξ (since K(x, ξ ) = K(ξ , x))
a a
 b
λ
= fn + yn (ξ )y(ξ )dξ .
λn a
360 Integral Equations
b
After replacing ξ with x in the right side and solving for a y(x)yn (x)dx we arrive
at  b
fn λn f n
y(x)yn (x)dx = = .
n −λ
λ
a 1 − λn
λ
Utilizing (5.91) we obtan
λn f n λ fn
cn = − fn = .
λn − λ λn − λ
Finally, using (5.90) we arrive at the solution

fn
y(x) = f (x) + λ ∑ λn − λ yn (x).
n=1

This completes the proof.


Example 5.10 Consider the Fredholm integral equation
 1
y(x) = x + λ K(x, ξ )y(ξ )dξ , (5.92)
0

where K(x, ξ ) is defined by the relations


(
ξ (1 − x) when ξ ≤ x ≤ 1,
K(x, ξ ) =
x(1 − ξ ) when 0 ≤ x ≤ ξ .

The kernel K is symmetric and moreover, using Lemma 16 on the homogeneous


integral equation
 1
y(x) = λ K(x, ξ )y(ξ )dξ ,
0
we arrive at the second-order boundary value problem

y′′ (x) + λ y(x) = 0, 0 < x < 1,

y(0) = 0, y(1) = 0.
See Example 5.3. This boundary value problem is a Sturm-Liouville problem. From
Example 5.3 we have the eigenvalues λn = n2 π 2 , n = 1, 2, . . . with corresponding
eigenfunctions
yn (x) = sin(nπx), n = 1, 2, . . .
Then the normalized eigenfunctions are

yn (x) = 2 sin(nπx), n = 1, 2, . . . .

Moreover,
  √
1 1 √ (−1)n+1 2
fn = f (x)yn (x)dx = x 2 sin(nπx)dx = , n = 1, 2, . . . .
0 0 nπ
Symmetric Kernel 361

Using (5.88), we have the solution to the nonhomogeneous integral equation

2λ ∞
(−1)n+1 sin(nπx)
y(x) = x + ∑ , λ ̸= n2 π 2 , n = 1, 2, . . . .
π n=1 n(n2 π 2 − λ )


Example 5.11 Consider the Fredholm integral equation
 1
y(x) = (x + 1)2 + λ (xξ + x2 ξ 2 )y(ξ )dξ . (5.93)
−1

It is clear that the kernel K(x, ξ ) = xξ + x2 ξ 2 = K(ξ , x). To apply Theorem 5.6, we
first need to find the eigenvalues and corresponding normalized eigenfunctions of
 1
y(x) = λ (xξ + x2 ξ 2 )y(ξ )dξ . (5.94)
−1

We write (5.94) in the following manner,


 1  1
y(x) = λ x ξ y(ξ )dξ + λ x2 ξ 2 y(ξ )dξ
−1 −1
= λ xC1 + λ x2C2 , (5.95)

where  
1 1
C1 = ξ y(ξ )dξ , C2 = ξ 2 y(ξ )dξ .
−1 −1
From (5.95), we see that
y(ξ ) = λ ξC1 + λ ξ 2C2 . (5.96)
Substituting y(ξ ) given by (5.96) into C1 and C2 gives
 1
2
ξ λ ξC1 + λ ξ 2C2 dξ = λC1 + 0C2 ,

C1 =
−1 3
and  1
2
ξ 2 λ ξC1 + λ ξ 2C2 dξ = 0C1 + λC2 .

C2 =
−1 5
Thus, we have the system of equations
2
(1 − λ )C1 + 0C2 = 0
3
2
0C1 + (1 − λ )C2 = 0.
5
For nontrivial values of C1 and C2 we must have
1 − 2 λ

3 0
= 0.
0 1 − 25 λ
362 Integral Equations

This gives the eigenvalues λ1 = 32 , λ2 = 52 . If λ1 = 32 , then the above system is


reduced to
0C1 + 0C2 = 0
2
0C1 + C2 = 0.
5
Hence, C2 = 0, and C1 is arbitrary. Thus, our first eigenfunction is y1 (x) = λ xC1 +
λ x2C2 = 32 C1 x = x, by choosing 32 C1 = 1. Similarly, when λ2 = 52 , the above system
reduces to
2
− C1 + 0C2 = 0
3
0C1 + 0C2 = 0
and we get C1 = 0 and C2 is arbitrary. The corresponding eigenfunction is y2 (x) = x2 ,
by choosing 52 C2 = 1. Next with normalize the eigenfunctions. Let

y1 (x) y2 (x)
φ1 (x) = q , φ2 (x) = q .
1 2 1 2
y
−1 1 (x)dx y
−1 2 (x)dx

Then φ1 and φ2 are normalized and given by



x x 6
φ1 (x) = q = ,
1 2 2
−1 x dx

and √
x2 x2 10
φ2 (x) = q = .
1 4 2
−1 x dx

Moreover, f1 and f2 , are found to be


 1  1
√ √
x 6 2 6
f1 = f (x)φ1 (x)dx = (x + 1)2 dx = ,
−1 −1 2 3

and   √ √
1 1
x2 10 8 10
f2 = f (x)φ2 (x)dx = (x + 1)2 dx = .
−1 −1 2 15
3 5
Thus, for λ ̸= λ1 = 2 and λ ̸= λ2 = 2, the solution is

2
fn
y(x) = (x + 1)2 + λ ∑ λn − λ φn (x)
n=1
 
2 f1 f2
= (x + 1) + λ φ1 (x) + φ2 (x)
λ1 − λ λ2 − λ

 2 6 √ √
8 10 2 √ 
2 3 x 6 15 x 10
= (x + 1) + λ 3 +5 .
2 −λ
2 2 −λ
2
Symmetric Kernel 363

Notice for λ = 1, then the above solution reduces to


25 2
y(x) = x + 6x + 1,
9
which is the solution for
 1
y(x) = (x + 1)2 + xξ + x2 ξ 2 )y(ξ dξ .

−1

5.6.1 Exercises
Exercise 5.42 (a) Find the eigenvalues and corresponding normalized eigenfunc-
tions for the homogeneous integral equation
 1
y(x) = λ K(x, ξ )y(ξ )dξ
0

where K(x, ξ ) is defined by the relations


(
x(1 − ξ ) when 0 ≤ x ≤ ξ ≤ 1,
K(x, ξ ) =
ξ (1 − x) when 0 ≤ ξ ≤ x ≤ 1.

(b) Use part (a) to solve the Fredholm integral equation


 1
y(x) = x + λ K(x, ξ )y(ξ )dξ .
0

(c) Does the Fredholm integral equation


 1
y(x) = x + 4π 2 K(x, ξ )y(ξ )dξ
0

have a solution?
Exercise 5.43 (a) Determine the eigenvalues and the corresponding normalized
eigenfunctions for
 π
y(x) = λ cos(x + ξ ) y(ξ ) dξ .
0

(b) Solve  π
y(x) = F(x) + λ cos(x + ξ ) y(ξ ) dξ
0
when λ is not characteristic and F(x) = 1.
364 Integral Equations

(c) Obtain the general solution (when it exists) if F(x) = sin(x), considering all
possible cases.
Exercise 5.44 (a) Determine the eigenvalues and the corresponding normalized
eigenfunctions for

 1
y(x) = λ K(x, ξ )y(ξ ) dξ
0
where K is the Green’s function that was obtained in Example 5.5 for the bound-
ary value problem

y′′ (x) = 0, 0 < x < 1, y(0) = 0, y(1) − 3y′ (1) = 0.

(b) Find the solution of


 1
y(x) = 1 + λ K(x, ξ )y(ξ )dξ .
0

Exercise 5.45 (a) Determine the eigenvalues and the corresponding normalized
eigenfunctions for

 1
y(x) = λ (xξ + 1)y(ξ )dξ .
−1

(b) Solve  1
y(x) = x + λ (xξ + 1)y(ξ )dξ .
−1

when λ is not characteristic.


Exercise 5.46 (a) Determine the eigenvalues and the corresponding normalized
eigenfunctions for

 1
y(x) = λ (x3 ξ 3 + x2 ξ 2 )y(ξ )dξ .
−1

(b) Solve  1
y(x) = x + λ (x3 ξ 3 + x2 ξ 2 )y(ξ )dξ .
−1

when λ is not characteristic.


Iterative Methods and Neumann Series 365

5.7 Iterative Methods and Neumann Series


At the beginning of this chapter, we briefly touched on an iteration method for solving
Volterra integral equations of the second kind. In this section, we consider similar
integral equations where the independent variable x is bounded. To be specific, we
consider the Volterra integral equation
 x
y(x) = f (x) + λ K(x, ξ )y(ξ )dξ , a ≤ x ≤ b (5.97)
a

where f and K are continuous on their respective domains. We emphasize that the
kernel K in (5.97) does not need to be symmetric or separable. We define a successive
approximation by

y0 (x) = f (x)
 x
y1 (x) = f (x) + λ K(x, ξ )y0 (ξ )dξ
a x
y2 (x) = f (x) + λ K(x, ξ )y1 (ξ )dξ
a
..
.  x
yn (x) = f (x) + λ K(x, ξ )yn−1 (ξ )dξ , n = 1, 2, . . . (5.98)
a

Our aim here is to give a concise method on how to define a sequence of functions
{yn } successively for (5.97) and obtain an infinite series that represents the solu-
tion. Let y0 be an initial approximation. Then replacing y in the integrand by y0
gives  x
y1 (x) = f (x) + λ K(x, ξ )y0 (ξ )dξ .
a
Substituting this approximation for y in the integrand gives the approximation
 x  ξ
 
y2 (x) = f (x) + λ K(x, ξ ) f (ξ ) + λ K(ξ , ξ1 )y0 (ξ1 )dξ1 dξ
a x a

= f (x) + λ K(x, ξ ) f (ξ )dξ


a
 x  ξ
+ λ2 K(x, ξ ) K(ξ , ξ1 )y0 (ξ1 )dξ1 dξ .
a a

In a similar fashion, substituting y2 into y3 of (5.97) gives


 x
y3 (x) = f (x) + λ K(x, ξ ) f (ξ )dξ
a
366 Integral Equations
 x  ξ
+ λ2 K(x, ξ ) K(ξ , ξ1 ) f (ξ1 )dξ1 dξ
a a
 x  ξ  ξ1
+ λ3 K(x, ξ ) K(ξ , ξ1 ) K(ξ1 , ξ2 )y0 (ξ2 )dξ2 dξ1 dξ .
a a a

If we define the operator L by


 x
Ly(x) = K(x, ξ )y(ξ )dξ , (5.99)
a

then we may write (5.97) in the form

y = f + λ Ly. (5.100)
In addition, y1 , y2 , and y3 may also be rewritten so that

y1 = f + λ Ly0 , y2 = f + λ L f + λ 2 L2 y0

and
y3 = f + λ L f + λ 2 L2 f + λ 3 L3 y0 .
Continuing in this fashion we obtain the successive approximation
n−1
yn (x) = f (x) + ∑ λ i Li f (x) + λ n Ln y0 (x), (5.101)
i=1

where Li = L(L(· · · L)), with L0 f = f . If we can show that Ln y0 (x) → 0 as n → ∞,


| {z }
i times
then from (5.101), the unique solution of (5.97) will be

y(x) = f (x) + ∑ λ i Li f (x). (5.102)
i=1

Lemma 19 Let Ln y0 (x) be define by (5.101). Then Ln y0 (x) → 0 and n → ∞.

Proof Let
M = max |K(x, ξ )| and C = max |y0 (x)|.
a≤x,ξ ≤b a≤x≤b

Then
 x

Ly0 (x) = K(x, ξ )y0 (ξ )dξ
a
 x
≤ K(x, ξ ) y0 (ξ ) dξ
a
≤ (x − a)MC, a ≤ x ≤ b.

Similarly,
 x
2
L y0 (x) = K(x, ξ )Ly0 (ξ )dξ
a
Iterative Methods and Neumann Series 367
 x
≤ K(x, ξ ) Ly0 (ξ ) dξ
a x
≤ M(ξ − a)MCdξ
a
(x − a)2 2
≤ M C, a ≤ x ≤ b.
2
Continuing this way, we arrive at
n n
L y0 (x) ≤ (x − a) M nC ≤ (b − a) M nC.
n
(5.103)
n! n!
To complete the induction argument, we assume (5.103) holds for n and show it holds
for n + 1. Using (5.103), we arrive at
 x
n+1
K(x, ξ )Ln y0 (ξ )dξ

L y0 (x) =
a
 x
K(x, ξ ) Ln y0 (ξ ) dξ


a
 x
(ξ − a)n n
≤ M
M C dξ
a n!
(x − a)n+1 n+1
≤ M C
(n + 1)!
(b − a)n+1 n+1
≤ M C.
(n + 1)!
This completes the induction argument. Now, it is clear from (5.103) that

(b − a)n n
lim |λ n ||Ln y0 (x) = lim |λ n |

M C=0
n→∞ n→∞ n!
uniformly for all a ≤ x ≤ b and for all values of λ . This shows the infinite series
(5.101) converges for any finite λ .
Lemma 20 The solution of (5.97) is given by (5.102).

Proof By Lemma 19 we have the sequence {yn (x)} converges uniformly on [a, b],
say to a function y(x). Consider the successive iterations
 x
yn (x) = f (x) + λ K(x, ξ )yn−1 (ξ )dξ , n = 1, 2, . . .
a

Then,

y(x) = lim yn (x)


n→∞
 x
= f (x) + λ lim K(x, ξ )yn−1 (ξ )dξ
n→∞ a
368 Integral Equations
 x
= f (x) + λ K(x, ξ ) lim yn−1 (ξ )dξ
n→∞
a x
= f (x) + λ K(x, ξ )y(ξ )dξ . (5.104)
a

In the next lemma we show that if y satisfies (5.104), then it is unique. In addition,
the author assume the reader is familiar with Banach spaces. For more on Banach
spaces we refer to [19] of Chapter 4.
Lemma 21 If y satisfies (5.104), then it is unique provided that
1
λ< , (5.105)
M(b − a)

where M = maxa≤x,ξ ≤b |K(x, ξ )|.



Proof Let B = {g : g ∈ C [a, b], R }. Then the space B endowed with the maximum
norm ∥ · ∥ is a Banach space. Suppose y satisfies (5.104). By (5.105) there exists an
α ∈ (0, 1) such that λ M(b − a) ≤ α. For y ∈ B, define the operator P : B → B
by  x
P(y)(x) = f (x) + λ K(x, ξ )y(ξ )dξ .
a
Clearly, P(y)(x) is a continuous map. Let w, z ∈ B. Then,

∥P(w) − P(z)∥ ≤ max P(w)(x) − P(z)(x)



x∈[a, b]
 x
≤ λM |w(u) − z(u)|du
a
 b
≤ λM |w(u) − z(u)|du
a
≤ λ M(b − a)∥w − z∥
≤ α∥w − z∥.

Thus, the operator P is a contraction and according to Banach fixed point theorem,
it has a unique fixed point.

Finally, we use the above lemmas to state the following theorem.


Theorem 5.7 Let f and K be continuous. Then the Volterra equation
 x
y(x) = f (x) + λ K(x, ξ )y(ξ )dξ , a ≤ x ≤ b (5.106)
a

has the solution



y(x) = f (x) + ∑ λ i Li f (x). (5.107)
i=1
Iterative Methods and Neumann Series 369

Moreover, if
1
|λ | < , (5.108)
M(b − a)
then the solution of (5.106) is unique and it is given by (5.107). The representation
(5.107) is called the Neumann series.

Proof The proof follows from Lemmas 19-21.

The proof of the next corollary is a direct consequence of (5.107), since f = 0.


Corollary 9 For any value of λ the Volterra equation
 x
y(x) = λ K(x, ξ )y(ξ )dξ , a ≤ x ≤ b, (5.109)
a

has only the trivial solution and hence it has no eigenvalues.


Remark 22 In our discussion we considered Volterra integral equation of the sec-
ond kind. Similar results are easily derived for the Fredholm integral equation of the
second kind. To see this, we consider Fredholm integral equation
 b
y(x) = f (x) + λ K(x, ξ )y(ξ )dξ . (5.110)
a

As before, we define the operator L by


 b
Ly(x) = K(x, ξ )y(ξ )dξ , (5.111)
a

then we may write (5.110) in the form

y = f + λ Ly

Continuing in this fashion we obtain the successive approximation


n−1
yn (x) = f (x) + ∑ λ i Li f (x) + λ n Ln y0 (x), (5.112)
i=1

where n
L y0 (x) ≤ (b − a)n M nC.

(5.113)
Then,
|λ n | Ln y0 (x) ≤ |λ |n (b − a)n M nC.

Furthermore,

lim |λ |n |Ln y0 (x) ≤ lim |λ |n (b − a)n M nC = 0,



n→∞ n→∞

provided that
1
|λ | < . (5.114)
M(b − a)
370 Integral Equations

As a consequence, the infinite series (5.112) converges to the solution



y(x) = f (x) + ∑ λ i Li f (x). (5.115)
i=1

Notice that condition (5.114) is a sufficient condition and hence without it the infinite
series may or may not converge.
Observe that the convergence for the infinite series in the case of Volterra integral
equation of the second kind was irrespective of the values of λ .
Example 5.12 Use both methods, iterations and Neumann series to solve the
Volterra integral equation
 x
y(x) = x + λ (x − ξ )y(ξ )dξ .
0

We begin with the successive approximation or iterations


 x
yn (x) = x + λ (x − ξ )yn−1 (ξ )dξ , n = 1, 2, . . .
0

with y0 (x) = x. For n = 1 we have


 x
x3
y1 (x) = x + λ (x − ξ )ξ dξ = x + λ .
0 3!
ξ3
For n = 2, with y1 (ξ ) = ξ + λ 3! , we have that
 x
ξ3  x3 x5
dξ = x + λ + λ 2 .

y2 (x) = x + λ x−ξ ξ +λ
0 3! 3! 5!
3 5
Similarly, for n = 3 with y2 (ξ ) = ξ + λ ξ3! + λ 2 ξ5! , we have
 x
ξ3 ξ5  x3 x5 x7
+λ2 dξ = x + λ + λ 2 + .

y3 (x) = x + λ x−ξ ξ +λ
0 3! 5! 3! 5! 7!
A continuation of this process leads to the sequence of functions

x3 x5 x7 ∞
x2n+1
yn (x) = x + λ +λ2 +λ3 +··· = x+ ∑ λn ,
3! 5! 7! n=1 (2n + 1)!

which converges for all values of λ and x. Next we compute the Neumann series. Let
 x
Ly(x) = K(x, ξ )y(ξ )dξ ,
0

then  x
1 1 x3
L f (x) = L x = (x − ξ )ξ dξ = .
0 3!
Iterative Methods and Neumann Series 371

Using the value of L1 x, we obtain


 x
ξ3 x5
L2 x = (x − ξ ) dξ = .
0 3! 5!

In a similar approach, we arrive at


 x
ξ5 x7
L3 x = (x − ξ ) dξ = ,
0 5! 7!

and so on. The Neumann series takes the form



y(x) = x + ∑ λ i Li x
i=1
x3 x5 x7
= x+λ +λ2 +λ3 +...
3! 5! 7!
∞ 2n+1
n x
= x+ ∑ λ .
n=1 (2n + 1)!


In what to follow, we twist the Neumann series and define the resolvent for the
Volterra integral equation given by (5.97). As before, define the operator
 x
(L f )(x) = K(x, ξ ) f (ξ )dξ .
a

Then,

(L2 f )(x) = L(L f )(x)


 x  ξ
= K(x, ξ ) K(ξ , ξ1 ) f (ξ1 )dξ1 dξ
a a
 x  x 
= K(x, ξ )K(ξ , ξ1 )dξ f (ξ1 )dξ1 ,
a ξ1

where we changed the order of integrations. If we let


 x
K2 (x, ξ1 ) = K(x, ξ )K(ξ , ξ1 )dξ ,
ξ1

then we have  x
(L2 f )(x) = K2 (x, ξ1 ) f (ξ1 )dξ1 .
a
Following in the same steps we arrive at
 x
3
(L f )(x) = K3 (x, ξ1 ) f (ξ1 )dξ1 ,
a
372 Integral Equations

where  x
K3 (x, ξ1 ) = K(x, ξ )K2 (ξ , ξ1 )dξ ,
ξ1
and in general,  x
(Ln f )(x) = Kn (x, ξ1 ) f (ξ1 )dξ1 ,
a
where  x
Kn (x, ξ1 ) = K(x, ξ )Kn−1 (ξ , ξ1 )dξ .
ξ1
The kernels K1 = K, K2 , K3 , . . . are called the iterated kernels. Consequently, the
Neumann series (5.107) can be written as
∞  x
i−1
y(x) = f (x) + λ ∑ λ Ki (x, ξ ) f (ξ )dξ
i=1 a
 x ∞ 
i−1
= f (x) + λ ∑ λ Ki (x, ξ ) f (ξ )dξ
a i=1
 x
= f (x) + λ Γ(x, ξ ; λ ) f (ξ )dξ , (5.116)
a

where

Γ(x, ξ ; λ ) = ∑ λ i−1 Ki (x, ξ )dξ , (5.117)
i=1
is the resolvent kernel. We arrived at the following theorem.
Theorem 5.8 Let f and K be continuous. Then the Volterra equation
 x
y(x) = f (x) + λ K(x, ξ )y(ξ )dξ , a≤x≤b (5.118)
a

has the solution  x


y(x) = f (x) + λ Γ(x, ξ ; λ ) f (ξ )dξ , (5.119)
a
where the resolvent kernel Γ(x, ξ ; λ ) is given by (5.117).
Example 5.13 Find the resolvent kernel and the solution for the Volterra integral
equation  x
2 2 −ξ 2
y(x) = ex + ex y(ξ )dξ . (5.120)
0
2 2 −ξ 2
Here we have f (x) = ex , K(x, ξ ) = ex , and λ = 1. Set
2 −ξ 2
K1 (x, ξ ) = K(x, ξ ) = ex .

Then
 x
K2 (x, ξ1 ) = K(x, ξ )K(ξ , ξ1 )dξ
ξ1
Iterative Methods and Neumann Series 373
 x
2 −ξ 2 2 −ξ 2
= ex eξ 1 dξ
ξ1
 x
2 −ξ 2 2 −ξ 2
= ex 1 dξ = ex 1 (x − ξ1 ).
ξ1

Similarly
 x
K3 (x, ξ1 ) = K(x, ξ )K2 (ξ , ξ1 )dξ
ξ
 1x
2 −ξ 2 2 −ξ 2
= ex eξ 1 (ξ − ξ1 )dξ
ξ1
 x
2 −ξ 2 2 −ξ 2 (x − ξ1 )2
= ex 1 (ξ − ξ1 )dξ = ex 1 .
ξ1 2

Additionally
 x
K4 (x, ξ1 ) = K(x, ξ )K3 (ξ , ξ1 )dξ
ξ1
 x
2 −ξ 2 2 −ξ 2 (ξ − ξ1 )2
= ex eξ 1 dξ
ξ1 2
2 −ξ 2 (x − ξ1 )3
= ex 1 .
3!
Inductively, we arrive at the formula

2 −ξ 2 (x − ξ1 )n−1
Kn (x, ξ1 ) = ex 1 , n = 1, 2, . . . .
(n − 1)!
Thus, the resolvent kernel is

Γ(x, ξ ; λ ) = ∑ λ n−1 Kn (x, ξ )dξ
n=1

2 2 (x − ξ1 )n−1
= ∑ ex −ξ1 (n − 1)!
n=1
2 −ξ 2

(x − ξ1 )n−1
= ex 1

n=1 (n − 1)!
2 −ξ 2
= ex 1 ex−ξ1 .

Using (5.119), we arrive at the solution


 x
2 2 −ξ 2 2
y(x) = ex + ex 1 ex−ξ1 eξ1 dξ1
0
 x
x2 x2 +x
= e +e e−ξ1 dξ1
0
374 Integral Equations
2 2 +x
= ex + ex (1 − e−x )
2 +x
= ex .


The next example displays another approach of finding the resolvent kernel.
Example 5.14 Consider the Volterra equation
 x
ϕ(x) = f (x) + λ ξ ϕ(ξ )dξ ,
a

where f is continuously differentiable. Using Lemma 16, we arrive at the first-order


differential equation
ϕ ′ (x) − λ xϕ(x) = f ′ (x).
x2
Multiplying the equation by the integrating factor e−λ 2 we arrive at

d −λ x2 x2
e 2 ϕ(x) = e−λ 2 f ′ (x).

dx
Integrating both sides from a to x leads to
 x
2 2 ξ2
−λ x2 −λ a2
e ϕ(x) − e ϕ(a) = e−λ 2 f ′ (ξ )dξ .
a

The term on the right hand side can be integrated by parts to obtain
 x
x2 a2 a2 x2 ξ2
e−λ 2 ϕ(x) − e−λ 2 ϕ(a) = −e−λ 2 f (a) + f (x)e−λ 2 +λ ξ e−λ 2 f ξ )dξ .
a

Since f (a) = ϕ(a), the above expression reduces to


 x
2 2 ξ2
−λ x2 −λ x2
e ϕ(x) = f (x)e +λ ξ e−λ 2 f ξ )dξ .
a

x2
Multiplying both sides with eλ 2 yields
 x
λ 2 −ξ 2 )
ϕ(x) = f (x) + λ ξ e 2 (x f ξ )dξ . (5.121)
a

An comparison between (5.119) and (5.121) indicates that


λ 2 −ξ 2 )
Γ(x, ξ ; λ ) = ξ e 2 (x .

If f (x) is given then we can use (5.121) to compute the solution. □


We end this section by considering nonlinear Volterra integral equations. Recall from
Definition 5.4 that an expression of the solution of the nonlinear ordinary differential
Iterative Methods and Neumann Series 375

equation x′ (t) = f (t, x(t)) for t ∈ I, and x(t0 ) = x0 is given by the nonlinear integral
equation  t

x(t) = x0 + f ξ , x(ξ ) dξ .
t0
To be consistent with our notations, we consider the nonlinear Volterra integral equa-
tion  x

y(x) = h(x) + f ξ , y(ξ ) dξ .
t0
where the functions h and f are continuous on their respective domains. As we have
done before, we define a successive approximation or Picard’s iteration by
y0 (x) = h(x)
 x
y1 (x) = h(x) + f (ξ , y0 (ξ ))dξ
0 x
y2 (x) = h(x) + f (ξ , y1 (ξ ))dξ
0
..
.  x
yn (x) = h(x) + f (ξ , yn−1 (ξ ))dξ , n = 1, 2, . . . (5.122)
0

Example 5.15 Consider the nonlinear integral equation


 x
y(x) = (ξ + y2 (ξ ))dξ .
0

We use the successive approximation or iterations


 x
ξ + y2n−1 (ξ ) dξ ,

yn (x) = n = 1, 2, . . .
0

with y0 (x) = 0. For n = 1 we have


 x
x2
y1 (x) = ξ dξ = .
0 2
ξ2
For n = 2, with y1 (ξ ) = 2 , we have that
 x
ξ4  x2 x5
y2 (x) = ξ+ dξ = + .
0 4 2 20
2 5
Similarly, for n = 3 with y2 (ξ ) = ξ2 + ξ20 , we have
 x
ξ 2 ξ 5 2 x5 x8 x11

y3 (x) = ξ+ + dξ = x + + + .
0 2 20 20 16 4400
A continuation of this process leads to higher approximation. □
376 Integral Equations

5.7.1 Exercises
Exercise 5.47 Provide all the detail for (5.113).
Exercise 5.48 Find the Neumann series for the Volterra integral equation
 x
y(x) = 1 − (x − ξ )y(ξ )dξ ,
0

and then find its solution.


Answer: y(x) = cos(x).
Exercise 5.49 Find the solution of the Fredholm integral equation
 1
x 1
y(x) = e + y(ξ )dξ ,
e 0

using:
(a) iterations,
(b) Neumann series. Answer: y(x) = ex + 1.
Exercise 5.50 Consider the Fredholm integral equation
 1
y(x) = 1 + λ (x − ξ )y(ξ )dξ .
0

Find:
(a) y1 (x), y2 (x) and y3 (x),
(b) the first three terms of the Neumann series.
Exercise 5.51 Find the Neumann series for the following integral equations.
(a)  x
y(x) = 1 − 2 ξ y(ξ )dξ .
0

(b)  1
1
y(x) = x + (x + ξ )y(ξ )dξ .
2 −1
Exercise 5.52 Solve the Volterra integral equation using the Neumann series
(a)  x
y(x) = 1 + x2 − 2 (x − ξ )y(ξ )dξ .
0
Answer: y(x) = 1.
Iterative Methods and Neumann Series 377

(b)  x
y(x) = x cos(x) + ξ y(ξ )dξ .
0
Very hard to simplify.
Answer: y(x) = sin(x).
Exercise 5.53 Consider the integral equation
 1
23 1
y(x) = x+ xξ y(ξ )dξ . (5.123)
6 8 0


(a) Show that the iterative kernel Kn (x, ξ ) = 3n−1
.
1 24
(b) Show that the resolvent kernel simplifies to Γ(x, ξ ; λ ) = xξ 1−1/24 = 23 xξ .

(c) Find the solution of (5.123).


Exercise 5.54 Consider the integral equation
 x
y(x) = x + λ y(ξ )dξ . (5.124)
0

(x−ξ )n−1
(a) Show that the iterative kernel Kn (x, ξ ) = (n−1)! .

(b) Show that the resolvent kernel simplifies to Γ(x, ξ ; λ ) = eλ (x−ξ ) .


(c) Find the solution of (5.124).
Exercise 5.55 Use the idea of Example 5.14 to compute the resolvent kernel for the
Volterra equation  x
y(x) = f (x) + λ y(ξ )dξ .
a
Find the solution when
(a) f (x) = 1,
(b) f (x) = x.
Exercise 5.56 Use the idea of Example 5.14 and redo Example 5.13.
Exercise 5.57 Use any method that you wish to solve the Volterra equation
 x
y(x) = cos(x) − x − 2 + (ξ − x)y(ξ )dξ .
0

Exercise 5.58 Consider the integral equation


 1
2
y(x) = x + y(ξ )dξ . (5.125)
0
378 Integral Equations
1
(a) Show that the iterative kernel Kn (x, ξ ) = 2n−1
.
(b) Show that the resolvent kernel simplifies to Γ(x, ξ ; λ ) = 2.
(c) Find the solution of (5.125).
Exercise 5.59 Apply Picard iteration to the (IVPs)
1. x′ (t) = tx, x(0) = 1
2. x′ (t) = 2t(1 + x), x(0) = 2

and show the obtained {xn (t)} of each of the of iterate converges to the true solution
of each of the (IVP) (True solution is the solution found by solving the (IVP)).
Exercise 5.60 Consider the coupled system of differential equations
dy dz 1
= z(x), = x3 (y(x) + z(x)); y(0) = 1 and z(0) = .
dx dx 2
Convert the system to integral equations and find the iterates
{y1 (x), y2 (x), y3 (x), z1 (x), z2 (x), z3 (x)}.

5.8 Approximating Non-Degenerate Kernels


In some cases, it is useful to approximate a non-degenerate kernel by a finite terms
using Maclaurin expansion so that it is degenerate. This is accomplished by approxi-
mating a given kernel K(x, ξ ), through a sum of a finite number of products of func-
tions of x alone by functions of ξ alone. To better explain the concept, we consider
the Fredholm integral equation of second kind
 b
y(x) = f (x) + λ K(x, ξ )y(ξ )dξ . (5.126)
a

Let D(x, ξ ) be the approximate and degenerta kernel of K. Then, the approximate
Fredholm integral equation of the second kind of (5.126) may take the form
 b
e(x) = f (x) + λ D(x, ξ )e(ξ )dξ , (5.127)
a

where the kernel D is degenerate. We may use Section 5.5 to obtain the solution e(x)
of (5.127), which is the approximate solution of (5.126). Such approximation will
involve an error which we denote by
ε = |y(x) − e(x)|
for small and positive ε. For illustrational purpose we propose the following exam-
ple.
Approximating Non-Degenerate Kernels 379

Example 5.16 Consider the Fredholm equation


 1
y(x) = cos(x) + x sin(xξ )y(ξ )dξ . (5.128)
0

Then, K(x, ξ ) = x sin(xξ ) is non-degenerate. However, a finite terms of its Maclaurin


series
(xξ )3 (xξ )5 
x sin(xξ ) = x xξ − + +···
3! 5!
is degenerate. We only consider the first two terms of its Maclaurin series and set

x4 ξ 3
D(x, ξ ) = x2 ξ − ,
3!
which is degenerate. The approximate Fredholm integral equation is then
 1
x4 ξ 3 
e(x) = cos(x) + λ x2 ξ − e(ξ )dξ . (5.129)
0 3!

Our task is to find the solution e(x) of (5.129). We begin by rewriting


2
x4 ξ 3
D(x, ξ ) = x2 ξ − = ∑ αi (x)βi (ξ ),
3! i=1

with
ξ3
α1 (x) = x2 , α2 (x) = −x4 , β1 (ξ ) = ξ , β2 (ξ ) = .
3!
Next we use  1
ai j = βi (x)α j (x)dx, i, j = 1, 2
0
to compute the matrix A = (ai j ).
 1  1
1
a11 = β1 (x)α1 (x)dx = x3 dx = ,
0 0 4
 1  1
1
a12 = β1 (x)α2 (x)dx = − x5 dx = − ,
0 0 6
 1  1 5
x 1
a21 = β2 (x)α1 (x)dx = dx = ,
0 0 6 36
 1  1 7
x 1
a22 = β2 (x)α2 (x)dx = − dx = − .
0 0 6 48
So we have
1
− 16
 
A= 4 .
1 1
36 − 48
380 Integral Equations

Since λ = 1, we have that


3 1


det(I − λ A) = det(I − A) = 41 6 ̸= 0.
− 36 − 49
48

1
Thus, λ = 1 is not an eigenvalue. We make use of fi = 0 βi (x) f (x)dx, i = 1, 2 to
compute f1 and f2 .
 1
f1 = x cos(x)dx ≈ 0.38177,
0
and

1 1 3
f2 = x cos(x)dx
6 0
1h 3 i 1
= x sin(x) + 3x2 cos(x) − 6x sin(x) − 6 cos(x)

6 0
≈ 0.02862.
Left to find c1 , c2 as the the unique solution of the system
3 1
c1 + c2 = 0.38177
4 6
1 49
− c1 + c2 = 0.02862.
36 48
After some calculations, we obtained
c1 ≈ 0.49977, c2 ≈ 0.041635.
Thus, the approximate solution e(x) of (5.129) is given by
2
e(x) = cos(x) + ∑ ci αi (x) = cos(x) + 0.49977x2 − 0.041635x4 .
i=1

The values of e(x) are compared to the values of the actual solution of (5.128) which
can be easily proved to be y(x) = 1, at various values of x ∈ [0, 1], in the table below.

x 0 0.25 0.5 0.75 1
y(x) 1 1 1 1 1
e(x) 1 0.99998 0.99926 0.99963 0.998437

5.8.1 Exercises
Exercise 5.61 Verify that y(x) = 1 is a solution of the Fredholm integral equation
given by (5.128).
Exercise 5.62 Redo Example 5.16 by taking
x4 ξ 3 x6 ξ 5
D(x, ξ ) = x2 ξ − + .
3! 5!
Laplace Transform and Integral Equations 381

Exercise 5.63 Consider the Fredholm equation


 1
y(x) = 1 − cos(x) + (1 − x sin(xξ ))y(ξ )dξ . (5.130)
0

(a) Verify y(x) = 1 is a solution of (5.130).


(b) Use the first two terms of the Maclaurin series of the kernel and compute the
approximate solution e(x).
(c) Compare both solutions at x = 0, 0.25, 0.5, 0.75, 1.
Exercise 5.64 Repeat Exercise 5.63 for the Fredholm integral equation
 1
y(x) = ex − x − x(exξ − 1)y(ξ )dξ
0

by considering the first three terms of the Maclaurin series of the kernel.

5.9 Laplace Transform and Integral Equations


The Laplace transform is a powerful tool for solving differential equations and in-
tegral equations of convolution types. In this introductory section, we first define
the Laplace transform and develop some of its basic properties. We begin with the
following definition:
Definition 5.8 Let f (t) be define on 0 ≤ t < ∞. The Laplace transform of f is de-
noted by F(s)  ∞
Ł[ f (t)] = F(s) = e−st f (t) dt, for s > 0. (5.131)
0
The Laplace transform of f is said to exist if the improper integral (5.131) converges.
Note that the right side of (5.131) is a function of s and hence the notation F(s). We
shall use small letters of functions of t such as f , g or h and shall denote their Laplace
transforms by Ł[ f (t)] = F(s), Ł[g(t)] = G(s), Ł[h(t)] = H(s). Laplace transform is
linear. That is for functions f and g and constants a and b it follows that

Ł[a f + bg] = aŁ[ f ] + bŁ[g].

Example 5.17 In this example we develop the Laplace transform of basic functions.
We do so by considering different values of f (t).
(a) For f (t) = 1, then

1 −st ∞ 1
∞  
−st
Ł[1] = e dt = − e = , s > 0.
0 s 0 s
382 Integral Equations

(b) For f (t) = t, then


 ∞  ∞  ∞
1 1 −st 1
Ł[t] = e−st t dt = − e−st t + e = 2 , s > 0.
0 s 0 0 s s

(c) For f (t) = dy


dt , then by performing an integration by parts we get
   ∞  ∞
dy dy
e−st dt = e−st y 0 + se−st ydt = −y(0) + sY (s).
 ∞
Ł =
dt 0 dt 0

(d) For f (t) = eat , a constant, then


 
1 −(s−a)t ∞
∞ ∞  
at −st at −(s−a)t 1
Ł[e ] = e e dt = e dt = − e = , s > a.
0 0 s−a 0 s−a

(e)
Ł[cos at + i sin at] = Ł[eiat ] by DeMoivre.
1 s ia
= 2 Ł[eiat ] =
2
+ 2 .
s − ia s + a s + a2
Hence, equating real and imaginary parts and using linearity
s
Ł[cos at] =
s2 + a2
a
Ł[sin at] = .
s2 + a2
We can apply the convolution property from the table to find
 
f (s)
Ł−1 .
s

1
Ł−1 [ f (s)] = f (t), and Ł−1 [ ] = 1 = g(t),
s
so    t
f (s)
Ł−1 = f (θ ) dθ .
s 0

(f) For f (t) = t n , n = 0, 1, 2, . . . , then

n!
Ł[t n ] = , n = 0, 1, 2, . . . .
sn+1


Laplace Transform and Integral Equations 383

You can access the Laplace Transforms of all the functions you are likely to meet
online thanks to computer algebra tools like Mathematica, Matlab, and Maple. The
packages also provide an inversion technique to find a function f from a given F(s).
For example
1  π 1/2
Ł[t 1/2 ] = ,
2 s3
and  π 1/2
Ł[t −1/2 ] = .
s

Theorem 5.9 (Shift Theorem) If F(s) = Ł[ f (t)], then

Ł[eat f (t)] = F(s − a).

Proof Using Definition 5.8 we have


 ∞  ∞
Ł[eat f (t)] = e−st eat f (t)dt = e−(s−a)t dt = F(s − a).
0 0

This completes the proof.

Next we define the Laplace inverse.


Definition 5.9 Let F(s) be the Laplace transform of given a function f (t). We denote
the Laplace inverse of f by Ł−1 [F(s)] such that

Ł−1 [F(s)] = f .

Example 5.18 We use Laplace transform to solve the initial value problem
dy
2 − y = sint, y(0) = 1.
dt
We begin by taking the Laplace transform on both sides and obtain
1
2(sY (s) − 1) −Y (s) = .
s2 + 1
Solving for Y (s) gives
2s2 + 3
Y (s) = .
(2s − 1)(s2 + 1)
Taking the Laplace inverse we arrive at
2s2 + 3
y(t) = Ł−1 [ ].
(2s − 1)(s2 + 1)
Next we use partial fractions. That is
2s2 + 3 A Bs +C
= + ,
(2s − 1)(s2 + 1) 2s − 1 s2 + 1
384 Integral Equations

and after some calculations we obtain A = 75 , B = −2


5 , C=
−1
5 . Thus, the solution
y(t) is
7 2 1
y(t) = et/2 − cost − sint.
5 5 5

Next we address the convolution between two functions.
Definition 5.10 Let f (t) and g(t) be define on 0 ≤ t < ∞. The function
 t
h(t) = f (t − τ)g(τ) dτ, (5.132)
0

is called the convolution of f and g and is written

h = f ∗ g.

Theorem 5.10
f ∗g = g∗ f.

Proof Let u = t − τ in (5.132). Then


 0  t
h(t) = f (u)g(t − u) (−du) = f (u)g(t − u) du
t 0
 t
= g(t − u) f (u) du = (g ∗ f )(t).
0

This completes the proof.

Let F(s) and G(s) be the Laplace transform of the functions f , and g, respec-
tively. We are interested in computing Ł−1 [F(s)G(s)]. We have the following the-
orem
Theorem 5.11 (Convolution Theorem) Let F(s) and G(s) be the Laplace trans-
form of the functions f , and g, respectively. Then
 t
Ł[ f ∗ g] = Ł[ f (t − τ)g(τ) dτ] = F(s)G(s).
0

Proof For h = Ł−1 [H(s)], with H(s) = F(s)G(s), let


 ∞
H(s) = e−st h(t)dt = F(s)G(s). (5.133)
0

Then
 ∞  ∞
−st
H(s) = ( e f (t)dt)( e−sτ g(τ)dτ)
0 0
Laplace Transform and Integral Equations 385
 ∞  ∞
e−s(t+τ) f (t)dt g(τ)dτ.

=
0 0

Make the change of variables u = t + τ for the inside integral. Then


 ∞  ∞
e−su f (u − τ)du g(τ)dτ.

H(s) =
0 τ

By changing the order of integrations we obtain


 ∞  u
f (u − τ)g(τ) dτ e−su du.
 
H(s) =
0 0

Replacing the dummy variable of integration u with t and then compare the result
with (5.133), we clearly see that
 ∞  t
e−st h(t)dt = f (t − τ)g(τ) dτ
0 0
 ∞  t
f (t − τ)g(τ) dτ e−st dt.
 
=
0 0

This completes the proof.


Example 5.19 Express h in the form f ∗ g, when
1
H(s) = .
(s2 + 4)2
1
Let F(s) = G(s) = s2 +4
. Then H(s) = F(s)G(s). Moreover,

1 1
Ł−1 [F(s)] = Ł−1 [G(s)] = Ł−1 [ ]= sin(2t).
s2 + 4 2
Thus,  t
1 1 1
h(t) = sin(2t) ∗ sin(2t) = sin 2(t − τ) sin(2τ)dτ.
2 2 4 0

Before we consider the next example, we define the error function.
Definition 5.11 The error function is the following improper integral considered as
a real function er f : R → R, such that
 x
2 2
er f (x) = √ e−z dz,
π 0

where exponential is the real exponential function. In addition the complementary


error function,  ∞
2 2
er f c(x) = 1 − er f (x) = √ e−z dz.
π x
386 Integral Equations

We can easily verify that

er f c(−∞) = 2, er f c(0) = 1, er f c(∞) = 0.

Next we state the gamma function, which is needed in future work. We denote the
Gamma function by Γ and it is defined by
 ∞
Γ(x) = ux−1 e−u du, x > 0.
0

Note that, it can be easily shown

Γ(x + 1) = xΓ(x),

which can be used to show that, for every positive integer n

Γ(n) = (n − 1)!.

We will also need the following formula. For positive integer n, we have

1 (n − 2)!! π
Γ( n) = , (5.134)
2 2(n−1)/2
where n!! is a double factorial. For example,
√ √ √
Γ(1/2) = π, Γ(3/2) = π/2, Γ(5/2) = (3 π)/4, etc,
1 (2n − 1)!! √
Γ( + n) = π,
2 2n
and
1 (−1)n 2n √
Γ( − n) = π.
2 (2n − 1)!!

The next example deals with integral equations of convolution type.


Example 5.20 We are interested in finding the solution of the integral equation of
convolution type  t
r(t) = 1 − (t − τ)−1/2 r(τ) dτ. (5.135)
0
In convolution form, equation (5.135) becomes

r(t) = 1 − t −1/2 ∗ r(t). (5.136)

Let Ł[r(t)] = R(s) and take Laplace transform on both sides of (5.136).

Ł[r(t)] = Ł[1] − Ł[t 1/2 ]Ł[r(t)].

This gives √
1 Γ(1 − 1/2) 1 π
R(s) = − R(s) = − R(s).
s s1−1/2 s s1/2
Laplace Transform and Integral Equations 387

Solving for R(s) in the above equation gives

1
R(s) = √ .
s1/2 (s1/2 + π)

Using partial fractions, we write


A B
R(s) = + √ . (5.137)
s1/2 s1/2 + π
Next we compute the coefficient A and B. Taking a common denominator and equat-
ing both sides give √
1 = A(s1/2 + π) + Bs1/2 .
√ √
For s = 0, we have A = 1/ π, and for s1/2 = 1, we obtain B = −1/ π. Hence,
(5.137) becomes
1 1
R(s) = √ 1/2 − √ 1/2 √ .
π(s ) π(s + π)
Then, by taking the inverse Laplace of R(s), we get
   
−1 −1 1 −1 1
r(t) = Ł [R(s)] = Ł √ 1/2 − Ł √ 1/2 √ .
π(s ) π(s + π)

Or,    
1 −1 1 1 −1 1
r(t) = √ Ł −√ Ł √ . (5.138)
π s1/2 π s1/2 + π
h i 2 √
By our provided table, we see that Ł−1 √s+a1
= √π1√t − a ea t erf(a t), then

√ πt √ √
 
1 1 1 1
r(t) = √ √ √ − √ √ √ − π e erf( π t) .
π π t π π t
This simplifies to
1 1 √ √
r(t) = √ − √ + eπt erf( π t).
π t π t
Finally, √ √
r(t) = eπt erf( π t).

5.9.1 Frequently used Laplace transforms


∞
Function f (t) Transform F(s) = 0 e−st f (t) dt
1 1/s
t n , f or n = 0, 1, 2, . . . n!/sn+1
1
t 1/2 2 (π/s )
3 1/2

t −1/2 ( πs )1/2
388 Integral Equations

eat 1/(s − a)
sin ωt ω/(s2 + ω 2 )
cosωt s/(s2 + ω 2 )
t sin ωt 2ωs/(s2 + ω 2 )2
t cos ωt (s2 − ω 2 )/(s2 + ω 2 )2
eat t n n!/(s − a)n+1
eat sin ωt ω/ (s − a)2 + ω 2


eat cos ωt (s − a)/ (s − a)2 + ω 2




sinh ωt ω/(s2 − ω 2 )
cosh ωt s/(s2 − ω 2 )
Shift of g: eat g(t) t
G(s − a)
Convolution: f (t) ∗ g(t) = 0 f (t − τ)g(τ) dτ G(s)F(s)
t 1
Integration: 1 ∗ g(t) = 0 g(τ) dτ s G(s)
Derivative: y′ sY (s) − y(0)
y′′ √ s2Y (s) − √sy(0) − y′ (0)
(1 + 2at)/
√ πt (s +
√a)/s s
e−at / πt √ √ s + a√
1/
(ebt − e−at )/2t √ πt √s − a − √s − b
(e−bt√− e−at√)/2t πt s +√a + s + b
er f ( at)/√ a√ 1/(s√ s√+ a)
eat er f ( at)/ a 1/( s s − a)
2 √ √
√1 − beb t er f (b t)] 1/( s + b)
πt
f (ct) 1/(cF(1/c)), c > 0
f (n) (t) sn F(s) − sn−1 f (0) − . . . − f (n−1) (0)
(−t)n f (t) F (n) (s)
u(t − a) f (t − a) e−as F(s)
u(t − a) e−as /s
Γ(v+1)
t v , (v > −1) sv+1
.

5.9.2 Exercises
Exercise 5.65 Solve the initial value problem using Laplace transform
(a) y′′ + 9y = u(t − 3), y(0) = 1, y′ (0) = 2.
(b) 2 dy
dt − y = sin(t), y(0) = 1.
Exercise 5.66 Express h in the form f ∗ g, when
1
(a) H(s) = s3 −3s
.
1
(b) H(s) = (s2 +4)(s2 +9)
.
1
(c) H(s) = 3 .
s 2 (s2 +4)
Laplace Transform and Integral Equations 389

Exercise 5.67 Use Laplace transform and write down the solution of the integral
equation  t
y(t) = f (t) + λ e(t−τ) y(τ) dτ.
0
t
Answer: y(t) = f (t) + λ 0 e(λ +1)(t−τ) f (τ) dτ.
Exercise 5.68 Use Exercise 5.67 to solve the integral equation
 t
y(t) = cos(t) − e(t−τ) y(τ) dτ.
0

Answer: y(t) = cos(t) − sin(t).


Exercise 5.69 Solve the the system of ODEs using Laplace transform
dy dx
− + y + 2x = et ,
dt dt
dy dx
+ − x = e2t ,
dt dt
x(0), y(0) = 1.
Exercise 5.70 Show that
1
(Γ( )2 = π.
2
Exercise 5.71 Use Laplace transform to solve the following integral equations.
t
(a) f (t) = t + 0 (t − τ) dτ,
t
(b) f (t) + 2 0 f (τ) cos(t − τ) dτ = 4e−t + sin(t),
t
(c) y′ (t) = 1 − sin(t) − 0 y(τ) dτ, y(0) = 0.
Exercise 5.72 Show that if
 t
r(t) = −a(t) + a(t − s)r(s)ds
0

and  t
x(t) = f (t) + a(t − s)x(s)ds,
0
then  t
x(t) = f (t) − r(t − s) f (s)ds.
0
Exercise 5.73 Solve the Abel equation
 t
1
√ y(τ)dτ = f (t),
0 t −τ

where f (t) is a given function with f (0) = 0 and f ′ admits a Laplace transform.
390 Integral Equations

Find the solution when


(a) f (t) = 1 + t + t 2 ,
(b) f (t) = t 3 .
Exercise 5.74 Use Laplace transform to solve
 t
(t − τ)1/3 y(τ)dτ = t 3/2 .
0

Answer: y(t) = (3 π t 1/6 )/(4Γ(4/3)Γ(7/6)).
Exercise 5.75 Use Laplace transform to solve
 t
e−2(t−τ) (t − τ)−1/2 y(τ)dτ = 1.
0

Answer: y(t) = (t −1/2 /π)e−2t +
p
2/π er f ( 2t).

5.10 Odd Behavior


In this brief section, we skim over some integral equations that display odd behavior,
either in the sense of solutions existing only over finite time or the existence of more
than one solution. In Chapter 1, Section 1.1, Example 1.4, we considered a first-order
initial value problem and showed that it had more than one solution. Since initial
value problems and integral equations have a direct relationship, one might expect
such strange behavior to apply to integral equations as well. Thus, strange behavior
requires qualitative analysis of solutions using different means. For this particular
section, the author assume the reader is familiar with complete metric spaces and
Banach spaces and we refer to [19] of Chapter 4.
In the next example we show an integral equation has its solution exists over a finite
time.
Example 5.21 Consider the integral equation
 x
y(x) = y0 + y2 (ξ )dξ , y0 > 0
0
which is equivalent to the initial value problem
y′ (x) = y2 (x), y(0) = y0 > 0.
Separating the variables, the initial value problem has the solution
y0
y(x) = .
1 − y0 x
This solution exists only on the interval x ∈ [0, y10 ). □
Odd Behavior 391

The next example is concerned with the existence of multiple solutions on an integral
equation.
Example 5.22 Consider the integral equation
 x
y(ξ )
y(x) = p dξ .
0 x2 − ξ 2
It is clear that y(x) = 0, is a solution. Additionally, y(x) = x is another solution since
 x  x
y(ξ ) ξ
p dξ = p dξ
0 x2 − ξ 2 0 x2 − ξ 2
 x2
1 1
= √ dξ = x,
2 0 u

where we have used the transformation u = x2 − ξ 2 . Note that the kernel K(x, ξ ) =
√ 1 , is well behaved under integration. That is for any T > 0 we see that
x2 −ξ 2
 T  x
|K(x, ξ )|dξ dx < ∞,
0 0

and moreover, the function g(x) = x is certainly Lipschitz continuous. However, the
kernel is singular, in the sense that

K(x, ξ ) → ∞, as x → ξ .


The next theorem provide necessary conditions for the existence of unique solutions
of integral equations of the form
 t
x(t) = f (t) + g(t, s, x(s))ds (5.139)
0

in which x is an n vector, f : [0, ∞) → Rn , and g : π × Rn → Rn is continuous in all


of its arguments where π = (t, s) ∈ [0, ∞) × [0, ∞) : 0 ≤ s ≤ t < ∞ . We will use the
contraction mapping principle and show the existence of solutions of (5.139) over a
short interval, say [0, T ].
Theorem 5.12 Suppose there are positive constants a, b, and α ∈ (0, 1). Suppose
(a) f is continuous on [0, a],
(b) g is continuous on

U = (t, s, x) : (t, s) ∈ [0, ∞) × [0, ∞) : 0 ≤ s ≤ t < ∞ and |x − f (t)| ≤ b ,

(c) g satisfies a Lipschitz condition with respect to x on U



g(t, s, x) − g(t, s, y) ≤ L|x − y|
392 Integral Equations

for (t, s, x), (t, s, y) ∈ U.



If M = maxU g(t, s, x) , then there is a unique solution of (5.139) on [0, T ], where
c = α/L for fixed α and T = min{a, b/M, c}.

Proof Let X denote the space of continuous functions φ : [0, T ] → Rn , such


that
||φ − f || = max {|φ (t) − f (t)|} ≤ b,
t∈[0,T ]

where for Ψ ∈ X the norm || · || is taken to be ∥Ψ∥ = maxt∈[0,T ] {|Ψi (t)|}. Let φ ∈ X
and define an operator D : X → X, by
 t
D(φ )(t) = f (t) + g(t, s, φ (s))ds.
0

Since φ is continuous we have that D(φ ) is continuous, and


 t
||D(φ ) − f || = max g(t, s, φ (s))ds

t∈[0,T ] 0
≤ MT ≤ b.
This shows that D maps X into itself. For the contraction part, we let φ , ψ ∈ X.
Then
 t  t
||D(φ ) − D(ψ)|| = max g(t, s, φ (s))ds − g(t, s, ψ(s))ds

t∈[0,T ] 0 0
 t
≤ max g(t, s, φ (s) − g(t, s, ψ(s) ds

t∈[0,T ] 0
 t
≤ max L φ (s) − ψ(s) ds

t∈[0,T ] 0

≤ T max L φ (s) − ψ(s)

t∈[0,T ]
= T L||φ − ψ|| ≤ cL||φ − ψ||
= α||φ − ψ||.
Thus, by the contraction mapping principle, there is a unique function x ∈ X
with  t
D(x)(t) = x(t) = f (t) + g(t, s, x(s))ds.
0

Next we state and prove Gronwall’s inequality, which plays an important role in the
next results.
Theorem 5.13 (Gronwall’s inequality) Let C be a nonnegative constant and let u, v
be nonnegative continuous functions on [a, b] such that
 t
v(t) ≤ C + v(s)u(s)ds, a ≤ t ≤ b, (5.140)
a
Odd Behavior 393

then t
v(t) ≤ Ce a u(s)ds , a ≤ t ≤ b. (5.141)
In particular, if C = 0, then v = 0.
 t
Proof Assume C > 0 and let h(t) = C + v(s)u(s)ds. Then
a

h′ (t) = v(t)u(t) ≤ h(t)u(t).

So we have the differential equation

h′ (t) − h(t)u(t) ≤ 0.
t
Multiply both sides of the above expression by the integrating factor e− a u(s)ds , to
get  t ′
h(t)e− a u(s)ds ≤ 0.

Integrating both sides from a to t gives


t t
h(t)e− a u(s)ds − h(a) ≤ 0, or h(t) ≤ h(a)e a u(s)ds .

Finally, t
v(t) ≤ h(t) ≤ Ce a u(s)ds , C = h(a).
If C = 0 then form (5.140) it follows that
 t  t
1
v(t) ≤ v(s)u(s)ds ≤ + v(s)u(s)ds a ≤ t ≤ b,
a m a

for any m ≥ 1. Then from what we have just proved we arrive at


1  t u(s)ds
v(t) ≤ ea , a ≤ t ≤ b.
m
Thus for any fixed t ∈ [a, b], we can let m → ∞ to conclude that v(t) ≤ 0 and it follows
that v(t) = 0, for all t ∈ [a, b]. This completes the proof.
Theorem 5.14 Consider the integral equation given by (5.139). Suppose there are
positive constants α, β , λ , and A such that

| f (t)| ≤ Ae−αt and |g(t, s, x(s))| ≤ λ e−α(t−s) |x|.

If α − λ = β > 0 and if x(t) is any solution of (5.139) then

|x(t)| ≤ Ae−βt .
394 Integral Equations

Proof From (5.139) we see that


 t

|x(t)| = f (t) + g(t, s, x(s))ds
0
 t
≤ Ae−αt + λ e−α(t−s) |x(s)|ds.
0

Multiplying both side of the above expression by eαt we arrive at


 t
eαt |x(t)| ≤ A + λ eαs |x(s)|ds.
0

Applying Gronwall’s inequality, we obtain the estimate


t
eαt |x(t)| ≤ Ae 0 λ ds .

Or,
|x(t)| ≤ Ae(λ −α)t = Ae−βt .
This completes the proof.

The next theorem shows that if the signs of the function g are right, then the growth of
g has nothing to do with continuation of solutions. Before we embark on the details,
the following is needed. Let x : R → R be continuous. Observing that
√ 1
|x| = x2 = (x2 ) 2 ,

and by using the chain rule we arrive at


d 1 2 1
|x(t)| = (x (t))− 2 (2x(t)x′ (t))
dt 2
x(t) ′
= 1 x (t)
(x2 (t)) 2
x(t) ′
= x (t).
|x(t)|
Theorem 5.15 Consider the scalar nonlinear integral equation
 t
y(t) = f (t) + K(t, s)g(y(s))ds, t ≥ 0. (5.142)
0

Suppose f and f ′ are continuous. In addition, we assume ∂ K(t,s)∂s and K(t, s) are
continuous for 0 ≤ s ≤ t < ∞. If for y ̸= 0, yg(y) > 0 and for each T > 0 we have
 T ∂ K(u,t)
K(t,t) + du ≤ 0,
t ∂u
then each solution y(t) of (5.142) can be continued for all future times.
Odd Behavior 395

Proof Let η > 0 and set Ku (u,t) = ∂ K(u,t) ′


∂ u . Then | f (t)| ≤ M, on [0, η) for positive
constant M. It suffices to show that if a solution y(t) of (5.142) is defined on [0, η),
then it is bounded. Let
 t η
−Mt
 
H(t, y(·)) = e 1 + |y(t)| + |Ku (u, s)|du|g(y(s))|ds .
0 t

Then along the solutions of (5.142) we have


t  
η
H ′ (t, y(·)) = −Me−Mt 1 + |y(t)| +
 
|Ku (u, s)|du|g(y(s))|ds
0 t
  η
−Mt y(t) ′
+ e y (t) + |Ku (u,t)|du|g(y(t))|
|y(t)| t
 t 
− |Kt (t, s)||g(y(s))|ds
0
  η
y(t) ′
≤ e−Mt − M − M|y(t)| + y (t) + |Ku (u,t)|du|g(y(t))|
|y(t)| t
 t 
− |Kt (t, s)||g(y(s))|ds . (5.143)
0

Notice that, the condition yg(y) > 0 implies

y(t)g(y(t)) |y(t)||g(y(t))|
= = |g(y(t))|.
|y(t)| |y(t)|

Hence, by differentiating (5.142) we have that


 t
y(t) ′ y(t)  ′ 
y (t) = f (t) + K(t,t)g(y(t)) + Kt (t, s)g(y(s))ds
|y(t)| |y(t)| 0
 t
|y(t)||g(y(t))|
≤ | f ′ (t)| + K(t,t) + |Kt (t, s)||g(y(s))|ds
|y(t)| 0
 t

= | f (t)| + K(t,t)|g(y(t))| + |Kt (t, s)||g(y(s))|ds.
0

Substituting into (5.143) we obtain


h   η  i
H ′ (t, y(·)) ≤ e−Mt − M|y(t)| + K(t,t) + |Ku (u,t)|du |g(y)|
t
≤ 0.

Since H > 0 and H is decreasing along the solutions, we see that H is bounded by
some constant, and hence |y(t)| is bounded on [0, η). As a matter of fact, we have
from the definition of H that

e−Mt |y(t)| ≤ H(t, y(·)) ≤ D, for some D.


396 Integral Equations

This yields
|y(t)| ≤ DeMt ≤ DeMα .
This completes the proof.

We furnish the following example.


Example 5.23 Consider the integral equation
 t
y5 (s)
y(t) = et − √ ds.
0 t −s+1

Then, for any η > 0 we have | f ′ (t)| = et ≤ eη := M. It readily follows that yg(y) =
−1/2 −3/2
y6 > 0 when y ̸= 0. Let K(t, s) = − t − s + 1 . Then Ku (u,t) = 12 u −t + 1 .
Moreover, for any T > 0 we have
 T  T

Ku (u,t) du = −1 + 1 −3/2
K(t,t) + u−t +1 du
t t 2
−1/2
= −1 − T − t + 1 + 1 ≤ 0.

Hence, by Theorem 5.15 solutions can be continued, or continuable, for all future
times. □

5.10.1 Exercises
Exercise 5.76 Construct an example that satisfies the hypothesis of Theorem 5.14.
Exercise 5.77 Use Theorem 5.15 to show that solutions of the integral equation
 t
t y3 (s)
y(t) = e − ds
0 (t − s + 1)2

can be continued for all future times.


Exercise 5.78 Consider the scalar nonlinear integral equation
 t
y(t) = f (t) + g(t, s, y(s))ds, t ≥ 0.
0

Suppose f is continuous. In addition, assume g is continuous for 0 ≤ s ≤ t < ∞, and


for each T > 0 there is a continuous function M(s, T ) with

|g(t, s, y)| ≤ M(s, T ) 1 + |y| , for 0 ≤ s ≤ t ≤ T.

Show that if y(t) is a solution of the above integral equation on some interval [0, α),
then it is bounded, and, hence, it can be continued for all future times.

Hint: Convince yourself of the fact that | f (t)| + 0 M(s, α)ds ≤ Q, for some positive
constant Q, and then apply Gronawall’s inequality.
Appendices
A
Fourier Series

This appendix covers the basic main topics of Fourier series. We briefly discuss
Fourier series expansion, including sine and cosine. We provide applications to the
heat problem in a finite slab by utilizing the concept of separation of variables. We
end this appendix by studying the Laplacian equation in circular domains.

A.1 Preliminaries
We start with some basic definitions.
Definition A.1 A function f (x) is said to be periodic with period p if f (x+ p) = f (x)
for all x in the domain of f . This means that the function will repeat itself every p
units.The main period is the smallest positive period of a function.
For example, the trig functions sin x and cos x are periodic with period 2π, as well
as with period 4π, 6π, 8π, etc. The function sin nx is periodic, with main period 2π
n ,
though it also has period 2π. If two functions are period with the same period, then
any linear combination of those functions is periodic with the same period.This is
important fact since the infinite sum

a0
+ ∑ (an cos nx + bn sin nx), (A.1)
2 n=1

has period 2π. Expression (A.1) is known as the Fourier series, where an , bn are
called Fourier coefficients. Given a function f (x) that is periodic with period 2π,
then we write

a0
f (x) = + ∑ (an cos nx + bn sin nx), (A.2)
2 n=1
where the Fourier coefficients of f (x) are given by the Euler formulas
 π
1
a0 = f (x)dx, (A.3)
π −π
 π
1
an = f (x) cos(nx)dx, n = 1, 2 . . . (A.4)
π −π

DOI: 10.1201/9781003449881-A 399


400 Fourier Series

and
 π
1
bn = f (x) sin(nx)dx, n = 1, 2 . . . (A.5)
π −π
This is an alternative way of expressing a function in an infinite series in terms of sine
and cosine. The above extension of f can be easily extended to periodic function with
period 2L. In such a case the above formulae takes the form


a0 nπx  nπx 
+ ∑ (an cos + bn sin , (A.6)
2 n=1 L L
has period 2L. Given a function f (x) that is periodic with period 2L, then we
write

a0 nπx  nπx 
f (x) = + ∑ (an cos + bn sin , (A.7)
2 n=1 L L
where the Fourier coefficients of f (x) are given by the Euler formulas
 L
1
a0 = f (x)dx, (A.8)
L −L
 L
1 nπx 
an = f (x) cos dx, n = 1, 2 . . . (A.9)
L −L L
and

 L
1 nπx 
bn = f (x) sin dx, n = 1, 2 . . . (A.10)
L −L L
We have the following definition.
Definition A.2 Let x0 be a point in the domain of a function f . Then,
(a) the right-hand limit of f at x0 , denoted by f (x0+ ) is defined by

lim f (x) = f (x0+ ),


x→x0
x>x0

(b) the left-hand limit of f at x0 , denoted by f (x0− ) is defined by

lim f (x) = f (x0− ),


x→x0
x<x0

(c) the right-hand derivative of f at x0 , denoted by f ′ (x0+ ) is defined by

f (x) − f (x0+ )
f ′ (x0+ ) = x→x
lim ,
x>x0
0 x − x0
Finding the Fourier Coefficients 401

(d) the left-hand derivative of f at x0 , denoted by f ′ (x0− ) is defined by

f (x) − f (x0− )
f ′ (x0− ) = x→x
lim .
0
x<x0
x − x0

Remark 23 If (a) and (b) of Definition A.2 are satisfied for ever x ∈ (a∗ , b∗ ), then we
say f is piecewise continuous on (a∗ , b∗ ), and we write f ∈ C p (a∗ , b∗ ). In addition
to (a) and (b), if (c) and (d) of Definition A.2 are satisfied for ever x ∈ (a∗ , b∗ ), then
we say f is piecewise smooth on (a∗ , b∗ ), and we write f ∈ C′p (a∗ , b∗ ).
We furnish the following example.
Example A.1 Consider 
−x, x<0
f (x) =
x + 1, x > 0
Then
lim f (x) = f (0+ ) = 1, and lim f (x) = f (0− ) = 0.
x→0 x→0
x>0 x<0
Moreover,
f (x) − 1 (x + 1) − 1
f ′ (0+ ) = lim = lim = 1,
x→0 x x→0 x
x>0 x>0

and
f (x) − 1 −x
f ′ (0− ) = lim = lim = −1.
x→0 x x→0 x
x<0 x<0

We see that f ∈ C′p (−∞, ∞). □


The next theorem is known as the Fourier convergence theorem.
Theorem A.1 [Fourier convergence theorem] Suppose f ∈ C′p (−L, L). Then the
Fourier series given by (A.6) with Fourier coefficients given by (A.7), (A.8), and
(A.10) will converge to the function f (x) at the point x at which f is continuous and
it will converge to
f (x0+ ) + f (x0− )
2
at which f is discontinuous at x0 .

A.2 Finding the Fourier Coefficients


Finding the Fourier coefficient depends on the concept of orthogonality of the
sine and cosine functions. The concept of orthogonality was discussed and
used in Chapters 3, 4 and 5. Recall the following trigonometric identities
cos(nx) cos(mx) = 12 cos(n + m) + cos(n − m) , sin(nx) sin(mx) = 21 cos(n −


m)x − cos(n + m)x , and sin(nx) cos(mx) = 12 sin(n + m)x + sin(n − m)x . Thus,
 
402 Fourier Series

utilizing those trigonometric


π identities we may easily
 π show now that for any inte-
gers m ̸= n, we have −π cos(nx) cos(mx)dx = 0, −π sin(nx) sin(mx)dx = 0, and
π π
−π sin(nx) cos(mx)dx = 0. In addition, if m = n then −π sin(nx) cos(nx)dx = 0.
Because these integrals are zero, we say that sin nx, cos mx forms an orthogonal sys-
tem
 π of functions. This is proved
π using the fact that n ± m ̸= 0 is an integer, and so
−π cos(n ± m)dx = 0 and −π sin(n ± m)dx = 0. If n = m, then

1 1
cos(nx) cos(nx) = (cos(2nx) + 1) and sin(nx) sin(nx) = (1 − cos(2nx)),
2 2
so we can compute
 π 
1 π
cos(nx) cos(nx)dx = (cos(2nx) + 1)dx
−π 2 −π
1 sin(2nx) π
= (x + ) dx
2 2n + x −π
= π,
 π 
1 π
sin nx sin nxdx = (1 − cos(2nx))dx
−π 2 −π
1 sin(2nx) π
= (x − ) −π dx
2 2n
= π.
Now, if we multiply both sides of

a0 
f (x) = + ∑ an cos(nx) + bn sin(nx)
2 n=1

by cos(mx), and then integrate term by term, we have by using the orthogonality
concept that
 π  π
a0
f (x) cos(mx)dx = cos(mx)dx
−π −π 2
∞  π  π
+ ∑ (an cos(nx) cos(mx)dx + bn sin(nx) cos(mx)dx)
n=1 −π −π

= 0 + am π.
Hence  π
1
am = f (x) cos(mx)dx.
π −π
The other coefficients are derived similarly. Now we work out some exam-
ples.
Example A.2 Let (
2, 0<x<π
f (x) = .
−1, −π < x < 0
Finding the Fourier Coefficients 403

The function f (x) has period 2π. First we compute


 π
1
a0 = f (x)dx
π −π
 0  π 
1
= −1dx + 2dx
π −π 0
1
= (−π + 2π)
π
= 1,

 π
1
an = f (x) cos(nx)dx
π −π
0  π  
1
= − cos(nx)dx + 2 cos(nx)dx
π −π 0
 
1 sin nx 0 sin nx π
= − +2
π n −π n 0
= 0, n = 1, 2, . . . .

Finally,
 π
1
bn = f (x) sin(nx)dx
π −π
 0  π 
1
= − sin(nx)dx + 2 sin(nx)dx
π −π 0
1  cos nx 0 cos nx π 
= −2
π n −π n 0
1 1 cos nπ cos nπ 2
= ( − −2 + )
π n n n n
3
= (1 − cos nπ).

Note that when n is even then cos nπ = 1 and when n is odd cos nπ = −1. Hence
6
bn = nπ if n is odd and bn = 0 if n is even. This means that can replace n by 2n − 1
in the sum and obtain
(

2, 0<x<π 1 6
f (x) = = +∑ sin((2n − 1)x).
−1, −π < x < 0 2 n=1 (2n − 1)π

According to Theorem A.1, the infinite sum given by the above expression converges
to the function (
1
at x = 0, ±π
g(x) = 2
f (x) otherwise

404 Fourier Series

We now consider a function with period 8.


Example A.3 Consider the function

0, −4 < x < −2

f (x) = 1, −2 < x < 2 .

0, 2 < x < 4

This function is 1 for −2 < x < 2, 6 < x < 10, etc. It is a regular pulse which is on
for 4 units of time, and then off for four units of time. Since the period is not 2π, but
instead 2L = 8, we have L = 4. The Fourier coefficients are
 4  2
1 1
a0 = f (x)dx = 1dx = 1,
4 −4 4 −2

 
1 4 nπx 1 2 nπx
an = f (x) cos dx = cos dx
L −4 4 4 −2 4
1 4 nπx 2 1 nπx 2
= sin −2
= sin
4 nπ 4 nπ  4 −2
1 2nπ −2nπ 1 nπ
= sin − sin = (2 sin ),
nπ 4 4 nπ 2

 4 
1 nπx 1 2 nπx
bn = f (x) sin
dx = sin dx
L−4 4 4 −2 4
1 4 nπx 2 1 nπx 2
= − cos −2
= − cos
4 nπ 4 nπ 4 −2
1 nπ −nπ 1
= − cos − cos = (0) = 0.
nπ 2 2 nπ
If n is even, then an = 0 as sine is 0 at integer values. Thus, an contribute nonzero
values for odd n, and so we may replace n by 2n − 1. With this in mind, the Fourier
series can be written as

0 −4 < x < −2 1 2 ∞ sin( (2n−1)π )

(2n − 1)πx
2
f (x) = 1 −2 < x < 2 = + ∑ cos( ).
 2 π n=1 2n − 1 4
0 2<x<4

A.3 Even and Odd Extensions


Any function f (x) which satisfies f (−x) = f (x) is called an even function. Similarly,
any function that satisfies f (−x) = − f (x) is called an odd function. Even functions
Even and Odd Extensions 405

are symmetric about the y-axis and odd functions are symmetric about the origin.
For example, f (x) = cos(x) is an even function since cos(−x) = cos(x) and f (x) =
sin(x) is an odd function since sin(−x) = − sin(x). Using the concept of odd and even
functions, one can easily show, using (A.9) and (A.10) that the Fourier coefficients
of an even function are simply
 L
2 nπx
an = f (x) cos dx, n = 0, 1, . . .
L 0 L
and
bn = 0, n = 1, 2, . . .
and the corresponding Fourier series is called a Fourier cosine series. Similarly, for
an odd function the coefficients are
an = 0, n = 0, 1, . . .
and 
2 L nπx
bn = f (x) sin dx, n = 1, 2, . . .
L 0 L
and the corresponding Fourier series is called a Fourier sine series. This comes be-
cause the product of two even functions is even, the product of two odd functions is
even, and the product of an even and an odd function is odd. In addition, integration
from −L to L of an odd function is zero, while integration from −L to L of an even
function is twice the integral of 0 to L.
Consider the sawtooth wave, which is given by the function f (x) = x + π for −π <
x < π, and f (x + 2π) = f (x). It can be written as the sum of an even function f1 (x) =
π and an odd function f2 (x) = x. The corresponding Fourier cosine and sine series
are f1 = π and f2 = 2 sin x − 12 sin 2x + 13 sin 3x − 14 sin 4x + · · · . Addition of series


gives f (x) = π + 2 sin x − 12 sin 2x + 13 sin 3x − 14 sin 4x + · · · . (The coefficients bn




are obtained using integration by parts and bn = − 2n cos nπ.)


If a function is defined on the interval [0, L], then it is possible to expand the function
periodically onto the interval [−L, 0] by either using an even expansion (reflection
about the y axis), or an odd expansion (reflection about the origin). Both expansions
are called half-range expansions. The Fourier series of an even half-range expansion
is the Fourier cosine series, and the Fourier series of an odd half-range expansion is
the Fourier sine series. We have the following. Let fe and fo denote the even and odd
extensions of 2l-periodic function f . Then,
 
f (x), 0 < x < l, f (x), 0 < x < l,
fe (x) = fo (x) =
f (−x), −l < x < 0, − f (−x), −l < x < 0
Thus, the periodic odd extension of a function f that is piecewise continuous on the
interval (0, L), is the Fourier sine series of f given by

nπx
f (x) = ∑ bn sin( ), 0 < x < L, (A.11)
n=1 L
406 Fourier Series

fe (x)

−2L −L 0 L 2L

FIGURE A.1
Periodic even extension.

where  L
2 nπx
bn = f (x) sin( )dx, n = 1, 2, . . . . (A.12)
L 0 L
Similarly, the periodic even extension of a function f that is piecewise continuous on
the interval (0, L), is the Fourier cosine series of f given by

a0 nπx
f (x) = + ∑ an cos( ), 0 < x < L, (A.13)
2 n=1 L

where  L
2
a0 = f (x)dx,
L 0
and  L
2 nπx
an = f (x) cos( )dx, n = 1, 2, . . . .
L 0 L
In Fig. A.1, we display the cosine Fourier series of
x
f (x) = + 1, 0 < x < L.
L

Every term in the Fourier cosine series is 2L-periodic. Note that the periodic even
extension does not introduce new jumps. Similarly, if we consider
x
f (x) = + 1, 0 < x < L,
L

Every term in the Fourier sine series is 2L-periodic. Note that the periodic odd exten-
sion does not introduce new jumps if and only if f (0) = f (L) = 0. The two figures
below illustrate both cases.
We provide the following examples.
Even and Odd Extensions 407

fo (x)

2L −L 0 L 2L

FIGURE A.2
Discontinuous periodic odd extension.

fo (x)

2L −L 0 L 2L

FIGURE A.3
Continuous periodic odd extension when we require f (0) = f (L) = 0.

Example A.4 Let us find the Fourier cosine series of f (x) = x, 0 < x < π. It is
easy to see that 
2 π
a0 = xdx = π.
π 0
Using integration by parts, we find that

2 π
2 (−1)n − 1
an = x cos(nx)dx = , n = 1, 2, . . . .
π 0 π n2
By noticing that an = 0 for n is even and an = −2 for n is odd, we may replace
(−1)n − 1 with −2 and use 2n − 1 for n in the summation. Thus,
π 4 ∞ cos(2n − 1)x
x= − ∑ , 0 < x < π,
2 π n=1 (2n − 1)2
408 Fourier Series

which is the periodic even extension of f (x) = x. Note that


fe (x) = |x|, −π ≤ x ≤ π.
In conclusion we may safely write
π 4 ∞ cos(2n − 1)x
|x| = − ∑ , −π ≤ x ≤ π.
2 π n=1 (2n − 1)2

A.4 Applications of Fourier Series


Fourier series are particularly useful in the telecommunications and graphics indus-
tries in areas such as cell phones, internet, land lines, and radio communication, etc.
Radio transmitters send out radio waves, which are essentially periodic vibrations of
space. Antennae detect these vibrations as they are transmitted in all directions. In
order to determine the received signal’s coefficients, your radio receiver computes
Fourier integrals. Fourier series also play a significant part in heat transfer mod-
eling. Engineers use the Fourier series to simulate the heat transfer in spacecraft,
automobiles, jet engines, and any other system that can malfunction due to over-
heating. In this section we will use Fourier series to solve the heat equation with
Neumann condition on bounded domain. In Section 2.7.3 we studied the heat equa-
tion on semi-infinite domain with a Neumann condition. In our analysis, we will
utilize the concept of separation of variables after we recast the PDE into an ODE.
Thus, we consider the heat problem on a finite slab

 ut − kuxx = 0, 0 < x < c, t > 0
u(x, 0) = f (x), 0<x<c (A.14)
ux (0,t) = 0, ux (c,t) = 0, 0 < x < c.

We are interested in finding a non trivial solution u(x,t) of (A.14) that satisfies the
boundary and the initial conditions. We seek separated solutions of functions of
the
u(x, y) = X(x)T (t), (A.15)
where X is a function of x alone and T is a function of t alone. Note, too, that X
and T must be nontrivial. That is X ̸= 0, and T = ̸ 0. By differentiating (A.15) with
respect to t and x and substituting into (A.14) we obtain the relation
X(x)T ′ (t) = kX ′′ (x)T (t).
Since X(x) ≠ 0, and T (t) ̸= 0, we may divide by the term X(x)T (t) to separate the
variables. That is,
X ′′ (x) T ′ (t)
= .
X(x) kT (t)
Applications of Fourier Series 409

Since the left-hand side is a function of x alone, and the right-hand side is a function
of t alone, the two sides must have a common constant value −λ . That is,

X ′′ (x) T ′ (t)
= = −λ .
X(x) kT (t)

Now we check the Neumann boundary conditions. 0 = ux (0,t) = X ′ (0)T (t), implies
that X ′ (0) = 0. Similarly, 0 = ux (c,t) = X ′ (c)T (t), implies that X ′ (c) = 0. Thus, we
arrive at the Sturm-Liouville problem

X ′′ (x) + λ X(x) = 0, X ′ (0) = 0, X ′ (a) = 0, (A.16)

and at the first-order ordinary differential equation

T ′ (t) + λ T (t) = 0, t > 0. (A.17)

One can easily argue as in Section 4.14, and determine that (A.16) has the trivial
solution for λ < 0. For λ = 0, we have from (A.16) that X ′′ (x) = 0, which has
the solution X(x) = Ax + B. Applying the boundary conditions we get B = 0 and
A is arbitrary, and so we set it equal to one. Thus, for λ0 = 0, the corresponding
eigenfunction is X0 (x) = 1. Now for λ > 0, we assume λ = α 2 for positive α. Then
the general solution of (A.16) is

X(x) = A cos(αx) + B sin(αx),

and hence
X ′ (x) = −Aα sin(αx) + Bα cos(αx).
Applying X ′ (0) = 0, we automatically get B = 0. Applying X ′ (c) = 0, with B = 0
already, we arrive at
−Aα sin(αx) = 0.
To obtain a nontrivial solution we set sin(αc) = 0. This gives αc = nπ, n = 1, 2, . . . .
and obtain α = nπ nπ 2
c . Thus, for λn = ( c ) , the corresponding eigenfunctions are given
by
nπx
Xn (x) = cos( ), n = 1, 2, . . . ,
c
where we set A = 1. Turning to (A.17), we need to solve it based on the already
determined eigenvalues λ0 and λn , n = 1, 2, . . . . For λ0 = 0, equation (A.17) has the
solution constant multiple of T0 (t) = 1. Similarly, for λn we have the corresponding
eigenfunctions
2 2
− n π2 k t
Tn (t) = e c , n = 1, 2, . . . .
Thus, we may write the solution as
2 2
− n π2 k t nπx
u(x,t) = u0 (x,t) + un (x,t) = 1 + e c cos( ).
c
410 Fourier Series

Note that u satisfies both of Neumann conditions. Now by the superposition principle
the general solution of (A.14) maybe written as
∞ 2 2
a0 −n π kt nπx
u(x,t) = + ∑ an e c2 cos( ). (A.18)
2 n=1 c

By applying the initial condition u(x, 0) = f (x) to (A.18) we obtain the Fourier cosine
series

a0 nπx
f (x) = + ∑ an cos( ),
2 n=1 c
where  c
2
a0 = f (x)dx,
c 0
and  c
2 nπx
an = f (x) cos( )dx, n = 1, 2, . . . .
c 0 c

A.5 Laplacian in Polar, Cylindrical and Spherical


Coordinates
In this section, we will discuss the Laplacian ∇2 u, in polar, cylindrical, and spherical
coordinates. When solving boundary value problems in more than one dimension, it
is often necessary to use other coordinate systems than the cartesian. It is then impor-
tant to be able to express the Laplacian operator in these coordinate systems.
In two dimension the Laplacian can be written

∂ 2 u(x, y) ∂ 2 u(x, y)
∇2 u = +
∂ x2 ∂ y2
while in three dimensions we write
∂ 2 u(x, y, z) ∂ 2 u(x, y, z) ∂ 2 u(x, y, z)
∇2 u = + + .
∂ x2 ∂ y2 ∂ z2
Either equation may be written
∇2 u = 0
and we will have to learn how to express the Laplacian

∂2 ∂2 ∂2
∇2 = 2
+ 2+ 2
∂x ∂y ∂z
in different ways such as in polar, cylindrical or spherical coordinates.
Laplacian in Polar, Cylindrical and Spherical Coordinates 411

We begin by considering the Lapacian in polar coordinates. Thus, we make use of


the transformation
x = r cos θ
y = r sin θ
Laplace’s equation in this coordinate system can be shown to be,

∂ 2 u(r, θ ) 1 ∂ u(r, θ ) 1 ∂ 2 u(r, θ )


∇2 u(r, θ ) = + + 2 . (A.19)
∂ r2 r ∂r r ∂θ2
In cylindrical coordinates, we set

x = ρ cos φ

y = ρ sin φ
z=z
and it can also be shown that Laplace’s equation in cylindrical coordinates takes the
form
∂ 2 u(ρ, φ , z) 1 ∂ u(ρ, φ , z) 1 ∂ 2 u(ρ, φ , z) ∂ 2 u(ρ, φ , z)
∇2 u(ρ, φ , z) = 2
+ + 2 + .
∂ρ ρ ∂ρ ρ ∂φ2 ∂ z2
(A.20)
Note that expression (A.19) is a special case of (A.20) by simply holding z constant.
Finally, in spherical coordinates

x = ρ sin θ cos φ

y = ρ sin θ sin φ
z = ρ cos θ
we have
1 ∂ 2 ∂ u(rθ , φ )
∇2 u(r, θ , z) = [ (r )
r2 ∂ r ∂r
1 ∂ ∂ u(r, θ , φ ) 1 ∂ 2 u(r, θ , φ )
+ (sin θ )+ 2 ]. (A.21)
sin θ ∂ θ ∂θ sin θ ∂φ2
Next, we give a brief derivation of (A.20). We already know that
p y
ρ = x2 + y2 , and φ = arctan .
x
Using the chain rule we have
∂u ∂u ∂ρ ∂u ∂φ ∂u ∂z
= + + .
∂x ∂ρ ∂x ∂φ ∂x ∂z ∂x
However,
∂ρ x x ρ cos φ
=p = = = cos φ .
∂x x 2 + y2 ρ ρ
412 Fourier Series

In similar fashions, we arrive at


∂ρ ∂φ sin φ ∂φ cos φ
= sin φ , =− , and = .
∂y ∂x ρ ∂y ρ
∂u
Substituting into we have
∂x
∂u ∂ u sin φ ∂ u
= cos φ − , (A.22)
∂x ∂ρ ρ ∂φ

∂z ∂ 2u ∂u
since = 0. To obtain 2
, we replace the function u in (A.22) by . That
∂x ∂x ∂x
is,
∂ 2u ∂  ∂ u  sin φ ∂  ∂ u 
= cos φ −
∂ x2 ∂ρ ∂x ρ ∂φ ∂x
∂  ∂ u sin φ ∂ u  sin φ ∂  ∂ u sin φ ∂ u 
= cos φ cos φ − − cos φ −
∂ρ ∂ρ ρ ∂φ ρ ∂φ ∂ρ ρ ∂φ
 2
∂ u sin φ ∂ u sin φ ∂ u 2 
= cos φ cos φ 2 + 2 −
∂ρ ρ ∂φ ρ ∂ ρ∂ φ
sin φ  ∂u 2
∂ u cos φ ∂ u sin φ ∂ 2 u 
− − sin φ + cos φ − − .
ρ ∂ρ ∂φ∂ρ ρ ∂φ ρ ∂φ2
Using the fact that
∂ 2u ∂ 2u
= ,
∂ ρ∂ φ ∂φ∂ρ
the above expression simplifies to

∂ 2u ∂ 2 u 2 sin φ cos φ ∂ 2 u sin2 φ ∂ 2 u


= cos2 φ − +
∂ x2 ∂ ρ2 ρ ∂φ∂ρ ρ2 ∂ φ 2
sin2 φ ∂ u 2 sin φ cos φ ∂ u
+ + (A.23)
ρ ∂ρ ρ2 ∂φ

By similar steps, we see that


∂u ∂u ∂ρ ∂u ∂φ ∂u ∂z
= + +
∂y ∂ρ ∂y ∂φ ∂y ∂z ∂x
∂ u cos φ ∂ u
= sin φ + .
∂ρ ρ ∂φ
Moreover,
∂ 2u ∂ 2 u 2 sin φ cos φ ∂ 2 u cos2 φ ∂ 2 u
= sin2 φ + +
∂ y2 ∂ ρ2 ρ ∂φ∂ρ ρ2 ∂ φ 2
2
cos φ ∂ u 2 sin φ cos φ ∂ u
+ −2 . (A.24)
ρ ∂ρ ρ2 ∂φ
Laplacian in Polar, Cylindrical and Spherical Coordinates 413

A substitution of (A.23) and (A.24) into

∂ 2 u(x, y, z) ∂ 2 u(x, y, z) ∂ 2 u(x, y, z)


∇2 u = + + = 0,
∂ x2 ∂ y2 ∂ z2
leads to (A.20). For an application, we consider the Laplacian given by equation
(A.19) on an annulus. Using simplified notations we write
1 1
uρρ + uρ + 2 uφ φ = 0, 1 < ρ < 2, 0 < φ < π, (A.25)
ρ ρ

along with boundary conditions

u(ρ, 0) = 0, u(ρ, π) = u0 , (A.26)

u(1, φ ) = 0, u(2, φ ) = 0, (A.27)


as depicted in Fig. A.4. As in Section A.4, we seek a nontrivial solution of the
form
u(ρ, φ ) = R(ρ)Θ(φ ).
Differentiate and then substitute into (A.25), and then divide the resulting equation
with R(ρ)Θ(φ ). Finally separating the variables and setting both sides equal to −λ ,
we arrive at the two second-order differential equations

Θ′′ − λ Θ = 0,
Θ(0) = 0

and
1 ′ λ
R′′ (ρ) + R (ρ) + R(ρ) = 0, ρ > 0,
ρ ρ
R(1) = R(2) = 0.

Using the method of Section 1.11, we obtain

R′′ (t) + λ R(t) = 0, where ρ = et .

It is easy to verify that R(t) = 0 for λ ≤ 0. For λ > 0, we assume λ = α 2 , α >


0.Then
R′′ (t) + α 2 R(t) = 0,
has the general solution

R(t) = A cos(αt) + B sin(αt),

and in term of ρ, the solution takes the form

R(ρ) = A cos(α ln(ρ)) + B sin(α ln(ρ)).


414 Fourier Series

Applying 0 = R(1), we obtain A = 0. Applying the second boundary condition and


keeping in mind that A = 0, we arrive at B sin(α ln(2)) = 0, from which we obtain
α ln(2) = nπ, n = 1, 2, . . . . This yields

αn = , n = 1, 2, . . . .
ln(2)

Thus, the corresponding eigenfunctions are given by

Rn (ρ) = sin(αn ln(ρ)), n = 1, 2, . . . .

Next we normalize the eigenfunctions with respect the weight function p = ρ1 , by


setting
Rn (ρ)
ζn = , n = 1, 2, . . . ,
||Rn ||
 2
2 sin2 (α ln(ρ))
||Rn || = dρ.
1 ρ
For learning purposes, we show the necessary steps for evaluating the inte-
gral.
 2  2 
sin2 (nπ ln(ρ)) 1 − cos 2α ln(ρ)
dρ = dρ
1 ρ 1 2ρ
2  2 cos 2α ln(ρ)

1
= ln(ρ) − dρ

2 1 1 2ρ
 2 
ln(2) cos 2α ln(ρ)
= − dρ.
2 1 2ρ
du dρ
To evaluate the integral, we make the substitution u = 2α ln(ρ). Then, 2α = ρ
and
 2   2α ln(2)
cos 2α ln(ρ) cos(u)
dρ = du
1 2ρ 0 4α
sin(2α ln(2))
= −0


sin(2 ln(2) ln(2))
= = 0.

Thus, r
ln(2)
||Rn || = .
2
Hence, the normalized eigenfunctions are given by
s
2 
ζn (ρ) = sin αn ln(ρ) , n = 1, 2, . . . .
ln(2)
Laplacian in Polar, Cylindrical and Spherical Coordinates 415
y

2 u=0

∇2 u = 0

u=0

x
u = u0 u=0
−2 −1 0 1 2

FIGURE A.4
Laplacian on an annulus.

Similarly, for the same eigenvalues λn = αn2 , the system

Θ − λ Θ = 0, Θ(0) = 0,

has the general solution

Θn (φ ) = c1 eαn φ + c2 e−αn φ .

An application of the boundary condition, leads to c2 = −c1 . By setting c1 = 12 , we


then have c2 = − 12 , and the solution takes the form
 
Θn (φ ) = sinh αn φ , n = 1, 2, . . . .

So, the solution of (A.25) can be written as



u(ρ, φ ) = ∑ dn sinh(αn φ )ζn (ρ).
n=1

Using u(ρ, π) = u0 , we get



u0 = ∑ dn sinh(αn π)ζn (ρ). (A.28)
n=1

Recall that the set of functions given by ζn are normalized with respect to the weight
function p = ρ1 and therefore,

2 
1, m = n
ζn (ρ)ζm (ρ)dρ =
1 0, m = n.
416 Fourier Series

Hence, by multiplying both sided of (A.28) with ζm (ρ) and then integrate with re-
spect to ρ from 1 to 2 we arrive at
 2
u0 ζn (ρ)dρ = dn sinh(αn π),
1

or,
 2
s
1 2
u0 sin(αn ln(ρ))dρ = dn sinh(αn π).
1 ρ ln(2)
From which we obtain after integrating,
p
1 u
0 2 ln(2) 1 − (−1)n 
dn = .
sinh(αn π) π n

Accordingly, the solution takes the form

2u0 ∞ 1 − (−1)n sinh(αn φ )


u(ρ, φ ) = ∑ n sinh(αn π) sin(αn ln(ρ)),
π n=1

where

αn = , n = 1, 2, . . . .
ln(2)
Bibliography

Chapter 1
1. Bellman, R., Stability Theory of Differential Equations, McGraw-Hill Book Com-
pany, New York, London, 1953.
2. Berezansky, L., and Braverman, E., Exponential stability of difference equations
with several delays: Recursive approach, Adv. Difference. Equ. Vol. 2009, Article
ID 104310, 13.
3. Driver, R. D., Introduction to Ordinary Differential Equations, Harper & Row,
Publishers, New York, 1978.
4. Hartman, P., Ordinary Differential Equations, John Wiley & Sons, Inc., New York,
1964.
5. Kelley, W., and Peterson, A., The Theory of Differential Equations, Classical and
Qualitative, Pearson Prentice Hall, 2004.
6. Miller, R. K., Nonlinear Volterra Integral Equations, Benjamin, New York,
1971.
7. Miller, R. K., Introduction to Differential Equations, Prentice Hall 1987.
8. Raffoul, Y. N., Class Notes on Ordinary Differential Equations, University of Day-
ton, 2022.
9. Raffoul, Y. N., Advanced Differential Equations, Elsevier/Academic Press, N Y,
2022.
Chapter 2
1. Brown, J. W., Fourier Series and Boundary Value Problems, 8th edition, Mc-
Grawhill, 2012.
2. Jeffrey, A., Applied Partial Differential Equations: An Introduction, Academic
Press, 2003.
3. Logan, D. J., Applied Partial Differential Equations, Springer, 1998.
4. Myint-U, T. and L. Debnath, Linear Partial Differential Equations for Scientists
and Engineers, 4th edition, Birkhauser, 2006.
5. Olver, P. J., Introduction to Partial Differential Equations, Springer, 2014.

417
418 Bibliography

6. Prasad, P. and R. Ravindran, Linear Partial Differential Equations, Wiley Eastern,


1985.
7. Pinsky, M. A., Partial Differential Equations and Boundary Value Problems
with Applications, 3rd edition, Waveland Press Inc., Prospect Heights, Illinois,
2003.
8. Raffoul, Y. N., Class Notes on Partial Differential Equations, University of Day-
ton, 2022.
9. Rauch, J., Partial Differential Equations, Springer-Verlag, 1991.
10. Ioannis P. Stavroulakis, I. P., and Tersian, S. A., Partial differential equations,
World Scientific Publishing Co. Inc., River Edge, NJ, 2nd edition, 2004.
11. Strauss, W. A., Partial Differential Equations, 2nd edition, John Wiley & Sons
Ltd., Chichester, 2008.
Chapter 3
1. Axler, S., Linear Algebra Done Right, 3rd revised edition, Springer, 2015.
2. Barker, G. P., and Schneider. H., Matrices and Linear Algebra, 2nd revised edition,
Dover, 1989.
3. Boyd, S, and Vandenberghe, L., Introduction to Applied Linear Algebra: Vectors,
Matrices, and Least Square, Cambridge University Press, 2018.
4. Cohen, M. X., Linear Algebra: Theory, Intuition, Code, sincXpress, 2021.
5. Cullen, C. G., Matrices and Linear Transformations, 2nd edition, Dover, 1990.
6. Friedberg, S. H., Insel, A. J, and Spence, L. E., Linear Algebra, 2nd edition, Pren-
tice Hall, Englwood Cliffs, New Jersey, 1989.
7. Hildebrand, F. B., Methods of Applied Mathematics, Prentice-Hall, 1965.
8. Nering, E. D., Linear Algebra and Matrix Theory, 2nd edition, John Wiley & Sons,
1970.
9. O’Nan, M., Linear Algebra, 3rd edition, Hardcourt Brace Jovamovich, publishers
and its subsidiary, Academic Press, 1990.
10. Raffoul, Y. N., Class Notes on Linear Algebra and Matrices, University of Day-
ton, 2022.
11. Shilov, G. E., Linear Algebra, Dover, 1977.
Chapter 4
1. Anderson, I. and Thompson, G., The Inverse Problem of the Calculus of Variations
for Ordinary Differential Equations, Memoirs of the Amer. Math. Soc., vol. 98, No.
473, 1992.
Bibliography 419

2. Arfken, G., Mathematical Methods for Physicists, 2nd edition, Academic Press,
1970.
3. Arnold, V. I., Mathematical Methods of Classical Mechanics, Springer-Verlag,
1978.
4. Bliss, G. A., Lectures on the Calculus of Variations, University of Chicago Press,
1946.
5. Bolza, O., Lectures on the Calculus of Variations, G.E. Stechert and Co.,
1931.
6. Brechtken-Manderscheid, U., Introduction to the Calculus of Variations, Chapman
& Hall, 1991.
7. Carathéodory, C., Calculus of Variations and Partial Differential Equations of the
First Order, Chelsea, 1982.
8. Ewing, G. M., Calculus of Variations with Applications, Dover, 1985.
9. Forsyth, A. R., Calculus of Variations, Cambridge University Press, 1927.
10. Fox, C., An Introduction to the Calculus of Variations, Dover, 1987.
11. Fulks, W., Advanced Calculus, 3rd edition, John Wiley, 1978.
12. Gelfand, I. M. and Fomin, S. V., Calculus of Variations, Prentice-Hall, 1963.
13. Giaquinta, M. and Hildebrandt, S., Calculus of Variations I: The Lagrangian For-
malism, Springer-Verlag, 1996.
14. Giaquinta, M. and Hildebrandt, S., Calculus of Variations II: The Hamiltonian
Formalism, Springer-Verlag, 1996.
15. Hildebrand, F. B., Methods of Applied Mathematics, Prentice-Hall, 1965.
16. Morse, M., The Calculus of Variations in the Large, American Math. Soc. Collo-
quium Pub., Vol. 18, 1932.
17. Pars, L. A., A Treatise on Analytical Dynamics, Heinemann, 1965.
18. Postnikov, M. M., The Variational Theory of Geodesics, Dover, 1983.
19. Raffoul, Y. N., Advanced Differential Equations, Elsevier/Academic Press, N Y,
2022.
20. Raffoul, Y. N., Class Notes on Calculus of Variations, University of Dayton,
2022.
21. Sagan, H., Introduction to the Calculus of Variations, Dover, 1992.
22. Wan, F. W., Introduction to the Calculus of Variations and its Applications, Chap-
man & Hall, 1995.
420 Bibliography

Chapter 5
1. Constanda., C., Integral methods in science and engineering, CRC, Press,
2000.
2. Hackbusch, W., Integral Equations: Theory and Numerical Treatment, Birkhäuser,
1995.
3. Hochstadt, H., Integral Equations, Wiley, 1973.
4. Colton, D., and Kress, R., Integral equation methods in scattering theory: Classics
In Applied mathematics, SIAM, 2013.
5. Lovitt, W. V., Linear Integral Equations, Dover Publications Inc.: New York,
1950.
6. Mikhlin, S. G., Linear Integral Equations, Dover Publications, 2020.
7. Porter, D, Stirling, D. G., and et al. Integral Equations: A Practical Treatment,
from Spectral Theory to Applications, Cambridge University Press, 1991.
8. Raffoul, Y. N., Class Notes on Integral Equations, University of Dayton,
2022.
9. Rahman, M., Mathematical Methods with Applications, WIT Press: Southampton,
2000.
10. Sharma, D. C., and Goyal, M. C., Integral equations, PHI Learning, Delhi,
2017.
11. Tricomi, F. G., Integral Equations, Dover, 1985.
12. Wazwaz, A. M., A First Course in Integral Equations, World Scientific: Singa-
pore, 2015.
13. Yosida, K., Lectures on differential and integral equations, Dover Publications,
1991.
14. Zabreyko, P. P., Integral equations: A Reference Text, Springer 1976.
Index

L2 norm, 103 Cauchy conditions, 73


Cauchy problem, 55
Fourier convergence theorem, Cauchy-Euler equation, 36
401 Cayley-Hamilton Theorem, 197
Left-hand derivative, 401 Characteristic curves, 54
Left-hand limit, 400 Characteristic equation, 25, 162
Right-hand derivative, 400 Characteristic lines, 49
Right-hand limit, 400 Characteristic polynomial, 162,
197
Abel equation, 389 Characteristics, 74
Adjoint, 140 Cofactor, 137
Admissible functions, 206 Competing functions, 206
Air resistance, 80 Complementary error function, 108,
Annulus, 413 385
Apex angle, 245 Complex numbers, 170
Approximating non-degenerate kernels, Complex roots, 29
378 Conjugate, 171
Auxiliary equation, 25 Conjugate points, 231
Conservation law, 69
Banach theorem, 368
Constant coefficients, 25, 45
Bases, 157
Continuable, 396
Beam, 343
Convection, 119
Bernoulli equation, 20
Convolution, 384
Blow-up, 67
Convolution theorem, 384
Bounded solution, 120
Corners, 275
Brachistochrone curve, 241
Cycloid, 242
Brachistochrone free end point,
Cylindrical, 410
256
Cylindrical coordinates, 410
Brachistochrone problem, 203,
239 D’Alembert’s formula, 120, 122
Broken extremal, 275 D’Alembert’s solution, 79
Burger equation, 41 Degenerate, 348, 378
Burger’s equation, 63 Density, 80
Derivation of the wave equation,
Calculus of variations, 203
79
Canonical, 185
Determinant, 136
Canonical form, 75
Diagonal, 185
Catenary, 203, 241, 245, 293

421
422 Index

Diagonal matrix, 135 Fredholm, 341, 348, 356


Diagonalizable, 177 Fredholm equation, 321, 322, 324, 329,
Differential operator, 335, 345 360, 361
Dilation, 105 Fredholm integral equation, 369
Dirichlet, 111 Fredholm theorem, 349
Dirichlet boundary conditions, Free endpoint, 252, 253, 256, 262,
306 270
Dirichlet condition, 120, 124 Fundamental lemma, 207, 315
Distinct roots, 26 Fundamental solution, 22, 24, 28,
Domain of dependence, 83 110
Double factorial, 386
Gamma function, 386
Eigenfunction, 299, 357 Gauss elimination, 130
Eigenfunctions, 317, 331, 409 Generalization of Euler equation,
Eigenpair, 161 246
Eigenvalue, 357 Generalized eigenvector, 166
Eigenvalues, 161, 331 Generated, 359
Eigenvectors, 161 Geodesic, 239, 242, 317
Eignvalue, 299 Global Lipschitz condition, 324
Elastic beam, 251, 259 Gradient, 283
Elasticity, 343 Gram-Schmidt process, 172
Electrostatic, 316 Gravity, 80
Elliptic, 74 Great circle, 242
Energy function, 100, 103 Green’s function, 110, 334, 341, 343,
Error function, 385 345, 347
Euler-Lagrange equation, 206, 209, 210, Green’s theorem, 291
213, 215, 220, 221, 236, 238, Gronwall’s inequality, 392
244, 246, 247, 257, 281, 299,
315, 316, 318, 319 Harmonic oscillations, 345
Even extension, 115, 116, 121, heat, 99
404 Heat equation, 41, 99, 100, 408
Even function, 404 Heat equation on bounded domain,
Exact differential equation, 10 408
Exponential decay, 109 Heat flux, 99
Exponential matrix, 200 Heat kernel, 110, 115
External force, 344 Heavyside step function, 105
Higher-order derivatives, 246
Finite-dimensional, 156, 282 Higher-order differential equations,
First fundamental form, 318 21
First kind equation, 321 Hilbert-Schmidt theorem, 358
First variation, 223, 224 Homogeneous, 21, 43
Fixed ends, 93 Homogeneous boundary conditions,
Fourier cosine series, 406 338
Fourier series, 399 Homogeneous solution, 43
Fourier sine series, 95, 405 Hyperbolic, 74
Index 423

Identity matrix, 134 Leibnitz formula, 329


Imaginary part, 171 Linear, 41
Impact of y′ , 220 Linear algebra, 126
Impact of y′′ , 261 Linear combinations, 105
Implicit, 6 Linear differential equations, 14
Improper integral, 108, 385 Linear equations, 126
Indefinite, 184 Linearly dependent, 22, 153
Inertia, 344 Linearly independence lemma,
Infinite-dimensional, 156 155
Initial curve, 55, 74 Linearly independent, 22, 153,
Initial value problem, 3 177
Inner product, 168, 299 Linearly independent eigenvectors,
Inner product space, 168 163
Insulated, 99 Local Lipschitz condition, 325
Integral curve, 54, 60 Local minimum, 204, 235
Integral equation, 321, 323, 325 Lotka-Volterra, 160
Integral equations, 321
Integral surface, 53–55 Matrices, 126
Invariance properties of the heat Matrix, 133, 135
equations, 106 Mean value theorem, 280
Invariant, 106 Method of variation of constants,
Inverse of a matrix, 139 33
Isoperimetric constraint, 287 Minimal surface, 239
Isoperimetric problems, 287 Modulus, 171
Iterated kernels, 372 Multiple integrals, 313
Iterative methods, 365 Multiplicity, 162

Jacobi Equation, 234 Natural boundary, 252


Jacobi necessary condition, 235 Navigation, 203
Jacobian, 74 Necessary, 205
Negative definite, 183
Kernel, 322, 348 Negative semidefinite, 184
Kinetic energy, 241 Neumann conditions, 115, 120, 121,
Kronecker delta, 135, 170 306, 408
Neumann series, 365, 369, 371
Lagrange multiplier, 283, 288 Newton’s second law, 80
Laplace equation, 41 Non-characteristic, 55
Laplace inverse, 383 Noncharacteristic, 74
Laplace table, 387 Nonhomogeneous, 21, 43, 44
Laplace transform, 323, 381 Nonhomogeneous equations, 30
Laplacian, 410 Nonhomogeneous systems, 126
Laplacian in circular domain, Nonhomogeneous wave equation,
414 87
Least square approximation, 147 Nonhomogenous boundaries, 114,
Legendre necessary condition, 124
227
424 Index

Nonhomogenous heat problem, Propagator, 110


111
Noninvertible, 139 Quadratic forms, 182
Nonlinear integral equations, 374 Quasi-Linear equations, 53
Nontirvial, 130
Norm, 169, 203, 299 Rank, 145
Normalized eigenfunctions, 309, 358, Rank of a matrix, 136
359, 361, 363, 364 Rayleigh Ritz method, 310
Normed space, 169 Reaction force, 260
Null matrix, 135 Reaction moment, 260
Real part, 171
Odd extension, 111, 112, 120, Rectangular hyperbola, 275
404 Relative minimum, 204
Odd function, 404 Repeated roots, 28
One-way wave equation, 63 Resolvent, 371
Operator, 368, 371 Resolvent kernel, 372
Order, 41 Ricatti, 231
Order of a differential equation, Riemann sum, 106
2 River crossing, 254
Ordinary differential equations, 2 Robin boundary conditions, 93
Orthogonal, 170, 299
Orthogonally diagonalizable, Saddle, 238
Scaling, 105
180
Orthonormal, 170, 180, 300 Schrödinger equation, 299
Schwartz inequality, 169
Parabolic, 74 Second kind equation, 321
Partial differential equations, 40 Second variation, 224
Particular solution, 43, 106 Second-order PDEs, 73
Path independent, 221, 282 Self-adjoint, 334, 337
Periodic even extension, 406 Semiinfinite string, 91
Periodic odd extension, 95, 405 Separable, 348
Picard’s theorem, 326 Separable equations, 5
Poincare inequality, 226 Separable kernel, 348
Poisson’s equation, 334 Separation of Variables, 408
Polar, 105, 296, 410 Several variables, 246
Polar coordinates, 410 Shock path, 69
Positive definite, 183, 231 Singular, 139, 391
Positive definite matrix, 184 Skew-symmetric, 135
Positive semidefinite, 184 Smooth function, 214
Potential energy, 241, 297 Solution, 325
Predator-Prey model, 160 Solution set, 126
Principal submatrix, 189 Source function, 110
Principle of superposition, 42 Span, 154, 156
Problem of Bolza, 271 Spatial translation, 105
Progressive wave, 85 Spherical, 410
Spherical coordinates, 243, 410
Index 425

Square error, 104 Transpose, 134


Square matrix, 133 Transversality condition, 269,
Stability, 83, 103 273
Standing wave, 299 Triangular matrix, 135
Strain energy, 259 Trivial, 130
Strict, 205
Strict minimum, 204 Uniformly, 120
Sturm-Liouville problem, 298, 317, 331,
360, 409 Variable coefficients, 50
Submatrix, 145 Variation of parameters, 14
Subspace, 155 Vector spaces, 152
Successive approximation, 323 Vibrating string, 93
Sufficient, 205 Volterra, 326
Sufficient condition, 235 Volterra equation, 322, 323, 368,
Superposition principle, 410 370–372
Surfaces of revolution, 203 Volterra integral equation, 365
Sylvester’s formula, 198, 200
Wave equation, 41, 79
Symmetric, 135, 337
Wave equation on semi-infinite domain,
Symmetric kernel, 356
120
Systems, 126
Weight function, 299
Tension force, 80 Weighted inner product, 176
Thermal conductivity, 99 Weirstrass-Erdmann corner conditions,
Thermal diffusivity, 100 279
thermal energy, 99 Work potential, 259
Traffic flow, 72 Wronskian, 23, 336
Transformation, 46 Wronskian method, 33
Transformations, 74
Zero matrix, 135
Transport equation, 40

You might also like