0% found this document useful (0 votes)
12K views997 pages

Mathematical Techniques An Introduction For The Engineering, Physical

Uploaded by

skstylw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12K views997 pages

Mathematical Techniques An Introduction For The Engineering, Physical

Uploaded by

skstylw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 997

Mathematical Techniques

An Introduction for the Engineering,


Physical, and Mathematical Sciences
Mathematical
Techniques

An Introduction for the Engineering,


Physical, and Mathematical Sciences

FOURTH EDITION

D. W. Jordan and P. Smith


Department of Mathematics
Keele University

1
1
Great Clarendon Street, Oxford OX2 6DP
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide in
Oxford New York
Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto
With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam
Oxford is a registered trade mark of Oxford University Press
in the UK and in certain other countries
Published in the United States
by Oxford University Press Inc., New York
© D. W. Jordan and P. Smith, 2008
The moral rights of the authors have been asserted
Database right Oxford University Press (maker)
First edition 1994
Second edition 1997
Third edition 2002
Fourth edition 2008
Reprinted 2010
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
without the prior permission in writing of Oxford University Press,
or as expressly permitted by law, or under terms agreed with the appropriate
reprographics rights organization. Enquiries concerning reproduction
outside the scope of the above should be sent to the Rights Department,
Oxford University Press, at the address above
You must not circulate this book in any other binding or cover
and you must impose the same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
Library of Congress Cataloging in Publication Data
Data available
Typeset by Graphicraft Limited, Hong Kong
Printed in Italy
on acid-free paper by
L.E.G.O S.p.A. – Lavis TN

ISBN 978–0–19–928201–2

3 5 7 9 10 8 6 4 2
Preface to the fourth edition

This revised and updated edition of Mathematical Techniques, published in pre-


vious editions in 1994, 1997, and 2002, is a student text covering the techniques
used in the early stages of science and engineering degrees and also providing the
groundwork of methods needed by first and second year mathematics specialists.
The requirements of such students determined its content and presentation,
helped in many ways by the authors’ long and continuous experience of teach-
ing the material to a great diversity of joint and specialist degree students at
Keele University and elsewhere, including many who started with only a minimal
background in mathematics.
The book has been completely re-set, with an improved layout of the text and
diagrams, and incorporating a discreet use of colour. The textual differences
from earlier editions consist of the inclusion of two new sections (on nonlinear
differential equations in Chapter 23 and on Stokes’s theorem in Chapter 34);
introductions to every chapter; self-test exercises at the end of most sections;
thorough revision of all the problems; and numerous refinements of the text in
the interests of clarity. There is also a list of references under Further Reading,
and a new appendix on physical dimensions and units. However, the sequence of
material has not been appreciably altered for this edition and users of the earlier
editions should experience minimum disruption.

Prerequisites and style


From Chapter 2 onwards, so far as is possible, all the fundamental supporting
topics start from scratch. For example, it is not assumed that the reader is at all
familiar with calculus. Most science and engineering students are likely to have
had some previous exposure and if so, the text can be used for revision as well as
to extend their knowledge of the subject. The same is true for vector algebra,
matrix algebra, and probability. Very simple numerical methods for equation-
solving, integration and the solution of differential equations are introduced at
the points where they can effectively illustrate the main text, rather than being
collected together in a separate chapter.
Students of science should benefit by cultivating geometrical reasoning, and
the reader is encouraged to look out for some kind of graphical or numerical
reality to attach to symbolic statements. Where possible we make use of geo-
metrical intuition rather than analytic proof, since rigorous proofs cannot be
understood without substantial supporting courses in formal analysis.
There are over 500 fully worked examples in the text to illustrate the use and
applications of the techniques explained in the individual sections.
Attempts to generate interest by using specialised examples from physics,
chemistry, engineering, etc. are liable to misfire for many students, who have
enough to do at first in grasping the underlying mathematical processes, and are
confused by layers of scientific vocabulary and unfamiliar notations. We have
therefore placed certain technical applications such as phasors, wave motion and
vi
circuit analysis in separate chapters or sections, so that they can be avoided if they
do not suit a particular class. Most of the applications in the main text are drawn
PREFACE TO THE FOURTH EDITION

from common knowledge, or can easily be understood.


Confidence is half the battle in learning mathematics, and it is very encourag-
ing for students to learn how to do something so as to get it nearly right most
times. Continual practice, even to the point of repetitive drill, is one way to
achieve this. The beneficial effects of practice can be obtained without using
very complicated and difficult exercises, so on the whole we have avoided such
problems.

Navigating the book


The book is arranged so as to enable students to use it with the minimum of
guidance, and as a source of reference. The same features will enable teachers
to accommodate limitations on difficulty or relevance applying to a particular
class, and to select a coherent course from the available material. We have tried to
organize the text accordingly.
The book is divided into eight parts, roughly corresponding to subjects that
commonly go together at a particular level; they give a snapshot of the techniques
covered:
I Elementary methods, differentiation, complex numbers
II Matrix and vector algebra
III Integration and differential equations
IV Transforms and Fourier series
V Multivariable calculus
VI Discrete mathematics
VII Probability and statistics
VIII Projects
Each part contains a group of chapters. Most chapters are short – the average
length of all chapters is about 20 pages – and each of the several sections within
a chapter includes not more than one or two new ideas. There is extensive cross-
referencing, and a very detailed index. The principal results are displayed in
detailed summary form in shaded boxes, and for revision, or in desperate cases,
progress might be made by attending only to the boxes.

Supplementary material
The book has an associated Resource Centre at Oxford University Press which is
open access at
www.oxfordtextbooks.co.uk/orc /jordan_smith 4e
It includes a Solutions Manual with model solutions of over 3000 end-of-chapter
problems in Mathematical Techniques, and a Computer Program Companion
which lists Mathematica™ programs for use with Chapter 42. It also features
figures from the book in electronic format, for lecturers to use for teaching
purposes.
vii
The supplementary Chapter 42 comprises a list of over 120 projects, following
the text chapter by chapter, which can be used as possible questions to be solved

PREFACE TO THE FOURTH EDITION


using symbolic computation. Symbolic computation is a useful interactive facility
for many routine but time-consuming processes such as factorization, integra-
tion, and so on, of algebraic expressions, matrix algebra, and three-dimensional
sketching. The projects in Chapter 42 were prepared with reference to
Mathematica™, but may be used as practice with any similar system. However,
the main text is software-free and does not require access to software in order to
understand it.

Acknowledgements
We should like to continue to acknowledge our thanks for help received from
individuals mentioned in previous editions. The development, writing and organ-
ization of textbooks together with colour printing, web-based resource centres,
and associated software has become a increasingly complex process. We wish to
express our appreciation of the helpfulness of the staff at Oxford University Press
during the production of this new edition.

Keele DWJ
March 2008 PS
Brief Contents

Part 1 Elementary methods, Part 4 Transforms and Fourier Series


differentiation, complex numbers
24 The Laplace transform 505
1 Standard functions and techniques 3 25 Laplace and z transforms: applications 527
2 Differentiation 61 26 Fourier series 562
3 Further techniques for differentiation 82 27 Fourier transforms 586
4 Applications of differentiation 100
5 Taylor series and approximations 124
6 Complex numbers 140 Part 5 Multivariable calculus
28 Differentiation of functions of two variables 623
29 Functions of two variables: geometry and
Part 2 Matrix and vector algebra
formulae 645
7 Matrix algebra 161 30 Chain rules, restricted maxima, coordinate
8 Determinants 179 systems 664
9 Elementary operations with vectors 193 31 Functions of any number of variables 683
10 The scalar product 219 32 Double integration 708
11 Vector product 244 33 Line integrals 735
12 Linear algebraic equations 259 34 Vector fields: divergence and curl 762
13 Eigenvalues and eigenvectors 279

Part 6 Discrete mathematics


Part 3 Integration and differential
35 Sets 789
equations 36 Boolean algebra: logic gates and switching
14 Antidifferentiation and area 307 functions 801
15 The definite and indefinite integral 320 37 Graph theory and its applications 814
16 Applications involving the integral as a 38 Difference equations 842
sum 341
17 Systematic techniques for integration 356
Part 7 Probability and statistics
18 Unforced linear differential equations with
constant coefficients 379 39 Probability 865
19 Forced linear differential equations 395 40 Random variables and probability distributions 884
20 Harmonic functions and the harmonic 41 Descriptive statistics 903
oscillator 413
21 Steady forced oscillations: phasors, impedance,
transfer functions 442 Part 8 Projects
22 Graphical, numerical, and other aspects of
42 Applications projects using symbolic
first-order equations 460
computing 919
23 Nonlinear differential equations and the phase
plane 480
Self-tests: Selected answers 931
Answers to selected problems 937
Appendices 948
Further reading 961
Index 962
Detailed Contents

Part 1 Elementary methods, differentiation, complex numbers

1 Standard functions and techniques 3

1.1 Real numbers, powers, inequalities 3


1.2 Coordinates in the plane 6
1.3 Graphs 7
1.4 Functions 12
1.5 Radian measure of angles 16
1.6 Trigonometric functions; properties 17
1.7 Inverse functions 23
1.8 Inverse trigonometric functions 25
1.9 Polar coordinates 28
1.10 Exponential functions; the number e 30
1.11 The logarithmic function 33
1.12 Exponential growth and decay 35
1.13 Hyperbolic functions 36
1.14 Partial fractions 39
1.15 Summation sign: geometric series 43
1.16 Infinite geometric series 45
1.17 Permutations and combinations 46
1.18 The binomial theorem 51
Problems 55

2 Differentiation 61

2.1 The slope of a graph 62


2.2 The derivative: notation and definition 65
2.3 Rates of change 67
2.4 Derivative of x n (n = 0, 1, 2, 3, … ) 69
2.5 Derivatives of sums: multiplication by constants 70
2.6 Three important limits 72
2.7 Derivatives of e x, sin x, cos x, ln x 74
2.8 A basic table of derivatives 76
2.9 Higher-order derivatives 77
2.10 An interpretation of the second derivative 79
Problems 80

3 Further techniques for differentiation 82

3.1 The product rule 83


3.2 Quotients and reciprocals 85
3.3 The chain rule 86
3.4 Derivative of x n for any value of n 89
3.5 Functions of ax + b 90
3.6 An extension of the chain rule 91
3.7 Logarithmic differentiation 92
xii
3.8 Implicit differentiation 93
3.9 Derivatives of inverse functions 94
CONTENTS

3.10 Derivative as a function of a parameter 95


Problems 98

4 Applications of differentiation 100


4.1 Function notation for derivatives 100
4.2 Maxima and minima 102
4.3 Exceptional cases of maxima and minima 106
4.4 Sketching graphs of functions 108
4.5 Estimating small changes 114
4.6 Numerical solution of equations: Newton’s method 116
4.7 The binomial theorem: an alternative proof 120
Problems 121

5 Taylor series and approximations 124


5.1 The index notation for derivatives of any order 125
5.2 Taylor polynomials 125
5.3 A note on infinite series 128
5.4 Infinite Taylor expansions 130
5.5 Manipulation of Taylor series 132
5.6 Approximations for large values of x 134
5.7 Taylor series about other points 134
5.8 Indeterminate values; l’Hôpital’s rule 136
Problems 138

6 Complex numbers 140


6.1 Definitions and rules 141
6.2 The Argand diagram, modulus, conjugate 144
6.3 Complex numbers in polar coordinates 146
6.4 Complex numbers in exponential form 148
6.5 The general exponential form 151
6.6 Hyperbolic functions 153
6.7 Miscellaneous applications 154
Problems 156

Part 2 Matrix and vector algebra

7 Matrix algebra 161


7.1 Matrix definition and notation 161
7.2 Rules of matrix algebra 162
7.3 Special matrices 168
7.4 The inverse matrix 172
Problems 177

8 Determinants 179
8.1 The determinant of a square matrix 179
8.2 Properties of determinants 182
xiii
8.3 The adjoint and inverse matrices 189
Problems 190

CONTENTS
9 Elementary operations with vectors 193
9.1 Displacement along an axis 193
9.2 Displacement vectors in two dimensions 195
9.3 Axes in three dimensions 198
9.4 Vectors in two and three dimensions 198
9.5 Relative velocity 204
9.6 Position vectors and vector equations 206
9.7 Unit vectors and basis vectors 210
9.8 Tangent vector, velocity, and acceleration 212
9.9 Motion in polar coordinates 214
Problems 216

10 The scalar product 219


10.1 The scalar product of two vectors 219
10.2 The angle between two vectors 220
10.3 Perpendicular vectors 222
10.4 Rotation of axes in two dimensions 223
10.5 Direction cosines 225
10.6 Rotation of axes in three dimensions 226
10.7 Direction ratios and coordinate geometry 229
10.8 Properties of a plane 230
10.9 General equation of a straight line 234
10.10 Forces acting at a point 235
10.11 Tangent vector and curvature in two dimensions 238
Problems 240

11 Vector product 244


11.1 Vector product 244
11.2 Nature of the vector p = a × b 246
11.3 The scalar triple product 249
11.4 Moment of a force 251
11.5 Vector triple product 255
Problems 256

12 Linear algebraic equations 259


12.1 Cramer’s rule 260
12.2 Elementary row operations 262
12.3 The inverse matrix by Gaussian elimination 265
12.4 Compatible and incompatible sets of equations 267
12.5 Homogeneous sets of equations 271
12.6 Gauss–Seidel iterative method of solution 273
Problems 275

13 Eigenvalues and eigenvectors 279


13.1 Eigenvalues of a matrix 279
13.2 Eigenvectors 281
xiv
13.3 Linear dependence 285
13.4 Diagonalization of a matrix 286
CONTENTS

13.5 Powers of matrices 289


13.6 Quadratic forms 292
13.7 Positive-definite matrices 295
13.8 An application to a vibrating system 298
Problems 301

Part 3 Integration and differential equations

14 Antidifferentiation and area 307


14.1 Reversing differentiation 307
14.2 Constructing a table of antiderivatives 311
14.3 Signed area generated by a graph 314
14.4 Case where the antiderivative is composite 317
Problems 318

15 The definite and indefinite integral 320


15.1 Signed area as the sum of strips 320
15.2 Numerical illustration of the sum formula 321
15.3 The definite integral and area 323
15.4 The indefinite-integral notation 324
15.5 Integrals unrelated to area 326
15.6 Improper integrals 328
15.7 Integration of complex functions: a new type of integral 331
15.8 The area analogy for a definite integral 333
15.9 Symmetric integrals 333
15.10 Definite integrals having variable limits 336
Problems 338

16 Applications involving the integral as a sum 341

16.1 Examples of integrals arising from a sum 341


16.2 Geometrical area in polar coordinates 344
16.3 The trapezium rule 346
16.4 Centre of mass, moment of inertia 348
Problems 353

17 Systematic techniques for integration 356

17.1 Substitution method for ∫ f(ax + b) dx 356


17.2 Substitution method for ∫ f(ax2 + b)x dx 359
17.3 Substitution method for ∫ cosmax sinnax dx (m or n odd) 360
17.4 Definite integrals and change of variable 362
17.5 Occasional substitutions 364
17.6 Partial fractions for integration 366
17.7 Integration by parts 368
17.8 Integration by parts: definite integrals 371
17.9 Differentiating with respect to a parameter 373
Problems 375
xv

18 Unforced linear differential equations with constant coefficients 379

CONTENTS
18.1 Differential equations and their solutions 380
18.2 Solving first-order linear unforced equations 382
18.3 Solving second-order linear unforced equations 384
18.4 Complex solutions of the characteristic equation 388
18.5 Initial conditions for second-order equations 391
Problems 393

19 Forced linear differential equations 395

19.1 Particular solutions for standard forcing terms 395


19.2 Harmonic forcing term, by using complex solutions 399
19.3 Particular solutions: exceptional cases 403
19.4 The general solution of forced equations 404
19.5 First-order linear equations with a variable coefficient 407
Problems 411

20 Harmonic functions and the harmonic oscillator 413

20.1 Harmonic oscillations 413


20.2 Phase difference: lead and lag 415
20.3 Physical models of a differential equation 417
20.4 Free oscillations of a linear oscillator 419
20.5 Forced oscillations and transients 420
20.6 Resonance 423
20.7 Nearly linear systems 425
20.8 Stationary and travelling waves 427
20.9 Compound oscillations; beats 431
20.10 Travelling waves; beats 434
20.11 Dispersion; group velocity 436
20.12 The Doppler effect 437
Problems 439

21 Steady forced oscillations: phasors, impedance, transfer functions 442

21.1 Phasors 442


21.2 Algebra of phasors 444
21.3 Phasor diagrams 445
21.4 Phasors and complex impedance 446
21.5 Transfer functions in the frequency domain 451
21.6 Phasors and waves; complex amplitude 453
Problems 458

22 Graphical, numerical, and other aspects of first-order equations 460

22.1 Graphical features of first-order equations 460


22.2 The Euler method for numerical solution 463
22.3 Nonlinear equations of separable type 466
22.4 Differentials and the solution of first-order equations 469
22.5 Change of variable in a differential equation 473
Problems 476
xvi

23 Nonlinear differential equations and the phase plane 480


CONTENTS

23.1 Autonomous second-order equations 481


23.2 Constructing a phase diagram for (x, x· ) 482
23.3 (x, x· ) phase diagrams for other linear equations; stability 486
23.4 The pendulum equation 489
23.5 The general phase plane 491
23.6 Approximate linearization 494
23.7 Classification of linear equilibrium points 496
23.8 Limit cycles 497
23.9 A numerical method for phase paths 499
Problems 501

Part 4 Transforms and Fourier Series

24 The Laplace transform 505


24.1 The Laplace transform 505
24.2 Laplace transforms of t n, e ±t, sin t, cos t 506
24.3 Scale rule; shift rule; factors t n and e kt 508
24.4 Inverting a Laplace transform 512
24.5 Laplace transforms of derivatives 515
24.6 Application to differential equations 516
24.7 The unit function and the delay rule 519
24.8 The division rule for f(t)/t 524
Problems 525

25 Laplace and z transforms: applications 527


25.1 Division by s and integration 527
25.2 The impulse function 530
25.3 Impedance in the s domain 533
25.4 Transfer functions in the s domain 535
25.5 The convolution theorem 541
25.6 General response of a system from its impulsive response 543
25.7 Convolution integral in terms of memory 544
25.8 Discrete systems 545
25.9 The z transform 548
25.10 Behaviour of z transforms in the complex plane 552
25.11 z transforms and difference equations 556
Problems 558

26 Fourier series 562


26.1 Fourier series for a periodic function 563
26.2 Integrals of periodic functions 564
26.3 Calculating the Fourier coefficients 566
26.4 Examples of Fourier series 569
26.5 Use of symmetry: sine and cosine series 572
26.6 Functions defined on a finite range: half-range series 574
26.7 Spectrum of a periodic function 577
26.8 Obtaining one Fourier series from another 578
26.9 The two-sided Fourier series 579
Problems 582
xvii

27 Fourier transforms 586

CONTENTS
27.1 Sine and cosine transforms 587
27.2 The exponential Fourier transform 590
27.3 Short notations: alternative expressions 592
27.4 Fourier transforms of some basic functions 593
27.5 Rules for manipulating transforms 596
27.6 The delta function and periodic functions 599
27.7 Convolution theorem for Fourier transforms 601
27.8 The shah function 605
27.9 Energy in a signal: Rayleigh’s theorem 607
27.10 Diffraction from a uniformly radiating strip 608
27.11 General source distribution and the inverse transform 612
27.12 Transforms in radiation problems 613
Problems 618

Part 5 Multivariable calculus

28 Differentiation of functions of two variables 623


28.1 Depiction of functions of two variables 624
28.2 Partial derivatives 627
28.3 Higher derivatives 629
28.4 Tangent plane and normal to a surface 632
28.5 Maxima, minima, and other stationary points 635
28.6 The method of least squares 638
28.7 Differentiating an integral with respect to a parameter 640
Problems 642

29 Functions of two variables: geometry and formulae 645


29.1 The incremental approximation 645
29.2 Small changes and errors 648
29.3 The derivative in any direction 651
29.4 Implicit differentiation 654
29.5 Normal to a curve 657
29.6 Gradient vector in two dimensions 659
Problems 662

30 Chain rules, restricted maxima, coordinate systems 664


30.1 Chain rule for a single parameter 664
30.2 Restricted maxima and minima: the Lagrange multiplier 667
30.3 Curvilinear coordinates in two dimensions 672
30.4 Orthogonal coordinates 675
30.5 The chain rule for two parameters 676
30.6 The use of differentials 679
Problems 681

31 Functions of any number of variables 683


31.1 The incremental approximation; errors 683
31.2 Implicit differentiation 686
xviii
31.3 Chain rules 688
31.4 The gradient vector in three dimensions 688
CONTENTS

31.5 Normal to a surface 690


31.6 Equation of the tangent plane 691
31.7 Directional derivative in terms of gradient 692
31.8 Stationary points 696
31.9 The envelope of a family of curves 702
Problems 704

32 Double integration 708


32.1 Repeated integrals with constant limits 709
32.2 Examples leading to repeated integrals with constant limits 710
32.3 Repeated integrals over non-rectangular regions 713
32.4 Changing the order of integration for non-rectangular regions 715
32.5 Double integrals 717
32.6 Polar coordinates 721
32.7 Separable integrals 724
32.8 General change of variable; the Jacobian determinant 727
Problems 732

33 Line integrals 735


33.1 Evaluation of line integrals 736
33.2 General line integrals in two and three dimensions 739
33.3 Paths parallel to the axes 743
33.4 Path independence and perfect differentials 744
33.5 Closed paths 746
33.6 Green’s theorem 748
33.7 Line integrals and work 750
33.8 Conservative fields 752
33.9 Potential for a conservative field 754
33.10 Single-valuedness of potentials 756
Problems 759

34 Vector fields: divergence and curl 762


34.1 Vector fields and field lines 762
34.2 Divergence of a vector field 764
34.3 Surface and volume integrals 765
34.4 The divergence theorem; flux of a vector field 770
34.5 Curl of a vector field 773
34.6 Cylindrical polar coordinates 777
34.7 General curvilinear coordinates 779
34.8 Stokes’s theorem 781
Problems 785

Part 6 Discrete mathematics

35 Sets 789
35.1 Notation 789
35.2 Equality, union, and intersection 790
xix
35.3 Venn diagrams 792
Problems 799

CONTENTS
36 Boolean algebra: logic gates and switching functions 801
36.1 Laws of Boolean algebra 801
36.2 Logic gates and truth tables 803
36.3 Logic networks 805
36.4 The inverse truth-table problem 808
36.5 Switching circuits 809
Problems 812

37 Graph theory and its applications 814


37.1 Examples of graphs 815
37.2 Definitions and properties of graphs 817
37.3 How many simple graphs are there? 818
37.4 Paths and cycles 820
37.5 Trees 821
37.6 Electrical circuits: the cutset method 823
37.7 Signal-flow graphs 827
37.8 Planar graphs 831
37.9 Further applications 834
Problems 837

38 Difference equations 842


38.1 Discrete variables 842
38.2 Difference equations: general properties 845
38.3 First-order difference equations and the cobweb 847
38.4 Constant-coefficient linear difference equations 849
38.5 The logistic difference equation 854
Problems 859

Part 7 Probability and statistics

39 Probability 865
39.1 Sample spaces, events, and probability 866
39.2 Sets and probability 868
39.3 Frequencies and combinations 872
39.4 Conditional probability 875
39.5 Independent events 877
39.6 Total probability 879
39.7 Bayes’ theorem 880
Problems 881

40 Random variables and probability distributions 884


40.1 Probability distributions 885
40.2 The binomial distribution 887
40.3 Expected value and variance 889
40.4 Geometric distribution 891
xx
40.5 Poisson distribution 892
40.6 Other discrete distributions 894
CONTENTS

40.7 Continuous random variables and distributions 895


40.8 Mean and variance of continuous random variables 897
40.9 The normal distribution 898
Problems 901

41 Descriptive statistics 903


41.1 Representing data 903
41.2 Random samples and sampling distributions 908
41.3 Sample mean and variance, and their estimation 910
41.4 Central limit theorem 911
41.5 Regression 913
Problems 915

Part 8 Projects

42 Applications projects using symbolic computing 919


42.1 Symbolic computation 919
42.2 Projects 920

Self-tests: Selected answers 931

Answers to selected problems 937

Appendices 948
A Some algebraical rules 948
B Trigonometric formulae 949
C Areas and volumes 951
D A table of derivatives 952
E Table of indefinite and definite integrals 953
F Laplace transforms, inverses, and rules 955
G Exponential Fourier transforms and rules 956
H Probability distributions and tables 957
I Dimensions and units 959

Further reading 961

Index 962
Part 1
Elementary methods,
differentiation, complex
numbers
Standard functions
and techniques 1

CONTENTS

1.1 Real numbers, powers, inequalities 3


1.2 Coordinates in the plane 6
1.3 Graphs 7
1.4 Functions 12
1.5 Radian measure of angles 16
1.6 Trigonometric functions; properties 17
1.7 Inverse functions 23
1.8 Inverse trigonometric functions 25
1.9 Polar coordinates 28
1.10 Exponential functions; the number e 30
1.11 The logarithmic function 33
1.12 Exponential growth and decay 35
1.13 Hyperbolic functions 36
1.14 Partial fractions 39
1.15 Summation sign: geometric series 43
1.16 Infinite geometric series 45
1.17 Permutations and combinations 46
1.18 The binomial theorem 51
Problems 55

This is a long chapter covering a variety of subjects, some of which you will have
met before. It is not necessary to work through every section in detail; to a large
extent the chapter can be used for reference as required later on. However, you
should read it carefully in order to find what is in it, and to pick up terms and
notations used regularly in the rest of the book. If you find that a familiar subject
is treated in an unfamiliar way, try to understand the fresh approach since the
ideas behind it are liable to reappear in later chapters.

1.1 Real numbers, powers, inequalities


The real numbers are the ordinary numbers used in arithmetic and measurement.
(We call them ‘real’ to distinguish them from the so-called ‘complex numbers’,
to be introduced in Chapter 6.) The following terms are used to classify special
types of real number:
4
1. An integer is a ‘whole number’, positive, negative, or zero; integers are the
numbers … , −3, −2, −1, 0, 1, 2, … .
STANDARD FUNCTIONS AND TECHNIQUES

2. A rational number is any number that can be expressed as a fraction having the
form p/q, where p and q are integers. They consist of all numbers expressible
as finite or recurring decimals. Examples of rational numbers with recurring
decimals are 1/3, which has the decimal representation 0.3333… written as
0.3, and 1/7 which has the recurring decimal form 0.142 857 (the dots mark out
the decimal repetition pattern). Notice that the integers are rational numbers
in this definition.
3. The rest are irrational numbers. These are the numbers that cannot be
expressed as fractions made up of integers; they are represented by infinite,
non-recurring decimals. Although there is an infinite number of rational num-
bers, there is a sense in which there are infinitely more irrational numbers, so
they appear everywhere. For example, the hypotenuse of a right-angled triangle
with sides of unit length has length √2, and this is known to be an irrational
number. The number π is irrational, and so is the number e which we will meet
in Section 1.8. Irrational numbers can be approximated as closely as we wish
by rational numbers: retain the appropriate number of decimal places, and
1

the resulting approximation is a rational number. For example, π = 3.141 to


3 decimal places, which is the rational number 3141/1000.
4. The symbol ∞, standing for the word infinity, is frequently useful, but it
cannot be used in algebra as an ordinary number. Claims such as ∞/∞ = 1, or
∞ − ∞ = 0, are fallacious. For example, if we take away the infinity of odd integers
from the infinity of all integers, an infinity of even integers is left behind.
We have continually to manipulate powers of numbers, which take the form ax.
The power x is called the exponent or index in the expression. It is assumed that
you know how to use the rules when x is a positive or negative integer, and can
interpret fractional powers by their connection with square roots, cube roots, and
1 1 3
so on. For example, 2–2 = √2, 2− –2 = 1/√2, and 2–2 = (√2)3 = √(23). The rules applying to
general exponents work in the same way, as follows:

Rules for exponents


a and b are any positive real numbers; x and y are any real numbers, positive,
negative, or zero. Then
(a) axay = ax+ y.
(b) a0 = 1.
(c) a−x = 1/ax.
(d) (ax)y = axy.
(e) axbx = (ab)x. (1.1)

1
The notations a –2 and √a always stand for the positive square root of a. If we
want the negative square root we must attach a minus sign. Thus the solutions of
the equation x2 = 2 are written separately as √2 and −√2, or as ±√2.
The condition a  0 is necessary if the rules are to apply to all exponents; for
1
example, (–2)–2 has no meaning in real-number terms since the square of any real
5
x
number is always positive. If a is negative, then sometimes a is a real number,
but only if x = p/q in its lowest terms where p and q are integers and q is an odd

1.1
integer. For example, (−8)3 = −2, because (−2)3 = −8.
1

The concept of an identity needs to be distinguished from that of a mere equa-

REAL NUMBERS, POWERS, INEQUALITIES


tion. The statement x2 + 2x + 1 = 0 is an equation: it is only true conditionally;
that is, for particular values of x. On the other hand statements such as
x2 + 2xy + y2 = (x + y)2 and sin2A + cos2A = 1
are called identities, because they are automatically true for all values of x,y, and
A, and to stress the difference we may use the sign ≡ instead of =. Anything
involving , , etc., is an inequality. An algebraic ‘phrase’ standing on its own,
such as x2 + 2x + 1 or sin2A + cos2A, is an expression. However, virtually anything
with an =, ≡, or inequality sign in it is commonly referred to as an equation.
Draw a straight line containing a point O, called the origin, and indicate a
scale starting at O, as in Fig. 1.1, with positive scale markings to the right of O
and negative markings to the left of. Imagine the line to be infinitely long in
both directions. This is called a number line, and every real number, positive or
negative, has a place on it. We shall use x to denote a general number.

–3 –2 –1 0 1 2 3 x

Fig. 1.1 The number line on the x axis.

The symbols for inequalities , , ,  have the following meanings:


 ‘is less than’  ‘is less than or equal to’
 ‘is greater than’  ‘is greater than or equal to’.
If we are given two numbers, then the one which is further to the right on the
number line is the greater one. Therefore −2  −3, −3  −2, 3  0, −3  0, and so
on. Obviously 2  3. But it is also true that 2  3, because 2 is certainly either less
than or equal to 3. For similar reasons, all these are true: 1 = 1, 1  1, 1  1.
A single piece, or a segment, of the number line is called an interval. The piece
of the line between x = 2 and x = 3 which includes both end-points x = 2 and 3 can
be specified by the expression
the interval 2  x  3
which means ‘all the values of x between, and including, 2 and 3’. The interval
2  x  3 means all values between 2 and 3, but excluding the end values. Infinite
intervals can be expressed in two ways: for example, the interval
x2 or 2x∞
contains all the numbers x which are greater than or equal to 2.
The ‘size’ of a number is denoted by the symbol
6
⎧x if x  0,
|x| = ⎨ (1.2)
⎩− x if x  0,
STANDARD FUNCTIONS AND TECHNIQUES

which is called the modulus or absolute value of x. Thus | 3| = 3, |−4| = 4. We


can use the modulus notation to define intervals. The inequality |x|  2 defines
the same interval as −2  x  2; |x − 1|  3 is the same as −3  x − 1  3 or
−2  x  4.

Self-test 1.1
The number x satisfies the inequalities 2  x  4 and | x |  3. Expressed as
a single expression, what values can x take?

1.2 Coordinates in the plane


The location of a point in a plane can be specified in terms of right-handed
cartesian axes, as illustrated in Fig. 1.2. These are effectively two number lines,
1

typically labelled x and y, at right angles, meeting at the common origin O. Axes
are right handed if, when we walk along the x axis in the direction of increasing
x, the positive y axis is on our left. If you look at Fig. 1.2 in a mirror you will see
left-handed axes.

y
3

P : (x, y) 2
A : (1.5, 1)
1
x
–3 –2 –1 O 1 2 3
−1

C : (−2, −1.5) −2
B : (2.5, −2)
−3 Fig. 1.2 Axes and coordinates.

The position of a point is determined by two coordinates (x, y). They repres-
ent, in order, the signed ‘distances’ of the point from the y and x axes respect-
ively, as read off from the numbers on the axis scales. In Fig. 1.2 the point A has
coordinates x = 1.5, y = 1. We shall use the notation A : (1.5, 1) for such a point,
so as to display the name of the point together with its coordinates. On Fig. 1.2 we
also show the point B : (2.5, −2), and a general point P : (x, y). For a point
P : (x, y), x is called the abscissa and y the ordinate of P.
In Fig. 1.3a, the x and y scales are supposed to be equal, so that the distance of
P : (x, y) from the origin is OP. By Pythagoras’s theorem,
OP = √(OU 2 + UP2),
7

(a) y (b) y P1 : (x1, y1)

1.3
P : (x, y)

GRAPHS
r

x
x O
O U
P2 : (x2, y2) U : (x1, y2)

Fig. 1.3

where U is the base of the perpendicular from P on to the x axis (Fig. 1.3a). If we
put OP = r, then
r = √(x2 + y2). (1.3a)

Note that distances, such as OP and r, are always counted as positive numbers.
Similarly, from Fig. 1.3b, for any two points P1 : (x1, y1) and P2 : (x2, y2) in the
plane, the distance P1 P2 between them is defined to be is given by
P1P2 = √[(x1 − x2 )2 + (y1 − y2 )2]. (1.3b)

Self-test 1.2
Find the distances between the points A : (1, 2), B : (2, −3) and C : (7, −2).
Confirm that AC2 = AB2 + BC2. What can you deduce about the angle R?

1.3 Graphs
If x and y are connected by an equation, then this relation can be represented by
a curve or curves in the (x, y) plane which is known as the graph of the equation
(often known as a cartesion equation in this context).

Example 1.1 Sketch the graph of y = x3.


We decide over what interval of values of (say) x we wish to sketch the graph, and the
scales to be used. Let −3  x  3. Construct a table of (x, y) values as shown below:

x −3 −2 −1 0 1 2 3
y −27 −8 −1 0 1 8 27

We then plot the points corresponding to this set of coordinates and draw a smooth
curve through them as shown in Fig. 1.4. The greater the number of values of x in the
interval, the greater is the reliability of the graph. It is assumed that the curve has a
smooth or regular behaviour between consecutive plotted points. In Fig. 1.4 the scales
are not the same on the two axes. Since y has a much greater spread of values (54)
compared with x (6), the vertical scale has been compressed to give a convenient picture.
Generally, unequal scales distort lengths and angles.
8

y 30
STANDARD FUNCTIONS AND TECHNIQUES

y B : (1, 2)
20
2

10
1 P : (x, y)
−3 −2 −1
O 1 2 3 x
x –2 –1 O 1 2
−10

A : (−1, −1) −1 Q C
−20
−2
−30
Fig. 1.5
Fig. 1.4 Graph of y = x 3: note that the x
and y scales are unequal.
1

Example 1.2 Find the equation of the straight line through the points
A : (−1, −1) and B : (1, 2).
The line is shown in Fig. 1.5. Let P : (x, y) be any point on the line. PAQ and BAC are
similar triangles, so that
Q P y + 1 CB 3
= = = .
AQ x + 1 AC 2
Therefore
2(y + 1) = 3(x + 1) or y = 23 x + 12 .
This represents the equation of the straight line through the points (1, 2) and (−1, −1).

Any equation of the form


y = mx + c
has a straight line graph, and any straight line can be expressed in this form
unless it is parallel to the y axis, when its equation is
x = d.
A more general form which includes both of the above cases is
ax + by = c
where a, b, and c are any constants.
The method of Example 1.2 can be used to find the equation of a line passing
through any two given points A : (x1, y1) and B : (x2, y2) (see Fig. 1.6) provided
x1 ≠ x2. Let P : (x, y) be any point on the line. Then
QP RB y − y1 y − y1
= or = 2 . (1.4)
AQ AR x − x1 x2 − x1
9

y
(x2, y2)

1.3
y B
(x, y) B
P
A

GRAPHS
y=
(x1, y1) x

m2
m 1

x
α A Q R y=
O x β α
Q O P x

Fig. 1.6 Fig. 1.7

This equation can be rearranged in the form y = mx + c, where


y2 − y1 x2 y1 − x1 y2
m= , c= . (1.5)
x2 − x1 x2 − x1
In the right-angled triangle ABR of Fig. 1.6, the angle α (Greek ‘alpha’) is given by
y2 − y1
tan α = (1.6)
x2 − x1
(provided that the x and y scales are the same; if not, the angle α will not be cor-
rect as measured in the figure). The number m or tan α gives the standard measure
of the slope or gradient of the straight line. The line slopes upwards or down-
wards from left to right according as tan α is positive or negative respectively; the
slope is zero when α or tan α is zero; and the larger the size (or modulus – see
(1.2)) of tan α, the steeper is the line. Also, according to (1.5) and (1.6), if the
equation is in the form y = mx + c, then
slope = tan α = m. (1.7)

From (1.7), if we require the line through A : (x1, y1) with given slope m, its
equation is
y − y1 = m(x − x1). (1.8)

The following result is often needed:

Perpendicular straight lines


Two straight lines are given by y = m1x + c and y = m2x + d.
(a) If m1m2 = −1, then they are perpendicular;
(b) conversely, if they are perpendicular, then m1m2 = −1.
(1.9)

Consider the lines y = m1x and y = m2x through the origin (these are parallel to
the original lines). Extend them into the upper half-plane y  0. If the two exten-
sions both lie in the same quadrant (the first, x  0, or the second, x  0), then
the angle between them is clearly less than 90°, and also m1m2  0. Therefore we
need only investigate the remaining case, of branches lying in different quadrants
as shown in Fig. 1.7.
10
In y  0, A and B are any points on y = m1x and y = m2x respectively. Construct
AP and BQ perpendicular to the x axis. The acute angles α, β are both repres-
STANDARD FUNCTIONS AND TECHNIQUES

ented by positive numbers, α  0, β  0. Then considering the triangles APO and


OQB respectively we obtain
m1 = tan α = AP/OP and m2 = − tan β = −BQ/OQ
(a) Assume firstly that m1m2 = −1.
Then
(AP/OP).(BQ/OQ) = −m1m2 = +1, or AP/OP = OQ/BQ.
The triangles ∆APO and ∆OQB are therefore similar triangles (that is, one just
a scaled up version of the other), since the corresponding sides forming the
right angles are proportional. In particular, S = T = β, and also (since
∆APO is right-angled) S = 90° − α. Therefore α + β = 90°, as required.
(b) Conversely, assume that α + β = 90°.
Then S = 90° − α (angles in a right-angled triangle), and from the condi-
tion α + β = 90° we obtain S = β. Likewise, in ∆OQB, U = 90° − α.
Therefore ∆OQB and ∆ AOP are similar triangles (because their angles are
equal), so, comparing corresponding sides, AP/OP = OQ/QB, equivalent to
1

(AP/OP).(QB/QO) = +1. Therefore m1m2 = −1, as required.

Equation of a circle, centre (a, b) and radius r


(x − a)2 + (y − b)2 = r 2. (1.10)

A circle consists of all points which are a constant distance from a given point.
In Fig. 1.8, the circle has radius r, and its centre is at (a, b). The point P : (x, y)
represents any point on the circle. Equation (1.3b) for the distance between
two points gives
√[(x − a)2 + (y − b)2] = r.
Square this expression to get rid of the square root, and we have the equation of a
circle in its standard form in eqn (1.10).

y
P
(x, y)

(a, b)

Fig. 1.8 Circle, centre at (a, b),


O x radius r.
11

Example 1.3 Find the centre and radius of the circle

1.3
4x + 4y − 4x + 8y − 11 = 0.
2 2
(i)

GRAPHS
To convert (i) to the form (1.10), rewrite it in the form
x 2 − x + y 2 + 2y = 11
4 . (ii)

Take the terms involving x and reorganize them:


x2 − x = (x − 12 )2 − 1
4

(this process is used in many different contexts, and is called completing the square).
Treat the terms in y similarly:
y 2 + 2y = (y + 1)2 − 1.
Replace the terms in (ii) by the new forms; we get
(x − 12 )2 − 1
4 + (y + 1)2 − 1 = 11
4 or (x − 12 )2 + (y + 1)2 = 4.
Therefore the centre is at ( 12 , −1), and the radius is 2.

Notice that (1.10) implies that, if we are given an equation


Ax2 + By2 + Cx + Dy + E = 0,
it can only represent a circle if A = B. (The equation might not represent anything,
as with x2 + y2 + 1 = 0, but if it does, it will be a circle.)
There are other familiar second-degree curves. We shall briefly mention those
which are centred at the origin and symmetrical about the x axis. Any equation of
the form
x2 y2
+ = 1, a  0, b  0,
a2 b2
represents an ellipse (see Fig. 1.9a). It is a ’flattened’ circle with so-called semi-
axes a and b.
An equation of the form
x2 y2
− = 1, a  0, b  0,
a2 b2
represents a hyperbola (Fig. 1.9b). It has two branches which approach the
asymptotes y = ±bx/a as | x | becomes large.

(b) y (c)
y
(a) y y=
b −b bx/a
x /a y=

x x
−a O a −a a
O
x
O
−b

Fig. 1.9 (a) An ellipse x2/a2 + y 2/b2 = 1. (b) A hyperbola x 2/a2 − y 2/b2 = 1. (c) A parabola y = x 2.
12
The list of second-degree curves is completed by the parabola, which has the
standard form y = x2 (Fig. 1.9c).
STANDARD FUNCTIONS AND TECHNIQUES

These curves, the ellipse, hyperbola and parabola are known as conic sections
since they can be constructed as plane sections of a cone.

Self-test 1.3
Find the radii and centres of the circles
x2 + y2 − 2x = 1, x2 + y2 − 4x − 2y = −1.
Find the coordinates of their points of intersection.

1.4 Functions
The area A of a circle depends on its radius r, and the dependence is expressed in
the formula A = πr 2. In general, suppose that the values of a certain independent
variable x, say, determine the values of a dependent variable y in such a way that
1

if a numerical value of x is given, a single value of y is determined. Then we say


that y is a function of x, and write, for example,

y = f(x), y = g(x),

and so on, where the letters f, g, etc., can be used to distinguish different forms of
dependence which can be thought of pictorially in terms of different graphs. The
letters f, g, and so on, standing alone, need not be associated with a formula
in the usual sense. They can stand for any rule, program, or calculation process
which produces a definite single value for y when we offer a number x to it.
A function can be thought of as an input–output device as in Fig. 1.10. ln y = f(x),
x is called the independent variable and y the dependent variable.

Input x Processor Output y


x f y = f(x) Fig. 1.10 The function f
represented by a processor.

Now suppose that the input is not simply x, but another function of x; say(2x).
For example we might be plotting the graph of y = f(2x) where f is the sine func-
tion, sin, and x is the independent variable. We then speak of 2x as being the
argument of f. We shall see many instances of this usage.
Functions can be defined implicitly by means of formulae. For example,
x2 + y 2 = 1
represents a circle, centre the origin and radius 1. But if we solve the equation
for y, we obtain y = ±√(1 − x2), which is not a single function, but two separate,
single-valued, functions
13
y = √(1 − x ) 2
and y = −√(1 − x ), 2

1.4
representing the upper and lower semicircles which together make up the circle.
The following result is frequently required. Suppose that c is a positive con-

FUNCTIONS
stant, and we are given a function f, with graph y = f(x). The graph y = f(x − c) is
exactly the same as that of f(x), except that it is moved, or translated a distance c
to the right along the x axis. There is a similar result for f(x + c), the movement
being to the left. Therefore

Translation of a function along the x axis


Let c be any positive number, and f any function. Then
y = f(x − c) and y = f(x + c)
represent translations of y = f(x) a distance c along the x axis to the right and left
respectively. (1.11)

Thus y = x2 and y = (x + 2)2 have the same shape, but the second is a distance 2 to
the left of the first.
Sometimes it is helpful to adopt a more formal way of presenting a function. For
example, instead of putting simply f(x) = √(1 − x 2) we may say
the function f defined, for −1  x  1, by f(x) = √(1 − x 2),
or, by changing the independent variable from x to t,
the function f defined, for −1  t  1, by f(t) = √(1 − t 2),
which has exactly the same meaning. Any letter may be used as the independent
variable to specify the formula or rule that f symbolizes; it is sometimes called a
dummy variable for this reason. When we call on the function f in the course of a
particular problem, we then revert to the symbols that are natural to the problem:
we might want f(r) or f(x 2) or f(x − y) or just a single value f(2π). In these
examples, the symbols r, x 2, x − y, and 2π are the arguments of the function f. For
example, if a function g is defined by
g(t) = (1 − t)2 for all values of t,
then, with the new argument 1 − x,
g(1 − x) = [1 − (1 − x)]2 = x 2,
or, with argument t 2,
g(t 2) = (1 − t 2)2.
It is useful to have terms in which symmetry of a graph can be described. For
example, the graph of the parabola y = x 2 shown in Fig. 1.9c is symmetrical about
the y axis; the two halves for x  0 and x  0 are reflections of each other in the
y axis. Functions with such graphs are called even functions. On the other hand y = x 3
(Fig. 1.4) is its own reflection in the origin: the function f(x) = x 3 is an example of
an odd function. The corresponding algebraic properties are defined by
14

Even and odd functions


STANDARD FUNCTIONS AND TECHNIQUES

(a) f(x) is even if f(−x) = f(x)


(b) f(x) is odd if f(−x) = −f(x)
for all x for which f is specified. (1.12)

For example, in plotting y = f(x) = x 3 in Example 1.1, we did not really have to
calculate x 3 for negative values of x. All that was necessary was to notice that x 3
is an odd function since (−x)3 = −(x 3), and this gives the table for negative x by
changing the sign of the entries for x positive.
Some functions of practical significance have graphs that are not entirely
smooth. For example, we may wish to model a device that is turned on at a given
time, being quiescent before that time but active afterwards. A sudden change
in the state of the device can be represented by a function which has a jump or
discontinuity in its graph at the critical moment. The basic building block for
functions with a jump is the unit step function H(t) (also known as the Heaviside
function after its inventor, and sometimes denoted by U(t)) which we shall define,
using t to represent time, by
1

⎧0 when t  0,
H(t) = ⎨ (1.13)
⎩1 when t  0
(see Fig. 1.11a).
If switch-on is required at t = t0 then we can use the translation

⎧0 when t  t0 ,
H(t − t0 ) = ⎨
⎩1 when t  t0 ,
shown in Fig. 1.11b: it is the same graph translated to the right a distance t0
by (1.11).

(a) y (b) y
1 1

O t O t0 t

Fig. 1.11 (a) Graph of y = H(t). (b) Graph of y = H(t − t0).

Example 1.4 Sketch the graph of f(t) = H(2 − t) + H(t + 1) − 1.


The function f(t) is a combination of unit functions each of which has a discontinuity
where its argument is zero. Thus f(t) has discontinuities at t = −1 (from H(t + 1)) and
at t = 2 (from H(2 − t)). Note that
⎧0 when t  −1, ⎧0 when t  2 ,
H(t + 1) = ⎨ H(2 − t ) = ⎨
⎩1 when t  −1, ⎩1 when t  2. ➚
15
Example 1.4 continued

1.4
y
1

FUNCTIONS
−2 −1 O 1 2 3
t Fig. 1.12

Hence for
t  −1 f(t) = 1 + 0 − 1 = 0;
−1  t  2 f(t) = 1 + 1 − 1 = 1;
t2 f(t) = 0 + 1 − 1.
The graph is shown in Fig. 1.12.
This function would switch a device on at t = −1 and switch it off at t = 2.

The odd function denoted by sgn and defined by

⎧−1 when t  0,

sgn t = H(t) − H(−t) = ⎨ 0 when t = 0,
⎪⎩ 1 when t  0,

is called the signum function (from the Latin signum meaning ‘sign’, used to avoid
verbal confusion with the trigonometric sine). Its graph is shown in Fig. 1.13a.
H(t) and sgn t can be used along with other functions to produce a variety of
functions having discontinuities in either value or direction at assigned points.
Figures 1.13b,c show the even functions y = t sgn(t) and y = sgn(1 − t 2). Note that
sgn(1 − t 2) has discontinuities where 1 − t 2 = 0; that is, where t = ±1.

y
y
(a) 1 (b) (c)
y 1

O t −1 O 1 t

−1 O t −1
Fig. 1.13 (a) y = sgn(t). (b) y = t sgn(t). (c) y = sgn(1 − t ). 2

Self-test 1.4
Sketch the graph of
y = [H(t + 1) + H(1 − t) − 1] t sgn t
16

1.5 Radian measure of angles


STANDARD FUNCTIONS AND TECHNIQUES

For everyday purposes angles are measured in degrees, so we are still following
the Babylonian practice of dividing the circle into 360 sectors each of which
subtends a degree (1°). For mathematical purposes, a less arbitrary measure is
desirable. The absolute unit is the radian, which represents about 57°. The special
property which makes the unit valuable is its connection with length.
Figure 1.14 shows a circle of radius R with a sector AOB containing an angle θ.
The length of the arc ⁄ is proportional to R, and it is proportional to θ whatever
the angular units, so it is proportional to the product Rθ. One radian is the unit
of angle such that ⁄ is numerically equal to Rθ.

A
θ
R
O
1

Fig. 1.14

Then if θ is measured in radians the arc-length


⁄ = Rθ.
But when θ covers the whole circle, the arc-length is 2πR.
Since the whole circle angle measures 360 in degrees, we obtain.
θ = 360° = 2π radians, or 1 radian = 180
π degrees = 57.295 78…°.
The following summarizes some useful information:

Radians and degrees


π
(a) α degrees = 180 α radians, β radians = 180
π β degrees.
360° = 2π rad, 180° = π rad, 90° = 12 π rad,
45° = 14 π rad, 60° = 13 π rad, 30° = 16 π rad.
(b) On a circle of radius R, the arc-length subtended by θ radians is Rθ. (1.14)

The radian measure is not just a matter of convention, like measuring lengths
in metres rather than feet. An angle θ of 30° degrees is equal to an angle of --16 π
radians, so θ subtends an arc of length Rθ = --16 π × R on the circumference of a
circle of radius R; not 30 × R, which would be ridiculous. This observation
has consequences in other places. For example, if you have already learned some
calculus you might know that (d/dθ )sin θ = cos θ, and that if θ is small enough,
sin θ is approximately equal to θ. But neither result is true unless θ is measured
in radians.
17

1.6 Trigonometric functions; properties

1.6
We assume that you know the meanings of sine, cosine, and tangent for positive
acute angles, as in ordinary trigonometry. We shall extend their meaning to

TRIGONOMETRIC FUNCTIONS; PROPERTIES


angles greater than 90°, and to negative angles. Unless indicated otherwise, angles
are always given in radians. In Fig. 1.15, X is any point on the positive x axis.
If we rotate the segment OX about O in the anticlockwise direction to arrive at a
final position OX′, the total angle through which it has turned is counted as a
positive number. If the rotation is clockwise, then the angle is given a negative
sign. These angles are unlimited in magnitude. We refer to an angle measured
from the positive x axis in this way as a polar angle. There is an infinite number
of polar angles leading to the same direction OX′, and differing by multiples of
a complete revolution 2π.

(a) y (b) y (c) y

X′ X′ X′

θ
x x x
O X θ O X θ O X

θ 0 θ 0 θ  2π

Fig. 1.15 Polar angles θ (in radians): (a) 43 π. (b) − 45 π. (c) 11


4
π. (These all have the same direction OX′.)

The trigonometric functions sine, cosine, and tangent for all angles are defined
by the construction in Fig. 1.16, in which P : (x, y) is any point, and θ is treated as
a polar angle. The length OP is given by
OP = r = √(x2 + y2)  0.

y
P : (x, y)
r
0
x0
y0 θ
x
O
Fig. 1.16 Diagram for cos θ, sin θ,
tan θ.

Then the definitions of the trigonometric functions for arbitrary angles θ are as
follows:
18

Trigonometric functions; definitions


STANDARD FUNCTIONS AND TECHNIQUES

θ is arbitrary, r = √(x2 + y2)  0 (see Fig. 1.16)


x y y
cos θ = , sin θ = , tan θ = .
r r x
(sec θ = 1/cos θ, cosec θ (or csc θ ) = 1/sin θ, cot θ = 1/tan θ ). (1.15)

These definitions are extensions, to all four quadrants of the (x, y) plane, of the
familiar geometrical meanings in the first quadrant. The length r is positive, but
the coordinates x and y are signed quantities which determine the signs of the
trigonometric functions sin θ, cos θ, tan θ in the four quadrants. The following
lists the ones which are positive:
1st quadrant, x  0, y  0: all are  0.
2nd quadrant, x  0, y  0: sin θ  0.
3rd quadrant, x  0, y  0: tan θ  0.
4th quadrant, x  0, y  0: cos θ  0.
1

Example 1.5 Obtain (a) sin 13 π, (b) tan 16 π, (c) cos 14 π. (The angles are in radians.)
Equation (1.14a) gives the angles in degrees. Use the triangles in Fig. 1.17.
(a) sin 13 π = sin 60° = √3/2.
(b) tan 16 π = tan 30° = 1/√3.
(c) cos 14 π = cos 45° = 1/√2.

(b)

30°

(a) 2 √3

√2 1

Fig. 1.17 (a) sin 45° = cos 45°


60°
45° = 1/√2. (b) sin 60° = cos 30°
1 1 = √3/2. cos 60° = sin 30° = 1/2.

Example 1.6 Obtain (a) cos 2π, (b) sin 23 π, (c) sin(− 43 π).
The points P on Fig. 1.18 have OP = r = 1, and polar angles equal to the angles given in
the question. The x, y coordinates of P are easy to find.
(a) P is at (x, y) = (1, 0), so that cos 2π = x/r = 1.
(b) P is at (x, y) = (0, −1), so that sin 23 π = y/r = −1.
(c) P is at (x, y) = (−1/√2, −1/√2), so that sin(− 43 π) = y/r = −1/√2. ➚
19
Example 1.6 continued

1.6
(a) y (b) y (c) y

TRIGONOMETRIC FUNCTIONS; PROPERTIES


θ = 2π

3
θ = − 4π
A,P O
x x x
O (r, 0) O r A
3
θ = 2π
P P
(0, −r) r
(− , − r)
√2 √2

Fig. 1.18

y y = cos θ

1 y = sin θ

θ
− 32 π −π − 12 π O 1
2 π π 3
2 π 2π

−1

Fig. 1.19 Graphs of cos θ and sin θ.

The graphs of cos θ and sin θ are shown in Fig. 1.19. Observe the following:
1. The curves for cos θ and sin θ have identical shape, but are displaced a distance
2 π (radians) from each other. They are related by
1

sin θ = cos(θ − 12 π), cos θ = sin(θ + 12 π). (1.16)

2. The functions cos θ and sin θ are said to be periodic, with period (or wave-
length) equal to 2π; that is, the curves repeat themselves at intervals of length 2π.
This is evident from the definition of a polar angle, because in terms of the
polar angle of a point P, an increase or decrease of 2π radians is equivalent to
a complete revolution.
3. cos θ is an even function (see eqn (1.12)), so that cos(−θ ) = cos θ; sin θ is an odd
function, so that sin(−θ ) = −sin θ.
4. The values taken by cos θ and sin θ oscillate between −1 and +1.
The graphs of tan θ, cot θ, sec θ, and cosec θ are shown in Fig. 1.20.
There are many trigonometric identities in common use. The following are
some of the more important (a more extensive list is given in Appendix B):
20

(a)
STANDARD FUNCTIONS AND TECHNIQUES

y y
(b)

O O
3 1 1 3
θ θ
− π −π − π
2 2 2 π π 2 π − 32 π −π − 12 π 1
2 π π 3
2 π

(c)
y (d) y

O 1 − 12 π 1 3
2 π
3 1 1 3
θ 3 1
θ
− π −π − π π π π − π −π O −1 π π
2 2 −1 2 2 2 2
1

Fig. 1.20 (a) y = tan θ. (b) y = cot θ. (c) y = sec θ. (d) y = cosec θ.

Trigonometric identities
For all angles A and B:
(a) Sums and differences of angles
sin(A ± B) = sin A cos B ± cos A sin B,
cos(A ± B) = cos A cos B z sin A sin B,
tan(A ± B) = (tan A ± tan B)/(1 z tan A tan B).
(b) Products as sums and differences
cos A cos B = 12 [cos(A + B) + cos(A − B)],
cos A sin B = 12 [sin(A + B) − sin(A − B)],
sin A sin B = 12 [−cos(A + B) + cos(A − B)].
(c) Double angles
cos2A + sin2A = 1,
sin(2A) = 2 sin A cos A,
cos2A = 12 (1 + cos 2A),
sin2A = 12 (1 − cos 2A).
(d) Cosine rule. In a triangle with side lengths a, b, c and opposite angles A,
B, C,
c2 = a2 + b2 − 2ab cos C.
(e) Sine rule. In a triangle with side lengths a, b, c and opposite angles A, B, C,
sin A sin B sin C
= = .
a b c
(Since the identities are the same as for positive acute angles we do not prove
them here.) (1.17)
21
We have so far encountered cos θ, sin A, etc., in which θ and A are understood
to represent certain angles arising in a geometrical context, but trigonometric

1.6
functions are used in many applications which have nothing directly to do with
angles. For example, expressions such as cos ω t will occur, in which t stands for

TRIGONOMETRIC FUNCTIONS; PROPERTIES


time and ω (Greek ‘omega’) is another constant. The identities continue to hold,
whatever the context.
With this in mind we now obtain an important identity which shows that any
function having the form a cos u + b sin u, where u is the independent variable
and a and b are constants, can be put into the form c cos(u + φ), in which c and
φ (Greek ‘phi’) are constants obtainable in terms of a and b.
Put
c = √(a2 + b2)  0,
and rewrite a cos u + b sin u in the form
a cos u + b sin u = c[(a/c)cos u + (b/c)sin u].
Let φ be an angle such that
cos φ = a/c and sin φ = −b/c.
The angle φ can be found by locating the point Q : (a, −b) on cartesian axes, as
shown in Fig. 1.21. Any one of the polar angles of Q satisfy the condition above;
usually the smallest in absolute magnitude is chosen. (Since c = √(a2 + b2)  0,
the radial and angular polar coordinates of Q are c and φ respectively. If you
do not know about polar coordinates, look forward to the first paragraphs of
Section 1.9.)

x
O φ <0

Fig. 1.21 To find φ satisfying


a/c = cos φ, b/c = −sin φ (here a
Q : (a, −b) and b are positive).

We then obtain
a cos u + b sin u = c(cos φ cos u − sin φ sin u) = c cos(u + φ)
from (1.17a) with u and φ in place of A and B. We have obtained the identity:

Harmonic functions
a cos u + b sin u = c cos(u + φ),
where c and φ are polar coordinates of the point (a, −b) in cartesian axes. (1.18)
22

y
STANDARD FUNCTIONS AND TECHNIQUES

1
( 2 π − φφ)/ω
ω

t
− φ/ω

−c

Period 2π/ω

Fig. 1.22 Graph of y = c cos(ω t + φ ).

A function having the form A cos(ku + α ), where A, k, and α are any constants,
1

is called a harmonic function or a sinusoid in the variable u. Sine functions are


included by virtue of (1.16), since we can change a sine into a cosine by subtracting
2 π from the argument of sine.
1

Consider the harmonic function given by


y = c cos(ω t + φ ) (c  0),
where the independent variable t represents time. The function is used to repres-
ent quantities y which have a regular wave-like time variation (or space variation
if the variable is distance x instead of time t). Figure 1.22 displays a function of
this type. It is like a cosine function c cos t, translated a distance − φ /ω along the
t axis, and stretched or compressed horizontally to an extent depending on ω. It
is periodic with period 2π /ω because, if we take any value of t and increase it by
2π/ω, we obtain
y = c cos[ω (t + 2π /ω) + φ ] = c cos(ω t + 2π + φ ) = c cos(ω t + φ ).
Therefore the value of y is repeated across every interval of length 2π/ω. The
number c  0 is called the amplitude; y is said to oscillate between ±c. The
frequency, in cycles per unit time, is ω /2π, and ω is called the circular or angular
frequency. The constant φ is the phase angle.
A general function f(x) is said to be periodic with period p if f(x + p) = f(x) for
every value of x. If p is a period, so obviously are 2p, 3p, and so on. The period
usually meant when we say that a function has period p is the smallest positive
period. From Fig. 1.20, tan θ and cot θ have period π in θ.

Self-test 1.5
–1 π and sin 12
Find numerical formulas for cos 12 –1 π.
23

1.7 Inverse functions

1.7
Let y = f(x), where f is a given function. It is often necessary to find a value of x
corresponding to a given value of y, which amounts to solving a certain equation.

INVERSE FUNCTIONS
For example, if f is defined by
y = f(x) = x 3 and y = 8,
1
then the resulting equation, 8 = f(x) = x 3, is solved uniquely by x = 8–3 = 2. In this
case there is a unique value of x corresponding to every value of y, positive or
negative. These x values each depend on y, so we say that x is a function of y.
Denoting this function by F, then we can write
x = F(y) = y 3 .
1

The function F is called the inverse function of f.


The following are the fundamental reciprocal relationships between F and f:
F{f(x)} = F(y) = (x 3 )3 = x, for every value of x,
1

and
f{F(y)} = f(x) = (y 3 )3 = y, for every value of y.
1

Initially, the values of x and y were connected through the relation y = f(x), but in
the final form of these equations:
F{f(x)} = x, for every value of x,
f{F(y)} = y, for every value of y.
We may use any letters to indicate the variables in place of x and y. In fact, these
two equations are identities (see Section 1.4). For example, if we were concerned
with an application involving an angle θ it might be convenient to write the first
one in the form
F{f(θ )} = θ, for every value of θ,
or we could substitute x for y in the second equation, without changing their
meaning.
Returning to the original problem, if we know the inverse function F we can
solve the equation
c = f(x)
by using the first reciprocal relation F{ f(x)} = x. Taking the inverse F of both
sides of the equation, we obtain
F(c) = F{f(x)} = x,
so that the required value of x is F(c).
We shall now give a geometrical description of the operations we have just gone
through. Figure 1.23a is the graph of y = f(x) = x 3. Choose any number a. To find
a3, locate x = a at A and follow the track ABC. The point C on the y axis repres-
1
ents y = f(a) = a3. Now choose a number b with the aim of obtaining b–3. Read the
graph backwards: locate y = b at U and follow the track UVW. Then W represents
24

y y = x3
STANDARD FUNCTIONS AND TECHNIQUES

(a) (b) y
U y=x
b V
1
1 1 y = x3
B Q
C
−1 a3
A W P
1
x x
O a1 b
3
−1 O 1

−1 −1

Fig. 1.23 (a) The function f(x) = x 3. (b) The function f(x) = x3 and its inverse F(x) = x –3 : since the
1

scales are equal, f(x) and F(x) reflect each other in the 45° line (see e.g. P : (a, b) and Q : (b, a)).

1
b–3. Therefore the same curve generates values of x 3 and also of its inverse function
1
F(x) = x–3. The two identities given above amount to the obvious fact that if we
follow the tracks ABCBA and CBABC respectively in Fig. 1.23a, then we arrive
back at the starting point in each case.
1

In order to obtain cube roots, we might prefer a graph from which the cube
root can be read off in the ordinary way: from a horizontal x axis to a vertical
y axis. Suppose that we plot the curves y = f(x) = x 3, and the inverse in the form
1
y = F(x) = x–3, on the same sheet of paper (see Fig. 1.23b). We also arrange that the
x and y scales are equal. Let P : (a, b), where b = a3, be any point on y = f(x) = x3.
1
The corresponding point on the graph of the inverse function y = F(x) = x–3 is
Q : (b, a). Since the x and y scales are equal, Q is the reflection of P in the straight
line through the origin inclined at 45° to the x axis. Therefore the graphs of
y = f(x) and of its inverse y = F(x) are reflections of each other in the 45° radial
1
line. This is shown for f(x) = x 3 and its inverse F(x) = x–3 in Fig. 1.23b.
The arguments are basically the same for any function and its inverse. What-
ever f may be, we obtain the graph of its inverse function F by plotting y = f(x)
with equal scales, and reflecting it in the 45° line. Also, the reciprocal identities

F{ f(x)} ≡ x and f{F(x)} ≡ x (1.19)

apply in the general case, though this is usually subject to restrictions on the range
of the function F, in order to ensure that a single, unique value is assigned to F(x).
1
The example f(x) = x 3 is particularly straightforward, since the functions y = x–3
and y = x 3 are unique inverses of each other for every value of x and y. The prob-
lem of single-valuedness of F arises if the graph of f(x) ‘turns over’ at some point,
or points. A simple example is y = f(x) = x 2 (see Fig. 1.9c) which turns over at
x = 0, so that the graph falls into two parts, on the left and right sides of the
y axis. The inverse function F1 corresponding to the right-hand branch of y = x 2 is
1
y = F1(x) = x–2, valid only for x  0 and y  0. A second inverse function, defined
1
by y = F2(x) = −x–2 for x  0 and y  0, arises from the left-hand branch. The two
inverse functions taken together provide, for example, the two expected solutions
of the equation f(x) = x 2 = 2, namely
1 1
x = F1(2) = +2–2 and x = F2(2) = −2–2.
25

1.8
y y = x0
1

INVERSE TRIGONOMETRIC FUNCTIONS


1
4
x
y= 1
3
x
y=

1
2
x
=
y
y =x
x2
y
=
x3
Fig. 1.24 Some positive integer1

x4
y
powers x n and their inverses x ,
n

y=
showing reflection across the
x 45° line.
O 1

The problem arises particularly in Section 1.8 in connection with inverse trigono-
metric functions.
Figure 1.24 illustrates the general character of positive integer powers xn and their
inverses x 1/n, the picture being confined to the range 0  x  1 for clarity. Notice
the symmetry of the inverse pairs xn and x1/n about the 45° line y = x 1 = x. Graphs
corresponding to other powers of x lie between those shown, in a regular way.

1.8 Inverse trigonometric functions


Consider the inverses of the trigonometric functions given by x = sin θ, x = cos θ,
and x = tan θ, in which θ can be thought of as an angle measured in radians.
There are two commonly used notations for the inverses. In this book they are
denoted by
θ = arcsin x, θ = arccos x, θ = arctan x.
The alternative notation for inverse trigonometric functions is
θ = sin−1x, θ = cos−1x, θ = tan−1x.
(In this notation the index (−1) does not signify a negative power: for example,
sin−1x does not stand for 1/sin x. Correspondingly, 1/sin x must be written as
(sin x)−1 in index form to distinguish its meaning. It is to avoid this possible source
of confusion that we have adopted the more modern notation arcsin etc., which
is also consistent with computer notation.)
Firstly, consider the problem of finding the inverse sine function defined by θ =
arcsin x, corresponding to the given function x = sin θ. The inverse function should
answer the question ‘what angles θ have their sine equal to x?’ Evidently if x  −1
or x  1 there exists no such angle. However, if x does lie in the permitted range,
there is an infinite number of such angles. This is illustrated in Fig. 1.25 for the
special case where x = 12 . The graph x = sin θ intersects the line x = 12 at an infinite
number of points,
θ = … , − 76 π, 16 π, 56 π, … .
A computer program or a hand calculator can deliver only a single, definite,
value of a function such as arcsin x, not an infinite shower of alternatives, but if
26

x
STANDARD FUNCTIONS AND TECHNIQUES

1
1
2

θ
−2π −π − 12 π O 1
2 π π 2π 3π 4π

−1

Fig. 1.25 The intersections of the graphs x = 12 and x = sin θ give the solutions of the equation
sin θ = 12 .

we are given any single one of these values we can easily use it to construct all
of them. Suppose θ = α is any one solution of the equation sin θ = c, where c is
a constant with −1  c  1. Then all the values of θ satisfying the equation
sin θ = c are obtainable from α by means of the formula
θ = nπ + (−1)nα,
1

where n is any integer (see Appendix B(g)).


In order to obtain a single value for θ = arcsinx and to exclude the rest, we need
to identify an interval on the graph of x = sin θ along which every value of x
between −1 and 1 occurs once and only once. The standard interval used for this
purpose is
− 21 π  θ  21 π,
shown shaded in Fig. 1.25, and in this way the standard inverse function θ =
arcsin x is restricted to lie in the range − 21 π  θ  21 π. Figure 1.26a shows its
graph. Given a value of x with −1  x  1, and the equation sin θ = x, it returns
the value of θ which has the smallest absolute magnitude. It is sometimes
called the principal value of arcsin x, and is denoted by a capital letter, that is
by Arcsin x (this is the value that the software Mathematica returns for the
command ArcSin[x]). If we want other solutions we must derive them from the
formula above.
The inverse cosine and tangent functions, arccos and arctan, are approached
in a similar way and are displayed in Figs 1.26b,c. They have different standard
ranges, and Appendix B(g) contains the formulae for obtaining all other solutions
of x = cos θ and x = tan θ. The various inverse functions are connected, as in the
following example.

Example 1.7 Simplify (a) cos(arctan x), (b) arcsin(cos x).


(a) In the right-angled triangle of Fig. 1.27a, x = tan θ, so that θ = arctan x. Therefore
cos(arctan x) = cos θ = 1/√(1 + x 2).
(b) In the right-angled triangle of Fig. 1.27b,
cos x = sin( 12 π − x), so that arcsin(cos x) = 12 π − x.
27

(a) θ (b) θ

1.8
1
2 π π

INVERSE TRIGONOMETRIC FUNCTIONS


θ=
nx

ar c
c si

co
ar

sx
=
θ
1
π
x 2
−1 O 1

− 12 π x
−1 O 1

(c) θ
π
1
2
x
ar ctan
θ=

x
−3 −2 −1 O 1 2 3

− 12 π

Fig. 1.26 The inverse trigonometric functions over their standard ranges. In (a) and (b), the scales
of θ and x are the same. (a) θ = arcsin x, − 12 π  θ  12 π, −1  x  1. (b) θ = arccos x, 0  θ  π,
−1  x  1. (c) θ = arctan x, − 12 π  θ  12 π, all x.

(a) (b)

1
x) 2π − x
2
1
+ sin x
√(1 x

x
1 cos x
θ

Fig. 1.27 See Example 1.7.

Self-test 1.6
Find a formula for sin(2 arctan x).
28

1.9 Polar coordinates


STANDARD FUNCTIONS AND TECHNIQUES

In Fig. 1.28, P is a general point, with coordinates x, y. The length of OP (always


counted positive) is given by

r = OP = √(x2 + y2)  0.

y
P : (x, y)

θ
x
O
Fig. 1.28 Polar coordinates:
x = r cos θ, y = r sin θ.
1

The angle θ is any polar angle (see Section 1.7) that locates P; we choose the
simplest one for the illustration, but our statements are true for any of the valid
polar angles (which are given by θ + 2πn for every integer n).
Definite values for r and θ locate a unique point P just as well as the cartesian
coordinates x, y, so in suitable cases we can use r, θ as coordinates in place of x, y.
They are called polar coordinates. From the definitions of sine and cosine in
eqn (1.15), the cartesian and polar coordinates of P are related in the following way:

Cartesian and polar coordinates


(a) x = r cos θ, y = r sin θ.
(b) r = √(x 2 + y 2 )  0, cos θ = x/r, and sin θ = y/r. (1.20)

Polar coordinates are often easier to use than x, y coordinates, especially for
curves that surround the origin. The simplest example is a circle, centre the origin
and radius c, whose polar equation is
r = c.
A spiral, such as a track on a compact disk with inner radius a, outer radius b, and
track width h, is described by
h
r=b− θ,

in which θ runs from zero to 2πN, where N = (b − a)/h is the number of revolu-
tions. Note the enormous size attained by the polar angle as it follows the
rotation through many revolutions.
An extensive catalogue of graphs of curves in polar and cartesian coordinates
can be found in Seggern (1990).
29

Example 1.8 (a) Obtain the polar equation of the central ellipse

1.9
x2 y2
+ =1
a2 b2

POLAR COORDINATES
(see Fig. 1.9a). (b) Obtain the polar equation for the same ellipse tilted through an
angle α.
(a) Referring to eqn (1.20), the cartesian equation becomes
r 2 cos2θ r 2 sin2θ
+ = 1. (i)
a2 b2
From the identity (1.17c) we can express cos2θ and sin2θ in terms of cos 2θ, so that (i)
can be written as
⎡(1 + cos 2θ ) (1 − cos 2θ ) ⎤
r2 ⎢ + ⎥ = 1,
⎣ 2a 2 2b2 ⎦
which simplifies to
√2ab
r= .
√[( a 2
+ b 2
) − (a2 − b2 ) cos 2θ ]
When θ runs from, say zero to 2π, the complete ellipse is traced out once.
(b) To tilt the ellipse through an angle α , simply replace θ by θ − α (by analogy with
eqn (1.11) for a change of origin of the x axis). Then the equation of the tilted ellipse is
√2ab
r= (ii)
√[(a2 + b2 ) − (a2 − b2 ) cos 2(θ − α )]

Equation (1.20b) gives the formulae for converting from x, y coordinates to


polar coordinates, and you may wonder why the two simultaneous equations for
θ were not written more simply as
sin θ y
tan θ = = ,
cos θ x
apparently leading to the explicit solution θ = arctan(y /x). To explain why, con-
sider the point P which has the coordinates
x = −1, y = 1.
The point P is in the second quadrant, and the simplest choice for θ (there is an
infinite number of valid choices, all differing by multiples of 2π) is
θ = 43 π .
However, if we look up the value of arctan(y /x) = arctan(−1) on a calculator, the
value we it gives us is θ = − 14 π, predicting, wrongly, that P is in the fourth quad-
rant. The reason for the problem is that the standard definition of arctan(y/x)
(its principal value) requires that it lies in the range − 21 π  θ  21 π , and so if
arctan(y/x) = θ is interpreted geometrically as a polar angle its associated point P
will lie in quadrants 1 or 4 (see Section 1.8 and Fig. 1.26c). A different solution of
the equation tan θ = y/x is therefore needed if P lies in quadrants 2 or 3 (which is
the region of negative x). This is provided by the rule:
30
Cartesian–polar rule
Let P be the point (x, y) with polar coordinates r, θ. Then
STANDARD FUNCTIONS AND TECHNIQUES

(i) if x = r cos θ is positive, a value of the polar angle of P : (x, y) is arctan(y /x);
(ii) if x = r cos θ is negative, a value for the polar angle is arctan(y /x) ± π.
The result (ii) follows from the fact that tan θ has period π; so
tan[arctan(y /x) ± π] = tan[arctan(y/x)] = y/x,
as required. As always, we can take the polar angles ±2πn.
Many symbolic computer systems accept x and y arguments to avoid this
problem. For example, in Mathematica the polar angle is given correctly by the
command ArcTan[x, y].

Self-test 1.7
Find the equivalent polar equation in (r, θ ) for the cartesian equation
(x2 + y2 − x)2 − (x2 + y2) = 0 assuming r  0. Sketch the curve.
1

1.10 Exponential functions; the number e


Consider the function
y = a x,
in which a is any positive constant and x can take any value (see Section 1.1).
Graphs of y against x for several cases where a  1 are shown in Fig. 1.29. All
graphs pass through the point x = 0, y = 1.
The number a is called the base for the exponential function ax. Exponential
functions having different bases are all closely related. For example, 2x = 4 2x , and
1

9x = 32x, and so on. We shall show later (Example 1.9) that all the functions y = ax
can be displayed on the same curve provided that the x-scales are contracted
or extended by appropriate factors, so we really only need one, standard, base to
describe them all.

y
4
y = 3x

2x

1.5 x
y=

y=

3
x
1.25
2 y=

1 y = 1x

x Fig. 1.29
−1 O 1 2 3
31
A standard base may be chosen according to what is most convenient for later
requirements. For example, at one time the base a = 10 was adopted in order

1.10
to simplify the arithmetic involved in large calculations, but nowadays we have
better methods. The base a = 2 is used in the theory of binary processes such as

EXPONENTIAL FUNCTIONS; THE NUMBER e


occur in information theory. The base now in most general use is denoted by the
letter e, and the corresponding standard exponential function is ex. It is written
alternatively as
exp x or exp(x)
in programming (Mathematica uses Exp[x]), on hand calculators, and in text.
The numerical value of e is chosen so as to simplfy the algebra associated with
exponential functions. We shall identify e by the following requirement:
e is the number such that the graph of y = ex cuts the y axis at 45° (provided
that the x and y scales are equal).
Reference to Fig. 1.29, in which the scales are equal, indicates that the value of e
is between 2 and 3.
To obtain a close estimate for the numerical value of e, see Fig. 1.30. The x and
y scales on the graph are equal, and P : (0, 1) is the point where the graph of y = ex
meets the y axis. Q is any nearby point on the graph having coordinates x = h,
y = eh. PN and NQ are parallel to the axes and meet at N. PT is the tangent line
to the curve at P, meeting QN at T. We have said that e must take a value such that
the graph cuts the y axis at 45°; by this we mean strictly that the tangent line PT
must cut the y axis at 45°. This is equivalent to saying that the slope of the tangent
line PT (see eqn (1.6)) should equal 1, that is
NT
= tan 45° = 1.
PN

y
Q

45°
1 N
P h

Fig. 1.30 Graph of y = e x, with


equal x, y scales. At P : (0, 1) the
x slope of the tangent line at P is
O h equal to 1.
32
First suppose that h is quite small, so that Q is close to P. Then NT ≈ NQ, and
STANDARD FUNCTIONS AND TECHNIQUES

NQ
≈ 1,
NT
the approximation improving as h becomes smaller. We have NQ = eh − 1 and
PN = h, so that
eh − 1
≈ 1, (1.21)
h
and, in fact, the approximation can be made as close as we wish by taking h small
enough. To estimate e, multiply through by h to give eh ≈ 1 + h. Finally, raise both
sides to the power 1/h to isolate the number e:
e ≈ (1 + h)1/h,
to any degree of accuracy, provided h is made small enough. The following table,
constructed using a calculator or computer, illustrates how the desired value of e
gradually emerges as h approaches zero:
1

h 0.1 0.01 0.001 0.0001 0.000 01


(1 + h)1/h (≈ e) 2.5937… 2.7048… 2.7169… 2.7181… 2.7182…

(To 7 decimal places, the value of e is given by e = 2.718 281 8….)


The approximation (1.21) can be written briefly by using the notation
eh − 1
→ 1 as h → 0, (1.22)
h
the sign ‘→’ standing for ‘approaches’. A non-geometrical way of describing this
requirement is to say that the rate of increase or the growth rate of the function ex
at x = 0 is equal to 1. By using this expression we avoid having to refer continually
to graphs having equal x, y scales.
The growth rate of ex at a general value of x can be obtained similarly, by
imagining a fixed point P on the graph with coordinates (x, ex), and a nearby
point Q : (x + h, ex+h). Then when h → 0,
e x+h − e x
→ the growth rate of ex at P.
h
Notice that we cannot shortcut the process of letting h → 0 by putting h = 0 dir-
ectly into the expression: we would obtain 0/0, which is meaningless. To obtain
an explicit expression, write ex+h = exeh. Then
e x+h − e x e x(eh − 1)
= → ex as h → 0, from eqn (1.22).
h h
Therefore, for any value of x,
the rate of growth of ex = ex.
This is the property of ex that makes e the preferred base for the exponential func-
tions, and e is the only base that delivers this property.
33
x
The function e increases rapidly, as the following table illustrates (to 2 signi-
ficant figures):

1.11
x −9 −6 −3 0 3 6 9

THE LOGARITHMIC FUNCTION


ex 1.2 × 10 −4 2.5 × 10 −3 5.0 × 10 −2 1.0 2.0 × 101 4.0 × 102 8.1 × 103

Whenever x is increased by 3, ex is multiplied by a factor of about 20 (because


e3 ≈ 20), and whenever x is reduced by 3, ex is divided by about 20(e−3 ≈ 1/20). This
observation indicates that ex approaches infinity as x approaches +∞, and that ex
approaches zero as x approaches −∞: ex → ∞ as x → ∞; ex → 0 as x → −∞.

1.11 The logarithmic function


The inverse function (see Section 1.7) corresponding to the exponential function
y = ex or exp(x) is called the logarithm of x (historically, the natural logarithm).
It is written
y = ln x,
or sometimes as y = loge x, read as ‘log with base e of x’, or simply as log x (in
Mathematica the notation is Log[x]). It fills in the question mark in e? = x, or
solves the equation e y = x for the unknown y. For example, the equation
ey = 3
has the unique solution
y = ln 3.
A scientific calculator gives ln 3 = 1.098 61 to 5 decimal places, and it can be
confirmed that e1.098 61 = 3 (to 5 decimal places).
Since ex and ln x are inverses, their graphs are reflections of each other in the
45° radial line (see Section 1.7) provided that the x and y scales are equal. This
relationship is shown in Fig. 1.31. Note that if x is negative, ln x does not have a
real value.

y y = ex

2
45°

1 y = ln x

45°
x
−1 O 1 2

−1
Fig. 1.31 The graph of ln x
obtained from that of ex.
34
The logarithm has the following properties, which are proved below:
STANDARD FUNCTIONS AND TECHNIQUES

Properties of the logarithm ln x


a and b are any positive numbers.
(a) Definition of ln x (x  0) as the inverse of ex:
eln x = x (x  0), and ln ex = x (any x)
(or exp(ln x) = x (x  0), and ln(exp x) = x (any x).)
(b) ln 1 = 0, ln e = 1.
(c) Product rule:
ln ab = ln a + ln b.
(d) Quotient rules:
ln(a/b) = ln a − ln b, ln(1/b) = −ln b.
(e) Power or exponent rule:
ln ax = x ln a for any x.
(f) ln x → ∞ as x → ∞; ln x → −∞ as x → 0 (x  0). (1.23)
1

Proof. (a) This is the fundamental property of (1.19) of inverse functions applied
to this case.
(b) e0 = 1 (see (1.2b)), so 0 = ln 1 (from (1.23a)). Also, e1 = e, so 1 = ln e (from
(1.23a)).
(c) From the definition (1.23a), applied three times:
eln ab = ab = eln aeln b = eln a+ln b (from (1.2a)).
By equating powers of e on the two sides, we have
ln ab = ln a + ln b.
(d) Put a = (a/b)b, and take the logarithm of both sides:
ln a = ln[(a/b)b] = ln(a/b) + ln b
from the product rule (1.23c). The first result in (1.23d) follows immediately.
Put a = 1 so that ln 1 = 0 to obtain the second result.
(e) From (1.23a),
a = eln a, so that a x = (eln a)x = ex ln a.
By the definition (1.23a) with ax in place of a, the logarithm of a x is x ln a.
(f ) These follow from (a) from the limit of eln n = x as x → ∞ and as x → −∞.

Example 1.9 Prove that all the graphs y = ax, for a  1, become identical to that
x
of e if, for each case, the x axis is scaled by the appropriate factor.
To fix ideas, think of a as being a given constant such as 2. As in the proof of (1.23e)
above, we can write y = ax in the form
y = ax = ex ln a = ekx, ➚
35
Example 1.9 continued
say, where k = ln a. The required scale factor is k  0. To make the graph of ax lie

1.12
along that of ex, the x axis must be stretched if k  1 (i.e. if a  e), and compressed if
0  k  1 (i.e. if 1  a  e).

EXPONENTIAL GROWTH AND DECAY


If k is negative, corresponding to the range 0  a  1, the direction of the x axis has
to be reversed as well as being rescaled.

Example 1.10 Obtain y in terms of x when ln(y − 1) = 3 ln x + 2.


Equate the exponential functions of both sides of the equation:
eln(y−1) = e3 ln x+2. (i)
From the inverse function property (1.23a),
eln(y−1) = y − 1. (ii)
Also, by using the ordinary rules for exponents (1.2),
e3 ln x+2 = e3 ln xe2 = (eln x)3e2 = x3e2 (iii)
from (1.23a). Substitute (ii) and (iii) into (i); we obtain
y = 1 + x3e2.

Self-test 1.8
Obtain y in terms of x when ln(y − 1) = 2 ln x + ln y − x for x  0.

1.12 Exponential growth and decay


Here we shall use t (for time) in place of x, and consider the function
y = A ect, (1.24)

where A and c are constants. This class includes functions such as 2t, since, by
Example 1.9, they can all be expressed in the form (1.24); in this case 2t = et ln 2.
If c  0, then y is said to have exponential growth. To get an idea of what this
implies, we shall consider the doubling period of y. Choose any moment of
time, t. At some later time t + T, y will have doubled its value, so that
A ec(t+T ) = 2A ect or A ect ecT = 2A ect.
After cancelling the factor A ect we have an equation for T:
ecT = 2,
so that cT = ln 2, from which we obtain the unknown T:
T = (1/c) ln 2.
This result is independent of t and A, so we have

Exponential doubling principle


y = A ect doubles its value in every interval of length T = (1/c) ln 2. (1.25)
36
The doubling time T is often quoted as a measure of the rate of growth of
populations, investments, etc., which, over a period, behave exponentially as in
STANDARD FUNCTIONS AND TECHNIQUES

eqn (1.24). More generally, the value of A ect is multiplied by a factor N over every
interval of length (1/c) ln N. The successive values form a geometric progression
(see Section 1.15) with common ratio N.

Example 1.11 The number N of scientists and engineers in the USA doubled
every 10 years between 1900 and 1935, and in 1935 they numbered about
1.5 × 105. This suggests exponential growth N = A ect. Find c, and predict
the number N for 1990 on the assumption that the trend continued.
Suppose that we count 1900 as t = 0. The doubling period is 10 years; so N = A ect,
where, by (1.25),
c= 1
10 ln 2 = 0.0693.
Thus
N = A e0.0693t.
In 1935, where t = 35 (years), N = 1.5 × 105, so that
1.5 × 105 = A e0.0693× 35,
1

or A = 13 265.
Therefore
N = 13 265 e0.0693t.
In 1990 t = 90, from which it follows that
N = 6.8 × 106.

Exponential growth occurs when a quantity increases at a rate proportional


to the amount already accumulated. In the short term, animal populations,
epidemics, and some investments have this characteristic.
Exponential decay may also occur. If c is a positive number and
y = A e−ct = A/ect,
then y halves itself in every interval of length (1/c) ln 2. This occurs in radioactive
decay, the period being called the half-life period of a radioactive substance. The
half-life period provides a convenient, memorable measure of the time it would
take for the substance to become less harmful.

Self-test 1.9
In a radioactive element, the number of radioactive nuclei, x, present at time t
is given by x = x0 e−kt, where k is a constant, and x0 is the number of nuclei
present at time t = 0. Find the time taken for 90% of the nuclei to decay.

1.13 Hyperbolic functions


It is often convenient to represent certain combinations of exponential functions by
separate functions. The hyperbolic cosine and hyperbolic sine functions, denoted
by cosh and sinh respectively, are defined by the following formulae.
37

Hyperbolic functions

1.13
cosh x = 12 (ex + e−x), sinh x = 12 (ex − e−x). (1.26)

HYPERBOLIC FUNCTIONS
Since
cosh(−x) = 12 (e−x + e−(−x)) = 12 (e−x + ex) = cosh x,
it follows that the graph of cosh x is symmetrical about the y axis, that is, cosh x
is an even function. By a similar argument, it can be shown that sinh x is on odd
function of x. Graphs of the two functions are shown in Fig. 1.32a.

(a)
4 y (b) (c)
y = cosh x y
4 y 4
3 3 y = coth x 3 y = cosech x

2 2 2
y = sech x 1
1
1 y = tanh x

−2 −1 O 1 2 −2 −1 O 1 2 −2 −1 O 1 2
−1 x −1 x −1 x

−2 −2 −2
y = coth x y = cosech x
y = sinh x −3 −3 −3
−4 −4 −4

Fig. 1.32 Graphs of the hyperbolic functions.

From the definitions (1.26)


cosh x + sinh x = ex, cosh x − sinh x = e−x.
The remaining hyperbolic functions are defined in a similar manner to their
trigonometric counterparts. Thus

sinh x cosh x
tanh x = , coth x = ,
cosh x sinh x
1 1
sech x = , cosech x = .
cosh x sinh x (1.27)

Graphs of tanh x, coth x, sech x, and cosech x are shown in Fig. 1.32b, c.
From the definitions, a number of identities follow which parallel those for trigo-
nometric functions but with important sign differences. Some are derived below.

(a) cosh2x + sinh2x = cosh 2x,


(b) cosh2x − sinh2x = 1. (1.28)

For (a):
cosh2x + sinh2x = 14 (e2x + 2 + e−2x) + 14 (e2x − 2 + e−2x)
= 12 (e2x + e−2x) = cosh 2x.
38
For (b):
STANDARD FUNCTIONS AND TECHNIQUES

cosh2x − sinh2x = 14 (e2x + 2 + e−2x) − 14 (e2x − 2 + e−2x) = 1.


To obtain the identity
sinh(x1 + x2) = sinh x1 cosh x2 + cosh x1 sinh x2,
start with the right-hand side:
sinh x1 cosh x2 + cosh x1 sinh x2
= 21 (ex1 − e−x1 ) 21 (ex2 + e−x2 ) + 21 (ex1 + e−x1 ) 21 (ex2 − e−x2 )
= 14 (ex1 +x2 − e−x1+x2 + ex1 −x2 − e−x1 −x2 ) + 14 (ex1 +x2 + e−x1 +x2 − ex1 −x2 − e−x1 −x2 )
= 21 (ex1 +x2 − e−x1 −x2 ) = sinh(x1 + x2).
Together with similar identities:

sinh(x1 ± x2) = sinh x1 cosh x2 ± cosh x1 sinh x2,


cosh(x1 ± x2) = cosh x1 cosh x2 ± sinh x1 sinh x2,
tanh x1 ± tanh x2
tanh(x1 ± x2) =
1

.
1 ± tanh x1 tanh x2 (1.29)

The inverse hyperbolic functions corresponding to sinh, cosh, and tanh are
indicated respectively by the notations
sinh−1, cosh−1, and tanh−1.
The index (−1) is traditional: do not mistake it as standing for a negative
power; sinh−1x does not mean 1/sinh x. (Note that the commands ArcSinh[x],
ArcCosh[x], and ArcTanh[x] are used for inverse hyperbolic functions in sym-
bolic computation in Mathematica.) The intervals for which they are defined are
as follows:

Inverse hyperbolic functions


y = sinh−1x for all x and y.
y = cosh−1x for x  1 and y  0.
y = tanh−1x for −1  x  1 and all y. (1.30)

These functions are expressible as logarithms. For example, consider sinh−1x.


Put
y = sinh−1x,
so that
x = sinh y = 12 (e y − e− y).
Multiply through by e y and rearrange to give
e2y − 2x e y − 1 = 0 or (e y)2 − 2x(e y) − 1 = 0.
39
y
This is a quadratic equation for e whose solutions are given by

1.14
e y = x ± √(x 2 + 1),
where the sign √ means the positive square root. The negative sign in the right-

PARTIAL FRACTIONS
hand side corresponds to a solution which cannot represent e y, since e y is always
positive but x − √(x 2 + 1) is always negative. Therefore we select the positive sign.
By taking the logarithm of both sides and using (1.23a) we obtain
y = sinh−1x = ln[x + √(x 2 + 1)],
valid for all x. The inverses of the other hyperbolic functions are obtained sim-
ilarly, and are shown in the following table:

Inverse hyperbolic functions as logarithms


y = sinh−1x = ln[x + √(x2 + 1)] for all x, y.
y = cosh−1x = ln[x + √(x2 − 1)] for x  1, y  0.
y = tanh−1x = 12 ln[(1 + x)/(1 − x)] for −1  x  1, all y. (1.31)

Self-test 1.10
Prove that tanh−1x = --21 ln[(1 + x)/(1 − x)].

1.14 Partial fractions


We shall first reiterate the distinction between an equation and an identity (see
Section 1.4). The word ‘equation’ has many uses, but for the present we shall think
of an equation as something like
x 2 + 2 = −3x,
which is true only for certain particular values of x, namely −1 and −2. On the
other hand,
x 2 + 3x + 2 ≡ (x + 1)(x + 2)
is an identity, meaning that it is true automatically, or for all values of x. We shall
write ≡ instead of = when we want to draw attention to an identity.
It is easy to test the truth of the following identities by adding up the fractions
on the right.
1 1 1 1 1
(i) ≡ − ,
x −1 2 x−1 2 x+1
2

x 1 1 1 1
(ii) ≡ + ,
4x 2 − 1 4 2x − 1 4 2x + 1
3x + 2 1 2 1
(iii) 2 ≡ + 2 − .
x (x + 1) x x x+1
The terms on the right are individually simpler than the functions on the left.
This break-up into simpler constituents is useful for many purposes. In this
40
section, we show how to break up a complicated function into simpler terms of
the type above.
STANDARD FUNCTIONS AND TECHNIQUES

A polynomial in a variable x is an expression such as


−2x2, 3x3 − x + 16
which is the sum of one or more terms of the form axn, where n is a positive
integer or zero, and a is any number. A rational function is a function which takes
the form
f(x) = P(x)/Q(x),
where P(x) and Q(x) are polynomials. For example, 1/[x 2(3x − 2)] and (2x 3 + 1)/
1
(x − 1)2 are rational functions, but x–2 /(x + 1) and (cos x)/(x + 1) are not rational
functions of x. We shall be concerned only with rational functions, and initially
we shall suppose that
degree of P(x)  degree of Q(x),
where the degree is the highest power of x occurring in the polynomial. Such
functions can be broken up into partial fractions, like the examples at the begin-
1

ning. No proofs will be given here, but the reader should learn the techniques.
It is the denominator Q(x) which determines what the form of the constituent
partial fractions will take. Suppose that the denominator is broken up into factors
as far as possible. For example,
2x 4 + x 3 − 4x 2 + x − 6 = (2x − 3)(x + 2)(x 2 + 1),
and it cannot be factorized any further. We shall consider only the cases where the
factors are of the type:
ax + b (a simple factor), (cx + d )n (a repeated factor of order n), and
px 2 + qx + r with q2  4pr (an irreducible quadratic).
The rules affecting these are as follows:

Partial fractions for rational functions P(x)/Q(x)


(degree of P(x))  (degree of Q(x))
Each factor of Q(x) gives rise to a partial fraction (or partial fractions) as below.
Capitals denote constants: their values are unique.
(a) Simple factors. To each factor ax + b of Q(x), a term K/(ax + b).
(b) Repeated simple factors. To each factor (cx + d )n of Q(x), there are n
terms:
L1 /(cx + d) + L2 /(cx + d)2 + ··· + Ln /(cx + d )n.
(c) Irreducible quadratic. To each factor px 2 + qx + r of Q(x), a term
(Mx + N)/(px2 + qx + r). (1.32)

P(x) is involved in these rules only to the extent that it will affect the values of the
coefficients K, L1, etc. The following examples show how to determine the values
of the coefficients.
41

Example 1.12 Express x/[(x − 1)(x + 2)] in partial fractions.

1.14
We can use any convenient letters for the unknown coefficients in the terms. The
denominator has two simple factors, x − 1 and x + 2, so (1.32a) says that the partial

PARTIAL FRACTIONS
fractions must have the form
x A B
≡ + (i)
(x − 1)(x + 2) x − 1 x + 2
Multiply through by (x − 1)(x + 2):
x = A(x + 2) + B(x − 1). (ii)
The constants must be chosen so that this becomes an identity. An identity has to be
true for any x, so that if we put any value of x into (ii), the result must be correct. Any
two substitutions of numbers for x form two simultaneous equations for the two
unknown constants A and B. For example, if we put x = −10 and x = 100 we obtain
−10 = −8A − 11B, 100 = 102A + 99B.
The numbers we chose are inconvenient, but according to (1.32) we get the same A and
B whatever values of x we use. Therefore, choose values of x that make the equations as
simple as possible:
x = −2 gives −2 = 0 − 3B, so B = 23 ,
x=1 gives 1 = 3A + 0, so A = 13 .
Therefore, from (i),
1 2
x
≡ 3 + 3 .
(x − 1)(x + 2) x − 1 x + 2

Example 1.13 Express (3x − 1) /[(2x + 1)(x − 1)2] in partial fractions.


According to (1.32a,b),
3x − 1 A B C
≡ + + . (i)
(2x + 1)(x − 1)2 2x + 1 x − 1 (x − 1)2
Multiply by (2x + 1)(x − 1)2 to give
3x − 1 = A(x − 1)2 + B(2x + 1)(x − 1) + C(2x + 1). (ii)

We need to choose three values of x to obtain three simple equations for A, B, C.


Obvious choices are x = 1 and x = − 12 . For the third, choose, say, x = 0. From (ii):
x=1 gives 2 = 0 + 0 + 3C, so C = 23 ,
x = − 2 gives − 52 = 94 A + 0 + 0, so A = − 109 ,
1

x=0 gives −1 = A − B + C, so B = 1 + A + C = 59 .
Finally,
3x − 1 10 1 5 1 2 1
≡− + + .
(2x + 1)(x − 1)2 9 2x + 1 9 x − 1 3 (x − 1)2

Example 1.14 Express 1/[x(x2 + 1)] in partial fractions.


Here, x is a simple factor, x2 + 1 is an irreducible quadratic; so, by (1.32c),
1 A Bx + C
≡ + 2 .
x(x + 1)
2
x x +1 ➚
42
Example 1.14 continued
STANDARD FUNCTIONS AND TECHNIQUES

Multiply by x(x2 + 1):


1 = A(x2 + 1) + (Bx + C)x (i)

Then x = 0 gives 1 = A + 0, so
A = 1.
There are no other very easy values of x to choose. Put the value of A just found into (i)
and rearrange: we get
−x = Bx + C. (ii)

It is easiest just to notice that (ii) is satisfied for all x only if


B = −1 and C = 0.
Therefore
1 1 x
≡ − .
x(x2 + 1) x x2 + 1

If the degree of the numerator is greater than or equal to the degree of the
1

denominator, the case is not covered by (1.32), but we can treat it as follows.

Example 1.15 Put (x 3 + 1)/[x(x − 1)] into the form of a polynomial plus partial
fractions.
Carry out polynomial division, until the remainder is of lower degree than the divisor:
x+1
x2 − x x3 + 1
subtract x 3 − x 2
x2 + 1
subtract x2 − x
remainder x+1

Therefore
x3 + 1 x+1
≡x+1+ .
x(x − 1) x(x − 1)
The last term is of the right type for partial fractions, and finally
x3 + 1 1 2
≡x+1− + .
x(x − 1) x x−1

Self-test 1.11
Express f(x) = x/[(x − a)(x − b)(x − c)] in partial fractions if (a) a, b, c are all
different; (b) b = c ≠ a; (c) a = b = c.
43

1.15 Summation sign: geometric series

1.15
The sign ∑ (sigma) is a large Greek capital S, standing for ‘the sum of …’. It is
used in the following way. Suppose, for example, we are provided with a string of

SUMMATION SIGN: GEOMETRIC SERIES


six quantities indexed in order, say
u1, u2, u3, … , u6.
This is called a sequence consisting of six terms. We can denote the general term
by (say) un, where n takes values from 1 to 6. Suppose we want to add them all up.
Then
6
u1 + u2 + u3 + u4 + u5 + u6 is denoted by ∑u , n
n=1

which is read ‘the sum of all the un from n = 1 through n = 6’. Similarly
5
u2 + u3 + u4 + u5 is written ∑ u.
n
n=2

Any letter can be used as the counting index instead of n, provided that there is
no conflict; so we could also write, for instance,
6
u3 + u4 + u5 + u6 = ∑ ui.
i=3

We index a sequence according to convenience. The first index does not have to
be 1. For example, consider the important sequence
1, x, x 2, x 3, … ,
which is the same as
x 0, x1, x 2, x 3, … .
This is called, for historical reasons, a geometric sequence or geometric progres-
sion. Each term in turn is got from its predecessor by multiplying by the common
ratio x. The natural way to index such a sequence is to start with n = 0 instead of
n = 1. Suppose then we want the sum of the first six terms. It can be expressed as
5
1 + x + x2 + x3 + x4 + x5 = ∑ xn,
n =0

though we could express the sum as


6

∑x
n =1
n −1

instead. Such a sum, whether or not it starts with the x0, or constant, term, is
called a geometric series.
We will obtain an expression for the sum S of a geometric series having any
value of the common ratio x (except x = 1) and which runs from the term
N
in x0 to the term in xN. Thus S = ∑ x . Note that it contains N + 1 terms
n

n=0
(i.e. not N terms). Written at length:
S = 1 + x + x 2 + ··· + x N−1 + x N.
44
Then
STANDARD FUNCTIONS AND TECHNIQUES

xS = x + x 2 + x 3 + ··· + x N + x N+1.
Subtract the second line from the first. All the terms cancel except for two; we
obtain
S(1 − x) = 1 − x N+1, so S = (1 − x N+1)/(1 − x).

Sum of a geometric series


N
1 − x N+1
∑x n
= 1 + x + x2 + ··· + xN =
1−x
, (x ≠ 1).
n=0
(1.33)

N
If x = 1, then S = ∑ 1 = N + 1.
n=0

4 6
1
Find the following sums. (a) ∑ (0.1)n , (b) ∑
1

Example 1.16 ,
N N 5 n= 0 n =0 2n
(c) ∑ enx, (d) ∑ (−1)n , (e) ∑ 2n .
n =0 n =0 n= 0

(a) x = 0.1 and N = 4, so


1 − (0.1)5 0.999 99
S= = = 1.1111
1 − (0.1) 0.9
(as becomes obvious if you write out the terms individually).
(b) 1/2n = ( 12 )n , so x = 12 , and N = 6. By (1.33),
1 − ( 12 )7
S= = 2(1 − 128
1
) = 127
64 .
1 − 12
(c) enx = (ex)n, so the common ratio is ex, in place of x in (1.33):
1 − (e x )N +1 1 − e (N+1)x
S= = .
1 − ex 1 − ex
(d) Here x = −1, so
1 − (−1)N +1 1
S= = 2 [1 − (−1)N+1 ].
1 − (−1)
The sums of N = 1, 2, 3, 4, … terms of the sequence are successively 1, 0, 1, 0, 1, … .
(e) x = 2 and N = 5. Therefore
1 − 26
1 + 2 + 22 + 2 3 + 2 4 + 25 = = 63.
1−2

Example 1.17 Find an expression for the sum S of


ar + ar + ar 8 + ar 11 + ar 14.
2 5

This can be written as


ar 2(1 + r 3 + r 6 + r 9 + r 12). ➚
45
Example 1.17 continued

1.16
The brackets contain a series of type (1.33), with common ratio r 3. The number of terms
N + 1 is equal to 5, so that N = 4. Then
1 − (r 3 )5 1 − r 15

INFINITE GEOMETRIC SERIES


S = ar 2 = ar 2 .
1 − (r )3
1 − r3
4
(In terms of the ∑ notation, we have S = ∑ ar 2+ 3n
. It is perhaps easier to see what to do
n= 0
when the series is written out fully, as above.)

1.16 Infinite geometric series


From (1.33), take the series to N (not N + 1) terms, giving
1 xN
SN = 1 + x + x2 + ··· + xN−1 = − . (1.34)
1−x 1−x
Suppose firstly that the absolute value of x, or | x |, is less than 1, meaning −1  x
 1. We shall see what happens when we continue the series to take in more and
more terms to obtain (in imagination) an infinite number of terms.
The first term in (1.34), 1/(1 − x), is the same for all N. But for −1  x  1, the
second term approaches zero as N increases to infinity:
xN
→ 0 as N → ∞, (1.35)
1−x
because xN → 0 as N → ∞. This can be illustrated numerically by taking any
specific value for x in the range −1  x  1, say x = 0.1. The behaviour of xN as
N increases is shown in the following table:

N 1 2 3 4 …
xN 0.1 0.01 0.001 0.0001 …

The sequence of terms 1, x, x2, x3, x4, x5, … is formed by multiplying each term in
turn by x to get the succeeding term, so if |x |  1, the terms become steadily
smaller in magnitude, and in fact (though we do not prove it here) can be made as
close as we wish to zero if we take a large enough value of N.
Therefore, referring back to (1.34),
1
if |x |  1, then SN → as N → ∞,
1−x
where the sign ‘→’ signifies ‘approaches’. In this way the idea of an infinitely
extended geometric series can be given a meaning. Its sum to infinity, S∞, is
expressed by

1
S∞ = ∑x
n =0
n
1−x
= . (1.36)

On the other hand, if |x|  1, the magnitude of the term xN+1/(1 − x) in (1.34) will
increase to infinity as N increases, so the infinite series cannot be said to have a
sum at all. If x = 1, the series becomes
46
1 + 1 + 1 + ··· ,
STANDARD FUNCTIONS AND TECHNIQUES

which simply continues to grow to infinity as the number of terms increases. The
case of x = −1 is indeterminate. To summarize:

Geometric series: sum to infinity


The geometric series 1 + x + x2 + ··· has a sum to infinity S∞ if, and only if,
−1  x  1; then

1
S∞ = ∑ xn = .
n =0 1 − x
(1.37)

The second term on the right in (1.34) is called the remainder or error: it
represents the error incurred by using only the first N terms to approximate to the
infinite sum. For the infinite series to be useful, this quantity must approach zero
as N approaches infinity.

Example 1.18 Express the recurring decimals (a) 0.4444… , (b) 0.96 96 96… , in
1

the form of the sum of fractions.


(a) The decimal can be written as an infinite geometric series:
4 4 4 4 ⎛ 1 1 ⎞ 4 1 4
+ + +$= ⎜1 + + + $⎟ = =
10 100 1000 10 ⎝ 10 100 ⎠ 10 1 − (1/10) 9
from (1.36).
(b) In a similar way
96 96 96 96 ⎛ 1 1 ⎞
+ 4 + 6 + $ = 2 ⎜ 1 + 2 + 4 + $⎟ .
10 2
10 10 10 ⎝ 10 10 ⎠
The series in the brackets has the common ratio x = 1/102 and by using (1.36) the sum
to infinity is 1/[1 − (1/10 2)] = 100/99. Therefore the decimal is equivalent to 96/99.
(Such results can be verified by ‘long division’.)

Self-test 1.12
Sum the series
1 + 4x + 7x2 + 10x3 + ··· + (3N + 1)xN.
What is the sum to infinity of the series if | x |  1?

1.17 Permutations and combinations


Suppose that we have four different letters A, B, C, and D, from which we may form
strings of several letters that we shall call words (we shall not require them to be
real words) such as BCA, AA, BBCDA. The order is important: AB is a different
word from BA.
Consider the number of two-letter words that we can form from A, B, C, D if we
may repeat any letter. There are 16 possible words; listed systematically they are:
47
AA AB AC AD; BA BB BC BD; CA CB CC CD; DA DB DC DD.

1.17
We can obtain the number of words without writing them all down by using the
following argument. Imagine that we are writing them down; then we have four

PERMUTATIONS AND COMBINATIONS


alternatives for the first letter, and following each of those choices we have four
alternatives for the second letter. Altogether, then,
number of two-letter words = 4 × 4 = 42 = 16.
To find the number of five-letter words which can be made up from A, B, C, D,
with any number of repetitions allowed within a word, we follow the same reasoning
as before, pursuing it through five letters, and obtain
4 × 4 × 4 × 4 × 4 = 45 = 1024 different words.
Suppose that repetition of letters within a word is not allowed. Or we may
imagine n different ‘objects’, such that we cannot pull duplicates out of the air,
as we did with the letters in the discussion above. The objects for selection might
be wooden letters or numbers, or a pile of books with different titles, or a group
of soldiers, etc. If we select r  n of the n objects and place them in a certain order,
we are said in this case to have a permutation of r distinct objects, taken from
among n distinct objects. The number of possible permutations is signified by n Pr .

Example 1.19 How many four-letter words can be made out of the six letters
A, B, C, D, E, F, with no repetitions within a word?
Put another way, how many permutations are there of n = 6 distinct objects, taken r = 4
at a time, or what is 6 P4? There are six choices for the first letter. With each such choice
there are only five letters available for the second (no repetition), so there are 6 × 5
possible choices for the first two letters. There are four letters left to supply the third
letter, so there are 6 × 5 × 4 possibilities for the first three letters, and inclusion of the
fourth letter gives finally
6 P4 = 6 × 5 × 4 × 3 = 360.

Example 1.20 There are six different books and we must choose one book for
each of four children as a present. How many different distributions of books
to the children are possible?
A decision is required as to whether to distinguish by letter the children, or the books, or
both. We choose the children, distinguished by W, X, Y, Z, say.
Imagine that we are listing all the possibilities. Child W may receive any one of six
books. Whichever one W receives, X will have one of the remaining five; Y will have
one of the remaining four; and Z one of the remaining three. The number of entries in
our list is therefore
6 P4 = 6 × 5 × 4 × 3 = 360.

It might seem at first sight that this treatment favours child W and that child Z is
shabbily treated. However, the process describes a systematic way to arrange a list
containing all possible assignments, only instead of writing them down, we count them.
No choice of any gift is involved.
48
Guided by this discussion we can now obtain a formula for n Pr . We firstly need
the factorial notation n!. If n  0 is a positive integer, then the meaning of n! is
STANDARD FUNCTIONS AND TECHNIQUES

n! = n(n − 1)(n − 2) … 2· 1 (1.38)

(or alternatively n! = 1· 2· 3· … ·n). We shall need the identity


n!
n(n − 1)(n − 2) … (n − r + 1) ≡ (1.39)
(n − r)!
for 0  r  n, in which the left-hand side contains r factors. If r = n in (1.39), the
expression n! /(n − r)! on the right becomes n!/0!, and 0! has no natural meaning.
However, its value is defined to equal unity:
0! = 1, (1.40)

and then we can use formulae such as (1.39) without making an exceptional case
for r = n.

Permutations
The number of possible permutations of r objects, 1  r  n, taken without
repetition from among n distinct objects is given by
1

n!
P = n(n − 1)(n − 2) … (n − r + 1) = .
n r
(n − r)! (1.41)

Proof. There are n possibilities for the first place in a permutation. With each of
these, the second can contain any of the remaining (n − 1) objects, so that there
are n(n − 1) possibilities for the first two places. The third place can contain any
of the remaining (n − 2) objects, so there are n(n − 1)(n − 2) possibilities for the
first three places. This continues until we have completed r factors corresponding
to the r entries in each permutation; these form the product n(n − 1)(n − 2) …
(n − r + 1). Then use (1.39).
The following example shows how it is sometimes possible to relate a problem
involving repetitions to one in which all the elements are distinct.

Example 1.21 How many distinct five-letter permutations can be formed by


using one A, one B, and three Cs?
Method (i) (by enumerating the possibilities). Consider firstly the permutations in which
A precedes B; the Cs are distributed in all possible ways among the positions marked *
in *A*B*. Take the possible distributions of the Cs among the positions case by case:
• The Cs may be adjacent as CCC; there are three positions for the group.
• Two adjacent Cs and one separate C; there are three positions for CC, and with each
there are two positions for the C, giving 3 × 2 = 6 possibilities.
• Three separated Cs; there is only one possibility.
Therefore there are 3 + 6 + 1 = 10 permutations in which A precedes B. Similarly there
are a further 10 permutations in which B precedes A, so the total number of
permutations is 10 + 10 = 20.
Method (ii). Make the three Cs distinct in the problem by calling them C1, C2, C3.
There are 5 P5 = 5! distinct permutations of ABC1C2C3. Imagine that they are all
listed. If, in this list, we restore C by putting C1 = C2 = C3 = C, we shall see many
replications of the same ‘word’. For example, consider the entry C3C2BC1A. This
reduces to CCBCA, but so does C1C2BC3A, and several others. In fact the number of ➚
49
Example 1.21 continued

1.17
replications corresponding to any distinct word is 3 P3 = 3! = 6: this is the number of
permutations of the symbols C1, C2, C3, all of which are equivalent. Therefore there are
3! times as many entries in the list of permutations of ABC1C2C3 as there are in the list

PERMUTATIONS AND COMBINATIONS


of N (say) distinct words made up from ABCCC. Therefore, 3!N = 5!, and
N = 5!/3! = 5 × 4 = 20.
The following example shows Method (ii) in use for a more complicated case.

Example 1.22 How many distinct permutations exist which use all the
14 letters in the word ASSASSINATIONS?
It does not matter which string, or anagram, we treat as a source of letters, so start
instead from one which displays the repetitions clearly:
SSSSSAAAIINNOT. (i)
We shall enforce a distinction between the repeated letters of each type by indexing them:
S1S2S3S4S5A1A2A3I1I2N1N2OT. (ii)
There are 5! × 3! × 2! × 2! permutations within the indexed groups in (ii), taken
separately, which all correspond to the same word (i). Similarly, there are 5! × 3! × 2! × 2!
rearrangements of indexed letters corresponding to any distinct permutation of the
ordinary letters. There are altogether 14 P14 = 14! permutations of the indexed letters.
Therefore, if the number of distinct permutations of the letters in ASSASSINATIONS is
denoted by n, then:
number of permutations of the indexed symbols in (ii) = 14! = (5! × 3! × 2! × 2!)n.
Finally
n = 14!/(5! × 3! × 2! × 2!) = 30 270 240.

Example 1.23 (Circular permutations) Five people sit round a circular table. In
how many distinct orders may they sit?
The meaning of ‘distinct’ here is that two arrangements are regarded as being the same
if each person has the same person on his or her right (or left would do as well). If the
people are named A, B, C, D, E, then rotation of a particular grouping, say BADEC,
bodily around the table does not count as a new circular permutation: go clockwise
(say) from any fixed position noting the order of seating; then BADEC, ADECB,
DECBA, ECBAD, CBADE are to be treated as the same permutation.
The number of ordinary 5-letter permutations is 5!. Let the number of circular
permutations be NC . To each of the circular permutations there are five ordinary
permutations, so that 5NC = 5!, and finally
NC = 5!/5 = 4! = 24.
In general, if there are n persons, the number of circular permutations is (n − 1)!.

Permutations are sequences: if the order of the elements is changed the per-
mutation is counted as a different one. We shall now consider problems in which
rearrangements of the same group, collection, or set of objects are regarded as
equivalent: what defines the set is simply which items it contains, without regard
to order. Such a set is called a combination. For example, an apple (A), a banana
(B), and a carrot (C) in a plastic bag can be regarded as a mere combination of
purchases, but the decision to consume them in a certain time order involves
consideration of the possible permutations ABC, BAC, and so on.
50
Suppose there are six distinct objects, A, B, C, D, E, F, and we want to count
how many different combinations consisting of three elements can be selected.
STANDARD FUNCTIONS AND TECHNIQUES

We denote this number by 6C3. A typical combination is any group containing A,


B, and C. If we were laying out all the permutations our list would include 3! = 6
entries all arising from the particular combination A, B, and C, namely
ABC, ACB, BAC, BCA, CBA, CAB.
We obtain the same factor 3! from every combination of three elements, so in
this case there are 3! times as many permutations as there are combinations.
Therefore (using (1.41)) the number of different combinations of three objects
drawn from six objects is equal to
P 6!
6 C3 = 6 3
= = 20.
3! 3!(6 − 3)!
If we have a set of n distinct objects from which we may select groups of size
r  n, the number of combinations is denoted by nCr . The same argument applies
in the general case, and we obtain
1

Combinations
The number of combinations of r distinct objects (1  r  n) selected from n
distinct objects is given by
P n!
Cr = =
n r
.
n
r! r!(n − r)! (1.42)

Example 1.24 Find the number of possible combinations made up by selecting


some or all of n different objects without repetition of any object.
To form all combinations of arbitrary size up to n, look at each object in turn, and
either take it or reject it, and do this in every possible way. There are two options with
regard to the first object; with each of these there are two options for the second object;
and so on until we arrive at the nth object. There are 2n such options in all, but these
include rejection of all the objects. The total number of combinations is therefore 2n − 1.
Alternatively, we may add the number of combinations consisting of 1 object,
2 objects, … , n objects, which is
nC1 + nC2 + nC3 + ··· + nCn .

It will be confirmed in Example 1.28 that this sum is in fact equal to 2n − 1.

Frequently, the permutations/combinations occurring in problems involving


repetition of objects may, with advantage, be broken up into types, which can be
considered separately as in Example 1.25.

Example 1.25 A lucky-dip jar contains seven sweets; there are one each of
flavours A, B, C, and D, and three of flavour E. A child reaches into the jar and
pulls out four sweets. How many distinct combinations of flavours might the
child obtain? ➚
51
Example 1.25 continued
The combinations may contain three Es, two Es, one E, or no Es:

1.18
three Es: one other choice out of four; possible combinations 4.
two Es: two other choices out of four; possible combinations 4C2 = 4!/(2!2!) = 6.

THE BINOMIAL THEOREM


one E: three other choices out of four; possible combinations 4C3 = 4!/(3!1!) = 4.
No Es: four choices out of four; possible combinations 1.
The total number of distinct combinations is 4 + 6 + 4 + 1 = 15.

Example 1.26 Prove that (a) nCr = nCn−r , (b) nCr = n−1Cr + n−1Cr−1.
(a) From (1.42) nCr = n!/[r!(n − r)!] and nCn−r = n!/[(n − r)!r!].
(b) Starting with the right-hand side
n−1Cr + n−1Cr−1 = (n − 1)!/[r!(n − r − 1)!] + (n − 1)!/[(r − 1)!(n − r)!]

= {(n − 1)!/[(r − 1)!(n − r − 1)!]} [1/r + 1/(n − r)]


= {(n − 1)!/[(r − 1)!(n − r − 1)!]}{n/[r(n − r)]}
= n!/[r!(n − r)!] = nCr .
(This result is used in Section 1.18 in the binomial theorem.)

Self-test 1.13
A pin number for cash machines is a 4-digit number chosen from 0, 1, 2, 3, … , 9.
How many pins are there if: (a) all the digits are different? (b) repetitions are
permitted except that pins with 4 digits the same are not allowed?

1.18 The binomial theorem


Consider expressions of the form
(1 + x)n,
where n is a positive integer. For small integers n we can expand (1 + x)n as a poly-
nomial by multiplying it out; for the first few powers we obtain
(1 + x)2 = 1 + 2x + x 2,
(1 + x)3 = 1 + 3x + 3x 2 + x 3,
(1 + x)4 = 1 + 4x + 6x 2 + 4x 3 + x 4,
and so on.
The coefficients occurring in the polynomials above (including the first and last
coefficients equal to 1) are called the binomial coefficients of the successive powers
of x, namely of x 0 (which equals 1), x1, x 2, x 3, and so on, in order. For example, in
the expansion of (1 + x)4, the coefficients can be listed as follows:

power of x 0 1 2 3 4
coefficient 1 4 6 4 1

Notice that there are n + 1 terms in the expansion of (1 + x)n, and that the coeffici-
ents have a symmetrical pattern, coefficients equidistant from the ends being equal.
We shall show later that these properties hold for all positive integer powers n.
52
The process of repeated multiplication soon becomes arduous. We firstly describe
a more efficient method for powers given numerically, and secondly obtain an
STANDARD FUNCTIONS AND TECHNIQUES

explicit formula for the coefficients.


Consider, for example, how we obtain the coefficients of (1 + x)5 from those
of (1 + x)4. We have (1 + x)5 = (1 + x)4(1 + x). Lay the calculation out like a long
multiplication sum:

(1 + x)4: 1 + 4x + 6x2 + 4x3 + x4


× (1 + x): 1+x
1 + 4x + 6x2 + 4x3 + x4
+ x(1 + 4x + 6x2 + 4x3 + x4)
= (1 + x)5: 1 + 5x + 10x2 + 10x3 + 5x4 + x5.

Notice how the coefficient of x2 in (1 + x)5 is arrived at: it is the sum of the
coefficients of x2 and x in (1 + x)4 (underlined in the sum above). Similarly, the
coefficient of x3 in (1 + x)5 is equal to the sum of the coefficients of x3 and x2 in (1
+ x)4, and so on, the only exceptions to the rule being the first and last coefficients,
which are both equal to 1. The same rule applies whenever we calculate (1 + x)n+1
from (1 + x)n:
1

Coefficient rule in the binomial theorem


For 1  r  n − 1, the coefficient of xr in (1 + x)n = the sum of the coefficients of
xr and xr −1 in (1 + x)n−1. (The first and last coefficients are equal to 1.)

By using this rule we can develop an efficient and rapid method, or algorithm,
for obtaining the coefficients, called Pascal’s triangle. It is a triangular array, as
shown in (1.43), whose rows consist of the coefficients of x0, x1, x2, … in the
expansion of (1 + x)n for n = 1, 2, … . The rows are constructed successively.
Each row is obtained from the preceding row by the rules just given: place a ‘1’
at the beginning and end of each new row; then every intermediate entry in that
row is equal to the sum of two entries from the previous row, one directly above
and one to the left. Two instances are indicated in the table (1.43).

Pascal’s triangle algorithm


For the coefficients in the expansion of (1 + x)n, n a positive integer:

power of x 0 1 2 3 4 5 …
n=1 1 1
n=2 1 2 1
n=3 1 3 3 1

n=4 1 4 6 4 1
n=5 1 5 10 10 5 1

      

(The construction of two of the coefficients is indicated.) (1.43)


53
Pascal’s triangle works like positional notation in arithmetic. We do not write
the number ‘three hundred and sixty-five’ in the form 5 + (6 × 10) + (3 × 10 2): the

1.18
powers of 10 are implicit in the positions of the digits in the sequence 365. Similarly,
in Pascal’s triangle we temporarily hide the powers xr, and only have to manip-

THE BINOMIAL THEOREM


ulate coefficients.
The procedure for deriving the nth row from the (n − 1)th row implies that if
the (n − 1)th row is symmetrical, then the new nth row is also symmetrical. But
the row for n = 2 is symmetrical; this symmetry is inherited by the row n = 3,
and so for all subsequent rows.
If we know the coefficients for (1 + x)n then we can obtain the expansion of
(a + b)n by writing it in the form
(a + b)n = an[1 + (b/a)]n.
Taking the case n = 3 as an illustration, put b/a = x, and use the expansion of
(1 + x)3 given above, with coefficients 1, 3, 3, 1:
(a + b)3 = a3[1 + 3(b/a) + 3(b/a)2 + (b/a)3]
= a3 + 3a2b + 3ab2 + b3. (1.44)

We shall now prove the binomial theorem, which provides an explicit formula
(rather than an algorithm) for the coefficients in (a + b)n where n is any positive
integer. We shall start with the standard case (1 + x)n.
All the essentials of the general result can be illustrated by the special case
n = 3. Consider the following expansion, obtained by multiplication:

(1 + x1)(1 + x2)(1 + x3)


= 1 + (x1 + x2 + x3) + (x1x2 + x2x3 + x1x3) + (x1x2x3). (1.45)

Each term on the right is really the product of three elements (either 1s or xs), one
from each of the three brackets on the left. Thus the first term is really 1 × 1 × 1, and
x1x3 arises from the product x1 × 1 × x3. The terms are then sorted into groups
according to the number of x factors. A formula for the number of terms in each
group can be obtained by using the result (1.42) of Section 1.17. For example, the
number of terms having two x factors is equal to the number of ways of choosing
two out of the three available x factors. This number is given by
3!
3 C2 = = 3.
2!1!
Similarly for the other groups: for r = 1, 2, 3 the group containing the products of
r x-factors has 3C2 members. Finally, suppose that all the x elements are made
equal by putting
x1 = x2 = x3 = x.
The bracketed expression in (1.45) containing the products of r of the x’s, with
r = 1, 2, or 3, collapses into 3Cr x r, so we obtain:
(1 + x)3 = 1 + 3C1x + 3C2x 2 + 3C3x 3.
54
We now prove the general result:
STANDARD FUNCTIONS AND TECHNIQUES

Binomial theorem
If n is a positive integer and a, b are any numbers:
(a) (1 + x)n = 1 + nC1x + nC2x2 + ··· + nCn−1xn−1 + nCnx n.
(b) (a + b)n = an + nC1an−1b + nC2an−2b2 + ··· + nCn−1abn−1 + nCnbn,
where
n! n(n − 1) … (n − r + 1)
Cr = = .
n
r!(n − r)! r!
(The notation
A nD
C r F = nCr

is also used for binomial coefficients.) (1.46)

Proof. (a) When expanded


1

(1 + x1)(1 + x2 )(1 + x3) … (1 + xn)


n
= 1 + ∑ (all the different r-fold products of the x’s). (1.47)
x=1

(The meaning of ‘an r-fold product’ is one containing exactly r different x’s: thus
xpxq, p ≠ q, is a two-fold product.) For each fixed value of r, the number of r-fold
products is equal to the number, nCr, of combinations of r objects selected without
repetition from the n different objects x1 to xn.
Now put
x1 = x2 = x3 = ··· = xn = x
into (1.45). For each value of r, the sum of the r-fold products collapses into the
form
(x r + x r + x r + ··· to nCr terms) = nCr x r.
This gives the result (1.46a)
(b) To obtain the expansion of (a + b)n, we follow the process that led to
eqn (1.44), namely
(a + b)n = an[1 + (b/a)]n,
and use (1.44a) with x = b/a; this becomes

⎡ ⎛ b⎞ ⎤
2 n −1 n
⎛ b⎞ ⎛ b⎞ ⎛ b⎞
(a + b) = a ⎢1 + nC1 ⎜ ⎟ + nC2 ⎜ ⎟ +  + nCn −1 ⎜ ⎟ + ⎜ ⎟ ⎥ .
n n
⎢⎣ ⎝ a⎠ ⎝ a⎠ ⎝ a⎠ ⎝ a⎠ ⎥

After removing the brackets the sum is as given in (b). An alternative proof of this
theorem, using a calculus method, is given at the end of Chapter 4.
55

Example 1.27 Expand and simplify the expression (x + x −1)6 (x ≠ 0).

PROBLEMS
It is simplest to add a line to Pascal’s triangle (eqn (1.43)) to obtain the coefficients, rather
than to use the general form of the binomial theorem (1.46):

power of x: 0 1 2 3 4 5 6 …

n=5 1 5 10 10 5 1
n=6 1 6 15 20 15 6 1

Then
(x + x−1)6 = x6(1 + x−2)6
= x6[1 + 6(x−2) + 15(x−2)2 + 20(x−2)3 + 15(x−2)4 + 6(x−2)5 + (x−2)6]
= x6[1 + 6x−2 + 15x− 4 + 20x− 6 + 15x−8 + 6x−10 + x−12]
= x6 + 6x4 + 15x2 + 20 + 15x−2 + 6x− 4 + x− 6.

Example 1.28 Prove that nC1 + nC2 + nC3 + ··· + nCn = 2n − 1.


Put x = 1 into the expansion of (1 + x)n given in (1.46a). It becomes
(1 + 1)n = 2n = 1 + nC1 + nC2 + nC3 + ··· + nCn ,
from which the result follows immediately.

Self-test 1.14
Using the binomial theorem, find a series expansion for (1 + x)2n + (1 − x)2n.

Problems

1.1 (Section 1.3). Sketch graphs of the following 1.3 (Straight line, Section 1.3). What are the slopes
equations over the intervals stated: of the following straight lines, and where do they
(a) y = x4, −1.5  x  1; cut the coordinate axes?
(b) y = x(1 − x), −1  x  2; (a) y = x − 1; (b) 3y = x − 2; (c) 2x + 5y = 4.
(c) y = 1 + x + x2, | x − 1 |  2;
(d) y = |x − 1 |, −3  x  3; 1.4 (Straight line, Section 1.3). Find the equations
(e) y = |x | + |x − 3 | + | x + 2 |, −3  x  4; of the following straight lines:
(f) y = ||x | − 1 |, −2  x  2; (a) passing through (1, 2) inclined at 45° to the
(g) y = √(x2 + 1), | x |  2. x axis;
(b) passing through (−1, −2) with slope −2;
1.2 (Straight lines, Section 1.3). Find the straight (c) with slope 0.5 and x axis intercept x = 1;
lines through the following pairs of points: (d) through (1, 2) parallel to the line y = 3x − 4;
(a) (1, 1), (−1, 5); (e) through (−1, 3) perpendicular to the line
(b) (0, 1), (2, 1); y = 4x − 1.
(c) (2, 1), (−1, −1).
Sketch the triangle formed by these lines. Find the 1.5 Show that the following pairs of lines are
lengths of each side of the triangle. mutually perpendicular:
56
(a) 2x + 3y − 2 = 0 and −3x + 2y − 3 = 0; 80 (t  0),
(b) f(t) =
(b) y = 2x + 1 and y = --12 (3 − x); 9 2t (t  0);
STANDARD FUNCTIONS AND TECHNIQUES

(c) y = 2x and y = − --12 x; (d) y = ±x.


10 (t  0),
1.6 (Straight line, Section 1.3). Show that the 4t (0  t  1),
equation (c) f(t) = 2 1 (1  t  2),
(x + y + 1) + α(2x − 3y − 2) = 0, 4 –t + 3 (2  t  3),
30 (t  3);
where α is any constant, represents a straight
line through the intersection of x + y + 1 = 0 and
2x − 3y − 2 = 0. Find the line joining this point 1.13 (Section 1.5). What are the radian measures
to the point (1, 1). of the following angles: (a) 30°, (b) 120°?

1.7 (Circles, Section 1.3). Find the centre and 1.14 (Trigonometric functions, Section 1.6). Using
radius for each of the following circles: the methods of Examples 1.5 and 1.6, obtain
(a) x2 + y2 = 9; (a) sin --14 π; (b) sin --12 π; (c) sin π;
(b) (x − 1)2 + y2 = 4; (d) sin(− --34 π); (e) cos --16 π; (f) cos --56 π;
(c) x2 + y2 − 2x − 2y − 21 = 0; (g) sin − --13 π; (h) cos − --23 π.
(d) 4x2 − 4x + 4y2 + 4y = 9.
1.15 (Trigonometric functions, Section 1.6). Use
1.8 (Circles, Section 1.3). Find the equation of the (1.18c) to show that
circle centred at (1, −2) with radius 3. (a) cos4A = --18 (3 + 4 cos 2A + cos 4A);
(b) sin4A = --18 (3 − 4 cos 2A + cos 4A).
1

1.9 Find the points of intersection of the following


circles and lines: 1.16 (Trigonometric functions, Section 1.6). Use
(a) x2 + y2 = 8 and x = 2; (1.17) to express the following in terms of sin x
(b) x2 + y2 − 2x + 2y − 4 = 0 and y = 2x + 1; and cos x:
(c) x2 + y2 = 1 and x + y = √2. (a) cos(x + --12 π); (b) sin(x + --12 π); (c) sin(x − --12 π);
(d) cos(x ± π); (e) sin(x ± π).
1.10 The following table contains experimental
data: 1.17 (Trigonometric functions, Section 1.6). Use
(1.17) to express the following in terms of the cos
x 1.06 0.84 0.72 0.44 0.23 and sin of --12 (x + y) and --12 (x − y):
(a) cos x + cos y; (b) sin x − sin y;
y 0 0.53 0.71 0.78 1.1 (c) cos x − cos y.
The hypothesis is that the points should lie on 1.18 State where the graphs of the following
a circle centre at the origin in the (x, y) plane. functions cross the x axis:
Find the distance of each point from the origin. (a) sin x; (b) cos x; (c) sin --12 x;
Calculate the average of these values, and write (d) cos 3πx; (e) cos(2x − --2 π); (f) e−x sin --12 πx.
1

down the equation of the circle approximation.


1.19 State the amplitude, angular frequency, period,
1.11 (Functions, Section 1.4). Draw sketches of and phase of the following harmonic outputs:
the following functions in the (x, t) plane over the (a) 2 cos(0.2t + 3.2);
intervals indicated: (b) 1.5 sin[0.2(t − 2.4)];
(a) x = H(t + 1) − H(t − 1) for −2  t  2; (c) 2 cos(0.2t + 0.12) + 2 cos(0.2t − 0.39);
(b) x = sgn(1 + t) + sgn(1 − t) for −2  t  2; (d) −cos t.
(c) x = tH(t − 1) for 0  t  2;
(d) x = (t2 − 1)[sgn(t + 1) + sgn(1 − t)] 1.20 State the inverse F(x) for each of the
for −2  t  2. following functions f(x) for the values of x stated:
(a) 4x2, x  0;
1.12 (Functions, Section 1.4). Using Heaviside (b) 2x + 3, −∞  x  ∞;
and signum functions, construct a single formula (c) sin 2x, 0  x  --14 π;
in each case for f(t) where (d) 2 sin x, 0  x  --12 π;
10 (t  –1), (e) cos x2, 0  x  √π;
(a) f(t) = 2 1 (–1  t  2), (f) sin(--12 π cos x), 0  x  --12 π;
1
30 (t  2); (g) x− –4 , x  0; (h) x2 + x, x  − --12 .
57
1.21 Sketch a graph of the function inverse to rotates at 4000 rpm (revolutions per minute).
the function x3 − x + 1. (There is no need to try If AB = 2.5 and BC = 5 (in cm), show that the

PROBLEMS
to solve the equation x3 − x + 1 = y for x.) displacement AC is given by
AC = 2.5[sin ω t + √(4 − cos2ω t)]
1.22 (Sections 1.10, 1.11). Solve the following
(in cm), where t is measured from θ = 0, and
equations for x:
state ω in radians per second.
(a) e2x = 13; (b) ln 3x = 2;
(c) ln x− –3 = 1; (d) 3 e3x = 1;
(e) ex + e−x = 2 (hint: multiply through by ex first); 1.28 An oscillation takes the form
(f) eln 2x = 4; (g) ln e2x + 3 ln e5x = 2; x = 3 cos ω t + 4 sin ω t.
(h) ln(x + 1) + ln(x − 1) = 0; By finding numbers c and φ such that
(i) ln(x + 1) + ln(x − 1) = e;
c cos φ = 3, c sin φ = 4
(j) 2x = 3; (k) 32x = --12 ;
(l) sinh 2x = 4; (m) 2 sinh x = 2 cosh x + 3. express x as a single cosine term. What are the
amplitude and phase of the oscillation?
1.23 Express 2x as a power of e.
1.29 The exponential function f(t) = C e−α t
1.24 (Section 1.12). Prove that 10x doubles its satisfies the conditions f(0) = 2 and f(1) = 0.5.
value in any interval of length equal to Find the constants C and α . What is the
ln 2 value f(2)?
.
ln 10
1.30 A yacht, which has a draught of 2 metres, is
1.25 Sketch regions in the (x, y) plane defined by anchored in a tidal estuary, in which the depth of
the following inequalities: water around the yacht is
(a) (x − 1)2 + y 2  9; 5 + 4.5 sin 0.5t
(b) x  0, y  0, and x + y  1;
(in metres), where the time t is measured in hours.
x2 y2
(c) +  1; What is the tidal period in hours? Over how many
4 9 hours in one period can the yacht float free of the
(d) x 2 + y 2  1 and x  0; estuary?
(e) | x | + | y|  1.
1.31 Draw sketches of the graphs of the following
1.26 Prove that tanh−1x = --12 ln[(1 + x)/(1 − x)] for
curves given in polar coordinates, by constructing
−1  x  1.
a table of values of r for equally spaced angles (say
15° intervals):
1.27 Figure 1.33 shows a cross-section of a simple
(a) the cardioid r = 0.5(1 + cos θ );
model of a piston and crankshaft. The crankshaft
(b) the folium r = (4 sin2θ − 1) cos θ ;
(c) the four-leaved rose r = sin 2θ ;
(d) the Archimedean spiral r = 0.04θ (extend the
interval in θ to [0, 6π]);
(e) the equiangular spiral r = 0.1 e0.1θ (extend the
interval in θ to [0, 6π]).

C 1.32 Sketch the graphs of the following


functions:
(a) sgn sin x; (b) sgn cos 2x;
(c) H(x) sin x; (d) sin2x;
(e) |sin x|; (f) sin| x|;
(g) H(x − π)sin x.
ω

1.33 The coordinates of three vertices of a


B rectangle are given by
(−7, 3), (1, −3), (4, 1).
θ Find the coordinates of the fourth vertex.
Fig. 1.33 A Determine also the area of the rectangle.
58
1.34 State which of the following functions are 1.40 (Geometric series, Section 1.15). Using the
periodic, and, if so, find the (minimum) period: geometric series formula find the sums of the
STANDARD FUNCTIONS AND TECHNIQUES

(a) sin 4x; (b) cos(π + t); (c) sin t + cos 2t; following series:
(d) sin(x2); (e) e−sin x; (f) cos2x; 7 6 5

(g) x sin x; (h) | sin x |; (i) 1/(4 + sin2t); (a) ∑ (0.5) ; n


(b) ∑ (--) ;
1 n
3 (c) ∑e –2n
;
( j) sin --14 t; (k) sin 3t + cos 9t; n=1 n=2 n=0

(l) sin(√2t) + sin t (note: √2 is irrational).


6 10
(d) ∑ n2 ; n
(e) ∑ (−--) ; 1 n
2
n=1 n=1
1.35 Decide which of the following functions are 6
even, odd, or neither even nor odd: (f) ∑ [2(0.5) n
+ 3(0.6)n].
(a) x2 + x3; (b) x2 + 2x4; n=0

(c) x + sin x; (d) sin x cos x;


(e) e–x ; 2
(f) ln(1 + x2); (g) esin x. 1.41 Find a formula for the sum of
x + x5 + x9 + ··· + x1+4n + ··· + x41.
1.36 (Partial fractions, Section 1.14). Express the
following in partial fractions: 1.42 ABC is a triangle with sides a, b, c opposite to
1 x the corresponding angles. Prove the cosine rule (see
(a) ; (b) ; eqn (1.17d)) that
(x − 2)(x + 3) (x + 1)(x + 2)
2x − 1 1 c2 = a2 + b2 − 2ab cos C.
(c) ; (d) ;
x(x − 1) x(x + 1)(x + 2) (Hint: drop a perpendicular from B on to b. Look
1

1 1 for a cos C, and then for an opportunity to use


(e) ; (f) ; Pythagoras’s theorem.)
x(x2 − 1) x(x + 2)2
x2 x−1
(g) ; (h) ; 1.43 Let A, c, t0, and T be any constants, and put
(x + 1)(x + 2)2 x2 − 2x − 3
f(t) = A ect.
1 1
(i) ; ( j) . Show that the sequence
x(x − 1)2 x3 + x2
f(t0), f(t0 + T), f(t0 + 2T), … ,
1.37 (Partial fractions, Section 1.14). In the is a geometic progression.
following problems with irreducible factors,
express the functions in partial fractions: 1.44 Express the following recurring decimals as
1 fractions:
(a) ; (a) 1.111…; (b) 0.999…;
x(x 2 + x + 1)
(c) 0.010 101 0…; (d) 0.090 909 0…;
x (e) 0.666…; (f) 2.72….
(b) ;
(x − 1)(x2 + 1)
x 1.45 Obtain the sum of the following infinite
(c) .
(x + 1)(x 2 + 2x + 6) geometric series:
∞ ∞

1.38 Express the following in partial fractions: (a) ∑ (--) ;


1 r
2 (b) ∑( 1 r
);
10
r=0 r=0
1 ∞ ∞
(a) ;
x 2(x2 + 1) (c) ∑e –r
; (d) ∑ (–1)r(--) ; 1 r
2
r=0 r=0
x3
(b) ; 2 4 8 …
(x + 1)(x + 2) (e) 1 − + − + .
3 9 27
x3
(c) .
(x − 9)(x + 1)
2
1.46 Calculate the numerical values of:
(a) 4!, 6!, and 7!;
1.39 Write down in full all the terms in each of (b) 12!/11!;
the series: (c) 10!/7!;
5 4 4 (d) 12!/(9!3!);
1
(a) ∑2; j
(b) ∑1+n 2
; (c) ∑ nx .n (e) n!/[r!(n − r)!] when n = 10 and r = 3;
j=2 n=0 n=1 (f) 3!/[r!(3 − r)] for each of the cases r = 0, 1, 2, 3.
59
1.47 (a) Simplify (i) n!/(n − 2)!, (ii) (n + 1)!/ (b) Split up the combinations according to the
(n − 1)!, where n is a positive integer. number of men/women among them, and

PROBLEMS
(b) Express the following in terms of factorials: obtain the numbers in each such grouping.
(i) 2 × 4 × 6 × ··· × (2m), (ii) 1 × 3 × 5 × ··· × Check the total against the number in (a).
(2m + 1), where m is a positive integer.
1.53 Suppose there is a collection of N objects of
1.48 (a) Calculate the numbers represented by different types A, B, etc., all the separate types
(i) 5 P4; (ii) 9 P3; (iii) 6 P3; (iv) 7C3; (v) 7C4; (vi) 10C5; being distinct from the others in some way, but
(vii) 100C98; (viii) (107). objects of a particular type are identical. There are
(b) Show that n Pn = n Pn−1, and explain why by NA identical objects of type A, NB identical objects
using an example. of type B, and so on. Show that:
(a) (A generalization of Example 1.22.) the
1.49 Given four letters A, B, C, D, obtain: possible number of distinct permutations
(a) the number of possible permutations of the four of the N objects is
letters, without repetitions of letters within a N!
permutation; ;
NA! NB! NC! …
(b) the number of three-letter combinations of
the letters, taken three at a time without (b) the combined number of distinct combinations
repetitions; that may be formed out of 1, 2, … , and N of
(c) the number of distinct four-letter the objects is equal to
permutations, in which all possible repetitions [(NA + 1)(NB + 1) … ] − 1.
within a permutation are allowed;
(d) the number of distinct three-letter 1.54 (a) Five representatives from each of the
combinations in which a letter may be countries France, Germany, Italy, and the UK are
repeated up to three times; to be seated along one side of a long table. Each
(e) the total number of permutations containing national group should sit together. In how many
from one to four letters without repetition; orders may the individuals be seated?
(f) the total number of distinct combinations (b) Suppose that the table is circular with the
containing from one to four letters, when up to representatives seated all round it. How many
three occurrences of a single letter is allowed. distinct orders are possible then? (Distinct
permutations are to be understood as circular
1.50 Find: permutations in the sense of Example 1.23.)
(a) the number of distinct three-letter ‘words’
obtainable from the letters A, B, C, D, E, in 1.55 Three prizes are to be distributed among
which E may occur 0, 1, or 2 times, but the 10 candidates. How many possible distinct
rest may occur only once; distributions are there in the following cases?
(b) the possible number of distinct six-letter words (a) The prizes are all equal with at most one for
in which E occurs exactly twice and the other any person.
letters only once. (b) The prizes are all unequal, and only one may
go to any person.
1.51 (a) How many distinct four-digit numbers (c) All the prizes are equal, and any person may
may be made up by using the digits 1, 2, 3, 4, 5, no receive up to three prizes.
digit being used more than once? (d) As in (c), but the prizes are all different.
(b) How many of the numbers in (a) are divisible
by 5? 1.56 (a) A committee of 4 is to be formed from
(c) How many of the numbers in (a) are divisible a pool of 2 accountants, 2 lawyers and 3 doctors.
by 2? How many committees contain (i) exactly 1
(d) How many distinct positive numbers are accountant, (ii) exactly 1 doctor? (Hint: enumerate
obtainable by using not more than four of the the possibilities.)
digits taken without repetition from the digits (b) The argument which follows is fallacious,
0, 1, 2, 3, 4? and the result it produces is false:
“Given N different objects, attempt to obtain a
1.52 There are four women and three men eligible new expression for the number of combinations
to fill four posts. NCn of n objects by means of a two-stage process.
(a) What is the total number of distinct Choose any r < n; there are NCr combinations of
combinations of personnel that can be selected? size r. To obtain combinations of size n,
60
supplement each of these by all combinations completed year (the annual growth rate) from the
N−rCn−r of size n − r taken form the remaining time it was purchased. Show that its value VN at the
STANDARD FUNCTIONS AND TECHNIQUES

N − r objects. This gives a total of N−rCn−r NCr end of the N th completed year is VN = A(1 + R)N.
combinations. Therefore N−rCn−r NCr = NCn.” Calculate VN over 5, 10, and 15 years when
Show that this result is false in general, and A = £1000 and R = 0.03 (usually expressed as 3%).
locate the source of the fallacy, by considering the (b) To value an investment when the time t
simple case where N = 7, n = 4 and r = 3, using from purchase is not a whole number of complete
letters for the objects. years the analogous formula Vt = A(1 + R)t is to
be used, where t is measured in years. Show that
1.57 (a) Write from memory the binomial over every period of T years the investment
expansions of the expressions (1 + x)n and (a + b)n, grows by a factor (1 + R)T (the proposed
where n is a positive integer. extension therefore has exactly the property
(b) Expand the expression (1 − x)6. we should hope for).
(c) Expand and simplify (x + x −1)5 and (x − x −1)5 (c) Obtain the doubling period of the investment
where x ≠ 0. when R = 3%, 6% and 9%. Obtain the 10-times
period when R = 6%.
1.58 Use the binomial theorem to show that
(1.01)10 ≈ 1.105. In a similar way, make an 1.63 Income from an investment is at a rate
approximation to (0.99)8. expressed as R per annum, but it is paid out
monthly to the investor at a rate r per month
1.59 By giving special values to the constants in on the current balance. Express r in terms of R.
the binomial theorem prove that, when n is a Why is R > 12r?
1

positive integer:
(a) 1 + 2 nC1 + 22 nC2 + ··· + 2n nCn = 3n, and 1.64 Money is borrowed from a finance company
1 − nC1 + nC2 − ··· + (−1)n nCn = 0. at an interest rate of rM per month. What is the
(b) 1 + nC2 + nC4 + ··· = 2n−1 = nC1 + nC3 + nC5 + ··· . equivalent compounded rate per annum? Calculate
the annual rate when the monthly rate is 1% and
1.60 Let
3%.
F(n, k) = nC0 + n+1C1 + n+2C2 + ··· + n+kCk,
where n and k are positive integers. Show that 1.65 (a) (Geometric series: a model savings
F(n, k) + n+k+1Ck+1 = F(n, k + 1). Check that scheme.) At the start of every year an amount A
F(n, 0) = n+1C0 , F(n, 1) = n+2C1 is put into a savings scheme. The interest on the
F(n, k + 1) = F(n, k) + n+k+1Ck+1 current balance at the end of each complete year
is reinvested, the (constant) annual rate being R.
for all n and k. By starting with k = 0 and k = 1, Show that the value VN of the fund at the end of
deduce that year N is given by VN = A(1 + R)[(1 + R)N − 1]/R.
F(n, k) = n+k+1Ck (b) Calculate VN after 10 years at 5% interest,
for all values of k. the annual subscriptions being £100, and find the
percentage gain on the total sum invested.
1.61 Expand 1/(x 2 + 3x + 2) in powers of x using (c) Find the expression for the fund value if the
partial fractions. saver contributes an amount 2A every 2 years,
over a period of 2M years, where M is an
1.62 (a) The value of a single investment of integer. Obtain the value of the fund using
amount A grows by a constant fraction R in each the data in (b).
Differentiation
2

CONTENTS

2.1 The slope of a graph 62


2.2 The derivative: notation and definition 65
2.3 Rates of change 67
2.4 Derivative of xn (n = 0, 1, 2, 3, … ) 69
2.5 Derivatives of sums: multiplication by constants 70
2.6 Three important limits 72
2.7 Derivatives of e x, sin x, cos x, ln x 74
2.8 A basic table of derivatives 76
2.9 Higher-order derivatives 77
2.10 An interpretation of the second derivative 79
Problems 80

Much of the material in this book requires knowledge of a branch of mathe-


matics called calculus, and its fundamental ideas and techniques are given in
Chapters 2– 4 and 15–17. The present chapter introduces the derivative function,
which is one cornerstone of calculus.
Suppose we have a formula describing a curve in x,y axes, say y = x 3. Then we
can draw its graph. To obtain the slope or gradient at any particular point we may
draw a tangent to the curve at that point, and measure the inclination with a pro-
tractor. But if we need a formula giving the steepness of the curve at an arbitrary
point (in this case the formula is ‘slope = 3x2’), then we need fresh ideas.
A given function (in the above case x3) gives rise to a so-called derivative func-
tion (representing the slope, in this case 3x2). We first explain how the derivative
function is arrived at in general. Try to understand Sections 2.1–2.3 thoroughly,
especially the use of the notation dy/dx and the concept of limit.
Next, we obtain explicitly the derivatives of the elementary functions xn, ex,
ln x, sin x and cos x, and linear combinations of these. The proofs are quite
complicated, but the results are simple. Table (2.19) should be firmly memorized,
as if you were learning a short but vital vocabulary in a foreign language.
62

2.1 The slope of a graph


DIFFERENTIATION

Figure 2.1 shows the graph of a straight line. The x and y coordinates are assumed
to have the same scale. Choose any two points P : (x1, y1) and Q : (x2, y2) which
lie on the line. If we measure the angle α from the positive x direction then
y2 − y1
tan α = (2.1)
x2 − x1
(see (1.6)). The value of tan α remains the same whether Q is to the right or left of
2

P, since the value of the fraction on the right is unchanged. The angle α itself will
differ in the two cases by an amount equal to π (or 180°), but this does not affect
the value of tan α. (If we refer to α itself to indicate the steepness of a line, we
choose the value that lies between ±90°, but normally we only need tan α.) Notice
that if the x and y scales differed, the angle α as depicted would not satisfy (2.1);
it would be too great or too small.

(x2, y2)
Q
(x1, y1) y2 − y1
α
P x2 − x1 N

O x Fig. 2.1

The slope or gradient of a straight line is defined to be the quantity tan α. If the
line is horizontal, tan α is zero. It is positive or negative according as the line
slopes upwards or downwards respectively as we go from left to right. Its magni-
tude increases or decreases as the inclination increases or decreases respectively,
becoming ±∞ when α = ±90°.
Consider now the slope or gradient of a curve at a point. Figure 2.2a shows a
typical curve. By the slope of the curve at the point P we mean the slope of the
tangent line to the curve at P. We can think of the tangent line as the line joining
two points on the curve which are ‘infinitely close together’, but it is no use
making P and Q coincide, since we simply get tan α = 0/0, which has no definite
meaning. It is necessary to carry out an indirect process.
Let P be the fixed point (see Fig. 2.2b). Take any other point Q on the curve
and join PQ by a straight line, called the chord PQ. If Q is some distance from P,
then the slope of PQ will not be close to that of PT, but if we take a succession of
points Q closer and closer to P, then the slope of the chord PQ can be made as
close as we wish to that of PT. The succession of points Q that we consider is
said to approach P. The corresponding value of the slope of PQ then approaches
a limit or a limiting value, and this is equal to the slope of the curve at P. As in
Section 1.10 the sign → signifies the word ‘approaches’, so we can write:

as Q → P, slope of PQ → slope of the curve at P. (2.2)


63

(a) (b)

2.1
y y Q

THE SLOPE OF A GRAPH


T T
Q
P
at Q
line
ent
tang
P P

O x O x

Fig. 2.2

We shall be able to obtain the exact value of the slope of PT, which is the same
as the slope of the curve at P, by carrying out the approach of Q to P in algebraic
terms. To do this, we introduce a new symbol
δx
(pronounced ‘delta-x’). This is a single symbol: the Greek letter δ stands for the
words ‘the increment in’ or ‘the change in’ something: in this case, the increment
in the value of x from P to Q. We represent the point Q by the abscissa x + δx.
Note that δx can be positive or negative depending on the position of Q in
relation to P. The corresponding incremental change in y from P to Q is δy.
The coordinates of Q are (x + δx, y + δy) as shown in Fig. 2.3. The separation
between P and Q is indicated by the differences δx and δy in the triangle PNQ.
Then, by (2.1),
δy
slope of PQ = . (2.3)
δx
Now let δx → 0 so that Q → P: the ratio δy/δx approaches a number which is
equal to the value of the slope at P. We first show what happens numerically in a
particular case.

y (x + δx, y + δy) Q

δy
P δx
(x, y) N

O x Fig. 2.3
64

Example 2.1 Find the slope of the curve y = x2 at the point P : ( 13 , 19 ) on the curve
DIFFERENTIATION

(Fig. 2.4).

1 1
( 3 + δx, 9 + δy)
1
9 + δy Q

δy
2

1 1
( 3, 9 ) δx
1
N
9 P
O 1
3
1
3 + δx x Fig. 2.4

At P, we have x = 13 and y = 19 . We shall make a table of values of δy/δx for diminishing


values of δx; that is, for points Q which are approaching P (from either side). We first
need to express δy in terms of δx:
δy = (y at Q) − (y at P) = ( 13 + δx)2 − ( 13 )2
= ( 19 + 23 δx + δx2 ) − 19 = 23 δx + δx2 . (2.4)
For Q to the left of P:
δx −0.1 −0.001 −0.0001 …
δy −0.056 −0.006 56 −0.000 665 6 …
δy
0.56 0.656 0.6656 …
δx
For Q to the right of P (as shown in Fig. 2.4):
δx 0.1 0.001 0.0001 …
δy 0.076 0.006 76 0.000 667 6 …
δy
0.76 0.676 0.6676 …
δx
(6 means that the number 6 recurs: e.g. 0.6 = 0.666 66…; see Section 1.1)
First notice that if we put δx = 0 we obtain δy/δx = 0/0, which gives no
information at all, so we have to look at the sequence of values for δy/δx as δx → 0.
Inspection suggests that each term is formed from its predecessor in a regular way, so
we can predict that:
as δx → 0, slope of PQ → 0.666 66… = 23 .
The exact value of the slope at P is therefore 23 .

We could have worked out the slope at P in this example without doing any
calculation. From (2.4),
δy 2
δx + (δx)2
= 3 = 2
+ δx (2.5)
δx δx
3

provided that δx ≠ 0. Therefore, when δx → 0, δy/δx → --23 , as before.


Finally, in exactly the same way, we can find a general formula giving the slope
at any point P : (x, y) on the graph of y = x 2. The value of δy corresponding to a
value of δx is given by
65
δy = (x + δx) − x = x + 2x δx + (δx) − x = 2x δx + (δx) .
2 2 2 2 2 2

2.2
Therefore
δy 2x δx + (δx)2

THE DERIVATIVE: NOTATION AND DEFINITION


= = 2x + δx.
δx δx
Now let δx → 0; we obtain:
the slope of y = x2 at (x, y) is 2x. (2.6)

This process is a model for treating other functions: for example, you could now
show in the same way that the slope of the graph of y = x3 at any point is equal
to 3x 2.

2.2 The derivative: notation and definition


We shall need to find the value approached by δy/δx as δx → 0 in many different
situations. There is a special notation used to signify the process:

Limit notation
Let y = f(x), so that δy = f(x + δx) − f(x). Then the value approached by δy/δx
when δx → 0 is denoted by
δy
lim .
δx→0 δx
(2.7)

Read this as ‘The limit, or the limiting value, of δy/δx as δx → 0’. (The lim sign is
used in many other contexts too.)
The result of the process limδx→0 δy/δx, where δy = f(x + δx) − f(x), is called the
derivative of y with respect to x, or the derivative of f(x). The process is called dif-
ferentiation. We worked out earlier that, if y = f(x) = x 2, then the derivative is equal
to 2x. The following notations are standard short ways of indicating a derivative:
dy df (x) d δy
, , or f (x) all signify lim .
dx dx dx δx →0 δx
The symbol dy/dx is usually pronounced ‘dee-y by dee-x’. Notice that the letter
used is an ordinary d, not δ.

Derivative, slope, and tangent


(a) Let y = f(x) and δy = f(x + δx) − f(x). Then the derivative of y with respect
to x, signified by
dy df (x) d df
, , f (x), or ,
dx dx dx dx
all stand for the limit of δy/δx as δx → 0:
dy δy
= lim .
dx δx→0 δx
66
Thus our earlier result for y = x2 can be written in several ways:
DIFFERENTIATION

dy d(x2 ) d 2
or or x = 2x.
dx dx dx
Strictly speaking, dy/dx should be regarded as a single shorthand symbol
representing the longer expression limδx→0 δy/δx, and not as a ratio which can
be taken to pieces. However, its great usefulness is that it often behaves just
like an ordinary ratio of nonzero quantities, and we shall later see cases where
this property guides us to true results and makes them easy to remember.
2

(b) The slope m of a curve at any point (x0, y0), where y0 = f(x0), is given by
⎛ dy ⎞
m=⎜ ⎟ ,
⎝ dx ⎠ x = x
0

where the derivative is evaluated at x = x0. Therefore the equation of the


tangent line at the point is
y − y0 ⎛ dy ⎞
=⎜ ⎟ .
x − x0 ⎝ dx ⎠ x=x 0 (2.8)

It is sometimes useful to think of the symbol


d
dx
standing alone as meaning: ‘differentiate what follows’. It is also called an oper-
ator, meaning that we operate on one function (x2 say) to produce another (i.e. 2x).
Sometimes the symbol D is used to stand for the operator d/dx. We would then
write
Dx 2 = 2x.
The normal to a curve at a point x = x0 is the straight line which is perpendicular
to the tangent at the point. If m is the slope of the tangent then, by (1.9), the slope of
the normal is
1 ⎛ dy ⎞
− = −1 ⎜ ⎟ .
m ⎝ dx ⎠ x = x0

Hence the equation of the normal at (x0, y0) is


1
y − y0 = − (x − x0 ).
m

Self-test 2.1
Find the equations of the tangent to the curve y = x − x3 at (1, 0), and the nor-
mal to the same curve at (--12 , --38). Find where the tangent and normal intersect.
67

2.3 Rates of change

2.3
The quantity limδx→0 δy/δx is usually needed to solve problems which have no
immediate connection with the slope of graphs: this idea was only introduced to

RATES OF CHANGE
give the reader a picture to hold on to. Moreover, it is not always appropriate to
call the variables x and y if other letters arise more naturally.
For example, suppose that a car is moving along a straight road, represented by
an x axis, and that at time t its displacement (which may be positive or negative)
from the origin is given by
x = f(t).
We can deduce the velocity from moment to moment from this information.
Choose any moment t, and suppose that, between times t and t + δt, the car
moves from x to x + δx. Then δx is given by
δx = f(t + δt) − f(t).
The quantities δt and δx could be imagined as being recorded with a stopwatch
and distance meter, and the average velocity over the interval δt is

change of displacement f (t + δt) − f (t) δx


= = .
time taken δt δt
The smaller that δt is, the more nearly will this ratio approximate to the instan-
taneous velocity v at time t. Therefore, let δt → 0; using the notation (2.7), we
obtain
δx
v = lim ,
δt → 0 δt

or alternatively, by (2.8),
dx
v= .
dt

We can borrow the result (2.6) to complete the calculation in one case.
Suppose that
x = t 2.
Equation (2.6) says in effect that
dy
if y = x2 then = 2x,
dx
and by changing the letters x and y to t and x respectively we obtain:
dx
if x = t 2 then = 2t.
dt
Therefore the velocity is
v = 2t.
68
Another way of expressing the meaning of velocity is that velocity is the rate of
change of displacement with time. Similarly, acceleration a is the rate of change
DIFFERENTIATION

of velocity with time:


dv
a= .
dt
For the case when x = t2 we have, with v = 2t,
δv = 2(t + δt) − 2t = 2δt,
2

so
dv δv
a= = lim = 2.
dt δt → 0 δt
The expression ‘rate of change’ means the same as the term ‘growth rate’ that we
used in Section 1.10. As seen in the next example, the idea of rate of change is
quite general and need not involve time.

Example 2.2 Find the rate of change of the area of a circle with respect to
its radius.
Call the radius r and the area A. The rate of change of A with respect to r is
dA δA
or lim .
dr δr → 0 δr

Since A = πr 2, we have
δA = π(r + δr)2 − πr 2 = π(2r δr + (δr)2).
Therefore
δA
= π(2r + δr).
δr
Now let δr → 0; we obtain
dA δA
= lim = 2πr.
dr δr → 0 δr
This result could have been obtained by using our previous result
d(t 2 )
= 2t,
dt
with r in place of t, and multiplying it by π. (Notice also that 2πr is the circumference:
the result can be interpreted as meaning that if we increase r by a small amount δr,
then the area increase is nearly equal to that of a narrow strip of length 2πr and
breadth δr.)

Self-test 2.2
The volume V of a sphere of vadius r is given by V = --43 πr 3. Find the rate of
change of volume with radius.
69

2.4 Derivative of xn (n = 0, 1, 2, 3, … )

2.4
The following is our first general result:

DERIVATIVE OF x n (n = 0, 1, 2, 3, … )
(a) if y = c, where c is a constant, then
dy
= 0.
dx
(b) If y = xn, where n = 1, 2, 3, … , then
dy
= nxn–1.
dx (2.9)

To prove (a): the graph of y = c is a horizontal straight line; therefore its slope
is zero, so dy/dx = (d/dx)c = 0.
To prove (b) in the most elementary way, we shall use an identity: if n is a
positive integer and a, b are any numbers,
an − bn ≡ (a − b)(an−1 + an−2b + an−3b 2 + ··· + bn−1).
This can be verified by multiplying out the two brackets on the right; everything
cancels except for the two terms on the left.
Follow (2.8a), with f(x) = xn, so that δy = (x + δx)n − xn:
dy δy 1
= lim = lim [(x + δx)n − xn ].
dx δx → 0 δx δx → 0 δx
Put a = x + δx and b = x into the identity, noticing that
a − b = (x + δx) − x = δx.
δy 1
= (δx)[(x + δx)n−1 + (x + δx)n−2x + ··· + xn−1]
δx δx
= (x + δx)n−1 + (x + δx)n−2x + ··· + xn−1
when δx ≠ 0. Now let δx → 0; we obtain
dy δy
= lim = xn−1 + xn−1 + ··· + xn−1.
dx δx → 0 δx
There are n terms on the right, each equal to xn−1, so finally,
dy
= nxn −1.
dx
(In Section 3.4 we show that (2.9b) is in fact true for all values of n.)
70

Example 2.3 Obtain (a) the general expression for dy/dx when y = x 3; (b) the
DIFFERENTIATION

slope of the curve y = x 3 at P : (2, 8); (c) the angle of inclination of the tangent
line to y = x 3 at the point P; (d) the equation of the tangent line through P;
(e) the velocity v and acceleration a of a point with coordinate x, when x = t3.
(a) From (2.9) with n = 3, we have
dy
= 3x3 −1 = 3x2.
dx
(b) The slope of the curve at P is equal to the value of dy/dx at x = 2, which is 12.
2

(c) The slope is equal to tan α, where α is the angle of inclination, so α = 85.2°.
(d) Let (x, y) now represent any point on the tangent line at (2, 8). The slope of the
tangent is equal to 12, so from (2.1).
y−8
= 12.
x−2
Therefore the equation of the tangent line is y = 12x − 16.
(e) From Section 2.3, v = dx/dt = (d/dt)t 3 = 3t 2. Also
dv d d
a= = 3t 2 = 3 t 2 = 6t.
dt dt dt
A little thought about the process of finding limδt→0 δv/δt will persuade the reader that it
is right to take the constant 3 from under the differentiation sign in the last line; see also
the next section.

Derivatives of sums: multiplication


2.5
by constants
The following are general rules which become obvious when the definition (2.8)
is applied to them.

Linear combinations of functions


(a) If C is a constant, then
d d
[Cf(x)] = C f(x).
dx dx
d d d
(b) [f(x) + g(x)] = f(x) + g(x).
dx dx dx
(c) If A, B, C, … are constants, then
d
[Af(x) + Bg(x) + Ch(x) + ··· ]
dx
d d d
= A f (x) + B g(x) + C h(x) +  .
dx dx dx (2.10)

The result (2.10c) follows easily by repeated use of (a) and (b). We can use
this rule together with (2.9) to obtain the derivatives of polynomials, as in the
following example.
71

Example 2.4 Obtain dy/dx when (a) y = 3x 3 − 12 x2 + 5; (b) y = 3x(x2 − 2).

2.5
(a) From (2.10c),

DERIVATIVES OF SUMS: MULTIPLICATION BY CONSTANTS


dy d d(x 3 ) 1 d(x2 ) d(5)
= (3x 3 − 12 x2 + 5) = 3 − +
dx dx dx 2 dx dx
= 3(3x2) − 12 (2x) + 0 = 9x2 − x.
(b) It is necessary in this case to express y without brackets; that is, as a polynomial:
dy d d
= [3x(x2 − 2)] = (3x 3 − 6x)
dx dx dx
d(x 3 ) dx
=3 −6 (from 2.10c)
dx dx
= 9x2 − 6.
These derivatives, of course, represent the slopes of the corresponding graphs.

Example 2.5 A car travels along a straight road with varying velocity v for
one hour. At time t hours, its displacement from the starting point O is given by
x = 60t 2(3 − 2t) (0  t  1) kilometres. Find expressions for (a) the velocity v;
(b) the acceleration a.
(a) The velocity is the rate of change of displacement with time:
dx d
v= = [60t 2(3 − 2t)]
dt dt
d ⎛ d(t 2 ) d(t 3 )⎞
= 60 (3t 2 − 2t 3 ) = 60 ⎜ 3 −2 ⎟
dt ⎝ dt dt ⎠
= 60[3(2t) − 2(3t 2)] = 360(t − t 2) in km h−1.
The car stops after 1 hour at t = 1.
(b) Acceleration is the rate of change of velocity with time:
dv
a= .
dt
Therefore
a = 360(1 − 2t) (in km h−2).

Example 2.6 The potential energy V of a pendulum of length l with a bob


of mass m is given approximately by V = mgl(θ − 16 θ 3) when its angle of
inclination θ (radians) is small. Find the rate of change of V with respect
to θ. (This quantity is associated with the moment exerted by gravity.)
Using the letters suggested by the question, we require
dV d ⎛ dθ 1 d(θ 3 )⎞
= [mgl(θ − 16 θ 3 )] = mgl ⎜ − ⎟ = mgl(1 − 2 θ 2 ).
1
dθ dθ ⎝ dθ 6 dθ ⎠

Self-test 2.3
Find the derivative of y = 10x7 + 7x10.
72

2.6 Three important limits


DIFFERENTIATION

In order to increase the repertory of functions that we can differentiate, three


important limits are needed. You might not need to learn the proofs, but the
results (2.11), (2.13), and (2.14) are essential, and you should try to acquire a
feeling for what is happening by examining the numerical tables given; or better,
by working out tables of your own.
Instead of δx, the letter ε (greek epsilon) will be used to represent the quantity
that tends to zero. (The limit is not affected by the letter we use.)
2

First consider

eε − 1
lim ,
ε→0 ε

where e = 2.718 28… is the number defined in Section 1.10. If we put ε = 0 we get
0/0, which is meaningless, but the approach to a limit can be seen in the following
table:

ε 0.1 0.01 0.001 …


eε − 1
1.0517 1.0050 1.0005 …
ε

(An approach to zero through negative ε values is similar.) It looks as if the limit
is equal to 1.
To prove this, recall that in Section 1.10 it was shown that the graph y = ex
intersects the y axis at 45°; that is to say, its slope there is equal to 1. (This is the
characteristic property of the base e = 2.7128….) The same thing is true if we plot
y = eε against ε, as in Fig. 2.5. Referring to this figure:

eε − 1 RQ − OP NQ
= = ,
ε PN PN

which represents the slope of the chord PQ. When ε → 0, the slope of the chord
PQ approaches the slope of the tangent PT, which is equal to 1. Therefore we have
proved that

eε − 1
lim = 1.
ε→0 ε
(2.11)

The second limit to be considered is

sin ε
lim ,
ε→0 ε
73

2.6
Q
T

THREE IMPORTANT LIMITS


P 45°
N
1 ε

O R
Fig. 2.5 Graph of y = ex
ε
(equalscales)

ε being measured in radians. The approach to the limit is shown in the following
table, which includes negative values of ε:

ε ± 0.1 ± 0.08 ± 0.06 ±0.04 ±0.02


sin ε
± 0.998 33 ± 0.998 93 ± 0.999 40 ±0.999 73 ±0.999 93
ε

The limit looks as if it might equal 1.


To prove this, consider Fig. 2.6a. PN is any line segment perpendicular to the
base line AB, with Q any point to the left of N, and we allow ε to represent the
angle PQN (in radians). The arc PR is a circular arc with centre Q and radius
PQ. Then
PN
= sin ε, so PN = PQ sin ε.
PQ
Also (radian property, Section 1.5)
arc PR = PQ × › = PQ ε.
Therefore

sin ε PN
= . (2.12)
ε arc PR

P
(a) (b) P

ε ε
B B
A Q N R AQ NR

Fig. 2.6
74
Now let Q recede some distance towards the left, as illustrated in Fig. 2.6b. The
angle ε decreases, and ε → 0 as Q recedes to infinity. At the same time the arc PR
DIFFERENTIATION

approaches the straight line PN, tending ultimately to coincide with it. Therefore,
when ε → 0, the length of the arc PR approaches the length of PN; so, from (2.12),

sin ε
lim = 1. (2.13)
ε→0 ε
2

Finally we consider
ln(1 + ε)
lim .
ε→0 ε
Figure 2.7 shows the graphs of y = ln ε and y = ln(1 + ε ). The graph y = ln ε (see
Fig. 1.31) passes through the point (1, 0) at 45° to the ε axis. The graph y = ln(1 + ε )
is the same graph moved over to the left by a distance 1, so it passes through the
origin O at 45°: that is to say, it has slope equal to 1 at the origin. Therefore

ln(1 + ε)
lim = 1.
ε→0 ε (2.14)

y
1

45° 45°
ε
ε
)

0 ε 1 2
1+

ln
ln(

y=
y=

−1 Fig. 2.7

You may be glad to know that there are no more complicated limits to be
evaluated.

Self-test 2.4
What are the following limits?
e3ε − eε sin 2ε ln(1 + 3ε)
(a) lim ; (b) lim ; (c) lim .
ε→0 ε ε→0 ε ε→0 ε

2.7 Derivatives of e x, sin x, cos x, ln x


These follow from the definition (2.8) and the limits obtained in the previous
section.
75
First let

2.7
y = e x.
Then according to the definition (2.8),

DERIVATIVES OF e x , sin x, cos x, ln x


dy ex+δx – ex ex eδx – ex eδx – 1
= lim = lim = lim ex .
dx δx → 0 δx δx → 0 δx δx → 0 δx
Now put
δx = ε.
The previous expression becomes
eε – 1
ex lim = ex ,
ε →0 ε
by (2.11). Therefore

d x x
e =e .
dx (2.15)

The rate of increase of ex is therefore numerically equal to ex itself. The simpli-


city of (2.15) is the reason why the number e is a desirable base for exponential
functions. The result is not so simple for any other base.
Next, consider
y = sin x.
(If x is an angle, then x and δx are in radians.) Then
dy sin(x + δx) − sin x
= lim .
dx δx → 0 δx
From the formula in Appendix B(d)
sin C − sin D = 2 sin 21 (C – D) cos 21 (C + D)
for any C and D. Put C = x + δx and D = x into the identity. Then we have
dy 2 sin 21 δx cos(x + 21 δx) sin 1 δx
= lim = lim 1 2 cos(x + 21 δx).
δx 2 δx
dx δx → 0 δx → 0

Putting 1
2 δx = ε , we have from (2.13),
dy sin ε sin ε
= lim cos(x + ε ) = lim lim cos(x + ε ) = cos x.
dx ε → 0 ε ε →0 ε ε →0
Therefore

d
sin x = cos x (where x is in radians).
dx (2.16)
76
By a closely similar argument, it can be shown that
DIFFERENTIATION

d
cos x = − sin x, (x in radians).
dx (2.17)

(Notice the minus sign that occurs here.)


Finally, suppose that
y = ln x.
2

Then from (2.8) and (1.23d),


dy ln(x + δx) − ln x 1 x + δx 1 ⎛ δx ⎞
= lim = lim ln = lim ln ⎜ 1 + ⎟ .
dx δx → 0 δx δx → 0 δx x δx → 0 δx ⎝ x⎠
By putting
δx
= ε , or δx = xε,
x
the previous equation becomes
dy 1 ln(1 + ε )
= lim .
dx ε → 0 x ε
By eqn (2.14) the limit of the part containing ε is 1, so we have

d 1
ln x = .
dx x (2.18)

(Remember that x must be positive for ln x to have a meaning.)

2.8 A basic table of derivatives


We assemble the results (2.15) to (2.18) from Section 2.7, and (2.9) for powers of
x, in a short table of derivatives.

Derivatives of the elementary functions

Function Derivative
y = f(x) dy/dx or df(x)/dx
c (c = constant) zero
x n (n = 1, 2, … ) nx n−1
ex ex
sin x cos x
cos x −sin x
ln x (x positive) 1/x (or x −1)
(2.19)
77
The derivatives of more complicated functions can be obtained from these by
using the rules described in the next chapter. A more extensive table is given

2.9
in Appendix D. Remember rule (2.10) for the addition of functions and multi-
plication by constants.

HIGHER-ORDER DERIVATIVES
Example 2.7 Obtain the equation of the tangent line at the point ( 21 π, π) on the
graph of y = 2x − 3 cos x.
At a general point on the curve,
dy d dx d
= (2x − 3 cos x) = 2 −3 cos x (by (2.10))
dx dx dx dx
= 2 − 3(−sin x) = 2 + 3 sin x
(from the table). At ( 12 π, π) this becomes equal to
2 + 3 sin 12 π = 5,
and this is the slope of the tangent line at the point. The equation of the tangent line is
therefore
y−π
= 5,
x − 12 π
or y = 5x − 23 π.

Self-test 2.5
Find the derivatives of cosh x and sinh x.

2.9 Higher-order derivatives


We may differentiate a function, and then differentiate the result. For example:
if y = x4,
dy
then = 4x 3,
dx
which we shall sometimes call the first derivative of x4. By differentiating again, we
obtain
d ⎛ dy ⎞
⎜ ⎟ = 12x2 ,
dx ⎝ dx ⎠
which we call the second derivative of x4, and so on.
In general, if y = f(x), we use the notation
d ⎛ dy ⎞ d2y d2 f (x)
⎜ ⎟ = or .
dx ⎝ dx ⎠ dx2 dx2
(Notice where the indices ‘2’ are placed: the locations are different above and
below.) If we differentiate again, we get
78

d ⎡ d ⎛ dy ⎞ ⎤ d ⎛ d2y ⎞ d 3y d 3f (x)
⎢ ⎜ ⎟⎥ = ⎜ ⎟ = or ,
DIFFERENTIATION

dx ⎣ dx ⎝ dx ⎠ ⎦ dx ⎝ dx2 ⎠ dx 3 dx 3

and so on.

Example 2.8 Show that d4y/dx4 = 0 when y = 2x3 + 3x2 − 1.


Differentiating four times, we have
dy d2 y d 3y d 4y
= 6x2 + 6x, = 12x + 6, = 12, = 0.
2

2 3
dx dx dx dx 4
For any polynomial of degree n the (n + 1)th derivative will be zero.

Example 2.9 Write down the sequence y, dy /dx, d2y /dx2, … , d7y /dx7 when
y = sin x.
The sequence is
sin x, cos x, −sin x, −cos x, sin x, cos x, −sin x, −cos x
(and it continues in this regular way).

The following example involves the factorial n!. As we saw in Section 1.17, n! is
defined as
n! = n(n − 1)(n − 2) … 2· 1.
Remember that 0! is defined to be 1.

Example 2.10 Let n and r be any integers with n  r  0. Prove that


d r
n) =
n! dn
(a) (x x n −r; (b) (x n) = n!.
dx r (n − r)! dx n
(a) From eqn (2.9),
d n
(x ) = nx n −1,
dx
and successively
d2 n d3 n
( x ) = n(n − 1 )x n −2
, (x ) = n(n − 1)(n − 2)x n −3,
dx2 dx 3
and so on. For n  r  0, the rth derivative is xn−r with a coefficient of the form
n(n − 1)(n − 2) … in which there are r factors. The factors therefore run down from
n to [n − (r − 1)], so that we have
dr n
(x ) = n(n − 1)(n − 2) … (n − r + 1)x n − r, (i)
dxr
or
dr n n!
(x ) = xn − r . (ii)
dxr (n − r )!
The step from (i) to (ii) results from the cancellation of factors between numerator and
denominator of (ii). ➚
79
Example 2.10 continued

2.10
(b) When r = n, we obtain from (i)
dn
(x n) = n(n − 1)(n − 2) … (n − n + 1)x0 = n(n − 1)(n − 2) … 1 = n!,

AN INTERPRETATION OF THE SECOND DERIVATIVE


dx n
after putting x 0 = 1. (If we try putting r = n directly into (ii) we get n!/0!. The
conventional value 0! = 1 as in (1.38c) gives us n!.)

Self-test 2.6
dr(x2r)
Find the higher derivative .
dxr

2.10 An interpretation of the second derivative


The second derivative has a simple interpretation. Suppose that y = f(x). From
Section 2.9,
d2 y d ⎛ dy ⎞
= ⎜ ⎟.
dx 2 dx ⎝ dx ⎠
dy/dx represents the slope of the graph, so d2y /dx2 gives the rate of change of the
slope with respect to x as we move from left to right on the graph. Where d2y /dx2
is positive, the slope is increasing; where it is negative, the slope is decreasing.
If d2y/dx2 is consistently positive, then the slope dy /dx steadily increases;
it might even increase from negative values (downward slope) through a zero
value (tangent horizontal) to positive values (upward slope). If d2y/dx2 is con-
sistently negative, then the slope steadily decreases. Figure 2.8 shows two curves
upon which the second derivative is positive or negative respectively all the
way along.

d2y d2y
(a) Case > 0. (b) Case < 0.
dx2 dx2
dy dy dy dy
0 0 0 0
dx dx dx dx

y dy y
=0
dx

O
x x
O
dy
=0
dx

Fig. 2.8 Change of sign of dy/dx across a point where dy/dx = 0 in two cases.
80

Example 2.11 Sketch one period 2π of the graph of sin x and indicate the signs
DIFFERENTIATION

of dy /dx and d2y/dx2.


We have
dy d2 y
y = sin x, = cos x, = − sin x.
dx dx2
The signs of the derivatives are shown in Fig. 2.9a; dy/dx is zero at the points marked Z.
In Fig. 2.9b, dy/dx is sketched to show explicitly how it varies.
2

(a) (b)
d2y d2y dy
y 0 0
dx2 dx2 dx

1 y = sin x 1
Z 3
2
π
x x
O 1 π π 2π O 1 π π 3 π 2π
2 Z 2 2
−1 −1
dy dy dy
0 0 0
dx dx dx

Fig. 2.9

Problems

2.1 (Computational). A point P is given on each of 2.3 (Sections 2.1, 2.2). Obtain dy/dx from
the following curves. Choose a sequence of points Q first principles (see Problem 2.2) at a general
which lie closer and closer to P on the curve, and point P : (x, y) on the given curves.
make a table giving the slopes of the chords PQ. (a) y = 3x2; (b) y = x3;
From this table, estimate the slope of the curve at P. (c) y = 1/x; (d) y = x2 + 12 ;
(Consider points on both sides of P.) (e) y = x + 1/x; (f) y = 2x2 − 3.
(a) y = x3 at P : (1, 1);
(b) y = x – at P : (1, 1);
1
2
2.4 (Section 2.3). Let x be the displacement of a
(c) y = cos x at P : ( 14 π, 2− – );
1
2
point moving on a straight line, and let t represent
(d) y = ex at P : (0, 1); the time elapsed. Form a table by taking the given
(e) y = e2x at P : (0, 1); value of t and calculating the average velocity
(f) y = x3 + x – at P : (1, 2) (compare (a) and (b)); between t and t + δt for diminishing values of δt.
1
2

(g) y = ln x at P : (1, 0). Use the table to estimate the velocity at time t.
(a) x = 3t at t = 1; (b) x = 5t 2 at t = 3;
2.2 (Sections 2.1, 2.2). Obtain dy/dx in each of (c) x = 2t − 5t 2 at t = 1; (d) x = 2t − 5t 2 at t = 0.2.
the following cases at the given point P. Do this
from first principles; that is, find δy in terms of δx, 2.5 Use the formula (2.9) to find dy/dx at the given
simplify δy/δx, and let δx → 0 to obtain points in the following cases.
limδx→0 δy/δx, or dy/dx. (a) y = x at any point; (b) y = x3 at x = 3;
(a) y = 3x at P : (2, 6); (b) y = 3 − 2x at P : (1, 1); (c) y = x4 at x = 2 and at x = −2.
(c) y = 3x2 at P : (1, 3); (d) y = x3 at P : (1, 1);
(e) y = 1/x at P : (2, --12 ); (f) y = 3x + 2x 2 at P : (1, 5); 2.6 From (2.9), write down the derivatives, dy/dx
(g) y = (1 + 2x)2 at P : (−1, 1). or (d/dx)f(x), for the given functions f(x). Use this
81
information to sketch rough graphs of f(x) (notice sin ε
(i) when ε is an angle measured in degrees;
the sign and the magnitude of the slope of y = f(x)). ε

PROBLEMS
(a) y = x; (b) y = x2; (c) y = x3; tan ε sinh ε e− ε − 1
(d) y = x4; (e) y = x5. (j)
ε
; (k)
ε
; (l)
ε
.

2.7 Sketch a velocity–time graph and an 2.13 (See Section 2.7.) Obtain d(cos x)/dx in the
acceleration–time graph for a point moving on a same way that (2.16), for sin x, was obtained.
straight line with displacement x = t3. Use these
to sketch a graph of acceleration against distance. 2.14 (See Section 2.7.) (a) Differentiate e2x by
(See Example 2.5.) following the method leading to (2.15).
(b) Differentiate sin 2x by following the
2.8 In the following, different letters for the method leading to (2.16).
variables are used in place of the usual x and y. (c) Prove that (d/dx)e−x = −e−x by following
Write down the derivatives in the appropriate part-way the method leading to (2.15). (Hint:
form. (For example, if w = r 3, then dw/dr = 3r 2.) limε→0[(e−ε − 1)/(−ε)] = 1.)
(a) V = 43 πr 3; (b) S = πd 2; Use this result to differentiate sinh x and cosh x
(c) E = kT 4 (k is a constant); (see (1.26) for the definitions).
(d) I = V/R (R is a constant);
(e) H = RI2 (R is a constant); 2.15 Differentiate the following functions.
(f ) V = RT/P (R and P are constant). (a) 2 sin x − 3 cos x;
(b) ln 3x (see Section 1.11 for the properties of the
2.9 Differentiate the following functions by using logarithm);
(2.10): (c) ln x3 (see Section 1.11); (d) sin x − x;
(a) 3x2 − 2x + 1; (b) x7 − 3x6 + x + 1; (e) ex − 1 − x − 12 x2.
(c) x + C (where C is a constant);
(d) x(x − 1); 2.16 Find the equations of the tangent lines in the
(e) x2(x2 + 1) − 1; following cases.
(f ) ax2 + bx + c (where a, b, c, are constants); (a) y = x3 at (1, 1);
(g) (x − 1)2. (b) y = x4 − 2x2 + 1 at (2, 9);
(c) y = cos x at (12 π, 0);
2.10 Prove that the following pairs of curves (d) y = ln x at (e, 1);
intersect in a right angle at the points given. 1 1
(Hint: find dy/dx at the point for each curve.) (e) y = sin x + cos x at (14 π, 1);
√2 √2
(a) y = 1 + x − x2 and y = 1 − x + x2 at (1, 1); (f) y = 3ex − 4x at (0, 3).
(b) y = 12(1 − x2) and y = x − 1 at (1, 0);
(c) y = 1 − 13 x3 and y = 16 + 12x2 at (1, 23 ). 2.17 Obtain dy/dx, d2y/dx2, d3y/dx3 in the
following cases.
2.11 Find the angle between the following curves (a) y = x6; (b) y = 3x2 − 2x + 2; (c) y = x6 − x2;
at their points of intersection. (Hint: the angle of (d) y = 2 sin x − 3 cos x; (e) ex − 1 − x − 12 x2.
intersection is the angle between the tangents to
the curves at the point; then consider (1.7) and 2.18 Show that, if N is a positive whole number,
the tangent formula of (1.17a) for the difference then (dN/dxN)xN = N!.
of angles.)
(a) y = x 2 and y = 1 − x 2; 2.19 For the curve y = x2(x2 − 3), find the ranges
(b) y = 13 x 3 and y = x 2 − 2x + 43 . in x for which (a) dy /dx is positive (so that y is
increasing); (b) dy /dx is negative (so that y is
2.12 (See Section 2.6.) Find the limits of the decreasing); (c) d2y /dx2 is positive (so that the slope
following functions when ε → 0. (Remember: is increasing); (d) d2y /dx2 is negative (so that the
0/0 has no definite meaning.) slope is decreasing). Deduce the general shape of
ε ε ε2 the curve from these facts. (Hint: if dy/dx changes
(a) ; (b) ; (c) ;
ε 2ε ε sign at some point, then dy/dx must be zero at the
point. But dy /dx does not necessarily change sign
e2ε − 1 e2 ε − 1 sin 2ε
(d) ; (e) ; (f) ; where dy/dx = 0.)
2ε ε 2ε
sin 2ε ln(1 + ε 2) 2.20 Find the equation of the normal to the
(g) ; (h) ;
ε ε2 parabola y = ax2 at any point x = x0.
Further techniques for
3 differentiation

CONTENTS

3.1 The product rule 83


3.2 Quotients and reciprocals 85
3.3 The chain rule 86
3.4 Derivative of x n for any value of n 89
3.5 Functions of ax + b 90
3.6 An extension of the chain rule 91
3.7 Logarithmic differentiation 92
3.8 Implicit differentiation 93
3.9 Derivatives of inverse functions 94
3.10 Derivative as a function of a parameter 95
Problems 98

In the course of Chapter 2, the derivatives were found of the elementary functions
xn (where n is a positive integer), ex, sin x, and so on (see (2.19)). However, we
shall need a lot more, even for quite ordinary applications. For example, we
should surely like to be able to find dy/dx if y = xa when a is not a positive integer
1
(for example y = x –2 , or x−1(= 1/x); if y = eax, where a is any constant; if y = sin ax;
1
and so on. Much more complicated cases frequently arise, like y = sin –2 (ln x).
Fortunately we do not have to work out each one separately by means of a
lengthy argument. The linear combination rule (eqn (2.10c)), the product rule
(eqn (3.1)), and the quotient/reciprocal rule (eqn (3.2)), together with the results
in the Table (2.19), enable us to differentiate xn when n is a negative integer; also
rational fractions involving terms listed in (2.19) such as tan x (which equals
sin x/cos x), mixed expressions such as x−3 cos[x/(1 + ex)].
The derivatives of xa, sin(Ax + B) and so on, where a may take any value, and
also the inverse functions such as arcsin x, are obtained by applying the chain
rule or ‘function-of-a-function’ rule (Section 3.3) to results that we already have
(for example those in the Tables (3.4) and (3.5)). The chain rule has many general
applications – it is very important to recognize when it can be used.
By using combinations of these rules as appropriate, any finite expression built
out of the basic functions, however complicated, can be differentiated.
83

3.1 The product rule

3.1
The derivatives of a product of several functions can be obtained when the deriva-
tives of its individual components are known. Examples of such products are

THE PRODUCT RULE


x2 ex, ex sin x, x ex cos x.
Suppose firstly that y takes the form of a product of two functions u(x) and
v(x):
y(x) = u(x)v(x),
where y(x) is written to display the dependence of y on x. We require dy/dx in
terms of u and v. Fix a value for x, and change it by an amount δx so that
x becomes x + δx.
Then u, v, and y all change:
u becomes u + δu, v becomes v + δv, and y becomes y + δy,
where u, v, y represent the values of the functions u(x), v(x) and y(x) with argu-
ment x. Since
δy = (u + δu)(v + δv) − uv,
we obtain
δy (u + δu)(v + δv) − uv uv + u δv + v δu + δu δv − uv
= =
δx δx δx
δv δu δv
=u +v + δu .
δx δx δx
Now let δx → 0, so that δy/δx, δu/δx, δv/δx become dy/dx, du/dn, dv/dx
respectively. Also, since δu → 0 when δx → 0, the final term becomes zero, and
we obtain the product rule:

Product rule
If y(x) = u(x)v(x), then
dy d dv du
= (uv) = u + v .
dx dx dx dx (3.1)

Example 3.1 Find dy/dx when y = x2 ex.


Put u = x2, v = ex, y = x2 ex = uv. Then
du dv
= 2x, = ex.
dx dx
Therefore, by (3.1),
dy dv du
=u +v = x2(e x) + e x(2x) = (x2 + 2x) e x.
dx dx dx
84

Example 3.2 Find dx/dt when x = et cos t.


FURTHER TECHNIQUES FOR DIFFERENTIATION

We have to interpret (3.1) in terms of the new symbols. Put


u = et, v = cos t, x = et cos t = uv.
Then (refer if necessary to the table (2.19) with the appropriate changes of letters)
du dv
= et , = − sin t.
dt dt
Changing the symbols in (3.1) we have
dx dv du
=u +v = et(−sin t) + (cos t) et = e(cos t − sin t) = et(cos t − sin t).
dt dt dt

Example 3.3 Find dy/dx when y = x ex sin x.


This product has three terms, but we can carry out the differentiation in two stages. Write
y = (x e x) sin x
and put u = x e x and v = sin x. By (3.1),
dy d d
= x ex sin x + sin x (x e x )
dx dx dx
3

d
= x ex cos x + sin x (x ex ). (i)
dx
To evaluate (d/dx)(x ex ), use the product rule again, putting u = x and v = ex. Then
d d x d
(x e x ) = x e + ex x = x ex + ex . (ii)
dx dx dx
Replace (ii) into (i):
dy
= x ex cos x + (sin x)(x ex + ex )
dx
= ex(x cos x + x sin x + sin x).

More generally, if y = uvw, the product rule, applied twice as in Example 3.3,
becomes
dy du dv dw
= vw + wu + uv,
dx dx dx dx
which generalizes in an obvious way to products of four or more functions.
A general method of dealing with the product of several terms, which is usually
more convenient, is given in Section 3.7. You are strongly recommended to write
out all the steps completely at first, otherwise mistakes are likely to occur.

Self-test 3.1
Find dy/dx where y = ex sin x
85

3.2 Quotients and reciprocals

3.2
Suppose that

QUOTIENTS AND RECIPROCALS


u(x)
y(x) = .
v(x)
Proceed as for the product rule: let x change to x + δx, so that u becomes u + δu,
v becomes v + δv, and y becomes y + δy. Then

δy ⎛ u + δu u ⎞ 1 uv + v δu − uv − u δv
=⎜ − ⎟ =
δx ⎝ v + δv v ⎠ δx v(v + δv) δx
v δu − u δv 1 ⎛ δu δv ⎞
= = ⎜v − u ⎟.
v(v + δv) δx v(v + δv) ⎝ δx δx ⎠

Let δx → 0; then δy/δx, δu/δx, δv/δx become dy/dx, du/dx, dv/dx, and δv → 0.
Therefore

dy d ⎛ u⎞ 1 ⎛ du dv ⎞
= ⎜ ⎟ = 2 ⎜v − u ⎟.
dx dx ⎝ v ⎠ v ⎝ dx dx ⎠

It is worth noting the special case of the reciprocal of a function. In that case,
u(x) = 1, so du/dx = 0. Finally we have

Quotient and reciprocal rules


u(x)
(a) If y(x) = , then
v(x)
dy d ⎛ u⎞ 1 ⎛ du dv ⎞
= ⎜ ⎟= ⎜v − u ⎟.
dx dx ⎝ v ⎠ v 2 ⎝ dx dx ⎠
1
(b) If y( x ) = (i.e. if u(x) = 1), then
v( x )
dy d ⎛ 1⎞ 1 dv
= ⎜ ⎟=− 2 .
dx dx ⎝ v ⎠ v dx
(3.2)

Example 3.4 Obtain dy /dx when y = tan x.


Express y in the form
sin x
y = tan x = .
cos x
Put u = sin x, v = cos x, y = u/v. Then
du dv
= cos x, = − sin x.
dx dx ➚
86
Example 3.4 continued
FURTHER TECHNIQUES FOR DIFFERENTIATION

From (3.2),
dy 1 ⎛ du dv ⎞ 1
= ⎜v −u ⎟ = [cos x cos x − sin x(− sin x)]
dx v2 ⎝ dx dx ⎠ cos2 x
1 1
= (cos 2 x + sin 2 x) = = sec 2 x .
cos 2 x cos 2 x
(Remember that cos2A + sin2A = 1.)

Example 3.5 Find dy /dx when y = (x + 3)/(2x 3 + 1).


Put u = x + 3, v = 2x 3 + 1, y = u/v. Then
du dv
= 1, = 6x 2 .
dx dx
By (3.2),
dy 1 1 − 18x2 − 4x 3
= [( 2 x 3
+ 1)(1) − (x + 3)( 6 x 2
)] = .
dx (2x 3 + 1)2 (2x 3 + 1)2

Example 3.6 Obtain dy /dx when (a) y = 1/x; (b) y = 1/x2.


3

(a) Put v = x into the reciprocal rule (3.2b) (or u = 1 and v = x into (3.2a)):
dy 1 dv 1
=− 2 =− 2.
dx v dx x
(b) Put v = x2 into (3.2b):
dy 1 2
= − 4 2x = − 3 .
dx x x

If we had put n = −1 and −2 respectively into the formula (2.9),


d n
(x ) = nxn−1,
dx
proved only for positive integer n, the correct result in Example 3.6 is obtained.
The formula (2.9) is in fact correct for all values of n, as will be shown in
Section 3.4.

Self-test 3.2
Find dy/dx where y = (ln x)/(1 + x2)

3.3 The chain rule


The chain rule will be used continually in future chapters. It is also called the
function-of-a-function rule. Suppose y can be expressed as a function of a vari-
able u, where u is a function of x. We shall express this by the notation
87
y = y(u), where u = u(x).

3.3
An example of this is
y = cos(x3),

THE CHAIN RULE


which we can rewrite in the form
y = y(u) = cos u, where u = u(x) = x3.
Another example is when y = cos3x. Write it in the form y = (cos x)3, so that
y = u3, where u = cos x.
The rule for such cases is the following:

The chain rule


If y = y(u) where u = u(x), then
dy dy du
= .
dx du dx (3.3)

The form of this result is easy to remember if you first write


dy dy •
= ,
dx • dx
then put du in place of the dots. Sometimes it is inconvenient to use u; any letter
not already in use can be used in place of u.
To prove (3.3), fix on any value of x. Consider a nearby value x + δx, and denote
the corresponding small changes in u and y by δu and δy. When x becomes x + δx,
then u becomes u + δu and y becomes y + δy. Evidently
δy δy δu
=
δx δu δx
since the terms δu cancel. Now let δx → 0. Then δu → 0, and consequently δy /δx,
δy /δu, δu/δx approach dy/dx, dy/du, du/dx respectively. Thus we obtain
dy dy du
= .
dx du dx
The following examples show how to recognize when it is appropriate to use
the chain rule. You should lay out every application in the systematic way shown
until you are used to it.

Example 3.7 (a) You are given that (d/dx) e x = e x (see the table, (2.19)). Deduce
that (d/dx) e = a eax, where a is any constant. (b) Find the derivative of e−x.
ax

(c) Use this result to obtain the derivatives of sinh x and cosh x (see (1.26) for
the definitions of these functions).
(a) Rewrite y = eax in the form
y = e u, where u = ax. ➚
88
Example 3.7 continued
FURTHER TECHNIQUES FOR DIFFERENTIATION

To use the chain rule (3.3), we need dy/du and du/dx:


dy du
= eu and = a.
du dx
The chain rule gives
dy dy du
= = e u a = a eax ,
dx du dx
after restoring the variable x.
(b) For e−x, the constant a is −1, so
d −x
e = −e− x .
dx
(c) sinh x = 12 (ex − e −x ) and cosh x = 12 (ex + e −x ), so the result (b) gives
d d 1 x
(sinh x) = [ 2 (e − e −x )] = 12 (ex + e −x ) = cosh x,
dx dx
d d 1 x
(cosh x) = [ 2 (e + e −x )] = 12 (ex − e −x ) = sinh x.
dx dx
3

Example 3.8 Find dy/dx when y = (x2 + 1)10.


We could expand (x2 + 1)10 as a polynomial by means of the binomial theorem, but the
chain rule is far simpler, Put
y = u10, where u = x2 + 1.
Then
dy du
= 10u9 and = 2x.
du dx
By the chain rule,
dy dy du
= = 10u9 2x = 20x(x2 + 1)9 .
dx du dx

Example 3.9 Find dy /dx when (a) y = sin(x3); (b) y = sin3x.


(a) Put y = sin u, where u = x3. Then
dy du
= cos u and = 3x2 ,
du dx
By the chain rule,
dy dy du
= = (cos u)3x2 = 3x2 cos(x 3 ).
dx du dx
(b) Put y = u3, where u = sin x. Then
dy du
= 3u2 and = cos x.
du dx
By the chain rule,
dy dy du
= = 3u2 cos x = 3 sin2 x cos x.
dx du dx
89

Example 3.10 Find dy/dx when y = 1/(x2 + 1).

3.4
Put y = 1/u, where u = x2 + 1. Then

DERIVATIVE OF x n FOR ANY VALUE OF n


dy 1 du
= − 2 and = 2x
du u dx
(where the reciprocal rule (3.2b) was used for differentiating 1/u). By the chain rule,
dy 1 2x
= − 2 2x = − 2 .
dx u (x + 1)2

Example 3.11 Find du/dt when u = a cos k(x − ct) where a, k, c, and x are
constant, t and u being the only variables.
We should not use u for the intermediate variable in the chain rule (3.3), because it is
already in use (as the name of the dependent variable). Instead of u, use an uncommitted
letter such as w as the intermediate variable, putting
u = a cos w, where w = kx − kct.
The chain rule takes the form
du du dw
= ,
dt dw dt
in which
du dw
= − a sin w and = −kc.
dw dt
Therefore
du
= (− a sin w)(−kc) = akc sin k(x − ct).
dt

Self-test 3.3
Find dy/dx where y = (1 + 12e12x)12.

3.4 Derivative of x n for any value of n


Consider the derivative of y = xn, where n may have any value, an integer or not,
positive or negative. The rule turns out to take the same form as (2.9), in which n
was limited to positive integers:

Derivative of xn
If y = xn, where n may take any value whatever, then
dy
= nxn–1.
dx (3.4)
90
To prove (3.4), we use the chain rule (3.3). Note that, for all x  0, x = eln x (see (1.21)),
so that
FURTHER TECHNIQUES FOR DIFFERENTIATION

y = xn = (eln x)n = en ln x.
To use the chain rule, we put this in the form
y = e u, where u = n ln x,
so that
dy du n
= eu and = .
du dx x
Then
dy dy du n n
= = eu = xn = nxn −1
dx du dx x x
(where we used eu = y = xn again).

Find dy/dx when (a) y = x 2 , (b) y = 1/x 2 , (c) y = 1/√x,


3 3
Example 3.12
(d) y = 1/(2x + x).
1
3
3

d 23
(a) Here n = (x ) = 23 x 2 .
1
3
2 in (3.4), so
dx
(b) This may be written y = x − 2 , so n = − 23 in (3.4), and
3

d
(x − 2 ) = − 23 x − 2 .
3 5

dx
dy
(c) y = x − 2 , so = − 12 x − 2 .
1 3

dx
1
(d) Write
1
y = (2x 3 + x)−1. We can use the chain rule: put y = u−1, where
u = 2x 3 + x, Then
dy du 2 − 23
= −u −2 (by (3.4)), = 3 x + 1 (by (3.4)).
dx dx
Therefore, by the chain rule (3.3),
dy dy du x +1
2 − 23
= = (−u −2 )( 23 x − 3 + 1) = − 3 1
2
.
dx du dx (2x 3 + x)2

3.5 Functions of ax + b
A frequently occurring application of the chain rule (3.3) is in connection with
functions like eax+b, sin(ax + b), (ax + b)n, and in general f(ax + b). The spirit of the
chain rule is to say: ‘If the functions were e x, sin x, xn, f(x), then they would be
easy. Therefore, try the chain rule with u = ax + b.’
Suppose that, in general, we want to differentiate y when
y = f(ax + b),
and that we know how to differentiate f(x). Write
91
u = ax + b, y = f(u).

3.6
Then the chain rule gives

AN EXTENSION OF THE CHAIN RULE


dy dy du df (u)
= =a ,
dx du dx du
in which the derivative occurring on the right is already known.
The following special cases, in which b = 0, should be memorized; they con-
situte an extension of the table (2.19).

Function Derivative
eax a eax
sin ax a cos ax
cos ax −a sin ax
ax(a  0) ax ln a

(For ax see Problem 3.18.) (3.5)

Self-test 3.4
Find dy/dx if y = akx where a and k are constants and a  0.

3.6 An extension of the chain rule


In Section 3.3, the chain rule (3.3) was looked upon as a way of differentiating a
function of a function, say y(u(x)). Sometimes we need to consider ‘a function of
a function of a function of …’. These can always be worked through by repeated
applications of (3.3), but it may be less complicated to proceed as in the following
example.

Example 3.13 Obtain dy/dx when y = esin(x 2+1).


Instead of using one intermediate variable u, introduce two variables, u and v. Put
y = ev, v = sin u, u = x2 + 1.
Then by exactly the same type of arguments as led to (3.3),
dy dy dv du
= .
dx dv du dx
We have
du dv dy
= 2x, = cos u, = ev ,
dx du dv
so
dy
= (ev cos u)2x = 2x esin(x +1) cos(x2 + 1).
2

dx
92
The result can be extended in an obvious way to any number of intermediate
variables, but it is seldom that more than two would be needed:
FURTHER TECHNIQUES FOR DIFFERENTIATION

Extended chain rule


Suppose that
y = y(v), v = v(u), and u = u(x).
Then
dy dy dv du
= .
dx dv du dx (3.6)

Self-test 3.5
Find dy/dx where y = sin(ex ).
2
3

3.7 Logarithmic differentiation


To differentiate a product
y = u(x)v(x)w(x)
consisting of three terms, the product rule (3.1) can be applied twice, as in
Example 3.3. An alternative procedure, which is often simpler, is the following.
Since y = uvw,
ln y = ln(uvw) = ln u + ln v + ln w.
By the chain rule (3.3), if u(x) is any function of x,
d 1 du
ln u = ,
dx u dx
and we may have y, v, or w in place of u. Therefore
1 dy 1 du 1 dv 1 dw
= + + .
y dx u dx v dx w dx
By multiplying through by y = uvw, we obtain

Logarithmic differentiation
If y = uvw, then
dy ⎛ 1 du 1 dv 1 dw ⎞
= uvw ⎜ + + ⎟
dx ⎝ u dx v dx w dx ⎠
(and so on for any number of terms in the product defining y). (3.7)
93
1
Example 3.14 Find dy/dx when y = (x–2 sin2x)/(x2 + 1).

3.8
Put y = uvw, where
u = x –2 , v = sin2x, w = (x2 + 1)−1.
1

IMPLICIT DIFFERENTIATION
Then
ln y = ln(x –2 ) + ln(sin2x) + ln(x2 + 1)−1.
1

= 12 ln(x) + 2 ln(sin x) − ln(x2 + 1).


Notice that we did not just copy the formula (3.7) rigidly: the logarithm is useful for
getting rid of awkward powers and we might have missed this. Differentiate this
expression:
1 dy 1 2 d 1 d 2
= + (sin x) − 2 (x + 1)
y dx 2x sin x dx x + 1 dx
1 2 cos x 2x
= + − 2 .
2x sin x x +1
Multiply through by y = (x –2 sin2x)/(x2 + 1) to give dy/dx:
1

1
dy x 2 sin2 x ⎛ 1 2 cos x 2x ⎞
= 2 ⎜ + − 2 ⎟.
dx x + 1 ⎝ 2x sin x x + 1⎠

Self-test 3.6
1
Using logarithmic differentiation, find dy/dx where y = x–2 cos2x ln x.

3.8 Implicit differentiation


An equation of the form
f(x, y) = c (a constant)
represents a curve or curves; for example, x2 + y2 = 1 represents a circle, otherwise
expressed by y = y(x) = ± (1 − x2 )2 . This latter relation is implicit in the first form,
1

which is called an implicit equation.


Suppose that the implicit equation for y is f(x, y) = c and its explicit equation is
y = y(x); that is to say, both equations specify the same curve. Then f(x, y(x)) = c
for all values of x: it is an identity. Since the value of f(x, y(x)) remains constant
for all x, its derivative is zero:
d
f (x, y(x)) = 0,
dx
for every relevant value of x.
Notice further that if y is a function of x, then the chain rule (3.3), using y as
the intermediate variable instead of u, gives results such as
d 2 dy d dy
y = 2y and cos y = −sin y .
dx dx dx dx
94
This fact can be used in the following way to obtain an expression for dy/dx, even
in cases when we cannot solve the implicit equation to obtain y as a function of
FURTHER TECHNIQUES FOR DIFFERENTIATION

x explicitly.

Example 3.15 Find a general expression for dy /dx at any point on the curve
given by f(x, y) = x + y + sin x + cos y = 1.
So long as we stay on the curve, f(x, y) does not change when x changes, so
df(x, y)/dx = 0. Therefore
dy dy
1+ + cos x − sin y = 0,
dx dx
so finally
dy cos x + 1
= .
dx sin y − 1

Such a result is not quite so simple as its neatness suggests, because we would
still find it hard to say what values of y are to be associated with a particular value
of x in the new formula: this would in effect involve solving the original equation
for y in terms of x.
3

Self-test 3.7
Find dy/dx as a function of x and y for 2y = x − y + 2y3 − 3 ln x. What is the
value of dy/dx at (x, y) = (1, 1)?

3.9 Derivatives of inverse functions


The derivatives of functions such as ln x, arctan x, arccos x, and arcsin x, which
are the respective inverses of ex, tan x, cos x, and sin x can be obtained by a
standard procedure. We need the general result

dy dx
=1
dx dy (3.8)

To illustrate what (3.8) means, take as an example the case when y = ln x. As


described in Section 1.11, two statements
y = ln x for x  0 and x = ey for all y
are different ways of saying the same thing; the two graphs depicting the relation
between x and y are the same graph. For small corresponding increments δx and δy
on this graph,
δy δx
=1 .
δx δy
Since δx and δy approach zero together, we obtain dy/dx = 1/(dx/dy), which is (3.8).
95
To find dy/dx when y = ln x, write equivalently

3.10
x = e y.
Then

DERIVATIVE AS A FUNCTION OF A PARAMETER


dx
= e y;
dy
so, by (3.8),
dy 1 1
= = .
dx ey x
This, of course, agrees with (2.18), where a more direct method was used.

Example 3.16 Find dy/dx when y = arctan x.


If y = arctan x, then x = tan y. From Example 3.4, interchanging x and y,
dx 1
= ;
dy cos2 y
so, by (3.8),
dy
= cos2 y.
dx
To express cos2y in terms of tan y (i.e. in terms of x) draw the triangle in Fig. 3.1, in
which y is represented by the angle A. This is a right-angled triangle because the sides
conform with Pythagoras’s theorem. Evidently cos2y = 1/(1 + tan2y), as can be checked
by putting tan2y = (sin2y)/cos2y. Therefore
dy 1 1
= = .
dx 1 + tan2 y 1 + x2

C
2 y)
+ tan
√(1 tan y

y
A 1 B Fig. 3.1

Self-test 3.8
Find dy/dx if y = tanh−1(2x).

3.10 Derivative as a function of a parameter


If x and y are functions of a common parameter, or supplementary variable, t,
so that
x = x(t), y = y(t),
96
then the point (x(t), y(t)) follows a curve as t varies. Suppose that t changes from
t to t + δt; then x changes to x + δx and y to y + δy. Obviously
FURTHER TECHNIQUES FOR DIFFERENTIATION

δy δy δx
= ,
δx δt δt
since δt cancels on the right-hand side. Let δt → 0; then δx → 0, and we have

Differentiation in terms of a parameter


If x = x(t) and y = y(t), then
dy dy dx
= .
dx dt dt
(3.9)

Example 3.17 A curve is given in polar coordinates by r = sin θ (0  θ  π).


Find dy/dx at the point where θ = 18 π.
We can use θ as the parameter in the following way. The universal relation between
polar and cartesian coordinates is
x = r cos θ, y = r sin θ.
On the special curve described by r = sin θ, these equations become
3

x = sin θ cos θ, y = sin2θ.


Then, from Appendix B(c),
dx
= − sin2θ + cos2θ = cos 2θ

and
dy
= 2 sin θ cos θ = sin 2θ

Therefore
dy dy dx
= = tan 2θ .
dx dθ dθ
At the point where θ = 18 π ,
dy
= tan 14 π = 1.
dx
The curve, which is a circle, together with its tangent, is shown in Fig. 3.2.

Example 3.18 The map coordinates of a moving vehicle are given by x = −t 2,


y = t , where t is time and t  0. Find the direction the vehicle is facing when t = 2.
1 3
3

From (3.8),
dy dy dx t2
= = = − 12 t.
dx dt dt −2t
This equals −1 when t = 2. The slope of the curve is negative at this point, so the
tangent to the path slopes downwards from left to right as shown in Fig. 3.3. The
actual direction in which the vehicle is moving is, however, from right to left. It is
facing north west as shown.
97

3.10
N
y
Ta
ng

DERIVATIVE AS A FUNCTION OF A PARAMETER


e nt W E 4
0.5 r
S
Point t = 2 2
θ P Pa
x th
–0.5 0.5
x
−6 −4 −2 O

Fig. 3.2 At P, θ = 18 π and the coordinates Fig. 3.3


of P are (1/(2√2), (√2 − 1)/(2√2)).

y Direction
Q
δs
δy
P
δx

O x

Fig. 3.4

From information such as that given in the previous example, the speed of a
moving point can be calculated. Suppose that a point moves so that
x = x(t), y = y(t),
where t represents time. Figure 3.4 shows the effect of changing t to t + δt, where
δt is small: the point moves from P to Q, a short distance δs say along the curve
(δs is called an element of arc-length). Then the average speed over this short time
is given by

arc-length PQ δs straight distance PQ


= ≈
δt δt δt
(δx2 + δy2 )2
1

=
δt
1

⎡⎛ δx ⎞ 2 ⎛ δy ⎞ 2 ⎤ 2
= ⎢⎜ ⎟ + ⎜ ⎟ ⎥ .
⎢⎣⎝ δt ⎠ ⎝ δt ⎠ ⎥

Now let δt → 0. Then δx/δt and δy/δt become dx/dt and dy/dt, and finally we
have the result:
98

Speed of a moving point


FURTHER TECHNIQUES FOR DIFFERENTIATION

Let x = x(t) and y = y(t), where t is time.


The speed of the point is given by
1

ds ⎡⎛ dx ⎞ ⎛ dy ⎞ ⎤
2 2 2

speed = = ⎢⎜ ⎟ + ⎜ ⎟ ⎥ ≥ 0,
dt ⎢⎝ dt ⎠ ⎝ dt ⎠ ⎥
⎣ ⎦
where ds stands for an element of arc-length. (3.10)

(Speed is always counted as a non-negative number: when we want to make a


distinction as to direction, the word ‘velocity’ is used.)

Example 3.19 Find the speed of the vehicle in Example 3.18 when t = 2.
In general
1

ds ⎡⎛ dx ⎞ ⎛ dy ⎞ ⎤
2 2 2

= ⎢⎜ ⎟ + ⎜ ⎟ ⎥ = (4t 2 + t 4 )2 .
1

dt ⎢⎣⎝ dt ⎠ ⎝ dt ⎠ ⎥

The speed is therefore 4√2 when t = 2.
3

Self-test 3.9
An ellipse is given parametrically by x = a cos t and y = b sin t. Find dy/dx as
a function of t. Find the points on the ellipse where the slope of the tangent
to the ellipse is (−1).

Problems

3.1 (Product rule, (3.1)). Obtain df(x)/dx for the (n) 1/(x + 1); (o) e−x (= 1/ex);
following f(x): (p) 1 /tan x; (q) x −2 ln x.
(a) x e x; (b) x sin x; (c) x cos x;
(d) e x sin x; (e) x ln x; (f ) x 2 ln x; 3.3 Find the first, second, and third derivatives of
(g) e x ln x; (h) x2 e x; (i) sin x cos x; (a) 1/(1 − x); (b) x sin x; (c) x/(x − 1);
(j) x 2x 3 (this is the same as x5: show that the result (d) f(x)g(x), where f and g are any functions.
is the same for both forms).
3.4 (Chain rule, (3.3)). Obtain df(x)/dx for the
3.2 (Quotient and reciprocal rule, (3.2)). Obtain following f(x). (Set out the calculation
df(x)/dx for the following f(x): systematically, as in the examples in Section 3.3.)
(a) cot x; (b) x /(x + 1); (c) (sin x)/x; (a) sin2x; (b) cos2x; (c) sin x2; (d) cos x2;
(d) e x/x; (e) (x2 − 1) /(x2 + 1); (e) tan2x; (f) tan x2; (g) cos(1/x);
(f ) (tan x)/x2; (g) (sin x + cos x)/(sin x − cos x); (h) e−x (compare Problem 3.2(o)); (i) (x + 1)5;
(h) sec x (= 1 /cos x); (i) cosec x (= 1/sin x); ( j) (x 3 + 1)4; (k) sin 3x; (l) cos 12 x;
(j) x/(3x2 − 2); (k) 1/x(x3 + 1); (l) 1/ln x; (m) tan 12 x; (n) e−3x; (o) sin(2x + 1);
(m) x n where n is a negative integer (x n = 1/x −n); (p) cos(3x − 2); (q) tan(1 − 2x); (r) e1/x;
99
3.5 (General powers of x, Section 3.4). (a) Show that if x + y = 4, then dy /dx = −x/y.
2 2

Differentiate the following. Check the correctness of the expression by

PROBLEMS
(a) x−2; (b) x−1; testing it with y = ±(4 − x2)–. Interpret the
1 1 3 1
(c) x –; (d) x– –; (e) x–;
3 3 2 2

(f ) √x; (g) √(x ); (h) 1/x; (i) 1/√x.


3
result geometrically by sketching the circle
x2 + y2 = 4 and considering the meaning of
3.6 Differentiate the following (the independent dy /dx in terms of slope.
(b) x– + y– = 1; (c) x3 + xy − y3 = 0;
1 1
variable is not always x, and more than one rule 2 2

is needed). (d) x sin y − y sin x = 1.


(c) (x2 + 1)– – ;
1 1 1
(a) x – sin x;
2
(b) sin– x;
3 2

(d) sin 2
(3t + 1); (e) e−t
cos t; (f) e−t sin t; 3.12 The same expression for dy/dx in Problem
(g) e−2t cos 3t; (h) e−3t cos 2t; (i) sin x cos2x; 3.11a is obtained when the radius is changed; for
⎛ sin x ⎞
2
example, if x2 + y2 = 9, we still get dy/dx = − x/y. Is
( j) sin2x cos x; (k) ⎜ ⎟ ; this paradoxical? (Notice that even in the general
⎝ x ⎠
case of f(x, y) = c, a constant, the expression for
(l) x sin3x; (m) x cos3x. dy /dx will not depend on c: think of the difference
between the form of the expression and the values
3.7 Differentiate cos2x and sin2x, (a) by using it takes.)
the identities cos2A = --12 (1 + cos 2A) and
sin2A = --12 (1 − cos 2A), (b) by using the product 3.13 Find expressions for dy/dx and then d2y/dx2
rule, (c) by using the chain rule. if xy2 − x2 y = 1.
3.8 Confirm the correctness of the following 3.14 Differentiate the following inverse functions,
statements. The letters A, B, C, D, and n stand using the method of Section 3.9. The results are
for any constants. quite important, and are included in the table of
d 2x
(a) If x = A cos 2t + B sin 2t, then 2 + 4x = 0 derivatives, Appendix D.
dt (a) arcsin x; (b) arccos x;
d 2x (c) arctan x; (d) sinh−1x;
(b) If x = A cos nt + B sin nt, then 2 + n 2x = 0
dt (e) cosh−1x; (f) tanh−1x.
2
d x
(c) If x = A e3t + B e−3t, then 2 − 9x = 0 3.15 (Parametric differentiation, Section 3.10).
dt
d 2
x The curves in the following are in polar
(d) If x = A ent + B e−nt, then 2 − n 2x = 0 coordinates. Find dy/dx at the point specified.
dt
(a) r = sin 12 θ at θ = 12 π;
(e) If x = A e−t cos t + B e−t sin t, then (b) r = 1 + sin 2θ at θ = 41 π.
d 2x dx
+2 + 2x = 0
dt 2 dt 3.16 Obtain dy /dx in terms of t, then re-express
(f ) If y = A ex + B e−x + C cos x + D sin x, then it in terms of x, when the path of a point is given
d4y parametrically by the following.
− y = 0.
dx 4 (a) x = t 3, y = t 2; (b) x = 2 cos t, y = 2 sin t.

3.9 (Chain rule (3.3); or, more easily, the 3.17 The path of a point is given parametrically by
extension (3.6)). Differentiate the following x = a cos t, y = b sin t. Show that the point travels
functions. around the ellipse
2 2 2
(a) ecos x; (b) e–cos x ; (c) ln(cos x2); (d) (ex – 1)4. x2 y2
+ = 1.
a2 b2
3.10 (Logarithmic differentiation, Section 3.7, is
easiest.) Differentiate the following. Express dy/dx in terms of t. Suppose that t
1 1
(a) x ex sin x; (b) t et cos t; (c) x – e2x sin– 3x.
2 2 represents time. Express the speed as a function
of t.
3.11 (Implicit differentiation, Section 3.8). Proceed
as in Example 3.15 to obtain expressions for dy /dx 3.18 Show that (d/dx)(ax) = ax ln a when a  0.
in the following. (Hint: write ax in exponential form with base e.)
Applications of
4 differentiation

CONTENTS

4.1 Function notation for derivatives 100


4.2 Maxima and minima 102
4.3 Exceptional cases of maxima and minima 106
4.4 Sketching graphs of functions 108
4.5 Estimating small changes 114
4.6 Numerical solution of equations: Newton’s method 116
4.7 The binomial theorem: an alternative proof 120
Problems 121

The concept of differentiation opens up a surprising range of mathematics, as is


illustrated in Chapters 4 and 5 (and throughout the book). Some areas of applica-
tion are listed in the section headings. Notice that Sections 4.4, 4.5 and 4.6 describe
its use for obtaining numerical approximations, as accurately as we please, to the
solutions of equations that are otherwise intractable, such as the equation ex = x.
Reminder. A basic table of derivatives is given in Appendix D at the end of
the book.

4.1 Function notation for derivatives


So far, we have used the dy/dx or (d /dx)f(x) notation for derivatives. The useful-
ness of the dy/dx notation is illustrated by the chain rule (3.3) and by (3.7)–(3.9):
it strongly suggests the truth of certain results and makes them easy to remember.
However, it is sometimes desirable to use another notation, f ′(x), which means
exactly the same thing:
d
f ′(x) means the same as f(x).
dx
By itself the symbol f ′ stands for the derivative function, because it is ‘derived’
from the original function f. Think of f ′ in the following way. Choose a ‘neutral’
letter for the independent variable, u say, which is not being used for anything else
at the moment, and specify f in terms of u. For example, suppose that
f(u) = u2 − 3u.
101
Then f ′ stands for the function specified by

4.1
d
f ′(u) = f(u) = 2u − 3.
du

FUNCTION NOTATION FOR DERIVATIVES


Knowing now the form of the function f ′ (i.e. its formula), we can put anything
we like in place of u, so that, as in Section 1.4 for f,
f ′(x) = 2x − 3, f ′(t) = 2t − 3, f ′(5) = 2 ·5 − 3 = 7,
f ′(x ) = 2x − 3,
3 3
f ′(x − ct) = 2(x − ct) − 3,
f ′(g(x)) = 2g(x) − 3 (where g is any function), and so on.
The following examples show how this notation can be used.

Example 4.1 The function f is defined by f(u) = sin u. Obtain


d
(a) f(x2); (b) f(x2); (c) f ′(x2).
dx
(a) f(x2) = sin x2.
d d
(b) f (x 2 ) = sin(x2 ) = 2x cos(x2 )
dx dx
(by using the chain rule (3.3) with u = x2).
(c) The first thing to do is to obtain the function f ′:
d d
f ′(u) = f (u) = (sin u) = cos u.
du du
Now put u = x2; then
f ′(x2) = cos(x2).

Notice that the result (c) is different from the result (b): f ′(x2) is not the same as
(d/dx)f(x2). In (b) we first find f(x2) and then differentiate with respect to x; in
(c) we first find f ′(u) and then put u = x2.

Example 4.2 Express (a) the product rule (3.1); (b) the quotient rule (3.2a);
(c) the chain rule (3.3); in terms of the ‘dash’ notation.
(a) Product rule
d
[u(x)v(x)] = u(x)v′(x) + v(x)u′(x).
dx
or simply
(uv)′ = uv′ + vu′.
(b) Quotient rule
⎛ u⎞ ′ 1
⎜ ⎟ = 2 (vu′ − uv′).
⎝ v⎠ v
(c) Chain rule
d
f(u(x)) = f ′(u(x))u′(x).
dx
102

Example 4.3 (a) Suppose that f is any function. Express (d /dx)f(5x − 3) in


APPLICATIONS OF DIFFERENTIATION

any terms available. (b) Verify the correctness of (a) in the special case when
f(5x − 3) = sin(5x − 3).
(a) Since the particular function f is not specified, the only thing to be done is to express
(d/dx)f(5x − 3) in terms of f ′, which is also unspecified. Then, from the chain rule (c) in
Example 4.2, with u = 5x − 3,
d
f(5x − 3) = 5f ′(5x −3).
dx
It is awkward to express the right-hand side without using the dash notation. One
alternative is to write it as
⎡d ⎤
⎢ du f (u)⎥ .
⎣ ⎦u =5x−3
(b) In this case f(u) = sin u, so
f ′(u) = cos u.
4

The result in (a) predicts that


d
f(5x − 3) = 5 cos(5x − 3).
dx
This is the same as the result obtained by working out
(d/dx) sin(5x − 3)
directly by using the chain rule with u = 5x − 3.

The dash notation extends to higher derivatives: we put

The dash notation


d d2
f ′(x) = f (x), f ′′(x) = f (x),
dx dx 2
d3
f ′′′(x) = 3 f (x), … .
dx
If y = f(x), then the notation y′, y″, y′′′, … is also used. (4.1)

Self-test 4.1
If f(u) = eu cos(u2), obtain (a) f ′(x); (b) f ′(x2); (c) df(x2)/dx.

4.2 Maxima and minima


A prominent feature in the graph of any function
y = f(x)
103
is any point at which the graph ‘turns over’. For example, in Fig. 2.9, the graph of
y = sin x turns over at x = 12 π and x = 32 π. These are points where the slope changes

4.2
sign from positive to negative or negative to positive. The derivative of f(x) is zero
(the tangent is horizontal) at such turnover points: for example, it is easy to verify

MAXIMA AND MINIMA


that, for y = sin x,
f ′( 12 π) = 0 and f ′( 32 π) = 0.
Therefore
f ′(x) = 0
can be looked on as an equation whose solutions include all the possible points
at which the graph turns over.
However, graphs do not necessarily turn over at points where f ′(x) = 0. For
example, if y = x3, then f ′(x) = 3x2. This is zero at x = 0, but the graph does not turn
over at x = 0 (see Fig. 1.4): it flattens instantaneously and then continues upward.
Figure 4.1 sketches two typical cases in which the graph does turn over, at
A (x = a) in Fig. 4.1a and at B (x = b) in Fig. 4.1b. Then f ′(a) = 0 and f ′(b) = 0.

(b) y

f ″(b)  0
(a) y f ″(a)  0
A
f ′(a) = 0
Local
Local
maximum
minimum

x B f ′(b) = 0
O a
x
O b

Fig. 4.1

If f ′(x) = 0 at a point x = c, that is if


f ′(c) = 0,
then x = c is called a stationary point of f(x). A stationary point such as A in
Fig. 4.1a is called a maximum of the function f(x). More precisely, the function is
said to have a local maximum at x = c, because the value of f(x) at x = c is greater
than its value at any point in the immediate neighbourhood. (There may be local
maxima elsewhere that are either greater or smaller than this one.) Similarly a
point such as B in Fig. 4.1b is called a local minimum of f(x).
To distinguish between types of stationary point algebraically, consider also
the second derivative f ″(x) at x = c. Suppose that
f ′(c) = 0 and f ″(c)  0.
104
The second derivative is
APPLICATIONS OF DIFFERENTIATION

d2 y d ⎛ dy ⎞
f ′′(x) = = ⎜ ⎟,
dx2 dx ⎝ dx ⎠
and this is negative at x = c. Therefore the slope, dy /dx, or f ′(x), is decreasing
across x = c, and since f ′(x) = 0 at x = c, f ′(x) must be positive on the left of c and
negative on the right. Thus the graph is of the type shown in Fig. 4.1a, and the
point is a local maximum.
If x = c is a point where
f ′(c) = 0 and f ″(c)  0,
then f ′(x) is increasing, and therefore goes from negative to positive, across x = c.
The point is therefore a minimum, like the point B in Fig. 4.1b.
In the special case when
f ′(c) = 0 but f ″(c) = 0
4

there might occur a maximum (as with y = −x4 at x = 0), or a minimum (as with
y = x4 at x = 0), or another feature called a stationary point of inflection (as with
y = ±x3 at x = 0). These cases are illustrated in Fig. 4.2. One way to classify such
a point is to examine directly the sign of dy/dx on both sides of the point.

(a) y (b) y (d) y


(c) y

Point of
Maximum inflection
O O
x O x O x Point of x
Minimum
inflection

Fig. 4.2 Cases for which f ′(0) = 0 and f ″(0) = 0. (a) y = −x4. (b) y = x4. (c) y = x3. (d) y = −x3.

To summarize:

Stationary points of f(x)


Let f ′(c) = 0; that is, x = c is a stationary point of f(x). Stationary points can be
classified by either examining the sign of f ′(x) on both sides of x = c or looking
at the sign of f ″(c).
(a) If f ″(c)  0, f(x) has a local maximum at x = c.
(b) If f ″(c)  0, f(x) has a local minimum at x = c.
(c) If f ″(c) = 0, the stationary point might be a maximum, a minimum, or a
point of inflection. Examine the sign of f ′(x) on both sides of x = c.
(4.2)
105

Example 4.4 Classify the stationary points of f(x) = x3 − 3x.

4.2
The stationary points are where f ′(x) = 0; that is, where

MAXIMA AND MINIMA


3x2 − 3 = 0, or x = ±1.
We need the signs of f ″(±1), where
f ″(x) = 6x.
Then
f ″(1) = 6,
which is positive, so there is a minimum at x = 1. Also
f ″(−1) = −6
which is negative, so there is a maximum at x = −1.
The values of f(x) at these points are
f(1) = −2, f (−1) = 2;
so the graph has the shape shown in Fig. 4.3. Alternatively, we could simply have
checked the signs of f ′(x) = 3x − 3 on both sides of the stationary points directly,
instead of using the test (4.2).

y
2

V
1
R

O
x
–2 –1 1 2

−1
x

−2 Fig. 4.4

Fig. 4.3

Example 4.5 In the circuit shown in Fig. 4.4, V is a constant voltage and R
and x represent two resistances: R is fixed and x is variable. The rate of heat
generation y in resistance x is equal to I 2x where I is the current. Show that y
is a maximum when x = R.
Current equals voltage divided by total resistance, so
V
I= .
R+x
Therefore the rate of heat generation is
V 2x
y= = f (x),
(R + x)2
106
Example 4.5 continued
APPLICATIONS OF DIFFERENTIATION

say. If there is a maximum, it will occur when f ′(x) = 0. From the quotient rule (3.2)
V2 R−x
f ′(x) = [(R + x)2 − x ·2(R + x)] = V 2 . (i)
(R + x)4 (R + x)3
This is zero when x = R.
To show that f(x) has a maximum when x = R we may work out the sign of f ″(R).
From (i),
V2 V2
f ′′(x) = [(R + x)3
(−1) − (R − x) · 3(R + x)2
] = (− 4R + 2x).
(R + x)6 (R + x)4
Therefore
f ″(R) = −V 2/(8R3),
which is negative, so x = R corresponds to a maximum of y.
However, it is easier to look instead at the expression (i) for f ′(x). When x  R,
we have f ′(x)  0, so f(x) is increasing. When x  R, we have f ′(x)  0, so f(x) is
decreasing. This ensures that a maximum has been obtained without the need to
differentiate again.
4

Example 4.6 x and y are two numbers subject to the restriction that x + y = 1.
Find the maximum possible value of xy.
There are two variables, x and y, but we can reduce the problem to one involving
only x by using the fact that x + y = 1, so that
y = 1 − x. (i)
In that case,
xy = x(1 − x) = x − x2 = f(x),
say. Now f(x) has a stationary point (a maximum, minimum, or point of inflection)
where f ′(x) = 0, that is to say, where
1 − 2x = 0, or x = --12 .
By (4.2), this value of x delivers a maximum, because f ″(x) = −2 (for any value of x)
which is negative. From (i), y = --12 when x = --12 , so the maximum value of xy is --14.

Self-test 4.2
Find and classify the stationary points of f(x) = 2x3 − 9x2 + 12x + 1.

4.3 Exceptional cases of maxima and minima


The method of finding local maxima and local minima by solving f ′(x) = 0 reveals
only points where the slope of the graph of y = f(x) is horizontal. Sometimes there
is a maximum or minimum at an end-point of an interval, even if the graph is not
horizontal there.
107

Example 4.7 Suppose that the values of x to be considered are restricted to lie

4.3
between 0 and 1 inclusive: that is, 0  x  1. Find the points on this interval at
which x − x2 takes maximum and minimum values.

EXCEPTIONAL CASES OF MAXIMA AND MINIMA


The graph of y = f(x) = x − x2 between x = 0 and x = 1 is shown is Fig. 4.5. The maximum
of f(x) = x − x2 which we found in Example 4.6 at x = --12 can be seen. But, understood in
a commonsense way, there are minimum values at x = 0 and x = 1, the end-points of the
restricted interval. These cannot be detected by the method of differentiation. Whether
we are interested in them would depend on the demands of any practical problem from
which the question originated.

In problems of the type illustrated in Example 4.6 this situation can arise
naturally, as in the following example.

f(x)
1

1
4
O
x
−1 1
O 1
2 1 x

Fig. 4.5 y = x − x2, 0  x  1.


−1

Fig. 4.6 f(x) = 2x2 − 1, −1  x  1.

Example 4.8 Find the maximum and minimum values of x2 − y2 on the circle
x + y = 1.
2 2

It is evident that the point (x, y) can only be on the circle if x and y both have values
between −1 and 1 inclusive, that is if
−1  x  1 and −1  y  1. (i)
A restricted interval therefore arises naturally in the problem. On the circle x + y = 1,
2 2

we have
y2 = 1 − x2, (ii)
so
x2 − y2 = 2x2 − 1 = f(x), (iii)
say. To find the stationary points of f(x) we see that f ′(x) = 4x, which is zero when x = 0.
Also f ″(0) = 4  0, so x = 0 is a local minimum of f(x), whose value is f(0) = −1.
However, we have overlooked something. In Fig. 4.6, we show the graph of
f(x) = 2x2 − 1, within the permitted interval −1  x  1. The local minimum at x = 0
can be seen, but there are also maxima at the end-points x = −1 and x = 1, where f(x)
takes the values +1.
Alternatively, the maxima at x = ±1 can be found by substituting for x instead
of y at the first stage. Put x2 = 1 − y2, so that ➚
108
Example 4.8 continued
APPLICATIONS OF DIFFERENTIATION

x2 − y2 = 1 − 2y2 = g(y),
say, and solve g′(y) = 0: we then find a local maximum at y = 0, where x = ±1. However,
we also lose sight of the minima we found before. The subject is discussed again in
Section 28.2.

y
B
A

O x
Fig. 4.7
4

Another possibility is that there may be points at which the graph of y = f(x)
does not have a definite tangent. Then f ′(x) or dy/dx has no meaning at such
points. For example, in Fig. 4.7, there is no tangent at the points A, B, and C. The
points A and B could qualify as local maxima, and C as a local minimum, but at
A and C the graph suddenly changes direction, and at B there is a jump in the
value of f(x). These points cannot be located by solving f ′(x) = 0, because f ′(x)
does not exist at A, B, and C.
For example, the derivative of f(x) = | x | is not defined at x = 0, but the function
clearly has a minimum value at x = 0.

Self-test 4.3
Obtain and classify the stationary points of f(x) = 3x4 − 8x3 + 6x2. Sketch the
graph of y = f(x).

4.4 Sketching graphs of functions


To sketch a graph is to indicate its general shape so as to draw attention to its
most important features without being concerned with accurate plotting. To do
this it is necessary for you to have a clear idea of the shape of the graphs of the
basic functions
x a, eax, sin ax, cos ax, ln x.

Example 4.9 Sketch the graph y = 1 − 1/(1 + x)2.


This can be done in stages, as shown in Fig. 4.8. Figure 4.8c is obtained from 4.8b by using
the rule (1.11) with c = 1; it simply involves sliding the graph y = −1/x2 one unit to the
left. To get from 4.8c to 4.8d, we add 1, which moves the graph up the y axis by one unit.
109

y (b) y

4.4
(a)
1 1
y= 2 y=−
2 x2 x2

SKETCHING GRAPHS OF FUNCTIONS


−2 2 x

−2 O 2 x O
−2 −2

(c) y –1 (d) y
y= 1
(1 + x)2 y=1−
2 2 (1 + x)2

−2 −1 O 2 x 1

–2 O 2 x
−2 −2

Fig. 4.8

In Fig. 4.8d of Example 4.9 we see that, as x increases, becoming large and
positive, the value of
1
y=1−
(1 + x)2
gets closer and closer to 1. The same is true when x becomes large and negative.
This is obviously an important feature of the graph. It can be seen to be true by
thinking what happens to y when we put a large value of x into the formula for y
(think of a very large number: x = 1 000 000, rather than x = 10). Then obviously
1/(1 + x)2 is very small, so y gets very close to 1, and the larger x becomes, the
nearer y is to 1. The same is true when x is large and negative.
We say that, as x increases, the graph approaches the line y = 1, which in gen-
eral terms is called an asymptote of the graph. When x approaches −1, the graph
approaches the vertical line x = −1; this is also called an asymptote. The two
continuous halves of the graph to the left and right of x = −1 are called branches.
Suppose that y = f(x) is to be sketched. A general question to be asked is ‘What
happens to y when x increases towards infinity (or decreases towards minus
infinity)?’ We normally say ‘as x approaches ±∞’, and as usual indicate the
approach by ‘→’:
x → ±∞.
For example, 1/x → 0 when x → −∞. Also
x−1
→ 1
when x → ∞ (or x → −∞).
3x + 2
3

To see this, think of the effect of giving x an immense value. Only the terms x and
3x are significant; they are said to dominate the expression, so
110
x−1 x
→ = 13 .
3x + 2
APPLICATIONS OF DIFFERENTIATION

3x
The limit notation can be used in this context (see Section 2.2). We can write,
for example,
1 – 2x
lim = −2 .
x→∞ 1+x
The reasoning is the same as in the earlier case: think of a very large value of x.
Very often the function has no definite limit as x → ∞. For example, limx→ ∞ sin x
does not exist; no definite single number is approached, since sin x simply goes up
and down between ±1 for ever. However, it is quite usual to write, say,
lim x2 = ∞,
x→∞

even though ∞ is not a number.


Notice the following result:
4

lim axn e–cx = 0,


x→∞
where a and n are any constants, and c is a positive constant. (4.3)

We shall not prove (4.3); but, to convey the feel of it, a table of values is given for
the special case of x3 e−x:

x 0 1 2 3 4 8 10
x3 e−x 0 0.36 1.08 1.34 1.17 0.18 0.05

Fairly large values are needed before the function settles down to approach
zero, because x3 is increasing, and therefore competes with e−x in the early stage.
However, e−x will beat any power of x down to zero eventually. In the following
example, we sketch the graph of the function in the table without using the
calculated values above.

Example 4.10 Sketch the graph y = x3 e−x.


Do it in stages, using any easily obtained facts you can think of.
1. Are there any points where it is easy to obtain values? At x = 0, y = 0.
2. Are there any definite points where x3 e−x is infinite? There are no such points.
3. Are there any points where the graph crosses the x axis? Only the point found in (1).
4. Are there any maxima/minima?
dy
= 3x2 e−x − x3 e−x = x2(3 − x) e−x.
dx
This is zero when x = 0 and x = 3, so these are stationary points.
dy/dx is positive when x  3, and negative when x  3, so x = 3 is a maximum. Since
e3 ≈ 20, at this point y ≈ 1--13.
Near the other stationary point x = 0, dy/dx is positive on both sides, so x = 0 is a
point of inflexion.
5. Behaviour as x → ∞. According to (4.3), y → 0 as x → ∞. ➚
111
Example 4.10 continued

4.4
6. Behaviour as x → −∞. As x → −∞, we have x3 → −∞ and e−x → ∞ (think, for
example, of x = −1000). Therefore, x3 e−x → −∞ (very rapidly).

SKETCHING GRAPHS OF FUNCTIONS


The sketch is shown in Fig. 4.9.

y
1

−1 O 1 2 3 4 5 6
−1 x

−2
Fig. 4.9

Sketch the graph of e− 3x sin 2x for 0  x  2π.


1
Example 4.11

(a) (b) (c)


y y y
1
y = sin 2x
1 y = e− 3x 1 1

x x
O π 2π O 1
2 π π 3
2 π 2π O 1
2 π π 3
2 π 2π
1
y = e− 3x sin 2x
−1 −1 −1 1
y = ±e− 3x

Fig. 4.10

1
(x is assumed to be in radians.) Split the expression into its two factors, e– –3x and sin 2x.
These are shown in Fig. 4.10a,b. The value of e– –3x drops to about --18 at x = 2π. Also,
1

sin 2x is zero when


2x = 0, π, 2π, 3π, 4π, … ,
that is, where
x = 0, 12 π, π, 32 π, 2π, … .
The product of the two is shown in Fig. 4.10c. The graph crosses the x axis (i.e. y = 0)
where sin 2x = 0, and nowhere else. The height of the peaks and troughs of e– –3x sin 2x
1

– 1–3x
are estimated by the size of the factor e , shown as a broken line, which multiplies the
maxima and minima of sin 2x. The new maxima and minima do not occur at exactly
the same points: it is left to the reader to show that the new maxima and minima occur
at values of x which satisfy the equation tan 2x = 6.

It is useful to be able to distinguish between the behaviour of functions such as


those shown in Fig. 4.11a,b. The function 1/x of Fig. 4.11b is infinite at x = 0,
but the sign changes across x = 0. The terminology used to describe y = 1/x near
x = 0 is
y → −∞ as x → 0 from the left;
y → ∞ as x → 0 from the right.
112

(a) y (b) y
APPLICATIONS OF DIFFERENTIATION

1
3 y= 2 3 1
x y=
x
2 2

1 1
−3 −2
x x
−3 −2 −1 O 1 2 3 −1 O 1 2 3
−1

−2

−3

Fig. 4.11

Example 4.12 Sketch the graph of y = 1/[(x − 2)(x + 1)].


4

Look out for the obvious things first. At x = 0, we have y = −--12 . The function is infinite at
x = 2 and x = −1. It does not cross the x axis anywhere.
Now consider the sign of 1/[(x − 2)(x + 1)]. It is positive when x  −1 (try e.g. x = −3).
It is positive when x  2 (try e.g. x = 3). It is negative when −1  x  2, which is linked
with the facts that the graph does not cross the x axis and, as we already know, that y is
negative when x = 0.
We know now that
y → ∞ as x → −1 from the left;
y → −∞ as x → −1 from the right;
y → −∞ as x → 2 from the left;
y → ∞ as x → 2 from the right;
so Fig. 4.12 is emerging.

y
2

1 x
−2 −1 O 2 1 2 3
− 12
−1

−2

Fig. 4.12 y = 1/[(x − 2)(x + 1)].

We now locate precisely the obvious maximum between x = −1 and 2, and make sure
there are no other stationary points. By the reciprocal rule (3.2b), we have ➚
113
Example 4.12 continued

4.4
dy 1 d 2x − 1
= [(x − 2)(x + 1)] = .
dx (x − 2)2(x + 1)2 dx (x − 2)2(x + 1)2

SKETCHING GRAPHS OF FUNCTIONS


This is zero at x = 12 and nowhere else. There is no need to use the test (4.1); the point
can only be a maximum since there are no other stationary points. The value of y
there is − --49 .

We now return to asymptotes and show that there can be asymptotes that slope.
Consider the function
x2 − 1
y= ,
2x + 1
when x is large, positive, or negative. The term x2 − 1 is dominated by x2, mean-
ing that the part −1 is negligible compared with x2 when x is large. Likewise the
dominant term in 2x + 1 is 2x. It is therefore obvious that
y → ±∞ when x → ±∞.
However, we can do much better than this, because it can be seen by polynomial
division that
x2 − 1 1 3
y= = 2x − 1
− 4
.
2x + 1 2x + 1
4

Therefore the graph will approach the straight line


y = 21 x − 1
4

when x is large. As in the earlier instances we have seen, the line y = 21 x − 14 is said
to be an asymptote of the original graph. The notation
y ∼ 21 x − 1
4 when x → ±∞
is sometimes used, meaning that the curve approaches the line y = 21 x − 14 when x
is large. The curve is sketched in Example 4.13.
In the same way, a function may be an asymptotic to a curve as x → ±∞. For
example, if
1 1
y= − 3 sin x,
x x
then
1
y∼ when x → ±∞.
x

Example 4.13 Sketch the graph of y = (x2 − 1)/(2x + 1).


The curve cuts the x axis (y = 0) at x = ±1. Also y = −1 when x = 0. The function is
infinite at x = − --12 , which is an asymptote.
Also, as shown above, the straight line y = 12 x − 14 is an asymptote for large values
of x. This is shown as the broken line in Fig. 4.13. ➚
114
Example 4.13 continued
APPLICATIONS OF DIFFERENTIATION

y
2
1
1 x− 4
=
te y
1 2

pto
O Asym
x
−2 −1 1 2
−1

1
Asymptote x = − 2
−2

Fig. 4.13

From the quotient rule (3.2a),


dy (2x + 1)(2x) − (x2 − 1)2 x2 + x + 1
4

= = 2 .
dx (2x + 1)2 (2x + 1)2
This is never zero, because the equation x2 + x + 1 = 0 has no real solutions. Therefore,
there are no stationary points.

4.5 Estimating small changes


Let
y = f(x).
Suppose that the value of x changes by a small amount δx. Then y will change by
a small amount δy. There is a simple approximate relation between δy and δx
which is important for practice and theory.
Fix a particular value of x, say x = a, the small deviation δx will be made from
this value of x. The derivative at x = a is f ′(a). According to (2.8), to obtain f ′(a)
we take a nearby point x = a + δx and form the ratio
δy f (a + δx) − f (a)
= ,
δx δx
and
δy
→ f ′(a) as δx → 0.
δx
If δx is small enough, δy /δx will become close in value to f ′(a):
δy
≈ f ′(a), so that δy ≈ f ′(a) δx.
δx
This is how to obtain an approximation to the change δy in y due to a small
change from x = a to x = a + δx. It is easier to remember the result in the form
dy
δy ≈ δx,
dx
115
near a general point x (which again shows the usefulness of the dy/dx notation in
suggesting true results). We call this the incremental approximation for functions

4.5
of a single variable.

ESTIMATING SMALL CHANGES


Incremental approximation
For a small increment δx from x = a:
(a) δy ≈ f ′(a) δx.
(b) (Mnemonic form)
dy
δy ≈ δx.
dx (4.4)

Example 4.14 Let y = x + 1/x. Estimate the change δy in y when x changes from
x = 2 to x = 1.8. Compare the estimate with the exact value of δy.
Put
1
y=x+ = f (x).
x
Then
dy 1
or f ′(x) = 1 − 2 ,
dx x
so that f ′(2) = 0.75. Here δx = 1.8 − 2.0 = − 0.2; so, by (4.4a),
δy ≈ 0.75 × (−0.2) = − 0.15.
The exact value is given by δy = (1.8 + 1/1.8) − (2.0 + 1/2.0) = −0.1444… .

Example 4.15 The volume V of a sphere of radius r is given by V = 43 πr 3.


Estimate the percentage change in volume if the radius increases from 2.0
to 2.1 metres.
We shall use the letters that the question offers, considering δV and δr. Put
V = 43 πr 3 = f (r ).
Then
dV
f ′(r ) = = 4πr 2 ,
dr
so, by (4.4b),
δV ≈ 4πr 2 δr. (i)
2
(Notice that 4πr is the formula for the surface area of a sphere: the change in volume is
nearly equal to the surface area times the thickness δr.) Now put r = 2:
f ′(2.0) = 16π and δr = 2.1 − 2.0 = 0.1.
Then, by (i),
δV ≈ 16π × 0.1 = 5.02.
The exact value is δV= --43 π (2.1)3 − --43 π 23 = 5.282… (cubic metres), implying an error of
5% in our estimate.
116
The number δV ≈ 5.02 in Example 4.15 might not seem to qualify as a small
change. Furthermore, if we express the identical problem in different units, say in
APPLICATIONS OF DIFFERENTIATION

centimetres rather than metres, the numbers are even larger; then δr becomes
10 (cm) and δV is about 5 × 10 6 (cm3). On the other hand, if the units had been
kilometres then δr and δV would have looked very small indeed. But nothing at
all is changed except the units of measurement. We still get only a 5% error in the
estimate. The reason for this is that the ratio
Estimated δV/Exact δV = [f ′(r) δr]/[f(r + δr) − f(r)]
is dimensionless; that is to say it is unaffected by the choice of units (see Appendix I).
There is no easy way to predict when the method will work well: geometrically
speaking, we are content to guess that the graph sticks sufficiently closely to its
tangent line at a within the interval a ± δx.

Example 4.16 The cosine rule for a triangle ABC is c2 = a2 + b2 − 2ab cos C. In
a triangle for which a = 3 and b = 4, estimate the change in c when C increases
4

from 60° to 65°.


Put in the fixed numbers, a = 3 and b = 4; then
c2 = 25 − 24 cos C
or
c = (25 − 24 cos C )2 = f (C),
1

say. By the chain rule (3.3) with u = 25 − 24 cos C,


dc
f ′(C) = = (25 − 24 cos C)− 2 (12 sin C).
1

dC
The quantity δC must be measured in radians, because radian measure was assumed
in obtaining the derivatives of the sin and cos functions. So we put
C = 60° = 13 π radians, δC = 1805
π = 0.087 radians.
We know that cos C = 2 and sin C = 12 √3, so
1

f ′( 13 π) = 6√3/√13.
Therefore, by the incremental approximation (4.4),
δc ≈ (6√3/√13) × 0.087 = 0.25.
The exact change is δc = 0.2489… .

Self-test 4.4
The surface area A of a sphere of radius r is given by A = 4πr 2. Estimate the
change in area if the radius increases from 2.0 to 2.1. Compare this value
with the exact change.

4.6 Numerical solution of equations: Newton’s method


It is often necessary to solve equations for which there is no straightforward
method of solution. (In fact, this is true for practically all equations.) Simple
examples are the equations
117
x +x −1=0
4 3
and e − x = 0.
−x

4.6
For such cases, there are many methods for obtaining numerical solutions, which
are applicable no matter how complicated the equation is. We describe one of

NUMERICAL SOLUTION OF EQUATIONS: NEWTON’S METHOD


them here.
To apply the method, it is necessary first of all to obtain at least a rough idea of
the location of the solution we are seeking. There are various ways of doing this:
for example, we can plot a rough graph. Taking the first example above, the graph
y = x4 + x3 − 1
is sketched in Fig. 4.14 using only five values of x: namely, −1.5, −1, 0, 0.5, and 1.
The solutions occur where it crosses the axis; there seems to be one not far from
−1.3 and one not far from 0.8.

y y
y = x4 + x3 − 1 B0
1.5

1.0

0.5
B1
x
−1.5 −1.0 −0.5 O 0.5 1.0
−0.5 B2
−1.0 A0 A1 A2 C
x
O x0 x1 x2
−1.5

Fig. 4.14 y = x4 + x3 − 1. Fig. 4.15

Suppose now that we have a general equation to solve:


f(x) = 0;
and that, by drawing its graph, or by some other method, we have established that
one of its solutions is not far from the value
x = x 0,
say. We show how to locate this solution accurately.
Figure 4.15 shows one possibility for the shape of the graph of y = f(x) close to
its (unknown) solution x = c, say, corresponding to the point C. (If the graph is
different from this, the discussion is much the same.) The initial estimate x = x0
corresponds to A0. The point A0 could be on the left of the solution C as shown,
or on the right; we are not likely to be sure: again the argument is much the same
(see Problem 4.14).
Perform, in imagination, the following steps.
1. Start at the point A0, where x = x0. Draw the perpendicular A0B0, intersecting
the curve at B0. Construct the tangent at B0, and continue it to intersect the x
axis at A1, where x = x1. Then A1 is nearer to the solution C than A0.
2. Repeat the process, starting with the improved estimate A1. We arrive at A2,
where x = x2, which is a still better approximation.
118
3. Using the new estimates as starting values as they arise, keep repeating the
process to produce a sequence of approximations
APPLICATIONS OF DIFFERENTIATION

A0 (x = x0), A1 (x = x1), A2 (x = x2), A3 (x = x3), …


and stop when the accuracy attained is satisfactory.

The steps 1–3 can be carried out algebraically:

1. Starting with A0 (x = x0), the equation of the tangent line at B0 is given by


y − f (x0 )
= f ′(x0 ).
x − x0
At A1 (x = x1), we have y = 0, so
− f (x0 )
= f ′(x0 ).
x1 − x0
4

Therefore, the new approximation A1 (x = x1) is given by


f (x0 )
x1 = x0 − .
f ′(x0 )
2. x1 takes the place of x0, and x2 takes the place of x1, so
f (x1 )
x2 = x1 − .
f ′(x1 )
3. Once the nth approximation xn is available, the (n + 1)th value xn+1 is given by
f (xn )
xn+1 = xn − .
f ′(xn )
This process, in which essentially we do exactly the same thing over and over
again, using for each step the information obtained from the previous step,
is called a step-by-step process or an iterative process. It is summarized in the
following algorithm (or recipe), known as Newton’s method.

Newton’s method for the numerical solution of f(x) = 0


Find a value x = x0 sufficiently close to the solution required. Then carry out the
following step-by-step process until the desired accuracy is obtained:
f (xn )
xn+1 = xn − ,
f ′(xn )
for n = 0, 1, 2, 3, … , successively. (4.5)

The following examples work through the equations with which we opened
the section.
119

Example 4.17 The equation x4 + x3 − 1 = 0 has a solution near x = 0.8. Find it to

4.6
five-decimal accuracy.
We have

NUMERICAL SOLUTION OF EQUATIONS: NEWTON’S METHOD


x0 = 0.8, f(x) = x4 + x3 − 1, f ′(x) = 4x3 + 3x2.
Then, in (4.5),
xn4 + xn3 − 1
xn+1 = xn − .
4xn3 + 3xn2
Starting with x0 = 0.8, we obtain the following table:
n 0 1 2 3
xn 0.800 00 0.819 76 0.819 17 0.819 17
f(xn) − 0.078 40 0.002 47 0.000 00 0.000 00
Evidently we do not have to pursue the sequence any further.

Example 4.18 The equation e−x = x has a solution near to x = 0.5. Find the
solution accurately to five decimal places.
We have
x0 = 0.5, f(x) = e−x − x, f′(x) = − e−x − 1.
From (4.4),
e −xn − xn x +1
xn+1 = xn − = n
− e − xn − 1 e xn + 1
(the last step for simplicity of calculation). We obtain the following sequence:
x0 x1 x2 x3
0.500 00 0.566 31 0.567 14 0.567 14

This repetitive process described is easy to program for a computer for indi-
vidual cases as they arise, and then the complexity of the equation is of no
importance. The same program can be adapted to scan a range of x in order to
get a provisional idea of where the solutions are to be found. A simple program
combined for safety’s sake with inspection of the whole sequence of values output
would satisfy most requirements.
However, to write a program which will automatically, without intervention,
find all the solutions for any function f(x) that might be presented to it is a very
different matter. For example, we would have to find means to be absolutely sure
that none of the possible tangents would by chance carry us an irrecoverable
distance away from the solution we are seeking (see e.g. Problem 4.15) by design-
ing a way of automatically recognizing and rectifying the situation if it occurs.

Self-test 4.5
The equation ex − 3 cos x = 0 has a solution near x = 1. Using Newton’s
method starting with x0 = 1, find the solution to four significant figures. How
many steps are required?
120

4.7 The binomial theorem: an alternative proof


APPLICATIONS OF DIFFERENTIATION

A proof of the binomial theorem has been given in Section 1.18 (eqn (1.46)),
obtained by counting combinations of terms. There follows a simpler proof,
obtained by repeated differentiation.

The binomial theorem


If n is a positive integer, then
(a) (a + b)n = an + c1an−1b + c2an−2b2 + ··· + cnbn,
in which
n! 1
cr = = n(n − 1)(n − 2) ··· (n − r + 1) = nCr ,
r!(n − r)! r!
and cr = cn−r.
(b) Special case
(1 + x)n = 1 + c1x + c2x2 + ··· + cnxn.
4

(4.6)

Proof. Consider firstly the case (1 + x)n, where n is any positive integer. It is clear
that this can be expressed as a polynomial of degree n. Suppose the coefficients
are c0, c1, c2, … , cn. Then
(1 + x)n ≡ 1 + c1x + c2x2 + ··· + cnxn. (4.7)

We have written ‘≡’ in place of ‘=’ to stress that this is an identity: that is, it is true
for all values of x (see Section 1.1). Differentiate both sides once, twice, … , n times:
n(1 + x)n−1 ≡ c1 + 2c2 +3c3x2 + 4c4x3 + ··· + ncnxn−1,
n(n − 1)(1 + x)n−2 ≡ 2c2 + 3 · 2c3x + 4 ·3c4 + ··· + n(n − 1)cnxn−2,
n(n − 1)(n − 2)(1 + x)n−3 ≡ 3·2c3 + 4·3·2c4 + ··· + n(n − 1)(n − 2)cnxn−3
up to the nth derivative, which is
n(n − 1)(n − 2) … 1 ≡ n(n − 1)(n − 2) … 1· cn.
Since these are all identities, they are true when x = 0, and for this value they
become
n = c 1, n(n − 1) = 2c2, n(n − 1)(n − 2) = 3 ·2c3, …,
and in general, for the rth derivative, with 1  r  n,
n(n − 1)(n − 2) … (n − r + 1) = r!cr, (4.8)

where the term on the left contains r factors. This immediately gives the result (4.6b).
Equation (4.6a) is derived from (4.6b) by writing
(a + b)n = an[1 + (b/a)]n.
We now know the coefficients cr , so we may put x = b/a into (4.7):
(a + b)n = an[1 + c1(b/a) + c2(b/a)2 + ··· + cn(b/a)n]
= an + c1an−1b + c2an−2b2 + ··· + cnbn,
as required.
121
Problems

PROBLEMS
4.1 (See Section 4.1 on the ‘dash’ notation.) 4.7 Solve Problem 4.6 for the case when the lid is
The function f is defined by f(u) = u2. Obtain not included in the restriction.
the following.
d 2 4.8 Sketch the graphs of the following functions.
(a) f ′(t); (b) f ′(t 2); (c) f(t ); (a) 1/(x2 + 1) (this is an even function: see (1.12)).
dt
e−x . (c) x/(x − 1). (d) x e−x.
2

d (b)
(d) f ′(t –); (e) f(t –); (f) f ″(t –).
1 1 1
2 2 2
(e) x2 e−x. (f) x3 e−x. (g) e2x − 4 ex.
dt
(h) (ln x) /x for x  0 ((ln x)/x → 0 when x → ∞;
this can be proved by putting x = eu and
4.2 (See Section 4.2.) Find the stationary points of
letting u → ∞).
the following functions and classify them as maxima,
(i) [ln(−x)]/x for x  0 (compare (h)).
minima, or points of inflection.
( j) x ln x − x for x  0 (x ln x → 0 when x → 0;
(a) x2 − x; (b) x2 − 2x − 3; (c) x ln x (x  0); this can be seen by writing x = e−u and letting
(d) x e ; −x
(e) 1/(x2 + 1); (f ) x2 − 3x + 2; u → ∞).
(g) e + e ;
x −x
(h) x + 4x + 2; (i) x − x3;
2
(k) sin 1/x (Start by finding where it crosses the
( j) x (x − 1); (k) sin x − cos x (in 0  x  2π);
2 axis, using the fact that sin u = 0 when u = 0,
(l) sin x cos x (−π  x  π); (m) e−x sin x; ±π, ± 2π, … .)
− –x
(n) e sin 2x (see Example 4.11); (o) x − cos x;
1
3
(l) (x2 − 1)2 (This is an even function: see (1.12).)
(m) x(x2 − 1)2 (This is an odd function: see (1.12).)
(p) 2ex − --12 e2x; (q) x2 e−x; (r) (ln x)/x (x  0);
(n) (sin x) /x (You will not be able to find the exact
(s) (1 − x)3; (u) e−x ;
2
(t) sin3x; positions of the maxima and minima; be
x −x
(w) x + x ; (x) x3 e−x.
−1
2
(v) e ; content to indicate the trend. It is an even
function: see (1.12). For the value approached
4.3 Let y = f(u(x)). Use two successive applications at x = 0, see (2.13).)
of the chain rule in the form of Example 4.2c to
show that 4.9 Sketch the graphs of the following functions.
dy2 (a) 1 /(x2 − 1) (Hint: write x2 − 1 = (x + 1)(x − 1),
= f ″(u(x))[u′(x)]2 + f ′(u(x))u″(x). and then follow Example 4.12; alternatively,
dx2
sketch y = x2 − 1 and imagine taking its
Show that if f ′(u) is always greater than zero, reciprocal.)
or always less than zero, then f(u(x)) and u(x) (b) x /(x2 − 1). (c) 1/x(x − 2).
have the same stationary points. Consider, for (d) x3/(1 − x) (Hint: see the note on curved
example, Problem 4.2v in this connection, asymptotes following Example 4.12.)
with f(u) = eu and u(x) = x2 − x: it becomes (e) (x + 2)/(x − 1) (See the hint in (d).)
rather obvious. (f) 1 /(x + 1) + 1 /(x + 2).

4.4 A rectangular piece of ground is to be 4.10 (See Section 4.5.) Find the approximate value
marked out, which must have a given area A. of the change δy in y due to a small change δx in x
Find the dimensions of the plot which requires using the incremental approximation (4.4) in the
the minimum length of perimeter fence. (This is a following cases. Compare the approximate and
‘restricted’ problem, like Example 4.6. Call the exact values of δy.
sides x and y.) (a) y = x3 when x = 2 and δx = 0.1;
(b) y = x sin x when x = 12 π and δx = −0.2;
4.5 A tunnel cross-section is to have the shape (c) y = cos x when x = 14 π and δx = 0.1;
of a rectangle surmounted by a semicircular roof. (d) y = (1 + x)/(1 − x) when x = 2 and δx = − 0.2;
The total cross-sectional area must be A, but the (e) y = tan x when x = 14 π and δx = 0.1;
perimeter minimized to save building costs. Find (f) y = 1 /(1 − x2) when x = 0.5 and δx = ±0.1.
its dimensions.
4.11 (a) If the focal length of a lens is f, and a
4.6 A circular-cylindrical oil drum is required to viewed object is at distance u, then the image is at
have a given surface area (including its lid and distance v where v = uf/(u − f ). Let f = 0.75 (m).
base). Find the proportions of the design which Find approximately the change in v if u changes
contain the greatest volume. from 1.25 to 1.30 (m).
122
(b) In a Wheatstone bridge circuit, the out-of- 4.15 (a) Supposing y = f(x) to have a continuous
balance voltage v is given by graph, illustrate graphically that the following
APPLICATIONS OF DIFFERENTIATION

v = E(R1R4 − R2R3)/[(R1 + R2)(R3 + R4)], principle is true:


where E is the applied voltage and R1, R2, R3, R4 If f(a) and f(b) have opposite signs, then there is
represent the resistances in the branches. Suppose at least one solution of the equation f(x) = 0 in the
that E = 5, R1 = 4, R2 = 2, R3 = 6, and R4 = 3, so range a  x  b.
that the circuit is initially balanced. Obtain an (b) (Computational). The equation ex − 3x = 0
approximate expression for δv in terms of a has exactly two solutions, and they are in the range
small change δR1 in R1. 0  x  2.5. Use the principle in (a) to narrow the
(c) In a triangle ABC with corresponding sides ranges in which they are known to lie, so as to
a, b, c, the formula a = b sin A/sin B applies. Show produce starting values x0 for Newton’s method.
that δa ≈ −a cot B δB. One systematic technique is to start with the given
(d) In a triangle ABC with corresponding sides end-points x = 0 and 2.5, then to halve the interval
a, b, c, the area A is given by repeatedly, considering the signs at the ends of the
subdivisions.
A = [s(s − a)(s − b)(s − c)]–,
1
2

where s = --12 (a + b + c). Find an approximate 4.16 (Computational). (a) Suppose that an
expression for δA in terms of δc. (Hint: use equation f(x) = 0 is known to have exactly one
logarithmic differentiation to shorten the solution in a particular finite interval a  x  b.
4

working.) Estimate δA when a = 2, b = 4, Write a program, using the principle described in


c = 5, and δc = 0.1, with a and b remaining Problem 4.15, to obtain a closer starting value x0
constant. for Newton’s method. (Since there is only one
solution, any subdivision you find across which
4.12 (Computational). Growth on a deposit the sign of f(x) does not change can be ignored.
by compound interest is given by the formula Arrange for the process to stop when the solution
C = P(1 + r)n, where P is the amount deposited, is located within a small preset interval of
r is the compound-interest annual growth rate, length E.)
n is the time of deposit in years or fractions of (b) Try this with, say, the equation
a year, and C is the accumulated balance. Obtain x(ex − 1) = 1, whose single solution lies between
approximating expressions for δC when (a) r 0 and 1.
changes by a small amount δr; (b) n changes by (c) By choosing E to be very small, the
a small amount δn. (c) Consider plausible values process can by itself locate the solution to any
of P, r, and n, and experiment with the degree of accuracy if E is small enough. (This is
accuracy of the formulae for various values called the bisection method for solving equations.)
of δr and δn. Obtain the number of iterations required to
locate the single solution of f(x) = 0 in 0  x  1
4.13 (Newton’s method, Section 4.6). It is not too to two-, four-, and six-decimal accuracy. (The
difficult to make these calculations on a hand-held number of iterations is the same for any such
calculator. Find the solutions of the following equation.)
equations within the broad ranges indicated, (d) Solve the equation in (b) by Newton’s
which contain exactly one solution. method to two-, four-, and six-decimal accuracy,
(a) x4 + 2x2 − x − 1 = 0 (range 0.5  x  1); starting with x = 0.5, and compare the number of
(b) x4 + x– − 1 = 0 (0.5  x  0.75);
1
3
iterations required with the number required by
(c) x ln x = −0.3 (0.1  x  0.2); the bisection method.
(d) ex = 4x3 (0  x  1);
(e) tan x = 2x (0  x  12 π); 4.17 Consider the neighbouring points [x0, f(x0)]
(f ) (ex sin x)/(1 + x) = 2 (1.5  x  1.9). and [x0 + δx0, f(x0 + δx0)] on the curve y = f(x). Find
the normals (see Section 2.2) to the curve at both
4.14 The equation f(x) = x e−x + 1 = 0 is known to these points, and the coordinates of their point of
have exactly one solution (not far from x = −0.6). intersection. Let δx0 → 0, and show that this point
Demonstrate numerically that it is of no use to (known as the centre of curvature at [x0, f(x0)]) has
start off Newton’s method for this equation with the coordinates
a value of x greater than 1. Sketch the graph of
the function, and make the construction based on A f ′(x0)[1 + f ′(x0)2] [1 + f ′(x0)2] D
x − , f(x0) +
Fig. 4.15 to explain why this is so. C 0 f ″(x0) f ″(x0) F
123
provided f ″(x0) ≠ 0. Show that the distance of the 4.18 Prove Leibniz’s formula for the nth derivative
centre of curvature from [x0, f(x0)] is of a product of two functions:

PROBLEMS
[1 + f ′(x 0 )2 ] 2
3
(fg)(n) = f (n)g + nC1 f (n−1)g(1) + nC 2 f (n−2)g(2) + ··· + nCn fg(n),
R= .
f ′′(x 0 ) where nCr is the rth binomial coefficient, given by
R is known as the radius of curvature of the curve n!/[r!(n − r)!]. (Hint: try writing out the first three
(see also Section 10.1). derivatives at full length: notice how the repetitions
Find the radius of curvature of the parabola of terms in f (n−r)g(r) combine to produce the
y = x2 at every point on the curve. coefficients.)
5 Taylor series and
approximations

CONTENTS

5.1 The index notation for derivatives of any order 125


5.2 Taylor polynomials 125
5.3 A note on infinite series 128
5.4 Infinite Taylor expansions 130
5.5 Manipulation of Taylor series 132
5.6 Approximations for large values of x 134
5.7 Taylor series about other points 134
5.8 Indeterminate values; l’Hôpital’s rule 136
Problems 138

We extend the idea of infinite series introduced in Section 1.16 for the special case
of infinite geometric series. The so-called Taylor series expansion is a type of
infinite series. In the simplest case, a given function f may be expressible in the
form
f(x) = c0 + c1x + c2x2 + … ,
whose terms involve successive positive integer powers of x. In general such series
are called power series. A power series holds good (that is, truly represents the
function) on some interval of validity −r  x  r, where r  0 is called the radius
of convergence (which may be infinite). Within this range two or three terms may
be adequate to provide a good approximation to the function if | x | is small enough.
For example, if | x |  0.5 radians only the first two non-zero terms, x − --16 x3, of the
series for sin x (eqn (5.4c)) are required for 0.05% accuracy.
We firstly obtain the formula for the coefficients of the Taylor series of a gen-
eral function, and work out the series for the standard elementary function ex,
sin x, and so on (see Table (5.4)). For composite functions such as tan x, arcsin x,
or e−x sin x, substitution in the general formula can become complicated, so we
also show how the results in the table can be used directly to obtain any fixed
number of terms in such cases.
125

5.1 The index notation for derivatives of any order

5.2
We shall use yet another standard notation for derivatives in this chapter. Since we
shall have to keep track of derivatives of high orders we modify the ‘dash’ notation

TAYLOR POLYNOMIALS
of (4.1) as follows to provide a brief form:

Index notation for derivatives


For the first, second, third, … derivatives respectively of f(x), write
f ′(x) = f (1)(x), f ″(x) = f (2)(x), f ″′ = f (3)(x), … .
If y = f(x), the notation y(1), y(2), y(3), … is also used. (5.1)

Thus if f(x) = x3, then f (1)(x) = 3x2, f (2)(x) = 6x, and f (3)(x) = 6. As with the dash
notation, we encounter such forms as f (2)(u) = 6u, f (2)(0) = 0, and f (2)(x − c) =
6(x − c).

5.2 Taylor polynomials


Firstly we shall show how to obtain approximations to a given f(x) for use when
x is a small number. Suppose, for example, that
1
f (x) = .
1−x
Since f(0) = 1, we can be sure that
1
≈1
1−x
so long as x is small enough. This is shown in Fig. 5.1a. It is, of course, a poor
approximation, acceptable only very close to x = 0.
A better approximation near x = 0 is given by the equation of the tangent line
at x = 0 (Fig. 5.1b). Since f (1)(x) = 1/(1 − x)2, the slope at x = 0 is f (1)(0) = 1.
The equation of the tangent line at x = 0 is therefore y = 1 + x, so
1
≈ 1 + x,
1−x
when x is small enough.

(a) y (b) y (c) y


x)
x)
x)

∞ ∞ ∞
/(1 −
/(1 −
/(1 −

y=

2 2 2
y=1
y=1
y=1

1+
x+

y=1 1 1 1
x
2

x
O x 1+ O x O x
y=
−1 1 2 −1 1 2 −1 1 2

Fig. 5.1
126
We need a way to continue improving the approximation, P(x) say, to a further
stage and beyond. At present we have reached the tangent approximation P(x) =
TAYLOR SERIES AND APPROXIMATIONS

1 + x, which was chosen so that P(0) = f(0) and P(1)(0) = f (1)(0). To obtain the next
approximation choose a P(x) which also matches the second derivative at x = 0:
P(0) = f(0), P(1)(0) = f (1)(0), P(2)(0) = f (2)(0).
This involves adding a term in x2, and we can choose its coefficient so that the
extra condition is satisfied without disturbing the two terms we have already
found. Continuing with the example,
2
f (2 )(x) = , so f (2)(0) = 2.
(1 − x)3
It is easy to check that P(x) = 1 + x + x2 satisfies the three conditions. Therefore
1
≈ 1 + x + x2
1−x
is an improved approximation. This represents the parabolic curve shown in
Fig. 5.1c. Notice that the series which is emerging is the geometric series which
5

was shown in (1.37) to have the sum 1/(1 − x).


We can carry out this process for almost any function f(x), and take it to
any level of approximation we wish. Successive approximations will consist of
polynomials of increasing degree. However, we must not expect too much of it:
we cannot go too far from the origin and still expect a good approximation.
To deal with the general case, we need the following simple result.

Derivatives of a polynomial in x at x = 0
P(x) = a0 + a1x + a2x2 + ··· + aNxN
is a polynomial of degree N. Then
P(0) = a0, P(1)(0) = a1, P(2)(0) = 2!a2,
and in general
P(n)(0) = n!an
for n = 1, 2, 3, … , N. (5.2)

It is easy to verify (5.2) by working out the first few derivatives.


Now suppose that we wish to approximate to a general function f(x) (near
x = 0) by means of a polynomial
P(x) = a0 + a1x + a2x2 + ··· + aNxN.
We require that
P(0) = f(0), P(1)(0) = f (1)(0), P(2)(0) = f (2)(0), ….
According to (5.2), the coefficients are given by
a0 = P(0) = f(0),
1 (1) 1
a1 = P (0) = f (1)(0),
1! 1!
127
1 (2 ) 1
a2 = P (0) = f (2 )(0),

5.2
2! 2!
1 ( 3) 1
a3 = P (0) = f ( 3)(0),

TAYLOR POLYNOMIALS
3! 3!
and so on. By writing the coefficients an in terms of the known values f (n)(0) we
obtain the Taylor polynomial approximation:

Taylor polynomial P(x) of degree N near x = 0


Let P(x) be the N th-degree polynomial
1 (1) 1 1 (N )
f (0) + f (0)x + f ( 2)(0)x 2 +  + f (0)xN .
1! 2! N!
Then for x sufficiently close to zero,
f(x) ≈ P(x). (5.3)

Example 5.1 Obtain a fifth-degree polynomial which approximates to ex for


values of x that are not too large.
Use (5.3), putting f(x) = ex. This case is simple:
f(x) = f (1)(x) = f (2)(x) = ··· = f (5)(x) = ex,
so f(0) = f (1)(0) = f (2)(0) = ··· = f (5)(0) = 1. Therefore
1 1 1 1 1
ex ≈ 1 + x + x2 + x 3 + x 4 + x5 = P(x).
1! 2! 3! 4! 5!
(If we take higher-degree approximations, the terms continue according to the same rule.)
We show ex and its approximation P(x) in the following table for a few values of x.

x −4 −3 −2 −1 −0.5 0 0.5 1 2 3 4
x
e 0.0183 0.0498 0.1353 0.3679 0.6065 1 1.6487 2.7183 7.3891 20.086 54.598
P(x) −3.533 −0.6500 0.0667 0.3666 0.6065 1 1.6487 2.7167 7.2667 18.400 42.867

The approximating polynomial P(x) clings to the true values for a considerable range
around the origin (see Fig. 5.2).

y y = ex
100

50 y = P(x)

Fig. 5.2 The solid curve shows


−4 y = ex, and the dashed curve the
x
fifth-degree polynomial
−2 2 4 approximation P(x).
128

Example 5.2 (a) Obtain the Taylor polynomial approximation of any degree
TAYLOR SERIES AND APPROXIMATIONS

N for the function 1/(1 − x) near x = 0. (b) Obtain an expression for the error
in the approximation.
(a) Putting 1/(1 − x) = f(x), the sequence of derivatives of f(x) is
1 2 ⋅1 3 ⋅ 2 ⋅1
f (1)(x) = , f (2)(x) = , f (3)(x) =
(1 − x)2 (1 − x)3 (1 − x)4
and in general
n!
f (n)(x) = .
(1 − x)n+1
Therefore, referring to (5.3), the Taylor polynomial of degree N is
1 + x + x2 + x3 + ··· + xN.
(b) The error in an estimation using this approximation is equal to
1 (1 − x)(1 + x + x2 +  + x N ) − 1 − x N +1
P(x) − f(x) = 1 + x + ··· + xN − = = .
1−x 1−x 1−x
You should experiment with this expression using various values of N and x. (i) If x is
5

very small, the error involved is very small even if N is only 2 or 3. (ii) If we take any
fixed value of x in the range −1  x  1, the error will approach zero when we take
approximations of higher and higher degree (because, when −1  x  1, xN+1
approaches zero as N increases: try this numerically with, say, x = 0.9). (iii) The
approximation fails altogether if x  1 or x  −1. The error will be large, and to
increase N will make it still larger because |xN+1 | increases when N increases.

Self-test 5.1
Obtain a fifth-degree polynomial P(x) which approximates e2x for small | x |.
Find the difference [e2x − P(x)] at x = 1.

5.3 A note on infinite series


In the previous section we did not put any limit on the degree of the approximat-
ing polynomial, and there seems to be no reason why we should not let the terms
run on for ever: in fact, let the degree N approach infinity. If we extend the poly-
nomial approximation of Example 5.1 for f(x) = ex, we obtain an example of a
so-called infinite series:

1 1 1 1
1+
1!
x + x2 + x 3 +  or
2! 3!
∑ n! x .
n =0
n

It might be that by extending the approximating polynomials in this way,


approximation will become equality, so that the sum of the series will be equal
to the original function instead of being just an approximation to it, but this is
only true with reservations.
There are many types of infinite series (see e.g. Chapter 26 on Fourier series).
Consider first what is meant by the sum of an infinite series. When x is given any
129
particular value, the terms to be added become simply numbers. We cannot in
practice add an infinite number of numbers: no matter how many operations we

5.3
carry out we never reach the end. However, this does not mean that the infinite
series does not add up to a definite number, only that we cannot reach it exactly

A NOTE ON INFINITE SERIES


by simply piling on more and more terms.
Consider the simpler infinite series that we get from putting x = 0.1 into the
Taylor polynomial for 1/(1 − x) (see Example 5.2), and letting the degree increase
to infinity. It is the geometric series
1 + 0.1 + 0.12 + 0.13 + 0.14 + ···
(see Section 1.16). This is the same as
1 + 0.1 + 0.01 + 0.001 + 0.0001 + ··· .
If we record the sum of 1, 2, 3, 4, … terms successively, we obtain what is called a
sequence of partial sums (‘partial’ because we only take a finite number of terms
into account). The sequence is
1, 1.1, 1.11, 1.111, 1.1111, ….
The number that is being approached is obviously 1.111 11…, which is equal to
10 /9. This number is equal to the value of 1/(1 − x) when x = 0.1, so in this case
the infinite series has delivered the value required. Similarly, if we put x = 12 , the
infinite series is
1+ 1
2 + ( 21 )2 + ( 21 )3 + ( 21 )4 +  = 1 + 1
2 + 1
4 + 1
8 + 1
16 +.
For the sum of 1, 2, 3, 4, … terms, we obtain the sequence of partial sums
1, 1 12 , 1 34 , 1 78 , 1 15
16 ,
…,
which is obviously approaching the value 2, and this is the value of 1 /(1 − x) when
x = 12 . Infinite series whose partial sums approach a definite value as we take more
and more terms are said to converge to this value, which is called the sum of the
infinite series.
However, not all infinite series converge. For example, if we form successively
the sum of 1, 2, 3, … terms of the infinite series
1 + 1 + 1 + ··· ,
then we obtain
1, 2, 3, 4, … ,
which is obviously going to infinity. The infinite series
1 − 1 + 1 − 1 + ···
has the successive partial sums
1, 0, 1, 0, 1, … ,
which is not going anywhere. Such series are said to diverge. You might be sur-
prised to know that the infinite series
1 + 12 + 13 + 14 + …
130
diverges: the partial sums go to infinity. (It is worth experimenting with this
series: even using a computer you might take a while to convince yourself that it
TAYLOR SERIES AND APPROXIMATIONS

really does diverge.)


On the other hand the series
1 1 1
1 + p + p + p + ···
2 3 4
converges for any fixed p  1.

5.4 Infinite Taylor expansions


We return to the subject of general Taylor polynomials of the type (5.3) when we
extend the polynomial to an infinite number of terms, so that we have an infinite
series instead of a polynomial expression. This is called a Taylor series or an
infinite Taylor expansion about the origin x = 0 for the function f(x).
The mathematical theory of infinite series, and in particular of Taylor series,
cannot be discussed in this book. In the previous section it is indicated that
pitfalls might arise when the polynomials are extended into infinite series. More-
5

over, it seems obvious, for example, that the values of a function and its derivatives
at the origin only cannot possibly predict values elsewhere if we allow functions
to be completely arbitrary at other points.
For example, the graph of f(x) where f(x) = e−1/x for x ≠ 0 and f(0) = 0, is perfectly
2

smooth everywhere. But f(x) and all its derivatives f (1)(x), f (2)(x), … , are zero when
x = 0. Therefore the coefficients in the approximating polynomials (5.3) are all
zero, and the only point whose value is given correctly by its Taylor series is the
value zero at the origin.
However, ordinary functions do follow the simple pattern illustrated by the
case of f(x) = 1 /(1 − x) in Example 5.2. Each function has an individual range of
values of x, called its interval of validity, in which the Taylor series converges to
the exact value of f(x). Elsewhere, the series must not be used for approximation.
The following table (5.4), displays the infinite Taylor series about the origin for
several important functions, together with their ranges of validity. You should
confirm the coefficients, as in Examples 5.1 and 5.2.

(i) Geometric series (valid for −1  x  1)


1
= 1 − x + x2 − x3 + ··· . (5.4a)
1+x
(ii) Exponential series (valid for all x)
1 1
ex = 1 + x + x2 +  . (5.4b)
1! 2!
(In particular, e = 1 + 11! + 21! +  .)
(iii) Trigonometric series (valid for all x in radians)
1 1
sin x = x − x3 + x5 −  . (5.4c)
3! 5!
1 1
cos x = 1 − x 2 + x4 −  . (5.4d)
2! 4! ➚
131

(iv) Logarithmic series (valid for −1  x  1)

5.4
1 1
ln(1 + x) = x − x 2 + x3 −  . (5.4e)
2 3

INFINITE TAYLOR EXPANSIONS


(v) Binomial series (valid for −1  x  1 and any α)
α(α − 1) 2 α(α − 1)(α − 2) 3
(1 + x)α = 1 + αx + x + x +. (5.4f)
2! 3!
(By putting α = −1, the geometric series (5.4a) is recovered. If α = N, a positive
integer, the series terminates at the term in xN, and so the binomial theorem of
Sections 1.18 and 4.7 is obtained.)
(vi) Inverse tangent (valid for −1  x  1)
arctan x = x − --1 x3 + --1 x5 − --1 x7 + …
3 5 7 (5.4g)
(vii) Hyperbolic functions (valid for −∞  x  ∞)
x3 x5
sinh x = x + + + … (5.4h)
3! 5!
x 2 x4 …
cosh x = 1 + + + . (5.4i)
2! 4!

When a series is used to provide approximations by taking only a finite number


of terms, it is necessary to estimate how many terms to take so as to obtain a
desired degree of accuracy. It is usually sufficient to observe the size of the terms
involved, as in the following example.

Example 5.3 Find how many terms of the Taylor series for sin x are needed to
obtain three-decimal accuracy over the range −1  x  1 (in radians).
The intuitive requirement is that we should stop at the point where we can see that taking
further terms is not likely to affect the third decimal place. The magnitude (modulus) of
the terms in (5.4c) increases when the magnitude of x increases, so it should be sufficient to
provide an approximation good for the largest value, x = 1. The magnitudes of successive
terms when x = 1 are equal to
1, 0.16, 0.083, 0.0002, 2 × 10 − 6, ··· ,
using the recurring decimal notation (Section 1.1). It is therefore enough to retain three
terms of the series; that is to say we should retain powers of x up to x5. To three
decimals, then,
1 3 1 5
sin x ≈ x − x + x for −1  x  1.
3! 5!

Self-test 5.2
How many terms of the Taylor series for cos x are needed to obtain three-
decimal accuracy over the range −1  x  1?
132

5.5 Manipulation of Taylor series


TAYLOR SERIES AND APPROXIMATIONS

We can obtain new Taylor series from the standard ones in (5.4).

Find the Taylor expansion about x = 0 for the function (2 − x)2 ,


1
Example 5.4
and state its range of validity.
Write
1 1 1 1 1
(2 − x)2 = 2 2 (1 − 12 x)2 = 2 2 [1 + (− 12 x)]2 .
We can use the binomial expansion (5.4f), with α = --12 , and with − --12 x in place of x. The
expansion will be valid, provided that −1  − --21 x  1, that is when −2  x  2. Therefore
1 1 1
(2 − x)2 = 2 2 [1 + (− 12 x)]2
1 ⎛ ( − 1) 1 2 12 ( 12 − 1)( 12 − 2) 1 3 ⎞
1 1
= 2 2 ⎜ 1 + 12 (− 12 x) + 2 2 ( − 2 x) + ( − 2 x) +  ⎟
⎝ 2! 3! ⎠
= 2 2 (1 − 14 x − x2 − x3 +  )
1
1 1
32 128

when −2  x  2.
5

To find the first few terms in the Taylor series for a composite function f(x) such as
e− x
f (x) = 1 ,
(1 + x)2
it is usually best not to start from first principles by calculating f(0), f (1)(0), f (2)(0),
and so on, which can lead to great complication, but to manipulate standard
expansions as in the following examples.

Example 5.5 Approximate to (sin x /x)2 by a polynomial of degree 4, and


compare the approximate and exact values when x = 0, 14 , 12 , 1, 2.
From (5.4c),
2
⎛ 1 1 ⎞
x − x 3 + x5 −  ⎟ 2
⎛ sin x ⎞ ⎜ ⎛ ⎞
2
3! 5! 1 2 1 4
⎜ ⎟ =⎜ ⎟ = ⎜1 − x + x −  ⎟
⎝ x ⎠ ⎝ x ⎠ ⎝ 3! 5! ⎠
⎛ 1 1 ⎞ ⎛ 1 2 1 4 ⎞
= ⎜ 1 − x2 + x 4 −  ⎟ ⎜ 1 − 3! x + 5! x −  ⎟
⎝ 3! 5! ⎠ ⎝ ⎠
2 2 ⎛2 1⎞
=1− x + ⎜ + 2 ⎟ x4 +  ,
3! ⎝ 5! 3! ⎠
where only terms up to x4 are retained in the multiplication. Write the approximating
polynomial P(x) as
P(x) = 1 − 0.3333x2 + 0.0444x4
to obtain the table

x 0 0.25 0.5 1.0 2.0

[sin x/x]2 1 0.9793 0.9179 0.6861 0.2067


P(x) 1 0.9793 0.9194 0.7111 0.3772
133

Approximate to e−x /(1 + x)2 near x = 0 by a polynomial of degree 2.


1
Example 5.6

5.5
Write

MANIPULATION OF TAYLOR SERIES


f (x) = e −x(1 + x)− 2 .
1

Use (5.4b) with −x in place of x, and carry it to degree 2:


1 2
e−x ≈ 1 − x + x.
2!
Also, by (5.4f) (the binomial theorem) with α = − 12 :
(− 12 )(− 12 − 1) 2
(1 + x)− 2 ≈ 1 + (− 12 )x +
1
x.
2!
Then by multiplying the two polynomials we obtain
f (x) ≈ 1 − 23 x + 11
8 x2 ,
when x is small. (Reject powers higher than 2 in the final product – they would not be
correct since we neglected such terms in the original approximations.)

Example 5.7 Obtain the first three nonzero terms of the Taylor expansion for
sec x = 1/cos x.
There are several ways of doing this problem.
(a) Working from (5.3). You might try this, but it is rather arduous.
(b) Using the power series for cos x given by (5.4d). Write
1 1
= .
cos x 1 2 1 4
1− x + x −
2! 4!
The problem is to find the first three terms in the reciprocal of the infinite series; we then
have a Taylor polynomial. Anticipate that only the even powers of x will occur, as in the
expansion of the even function cos x. Then we expect
1 1
= = b0 + b2x2 + b4x4 + ··· .
cos x 1 2 1 4
1− x + x −
2! 4!
We have to find b0, b2, b4. To do this, cross-multiply:
⎛ 1 1 ⎞
1 = ⎜ 1 − x2 + x 4 −  ⎟ (b0 + b2x2 + b4x4 + ··· )
⎝ 2! 4! ⎠
= b0 + (b2 − 12 b0 )x2 + (b4 − 12 b2 + 1
b )x 4 + 
24 0

(retaining only powers up to x4). Match the coefficients of powers of x on both sides,
starting with the constant term; we obtain
b0 = 1,
and, since the coefficients of x2 and x4 on the left are zero,
b2 − 12 b0 = 0 and b4 − 12 b2 + 1
24 b0 = 0.
The last two equations can be solved successively to give
b2 = 1
2 and b4 = 5
24 .
Finally
1 /cos x ≈ 1 + 12 x2 + 5
24 x4.
134
(c) Polynomial division. We can evaluate 1/(1 − 12 x2 + 1
24 x 4 −  ) by long division,
setting it out like this, ignoring powers higher than x4:
TAYLOR SERIES AND APPROXIMATIONS

1 + 12 x2 + 245 x4
1 − 12 x2 + 241 x4 1
subtract: 1 − 12 x2 + 241 x4
1
2 x2 − 241 x4
subtract: 1
2 x2 − 241 x4
5
24 x4
5
24 x4

Self-test 5.3
1
Find the Taylor series for (1 + x)–2 e−x as far as x 3.
5

5.6 Approximations for large values of x


When x is large, 1/x is small. This fact can sometimes be used to obtain approxima-
tions valid when x is large, as in the following example.

Obtain a three-term approximation to (1 + 1/x)2 valid when x is


1
Example 5.8
large enough.
Translate the binomial series, (5.4f), with α = 12 , in terms of a neutral variable, say u:
( − 1) 2
1 1
(1 + u)2 = 1 + 12 u + u + when −1  u  1,
1 2 2
2!
so (1 + u)2 ≈ 1 + 12 u − 18 u2 when u is small enough, the approximation improving as u
1

gets smaller. Now put u = 1/x; we obtain


1

⎛ 1⎞ 2 1 1
⎜1 + ⎟ ≈ 1 + − 2,
⎝ x ⎠ 2x 8x
when x is large enough (positively or negatively), the approximation improving as x gets
larger.

5.7 Taylor series about other points


The Taylor series about x = 0
1
f (x) = = 1 + x + x2 + x3 + ···
1−x
135
does not work when x = 2: we get 1 + 2 + 2 + ··· , which is infinite. However, we
2

can obtain a different Taylor-type series which represents 1/(1 − x) near x = 2 by a

5.7
process that amounts to changing the origin, as in the following example.

TAYLOR SERIES ABOUT OTHER POINTS


Example 5.9 Find a Taylor-type series which represents 1/(1 − x) for values of x
near x = 2.
Look for a series of this type:
1
= b0 + b1(x − 2) + b2(x − 2)2 + ··· ,
1−x
because we want a series that works when x is close to 2, which is to say when x − 2 is
small, rather than when x is small as before. Therefore we need a series consisting of
powers of x − 2. We can bring the element x − 2 into view by writing
1 1 1
= =− .
1 − x 1 − (x − 2 + 2) 1 + (x − 2 )
Now expand the final term by using (5.4f ) (the binomial theorem) with α = −1, and x − 2
in place of x, obtaining
1
= −1 + (x − 2) − (x − 2)2 + (x − 2)3 + ··· ,
1−x
valid if −1  x − 2  1, that is if
1  x  3.

Example 5.10 Obtain a Taylor series about the point x = π for the function cos x.
There exists already the series (5.4d), which is valid at x = π. However, if we are
interested in approximating to cos x near x = π, an expansion in powers of x − π should
be more economical and expressive than one consisting of powers of x. We show two
ways of finding the series.
(a) On the lines of Example 5.9. Write
cos x = cos[π + (x − π)] = cos π cos(x − π) − sin π sin(x − π) = −cos(x − π).
We can use (5.4d) to expand this, by putting x − π in place of x. We obtain
1 1
cos x = −cos(x − π) = −1 + (x − π)2 − (x − π)4 +  .
2! 4!
This is valid for all values of x. A two-term approximation shows that cos x has a
parabolic shape near x − π = 0 or x = π, where cos x has a local minimum.
(b) Matching the value and the derivatives at x = π. The derivatives of f(x) = cos x at
x = π are given by
f(π) = cos π = −1, f (1)(π) = −sin π = 0, f (2)(π) = −cos π = 1,
and so on. The same relations hold good between the coefficients of a polynomial
in powers of x − π and the values of its derivatives at x = π, as was stated in (5.3) for
polynomials in x at x = 0. We simply put x − π in place of x in (5.3). The required
Taylor series is
1 1
f(x) = f (π) + f (1)(π)(x − π) + f (2)(π)(x − π)2 +  ,
1! 2!
which is the same as the result obtained in (a).
136
The general result is the following:
TAYLOR SERIES AND APPROXIMATIONS

Taylor series about a point x = c


1 (1) 1
f (x) = f (c) +f (c)(x − c) + f ( 2)(c)(x − c)2 +  .
1! 2!
(The range of validity depends upon f(x).) (5.5)

Self-test 5.4
Find the Taylor series of sin x about x = --12 π, including the general term.

5.8 Indeterminate values; l’Hôpital’s rule


The relation
5

sin x
y=
x
specifies a value for y for all values of x except for x = 0. At this point the formula
gives y = 0/0, which is a meaningless or indeterminate expression. The graph of y
against x therefore contains a gap at x = 0. However, as we approach x = 0 from
either side, y may approach a single, finite value, y(0) say, that plugs the gap. If such
a value exists, it is given by the limiting operation
sin x
y(0) = lim ,
x→0 x
which can be evaluated in the following way.
Use the Taylor series to write, for x ≠ 0,
sin x x − 3! x 3 + 
1
1
= = 1 − x2 +  ,
x x 3!
after cancelling the factor x. This new expression has no pecularities. Therefore
we put x = 0 in the new series, obtaining
sin x
lim = 1,
x→0 x
and this is the missing value y(0).

Example 5.11 Obtain


2
sin 3x
lim .
x→0 1 − cos x

For x = 0, we obtain 0/0, which is indeterminate. From (5.4c)


sin(3x) = 3x + higher powers,
so that
sin2(3x) = (3x)2 + higher powers. (i) ➚
137
Example 5.11 continued

5.8
Also
⎛ 1 ⎞ 1 (ii)
1 − cos x = 1 − ⎜ 1 − x2 + higher powers⎟ = x2 + higher powers.

INDETERMINATE VALUES; L’HÔPITAL’S RULE


⎝ 2! ⎠ 2 !
Therefore, from (i) and (ii), for x ≠ 0
sin 2 (3x) 9x 2 + higher powers 9 + positive powers of x
= 1 2 = , (iii)
1 − cos x 2 x + higher powers 12 + positive powers of x
after cancelling the common factor x2. This expression is not problematic; we may put
x = 0 into it, obtaining
sin2(3x) 9
lim = 1 = 18.
x → 0 1 − cos x
2

Observe that only the leading terms, or the dominant terms, of the Taylor series
are needed explicitly in order to obtain the limiting value.

Example 5.12 The function


ln x
sin(πx)
is indeterminate at x = 1. Obtain the limiting value as x → 1.
Here we require the dominant terms of the Taylor series centred on x = 1. write
ln x = ln[1 + (x − 1)] = (x − 1) + ··· ,
from eqn (5.4e) in which the variable x is replaced by (x − 1). Also
sin πx = sin π[1 + (x − 1)] = sin π cos π(x − 1) + cos π sin π(x − 1)
= − sin π(x − 1) = −π(x − 1) + ··· .
Therefore, for x ≠ 1,
ln x (x − 1) +  1 + positive powers of (x − 1)
= = .
sin(πx) − π(x − 1) +  − π + positive powers of (x − 1)
The limit is obtained by putting x = 1 in the right-hand side, giving
ln x 1
lim =− .
x →1 sin(πx) π

Suppose that we require the limit of the ratio f(x)/g(x) as x → a, but f(a) and
g(a) are both zero. The dominant terms in the Taylor series can be expressed in
terms of derivatives of f(x) and g(x) as in (5.5). This leads to the following formal
statement, known as l’Hôpital’s rule:

L’Hôpital’s rule
Let f(x), g(x) be represented by Taylor series at x = a. If f(a) = g(a) = 0, and
g′(a) ≠ 0, then
f (x) f ′(a)
lim = .
x → a g(x) g ′(a) (5.6)
138
If f ′(a) = g′(a) = 0 and g″(a) ≠ 0, then l’Hôpital’s rule generalizes to
TAYLOR SERIES AND APPROXIMATIONS

f(x) f ″(a)
lim = ,
x→a g(x) g″(a)
with obvious extensions.
For example, if f(x) = x and g(x) = sin x, then, since f(0) = g(0) = 0,
f(x) f ′(x) 1
lim = lim = lim = 1.
x→0 g(x) x→0 g ′(x) x→0 cos x

Self-test 5.5
Using l’Hôpital’s rule find
ex sin x
lim 1 .
x→0 (1 − x)–2 − 1
5

Problems

5.1 Obtain a four-term Taylor polynomial (a) arcsin x; (b) arccos x; (c) arctan x;
approximation valid near x = 0 for each of the (d) e−x sin x; (e) e−x cos x.
following. Estimate the ranges of x over which
three-term polynomials will give two-decimal 5.5 (See Section 5.5.) Find three nonzero terms in
accuracy (you cannot usually tell until you have the Taylor series at x = 0 for the following
seen the next-higher term). functions and state the ranges of validity.
(a) 1 /(1 + 3x); (b) 1/(2 − x); (c) (3 − x)–;
1

(b) (1 + x)– ; (c) (1 + x)− –;


1 1 1 3
(a) e– x;
2 2 3

(d) (x − 3)–; (e) ln(9 − x); (f ) cos --12 x;


1

(d) sin 2x; (e) cos --12 x; (f ) ln(1 + x); 3

(g) (1 + x2)– (hint: consider (1 + u)– ; then


1 1 1
2 2
(g) sin x– (Consider the series for sin u; then put
2

put u = x2); u = x–.) This is not strictly a Taylor series – it is


1
2

(h) ln(1 + 3x) (see the hint in (g)). 1


spoken of as ‘a Taylor series in x– ’ – but it still
2

is useful for approximations.


5.2 Verify the coefficients of each of the infinite (h) cos x –.
1
2

Taylor series shown in (5.4) (taking the ranges of


validity for granted); namely 5.6 (See Section 5.5.) Find the first three nonzero
(a) ex; (b) sin x; (c) cos x; terms in the Taylor series at x = 0 for the following.
(d) (1 + x)α ; (e) ln(1 + x). (a) e−x /(1 + x);
(b) (1 − x)– ex;
1
2

5.3 For each of the following series give the Taylor (c) [ln(1 − x)]2 /x2.
polynomial having the lowest degree which you
think will safely give four-decimal accuracy over 5.7 (See Example 5.7.) Find the first three nonzero
the ranges given. terms in the Taylor expansions at the origin for the
(a) ex over −2  x  2; following.
(b) sin x over −2  x  2; (a) 1/[1 + ln(1 + x)];
(c) cos x over −2  x  2; (b) tan x;
(d) (1 + x)– over −0.5  x  0.5; (c) 1/(1 + ex );
1
2

(e) ln(1 + x) over −0.5  x  0.5. (d) tanh x, or (ex − e−x )/(ex + e−x ). (It is less
complicated if you firstly reduce this to
5.4 Obtain the first two nonzero terms in the a more manageable form.)
Taylor series at the origin for the following. (e) x/sin x.
139
5.8 (See Section 5.6.) Find a three-term 5.13 Obtain
approximation, valid for large enough values of x,

PROBLEMS
in each of the following cases. (1 − x)12 − 1 sin x − x
(a) lim ; (b) lim ;
–12 x→0 (1 − x)10 − 1 x→0 sin x − x cos x
A 1D A 1D
(a) 1− ; (b) 1+ – ; cos x + 1 sin x − 1
C xF C xF
1
2
(c) lim ; (d) lim1 .
x→π x−π x→ –2 π cos 5x
(c) x /(1 + x) ;
–12 –12

(d) ln(1 + x + x2); (e) 1/sin(x−1).


5.14 Show that
5.9 (a) Show that 1 /sin x ≈ (1 /x) + --x when x is 1
6 ex − 1
lim
non-zero but small enough. x→0 e −1−x
x

(b) Show that, when x is large enough,


does not exist, but that the function values
(1 + x)– ≈ x– + 1/(2x–).
1 1 1
2 2 2

(c) Show that, when x is large enough, approach −∞ as x approaches zero from the left,
(2 + x)– − (1 + x)– ≈ 1/(2x–).
1
2
1
2
1
2
and +∞ as x approaches from the right.
(d) Show that, when x is nonzero but small
5.15 (a) Obtain limx→0 sin3(3x) /(1 − cos x) by
enough, then 1/(1 − cos x)– ≈ (√2/x) + (x√2/24).
1
2

using the results of Examples 5.11 and 5.12.


(b) Obtain limx→0[(ex − 1)/x]– by using the
1
2

5.10 (a) (See Section 5.6.) Write


result of Problem 5.12a.
ln x = ln[1 + (x − 1)], (c) Find
and so obtain the Taylor series for ln x about x = 1. sin x · (2 + tan x)
State the range of validity. lim ,
x→0 x(3 − tan2 x)
(b) Obtain the Taylor series about x = 12 π for
cos x, and state the range of validity. assuming that limx→0(sin x) /x = 1.
(c) Obtain the Taylor series about x = 1 for the
1
function (1 + x) 2 , and state the range of validity. 5.16 Using l’Hôpital’s rule find lim [(3x − sin x)/x].
x→0

5.11 Suppose that f(x) has a stationary point at 5.17 Using (5.4) identify the functions which have
x = c. Write down the form of its Taylor series the following Taylor series:
about x = c, taking this into account. ∞
⎡1 n⎤

xn + 2
(a) By considering the first three terms, (a) ∑x
n=0
n
⎢ n! − (−1) ⎥ ;
⎣ ⎦
(b) ∑
n =1 n
;
rediscover the conditions on f ″(c) which
determine the type of stationary point (see (4.2)). ∞
x 2n
(b) By considering further terms of the Taylor (c) ∑ (2n)! ;
n=0
series, extend the criteria to obtain a general rule
which covers the case f ″(c) = 0. and indicate the values of x for which the series
is valid.
5.12 The following expressions are undetermined
at x = 0. Where possible, obtain the appropriate 5.18 By identifying the Taylor series find the sums
values there which make up continuous functions. of the following series.
(a) (ex − 1) /x; (a) 2 − 23
+ 25
− 27
+;
3! 5! 7!
(b) (1 − cos x)/x2;
(c) [ln(1 + x) − x]/sin x; (b) 1 + ( 12 ) + 21! ( 12 )2 + 31! ( 12 )3 +  ;
(d) sin x/(1 − cos x). (c) 1 − 1
4 + 1
16 − 1
64 + 1
256 −.
6 Complex numbers

CONTENTS

6.1 Definitions and rules 141


6.2 The Argand diagram, modulus, conjugate 144
6.3 Complex numbers in polar coordinates 146
6.4 Complex numbers in exponential form 148
6.5 The general exponential form 151
6.6 Hyperbolic functions 153
6.7 Miscellaneous applications 154
Problems 156

The quadratic equation x2 − 2x + 2 = 0 can be written in the form [(x − 1)2 − 1] + 2


= 0 (using the method known as ‘completing the square’), which becomes
(x − 1)2 = −2 + 1 = −1. If we try to continue formally by taking the square root, we
obtain x − 1 = ±√(−1), or x = 1 ± √(−1). But √(−1) is not a real number (see Sec-
tion 1.1: there is no real number whose square is negative) so we conclude that
the equation has no real solutions: put another way, the graph of y = x2 − 2x + 2
never meets the axis y = 0.
This chapter explores the implications of introducing a single new, ‘imagin-
ary’, unit into our number system, denoted conventionally by the letter i (in many
engineering applications it is denoted by j). The most obvious property distin-
guishing it from a ‘real’ number is that i2 = −1. Apart from this, we may handle
i as we would any symbol for a real number in processes such as addition, multi-
plication, use of brackets, integer powers, differentiation with respect to a para-
meter and so on. Thus i4 = (i2)(i2) = (−1)(−1) = 1, and i(1 + i)2 = i(1 + 2i + i2) = i + 2i2
+ ii2 = i + 2(−1) + i(−1) = −2. Such expressions involving i are called complex
numbers (or complex expressions). As mentioned above, the quadratic equation
x2 − 2x + 2 = 0 has no real solutions, but there are two complex solutions, x = 1 ± i.
You should check this by substituting these two complex numbers into the left-
hand side of the equation.
Complex numbers are a routine mathematical tool, and we shall use them later
in the book as occasions arise. In this chapter we show how to manipulate them,
and there is a small technical vocabulary whose terms you should memorize as
you go along.
141

6.1 Definitions and rules

6.1
A general complex number z can be written in the standard form z = x + iy,
where x and y are real numbers. In this expression x is known as the real part

DEFINITIONS AND RULES


of z, denoted by Re(z), and y (not iy) is called the imaginary part of z, denoted
by Im(z). If y = 0, then z is a real number, and if x = 0 then z is called a pure
imaginary number. Complex numbers that are not already in standard form,
such as 1/(1 + i) or e2i, can, in principle, be reduced to standard form.
We need to put together rules for manipulating complex numbers: the rules are
natural consequences of operations of addition, multiplication, etc. on numbers
containing i. The only exception to these processes is that, wherever i2 appears,
we can substitute (−1).

Example 6.1 Express i2, i3, i4, i5, and i6 in standard form.
The standard forms are
i2 = −1, i3 = i2i = −1 × i = −i.
Since −i can be written 0 + (−1)i, it follows that Re i3 = 0 and Im i3 = −1.
i4 = i2i2 = (−1)(−1) = 1, i5 = ii4 = i, i6 = ii5 = −1.

Example 6.2 Express the following in standard form, and state the real and
imaginary parts in each case:
(a) (2 + i) − (3 + 3i); (b) i(i + 2); (c) (1 − i)(1 + 2i); (d) (2 − 3i)(2 + 3i).
(a) (2 + i) − (3 + 3i) = 2 + i − 3 − 3i = −1 − 2i.
Real part = −1; imaginary part = −2.
(b) i(i + 2) = i2 + 2i = −1 + 2i.
Real part = −1; imaginary part = 2.
(c) (1 − i)(1 + 2i) = 1 + 2i − i − 2i2 = 1 + 2i − i + 2 = 3 + i.
Real part = 3; imaginary part = 1.
(d) (2 − 3i)(2 + 3i) = 22 − (3i)2 = 4 − 9i2 = 4 + 9 = 13.
Real part = 13; imaginary part = 0.

Let z1 = x1 + iy1 and z2 = x2 + iy2. In formal terms, the principal rules are as
follows.

1. Equality Two complex numbers z1 and z2 are said to be equal if and only if
x1 = x2 and y1 = y2: we write z1 = z2. (This rule may seem too obvious to be worth
recording. However, it is called for very frequently in applications and you should
attend to it – especially the ‘only if’.)

2. Sum The sum of two complex numbers z1 and z2 is given by


z1 + z2 = (x1 + iy1) + (x2 + iy2) = (x1 + x2) + i(y1 + y2), (6.1)

its real part being the sum of the real parts, and its imaginary part the sum of the
imaginary parts of z1 and z2.
142
3. Difference Similarly the difference z1 − z2 is
COMPLEX NUMBERS

z1 − z2 = (x1 + iy1) − (x2 + iy2) = (x1 − x2) + i(y1 − y2).

4. Product The product of z1 and z2 is


z1z2 = (x1 + iy1)(x2 + iy2) = x1x2 + iy1x2 + x1iy2 + iy1iy2
= x1x2 + iy1x2 + ix1y2 + i2y1y2
= x1x2 + iy1x2 + ix1y2 − y1y2 (since i2 = −1)
= (x1x2 − y1y2) + i(y1x2 + x1y2).
6

5. Conjugate In order to carry out division, a special result is needed. Suppose


that z = x + iy, where x and y are real. Then the number x − iy, where we have
changed i to −i, is called the complex conjugate of z, and will be written Z. The
product zZ is given by
zZ = (x + iy)(x − iy) = x2 − (iy)2 = x2 + y2,
which is a real positive number. (This was illustrated in Example 6.2d.)

Property of the conjugate


Let z = x + iy, with x, y real. Then Z = x − iy and zZ = x2 + y 2. (6.2)

6. Reciprocal The reciprocal of a complex number must be written in standard


form. Let z = x + iy, and consider
1 1
= .
z x + iy
This is not in standard form a + ib: to reduce it to standard form, multiply it by
the factor
x − iy
,
x − iy
x − iy being the conjugate of x + iy. This factor is equal to 1, and it will not affect
the value of 1/z. Hence
1 1 x − iy x − iy x y
= = 2 = 2 −i 2 (from (6.2)),
z x + iy x − iy x + y 2 x +y 2 x + y2
which is in standard form. This process makes the denominator real, and so enables
us to reduce quotients to standard form.

7. Quotient The quotient rule is


z1 x + iy1 (x + iy1 )(x2 − iy2 ) (x x + y1y2 ) + i(x2 y1 − x1y2 )
= 1 = 1 , = 1 22 ,
z2 x2 + iy2 (x2 + iy2 )(x2 − iy2 ) x 2 + iy2 x2 − ix2 y2 + y22
(x1 x2 + y1y2 ) + i(x2 y1 − x1y2 )
= .
x 22 + y22
143

Example 6.3 Reduce to standard form

6.1
1 1 1−i 1
(a) ; (b) ; (c) ; (d) .
2 + 3i 2 − 3i 1+i i

DEFINITIONS AND RULES


The standard forms are:
1 1 2 − 3i 2 − 3i 2 3
(a) = = 2 = − i;
2 + 3i 2 + 3i 2 − 3i 2 + 3 2
13 13
1 1 2 + 3i 2 + 3i 2 3
(b) = = 2 = + i;
2 − 3i 2 − 3i 2 + 3i 2 + 3 2
13 13
1−i 1−i 1 − i 1 − 2i + i2
(c) = = 2 = − i;
1+i 1+i 1−i 1 + 12
1 1 ⎛ −i ⎞
(d) = ⎜ ⎟ = −i.
i i ⎝ −i ⎠

Example 6.4 Find the standard form of the complex numbers (a) z1 + z2,
(b) 2z1 − 3z2, (c) z1z2, (d) z 21 /z2, where z1 = −1 + 2i and z2 = 2 − 3i.
(a) z1 + z2 = (−1 + 2i) + (2 − 3i) = 1 − i.
(b) 2z1 − 3z2 = 2(−1 + 2i) − 3(2 − 3i) = −2 + 4i − 6 + 9i = −8 + 13i.
(c) z1z2 = (−1 + 2i)(2 − 3i) = −2 + 4i + 3i − 6i2 = −2 + 4i + 3i + 6 = 4 + 7i.
(d) First
z 12 = (−1 + 2i)(−1 + 2i) = 1 − 4i + 4i2 = −3 − 4i.
Then
z 21 −3 − 4i (−3 − 4i) (2 + 3i) −6 − 8i − 9i − 12i2
= = =
z2 2 − 3i (2 − 3i) (2 + 3i) 4 − 6i + 6i − 9i2
6 − 17 i 6 17
= = − i.
4+9 13 13

The conjugate Z of z has the following further properties, which are simply
applications of the rule:
to obtain the conjugate, change i to −i wherever it occurs (explicitly or
implicitly).

Properties of the conjugate


(a) z1 + z2 = Z1 + Z2.
(b) z1z2 = Z1Z2.
⎛z ⎞ Z
(c) ⎜ 1 ⎟ = 1 .
⎝ z2 ⎠ Z2
1 1
(d) x = Re z = (z + Z ), y = Im z = (z − Z ).
2 2i (6.3)
144
Property 6.3(c) was illustrated by Example 6.3a,b. Similarly, for example,
COMPLEX NUMBERS

(2 + 3i)(4 + i)(3 − 2 i) (2 − 3i)(4 − i)(3 + 2 i)


the conjugate of is ; we do not have to
(1 − 4 i) (1 + 4 i)
work the whole thing out first and then find the conjugate. These rules become
important later.

Self-test 6.1
Find the standard form of (a) z1z2; (b) z1 /z2; (c) (z 12/z2) − (Z 12/Z2 ), where z1 = 1 + 2i
6

and z2 = 2 − i.

6.2 The Argand diagram, modulus, conjugate


A complex number z = x + iy can be regarded as a pair of real numbers (x, y)
known as an ordered pair. The pair of numbers can be interpreted as the carte-
sian coordinates of a point in the plane in the usual way. The complex number
z = x + iy has an abscissa x and an ordinate y. In Fig. 6.1, the x axis is known as the
real axis and the y axis is the imaginary axis. The number z = x + iy is represented
by the point P : (x, y). A figure showing complex numbers is known as the Argand
diagram of the complex numbers.

y
Imaginary axis

P : (x, y), representing


z = x + iy
2 )
2 +y
(x
r =√ y
Real
axis
θ
x
O x

Fig. 6.1 The Argand diagram.

The length OP = r = √(x2 + y2)  0 is called the modulus of z (or simply


‘mod z’) and written | z |.
The polar angle θ (see Section 1.6) is called an argument of z, and is written
arg z. As in Section 1.6, polar angles differing by a multiple of 2π are equivalent;
that is to say, all such arguments define the same complex mumber, though some
may be more convenient than others.

Example 6.5 Obtain | z | where (a) z = 2 + 3i; (b) z = 2 − 3i.


–1
(a) | z| = |2 + 3i| = (22 + 32)2 = √13.
–1
(b) | z| = |2 + (−3)i| = [22 + (−3)2]2 = √13.
Thus |2 + 3i| = |2 − 3i|. (The modulus of a number and the modulus of its conjugate are
always equal.)
145

Example 6.6 Let z1 = 1 − 3i and z2 = 3 − 2i. Find

6.2
(a) |z1 + z2 |; (b) |z1 | + |z2 |; (c) | z1z2 |;

THE ARGAND DIAGRAM, MODULUS, CONJUGATE


z1 | z1 |
(d) |z1||z2|; (e) ; (f) .
z2 | z2 |
(a) z1 + z2 = (1 − 3i) + (3 − 2i) = 4 − 5i
(this must be worked out in standard form first). Hence
–1
|z1 + z2 | = |4 − 5i| = [42 + (−5)2]2 = √41.
1 –1
(b) |z1 | + |z2 | = [12 + (−3)2]2 + [32 + (−2)2]2 = √10 + √13.

–1
(c) | z1z2 | = |(1 − 3i)(3 − 2i)| = |−3 − 11i| = [(−3)2 + (−11)2]2 = √130.
(d) | z1 ||z2 | = |1 − 3i||3 − 2i| = √10√13 = √130
(i.e. the same as (c)).
z1 1 − 3i 3 + 2i 1
(e) = = (9 − 7 i).
z2 3 − 2i 3 + 2i 13
Therefore
z1 1 1 √10
= | 9 − 7 i | = √130 = .
z2 13 13 √13
| z1 | | 1 − 3i | √10
(f) = = (i.e. the same as (e)).
| z2 | | 3 − 2i | √13

The following results hold good for the modulus; they are illustrated in
Example 6.6 above.

Properties of the modulus


1
If z = x + iy (x and y real), then |z | = (x2 + y2)–2 , and
(a) | Z | = |z |,
(b) zZ = |z |2,
(c) |z1z2 | = | z1 ||z2 |,
(d) | z1 /z2 | = | z1 |/|z2 |,
(e) the distance between two points z1 and z2 is |z1 − z2 | = |z2 − z1 |. (6.4)

The identity (6.4b) follows directly from (6.2). We shall defer the proof of (c) and
(d), but the truth is illustrated in Example 6.6c,d,e,f. The modulus of a sum or dif-
ference cannot be split in this way: contrast the results of Examples 6.6a and 6.6b.
The sum of two complex numbers can be interpreted by the parallelogram law
of addition in the Argand diagram, as in Fig. 6.2. Construct a parallelogram on
OP1 and OP2, where P1 and P2 correspond to the complex numbers z1 and z2. The
corner P of the parallelogram represents the sum z1 + z2. This follows from the
addition rule for complex numbers. If you know anything about vectors, you will
recognize that complex numbers add like vectors.
The conjugate Z = x − iy is the reflection of z in the x axis, shown in Fig. 6.3.
146

y z : (x, y)
COMPLEX NUMBERS

P : z1 + z2
y

P2 : z2
O x

P1 : z1

O x Z : (x, −y)
6

Fig. 6.2 Parallelogram law of addition. Fig. 6.3 Argand diagram showing z and its
conjugate Z.

Self-test 6.2
If z = 1 + i, plot the points on the Argand diagram of z, Z, z2, Z2, 2z, zZ.

6.3 Complex numbers in polar coordinates


In Fig. 6.1, r and θ obviously serve as polar coordinates, as defined in (1.20). As
defined in Section 6.1, θ is called the argument of z and denoted by arg z. The
same point P can be described by the angles θ + 2πn, where n = ±1, ±2, … . Usually
we use the principal value of the argument, denoted by θ = Arg z (capital A),
which satisfies −π  θ  π, that is
−π  Arg z  π.
The pair of equations
x y
cos θ = , sin θ = ,
r r
has exactly one solution for θ within this range.

Example 6.7 Find the moduli and principal values of the arguments of the
following complex numbers: (a) z1 = 2i; (b) z2 = −1 − i; (c) z3 = −2;
(d) z 4 = 21 + 21 i√3.
The moduli are given by:
(a) | z1 | = |2i| = 2;
(b) | z2 | = |−1 − i| = √(1 + 1) = √2;
(c) | z3 | = |−2| = 2;
(d) |z4 | = | 12 + 12 i√3 | = √( 14 + 43 ) = 1 . ➚
147
Example 6.7 continued

6.3
(a) y (b) y

COMPLEX NUMBERS IN POLAR COORDINATES


2 2i 2

1 1

θ1 O
x x
−2 −1 O 1 2 −2 −1 θ2 1 2
−1 −1
−1 − i

−2 −2

(c) y (d) y
2 2

1 √3
1 1 2 + 2i

θ3 θ4
x x
−2 −1 O 1 2 −2 −1 O 1 2
−1 −1

−2 −2

Fig. 6.4

A sketch of the Argand diagram for the complex numbers helps to decide their
arguments. Figure 6.4 show their locations. Thus
(a) Arg z1 = θ1 = 12 π; (b) Arg z2 = θ2 = − 43 π; (c) Arg z3 = θ3 = π; (d) Arg z4 = θ4 = 13 π.

In Fig. 6.1, the coordinates (x, y) and the polar coordinates (r, θ ) are related by
x = r cos θ, y = r sin θ.
Hence the complex number z = x + iy can be written
z = r cos θ + ir sin θ = r(cos θ + i sin θ ), (6.5)

which is the polar form of the complex number z. Note that r  0.

Example 6.8 Express −1 + √3i in polar form.


Here
r = √[(−1)2 + (√3)2] = √(1 + 3) = 2
and θ is given by (see Fig. 6.5)
cos θ = − 12 , sin θ = 12 √3. ➚
148
Example 6.8 continued
COMPLEX NUMBERS

y
2
−1 + √3i

1
θ

−2 −1 O 1 2 x
Fig. 6.5
6

Hence from Fig. 6.5, θ = 23 π, and


−1 + √3i = 2(cos 23 π + i sin 23 π ).

Example 6.9 Obtain (a) | cos θ + i sin θ |; (b) | 1/(cos θ + i sin θ )|.
(a) |cos θ + i sin θ | = (cos2θ + sin2θ )2 = 1.
1

(b) |1/(cos θ + i sin θ )| = 1/ |cos θ + i sin θ | (by (6.4d))


= 1 (from (a)).

Self-test 6.3
If z = 1 − √3i, find the polar forms of z, Z, 2z and z2.

6.4 Complex numbers in exponential form


Consider the function
f(θ ) = cos θ + i sin θ,
where θ can take any value. Its derivative with respect to θ is
d f(θ )
= −sin θ + i cos θ = i(cos θ + i sin θ ) = if(θ ).

Hence f(θ ) satisfies a relation in which the derivative is ‘proportional’ to itself
notwithstanding that the constant of proportionality is the ‘imaginary’ number i.
As we saw in Table (3.5), a function with this property is the exponential function
k eiθ, where k is a constant. We conclude that
cos θ + i sin θ = k eiθ
for some value of k. In particular this must be true for the value θ = 0, from which
k = 1. Hence, we obtain the important result
eiθ = cos θ + i sin θ. (6.6)

The conjugate formula is


149
e −iθ
= cos θ − i sin θ, (6.7)

6.4
applying the rule of replacing i by (−i).
From (6.5), any complex number can be written in the exponential form

COMPLEX NUMBERS IN EXPONENTIAL FORM


(‘Euler’s formula’)
z = r(cos θ + i sin θ ) = r eiθ,
with its conjugate
Z = r(cos θ − i sin θ ) = r e−iθ.
As an alternative justification of (6.6), we may use the Taylor series for exponen-
tial and trigonometric functions in (5.4b,c,d). Formally, by putting x = iθ into (5.4b),
we obtain
1 1 1 1
e iθ = 1 + (iθ ) + (iθ )2 + (iθ )3 + (iθ )4 + 
1! 2! 3! 4!
1 1 1 1
= 1 + i θ − θ2 − i θ3 + θ4 + 
1! 2! 3! 4!
⎛ 1 1 ⎞ ⎛ 1 1 ⎞
= ⎜ 1 − θ 2 + θ 4 −  ⎟ + i ⎜θ − θ 3 + θ 5 −  ⎟
⎝ 2! 4! ⎠ ⎝ 3! 5! ⎠
= cos θ + i sin θ
as required.

Properties of eiθ for real θ


(a) eiθ = cos θ + i sin θ,
(b) |eiθ | = 1,
(c) conjugate of eiθ is e−iθ. (6.8)

Exponential form for a complex number


z = r eiθ = r cos θ + ir sin θ,
where r = | z | and θ is any value of arg z. (6.9)

Express the following in standard form: (a) 2 e 2 πi; (b) 3 e−πi;


1
Example 6.10
(c) e ; (d) 2 e ; (e) 3 e 4 πi.
− 12 πi 1
3πi

Remember that, in r eiθ, the numbers r and θ are polar coordinates. For these simple
cases, we can therefore put the points straight on an Argand diagram and read off
the coordinates, without needing to work out cos θ and sin θ.
(a) r = 2, θ = 12 π (90°). Hence 2 e 2 π i = 2i.
1

(b) r = 3, θ = −π (−180°). Hence 3 e−iπ = −3.


(c) r = 1, θ = 3π. Hence e3πi = −1.
(d) r = 2, θ = − 12 π (−90°). Hence 2 e − 2 π i = −2i.
1

(e) r = 3, θ = 14 π (45°). Hence 3 e 4 π i = 23 √2 + i 23 √2.


1
150
It follows by treating (6.6) and (6.7) as two simultaneous equations for sin θ and
cos θ that
COMPLEX NUMBERS

1 iθ 1 iθ
cos θ = (e + e −iθ ), sin θ = (e − e −iθ ).
2 2i (6.10)

Equation (6.6) will still be true if we replace the angle θ by nθ, where n is an
integer. Hence we obtain De Moivre’s theorem:
cos nθ + i sin nθ = eniθ = (eiθ )n = (cos θ + i sin θ )n.
6

De Moivre’s theorem
If n is any integer and θ real, then
(cos θ + i sin θ )n = cos nθ + i sin nθ. (6.11)

The complex numbers having arguments θ and θ + 2nπ are equal for all integer
values of n, since 2π is a complete revolution on the Argand diagram. Thus
ei(θ +2nπ) = eiθ ei2nπ = eiθ · 1 = eiθ
If z1 = r1eiθ , z2 = r2eiθ , then the product z1z2 = r1r2ei(θ +θ ): its argument is the sum
1 2 1 2

of the arguments of z1 and z2.

Example 6.11 Express the following complex numbers in exponential form


with principal values of the arguments (−π  θ  π): (a) i; (b) −5i; (c) −3;
(d) 4 − 4i; (e) 3 − 4i.
In each example, we put r cos θ equal to the real part and r sin θ equal to the imaginary
part of the given complex number. In each case we shall find θ by plotting the point on
an Argand diagram.
(a) r cos θ = 0, r sin θ = 1. Hence r = 1 and, in the interval −π  θ  π, we obtain
θ = 12 π. The exponential form is
i = e 2 iπ .
1

(b) r cos θ = 0, r sin θ = −5. Hence r = 5 and θ = − 12 π. The exponential form is


−5i = 5 e − 2 iπ .
1

(c) r cos θ = −3, r sin θ = 0. Hence r = 3 and θ = π. The exponential form is


−3 = 3 eiπ.
(d) r cos θ = 4, r sin θ = − 4. Hence r = √(16 + 16) = 4√2 while θ = − 14 π.
Hence
4 − 4i = 4√2 e − 4 iπ.
1

(e) r cos θ = 3, r sin θ = − 4. Hence r = √(9 + 16) = 5 while the angle α is the principal
value of the argument such that cos α = 53 , sin α = − 54 . The exponential form is
3 − 4i = 5 eiα.
where α = −53.1° or −0.927 radians.
151

Example 6.12 By expressing −1 + i in the form r eiθ, find (−1 + i)−8 as a complex

6.5
number in standard form.
First r = | −1 + i| = √2. From its position on an Argand diagram θ = 3 × 45°, or 43 π in

THE GENERAL EXPONENTIAL FORM


radians. Therefore
−1 + i = √2 e 4 π i.
3

Then
(−1 + i)−8 = (√2 e 4 π i )−8 = (√2 )−8 e − 6 π i = 161 e− 6 π i.
3

On an Argand diagram the polar coordinates are r = 161 and θ = −6π = −3(2π). This
value of θ, equivalent to three complete revolutions, puts us on the positive real axis
again, so that
(−1 + i)−8 = 161 .

Self-test 6.4
If z = −1 − i, express z10 in standard form.

6.5 The general exponential form


The advantage of the exponential form of a complex number is that it is par-
ticularly easy to differentiate, integrate, and to combine with other exponentials,
including ordinary ones. We are not tied to the Argand diagram when we
manipulate the exponentials, so we shall often use letters other than r and θ.

Example 6.13 Prove that


cos(A + B) = cos A cos B − sin A sin B,
sin(A + B) = sin A cos B + cos A sin B.
As with an ordinary exponential,
ei(A+B) = eiA eiB.
In terms of the definition (6.8), this becomes
cos(A + B) + i sin(A + B)
= (cos A + i sin A)(cos B + i sin B)
= (cos A cos B − sin A sin B) + i(sin A cos B + cos A sin B).
The real and imaginary parts of the two sides of the equation must be respectively equal
(Rule 1 of Section 6.1), and so we have the result immediately.

Example 6.14 Express cos 5θ in terms of power of cos θ.


Since
e5iθ = (eiθ )5,
it follows that
cos 5θ + i sin 5θ = (cos θ + i sin θ )5. ➚
152
Example 6.14 continued
COMPLEX NUMBERS

Expand the right-hand side by the binomial theorem. Thus


cos 5θ + i sin 5θ = cos5θ + 5 cos4θ · i sin θ + 10 cos3θ(i sin θ )2 + 10 cos2θ (i sin θ )3
+ 5 cos θ (i sin θ )4 + (i sin θ )5
= cos5θ + 5i cos4θ sin θ −10 cos3θ sin2θ − 10i cos2θ sin3θ
+ 5 cos θ sin4θ + i sin5θ.
Equate real parts on both sides of the equation:
cos 5θ = cos5θ − 10 cos3θ sin2θ + 5 cos θ sin4θ.
Finally replace sin2θ by 1 − cos2θ and simplify:
6

cos 5θ = cos5θ − 10 cos3θ (1 − cos2θ ) + 5 cos θ (1 − cos2θ )2,


= 16 cos5θ − 20 cos3θ + 5 cos θ.
(Incidentally, sin 5θ can also be found in terms of powers of sin θ, by equating the
imaginary parts.)

Example 6.15 Prove the result in (6.4d), that


z1 |z |
= 1 .
z2 | z2 |
Put z1 = r1 eiθ1 and z2 = r2 eiθ 2 . Then
z1 r1 e iθ1 r
= iθ 2
= 1 e i(θ1− θ 2).
z 2 r2 e r2
Therefore
z1 r |z |
= 1 = 1 ,
z2 r2 | z 2 |
as required.

Consider the number z where


z = c e p+iq,
and p, q, and c  0 are real numbers. We have
z = c e p+iq = c e p eiq = c e p(cos q + i sin q)
(q is assumed to be in radians if it represents an angle). Therefore we have

The form z = c e p+iq ( p, q, c real)


(a) | z|= |c| e p,
(b) arg z = q + 2nπ, (n is any integer),
(c) Re z = c e p cos q, Im z = c e p sin q. (6.12)

Note that if c  0, then (a) and (b) are different


In science and engineering, complex exponentials of the type in (6.12) are often
used to describe oscillations of various kinds. Instead of c e p+iq, the kind of
symbols that occur may look like
153
(−k+iω)t (α+iβ )t
Ae , or ce ,

6.6
in which t represents time. Recast c e(α+iβ )t by writing it as c eα t+iβ t, and we have

HYPERBOLIC FUNCTIONS
The form c e(α+iβ )t (α, β, c, t real)
(a) c eα t cos β t = Re c e(α+iβ )t,
(b) c eα t sin β t = Im c e(α+iβ )t. (6.13)

Example 6.16 The damped vibration of a piece of machinery is described by


x = 0.01 e− 0.02t cos 15t. Write this in the form x = Re(c e(α+iβ )t).
We have
x = 0.01 e−0.02t cos 15t = 0.01 e−0.02t Re ei15t = Re(0.01 e−0.02t ei15t) = Re(0.01 e(−0.02+15i)t).

Example 6.17 The current i(t) in a branch of a circuit is given by


i(t) = c e −kt
sin(ω t + φ).
Write this in the form of (a) the imaginary part of a complex function; (b) the real part
of a complex function.
(a) i(t) = Im(c e−kt ei(ω t+φ)) = Im(c e(−k+iω)t+iφ ).
(b) Note that, if z = x + iy, then
y = Im z = Re(−iz).
Therefore
i(t) = Re(−i e(−k+iω)t+iφ) = Re(c e − 2 πi e(−k+iω)t+iφ) = Re(c e(−k+iω )t+i(φ − 2 π) ).
1 1

Self-test 6.5
Express cos 4θ in powers of cos θ.

6.6 Hyperbolic functions


The hyperbolic functions cosh and sinh are related to the trigonometric functions
cos and sin. The hyperbolic functions were defined in Section 1.13 by
cosh x = 21 (ex + e− x ), sinh x = 21 (ex − e−x ).
It follows that
cosh ix = 21 (eix + e−ix) = cos x,
sinh ix = 21 (eix − e−ix) = i sin x,
by (6.10). Similarly
cos ix = 21 (ei2x + e− i2x ) = 21 (e−x + ex ) = cosh x,
1
sin i x = (ei2x − e− i2x ) = − 21 i(e−x − ex ) = i sinh x.
2i
154

Example 6.18 Solve the equation cosh z = −1.


COMPLEX NUMBERS

Since, for real z, we have cosh z  1, we expect the equation to have complex roots.
In exponential form,
2 (e + e ) = −1,
1 z −z

or
e2z + 2ez + 1 = 0,
or
(ez + 1)2 = 0.
Hence
6

ez = −1 = e(2n+1)πi (n = 0, ±1, ±2, … ).


Here we have considered all the representations of the number −1, in the form e(2n+1)πi. It
is important when finding all the roots of an equation to include all possible arguments,
not just the principal one. By matching the exponents, the solutions are
z = (2n + 1)πi (n = 0, ±1, ±2, … ).

Self-test 6.6
Find all (complex) solutions of
cosh2z − 3 cosh z + 2 = 0.

6.7 Miscellaneous applications


The polar form of complex numbers can be used to solve polynomial equations
as in the following example.

Example 6.19 Find all solutions of the equation z5 = 4 − 4i.


We first express 4 − 4i in polar form ρ eix. Thus
ρ cos α = 4, ρ sin α = − 4,
from which it follows that ρ = √32 and α = − --14 π + 2nπ, using an Argand diagram. All the
polar representations of 4 − 4i are given by
4 − 4i = 4 √2 e− 4 iπ + 2 n πi = 2 2 eiπ(− 4 +2n) (n = 0, ±1, ±2, … ).
1 5 1

Let z = r eiθ. Then


r5 e5iθ = 2–2 eiπ (− –4 + 2n),
5 1

so that
r 5 = 2–2 and 5θ = π(− 14 + 2n),
5

with n = 0, ±1, ±2, … . Therefore


r = √2, θ = 201 (−1 + 8n)π.
Five successive values of n give distinct solutions; other values of n merely duplicate
existing solutions. In full, the solutions (the five fifth roots of 4 − 4i) are
√2 e− ––20 iπ, √2 e ––20 iπ, √2 e–4 iπ, √2 e ––20 iπ, √2 e ––20 iπ.
1 7 3 23 31
155
The exponential form (6.8a) can be used to express cos nθ and sin nθ in terms of
powers of cos θ and sin θ respectively, as we saw in Example 6.14. Conversely, we

6.7
can express cosnθ and sinnθ in terms of cosines and sines of multiple angles.

MISCELLANEOUS APPLICATIONS
Example 6.20 Expand cos6θ in terms of multiple angles.
Let z = cos θ + i sin θ. By De Moivre’s theorem, with n an integer,
1
zn = cos nθ + i sin nθ, = cos nθ − i sin nθ.
zn
By adding these two results, it follows that
1⎛ 1⎞
cos nθ = ⎜ z n + n ⎟ . (6.14)
2⎝ z ⎠
Hence
6
⎛ 1⎞ 15 6 1
(2 cos θ )6 = ⎜ z + ⎟ = z 6 + 6z 4 + 15z 2 + 20 + 2 + 4 + 6
⎝ z⎠ z z z
⎛ 1⎞ ⎛ 1⎞ ⎛ 1⎞
= ⎜ z 6 + 6 ⎟ + 6 ⎜ z 4 + 4 ⎟ + 15 ⎜ z 2 + 2 ⎟ + 20
⎝ z ⎠ ⎝ z ⎠ ⎝ z ⎠
= 2 cos 6θ + 12 cos 4θ + 30 cos 2θ + 20,
by repeated use of (6.14). Finally
cos6θ = 321 cos 6θ + 163 cos 4θ + 15
32 cos 2θ + 16 .
5

We can also use the polar form to sum certain series as in the following
example.

Example 6.21 Find the sum of the series


cos 2θ cos 3θ
f(θ ) = 1 + cos θ + + + .
2! 3!
In summation notation, the series can be written
x
cos nθ
f (θ ) = ∑ n!
.
n= 0

Since cos nθ = Re eniθ, consider the series


e2 iθ e 3iθ
S(θ ) = 1 + eiθ + + +,
2! 3!
the real part of whose sum is the required sum f(θ ). Thus
(e iθ )2 (e iθ )3
S(θ ) = 1 + eiθ + + +
2! 3!
= exp eiθ = ecosθ +i sinθ = ecosθ[cos(sin θ ) + i sin(sin θ )],
using the formula (5.4b) for the power series of the exponential function. Thus
f(θ ) = Re S(θ ) = ecosθ cos(sin θ ).
156

Self-test 6.7
COMPLEX NUMBERS

Find the sum of the series


cos 2θ cos 4θ …
S(θ) = 1 − + − .
2! 4!
6

Problems

6.1 (Section 6.1). Find the solutions of the Express the following complex numbers in
following quadratic equations: standard form:
(a) x2 + 2x + 5 = 0; (b) x2 − 6x + 10 = 0; (a) z1 + z2 + z3; (b) z1z2 + Z3; (c) z1z2z3;
(c) x2 + 2ix + 3 = 0. (d) z1Z2 /z3; (e) z21z22 − 2z32.

6.2 (Section 6.1). Find all the complex solutions of 6.9 (Section 6.3). Find the modulus and principal
x4 + 3x2 − 4 = 0. argument of each of the following complex
numbers:
6.3 (Section 6.1). Express the following complex (a) z1 = −2 + 2i; (b) z2 = 4 − 4√3i; (c) z3 = −5i;
numbers in standard form: (d) z4 = −3; (e) z5 = 3 + 4i.
(a) (1 − i) + (3 + 4i); (b) 2(3 − i) + 3(−1 − i);
(c) 3(−1 + i) − 4(2 − 3i); (d) 3(1 + i)(2 − i); 6.10 (Section 6.2). Let z = x + iy. Express each of
2+i (2 + i)(7 + 5i) the following equations in the complex variable z in
(e) ; (f) ; (g) (−1 + 2i)2;
3−i 3−i real form in terms of x and y. Sketch, and identify
1 in each case, the corresponding curve in the Argand
(h) (−1 + 2i)2 + ; (i) (1 + i)5.
(−1 + 2 i)2 diagram:
(a) zZ = 1; (b) Im z = 2;
6.4 (Section 6.1). Find the boundary curve in the
(c) | z − a | = 1, where a is a complex number;
(p, q) plane which separates the (p, q) values giving
(d) (z − Z)2 = −8(z + Z);
real roots from those for the complex roots in the
(e) | z − 1| + | z + 1| = 4;
quadratic equation
(f) Arg z = --14 π (see Section 6.3);
x2 + px + q = 0, (g) | z | = arg z.
where p and q are real parameters. Of the real
solutions, where in the (p, q) plane do the solutions 6.11 (Section 6.4). Express the following complex
which are both negative lie? numbers in exponential form with principal values
of the arguments:
6.5 (Section 6.1). Let z1 = 3 − i and z2 = 1 + 2i. Find, (a) −1 + i; (b) −2; (c) −3i;
in standard form, the complex numbers (d) 7 − 7i√3; (e) (1 − i)(1 + i√3);
(a) z1 + z2; (b) z1z2; (c) z1 /z2; (d) z1 /z22. 1−i
(f) ; (g) e2+i; (h) (1 + i) e2i;
1 + √3
6.6 (Section 6.1). Let z1 = 2 + 3i and z2 = −2 + i. (1 + i)4
Find the following complex conjugates: (i) (1 − i√3)9; ( j) .
2 − 2i
(a) z1 + z 2; (b) z1z 2; (c) z1 /z 2; (d) Z1 /Z2 .
6.12 (Section 6.3). Using Euler’s formula (6.8)
6.7 Let z = 1 + i. Find the following complex for e±iθ, obtain the trigonometric identities for
numbers in standard form and plot their cos(θ1 ± θ2) and sin(θ1 ± θ2).
corresponding points in the Argand diagram:
(a) Z; (b) z2; (c) Z 2; (d) 1/Z ; (e) z /Z. 6.13 (Section 6.2). Using the parallelogram rule,
sketch the locations in the Argand diagram, for
6.8 (Section 6.5). Three complex numbers are general complex numbers z1 and z2, the following
given by z1 = 2 e1+i, z2 = 3 e−i, and z3 = --12 e−1+2i. points: z1 + z2, Z1 + Z2, z1 − z2, Z1 + z2, z1 − Z2.
157
6.14 Let f(θ ) = cos θ + i sin θ. Verify that 6.22 (Section 6.5). The current in a branch of a
circuit is given by
d f(θ )

PROBLEMS
2
= −f(θ ). i(t) = c e−0.05t sin(0.4t + 0.5).
dθ 2
Show that it is still true if Write this in the form of the real part of a complex
function.
f(θ ) = a cos θ + b sin θ,
where a and b are arbitrary complex numbers. 6.23 A function f(z), where z = x + iy, is known as
a function of complex variable z. Find the real and
6.15 Prove that tan ia = i tanh a, where a is a imaginary parts of the following functions in terms
real number. of x and y:
(a) z2; (b) z + 2z2 + 3z3; (c) sin z;
6.16 Find all the complex solutions of the (d) cos z; (e) ez cos z; (f) ez .
2

following equations:
(a) cosh z = 1; (b) sinh z = 1; 6.24 Let w = f(z), where z = x + iy and w = u + iv
(c) ez = −1; (d) cos z = √2. are complex variables. If f(z) = z2, find u and v
in terms of x and y. The relation represents a
6.17 The logarithm of a complex number z = reiθ mapping between two Argand diagrams. What
is defined by curves do the hyperbolas x2 − y2 = 1 and xy = 1
log z = ln r + iθ + 2nπi, map into in the (u, v) plane?
where n is any integer, since elnr +iθ +i2nπ = elnr +iθ e i2nπ
and ei2nπ = 1. Therefore log z is a multi-valued 6.25 Show that the mapping (see Problem 6.24)
function. The principal value of the logarithm is c
w=z+ ,
denoted by Log z (note the capital letter L), and z
is defined by
where z = x + iy and w = u + iv and c is a real
Log z = ln r + iθ, number, maps the circle |z | = 1 in the z plane into
where −π  θ  π. an ellipse in the w plane, and find its equation.
(a) Find the principal value Log(1 + i√3), and
indicate its location on the Argand diagram. 6.26 (Section 6.7). Show that
(b) Find all roots of the equation log z = πi. cos6θ = 321 (cos 6θ + 6 cos 4θ + 15 cos 2θ + 10),
(c) Express Log(ei) in standard form.
(d) Show that elog z = z. and find sin6θ.

6.18 If z ≠ 0 and c are complex numbers, then z c is 6.27 The damped oscillation of a vibrating block
defined by is given by
z c = ec lnz x = Re z, z = e(−0.2+0.5i)t,
(see Problem 6.17). in terms of the time t. Find x, and determine the
(a) Express 2i in standard form. values of t where x is zero. Find the velocity of
(b) Find the principal value of the argument of ii. the block
(c) Find all complex roots of zi = −1. (a) as dx/dt;
(b) as Re dz /dt;
6.19 (Section 6.4). Find all complex solutions of and confirm that the answers are the same.
z5 = −1, and sketch their locations on the Argand
diagram. 6.28 Given that 2 + i is a solution of the equation
z4 − 2z3 − z2 + 2z + 10 = 0,
6.20 (Section 6.5). Find the modulus, argument,
find the other three solutions.
and real and imaginary parts of each of the
following complex numbers:
6.29 Find the sum of the series
(a) 2 e3+2i; (b) 4 ei; (c) 5 ecos – π+i sin – π; (d) e1+i.
1 1
4 4

sin 2θ sin 3θ
(a) 1 − sin θ + − +;
6.21 (Section 6.5). An oscillation in a system is 2! 3!
given by x = 0.04 e−0.01t sin 12t. Write this in the form 2 2 cos 2θ 23 cos 3θ
(b) 1 + 2 cos θ + + +.
x = Re(c eα +iβ ). 2! 3!
Part 2
Matrix and vector algebra
Matrix algebra
7

CONTENTS

7.1 Matrix definition and notation 161


7.2 Rules of matrix algebra 162
7.3 Special matrices 168
7.4 The inverse matrix 172
Problems 177

In many applications in physics and engineering it is useful to represent and manip-


ulate data in tabular or array form. A rectangular array which obeys certain
algebraic rules of operation is known as a matrix. Matrices are important in the
representation and solution of linear algebraic equations as we shall see in later
chapters.

7.1 Matrix definition and notation


Capital letters are usually used to denote matrices. Thus

⎡1 2 −1 ⎤
A=⎢ ⎥
⎣0 3 − 4⎦

is a matrix with two rows and three columns. The individual terms are known as
elements: the element in the second row and third column is − 4. This matrix is
said to be of order 2 × 3, or a 2 × 3 matrix. A general m × n matrix, one with m
rows and n columns, can be represented by the notation

⎡ a11 a12 … a1n ⎤


⎢a a22 … a2 n ⎥
A = ⎢ 21 ⎥ = [aij ] (1  i  m, 1  j  n),
⎢     ⎥
⎢⎣am1 am2 … amn ⎥⎦

where aij is the element in the ith row and jth column of A; or by
A = [aij : i = 1, … , m; j = 1, … , n],
162
or simply
MATRIX ALGEBRA

A = [aij]
for brevity, if it is clear in context that the matrix is m × n.
A 1 × 1 matrix is simply a number: for example, [−5] = −5. Matrices which have
either one row or column are known as vectors. Thus

⎡ −1.1⎤
[1.3 −1.1 2.9 4.6] and ⎢ 6.5⎥
⎢−2.0⎥
7

⎣ ⎦
are respectively row and column vectors.
A matrix in which the number of rows m equals the number of columns n is
called a square matrix: if m ≠ n then the matrix is said to be rectangular.

Self-test 7.1
The elements of a matrix are given by
aij = (−i)j (i = 1, 2, 3; j = 1, 2, 3).
Write out the matrix in full.

7.2 Rules of matrix algebra


We need to define consistent algebraic rules for manipulating matrices, such
processes as addition, multiplication, etc. As we shall see, these rules have their
origins in the representation of linear equations and linear transformations, but
for the present we simply state them as a list of rules.

1. Equality. Two matrices can only be equated if they are of the same order: that
is, if they each have the same number of rows and the same number of columns.
They are then said to be equal if the corresponding elements are equal. Thus if

⎡a b ⎤ ⎡e f ⎤
A=⎢ ⎥ and B = ⎢g h⎥ ,
⎣c d ⎦ ⎣ ⎦
then A = B if and only if a = e, b = f, c = g, and d = h. In general, if A = [aij] and
B = [bij] are both m × n matrices, then A = B if and only if aij = bij for i = 1, 2, … ,
m and j = 1, 2, … , n.

Example 7.1 Solve the equation A = B when


⎡x 1 2⎤ ⎡ 1 1 2⎤
A=⎢ , B=⎢ ⎥.
⎣ 0 x 2 − y 3⎥⎦ ⎣0 2 3⎦
Since A and B must have the same elements, if follows that x = 1 and x2 − y = 2. Hence,
y = x2 − 2 = −1. The solution is x = 1, y = −1.
163
2. Multiplication by a constant. Let k be a constant or scalar (an alternative
name for a number in algebra). By the product kA we mean the matrix in which

7.2
every element of A is multiplied by k. Thus, if

RULES OF MATRIX ALGEBRA


⎡ 2.0 1.5 3.1⎤
A=⎢ ⎥,
⎣−1.2 3.0 − 4.6⎦
and k = 10, then

⎡ 10 × 2.0 10 × 1.5 10 × 3.1 ⎤ ⎡ 20 15 31⎤


kA = ⎢
10 × − 1. 2 10 × 3. 0 10 × − 4. 6 ⎥ = ⎢−12 30 − 46⎥ .
⎣ ⎦ ⎣ ⎦
Equally, we can ‘factorize’ a matrix. Thus

⎡ 5 25 −30⎤ ⎡ 1 5 − 6⎤
A=⎢ ⎥ = 5 ⎢2 3 −1⎥ .
⎣10 15 −5⎦ ⎣ ⎦

3. Zero matrix. Any matrix in which every element is zero is called a zero or null
matrix. If A is a zero matrix, we can simply write A = 0.

4. Matrix sums and differences. The sum of two matrices A and B has meaning
only if A and B are of the same order, in which case A + B is defined as the matrix
C whose elements are the sums of the corresponding elements in A and B. We
write C = A + B. Thus, if A = [aij] and B = [bij] are both m × n matrices, then
C = A + B = [aij + bij].

To summarize:
If A = [aij] and B = [bij] (1  i  m; 1  j  n) then
1. Equality: A = B if and only if aij = bij;
2. Scalar factor k: kA = [kaij];
3. Zero: A = 0 if and only if aij = 0 for all i, j;
4. Sum and diffence: A + B = [aij + bij], A − B = [aij − bij].
(7.1)

Example 7.2 If
⎡ 1 3⎤ ⎡− 4 − 6 ⎤
A = ⎢2 2⎥ and B = ⎢ −5 −5⎥ ,
⎢ 3 1⎥ ⎢ − 6 − 4⎥
⎣ ⎦ ⎣ ⎦
then find A + B, B + A, and A + 2B.
We have
⎡1 − 4 3 − 6 ⎤ ⎡−3 −3⎤ ⎡1 1⎤
A + B = ⎢2 − 5 2 − 5⎥ = ⎢−3 −3⎥ = −3 ⎢1 1⎥ (by Rule 2).
⎢3 − 6 1 − 4 ⎥ ⎢−3 −3⎥ ⎢1 1⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦

164
Example 7.2 continued
MATRIX ALGEBRA

Also
⎡− 4 + 1 −6 + 3⎤ ⎡−3 −3⎤
B + A = ⎢−5 + 2 −5 + 2⎥ = ⎢−3 −3⎥ = A + B.
⎢− 6 + 3 − 4 + 1⎥ ⎢−3 −3⎥
⎣ ⎦ ⎣ ⎦
Further
⎡ 1 3⎤ ⎡− 4 − 6⎤ ⎡ 1 3⎤ ⎡ −8 −12⎤
A + 2B = ⎢2 2⎥ + 2 ⎢ −5 −5⎥ = ⎢2 2⎥ + ⎢−10 −10⎥ (by Rule 2)
7

⎢3 1⎥ ⎢− 6 − 4⎥ ⎢3 1⎥ ⎢−12 −8⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
⎡−7 −9⎤
= ⎢ −8 −8⎥ .
⎢ −9 −7 ⎥
⎣ ⎦

As the second sum suggests, the commutative property of the real numbers,
namely aij + bij = bij + aij, implies the commutative property of matrix addition,
that is
A + B = B + A.
The difference of two matrices is written as A − B which is interpreted as
A + (−1)B, using Rule 2 for the multiplication of B by the number −1, and then
Rule 4 for the sum of A and (−1)B. In practice, we simply take the difference of
corresponding elements.

Example 7.3 Find A − B and 2A − 3B, if


⎡1 −1 2⎤ ⎡2 −2 −3⎤
A=⎢ ⎥ , B = ⎢ 1 0 −1⎥ .
⎣0 2 −3⎦ ⎣ ⎦
We have
⎡1 −1 2⎤ ⎡2 −2 −3⎤ ⎡1 − 2 −1 + 2 2 + 3⎤ ⎡−1 1 5⎤
A−B=⎢ −⎢ =⎢ = .
⎣0 2 −3⎥
⎦ ⎣ 1 0 −1⎥
⎦ ⎣0 − 1 2 − 0 −3 + 1⎥⎦ ⎢⎣−1 2 −2⎥⎦
Also
⎡1 −1 2⎤ ⎡2 −2 −3⎤ ⎡2 −2 4⎤ ⎡6 − 6 −9⎤
2A − 3B = 2 ⎢ ⎥ − 3⎢ = −
⎣0 2 −3⎦ ⎣ 1 0 −1⎥⎦ ⎢⎣0 4 − 6⎥⎦ ⎢⎣3 0 −3⎥⎦

⎡− 4 4 13⎤
=⎢ .
⎣ −3 4 −3⎥⎦

The rules of arithmetic as applied to the elements of matrices lead to the


following results for matrices for which addition can be defined:
(a) A + (B + C) = (A + B) + C (associative law of addition).
165
In other words, the order of addition of matrices is immaterial.

7.2
(b) k(A + B) = kA + kB, (k + l )A = kA + lA.

RULES OF MATRIX ALGEBRA


5. Matrix multiplication. We now need to define the concept of the product of
two matrices. Not all matrices can be multiplied: they must have the right shape,
or be conformable for multiplication to be defined. The product of A and B, in
this order, is written as AB (no product sign is used), but it is only defined if the
number of columns in A equals the number of rows in B. The product BA might
not exist, and if it does, it will not in general be equal to AB.
Let us look at the case where A is a 1 × 3 matrix, which is a row vector, and B is
a 3 × 1 matrix, which is a column vector, given by

⎡b11 ⎤
A = [a11 a12 a13], B = ⎢b21 ⎥ .
⎢b ⎥
⎣ 31 ⎦

The product AB is defined as the 1 × 1 matrix C given by

⎡b11 ⎤
AB = [a11 a12 a13] ⎢b21 ⎥ = [a11b11 + a12b21 + a13b31] = C. (7.2)
⎢b ⎥
⎣ 31 ⎦

Here, the single surviving element in C is the sum of the products of corre-
sponding elements from the row in A and the column in B. Thus the product of a
1 × 3 matrix and a 3 × 1 matrix is a 1 × 1 matrix (or simply on ordinary number).
This is known as a row-on-column operation.
Suppose now that A is a 2 × 3 matrix and that B a 3 × 2 matrix which are
given by

⎡b11 b12 ⎤
⎡a a a13 ⎤
A = ⎢ 11 12 , B = ⎢b21 b22 ⎥ .
⎣a21 a22 a23 ⎥⎦ ⎢b ⎥
⎣ 31 b32 ⎦

The product AB is now a 2 × 2 matrix C defined by

⎡b b ⎤
⎡a a a13 ⎤ ⎢ 11 12 ⎥
AB = ⎢ 11 12 b b
⎣ 21 a22
a a23 ⎥⎦ ⎢ 21 22 ⎥
⎣b31 b32 ⎦
⎡a b + a12 b21 + a13 b31 a11b12 + a12 b22 + a13 b32 ⎤ (7.3)
= ⎢ 11 11 ⎥
⎣a21 b11 + a22 b21 + a23 b31 a21 b12 + a22 b22 + a23 b32 ⎦
= C.
Note that each row in A ‘operates’ on each column in B giving four elements in
the 2 × 2 matrix C.
166

Example 7.4 Find AB if


MATRIX ALGEBRA

⎡ 0 3⎤
⎡ 1 −1 0⎤
A=⎢ , B = ⎢ 1 −1⎥ .
⎣2 1 −3⎥⎦ ⎢−2 4⎥
⎣ ⎦
We have
⎡ 0 3⎤
⎡ 1 −1 0⎤ ⎢
AB = ⎢ 1 −1⎥
1 −3⎥⎦ ⎢
⎣2 −2 4⎥⎦
7


⎡1 × 0 + (−1) × 1 + 0 × (−2) 1 × 3 + (−1) × (−1) + 0 × 4 ⎤ ⎡−1 4⎤
=⎢ = .
⎣2 × 0 + 1 × 1 + (−3) × (−2) 2 × 3 + 1 × (−1) + (−3) × 4⎥⎦ ⎢⎣ 7 −7 ⎥⎦

We can use a summation (Section 1.15) notation to condense the expanded


sums of products which occur in matrices. The sum of a string of numbers, say,
c1 + c2 + c3 can be expressed as
3

∑c ,
i =1
i

where i runs through all the integers from the lower limit on i under the ∑ symbol,
to the upper limit above. Thus, for example,
8

∑h
i =3
i = h3 + h4 + h5 + h6 + h7 + h8.

We can also use the summation notation with the double-suffix notation, as in
4

∑h
i =1
i6 = h16 + h26 + h36 + h46.

The product given by (7.2) can be written

⎡ 3 ⎤ 3
AB = [a11b11 + a12b21 + a13b31] = ⎢∑ a1 j bj1 ⎥ = ∑a 1j bj 1 .
⎢⎣ j =1 ⎥⎦ j =1

Similarly the elements in the square matrix (7.3) can be expressed as

⎡ 3 ⎤
AB = ⎢∑ aik bk j : i, j = 1, 2⎥ .
⎢⎣ k =1 ⎥⎦
This example gives a clue to the general expression for the product of an m × n
matrix A and an n × p matrix B. Remember that the number of columns in A must
always equal the number of rows in B for the product to be defined. Thus, the
row-on-column definition of the product is the m × p matrix
⎡n ⎤
AB = ⎢∑ aik bk j : i = 1, … , m; j = 1, … , p⎥ .
⎢⎣ k =1 ⎥⎦
167

Multiplication rule

7.2
The element in the ith row and jth column of the product consists of the
row-on-column product of the ith row in A and the jth column in B. (7.4)

RULES OF MATRIX ALGEBRA


Example 7.5 If A is a 5 × 4 matrix, B is a 4 × 5 matrix, and C is a 6 × 4 matrix,
which of the following products are defined: AB, BA, AC, CB, (AB)C, (CB)A?
AB is a 5 × 5 matrix.
BA is a 4 × 4 matrix.
AC is not defined since A has four columns and C has six rows.
CB is a 6 × 5 matrix.
(AB)C is not defined since AB is a 5 × 5 and C is a 6 × 4 matrix.
CB is a 6 × 5 matrix; hence (CB)A is a 6 × 4 matrix.

One conclusion which can be inferred from the previous example is that matrix
multiplication does not, in general, commute; that is, in general, AB ≠ BA. As the
previous example indicates, one or both products may not be defined; when both
are defined, AB and BA may be of different order; and, even when both are defined
and of the same order, AB is generally not equal to BA. So we must be careful about
the order of multiplication. In the product AB, we say that A is multiplied on the
right by B, or that B is multiplied on the left by A. The expressions ‘A postmultiplied
by B’ and ‘B premultiplied by A’ are also used. Statements such as ‘A is multiplied
by B’ can be ambiguous without carefully stating how the product occurs.

Example 7.6 If
⎡1 2⎤
⎡1 −1 0⎤
A=⎢ ⎥, B = ⎢1 2⎥ ,
⎣3 −2 −1⎦ ⎢1 2⎥
⎣ ⎦
calculate AB and BA.
AB will be a 2 × 2 matrix and BA a 3 × 3 matrix. We have
⎡1 2⎤
⎡1 −1 0⎤ ⎢ ⎡0 0⎤
AB = ⎢ ⎥ 1 2⎥ = ⎢ = 0,
⎣3 −2 −1 ⎢
⎦ 1 2 ⎥ ⎣0 0⎥⎦
⎣ ⎦
and
⎡1 2⎤ ⎡7 −5 −2⎤
⎡1 −1 0⎤ ⎢
BA = ⎢1 2⎥ ⎢ ⎥ = 7 −5 −2⎥ .
⎢1 2⎥ ⎣3 −2 −1⎦ ⎢7 −5 −2⎥
⎣ ⎦ ⎣ ⎦

This example illustrates the point that AB can be a zero matrix without either
A or B or BA being zero. Also, as a consequence, A(B − C) = 0 does not necessarily
imply B = C.
168
We state the following results concerning sums and products, but proofs are
omitted:
MATRIX ALGEBRA

(a) A(B + C) = AB + AC (distributive law of addition),


(b) A(BC) = (AB)C (associative law of multiplication),
provided that the products are defined.

Self-test 7.2
7

If
G1 1J
G 1 −1 2 J
A=I H
, B = 2 −1K ,
3 0 −3 L
I 1 −1L
calculate AB and BA.

7.3 Special matrices


We define and give properties of several special matrices. Some properties apply
to rectangular matrices; others are specific to square matrices.

The transpose of any matrix is one in which the rows and columns are inter-
changed. Thus the first row becomes the first column, the second row the second
column, and so on. We denote the transpose of A by AT. Hence,

⎡a11 a12 ⎤
⎡a a a31 ⎤
if A = ⎢a21 a22 ⎥ then AT = ⎢ 11 21 .
⎢a ⎥ ⎣a12 a22 a32 ⎥⎦
⎣ 31 a32 ⎦

The 3 × 2 matrix A becomes the 2 × 3 matrix AT.

Example 7.7 Find the transposes of A, B, A + BT, and AB, where


⎡ 1 2⎤
⎡3 −1 0⎤
A = ⎢ 0 1⎥ , B=⎢ ⎥.
⎢−1 1⎥ ⎣1 2 −2⎦
⎣ ⎦
Confirm that (AB)T = BTAT.
We see that
⎡ 3 1⎤
⎡1 0 −1⎤
AT = ⎢ , BT = ⎢−1 2⎥ ,
1⎥⎦ ⎢ 0
⎣2 −2⎥⎦
1

T T
⎛ ⎡ 1 2⎤ ⎡ 3 1⎤⎞ ⎡ 4 3⎤
⎡4 −1 −1⎤
(A + BT )T = ⎜ ⎢ 0 1⎥ + ⎢−1 2⎥⎟ = ⎢−1 3⎥ = ⎢ .
⎜ ⎢−1 1⎥ ⎢ 0 −2⎥⎟
⎝⎣
⎢−1 −1⎥ ⎣3 3 −1⎥⎦
⎦ ⎣ ⎦⎠ ⎣ ⎦ ➚
169
Example 7.7 continued

7.3
Also, note that
⎡ 1 0 −1⎤ ⎡3 −1 0⎤ ⎡4 −1 −1⎤

SPECIAL MATRICES
AT + (BT )T = ⎢ + =⎢ = (A + BT )T .
⎣2 1 1⎥⎦ ⎢⎣1 2 −2⎥⎦ ⎣3 3 −1⎥⎦
T T
⎛ ⎡ 1 2⎤ ⎞ ⎡ 5 3 − 4⎤ ⎡ 5 1 −2⎤
⎡3 −1 0⎤⎟
(AB)T = ⎜ ⎢ 0 1⎥ ⎢ ⎥ = ⎢ 1 2 −2⎥ =⎢ 3 2 3⎥ ,
⎜ ⎢−1 1⎥ ⎣1 2 −2⎦⎟ ⎢−2 3 −2⎥ ⎢− 4 −2 −2⎥
⎝⎣ ⎦ ⎠ ⎣ ⎦ ⎣ ⎦

⎡ 3 1⎤ ⎡ 5 1 −2⎤
⎡ 1 0 −1⎤ ⎢
BTAT = ⎢−1 2⎥ ⎢ = 3 2 3⎥ .
⎢ 0 −2⎥ ⎣2 1 1⎥⎦ ⎢− 4 −2 −2⎥
⎣ ⎦ ⎣ ⎦
Hence (AB)T = BTAT.

1. Properties of the transpose. Provided that the sum A + B and product AB are
defined for two matrices A and B, the last example points to the following two
results concerning transposes:
(a) (A + B)T = AT + BT;
(b) (AB)T = BTAT.

2. Symmetric matrices. A square matrix is said to be symmetric if A = AT. Since


rows and columns are interchanged in the transpose, this is equivalent to aij = aji
for all elements if A = [aij]. Symmetric matrices are easy to recognize since their
elements are reflected in the leading diagonal, the diagonal string of elements
from the top left to the bottom right of the matrix. Thus

⎡ 1 3 −2⎤
A=⎢ 3 2 4⎥
⎢−2 4 −1⎥
⎣ ⎦

is a 3 × 3 symmetric matrix.
A square matrix A for which A = −AT is said to be skew-symmetric. Note that, if
A is any square matrix, then A + AT is symmetric and A − AT is skew-symmetric.
The elements along the leading diagonal of a skew-symmetric matrix must all be
zero. Thus

⎡ 0 1 2⎤
⎢ −1 0 −3⎥
⎢−2 3 0⎥
⎣ ⎦

is skew-symmetric.

3. Row and column vectors. As we defined them is Section 7.1, a row vector is a
matrix with one row, and a column vector is one with one column. For vectors, we
usually use bold-faced small letters and write, for example,
170

⎡ a1 ⎤
⎢a ⎥
MATRIX ALGEBRA

a = ⎢ 2 ⎥, b = [b1 b2 … bn].
⎢⎥
⎢⎣an ⎥⎦

The transpose of a row vector is a column vector and vice versa. If A is an


m × n matrix, then Aa is a column vector with m rows.

If
7

Example 7.8

⎡ 1 −1 2⎤ ⎡x⎤ ⎡ 2⎤
A = ⎢ 3 1 − 4⎥ , x = ⎢y ⎥ , d = ⎢ 1⎥ ,
⎢−1 2
⎣ 1⎥⎦ ⎢z ⎥
⎣ ⎦
⎢−1⎥
⎣ ⎦
find the set of equations for x, y, z represented by Ax = d.
The matrix equation in full is
⎡ 1 −1 2⎤ ⎡x⎤ ⎡ 2⎤
⎢ 3 1 − 4⎥ ⎢y ⎥ = ⎢ 1⎥ ,
⎢−1 2 1⎥⎦ ⎢⎣ z ⎥⎦ ⎢⎣−1⎥⎦

or
⎡ x − y + 2z ⎤ ⎡ 2 ⎤
⎢ 3x + y − 4z ⎥ = ⎢ 1⎥ .
⎢−x + 2y + z ⎥ ⎢−1⎥
⎣ ⎦ ⎣ ⎦
The set of linear equations for x, y, z is
x − y + 2z = 2,
3x + y − 4z = 1,
−x + 2y + z = −1.

We shall say more about the solutions of linear equations in Chapter 8.

4. Diagonal matrices. A square matrix all of whose elements off the leading
diagonal are zero is called a diagonal matrix. Thus, if A = [aij] is an n × n matrix,
then A is diagonal if aij = 0 for all i ≠ j. Hence

⎡ 1 0 0⎤
A = ⎢0 −2 0⎥
⎢0
⎣ 0 3⎥⎦

is an example of a 3 × 3 diagonal matrix.


A diagonal matrix is obviously symmetric. If A and B are diagonal matrices of
the same order then A + B and AB are also both diagonal.

5. Identity matrix. The diagonal matrix with all diagonal elements 1 is called
the identity or unit matrix In. Hence, the 3 × 3 identity is
171

⎡ 1 0 0⎤
I3 = ⎢0 1 0⎥ .

7.3
⎢0 0 1⎥
⎣ ⎦

SPECIAL MATRICES
(If there is no confusion likely to arise, In or I3 are simply replaced by the
universal symbol I.) The reason for the definition becomes clear if we multiply
a 3 × 3 matrix by I3. If A is a general 3 × 3 matrix, then

⎡a11 a12 a13 ⎤ ⎡ 1 0 0⎤


AI3 = ⎢a21 a22 a23 ⎥ ⎢0 1 0⎥
⎢a
⎣ 31 a32 a33 ⎥⎦ ⎢⎣0 0 1⎥⎦
⎡a11 a12 a13 ⎤
= ⎢a21 a22 a23 ⎥ = A.
⎢a
⎣ 31 a32 a33 ⎥⎦

Similarly I3A = A.
A need not be square: provided that the products are defined, AI = A and
IA = A for the appropriate identity matrix in each case.

6. Powers of matrices. If A is a square matrix of order n × n, then we write AA


as A2, AA2 as A3, and so on.
If A is diagonal, as in

⎡d1 0 0⎤
A = ⎢ 0 d2 0 ⎥,
⎢0 0
⎣ d 3 ⎥⎦

then

⎡d 21 0 0 ⎤ ⎡d 31 0 0 ⎤
A = ⎢ 0 d 22 0 ⎥ ,
2 A = ⎢ 0 d 32 0 ⎥ ,
3 etc.
⎢ 0 0 d2⎥ ⎢ 0 0 d3 ⎥
⎣ 3⎦ ⎣ 3⎦

In particular Inm = Im for all positive integers n.

Self-test 7.3
If the symmetric matrices A and B are given by
G1 2 3J G 2 1 1J
A = H 2 −1 1K , B = H 1 −1 3K ,
I3 1 1L I 1 3 0L
compute 2A + 3B, A2, AB, BA and AB + BA. Which of these matrices is
symmetric?
172

7.4 The inverse matrix


MATRIX ALGEBRA

If A and B are square matrices, each of order n × n, which satisfy the equations
AB = BA = In, (7.5)

then B is called the inverse of A. We say the inverse because, if a matrix B exists
with this property, then it is uniquely determined by A (although we shall not
prove this here). We write B = A−1 (not B = In /A). Since the definition is ‘sym-
metric’, it follows that A is the inverse of B, that is A = B −1. The inverse matrix
7

defines ‘division’ for matrices, but analogies with numbers must not be taken too
far. It is a particularly useful operation since it enables us to manipulate matrix
equations. Thus, if AB = C, and the inverse of B exists, then we can solve the
equation and find A as A = CB −1.
It appears that we need to find a matrix B which satisfies both equations
in (7.5). However it can be proved that if AB = In, then BA = In, and conversely (see
Whitelaw (1983) for a proof of this). Hence it is sufficient that:

Inverse of a square matrix A, order n


If it can be shown that there exists a matrix B such that either AB = In or
BA = In, then B is called the inverse of A, and is written as B = A−1. (7.6)

How do we find the inverse? Does it always exist? Let us look first at the case in
which A is a 2 × 2 matrix, and consider the equation
Ax = d,
where

⎡a a ⎤ ⎡x ⎤ ⎡d ⎤
A = ⎢ 11 12 ⎥ , x = ⎢ 1⎥, d = ⎢ 1⎥.
⎣a21 a22 ⎦ ⎣x2 ⎦ ⎣d2 ⎦
Thus

⎡a11 a12 ⎤ ⎡x1 ⎤ ⎡d1 ⎤


⎢a21 a22 ⎥ ⎢x2 ⎥ = ⎢d2 ⎥ ,
⎣ ⎦⎣ ⎦ ⎣ ⎦
or
a11x1 + a12x2 = d1, (7.7)

a21x1 + a22x2 = d2. (7.8)

These are linear equations in the unknowns x1 and x2: we shall say more about
their solution in Chapter 12. Eliminate x2 by multiplying (7.7) by a22, (7.8) by a12,
and by subtracting the two equations so that
(a11a22 − a21a12)x1 = a22d1 − a12d2.
Similarly, elimination of x1 leads to
−(a11a22 − a21a12)x2 = a21d1 − a11d2.
173
Provided that a11a22 − a21a12 ≠ 0, it follows that

7.4
a22 d1 − a12 d2 −a21 d1 + a11 d2
x1 = , x2 = .
a11 a22 − a21 a12 a11 a22 − a21 a12

THE INVERSE MATRIX


We can now express the solution of the pair of equations (7.7), (7.8) in matrix
form:

⎡x ⎤ 1 ⎡ a22 d1 − a12 d2 ⎤
x = ⎢ 1⎥ = ⎢−a21 d1 + a11 d2 ⎥ = Cd,
⎣x2 ⎦ a11 a22 − a21 a12 ⎣ ⎦

where

1 ⎡ a22 −a12 ⎤
C= , with det A = a11a22 − a21a12.
det A ⎢⎣−a21 a11 ⎥⎦

If Ax = d is multiplied on the left by the inverse A−1, then


A−1Ax = I2x = x = A−1d.
Hence A−1 can be identified with the matrix C. In other words,

1 ⎡ a22 −a12 ⎤
A −1 = C = .
det A ⎢⎣−a21 a11 ⎥⎦
(7.9)

It is worth remembering the rule for 2 × 2 matrices by which A−1 can be con-
structed from A.

Rule for 2 × 2 inverse


Suppose that det A ≠ 0. Then to form the inverse, the diagonal elements a11 and
a22 are interchanged, the signs are changed for the other two elements, and the
matrix is divided by det A. (7.10)

The number det A = a11a22 − a21a12 is known as the determinant of A. It may also
be written directly in terms of the corresponding matrix as

⎡a a ⎤
det A = det ⎢ 11 12 ⎥ .
⎣a21 a22 ⎦

The following is a common notation for a determinant:

a11 a12
det A = .
a21 a22

The determinant is a function of the corresponding matrix, but is a number – not


a matrix. If det A = 0, then the matrix has no inverse. It is then said to be singular;
if A has an inverse, then A is said to be non-singular. We shall say more about
determinants in the next chapter.
174

Example 7.9 Decide whether A, where


MATRIX ALGEBRA

⎡ 1 3⎤
A=⎢ ⎥,
⎣−1 4⎦
is singular or not. If it is non-singular, find its inverse.
Here a11 = 1, a12 = 3, a21 = −1, and a22 = 4. Hence
1 3
det A = = 1 × 4 − (−1) × 3 = 4 + 3 = 7.
−1 4
7

Since det A ≠ 0, then A is non-singular. Its inverse is, by the rule above,
4 −3
A −1 = 1
7 .
1 1

Example 7.10 If
⎡ 1 3⎤ ⎡1 2⎤
A=⎢ ⎥, B=⎢ ⎥,
⎣−1 4⎦ ⎣1 −1⎦
find A−1, B−1, and (AB)−1.
Always check the determinants first. Here
det A = 4 − (−3) = 7, det B = −1 − 2 = −3.
These are not zero, so A and B have inverses:
⎡4 −3⎤ ⎡−1 −2⎤
A −1 = 1
⎢⎣ 1 1⎥⎦ , B −1 = − 13 ⎢ .
7
⎣−1 1⎥⎦
Also
⎡ 1 3⎤ ⎡1 2⎤ ⎡4 −1⎤
AB = ⎢ = .
⎣−1 4⎥⎦ ⎢⎣1 −1⎥⎦ ⎢⎣3 − 6⎥⎦
Thus, det(AB) = −24 + 3 = −21, and
⎡− 6 1⎤
(AB)−1 = − 211 ⎢ .
⎣ −3 4⎥⎦
Note that
⎡−1 −2⎤ ⎡4 −3⎤ ⎡− 6 1⎤
B −1A −1 = − 211 ⎢ = − 211 ⎢ = (AB)−1.
⎣−1 1⎥⎦ ⎢⎣ 1 1⎥⎦ ⎣ −3 4⎥⎦

This last result suggests the following correct rule for the inverse of the product
of two square matrices, namely:

Inverse rule for products

(AB)−1 = B −1A−1.
(7.11)
175
For the inverse of a 3 × 3 matrix we can adopt the same approach as for the
2 × 2 case by eliminating x1, x2, … successively between the set of equations

7.4
Ax = d or

THE INVERSE MATRIX


a11x1 + a12x2 + a13x3 = d1,
a21x1 + a22x2 + a23x3 = d2, (7.12)

a31x1 + a32x2 + a33x3 = d3.

The result is

x = A−1d,

where

⎡ a a − a32 a23 −(a12 a33 − a32 a13 ) a12 a23 − a22 a13 ⎤
1 ⎢ 22 33
A =
−1 −(a21 a33 − a31 a23 ) a11 a33 − a31 a13 −(a11 a23 − a21 a13 )⎥ (7.13)
det A ⎢ a a − a a −(a11 a32 − a31 a12 ) a11 a22 − a21 a12 ⎦

⎣ 21 32 31 22

with

det A = a11(a22a33 − a32a23) − a12(a21a33 − a31a23) + a13(a21a32 − a31a22), (7.14)

provided that det A ≠ 0. Again det A is known as the determinant of A and is


denoted by

a11 a12 a13


det A = a21 a22 a23 .
a31 a32 a33

Equation (7.13) gives the inverse matrix, as can be verified by calculation of the
products AA−1 and A−1A. Even for 3 × 3 matrices, the formula for the inverse
is quite complicated. If det A = 0, then the matrix is called singular, and has
no inverse.
Determinants, which have arisen in the content of inverse matrices, have import-
ant properties particularly with regard to their evaluation. They will be discussed
in more detail in the next chapter.

Example 7.11 Verify by direct multiplication that


⎡ 0 1 1⎤
A = −1 1 1⎥

⎢ 1 −1 1⎥
⎣ ⎦
has the inverse
⎡2 −2 0⎤
B = 21 ⎢2 −1 −1⎥ .
⎢0
⎣ 1 1⎥⎦ ➚
176
Example 7.11 continued
MATRIX ALGEBRA

Check the matrix product BA:


⎡2 −2 0⎤ ⎡ 0 1 1⎤
BA = 12 ⎢2 −1 −1⎥ ⎢−1 1 1⎥
⎢0 1 1⎥⎦ ⎢⎣ 1 −1 1⎥⎦

⎡ 1 0 0⎤
= ⎢0 1 0⎥ = I 3.
⎢0 0 1⎥
⎣ ⎦
7

Hence B = A−1.
Note that we need only verify that either BA = I3 or AB = I3, not both. If BA = I3, then
AB = I3, and vice versa.

Example 7.12 Using formula (7.13) find A−1, where


⎡ 2 1 0⎤
A = ⎢ 1 −1 5⎥ .
⎢−1 −1 2⎥
⎣ ⎦
We first find det A using (7.14):
det A = 2 × [(−1) × 2 − (−1) × 5] − 1 × [1 × 2 − (−1) × 5] + 0 × [1 × (−1) − (−1) × (−1)]
=2×3−1×7
= −1.
Thus, from (7.13),
−1 × 2 − (−1) × 5 −[1 × 2 − (−1) × 0] 1 × 5 − (−1) × 0 ⎤
1 ⎡⎢
A −1 = −[1 × 2 − (−1) × 5] 2 × 2 − (−1) × 0 −(2 × 5 − 1 × 0) ⎥
−1 ⎢1 × (−1) − (−
−1) × (−1) −[2 × (−1) − (−1) × (−1)] 2 × (−1) − 1 × (−1)⎥⎦

⎡ 3 −2 5⎤ ⎡−3 2 −5⎤
= − ⎢−7 4 −10⎥ = ⎢ 7 − 4 10⎥ .
⎢ −2 1 −3⎥⎦ ⎢ 2 −1 3⎥
⎣ ⎣ ⎦
This is the ‘formula’ method of finding the inverse of a 3 × 3 matrix, but it is not an
efficient procedure numerically. There are better methods using row operations which
will be explained in Chapter 12.

Self-test 7.4
The matrix A is given by
G0 a 0 0J
H0 0 0 bK
A=H ,
c 0 0 0K
I0 0 d 0L
where a, b, c, d are non-zero constants. Compute A2, A3 and A4. Hence find
the inverse of A.
177
Problems

PROBLEMS
7.1 (Section 7.1). The matrix A = [aij] is given by 7.8 (Section 7.3). A general n × n matrix is given by
⎡ 1 2 3⎤ A = [aij].
⎢−1 0 1⎥ Show that A + AT is a symmetric matrix, and that
A=⎢
2 −2 4⎥
.
⎢ ⎥ A − AT is skew-symmetric.
⎣ 1 5 −3⎦ Express the matrix
Identify the elements a13 and a31. ⎡ 2 1 3⎤
A = ⎢−2 0 1⎥
7.2 (Section 7.2). Solve the equation A = B, where ⎢ 3 1 2⎥
⎣ ⎦
⎡ 1 −2⎤ ⎡ 1 x⎤ as the sum of a symmetric matrix and a skew-
A=⎢ 3 1⎥ , B = ⎢y − x 1 ⎥ , symmetric matrix.
⎢−1 2⎥ ⎢ −1 2 ⎥
⎣ ⎦ ⎣ ⎦
for x and y. 7.9 (Section 7.3). Let
⎡ 1 3⎤
7.3 (Section 7.2). Given that A = ⎢−1 2⎥ .
⎢ 0 1⎥
⎡ 1 2 −3⎤ ⎡2 −1 3⎤ ⎣ ⎦
A=⎢ , B=⎢ ,
⎣−1 0 4⎥⎦ ⎣4 1 2⎥⎦ Write down AT, and find the products AAT and ATA.
find the matrices A + B, A − B, and 2A − 3B.
7.10 (Section 7.3). If
7.4 (Section 7.2). Given that ⎡ 1 −1 2⎤ ⎡x⎤ ⎡ 2⎤
⎡ 1 0⎤ ⎡ 2 1⎤ A=⎢ 3 0 1⎥ , x = ⎢y ⎥ , d = ⎢ 0⎥ ,
⎡ 1 3 0⎤ ⎢−1 2 −3⎥ ⎢z ⎥ ⎢−1⎥
A=⎢ ⎥ , B=⎢ 2 1⎥ , C = ⎢−1 1⎥ , ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
⎣2 1 1⎦ ⎢−1 −1⎥ ⎢ 0 1⎥
⎣ ⎦ ⎣ ⎦ write down the set of equations defined by Ax = d.
verify the distributive law A(B + C) = AB + AC for Confirm that the same set of equations is given by
the three matrices. xTAT = d T.

7.5 (Section 7.2). Let 7.11 (Section 7.3). Let


⎡ −1 0 ⎤ ⎡ 1 0 0⎤
⎡−1 2 −1⎤ ⎢ 1 2⎥ , A = ⎢a −1 0⎥ .
A=⎢ , B =
⎣ 2 3 1⎥⎦ ⎢ 3 −1⎥ ⎢b c 1⎥
⎣ ⎦ ⎣ ⎦
2
Find A . For what relation between a, b, and c is
⎡ 1 1⎤
C=⎢ . A2 = I3? In this case, what is the inverse matrix
⎣−1 2⎥⎦ of A? What is the inverse matrix of A2n −1 (n a
positive integer)?
Verify the associative law A(BC) = (AB)C for these
matrices.
7.12 If
7.6 (Section 7.2). Let ⎡2 0 1⎤ ⎡0 1
2 4⎤
1

A = ⎢2 −2 2⎥ , B = 1 −1 − 41 ⎥ ,

⎡ 4 2⎤ ⎡−2 −1⎤ ⎢0 ⎢ 1⎥
A=⎢ , B=⎢ . ⎣ 4 − 4⎥⎦ ⎣1 −1 − 2 ⎦
⎣2 1⎥⎦ ⎣ 4 2⎥⎦
Show that AB = 0, but that BA ≠ 0. find the products AB and BA, and confirm that B is
the inverse of A.
7.7 (Section 7.3). Let
7.13 Let
⎡ 2 1 3⎤
A = ⎢ 1 −1 2 ⎥ . ⎡2 0 1⎤
⎢−2 1 1⎥⎦ A = ⎢2 −2 2⎥ .
⎣ ⎢0 4 1⎥⎦

Find a matrix C such that A + C is the identity
matrix I3. Deduce that AC = CA. Find AC, and Find the powers A2 and A3, and verify that
hence the matrix A2 + C 2. A3 − A2 − 12A = −12I3.
178
Hence find the inverse matrix A−1 by multiplying in the (x, y) plane. Show that the matrix equation
the equation on both sides by A−1. for the constants a, b, and c can be written as
MATRIX ALGEBRA

⎡1 x1 x 21 ⎤ ⎡a⎤ ⎡y1 ⎤
7.14 (Section 7.4). Using the rule for inverses of ⎢1 x x 22 ⎥ ⎢b⎥ = ⎢y 2 ⎥ .
2 × 2 matrices, write down the inverses of: ⎢ 2

⎣1 x3 x 32 ⎦ ⎢⎣c ⎥⎦ ⎢⎣y3 ⎥⎦
G1 1J G 2 3J
(a) I
2 –1 L
; (b) I
–7 11 L
; Verify that the inverse of the 3 × 3 matrix on the
left is
G1 0J G 10 –7 J ⎡ ⎤
(c) I ; (d) I ; x2 x3 x3 x1 x1 x2
0 –2 L 8 0L ⎢ ⎥
⎢ (x2 − x1)(x3 − x1) (x3 − x2)(x1 − x2) (x1 − x3)(x2 − x3) ⎥
⎢ x2 + x3 x3 + x1 x1 + x2 ⎥
7

G –99 100 J ⎢− − − ⎥,
(e) I
97 98 L
. ⎢ (x2 − x1)(x3 − x1) (x3 − x2)(x1 − x2) (x1 − x3)(x2 − x3) ⎥
⎢ 1 1 1 ⎥
⎢ (x − x )(x − x ) (x3 − x2)(x1 − x2) (x1 − x3)(x2 − x3) ⎥⎦
⎣ 2 1 3 1

7.15 (Section 7.4). The sparsely filled matrix A is provided that certain conditions are met. What
given by are they, and what implications have they for the
⎡0 1 0 0⎤ given points in the plane? Find a, b, and c in terms
⎢0 0 1 0⎥ of the given data. Find the equation of the parabola
A=⎢
0⎥
. through the points (−2, 0), (1, −2), (3, 4).
1 0 0
⎢ ⎥
⎣0 0 0 1⎦
7.19 The elements in a 3 × 3 matrix A = [aij ] are
Thinking about the row-on-column rule for matrix given by the rule
multiplication, can you guess the columns in the
aij = (−j)i − ij.
inverse matrix A−1? How would this rule generalize
to the matrix Write down the matrix A. Calculate det A and the
inverse of A.
⎡0 a 0 0⎤
⎢0 0 b 0⎥ 7.20 If
A=⎢
0⎥
?
c 0 0
⎢ ⎥ ⎡2 1 3⎤
⎣0 0 0 d⎦
A = ⎢1 −1 2⎥ ,
⎢1 2 1⎥
7.16 (Section 7.4). Write down the set of equations ⎣ ⎦
given by Ax = d, where show that A 3
− 2A 2
− 9A = 0, but that A2 − 2A − 9I3
≠ 0. Does the inverse of A exist?
⎡0 1 1⎤ ⎡x⎤ ⎡ 6⎤
A = ⎢1 −2 2⎥ , x = ⎢y ⎥ , d = ⎢ 3⎥ . 7.21 An nth-order square matrix A satisfies A2 = A
⎢1 0 1⎥ ⎢z ⎥ ⎢ −9 ⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ and A ≠ In. Show that
Find A−1 and calculate the product A−1d. What is (a) det A = 0;
the solution of the equation? (b) (In + A)−1 = In − 12 A;
(c) (In + A)m = In + (2m − 1)A for any positive
7.17 (Section 7.4). If A and B are both n × n integer m.
matrices with A non-singular, show that
⎡x y1 ⎤ ⎡ x2 y2 ⎤
(A−1BA)2 = A−1B2A. 7.22 Let A1 = ⎢ 1 ,A = .
⎣− y1 x1 ⎥⎦ 2 ⎢⎣− y 2 x 2 ⎥⎦
G 1 2J G 1 2J
Let A = I and B = I . Calculate A−1B4A. Calculate
–1 1L –1 0L
A1 + A2, A1A2, A2A1, A 1−1. Compare your results
7.18 (Section 7.4). For interpolation purposes with z1 + z2, z1z2, and 1/z1, where z1 = x1 + iy1 and
for given data, it is required that the parabola z2 = x2 + iy2 are complex numbers (see Chapter 6).
y = a + bx + cx2 should pass through the three Consider the possibility of developing further
points with coordinates (x1, y1), (x2, y2), and (x3, y3) parallels, such as to |z | and ez.
Determinants
8

CONTENTS

8.1 The determinant of a square matrix 179


8.2 Properties of determinants 182
8.3 The adjoint and inverse matrices 189
Problems 190

As we saw in Section 7.4, certain combinations of elements from a square matrix


appear as the denominator in the construction of the inverse matrix. If this num-
ber, called the determinant of the matrix, turns out to be zero, then the matrix is
singular and no inverse exists. Here we look at the definition of the determinant
of a matrix and its properties. Special emphasis will be placed on the 2 × 2 and
3 × 3 determinants, which suggest generalizations to higher-order cases.

8.1 The determinant of a square matrix


Given the matrix
⎡a a ⎤
A = ⎢ 11 12 ⎥ ,
⎣a21 a22 ⎦
then the determinant of A is denoted and defined by

Expansion of 2 × 2 determinant
a11 a12
det A = = a11a22 − a21a12 .
a21 a22
(8.1)

(The notation | A | is also used extensively for the determinant.) For the 3 × 3
matrix

⎡a11 a12 a13 ⎤


A = ⎢a21 a22 a23 ⎥ ,
⎢a
⎣ 31 a32 a33 ⎥⎦
180
its determinant is (see (7.11)) defined as
DETERMINANTS

a11 a12 a13


det A = a21 a22 a23 = a11a22a33 − a11a32a23 − a12a21a33 + a12a31a23
a31 a32 a33 + a13a21a32 − a13a31a22. (8.2)

In (8.2) there are six terms, each of which is the product of three elements. Each
term contains three elements, each from a different row and column. In other
words, there are never two elements in any term from the same row or column.
8

It can be seen that there must be just 3 × 2 × 1 = 6 terms of this form, because
three elements can be chosen from row 1, two from the two remaining elements
in row 2, and one element from row 3.
Each term is prefixed by either +1 or −1. This is decided according to the
following rule. Write each term in the form
a1j a2j a3j ,
1 2 3

in which the first suffixes are in consecutive increasing order. Examine the permuta-
tions of the second suffixes j1 j2 j3 (see Section 1.17). The permutation is said to
be even (odd) if it has an even (odd) number of inversions. An inversion occurs
whenever a larger integer precedes a smaller one. Thus the permutation 132 is
odd, since 3 precedes 2, but 312 is even since there are two inversions because 3
precedes 1 and 2. If the number of permutations is even, then a + sign is attached;
if the number is odd, then a − sign is attached. This rule can be extended to a
determinant of any order.
While this expansion of the determinant says something about the structure
of the determinant, it is not really a practical rule for evaluating determinants.
Returning to (8.2), we can rewrite det A as
det A = a11(a22a33 − a32a23) − a12(a21a33 − a31a23) + a13(a21a32 − a31a22).
The terms in brackets are themselves 2 × 2 determinants. Thus

Expansion of 3 × 3 determinant
a11 a12 a13
det A = a21 a22 a23
a31 a32 a33
a22 a23 a a23 a a22
= a11 − a12 21 + a13 21 .
a32 a33 a31 a33 a31 a32
(8.3)

This expression is called an expansion by the top row. The term associated with
a11, namely

a22 a23
C11 = ,
a32 a33

is known as the cofactor of a11. The cofactor of an element of A is obtained


by deleting the row and column through the element and writing down the
181
determinant of the elements of the remaining 2 × 2 submatrix with a + or − sign
attached. The cofactors of a12 and a13 are

8.1
a21 a23 a21 a22

THE DETERMINANT OF A SQUARE MATRIX


C12 = − , C13 = ,
a31 a33 a31 a32

where the signs attached should be noted. In the same way the cofactors of the
elements in the second and third rows are defined as follows:

a12 a13 a11 a13 a11 a12


C21 = − , C22 = , C23 = − ,
a32 a33 a31 a33 a31 a32
a12 a13 a11 a13 a11 a12
C31 = , C32 = − , C33 = .
a22 a23 a21 a23 a21 a22

The signs associated with the cofactors alternate, starting with a + at the top left
as we move across or down from the top left-hand corner as shown:

+ − +
− + − .
+ − +

The sign associated with Cij is + if i + j is even and − if i + j is odd, and can be
expressed as (−1)i+j.
For example, if det A is expanded by the third column, then
det A = a13C13 + a23C23 + a33C33.

Example 8.1 Let


⎡1 −1 0⎤
A = ⎢2 3 −2 ⎥ .
⎢1
⎣ −1 1 ⎥⎦
Evaluate det A by expanding by the first row. Find the cofactors C13, C23, C33 of
the elements in the third column. Calculate
a13C13 + a23C23 + a33C33,
and verify that it also equals det A.
By (8.3),
3 −2 2 −2 2 3
det A = 1 × − (−1) × +0× = (3 − 2) + (2 + 2) = 5.
−1 1 1 1 1 −1
The cofactors are (with due regard to the sign convention)
2 3 1 −1 1 −1
C13 = = −5, C23 = − = 0, C33 = = 5.
1 −1 1 −1 2 3
Hence expansion by the third column gives
a13C13 + a23C23 + a33C33 = 0 × (−5) + (−2) × 0 + 1 × 5 = 5,
which is the same as det A. (See also Rule 4 below.)
182

Example 8.2 Evaluate the determinant


DETERMINANTS

1 2 k
det A = 2 −1 3 ,
−1 4 −2
for any k. Find the value of k for which the determinant is zero.
Expanding by the first row gives
8

1 2 k
−1 3 2 3 2 −1
det A = 2 −1 3 = 1 × −2× +k×
4 −2 −1 −2 −1 4
−1 4 −2
= 1 × (2 − 12) − 2 × (−4 + 3) + k × (8 − 1)
= −10 + 2 + 7k = −8 + 7k.
Hence det A = 0 if k = 87 .

The notion of cofactors generalizes to higher-order determinants. The alternating-


sign rule applies from the top left-hand corner. For example, a 4 × 4 determinant
has 16 cofactors, each of which is a 3 × 3 determinant.

Self-test 8.1
Let
G 1 2 kJ
A = H k 2 −1K .
I1 2 1L
Find det A and the value of k for which det A = 0.

8.2 Properties of determinants


We list here some properties of determinants. Many of them are useful in evalu-
ating determinants. We shall not aim for complete generality but illustrate the
rules mainly in the 3 × 3 case. However, the rules have obvious generalizations to
higher orders.

1. Transpose. det AT = det A, where AT is the transpose of A (see Section 7.3).


The determinants of a square matrix and its transpose are equal since

a11 a21 a31


det AT = a12 a22 a32 = a11a22a33 − a11a23a32 − a21a12a33 + a21a13a32
a13 a23 a33 + a31a12a23 − a31a13a22,

and all terms in this expansion can be identified with those in (8.2). Hence
det AT = det A.
183

Example 8.3 Evaluate

8.2
1 28 −29
det A = 0 1 −4 .

PROPERTIES OF DETERMINANTS
0 −2 5
Since the determinant has two zeros in the first column, it is advantageous to use Rule 1.
The determinant of the transpose of A is given by
1 0 0
det AT = 28 1 −2 ,
−29 − 4 5
which now has two zeros in the first row. Hence the expansion by the top row becomes
particularly easy:
1 −2
det AT = 1 × = 5 − 8 = −3.
−4 5

2. Scalar factor. If every element of any single row or column of the matrix A is
multiplied by a scalar k, then the determinant of this matrix is k det A.
(Note: this rule is different from Rule 2, Section 7.2, for matrices.)
This is a self-evident result, since just one element from every row and column
appears in every term. Thus, by (8.2), if every element of the second row in A is
multiplied by k, then
a11 a12 a13
ka21 ka22 ka23 = a11ka22a33 − a11ka23a32 − a12ka21a33 + a12ka23a31
a31 a32 a33 + a13ka21a32 − a13ka22a31
= k det A.
By putting k = 0 in this result, note that any determinant must have zero value
if all the elements of any row or column are zeros.
Conversely, a common factor can be taken out as a multiplier of the determinant.

Example 8.4 Evaluate the determinant


−1 99 1
∆ = 2 33 −2 .
3 55 1

Since the second column obviously has a factor of 11, then we can remove this factor
from the second column before expansion. Thus, by Rule 2,
−1 9 1
⎛ 3 −2 2 −2 2 3⎞
∆ = 11 × 2 3 −2 = 11 × ⎜(−1) × −9× +1×
⎝ 5 1 3 1 3 5 ⎟⎠
3 5 1
= 11 × [−(3 + 10) − 9 × (2 + 6) + (10 − 9)]
= 11 × (−13 − 72 + 1) = −924.
184
3. Row/column exchange. If B is obtained from A by interchanging two rows
(or columns) then det B = −det A.
DETERMINANTS

Suppose, for example, that rows 1 and 3 are interchanged, so that

⎡a11 a12 a13 ⎤ ⎡a31 a32 a33 ⎤


A = ⎢a21 a22 a23 ⎥ , B = ⎢a21 a22 a23 ⎥ .
⎢a
⎣ 31 a32 a33 ⎥⎦ ⎢a
⎣ 11 a12 a13 ⎥⎦
8

Then, by analogy with (8.2), the expansion of B by its first row is given by

a22 a23 a a a a
det B = a31 − a32 21 23 + a33 21 22
a12 a13 a11 a13 a11 a12
= a31a22a13 − a31a23a12 − a32a21a13 + a32a23a11 + a33a21a12 − a33a22a11.
These are the same terms as those present in (8.2) except that the sign of every
term is changed. Therefore in this case
det B = −det A.
The same is true whichever row or column pairs are exchanged.
The rule applies to a determinant of any order.

Example 8.5 Evaluate the determinant


1 2 1 2
0 2 0 0
∆= .
−1 3 0 4
−1 2 0 −1

There are several ways of approaching the evaluation of this determinant since the
second row and third column each have three zeros. It is obviously advantageous to
have as many zeros as possible in the top row. With this in view, interchange rows 1
and 2 using Rule 3:
0 2 0 0
1 2 1 2
∆=− .
−1 3 0 4
−1 2 0 −1
Expanding by row 1, remembering the sign rule for cofactors:
1 1 2
∆ = 2 × −1 0 4 .
−1 0 −1
Now successively use Rule 1 and interchange rows with columns, and then Rule 3 and
interchange the new rows 1 and 2:

1 −1 −1 1 −1 −1 1 0 0
−1 −1
∆ = 2 × 1 0 0 = 2 × 1 0 0 = (−2) × 1 −1 −1 = (−2) ×
4 −1
2 4 −1 2 4 −1 2 4 −1
= (−2) × (1 + 4) = −10.
185
4. Expansion by any row or column. From (8.2), by grouping the terms differ-
ently, we can write, for example,

8.2
det A = a31(a12a23 − a13a22) − a32(a11a23 − a13a21) + a33(a11a22 − a12a21)

PROPERTIES OF DETERMINANTS
a12 a13 a a a a
= a31 − a32 11 13 + a33 11 12
a22 a23 a21 a23 a21 a22
= a31C31 + a32C32 + a33C33.
in terms of cofactors. Here the elements a31, a32, a33 constitute the third row, and
we call this the expansion of det A by the third row.
It can be shown that the expansion can be written down similarly using any
row or column. Thus
det A = a12C12 + a22C22 + a32C32
is an expansion by the second column.

Example 8.6 Evaluate det A, where


⎡ 1 3 0 1⎤
⎢ 1 0 0 2⎥
A=⎢
2 4⎥⎥
.
⎢−1 2
⎢⎣ 2 1 0 −1⎦⎥
Since column 3 contains three zeros, expand by this column. The cofactor of the
element in row 3, column 3, is the 3 × 3 determinant obtained from A by deleting
the third row and third column in A. It is associated with a + sign. Hence
1 3 1
det A = 2 1 0 2 (remember the sign rule)
2 1 −1
⎛ 3 1 1 3⎞
= 2 ⎜ −1 × −2×
2 1 ⎟⎠
(expanding by row 2)
⎝ 1 −1
= 2(4 + 10) = 28.

For Rules 5 and 6 it is useful to use the expressions linearly dependent or


linearly independent to describe important general relations between the rows (or
columns) of a matrix, determinant, or between vectors. For illustration, consider
a 3 × 3 determinant case. Denote the rows by bold type, r1, r2, r3, each standing for
a row of entries, with r1 = (a1, b1, c1), etc. If α is any constant, then α r1 stands
for a typical row (α a1, αb1, αc1), and so on:

Linear dependence
The rows r1, r2, r3 are said to be linearly dependent if α, β, γ, not all zero, exist
such that α r1 + β r2 + γ r3 = (0, 0, 0).
Otherwise the three rows are linearly independent. (8.4)

The definition can be extended to any number n of dimensions. In three dimen-


sions a simple expression of linear dependence of the three rows is that if any one
186
of the rows equals the sum of multiples of the other two (including the case where
one row is equal to a multiple of either one of the others) the three rows are
DETERMINANTS

linearly dependent. There may be more then one such relation if the dimension
n > 3; in all cases the whole group of n rows is said to be linearly dependent.
Notice that the same terminology is applicable to any collection of n-
dimensional vectors (see Chapters 9, 10, 11). For example, if three non-zero
three-dimensional vectors are coplanar, they are linearly dependent, and con-
versely, if they are linearly dependent, they are coplanar.
8

We return to property 5.

5. Zero determinant. If the rows (or columns) of A are linearly dependent, then
det A = 0.
The simplest case is when two rows are identical. By Rule 2, interchange of
two rows changes the sign of det A even if the rows are identical, so that inter-
change amounts to no change. Therefore det A = −det A. This is possible only if
det A = 0.
Suppose one row is a multiple k ≠ 0 of another. Then by Rule 2 we may take k
out as a factor, leaving det A = k det A, which is only possible if det A = 0. (There
may be more than one such relation.)
We prove the general case of linear dependence, illustrating it for three dimen-
sions, using the notation A = [aij] for the matrix elements and Cij for the cor-
responding cofactors as in Section 8.1. Suppose, for example, that the first row
consists of a linear combination of the second and third rows, so that a1j = pa2j
+ qa3j, j = 1, 2, 3, where p and q are constants, not both zero.
Expand the determinant det A by the top row:
det A = (pa21 + qa31)C11 + (pa22 + qa32)C12 + (pa23 + qa33)C13
= p(a21C11 + a22C12 + a23C13) + q(a31C11 + a32C12 + a33C13)
in which C11, C12, C13 are the cofactors corresponding respectively to a11, a12, a13.
The bracketed expressions on the right each has the form of an expanded deter-
minant with a repeated first row, and is therefore zero, as we showed above.
Therefore det A = 0. For dimensions n with n  3 there will simply be more non-
zero coefficients, p, q, r, … , and correspondingly more terms in the expression.
Thus, for example,

99 18 63 11 2 7
11 2 7 = 9 11 2 7 (by Rule 2).
−2 3 4 −2 3 4
= 0 (by Rule 5).

6. Elementary row/column transformations. If the matrix B is constructed from


A by adding k times one row (or column) to another row (column) then
det B = det A: in other words, any number of such operations on rows and on
columns has no effect on the value of det A.
For our standard matrix A, consider the matrix B which is obtained from A by
adding k times the elements in the first row to the elements in the third row. Then
187

a11 a12 a13


det B =

8.2
a21 a22 a23
a31 + ka11 a32 + ka12 a33 + ka13

PROPERTIES OF DETERMINANTS
= (a31 + ka11)C31 + (a32 + ka12)C32 + (a33 + ka13)C33
(expanding by row 3)
= a31C31 + a32C32 + a33C33 + k(a11C31 + a12C32 + a13C33)
a11 a12 a13
= det A + k a21 a22 a23
a11 a12 a13
= det A,
since the second determinant vanishes by Rule 5 having two rows with the same
elements.
Note that
a11C31 + a12C32 + a13C33 = 0
that is, in its general form, the sum of the products of the elements of one row (or
column) and the cofactors of the elements of another row (column) is zero. This
follows since the left-hand side must arise from a matrix with two identical rows
(columns).
Rule 5 is a particularly useful rule for simplifying the elements in a determinant
before expansion and evaluation. We illustrate a number of these points in the
next example.

Example 8.7 Evaluate


2 99 −99
∆= 999 1000 1001 .
1000 1001 998

Usually we use the rules (particularly 6) either to introduce zeros into the matrix or to
reduce the size of elements as far as possible. It is important to list the operations in
order to make the sequence of operations intelligible. For this purpose we identify the
current rows by r1, r2, …, and the current columns by c1, c2, … . Denote the new rows
and columns which have been changed by r′1, r′2, … and c′1, c′2, … respectively. There are
many ways of approaching the evaluation of ∆. A first step in this example could be to
add column 3 (c3) to column 2 (c2) since this produces a zero at the top of column 2. This
operation is represented by c′2 = c2 + c3, and we list the operations on the right-hand side
as we proceed. The second operation is to subtract the new row 3 from the new row 2.
A decision is taken at each step in the light of the new matrix. By Rule 6, these
operations do not affect the value of ∆. Hence
2 99 −99
∆= 999 1000 1001
1000 1001 998
2 0 −99
= 999 2001 1001 (c 2′ = c 2 + c 3 )
1000 1999 998 ➚
188
Example 8.7 continued
DETERMINANTS

2 0 −99
= −1 2 3 (r ′2 = r2 − r 3)
1000 1999 998
2 0 −99
⎛ c1′ = c1 − 12 c 2 ⎞
= −2 2 2 ⎜ c′ = c − 1 c ⎟
0.5 1999 −1.5 ⎝ 3 3 2 2⎠

2 0 −93
8

= −2 2 −4 (c ′3 = c 3 + 3c1 )
0.5 1999 0
= 2(4 × 1999) − 93[(−2 × 1999) − 1] = 387 899.

Note that while r′2 = r2 + kr3 does not affect the value of the determinant,
r′2 = kr2 + r3 will change its value by a factor k.

7. Product of determinants. If A and B are square matrices of the some order,


then det AB = det A det B. (We shall not prove this here.)
If A is non-singular and we put B = A−1, then it follows from this rule that
det A det(A−1) = det(AA−1) = det In = 1.
Hence det A = 1/det(A−1).

Summary of rules for determinants


Let A = [aij]. Then
1. det AT = det A.
2. If B is obtained from A by multiplying the elements if any row or column
by k, then det B = k det A.
3. If B is obtained from A by interchanging two rows (or two columns), then
det B = −det A.
4. If p is a vector representing any row or column in A, and C is the
corresponding vector of cofactors, then det A = p · C.
5. If two rows or columns are linearly dependent, then det A = 0.
6. If B is constructed from A by adding k times one row (or column) to
another row (or column), then det A = det B.
7. det(AB) = det A det B. (8.5)

Self-test 8.2
The determinant
ix a a … ai
ia x a … ai
Dn = … … … … …i
i
ia a a … xi
has n rows. Factorize Dn in terms of x.
189

8.3 The adjoint and inverse matrices

8.3
We can now rewrite the formula for the inverse given in Section 7.4, using cofactors.
The transposed matrix of cofactors given by

THE ADJOINT AND INVERSE MATRICES


⎡C11 C21 C31 ⎤
adj A = ⎢C12 C22 C32 ⎥ (8.6)
⎢C ⎥
⎣ 13 C23 C33 ⎦
is known as the adjoint of A; the term adjugate is also used. Hence the inverse
matrix of A given by eqn. (7.13) becomes
adj A
A −1 = ,
det A
in terms of the adjoint and determinant of A.
We can alternatively confirm by direct matrix multiplication that adj A/det A
is the inverse of A. Thus, using (8.6),

adj A ⎡⎢ 11 12
a a a13 ⎤ ⎡C11 C21 C31 ⎤
1
A = a 21 a 22 a 23 ⎥ ⎢C12 C22 C32 ⎥
det A ⎢a ⎥ ⎢ ⎥ det A
⎣ 31 a32 a33 ⎦ ⎣C13 C23 C33 ⎦

a C + a12C12 + a13C13 a11C21 + a12C22 + a13C23 a11C31 + a12C32 + a13C33 ⎤


1 ⎡⎢ 11 11
= a 21C11 + a 22C12 + a 23C13 a 21C21 + a 22C22 + a 23C23 a 21C31 + a 22C32 + a 23C33 ⎥
det A ⎢ a C + a C + a C a31C21 + a32C22 + a33C23 a31C31 + a32C32 + a33C33 ⎥⎦
⎣ 31 11 32 12 33 13

1 ⎡⎢
det A 0 0 ⎤
= 0 det A 0 ⎥ = I3 .
det A ⎢ 0 0 det A⎥⎦

This confirmation uses the results that the sum of the products of the elements
of one row and their cofactors is the value of the determinant whilst the sum of
the products of one row and the cofactors of another row is zero.

Example 8.8 Find the inverse of


⎡ 1 2 −1⎤
A = ⎢0 1 −1⎥ .
⎢ 1 −1 −2⎥
⎣ ⎦
We evaluate det A first. Thus
det A = 1 × (−2 − 1) − 2 × (0 + 1) − 1 × (0 − 1) = −4.
Since det A ≠ 0, the inverse exists. The cofactors are
C11 = −3, C12 = −1, C13 = −1,
C21 = 5, C22 = −1, C23 = 3,
C31 = −1, C32 = 1, C33 = 1.
Hence
⎡−3 5 −1⎤
A −1 = − 14 ⎢ −1 −1 1⎥ .
⎢ −1 3 1⎥
⎣ ⎦
190
The definition of the adjoint generalizes to matrices of higher order. However,
the adjoint of a 4 × 4 matrix contains 16 3 × 3 determinants, which is about the limit
DETERMINANTS

of hand calculations unless the determinant is only sparsely filled with nonzero
elements or can be reduced to such a determinant. Such computations become a
fertile source of errors. There are computer packages available which will quickly
perform the arithmetic operations for determinants of reasonable size.

Determinant, adjoint, and inverse for 3 × 3 matrices


8

(a) Determinant of A
a22 a23 a a23 a a22
det A = a11 − a12 21 + a13 21 ;
a32 a33 a31 a33 a31 a32
(b) Adjoint or adjugate of A
⎡C11 C21 C31 ⎤
adj A = ⎢C12 C22 C32 ⎥ ;
⎢C ⎥
⎣ 13 C23 C33 ⎦
(c) Inverse of A
adj A
A−1 = .
det A (8.7)

Self-test 8.3
Let
G1 k 2J
A= H 2 −1 −2 K.
I 1 −1 1L
Find the adjoint and inverse of A, and state the value of k for which the
inverse does not exist. For this value of k calculate the product A adj A. Is the
answer predictable?

Problems

8.1 Evaluate the following determinants. 0 1 0 0 0


1 0 0 0 0
1 0 1 (e) 0 0 0 0 1 ;
1 2 0 0 1 0 0
(a) ; (b) 0 1 0 ;
−1 3 0 0 0 1 0
1 0 1
2 1 0 0 0
2 1 0 −1 1 2 1 0 0
1 −1 2
0 0 2 0 (f) 0 1 2 1 0.
(c) 3 1 −1 ; (d) ;
3 −1 2 1 0 0 1 2 1
2 1 −1
0 1 −1 1 0 0 0 1 2
191
8.2 Without evaluating the following represents the equation of the straight line through
determinants, explain why they are all zero: the points (a1, b1) and (a2, b2) in the (x, y) plane. If

PROBLEMS
2 3 4 −1 2 X1 and X2 are the cofactors of x and y, what is the
3
slope of this line in terms of the cofactors? Using
(a) 4 6 8 ; (b) 3 1 −2 ;
this method, find the equation of the straight line
1 −1 2 −2 −3 −1
through the points:
a b c 1 1 1 (a) (1, −1) and (2, 3); (b) (−1, 0) and (4, −1).
(c) b c a ; (d) 3 0 0 .
a−b b−c c−a 5 0 0 8.8 Find the value of a which makes the
determinant
8.3 Given that
1 1 −1
a b c 1 a 2
∆= b c a, −1 1 2
c a b
equal to zero.
what is the value of
a3 ab ac 2 8.9 Explain why
ab c ac x 2 −2
ac a bc 2 x 3 =0
in terms of ∆? x −1 x
will be at most a cubic equation in x, but that
8.4 Simplify first and then evaluate the following
determinants 1 1 2
3 x 2 =0
99 100 200 77 84 55
x 1 x
(a) 98 102 199 ; (b) 75 87 57 ;
−1 2 3 1 −2 3 will be at most a quadratic equation in x. Solve
both equations, and find all roots including any
2 −1 1 complex ones.
(c) 99 98 55 ;
200 197 111 8.10 Show that
87 84 83 81 a11 + b11 a12 + b12 a13 + b13
77 76 77 75 a 21 a 22 a 23
(d) .
54 53 52 54 a31 a32 a33
− 43 − 44 − 46 − 4
a11 a12 a13 b11 b12 b13
= a 21 a 22 a 23 + a 21 a 22 a 23
8.5 Explain why the determinant
a31 a32 a33 a31 a32 a33
1 1 1
∆= a b c
8.11 The determinant
a 2 b2 c 2
a11 + b11 a12 + b12 a13 + b13
has factors b − c, c − a, and a − b. Express the value
a 21 + b21 a 22 + b22 a 23 + b23
of ∆ as the product of factors.
a31 + b31 a32 + b32 a33 + b33
8.6 Factorize the determinant is required as the sum of determinants each of
1 1 1 which has just as or bs in columns. How many
∆= a b c . determinants are there in the sum? If the
a3 b3 c3 determinant is n × n, how many determinants
would there be in the sum?
8.7 Explain, using one of the rules for
determinants, why the equation 8.12 Show that
x y 1 1 a1 − b1 a1 + b1 1 a1 b1
a1 b1 1 = 0 1 a 2 − b2 a 2 + b2 = 2 1 a 2 b2 .
a 2 b2 1 1 a3 − b3 a3 + b3 1 a3 b3
192
8.13 Let Dn be the n × n tridiagonal determinant (b) Write down AT and find its determinant
defined by det AT. Confirm that
DETERMINANTS

det AT = det A.
2 1 0 ... 0
1 2 1  0 (c) Find A−1 and det A−1, assuming that det A ≠ 0.
Dn = 0 1 2   . Confirm that
    1 det A−1 = 1/det A.
0 0 ... 1 2
(d) Show that
Show that
det adj A = det A.
8

Dn = 2Dn−1 − Dn−2. (These formulae for 2 × 2 matrices suggest


If Qn = Dn − Dn−1, deduce that generalizations for n × n matrices. Thus, for two
Qn = Qn−1 = … = Q2 = 1. n × n matrices A and B,
det AB = det A det B,
Show that Dn = n + 1.
det An = (det A)n,
8.14 Find all values of x for which det AT = det A,
det A−1 = 1/det A,
x a b c
a x b c det adj A = (det A)n−1.
a b x c We shall not attempt to prove these formulae here.)
a b c x
8.16 If
is zero.
⎡1 2 −1⎤ ⎡1 2 −1⎤
8.15 Let A and B be the two 2 × 2 matrices A = ⎢0 1 2⎥ , B = ⎢0 3 1⎥ ,
⎢1 3 −1⎥ ⎢2 1 3⎥
⎣ ⎦ ⎣ ⎦
⎡a a ⎤ ⎡b b ⎤
A = ⎢ 11 12 ⎥ , B = ⎢ 11 12 ⎥ .
a a
⎣ 21 22 ⎦ b b
⎣ 21 22 ⎦ calculate det A, det B, det AB, AT, det AT, adj A,
det adj A, A−1, and det A−1. Confirm the results
Write down det A and det B.
conjectured at the end of the previous problem.
(a) Find the product AB and its determinant
det AB. Confirm that
8.17 The elements in a 3 × 3 matrix A = [aij] are
det AB = det A det B. given by the formula
Show also that aij = α j + (−1)i2j (i, j = 1, 2, 3).
det (A ) = (det A) .
2 2
Show that det A = 0 for all real α.
Elementary operations
with vectors 9

CONTENTS

9.1 Displacement along an axis 193


9.2 Displacement vectors in two dimensions 195
9.3 Axes in three dimensions 198
9.4 Vectors in two and three dimensions 198
9.5 Relative velocity 204
9.6 Position vectors and vector equations 206
9.7 Unit vectors and basis vectors 210
9.8 Tangent vector, velocity, and acceleration 212
9.9 Motion in polar coordinates 214
Problems 216

A vector quantity in geometry, mechanics and physics is one that has both mag-
nitude and direction, and satisfies certain other strict physical properties. A great
convenience of using vectors is to represent and manipulate a multi-dimensional
variable by a single symbol (such as v for velocity). Using the simplest types,
arising in two- and three-dimensional geometry, we demonstrate the concepts of
a directed line segment, components, vector addition, and position vector (see
Sections 9.1 to 9.7). We then consider derivatives of vectors that are functions of
time or position. This process delivers the less intuitive vector quantities velocity,
acceleration, and the tangent and curvature of curves.

9.1 Displacement along an axis


Figure 9.1 shows an x axis with origin at O and a scale indicated. The positive
direction for x is from left to right. Two points are marked, P at x = xP = −2 and Q
at x = xQ = 1.5. The distance between two points is always expressed as a positive
number, so in this case
distance from P to Q, or from Q to P = PQ or QP = 2 + 1.5 = 3.5 units.
If we are told that xP = −2, and that the distance between P and Q is 3.5 units,
this does not tell us where Q is: xQ might be either 1.5 or −5.5. We need a way to
express, as a single piece of information, both the distance PQ and whether Q lies
to the right or left of P.
194

P O Q
ELEMENTARY OPERATIONS WITH VECTORS

–2 –1 0 1 2 x

Fig. 9.1

This is done by attaching a plus or minus sign to the distance. We use plus if Q
as viewed from P is in the positive direction of the x axis (to the right in this case),
and minus if Q is in the negative direction (to the left in this case). This quantity
is called the displacement of Q relative to P, or the displacement of Q from P, and
is defined in terms of xQ and xP by
displacement of Q from P = xQ − xP .
In this case the displacement of Q from P is equal to 1.5 − (−2) = 3.5. This is
positive, showing that Q is to the right of P. By the same rule,
displacement of P from Q = xP − xQ = (−2) − 1.5 = −3.5.
The minus sign indicates that P is to the left of Q.
9

Example 9.1 A pedestrian wanders up and down the high street, which extends
east and west. Starting at the bus stop, she strolls 80 m east, 25 m west, 50 m
east, then races 100 m west, at which point the returning bus drives off. Where
was she, relative to the bus stop, at this time?

80 m N
B C

25 m W E
D C
S
Bus stop

50 m
D E

F 100 m
E

5m

Fig. 9.2

There is no difficulty about this question: Fig. 9.2 shows that she ends up east of the bus
stop with 5 more metres to go. Notice how natural it is to count one direction as positive
and the other as negative. We shall formalize this, because we can get useful illustrations
about handling displacements from this problem.
In Fig. 9.3 we have drawn an east-pointing axis x. The origin is at O (it will make no
difference where it is) and the bus stop is at B. The direction changes at C, D, E, and the
end-point is F.
We want to find the displacement of F from B. This is defined by xF − xB. Write it in
the form
xF − xB = (xF − xE ) + (xE − xD) + (xD − xC) + (xC − xB) ➚
195
Example 9.1 continued

9.2
B F D C E

DISPLACEMENT VECTORS IN TWO DIMENSIONS


O x
(in 10 m steps)

Fig. 9.3

which is identically true because xE, xD, and xC cancel out. The quantities in the
brackets are relative displacements: for example, (xD − xC) represents the displacement
of D relative to C.
The data of the problem consists of these displacements; all we have to do is to get
the signs right. For example, since we chose the positive direction to be east, and the
movement from C to D is west, the displacement of D from C is −25. By substituting
all the information we obtain
xF − xB = 80 + (−25) + 50 + (−100) = 5.
Since this is positive, she ends up 5 m east of the bus stop.
We did not need to know the actual coordinates of any of the points B to F; the position
of the origin O makes no difference to relative displacements.

Relative displacement along a line


Definition: Given an axis Ox, and points P and Q,
Displacement of Q from P = xQ − xP = −(displacement of P from Q).
(a) The value of xQ − xP is unaffected by changing the origin of x.
(b) Addition of displacements:
xD − xA = (xD − xC) + (xC − xB) + (xB − xA),
identically (there may be any number of intermediate points).
(c) The order in which the displacements take place does not affect the final
displacement. (9.1)

9.2 Displacement vectors in two dimensions


We shall extend the idea of displacement into two dimensions. Suppose that
a ferry boat stationed at a port A is instructed to proceed in three stages to a
destination D. The instructions are:
(a) go 50 km east;
(b) continue 20 km north;
(c) continue 20 km north west to D.
The navigator can plot the route as in Fig. 9.4. The axes point east and north
for convenience, and the origin O may be anywhere we please. The three stages
are drawn to scale, with their directions indicated by the arrowheads.
196

N y
ELEMENTARY OPERATIONS WITH VECTORS

W E P
y (km × 10) S

1
(– √21 ) km north
km
so
ut
h
ea
D

st
20 km
C
20 km 1 Q
A 50 km km east
B √2

O x (km × 10) O x

Fig. 9.4 Fig. 9.5

Each instruction prescribes a displacement in two dimensions relative to the


initial point of each stage. A notation for the displacements is
A_B, B_C, C_D.
The bar emphasizes that they have particular directions (e.g. A to B). The arrows
9

in the figures indicate displacement vectors.


Any displacement vector can be described by a pair of numbers, called its
components. The two components are the displacements in the x and y directions
that take place during the two-dimensional displacement. For example, Fig. 9.5
shows the displacement vector corresponding to the instruction:
Proceed 1 km south east from P.
The same point Q is arrived at by saying:
Proceed to a point 1/√2 km east and (−1/√2) km north of P.
The numbers 1/√2 and (−1/√2) are the components of the displacement vector
P_Q.
In general, if P_Q represents a displacement vector, and P and Q have coordinates
(xP, yP) and (xQ, yQ), then
component of P_Q in the x direction = xQ − xP,
component of P_Q in the y direction = yQ − yP.
We may then write P_Q in component form:
P_Q = (xQ − xP, yQ − yP).
Suppose we have a chain of successive displacements. In the ferry boat problem
(Fig. 9.4) the chain consists of A_B, B_C, C_D. The final displacement, of D relative
to A, is A_D. In component form
A_D = (xD − xA, yD − yA).
The final components can be broken up into the successive stages:
xD − xA = (xD − xC) + (xC − xB) + (xB − xA),
yD − yA = (yD − yC ) + (yC − yB) + (yB − yA).
197
Since C_D = (xD − xC, yD − yC), and so on, it is reasonable to write

9.2
A_D = C_D + B_C + A_B.
The components are ordinary numbers, so we can change the order in which

DISPLACEMENT VECTORS IN TWO DIMENSIONS


displacements are added, and write instead, in the order of events
A_D = A_B + B_C + C_D,
or reorder them in any other way: the traveller would still arrive at the same point.
This is true for any number of displacements.

Example 9.2 Figure 9.6 shows the track of the ferry boat again (see Fig. 9.4).
(a) Find the displacement vector A_D in component form.
(b) Express A_D in terms of its length A_D and the angle θ it makes with the
positive x axis.

y
x displacement
D
y displacement

20 km
C
20 km
θ 50 km
A B
O x Fig. 9.6

(a) In component form,


A_B = (50, 0), B_C = (0, 20), C_D = (−20/√2, 20/√2)
(in km units) and suppose that
A_D = (X, Y).
Then, adding the individual x and y components,
X = 50 + 0 − 20/√2 = 50 − 20/√2,
and Y = 0 + 20 + 20/√2 = 20 + 20/√2,
(b) From Fig. 9.6,
length AD = √(X 2 + Y 2) = [(50 − 20 / √2)2 + (20 + 20 / √2)2 ]2 = 49.5 (km).
1

Y
θ = arctan = arctan 0.952 = 43.6°.
X

Self-test 9.1
Three vectors in the plane are given by
A_B = (10, 20), B_C = (5, 20), C_D = (14, –5).
Find the displacement vector A_D. What are the magnitude and direction
of A_D?
198

9.3 Axes in three dimensions


ELEMENTARY OPERATIONS WITH VECTORS

From now on we shall consider both two- and three-dimensional situations. To


locate points in a plane, two axes are needed. For three-dimensional space, intro-
duce a third axis Oz perpendicular to the other two and drawn through the origin
O in the direction shown in Fig. 9.7. These axes are indicated briefly by Oxyz.
The position of any point P is then specified by a triplet of coordinates, (x, y, z),
determined by reference to the three axes, Ox, Oy, Oz. For the point P in Fig. 9.7,
x = 2, y = 3, and z = 1, and we indicate P by writing P : (2, 3, 1).

z
2
1
1
1O 2
2 3
P : (2, 3, 1)
1 y
2
3
x
Fig. 9.7
9

There was a choice of two possible directions for Oz, as shown in Fig. 9.8.
These two sets of axes cannot be superposed no matter how we turn them about:
they are mirror images of each other, like a right shoe and a left shoe. The axes
shown in Fig. 9.8a are called right-handed axes (left-handed axes, Fig. 9.8b, are
seldom used).

(a) z (b) O

x
O

z
y

Fig. 9.8 (a) Right-handed axes. (b) Left-handed axes.

9.4 Vectors in two and three dimensions


A displacement vector is a case of a physical quantity which has a magnitude and
a direction, and which follows a certain set of rules similar to those in ordinary
algebra. Velocity, acceleration, and force are other examples. Such a quantity is
called a vector quantity, and can be depicted in terms of directed line segments
similar to those we used in Section 9.2 for displacements. The rules which follow
apply to directed line segments. We shall illustrate later how the rules also apply
to other vectors, such as forces.
199

9.4
Q

VECTORS IN TWO AND THREE DIMENSIONS


a
y

x Fig. 9.9 A typical vector P_Q, or a.

1. Components and magnitude. Figure 9.9 shows a vector placed in a set of axes.
Its initial point is P : (xP, yP, zP) and its end-point is Q : (xQ, yQ, zQ). We denote it
either by P_Q, where the bar stresses the direction, P to Q; or (more often) by a
single letter, say
a (in heavy print) or a (underlined when handwritten).
The components of a in the x, y, and z directions respectively are a1, a2, and a3,
where
a1 = xQ − xP, a2 = yQ − yP, a3 = zQ − zP. (9.2)

We write
P_Q, or a = (a1, a2, a3). (9.3)

The length or magnitude of a is denoted by PQ (no bar) or QP, or |a| or | P_Q |


or a. By Pythagoras’s theorem the length PQ is given by
√[(xQ − xP)2 + (yQ − yP)2 + (zQ − zP)2],
so
PQ or |a| or |P_Q | = √(a 12 + a 22 + a 32). (9.4)

This is always a positive number.

2. Equality of two vectors. We say that a = b if and only if their components


are equal: a1 = b1, a2 = b2, a3 = b3.
This is equivalent to saying that a = b if they have the same magnitude and direc-
tion. Instead of saying ‘in the same direction’, we may say ‘parallel and with the
same sense’.
The vectors shown in Fig. 9.10a are all called equal although they are in different
places. Figure 9.10b shows four specific vectors in the form of a parallelogram,
but only two letters, a and b, are needed to label it. Figure 9.10c shows two vec-
tors which are parallel but have opposite senses, or directions.
200

z z z
ELEMENTARY OPERATIONS WITH VECTORS

a b
a a a

–a
Q a
P
O b

a y y y
x R
x x
(a) (b) (c)

Fig. 9.10 (a) and (b) illustrate equality of vectors. (c) The vectors a and −a have opposite senses.

3. Multiplication by a positive or negative number. If k is a real number, then


ka = (ka1, ka2, ka3). (9.5)

Therefore ka is | k | times as long as a. If k is positive then ka is in the same


direction as a, and if k is negative it is in the direction opposite to a.
The vector (−a) means the same as (−1)a:
9

−a = (−a1, −a2, −a3), (9.6)

which has the same length as a and the opposite direction (Fig. 9.10c).

4. Addition and subtraction.


a + b = (a1, a2, a3) + (b1, b2, b3) = (a1 + b1, a2 + b2, a3 + b3), (9.7)

so the sum of two vectors is obtained by adding the corresponding components


(and similarly for any number of vectors).
This is equivalent geometrically to the triangle rule, illustrated in Fig. 9.11a.
Choose any point A as the starting point, then draw a followed by b, as if they
were successive displacements from A. The definition says that
a + b = A_B + B_C = A_C, (9.8)

where AC is the third side of the triangle.

(a) z (b) z
C
–b
B
b B
C a–b
a a
a+b

A O A O

y y

x x

Fig. 9.11 The triangle rule.


201

(a) z (b) z

9.4
C
A A

VECTORS IN TWO AND THREE DIMENSIONS


C
a a+b
a–b a
B
O
P –b O
P
b B y y

x x

Fig. 9.12 The parallelogram rule.

Sometimes the parallelogram rule, illustrated in Fig. 9.12a, is more convenient.


In this case, draw the vectors a and b out from the same point P, then complete
the parallelogram PACB. The diagonal vector P_C is equal to a + b, because
A_C = P_B = b
and so by the triangle rule applied to PAC
a + b = P_A + A_C = P_C.
For subtraction,
a − b = a + (−b), (9.9)

which is illustrated in Fig. 9.11b using the triangle rule and in Fig. 9.12b using the
parallelogram rule.
Also,
a − a = (a1, a2, a3) + (−a1, −a2, −a3) = (0, 0, 0).
This is the zero vector, denoted by 0 (or 0, if handwritten).

5. Brackets and rearrangement of sums of vectors. Addition involves only the


addition of the x, y, and z components separately. Since the components are ordin-
ary numbers, we may change the order in which they are added; for two vectors
we have
a + b = b + a. (9.10)

We can also use brackets in the usual way:


a + (b + c) = (a + b) + c. (9.11)

These are like the rules of ordinary algebra. A more complicated example is
(a + b) − (c + d ) = (b − c) − (d − a).

6. Vectors in two dimensions. All of the foregoing definitions and properties


apply equally to vectors in two dimensions: often called plane vectors. All that is
necessary is to delete the z component. Thus, in two dimensions, if a = (a1, a2) and
b = (b1, b2), then
202
a + b = (a1 + b1, a2 + b2).
ELEMENTARY OPERATIONS WITH VECTORS

Plane vectors are useful in dynamical applications such as for projectiles where
motion takes place in a fixed vertical plane.

Example 9.3 Find | a − b| when a = (a1, a2) and b = (b1, b2).


a − b = (a1 − b1, a2 − b2);
|a − b | = magnitude of a − b = √[(a1 − b1)2 + (a2 − b2)2].

Example 9.4 M is the midpoint of the side AB of the triangle PAB (Fig. 9.13).
Put P_A = a and P_B = b. (a) Express the vector P_M in terms of a and b.
(b) Deduce that the diagonals of a parallelogram bisect each other.
(You can think of this in two dimensions, but it applies equally in three.)

B
9

B
b
b D
M N

P P
a
a A
A
Fig. 9.14 N is the midpoint of PD.
Fig. 9.13 M is the midpoint of AB.

(a) P_M = P_B + B_M (triangle rule)


= P_B + --12 B_A (M is the midpoint)
= b + --B_A.
1
2 (i)

Also
B_A = B_P + P_A (triangle rule; note the direction of B_P)
= −P_B + P_A (see (9.6))
= −b + a. (ii)

Substitute for B_A from (ii) in (i):


P_M = b + --12 (−b + a) = --12 (a + b) (after rearrangement).
(b) In Fig. 9.14 we have added a fourth vertex D to form a parallelogram. The point N
is the midpoint of PD. Then
P_N = --12 P_D = --12 (a + b) (parallelogram rule).
Therefore, from the result in (a),
P_N = P_M,
so the midpoints of PD and BA coincide.
203

Example 9.5 (In two dimensions.) In the (x, y) plane, a and b are any two vectors

9.4
which are not parallel, and c is another vector. (a) Prove that c = λ a + µ b, where
λ and µ are constants. (b) Find λ and µ when a = (1, 1), b = (2, 0), and c = (3, 4).

VECTORS IN TWO AND THREE DIMENSIONS


B

C
b c

Q
a Fig. 9.15 Vectors in a plane:
A c = λ a + µ b.

(a) Take any point Q. Draw a, b, and c radiating from it, and then complete the
parallelogram QBCA, as in Fig. 9.15. Then
c = Q_C = Q_A + Q_B (parallelogram rule).
But Q_A and Q_B point respectively in the directions of a and b so they are equal to
certain (unique) multiples of a and b:
Q_A = λ a and Q_B = µb,
say. Therefore
c = λ a + µb.
(b) a = (1, 1), b = (2, 0), and c = (3, 4), so from (a)
(3, 4) = λ(1, 1) + µ(2, 0).
The individual components on the two sides must match, so
3 = λ + 2µ, 4 = λ.
The solution is λ = 4, µ = − --12 , so
c = 4a − --12 b.

The result in Example 9.5a is important, and it extends to three dimensions as


follows:

Relation between three coplanar vectors


a and b are two non-parallel, nonzero vectors with the same initial point Q, and
c is any other vector at Q, in the same plane as a and b. Then
c = λ a + µ b,
where λ and µ are certain (unique) constants. (9.12)

Figure 9.16 shows the three vectors a, b, c in their common plane: the vectors
are then said to be coplanar, or linearly dependent (see Section 8.2). We can use
the same argument as in the previous example. (It is not actually necessary for the
vectors a, b, and c to be in the same plane and emerge from the same point to start
with: it is sufficient for them merely to be parallel to the same plane, so that we can
translate them to the positions in Fig. 9.16.) Then the argument in Example 9.5
follows.
204

z
ELEMENTARY OPERATIONS WITH VECTORS

b
Q c
a

O
y

x Fig. 9.16

Self-test 9.2
ABCD is a quadrilateral with its sides represented by the vectors
A_B = a, B_C = b, A_D = c, D_C = d.
9

P, Q, R, S are the mid-points respectively of AB, BC, CD, DA. Show that
PQRS is a parallelogram. If M is the mid-point of AC, show that AMRS is
also a parallelogram.

9.5 Relative velocity


In this section we shall assume that all the velocities are constant.
Velocity has magnitude and direction, so we can depict it by a directed line
segment whose length is proportional to the speed (always a positive number), and
which points in the right direction. But to decide whether velocity can be treated
as a vector (i.e. whether it obeys the rules in Section 9.4) we need to say what
addition of velocities is to mean physically.
Typically, addition of velocities is concerned with combining relative velocities.
For example, if an escalator is moving at 0.5 m s−1 relative to the wall, and a
passenger is walking up at 1 m s−1 relative to the escalator, then the actual velocity
of the passenger is 1 + 0.5 = 1.5 m s−1 relative to the wall.
Since relative velocities are relative displacements per unit time, velocity vectors
obey the same rules as displacement vectors. Take a set of axes Ox, Oy, Oz which
are to be regarded as fixed axes. They might be fixed relative to the earth’s surface,
or relative to the directions of distant stars. Let
vP = velocity of a point P relative to the fixed axes,
vQ = velocity of a point Q relative to the fixed axes,
vQP = velocity of Q relative to P.
Then the velocity vQP of a point Q as observed from P, in terms of the velocities vQ
and vP observed from the fixed axes, is given by
205
velocity of Q relative to P = velocity of Q − velocity of P,
or vQP = vQ − vP.

9.5
(9.13)

RELATIVE VELOCITY
Example 9.6 (Figure 9.17) A river of width 0.2 km flows with uniform speed
3 km h−1 from west to east. A boat sets off from a point S on the south bank,
wishing to land at a point N on the north bank directly opposite S. It can travel
at a speed of 5 km h−1 relative to the water. In what direction should it point in
order to arrive at N by a straight line route? How long does it take?

Direction of
Direction travel
of bow
θ
Water velocity 0.2 km
B

Fig. 9.17

The true path of the boat (i.e. as seen from the land, or relative to fixed axes) is not
along the direction it is pointing, because it is also being carried downstream. However,
viewed from axes which travel along with the water, it does go in the direction it is
pointing, at an apparent speed of 5 km h−1. To visualize this, imagine there is a dense
fog, so that the banks cannot be seen and the pilot is not aware of the current.
With B denoting ‘boat’ and W denoting ‘water’, put
vB = velocity of B relative to fixed axes (direction north, magnitude, or speed,
unknown);
vBW = velocity of B relative to the water W (speed 5 km h−1, in the unknown direction
it is pointing);
vW = velocity of the water W relative to fixed axes (direction east, speed 3 km h−1).
We also know from (9.13) that these are connected by
vBW = vB − vW,
or vB = vBW + vW.
This information gives Fig. 9.18.
(a) From Fig. 9.18, the boat is directed at θ = arcsin 53 = 36.9°.
(b) Pythagoras’s theorem gives the magnitude of vB:
| vB | = √(52 − 32) = 4 km h−1.
Therefore the time taken is 0.2/4 = 0.05 h = 3 minutes.
206

3
ELEMENTARY OPERATIONS WITH VECTORS

vW

vBW
vB
5

Fig. 9.18

Self-test 9.3
Two roads in NS and WE directions cross (by a bridge). A car A is travelling
9

north at 70 km h–1, and a second car B is travelling east on the other road at
50 km h–1. What is the speed of B relative to A, and what is the apparent
direction of B viewed from A?

9.6 Position vectors and vector equations


In Fig. 9.19 P is the point with coordinates (2, 3, 1). The vector O_P, or r, which has
its initial point at the origin of coordinates, O, is called the position vector of P.
The components of r, or O_P, are then equal to the coordinates of P, so
O_P = r = (2, 3, 1)
in this case. Position vectors are often distinguished from ordinary vectors by
using the letter r. Apart from their being attached to the origin, the rules for
position vectors are the same as for ordinary vectors.
This device enables us to specify, for example, the point at which a force acts,
without mixing up vectors with coordinates in the same calculation. It also allows
us to do coordinate geometry in vector terms, by obtaining vector equations
describing curves and surfaces in terms of the position vector r = (x, y, z).

2
1
1 2
1 3
2 O r
P
1 2 y Fig. 9.19 Position vector, r = O_P,
3
x of the point P.
207

Example 9.7 (Two dimensions.) A circle has radius c, and its centre C at the

9.6
point (a, b). (a) Obtain a vector equation for the circle. (b) Deduce the ordinary
cartesian equation.

POSITION VECTORS AND VECTOR EQUATIONS


(a) The circle is shown in Fig. 9.20. P is any point (x, y) on its circumference, so its
position vector in component form is
r = (x, y).
The centre C has position vector rC, where
rC = (a, b).
Also, C_P = r − rC. The length of C_P must be constant and equal to c, so
|r − rC | = c. (i)
This is the vector equation required.
(b) To turn (i) into x, y form write r and rC in component form:
r − rC = (x, y) − (a, b) = (x − a, y − b).
The length of this vector is given by
|r − rC | = √[(x − a)2 + (y − b)2].
Therefore, after squaring both sides in (i), we get
(x − a)2 + (y − b)2 = c2,
which is the usual form for the equation of a circle. (This is not an efficient way of
obtaining it, of course. We are simply checking that (i) makes sense.)

z
y
P : (x, y) A
a–b
r – rC B r–b
c–b P
r
C : (a, b) b r
C
a c
rC

O
y
O x
x
Fig. 9.20
Fig. 9.21

Example 9.8 Three points, A, B, and C (which do not lie in a straight line), have
position vectors a, b, and c. (a) Obtain a parametric vector equation for the
plane through the points A, B, C. (b) Deduce parametric cartesian (i.e. x, y, z)
equations for the plane in the case where the points are A : (1, 2, 1), B : (2, 2, 0),
C : (2, 1, 2). (c) Deduce the ordinary cartesian equation for this plane by
eliminating the parameters occurring in (b).
(a) Figure 9.21 shows the points A, B, C, and their position vectors. The point P : (x, y, z)
with position vector r is any point in the plane through A, B, and C. By the triangle rule
B_A = a − b, B_C = c − b, B_P = r − b. ➚
208
Example 9.8 continued
ELEMENTARY OPERATIONS WITH VECTORS

By using the result (9.12), which relates any three coplanar vectors, we obtain
B_P = λ B_A + µ B_C,
or r − b = λ(a − b) + µ(c − b), (i)
where λ, µ are two constants which depend on the position of P. We find every point r in
the plane by letting the parameters λ , µ run through all possible values between −∞ and
+∞, so (i) is a parametric vector equation for the plane through A, B, C.
(b) Since r, a, b, c are position vectors, their components are given by the coordinates of
P, A, B, C, so eqn (i) becomes
(x, y, z) − (2, 2, 0) = λ [(1, 2, 1) − (2, 2, 0)] + µ[(2, 1, 2) − (2, 2, 0)]
= λ (−1, 0, 1) + µ(0, −1, 2).
Take the vector (2, 2, 0) over to the right-hand side, and then match the x, y, z
components separately:
x = 2 − λ, y = 2 − µ, z = λ + 2µ (ii)
where λ and µ may take any values. These are cartesian parametric equations for the plane.
(c) We obtain an x, y, z equation by eliminating λ and µ from the equations (ii). From the
first two equations we have
λ = 2 − x and µ = 2 − y.
9

Substitute these into the third equation of (ii):


z = (2 − x) + 2(2 − y),
which is the same as
x + 2y + z = 6. (iii)

If A, B, and C do not lie on a straight line, the equation of the plane through them
will always be like Example 9.8(iii):

Equation of a plane
The general equation of a plane is
ax + by + cz = d,
where a, b, c, d are constants. (9.14)

Example 9.9 (Three dimensions.) Two points, A and B, have position vectors
a and b. (a) Obtain a parametric vector equation for the straight line joining A
and B. (b) Deduce parametric cartesian (i.e. x, y, z) equations for the case where
the points are A : (2, 2, −1) and B : (0, 1, −2). (c) By eliminating the parameter
between the equations in (b), find cartesian equations for this line.
(a) Figure 9.22 shows the points A and B and their position vectors a and b. The
point P : (x, y, z) with position vector r represents any point on the line joining AB.
Also,
A_B = b − a, and A_P = r − a.
A_P is some multiple, λ say, of A_B: ➚
209
Example 9.9 continued

9.6
z

POSITION VECTORS AND VECTOR EQUATIONS


A
y

a
B

b
P : (x, y, z)
O r

x Fig. 9.22

A_P = λ A_B,
or r − a = λ(b − a).
Therefore
r = (1 − λ)a + λ b. (i)

This is the required parametric vector equation, with λ as the parameter. As λ increases
from −∞ to +∞, P traces out the straight line passing through A and B.
(b) Since r, a, and b are position vectors, their components are the same as the
coordinates of P, A, and B:
r = (x, y, z), a = (2, 2, −1), b = (0, 1, −2).
Substitute these into (i):
(x, y, z) = (1 − λ)(2, 2, −1) + λ(0, 1, −2) = (2 − 2λ, 2 − λ, −1 − λ).
Now match the x, y, z components on both sides:
x = 2 − 2λ, y = 2 − λ, z = −1 − λ. (ii)

These are parametric cartesian equations, in which the parameter ranges from −∞
to +∞.
(c) In order to get rid of the parameter λ in (ii), write them successively in the form
x−2 y−2 z +1
λ= , λ= , λ= .
−2 −1 −1
Since the three fractions are equal (equal to the current value of λ) we obtain the relation
between x, y, z which holds on the line:
x−2 y−2 z +1
= =
−2 −1 −1
which simplifies to
− --12 x + 1 = −y + 2 = −z − 1. (iii)

The shape of the result (iii) of Example 9.9 might strike you as being peculiar.
It really consists of two simultaneous equations, representing two planes which
intersect along the required line AB. The expression cannot be reduced to a single
equation. The general case will be given in Chapter 10.
210

Example 9.10 Given the straight line


ELEMENTARY OPERATIONS WITH VECTORS

2x − 2 = y + 1 = −2z, (i)

(a) Find any one point on the line. (b) Find a parametric equation for the line.
(c) Find the coordinates of the point where the line crosses the plane
x − y + z = 0. (ii)

(a) Put, for example, x = 1. Then from (i), 2x − 2 = y + 1, so when x = 1, y = −1. Also
from (i), 2x − 2 = −2z, so z = 0. Therefore, the point (1, −1, 0) lies on the line. (Other
values of x lead to other points.)
(b) Proceeding as in (a), put x = λ, where λ may take any value. Then we find that
y = 2λ − 3 and z = −λ + 1.
Therefore a set of parametric equations is
x = λ, y = 2λ − 3, z = −λ + 1. (iii)

(c) From (ii) and (iii), at the point where the line meets the plane the value of λ must be
given by
0 = x − y + z = λ − (2λ − 3) + (−λ + 1) = −2λ + 4.
At this point λ = 2, so from (i) again, the line meets the plane at
x = 2, y = 1, z = −1.
9

(Alternatively, solve the equations (i) and (ii) simultaneously.)

Self-test 9.4
Find the parametric vector and cartesion equations of the plane through the
points A, B, C with position vectors respectively a = (1, –1, 2), b = (2, 0, –1),
c = (3, –1, –3). Show that the plane passes through the origin.

9.7 Unit vectors and basis vectors


A vector of unit magnitude is called a unit vector. For example,
a = (− 27 , 37 , 67 ) is a unit vector since
a = | a | = √[(− 27 )2 + ( 73 )2 + ( 67 )2 ] = 1.
The vector (1, 0, 0) is a unit vector; it points in the direction of the x axis, since
if it is drawn as a position vector it would join the origin to the point 1 unit along
the x axis. Similarly, (0, 1, 0) and (0, 0, 1) are unit vectors in the y and z directions
respectively. These vectors have the special symbols î, q, and x, and are called basis
vectors for the given coordinates:
î = (1, 0, 0), q = (0, 1, 0), x = (0, 0, 1). (9.15)

(They are sometimes spoken of as ‘i-hat’, and so on.) Figure 9.23 shows them as
position vectors.
211

9.7
2

UNIT VECTORS AND BASIS VECTORS


y
1
2

x
1
q
O
î

2 x
Fig. 9.23 Basis vectors, î, q, x.

Any vector can be expressed in terms of î, q, and x. Suppose that a = (a1, a2, a3)
in component form. Then
a = (a1, 0, 0) + (0, a2, 0) + (0, 0, a3)
= a1(1, 0, 0) + a2(0, 1, 0) + a3(0, 0, 1) = a1î + a2 q + a3x.
The components become the coefficients of î, q, and x.

Example 9.11 Let a = 2î + 3q − x and b = î − 3x. Express the vector x in the


equation 3a + 2x = b in terms of î, q, x.
In the usual way, we find that
x = 12 (b − 3a ) = 12 b − 23 a = 12 (î − 3x) − 23 (2 î + 3 q − x) = − 52 î − 92 q.
The components of x are therefore (− 52 , − 92 , 0).

If a is any vector, then the vector â (called ‘a-hat’)


â = a /|a |,
obtained by dividing a by its own length (or magnitude), is a unit vector in the
direction of a (we can say ‘the direction of a is â’).

Example 9.12 Obtain the unit vector w pointing in the direction of the force
F = 2î − 3q − 6x.
|F | = √[22 + (−3)2 + (−6)2] = √49 = 7.
Therefore, the unit vector pointing in the same direction is
w = F/| F| = (2î − 3q − 6x)/7 = 27 î − 73 q − 67 x,
or, in component form,
w = ( 27 , − 73 , − 67 ).
212

Unit vectors
ELEMENTARY OPERATIONS WITH VECTORS

A unit vector is a vector of unit magnitude. The unit vector in the direction of a
is denoted by â (a-hat).
(a) If a is any vector, then
â = a/|a |.
(b) The vectors î, q, x (basis vectors) are the unit vectors in directions Ox, Oy,
Oz. If a = (a1, a2, a3) is any vector, then
a = a1î + a2 q + a3 x.
(For two dimensions, use only î and q.). (9.16)

Example 9.13 Find the point Q where the straight line joining A : (2, 3, 1) and
B : (1, 2, 2) intersects the plane x + y + z = 0.
The position vectors of A and B, in terms of î, q, x, are a = 2î + 3q + x and b = î + 2q + 2x
respectively. Let r = xî + yq + zx be the position vector of a general point on the line AB.
Then from Example 9.9a, the parametric equation of AB is
r = (1 − λ)a + λ b = (1 − λ)(2î + 3q + x) + λ(î + 2q + 2x).
9

After collecting terms in î, q, and x on the right, this becomes


xî + yq + z x = (2 − λ)î + (3 − λ)q + (1 + λ)x. (i)
Match the coefficients of î, q, x on either side of (i); then
x = 2 − λ, y = 3 − λ, z = 1 + λ. (ii)
The intersection point Q is on the plane, x + y + z = 0, so
(2 − λ) + (3 − λ) + (1 + λ) = 0.
Therefore λ = 6 at Q. Put this value back into (ii):
x = −4, y = −3, z = 7.
The position vector of Q is therefore
−4î − 3q + 7x.

Problems can be worked through with the vectors given either in component form
or in î, q, x form, whichever is convenient.

Self-test 9.5
Find where the straight line through the points A : (1, 2, –1), B : (p, 1, 0),
(p ≠ 1), intersects the plane x + y + z = 0. Treating p as a parameter, find the
locus of the points of intersection on the plane.

9.8 Tangent vector, velocity, and acceleration


Suppose that the coordinates of a point P depend on a parameter t (which might
stand for time). Then we can write
r(t) = x(t)î + y(t)q + z(t)x.
213

9.8
Q
B(t = b)

TANGENT VECTOR, VELOCITY, AND ACCELERATION


P
δr
r + δr
r

O
A(t = a) Fig. 9.24 Tangent vector T.

As t runs from the value t = a to t = b, where b  a, P follows a curve from A to B,


as in Fig. 9.24.
Consider two points, P and Q, close together on the curve, where the para-
meter values are t and t + δt respectively. The corresponding position vectors are
r(t) and r(t + δt). By the triangle rule,
P_Q = r(t + δt) − r(t) = δr.
Now consider the vector T defined by
T = lim [r(t + δ t) − r(t)]/δ t = lim δ r /δ t.
δt →0 δt → 0

This is like an ordinary derivative, so we denote this vector by


T = dr(t)/dt. (9.17)

Notice also that this is equivalent to


T = dr/dt = î dx /dt + q dy/dt + x dz /dt,
since î, q, x are constant.
As δt approaches zero, δr, and therefore δr/δt, become more and more nearly
tangential to the curve. Therefore T is a tangent vector to the curve at P. To decide
which way T points, consider the case when δt is positive. Then δr points in the
direction of increasing t, so the tangent vector T must also point in this direction.

Derivative of r(t)
r(t) = îx(t) + qy(t) + xz(t), where t is a parameter, represents a curve. The vector T
given by
T = dr/dt = î dx/dt + q dy /dt + x dz /dt
is a tangent to the curve, in the direction of increasing t. (9.18)

If the parameter t stands for time, then dr /dt is the definition of the velocity v(t)
of P, and dv/dt represents its vector acceleration:

Velocity and acceleration vectors


If a point P has position vector r(t), where t represents time, then
velocity v(t) = dr/dt,
acceleration a(t) = dv/dt or d2r/dt2.
Also the speed = | v(t)|. (9.19)
214
Notice that velocity and acceleration are not generally parallel.
ELEMENTARY OPERATIONS WITH VECTORS

Example 9.14 (Motion in the (x, y) plane.) The position vector of a point P is
given by
r(t) = îc cos ω t + qc sin ω t,
where c and ω are positive constants. Find (a) the velocity v(t) and the speed of
P; (b) the acceleration a(t) of P.
(a) |r(t)| = c√[cos2ω t + sin2ω t] = c, so P is moving around a circle of radius c in the (x, y)
plane.
v = dr /dt = −îcω sin ω t + qcω cos ω t.
The direction of v is tangential to the circle, by (9.18). By putting, say, t = 0 we obtain
v = qω c, and since c, ω  0, this shows the motion to be anticlockwise.
Also, speed = |v | = cω .
(The speed is constant, but the velocity is not, because its direction is continuously
changing.)
(b) a = dv/dt = −îcω 2 cos ω t − qcω 2 sin ω t
= −ω 2(îc cos ω t + qc sin ω t) = − ω 2r.
9

The acceleration is therefore directed towards the centre of the circle (perhaps
unexpectedly).

Self-test 9.6
A particle has the position vector r = cî cos ω t + –√2–1 cq sin ω t + –√2–1 cx sin ω t in
terms of time t. Show that the particle moves on a sphere of radius c. Find the
velocity and acceleration of the particle. Show that both are constant in
magnitude, and that the acceleration is directed towards the origin.

9.9 Motion in polar coordinates


Suppose that two-dimensional polar coordinates r, θ are appropriate to the
geometry of an application. Figure 9.25 shows a point P : (x, y) and its polar
coordintes r, θ. There are also two unit vectors êr and êθ associated with P : êr in
the direction of θ constant with r increasing, and êθ in the direction of r constant
with θ increasing. The position vector of P is r, given by
O_P = r = rêr. (9.20)

The unit vectors êr and êθ vary in direction according to the value of θ, and are
therefore functions of θ. They are related to the basis vectors î and q as in Fig. 9.26.
By the triangle rule,
êr = î cos θ + q sin θ, êθ = − î sin θ + q cos θ. (9.21)

We shall need their derivatives with respect to θ :


215

9.9
êθ

êr

MOTION IN POLAR COORDINATES


P
r

θ
Fig. 9.25 Polar unit vectors êr
O x and êθ .

êθ
êr q cos θ
q sin θ

θ
î cos θ −î sin θ

Fig. 9.26 The hypotenuse has unit length in both triangles, which determines the lengths of the
other vectors.

dêr /dθ = −î sin θ + q cos θ = êθ (9.22)

and dêθ /dθ = −î cos θ − q sin θ = −êr . (9.23)

Now suppose that P is moving along a curved path. Then r and θ are functions
of time, t, so we can write r(t), θ(t) for its polar coordinates, and consider their
derivatives with respect to t. There is a useful dot notation for time derivatives
which saves a lot of writing – it works in the same way as the dash notation, (4.1):

Dot notation for time derivatives


If x(t) represents a function of t, then B stands for dx/dt, F stands for
d2x/dt2, etc. (9.24)

The dot notation is used extensively in dynamics.


By using the chain rule, and writing I for dθ /dt, we obtain from (9.22) and
(9.23) the time variation of êr and êθ :
dêr dêr dθ dêθ dêθ dθ
= = êθ I and = = −êr I. (9.25)
dt dθ dt dt dθ dt
This result is used in the following example.
216

Example 9.15 The polar coordinates of a point moving in a plane are r(t), θ(t),
ELEMENTARY OPERATIONS WITH VECTORS

where t is time. Find the polar components (a) of its velocity and (b) of its
acceleration.
(a) The position vector is r(t) = r(t)êr. The velocity v is dr /dt:
v(t) = dr /dt = d(rêr) /dt.
Both r and êr depend on θ, so we use the product rule for differentiation:
v = Kêr + r dêr /dt = Kêr + rIêθ (i)
by (9.25). Therefore the radial velocity component is K and the transverse component
is rI.
(b) The acceleration is dv/dt, given by
dv /dt = (d/dt)(Kêr + rIêθ) (from (i))
dêr d(r I) dê
= } êr + K + êθ + r I θ
dt dt dt
= }êr + KIêθ + (KI + rJ)êθ − rI êr
2

= (} − rI2)êr + (rJ + 2KI)êθ .


Therefore the radial component of acceleration is } − rI2, and the transverse component
is rJ + 2KI.
9

Problems

9.1 Sketch the two-dimensional displacement displacement vectors. In axes pointing east and
vectors P_Q and Q_P, and state their x and y north, S1 follows the path to B via Q_A = (2, 4),
components, when the coordinates of P and and A_B = (4, 1). S2 goes to E via Q_C = (3, 3), C_D =
Q are as follows. (1, 1), and D_E = (2, −3). Find the displacement
(a) P : (−2, 3), Q : (3, 0), (b) P : (3, 4), Q : (2, 1), vector B_E in component form, the distance BE,
(c) P : (0, 1), Q : (−1, −2), and the final bearing of S2 seen from S1.
(d) P : (−1, −1), Q : (0, 0).
9.5 Find the distances between the pairs of points
9.2 (a) to (h) represent two-dimensional whose coordinates are: (a) (0, 0, 0) and (1, 2, 3),
displacement vectors expressed in terms of their x, (b) (1, 2, 3) and (3, 2, 1), (c) (1, 0, −1) and (−1, 1, 0).
y components. For each one obtain the length and
the angle of inclination θ to the positive direction 9.6 State the projections on the three axes of the
of the x axis in the range −180° to 180°. vector P_Q when P is the point (1, 2, 1) and Q is
(a) (3, 0), (b) (0, 2), (c) (−1, 1) (2, 3, 3).
(d) (1, 1), (e) (−1, −1), (f) (−3, 4),
(g) (−3, −4), (h) (−2, 1). 9.7 Find 2a, 3b, and 2a − 3b when
(Made sure that these angles are in the right (a) a = (1, 2, 1), b = (2, 1, 2),
quadrant by means of a rough sketch.) (b) a = (3, 2, 3), b = (1, 1, 2),
9.3 Obtain the components of the vectors a in (a) (c) a = (6, 3, 1), b = (4, 2, 1).
to (d), where L is the magnitude and θ the angle How do you recognize that 2a − 3b is parallel
made with the positive direction of the x axis to the (x, y) plane in (b), and parallel to the z axis
(−180°  θ  180°): (a) L = 2, θ = 45°, (b) L = 3, in (c)?
θ = 120°, (c) L = 3, θ = 60°, (d) L = 3, θ = −150°.
9.8 Sketch a diagram to show that if A, B, C are
9.4 Two ships, S1 and S2, set off from the same any three points, then A_B + B_c
_ + C_A = 0. Formulate
point Q. Each follows a route given by successive a similar result for any number of points.
217
9.9 Sketch a diagram to show that if A, B, C, D 9.18 r is the position vector (2, 3, 1), and
are any four points, then C_D = C_B + B_A + A_D. a = (1, 1, 2) is a general vector. R is the position

PROBLEMS
Formulate a similar result for any number of points. vector defined by R = a + 2r. Find the coordinates
of the terminal point of R.
9.10 Oxyz and QXYZ are two sets of axes with
origins at O and Q respectively. QX is parallel to 9.19 Find the angle θ, where 0  θ  180°, made
Ox and has the same sense (positive direction), and by the position vector r with the positive directions
similarly for QY and QZ. The frame QXYZ is said of the axes Ox, Oy, Oz in the following cases:
to be a translation (a motion without rotation) of (a) r = (1, 0, 0), (b) r = (0, 1, 1), (c) r = (0, 0, −1),
the frame Oxyz. (d) r = (1, 1, 1), (e) r = (1, 1, −1).
Suppose that O_Q = (2, −1, 3). (a) Find the
coordinates of the point P in QXYZ if it has 9.20 P : (1, 1, 0), Q : (1, 1, 1), and R : (1, 2, 1)
coordinates x = 5, y = 2, z = −3 in Oxyz. (b) Find are three of the vertices of a parallelogram with
the equation of the sphere x2 + y2 + z2 = 1 in terms sides PQ and PR. Use vector methods to find the
of X, Y, and Z. coordinates of (a) the fourth vertex, S, (b) the
midpoint of PS, (c) the midpoint of QR. Show that
9.11 ABCD is any quadrilateral in three (b) and (c) have the same coordinates (it is where
dimensions. Prove that if P, Q, R, S are the the diagonals intersect).
midpoints of AB, BC, CD, DA respectively, Find the midpoints A, B, C, D of the four sides
then PQRS is a parallelogram. PR, RS, SQ, QP respectively. Show that ABCD is
a parallelogram.
9.12 ABC is a triangle, and P, Q, R are the
midpoints of the respective sides BC, CA, AB. 9.21 Show that the points A : (1, 2, −1), B : (3, 3, −2),
Prove that the medians AP, BQ, CR meet at a and C : (−3, 0, 1) are collinear (lie on a straight
single point G (called the centroid of ABC; it is line), by considering the vectors A_B and A_C (or any
the centre of mass of a uniform triangular plate). other two combinations of A, B, and C). (a) Find
which point is between the other two. (b) Find any
9.13 Show that the vectors O_A = (1, 1, 2), O_B = other point on the line. (c) Show that the points
(1, 1, 1), and O_c = (5, 5, 7) all lie in one plane. Show x = 2λ + 1, y = λ + 2, z = −λ − 1, where λ is a
that the same is true if O_A = (a, a, p), O_B = (b, b, q), parameter which may take any value, all lie on the
O_C = (c, c, r), where a, b, c, p, q, r may stand for line (these are parametric equations for the line).
any numbers. Explain this result geometrically.
9.22 Two points A and B have position vectors
9.14 A glider is moving with a velocity v = a and b respectively. In terms of a and b find the
(40, 30, 10) relative to the air and is blown by the position vectors of the following points on the
wind which has velocity relative to the earth of straight line passing through A and B: (a) the
w = (5, −10, 0). Find the velocity of the glider midpoint C of AB; (b) a point U between A and B
relative to the earth. for which AU/UB = 1/3; (c) a point V for which
AV/VB = 1/3, but for which V does not lie between
9.15 The captain of a boat at night can tell that A and B.
it is moving relative to the sea with velocity (5, 4)
km h−1, and by observation of lights on shore its 9.23 Suppose that λ is a number such that 0  λ
true velocity is found to be (4, 1). What is the  1. Find two points, U and V, on the line through
velocity of the current? A and B such that (a) AU/UB = λ and U is between
A and B. (b) AV/BV = λ and V is not between A
9.16 A cyclist rides north along a straight road and B. (c) What is the case if λ  1?
at 10 km h−1. The wind appears to come from the
west. If she increases her speed to 20 km h−1 then 9.24 (a) Obtain a vector parametric equation for
the wind appears to blow from the north west. the straight line which passes through the point
Determine the speed and direction of the wind. (1, 4, 2) and is parallel to the line joining the points
(2, 3, 4) and (1, 2, 3). (b) As in Example 9.9, deduce
9.17 A ship travels south with speed u and the a pair of simultaneous cartesian equations for the
apparent wind direction is from the east. Another line. (c) Obtain the points where the line intersects
travels west with speed 2u/√3, and the apparent the (x, y) plane and the (y, z) plane. (d) By using
wind direction is from 30° east of north. Find the these two points, obtain another pair of cartesian
true wind velocity. equations for the line.
218
9.25 Suppose that P has position vector r, and through the point with position vector î + q + x.
r = λ a + (1 − λ)b, where λ is a parameter, and A, B Find the point of intersection of this line with the
ELEMENTARY OPERATIONS WITH VECTORS

are points with a, b as position vectors. Show that plane x − y + z = −2.


P describes a straight line. Indicate on a diagram
the relative positions of A, B, P when λ  0, 9.35 An aircraft flying with constant speed V is
0  λ  1, and λ  1. circling horizontally at height H above an airfield
which lies in the (x, y) plane. Its motion is in the
9.26 Find the cartesian equation of the planes clockwise direction when viewed from below.
passing through the following points: (a) (1, 0, 1), The centre of its circular path is at Pî + Hx, and
(0, 1, 0), (0, 0, 1), (b) (0, 0, 0), (1, 2, −1), (2, 2, 2). at time t = 0 it is at the point (P + R)î + Hx. Find
the position vector for the aircraft at time t.
9.27 Find the shortest distance from the origin
of the line given in vector parametric form by 9.36 a and b are two position vectors. Find in
r = a + tb, where a = (1, 2, 3), b = (1, 1, 1), and t terms of a and b a position vector which bisects
is the parameter. (Hint: use a calculus method, the angle between them.
with t as the independent variable.)
9.37 An aircraft A is flying along a path given by
9.28 For each of the following cases find a unit the position vector 0.41î + 148tq + 0.99x, where t is
vector which has the same direction as a, and a the time in hours, and distance is in km. Another
unit vector which has the opposite direction. aircraft, B, takes off from an airfield at the origin
(a) a = (3, 4, 3), (b) a = 2î + 3q + 6x, (c) a = O at time t = 0 and follows the path given by the
(−1, −1, 2), (d) a = î − 2q + x, (e) 3î − 6q + 3x. position vector 100tî + 250tq + 250tx. (a) Show that
9

A and B are moving along straight lines at constant


9.29 Express in terms of î, q, x the vectors whose speeds, and find the speeds. (b) Show that a near
initial and terminal points are respectively given miss between A and B will occur, and find the time
by the following position vectors: (a) î + q + x and that this happens.
−2î + 3q + 5x, (b) î + 2q − x and 3î − q − 2x. Find the
length of the vector in each case. 9.38 Two moving points A and B have position
vectors rA(t) = xA(t)î + yA(t)q + zA(t)x and rB(t) =
9.30 Show that the line joining the points with xB(t)î + yB(t)q + zB(t)x respectively, which depend on
position vectors î − q + 2x and 2î − 2q − 3x intersects the time t. (a) Show that the velocity of B relative to
the z axis. A is drB(t)/dt − drA(t)/dt. (b) Suppose that the two
points are A : (t, − t2, t) and B : (t3, 2t2, 1 + 3t). Find
9.31 A set of two-dimensional position vectors is the velocity of B relative to A and the velocity of
given by r = aî + bq, where | a| + |b |  1. Describe A relative to B. (c) Find the time t at which the
the shape of the region which includes all the relative speed is a minimum.
points with these position vectors.
9.39 A particle describes an elliptical plane path
9.32 A set of position vectors is given by r = aî + bq with position vector r = îa cos ω t + qb sin ω t, where
+ cx, where | a | + | b| + |c |  1. Describe the shape t is time and ω, a, b are constants. Show that the
of the region which includes all the points with acceleration is always directed towards the centre.
these position vectors.
9.40 The position vector of a particle is given in
9.33 Supppose that a weightless framework polar coordinates by r = sec t, θ = t. Sketch the
supports N particles, which have masses mi and path for 0  t  --12 π. Find the radial and transverse
are located at points with position vectors ri where components of acceleration.
i = 1, 2, 3, … , N. You may assume that the centre
of mass is at the point with position vector \, where 9.41 The position vector of a particle P is given by
\ = ∑ miri / ∑ mi. r = îa cos ω t sin vt + qa sin ω t sin vt + xa cos vt,
Find the centre of mass of three particles of where a, ω, v are constants and t is time. Show that
masses 1 kg, 2 kg, and 3 kg at the points î + q + 2x, P moves on a sphere of radius a. Find the velocity
−2î + 3q − 5x, and 3q + 2x. of the particle and show that its magnitude is
a(v2 + ω 2 sin2 vt)–. Deduce that the minimum speed
1
2

9.34 Obtain a parametric vector equation for the occurs at the highest and lowest points of the sphere,
line which is parallel to î + 2q − x and which passes and find where the maximum occurs.
The scalar product
10

CONTENTS

10.1 The scalar product of two vectors 219


10.2 The angle between two vectors 220
10.3 Perpendicular vectors 222
10.4 Rotation of axes in two dimensions 223
10.5 Direction cosines 225
10.6 Rotation of axes in three dimensions 226
10.7 Direction ratios and coordinate geometry 229
10.8 Properties of a plane 230
10.9 General equation of a straight line 234
10.10 Forces acting at a point 235
10.11 Tangent vector and curvature in two dimensions 238
Problems 240

Given two vectors a and b, an operation can be carried out that bears some sim-
ilarity to forming their product. There are two types of ‘product’, the scalar prod-
uct or dot product, written a·b, which is the subject of this chapter, and vector
product, or cross product a × b, treated in the next chapter, and they have differ-
ent spheres of usefulness. The scalar product of two vectors is not itself a vector,
but a scalar quantity related to the angle between the two vectors. This property
extends the capacity of vector techniques to handle many geometrical questions.
In mechanics the scalar product is associated with component of a vector
quantity, such as force, in a given direction.

10.1 The scalar product of two vectors


Suppose that in component form
a = (a1, a2, a3) and b = (b1, b2, b3).
The dot product or scalar product of a and b is denoted by a dot and is defined by
a ·b = a1b1 + a2b2 + a3b3.
(It is necessary to write the dot, because there is also another form of product,
called the vector product.) The dot product is not a vector, but an ordinary
number, or a scalar quantity. Some simple properties which are easy to prove are:
220

Scalar or dot product


THE SCALAR PRODUCT

Definition: Let a = (a1, a2, a3) and b = (b1, b2, b3).


Then
a·b = a1b1 + a2b2 + a3b3.
(a) a·b = b·a (commutative property).
(b) a·(b + c) = a ·b + a ·c (distributive property).
(c) Connection with the magnitude |a|:
a·a = a 12 + a 22 + a 32 = | a |2.
(For two dimensions, omit the third component.) (10.1)
10

Example 10.1 Find (a − b)·(a + b) when a = (−1, 0, 1) and b = (2, 3, 2).


a − b = (−3, −3, −1) and a + b = (1, 3, 3).
Therefore
(a − b)·(a + b) = (−3 × 1) + (−3 × 3) + (−1 × 3) = −15.

Example 10.2 Prove that (a − b) ·(a + b) = | a|2 − |b|2.


Use the rules in (10.1) to proceed as in ordinary algebra:
(a − b)·(a + b) = (a − b) ·a + (a − b) ·b = a·a − b·a + a·b − b·b
= a·a − b ·b = | a|2 − | b|2 (by (10.1c)).

Self-test 10.1
Prove that if a = −î − 2q − x and b = 2î − q, then a · b = 0.

10.2 The angle between two vectors


In Fig. 10.1 we show two vectors, a and b, in three dimensions. Their initial points
coincide at P. By the angle θ between a and b we mean the angle θ in the plane
of a and b as shown: the angle chosen is the one which is in the range 0° to 180°
(i.e. we refer to the internal angle, and do not use negative angles).
By the triangle rule
B_A = a − b,
and the lengths of the sides of the triangle ABP are given by
| P_A | = | a|, | P_B | = |b| , | B_A | = | a − b |.
The cosine rule (Appendix B) says that
BA2 = PA2 + PB2 − 2PA· PB cos θ,
or | a − b|2 = |a |2 + |b|2 −2| a| | b| cos θ. (10.2)

But from (10.1c), putting a − b in place of a,


|a − b|2 = (a − b) · (a − b) = a· a + b ·b − 2a ·b,
or | a − b|2 = |a |2 + |b|2 − 2a·b. (10.3)
221

10.2
B

THE ANGLE BETWEEN TWO VECTORS


a –b
b
y
θ A
P a

x Fig. 10.1

By comparing (10.2) with (10.3) we obtain


a·b = |a||b | cos θ,
or cos θ = a·b /|a ||b|.

Angle between two vectors


Let θ, 0  θ  π, be the internal angle between the directions of a and b. Then
(a) a· b = |a||b | cos θ.
(b) cos θ = a· b/|a||b |.
Therefore θ = arccos(a·b/|a||b | ) (a calculator gives this angle uniquely in the
range 0° to 180°). (10.4)

If a and b are not at the same point to start with we may still refer to θ as
being the angle between them. The result (10.4b) can also be written in the form
cos θ = â ·s where â and s are the unit vectors in the directions of a and b.

Example 10.3 Given three points A : (1, 1, 1), B: (3, 2, 3), and C : (0, −1, 1), find
the angle θ between C_A and C_B.
Put C_A = a, C_B = b. Then
a = (1, 1, 1) − (0, −1, 1) = (1, 2, 0), b = (3, 2, 3) − (0, −1, 1) = (3, 3, 2).
| a| = √ [12 + 22 + 02] = √5 and | b| = √ [32 + 32 + 22] = √22.
a ·b = (1 × 3) + (2 × 3) + (0 × 2) = 9.
From (10.4),
a ⋅b 9
cos θ = = = 0.858.
| a || b | √110
Finally θ = 30.9°
222

Self-test 10.2
THE SCALAR PRODUCT

Show that the angle θ between the vectors (3, 2, 1) and (−2, 1, 2) is equal
to 100.2°.

10.3 Perpendicular vectors


Cases when vectors are perpendicular or orthogonal are particularly important.
The condition is that cos θ = 0.
10

Example 10.4 Show that the vectors a = (1, 2, 3) and b = (−5, 1, 1) are
perpendicular.
We have
a ⋅b
cos θ =
| a || b |
and a· b = (1, 2, 3) ·(−5, 1, 1) = −5 + 2 + 3 = 0. Therefore θ = 90°, by (10.4).

From (10.4) the condition for two vectors to be perpendicular may be expressed
as follows:

Perpendicular vectors
If a and b are nonzero vectors, they are perpendicular if
a ·b = a1b1 + a2b2 + a3b3 = 0. (10.5)

The basis vectors î, q, x are perpendicular, so


î · q = q ·x = x· î = 0. (10.6)

Also, they have unit magnitude, so by (10.1c),


î · î = q · q = x· x = 1. (10.7)

Suppose that a = (a1, a2, a3) in component form. Then


î · a = î · (a1î + a2 q + a3x) = a1î · î + a2î ·q + a3 î·x = a1,
from (10.6) and (10.7). The component a1 is therefore picked out by scalar multi-
plication by î. Similarly,
q · a = a 2, x · a = a3.
We can therefore write any vector in the form
a = (î ·a)î + (q ·a)q + (x· a)x.
(Remember that î ·a, q ·a, and x · a are ordinary numbers.)
223

Scalar products of î, Q, X

10.4
(a) î ·î = q · q = x·x = 1;
î· q = q · x = x·î = 0.

ROTATION OF AXES IN TWO DIMENSIONS


(b) The components of any vector a are given by
a1 = î ·a, a2 = q ·a, a3 = x·a. (10.8)

Example 10.5 Find the numbers α, β, and γ which make the vectors
a = α î + q + 2x, b = î + β q − x, c = î − q + γ x
mutually perpendicular.
We require that a· b = b·c = c· a = 0.
a ·b = (α î + q + 2x)·(î + β q − x) = α + β − 2 = 0,
b· c = (î + β q − x)· (î − q + γ x) = 1 − β − γ = 0,
c·a = (î − q + γ x)·(α î + q + 2x) = α − 1 + 2γ = 0.
Therefore α, β, γ must satisfy
α+β = 2, (i)
− β − γ = −1, (ii)
α + 2γ = 1. (iii)
Substitute α from (i) and γ from (ii) into (iii) to give
(2 − β ) + 2(1 − β ) = 1,
so that β = 1. From (ii), γ = 1 − β = 0, and from (i), α = 2 − β = 1. Therefore the required
vectors are
a = î + q + 2x, b = î + q − x, c = î − q.

Self-test 10.3
Show that the vectors 3î − 2q + x, î − 3q + 5x, and 2î + q − 4x are parallel to the
sides of a certain right-angled triangle.

10.4 Rotation of axes in two dimensions


In Fig. 10.2a, P is a point which has coordinates (x, y) in the axes Ox, Oy. OX,
OY is another set of axes, rotated relatively to the first set by an angle θ. The
positive direction for θ is anticlockwise, and θ may lie in the range ±180° so as to
cover all possibilities, like a polar angle. The unit basis vectors in the axes OX,
OY are Î and r respectively. The problem is to find the coordinates (X, Y ) of P in
the new axes.
We can express î and q in terms of Î and r. From Fig. 10.2b, their components in
the X, Y axes are
î = (cos θ, −sin θ ), q = (sin θ, cos θ ).
224

(a) y (b)
THE SCALAR PRODUCT

P : ( x, y)
Y

X q
r

θ Î
θ θ
x 90° – θ x
O O î
10

Fig. 10.2 (a) Change of axes in two dimensions. (b) The associated unit vectors.

Therefore
î = Î cos θ − r sin θ,
and q = Î sin θ + r cos θ.
The position of P in space does not change when we change axes, so in terms of
the new axes
XÎ + Yr = xî + yq
= x(Î cos θ − r sin θ ) + y(Î sin θ + r cos θ )
= (x cos θ + y sin θ )Î + (−x sin θ + y cos θ )r.
Finally, by equating the coefficients of Î and r, we obtain the result (10.9a):

Rotation of right-handed axes in two dimensions


Given axes inclined at θ as in Fig. 10.2a, the coordinates
x, y and X, Y are related by
(a) X = x cos θ + y sin θ,
Y = −x sin θ + y cos θ.
(b) x = X cos θ − Y sin θ,
y = X sin θ + Y cos θ. (10.9)

The inverse relation (10.9b) can be obtained by solving the equations in (10.9a) for
x and y; or by interchanging x, y and X, Y in (a) and putting (−θ ) in place of θ.

Self-test 10.4
(a) The coordinates of a fixed object are x = 1, y = √ 3. What do its coordi-
nates become in axes X, Y obtained by rotating x, y anti-clockwise
through 60°?
(b) In a change of axes, the X axis is to be parallel to the vector î + q. What is
the value to be given to θ in (10.9a)?
225

10.5 Direction cosines

10.5
Figure 10.3 shows a position vector r = O_P, where P is the point (a, b, c), so
O_P = r = (a, b, c).

DIRECTION COSINES
The angles between r and î, q, x respectively (chosen for definiteness between
0° and 180°, as for θ in Section 10.2) are α, β, γ. These angles specify the direction
of r uniquely. It is convenient to use not the angles themselves, but their cosines,
which are normally indicated by l, m, n:
l = cos α, m = cos β, n = cos γ.
These are called the direction cosines of r, and also specify the direction of r
uniquely.

γ β

α
O

Fig. 10.3 Angles made by O_P with


x the axes.

Referring to Fig. 10.3,


|r | = √(a2 + b2 + c2).
Also
l = cos α = a/|r |, m = cos β = b/| r |, n = cos γ = c / | r|.
Therefore
l 2 + m2 + n2 = cos2α + cos2β + cos2γ = (a2 + b2 + c2)/| r |2 = 1.
The vector given in component form by
(cos α, cos β, cos γ ) = (l, m, n)
is therefore the unit vector which specifies the direction of r.
Now let s be a vector having any magnitude and location, but pointing in the
same direction as r. Then s and r have the same inclinations α, β, γ to the axes,
and cos α, cos β, cos γ , the direction cosines of s, are the same. To summarize:
226

Direction cosines l, m, n of any vector s


THE SCALAR PRODUCT

If the angles between s and Ox, Oy, Oz, are α, β, γ, respectively, in the range 0°
to 180°, then
l = cos α, m = cos β, n = cos γ
are the direction cosines of s.
(a) Any vector parallel to s with the same sense has the same direction cosines
l, m, n.
(b) l 2 + m2 + n2 = cos2α + cos2β + cos2γ = 1.
(c) v = (l, m, n) is a unit vector in the direction of s. (10.10)
10

Example 10.6 Obtain the direction cosines of the vector s = î + 2q − 2x. Find the
angles between s and the coordinate axes.
The components of s are (1, 2, −2), so its length is given by
s = √[12 + 22 + (−22] = 3.
Therefore the unit vector v has components
l = 13 , m = 23 , n = − 23.
The corresponding angles in the range 0° to 180° are α = arccos 1
3 = 70.5°,
β = arccos --23 = 48.2°, γ = arccos(− --23 ) = 131.8°.

Self-test 10.5
Obtain the direction cosines (a) of vectors parallel to î + q − x, (b) of the line
joining the points with coordinates (1, --12 , 1) and (0, − --12 , 1), and the angles
they make with the axes.

10.6 Rotation of axes in three dimensions


Figure 10.4a shows two sets of axes Oxyz and OXYZ, with the same origin O.
The basis vectors are respectively î, q, x and Î, r, P. We shall show how to change
from one set of axes to the other, as we did in Section 10.4 in two dimensions.
The three components of any unit vector are equal to its three direction
cosines. Î, r, are P are unit vectors, so in the axes Oxyz let them be given in terms
of their direction cosines by
Î = (l1, m1, n1), r = (l2, m2, n2), P = (l3, m3, n3). (10.11a)

By inverting our view of the two sets of axes, we can also specify the components
of î, q, x in the axes OXYZ:
î = (l1, l2, l3), q = (m1, m2, m3), x = (n1, n2, n3) (10.11b)

(this is illustrated for the case of î in Fig. 10.4b).


227

(a) z (b)

10.6
Z Z

ROTATION OF AXES IN THREE DIMENSIONS


y

α3 Y
Y

O α2
O
α1
î = (l1, l2, l3)
x

X X

Fig. 10.4 (a) Change of axes in three dimensions. (b) Angles between î and the X, Y, Z axes.

Next, suppose that a fixed point P has position vector


r = (x, y, z) = xî + yq + zx
in the axes Oxyz. We need to find the components of r in the axes OXYZ. By
substituting
î = l1Î + l2 r + l3P, etc.
from (10.11b) into r, we obtain
r = (l1x + m1y + n1z)Î + (l2x + m2y + n2z)r + (l3x + m3y + n3z)P.
The OXYZ coordinates of the point P are therefore
(l1x + m1y + n1z, l2x + m2y + n2z, l3x + m3y + n3z). (10.12a)

The inverse relation is obtained in a similar way. Given a fixed point Q, with
coordinates (X, Y, Z) in the axes OXYZ and position vector R, then
R = (X, Y, Z) = XÎ + Yr + ZP.

Now use (10.11a) to show that R is given in the axes Oxyz by


R = (l1X + l2Y + l3Z)î + (m1X + m2Y + m3Z)q + (n1X + n2Y + n3Z) x.
The coordinates of Q in Oxyz are therefore
(l1X + l2Y + l3Z, m1X + m2Y + m3Z, n1X + n2Y + n3Z). (10.12b)

In matrix form, the coordinates in the two systems are related as follows:
228

Rotation of axes; three dimensions


THE SCALAR PRODUCT

Î = (l1, m1, n1), r = (l2, m2, n2), P = (l3, m3, n3) are the basis vectors for axes OXYZ,
referred to axes Oxyz (the components being direction cosines). Then
⎡X ⎤ ⎡l1 m1 n1 ⎤ ⎡x⎤
(a) ⎢Y ⎥ = ⎢l 2 m2 n2 ⎥ ⎢y ⎥
⎢ Z ⎥ ⎢l m3 n3 ⎥⎦ ⎢⎣ z ⎥⎦
⎣ ⎦ ⎣3
⎡x⎤ ⎡ l1 l 2 l3 ⎤ ⎡X ⎤
(b) ⎢y ⎥ = ⎢m1 m2 m3 ⎥ ⎢Y ⎥ .
⎢z ⎥ ⎢ n n n3 ⎥⎦ ⎢⎣ Z ⎥⎦
⎣ ⎦ ⎣ 1 2
(10.13)
10

The matrix of direction cosines in (b) is the inverse of the matrix in (a).

Example 10.7 In axes Ox, Oy, Oz, Î = ( 13, − 23 , 23 ), r = ( 23 , − 13, − 23 ), P = ( 23 , 23 , 13 )


are perpendicular unit vectors which are basis vectors for a new set of axes. Find
the new coordinates of the point P : (−3, −3, 3).
From (10.13), the new coordinates are
X = 13 (−3) − 23 (−3) + 23 (3) = 3,
Y = 23 (−3) − 13 (−3) − 23 (3) = −3,
Z = 23 (−3) + 23 (−3) + 13 (3) = −3.

Example 10.8 (a) Confirm that the matrices


⎡l1 m1 n1 ⎤ ⎡ l1 l2 l3 ⎤
⎢l2 m2 n2 ⎥ and ⎢m1 m2 m 3 ⎥
⎢l m n ⎥
⎣3 3 3⎦
⎢n n
⎣ 1 2 n 3 ⎥⎦
are inverse matrices, where Î = (l1, m1, n1), r = (l2, m2, n2), and P = (l3, m3, n3) are
mutually perpendicular unit vectors.
(b) Find the equation of the plane 3x + 3y + 3z = 1 in the new axes, using the
basis vectors given in Example 10.7.
(a) Multiply the two matrices. The diagonal elements are
l 12 + l 22 + l 32, m 12 + m 22 + m 32, n 12 + n 22 + n 32,
all of which are equal to unity since Î, r, P are unit vectors. The other elements have the
typical form
l1l2 + m1m2 + n1n2 = (l1, m1, n1) ·(l2, m2, n2).
All of these are zero because Î, r, P are mutually perpendicular. Therefore, the product
is the unit matrix.
(b) In this case we need x, y, z in terms of X, Y, Z. The equations corresponding to
(10.13b) are
x = l1X + l2Y + l3Z = 13 X + 23Y + 23 Z,
y = m1X + m2Y + m3Z = − 23 X − 13Y + 23 Z,
z = n1X + n2Y + n3Z = 23 X − 23Y + 13 Z. ➚
229
Example 10.8 continued

10.7
In the new coordinates,
3x + 3y + 3z = (X + 2Y + 2Z) + (−2X − Y + 2Z) + (2X − 2Y + Z) = X − Y + 5Z.

DIRECTION RATIOS AND COORDINATE GEOMETRY


Therefore, the plane has the new equation
X − Y + 5Z = 1.

10.7 Direction ratios and coordinate geometry


In ordinary three-dimensional coordinate geometry the inclination of a straight
line is specified without distinguishing between the two possible directions along
the line. The method used (in vector terms) is equivalent to specifying the three
components of any vector s that is parallel to the line. The length of s, and its
direction forwards or backwards along the line, are immaterial. If
s = pî + qq + rx = (p, q, r)
is parallel to the line, then the triplet of numbers p, q, r is called a set of direction
ratios for the line. Alternatively, if AB is any segment of the straight line, then the
projections of AB on to Ox, Oy, Oz are a set of direction ratios for the line.
Any multiple of p, q, r, say λp, λq, λr, is also a set of direction ratios for the line,
because it corresponds to a parallel vector s1 = λpî + λqq + λrx. For example, if
s = 2î + 3q + 6x is parallel to a given line, then 2, 3, 6 and 6, 9, 18 are both sets of
direction ratios for the line. So are −2, −3, −6, corresponding to the vector (−1)s,
although it points in the opposite direction.
By putting λ = ± 17 we obtain the direction ratios ± 27 , ± 37 , ± 67 corresponding to the
unit vectors ±v. These are also direction cosines for ±s, from which the angles
made with the directions of the axes can be obtained.

Example 10.9 Find the angles made with Ox, Oy, and Oz by a line with
direction ratios 2, 3, −6.
Put s = 2î + 3q − 6x: this is parallel to the line. Since |s| = 7, the corresponding unit vector v
is given by
v = --17 s = --27 î + --37 q − --67 x = î cos α + q cos β − x cos γ,
where cos α, cos β, cos γ are its direction cosines. Therefore, the inclination of the line
is specified by the angles
α = arccos --27 = 73.4°, β = arccos --73 = 64.6°, γ = arccos(− --76 ) = 149°.

Example 10.10 (Two dimensions.) Find a set of direction ratios for the straight
line y = 2x + 1.
We are looking for any vector which is parallel to the line. The points
A : (0, 1) and B : (1, 3) lie on the line, so the vector s = A_B given by
s = O_B − O_A = (î + 3q ) − q = î + 2q
is parallel to the line. Therefore one set of direction ratios is given by the numbers 1, 2.
230

Example 10.11 (Two dimensions.) Find parametric and cartesian equations for
THE SCALAR PRODUCT

the straight line through the point A : (a, b), which has direction ratios p, q.

y
P : (x, y)

S
r
s
A : (a, b)
a
10

O x
Line Fig. 10.5

In Fig. 10.5, A is the point with position vector a = aî + bq, and s = pî + qq. P is a general
point on the line with position vector r = xî + yq, and s = pî + qq = A_S.
r = O_A + A_P,
and A_P is some multiple of s, say:
A_P = λ s.
Therefore
r = a + λ s, (i)
where λ is a parameter. This is a parametric vector equation for the line.
By equating corresponding components we have
x = a + λp, y = b + λ q, (ii)
and these are parametric cartesian equations.
Now eliminate the parameter between the equations (ii):
(x − a)/p = (y − b)/q.
This is a cartesian equation, which could be reduced to the standard form y = mx + c.

Direction ratios of a straight line


Definition: if pî + qq + rx is parallel to the line, then p, q, r (or any multiple λ p,
λ q, λ r) is a set of direction ratios for the line.
(a) The angles α, β, γ made with Ox, Oy, Oz are obtained from the equations
cos α = p/k, cos β = q/k, cos γ = r/k,
where k = √(p2 + q2 + r2 ).
(For two dimensions, suppress the third component.) (10.14)

10.8 Properties of a plane


Figure 10.6 shows a plane which passes through a given point A : (a1, a2, a3),
and is perpendicular to a line CD having direction ratios p, q, r. We shall obtain
equations for the plane.
231

10.8
Required plane
D
r–a
A Line

PROPERTIES OF A PLANE
P

n
y
a

C
O

x Fig. 10.6

The position vector of A is a given by


a = O_A = a1î + a2 q + a3x. (10.15)

From (10.14) the vector n given by


n = pî + qq + rx (10.16)

is parallel to the line CD, so n is also perpendicular to the plane (n is called a


normal to the plane). P : (x, y, z) represents an arbitrary point on the plane, with
position vector r given by
r = O_P = xî + yq + zx. (10.17)

By the triangle rule, A_P = r − a, and n must be perpendicular to A_P; therefore


n·(r − a) = 0,
or n·r = n· a. (10.18)

This is a vector equation for the plane.


By substituting for n, r, and a from (10.15, 16, and 17) we obtain the cartesian
equation
px + qy + rz = pa1 + qa2 + ra3.
Now suppose we start with an equation in the form
ax + by + cz = d, (10.19)

such as 2x + 7y − 5z = 3. We shall show how it can be written in the form (10.18).


Put
r = xî + yq + zx, and p = aî + bq + cx.
Then (10.19) can be written in the form
p·r = d. (10.20)

Now let A : (a1, a2, a3) be any point on the plane. It satisfies the equation (10.19).
Put a = a1î + a2 q + a3x. From (10.20), this means that
232
p· a = d.
THE SCALAR PRODUCT

Therefore (10.19) can be written


p · r = p· a. (10.21)

This is like (10.18). Therefore (10.19) represents a plane, the plane passes through
the point with position vector a, and p is perpendicular to the plane:

Vector equation of a plane


(a) A vector equation for a plane through a given point a and perpendicular to a
vector n is n ·r = n·a. Also, n ·r = constant represents a plane perpendicular
10

to n.
(b) ax + by + cz = d always represents a plane.
(c) p = aî + bq + cx is perpendicular to the plane ax + by + cz = d. (10.22)

Example 10.12 Show that the plane 3x − 2z = 1 is parallel to the y axis.


By (10.22c) the vector p = 3î − 2x is perpendicular to the plane. Also
q ·p = q ·(3î − 2x) = 3q·î − 2q· x = 0,
so p is perpendicular to q. Therefore the plane is parallel to q.

Example 10.13 Find the angle of intersection between the two planes
2x + 3y + 4z = 5 and 2x − 6y − 3z = 0.

θ
θ

q Fig. 10.7 The angle between the


planes, θ, is equal to the angle
between the normals p and q.

From (10.22c), the vector p = 2î + 3q + 4x is a normal to the first plane (i.e. it is


perpendicular to it) and q = 2î − 6q − 3x is a normal to the second plane.
From Fig. 10.7, one of the angles between the two planes is equal to the standard
angle θ (with 0°  θ  180°) between the two normals, p and q. By (10.4a)
p ·q = | p ||q | cos θ,
or −26 = √29 × 7 cos θ.
Therefore cos θ = −26/(7√29), so θ =133.6°.
233

Example 10.14 Show that the planes 2x + 2y − z = 10 and 3x − 2y + 2z = 0 are

10.8
perpendicular.
The planes are perpendicular if their normal vectors are perpendicular. Taking the

PROPERTIES OF A PLANE
equations in order, by (10.22c) the vectors
p = 2î + 2q − x and q = 3î − 2q + 2x
are normal to the planes. Then
p ·q = 6 − 4 − 2 = 0,
so the planes are perpendicular.

Origin
O

p
Perpendicular
O_N,
|O_N | = D

N
Fig. 10.8 The distance from the
Plane p ·r = d origin O to a plane.

To find the distance D of the plane ax + by + cz = d from the origin, consider


Fig. 10.8. Drop a perpendicular O_N from the origin O to the plane at N. The
equation of the plane may be written
p·r = d
where p = aî + bq + cx. Since N is a point on the plane,
p ·O_N = d
or, after dividing by |p|,
;·O_N = d/| p|,
where ; = p/|p | is the corresponding unit vector. Also O_N = ±D; depending on
its sense, so
±D;·; = d/|p|.
But ; ·; = 1, and by taking the modulus we find D:
D = |d |/|p | = |d|/√(a2 + b2 + c2). (10.23)

Now let Q, position vector q, be any point, distance DQ from the plane. Move
the origin to Q, and let R denote the new general position vector measured from
Q. Since R = r − q, the new equation of the plane is p ·(R + q) = d, or p ·R = d − p · q.
Therefore d − p·q is to be put in place of d in (10.23).
234

Distance of a point from a plane ax + by + cz = d


THE SCALAR PRODUCT

Put aî + bq + cx = p. Then
(a) Distance D of theorigin O from the plane:
D = | d|/|p| = |d | /√(a2 + b2 + c2).
(b) Distance DQ of a point Q, position vector q from the plane:
DQ = |p·q − d| /| p|. (10.24)

Self-test 10.6
10

Obtain a vector n 1 perpendicular to the plane 2x − y + z = 6, and a vector n 2


perpendicular to x + y − 2z = 0. Deduce the angle between the planes along
the line of intersection.

Self-test 10.7
(a) Show that the planes x + 2y + 3z = 1 and x + 2y + 3z = − 4 are parallel.
(b) State their perpendicular distances from the origin. (c) Decide whether
they lie on the same or opposite sides of the origin, and deduce the distance
between them.

10.9 General equation of a straight line


Figure 10.9 shows the straight line through a point A : (a1, a2, a3) with position
vector a. Its inclination is specified by direction ratios p, q, r. The vector
s = pî + qq + rx
is parallel to the line by (10.14), and is shown with its initial point at A, so that it
lies along the line. P : (x, y, z) is any point on the line, and has position vector r.

z
P
A s
Line
y
a r

x Fig. 10.9
235
By the triangle rule

10.10
r = O_A + A_P = a + A_P.
A_P is always some multiple λ of s:

FORCES ACTING AT A POINT


A_P = λ s,
where λ is a parameter which may take any value, so finally
r = a + λs
is a vector parametric equation for the line.
In components this becomes
x = a1 + λp, y = a2 + λ q, z = a3 + λ r,
and these are parametric cartesian equations for the line.
Provided that none of p, q, r is zero the parameter λ can be eliminated by
rearranging the equations
(x − a1)/p = (y − a2)/q = (z − a3) /r. (10.25)

This is the cartesian equation of a straight line, since any line passes through some
point A and has some direction p, q, r. This expression is not unique, because
(a1, a2, a3) and p, q, r are not unique.
The equation (10.25) really consists of two simultaneous equations: for example,
the pair
(x − a1)/p = (y − a2)/q and (y − a2) /q = (z − a3)/r.
These are the equations of two planes, and the line is their line of intersection.

Self-test 10.8
(a) Obtain a cartesian equation for a straight line connecting the points
A : (1, 2, 1) and B : (3, −1, 2). (b) Obtain the equations of the line of intersec-
tion of 2x + 3y − z = 1 and 3x + 2y + z = 1.

10.10 Forces acting at a point


The magnitude and direction of a force acting at a point in a body can be depicted
by a directed line segment. This pictorial possibility does not automatically mean
that forces behave like vectors: it must be established that the rules for combining
vectors listed in Section 10.3 parallel the experimental facts of mechanics.
The analogy is a little different according to the physical situation. For the
simplest case, Fig. 10.10 shows several forces, F1, F2, …, acting on the same point P.
P might be a single particle, or a single point fixed in a large body. It is ultimately
an experimental fact that the forces have the same physical effect as a single force
F, called the resultant of the forces shown, which acts at the same point and is
obtained by vector addition:
F = F1 + F2 + ··· .
236

F
THE SCALAR PRODUCT

F2

F3 v

F1 P
10

θ
F4

Fig. 10.10 Forces acting at a point P. Fig. 10.11

A zero force has zero effect. This gives the condition for equilibrium of a particle
under the influence of several forces F1, F2, … : that the resultant force F must be
zero, or
F = F1 + F2 + ··· = 0. (10.26)

The magnitude of a force (expressed in units such as newtons) is denoted by


|F |, and is proportional to the length of the arrow that represents it. The com-
ponent of a force in an arbitrary direction is illustrated in Fig. 10.11. Suppose the
direction is indicated by the unit vector v. Then
component of F in direction v = F ·v = |F | cos θ, (10.27)

where θ is conveniently given a value between 0° and 180°. Notice that if θ is


between 90° and 180°, then the component is negative. This agrees with the
definition of vector components in the î, q, x directions that we used before. The
process of obtaining a component of F in a certain direction is often spoken of
as resolving F in that direction.

Example 10.15 Find the component of the force F = 3î + q + x in the direction of


the vector s = 2î + 3q + 6x.
The unit direction vector s is given by
v = s /|s | = (2î + 3q + 6x)/ 7 = 27 î + 73 q + 67 x.
The component of F in this direction is given by
F·v = (3î + q + x) · ( 27 î + 73 q + 67 x ) = 15/ 7.

In two dimensions, if the components of a force F in any two non-parallel


directions are zero, then F must be zero (and conversely, of course). For suppose
the angles made with the two directions are θ and φ, and they do not differ by 0°
or 180°. If
237
|F | cos θ = 0 and |F | cos φ = 0,

10.10
then |F| must be zero, so F = 0. (One direction is not sufficient, since F might be
perpendicular to that direction.) This principle, of ‘resolving in two directions’, is
used frequently to solve problems. The following is a simple example.

FORCES ACTING AT A POINT


Example 10.16 Figure 10.12 represents a particle P at rest on a rough
inclined plane of inclination 30°. The forces acting on P are the force of
gravity downwards of magnitude mg where g is the gravitational constant,
the normal (perpendicular) reaction of the plane R, and the frictional force F.
Find R and F.

180° – 30°
60° 180° – 60°

30°

mg Fig. 10.12

The arrows indicate provisional directions for the vectors R and F. The scalar quantities
R and F attached to the arrows stand for the unknown components of R and F in the
assumed directions, and these might not be positive numbers. This convention provides
a safety net, for suppose we have, say, guessed the direction of F wrongly, and that it
actually acts down the plane rather than up it. The mistake will do no harm, because F
will simply turn out to be a negative number in our answer. This is a conventional way
of lettering diagrams in mechanics.
It is easiest to resolve in the assumed directions of F and R:
in direction of F: 0 = F + mg cos(180° − 60°)
which is the same as
0 = F − mg cos 60° = F − --12 mg (i)
in direction of R: 0 = R + mg cos(180° − 30°)
which is the same as
√3
0 = R − mg cos 30° = R − mg. (ii)
2
Therefore
√3
F = 12 mg and R = mg.
2
(You would usually go straight for the commonsense way of writing the components
given by (i) and (ii), avoiding the cosines of large angles.)
238

10.11 Tangent vector and curvature in two dimensions


THE SCALAR PRODUCT

In Fig. 10.13, S is a fixed point on an arc and P is any other point on it, with
position vector r = xî + yq + zx. A positive direction along the arc is indicated. P is
then determined by specifying a number s, where
| s| = arc-length fi,
and s is positive or negative according to whether P is on the positive or negative
side of S. The parameter s is a kind of coordinate for P, measured along the arc.
Indicate the dependence of r on s by writing r(s) (compare r(t) in Section 9.8).
Given a particular vector function r(s), the curve can in principle be reconstructed,
10

although it is usually a complicated matter.

s increasing

δr
P
r + δr

O s

Fig. 10.13 The approach to a


S tangent at P.

Let Q be a point on the curve with position vector r + δr. Figure 10.13 shows
the vector P_Q = δr, where P has parameter value s and Q has parameter value
s + δs. According to (9.18), the vector dr /ds is tangential to the arc at P, and points
in the direction of increasing s. Also, in this case, when δs is small,
|δr | ≈ |δs|,
approximately, and so
|dr(s)/ds| = lim |δr /δs| = 1. (10.28)
δs→0

Therefore, in the case when the parameter used is s, dr/ds is a unit tangent vector,
which we can write as M, pointing in the direction of increasing s.
Since M is a unit vector, M·M = 1. Therefore, by using the product rule to differentiate,
d dM dM dM
(M· M) = M· + ·M = 0 so that M· = 0.
ds ds ds ds
Therefore, dM /ds is perpendicular to M.
Figure 10.14 shows a curve in the x,y plane. Draw a unit normal P_N = L to the
curve at P as in the diagrams of Fig. 10.14. As we walk along the curve in the
direction of M, the direction of L is towards the right. Since M and dM /ds are
perpendicular, dM/ds must be a certain multiple, κ say, of L:
239

(a) M (b) (c)

10.11
s increasing s increasing s increasing
M
M

TANGENT VECTOR AND CURVATURE IN TWO DIMENSIONS


N
N
P
L P L
N

Fig. 10.14 (a) κ  0, curve is concave viewed from the side of L. (b) κ  0, the curve is convex
viewed from the side of L. (c) κ = 0, a point of inflection.

dM/ds = κ L. (10.29)

The three cases in Fig. 10.14 relate to the sign of κ. In Fig. 10.14a, the curve is
concave as viewed from the side of L, implying that if we make a small increase
in s, then δM points in the direction of L; therefore κ is positive. In Fig. 10.14b,
the curve is convex as viewed from the side of L; and in the same way it follows
that κ is negative. In the case of a point of inflection (Fig. 10.14c), κ is zero.
The number κ is called the curvature of the curve at P. The greater is | κ |, the
more sharply the curve is turning. The positive quantity ρ given by
ρ = 1/| κ |
is its radius of curvature at P. This is the radius of the circle that best fits the curve
at P. We will not prove this, but illustrate it in the following example.

Example 10.17 Obtain expressions for M and dM /ds for the case of a circle of
radius a with centre at the origin, and confirm that ρ = a (Fig. 10.15).

y M L
90° – θ

a s

θ
O S x

Fig. 10.15

Measure s from the point S; then


s = aθ. ➚
240
Example 10.17 continued
THE SCALAR PRODUCT

The unit tangent vector M has components (−sin θ, cos θ ), so


M = −î sin θ + q cos θ.
To differentiate with respect to s use the chain rule with dθ /ds = 1/a (because ds/dθ = a)
dM 1
= (− î cos θ − q sin θ )
ds a
(observe that t ·dt /ds = 0). The unit normal L in the right-hand direction has
components (cos θ, sin θ ), so
L = î cos θ + q sin θ.
Evidently
10

1
dM /ds = − L,
a
so κ = −1/a (consistent with a curve that is convex viewed from the side of the normal n).
Also
ρ = 1/| κ | = a,
the radius of the circle.

Self-test 10.9
For a plane curve r = xî + y(x)q, y is a function of x, δs = √[δx)2 + (δy)2], and
M = dr/ds. (a) Supposing that the curve is given by y = x2, write
dr dr dx
= ,
ds dx ds
and show that M = (î + 2xq)/√(1 + 4x2).
3
(b) Obtain dM/ds, and deduce that ρ = 12 (1 + 4x 2)–2 .

Problems

10.1 Obtain the scalar products of the pairs of geometrical theorem from this result. (There
vectors given in component form by: (a) (2, 2, 1) are two possible theorems, depending on what
and (3, 1, 2), (b) (2, −3, 2) and (−2, 3, −1), diagram you draw.)
(c) (2, 2, −3) and (−1, 1, −2), (d) (2, 3, 4) and
(1, −2, 1), (e) (p − q, p + q, p) and (p + q, q, −p − q). 10.4 Let a = (2, −3, 4) and b = (−1, −2, 3), or in the
alternative form a = 2î − 3q + 4x and b = −î − 2q + 3x.
10.2 (Two dimensions). Obtain the scalar Evaluate a· b, (a) using the first form, (b) using the
products of the pairs of vectors given in component second form with (10.8).
form by (a) (2, 3) and (3, 4), (b) (1, 0) and (0, 1),
(c) (5, 6) and (0, −4), (d) (2, 3) and (3, −2). 10.5 Given that a = î + 2q − x and b = î + 3q + x,
evaluate the following scalar products:
10.3 Prove that | a + b |2 + | a − b |2 = 2(| a |2 + | b|2). (a) a ·b, (b) (a − b) ·(a + b),
(Hint: see eqn (10.1c).) Sketch the vectors a, b, (c) (a − b) · (a − b), (d) a · a + 2a ·b + b· b,
a + b, a − b on one diagram in order to obtain a (e) (a ·a)a − (b ·b)b.
241
10.6 Find the angles, in the range 0° to 180°, 10.16 Determine numbers α, β, γ which ensure
between the pairs of vectors (a) î + q + x and that the vectors a = (α, 2, −3), b = (−1, 2β, 2), and

PROBLEMS
î + q, (b) î − q + x and î + q, (c) 2î − q + 3x and c = (2, 1, −3γ ) are mutually perpendicular.
î + 3q + 2x.
10.17 The points A : (1, 0, 0), B : (0, 1, 0),
10.7 (Two dimensions). Find the angle θ C : (0, 1, 1), and D : (0, y, z) are the vertices of a
(0°  θ  180°) between the pairs of vectors: tetrahedron. Find y and z such that ABD is an
(a) 3î + 4q and 4î − 3q, (b) î − 2q and 2î − q, equilateral triangle and ‹ is a right angle.
(c) î − 2q and −6î + 3q.
10.18 (Change of axes in two dimensions). Oxy
10.8 Find the angle between one of the edges of a and OXY are two sets of right-handed axes with
cube and a diagonal line through one end. the same origin O. OX is reached from Ox by an
anticlockwise rotation 45°. (a) Obtain the X, Y
10.9 A circular cone has its vertex at the origin
coordinates of a point P whose coordinates in Oxy
and its axis in the direction of the unit vector â. are (2, 2). (b) Find the values of x and y for the point
The half-angle at the vertex is α. Show that the Q for which X = 1, Y = −1. (c) Find the equation of
position vector r of a general point on its surface the circle (x −1)2 + y 2 = 1 in the axes OXY.
satisfies the equation
10.19 Find the lengths, and the direction cosines l,
â · r = | r| cos α. m, n, of the following vectors. (a) q, (b) î + q + x,
Obtain the cartesian equation when â = (--27, − --37, − --67 ) (c) î − 2q − 2x, (d) î − q + x, (e) î − q − x,
and α = 60°. (f ) 2î + 3q + 6x, (g) î − 2q + 2x, (h) 3x, (i) −3x.

10.10 A : (2, 2, −1), B : (0, 1, 1), C : (−1, 2, 0) are 10.20 (Change of axes). (a) Show that the vectors
three points. Find the angles in the triangle ABC. with components in Oxyz given by X = (6, 15,
10)/19, Y = (15, −10, 6)/19, Z = (10, 6, −15)/19 are
10.11 Confirm the fact that a · b = --14 ( | a + b|2 − mutually perpendicular unit vectors. (b) A sketch
|a − b |2). (Hint: it is easier to start with the right- will show that [X, Y, Z] is a right-handed system,
hand side.) Test the result using any two vectors. so it defines a new set of right-handed axes OXYZ.
Deduce a simple geometrical theorem by sketching Write down the change-of-axes matrices in (10.13a)
a, b, a + b, a − b all on the same diagram: there are and (10.13b). (c) Find the coordinates of the point
two theorems to be had, depending on whether x = 1, y = 2, z = 2 in the new axes. (d) Express the
you think of the triangle or the parallelogram rule. equation of the plane x + y + z = 0 in the new
coordinates.
10.12 Show that the component of a vector F
10.21 The following are sets of direction ratios p,
in the direction of another vector a is given by
q, r for a straight line. Obtain two possible sets
F · a/ |a |. Find the components of F = (8, 15, 9) in
of direction cosines in each case. (a) 3, 4, 12;
the directions of the three vectors a, b, c, where
(b) 6, −10, 15.
a = (2, 3, 6), b = (0, 3, 4), and c = (2, 2, 1). Express
F in the form F = λ a + µ b + νc, where λ, µ, ν are
10.22 A swarm of particles expands through
constants.
all space. The velocity v(t) of the particle with
position vector r(t) at time t in a given set of axes
10.13 Show that the vectors a = î + 3q + 4x and is equal to f(t)r. Show that the rule is the same
b = −2î + 6q − 4x are perpendicular. Obtain any when the velocity is measured relative to any
vector c = c1î + c2 q + c3x which is perpendicular given particle.
to a and b, and derive from it two unit vectors
(their senses will be opposite). 10.23 The angles made by a vector a and the
positive directions of the axes Ox, Oz are 45° and
10.14 Let a = î + q − x and b = 2î − q + 2x. Find 60° respectively. Find the angles that a may make
the angle (in the range 0° to 180°) between a and with Oy.
b, and construct any vector perpendicular to
a and b. 10.24 The following are sets of direction ratios p,
q, r for a straight line. Obtain two sets of direction
10.15 Find the value of λ such that the vectors cosines, describing unit vectors parallel to the line,
(λ, 2, −1) and (1, 1, −3λ ) are perpendicular. for each csae. (a) 3, 4, 12; (b) 6, −10, 15.
242
10.25 (a) Find any constant vector parallel to the directions of v and L respectively, so that F = Fs + Fn.
line given parametrically by x = 1 − λ , y = 2 + 3λ , Show that F = Fs + (F ·L)L. (Hint: see (10.8b).)
THE SCALAR PRODUCT

z = 1 + λ . (Hint: see eqn (10.25).) (b) Find the Find Fs and Fn when F = î − 3q and the straight
equation of the plane which is perpendicular to line is given by 2x − 3y = 1.
line in (a) and which passes through the origin.
(Hint: see eqn (10.22b).) (c) Find the equation of 10.32 (Two dimensions). A mirror M stands
the plane such that the line in (a) lies in the plane, upright on a table (sketch it as a straight line M
and the plane passes through the origin. through the origin O in the (x, y) plane). v is a unit
(Hint: the new plane must be perpendicular vector along M pointing away from O, and L is the
to the plane in (b).) unit normal vector to M pointing to the left of v.
(a) A ray of light in the plane, with direction
10.26 Find the angle θ, in the range 0°  θ  90°, vector N, falls on the mirror and is reflected in the
between the pairs of planes given as follows: direction N1. By considering its vector components
10

(a) 2x − 3y + z = 2 and x − y = 0, (b) x + y + z = 0 in the directions of v and L show that


and z = 0. (Hint: consider the normals.) N1 = −N + 2(N ·v)v.
10.27 The vector equations of two planes are (b) Find N1 when N = −(î + q)/√2, and the mirror
a·r = u and b·r = v, where a and b are constant lies along the line y = 0. (c) Suppose there are two
vectors and u and v are constants. What is the mirrors, M1 and M2, forming a wedge of angle 60°
vector relation between a and b for the planes in the sector x  0, y  0, M1 being along y = 0 and
to be perpendicular? (Hint: see 10.22(b).) Obtain M2 along y/x = √3. A ray enters the wedge in the
any plane perpendicular to x + y + z = 0. direction N = −î cos θ − q sin θ. Use the result in (a)
to find the direction of the twice-reflected ray.
10.28 (a) Show that the planes ax + by + cz = d,
where a, b, c are fixed and d may take any value, are 10.33 (a) Find any two points on the line of
all parallel. (b) Show that the straight line through intersection of the planes x + y + z = 2 and 2x + y − 2z
O and perpendicular to the plane 2x + y − z = 2 has = 1 (e.g. one point is obtained by starting with x = 0).
the parametric equation r = λ (2, 1, −1), where λ is (b) Obtain a parametric vector equation for the
a parameter. (c) For (b), find the point at which line of intersection. (c) Deduce cartesian equations
the line intersects the plane, and deduce its length of the form (10.25) for the straight line. (Notice
(this is the distance of the plane from the origin). that the equations in (a), taken together, already
(d) Find the distance between the plane in (b) and define the line in cartesian form, but the form
the plane 2x + y − z = 1. (10.25) is more informative since it contains the
direction ratios.)
10.29 Let p · r = d, where p = (a, b, c) and
10.34 Obtain, in parametric form, the line of
r = (x, y, z), be a plane, and Q be a point with
intersection of the planes 2x + 3y − z = 1 and
position vector q. Show that the distance of Q
x + y + z = 0. Deduce the standard form (10.25).
from the plane is equal to | p· q − d | /| p|.
Deduce the distance of the point (1, 1, 2) from
10.35 Find direction ratios for the line of
the plane x + 2y − 4z + 3 = 0.
intersection of the planes 2x + 3y − 2z = 1 and
10.30 A : (0, −1, 3), B : (1, 0, 3), and C : (0, 0, 5)
x − 3y + 2z = 2. (Notice that the line cannot be
are three points. Let P1 be the plane through A, represented in the form (10.25).)
perpendicular to −q + x, and P2 be the plane
10.36 Three points are given by A : (−1, −2, 1),
through A, B, and C.
B : (−1, −2, 0), and C : (−1, 0, 3). Let P1 be the
(a) Find equations for P1 and P2. (b) Obtain
plane through B with normal vector n1 = q + x
the angle between P1 and P2. (c) Determine the
and P2 the plane through C with normal vector
perpendicular distance from the origin O to P1.
n2 = 2î − q + 3x. Show that the line AC is
(d) Show that the line of intersection, L, of P1
perpendicular to P1.
and P2 meets the line OD, where D is the point
(1, 4, −4). (e) Determine the point of intersection 10.37 Given two planes, a1x + b1 y + c1z = d1 and
of L with OD. a2x + b2 y + c2z = d2, show that any solution p, q, r
of the simultaneous equations
10.31 (Two dimensions). A straight line has a unit
normal L, and v points along the line. Let Fs and a1p + b1q + c1r = 0, a2 p + b2q + c2r = 0
Fn be the component vectors of a vector F in the is a set of direction ratios for the line of intersection.
243
10.38 (Perspective drawing). An observer’s 10.40 A plane curve has the equation y = f(x), and
eye E is at the point î + q + x, and views objects the position vector r of a point on the curve can be

PROBLEMS
through a plane screen which has the equation represented by
r ·(1.1î + 1.1q + x) = 1. Q is a general point on an r = xî + f(x)q
object behind the screen, and its position vector
using x as the parameter. Show that the unit
is r = xî + yq + zx. Find the coordinates of the
tangent vector to the curve is
apparent position of Q on the screen. (Hint: find
the equation of the line EQ; then find where it dr dr dx
M= =
cuts the screen.) ds dx ds
î + f ′q
10.39 An ellipse is given parametrically by = .
√(1 + f ′ 2 )
r = îa cos t + qb sin t, where a and b are constants
and t is the parameter, with −π  t  π (in Show that the curvature κ of the curve at any
radians). Show that δs2 ≈ δx2 + δy2, where point is given by
s represents arc-length. Deduce that −f ′′
κ = .
ds /dt = (a 2 sin 2t + b2 cos 2t) 2 . (1 + f ′ 2 )3 / 2
1

Find the unit tangent vector, a unit normal, the Find the curvature along
curvature, and the radius of curvature at the points (a) the parabola y = x2;
where t = 0, --14 π, and --12 π. (b) the cosine curve y = cos x.
11 Vector product

CONTENTS

11.1 Vector product 244


11.2 Nature of the vector p = a × b 246
11.3 The scalar triple product 249
11.4 Moment of a force 251
11.5 Vector triple product 255
Problems 256

A second form of product defines a vector which finds applications in problems


about moments, angular velocity, and in other circumstances that involve rota-
tion. It has widespread use in dynamics, fluid mechanics, and electromagnetism.

11.1 Vector product


The vector product, or cross product, is denoted by a bold multiplication sign as
in a × b, or a caret sign as in a ∧ b. Its definition is:

Vector product a × b
(a) a × b = (a2b3 − a3b2)î − (a1b3 − a3b1) q + (a1b2 − a2b1)x,
which can be expressed as an expansion like a determinant (see (8.3)):
î q x
a a3 a a3 a a2
(b) a × b = a1 a2 a3 = î 2 −q 1 +x 1 .
b2 b3 b1 b3 b1 b2
b1 b2 b3
(11.1)

Example 11.1 Find the vector products a × b and b × a, where a = 2î − q + 3x


and b = − î + 2q + 4x.
From (11.1a),
a × b = {[(−1) × 4] − (3 × 2)}î −{(2 × 4) − [3 × (−1)]}q + {(2 × 2) − [(−1) × (−1)]}x
= −10î − 11q + 3x. ➚
245
Example 11.1 continued

11.1
In evaluating b × a, we exchange the a and b components in the expression (11.1a), so
the sign of each of the three bracketed terms changes. Therefore
b × a = −a × b = 10î + 11q − 3x.

VECTOR PRODUCT
(Alternatively, we interchange the last two rows in the determinant form (11.1b), which
changes its sign by Section 8.2, Rule 3.)

Algebraic manipulations are governed by the following rules:

Algebraic properties of a × b
(a) a × b = −b × a (the vector product does not commute).
(b) a × (b + c) = a × b + a × c (distributive law).
(c) a × (λ b) = λ a × b where λ is any number.
(d) a × b = 0 if b and a are parallel: in particular, a × a = 0. (11.2)

These are proved as follows:


(a) If a and b are interchanged, then the three brackets in (11.1a) change sign.
(b) Put b1 + c1, b2 + c2, b3 + c3 in place of b1, b2, b3 in (11.1a), and separate the
groups of terms involving b and c. This is also a property of determinants.
(c) This follows immediately from (11.1a): λ is a factor throughout.
(d) a and b are parallel so b = λ a, where λ is some number. Therefore
a × b = λ a × a (from (11.2c)). If we now put b1 = a1 etc. into (11.1a), we obtain
a × a = 0, so a × b = 0.
The unit vectors î, q, x are simply related by the vector product:

Vector products of î, q, x
(a) î × q = x, q × x = î, x × î = q.
(b) q × î = −x, x × q = −î, î × x = − q. (11.3)

Notice that for the group in (11.3a), the cyclic order î, q, x, î, q, … is maintained,
and for the group in (11.3b) there is a different cyclic order q, î, x, q, î, … . To prove,
for example, that î × q = x, put î = (1, 0, 0) and q = (0, 1, 0) into the definition (11.1b)
(or into (11.1a) if you are not sure about determinants). Then we obtain

î q x
î × q = 1 0 0 = 0î + 0q + 1x = x.
0 1 0

The group (11.3b) follows by the change-of-order rule, (11.2a).

Self-test 11.1
Evaluate c = a × b, where a = 3î − q + 2x and b = î + q + x, and confirm that
c is perpendicular to both a and b. Find the magnitude of c.
246

11.2 Nature of the vector p = a × b


VECTOR PRODUCT

Firstly we show that p = a × b is perpendicular to both a and b. This is equivalent


to proving that if we move a and b to emerge from a common point Q (see
Fig. 11.1a) then p, or a × b, is perpendicular to the plane containing a and b.

(a) (b)
11

Q
p

a
b
p
Q

Fig. 11.1 It will be shown later that the direction of p is that given in (a).

Using the definition of a × b,


a · p = a ·(a × b)
= (a1î + a2 q + a3x) · [(a2b3 − a3b2)î − (a1b3 − a3b1)q + (a1b2 − a2b1)x]
= a1(a2b3 − a3b2) − a2(a1b3 − a3b1) + a3(a1b2 − a2b1)
= 0.
Therefore, by (10.5), p is perpendicular to a. Similarly, p is perpendicular to b.
However, so far as we can tell from this argument, p might point in either of
two directions, as suggested by the diagrams in Fig. 11.1a and b. We want to dis-
tinguish between them, and the distinction is similar to the distinction between
right- and left-handed axes (compare Fig. 9.8). One way to recognize a right-handed
system follows:

Test for a right-handed system


(See Fig. 11.2.) Place a = Q_A, b = Q_B, c = Q_C, at a common point Q. View Q
from any point V on the opposite side of the triangle ABC from Q. Then
(a) [a, b, c], in that order, is a right-handed system if the direction of the
circuit A to B to C is seen from V as anticlockwise. Otherwise, [a, b, c] is
left-handed.
(b) If [a, b, c] is right-handed, then (maintaining the cyclical order) [b, c, a]
and [c, a, b] are right-handed. The other permutations are left-handed.
(c) A set of axes is right-handed if [î, q, x] is right-handed. (11.4)
247

11.2
C
Eye, V

NATURE OF THE VECTOR p = a × b


c p=a×b y

b2  0
a A
Q
b B b
θ
Fig. 11.2 Test for a right-handed system Q
of vectors [a, b, c]. Viewed through the a
triangle, the vertices A, B, C follow in
a1  0 x
anticlockwise order.
Fig. 11.3

It is essential to place V on the opposite side of the triangle ABC from Q, other-
wise the apparent direction of the circuit is reversed. In Fig. 11.1a, the system
[a, b, p] is right-handed, and in Fig. 11.1b, [a, b, p] is left-handed.
Returning to the cross product p = a × b, where the vectors all emerge from Q,
set up a special set of right-handed axes Qx, Qy, Qz, as in Fig. 11.3. The axes
satisfy the following conditions:
(i) Qx is in the direction of a.
(ii) Qy is in the plane of a and b, perpendicular to Qx. It is directed so that the
y component of b is positive.
(iii) The direction of Qz makes the axes right-handed.
The unit vectors are î, q, x. From the conditions (i) and (ii), with the usual
notation,

î q x
p = a1 0 0 = a1b2 x. (11.5)
b1 b2 0

Since, according to (i) and (ii), a1 and b2 are positive, p is in the direction of x, and
the test (11.4a) shows that

[a, b, p] is a right-handed system. (11.6)

Therefore, Fig. 11.1a is the correct one, and Fig. 11.1b gives the direction of p
incorrectly.
Moreover (see Fig. 11.3),
b2 = |b | sin θ
(since 0°  θ  180°, the sign of b2 is positive as required). Also
a1 = |a |.
248
Therefore, from (11.5),
VECTOR PRODUCT

p = a × b = x |a | | b| sin θ, (11.7)

which specifies p in a simple way.

Properties of p = a × b
(a) p is perpendicular to a and b, in the direction making
[a, b, p] right-handed.
(b) | p| = |a||b | sin θ, where θ is the angle in the range 0° to 180° between the
11

directions of a and b. (11.8)

The properties of a × b in (11.8) depend only on the magnitude and direction of


a and b. We are bound to find the same results whatever axes we use to obtain
them: the axes we actually used were chosen only to simplify the algebra. There-
fore, we have shown that the cross product is invariant with respect to changes in
axes (provided that we confine ourselves to right-handed axes; left-handed axes
would produce (−p)).

Invariance of a × b
a × b is invariant with respect to changes from one right-handed set of axes to
another. (11.9)

Other invariants are the length and direction of a vector a, and therefore the
vector a itself: its components are different in different axes, but the physical vector
we are talking about does not change. The scalar product, a·b = a1b1 + a2b2 + a3b3,
is also invariant; that is to say, it has the same numerical value in any right-handed
axes: this value is equal to |a| | b | cos θ and so does not change.

Example 11.2 Let a = Q_A and b = Q_B be two vectors from Q, representing
two sides of a parallelogram. Show that the area of the parallelogram is equal
to |a × b|.
Complete the parallelogram as shown in Fig. 11.4. Construct a perpendicular BN on
to QA. Then
Area QACB = base QA × height BN = | a| | b| sin θ = | a × b| (by (11.8)).

Example 11.3 Two planes have normals n1 and n2 respectively, and pass
through a point A with position vector a. Obtain a vector parametric equation
for their line of intersection.
Figure 11.5 shows the two planes and their line of intersection LM, which contains
the point A. The point P : (x, y, z) with position vector r is a general point on the line.
Let p be any vector parallel to LM. Then A_P is always a multiple of p, so
r − a = λp (i)
where λ is a parameter. ➚
249

C O

11.3
r
B M
n2
n1 a

THE SCALAR TRIPLE PRODUCT


P
|b| sin θ
b
A
a A
θ L
N Plane 2
Q Plane 1

Fig. 11.4 The area of a parallelogram. Fig. 11.5

Example 11.3 continued

We may choose p to be given by


p = n1 × n2; (ii)
it is perpendicular to n 1 and n2 by (11.8a), so it is parallel to LM. Therefore, from (i)
and (ii),
r = a + λ n1 × n2 (iii)
is a parametric vector equation for the line.

Self-test 11.2
Let a = 3î − 4 q and b = î + q + x. Confirm that the value of θ given by (11.7) is
compatible with the value of cos θ derived from the dot product a·b.

11.3 The scalar triple product


The scalar quantity a ·(b × c) is called a scalar triple product. It has the following
properties:

Properties of the triple scalar product


G a 1 a 2 a 3 J 4 a1 a2 a 3 4
(a) a·(b × c) = det H b1 b2 b3 K = 4b1 b2 b3 4 .
I c1 c2 c3 L 4c1 c2 c3 4
(b) a·(b × c) = b·(c × a) = c·(a × b)
(c) a·(c × b) = b·(a × c) = c·(b × a)
= −a ·(b × c).
(d) If any two vectors are equal or parallel (that is, if a, b, c are linearly
dependent – see Section 8.2), then a·(b × c) = 0.
(e) It is invariant for right-handed axes. (11.10)
250
The proofs are as follows:
VECTOR PRODUCT

(a) Put a = (a1, a2, a3) and so on. Then


a·(b × c) = (a1, a2, a3) × (b2c3 − b3c2, b3c1 − b1c3, b1c2 − b2c1)
= a1(b2c3 − b3c2) − a2(b1c3 − b3c1) + a3(b1c2 − b2c1)
4 a1 a2 a3 4
= 4 b1 b2 b3 4 .
4 c1 c2 c3 4
(b) In the three permutations
11

a·(b × c), b·(c × a), c ·(a × b),


the cyclic order a, b, c, a, b, … is maintained. The determinants for b·(c × a) and
c ·(a × b) are each obtained from a·(b × c) by means of two row interchanges, and
this leaves the determinant unaltered (see Section 8.2, Rule 3). Therefore they are
all equal.
(c) Compare these three products with those in (b), and recall that b × c =
−c × b etc.
(d) If b is parallel to c, then b × c = 0 from (11.2d). If a is parallel to b or c, then
use the same argument on one of the equivalent permutations in (11.10b).
(e) Its value remains the same in any right-handed axes because the cross and
dot products have this property (see (11.9)).
The brackets in the triple scalar product are not strictly necessary and are often
omitted, because the alternative bracketing (a·b) × c would be meaningless.
A parallelepiped is the three-dimensional analogue of a parallelogram and its
volume can be expressed as a scalar triple product. Figure 11.6 shows the paral-
lelepiped which has the vectors Q_A = a, Q_B = b, Q_C = c as three adjacent sides.

b×c

a
θ C
θ c

Q N

b E

B Fig. 11.6

Drop a perpendicular AN on to the plane QBEC. Then


volume = area QBEC × height AN.
But from Example 11.2, since QBEC is a parallelogram,
area QBEC = |b × c |.
251
Since b × c is perpendicular to the plane of b and c, A_N and b × c are parallel, so

11.4
height AN = QA cos θ = |a | cos θ.
Therefore

MOMENT OF A FORCE
volume = | a||b × c | cos θ = |a ·(b × c)|
from (11.2).

Volume of a parallelepiped
If the adjacent sides at a vertex Q are Q_A = a, Q_B = b, Q_C = c, then
volume = | a·(b × c)|. (11.11)

Vectors are said to be coplanar if, when drawn from the same point, they lie in
the same plane. The condition for this is:

Coplanar vectors
Three nonzero vectors a, b, c at the same point are coplanar if, and only if,
a·(b × c) = 0, (that is, for a, b, c linearly dependent – see Section 8.2). (11.12)

(If they are not at the same point, then this is the condition that they should be
parallel to a common plane.) The result follows from (11.11): the volume of the
corresponding parallelepiped is zero.

Example 11.4 Show that the points A : (1, 2, 2), B : (3, 4, 5), C : (−1, 0, −1) lie on
a plane through the origin.
Suppose the three points A, B, C have position vectors a, b, c. To show a, b, c are
coplanar, evaluate a ·(b × c):
a·(b × c) = (1, 2, 2)· [(3, 4, 5) × (−1, 0, −1)]
1 2 2
4 5 3 5 3 4
= 3 4 5 =1 −2 +2
0 −1 −1 −1 −1 0
−1 0 −1
= −4 − 2 × 2 + 2 × 4 = 0.
Therefore, A, B, C, and O are all in the same plane, so the points A, B, C are on a plane
through the origin.

11.4 Moment of a force


Suppose that, in three dimensions, a force F is acting at a point P in a body
(Fig. 11.7) and Q is any point. Then the magnitude M of the moment or torque
about the point Q which is exerted by F is defined to be
M = |F|d,
252

Q
VECTOR PRODUCT

R F

d θ
P
N

Line of action Fig. 11.7

where d is the length of the perpendicular QN from Q to the line of action of F.


11

In Fig. 11.7, Q__P = R, and θ is the angle between F and R, with 0  θ  180°. Then
d = | R | sin θ,
so
M = | F||R | sin θ. (11.13)

This equation suggests a connection with the vector R × F. Define a vector M by


M=R×F (11.14)

(note that R comes first in the product). Then by (11.8b),


|M | = | R × F| = | R| | F| sin θ,
which is the same as (11.13). We call M the vector moment about the point Q of
F acting at P. M is perpendicular to the plane of R and F, in the direction making
[R, F, M] right-handed.

Example 11.5 A force F = î − q + 2x acts at P : (1, 2, 1). Find its vector moment
M about the point Q : (2, 1, 1).
In these axes the position vectors of P and Q are
p = î + 2q + x, q = 2î + q + x,
so
R = Q_P = p − q = −î + q.
The moment M is given by
î q x
M = R × F = −1 1 0 = 2 î + 2 q.
1 −1 2

Example 11.6A force F = î − q (force units) acts at P : (1, 2, 0). Find its vector
moment about the origin O.
O, P, and F all lie in the (x, y) plane, so the physical problem is two-dimensional. The
Oz axis points towards you, out of the page, in Fig. 11.8.
The vector moment M is given by
î q x
M = R × F = 1 2 0 = −3x.
1 −1 0 ➚
253
Example 11.6 continued

11.4
Thus M is parallel to Oz and its z component is −3. Figure 11.8 shows the negative sign
corresponds to F having a clockwise influence on a wheel turning about the point O.

MOMENT OF A FORCE
y y F2 q
P
2

F
F1î
R
1 P: (a, b, 0)
R

O 1 2 x O x

Fig. 11.8 Fig. 11.9

Example 11.7 (Generalizes Example 11.6.) A force F = F1î + F2 q acts at


P : (a, b, 0). Find its vector moment about the origin.
We have

î q x
M=R×F= a b 0 = (F2a − F1b)x.
F1 F2 0

This situation is also (physically) two-dimensional; the z direction in Fig. 11.9 would only
be needed in order to display M. The expression illustrates the separate clockwise and
anticlockwise contributions respectively of F1 and F2.

The scalar triple product


v ·M = v ·(R × F) (11.15)

represents the component of M in the direction of a unit vector v. Its physical


significance concerns torque or moment about an axis (rather than about a
point), as follows.
Figure 11.10 shows an axis of rotation AA′ (in three dimensions) passing through
a point Q and parallel to a unit vector v. A force F acts at P. Q′ is any other point
on AA′, and P′ is any point on the line of action of F.
Put
Q_P = R and Q__=_P_= = R′.
Then
v ·(R′ × F′) = v · [(Q_+_Q + R + P__P_= ) × F].
But QQ′ is parallel to v, and PP′ is parallel to w, so by (11.10d) these make no
contribution to the triple scalar product, and we obtain
254

A′ z
VECTOR PRODUCT

of
ion
F

Axis
F

rotat
v
R P

Q P
y
P′
v
R′

Q′
11

Q′
A R′
P′
Fig. 11.10 Moment of F about an axis x
parallel to s:
v ·(R′ × F) = v ·(R × F). Fig. 11.11

v ·(R′ × F) = v ·(R × F).


Thus any point on the axis and any point on the line of action of F may be put into
the triple scalar product without affecting its value.
The freedom given by this result allows us to choose Q′ and P′ such that Q′P′ = R′
is perpendicular both to v and to the line of action of F, as in Fig. 11.11. Put
|R′| = |Q_=_P_= | = d;
this is the distance between the two skew lines.
Next, construct a set of coordinate axes at origin Q′. Let Q′x be in the direc-
tion of R′, Q′z in the direction of v, and Q′y perpendicular to Q′x and Q′y in the
direction necessary for the axes to be right-handed. The unit vectors are î, q, x.
Express F in terms of its components in directions î, q, x. The î component is
zero since F is perpendicular to Q′x, so
F = F2 q + F3x.
Put v ·M = M, say. Then
M = v ·(R′ × F) = F ·(v × R′) (by (11.10b)).
Also
v × R′ = (| v||R′ | sin 90°)q = | R′ | q = dq.
Therefore
M = F · dq = (F2 q + F3 x) · (dq) = F2d. (11.16)

The expression (11.16) corresponds to what we should expect about the turning
effect of F about the given axis. There is no contribution from F3 because F3 x is
parallel to the axis of rotation, and F1 is zero in these axes. What remains is F2 q,
which is perpendicular to the axis of rotation Q′z, and d is the perpendicular
distance of F from it.
255
For this reason the scalar quantity M = v ·(R × F) is called the moment of
F about an axis of rotation AA′, as in Fig. 11.10. Dropping the dashed quantities,

11.5
the unit vector v is the direction AA′, and R = Q_P, where Q is any point on AA′ and
P any point on the line of action of F (M being independent of the choice of these

VECTOR TRIPLE PRODUCT


points, by (11.15)).

Self-test 11.3
Let ω be a constant vector, ω = ω î, and r = xî + yq + zx the position vector of
a point. Show that r × ω = (−xy + qz)ω, whose directions (projected on to the
y,z plane) are tangential to a family of circles, radii √(y2 + z2), traversed in the
anticlockwise direction.

11.5 Vector triple product


The vector
w = a × (b × c) (11.17)

is called a vector triple product. The vector w is perpendicular to b × c, but b × c


is perpendicular to b and c, so b, c, and w are parallel to the same plane. Therefore
(see (9.12)), it must be possible to express w in the simple form w = λ b + µ c. The
required relation is:

Vector triple product


a × (b × c) = (a ·c)b − (a·b)c. (11.18)

To prove (11.18), translate a, b, c to a common point Q, and set up axes Qx,


Qy, Qz as in Fig. 11.12, such that Qz is in the direction of a. Then
a = a3x, (say),
so a ·b = a3b3 and a·c = a3c3. (11.19)

Q y

x Fig. 11.12
256
Remember that x × î = q, x × q = − î, and x × x = 0. In these axes,
VECTOR PRODUCT

w = a × (b × c)
= a3x × [(b2c3 − b3c2)î − (b1c3 − b3c1)q + (b1c2 − b2c1)x]
= a3(b2c3 − b3c2)q + a3(b1c3 − b3c1)î
= a3c3(b1î + b2 q ) − a3b3(c1î + c2 q).
The third components of b and c (b3x and c3x) are missing in the brackets: to
make them appear, add to the right-hand side the term
11

a3b3c3x − a3b3c3x (which = 0).


After bringing them into the brackets we get
w = a3c3(b1î + b2 q + b3x) − a3b3(c1î + c2 q + c3x) = (a ·c)b − (a ·b)c.

Example 11.8 Find a × (b × c) when a = î + q, b = 2î − q, and c = î + q + x.


a ·b = (î + q )·(2î − q) = 2 − 1 = 1;
a ·c = (î + q) ·(î + q + x) = 1 + 1 = 2.
Therefore
a × (b × c) = (a·c)b − (a·b)c = 2b − c = 2(2î − q) − (î + q + x) = 3î − 3q − x.
(The product could also be worked out directly.)

Self-test 11.4
Express (a) a × (b × c) and (b) c × (b × a) in the form (11.18), for a = î + q + x,
b = î + 2q + x, c = 3î + q + x.

Problems

11.1 In component form let a = (1, −2, 2), b = (3, perpendicular to the vectors b and c, and passes
−1, −1), and c = (−1, 0, −1). Evaluate the following: through the point with position vector a.
(a) a × b (b) b × a (b) Obtain the equation to the line when
(c) a × a (d) a·(b × c) a = î + 2q + x, b = î − q, and c = q + x.
(e) c ·(a × b) (f) b ·(a × c)
(g) (a × b)·b (h) a × (a × b) 11.4 Show that the vector a × u, where
(i) (c × b) × a. a = (a1, a2, a3) and u is any vector, is parallel
to the plane a1x + a2 y + a3z = d. Obtain two
11.2 Given two planes, r · n1 = d1, r · n2 = d2, show vectors parallel to the plane 2x − 3y − z = 1.
that the plane through the origin perpendicular to
their line of intersection is given by r ·(n1 × n2) = 0. 11.5 Under what conditions will a × b = 0?

11.3 (a) Use the vector product to obtain a vector 11.6 Show that the vectors a = 2î + 3q + 6x and
parametric equation for the straight line which is b = 6î + 2q − 3x are perpendicular. Find a vector c
257
which is perpendicular to b and c and such that as Cramer’s rule (see Section 12.1), for solving any
[a, b, c] is a right-handed set. three simultaneous equations provided D ≠ 0.

PROBLEMS
11.7 (a) The vertices of a triangle are A, B, C, with 11.12 (a) Show that if three vectors a, b, c are
position vectors a, b, c. Show that the area of the non-coplanar and v is any vector, then constants
triangle ABC is given by --12 |b × c + c × a + a × b|. X, Y, Z can be found such that v = Xb × c + Yc × a
(Hint: see Example 11.2.) (b) A second triangle has + Za × b. (Hint: start by forming a· v from this
vertices at a + λ(b − c), b, c, where λ is a scalar. expression.) (b) Find X, Y, Z if v = 2î + q − 2x,
Show that the areas of the two triangles are a = î − q, b = î + 2q, c = q − 2x.
the same. What simple geometrical result does
the equality exhibit? (c) Find the area of the 11.13 The equations r = a + λ u and r = b + µ v,
triangle whose vertices are at î − 2q − x, î − q + 2x, where λ and µ are parameters, represent two
î + 2q − x. skew lines L1 and L2 (straight lines which do not
intersect). (a) Write down a vector w which is
11.8 A, B, C are three points which do not lie on perpendicular to both L1 and L2. (b) Show that
a straight line, and D is another point. Put A_B = b, values of λ, µ, and ν can be found so that
A_C = c, and A_D = d. Show that the distance of D (a + λ u) + ν w = (b + µ v),
from the plane passing through A, B, C is equal to
and explain why this implies that there actually
| d ·(b × c)|/ |b × c |.
exists a straight line L3 which joins L1 and L 2 and is
11.9 Show that, if QA, QB, QC are adjacent edges
perpendicular to both. (c) For the case when a = −î,
of a rectangular parallelepiped with coordinates u = x, b = î − q, v = î + q + x, find the values of λ, µ,ν.
Deduce the points where L3 meets L1 and L2. Find
Q : (x0, y0, z0), A: (x1, y1, z1), an equation for L3, and the perpendicular distance
B : (x2, y2, z2), C : (x3, y3, z3), between L1 and L2.
then its volume is given by the modulus of the
determinant 11.14 Find the vector moments M of the given
forces F acting at the points P as specified. Make
x1 − x 0 x2 − x0 x3 − x 0
sketches, indicating the direction of M.
y1 − y 0 y2 − y0 y3 − y 0 .
(a) F = (2, 0, 0) at P : (0, 3, 0). Find M about the
z1 − z 0 z2 − z0 z3 − z 0
origin O.
(b) F = (2, 0, 0) at P : (0, 3, 0). Find M about
11.10 (Oblique coordinates). (a) Let a, b, c be Q : (0, 0, 3).
three non-coplanar vectors, and v be any vector. (c) F = (2, 0, 0) at P : (0, −3, 0). Find M about
Show that v can be expressed as Q : (0, 0, 3).
v = Xa + Yb + Zc,
where X, Y, Z are constants given by 11.15 A force F of magnitude 4 acts at the
point (1, −1, 2) in the direction of î − 2q − 2x.
X = v ·(b × c)/D, Find the vector moment M of F, (a) about the
Y = v ·(c × a)/D, origin, (b) about the point (−2, 1, 2). (c) Find its
Z = v ·(a × b)/D, component about the y axis, taken in the direction
of q (i.e. v = q, in the text).
where
D = a·(b × c). 11.16 Find the moment M about the axes
(Hint: start by forming, say, v ·(a × b). Equation specified, where the force is F = (2, 0, 0) acting at
(11.10d) gets rid of two terms.) (b) Check the P : (0, 3, 0). (Note that the sense of the axis needs
formulae for the case a = (1, 1, 0), b = (0, 1, 1), to be specified. If the sense is reversed, then the
c = (1, 0, 1). v = (1, 1, 1), by solving the three sign of v·(R × F ) changes.) (a) The z axis, taken
equations obtained by splitting the vector in the positive direction. (b) The z axis, taken
equation into components. in the negative direction. (c) The x axis, in the
positive direction. (d) The y axis, in the positive
11.11 (Cramer’s rule). In Problem 11.10, write direction. (e) The axis through the origin,
the vector equation v = Xa + Yb + Zc in the form direction v = (1/√3, 1/√3, 1/√3).
of three simultaneous equations involving the
components of a, b, c. Now write the formulae 11.17 Find the magnitude of the moment M of
for X, Y, Z in determinant form. This is known the force F = (1, 1, 2) acting at P : (2, −3, 1), about
258
the axis A_B , where A : (2, 3, 2) and B : (1, 1, 1). Find the matrix S such that v = Sω. Show that
Verify directly that the component of F in | v |2 = ω TS TSω, and that
VECTOR PRODUCT

the direction of AB makes no contribution. ⎡y 2 + z 2 −xy −zx ⎤


Show that the component of F along any S S = ⎢ −xy
T
x2 + z 2 −yz ⎥ .
line joining P to the axis AB makes no ⎢ −zx −yz y 2 + z 2 ⎥⎦
contribution. ⎣

11.21 Supposing that a and b emerge from the


11.18 A fixed force F acts at a fixed point P same point, show geometrically that a × (a × b)
with position vector r. An axis passes through and b × (a × b) are in the plane of a and b.
the origin, but it can be adjusted so as to take any
direction v. Show that the magnitude | M | of the 11.22 If v = (a × b) × (c × d ), then v can be written
11

moment M about the axis is a maximum when v in either of the forms v = pc + qd or v = ma + nb.
is perpendicular to the plane containing r and F. Justify this expectation geometrically, then obtain
(Hint: remember a·b = |a||b| cos θ in the usual the constants by using eqn (11.18).
notation.) Under what conditions is |M| a
minimum, and what is its value? 11.23 Prove that
a × (b × c) + b × (c × a) + c × ( a × b) = 0.
11.19 A rigid lamina in the (x, y) plane rotates at
11.24 (a) Find a vector which is perpendicular to
ω radians per second about the z axis in axes Oxyz,
n and in the plane of n and b, where n and b are
in the manner of a wheel on an axle. (a) Show that
any two vectors. (b) Show that the straight line
if r and θ are the polar coordinates of any point P,
r = b + µ n × [(a − b) × n], where µ is a parameter,
then the velocity of P is given by v = −îω r sin θ +
passes through the point with position vector b,
qω r cos θ.
and meets the straight line given parametrically
(b) Show that v may be written v = ω × r, where
by r = a + λ n in a right angle.
ω = ω x (ω is called the angular velocity vector in
two dimensions). 11.25 You are given two planes, r · n1 = d1, r · n2 = d2.
(c) Choose any point Q which travels round with Show that the point on their line of intersection
the lamina, and let QXYZ be another set of axes that is closest to the origin has the position vector
which remain parallel to Oxyz. Show that, viewed
relative to QXYZ, any point P has velocity V given α(n1 × n2) × n1 + β(n1 × n2) × n2
by V = ω × R, where R is its position vector in where α and β are certain constants. Obtain a
QXYZ. formula for the constants.

11.26 A particle P of mass m and position vector


11.20 A point of a rigid body is fixed at the origin
r(t) moves with velocity v(t) under the action
of coordinates O. It rotates about O with angular
of a single force F. A point Q at q(t) has velocity
velocity ω (i.e. at any instant the body is rotating at
u(t). The moment of momentum (or angular
a rate |ω | rad s−1 about the line in the instantaneous
momentum) H(t) of P about Q is defined by
direction of the vector ω ).
Explain why every point of the body is moving H = (r − q) × (mv).
perpendicularly to ω. Show that the velocity v of Show that dH/dt = [(r − q) × F] − mu × v. Deduce
any point is given by v = ω × r. (Hint: compare that if u(t) = 0, then dH/dt = M, where M is the
Problem 11.19.) moment of F about the point Q.
Linear algebraic equations
12

CONTENTS

12.1 Cramer’s rule 260


12.2 Elementary row operations 262
12.3 The inverse matrix by Gaussian elimination 265
12.4 Compatible and incompatible sets of equations 267
12.5 Homogeneous sets of equations 271
12.6 Gauss–Seidel iterative method of solution 273
Problems 275

You are likely to have encountered a method for solving two simultaneous linear
equations for two unknowns, x and y, as in the case of the equations
2x + 3y = −1, x − 2y = 3.
To solve these equations, use the method of elimination. To eliminate y, multiply
the first equation by 2 and the second by 3, and add the results. The y terms
cancel, and we are left with 7x = 7, so x = 1. By substituting this value back in, say,
the first equation, we have 2 + 3y = −1, so y = −1. You might have been led to expect
that a similar process will always lead to a single, definite or unique solution for
x and y, no matter what pair of equations is presented. But in fact a surprising
variety of eccentricities can occur.
Suppose the equations are
x + y = 2, 2x + 2y = 1.
These two equations are contradictory, or incompatible – there is no solution –
we cannot reconcile the two statements, x + y = 2 and 2(x + y) = 1. But if the two
equations happen to be equivalent, meaning effectively identical, say x + y = 2 and
2x + 2y = 4, they reduce to the single equation x + y = 2 for the two unknowns,
x and y. Therefore there is an infinity of solutions, in this case x = c, y = 2 − c for
every value of c.
Now suppose the numbers on the right-hand sides of a pair of equations are
both zero, 2x + 3y = 0, x − 2y = 0 say (the pair is then said to be homogeneous).
Such a pair is always compatible, but there are still two possibilities. If the
equations are equivalent, as in the pair x + y = 0, 2x + 2y = 0, there is really only
260
one equation for the two unknowns, and they have an infinity of solutions (in
this case given by x = c, y = −c for every value of c). If they are not equivalent, for
LINEAR ALGEBRAIC EQUATIONS

example in the pair 2x + 3y = 0, x − 2y = 0, they are not incompatible because


they do have a single, unique solution, x = 0, y = 0 (sometimes oddly called the
trivial solution).
In practice, simultaneous linear equations in three, four, or many unknowns
are commonplace. Obviously the elimination method for obtaining solutions
(see, for Example 12.1) soon becomes too arduous to carry out by hand, but
also the variety of possible eccentricities is greater. A particular property, say
compatibility, cannot be checked at a glance, and when there are more than three
variables there is no simple geometrical argument to give guidance. The matrix
techniques described in this chapter are essential for understanding larger systems.

12.1 Cramer’s rule


12

Consider the general system with two equations and two unknowns:
a11x1 + a12 x2 = d1,
a21x1 + a22 x2 = d2.
Elimination leads to the solution
d1a22 − d2 a12 d a − d1a21
x1 = , x2 = 2 11 ,
a11a22 − a12 a21 a11a22 − a12 a21
provided that the denominator a11a22 − a12a21 is not zero. From Section 8.1 in
the chapter on determinants, these ratios can be recognized as ratios of
determinants:
d1 a12 a11 d1
d2 a22 a21 d2
x1 = , x2 = .
a11 a12 a11 a12
a21 a22 a21 a22
This formula is known as Cramer’s rule. As we shall see later, if the denominator
is zero then the two equations can have no solutions or an infinity of solutions.
(These possibilities occur when the vectors [a11, a12] and [a21, a22] are linearly
dependent – see Section 8.2.)
Elimination can be applied to equations with more unknowns as the following
example illustrates.

Example 12.1 Solve the equations


x1 − 2x2 + x3 = − 4, (i)

2x1 + x2 − x3 = −1, (ii)

x1 + 3x2 + 2x3 = 7. (iii)

Eliminate x1 between (i) and (ii), and between (ii) and (iii). Thus 2(i) − (ii) gives
−5x2 + 3x3 = −7, (iv)
261
Example 12.1 continued

12.1
whilst (ii) − 2(iii) gives
−5x2 − 5x3 = −15. (v)

CRAMER’S RULE
Now eliminate x2 between (iv) and (v) by subtraction:
8x3 = 8, or x3 = 1.
Rather than eliminate again to find the other unknowns, we can substitute back x3 = 1
in (iv), say, so that
−5x2 + 3 = −7, or x2 = 2.
Finally substitute x2 = 2 and x3 = 1 into (i):
x1 − 4 + 1 = −4, or x1 = −1.
Hence the full solution is x1 = −1, x2 = 2, x3 = 1.

For more unknowns and equations, this approach becomes increasingly labori-
ous and prone to errors.
A matrix equation
Ax = d (12.1)

defines a set of linear equations (we referred to them previously in connection


with the inverse matrix in Section 7.4). In general, A will be an m × n matrix,
while x and d are n and m column vectors respectively. Usually, but not always, we
are interested in the case where the number of unknowns in the equations equals
the number of equations. In other words, there is neither a surplus of unknowns
nor of restrictive equations. In this case we have an n × n or square matrix, and
this is the normal situation in applications. For example, the set of equations
defined by

⎡ 1 2 1⎤ ⎡x1 ⎤ ⎡ 1⎤
A = ⎢−2 3 −1⎥ , x = ⎢x2 ⎥ , d = ⎢−7⎥
⎢ 1 4 −2⎥ ⎢x ⎥ ⎢−7⎥
⎣ ⎦ ⎣ 3⎦ ⎣ ⎦

is
x1 + 2x2 + x3 = 1, (12.2a)

−2x1 + 3x2 − x3 = −7, (12.2b)

x1 + 4x2 − 2x3 = −7. (12.2c)

Consider now the case in which A is an arbitrary square matrix. If the inverse
of A exists, then multiplication of (12.1) on the left by A−1 leads to the solution
vector
adj A
x = A −1d = d,
det A
using the formula for the inverse given in Equation (8.5c): adj A is the adjoint of
A. Let n = 3; then, for our standard matrix,
262
⎡a11 a12 a13 ⎤
A = ⎢a21 a22 a23 ⎥ ,
LINEAR ALGEBRAIC EQUATIONS

⎢a
⎣ 31 a32 a33 ⎥⎦

we have

⎡x1 ⎤ ⎡d ⎤ ⎡C d + C21d2 + C31d 3 ⎤


adj A adj A ⎢ 1 ⎥ 1 ⎢ 11 1
x = ⎢x2 ⎥ = d= d2 = C12d1 + C22d2 + C32d 3 ⎥ ,
⎢x ⎥ det A det A ⎢d ⎥ det A ⎢C d + C d + C d ⎥
⎣ 3⎦ ⎣ 3⎦ ⎣ 13 1 23 2 33 3 ⎦

where C11, C12, … are the cofactors of a11, a12, … (see Section 8.1). Thus, com-
parison of elements in the vectors leads to

d1 a12 a13
1 1
x1 = (C11d1 + C21d2 + C31d 3 ) = d2 a22 a23 ,
12

det A det A d a32 a33


3

a11 d1 a13 a11 a12 d1


1 1
x2 = a21 d2 a23 , x3 = a21 a22 d2 .
det A a d3 a33 det A a a32 d3
31 31

This is Cramer’s rule for 3 equations in 3 unknowns. It is systematic in that, for


x1, the determinant in the numerator has the first column of A replaced by d; for
x2, the second column is replaced by d; and so on. The generalization to n linear
equations in n unknowns is fairly clear from this formula. It is a useful theoret-
ical result, but not generally a recommended method of solving more than four
equations in four unknowns. High-order determinant evaluation is complicated.

12.2 Elementary row operations


Short of using computer software, the simplest method of solving equations
involves systematic elimination. Consider the three equations
x1 + 2x2 + x3 = 1, (12.3a)

−2x1 + 3x2 − x3 = −7, (12.3b)

x1 + 4x2 − 2x3 = −7. (12.3c)

We can perform three elementary row operations on linear equations which do


not affect the solution. They are

Elementary row operations for simultaneous equations


(i) any equation can be multiplied by a nonzero constant,
(ii) any two equations can be interchanged,
(iii) any equation can be replaced by the sum of itself and any multiple of
another equation.
These operations do not alter the solutions. (12.4)
263
Step 1. Eliminate x1 from (12.3b,c) by adding multiples of (12.3a) from (12.3b)
and (12.3c):

12.2
x1 + 2x2 + x3 = 1,

ELEMENTARY ROW OPERATIONS


7x2 + x3 = −5 (r 2′ = r2 + 2r1),
2x2 − 3x3 = −8 (r 3′ = r3 − r1).
The required operations between the equations are listed on the right: r 1′ , r 2′ , r′3 are
the new equations obtained from the existing equations r1, r2, r3 by the rule shown
at each step.

Step 2. We now proceed to eliminate x2 from (12.3c) using a multiple --72 of the
new row 2. Hence
x1 + 2x2 + x3 = 1,
7x2 + x3 = −5,
− 237 x3 = − 467 (r3′ = r3 − 27 r2 ).

Step 3. Using Rule (i) above, reduce the coefficients of x2 and x3 in the second
and third equations above to 1:
x1 + 2x2 + x3 = 1,
x2 + 17 x3 = − 57 (r2′ = 17 r2 ),
x3 = 2 (r3′ = − 237 r3 ).

Step 4. Starting from the third equation, we can now solve the equations by
back substitution. Since x3 = 2, the second equation then gives,
x2 = − 57 − 17 x3 = − 57 − 1
7 × 2 = −1,
and from the first equation,
x1 = 1 − 2x2 − x3 = 1 + 2 − 2 = 1.
Thus the solution is
x1 = 1, x2 = −1, x3 = 2.
The method is known as Gaussian elimination.
In fact, we need not write down the equations for x1, x2, x3 at each stage, since
all the information in (12.3) is given by the 3 × 4 matrix

⎡ 1 2 1 1⎤
⎢−2 3 −1 −7⎥ ,
⎢ 1 4 −2 −7⎥
⎣ ⎦

which is known as the augmented matrix for the system of equations: the fourth
column consists of the constants on the right-hand sides of (12.3a,b,c). The ele-
mentary operations referred to previously correspond to elementary row operations
on the matrix. We can reproduce the steps above by the following more compact
procedure:
264

⎡ 1 2 1 1⎤ ⎡1 2 1 1⎤
⎢−2 3 −1 −7⎥ → ⎢0 7 1 −5⎥ ⎛ r2′ = r2 + 2r1 ⎞
LINEAR ALGEBRAIC EQUATIONS

⎢ 1 4 −2 −7⎥ ⎢0 2 −3 −8⎥ ⎜ ⎟
⎣ ⎦ ⎣ ⎦ ⎝ r3′ = r3 − r1 ⎠

⎡1 2 1 1⎤
→ ⎢0 7 1 −5⎥
⎢0 0 − 23 − 46 ⎥
⎣ 7 7 ⎦ (r3′ = r3 − 27 r2 )

⎡1 2 1 1⎤
→ ⎢0 1 17 − 57 ⎥ ⎛ r2′ = 17 r2 ⎞
⎢ ⎥ ⎜ r′ = − 7 r ⎟ ,
⎣0 0 1 2⎦ ⎝3 23 3 ⎠

where the arrow ‘→’ means ‘is transformed into’. The final matrix is said to be in
echelon form: that is, it has zeros below the diagonal elements starting from the
top left. We can now solve the equations by back substitution as before.
12

The elements underlined are known as pivots and they must be nonzero.
They are used to clear the elements in the column below them. If any pivot
turns out to be zero as the method progresses, then that equation or row is
replaced by the first row below which has a nonzero coefficient in the column.
If there are no further nonzero coefficients, then the pivot moves across to the
next column.
It is now possible to complete the Gaussian elimination by using further row
operations on the echelon matrix rather than back substitution. Thus, continuing
from the echelon form above

⎡1 2 1 1⎤ ⎡ 1 2 0 −1⎤ ⎛ r1′ = r1 − r3 ⎞
⎢0 1 1 − 5 ⎥ → ⎢0 1 0 −1⎥ ⎜ r′ = r − 1 r ⎟
⎝ 2 2 7 3⎠
⎢ 7 7
⎥ ⎢0 0 1 2⎥
⎣0 0 1 2⎦ ⎣ ⎦
⎡1 0 0 1⎤ (r1′ = r1 − 2r2 ),
→ ⎢0 1 0 −1⎥
⎢0 0 1 2⎥
⎣ ⎦

where the pivots are underlined again. The final matrix now represents the solution
set x1 = 1, x2 = −1, x3 = 2.

Example 12.2 Using Gaussian elimination and back substitution, solve the set
of equations
x1 + x2 + 2x3 = 4,
2x1 + 2x2 + x3 − x4 = −1,
x2 + x 3 + x 4 = 6,
x2 − x3 + 2x4 = 5.
We first perform the pivotal row operations on the augmented matrix as follows:

265
Example 12.2 continued

12.3
⎡1 1 2 0 4⎤ ⎡1 1 2 0 4⎤
⎢2 2 1 −1 −1⎥ ⎢0 0 −3 −1 −9⎥ (r2′ = r2 − 2r1 )
⎢0 →
1 1 1 6⎥ ⎢0 1 1 1 6⎥

THE INVERSE MATRIX BY GAUSSIAN ELIMINATION


⎢ ⎥ ⎢ ⎥
⎣0 1 −1 2 5⎦ ⎣0 1 −1 2 5⎦
⎡1 1 2 0 4⎤
⎢0 1 1 1 6⎥ (r2 ↔ r3 )
→⎢
0 0 −3 −1 −9⎥
⎢ ⎥
⎣0 1 −1 2 5⎦

⎡1 1 2 0 4⎤
⎢0 1 1 1 6⎥
→⎢
0 0 −3 −1 −9⎥
⎢ ⎥
⎣0 0 −2 1 –1⎦ (r4′ = r4 − r2 )

⎡1 1 2 0 4⎤
⎢0 1 1 1 6⎥
→⎢
0 0 −3 −1 −9⎥
⎢ ⎥
⎣0 0 0 5
3 5⎦ (r4′ = r4 − 23 r3 )

⎡1 1 2 0 4⎤
⎢0 1 1 1 6⎥
→⎢
0 0 1 13 3⎥ ⎛ r3′ = − 13 r3 ⎞
⎢ ⎥ ⎜ r′ = 3 r ⎟ .
⎣0 0 0 1 3⎦ ⎝ 4 5 4 ⎠
(Note the row change r2 ↔ r3 at step 2 because of the zero pivot.) Back substitution now gives
x4 = 3, x3 = 3 − 13 x4 = 2, x2 = 6 − x3 − x4 = 1, x1 = 4 − x2 − 2x3 = −1.

Self-test 12.1
Using elementary row operations, solve the equations
x1 + 2x2 − x3 = 1
2x1 − x2 + 3x3 = −3
−x1 + x2 − x3 = −1.

12.3 The inverse matrix by Gaussian elimination


Matrix multiplication is a row-on-column operation (see Section 7.2), so any ele-
mentary row operation applied simultaneously to both A and I in the identity
AA−1 = I maintains the equality. To obtain A−1 from A, apply to both sides of the
identity a sequence of row operations that transform A into I, so that on the
left AA−1 becomes A−1. The parallel transformation of I gives A−1 explicitly. In
other words:

Matrix inversion
Use elementary row operations to transform A into the identity I, and use the
same operations to transform I into A−1. (12.5)
266
Suppose that we require the inverse of
LINEAR ALGEBRAIC EQUATIONS

⎡0 1 0 2⎤
⎢1 0 1 0⎥
A=⎢
1⎥⎥
.
⎢0 1 0
⎢⎣1 0 2 0⎥⎦

We reduce A to I4 and perform the same row operations on I4. Thus, we can write
down the steps in parallel as follows:

⎡0 1 0 2⎤ ⎡1 0 0 0⎤
⎢1 0 1 0⎥ ⎢0 1 0 0⎥
A=⎢ I4 = ⎢
⎢0 1 0 1⎥⎥ ⎢0 0 1 0⎥⎥
⎢⎣1 0 2 0⎥⎦ ⎢⎣0 0 0 1⎥⎦
12

⎡1 0 1 0⎤ (r1 ↔ r2 ) ⎡0 1 0 0⎤
⎢0 1 0 2⎥ ⎢1 0 0 0⎥
→⎢ →⎢
⎢0 1 0 1⎥⎥ ⎢0 0 1 0⎥⎥
⎢⎣1 0 2 0⎥⎦ ⎢⎣0 0 0 1⎥⎦

⎡1 0 1 0⎤ ⎡0 1 0 0⎤
⎢0 1 0 2⎥ ⎢1 0 0 0⎥
→⎢ →⎢
⎢0 1 0 1⎥⎥ ⎢0 0 1 0⎥⎥
⎢⎣0 0 1 0⎥⎦ (r4′ = r4 − r1 ) ⎢⎣0 −1 0 1⎥⎦

⎡1 0 1 0⎤ ⎡ 0 1 0 0⎤
⎢0 1 0 2⎥ ⎢ 1 0 0 0⎥
→⎢ →⎢
⎢0 0 0 −1⎥⎥ (r3′ = r3 − r2 ) ⎢−1 0 1 0⎥⎥
⎢⎣0 0 1 0⎥⎦ ⎢⎣ 0 −1 0 1⎥⎦

⎡1 0 1 0⎤ ⎡ 0 1 0 0⎤
⎢0 1 0 2⎥ ⎢ 1 0 0 0⎥
→⎢ →⎢
⎢0 0 1 0⎥⎥ (r3 ↔ r4 ) ⎢ 0 −1 0 1⎥⎥
⎢⎣0 0 0 −1⎥⎦ ⎢⎣−1 0 1 0⎥⎦

⎡1 0 1 0⎤ ⎡0 1 0 0⎤
⎢0 1 0 2⎥ ⎢1 0 0 0⎥
→⎢ →⎢
⎢0 0 1 0⎥⎥ ⎢0 −1 0 1⎥⎥
⎢⎣0 0 0 1⎥⎦ (r4′ = − r4 ) ⎢⎣ 1 0 −1 0⎥⎦

⎡1 0 1 0⎤ ⎡ 0 1 0 0⎤
⎢0 1 0 0⎥ (r2′ = r2 − 2r4 ) ⎢−1 0 2 0⎥
→⎢ →⎢
⎢0 0 1 0⎥⎥ ⎢ 0 −1 0 1⎥⎥
⎢⎣0 0 0 1⎥⎦ ⎢⎣ 1 0 −1 0⎥⎦

⎡1 0 0 0⎤ (r1′ = r1 − r3 ) ⎡ 0 2 0 −1⎤
⎢0 1 0 0⎥ ⎢−1 0 2 0⎥
→⎢ →⎢
⎢0 0 1 0⎥⎥ ⎢ 0 −1 0 1⎥⎥
⎢⎣0 0 0 1⎥⎦ ⎢⎣ 1 0 −1 0⎥⎦
= I4 = A−1.
267
We conclude that

12.4
⎡ 0 2 0 −1⎤
⎢−1 0 2 0⎥
A −1 =⎢
1⎥⎥
.

COMPATIBLE AND INCOMPATIBLE SETS OF EQUATIONS


⎢ 0 −1 0
⎢⎣ 1 0 −1 0⎥⎦

Self-test 12.2
Using Gaussian elimination, find the inverse of
G 1 2 −1 0 J
H 0 −1 3 2K
A=H .
−1 1 −1 0 K
I 1 −4 2 −1 L

Check that AA−1 = A−1A = I4 .

12.4 Compatible and incompatible sets of equations


Not all sets of equations have solutions. For example, the simultaneous equations
x + y = 1,
x + y = 2,
clearly have no solutions. We say that the equations are incompatible. On the
other hand the equations
x + 2y = 1,
2x + 4y = 2,
have the infinity of solutions
x = 1 − 2λ, y = λ
for any real number λ. For two equations in two unknowns the occurrences of
unique solutions or otherwise are easy to detect: for higher-order sets of equa-
tions these possibilities are not so obvious.
Consider the set of equations
x + y − z = 3,
3x − y + 3z = 5,
x − y + 2z = 2.
We can sense that there might be a problem by first evaluating the determinant of
the coefficients of x, y, and z. Thus

1 1 −1 1 1 −1
4 2
3 −1 3 = 4 0 2 ⎛ r2′ = r2 + r1 ⎞ = − = 0.
⎜ ⎟ 2 1
1 −1 2 2 0 1 ⎝ r3′ = r3 + r1 ⎠
268
Thus Cramer’s rule will fail, although there still may be solutions. We can deter-
mine whether solutions exist more readily by using Gaussian elimination. In this
LINEAR ALGEBRAIC EQUATIONS

case the application of row operations on the augmented matrix leads to

⎡1 1 −1 3⎤ ⎡1 1 −1 3⎤
⎢3 −1 3 5⎥ → ⎢0 − 4 6 − 4⎥ ⎛ r2′ = r2 − 3r1 ⎞
⎢1 −1 2 2⎥ ⎢ − ⎥ ⎜ ⎟
⎣ ⎦ ⎣0 2 3 −1⎦ ⎝ r3′ = r3 − r1 ⎠
⎡1 1 −1 3⎤
→ 0 − 4 6 − 4⎥

⎢0
⎣ 0 0 1⎥⎦ (r3′ = r3 − 21 r2 ),

which is the echelon form for this set of equations. However, row 3 is impossible
to be satisfied. Hence these equations can have no solutions.
On the other hand, consider the following set:
12

x + y − z = 1,
3x − y + 3z = 5,
x − y + 2z = 2
(this is the previous set with one change to the first equation). Gaussian elimina-
tion now gives
⎡1 1 −1 1⎤ ⎡1 1 −1 1⎤
⎢3 −1 3 5⎥ → ⎢0 − 4 6 2⎥ ⎛ r2′ = r2 − 3r1 ⎞
⎢1 −1 2 2⎥ ⎢ − ⎥ ⎜ ⎟
⎣ ⎦ ⎣0 2 3 1⎦ ⎝ r3′ = r3 − r1 ⎠
⎡1 1 −1 1⎤
→ ⎢0 − 4 6 2⎥
⎢0
⎣ 0 0 0⎥⎦ (r3′ = r3 − 21 r2 ).

Row 3 is now consistent, and row 2 is − 4y + 6z = 2. Hence


y = − 14 (2 − 6z)
and, from row 1,
x =1−y+z = 3
2 − 21 z.
Thus z can take any value, say λ, so the full solution set is

⎡x⎤ ⎡ 23 − 21 λ ⎤
⎢y ⎥ = ⎢ − 1 + 3 λ ⎥
⎢z ⎥ ⎢ 2 2 ⎥
⎣ ⎦ ⎣ λ ⎦

for any value of λ . It can be seen in this case that there exists an infinite number of
solutions, a different one for each different value of λ .
Geometrically, in three dimensions, it can be seen why equations can have a
unique solution, no solution, or an infinite set of solutions. Any equation such as
ax + by + cz = d
269

(b)

12.4
(a)

COMPATIBLE AND INCOMPATIBLE SETS OF EQUATIONS


N
P

(c) (d)

Fig. 12.1 (a) Exactly one solution at P. (b) Infinite number of solutions, lying on the straight line
MN. (c) and (d) are examples of cases with no solutions.

represents a plane in 3. Three equations represent three planes, and we need only
visualize how they might intersect or not. The coordinates of any point of inter-
section of the planes is the solution of the equations. The three diagrams in Fig. 12.1
show how three planes can intersect in a single point, no point, or a line of points.

Example 12.3 Determine the complete sets of values for a and b which make
the equations
x − 2y + 3z = 2,
2x − y + 2z = 3,
x + y + az = b
have (i) a unique solution, (ii) no solutions, (iii) an infinite set of solutions.
Reduce the augmented matrix to echelon form using pivots to clear each column successively:
⎡ 1 −2 3 2⎤ ⎡1 −2 3 2 ⎤
⎢2 −1 2 3⎥ → ⎢0 3 −4 −1 ⎥ ⎛ r2′ = r2 − 2r1 ⎞
⎢1 1 a b⎥⎦ ⎢0 3 a − 3 b − 2⎥ ⎜⎝ r ′ = r − r ⎟⎠
⎣ ⎣ ⎦ 3 3 1

⎡1 −2 3 2 ⎤
→ ⎢0 3 −4 −1 ⎥
⎢0 0 a + 1 b − 1⎥ (r ′ = r − r ).
⎣ ⎦ 3 3 2

We can now interpret the echelon matrix.


(i) If a ≠ −1, then z has the unique solution
b−1
z= .
a+1
Also y and x can be found by back substitution.
(ii) If a = −1, and b ≠ 1 then row 3 will lead to an inconsistency. Hence there are no
solutions of the equations.
(iii) If a = −1, and b = 1 then row 3 implies z = λ for any number λ . Also x and y can be
found by back substitution.
270
This example illustrates the advantage of Gaussian elimination over the formula-
based method of Cramer’s rule. The Gaussian method can still be used if the
LINEAR ALGEBRAIC EQUATIONS

number of equations differs from the number of unknowns. Consider the


following example.

Example 12.4 Investigate all solutions of


x + y − z = 1,
3x − y + 3z = 5,
x − y + 2z = 2,
x + z = 3.
There are 3 unknowns and 4 equations. The augmented matrix is
⎡1 1 −1 1⎤ ⎡1 1 −1 1⎤
⎢3 −1 3 5⎥ ⎢0 − 4 6 2⎥ ⎛ r2′ = r2 − 3r1 ⎞
12

→ ⎢0 −2 3
⎢1 −1 2 2⎥ 1⎥ ⎜ r3′ = r3 − r1 ⎟
⎢ ⎥ ⎢ ⎥ ⎜ r′ = r − r ⎟
⎣1 0 1 3⎦ ⎣0 −1 2 2⎦ ⎝ 4 4 1 ⎠
⎡1 1 −1 1⎤
⎢0 − 4 6 2⎥
→⎢
0 0 0 0⎥ ⎛ r3′ = r3 − 12 r2 ⎞
⎢ 1 3⎥ ⎜ r′ = r − 1 r ⎟
⎣0 0 2 2⎦ ⎝ 4 4 4 2⎠
⎡1 1 −1 1⎤
⎢0 − 4 6 2⎥
→⎢ 3⎥
2 ⎥ (r3 ↔ r4 ).
1
0 0
⎢ 2
⎢⎣0 0 0 0⎥⎦
Row 4 is consistent, while row 3 implies z = 3. Then y and x can be found by back
substitution in rows 2 and 1. Confirm that y = 4 and x = 0.

On the other hand, there may be more variables than equations, as in the
following example. (Inconsistency is still possible.)

Example 12.5 Show that the following equations are inconsistent:


x1 + x2 + x3 + 2x4 = 1,
x1 − 2x2 + 3x3 − x4 = 4,
3x1 − 3x2 + 7x3 = 7.
Proceed as before, and successively reduce the augmented matrix by pivots. Thus
⎡1 1 1 2 1⎤ ⎡1 1 1 2 1⎤
⎢1 −2 3 −1 4⎥ → ⎢0 −3 2 −3 3⎥ ⎛ r2′ = r2 − r1 ⎞
⎢3 −3 7 0 7 ⎥ ⎢0 − 6 4 − 6 4⎥ ⎜ r ′ = r − 3r ⎟
⎣ ⎦ ⎣ ⎦ ⎝ 3 3 1⎠

⎡ 1 1 12 1 1⎤
→ ⎢0 −3 2 −3 3⎥
⎢0 0 0 0 −2⎥ (r = r − 2 r ).
⎣ ⎦ 3′ 3 2

Since 0 ≠ 2, row 3 indicates an inconsistency, so the equations are incompatible.


271

Self-test 12.3

12.5
Determine the complete set of values for a and b which make the equations
x − 2y + 3z = 2,

HOMOGENEOUS SETS OF EQUATIONS


x − y + z = 1,
x + ay + 2z = b,
have (i) an unique solution, (b) no solutions, (c) an infinite set of solutions.

12.5 Homogeneous sets of equations


Any set of equations Ax = 0 is known as a homogeneous set; it is a set of linear
equations with zero right-hand sides. Clearly, the equations always have the
so-called trivial solution x = 0, but there may exist non-trivial solutions. What are
the conditions for their existence? Consider the following example.

Example 12.6 Find the value of a for which the following equations have
non-trivial solutions:
x + y + z = 0,
x + 2y = 0,
x − 3y + az = 0.
Proceed in the usual way using Gaussian reduction. Thus
⎡1 1 1 0⎤ ⎡1 1 1 0⎤
⎢1 2 0 0⎥ → ⎢0 1 −1 0⎥ ⎛ r2′ = r2 − r1 ⎞
⎢1 −3 a 0⎥ ⎢0 − 4 a − 1 0⎥ ⎜ r ′ = r − r ⎟
⎣ ⎦ ⎣ ⎦ ⎝ 3 3 1⎠
⎡1 1 1 0⎤
→ ⎢0 1 −1 0⎥
⎢0 0 a − 5 0⎥ (r ′ = r + 4 r ).
⎣ ⎦ 3 3 2

Non-trivial solutions exist if, and only if, a = 5. Put z = c, any number. Then by back
substitution we obtain
y = z = c, x = −y − z = −2c
for any c. There is therefore an infinite number of solutions if a = 5.

If A is a square matrix, then Cramer’s rule (Section 12.1) implies that


x det A = 0.
Hence x is a nonzero column vector only if det A = 0. Further, it can be proved
that if det A = 0 then there is an infinite number of solutions. We therefore have
the following test:
272

Homogeneous equations Ax = 0, where A is square.


LINEAR ALGEBRAIC EQUATIONS

If det A = 0, there is an infinite number of non-trivial solutions. If det A ≠ 0,


the only solution is x = 0. (12.6)

Example 12.7 Find all conditions on the constants a, b, and c in order that
x + y + z = 0,
ax + by + cz = 0,
a2x + b2y + c2z = 0
should have non-trivial solutions. Find the solutions in the cases
(a) a = 1, b = 1, c = 2; (b) a = 1, b = 1, c = 1.
This system of equations will have non-trivial solutions for x, y, and z if, and only if,
12

1 1 1
D= a b c = 0.
a2 b2 c2
Thus
1 0 0
⎛ c 2′ = c 2 − c1 ⎞
D= a b−a c−a ⎜ c′ = c − c ⎟
⎝ 3 1⎠
a2 b2 − a2 c 2 − a2 3

1 0 0
= (b − a)(c − a) a 1 1
a2 b+a c+a
= (b − a)(c − a )(c + a − b − a)
= (b − c)(c − a)(a − b).
Hence, non-trivial solutions exist if b = c, c = a, or a = b.
(a) (a = 1, b = 1, c = 2) The equations become
x + y + z = 0,
x + y + 2z = 0,
x + y + 4z = 0.
The augmented matrix is
⎡1 1 1 0⎤ ⎡1 1 1 0⎤
⎢1 1 2 0⎥ → ⎢0 0 1 0⎥ ⎛ r2′ = r2 − r1 ⎞
⎢1 1 4 0⎥ ⎢0 0 3 0⎥ ⎜ r ′ = r − r ⎟
⎣ ⎦ ⎣ ⎦ ⎝ 3 3 1⎠
⎡1 1 1 0⎤
→ ⎢0 0 1 0⎥
⎢0 0 0 0⎥ (r ′ = r − 3r ).
⎣ ⎦ 3 3 2

Row 2 implies z = 0, while row 1 implies x = −y. Let y = λ, say. Then the solution set is
⎡x⎤ ⎡−λ ⎤ ⎡−1⎤
⎢y ⎥ = ⎢ λ ⎥ = ⎢ 1⎥λ ,
⎢ z ⎥ ⎢ 0⎥ ⎢ 0⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
for any λ. ➚
273
Example 12.7 continued

12.6
(b) (a = 1, b = 1, c = 1) Applying Gaussian elimination, we find that
⎡1 1 1 0⎤ ⎡1 1 1 0⎤

GAUSS–SEIDEL ITERATIVE METHOD OF SOLUTION


⎢1 1 1 0⎥ → ⎢0 0 0 0⎥ ⎛ r2′ = r2 − r1 ⎞
⎢1 1 1 0⎥ ⎢0 0 0 0⎥ ⎜ r ′ = r − r ⎟
⎣ ⎦ ⎣ ⎦ ⎝ 3 3 1⎠
Hence, we are left with row 1, which implies
x + y + z = 0.
Let z = λ and y = µ. Then x = −λ − µ. Hence the solution is
⎡x⎤ ⎡−λ − µ⎤ ⎡−1⎤ ⎡−1⎤
⎢y ⎥ = ⎢ µ ⎥= ⎢ 0⎥ λ + ⎢ 1⎥ µ
⎢z ⎥ ⎢ λ ⎥ ⎢ 1⎥ ⎢ 0⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
for any λ and any µ. Note that this is a two-parameter solution set.

12.6 Gauss–Seidel iterative method of solution


The method of Gaussian elimination, described in Section 12.2, is not a practical
approach, by hand, for a large system with perhaps 30 equations in 30 unknowns.
Whatever method is employed, the equations will have to be solved by computer.
But decisions about what scheme may be the best for a given set of equations
are not always easy. There are many direct iterative methods in addition to the
row-operation method described in Section 12.2. Two such methods will be
briefly explained here.
Consider the equations
3x1 + x2 + x3 = −1,
−x1 + 4x2 + x3 = −8,
2x1 + x2 + 5x3 = −14.
Write the equations as
x1 = 13 (−x2 − x3 − 1), (12.7)

x2 = (x1 − x3 − 8),
1
4 (12.8)

x3 = (−2x1 − x2 − 14),
1
5 (12.9)

where x1, x2, and x3 are now the subjects of the three equations.
2 = 0 and x 3 = 0,
To start the iteration choose initial values for x2 and x3, say x(0) (0)
(1)
without thinking about the equations. Calculate x 1 from eqn (12.7) as
x (11) = 13 (−x (20) − x (30) − 1). (12.10)

Use x(1) (0)


1 in (12.8) with x 2 . Thus

x (21) = 14 (x (11) − x (30) − 8). (12.11)

Finally, use the update x(1) (1)


2 in (12.9) to find x 3 :

x (31) = 15 (2x (11) − x (21) − 14). (12.12)


274
Hence we have calculated a new approximate solution given by x(1) (1) (2)
1 , x 2 , x 3 . Now
(1) (1)
repeat the calculations starting with x 2 and x 3 as the new initial values to obtain
LINEAR ALGEBRAIC EQUATIONS

x(2) (2) (3)


1 , x 2 , x 3 . The output from these iterations is shown in the following table.

i 0 1 2 3 4 5 6 7
x (i1 ) – − 0.3333 1.1110 1.0570 1.0030 0.9984 0.9997 1.0000
x (i2 ) 0 −2.0830 −1.1600 −0.9825 −0.9926 − 0.9997 −1.0000 −1.0000
x (i3 ) 0 −2.2500 −3.0130 −3.0260 −3.0030 −2.9990 −3.0000 −3.0000

All solutions are quoted to 4 decimal places. It can be seen that the exact solu-
tion, which is x1 = 1, x2 = −1, x3 = −3, can be achieved to this accuracy after seven
steps for this example. This is known as the Gauss–Seidel scheme for numerical
solution of linear equations.
An alternative method without updating given by
x (11) = 13 (−x (20) − x (30) − 1),
12

x (21) = 14 (x (10) − x (30) − 8),


x (31) = 15 (−2x (10) − x (20) − 14)
is known as Jacobi’s method. However, convergence to the exact solution is slower
with this scheme.
These methods do not always converge to the solution. For example, the Gauss–
Seidel scheme applied to the system (12.2) fails since the iterates continue to
increase in size. It can be shown that the Gauss–Seidel scheme converges if the
magnitude of each leading diagonal element exceeds the sum of the magnitudes of
the remaining elements in the same row of the matrix of coefficients. This is the
case for the system given by (12.7,8,9). Here the matrix of coefficients is

⎡ 3 1 1⎤
⎢−1 4 1⎥ .
⎢ 2 1 5⎥
⎣ ⎦
Each of the diagonal elements dominates the remaining elements in that row, since
3  1 + 1 = 2, 4  | −1| + 1 = 2, 5  2 + 1 = 3.
This property of the system of equations is known as diagonal dominance. If the
matrix is not diagonally dominant, then the scheme may or may not converge.
Usually a few steps will indicate whether this is likely to be the case.
The schemes for both these methods can be expressed in matrix form as
follows. Let the system of equations
Ax = d,
where A = [aij] is an n × n matrix. Let
A = AL + D + AU,
where AL, D, and AU are respectively the lower triangular, diagonal, and upper
triangular matrices given by
275

⎡0 0 … 0⎤
⎡a11 0 … 0 ⎤
⎢a21 0 … 0⎥

PROBLEMS
⎢ 0 a22 … 0 ⎥
AL = ⎢a31 a32
⎢ … 0⎥⎥ , D = ⎢ ⎥,
⎢     ⎥
⎢   ⎥ ⎢⎣ 0 0 … ann ⎥⎦
⎢⎣an1 an 2 … 0⎥⎦
⎡0 a12 a13 . . . a1n ⎤
⎢0 0 a23 . . . a2 n ⎥
AU = ⎢ ⎥.
⎢     ⎥
⎢⎣0 0 0 . . . 0 ⎥⎦

The matrix equation becomes


ALx + Dx + AUx = d.
It is easy to find x from an equation of the form Dx = ··· , since D is a diagonal
matrix with a simple inverse. This is the matrix which is updated by Jacobi’s
method. Assuming that x(0) = [x(0) (0) T
1 , … , x n ] is the given initial estimate, the approx-
imate solution at step r will be computed from
Dx(r) = −AUx(r−1) − ALx(r−1) + d.
On the other hand, in the Gauss–Seidel scheme, we take advantage of the observa-
tion that x(r) (r)
1 , x 2 , … are successively computed from rows 1, 2, … . Hence they can
be used in the rows that follow. Thus the Gauss–Seidel iterations are given by
Dx(r) = −ALx(r) − AUx(r−1) + d.

Problems

12.1 (Section 12.1). Solve the following systems of (e) x1 + 5x2 + 2x4 = 1,
linear equations using Cramer’s rule: − 3x2 − x4 = 1,
(a) x1 + x3 = 1, 3x2 + x3 + x4 = 1,
x2 − x3 = 3, 2x2 + x3 + x4 = 2.
2x1 + x2 = −1;
(b) x1 + 7x + x3 = 1, 12.2 (Section 12.1). The currents i1, i2, i3 (in amps)
2
flow in parts of a circuit which contains a variable
x2 − x3 = 3,
resistor of resistance R (in ohms). The equations
2x1 + x2 + 10x3 = −1; for the currents are given by
(c) x1 + 5x2 − x3 = 1, 4i1 − i2 − i3 = 12,
−3x1 + x2 − x3 = 1, −i1 − Ri2 = 24,
3x1 + x2 + x3 = −3; i1 + 5i3 = −12,
(d) x1 + x2 + x3 = 1, in terms of the voltages on the right-hand side. For
ax1 + bx2 + cx3 = d, design reasons, the current i3 should be 2 amps.
a2x1 + b2x2 + c2x3 = d 2; How many ohms should the resistance R be?
276
12.3 (Section 12.4). Show that the following sets 12.9 (Section 12.3). Find the inverses of the
of equations are inconsistent. following matrices.
LINEAR ALGEBRAIC EQUATIONS

(a) x1 + 2x2 + x3 = 3, ⎡ 6 −3 6 ⎤ ⎡ 1 −1 2⎤
x1 − 3x2 + 2x3 = 4, (a) ⎢ 3 6 6 ⎥ ; (b) ⎢ 1 2 1⎥ ;
5x1 + 5x2 + 6x3 = 1; ⎢−12 −3 6 ⎥ ⎢− 4 −1 2⎥
⎣ ⎦ ⎣ ⎦
(b) x1 + x2 + x3 = 2,
⎡1 1 0 0 0 0⎤
x1 + x3 + 2x4 = 3, ⎢0 0⎥
⎡ 2 −1 2 0⎤ 1 0 0 0
x1 + x2 + x4 = 4, ⎢ 1 0 −1 2⎥ ⎢0 0 1 0 0 0⎥
(c) ⎢ (d) ⎢ ⎥;
2⎥
;
− x2 + 2x3 = 2; 0 0 −1 ⎢0 0 0 1 0 0⎥
⎢ ⎥
(c) x1 + x2 = 1, ⎣ −1 0 1 0⎦ ⎢0 0 0 0 1 1⎥
⎢⎣0 0 0 0 0 1⎥⎦
x2 + x 3 = 1,
x3 + x4 = 1, ⎡1 0 0 0 0⎤
x4 + x5 = 1, ⎢1 1 0 0 0⎥
x1 + 3x2 + 5x3 + 7x4 + 4x5 = 1. (e) ⎢1 1 1 0 0⎥ .
⎢ ⎥
⎢1 1 1 1 0⎥
12

12.4 (Section 12.4). Determine the complete set of ⎢⎣1 1 1 1 1⎥⎦


values for a and b that make the equations
x + y − z = 2, 12.10 Show that
2x + 3y + z = 3,
⎡1 0 1 0 1⎤
5x + 7y + az = b ⎢0 1 0 1 0⎥
have (i) a unique solution, (ii) no solutions, (iii) an ⎢1 0 1 0 1⎥
⎢ ⎥
infinite set of solutions. ⎢0 1 0 1 0⎥
⎢⎣1 0 1 0 1⎥⎦
12.5 (Section 12.4). Investigate all solutions of the
system is a singular matrix.
x − y + 2z = 1,
12.11 (Section 12.1). The four planes
x + y + 3z = 2,
6x − 3y − z = −3,
x + 2y − z = 3,
2x − y + 5z = 15,
x − 2y + 6z = 0.
y + z = 1,
12.6 (Section 12.4). Show that the following 2x + y − z = 1
equations are inconsistent.
are the faces of a tetrahedron. Find the coordinates
x1 + x2 + x3 − x4 = 10, of all its vertices.
x1 − x2 − x3 = 1,
4x1 − 2x2 − 2x3 − x4 = 5. 12.12 A light source is situated at the point
P : (3, 2, 2). A triangle has the points A : (1, 1, 1),
12.7 (Section 12.2). Solve the following equations B : (1, 0, 1), C : (2, 1, 1) as vertices. Find the
by Gaussian elimination. coordinates of the vertices of the shadow of the
x2 + 2x3 − x4 = 11, triangle on the coordinate planes x = 0, y = 0,
and z = 0.
x1 + x2 + x3 + x4 = 1,
2x1 + x2 − x3 + 4x4 = 0,
12.13 The parabola
x1 − x2 + x3 − 2x4 = 2.
y = α + βx + γ x2
12.8 (Section 12.4). Find the value of a for which is required which passes through the three points
the linear equations (x1, y1), (x2, y2), and (x3, y3). When solutions exist,
ax − y + 2z = 1, find α, β, and γ , and discuss the cases where there
are no solutions.
x + 2y − az = 2,
4x + y − 2z = 2 12.14 Find all values of the constants λ and µ in
have no solutions. order that the equations
277
x + y + z = 4, kx1 + 4x2 − x3 + 3x4 = 0,
x − y + z = 2, 4x1 + kx2 − x3 + 3x4 = 0,

PROBLEMS
2x + y − λz = µ 4x1 − x2 + kx3 + 3x4 = 0,
may have (a) just one solution, (b) no solutions, 4x1 − x2 + 3x3 + kx4 = 0
(c) an infinite set of solutions. have non-trivial solutions?

12.15 (Section 12.3). For each of the sets of 12.19 (Section 12.4). Show that the equations
equations below, set up the augmented matrix and, x1 + 2x2 + 3x3 = 4,
using elementary row operations, decide on the
consistency of the equations. If they are consistent, 2x1 + 3x2 + 8x3 − x4 = 20,
obtain all solutions in each case. 2x1 + 5x2 + 4x3 + x4 = 5
(a) x + y + z = 3, are inconsistent.
3x + 5y + z = −1,
12.20 (Section 12.3). Find the inverses of
x + 2y = 0;
⎡1 λ 0⎤ ⎡ 1 0 0⎤
(b) y + z = 1, ⎢0 1 λ ⎥ and ⎢µ 1 0⎥ .
x + y + 2z = 3, ⎢0 0 1 ⎥⎦ ⎢ 0 µ 1⎥
⎣ ⎣ ⎦
x− y = 1; Hence find the inverse of
(c) x + 2y + z = 4, ⎡1 + λµ λ 0⎤
x+ y = −1, ⎢ µ 1 + λµ λ ⎥ .
⎢ 0 µ 1 ⎥⎦
3x + 4y − z = 12. ⎣
Find the inverse of
12.16 (Section 12.5). Find all solutions of the ⎡13 3 0⎤
determinant equation ⎢ 4 13 3⎥ .
⎢ 0 4 1⎥
1−k 2 −1 ⎣ ⎦
2 1−k −1 = 0.
−1 −1 2 − k 12.21 (Section 12.5). Express the determinant

What are the values of k for which the following 1 1 1


set of equations has non-trivial solutions? a2 b2 c2
a(b + c) b(c + a) c(a + b)
(1 − k)x + 2y − z = 0,
as the product of factors.
2x + (1 − k)y − z = 0,
Obtain the values of a, b, and c for which
−x − y + (2 − k)z = 0. non-trivial solutions of
x + a2y + a(b + c)z = 0,
12.17 (Section 12.5). Show that
x + b2y + b(c + a)z = 0,
a2 + t ab ca
ab b2 + t bc = t 2(t + a 2 + b2 + c 2 ).
x + c2y + c(a + b)z = 0
ca bc c2 + t exist. Find the complete solution in the case
a + b = −c.
For what values of t do the equations
(1 + t)x + 2y + 3z = 0, 12.22 (Section 12.6). Using the Gauss–Seidel
iterative scheme solve the system of equations:
2x + (4 + t)y + 6z = 0,
3x1 + x2 + x3 = 5,
3x + 6y + (9 + t)z = 0
6x2 − 2x3 + 3x4 = 6,
have non-trivial solutions? Find all solutions in
each case. x1 + 4x3 − 2x4 = 1,
x2 + 2x3 − 4x4 = 2.
12.18 (Section 12.5). For what values of k do the Show that the iterations converge to a solution
equations accurate to 4 significant figures within 11 steps,
278
starting from (0, 1, 0, 0). Confirm also that the However, confirm that the Gauss–Seidel scheme
matrix of coefficients is diagonally dominant. delivers a solution accurate to 4 significant figures
LINEAR ALGEBRAIC EQUATIONS

after 10 iterations starting at (0, 0, 1).


12.23 (Section 12.6). Show that the Gauss–Seidel
scheme fails for the system
x1 − 2x2 + x3 = 4, 12.25 For comparison purposes with the
x1 − x2 − x3 = 1, Gauss–Seidel method, solve the equations
2x1 + 3x2 − 4x3 = 4 (12.7)–(12.9), namely
(the matrix in this case is not diagonally dominant). x1 = 31 (−x 2 − x3 − 1),

12.24 (Section 12.6). Show that one row of the x 2 = 41 (x1 − x3 − 8),
matrix of coefficients fails to be dominant in the x3 = 15 (−2x1 − x 2 − 14),
system
using the Jacobi method. How many steps are
6x1 − x2 + x3 = 2, required to achieve the same accuracy as that in
3x1 + 2x2 + x3 = 1, the table in Section 12.6, that is to 5 significant
x1 − x2 + 4x3 = 5. figures?
12
Eigenvalues and
eigenvectors 13

CONTENTS

13.1 Eigenvalues of a matrix 279


13.2 Eigenvectors 281
13.3 Linear dependence 285
13.4 Diagonalization of a matrix 286
13.5 Powers of matrices 289
13.6 Quadratic forms 292
13.7 Positive-definite matrices 295
13.8 An application to a vibrating system 298
Problems 301

The problem of finding a formula for powers of a square matrix A requires the
construction of matrices which transform A into a diagonal matrix. This process
involves the determination of the eigenvalues and eigenvectors of A. Eigenvalues
have many other applications wherever linear equations occur, particularly in
systems of linear differential equations.

13.1 Eigenvalues of a matrix


With any square matrix A, we can associate a set of homogeneous linear
equations Ax = 0. As we saw in Section 12.5 of the previous chapter, such a set of
equations will only have a non-trivial solution set if det A = 0. Consider now the
n × n set of equations
Ax = λ x, or (A − λ In)x = 0,
where λ is a parameter, and In is the unit matrix (see Section 7.3(5)). In order for
these equations to have non-trivial solutions, we must have
det(A − λ In) = 0. (13.1)

This can only be satisfied if λ takes certain values. These are called the eigenvalues
of the matrix A, and the equation they satisfy (eqn (13.1)) is called the character-
istic equation of A. The characteristic equation is a polynomial equation, of
degree n in λ. We usually list the eigenvalues as λ 1, λ 2, and so on.
280

Example 13.1 Find the eigenvalues of


EIGENVALUES AND EIGENVECTORS

⎡1 3⎤
A=⎢ ⎥.
⎣2 2⎦
The eigenvalues of A are given by the determinant equation
1−λ 3
det(A − λI2 ) = = 0,
2 2−λ
which can be expanded into
(1 − λ)(2 − λ) − 6 = 0, or λ2 − 3λ − 4 = 0.
This factorizes into (λ − 4)(λ + 1) = 0: hence the eigenvalues are λ 1 = −1, λ 2 = 4.

Example 13.2 Find the eigenvalues of


⎡2 −2⎤
A=⎢
13

⎥.
⎣ 1 4⎦
In this case

2−λ −2
det(A − λI2 ) = = (2 − λ )(4 − λ ) + 2 = λ2 − 6λ + 10 = 0,
1 4−λ
and the quadratic equation has the complex roots
λ = 12 [6 ± √(36 − 40)] = 3 ± i.
Thus real matrices can have complex eigenvalues, which will occur in pairs of complex
conjugates.

Example 13.3 Find the eigenvalues of


⎡1 2 1⎤
A = ⎢2 1 1⎥ .
⎢1 1 2 ⎥
⎣ ⎦
Here
1−λ 2 1 4−λ 4−λ 4−λ
det(A − λI 3 ) = 2 1−λ 1 = 2 1−λ 1 (r1′ = r1 + r2 + r3 )
1 1 2−λ 1 1 2−λ
1 1 1
= (4 − λ ) 2 1 − λ 1
1 1 2−λ
1 0 0
⎛ c 2′ = c 2 − c1 ⎞
= (4 − λ ) 2 −1 − λ −1 ⎜ c′ = c − c ⎟
⎝ 3 1⎠
1 0 1−λ 3

= (4 − λ)(−1 − λ)(1 − λ) = 0,
if λ = 4 or ±1. Hence the eigenvalues are
λ 1 = 4, λ 2 = 1, λ 3 = −1.
281

Eigenvalues

13.2
The eigenvalues of the n × n square matrix A are the solutions λ of the
determinant equation

EIGENVECTORS
det(A − λIn) = 0. (13.2)

13.2 Eigenvectors
Associated with each eigenvalue λ of A, there will be non-trivial solutions of the
equation (A − λIn)x = 0.
These are called the eigenvectors of A corresponding to the eigenvalue λ, and
are generally denoted in this text by s. Thus, if λ is an eigenvalue of A, then there
will exist a corresponding eigenvector s ≠ 0, which is a non-trivial solution of
(A − λIn)s = 0.
The solutions of this set of linear equations can be found by Gaussian elimination.

Example 13.4 Find the eigenvectors of


⎡1 3⎤
A=⎢ ⎥.
⎣2 2⎦
From Example 13.1, the eigenvalues are λ 1 = 4 and λ 2 = −1. Let the corresponding
eigenvectors be
⎡a ⎤ ⎡a ⎤
s1 = ⎢ 1 ⎥ , s2 = ⎢ 2 ⎥ .
⎣b1 ⎦ ⎣b2 ⎦
Thus (A − λ1I2)s1 = 0 becomes
⎡1 − 4 3 ⎤ ⎡a1 ⎤ ⎡0⎤ ⎧−3a + 3b1 = 0⎫
⎢⎣ 2 = , or ⎨ 1 ⎬.
2 − 4⎥⎦ ⎢⎣b1 ⎥⎦ ⎢⎣0⎥⎦ ⎩ 2a1 − 2b1 = 0⎭
Solution is easy in this case, and the solutions can be expressed as a1 = b1 = α for any α.
If we put α = 1, then an eigenvector is
⎡1⎤
s1 = ⎢ ⎥ .
⎣1⎦
Any nonzero value of α will give an eigenvector; we usually choose a convenient value
for the parameter to give one solution. The others are multiples of this.
Similarly (A − λ 2I2)s2 = 0 becomes
⎡1 + 1 3 ⎤ ⎡a2 ⎤ ⎧2a + 3b2 = 0⎫
⎢⎣ 2 = 0, or ⎨ 2 ⎬.
2 + 1⎥⎦ ⎢⎣b2 ⎥⎦ ⎩2a2 + 3b2 = 0⎭
The eigenvectors for this case are
⎡ β ⎤
s2 = ⎢ 2 ⎥ ,
⎣− 3 β ⎦
for any nonzero β. As before, we choose a particular value of β which makes the
eigenvector specific and simple. In this case we could put β = 3 to give the eigenvector
⎡ 3⎤
s2 = ⎢ ⎥ .
⎣−2⎦
282

Example 13.5 Find the eigenvectors of


EIGENVALUES AND EIGENVECTORS

⎡1 2 1⎤
A = ⎢2 1 1⎥ .
⎢1 1 2 ⎥
⎣ ⎦
The eigenvalues of A are λ 1 = 4, λ 2 = 1, λ 3 = −1 (see Example 13.3). Let the corresponding
eigenvectors be

⎡ai ⎤
si = ⎢bi ⎥ (i = 1, 2, 3).
⎢c ⎥
⎣ i⎦
In each case, we need to solve (A − λ iI3)si = 0. If λ 1 = 4, then
−3a1 + 2b1 + c1 = 0,
2a1 − 3b1 + c1 = 0,
a1 + b1 − 2c1 = 0.
13

Gaussian elimination leads to

⎡−3 2 1 0⎤ ⎡−3 2 1 0⎤
⎢ 2 −3 1 0⎥ → ⎢ 0 − 53 5
0⎥ ⎛ r2′ = r2 + 23 r1 ⎞
⎢ ⎥
⎢ 1 1 −2 0⎥ 0⎦ ⎜⎝ r3′ = r3 + 13 r1 ⎟⎠
3

⎣ ⎦ ⎣ 0 5
3 − 5
3

⎡−3 2 1 0⎤
→ ⎢ 0 − 53 53 0⎥
⎢ 0 0 0 0⎥⎦ (r3′ = r3 + r2 ).

By back substitution, if c1 = α, then b1 = c1 = α, and a1 = 13 (2b1 + c1 ) = α. Thus, with
α = 1, an eigenvector is
⎡1⎤
s1 = ⎢1⎥ .
⎢1⎥
⎣⎦
The other eigenvectors corresponding to λ 1 are simply multiples of s1. Using the same
procedure shows that the two eigenvectors corresponding respectively to λ 2 and λ 3 can
be chosen to be

⎡−1⎤
s2 = ⎢−1⎥ ,
⎢ 2⎥
⎣ ⎦

⎡ 1⎤
s 3 = ⎢−1⎥ .
⎢ 0⎥
⎣ ⎦

Example 13.6 Find the eigenvalues and eigenvectors of


⎡1 2 −1⎤
A = ⎢1 2 −1⎥ .
⎢2 2 −1⎥
⎣ ⎦ ➚
283
Example 13.6 continued

13.2
In this example,
1−λ 2 −1

EIGENVECTORS
det(A − λI 3 ) = 1 2−λ −1
2 2 −1 − λ
−λ λ 0 (r1′ = r1 − r2 )
= 1 2−λ −1
2 2 −1 − λ
−λ 0 0
= 1 3−λ −1 (c ′2 = c 2 + c1 )
2 4 −1 − λ
= −λ[(3 − λ)(−1 − λ) + 4]
= −λ(λ − 1)2.
This particular matrix has an eigenvalue 0 and a repeated eigenvalue 1. How does this
affect the eigenvectors? Let the eigenvectors be, for λ 1 = 0 and λ 2 = 1,
⎡ai ⎤
si = ⎢bi ⎥ (i = 1, 2).
⎢c ⎥
⎣ i⎦
For λ 1 = 0,
a1 + 2b1 − c1 = 0,
a1 + 2b1 − c1 = 0,
2a1 + 2b1 − c1 = 0.
Hence a1 = 0, b1 = α, c1 = 2α, for any α. An eigenvector is
⎡0⎤
s1 = ⎢1⎥ .
⎢2⎥
⎣ ⎦
For λ 2 = 1,
2b2 − c2 = 0,
a2 + b2 − c2 = 0,
2a2 + 2b2 − 2c2 = 0.
If we let b2 = β, then c2 = 2β and a2 = c2 − b2 = β. Hence we can associate with λ 2 = 1 the
eigenvector
⎡1⎤
s2 = ⎢1⎥ ,
⎢2⎥
⎣ ⎦
by putting β = 1. There are only two independent eigenvectors in this example.

Note that if A has a zero eigenvalue, then A must be a singular matrix since
det A = 0. And conversely, if A is singular, then A has at least one zero eigenvalue.
The matrix in Example 13.6 has two eigenvalues (one repeated) and two
eigenvectors. The meaning of this reduced eigenvector set will be illustrated in the
284
context of coordinate transformations in Section 13.4. As the next example
illustrates, a matrix can have a repeated eigenvalue but still retain a full set of
EIGENVALUES AND EIGENVECTORS

independent eigenvectors.

Example 13.7 Find the eigenvalues and eigenvectors of


⎡ 3 0 −1⎤
A = ⎢0 1 0⎥ .
⎢2 0 0⎥
⎣ ⎦
Thus
3−λ 0 −1
det(A − λI 3 ) = 0 1−λ 0 = (3 − λ)(1 − λ)(−λ) + (−1)(−2)(1 − λ)
2 0 −λ
= (1 − λ)[−3λ + λ2 + 2] = −(λ − 2)(λ − 1)2.
Let λ 1 = 2 and λ 2 = 1 with corresponding eigenvectors
13

⎡ai ⎤
si = ⎢bi ⎥ (i = 1, 2).
⎢c ⎥
⎣ i⎦
For λ 1 = 2,
a1 − c1 = 0,
−b1 = 0,
2a1 − 2c1 = 0.
We can let b1 = 0, c1 = α, a1 = α. Hence we can choose
⎡1⎤
s1 = ⎢0⎥ .
⎢1⎥
⎣ ⎦
For λ 2 = 1,
2a2 − c2 = 0,
0 = 0,
2a2 − c2 = 0.
If a2 = β, then c2 = 2β but b2 can then take any value γ , say. Hence, the eigenvector set is
⎡β ⎤ ⎡1⎤ ⎡0⎤
s2 = ⎢ γ ⎥ = β ⎢0⎥ + γ ⎢1⎥ ,
⎢2β ⎥ ⎢2⎥ ⎢0⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
this is it contains two parameters β and γ. The choices of β = 1 with γ = 0, and β = 0 with
γ = 1, say, give two independent eigenvectors
⎡1⎤ ⎡0⎤
⎢0⎥ and ⎢1⎥ .
⎢2⎥ ⎢0⎥
⎣ ⎦ ⎣ ⎦

Unlike the previous example, three independent eigenvectors are associated


with this matrix even though the matrix has only two eigenvalues. We shall take
up this point again in connection with the diagonalization of matrices.
285

Eigenvectors

13.3
The eigenvectors of a square matrix A are the non-trivial solutions sr of the
homogeneous equations

LINEAR DEPENDENCE
(A − λ r I n)sr = 0, for each eigenvalue λ r . (13.3)

Self-test 13.1
Find the eigenvalues and eigenvectors of

G1 2 –1 J
A = H2 –1 1K .
I0 –2 1L

13.3 Linear dependence


It is useful in mathematics to gather, in a collection or set, elements which have
common features. For example, we might consider the set of all integers, the set
of all fractions, or the set of all real numbers. In a similar way, we can gather all
m × n matrices. They all obey certain rules, and are said to form a vector space.
We shall not consider the general case here, but restrict ourselves to the set of all
m × 1 column vectors: this set is called an m-dimensional vector space Vm. These
vectors obey the rules of matrix algebra. Thus if

⎡ a1 ⎤ ⎡ b1 ⎤
⎢a ⎥ ⎢b ⎥
s1 = ⎢ 2 ⎥ , s2 = ⎢ 2 ⎥ ,
⎢⎥ ⎢⎥
⎢⎣am ⎥⎦ ⎢⎣bm ⎥⎦

then s1 and s2 belong to Vm, and so does α s1 + β s2 for any constants α and β.
An important set of vectors in Vm is the set of base vectors

⎡1 ⎤ ⎡0⎤ ⎡0⎤
⎢0⎥ ⎢1 ⎥ ⎢0⎥
e1 = 0 , e2 = 0⎥ , … , em = ⎢⎢0⎥⎥ .
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣0⎥⎦ ⎢⎣0⎥⎦ ⎢⎣1⎥⎦

Any vector in Vm can be expressed as a linear combination of these vectors. Thus

⎡ a1 ⎤
⎢a ⎥
s1 = ⎢ 2 ⎥ = a1e1 + a2 e2 +  + amem .
⎢⎥
⎢⎣am ⎥⎦
286
The set of vectors {e1, e2, … , em} is said, therefore, to form a basis of Vm. None
of the vectors e1, e2, … , em can be expressed as a linear combination of the others,
EIGENVALUES AND EIGENVECTORS

so that they are said to be linearly independent. A set of n column vectors s1, s2, …
, sn is said to be linearly dependent if there exist constants α1, α2, … , αn, not all
zero, such that
α 1s1 + α 2s2 + ··· + αnsn = 0.
If the above equation holds only when α 1 = α 2 = ··· = αn = 0, then the vectors
are linearly independent. It can be proved that any set of m linearly independent
vectors form a basis of the vector space Vm.

Example 13.8 Show that the column vectors


a1 = (1, 1, 0)T, a2 = (1, 0, 1)T, a3 = (0, 1, 1)T
form a basis in three dimensions.
13

We must test whether


xa1 + ya2 + za3 = 0
has nonzero solutions for x, y, z. The equations in full are
x+y = 0,
x + z = 0,
y + z = 0.
The determinant of the coefficients is
1 1 0
D = 1 0 1 = −2 ≠ 0.
0 1 1

By (12.6) the only solution is x = y = z = 0. The vectors are therefore linearly independent
and can form a basis.

By a similar argument it can be shown that


b1 = (1, 1, 0)T, b2 = (1, 0, −1)T, b3 = (0, 1, 1)T
are linearly dependent and therefore cannot form a basis.

Self-test 13.2
For what values of k do the vectors (1, 2, k)T, (1, 2, –1)T, (k, 2, –1)T form a
basis in three dimensions?

13.4 Diagonalization of a matrix


We will take a constructive approach to this problem for a 3 × 3 matrix. Consider the
matrix of Examples 13.3 and 13.5, namely
287

⎡1 2 1⎤
A = ⎢2 1 1⎥

13.4
⎢1 1 2 ⎥
⎣ ⎦

DIAGONALIZATION OF A MATRIX
which has the eigenvalues λ 1 = 4, λ 2 = 1, λ 3 = −1 and eigenvectors

⎡1⎤ ⎡−1⎤ ⎡ 1⎤
s1 = ⎢1⎥ s2 = ⎢−1⎥ s 3 = ⎢−1⎥ .
⎢1⎥ ⎢ 2⎥ ⎢ 0⎥
⎣⎦ ⎣ ⎦ ⎣ ⎦
Construct a matrix C which has these eigenvectors as its columns:

⎡1 −1 1⎤
C = [s1 s2 s 3 ] = ⎢1 −1 −1⎥ .
⎢1 2 0⎥
⎣ ⎦
The columns are independent, so C is nonsigular. Then
AC = A[s1 s2 s3] = [As1 As2 As3] = [λ 1 s1 λ 2 s2 λ 3 s3],
the last equality holding since the eigenvector si is defined as a nonzero solution
of Asi = λisi. Let D be the diagonal matrix of eigenvalues, namely

⎡λ 1 0 0 ⎤
D = ⎢ 0 λ2 0 ⎥ .
⎢0 0 λ ⎥
⎣ 3⎦

Then
AC = [λ 1 s1 λ 2 s2 λ 3 s3] = [s1λ 1 s2λ 2 s3λ 3 ]
G λ1 −λ 2 λ 3 J G 1 −1 1J G λ1 0 0 J
= H λ1 −λ 2 −λ 3 K = H 1 −1 −1 K H0 λ 2 0 K = CD.
I λ1 2λ 2 0 L I1 2 0L I0 0 λ 3L

If we premultiply this equation by C −1, then


C −1AC = C −1CD = I3D = D.
Therefore, the operation C −1AC has diagonalized the matrix A. In the example
above,
−1
⎡1 −1 1⎤ ⎡ 13 1
3
1
3 ⎤
C −1 = ⎢1 −1 −1⎥ = ⎢− 61 − 1 1 ⎥.
⎢1 2 0⎥ ⎢ 1 6 3

⎣ ⎦ ⎢⎣ 2 − 1
2 0⎥⎦

Finally, it can be checked that

⎡ 13 1
3
1
3 ⎤ ⎡1 2 1⎤ ⎡1 −1 1⎤ ⎡4 0 0⎤
C −1AC = ⎢− 61 − 1 1 ⎥ ⎢2 1 1⎥ ⎢1 −1 −1⎥ = ⎢0 1 0⎥ = D.
⎢ 1 6 3
⎥⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢⎣ 2 − 1
2 0⎥⎦ ⎣1 2 0⎦ ⎣1 2 0⎦ ⎣0 0 −1⎦
288
It might appear at first sight that there is not a unique answer for D since the
eigenvectors are not uniquely defined. However, a different selection of eigen-
EIGENVALUES AND EIGENVECTORS

vectors in C will still lead to C −1AC = D.

Example 13.9 Use the eigenvalues and eigenvectors of


⎡ 3 0 −1⎤
⎢0 1 0⎥
⎢2 0 0⎥
⎣ ⎦
obtained in Example 13.7 to construct a transformation which diagonalizes A,
and verify that the diagonalized matrix is
⎡2 0 0⎤
D = ⎢0 1 0⎥ .
⎢0 0 1⎥
⎣ ⎦
13

From Example 13.7, we see that A has the eigenvalues λ 1 = 2 and λ 2 = λ 3 = 1. However,
we can associate two linearly independent eigenvectors with the repeated eigenvalue.
Thus, we can define C by
⎡1 1 0⎤
C = [s1 s2 s 3 ] = ⎢0 0 1⎥ .
⎢1 2 0⎥
⎣ ⎦
Its inverse is
⎡ 2 0 −1⎤
C −1 = ⎢−1 0 1⎥ .
⎢ 0 1 0⎥
⎣ ⎦
Finally it can be verified that
⎡ 2 0 −1⎤ ⎡3 0 −1⎤ ⎡1 1 0⎤ ⎡2 0 0⎤
C −1AC = ⎢−1 0 1⎥ ⎢0 1 0⎥ ⎢0 0 1⎥ = ⎢0 1 0⎥ = D.
⎢ 0 1 0⎥ ⎢2 0 0⎥ ⎢1 2 0⎥ ⎢0 0 1⎥
⎣ ⎦⎣ ⎦⎣ ⎦ ⎣ ⎦

Example 13.10 Find a transformation which diagonalizes the matrix


⎡2 −2⎤
A=⎢ ⎥.
⎣ 1 4⎦
From Example 13.2, the eigenvalues are λ 1 = 3 + i, λ 2 = 3 − i. Corresponding
eigenvectors are
⎡−1 + i⎤ ⎡−1 − i⎤
s1 = ⎢ , s2 = ⎢ .
⎣ 1 ⎥⎦ ⎣ 1 ⎥⎦
The eigenvalues and eigenvectors are complex valued but this does not affect the
method. The matrix C becomes
⎡−1 + i −1 − i⎤
C = [s1 s2 ] = ⎢ .
⎣ 1 1 ⎥⎦ ➚
289
Example 13.10 continued

13.5
Its inverse is
1 ⎡ 1 1 + i⎤ 1 ⎡ 1 1 + i⎤
C −1 = = .
det C ⎢⎣−1 −1 + i⎥⎦ 2i ⎢⎣−1 −1 + i⎥⎦

POWERS OF MATRICES
Finally, check that
1 ⎡ 1 1 + i⎤ ⎡2 −2⎤ ⎡−1 + i −1 − i⎤ ⎡3 + i 0 ⎤
C −1AC =
2i ⎢⎣−1 −1 + i⎥⎦ ⎢⎣ 1 4⎥⎦ ⎢⎣ 1 1 ⎥⎦ = ⎢⎣ 0 3 − i⎥⎦
.

Diagonalizing a matrix
To diagonalize a matrix A:
(i) find the eigenvalues of A;
(ii) find n linearly independent eigenvectors sn of A (if they exist);
(iii) construct the matrix C of eigenvectors;
(iv) calculate the inverse C −1 of C;
(v) compute C −1 AC. (13.4)

Not all matrices can be diagonalized in this way. In Example 13.6 where

⎡1 2 −1⎤
A = ⎢1 2 −1⎥ ,
⎢2 2 −1⎥
⎣ ⎦
we can associate only two linearly independent eigenvectors with the eigen-
value 0 and the repeated eigenvalue 1, and no diagonalizing matrix C can be
constructed.

Self-test 13.3
Find the eigenvalues and eigenvectors of
G1 2 2J
A = H1 2 −1 K .
I2 2 −1L
Construct a matrix C such that C −1AC = D, where D is a diagonal matrix.
What are the elements in D?

13.5 Powers of matrices


The transformation C of the previous section can be used to obtain a formula for
calculating powers of square matrices. This follows since it is a simple matter to
find powers of diagonal matrices. Thus if
290

⎡λ 1 0 0 ⎤
D = ⎢ 0 λ2 0 ⎥ ,
EIGENVALUES AND EIGENVECTORS

⎢0 0 λ ⎥
⎣ 3⎦

then

⎡λ 1 0 0 ⎤ ⎡λ 1 0 0 ⎤ ⎡λ 12 0 0⎤
D = ⎢ 0 λ2 0 ⎥ ⎢ 0 λ2 0 ⎥ = ⎢ 0
2 λ 22 0 ⎥,
⎢0 0 λ ⎥⎢0 0 λ ⎥ ⎢ 0 0 λ 23 ⎥⎦
⎣ 3⎦ ⎣ 3⎦ ⎣
and, in general,

⎡λ 1n 0 0⎤
D = ⎢0
n
λ n2 0 ⎥.
⎢0
⎣ 0 λ n3 ⎥⎦
13

In the previous section we showed that, if a 3 × 3 matrix A has three linearly


independent eigenvectors, then we can find a matrix C such that
AC = CD,
where D is a diagonal matrix, its elements consisting of the eigenvalues of A. Thus
by multiplying on the right by C −1 we find that
A = CDC −1.
Hence
A2 = CDC −1CDC −1 = CDI3DC −1 = CD2C −1,
since C −1C = I3. Continuing this process, we find that
A3 = A2A = CD2C −1CDC −1 = CD3C −1,
and, in general,
An = CDnC −1.

Example 13.11 Find a formula for An, where


⎡1 2 1⎤
A = ⎢2 1 1⎥ .
⎢1 1 2 ⎥
⎣ ⎦
(See Examples 13.3 and 13.5 and Section 13.5.)
The eigenvalues of A are λ 1 = 4, λ 2 = 1, λ 3 = −1; and the diagonalizing transformation,
with its inverse, is
⎡1 −1 1⎤ ⎡ 13 1
3
1
3⎤
C = ⎢1 −1 −1⎥ , C = ⎢− 16
−1
− 1 1⎥.
⎢1 2 0⎥ ⎢ 1 6 3

⎣ ⎦ ⎣ 2 − 1
2 0⎦
Hence ➚
291
Example 13.11 continued

13.5
n
⎡1 −1 1⎤ ⎡4 0 0⎤ ⎡ 13 1
3
1
3⎤
A = CD C = ⎢1 −1 −1⎥ ⎢0 1 0⎥
n n −1 ⎢− 1 − 1 1⎥
⎢1 2 0⎥ ⎢0 0 −1⎥ ⎢ 61 6 3

⎣ ⎦⎣ ⎦ ⎣ 2 − 1
0⎦

POWERS OF MATRICES
2

⎡1 −1 1⎤ ⎡4n 0 0 ⎤ ⎡ 13 1
3
1
3⎤
= ⎢1 −1 −1⎥ ⎢ 0 1n 0 ⎥ ⎢− 16 − 16 1⎥
⎢1 2 0⎥ ⎢ 0 0 (−1)n ⎥ ⎢ 1 ⎥
3

⎣ ⎦⎣ ⎦⎣ 2 − 12 0⎦

⎡4n −1 (−1)n ⎤ ⎡ 13 1
3
1
3⎤
= ⎢4n −1 −(−1)n ⎥ ⎢− 16 − 16 1⎥
⎢4n 0 ⎥⎦ ⎢⎣ 12 ⎥
3

⎣ 2 − 12 0⎦

4n ⎡1 1 1⎤ ⎡ 1 1 −2⎤ (−1)n ⎡ 1 −1 0⎤
= ⎢1 1 1⎥ + 16 ⎢ 1 1 −2⎥ + ⎢−1 1 0⎥ .
3 ⎢1 1 1⎥ ⎢−2 −2 4 ⎥ 2 ⎢ 0 0 0⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦

Example 13.12 Let


⎡1 − α α ⎤
P=⎢ ,
⎣ β 1 − β ⎥⎦
where 0  α, β  1. Find Pn and limn→∞ Pn.
The matrix P is an example of a row-stochastic matrix: that is, all elements are
non-negative and the sum of the elements in each row is 1 (these matrices occur in
probability applications, which accounts for the term stochastic). The eigenvalues of P
are given by
1−α −λ α
= 0.
β 1−β −λ

Hence
(1 − α − λ)(1 − β − λ) − αβ = 0,
or
λ2 − λ(2 − α − β ) + 1 − α − β = 0.
The roots λ 1 = 1, λ 2 = 1 − α − β = p, say. Choose the corresponding eigenvectors
⎡1⎤ ⎡−α ⎤
s1 = ⎢ ⎥ , s2 = ⎢ ⎥ .
⎣1⎦ ⎣β⎦
Let
⎡1 −α ⎤
C = [s1 s2 ] = ⎢ .
⎣1 β ⎥⎦
Its inverse is given by
1 ⎡ β α⎤
C −1 = .
α + β ⎢⎣−1 1 ⎥⎦
Thus ➚
292
Example 13.12 continued
EIGENVALUES AND EIGENVECTORS

⎡1 −α ⎤ ⎡1 0 ⎤ ⎡ β α ⎤ 1
Pn = CDnC −1 = ⎢
⎣1 β ⎥⎦ ⎢⎣0 p ⎥⎦ ⎢⎣−1 1 ⎥⎦ α + β
n

1 ⎡1 −α pn ⎤ ⎡ β α ⎤
= ⎢⎣1 β pn ⎥⎦ ⎢⎣−1 1 ⎥⎦
α+β
1 ⎡β + α pn α − α pn ⎤
=
α + β ⎢⎣β − β pn α + β pn ⎥⎦
1 ⎡β α ⎤ pn ⎡α −α ⎤
= ⎢⎣β α ⎥⎦ + α + β ⎢⎣− β .
α+β β ⎥⎦

Since 0  α  1 and 0  β  1, it follows that


p = 1 − α − β  1 and p = 1 − α − β  1 − 1 − 1 = −1,
that is |p |  1. As n → ∞, then pn → 0 and
1 ⎡β α ⎤
Pn → ⎢⎣β α ⎥⎦ .
13

α+β

Powers of a square matrix


To find the power An of a diagonalizable matrix A:
(i) find the eigenvalues and eigenvectors of A;
(ii) construct a matrix C of eigenvectors and its inverse C−1, so that A = CDC −1;
(iii) the required answer is
An = CDnC −1. (13.5)

Self-test 13.4
Using the results from Self-test 13.3, find a formula for An where
G1 2 2J
A = H1 2 −1 K .
I2 2 −1L

13.6 Quadratic forms


Suppose that x = [x1, x2, … , xn]T, an n-dimensional column vector with elements
x1, x2, …, xn. Any polynomial function of these elements in which every term is of
degree 2 in them is known as a quadratic form. Thus, if n = 3, then
x 12 + 8x1x2 + x 22 + 6x2x3 + x 32
is an example of a quadratic form. Quadratic forms can always be expressed as a
matrix product of the form
xTAx.
293
The example above can be written as

13.6
⎡1 4 0⎤ ⎡ x1 ⎤
[x1 x2 x3 ] ⎢4 1 3⎥ ⎢x2 ⎥ ,
⎢0 3 1⎥ ⎢x ⎥ (13.6)

QUADRATIC FORMS
⎣ ⎦ ⎣ 3⎦
in which A is symmetric. Any quadratic form may be written using a symmetric
A although non-symmetric representations are possible. For example, in the above
we may put

⎡1 0 0 ⎤
A = ⎢8 1 2⎥ .
⎢0 4 1⎥
⎣ ⎦
However, the symmetric form is adopted throughout this section.
Let us find the eigenvalues of the symmetric matrix in (13.6) in the usual way
by solving

1−λ 4 0
4 1−λ 3 = 0.
0 3 1−λ
Hence
(1 − λ)[(1 − λ)2 − 9] − 4· 4· (1 − λ) = 0
or
(1 − λ)[(1 − λ)2 − 25] = 0.
It follows that the eigenvalues are λ1 = 1, λ2 = −4, λ3 = 6. It can be shown by the
methods previously explained that corresponding eigenvectors are
⎡ 3⎤ ⎡− 4 ⎤ ⎡4⎤
s1 = ⎢ 0⎥ , s2 = ⎢ 5⎥ , s 3 = ⎢5 ⎥ .
⎢− 4 ⎥ ⎢ −3⎥ ⎢3⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
If a and b are two column vectors, and
aTb = 0,
then a and b are said to be orthogonal. If we examine the eigenvectors s1, s2, and s3
above, then it is easy to see that
⎡− 4 ⎤
s s = 3 0 − 4 ⎢ 5⎥ = −12 + 0 + 12 = 0,
T [ ]
⎢ −3⎥
1 2

⎣ ⎦
and similarly that s T2 s3 = 0 and s T3 s1 = 0. Thus the three eigenvectors are mutually
orthogonal: regarded as ordinary vectors in the sense of Chapter 9, they are
mutually perpendicular.
It will be shown that this property of the eigenvalues follows from the symmetry of
the matrix of the quadratic form. However, we first show that the eigenvectors of a
symmetric matrix must be real numbers.
294

Theorem 13.1 If A is a symmetric real matrix, then its eigenvalues are real.
EIGENVALUES AND EIGENVECTORS

Proof. Suppose that λ = α + iβ is an eigenvalue. Since the left-hand side of the equation
det(A − λIn) = 0 is a real polynomial in λ, it must also have an eigenvalue i = α − iβ.
Let s and [ be the eigenvectors corresponding to λ and its conjugate i. Thus
As = λ s, A[ = i[. (13.7)
Since A is symmetric, it follows that (A[)T = [ TAT = [ TA, and we can replace (13.7) by
As = λ s, [ TA = i[ T. (13.8)
Multiply the first equation in (13.8) on the left by [ T, and the second equation on the
right by s. Thus
[ TAs = λ [ Ts, [ TAs = i[ Ts.
T
Elimination of [ As leads to
(λ − i)[ Ts = 0. (13.9)
To show that s s ≠ 0, put s = (a1, … , an ). Then, since =nan = | an | ,
T T 2

⎡a1 ⎤
⎢a ⎥
[ s = [A1 A2 … An ] ⎢ 2 ⎥ = | a1 |2 + | a2 |2 +  + | an |2  0.
13

⎢⎥
⎣an ⎦
From (13.9), it follows that λ = i or α + iβ = α − iβ, from which we conclude that β = 0.
Therefore λ is real.

Theorem 13.2 If A is a real symmetric matrix, then the eigenvectors associated


with any two distinct eigenvalues are orthogonal.
Proof. Let λ 1 and λ 2 be the distinct eigenvalues, and s1 and s2 their corresponding
eigenvectors. Then
As1 = λ 1s1, As2 = λ 2s2.
Transpose the second equation so that the equations become
As1 = λ 1s1, s T2 A = λ 2s T2,
since A is symmetrical. Multiply the first equation by s T2 on the left, and the second
equation by s1 on the right. Hence
s T2As1 = λ 1s T2s1, s T2As1 = λ 2 s T2s1.
T
Eliminate s 2 As1 between these equations, leaving
λ1s T2s1 = λ 2s T2s1, or (λ 1 − λ 2)s T2s1 = 0.
Since λ 1 ≠ λ 2 it follows that s T2s1 = 0; that is, s1 and s2 are orthogonal.

Self-test 13.5
Find the eigenvalues and eigenvectors of
G 3 −3 2 J
A = H −3 −6 3 K .
I 2 3 3L
Confirm that the eigenvectors are mutually orthogonal.
295

13.7 Positive-definite matrices

13.7
A quadratic form xTAx is said to be positive-definite if xTAx  0 for all x ≠ 0. If this
is true, we simply describe the matrix A as positive-definite.

POSITIVE-DEFINITE MATRICES
Remember that any quadratic form can be written as xTAx where A is
symmetric.
Consider the particular case in which A is a 3 × 3 symmetric matrix. Let λ 1, λ 2,
λ 3 be its eigenvalues, with corresponding eigenvectors s1, s2, s3 which are chosen
so that they are all unit vectors, that is sT1 s1 = s T2 s2 = s T3 s3 = 1.
As we saw in Section 13.4, we can diagonalize A by using the matrix
C = [s1 s2 s3],
so that

⎡λ 1 0 0 ⎤
C AC = D = ⎢ 0 λ 2 0 ⎥ .
−1
⎢0 0 λ ⎥
⎣ 3⎦

For a symmetric matrix, the eigenvectors are orthogonal (Theorem 13.2). Hence
s T1 C = s T1 [s1 s2 s3]
= [s T1 s1 sT1 s2 s T1 s3]
= [1 0 0],
since s1 is a unit vector. In a similar way,
s T2 C = [0 1 0], s T3 C = [0 0 1].
Hence, if we construct a matrix with sT1, sT2, sT3 as its rows, then

⎡s T1 ⎤ ⎡1 0 0 ⎤
C C = ⎢s T2 ⎥ C = ⎢0 1 0⎥ = I3.
T
⎢s T ⎥ ⎢0 0 1⎥
⎣ 3⎦ ⎣ ⎦

In other words, the transpose of C is equal to the inverse of C:

⎡s T1 ⎤
C = ⎢s T2 ⎥
T
⎢s T ⎥
⎣ 3⎦
is the inverse of C, that is C T = C −1. Square matrices with this property are said to
be orthogonal matrices.
Suppose that we now define a transformation by x = CX, where C is an
orthogonal matrix. Then, in terms of X, the quadratic form becomes
xTAx = (CX)TACX = X TC TACX
= X TDX = λ 1X 12 + λ 2X 22 + λ 3X 32.
It follows from this result, for 3 × 3 matrices, and similarly for higher order, that
a quadratic form is positive-definite if and only if all its eigenvalues are positive.
296

Example 13.13 Find an orthogonal matrix C which transforms the quadratic


EIGENVALUES AND EIGENVECTORS

form xTAx where


⎡ 3 −1 0⎤
A = ⎢−1 3 0⎥
⎢ 0 0 1⎥
⎣ ⎦
into a diagonal quadratic form X TDX.
The eigenvalues of A are given by det(A − λI3) = 0, where
3−λ −1 0
det(A − λI 3 ) = −1 3−λ 0 ,
0 0 1−λ
= [(3 − λ)2 − 1](1 − λ),
= (λ − 2)(λ − 4)(1 − λ).
Hence the eigenvalues are λ 1 = 1, λ 2 = 2, λ 3 = 4. Since all the eigenvalues are positive, it
13

follows that the quadratic form is positive-definite. The corresponding eigenvectors are

⎡0⎤ ⎡1/ √2⎤ ⎡−1/ √2⎤


s1 = ⎢0⎥ , s2 = ⎢⎢1/ √2⎥⎥ , s 3 = ⎢⎢ 1/ √2 ⎥⎥ .
⎢1⎥
⎣ ⎦ ⎢⎣ 0 ⎥⎦ ⎢⎣ 0 ⎥⎦
Hence the required orthogonal matrix C is
⎡0 1/ √2 −1/ √2⎤
C = [s1 s2 s 3 ] = ⎢⎢0 1/ √2 1/ √2 ⎥⎥ .
⎢⎣1 0 0 ⎥

The relation between the coordinates (x, y, z) and (X, Y, Z) of a point fixed in
space in the transformation
x = CX = [s1 s2 s3]X,
where the eigenvectors s1, s2, s3 are orthogonal unit vectors, can be seen as follows.
Put X = 1, Y = 0, Z = 0, which is a point on the X axis. Since

⎡1⎤
X = ⎢0⎥ ,
⎢0⎥
⎣ ⎦
it follows that the corresponding point in the x frame is x = s1. In other words
the elements (a1, b1, c1) of s1 are the coordinates in the x space of the point
A1 : (1, 0, 0) in the X space. Similarly, the elements of s2 and s3 are respectively the
coordinates of A2 : (0, 1, 0) and A3 : (0, 0, 1) in the X space (see Fig. 13.1).
We know that the eigenvectors are mutually orthogonal, that is sTi sj = 0 (i ≠ j).
We want to show that this implies that the new axes OXYZ are also mutually
perpendicular. Consider the triangle OA1A2: we want to show that A is a right
angle, so that the triangle is subject to Pythagoras’s theorem:
297

Z z

13.7
A3

POSITIVE-DEFINITE MATRICES
(a3, b3, c3)
1 A2 Y
1
(a2, b2, c2)
O
1
A1 y Fig. 13.1 Orthogonal mapping
(a1, b1, c1) between axes. The coordinates
x (a1, b1, c1), (a2, b2, c2), (a3, b3, c3)
X are measured in the x space.

A1A 22 − OA 12 − OA 22
= (a1 − a2)2 + (b1 − b2)2 + (c1 − c2)2 − (a 12 + b 12 + c 12) − (a 22 + b 22 + c 22)
= −2(a1a2 + b1b2 + c1c2)
= −2s T1 s2 = 0,
since the eigenvectors are unit vectors and orthogonal. Hence, by Pythagoras’
theorem, A is a right angle. Similarly, the other angles B and C
are right angles. Hence the new axes are mutually perpendicular. It can be shown
that det C = ±1. If det C = 1, then the X coordinates can be obtained from the x
coordinates by a rotation about the origin O. If det C = −1, then a reflection and
rotation are required.

Example 13.14 Show that


⎡1 2 2⎤
C = 13 ⎢2 1 −2⎥
⎢2 −2
⎣ 1⎥⎦
is an orthogonal matrix. If x = CX, what does the point x = 1, y = 2, z = −1 map
into in the (X, Y, Z) coordinates?
In this example,
⎡1⎤ ⎡ 2⎤ ⎡ 2⎤
s1 = 13 ⎢2⎥ , s2 = 13 ⎢ 1⎥ , s 3 = 13 ⎢−2⎥ .
⎢2⎥ ⎢−2⎥ ⎢ 1⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
Clearly
⎡1⎤
s T1 s1 = 19 [1 2 2] ⎢2⎥ = 1.
⎢2⎥
⎣ ⎦
Similarly sT2s2 = 1 and sT3s3 = 1. Also
⎡ 2⎤
s T1 s2 = 19 [1 2 2] ⎢ 1⎥ = 19 [1 × 2 + 2 × 1 + 2 × (−2)] = 0.
⎢−2⎥
⎣ ⎦ ➚
298
Example 13.14 continued
EIGENVALUES AND EIGENVECTORS

Similarly, s T2s3 = 0 and sT3 s1 = 0. We need to invert the transformation so that


X = C −1x = C Tx
⎡s T1 ⎤ ⎡ 1⎤ ⎡1 × 1 + 2 × 2 + 2 × (−1) ⎤
= ⎢s T2 ⎥ ⎢ 2⎥ = 13 ⎢2 × 1 + 1 × 2 + (−2) × (−1)⎥ = [1 2 −1]T.
⎢s T ⎥ ⎢−1⎥ ⎢2 × 1 + (−2) × 2 + 1 × (−1)⎥
⎣ 3⎦⎣ ⎦ ⎣ ⎦
Hence X = 1, Y = 2, Z = −1.

Self-test 13.6
Let
G1 1 −1 −1 J
H 1 −1 −1 1 K
13

A = 12 H .
1 −1 1 −1 K
I1 1 1 1L
Show that A is an orthogonal matrix. Show also that the columns form an
orthonomal basis. Is this a general property of orthogonal matrices?

13.8 An application to a vibrating system


Positive-definite matrices occur frequently in applications. For example, consider
the system consisting of two particles of equal mass m and three equal springs
stretched in a straight line between two supports as shown in Fig. 13.2. Suppose
that, in equilibrium, the springs are unstretched, each of length a. The mech-
anical system vibrates longitudinally so that the displacements of the particles
are x and y as shown.

a m a m a

T1 T2 T2 T3

x y

Fig. 13.2 Longitudinal oscillations.

If a spring is stretched or compressed from equilibrium by a length x, then its


potential energy stored is --12 kx2 where k is a constant known as the stiffness of the
spring, which measures its reaction to being stretched or compressed. The total
potential energy of the system is
V = 12 kx2 + 12 k(y − x)2 + 12 ky2.
299
Note that the extension of the middle spring is y − x. Thus

13.8
V = 21 kx2 + 21 ky2 − kyx + 21 kx2 + 21 ky2
= kx 2 − kxy + ky 2 = 12 xTKx,

AN APPLICATION TO A VIBRATING SYSTEM


where

⎡x⎤ ⎡ 2k −k ⎤
x = ⎢ ⎥, K=⎢ ⎥.
⎣y ⎦ ⎣−k 2k ⎦

The eigenvalues of K are given by det(K − λI2) = 0, that is

2k − λ −k
= 0, or (2k − λ)2 – k2 = 0.
−k 2k − λ

Hence, the eigenvalues are λ 1 = k and λ 2 = 3k, which are both positive, imply-
ing that the potential energy is a positive-definite quadratic form. This is not
surprising, since we might expect the potential energy to take a minimum value
in equilibrium. Corresponding eigenvectors are

1 ⎡1⎤ 1 ⎡ 1⎤
s1 = , s2 = ,
√2 ⎢⎣1⎥⎦ √2 ⎢⎣−1⎥⎦

normalized as unit vectors. The matrix of eigenvectors, C, is given by

1 ⎡1 1⎤
C = [s1 s2 ] = .
√2 ⎢⎣1 −1⎥⎦

The transformation x = CX introduces the coordinates X T = (X, Y ) in which

⎡k 0 ⎤
V = 21 X T ⎢ ⎥X
⎣0 3k ⎦
= 12 (kX2 − 3kY)2
(X, Y) are known as the normal coordinates of the system, and are related to
x and y by

⎡x⎤ ⎡X ⎤ ⎡(X + Y)/ √2⎤


⎢y ⎥ = C ⎢Y ⎥ = ⎢(X − Y)/ √2⎥ .
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
Normal coordinates are often more convenient coordinates to use.
For the same problem, we can also derive the equations of motion of each
particle. If T1, T2, and T3 are the tensions in each of the springs, then applying
Newton’s law (force equals mass times acceleration) for each particle gives the
differential equations
T2 − T1 = mF, (13.10)

T3 − T2 = mH. (13.11)
300
where F and H stand for d2x/dt2 and d2y /dt2 respectively. The tension in a spring
is k times the extension, by Hooke’s law, where k is the stiffness of the spring.
EIGENVALUES AND EIGENVECTORS

Thus
T1 = kx, T2 = k(y − x), T3 = −ky.
Substitution into (13.10) and (13.11) yields
−2kx + ky = mF, (13.12)

kx − 2ky = mH. (13.13)

In matrix form, these equations can be combined into the vector equation
f + Ax = 0,
where

⎡F⎤ ⎡ 2k /m −k /m⎤
f = ⎢ ⎥, A=⎢ ⎥.
13

⎣H ⎦ ⎣−k /m 2k /m⎦

If we use the normal coordinates, then x = CX implies


Ce + ACX = 0.
Multiply this equation on the left by C −1 = CT:
e + CTACX = 0,
or
e + DX = 0, (13.14)

where

⎡λ 0 ⎤ ⎡k /m 0 ⎤
D=⎢ 1 =⎢ .

⎣ 0 λ 2⎦ ⎣ 0 3k /m⎥⎦

Equation (13.14) now separates into the two differential equations


E + (k/m)X = 0,
G + 3(k/m)Y = 0,
which, unlike (13.12) and (13.13), are now no longer simultaneous equations,
but uncoupled into two equations which can be solved separately and independ-
ently for X and Y. We say more about the solution of differential equations in
Chapter 18.
301
Problems

PROBLEMS
13.1 (Sections 13.1, 2). Find the eigenvalues and 13.7 (Sections 13.1, 2). Show that the matrix
eigenvectors of the following matrices:
⎡ −1 −1 a + 1⎤
⎡2 3⎤ ⎡6 3⎤ ⎡2 1⎤ A = ⎢a + 1 −a −1 ⎥
(a) ⎢ ; (b)
⎢⎣2 7 ⎥⎦ ; (c) ⎢⎣4 6 ⎥⎦ ;
⎣4 6 ⎥⎦ ⎢ −a a + 1 −a ⎥
⎣ ⎦
⎡ 1 1⎤ ⎡ 1 2⎤ ⎡2 −2⎤ has a zero eigenvalue. For design reasons, a
(d) ⎢ ; (e)
⎢⎣14 5⎥⎦ ; (f ) ⎢⎣4 6 ⎥⎦ .
⎣4 5⎥⎦ second eigen-value must be 3. For what values
of a does this occur? Find the third eigenvalue
13.2 (Section 13.1). Show that the eigenvalues of in each case.
the symmetric matrix
⎡a b⎤ 13.8 (Sections 13.1, 2). A matrix is said to be
A=⎢ ,
⎣b c ⎥⎦ idempotent if A2 = A. Explain why all eigenvalues
of A must be either 0 or 1. Show that
where a, b, and c are real numbers, are real.
⎡1 0 0⎤
13.3 (Section 13.1). Find the eigenvalues of A = ⎢0 3 6⎥
⎢0 −1 −2⎥
⎡6 3⎤ ⎣ ⎦
A=⎢
⎣2 7 ⎥⎦ is idempotent. Find the eigenvalues and
(see Problem 13.1b). Find the inverse of A and find eigenvectors of A and A2 and confirm the
its eigenvalues. What relationship, would you above result.
guess, exists between the eigenvalues of A and
those of A−1? Find the eigenvalues of A2. How do 13.9 (Sections, 13.1, 2). Let
they relate to those of A?
⎡1 1 1 1⎤
13.4 (Sections 13.1, 2). Find the eigenvalues and ⎢1 1 −1 −1⎥
A= ⎢ 1
1 −1 1 −1⎥
2 .
eigenvectors of ⎢ ⎥
⎡ 1 1 2⎤ ⎡2 1 2⎤ ⎣1 −1 −1 1⎦
(a) ⎢1 2 1⎥ ; (b) ⎢1 2 2⎥ ; Show that A2 = I4. Explain why the eigenvalues of
⎢2 1 1⎥ ⎢2 1 2⎥
⎣ ⎦ ⎣ ⎦ A must be either 1 or −1. Can A be diagonalized?
⎡2 0 0⎤ ⎡6 5 5⎤
(c) ⎢0 2 2⎥ ; (d) ⎢ 5 6 5⎥ . 13.10 Find the eigenvalues λ 1, λ 2, λ 3 of
⎢0 2 −1⎥ ⎢5 5 6⎥
⎣ ⎦ ⎣ ⎦ ⎡1 2 1⎤
A = ⎢2 1 1⎥ .
⎢1 1 2⎥
13.5 (Sections 13.1, 2). Find the eigenvalues and ⎣ ⎦
eigenvectors of
The trace of a square matrix is the sum of the
⎡1 2 0 0⎤ elements in the leading diagonal. Thus if B = [bij]
⎢3 2 0 0⎥ is an n × n matrix, then
⎢0 0 3 1⎥
.
⎢ ⎥ trace B = b11 + b22 + ··· + bnn.
⎣0 0 1 3⎦
Confirm for A above that trace A = λ 1 + λ 2 + λ 3.
Also verify that det A = λ 1λ 2λ 3.
13.6 (Sections 13.1, 2). Show that
⎡1 0 0⎤ 13.11 (Section 13.3). Show that the vectors
A = ⎢0 2 2⎥
⎢0 2 5⎥ ⎡1⎤ ⎡ 2⎤ ⎡ 4⎤
⎣ ⎦
s1 = ⎢2⎥ , s2 = ⎢−1⎥ , s3 = ⎢ 3⎥
has a repeated eigenvalue. Find the corresponding ⎢1⎥ ⎢ 3⎥ ⎢ 5⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
eigenvectors. How many linearly independent
eigenvectors are there? are linearly dependent.
302
13.12 (Section 13.6). Let 13.18 (Section 13.7). Show that
EIGENVALUES AND EIGENVECTORS

⎡− 4 1 −2⎤ ⎡1 −1 1 −1⎤
A = ⎢ 2 −2 1⎥ . ⎢1 −1 −1 1⎥
A= ⎢ 1
⎢ 0
⎣ 1 0 ⎥
⎦ 1 1 −1 −1⎥
2
⎢ ⎥
⎣1 1 1 1⎦
Find the eigenvalues of A and a set of
corresponding eigenvectors. Hence construct a is an orthogonal matrix.
matrix C which makes C −1AC a diagonal matrix.
13.19 Show that, in the transformation
13.13 Find a matrix C which diagonalizes the
⎡X⎤ ⎡cos α − sin α ⎤ ⎡x⎤
⎢⎣Y ⎥⎦ = ⎢ sin α
matrix ,
⎣ cos α ⎥⎦ ⎢⎣y ⎥⎦
⎡1 8⎤
A=⎢ .
⎣2 1⎥⎦ the angle between the two sets of axes is α. What
do the axes of x and y become in the (X, Y ) plane?
13.14 (Section 13.5). Find a matrix C which
13.20 Show that the nonzero eigenvalues of the
diagonalizes
skew-symmetric matrix
⎡2 0 0⎤
13

A = ⎢0 2 2⎥ . ⎡ 0 a b⎤
⎢0 2 −1⎥ A = ⎢− a 0 c⎥
⎣ ⎦ ⎢−b −c 0⎥
⎣ ⎦
Verify that C −1AC = D, where D is the diagonal
are imaginary for a, b, c real.
matrix of eigenvalues.

13.15 Using the diagonalization result 13.21 Let

C −1AC = D ⎡1 2 1⎤
A = ⎢2 1 1⎥ .
for a matrix A which has n linearly independent ⎢1 1 2⎥
eigenvectors, show that ⎣ ⎦
det A = λ 1λ 2 … λ n, Show that
where λ 1, λ 2, … , λ n are the eigenvalues of A. det(A − λ I3) = −λ3 + 4λ2 + λ − 4.
(Hint: use the result det AB = det A det B for Verify that
square matrices.)
−A3 + 4A2 + A − 4I3 = 0.
13.16 (Section 11.5). Find the eigenvalues and In other words, the matrix A satisfies its own
eigenvectors of the row-stochastic matrix. characteristic equation. This is known as the
Cayley–Hamilton theorem, and holds generally for
⎡ 41 1
2
1
4 ⎤ square matrices. Use the result to find the inverse
A = ⎢ 12 1 1 ⎥. matrix A−1.
⎢1 4
1
4
1 ⎥
⎣4 4 2 ⎦
13.22 Find the eigenvalues and eigenvectors of
Find a formula for An. How does A behave as
n → ∞? ⎡ 5 −1 −3 3⎤
⎢−1 5 3 −3⎥
A=⎢
−3 3 5 −1⎥
13.17 Show that .
⎢ ⎥
⎡1 0 0 ⎤ ⎣ 3 −3 −1 5⎦
A = ⎢0 cos α − sin α ⎥
⎢0 sin α Construct a matrix C such that C −1AC is the
⎣ cos α ⎥⎦ diagonal matrix of eigenvalues. Write down det A.
is an orthogonal matrix. Describe the mapping
defined by 13.23 (Section 13.6). Express the following
quadratic forms in the form xTAx, where A is
X = Ax. a 3 × 3 symmetric matrix:
Which set of points remains unaffected by the (a) x 12 + x 22 + x 32 + 4x1x2 − 4x1x3 + 4x2x3;
mapping? (b) x1x2 − x1x3 + x2x3.
303
Find eigenvalues of A in each case, and find also If
a matrix C which transforms each into the form

PROBLEMS
⎡1 3⎤
λ 1X 12 + λ 2X 22 + λ 3X 32. A=⎢
⎣2 2⎥⎦
13.24 (Section 13.6). Which of each of the (see Examples 13.1 and 13.4), find a formula for
following quadratic forms is positive-definite? Am and the sum
(a) 4x 12 + x 22 − 4x1x2; n
(b) x 12 + x 22 + 2x 32 + 2x2x3 + 2x3x1 + 4x1x2; S = ∑ Am.
(c) 6x 12 + 2x 22 − x3x1. m=1

13.25 (Section 13.8). Consider three particles, 13.28 Let


each of mass m, and four equal springs stretched
⎡1 2 1⎤
in a straight line between fixed supports distance
A = ⎢2 1 1⎥ .
4a apart by four springs each with unstretched ⎢1 1 2⎥
lengths a (as in Fig. 13.2, but with three particles). ⎣ ⎦
Consider longitudinal oscillations of the systems Find the eigenvalues of A and confirm that
and let x, y, z be the extensions of the springs.
Assuming Hooke’s law with stiffness k for the ⎡2⎤ ⎡−1⎤ ⎡−3⎤
tension in each spring, show that x, y, z satisfy s1 = ⎢2⎥ , s2 = ⎢−1⎥ , s3 = ⎢ 3⎥
⎢2⎥ ⎢ 2⎥ ⎢ 0⎥
the differential equations ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
k(−2x + y) = mF, are eigenvectors of A. Construct the matrix C and
k(x − 2y + z) = mH, verify that
k(y − 2z) = mv. D = C −1AC,
Express the equations in the matrix form where D is the diagonal matrix of eigenvalues.
f + Ax = 0. (This is a reworking of the problem at the
beginning of Example 13.5, but with different
Find the eigenvalues and eigenvectors of A. eigenvectors.)
Construct a matrix C such that CTAC is diagonal.
Obtain differential equations for the normal 13.29 Given an m × n matrix
coordinates X, Y, Z.
⎡ a11 a12 … a1n ⎤
⎢a a 22 … a2n ⎥
13.26 Let A = ⎢ 21 ⎥,
⎢     ⎥
⎡0 1 0⎤ ⎣am1 am2 … amn ⎦
A = ⎢0 0 1⎥ .
⎢1 0 0⎥ the determinant of any square r × r submatrix of
⎣ ⎦
A obtained by eliminating (m − r) rows and (n − r)
Calculate A2 and A3. Find a general formula columns in A is called an rth-order minor of A.
for An. Since the sub-matrix must be square it follows
Show that A has two complex eigenvalues, and that 1  r  min(m, n) (‘min’ means the smaller
find the corresponding eigenvectors. Construct a of m or n unless they are equal, in which case
matrix C which diagonalizes A and find a formula min(m, n) = m = n). For example, if
for An. Compare this result with the ad hoc method
⎡ 1 2 3 4⎤
above.
A = ⎢10 11 12 5⎥ ,
⎢ 9 8 7 6⎥
⎣ ⎦
13.27 Let A be a square matrix, and let S
represent the sum of the powers of A from A then
up to An:
1 2 4
S = A + A2 + A3 + ··· + An. 11 5
B = 10 11 5 = −80 and C = = 26
8 6
By multiplying the equation by A and subtraction, 9 8 6
show that
are examples respectively of third-order and
S = A(I − An)(I − A)−1, second-order minors of A. Find the remaining
and state any cases for which this method fails. third-order minors of A.
304
13.30 (See Problem 13.29.) A matrix is said to ⎡1 2 −1⎤ ⎡3 0 −1⎤
have rank r if at least one of its rth-order minors is A1 = ⎢1 2 −1⎥ and A2 = ⎢0 1 0⎥ .
EIGENVALUES AND EIGENVECTORS

⎢2 2 −1⎥ ⎢2 0 0⎥
not zero whilst every (r + 1)th-order minor is zero. ⎣ ⎦ ⎣ ⎦
Find the ranks of the following matrices:
Both these matrices have a repeated eigenvalue.
⎡1 2 3⎤ ⎡3 2 1 ⎤ Find the ranks of the matrices λI3 − A1 and λI3 − A2
(a) ⎢3 4 5⎥ ; (b) ⎢1 2 3⎥ ; for all the eigenvalues. Confirm that if λ is the
⎢6 7 8⎥ ⎢2 1 3⎥ repeated eigenvalue of A2, then the rank of λI3 − A2
⎣ ⎦ ⎣ ⎦
is 1, and there are two eigenvectors associated with
⎡1 2 3 2⎤ this eigenvalue: the vector space defined by this
(c) ⎢1 3 4 5⎥ . eigenvalue has dimension 2. (In general, for any
⎢2 3 5 1⎥ eigenvalue λ, the rank of λIn − A indicates the
⎣ ⎦
dimension of the vector space associated with λ: if
the root is r-fold and the rank of λIn − A is s, where
13.31 (See Problem 13.30.) The method for s must satisfy n − 1  s  n − r, then the dimension
checking the rank of a matrix by calculating of the vector space of λ is n − s. If the eigenvalue
minors can be a lengthy procedure. An alternative is unique, then the vector space has dimension 1;
approach uses the elementary row operations of in other words, there is just one eigenvector
Section 12.2. Given a matrix A, row operations are associated with the eigenvalue, and if r = 2, there
13

applied to reduce A to echelon form, from which could be one or two eigenvectors depending on the
it is easier to test its rank. This is justified since it rank, and so on.)
can be proved (but not here) that elementary row Find the eigenvalues of
operations do not change the rank of a matrix.
Express the matrix in Problem 13.30c in echelon ⎡2 1 0 0⎤
form and check its rank. ⎢0 −1 0 2⎥
A=⎢
0⎥
.
0 0 1
⎢ ⎥
13.32 Consider again Examples 13.6 and 13.7. ⎣0 0 2 1⎦
Let the matrices in these examples be defined, Find the rank of A. What are the dimensions of the
respectively, by vector spaces associated with each eigenvalue?
Part 3
Integration and
differential equations
Antidifferentiation
and area 14

CONTENTS

14.1 Reversing differentiation 307


14.2 Constructing a table of antiderivatives 311
14.3 Signed area generated by a graph 314
14.4 Case where the antiderivative is composite 317
Problems 318

Chapters 2–5 concerned differentiation of functions and some immediate applica-


tions. Chapters 14 –17 describe integration, the second cornerstone of calculus.
A question that suggests itself quite naturally is this: ‘given a function, what func-
tion (or functions) is it the derivative of’? This is the opposite, or inverse process
to differentiating something, and is therefore called antidifferentiation.
For example, given the function x2, it is easy to verify that --31x3 is an antiderivative
(and it might strike you that so is 1 + --13 x3, and infinitely many similar functions).
A related question is ‘if the speed of a car, dx/dt, at distance x(t) from a starting
point x = 0 at t = 0, is equal to t2, what distance does it travel by time t’? (The
required distance x(t) = --13 t 3.)
In this chapter we construct a basic table of antiderivatives (14.2) by inverting
the table of derivatives (3.5). In Section 14.3 it is shown how the process of
antidifferentiation can be interpreted to obtain the signed area under a graph. Pay
particular attention to the idea of a signed area, in preparation for a proper
definition of integration, and much more extensive applications in later chapters.

14.1 Reversing differentiation


Compare the following two problems:
d
Problem A: sin x = f(x); what is f(x)?
dx
d
Problem B: F(x) = cos x; what is F(x)?
dx
For Problem A we know already that
f(x) = cos x.
308
This provides one answer to Problem B, which is solved by
ANTIDIFFERENTIATION AND AREA

F(x) = sin x.
Since cos x is the derivative of sin x, we say that sin x is an antiderivative of cos x
(we say an antiderivative because it is not the only one; for example, sin x + 1 is
also an antiderivative).
The antidifferentiation question in Problem B can be expressed in various
ways; for example,
(a) What must be differentiated to get cos x?
(b) What curves have slope equal to cos x at every point?
(c) Find y as a function of x if dy /dx = cos x.
Finding antiderivatives is the opposite or inverse process to that of finding
derivatives.
The following examples show that a function f(x) has an infinite number of
14

antiderivatives: there is an infinite number of functions whose derivatives are f(x).


However, they are all very simple variants on a single function.

Example 14.1 Find y as a function of x if dy /dx = 2x.


One solution is y = x2, because its derivative is 2x. But the derivatives of x2 + 3, x2 − 12 ,
and so on are also equal to 2x. In fact
y = x2 + C
is an antiderivative of 2x for any constant C.

C0
y

C=0
R

C0
Q

O P x

Fig. 14.1

Some of these solutions are shown in Fig. 14.1. Different choices for C just shift the
graph bodily up or down parallel to itself. Therefore, at any particular value of x, such
as is represented by the vertical line PQR, the slopes are all the same, independently of
the value of C.
309
Evidently the same thing will happen whatever function we start with: if we
find one solution, we can add constants to obtain more.

14.1
Find a collection of antiderivatives of sin 2x.

REVERSING DIFFERENTIATION
Example 14.2

We want y such that dy/dx = sin 2x. If we differentiate a cosine we get something
involving a sine, so first of all test whether y = cos 2x is close to being an antiderivative
of sin 2x. We find that dy /dx = −2 sin 2x. This contains an unwanted factor (−2). It can
be eliminated by choosing instead
y = − 12 cos 2x,

for then we have dy/dx = − 12 (−2 sin 2x) = sin 2x, which is right. Therefore, one
antiderivative is − 12 cos 2x, and the rest are of the form
y = − 12 cos 2x + C (C is any constant).

Example 14.3 Solve the equation dy /dx = e−3x (that is to say, find a collection of
antiderivatives of e−3x).
Try y = e−3x; then dy /dx = −3 e−3x. To avoid the unwanted factor (−3) we should have
taken
1 −3x
y= e = − 13 e −3x .
(−3)
From this we construct an infinite collection of antiderivatives:
− 13 e −3x + C (C any constant).

It can be proved that the above process, of finding a particular antiderivative of


a function and adding constants, generates all possible antiderivatives for that
function.

Antiderivatives of f (x)
A function F(x) is called an antiderivative of f(x) if
d
F(x) = f(x).
dx
If F(x) is any particular antiderivative of f(x), then all the antiderivatives are
given by
F(x) + C,
where C can be any constant. (Therefore, any two antiderivatives differ by a
constant.) (14.1)

An antiderivative of a function is also more usually called an indefinite


integral of the function, and the process of getting it is called integration. If
you know the term already, it is perfectly safe to use it. We shall change over to
it in Chapter 15.
310

Example 14.4 Find all the antiderivatives of x 3.


ANTIDIFFERENTIATION AND AREA

We firstly have to find any y which fits the equation dy /dx = x 3. Differentiation reduces
a power of x by unity, so try y = x4:
dy/dx = 4x 3.
The factor 4 is unwanted; we needed --14 x 4 to give x 3. Therefore all antiderivatives are
given by
y = --14 x 4 + C,
where C is any constant.

Sums of terms and constant multipliers are treated in the same way as in
differentiation: the multipliers stay as multipliers and each term is treated
separately, as in the next example.
14

Example 14.5 Obtain all the antiderivatives of 2 e−3x − 12 x 3 + 2.


From the previous examples, one antiderivative of e−3x is − 13 e −3x , and one for x3 is --14 x 4.
Also, one antiderivative of 2 is obviously 2x. Therefore one antiderivative of the given
expression is
2(− 13 e −3x ) − 12 ( 14 x 4 ) + 2x,
and all its antiderivatives are of the form
− 23 e −3x − 18 x 4 + 2x + C,
where C is any constant.

The following two examples show the importance in practice of including the
constant C.

Example 14.6 A point is at x = 2 on the x axis at time t = 0, then moves with


velocity v = t − t 2. Find where it is at time t = 3.
Velocity is the rate at which displacement x changes with time: v = dx/dt. In this case
v = dx/dt = t − t 2.
Therefore x is some antiderivative of t − t 2. All of its antiderivatives are included in
x = 12 t 2 − 13 t 3 + C,
where C is any constant.
To find what value C must take in this case, we obviously have to take the starting
point into consideration: x = 2 when t = 0. To obtain the value of C, substitute these
values into our expression:
2 = 0 − 0 + C.
Therefore C = 2, so the position at any time is given by
x = 12 t 2 − 13 t 3 + 2.
Finally, when t = 3, we have x = − 52 .
311

Example 14.7 Find the equation of the curve which passes through the

14.2
point (π, −1) and whose slope is given by dy /dx = sin 2x.
Since the required y is an antiderivative of sin 2x, the equation of the curve must take

CONSTRUCTING A TABLE OF ANTIDERIVATIVES


the form
y = − 12 cos 2x + C,
where C is some (not ‘any’) constant. Since also we know that the curve passes through
the point x = π, y = −1, we must require
−1 = − 12 cos 2π + C = − 12 + C,
so C = − 12. Finally the required curve is
y = − 12 cos 2x − 12 .

Example 14.8 Obtain the antiderivatives of (3x − 2)3.


As in the earlier examples, we try to guess the structure of y, given that dy/dx
= (3x − 2)3. There is not much to go on, so try an analogy with x3; it would lead us
to try something like y = (3x − 2)4. To check this, differentiate using the chain rule with
u = 3x − 2 and y = u4:
dy
= 4(3x − 2)3 ⋅ 3 = 12(3x − 2)3.
dx
The factor 12 is unwanted; we really needed y = 121 (3x − 2)4 . Therefore all the
antiderivatives are given by y = 121 (3x − 2)4 + C.

The technique used in the previous example can be used for functions like
(ax + b)n, eax+b, cos(ax + b), and sin(ax + b). However, it would not work in this
simple way for a function such as (2x2 − 3)2 or sin(2x 2 − 3): the antiderivative of
(2x 2 − 3)2 is not equal to 13 (2x2 − 3)3, because x2 is present rather than x (try it, using
the chain rule).

Self-test 14.1
A point is at x = 0 on the x axis at time t = 0, and then moves along the x axis
with velocity v = 2 sin(3t) + 4. Find the displacement of the point as a func-
tion of time.

14.2 Constructing a table of antiderivatives


Since antidifferentiation is the inverse of differentiation, any table of derivatives
can be read backwards in order to provide antiderivatives. Suppose that two
typical entries in a table of derivatives are as follows:
312

Given function Derivative


ANTIDIFFERENTIATION AND AREA

d
F(x) f (x) = F(x)
dx

sin ax a cos ax
eax a eax

By interchanging the columns and modifying the headings, we get two entries in
a possible table of antiderivatives:

Given function One antiderivative


14

f (x) F(x)

a cos ax sin ax
aeax eax

However, these entries are not yet in the form we should like them. For
example, for the first entry we would prefer to have cos ax in the left column,
instead of a cos ax. Therefore divide both entries by the constant a, remember to
introduce the arbitrary constant C to register all the antiderivatives, and we have
a more convenient table:

Given function Antiderivatives


f (x) F(x)

1
cos ax sin ax + C
a
1 ax
eax e +C
a

By such means the short table (14.2) is produced. To verify any entry, differen-
tiate the function in the right-hand column; the result should be the entry on the
left. The letter C stands for ‘any constant’ or ‘an arbitrary constant’.
313

A short table of antiderivatives

14.2
Given function Antiderivatives
f(x) F(x)

CONSTRUCTING A TABLE OF ANTIDERIVATIVES


a (constant) ax + C
1
* x m (unless m = −1) xm +1 + C
m+1
⎛ 1⎞ ⎧ln x + C if x  0
** x−1 ⎜ i.e. ⎟ ⎨ln(−x) + C if x  0
⎝ x⎠ ⎩
or ln |x| + C (x ≠ 0)
1 ax
eax e +C
a
1
cos ax sin ax + C
a
1
sin ax − cos ax + C
a
(14.2)

Notice particularly the two starred entries. The formula * covers most cases, but it
does not produce antiderivatives of the function x−1 (i.e. of 1 /x). Here m = −1, so
the entry on the right becomes infinite and therefore meaningless. Therefore the
antiderivatives of x−1 must be given by some different formula, and this is shown
under **. All we have to do is to verify the formula ** as in the following example.
(The modulus or absolute value notation | x| is explained in Section 1.1.)

Example 14.9 Confirm that the antiderivatives of x−1 (i.e. 1/x) are given by
ln x + C if x is positive and by ln(−x) + C if x is negative, and that ln | x | + C
covers both cases.
(Remember that ln x does not have a meaning if x is negative or zero.) All we have to do
to verify the correctness of the formulae is to differentiate the proposed antiderivatives.
Since (d/dx) ln x = x−1, the result is right when x is positive.
Suppose now that x is negative. Then −x is positive, so ln(−x) has a meaning. Using
the chain rule (3.3) with u = −x,
d 1 1
ln(−x) = (−1) = ,
dx −x x
so the second result is confirmed.
But (see Section 1.1) | x| = x if x  0 and |x| = −x when x  0, so ln |x| is an
antiderivative whether x is positive or negative.

Example 14.10 Find the antiderivatives of (2x − 3)−1.


The power m = −1 is the starred case in (14.2), so try y = ln(2x − 3), supposing initially
that 2x − 3  0. Then ➚
314
Example 14.10 continued
ANTIDIFFERENTIATION AND AREA

dy 2
= , or 2(2x − 3)−1.
dx 2x − 3
The unwanted factor 2 will not appear if we try again with y = 12 ln(2x − 3), 2x − 3 > 0.
Also 2x − 3 might be negative, so we introduce a modulus sign (compare Example 14.9).
Finally we have
y = 12 ln| 2x − 3 | + C.

Self-test 14.2
Using Table (14.2) construct the antiderivatives of
(a) e3x + 2 sin 2x; (b) 3/x2; (c) 4x–1.
14

Self-test 14.3
Can you guess and justify the antiderivatives of
2
(a) xe–x ; (b) esin x cos x?

14.3 Signed area generated by a graph


Figure 14.2 shows the graph of a function y = f(x) between x = a and x = b, in
which we assume that the x and y scales are the same. Divide the range as shown
into N sections so that in any section y is either positive only, or negative only.
Let A1, A2, … denote the geometrical areas of these segments, and A the sum of
these. Geometrical area is always positive, so A1, A2, … are all positive numbers.
Then
A = A1 + A2 + A3 + ··· + AN (14.3)

is naturally called ‘the geometrical area between the curve and the x axis’.

A1 A3
(+) O (+) b
a AN x
A2
(−)
(−)

Fig. 14.2
315

(b)

14.3
δx
P Q

SIGNED AREA GENERATED BY A GRAPH


(a)
δA

x x + δx b
a P Q
δA
R
S R S N

Fig. 14.3 Increment of signed area.

We require a different quantity, A , called the signed area between the curve and
the x axis. This is defined as in Fig 14.2 by
A = A1 − A2 + A3 − ··· − AN . (14.4)

In forming A , we use the rule: If y is positive, the contribution takes a positive


sign; if y is negative, the contribution takes a negative sign. This quantity has a
far more useful range of applications than has geometrical area. For example,
suppose that a point is moving on a straight line; then the signed displacement
from its starting point is equal to the signed area of its velocity–time graph.
We show how to calculate the signed area A of the graph of y = f(x) between
two given points, x = a and x = b (Fig. 14.3a). Let A (x) represent the signed area
between a and a variable point with coordinate x (Fig. 14.3a). Increase x by a
small step δx; the signed area from a to x + δx is A (x + δx). The change in signed
area, δA = A (x + δx) − A (x) (positive or negative), is equal to the signed area
of PQRS in Figs 14.3a and b. This is very nearly equal to the signed area of the
rectangle PQNS in Fig. 14.3b (in this case the required sign is negative) so
δA ≈ f(x) δx
which automatically takes the right sign. Therefore
δA
≈ f(x).
δx
Now let δx → 0; ‘≈’ becomes ‘=’, and δA /δx becomes dA /dx, so that
dA
= f(x). (14.5)
dx
From (14.5) A (x) must be one of the antiderivatives of f(x). To find which one,
choose any particular antiderivative and call it F(x). Then A (x) can differ from
F(x) only by a constant, k say, so
A (x) = F(x) + k. (14.6)
316
To determine the value of k, use the fact that A = 0 at x = a, the starting point; that
is to say,
ANTIDIFFERENTIATION AND AREA

A (a) = 0.
Therefore, from (14.6)
A (a) = 0 = F(a) + k,
or k = −F(a), (14.7)

a known quantity, since we selected the antiderivative F(x) of f(x) ourselves. The
required signed area A between a and b is given by
A = A (b) = F(b) − F(a),
by putting x = b into (14.6), with (14.7) as the value of k.
14

The signed area A of f (x) between x = a and b


A = F(b) − F(a),
where F(x) is any (continuous) antiderivative of f(x). (14.8)

In practice we naturally use the simplest antiderivative F(x), in which the C in the
table is zero. But any nonzero choice of C will cancel out and disappear, since it
will be present in both F(a) and F(b).

Example 14.11 Find the signed area of y = x2 from x = −1 to x = 2.


(This happens to be the same as the geometrical area, because y is never negative.) Here
a = −1 and b = 2. Also, the simplest antiderivative of x2 is
F(x) = 13 x 3.
Therefore, from (14.8),
A = F(b) − F(a) = 13 (2)3 − 13 (−1)3 = 3.

There is a special notation, the square-bracket notation, which we shall use


generally from now onward.

Square-bracket notation
[F(x)]ba stands for F(b) − F(a). (14.9)

Example 14.12 Find (a) the signed area, and (b) the geometrical area, between
y = sin x and the x axis from x = 0 to x = 2π.
(a) f(x) = sin x, so F(x) = −cos x is an antiderivative. From (14.8) and (14.9), with a = 0
and b = 2π, the signed area A is given by
A = [−cos x] 2π
0 = −[cos x] 0 = −(cos 2π − cos 0) = 0,

as is expected from Fig. 14.4: the positive and negative sections cancel. ➚
317
Example 14.12 continued

14.4
y

CASE WHERE THE ANTIDERIVATIVE IS COMPOSITE


1
(+) x
O π (−) 2π
−1 Fig. 14.4

(b) The geometrical area A can be obtained by splitting the range into a positive section
0 to π, and a negative section from π to 2π (see Fig. 14.4). The negatively signed section
π to 2π must have its sign reversed in order to give the geometrical area:
A = [geometrical area of 1st loop] + [geometrical area of 2nd loop]
= [signed area of 1st loop] − [signed area of 2nd loop].
This is equal to
[F(x)]0π − [F(x)]2π
π = [−cos x] 0 − [−cos x] π
π 2π

= (−cos π + cos 0) − (−cos 2π + cos π)


= (1 + 1) − (−1 + (−1)) = 2 + 2 = 4.

Self-test 14.4
Find the signed area and the geometrical area between the x axis and
y = ex − 1 for –1  x  1.

14.4 Case where the antiderivative is composite


The argument leading up to eqn (14.8) implicitly assumes that F(x) should be
continuous over the whole range a  x  b (no steps, no infinities, etc.). Cases
where f(x) or f′(x) have finite jumps in value at given points can be assimilated
correctly into the area formula, eqn (14.8), if arbitrary constants are adjusted to
secure the continuity of F(x).

Example 14.13 Define a continuous antiderivative F(x) on 0  x  2, given that


8 1, 0  x  1,
f(x) =
9 2, 1 < x  2,
(see Fig. 14.5).
Over 0  x  1, let F(x) = x, and over 1 < x  2, suppose that F(x) = 2x + c, where c is a
constant. (These are antiderivatives of f(x) over the respective intervals.) If we choose
c = −1 the jump in value of F(x) at x = 1 is removed, so F(x) given by
8 x, 0  x  1,
F(x) =
9 2x − 1, 1  x  2,
is continuous. ➚
318
Example 14.13 continued
ANTIDIFFERENTIATION AND AREA

In Fig. 14.5, F(x) is shown as a broken line. Equation (14.8) then produces an
area equal to F(2) − F(0) = 3. This is obviously correct, being equal to the sum of
the rectangular areas in Fig. 14.5.

3 F(x)

2 2
F(x)

1 1

x x
O 1 2 O 1 2
14

Fig. 14.5 Fig. 14.6

Example 14.14 Define a continuous antiderivative F(x) on 0  x  2, given that


8 x, 0  x  1,
f(x) =
9 1, 1 < x  2,
(see Fig. 14.6).
Put
8 --12 x2, 0  x  1,
F(x) =
9 x + c, 1  x  2,
where c is some constant. The jump x = 1 is removed if we put 1 + c = --12 , so c = − --12 , giving
a continuous antiderivative defined by
8 --12 x2, 0  x  1,
F(x) =
9 x − --12 , 1 < x  2,
as shown in Fig. 14.6. The area is F(2) − F(0) = (2 − --12 ) − 0 = --32 .

Problems
Note: In case you have already met the term (d) 1/x2 (write as x−2); 1/x4; 1/x when x  0
‘indefinite integral’, the term ‘antiderivative’ has (see (14.2)).
(e) √x(= x 2 ); 1 /√x; 1 /x 2 .
1 3
the same meaning.
1 2 1
(f) 3x; 2 x ; 1 /( 3x 2); 3 /(4x 4 ).
14.1 Obtain all the antiderivatives of the x −x 2x − 12 x −2x
(g) e ; e ; 5e ; e ; 3e .
following functions, and check their correctness (h) cos x; cos 3x; sin x; sin 3x;
by differentiating your results. (i) 1 − 3x; 1 + 2x − 3x 2; 3x4 − 4x 2 + 5.
(a) x5; 3x4; 2x3; 31 x 2; 6x; f(x) = 3; f(x) = 0. (j) x(x + 1) (expand by removing the
(b) − 12 x −3; 2x−2; 3x−1 when x  0 (if in doubt, brackets);
see (14.2)). (1 + 2x)(1 − 2x); (x + 1)2; (1 + x)(1 − 1 /x);
(c) x 2 ; x 2 ; x − 2 ; x 3 ; x − 3 . x2(x + x2).
3 1 1 4 1
319
(k) (x + 1)/x (turn it into the1 sum of two terms); By roughly sketching the graphs of the functions
(2√x − 1)/√x (put √x = x 2 and 1 /√x = x − 2 ,
1
for which you obtain zero, explain this fact.

PROBLEMS
then simplify as the sum of two terms); (a) y = x, 0  x  2;
(x + 1)2 /x3. (b) y = x, −1  x  1;
(l) ex + e −x ; 2e 2x − 3e3x ; e 2 x(1 + e − 2 x );
1 1
(c) y = −x2, 0  x  1;
1 /e (= e ); (e − e )/e .
2x −2x 2x −2x 2x
(d) y = cos x, −π  x  π;
(m) 2 cos 2x; 3 sin 12 x − 4 cos 31 x; 2 + sin 2x. (e) y = cos x − 1, 0  x  2π;
(f) y = x−1, −2  x  −1 (note that x is negative in
14.2 Find all the antiderivatives of the following this range);
by trial and error, as explained in the text. Confirm (g) y = sin 3x, 0  x  32 π;
your answers by differentiation. (h) y = 1/(1 − x), 2  x  3 (note: 1 − x is negative
(a) (x + 1)3 (start by trying (x + 1)4); (3x + 1)3; over this range, so make sure you understand
(3x − 8)3. Example 14.10; alternatively, write 1/(1 − x)
1 1
(b) (1 – x)4 ; (8 − 3x) 2 ; (1 − x)3 . = −1 /(x − 1)).
(c) (2x + 1)−2; (1 − x)− 2 ; 2 /(3x + 1)3; 1/[4 (1 − x) 4].
1 1

(d) 2 cos(3x − 2) (try first sin(3x − 2)); 3 sin(1 − x); 14.7 Obtain the geometric area between the graph
2 sin(2 − 3x). and the x axis in each of the following cases. It is
necessary to treat each positive or negative section
14.3 (See Example 14.10.) Find the antiderivatives separately.
of the following. (a) y = −3, 0  x  1 (this is negative all the way);
(a) 1 /(x + 1); 1 /(x − 1); 3 /(3x − 2); 2 /(5x − 4). (b) y = x 3, −1  x  1;
(b) 1 /(1 − x); 1 /(4 − 5x). (c) y = 4 − x2, −1  x  3;
(c) x /(x + 1) (it can be written as 1 − 1 /(x + 1)). (d) y = cos x, 0  x  2π.
(d) (x + 1)/(x − 1) (compare (c)).
14.8 Find the most general function which satisfies
14.4 Use the identities cos2A = 12 (1 + cos 2A), the following equations.
sin2A = 12 (1 − cos 2A), and sin A cos A = 12 sin 2A d 2x d dx d3x d d 2x
to get rid of the squares and products in the (Note: 2 = , 3 = , etc. Work in
dt dt dt dt dt dt 2
following expressions, and in that way obtain several steps, finding the next lowest derivative in
the antiderivatives. each step.)
(a) cos2x; sin2x; sin x cos x.
(b) 3 cos2 2x; sin2 3x; sin 2x cos 2x. d 2x d 2x
(c) cos4x (you will have to use the identities twice). (a) = 0; (b) = t;
dt 2
dt 2
14.5 (a) Show that (d /dx)(x ex) = ex + x ex. By d 2x
(c) = sin t;
rearranging the terms, show that the dt 2
antiderivatives of x ex d3x d3x
are ex(x − 1) + C (use the fact that ex can be written (d) = 0; (e) = cos t;
dt 3
dt 3
as (d /dx) ex). Confirm the result by differentiation.
(b) Differentiate x2 ex. By rearranging the terms d 2x
(f ) = g (g is a constant);
and using the result in (a), find the antiderivatives dt 2
of x2 ex. d4y
(g) = w 0 (w0 is constant; this relates to the
14.6 Use the result (14.8) to obtain the signed dx 4 displacements y(x) of a bending
areas between the given graphs and the x axis. beam).
The definite and
15 indefinite integral

CONTENTS

15.1 Signed area as the sum of strips 320


15.2 Numerical illustration of the sum formula 321
15.3 The definite integral and area 323
15.4 The indefinite-integral notation 324
15.5 Integrals unrelated to area 326
15.6 Improper integrals 328
15.7 Integration of complex functions: a new type of integral 331
15.8 The area analogy for a definite integral 333
15.9 Symmetric integrals 333
15.10 Definite integrals having variable limits 336
Problems 338

In Chapter 14 we showed that the signed area under the graph of y = f(x) was
related to the antiderivative F(x). Calculating areas is only of limited practical
interest in itself, but the existence of a universal connection between antiderivat-
ives and signed areas allows us to adapt this idea in the form of an area analogy
that is applicable to a wide variety of problems.
This approach leads into the definition of an integral of a given function f as
the limit of a sum of infinitesimal terms ∑ ab f(x)δx. This is equal to the area under
the graph of f(x), and we have shown how to evaluate this as F(b) − F(a), where
[a, b] is the range and F is any (continuous) antiderivative of f (see eqn (14.8) and
Section 14.4). There are innumerable physical and other problems leading to sum-
mations of this type, to which the analogy applies. The standard integral notation
facilitates manipulations.

15.1 Signed area as the sum of strips


Consider signed area from another point of view. Figure 15.1a represents the
graph of a function f(x) between x = a and x = b. Since we are talking about area
we suppose the x and y scales are equal. We shall divide the area under the curve
into slices, as follows. Split the range of x from a to b into N equal steps, each of
length δx = (b − a)/N. Over each step (typically PQ in Fig. 15.1a) we mount a strip,
an element of signed area δA, shown shaded. The total signed area A under the
curve is equal to the sum of the signed areas of all these strips
321

f(x) (b)

15.2
(a) S R

δA

NUMERICAL ILLUSTRATION OF THE SUM FORMULA


a f(x)
x
O P Q b
δx
P Q
x
x x + δx

Fig. 15.1

x=b
A = ∑ δA.
x=a

The typical area element is shown magnified in Fig. 15.1b. When δx is small, the
signed area δA is approximately equal to the signed area of the rectangle PQRS,
so that
x=b x=b
A = ∑ δA ≈ ∑ f(xp) δx
x=a x=a

where xp is the x coordinate of P. When we take narrower (and correspondingly


more numerous) strips, it seems reasonable to expect that the accuracy of this
estimate will improve, and indeed that as δx → 0 the error in replacing the true area
by strips will shrink to zero. Therefore (dropping the suffix in xP) we obtain the
following result:

Signed area as a sum


Signed area A of y = f(x), a  x  b:
x=b
A = lim
δx→0

x=a
f(x) δx.
(15.1)

15.2 Numerical illustration of the sum formula


We shall specify the sum in (15.1) in more detail, with the idea of obtaining a
specific algorithm for actually calculating such a sum on a computer. In Fig. 15.2
we show the graph y = f(x). There are N equal subdivisions: we shall call the
length of a subdivision h rather than δx as in Fig. 15.1, since this is conventional
when making numerical calculations, and
b−a
h= .
N
The N + 1 ordinates, x0 to xN, are:
x0 = a, x1 = a + h, x2 = a + 2h, …, xN = a + Nh = b.
322

y
THE DEFINITE AND INDEFINITE INTEGRAL

f(x)
y=

x0 x1 x2 xn h xn+1 xN−1 xN
x
a O b
f(xn)

Fig. 15.2

Then the area of the nth approximating rectangle in (15.1) is f(xn)h for n = 0 to
N − 1, and the approximating sum in (15.1) becomes
N−1
A ≈ hf(a) + hf(a + h) + ··· + hf(a + (N − 1)h) = h ∑ f(a + nh).
15

n=0

Computation of approximating sums (rectangle rule)


If y = f(x) with range x = a to b, the signed-area approximation with N
subdivisions is
N−1
A≈h ∑
n=0
f(x ),
n

where h = (b − a)/N and xn = a + nh. (15.2)

When we take larger and larger N, and smaller and smaller h correspondingly,
we expect that the approximation will approach the exact value. The following
example illustrates this for the very simple case of the signed area associated with
a straight line. The algorithm (15.2) is very easy to program on a computer for
any function f(x). It is called the rectangle rule.

Example 15.1 Calculate the sum in (15.2) when f(x) = x, a = −1, b = 2,


for N = 30, 300, 3000, … , showing how the results approach the exact
value 1.5.
The graph y = x in Fig. 15.3 is a straight line, from which it is very easy to see that
the signed area is exactly 1.5. The computed results are as follows (b − a = 3, and so
h = 3/N):

N 30 300 3000 30 000 …


h 0.1 0.01 0.001 0.0001 …
A≈ 1.05 1.455 1.4955 1.499 55 …

The approximations are approaching 1.5, though very slowly. We shall see in
Section 16.3 how to improve such calculations.
323

y=x

15.3
2

THE DEFINITE INTEGRAL AND AREA


1
+ve

–1
−ve 0 1 2

–1
Fig. 15.3

Self-test 15.1
Apply the sum (15.2) to y = 1 − x2 with a = 0, b = 1 and N = 5. Draw a sketch
showing the curve and the approximating strips. Calculate the area, and
compare this with the exact value obtained from the antiderivative.

15.3 The definite integral and area


Until now we have considered only the special case of N equal steps of width
δx = (b − a)/N, and we caused the step lengths to shrink to zero by letting N → ∞,
as in eqn (15.1). Now suppose that all the steps are small, but not necessarily
equal. In this way we define a partition, (δx1, δx2, δx3, … ) of the interval [a, b],
where the nth step is δxn. To approximate to f(x) over the nth interval we shall
now allow any value of x, x = Xn, say, in the nth interval to be chosen as represent-
ative of that interval, so that the representative ordinate for the nth interval is f(Xn).
We then have a variety of possible stepwise approximations to f(x) on [a, b]. It can
be proved that as finer and finer partitions of [a, b] are used, the corresponding
summations S = ∑ f(Xn)δxn always approach the same number, equal to the signed
area A. Since there is great freedom in the choice of the partitions δxn and of the
representative points Xn, we may use the simple form given by (15.1), namely
x=b
lim ∑ f(x) δx
δx→0 x=a

to express a general instance of the summation.


Historically, a large letter S, for ‘sum’, used to be printed to express summation,
instead of ∑. The integral sign ∫ that we now introduce arose as a stretched letter S:

 f(x) dx over [a, b].


b
Notation: definite integral
a
x=b


b
(a) When b  a, f(x) dx stands for lim
δx→0

x=a
f(x) δx.
a
A D
 C
b a

(b) When b < a, f(x) dx stands for − B f(x) dxE .


a F b
(15.3)
324
The notation ∫ ba f(x) dx is called a definite integral, and is usually read as ‘the
integral from a to b of f(x) dx’. The letter x is the variable of integration, and f is
THE DEFINITE AND INDEFINITE INTEGRAL

the integrand, or the function to be integrated.


We showed in Sections 14.3 and 14.4 that for the cases illustrated, in which
b  a, the signed area under the graph of f(x) is equal to [F(x)]ba = F(b) − F(a),
where F(x) is any continuous antiderivative of f(x). But the signed area is altern-
atively given by the right-hand side of (15.3a), so for b > a

 f(x) dx = [F(x)] .
b
b
a
a

By (15.3b), this is still true when b  a. Therefore we have the important general
result, sometimes called ‘the basic theorem of integral calculus’.

Integral/antiderivative connection

 f(x) dx = [F(x)] if F(x) is continuous.


b
b
For all values of a and b, a
a (15.4)
15

(We have used a lot of intuition about area to arrive at (15.4); the full justification,
due to Riemann, is far beyond the scope of this book.) This relation will enable
us to evaluate the summations produced in problems that have no original con-
nection with area, provided we can obtain an antiderivative F that is continuous
over the interval [a, b]. Further applications of this area analogy are illustrated in
Section 15.8.
Notice that in a definite integral any letter can be used for the variable of integra-
tion, because the letter itself disappears in the course of evaluation; for example,
1

 x dx = [ x ] = [ x ] =
0
1
2
2 1
0
1
2
2 x =1
x =0
1
2 ;

 t dt = [ t ] = [ t ] = ;
0
1 2 1
2 0
1 2 t =1
2 t =0
1
2 and so on.

Consequently, the letter used is called a dummy variable. We should not choose a
letter already being used for something else.

Self-test 15.2 Evaluate


x2

 x dx; e
b
(a) 2
(b) xt
dt (x ≠ 0).
a x

15.4 The indefinite-integral notation


The symbol

 f(x) dx,
325
with no limits of integration specified, is called an indefinite integral of f(x), and has
exactly the same meaning as the word ‘antiderivative’ that we have used up until

15.4
now, and which we have denoted by F(x) (see (14.1)).

THE INDEFINITE-INTEGRAL NOTATION


Indefinite-integral notation
∫ f(x) dx, with no limits specified, stands for any antiderivative of f(x). (15.5)

The expression ∫ ba f(x)dx is called a definite integral because it takes a definite


value: it represents a specified signed area and there is no arbitrary constant on
the right. However, an indefinite integral ∫ f(x) dx does not stand for a number; it
represents an antiderivative, which is a function. This function is to be written in
terms of the current variable of integration, so the name of the variable is usually
significant. Also there will be a disposable, or arbitrary, constant, in the usual
way. For example,

 x dx = --x + C,  e
2 1 3
3
2t
dt = --12 e2t + C,  cos u du = sin u + C
and so on, where C is a constant. In some problems we shall assign or discover a
definite value for C; in others we might want to keep C as an arbitrary constant in
order to express every possible antiderivative (hence, ‘indefinite’ integral).

Example 15.2 Find the signed area A associated with the graph y = 3 e2x from
x = 1 to x = 3 using the new notation.
We shall need an antiderivative F(x) (i.e. an indefinite integral) of 3 e2x. Using the
notation (15.5), we may write


F(x) = 3 e2x dx = 3
2 e 2x

(for this purpose any antiderivative will do, so we have put C = 0). Then, from (15.4),

 3e
3
A= 2x
dx = [ 23 e2x ]31 = 23 [e2x ]31 = 23 ( e6 − e2 ).
1

By the definition (14.1) of antiderivative, f(x) is an antiderivative of df(x)/dx,


that is to say,

df(x)
dx
dx = f(x) + A,

where A is a constant. Next, write ∫ f(u) du = F(u), say, and suppose that the vari-
able of integration u is a function of another variable x. Consider the derivative of
F{u(x)} with respect to x:
dF(u) du dF(u)
= (by the chain rule (3.3)
dx dx du
du
= f(u) (by the definition of F (u)).
dx
326
Therefore F{u(x)} = ∫ f(u) du is an antiderivative of f{u(x)}{du/dx}; that is,
THE DEFINITE AND INDEFINITE INTEGRAL

 
du
F{u(x)} = f{u(x)} dx = f(u) du + B,
dx
where B is a constant. We now have

Two identities for indefinite integrals

(a)
df(x)
dx
dx = f(x) + A, with A a constant.

 dudx 
(b) f{u(x)} dx = f(u)du + B, with B a constant, and any function u(x).
(15.6)

Self-test 15.3
15

Evaluate the indefinite integral

sin x cos x dx.


17

15.5 Integrals unrelated to area


Integrals arise constantly in applications, but only seldom is there any direct
connection with area. The following Example starts by giving information that
seems to have nothing to do with area. However, we show that the problem can be
thought of in terms of an area, and therefore can be solved in terms of a definite
integral.

Example 15.3 A small object P is pushed steadily along the x axis from x = 0
to x = 1, against a resistive force f(x) = x2. Find the work done against the
resistance.
Divide the range x = 0 to 1 into a large number of short steps of length δx. In general,
if the resistive force is constant the work done over a distance is (force) × (distance
moved). Although the force on P is not constant, over a short distance δx the work δW
done by the applied force is given approximately by
δW ≈ f(x) δx = x2 δx.
The total work W is given by
x =1 x =1
W= ∑ δW ≈ ∑ x 2
δ x.
x=0 x=0

Letting δx → 0, we obtain exactly ➚


327
Example 15.3 continued

15.5
x =1
W = lim
δx→ 0
∑x
x=0
2
δ x. (15.7)

INTEGRALS UNRELATED TO AREA


But this expression matches eqn (15.3). Consequently we can deduce from (15.4) that
1
W=  x dx = [ x ]
0
2 1
3
3 1
0 = 13 , (15.8)

and this equal to the area under the curve y = x2 between x = 0 and x = 1.

Similarly, if there arises, in any context whatever, a sum of the type


x =b
lim
δx → 0
∑ f(x) δx,
x= a
(15.9)

then such a sum can always be interpreted as representing, by eqn (15.4), the
signed area of y = f(x) between x = a and x = b. We shall call this observation the
area analogy. Its usefulness goes further than the connection of integrals with
antiderivatives, as illustrated in Example 15.6 and Section 15.8.

An object is driven along a straight line with velocity v(t) = e 2 t


1
Example 15.4
between times t = 0 and t = 2. There is a resistive force g(v) = 3v2, where v is
velocity. Find the total work done against the resistance.
In a short interval between times t and t + δt, the distance travelled, δx, is given
approximately by
δx ≈ v(t) δt = e 2 t δ t.
1

The work δW done in this time interval is approximated by


δW ≈ g(v) δx = 3v 2 δx ≈ 3v2(e 2 t δ t) = 3 e t(e 2 t δ t) = 3 e 2 t δ t.
1 1 3

Therefore the total work W required is given by


t=2 2
W = lim ∑ 3 e 2 t δ t =  3e dt = 2[ e 2 t]20 = 2(e 3 − 1).
3 3 3
2t
δ t→ 0
t=0 0

Example 15.5 During a rainy period extending from t = 0 to t = 10 days, the


rainfall rate r from moment to moment in units of centimetres per day is
found to be r(t) = 53 t − 503 t 2 . Find the total depth of rainfall, R, for the period.
Take a short time interval from t to t + δt (expressed as a fraction of a day). During this
period, the rainfall δR is given approximately by
δR ≈ r(t) δ t = ( 53 t − 503 t 2 ) δ t
(‘approximately’ because the rate of fall r varies a little even through a short time). The
total rainfall from t = 0 to 10 days is equal to the sum of all the contributions as the steps
δt tend to zero (while becoming proportionately more numerous):
t =10 10
R = lim ∑ ( 53 t −
δ t→ 0
t=0
3 2
50 t ) δt = 
0
( 53 t − 3 2
50t ) dt (by (15.3))

= [ 53 ( 12 t 2 ) − ( t )]100 = 103 (10)2 − 501 (10)3 = 10 (cm).


3 1 3
50 3
328

Example 15.6 Suppose that, in Example 15.5, the rainfall rate is given by
THE DEFINITE AND INDEFINITE INTEGRAL

r(t) = t 2 e−t (cm per day). Obtain the total rainfall R between t = 0 and 10 days.
1

Proceeding as before, the total rainfall is given by


10
R= 
1
t 2 e −t dt.
0

We cannot find an indefinite integral to enable R to be evaluated. However, we know


from (15.9) that R is equal to the area under the (r, t) graph, which can be computed
numerically by using the numerical method of eqn (15.2).
For example, divide the range t = 0 to 10 into N strips (so that δt = 10/N), then the
approximation corresponding to (15.2) becomes
N −1
10 ⎛ 10 ⎞
R≈ ∑t e − tn , where tn = n ⎜ ⎟ .
1
2
n
N n= 0 ⎝ N⎠
The following computed values show how the exact result is approached when we take
N larger and larger:
N 5 10 100 1000
δx 2.00 1.00 0.10 0.100
R 0.4701 0.7070 0.8796 0.8859
15

The exact answer is 0.886 07… .

We shall show in Chapter 16 that we are not tied to eqn (15.2) for calculating
signed area, but can find far better computing formulae.

Self-test 15.4
The mean value of a periodic function f(t) of period T is given by

 f(t) dt.
T
1
m(T) =
T
0

If f(t) = a cos ωt (T = 2π/ω, a > 0), then m(T) = 0. If f(t) is an alternating cur-
rent, for example, then its mean value indicates nothing about the current.
Instead, a measure of its magnitude is the root mean square (rms) defined by
1


G1 T
J –2
rms[f(t)] = I | f(t)|2 dt L .
T
0

Find the rms of f(t) = a cos ωt.

15.6 Improper integrals


If a definite integral has an infinite range, or the integrand becomes infinite at
some point in its range, the integral is said to be improper. Usually these present
no particular problem.
329

e

15.6
−2x
Example 15.7 Evaluate dx.
0

IMPROPER INTEGRALS

Putting e−2x dx = − 12 e −2x, we have

e
0
−2x
dx = − 12 [e −2x ]0∞ = − 12 (0 − 1) = 12 .

Example 15.8 Evaluate 1


dx
x2
.

x
1
−2
dx = [−x−1]1∞ = [0 − (−1)] = 1.

In Examples 15.7 and 15.8 we have (see Fig. 15.4) two cases of an infinitely
long figure which encloses a finite area. This does not always happen, even if the
integrand goes to zero when x → ∞.

y y
1 1
1
y = e−2x y=
x2

O 1 x O 1 2 3 x

Fig. 15.4

Example 15.9 Consider  1


dx
x
.

We have

1
x−1 dx = [ln x] ∞1 .

The logarithm becomes infinite as x becomes infinite, so the integral is meaningless.


The function x−1 does not tend to zero fast enough to keep the area finite as we extend
the range to infinity.


The integral ∫ a f(x) dx is defined by the limit lim ∫ Xa f(x) dx if the limit exists:
X→∞
such integrals are also called infinite integrals.
330
The case when the integrand becomes infinite at some point in its range has
similar features:
THE DEFINITE AND INDEFINITE INTEGRAL

1 1
Example 15.10 Consider (a) 
0
x − 2 dx; (b)
1

x
0
−1
dx.

Notice that x − 2 and x−1 are infinite at x = 0.


1

x
1
− 12
dx = 2[ x 2 ]10 = 2[1 − 0] = 2.
1
(a)
0

Therefore the integral gives no problem; it is again a case of an infinitely extended figure
(extended in the y direction this time) containing a finite area.
(b) On the other hand,

x
1
−1
dx = [ln x]10,
0

and the integral this time is infinite, because ln 0 is (−∞).


15

There are improper integrals which do not work out for a different reason:

Example 15.11 Consider the integrals


  cos x dx.
X

(a) cos x dx, (b)


0 0

(a) We have

 cos x dx = [sin x]
X
X
0 = sin X.
0

So long as X is finite, there is no problem.


(b) However, for ∫ ∞0 cos x dx we would, straightforwardly, have a term sin ∞ to
interpret. The only sensible meaning that we could attach to sin (∞) is that it stands for
limx→∞ sin X. But sin X has no definite limit as X → ∞; it goes up and down between ±1
for ever.

Improper integrals which give a definite finite result are said to converge. If not,
they are said to diverge.

Self-test 15.5
Using (15.6), evaluate

 xe 0
–x 2
dx.
331

Integration of complex functions:


15.7

15.7
a new type of integral
To differentiate or integrate a function containing the ‘imaginary’ element i,

INTEGRATION OF COMPLEX FUNCTIONS: A NEW TYPE OF INTEGRAL


simply treat i like an ordinary real constant. Thus, for example,
d ix
dx
e = ieix and e ix
1
dx = eix + C = −ieix + C,
i
where C is an arbitrary constant (which in this context we would allow to be
itself a complex number). Suppose that a and b are real numbers, and that
c = a + ib. Then
d cx
e = c ecx ,
dx
which implies

e cx
dx =
1 cx
c
e + C. (15.10)

We use (15.10) with c = a + ib to work out two integrals U and V which


frequently occur in practice:


U = eax cos bx dx and 
V = eax sin bx dx.

Omit the arbitrary constant C for the moment. Observe that

 
U + iV = eax cos bx dx + i eax sin bx dx

=  e (cos bx + i sin bx) dx =  e e dx


ax ax ibx

= e
1
dx =
(a+ib)x
e (a + ib)x
(from (15.10))
a + ib
a − ib ax
= e (cos bx + i sin bx)
a2 + b2
1
= 2 eax [(a cos bx + b sin bx) + i(−b cos bx + a sin bx)].
a + b2
Equate this last expression to U + iV: the real and imaginary parts must separately
be equal; so, after introducing the arbitrary real constant C, we have

(a) e ax
cos bx dx =
1
a2 + b2
eax(a cos bx + b sin bx) + C,

(b)  e
1
ax
sin bx dx = eax(−b cos bx + a sin bx) + C.
a2 + b2 (15.11)
332
The integrals can be expressed more simply in terms of a phase angle. Put
THE DEFINITE AND INDEFINITE INTEGRAL

a b
1 = cos φ and 1 = − sin φ
(a2 + b2 )2 (a2 + b2 )2
into (15.11a), and
−b a
1 = cos θ and 1 = − sin θ
(a + b2 )2
2 (a + b2 )2
2

into (15.11b). Notice also that φ and θ must therefore be related by


θ = φ − --12 π.
Then (15.11) becomes

(a) e ax
cos bx dx =
1
(a 2 + b 2 )
e ax cos(bx + φ ) + C,
1
2

(b)  e
1
sin bx dx = e ax cos(bx + φ − 12 π) + C
15

ax
(a + b 2 )
1
2 2

where cos φ = a/(a2 + b 2 ) , sin φ = −b/(a2 + b 2 ) .


1 1
2 2
(15.12)

Example 15.12 Evaluate I = e


0
−x
cos 2x dx.

Equation (15.11a) or (15.12a) can be used directly with a = −1, b = 2. However, we will
go through the working from first principles, but express the argument differently.
Remember that
eα +iβ = eα eiβ = eα(cos β + i sin β );
then we have e−x cos 2x = Re e(−1+2i)x. Therefore
∞ ∞ ∞

I= 
0
e−x cos 2x dx = Re  0
⎡ 1
e(−1+2i)x dx = Re ⎢
⎣ −1 + 2i

e(−1+ 2i)x⎥
⎦0
⎛ 1 ⎞ 1 + 2i 1
= Re ⎜ 0 − ⎟ = Re = 5.
⎝ −1 + 2 i⎠ 5

Self-test 15.6
Evaluate

e 0
–x
cos bx dx.
333

15.8 The area analogy for a definite integral

15.9
A signed area can be represented as a definite integral as in (15.3). Conversely, any
definite integral ∫ ba f(x) dx, whatever it represents, can be interpreted as represent-

SYMMETRIC INTEGRALS
ing the signed area of the graph y = f(x) between a and b (15.1). The connection
with area means that we have a picture of an integral which can often give useful
information without the need to find an indefinite integral, which might in any case
be impossible. One example of this is the simple numerical method described in
Section 15.2. We restate the connection, which we called the area analogy.

The area analogy


The definite integral ∫ ba f(x) dx (b  a) represents the signed area of the graph of
y = f(x) from x = a to b. (15.13)

The following section illustrates the use of the principle (15.13): it will also be
referred to in later chapters.

15.9 Symmetric integrals


In this section we shall use t instead of x, and x instead of y, so that we consider
functions of the form
x = f(t).
The reason is that time t is commonly the physical variable in contexts where
these techniques are found useful.
Sometimes, by using the area analogy (15.13), the graph of the integrand of
∫a
b
f(t) dt makes it obvious that the value of a definite integral is zero. Figure 15.5
shows some simple cases.

(a) x
1
(c) x
x 2
(b)
1 1 (+)
(+) (+) − 12 π
t t t
−1 (−) O 1 −π (−) O π O 1
2 π
−1 (−) −1
−2

−1

Fig. 15.5 (a) x = t3; ∫ 1−1 t3 dt = 0. (b) x = sin t; ∫ π−π sin t dt = 0. (c) x = t + sin 2t; ∫ −2 π1 π(t + sin 2t) dt = 0 .
1

The ranges of integration on the two sides of the origin are equal, and because
of the special symmetry the positive and negative contributions cancel out. Such
functions are called odd functions, or functions odd about the origin. They have
the following property (see Section 1.4):
334

Odd functions f(t)


THE DEFINITE AND INDEFINITE INTEGRAL

satisfy the condition f(−t) = −f(t). (15.14)

Some basic odd functions are


t, t 3, t 5, … , and their reciprocals,
sin at, sin3at, … , and tan at, tan3at, … , where a is constant.

Symmetrical integrals over functions which are odd about the origin

 f(t) dt = 0.
c

If f(−t) = −f(t) then


−c
(15.15)

Another useful class are even functions, which are symmetrical about the y axis:

Even functions f (t)


15

f(t) is even if f(−t) = f(t). (15.16)

Some basic even functions are


t 2, t 4, t 6, … , and their reciprocals,
cos at and cosnat (a is a constant),
and even powers of any odd function, such as tan6at. It is also useful to realize that
(odd function) × (even function) = (odd function),
(odd function) × (odd function) = (even function).
Some even functions are shown in Fig. 15.6.

(b) x

(a) x 1 (c) x
1 x = t4 1

− 12 π 1
2 π 1
t x=
1 + t2
O
t t
x = cos t −1 O 1 −1 O 1

Fig. 15.6

Example 15.13 Show that the following integrals are zero:



1 π 1

(a)
 − 12 π
t sin 3t dt;
4 (b) 
−π
5
t cos 3t cos t dt; 1
2 (c)  (e
−1
2t
− e−2t ) dt.

335
Example 15.13 continued

15.9
(a) The function t4 is even and sin 3t is odd, so the integrand is odd. Since the range is
symmetrical about the origin, the integral is zero.
(b) t5 is odd, cos 3t is even, and cos --12 t is even, so the integrand is odd and the integral

SYMMETRIC INTEGRALS
is zero as in (a).
(c) e2t − e−2t is odd (put −t in place of t in the function – it just changes its sign).
Therefore the integral is zero.

If we integrate an even function between ±c, the graph shows that we get equal
contributions from both sides of the origin, which gives the following result.

Symmetrical integrals of even functions

  f(t) dt.
c c

If f(t) is even, then f(t) dt = 2


−c 0 (15.17)

These ideas may also be useful if there is special symmetry about some point
other than the origin.

Example 15.14 Show that  0


cos3t dt = 0.

(a) (b)
x x
1 1
A
(+) B π (+) (+) π (+)
t t
O (−) D (−) 2π (−) (−) 2π
C x = cos3t
−1 x = cos t −1

Fig. 15.7

In the graph of x = cos t (Fig. 15.7) the parts OBA and DBC are congruent, and similarly
for the other pair of divisions; in fact all four divisions are congruent. For the graph of
x = cos3t the shape is changed, but the four pieces remain congruent and retain their
original sign. The resulting cancellation gives zero for the integral.

Self-test 15.7
Without evaluating the integrals explain why
1

 
–3 π
2

sinh3x cosh2x dx and cos3t dt


1
−–π
−1 2

are both zero.


336

15.10 Definite integrals having variable limits


THE DEFINITE AND INDEFINITE INTEGRAL

Integrals of the following type occur rather frequently in applications:


x

I(x) =  f(t) dt,


c
(15.18)

where c is a constant. Although I(x) depends on x, this is still a definite integral


because it has limits of integration; no arbitrary constant occurs. Notice that we
avoid using x as the variable of integration when a limit of integration involves x:
the same letter would be serving two totally different purposes. Therefore we
have changed the variable of integration to t.
Suppose that F(t) represents any particular continuous indefinite integral, or
antiderivative, of f(t). Then, as always,

I(x) = [F(t)]cx = F(x) − F(c). (15.19)

Since F(c) is constant,


15

 f(t) dt =
d I(x) d dF(x)
= = f(x).
dx dx c
dx

Similarly we can obtain


c

 f(t) dt = −f(x).
d
dx x

Now consider the more complicated case


v(x)

K(x) =  u(x)
f(t) dt = F(v(x)) − F(u(x)).

By using the chain rule, we have

d d F(v) d v(x) d v(x)


F(v(x)) = = f (v(x)) ,
dx d v dx dx
with a similar result for F(u(x)), and finally we have the results

Differentiation of integrals with respect to limits


v( x )
d d v(x) d u(x)
(a) f (t) dt = f (v(x)) − f (u(x)) .
dx u( x )
dx dx
(b) (Special cases):

  f(t) dt = −f(x).
x c
d d
f (t) dt = f (x) and
dx c
dx x
(15.20)
337
(The results (15.20b) are simply (15.20a) in the respective cases v(x) = x, u(x) = c,
or v(x) = c, u(x) = x.) It is worth noticing that (15.20) does not require you to

15.10
integrate anything!

DEFINITE INTEGRALS HAVING VARIABLE LIMITS


3x 2

Example 15.15 Obtain dI/dx when I(x) =


 x2
et dt.

Here f(t) = et, u(x) = x2, v(x) = 3x2.


d I(x) 3x2
Therefore = e 6x − ex2 2x = 2x(3 e3x2 − ex2).
dx

Equation (15.20b) shows that I(x) = ∫ xc f(t) dt is an antiderivative of f(x). It might


be thought that, by choosing various values for c, we could reproduce all the
antiderivatives F(x) corresponding to f(x). However, this expectation is only
sometimes correct.

Example 15.16 Let f(x) = cos x, with antiderivatives sin x + C, where C is an


arbitrary constant. Demonstrate that for the integral ∫ xc f(t) dt it is not possible
to find a value of c which will reproduce the antiderivative sin x + 1000.
We have
x

 cos t dt = [sin t] = sin x − sin c.


c
x
c

But −1  sin c  1 no matter what value of c we take, so we could never make the
integral equal to sin x + 1000.

Example 15.17 The function shown in Fig. 15.8a is described by f(x) = 1 when 0
 x  1, f(x) = −1 when 1  x  2, and f(x) = 0 when x  2. Sketch a graph of
the function
x

I(x) =  f(t) dt.


0
x

Range 0  x  1. I(x) =  1 dt = x.
0
(i)

Range 1  x  2. In this range, f(t) is described by a different expression, −1 instead


of 1, so we must split up the integral:
x 1 x

I(x) = 
0
f(t) dt = 0
f(t) dt +  f(t) dt
1
x

=1+  (−1) dt
1
(after using (i) at x = 1)

= 1 − (x − 1) = 2 − x. (ii)


338
Example 15.17 continued
THE DEFINITE AND INDEFINITE INTEGRAL

(a) (b)
y y
1 1
y = f(x) y = I(x)

x x
0 1 2 3 0 1 2 3

−1 −1

Fig. 15.8

Range x  2. Split the integral again:


x x

   0 dt = 0,
2
I(x) = f(t) dt + f(t) dt = 0 + (iii)
0 2 0

where we used the value of (ii) at x = 2. The resulting graph of I(x) is shown in
Fig. 15.8b.
15

Self-test 15.8
Find
x2
d G
 e cost dtL
J
I(x) = t
dx I x

by (a) evaluating the integral and differentiating the result, and (b) by using
(15.20d).

Problems

15.1 Sketch each of the following curves; then


express the signed areas under them firstly as the
(f)  t dt; (g)  cos 2u du; (h)  3e dy;
− 12 − 12 y

sums of strips, as in (15.1), and secondly as definite


integrals, as in (15.3); and finally evaluate them (i)  (1 + 3t − 2t) dt; (j)  (1 + 4 cos 4w) dw;
2

by (15.4).
(a) y = x3, −1  x  2; (b) y = x5, −1  x  1;
(c) y = sin x, −π  x  0; (d) y = e−2x, 0  x  1. (k)  (−x) dx when x is negative (you will have to
–12

experiment to find a valid antiderivative).


15.2 Evaluate the following indefinite integrals
(remember the arbitrary constant). 15.3 Evaluate the following definite integrals.

   dx;
1 1 2

  
1
(x + 1) dx;
1 1
2x (a) x 3 dx; (b) x 2 dx; (c)
(a) x dx;
2
(b) 2
(c) e dx;
−1 −1 0

(d)  sin x dx; (e)  (cos x − 2 sin 2x) dx; (d)  x dx; (e)  (1 − 3x + 2x ) dx;
4 1
–12 2

0 −1
339

 (x x
2 2
values of T, and deduce the value of the mean
(f) −3
+ x−2) dx; (g) −2
dx;
over 0  t  ∞; if you put T = ∞ into the

PROBLEMS
1 1
−1 integral directly, it turns out to be infinite, or
(h)  x −1
dx (take care: the x values are negative); ‘diverges’ (see Section 15.6), so no conclusion
−2 can be drawn from this approach).
−1 (i) f(t) = t −1, 1  t  ∞ (it is necessary to follow
(i)  (−x) dx (see the remark in Problem 15.2k);
−2
–12 the procedure in the previous question, for the
same reason).

 e dx; (k)  sin 4x dx;


1

1

−3x
(j) 15.7 Use the even/odd properties of the integrands
0 0 (see Section 15.9) to prove the following results.

 sin --x dx; (m)  cos --x dx.


2π 2π
π π

 sin t dt = 2  sin t dt;


1 1
(l) 2 2 (a) 4 4
0 0
−π 0
1


15.4 Evaluate the following integrals, using the t3
notation of (15.4). (b) dt = 0;
−1
(1 + t 4 )

  (x − 1)(x + 1) dx;
1 1

x(x2 + x + 1) dx; π 2π
1

 
(a) (b) t cos t
0 −1 (c) dt = 0; (d) t 2 sin(t 3 ) dt = 0.
−π
1 + t2 − 12 π

(c)  x(x − 1) dx;


2 2

(d) 
2
x+x 2
dx; 3
0 x 1 15.8 (Computational: see Section 15.2 and Examples
15.1 and 15.6.) Write a simple program based on
2 4

 t dt; (f) 
t(t + 1) √u − 1 the algorithm (15.2) to evaluate a definite integral
(e) d u;
1
2 u ∫ ab f(x)dx. Assume that you have a subroutine for
1 1
0 −1
evaluating f(x), and that you input a, b, and N (the
(g)  (h) 
dw x number of subdivisions); also either a permissible
; dx ;
2w + 3
−1
x−1 −2
error E, or a parameter M which determines the
π number of iterations. If you use E, the process
(i)  cos 3t dt (cos A = --(1 + cos 2A)).
2 2 1
2 might be written to print out when two successive
0 iterations are within E of each other. Check the
correctness of the program by using a function
15.5 Evaluate the following infinite integrals.
∞ ∞ ∞
such as x2 as integrand.
(a)  e−3t dt;  (b)  x;
dx
e − 2 v dv; (c)
1

3
Estimate the values of the following integrals.
2 π

 x dx; (b)  sin x dx;


1 0 1 e −x 2
∞ 1 2 (a)
 (2x + 3) ; (e)  s ; (f)  (t − 1) ;
dx ds dt
(d) 2 1 1
1 0
4 2
1

(c)  cos e dx.


0 0 1
∞ −x

(g)  e sin 3t dt (see Section 15.7);


−2t
0
0

(h)  e cos 2t dt (see Section 15.7).


− 12 t 15.9 (Computational). (a) Convince yourself that
0
e−x2  e−x when x  1. Use the area analogy (15.13)
to show that, if b  1, then ∫ b∞ e−x2 dx  ∫ b∞ e−x dx.
15.6 The mean or average value of f(t) over Deduce that, if E is a positive number and E  1,
an interval 0  t  T is the quantity --T1 ∫ T0 f(t) dt. then ∫ b∞ e−x2 dx  E for b  − ln E.
Find the mean values of the following over the (b) Use the program written for Problem 15.8 to
intervals given. evaluate the improper integral ∫ ∞0 e−x2 dx to within 2
(a) f(t) = t, 0  t  1; (b) f(t) = t, −1  t  1; decimal places, in the following way. You have to
(c) f(t) = sin t, 0  t  π ; stop the integral somewhere: the program cannot
(d) f(t) = sin t, 0  t  2π; deal with b = ∞. Take a permissible error E = 0.001,
(e) f(t) = t −2, 1  t  T; say, to leave some leeway. Referring to (a), choose
(f) f(t) = e−t cos t, 0  t  2π; b  −ln E, and compute the integral ∫ b0 e−x2 dx.
(g) f(t) = e−2t sin t, 0  t  ∞; The part of the original integrand between b
(h) f(t) = 1 − e−t, 0  t  ∞ (work out the mean and infinity will then be negligible if b is large
value over 0  t  T for several increasing enough.
340
15.10 (Section 15.10). Find dI/dx where I(x) is A switch cuts off the voltage and closes the circuit
given by the following integrals. again at time t = t0  0. For t  t0, the current is
THE DEFINITE AND INDEFINITE INTEGRAL

x x given by I(t) = I0 e−R(t−t )/L. Obtain expressions for


0

 
x


et
(a) t2 dt; (b) sin5t dt; (c) dt; Q(t) for t  0, where
0 0 0 1+t

 I(u) du.
t
ex √(x+1) Q(t) =
(d)  t ln t dt;
0
(e) 
√x
sin(t2) dt. 0

15.13 A function f(x) is defined by


15.11 (See Example 15.17: it is necessary to split
up the integrals.) Obtain ∫ x0 f(t) dt where f(x) is ⎧x 2, 0  x  1,
f (x) = ⎨
defined by the following. ⎩2 − x, 1  x  2.
⎧0 if x  −1 ⎫ Sketch the graph of y = f(x) for 0  x  2. Find the
⎪ ⎪
(a) f (x) = ⎨x if −1  x  1⎬ ; consider positive
area under the curve between x = 0 and x = 2.
⎩⎪0 if x  1 ⎭⎪ and negative
values of x.
15.14 Evaluate
⎧x if 0  x  1 ⎫ 2

 | x − 1|
⎪ ⎪ dx
(b) f (x) = ⎨2 − x if 1  x  32 ⎬ ; consider positive 2 /3
.
⎪⎩0 if x  2
3
⎪⎭ x only. 0

Note that the integrand is infinite at x = 1. The


15

15.12 An ‘RL’ circuit has a constant current I0 result will be the sum of two improper integrals
flowing, produced by a constant applied voltage. on 0  x  1 and 1  x  2.
Applications involving
the integral as a sum 16

CONTENTS

16.1 Examples of integrals arising from a sum 341


16.2 Geometrical area in polar coordinates 344
16.3 The trapezium rule 346
16.4 Centre of mass, moment of inertia 348
Problems 353

We illustrate how solutions in the form of summations of the type described in


Chapter 15 occur in various fields of application, and are represented by integrals.
Here we limit complication by using examples requiring only the elementary
table of antiderivatives (14.2).

16.1 Examples of integrals arising from a sum


The examples which follow show typical cases where integrals arise from sums of
the type (15.3).

Example 16.1 The tension T in an elastic string is given by T = 0.01x (kg m s−2),
where x is the extension beyond the natural length. Find the work done on the
string to stretch it 2 metres beyond its natural length.

Natural length
Tension T

String x δx
Extension

Fig. 16.1

To stretch it from extension x to x + δx (Fig. 16.1), the work δW required is


approximated by
δW ≈ force × distance = T δx = 0.01x δx. ➚
342
Example 16.1 continued
APPLICATIONS INVOLVING THE INTEGRAL AS A SUM

The total work W is approximated by


x=2 x=2
W= ∑ δW ≈ ∑ 0.01x δx.
x=0 x=0

Now let δx → 0. Then we obtain


x=2 2
W = lim
δx→0
∑ 0.01x δx =  0.01x dx
x=0 0
(from (15.3))
2
⎡ ⎤
⎢⎣
= ⎢ 0.01x dx⎥ = 0.01[ 12 x2 ] 20 = 0.02 (kg m2 s −2 ).
⎥⎦
0

Example 16.2 A car runs from rest to rest in 1 hour, its velocity v being given by
v = 200t(1 − t) (in kilometres per hour). The rate of fuel consumption, f (in litres
per kilometre), is related to the velocity by f = 10 − 4 v 2. Find (a) the distance
travelled and (b) the amount of fuel used.
(a) In time δt it travels a distance δx, where
δx ≈ v δt.
The total displacement x (which is equal to the distance travelled since v is always
positive) is therefore
16

t =1 1 1 1
x = lim ∑ v δt =
δt→ 0
t=0

0
v dt = 0
200t(1 − t ) dt = 200  (t − t ) dt
0
2

= 200 [ 12 t 2 − 13 t 3 ]10 = 33 13 (km).


(b) In distance δx, it uses an amount of fuel δF approximated by
δF ≈ f δx ≈ fv δt = 10 − 4v 2(v δt) = 800t 3(1 − t)3 δt.
The total fuel used, F, is given by
t =1 1
F = lim
δt→ 0
∑ 800t (1 − t)
t= 0
3 3
δt =  800t (1 − t) dt,
0
3 3

(by comparing (15.3) with t in place of x)


1
= 800  (t
0
3
− 3t 4 + 3t 5 − t 6 ) dt

= 800 [ 14 t 4 − 53 t 5 + 12 t 6 − 17 t 7 ]10 = 5.71 (litres).

Example 16.3 The straight line y = (r /h)x, between x = 0 and x = h, is rotated


around the x axis to sweep out a solid cone of height h and circular base radius r.
Obtain an expression for its volume, and its surface area (excluding the base).
Divide the interval OH into a large number of equal small steps δx (Fig. 16.2). Consider
the step PQ between x and x + δx. This identifies a thin slice of the cone, like a slice of
bread. Its volume δV is nearly that of a cylinder of radius y and thickness δx, so
δV ≈ πy 2 δx.
The total volume is obtained by adding all the δV and then letting the slices tend to zero
thickness (at the same time becoming proportionately more numerous): ➚
343
Example 16.3 continued

16.1
(a) y (b)
y

EXAMPLES OF INTEGRALS ARISING FROM A SUM


r
y= x δs
h
r
O y r
P Q x h
x
H
x=h x r
x
δx

Fig. 16.2 (a) Incremental volume for a cone; (b) x, y section of the cone.

x=h h h
πr 2 2 πr 2
V = lim
δx → 0
∑ πy
x=0
2
δx =  πy dx = 
0
2

0 h 2
x dx = 2 [ 13 x 3 ] 0h
h
πr 2 h3 1 2
= 2 = 3 πr h.
h 3
The x, y section of the cone is shown in Fig. 16.2b. If δs is arc-length
on the line y = (r/h)x, then the surface area of the ‘band’ is approximately
2πy δs = 2πy[(δx)2 + (δy)2]–2 = 2πf(x)[1 + f ′(x)2]–2 δx. Hence the surface area is
1 1

x=h
S = lim ∑ 2πy[1 + (dy/dx) ] δx,
1–
2 2

δx→0 x=0

h
= 2π  rxh [1 + (hr ) ] dx,
1–
2 2

= πr[r 2 + h2]–2 .
1

The volume and surface area of any solid of revolution between x = a and
x = b, formed by rotating a profile y = f(x) around the x axis, can be found in
exactly the same way:

Volume and surface area of a solid of revolution around the x axis


For a profile y = f(x), a  x  b,

 πy dx,
b

the volume V = 2

1
⎡ ⎛ dy ⎞ ⎤

2 2 b

the surface area S = 2π y ⎢1 + ⎜ ⎟ ⎥ dx .


a ⎢
⎝ dx ⎠ ⎥
⎣ ⎦ (16.1)
344

Example 16.4 Find the geometrical area enclosed between the curves y = 2x 2 − 1
APPLICATIONS INVOLVING THE INTEGRAL AS A SUM

and y = x . 2

This problem is complicated if we have to think all the time about the difference
between signed and geometrical area as in Chapter 15.
Here it will be done in a different way. Divide the interval −1  x  1 into short steps
of length δx and consider the area elements indicated in Fig. 16.3. They are nearly
rectangular, and the geometrical (positive) area δA of each is given by
δA ≈ |x 2 − (2x 2 − 1)| δx = (−x 2 + 1) δx
(we may drop the modulus signs since −x2 + 1  0 in the given range).
The total geometrical area A is therefore given by
x =1 1
A = lim
δx → 0
∑ (−x
x = −1
2
+ 1) δx =  (−x
−1
2
+ 1) dx

= [− 13 x 3 + x]1−1 = (− 13 + 1) − ( 13 − 1) = 43 .

y
1

y = x2
y B
δx
δA
Q

θ =β
P
16

δθ r
−1 O 1
A
θ θ =α
O x
y = 2x 2 − 1 δA
Fig. 16.4
−1

Fig. 16.3

Self-test 16.1
A surface of revolution is formed by rotating the curve y = 1 + x2 about the
x axis between x = 0 and x = 1. Find the volume of the region created.

16.2 Geometrical area in polar coordinates


In Fig. 16.4, ⁄ represents part of a plane curve which is described in polar
coordinates by
r = f(θ ), α  θ  β.
Form a new, non-rectangular type of area element δA by dividing the θ range,
θ = α to θ = β, into small angular steps δθ, expressed in radians. (We use A rather than
A because, in polar coordinates, we always regard r as being positive, and we shall
count the area elements as positive.) A typical area element has the shape OPQ.
345
When δθ is small, OPQ has very nearly the same area as a narrow circular
sector of radius r and angle δθ radians. Its area is therefore a fraction δθ /2π of a

16.2
complete circle of radius r and area πr 2:
δθ 2 1 2

GEOMETRICAL AREA IN POLAR COORDINATES


δA ≈ πr = 2 r δθ .

The total area is obtained by adding all the elements and letting δθ tend to zero:
θ =β β
A = lim
δθ → 0

θ =α
1 2
2 r δθ =

α
1 2
2 r dθ ,

where r = f(θ ).

Area of a sector in polar coordinates


For a sector of r = f(θ ), with α  θ  β,
β

A= 1
2  r dθ.
α
2

(16.2)

Example 16.5 Find the area of the loop of the curve r = 3 sin 2θ in the first
quadrant.

1 P
r

r = 3 sin 2θ
θ

O 1 2 x Fig. 16.5

For the loop shown in Fig. 16.5, the range of θ is 0  θ  12 π (it is generally helpful to
sketch polar curves before proceeding with the integration). Thus in (16.2)
f(θ ) = 3 sin 2θ, α = 0, β = 12 π.
The area is therefore given by
1
π 1
π

 
2 2

A= 1
2 (3 sin 2θ )2 dθ = 9
2 sin2 2θ dθ .
0 0

But, for any angle B, sin2B = 12 (1 − cos 2B); so


1
π


2 1
π
A= 9
4 (1 – cos 4θ ) dθ = 94 [θ − 1
4 sin 4θ ] 02 = 98 π.
0
346

Self-test 16.2
APPLICATIONS INVOLVING THE INTEGRAL AS A SUM

Sketch the curve defined by r = cos 3θ (− --16 π  θ  --16 π ]. Find the area enclosed
by the loop.

16.3 The trapezium rule


Practical problems often give rise to integrals which the investigator cannot
evaluate or find in a dictionary of integrals. Indeed, sometimes integrals which
are very simple-looking cannot, in principle, be expressed in terms of ordinary
‘formulae’ at all. However, numerical approximations to definite integrals can
usually be obtained to any required degree of accuracy by using numerical methods
in conjunction with a computer. We will mention some very simple methods that
call directly on the area analogy (15.13), which we repeat here:

The area analogy


The definite integral ∫ ba f(x) dx is equal to the signed area between y = f(x) and the
x axis from x = a to x = b. (16.3)
16

In Examples 15.1 and 15.4, we illustrated the use of the area analogy (15.13) using
as the area approximation the sum in (15.1), which had been introduced only for
the purpose of establishing the principle. It only gives close approximations if we
use very small step lengths; but, now that the area analogy is established, we can
look for approximation methods that will be more efficient.
An improved area approximation is shown in Fig. 16.6, where the curve y = f(x)
is ‘fitted’ by a polygonal curve. The approximation to the area of each strip indi-
vidually is obviously better in general than we would get from a rectangle. Divide
the interval x = a to x = b into N steps. We shall denote the length of each step by
h (instead of δx, because h is conventional in numerical analysis). Then

yn−1 yn yN−2 yN−1 yN


y0 y1 y2

A h B
x
O a x1 x2 xn−1 xn xN−2 xN−1 b
(= x0) (= xN)

Fig. 16.6
347
b−a
h= .
N

16.3
Number the N + 1 points of division 0, 1, 2, … , N: the x values are

THE TRAPEZIUM RULE


x0 (= a), x1, x2, … , xN−1, xN (= b).
and the y values y0, y1, y2, … , yN−1, yN.
Each of the approximating area elements is a trapezium. The signed area δAn
of the nth area trapezium is given by
b−a
δA n ≈ 21 ( yn−1 + yn )h = ( yn−1 + yn ).
2N
The total area A is approximated by the sum of these:
N
b−a b−a
A≈ ∑ 2N
( yn−1 + yn ) =
2N
[( y0 + y1 ) + ( y1 + y2 ) + $ + ( yN−1 + yN )]
n=1

b−a 1
= [ 2 y0 + ( y1 + y2 + $ + yN−1 ) + 21 yN ].
N
This is called the trapezium rule.

Trapezium rule

 f (x) dx ≈ b N− a [ y + ( y + y + $ + y
b
1
2 0 1 2 N−1 ) + 12 yN ].
a

The interval is divided into N equal steps:


x0 (= a), x1, … , xN (= b) are the division points; and
yn = f(xn) (n = 0, 1, 2, … , N). (16.4)

In the following example, we compare the trapezium rule (16.4) with the rectangle
rule (15.2), which we can recast for comparison as

 f (x) dx ≈ b N− a (y + y + $ + y
b

0 1 N−1 ).
a

Example 16.6 Compare the efficiency of the trapezium rule (16.4) with the
rectangle rule (15.2) for approximating to ∫ 10 e−x dx.
We set out the results in the following table.
N 10 100 1000
h = (b − a)/N 0.1 0.01 0.001
Rectangle rule 0.66 0.635 0.6324
Trapezium rule 0.632 657 0.632 125 0.632 120
The exact value is 0.632 120 5… . For three-decimal accuracy, the rectangle rule requires
about 1000 divisions and the trapezium rule only about 12. There are many formulae
that are far more efficient than even the trapezium rule, one of the best of these, for
combining simplicity with accuracy, being Simpson’s rule (see Problem 16.21). You
should look at books on numerical analysis for others.
348

y
APPLICATIONS INVOLVING THE INTEGRAL AS A SUM

D
(xn, yn)
xn − X

yn − Y

G
A B
(X, Y)

x
O

C Fig. 16.7

16.4 Centre of mass, moment of inertia


Suppose that there are N particles attached to a weightless plane sheet (Fig. 16.7),
the nth particle being at P : (xn, yn) and having mass mn, where n = 1, 2, … , N.
Let G : (X, Y) be the centre of mass. It is the balancing point of the assembly,
the point such that the total moment of the particles about any axis through G is
zero. Consider in particular the axes CGD and AGB, parallel to the y and x axes
and passing through G. Then, by definition, the coordinates X, Y of G are given by
16

the equations
N N

∑ mn(xn − X) = 0,
n =1
∑ m (y
n =1
n n − Y) = 0. (16.5)

These can be written


N N N N

∑ mnxn − X ∑ mn = 0,
n =1 n =1
∑ mnyn − Y ∑ mn = 0.
n =1 n =1

N
Let ∑ mn = M, the total mass; then these equations give
n=1

N N
1 1
X=
M
∑ mnxn ,
n =1
Y=
M
∑m y .
n =1
n n

If instead of a number of particles there is a solid plate, then this too has a
balancing point. Assume that the plate is uniform so that its mass per unit area,
µ (Greek mu), is the same everywhere on it.
We also assume that the shape of the plate is such that no vertical or horizontal
line cuts across the boundary more than twice: once going in and again going out.
If the shape does not have this property, then the process as explained here has to
be modified.
Suppose that the centre of mass G is at (X, Y). Divide the area into narrow
vertical strips of width δx (Fig. 16.8a). Let the total length, or height, of a
349

(a) B (b)

16.4
y y d
y + δy δy

V(x)
y

CENTRE OF MASS, MOMENT OF INERTIA


H(y)
G δx C D
x G
O a x x + δx b O x

c
A
δA

Fig. 16.8

representative strip as shown be V(x). Then its geometrical area δA is nearly equal
to V δx, and its mass δm is nearly µV δx. Therefore the moment about a vertical
axis AB through G : (X, Y) is approximately given by (x − X) δm ≈ (x − X) µV δx.
The sum of all the elementary moments must be zero, since G is the mass centre.
So, in the limit as δx tends to 0, we have
x =b
lim
δx → 0
∑ (x − X)V(x)µ δx = 0,
x=a

where x = a and x = b represent the extreme left and right limits of the plate. Since
µ and X are constants, this is the same as
x =b x =b
µ lim ∑ xV(x) δx = µX lim ∑ V(x) δx = µAX,
δx → 0 δx → 0
x =a x =a

x =b
where A is the area of the plate, equal to lim ∑ V(x) δx. Cancelling µ, we obtain
δx → 0
x =a

x =b b

 xV(x) dx.
1 1
X= lim ∑ xV(x) δx =
A δx → 0 A
x =a a

Similarly, by dividing the y axis into steps δy, and considering the moments
of horizontal strips of length H(y) (see Fig. 16.8b) about a horizontal axis CD
through G, we obtain
d

 yH( y) dy,
1
Y=
A c

where y = c and y = d are the extreme lower and upper limits of the plate.
In these expressions, all reference to mass has gone ( µ is no longer present).
Therefore the centre of mass of a uniform plate is also called the centroid of the
figure representing the plate, and it depends only on its shape and size.
In fact the moments about every line through G are zero, not simply the
moments about AB and CD parallel to the x and y axes that we used to find G.
350

Centre of mass of a uniform convex plate, or centroid of a convex area, G : (X, Y)


APPLICATIONS INVOLVING THE INTEGRAL AS A SUM

  yH( y) dy,
b d
1 1
X= xV(x) dx; Y =
A a
A c

where A is the area. Here, respectively, V(x) and H(y) are the lengths of the
vertical and horizontal strips, and x = a, b (resp. y = c, d) are the extreme
horizontal (resp. vertical) boundaries of the figure. (16.6)

Example 16.7 Find the position of the centroid or centre of mass of an isosceles
triangle of height h and base b.

y (h, 12 b)
b
y= x
2h

x
O h

b
y=− x
2h
(h, − 12 b) Fig. 16.9
16

Choose axes which make the job as simple as possible. In this case, use the axes shown
in Fig. 16.9.
From the symmetry of the isosceles triangle about the x axis, the centroid must lie on
this axis, so Y = 0 without any calculations.
The sides have equations
b
y = ± x;
2h
therefore the length of the strip at x is given by
b
V(x) = x.
h
Also the area A of the triangle is given by
A = 12 bh.
Therefore by (16.6),
h
⎛b ⎞ h

  x dx =
2 2
X= x ⎜ x⎟ dx = 2 2 2
h.
bh 0 ⎝ h ⎠ h 0
3

(In these coordinates, X is independent of the base length b.)

The moment of inertia is important for problems in mechanics involving


rotation: it plays a part similar to that of mass in non-rotational problems. The
moment of inertia of a single particle of mass m about any axis AB is defined to
be md 2, where d is its perpendicular distance from AB (see Fig. 16.10). For the
moment of inertia of an assemblage of particles, the individual contributions
are added. For a solid plate the contributions of small area elements are likewise
351

B (m)

16.4
y δA
2
d

CENTRE OF MASS, MOMENT OF INERTIA


B C

A D
O δx 6x

A Fig. 16.11

Fig. 16.10

added, as if they were particles, and in the limit we obtain a definite integral. It is
important to select axes and suitably shaped area elements to make a particular
problem manageable.

Example 16.8Find the moment of inertia I of a uniform rectangular plate


ABCD about the edge AB when AB = 2, BC = 6, and the mass per unit area is 2.
Set up axes parallel to the sides, and area elements which are vertical strips of height 2
and width δx, as shown in Fig. 16.11. The axis of rotation is the y axis. The mass δm of
each strip is given by
δm = (surface density) × (area) = 2 δA = 2 × 2 × δx = 4 δx.
The moment of inertia of the strip distance x from the y axis is therefore
x2 δm = 4x2 δx.
The total moment of inertia I is given by
x=6 6
I = lim ∑ 4x2 δx = 4
δx → 0
x=0
 x dx = 4 [ x ]
0
2 1
3
3 6
0 = 288.

Example 16.9 Find an expression for the moment of inertia I of an isosceles


triangle ABC about its base AB, when AB = b, its height is h, and its mass is M.

y
B
1 δA
2b y=− b
2h x + 1b
2

C
x
O x=h
V(x)

− 2b
1
b x

1
y= 2h
− 2b δx
A Fig. 16.12

The axes and the representative strip at x are shown in Fig. 16.12. The equations
of BC and AC are ➚
352
Example 16.9 continued
APPLICATIONS INVOLVING THE INTEGRAL AS A SUM

⎛ b ⎞
y = ± ⎜ − x + 12 b⎟
⎝ 2h ⎠
respectively, so the length V(x) to be assigned to the strip is
b
V(x) = − x + b,
h
and the area δA is approximated by
⎛ b ⎞
δA ≈ ⎜ − x + b⎟ δx.
⎝ h ⎠
Since the plate is uniform, the mass per unit area is (total mass)/(area), or M/--12 bh,
so the mass element δm is approximated by
M ⎛ b ⎞ 2M ⎛ x⎞
δm ≈ ⎜ − x + b⎟ δx = ⎜ 1 − ⎟ δx.
1
2bh ⎝ h ⎠ h ⎝ h⎠
Therefore the moment of inertia I is given by
x=h x=h
2M ⎛ x⎞
I = lim ∑ x2 δm = lim ∑ x2 ⎜ 1 − ⎟ δx
δx → 0
x=0
δx → 0
x=0 h ⎝ h⎠
h h
⎛ x⎞ ⎛ 1 3⎞
 x ⎜⎝1 − h ⎟⎠ dx = h  ⎜⎝ x
2M 2M
= 2 2
− x ⎟ dx
h 0 0 h ⎠
h
2M ⎡ 1 3 1 4⎤ 2M h 3 1
= ⎢ 3x − x ⎥ = = 6 Mh2 .
16

h ⎣ 4h ⎦ 0 h 12

Example 16.10 Find the moment of inertia of a circular disc of radius R and
mass M about an axis through its centre and perpendicular to the plane of
the disc.
R
r + δr
r

Axis

Element
δA Fig. 16.13

The usual (x, y) coordinates are not natural to this problem. In Fig. 16.13, the polar
coordinate r ranges from 0 to R. Break this range into ring-shaped steps as shown, the
representative ring or annulus having inner radius r and thickness δr. These constitute
the area elements δA.
We have δA ≈ 2πr δr, and the mass per unit area is M/πR2, so that the mass of the ring
δm is approximately
M
δm ≈ 2πr δr.
πR2 ➚
353
Example 16.10 continued

PROBLEMS
The moment of inertia of the ring must be equal to that of a suitable distribution of
closely spaced particles along its circumference. The contribution of each of these
imaginary particles to the moment of inertia of the ring is equal to its mass times r 2.
Since r is constant on the ring, its moment of inertia δI is equal to the total mass of
the ring times r 2:
2M
δI ≈ r 2 δm ≈ 2 r 3 δr.
R
Finally
r =R
2M R

2M 2M
I = lim ∑ 2 r 3 δr = 2 r 3 dr = 2 ⋅ 14 R 4 = 12 MR2 .
δr → 0 R R R
r =0 0

Self-test 16.3
A plane area is bounded by the parabola are y = 1 − x2 and the axis y = 0.
Express the ordinate Y of the centroid as the ratio of two integrals.

Problems

(Units are kilogram, metre, second (SI units) where (c) y = x(1 − x), 0  x  1;
they are unstated.) (d) y = sin x, 0  x  π;
(e) y = x 3, −1  x  1 (the fact that x3 is negative
16.1 The resistance R of a compression spring over part of its range does not have to be taken
is given by R = 100x + 1000x2, where x is the into account: the volume elements are always
displacement from its natural length. Find the positive, unlike area elements);
work done in compressing it through a distance (f) y = x(1 − x), 0  x  2 (see the note in (e));
of 0.01. (g) y = x−1, 1  x  ∞ (contrast Example 15.9,
for area);
(h) y = x–, 0  x  1.
1
16.2 The velocity v of a point moving along the 4

x axis is v = 20 − 10t, where t is the time. The


displacement x taking place in a short time 16.4 Show that the volume of a sphere of radius R
δt is approximated by δx ; v δt. Express the is 43 πR3. (A sphere is a solid of revolution.)
displacement which takes place between t = 2
and t = 4 as a definite integral, and evaluate it. 16.5 (a) Find the volume of the ellipsoid obtained
What is its x coordinate at t = 4 if it was at by rotating the elliptical profile x 2/a2 + y 2/b2 = 1
x = 3 when t = 2? about the x axis.
(b) If the x and y scales of the profile ellipse
16.3 Each of the following curves is the profile in (a) are contracted or expanded by suitable
of a solid of revolution which has the x axis as its factors, it becomes a unit circle. Deduce from
central axis. Find the volume in each case (see this fact the formula for the volume of the
(16.1) – you should briefly go through the whole ellipsoid of revolution.
argument until you understand it, not simply
quote the formula): 16.6 The curve y = --12 x between y = 1 and y = 2
(a) y = e−x, 0  x  1; is rotated about the y axis to profile a vertical
(b) y = 1/x, 1  x  2; spindle, or truncated cone. Find its volume.
354
16.7 A uniform beam AB of length L has mass heat generation is Ri 2 heat units per unit time.
m per unit length. It is cemented horizontally at Find the heat generated in a complete cycle of the
APPLICATIONS INVOLVING THE INTEGRAL AS A SUM

A into a wall at the end A. Sum the moments about current, that is in a period 2π/ω. Does it make any
A of elements of length δx, form a definite integral, difference at what instant you regard the period as
and so find the moment supporting the beam at A. starting? (To carry out the integration, you will
need the identity cos2A = --12 (1 + cos 2A).)
16.8 A ‘beam’ in the shape of a circular spindle
made of material of density 500 is fixed to a vertical 16.12 Find the geometric area enclosed between
wall at the end A with its axis of symmetry the curves y = −x and y = x(x − 1) on the interval
horizontal. Its cross-sectional area (perpendicular 0  x  2, by considering vertical strips between
to its axis) is 4 × 10 − 4(1 + 0.4x2), where x is the curves of width δx.
measured from A. Its length is 1. Find the moment
at A required to support it under gravity. 16.13 Find the geometric area enclosed between
Suppose that the data are the same, except that the curves y = −x and y = x3 between x = −1 and
the cross-section is square, or possibly irregular x = 1 by considering vertical strips of width δx
in shape. Does this affect the answer? Suppose connecting the curves. (Be careful about signs:
that the axis is bent, but that x still measures these curves cross.)
the perpendicular distance from the wall: is the
calculation affected? 16.14 For the angular ranges specified, sketch the
curves given in polar coordinates below and find
16.9 A narrow tube of length 10 cm and cross- the sectorial areas.
section 0.1 cm2 contains a chemical solution, with (a) r = θ, 0  θ  2π (a spiral arc);
concentration c(x) = 0.04 e − 4 x g cm−3, where x is
1
(b) r = 2 cos θ, − --12 π  θ  --12 π (a circle);
the distance from one end. Find the total mass of (c) r = eθ /2π, 0  θ  π (spiral arc);
solute in the tube. (d) r = sin 2θ, 0  θ  --12 π.
(Remember the identities cos2A = --12 (1 + cos 2A),
16

16.10 The water clock in Fig. 16.14 has depth sin2A = --12 (1 − cos 2A).)
0.5 m, and its profile is given by r(h) = 0.39h–,
1
4

where r(h) is the radius at height h from the 16.15 The end of a water trough is a rectangle
outlet in the bottom. The size of the outlet hole of height H and width L. Find the total force and
is such as to drain the water at a rate given by moment on the end when the trough is full. (The
dV pressure, meaning the force per unit area acting
= − 0.003h 2 m 3 h −1,
1
perpendicularly on any surface, at depth y is
dt
ρgy, where ρ is density and g the gravitational
where V is the volume of water remaining. Show constant.)
that the water level falls at a uniform rate, and find
how long it runs. (Consider the change δh in level 16.16 Determine the position of the centre of mass
which occurs in a short time δt.) of a symmetrical cone of circular cross-section
which has height H and base radius R.

16.17 Find the moment of inertia of a rectangle,


having sides a and b, about an axis through its
centre, parallel to the sides of length b.

0.5 16.18 Obtain the moment of inertia of an isosceles


triangle of height H and base B about an axis
through its vertex which is (a) parallel to the base,
and (b) perpendicular to the base.

16.19 Use the trapezium rule (16.4) to evaluate


the following integrals to 1% accuracy. (The exact
Fig. 16.14 value can be obtained by evaluating the integrals in
the usual way.)
1 π 1
π
16.11 An alternating current i = i0 cos ω t flows
  
2
1
(a) e 2 x dx; (b) sin x dx; (c) cos x dx.
through a resistor R. The instantaneous rate of 0 0 − 12 π
355
step δx is given by δs = [(δx) + (δy) ] . Deduce that
1
2 2
16.20 The following integrals are either difficult 2

or impossible to evaluate directly. Estimate them the total length s of the curve is given by

PROBLEMS
by using the trapezium rule (16.4). (Since you 2
1
b
⎡ ⎛ dy ⎞ ⎤

2

cannot know the exact answer in advance, you can s= ⎢1 + ⎜ ⎟ ⎥ dx.


proceed by running the program using increasingly a ⎣ ⎝ dx ⎠ ⎦
fine divisions until you get no change in some
This type of integral is usually impossible to
predetermined number of decimal places.)
evaluate explicitly, but can be done numerically.
1
π 1

 e
2
1
−x 2 Compute the lengths of the following curves.
(a) sin 2 x dx; (b) dx;
0 0
(Try the trapezium rule, Simpson’s rule of Problem
16.21, and an integrating routine from a software
2


2


ex dx sin x package if you know how to use it: the interest lies
(c) ; (d) dx.
1 1 + x3 1
x in comparing them.)
(a) y = sin x, 0  x  1;
(b) y = x2, 0  x  2;
16.21 The following is called Simpson’s rule for
(c) y = e x, −1  x  1;
numerical integration. It results from splitting (d) y = (1 − x 2 ) 2 , −1  x  1 (a semicircle, so it can
1

the points of division into successive groups of be done directly).


three, then exactly fitting the corresponding
groups of points on the graph by second-degree 16.23 A curve is given in polar coordinates (r, θ )
polynomials. For this purpose, N must be an by r = f(θ ). Show that the arc-length δs associated
even number: with a small change δθ in θ is given by
b

 y dx ≈
b−a δs = [(δx)2 + (δy)2 ] 2 = [r 2(δθ )2 + (δr)2 ] 2 ,
1 1

(y0 + 4y1 + 2y2 + 4y3


3N using the relations x = r cos θ, y = r sin θ. Hence
a
+ 2y4 + ··· + 4yN−1 + yN).
show that the total length of the curve between
Show that ∫ 10 e−x dx is given correctly to 4
2
θ = α and θ = β is
decimal places by using only four subdivisions. β

Compare the trapezium rule and the rectangle


rule.
s=  {[f (θ)]
α
2
+ [f ′(θ )]2 }2 dθ .
1

The cardioid (see also Problem 1.31a) is given


16.22 Consider the curve y = f(x) for a  x  b. by r = a(1 + cos θ ) in polar form. Find the length of
Show that the arc-length δs associated with a short its perimeter.
Systematic techniques
17 for integration

CONTENTS

17.1 Substitution method for ∫ f(ax + b) dx 356


17.2 Substitution method for ∫ f(ax2 + b)x dx 359
17.3 Substitution method for ∫ cosmax sinnax dx (m or n odd) 360
17.4 Definite integrals and change of variable 362
17.5 Occasional substitutions 364
17.6 Partial fractions for integration 366
17.7 Integration by parts 368
17.8 Integration by parts: definite integrals 371
17.9 Differentiating with respect to a parameter 373
Problems 375

The commonest applications of integration usually involve finding an antiderivative


F(x) of an integrand f(x). So far, due to our choice of illustrations, we have been able
to use the elementary table of antiderivatives (14.2). But (as was the case for derivat-
ives) we do not have to treat every new integral that we meet from first principles
– there are methods such as those described in this chapter that often enable an
unfamiliar integral to be reduced to ones that we know how to deal with. The
rules are not so straightforward as those applied to derivatives in Chapter 3, and
require some foresight in order to see in advance which rule is likely to be helpful.

17.1 Substitution method for ∫ f(ax + b) dx


Consider the indefinite integral

 (3x − 2) dx.
3

We carried out this integration in Example 14.8 by starting with a guess that the
result will resemble (3x − 2)4. We now describe a method less dependent on trial
and error.
We shall take up a clue suggested by the chain rule procedure (Section 3.3). Put
3x − 2 = u. (17.1)
357
Then the integral becomes

17.1
 u dx.
3

SUBSTITUTION METHOD FOR ∫ f (ax + b) d x


Unfortunately this is not equal to 14 u4 + C, because dx, not du, is present: the vari-
able of integration is still x. Thinking in terms of an integral as a sum, δx is not
the same size as δu; in fact from (17.1) δu = 3 δx, which suggests what to do with
the new integral.
From (17.1), du/dx = 3, which we write as
dx = 1
3 du.
Put this into the integral, and it works through straightforwardly:

 (3x − 2) dx =  u (
3 3 1
3 du) = 1
3  u du =
3 1
12 u4 + C.

Now use (17.1) to change back to x:

 (3x − 2) dx =
3 1
12 (3x − 2)4 + C,

and this is correct. In checking its correctness by differentiation, we use the chain
rule with u = 3x − 2, and find we are simply reversing the order of the operations
that we just went through.

 2x − 1 .
dx
Example 17.1 Use a substitution to obtain

Try
u = 2x − 1.
We shall need to express dx in terms of u. Since du/dx = 2, we have
dx = 12 du.
The integral therefore becomes, in terms of u,

 2x − 1 = 
dx ( 12 du)
= 1
2 ln | u | + C = 1
2 ln | 2x − 1| + C.
u

Example 17.2

Evaluate sin(3x + 2) dx.

Put
u = 3x + 2,
then du /dx = 3, so du = 3 dx, or dx = 1
3 du. The integral becomes

 sin(3x + 2) dx =  sin u · ( 1
3 du) = ( − 13 cos u) + C = − 13 cos(3x + 2) + C.
358
The essence of the matter is that the change of variable or substitution led to a
simpler integral than the one we started with. In general, for integrals of this type,
SYSTEMATIC TECHNIQUES FOR INTEGRATION

we have the following result.

Type  f(ax + b) dx
 f (u) du
du 1 1
Put u = ax + b; then = a, or dx = du. The integral transforms to
dx a a
(17.2)

It is worth while to try this substitution in more general cases, even if it is not
obvious that a simplification will take place.

Example 17.3

Evaluate x(2x − 1)3 dx.

This is not quite of the form (17.2) because of the presence of the loose x. Nevertheless,
put
u = 2x − 1,
17

with the object of simplifying at least the most complicated part. Then
du = 2 dx, or dx = 12 du.
We also need to express x in terms of u, using u = 2x − 1:
x = 12 (u + 1).
Now we have

 x(2x − 1) dx = 
3 1
2 (u + 1)u 3( 12 du) = 1
4  (u 4
+ u 3 ) du = 1
20 u5 + 1
16 u4 + C

= 1
20 (2x − 1)5 + 1
16 (2x − 1)4 + C.

Do not miss the possibility of making a substitution in simple cases. For example:

 e dx: put u = −3x, dx = − du;


−3x 1
3

 sin 3x dx: put u = 3x, dx = du; 1


3

 11 +− xx dx: put u = 1 − x, dx = −du.


Self-test 17.1


Use a substitution to obtain sin2(3x + 4) dx.
359

17.2 Substitution method for ∫ f(ax2 + b)x dx

17.2


SUBSTITUTION METHOD FOR ∫ f (ax 2 + b)x dx


Example 17.4 Evaluate x ex 2 dx.

Try putting
u = x2,
2
with the objective of simplifying the unfamiliar-looking term ex . It is then necessary to
deal with x and dx in the integral. We have
du
= 2x,
dx
which we can write as du = 2x dx, or
x dx = 12 du.
In this way we have translated the whole group (x dx) into terms of u, instead of having
to deal separately with x and dx. Therefore

x e x2
dx =  e (x dx) =  e ( du)
x2 u 1
2

=  e du = e + C = e
1
2
u 1
2
u 1
2
x2
+ C,

where C is an arbitrary constant. The correctness of the result can be checked by


differentiating it.

Example 17.5 Evaluate  3xx d+x 2 .


2

Notice that the integral can be written in the form

 3x
1
(x dx).
2
+2
The integrand contains a function of x2 and the combination x dx which appeared in
Example 17.4. This suggests putting u = x2 to give a simpler integral. However, we can
do even better than this.
Put
u = 3x2 + 2.
Then du /dx = 6x, so that
x dx = 1
6 du.
Therefore

 3x u( u
1 1 du
(x dx) = 1
du) = 1
+2
2 6 6

= 1
6 ln |u| + C = 1
6 ln (3x2 + 2) + C,
where C is an arbitrary constant. The modulus sign in the logarithm was discarded
because 3x2 + 2 is always positive.
360
The general result is as follows:
SYSTEMATIC TECHNIQUES FOR INTEGRATION

Integrals of type I =  xf (ax + b) dx


2

1
Put u = ax2 + b; then x dx = du, so
2a

I=
1
2a  f(u) du. (17.3)

Self-test 17.2


Evaluate x sin(x2) dx.
17

Substitution method for ∫ cosmax sinnax dx


17.3
(m or n odd)

Example 17.6

Evaluate sin3x cos x dx.

In this case m = 1 and n = 3. Aim to simplify the worst term sin3x by putting
u = sin x.
Then sin3x becomes u3, and we must deal with cos x dx. As always, begin with
du /dx = cos x. Therefore
du = cos x dx,
so, by good fortune, cos x dx appears in one piece. Then we have

 sin x·(cos x dx) =  u du =


3 3 1
4 u4 + C = 1
4 sin 4 x + C,

with C arbitrary. (Check by differentiating.)

Example 17.7

Evaluate tan x dx.

We have

 tan x dx =  cos x dx =  cos x (sin x dx).


sin x 1

This time, put


u = cos x,
so that du /dx = − sin x. From this we obtain
du = − sin x dx, ➚
361
Example 17.7 continued

17.3
so, apart from the sign, we have exactly the combination required for the rest of the
integrand. Then

 cos x (sin x dx) =  u (−du) = −ln |u | + C = −ln | cos x| + C,

SUBSTITUTION METHOD FOR ∫ cos m ax sin n ax dx (m OR n ODD)


1 1

where C is arbitrary. This often appears as ln |sec x| + C in tables of integrals, since


ln |sec x | = ln(1/|cos x|) = −ln |cos x |.

This technique can be used for products cosmax sinnax, when either m or n
(or both) are odd numbers, either positive or negative, and for certain other cases
as well.

Example 17.8

Evaluate cos3x dx. (This is the case m = 3, n = 0.)

 
Write cos3x dx = cos2x·(cos x dx), and put

u = sin x
(not cos x as possibly expected). Then du/dx = cos x, so that
du = cos x dx.
The remaining part of the integrand is cos2x, and we can transform this by writing
cos2x = 1 − sin2x = 1 − u2.
Then we have

 cos x·(cos x dx) =  (1 − u ) du = u −


2 2 1
3 u 3 + C = sin x − 13 sin 3 x + C,

where C is arbitrary.
You should try also the substitution u = cos x. It leads to an integral in terms of u that
is correct but worse than the original.

Example 17.9

Evaluate I = cos32x sin32x dx.

(Here m = 3, n = 3.) The technique requires us to decompose the term whose power is
odd. Here both powers are odd, so either will do. We shall split the integrand like this:


I = cos32x sin22x(sin 2x dx).
Put u = cos 2x so that
sin 2x dx = − 12 du.
Since sin22x = 1 − cos22x, the integral becomes


I = u3(1 − u2)( − 12 du )


= − 12 (u 3 − u5 ) du = − 18 u 4 + 1
12 u6 + C

= − 18 cos 4 2x + 1
12 cos6 2x + C,
with C arbitrary.
362
The general rule is as follows.
SYSTEMATIC TECHNIQUES FOR INTEGRATION

Integrals of type I =  cos ax sin ax dx, m or n an odd positive or


m n

negative integer

(a) If n is odd, put I =  cos ax sin


m n−1
ax (sin ax dx);
1
then u = cos ax, sin ax dx = − du, and sin2ax = 1 − cos2ax.
a
(b) If m is odd, write

I=  cos m−1
ax sinnax(cos ax dx);
1
then u = sin ax, cos ax dx = du, and cos2ax = 1 − sin2ax.
a
(c) If n and m are both odd, use either (a) or (b). (17.4)

Self-test 17.3
17

Evaluate

 
I1 = sin x cos3x dx and I2 = sin(3x + 2) cos3(3x + 2) dx.

17.4 Definite integrals and change of variable


For the previous examples involving indefinite integrals, we changed the variable
to u, carried out the integration, and then expressed the result back in terms
of x. For a definite integral, it is often more convenient to express the limits
of integration in terms of u, as well as the integrand, and in that way work
with u right up to the end. In the following example, both procedures are
illustrated.

1
π


2

Example 17.10 Evaluate I = cos3x dx in two ways.


0

(a) (By finding an indefinite integral in terms of x.) As in Example 17.8, put u = sin x,
du = cos x dx;

 cos x dx =  (1 − u ) du = u −
3 2 1
3 u 3 = sin x − 13 sin 3 x,

taking the simplest case with C = 0. Then


π

1
I = [sin x − 13 sin 3 x] 02 = (1 − 13 ) − 0 = 23 .
363
Example 17.10 continued

17.4
(b) (Working with u throughout.) Put u = sin x into I. In order to express the limits of
integration in terms of u, note that u = 0 when x = 0, and u = 1 when x = 12 π . Then
(writing the limits so as to make them more explicit)

DEFINITE INTEGRALS AND CHANGE OF VARIABLE


x = 12 π u=1

x=0
cos 3 x dx = u= 0
(1 − u2 ) du = [u − 13 u 3 ] uu==10 = (1 − 13 ) − 0 = 23 .

In Example 17.10b it would have been wrong to write the integral in the form

1

0
(1 − u2 ) du.

This would imply that we were going to put u equal to 0 and 21 π after integrating.

Example 17.11 Find the centroid (centre of mass) of the uniform semicircular
plate shown in Fig. 17.1.

y
R δx

V(x)
x
O R

−R x2 + y2 = R2
Fig. 17.1

The symmetry shows that the centroid G lies on the x axis. From (16.6), the
x coordinate of G is given by

 V(x)x dx.
R
1
X= 1
2 πR 2
0

Since x2 + y2 = R2, we have V(x) = 2(R2 − x2)–2 , so that


1

 (R
R
4
X= − x2 )2 x dx.
1
2
πR2 0

This is an integral of the type of (17.3). To simplify it, put u = R2 − x2, so that du/dx = −2x
and x dx = − 12 du. Also, u = R2 when x = 0, and u = 0 when x = R. Therefore
0


4 2 2 3 0 4 4
X= u 2 (− 12 du) = − 3 [u ] R 2 = − [0 − R 3 ] =
1
2 R.
πR2 R2 πR2 3πR2 3π
364

Self-test 17.4
SYSTEMATIC TECHNIQUES FOR INTEGRATION

Evaluate
1 2

I1 = 0
dx
1+x
and I2 =  lnxx dx.
1

17.5 Occasional substitutions


Finding an advantageous new variable u is often a process of trial and error.
Frequently the possible usefulness of a substitution is easier to see in the form
x = h(u)
rather than u as a function of x as in the previous work.

 (1 − x ) .
dx
Example 17.12 Find a substitution to evaluate 2
1
2
17

Try to simplify (1 − x2 )2 first, hoping that dx will work out conveniently. To do this try
1

x = sin u; (17.5)
then (1 − x2 )2 = (1 − sin2 u) 2 = cos u. Also dx/du = cos u, so
1 1

dx = cos u du.
Therefore

 (1 − x ) = 
dx cos u du
2 1 = u + C = arcsin x + C,
2 cos u
You might try putting u = 1 − x2 instead: the resulting integral is different from, but no
better than, the original.


dx
Example 17.13 From Example 17.12, we know that 1 = arcsin x + C.
(1 − x2 )2
Use this result to obtain  (4 −dxx ) . 2
1
2

1 1
Aim to convert (4 − x2 )2 into something like (1 − u2 )2 , so as to be able to use the
given result.
(4 − x2 )2 = 2(1 − 14 x2 ) 2 = 2[1 − ( 12 x)2 ] 2 ,
1 1 1

and make the substitution


u = 12 x.
Then du /dx = 12 , so that dx = 2 du. Therefore

 (4 − x ) =  2(1 − u ) =  (1 − u )
dx 2 du du
1 1 1
2 2 2 2 2 2

= arcsin u + C = arcsin 12 x + C.
365

 1 +dxx

17.5
Example 17.14 From Appendix E, 2
= arctan x + C.

 1 +dx9x .

OCCASIONAL SUBSTITUTIONS
Use this result to evaluate 2

We want to transform 1 + 9x2 to a form close to 1 + u2, so put


u = 3x,
so that dx = 13 du. Then

 1 + u
1
dx du
= 3
= 1
arctan u + C = 1
arctan 3x + C.
1 + 9x 2 2 3 3

If the required integral does not seem to be similar to one that is already
known, then one has in effect to guess a suitable substitution:

Example 17.15 Evaluate  ln 3 x


x
dx.

We can simplify the logarithm (at the risk of extra complexity elsewhere) by putting
x = eu
so that ln x = ln eu = u. Since dx /du = eu, we have dx = eu du. Therefore

 e 
ln 3 x u3
dx = u
eu du = u 3 du = 14 u 4 + C = 14 (ln x)4 + C.
x

The general shape of the integrand often suggests a substitution that is sure to
simplify it. Suppose we notice that f(x) can be written in the special form
du
f (x) = cg(u) ,
dx
in which u is some function of x and c a constant. Then by eqn (15.6b) we have


I = f(x) dx = g(u)  du
dx 
dx = g(u) du,

which we might be able to integrate. For example, suppose that


I = (x4 + 1)7 x3 dx.

We could evaluate this by exponding (x4 + 1)7 by the binomial theorem, eqn (1.44).
However it is for simpler to notice that x3 = --14 d(x4 + 1)/dx, so put x4 + 1 = u(x),
changing the variable to u:


I = u7 · 14 du = 14 · 17 u8 + C = 281 (x4 + 1)8 + C,

where C is an arbitrary constant.


366
(Any f(x) can in principle be written in this form: the question is only whether
it is easy to see how it breaks up.) Having observed the form of u(x), the substitu-
SYSTEMATIC TECHNIQUES FOR INTEGRATION

tion should be made in the usual way.

Example 17.16

Evaluate (x 2 + 1)3 x − 2 dx.
1 1 1

1
is to spot that (d/dx)( x 2 + 1) is like the remaining 1factor, x − 2 . This
1
The important thing1
suggests that u = x + 1 is the right substitution. Specifically, put u = x 2 + 1; then
2

du 1 − 12
= 2 x and so x − 2 dx = 2 du.
1

dx
The integral becomes


2 u 3 du = 23 u 3 + C = 23 (x 2 + 1)3 + C.
1 4 1 4

Some further special substitutions together with illustrative integrals are listed in
Problem 17.23.
17

Self-test 17.5
Evaluate
4

(a)  1 +dx√x
0
(use the substitution x = u2);

(b)  √(xdx− x )
0
2
(use the substitution x = sin2u).

17.6 Partial fractions for integration


In Section 1.14, it was shown how a rational function P(x)/Q(x), where P(x) and
Q(x) are polynomials, P(x) is of lower degree than Q(x), and Q(x) factorizes into
real factors, can be expressed as the sum of simpler partial fractions. This
provides a method for integrating rational functions.

Example 17.17 Evaluate  x d−x 1 .


2

By the methods of Section 1.14, we find that


1 1 1 1
= = 1
− 1
.
x − 1 (x – 1)(x + 1) x−1 x+1
2 2 2

Therefore ➚
367
Example 17.17 continued

x x −1 − x +1

17.6
dx dx dx
= 1 1
−1
2 2 2

= ln | x − 1| − ln |x + 1| + C.

PARTIAL FRACTIONS FOR INTEGRATION


1 1
2 2
1

x−1 x−1 2 x−1


Other equivalent forms are ln + C, ln + C, and
1 1
ln B , where
x+1 x+1 x+1
2 2

C and B are arbitrary, on any range that excludes the points x = ±1.

As a result of expanding in partial fractions we may encounter integrands of


the type

cx + d
px2 + qx + r
in which the equation px2 + qx + r = 0 has no real roots (i.e. the denominator has no
real factors). The following example shows how to evaluate them by ‘completing
the square’ in the denominator.

Example 17.18 Evaluate I =  x(x++41x) d+x8 .


2

The quadratic form x2 + 4x + 8 has no real factors. ‘Completing the square’ in the
denominator consists of writing x2 + 4x + 8 in the form (x + a)2 + b. The first two
terms, x2 + 4x, can be written
x2 + 4x =(x + 2)2 − 4,
so
x2 + 4x + 8 = (x + 2)2 − 4 + 8 = (x + 2)2 + 4.
The integral becomes
(x + 1) dx (x + 1) dx
I=  (x + 2) + 4 =  [ (x + 2)] + 1 .
2
1
4 1
2
2

Now put u = 12 (x + 2), or x = 2u − 2, from which


dx = 2 du.
Then
2u − 1
u u u
u 1
I= 1
du = du − 1
du.
+1 +1 +1
2 2 2 2 2

To evaluate the first integral, use the substitution v = u2 + 1, as in Section 17.2; the
second is a standard integral. We obtain
I= 1
2 ln(x2 + 4x + 8) − 1
2 arctan 12 (x + 2) + C.
368

Self-test 17.6
SYSTEMATIC TECHNIQUES FOR INTEGRATION

Using partial fractions, evaluate

 x2 dx
(x + 3)2 (x + 1)
.

17.7 Integration by parts


This method is totally unrelated to the techniques we have so far described, and
can be used to integrate special types of product. It is needed very frequently for
obtaining fundamental general results.
Suppose that we are given any u(x) and v(x). Then, by the product rule
(Section 3.1),
d dv du
(uv) = u +v .
dx dx dx
Since both sides are equal, their indefinite integrals can only differ by a constant,
17

so

 dx (uv) dx =  u dx dx +  v dx dx + B,
d dv du
(17.6)

where B is a constant. Look at the integral on the left. It means ‘an antiderivative
of (d /dx)[u(x)v(x)]’. But, from the definition (14.1), u(x)v(x) is an antiderivative.
Therefore (17.6) becomes

uv = u dv
dx 
du
dx + v dx + B.
dx
Now rearrange the terms to obtain

 u ddxv dx = uv −  v ddxu dx − B.
This is the formula for integration by parts. Replacing −B by C:

Integration by parts

 u ddxv dx = uv −  v ddxu dx + C,
where C is some constant. (17.7)

It is not at first obvious how this complicated result could be of any use, but
the point of it is that the right-hand integral might be simpler than the one
369
on the left. The process was once called ‘partial integration’, because the uv
part is already integrated out. (For the dangerous effect of missing out C, see

17.7
Problem 17.19.)

INTEGRATION BY PARTS
Example 17.19

Evaluate x ex dx by integrating by parts.

First observe that the integrand consists of the product of two factors, x and ex, both of
which we can integrate and differentiate any number of times. We relate this fact to
(17.7) by identifying them with u and dv/dx respectively: put
dv
u = x and = ex . (i)
dx
Then


du
= 1 and v = ex dx = ex, (ii)
dx
where we have chosen v to be the simplest antiderivative of ex. Nothing would ultimately
be changed by introducing an arbitrary constant C into v: any antiderivative will do (see
Problem 17.18).
Fill in the right-hand side of (17.7) by picking out u, v, du/dx from (i) and (ii), and
introduce the constant C:

 x e dx = x e −  (e )(1) dx + C
x x x

= x e −  e dx + C = x e − e + C,
x x x x

where C is arbitrary.

We obtained a simplification because we chose x, rather than ex, to be assigned


to u. Since du /dx is simpler than u, it seemed possible that the right side of (17.7)
might be simpler than the left. (To see what happens when we put u = ex, dv/dx = x,
see Example 17.21.)
As in Example 17.19, you should always write out stages (i) and (ii) in full and
do the subsequent working in full, or you will make mistakes.

Example 17.20

Evaluate x cos 2x dx.

Put u = x, dv/dx = cos 2x. Then


du
= 1, v = cos 2x dx = 12 sin 2x.
dx
Substituting these functions into the right-hand side of (17.7):

 x cos 2x dx = x( 1
2 
sin 2x) − ( 12 sin 2x)(1) dx + C

= 12 x sin 2x − 12 (− 12 cos 2x) + C


= 12 x sin 2x + 1
4 cos 2x + C.
370

Example 17.21 For ∫ x ex dx (see Example 17.19), try the effect of assigning
SYSTEMATIC TECHNIQUES FOR INTEGRATION

x
x and e to u and dv/dx the ‘wrong way round’.
In Example 17.19, we successfully put u = x and dv/dx = ex. Now try instead
dv
u = e x, = x,
dx
then


du
= ex , v = x dx = 12 x2 .
dx
The integration-by-parts formula becomes

 x e dx = e ( x ) −  ( x ) e dx + C =
x x 1
2
2 1
2
2 x 1
2 x2 ex − 1
2  x e dx + C,
2 x

which is a true result, but the transformed integral is worse than the original.

Sometimes it is not immediately obvious that the method can be made to work,
as in the following.
17

Example 17.22

Evaluate ln x dx.

Write ln x = (ln x)(1), so that the integral becomes

 (ln x)(1) dx.


We can now put u = ln x and dv/dx = 1, so that
du 1
= , v = x.
dx x
Then

 (ln x)(1) dx = (ln x)(x) − (x)⎛⎜⎝ x ⎞⎟⎠ dx + C = x ln x −  dx + C = x ln x − x + C,


1

where C is an arbitrary constant.

The integrals of other inverse functions, such as arcsinx and arctan x, respond
to the same technique (see Problem 17.11).

Example 17.23

Evaluate x2 sin x dx.

It is necessary in this problem to integrate by parts twice. Put


dv
u = x2, = sin x;
dx ➚
371
Example 17.23 continued

17.8
then
du
= 2x, v = −cos x.

INTEGRATION BY PARTS: DEFINITE INTEGRALS


dx
From (17.7),

 x sin x dx = −x cos x + 2  x cos x dx + C.


2 2

Integrate the integral on the right by parts; put


dv
u = x, = cos x,
dx
so that du/dx =1 and v = sin x. From (17.7), we obtain finally

 x sin x dx = −x cos x + 2x sin x + 2 cos x + C.


2 2

Self-test 17.7
Using integration by parts, evaluate
integral equal if n = −1?
 x ln x dx, (n ≠ −1). What does the
n

17.8 Integration by parts: definite integrals


The integration-by-parts formula (17.7) expresses a relation between indefinite
integrals, or antiderivatives. Suppose that we have a definite integral of the form
b

 u dx dx,
dv
a

which we expect to integrate by parts. Then, from (17.7),


b
⎡ ⎤

b


dv du
u dx = ⎢uv – v dx⎥
dx ⎢ dx ⎥
a
⎣ ⎦a

The operation[…]baapplies to the two terms separately, so we have:

Integration by parts (definite integrals)

  v ddux dx.
b b
dv
u dx = [uv] ba −
a
dx a
(17.8)
372
This can sometimes considerably simplify the working, especially if more than
one integration by parts is needed.
SYSTEMATIC TECHNIQUES FOR INTEGRATION


1

Example 17.24 Evaluate


0
x2 sin x dx.

As in Example 17.23, put u = x2 and dv/dx = sin x. Then


du
= 2x, v = −cos x.
dx
From (17.8),
1
π 1
π

 
2 2
1
π
x sin x dx = [x (−cos x)] 0 −
2 2 2
(− cos x)(2x) dx
0 0
1
π


2

=2 x cos x dx,
0

because the bracketed term is zero; we did not have to wait to the end of the calculation
to see it go. To evaluate the remaining integral, integrate by parts again, putting u = x and
dv/dx = cos x; we have
du
= 1, v = sin x.
17

dx
Use (17.8) again:
π ⎛ π ⎞
1 1

 
2 2
1
π
2 x cos x dx = 2 ⎜[x sin x] 02 − sin x dx⎟
0 ⎝ 0 ⎠
1
π
= 2( π + [cos x] 0 ) = 2 [ 12 π + (0 − 1)]
1
2
2

= π − 2.

The following result is important for Chapter 24, and involves the use of (17.8):

et 0
−t N
dt = N!

when N = 0, 1, 2, 3, … .
(0! is defined to be 1.) (17.9)

Here N! stands for the factorial (see eqn (1.38a)):


N! = N(N − 1)(N − 2) … 3· 2· 1.
The symbol 0!, which is apparently arbitrarily given the value 1, does not fit this
pattern; it should be regarded at this stage as being just a useful convention. The
related gamma function Γ(N) = (N − 1)! is used in statistics in Section 41.6.
To prove (17.9), let k represent any of the numbers 0, 1, 2, … , and write

 e t dt = F(k),
0
−t k
373
∞ −t 3
to indicate the integral’s dependence on the parameter k; for example, ∫ e t dt is 0
denoted by F(3). Notice in particular that

17.9

F(0) = e −t
dt = [−e−t] 0∞ = 1. (17.10)

DIFFERENTIATING WITH RESPECT TO A PARAMETER


0

For k = 1, 2, … , integrate by parts. Put u = t k and dv/dt = e−t; then


∞ ∞ ∞

F(k) = 
0
e−tt k dt = [t k(−e−t)] 0∞ −  0
et
(ktk−1)(−e−t) dt = k
0
−t k−1
dt

(the bracket is zero because k  1). The integral is F(k − 1), so


F(k) = kF(k −1) for k = 1, 2, … . (17.11)

By integrating by parts, we have reduced the degree of t by unity. We could evalu-


ate F(N), where N is given, by integrating by parts again and again until we reach
F(0), given as 1 by (17.10). But we do not have to integrate by parts any more:
eqn (17.11) does it for us. Put k = 0, 1, 2, 3, … , successively: we obtain
F(0) = 1 (by (17.10)),
F(1) = 1F(0) (by (17.11)) = 1· 1 = 1!,
F(2) = 2F(1) (by (17.11)) = 2(1!) = 2!,
F(3) = 3F(2) (by (17.11)) = 3(2!) = 3!,
and so on (each line uses the result of the previous line). So, if we are given N, we
shall reach F(N) after N lines, and find that
F(N) = N!.
The argument above can be expressed in a different way. Using (17.11) repeatedly
we have
F(N) = NF(N − 1) = N(N − 1)F(N − 2) = ··· ,
until we arrive at F(0), which is 1, and we are left with N! on the right.
Equation (17.11) is an example of a reduction formula, by which an integral
can be systematically reduced, one step at a time, to progressively simpler
integrals. (See Problems 17.14, 17.15, 17.16.)

17.9 Differentiating with respect to a parameter


(A fuller account of this topic is given in Section 28.8.) Suppose that we wish to
integrate a function which, besides the variable of integration, contains a parameter
(i.e. a general constant which may take any of a range of values). For example,

 xe
0
−αx
dx

is such an integral, the parameter being α. This may be written


∞ ∞

I(α) =  xe
0
−αx
dx = −  ddα (e
0
−αx
) dx,
374
the derivative being with respect to α (not to x, which is treated like a constant for
the purpose of the differentiation). It can be shown, as in Section 28.8, that the
SYSTEMATIC TECHNIQUES FOR INTEGRATION

operator d /dα can be taken outside the integral sign, so that we have
∞ ∞

I(α) = 0
x e−αx dx = −
d
dα 
0
e−αx dx = −
d ⎛ 1⎞ 1
⎜ ⎟ = 2.
dα ⎝ α ⎠ α
In cases when we can foresee that the original integrand can be written in the form
d
of something that we can integrate with respect to x,

this procedure enables the original integral to be worked out. The following
two examples further illustrate the procedure; the method can also be used for
indefinite integrals.

Example 17.25 Evaluate the indefinite integral

x α
ln x dx = I(α).

We observe that (see Problem 3.18)


17

d α d α ln x
(x ) = (e ) = eα ln x ln x = xα ln x.
dα dα
Therefore

 dα (x ) dx = dα x dx
d d
I(α) = α α

d ⎛ 1 ⎞ 1 1
= ⎜ xα +1 ⎟ = − xα +1 + xα +1 ln x,
dα ⎝ (α + 1) ⎠ (α + 1)2 (α + 1)
apart from a constant of integration.

Example 17.26 Show that


 (x + 1)
dx 1
= .
0
2 2 2

There is no parameter in the integral, so we shall introduce one and put α = 1 at the end.
Define I(α) by

 (x
dx
I(α ) = . (i)
0
2
+ α 2 )2
Observe that
d ⎛ 1 ⎞ 1 1 1 d ⎛ 1 ⎞
= −2α 2 or =− .
dα ⎜⎝ x2 + α 2 ⎟⎠ (x + α 2 )2 (x2 + α 2 )2 2α dα ⎜⎝ x2 + α 2 ⎟⎠
Then
∞ ∞
d ⎛ 1 ⎞
 
1 1 d dx
I(α ) = − ⎜ ⎟ dx = − . (ii)
2α dα ⎝ x2 + α 2 ⎠ 2α dα x2 + α 2
0 0

375
Example 17.26 continued

PROBLEMS
But
∞ ∞
⎡1 ⎛ x ⎞⎤

dx 1
= ⎢ arctan ⎜ ⎟ ⎥ = . (iii)
0 x +α
2 2
⎣α ⎝ ⎠
α ⎦0 α
Put (iii) into (ii); we obtain
1 d ⎛ 1⎞ 1
I(α ) = − ⎜ ⎟= . (iv)
2α dα ⎝ α ⎠ 2α 2
By putting α = 1 into (iv) we obtain from (i)

 (x
dx 1
= ,
0
2
+ 1)2 2
as requested (though (iv) is a more general result).

Self-test 17.8

Evaluate

 x (ln x) dx.
α 2

(Hint: use the result from Example 17.25.)

Problems

17.1 (Section 17.1). Obtain ∫ f(x) dx when the f(x)


 (−t) dt if t  0; (f ) (1s−dss) ;
1
(e) 2
are as follows. 3

(a) sin 3x; (b) cos 4x; (c) e−3x;


(d) (1 + x) ; (e) (1 − x)9; (g)  cos (ω t − φ) dt.
10

(f) (3 − 2x)5; (g) (1 + 2x)1 n;


(h) x(x − 1)4;1 (i) (1 − x) 2 ;
−2
(j) (2 x − 3) for  32 ;
x 17.3 (Section 17.2). Obtain ∫ f(x) dx when the f(x)
(k) 1/(3x + 2)2; (l) 1/(1 − x)4; are as follows.
(m) 1/(1 + x); (n) 1/(2x + 3); (a) x e −x 2 ; (b) x sin x2;
(o) x/(1 − x)2; (p) (1 + x)/(1 − x); (c) x cos x 2
; (d) x cos(x2 + 3);
(q) x /(x − 1) 2 for x  1;
1
(e) x cos(1 − 3x2); (f ) x(x2 − 1)4;
(r) cos(1 − 2x); (s) sin(2x − 3). (g) x(3x2 + 4)3; (h) x /(1 + 2x2);
(i) x3(1 − x2)3 (note: x3 = xx2);
17.2 (Section 17.1). Evaluate the following (j) x/(1 + x2); (k) x /(3x2 − 2).
indefinite integrals.
17.4 (Section 17.3). Find ∫ f(x) dx where the f(x)
(a)  (2t − 5) dt; 5
(b)  sin (3t − 1) dt;
1
2 are as follows.
(a) sin x cos x; (b) sin2x cos x;
(c)  (d)  e dr;
1 −3r (c) sin22x cos 2x; (d) cos2x sin x;
dw ;
(2w + 1) 2 (e) cos23x sin 3x; (f ) sin3x cos x;
376
(g) cot 2x; (h) tan --12 x; (i) (sin3x)/cos x; 1
(i) (put t = u 2 );
t 2 (1 + t)
1
(j) sin3x (= sin2x sin x); (k) tan3x (compare (j));
SYSTEMATIC TECHNIQUES FOR INTEGRATION

(l) cos3x (compare (j)). 1 1


(j) 2 sin (put t = 1 /u);
t t
17.5 Evaluate the following definite integrals by 1
(k) (1 − x 2 ) 2 (put x = sin u);
using any necessary substitutions.
(l) 1 /(1 + x 2 ) 2 (put x = tan u).
1

  (1 −
1 1

(a) (1 + x)7 dx; (b) 1


2 x)7 dx;
−1 −1 17.8 Use partial fractions to evaluate ∫ f(x) dx for
the following f(x).
 
1 1
x dx (a) 1/(x2 − 4); (b) 1/x(x + 2);
(c) x(1 − x2)3 dx; (d) ;
0 0
2x + 3 (c) 1/x2(x − 1); (d) x/(2x + 1)(x + 1);
−2 (e) (x + 1)/(4x2 − 9); (f ) 1/x(x2 + 1);
 
4
dx dx
(e) (note: x  −1); (f) ; (g) x/(2x2 + 3x + 1); (h) 1/x2(2x + 1);
−3
1+x 3
2 − 3x (i) 1/cos x (first put u = sin x);
1
π (j) 1/sin x (first put u = cos x).
 x (1 − x ) dx;
1


4
3 2 3
(g) (h) tan t dt;
0 0 17.9 Obtain ∫ f(x) dx for each of the following f(x),

1

1 noting that they take the form cg(u) du/dx (see the
(i)  1
12 π
cot 3w dw; (j)  0
sin u cos u du; remark at the end of Section 17.5), so that (a), for
example, will respond to the substitution u = x3 − 1.
π 1
π
(a) x2(x3 − 1)5; (b) (x − 1)( x2 −2x + 3)−1;

  (d) x 2 (3x 2 + 2) 2 ;
2 1 3 1
2
(k) –12
(sin v) cos v dv; (l) cos3θ dθ ; (c) 1/(x ln x);
(e) (ex − e−x)/(ex + e−x); (f ) 1 /x 2 (x 2 + 1);
1 1
17

0 − 12 π
(g) x2 /(x3 + 1).
π 2 π /ω
1 1

 
2

(m) sin 2t dt; (n) cos(ω t + φ ) dt.


0 − 12 π /ω 17.10 Use integration by parts (Section 17.7) to
obtain ∫ f(x) dx for each of the following f(x).
17.6 Use the identities cos A = --(1 + cos 2A), 2 1 (a) x e−x; (b) x e3x; (c) x e−3x;
2
sin2A = --12 (1 − cos 2A), or sin A cos A = --12 sin 2A to (d) x cos x; (e) x sin x; (f ) x cos 12 x;
evaluate the following. (g) x sin 2x; (h) x(1 − x)10; (i) x ln x;
π π (j) xn ln x, n ≠ −1;
(a) 
0
sin2t dt; (b)  cos t dt;
0
2 (k) (ln x)/x (the method might seem to have failed;
but look again).
1
π 2π
1

  cos t dt;
2

(c) sin 2 2t dt; (d) 21 17.11 Use integration by parts (see Example
2
0 0 17.22), writing the integrand as f(x)(1), to obtain
π π ∫ f(x) dx for each of the following f(x).
(e)  sin23t cos 3t dt; (f )  cos u du. 4 (a) ln2x; (b) arcsin x;
−π 0 (c) arccos x; (d) arctan x.

17.7 Use the substitutions suggested to evaluate 17.12 To evaluate ∫ f(x) dx for the following f(x),
∫ f(x) dx for the following f(x). (In several of integrate by parts twice; then look closely at your
the questions the identity 1 + tan2A = 1/cos2A is result. (If it does not work out you have probably
needed. You may also have to refer to the table, made a mistake with a sign.) Compare your
Appendix E.) results with (15.11).
(a) ln x/x (put x = eu); (a) ex sin x; (b) e−x sin x; (c) e−x cos x.
2 12
(b) x(1 − x ) (try (i) u = 1 − x2, (ii) x = sin u);
(c) 1/(ex + e−x)1 (put u = ex); 17.13 (Integration by parts: definite integrals,
(d) 1/ (1 − x 2 ) 2 (try (i) x = sin u, (ii) x = cos u; why Section 17.8). Evaluate the following.
do the results seem to be different?); π

1

(e) tan2x (put u = tan x);


(f) 1/x2(1 + x2) (put x = 1/u, followed by another
(a)  x cos x dx;
0
(b)  x cos 2x dx;
0
process); π
(g) 1/(1 + x2) (put x = tan u); (c)  x cos x dx;
2

(h) 1/cos2x (put u = tan x); 0


377

(d)  e sin x dx (integrate by parts twice);


−x
(c) Obtain a reduction formula for
π

PROBLEMS
1


2
0
∞ F(k) = sinkx dx,
(e)  e cos x dx (integrate by parts twice);
−x 0
1
π 1
π
0 and use it to evaluate ∫ 02 sin4x dx and ∫ 02 sin5x dx.

(f)  (g)  arcsin x dx;


2 1
ln x dx
; 17.17 (Change of variable etc.). Denote the integral
x c


1 0
dx
for c  0 by F(c). Deduce the properties
 arccos x dx;  arctan x dx;
1 1

(h) (i) 1 x

−1 0
(a) to (d) below. F(c) is obviously equal to ln c,
but do not use any of the known properties of the
 ln x dx.
2

(j) logarithm; pretend that this is the first time you


1 have ever seen the integral.
(a) F(a −1) = −F(a) if a  0. (Hint: put c = a −1 in the
17.14 (Compare (17.9).) Denote ∫ 10 xk ex dx by F(k) definition; then change the variable to u where
for k = 0, 1, 2, … . Integrate by parts to obtain the u = x −1.)
reduction formula (b) F(ab) = F(a) + F(b) if a and b  0.
(c) F(a /b) = F(a) − F(b), where a and b  0.
F(k) = e − kF(k − 1) (d) F(an) = nF(a) if a  0 and n has any value.
(provided that k is positive). By applying it four
times, show that 17.18 (Integration by parts). It is stated in
Example 15.19 that, in obtaining v from dv/dx,

 x e dx = −15e + 24F(0) = 9e − 24.


1

F(4) = 4 x we may take any antiderivative (so naturally we


0
always take the simplest one, with C = 0 in the
tables). Confirm that this is true for Example 17.19,
1
π in which u = x and dv/dx = ex, by choosing v(x) = ex
17.15 Denote ∫ 02 coskx dx by F(k) when k = 0, 1,
+ A instead.
2, … . Integrate by parts to show that Prove that the truth of (17.7) is always
k −1 unaffected by the choice of antiderivative for v(x).
F(k) = F(k − 2) for k = 2, 3, … .
k
17.19 (Integration by parts: an apparent paradox).
Evaluate F(0) and F(1). Use the reduction Consider the following calculation.
formula repeatedly, together with F(0) and F(1),
to evaluate
1
π 1
π
x −1

dx = x−1(1) dx

 
2 2

0
cos4x dx and
0
cos5x dx.
 
= x−1x − (−x−2)x dx = 1 + x −1 dx.

Therefore 0 = 1. How is this to be resolved?


17.16 Follow the lines of Problems 17.14
and 17.15 to obtain the following reduction 17.20 Verify the following moments of inertia I
formulae, and to integrate the special cases about the axis stated (Section 16.4):
given. The letter k is an integer as specified (a) thin circular disc, mass m, radius a, about a
for each case. diameter: I = 41 ma 2;
(a) Let (b) solid uniform sphere, mass m, radius a, about
a diameter: I = 25 ma 2;

 (ln x) dx (k  0).
2

F(k) = k (c) thin spherical shell, mass m, radius a, about a


1 diameter: I = 32 ma 2;
(d) thin rectangle, mass m, side lengths 2a and 2b,
Show that F(k) = 2(ln 2)k − kF(k − 1) for k  1, and
about a diagonal: I = 4ma2b2/[3(a2 + b2)]
evaluate ∫ 12 (ln x)3 dx.
(e) solid uniform cone, mass m, base radius a,
(b) Let F(k) = ∫ π0 xk sin x dx (k  0). Integrate by height h, about its axis: I = 103 ma 2.
parts twice to show that
F(k) = πk − k(k − 1)F(k − 2) 17.21 Assume that

for k  2. Evaluate ∫ x sin x dx and


∫ π0 x5 sin x dx.
π
0
4
e−at
cos bt dt = A e−at cos bt + B e−at sin bt + C,
378
where A, B, and C are constants. By differentiating
this expression and matching both sides, obtain the
(g)  sec x dx, u = sec x + tan x;
SYSTEMATIC TECHNIQUES FOR INTEGRATION

constants A and B in terms of a and b. Compare


(h) 
4
your result with eqn (15.11). dx
, x=u ; 2
1 + √x
0

(i)  x(1 + x) dx, x = u − 1;


17.22 Evaluate the indefinite integral 1
3 3


I(α) = x2e−α x dx,
(j) 
2
(x + 1) dx 2
1
, u=x− ;
using the technique of differeniating under the x √(x + 7x + 1)
1
4
x 2

integral sign. (Hint: (d2/dα 2)(e−α x) = x2e−α x.)


 √(1 + √x) dx,
4

(k) u = 1 + √x.
0
17.23 (Some additional special substitutions).
Evaluate the following integrals starting with the 17.24 If p(x) is a polynomial of degree n,
substitution suggested (further substitutions may show that
be required: the table of integrals in Appendix E
may also be helpful):
 e p(x) dx = +e (−1)
x
[p(x) − p′(x) + p″(x) − ···
p (x)] + C.
x

n (n)

(a)  dx
x √(x 2 − a 2 )
, x = a/u; Hence evaluate

 e (x − 2x + x − 2) dx.
1

 dx x 3 2
(b) , x = a/u;
17

x √(a 2 − x 2 ) 0

What is the formula for


(c)  dx
a 2 sin 2 x + b2 cos 2 x
, u = (a tan x)/b;
 e p(x) dx?
1
−x

(d)  sindxx , u = tan 12 x; What is the value of the infinite integral


 e p(x) dx?−x

(e)  dx
3 + 5 cos x
, u = tan 12 x; 0

17.25 Find the centroid of the uniform plate


(f)  dx
5 cosh x + 4 sinh x
, u = tanh 12 x; bounded by the parabola y 2 = 4ax and the straight
line x = h (a, h  0).
Unforced linear differential
equations with constant
coefficients
18

CONTENTS

18.1 Differential equations and their solutions 380


18.2 Solving first-order linear unforced equations 382
18.3 Solving second-order linear unforced equations 384
18.4 Complex solutions of the characteristic equation 388
18.5 Initial conditions for second-order equations 391
Problems 393

Suppose that we have a problem in which a quantity x that we are studying


depends on the time t; that is to say, x is a function of t, which we will write as x(t).
From the physics and geometry of the problem we can often obtain an indirect
relation between x and t, called an equation for x. The equation might be an
ordinary algebraic equation such as x2 + 2xt = 1, but it might contain dx /dt or
d2x/dt2, as in the equation d2x/dt 2 = g for a falling body where g is the gravitational
acceleration. This is a simple example of a differential equation, and we can solve
it by the methods of earlier chapters (compare Problem 14.8f ).
The equation
dx
= 3x
dt
is also a differential equation, but we do not yet know how to find an explicit
solution for x in terms of t. Obviously not just anything will do; if for instance
we try x = t 2 it does not work, because then dx /dt = 2t, but 3x = 3t 2, and these are
quite different.
A clue is given by interpreting the equation: it says that a quantity x always grows
at a rate proportional to the amount of x already present. This is a property of the
exponential function (see Section 1.10), so we might try exponential functions
of t. In fact,
x = e3t
solves the equation, because then dx/dt = 3 e3t, and this is equal to 3x, as required.
However, it is not the only solution, because
x = A e3t,
where A is any constant, also solves the equation.
380

18.1 Differential equations and their solutions


UNFORCED LINEAR DIFFERENTIAL EQUATIONS WITH CONSTANT COEFFICIENTS

In general, a differential equation as a function x(t) is an equation involving at least


the first derivative dx /dt as well as, possibly, x and t separately. Some examples are
dx d2 x d x d 3x x 2
+ 2xt = 1, + + x = 0, = 2 .
dt dt 2 dt dt3 t
In such equations, t is called the independent variable and x the dependent
variable. An equation is called first-order, second-order, and so on, according to the
order of the highest derivative in it: dx /dt, d2x /dt2, and so on.

Resistance Inductance

R L

Voltage Switch
E(t)

Fig. 18.1

Problems in science and engineering are often most easily formulated in terms
of differential equations. Suppose for example that in the RL circuit of Fig. 18.1
the switch is closed at time t = 0, and that subsequently the voltage applied is E(t).
Then the current x(t) is found by solving the differential equation
dx
L + Rx = E(t).
dt
Here we have collected all the terms that involve x (including dx /dt) on the left
side and have put the term that does not involve x, namely E(t), on the right. This
is the conventional arrangement. The term independent of x which comes on the
right is then called the forcing term, the reason being obvious in this case, since
E(t) drives the circuit.
The differential equation with the same left-hand side, but with a zero forcing
18

term on the right, plays a key role in obtaining solutions of the original equation.
Such equations are called unforced differential equations, or sometimes homo-
geneous equations, and are the subject of this chapter. Also, for the present, we
shall further restrict ourselves to linear equations with constant coefficients,
which have the form:

Linear unforced differential equations with constant coefficients


(a) First-order:
dx
+ cx = 0 (c constant).
dt
(b) Second-order:
d2x dx
2
+b + cx = 0 (b, c constants).
dt dt (18.1)
381
These are called linear because there are no squares, products, etc., involving x
and its derivatives. Such equations have comparatively simple characteristics. The

18.1
simplest instance of all is

DIFFERENTIAL EQUATIONS AND THEIR SOLUTIONS


dx
= 0.
dt
It has solutions x = A, where A is any constant. There is therefore an infinity of
solutions, and we must expect this to be true in more general cases too.
A solution of a differential equation is any function x(t) which fits, or satisfies,
the equation. This is illustrated in the next two examples.

Example 18.1 For the differential equation dx /dt + 2x = 0, verify that (a) x = e2t
is not a solution, (b) x = 2 e−2t is a solution.
(a) Test x = e2t. Then dx /dt = 2 e2t and so
dx
+ 2x = 2 e2t + 2 e2t = 4 e2t.
dt
This is not zero, so e2t is not a solution.
dx
(b) Test x = 2 e−2t. Then = − 4 e −2t and so
dt
dx
+ 2x = − 4 e−2t + 4 e−2t = 0.
dt
The zero value is what the equation requires, so 2 e−2t is a solution.
Incidentally, we can confirm in the same way that x = A e−2t, where A is any constant,
is always a solution. We have
dx
+ 2x = −2A e−2t + 2A e−2t = 0,
dt
as it should be. This is the infinity of solutions we were expecting.

Example 18.2 Verify that the following functions are solutions of the
second-order equation d2x/dt2 + 4x = 0: (a) x = cos 2t, (b) x = sin 2t,
(c) x = A cos 2t + B sin 2t, where A and B are any constants.
Note that ‘verify’ means ‘try out’: you are not expected to show how the
solutions were obtained.
(a) If x = cos 2t, then dx/dt = −2 sin 2t, and d2x /dt 2 = − 4 cos 2t. Therefore
d2 x
+ 4x = −4 cos 2t + 4 cos 2t = 0
dt2
as required.
(b) Similarly, if x = sin 2t, then
d2 x
+ 4x = −4 sin 2t + 4 sin 2t = 0.
dt2
(c) Confirmation is straightforward, but the underlying reason why the previous
solutions can be combined into a new solution in this way is made clearer by
organizing the calculation as follows. ➚
382
Example 18.2 continued
UNFORCED LINEAR DIFFERENTIAL EQUATIONS WITH CONSTANT COEFFICIENTS

d2 x d2
2
+ 4x = 2 (A cos 2t + B sin 2t) + 4(A cos 2t + B sin 2t)
dt dt
⎛ d2 ⎞ ⎛ d2 ⎞
= A ⎜ 2 cos 2t + 4 cos 2t⎟ + B ⎜ 2 sin 2t + 4 sin 2t⎟ ,
⎝ dt ⎠ ⎝ dt ⎠
by rearranging the terms. We already know that the two bracketed expressions are zero,
so the whole expression is zero as required.
The separation of d2x/dt 2 + 4x into an ‘A’ part and a ‘B’ part in this way is possible
only because the equation is linear.

18.2 Solving first-order linear unforced equations


Consider the equation
dx
+ cx = 0 (c a fixed constant). (18.2)
dt
If we write it in the form
dx
= (−c)x,
dt
it can be seen to describe the variation of a quantity x(t) which decays (if c is positive)
or grows (if c is negative) at a rate proportional to the amount of x already present.
From Section 1.10, we know that exponential functions have this property. We
shall therefore test for solutions of the form
x(t) = A emt (18.3)

where A and m are unknown constants which we shall try to adjust to fit the
equation. From (18.3),
dx
+ cx = Am emt + cA emt = A(m + c) emt.
dt
This quantity must be zero for all values of t in order to fit the differential equa-
tion (18.2). Ignoring the possibility A = 0, which gives us the so-called trivial
18

solution x(t) = 0, we must have


m = −c,
and in that case it does not matter what value is given to A. We have therefore
found a collection of solutions x(t) = A e−ct, where A is an arbitrary constant.
It can be proved that there are no other solutions, and so we call the solutions
we have found the general solution of the equation.

The general solution of


dx
+ cx = 0
dt
where c is a given constant, is
x = A e−ct,
where A is any constant. (18.4)
383

Example 18.3 Find the general solution of dx /dt − 4x = 0.

18.2
We will rework the theory. Look for solutions of the form x = A emt:
dx

SOLVING FIRST-ORDER LINEAR UNFORCED EQUATIONS


− 4x = Am emt − 4A emt = A emt(m − 4).
dt
This is zero for all time if m = 4, whatever the value of A. Therefore the general solution
(which includes the solution x(t) ≡ 0) is
x = A e4t, with A an arbitrary constant.
Figure 18.2 depicts several of these solutions, corresponding to various values of the
arbitrary constant A.

1 1
8 4 2 1
x 2 4

2
1
8

O
A=0
−0.6 0.2 0.4 0.6 t

−1
1
−8

−2
1 1
−8 −4 −2 −1 −2 −4

Fig. 18.2 The values of A are indicated on the curves.

Each value of A gives a different curve, and these solution curves fill the whole plane.
Also the curves do not cross, so there is one and only one curve through every point. This
corresponds to the fact that the slope dx/dt has one and only one value at every point,
namely the value prescribed by the differential equation dx/dt = 4x taken at the point.
This is all strong evidence that we have found all the solutions. More is said about the
graphical way of understanding differential equations in Chapters 22 and 23.

dx
Example 18.4 Find all the solutions of 3 + 2x = 0.
dt
We could carry out the full calculation as in the previous example. However, if instead
we want to quote the formula, (18.4), we must first write the equation in the form
dx 2
+ 3 x = 0.
dt
Therefore c = 23 (not 2), and the general solution is
x = A e − 3 t , with A any constant.
2
384
It is worth while to memorize the formula (18.4).
In practical cases we do not usually need all the solutions, but only the one
UNFORCED LINEAR DIFFERENTIAL EQUATIONS WITH CONSTANT COEFFICIENTS

which satisfies some further condition of the problem. Frequently the condition
supplied describes the condition prevailing at the start of the action, or at some
other time, as in the following.

dx
Example 18.5 Find the solution of − 4x = 0 for which x = 2 when t = 1.
dt
Other ways of saying this are ‘find the solution curve which passes through the
point (1, 2)’, or ‘find a solution x(t) so that x(1) = 2’.
From Example 18.3, all the possible solutions are given by
x = A e4t.
Since x = 2 when t = 1, we must have 2 = A e4. Therefore
A = 2 e− 4
and the single solution picked out is
x = (2 e− 4) e4t = 2 e4(t−1).

An extra condition of this type is called an initial condition. It describes the


state of the system at a given time. The differential equation together with its
initial condition is called an initial-value problem.

Initial-value problem, first-order equation


dx
(a) Differential equation: + cx = 0.
dt
(b) Initial condition: x = x0 at t = t0
(or x(t0) = x0), with x0 and t0 specified. (18.5)

Self-test 18.1
18

Solve dx/dt − 10x = 0 with the condition x = 1 at t = 2.

18.3 Solving second-order linear unforced equations


For second-order differential equations of the type (18.1b), we use a similar
technique.

Example 18.6 Find some solutions of the equation


d x dx
2
+ − 2x = 0.
d t2 dt
We will look first for absolutely basic solutions. Test whether there are any solutions of
the form x(t) = emt, where m is constant. Because dx/dt = m emt and d2x/dt2 = m2 emt, we have ➚
385
Example 18.6 continued

18.3
d2 x d x
+ − 2x = m2 emt + m emt − 2 emt = emt(m2 + m − 2).
dt2 dt

SOLVING SECOND-ORDER LINEAR UNFORCED EQUATIONS


This is zero for all time if m2 + m − 2 = 0, that is if
m = 1 or −2.
This gives us two solutions, namely
x(t) = et and x(t) = e−2t.
From this basis, we can obtain more solutions. Guided by Example 18.2c, we show
that also
x(t) = A et + B e−2t,
where A and B are arbitrary constants, is a solution. By substituting into the equation
and sorting the terms into those with coefficient A and those with coefficient B, we
obtain
⎛ d2 d t ⎞ ⎛ d2 d −2t ⎞
A ⎜ 2 et + e − 2 et ⎟ + B ⎜ 2 e −2t + e − 2 e −2t ⎟ = 0,
⎝ dt dt ⎠ ⎝ dt dt ⎠
because et and e−2t are known already to be solutions; so both of the bracketed expressions
are zero.

This is the principle, but consider now the general case


d2 x dx
+b + cx = 0.
dt 2 dt
Look for solutions of the form x = emt. Then
d2 x dx
+b + cx = emt(m2 + bm + c).
d t2 dt
This will be zero for all t, as required by the differential equation, if
m2 + bm + c = 0, (18.6)

which is called the characteristic equation. Being quadratic, it may have two real
solutions, exactly one real solution, or two complex solutions, depending on the
coefficients. Consider the real cases first:

Solutions m1 and m2 of the characteristic equation real and different


In this case,
x(t) = em1t and x(t) = em2t
are solutions of the differential equation, and from these we can construct a
whole family of solutions
x(t) = A em1t + B em2t,
where A and B are arbitrary. It can be proved that there are no more solutions: this
gives the general solution. The pair of functions (em1t, em2t) is called a basis for the
general solution.
386

Characteristic equation with solutions real and unequal


UNFORCED LINEAR DIFFERENTIAL EQUATIONS WITH CONSTANT COEFFICIENTS

d2x dx
2
+b + cx = 0; solutions m1 and m2 of
dt dt
m2 + bm + c = 0 real and different.
Basis of solutions: em t, em t.
1 2

General solution: A em t + B em t (A, B arbitrary).


1 2
(18.7)

d2 x d x
Example 18.7 Find the general solution of 2 − − x = 0.
d t2 dt
To correspond with the standard form, (18.7), we should have to write the equation
in the form d2x/dt2 − 12 dx / dt − 12 x = 0, but there is no need to do this if we directly
test for solutions of the form x = emt. The characteristic equation then takes the form
2m2 − m − 1 = 0, or (2m + 1)(m − 1) = 0, so that1 m1 = − 12 , m2 = 1. Therefore the basis
for the general solution is the solution pair (e − 2 t , et ), and the general solution is
x(t) = A e − 2 t + B et, A and B arbitrary.
1

Solutions m1 and m2 of the characteristic equation are equal


Suppose that m1 = m2 = m0, say. We have then only one function for our basis
instead of two, and we might expect the general solution to be A em0t. However, all
we know is that there is essentially only one solution of the form emt (ignoring
simple multiples of emt), but we shall see in the next example that there is also a
solution which is not of this form, namely
x(t) = t em0t. (18.8)

We might therefore think there will be no end to it: if t em0t is a solution, then
why not t 2 em0t, or some function of great complication? However, it can be proved
that every second-order linear differential equation has exactly two linearly
independent solutions (i.e. they are not just constant multiples of each other);
18

also that these form a basis of solutions: we do not need any others to construct
the most general solution. Formally:

Basis and general solution of


d2x dx
2
+b + cx = 0
dt dt
(a) There exist two linearly independent solutions.
(b) If u(t) and v(t) are any two linearly independent solutions, these form a basis
for the general solution; that is to say, the general solution is given by
x(t) = Au(t) + Bv(t),
where A and B are arbitrary constants. (18.9)
387

d2 x dx
Find the general solution of +4 + 4x = 0.

18.3
Example 18.8
d t2 dt
The characteristic equation, formed by substituting x(t) = emt, is

SOLVING SECOND-ORDER LINEAR UNFORCED EQUATIONS


m2 + 4m + 4 = (m + 2)2 = 0,
and the only value of m that we find is m = −2. It corresponds to the basic solution e−2t.
The theorem (18.9) guarantees there is another independent solution, and it does not
matter how we find it. Test the truth of (18.8), which proposes an independent solution
having the form
x(t) = t e−2t.
Then
dx d2 x
= (1 − 2t) e −2t , and = (− 4 + 4t) e −2t.
dt dt2
Therefore
d2 x dx
+4 + 4x = [(− 4 + 4t) + 4(1 − 2t) + 4t] e−2t,
dt2 dt
which is zero, so x(t) = t e−2t is a second solution, and it is independent of the first.
By (18.9), the solution basis is therefore
(e−2t, t e−2t), (i)

and the general solution is


x(t) = A e−2t + Bt e−2t, A and B arbitrary.

The second solution always takes a form similar to (i) in Example 18.8:

Characteristic equation: coincident solutions


d2x dx
If 2 + b + cx = 0, in which b2 − 4c = 0 solutions and m0 is the single
dt dt
solution of the characteristic equation m2 + bm + c = 0, then the solution basis
is (em t, t em t) and the general solution is x(t) = A em t + Bt em t (A and B arbitrary
0 0 0 0

constants). (18.10)

An alternative way to justify the second solution t e−2t in Example 18.8 is to try
x = f(t) e−2t. Then it can be shown that
d 2x dx
+ 4 + 4 = f ″(t)e−2t,
dt2 dt
which is zero for all t if f″(t) = 0. Hence f(t) = A + Bt e−2t.
388

Self-test 18.2
UNFORCED LINEAR DIFFERENTIAL EQUATIONS WITH CONSTANT COEFFICIENTS

Find the general solution of


d 2x dx
− (1 + a) + ax = 0
dt2 dt
for the cases (a) a ≠ 1, and (b) a = 1.

18.4 Complex solutions of the characteristic equation


If b2  4c, the solutions m1 and m2 of the characteristic equation m2 + bm + c = 0
for the differential equation d2x/dt2 + b dx /dt + cx = 0 are complex. Since they are
solutions of a quadratic equation, they must be complex conjugates, so put
m1 = α + iβ, m2 = α − iβ,
where α and β are real numbers. The corresponding functions
e(α + iβ )t and e(α −iβ )t (18.11)

are genuine solutions of the differential equation. They are complex functions,
so we call (18.11) a complex basis for solutions of the differential equation. If
we are interested in complex as well as real solutions, then we can allow the arbi-
trary constants A and B to be complex as well, in an all-inclusive general complex
solution
x(t) = A e(α + iβ )t + B e(α −iβ )t.
Suppose, however, that we want the general solution to consist only of real
functions. Then a basis for real solutions can be got from (18.11) in the following
way. By (6.8)
e(α + iβ )t = eα t eiβ t = eα t cos β t + i eα t sin β t.
This function solves the differential equation, so its real and imaginary parts
separately must also solve it. Therefore
18

(eα t cos β t, eα t sin β t)


is a real basis for the general (real) solution
x(t) = A eα t cos β t + B eα t sin β t, (18.12)

where A and B are arbitrary (but real, of course). The second complex solution,
e(α −iβ )t, has the basis, (eα t cos β t, −eα t sin β t), which leads to the same family of real
solutions, so we get nothing new by considering it.
Equation (18.12) can be written in a different form. Using the identity (1.18),
we have
A cos β t + B sin β t = C cos(β t + ϕ),
where C and ϕ are constants related to A and B. Therefore (18.12) can be written
x(t) = C eα t cos(β t + ϕ).
Since A and B are arbitrary, so are C and ϕ.
389

Example 18.9 Find the general solution of

18.4
2
dx
+ 4x = 0.
d t2

COMPLEX SOLUTIONS OF THE CHARACTERISTIC EQUATION


The characteristic equation is m2 + 4 = 0. Its solutions are m = ± 2i. Therefore the
complex solution basis is (e2it, e−2it). But
e2it = cos 2t + i sin 2t,
and the real and imaginary parts give a basis for the real solutions:
(cos 2t, sin 2t).
Therefore the general solution is
x(t) = A cos 2t + B sin 2t (A, B arbitrary).

Example 18.10 Find the general solution of


d2 x dx
+2 + 2x = 0.
d t2 dt
Setting x = emt gives the characteristic equation m2 + 2m + 2 = 0, so that m = −1 ± i.
Therefore
(e(−1+i)t, e(−1−i)t)
is a basis for complex solutions. But
e(−1+i)t = e−t(cos t + i sin t),
whose real and imaginary parts are
e−t cos t, e−t sin t.
These form the basis for the real solutions. The general solution is
x(t) = A e−t cos t + B e−t sin t.
If we chose instead to take the real and imaginary parts of e(−1−i)t, we would obtain
(e cos t, −e−t sin t) as a basis. The minus sign will be absorbed into the arbitrary
−t

constant B: no new solutions appear.

The general solution method can be summed up as follows:

Characteristic equation: complex solutions


d2x dx
2
+b + cx = 0, when m2 + bm + c = 0 has complex roots
dt dt
m1, m2 = α ± iβ (i.e. b2  4c).
Complex basis: e(α + iβ )t, e(α −iβ )t.
Real basis: eα t cos β t, eα t sin β t.
General solution:
(a) x(t) = A eα t cos β t + B eα t sin β t
(A and B arbitrary);
or
(b) x(t) = C eα t cos( β t + φ) (C and φ arbitrary). (18.13)
390
A very important case is when b = 0 and c  0, illustrated by Example 18.9.
In that case, α = 0. In conventional notation, putting c = ω 2, we obtain the fol-
UNFORCED LINEAR DIFFERENTIAL EQUATIONS WITH CONSTANT COEFFICIENTS

lowing result:

Characteristic equation: special case


d2x
+ ω 2x = 0
dt2
Characteristic equation: m2 + ω 2 = 0; m1, m2 = ± iω.
Complex basis: eiω t, e−iω t.
Real basis: cos ω t, sin ω t.
General solution: (a) x(t) = A cos ω t + B sin ω t, or (b) x(t) = C cos(ω t + φ). (18.14)

x
4
Graph x = 4e−0.2t
2
5
O 10 20 t
−2 Fig. 18.3 Graph of
x(t) = 4 e− 0.2t cos(2t − 1).

In the special case (18.14), the alternative solution form


x(t) = C cos(ω t + φ)
shows that the solutions oscillate regularly, swinging above and below the t axis to
an extent governed by the amplitude C. In the general case (18.13),
x(t) = C eα t cos(β t + φ),
the solutions oscillate, but the amplitude is modulated by the factor C eα t. If α is
positive, the oscillation constantly grows; if α is negative, it dies away to zero. This
is fully discussed in Chapter 20, but Fig. 18.3 shows a particular case where α is
negative.
18

The damped unforced linear oscillator is the simplest linear model of an oscillat-
ing mechanical or electrical system which has a small amount of friction or some
other form of energy-loss mechanism (see Chapter 20 for a full discussion). In a
customary notation the equation is
d2 x dx
2
+ 2k + ω 2 x = 0.
dt dt
The term 2k dx/dt expresses the energy-absorbing property. Assume
k 2  ω 2.
The characteristic equation is m2 + 2km + ω 2 = 0, so that
m = − k ± (k 2 − ω 2 )2 = − k ± i(ω 2 − k 2 )2 ,
1 1

since k2  ω 2. From (18.13), α = −k and β = (ω 2 − k 2 )2 , so finally:


1
391

Damped linear oscillator

18.5
d2x dx
+ 2k + ω 2 x = 0 where k 2  ω 2
dt2 dt

INITIAL CONDITIONS FOR SECOND-ORDER EQUATIONS


General solution:
(a) x(t) = A e−kt cos(ω 2 − k 2 ) t + B e−kt sin(ω 2 − k 2 ) t
1 1
2 2

(A and B arbitrary constants); or


(b) x(t) = C e−kt cos[(ω 2 − k 2 ) t + φ ]
1
2

(C and φ arbitrary). (18.15)

Self-test 18.3
Find the general solution of
d 2x dx
2
− 4 + 13x = 0.
dt dt

18.5 Initial conditions for second-order equations


The general solution of a second-order differential equation involves two arbitrary
constants, and the solutions are therefore an order of magnitude more numerous
than in the first-order case. Unlike the first-order case, the solution curves may
cross – in fact, there is an infinite number of solution curves through any point on
the (x, t) plane, as indicated in Fig. 18.4a.

(b)
(a)
x
x

P P
x0
Slope given

O t O t0 t

Fig. 18.4 (a) An infinite number of curves pass through each point. (b) Selection of a solution,
given P and the slope at P.

To pick out a particular solution, we need to determine the two arbitrary con-
stants. Two pieces of information are necessary. These may consist of two initial
conditions, conditions which define the state of the system at some starting time t0:
the values of x(t) and the slope dx/dt at t = t0 are given (see Fig. 18.4b). For example,
the equation d2x/dt 2 + ω 20 x = 0 describes the oscillations of a particle on a spring;
the initial conditions tell us its position and velocity (i.e. its state) when it starts
off. We then have an initial-value problem:
392

Initial-value problem
UNFORCED LINEAR DIFFERENTIAL EQUATIONS WITH CONSTANT COEFFICIENTS

d2x dx
(i) Equation: 2 + b + cx = 0.
dt dt
(ii) Initial conditions:
dx
x = x0 and = x1 at t = t0,
dt
which may be expressed alternatively as
x(t0) = x0, x′(t0) = x1,
where x0 and x1 are given. (18.16)

Example 18.11 Find the solution of d2x/dt2 + 4x = 0 for which x = 1 and


dx/dt = 2 at t = 0 (i.e. x(0) = 1, x′(0) = 2).
First we need all the solutions. From Example 18.9, these are x(t) = A cos 2t + B sin 2t,
where A and B may take any values. Since x = 1 at t = 0,
1 = A + 0, so A = 1.
For the other condition, we first need x′(t) in general:
x′(t) = −2A sin 2t + 2B cos 2t.
At t = 0, we are given that x′(t) = 2, so the last equation becomes
2 = 0 + 2B, or B = 1.
The required solution is therefore x(t) = cos 2t + sin 2t.

The solution of higher order linear differential equations can be developed in a


similar manner. For example, the third-order equation
d3x d2 x dx
3
+ a 2 + b + cx = 0
dt dt dt
has the associated characteristic equation
18

m3 + am2 + bm + c = 0.
If this equation has three distinct solutions m1, m2, m3, then
x = A em t + B em t + C em t.
1 2 3

In higher order equations interesting mixtures of repeated and complex solu-


tions of the characteristic equation can occur.
393
Problems

PROBLEMS
(For the ‘dash’ notation x′(t) = dx /dt etc., see (4.1).) 18.6 (Götterdämmerung). Once upon a time,
rabbits in Elysium reached maturity instantly and
18.1 Say which of the following equations are bred with a birthrate of 20 rabbits per year per
linear, unforced, with constant coefficients couple. No rabbit ever died. At the start of the
(i.e. can be rearranged to conform with (18.1a)). experiment Zeus released 50 male and 50 female
(a) x′ = 3t; (b) x′ = 12 x; (c) x′ + tx = 0; rabbits.
(d) 3x′ − 2x2 = 0; (e) x′ − x = 0; (f) x′ = 0; By treating the number of rabbits as a
x′ dy 1 1 dy continuously varying quantity and considering
(g) 2 = 3; (h) + 2 y = 1; (i) = 2;
x dx y dx the number born in a short time δt, construct a
differential equation and then an initial-value
dI v′ + v + v2
( j) L + RI = 0; (k) = 1. problem for R(t), the rabbit population. Find
dt v′ − v + v2 how many rabbits there were at the end of
Year 4.
18.2 Write down all the solutions of the
Appalled by this result and assisted by Pluto,
following equations. Check one or two of them
Zeus launched another similar experiment, in
by substitution into the differential equation.
which any rabbit was allowed to live for one year
(a) x′ + 5x = 0; (b) x′ − 12 x = 0;
only. Construct the differential equation for the
(c) x′ − x = 0; (d) x′ + 3x = 0;
population. Did this alleviate the situation
(e) 3x′ + 4x = 0; (f) x′ = 2x; (g) x′ = 3x;
appreciably?
(h) x′/x = −3; (i) (x′ + 1)/(x + 1) = 1.
18.7 Obtain all solutions of the following
18.3 Solve the following initial-value problems.
equations. (The characteristic equations all
(a) x′ + 2x = 0, x = 3 when t = 0;
have real roots, not necessarily distinct.)
(b) 3x′ − x = 0, x = 1 when t = 1;
(a) x″ − 3x′ + 2x = 0; (b) x″ + x′ − 2x = 0;
(c) y′ − 2y = 0, y = 2 when x = − 3;
(c) x″ − x = 0; (d) x″ − 4x = 0;
(d) x′ + x = 0, x(−1) = 10;
(e) 3x″ − 14 x = 0; (f) x″ − 9x = 0;
(e) 2y′ − 3y = 0, y(0) = 1;
(g) x″ + 2x′ − x = 0; (h) x″ − 2x′ − 2x = 0;
(f) Find the curve whose slope at any point (x, y)
(i) 2x″ + 2x′ − x = 0; (j) 3x″ − x′ − 2x = 0;
is equal to 5y, and which passes through the
(k) x″ + 4x′ + 4x = 0; (l) x″ + 6x′ + 9x = 0;
point (1, −2).
(m) 4x″ + 4x′ + x = 0; (n) x″ = 0.
18.4 Suppose that the generator in Fig. 18.1 is
short-circuited and cut out at a moment when the 18.8 Verify that, when the characteristic equation
current in the circuit is I0. Find an expression for corresponding to x″ + bx′ + cx = 0 has coincident
the current subsequently. Show that the ratio L/R roots, m1 = m2 = m0, say, then the function x(t) = t
provides a measure of the time it takes for the em0t provides a second solution for the basis of the
current to die away. general solution. (For coincident roots, b2 = 4c.)

18.5 A radioactive element disintegrates at 18.9 Solve the following initial-value problems.
a rate proportional to the amount of the (a) x″ − 4x = 0, x(0) = 1, x′(0) = 0;
original element still remaining. Show that if (b) x″ + x′ − 2x = 0, x(0) = 0, x′(0) = 2;
A(t) represents the activity of the element at (c) y″ − 4y′ + 4y = 0, y(0) = 0, y′(0) = −1;
time t, then (d) y″ + 2y′ + y = 0, y(1) = 0, y′(1) = 1;
(e) x″ − 9x = 0, x(1) = 1, x′(1) = 1;
dA x″ − 4x′ = 0, x(1) = 1, x′(1) = 0.
+ kA = 0, (f)
dt
where k is a positive constant. 18.10 Obtain all solutions of the following
(a) Solve the initial-value problem for A if equations. (The roots of the characteristic
A = A0 (given) at time t = 0. equations are complex.)
(b) The time taken for the activity to drop to (a) x″ + x = 0; (b) x″ + 9x = 0;
half of the starting value is called the half-life (c) x″ + 14 x = 0; (d) x″ + ω 20 x = 0;
period. For uranium-232, it is found that 17.5% (e) x″ + 2x′ + 2x = 0; (f) y″ − 2y′ + 2y = 0;
has decayed after 20 years. Show that its half-life (g) y″ + y′ + y = 0; (h) 2x″ + 2x′ + x = 0;
period is about 72 years. (i) 3x″ + 4x′ + 2x = 0; (j) 3x″ − 4x′ + 2x = 0.
394
18.11 Solve the following initial-value 18.15 Consider the third-order differential
problems. equation
UNFORCED LINEAR DIFFERENTIAL EQUATIONS WITH CONSTANT COEFFICIENTS

(a) x″ + x = 0, x(0) = 0, x′(0) = 1; d3y


(b) x″ + 4x = 0, x(0) = 1, x′(0) = 0; − y = 0.
dx3
(c) x″ + ω 20 x = 0, x(0) = a, x′(0) = b;
(d) x″ + 2kx′ + x = 0, x(0) = 0, x′(0) = b for the Proceed by analogy with the method of Section
cases k2  1, k2  1, and k2 = 1. 18.3: by substituting y = emx, and obtaining a
characteristic equation for m (a cubic equation),
(Use the A, B form: finding the constants C and find three distinct basic solutions of this type. By
φ in (18.14b) for an initial-value problem can be introducing arbitrary constants A, B, C, find as
comparatively difficult.) wide a variety of solutions as you can (in fact,
this is the general solution).
18.12 The approximate equation for small
swings of a pendulum is 18.16 By proceeding as in Problem 18.15, find a
wide variety of solutions of the equation
dθ g
2
+ θ=0 d3y
dt2 l + y = 0.
dx3
where θ is the inclination from the vertical (in
radians), l is the length, and g the gravitational 18.17 By proceeding with the equation
acceleration. The pendulum is held still at an
angle α, and is then passively released. Find the d4y
−y=0
subsequent motion. dx 4
as in Problem 18.15, obtain the collection of
solutions
18.13 The pendulum in Problem 18.12 is hanging
at rest; then the bob is given a small velocity v in y(x) = A ex + B e−x + C cos x + D sin x,
the direction of θ increasing. Find the subsequent where A, B, C, D are arbitrary constants.
motion.
18.18 A tapered concrete column of height H
18.14 If there is a little friction in the pendulum
metres is to support a statue of mass M (i.e.
of Problem 18.12, the equation of motion takes weight Mg force units, where g is the gravitational
the form acceleration) at the top. Pressure (force per unit
area) may not exceed P. Show that the most
d 2θ dθ g economical construction for the column
+K + θ = 0,
dt2 dt l is for its cross-sectional area A(y), where y is
where K is an additional positive constant distance above the ground, to satisfy the equation
which takes account of the friction (assumed Mg ρg
 A(u) du,
H
to be proportional to the angular velocity). A(y) = +
In a particular case (SI units), g = 9.7, l = 20, P P y
18

K = 0.066. The pendulum is at rest at first, where ρ is the density of concrete. By


hanging freely. It is then pushed so as to give differentiating this expression (see Section 15.10),
the bob a velocity of 1 metre per second. Find obtain a differential equation for A(y), and an
the subsequent motion. initial condition for the equation, and solve it.
Forced linear differential
equations 19

CONTENTS

19.1 Particular solutions for standard forcing terms 395


19.2 Harmonic forcing term, by using complex solutions 399
19.3 Particular solutions: exceptional cases 403
19.4 The general solution of forced equations 404
19.5 First-order linear equations with a variable coefficient 407
Problems 411

The previous chapter treated differential equations that are linear, have constant
coefficients, and are homogeneous (meaning unforced, or having a zero right-
hand side). It is now shown how such equations are involved in obtaining the
general solution of linear equations which have a non-zero forcing term when this
term is a linear combination of polynomials, exponentials eax, and trigonometric
functions sin bx, and cos bx. Additionally, more general results about linear
equations are stated. Section 19.5 describes the integrating factor procedure for
first-order linear equations having non-constant coefficients as well as a non-zero
forcing term; this method therefore has very general application.

19.1 Particular solutions for standard forcing terms

Consider the equations

dx d2 x dx
+ cx = f(t), +b + cx = f(t).
dt d t2 dt

The function f(t) is called the forcing term; it represents physically the external
input to the physical system that the equation describes, and the system will
respond with an output x(t) which depends on the input f(t).
If f(t) is an exponential function K eα t, a sine or cosine function K sin β t or
K cos β t, or a polynomial, then we can find an individual particular solution
by trial.
396

Example 19.1 Find a particular solution of


FORCED LINEAR DIFFERENTIAL EQUATIONS

d2 x d x
+ − 2 x = 3 e2 t .
d t2 dt
Try for a solution containing the same exponential as that on the right-hand side of the
equation:
x(t) = p e2t,
where p is some constant – not an arbitrary constant, but one whose value we shall
settle by substitution: only one value will do. Then
dx d2x
= 2p e2t , and = 4 p e 2t ,
dt dt2
so that
d2 x d x
+ − 2x = 4p e2t + 2p e2t − 2p e2t
dt2 dt
= e2t(4 + 2 − 2)p = 4p e2t.
This must equal the given right-hand side, 3 e2t for all values of t, which is only possible
if 4p = 3, or
p = 43 .
19

Therefore a particular solution is


x(t) = 3
4 e 2t .
There are many other solutions, as we shall see in Section 19.4, but they are all based
on this particular solution.

d2 x
Example 19.2 Find a particular solution of + 4x = 2 cos 3t.
d t2
Guess that there might be a solution of the form
x = p cos 3t.
Then
dx d2 x
= −3p sin 3t, and = −9 p cos 3t,
dt dt2
so that
d2 x
+ 4x = −9p cos 3t + 4p cos 3t = −5p cos 3t.
dt2
This must be the same as the right-hand side of the equation in order for the guessed
function to be a solution, so
−5p cos 3t = 2 cos 3t.
Therefore p = − 25 , and the required solution is
x(t) = − 25 cos 3t.

In most cases when the right-hand side is a sine or cosine it will not work so
simply, as is illustrated in the following example.
397

Example 19.3 Find a particular solution of

19.1
d2 x d x
+ − 2x = 2 cos 3t.
d t2 dt

PARTICULAR SOLUTIONS FOR STANDARD FORCING TERMS


The form p cos 3t cannot be made to fit this equation because the dx/dt term on the left
produces a sin 3t term, making the left and right sides impossible to match for all t.
Try instead
x = p cos 3t + q sin 3t.
Then
dx
= −3p sin 3t + 3q cos 3t,
dt
and
d2 x
= −9 p cos 3t − 9q sin 3t.
dt2
Therefore
d2x dx
+ − 2x = (−9p + 3q − 2p) cos 3t + (−9q − 3p − 2q) sin 3t
dt2 dt
= (−11p + 3q) cos 3t + (−3p − 11q) sin 3t
= 2 cos 3t
which must match the right-hand side of the equation for all t.
The only way to satisfy this condition is to require both
−11p + 3q = 2, −3p − 11q = 0.
The solution of these two simultaneous equations for p and q is
.p = − 11
65 , q= 3
65 .
Therefore a particular solution is
x = − 11
65 cos 3t +
3
65 sin 3t.
It will be necessary in nearly all cases to take both sine and cosine terms into
account.

The case when f(t) is a constant often occurs:

d2 x dx
Example 19.4 Find a particular solution of 2
−2 + 4x = 3.
dt dt
Test whether there is a constant solution
x(t) = p (a constant).
By substituting this in the differential equation, we get
0 + 0 + 4p = 3,
so that p = 43 , and the particular solution is just x(t) = 43 , which is obvious after it has
been worked out.

If the right-hand side is a polynomial then the solution will be a polynomial:


398

d2 x d x
Find a solution of − + x = 3 + 2 t 2.
FORCED LINEAR DIFFERENTIAL EQUATIONS

Example 19.5
d t2 dt
Try a solution of the form x(t) = p + qt + rt 2, where p, q, and r are constants. It is
normally necessary to try a polynomial of the same degree as the forcing term, which
in this case has degree 2, and to include all the lower-degree terms in the trial solution.
Since
dx d2 x
= q + 2rt, and = 2r,
dt dt2
we must have
2r − (q + 2rt) + (p + qt + rt 2) = 3 + 2t 2.
Match up the coefficients of the three powers of t; we find that
r = 2, −2r + q = 0, 2r − q + p = 3.
The equations are easy to solve and lead to the solution
x(t) = 3 + 4t + 2t 2.

These methods apply equally to first-order (and higher order) equations having
constant coefficients. Summarizing for first- and second-orders:
19

Particular solutions of
d2x dx dx
2
+b + cx = f (t) and + cx = f (t)
dt dt dt
(a) f(t) = K eα t: try a solution x(t) = p eα t.
(b) f(t) = K cos β t or K sin β t: try a solution
x(t) = p cos β t + q sin β t.
(c) f(t) is a polynomial of degree N: try a polynomial of the same degree,
with all its terms present. (19.1)

There are exceptional cases where these substitutions have to be modified. For
example, d2x/dt 2 = t has a polynomial solution of degree 3, not degree 1. These
cases are treated in Section 19.3.
If the forcing term on the right-hand side consists of the sum of several con-
stituent terms, then obtain a particular solution for each one, and add them, as in
the following example.

Example 19.6 Obtain a particular solution of


d2 x
+ 4x = 1 + e − t .
d t2
d2x1 d2x2
Solve + 4 x1 = 1 for x1 (t) and + 4x2 = e −t for x2(t); then
dt2 dt2
x(t) = x1(t) + x2(t)
will be a particular solution of the original equation. ➚
399
Example 19.6 continued

19.2
For x1(t), try for a constant solution x1(t) = p: it is found that p = 14 , so x1(t) = 14 .
For x2(t), try x2(t) = q e−t (following the method of Example 19.1). The substitution
gives q e−t + 4q e−t = e−t, so that q = 15 , and the solution is x2(t) = 15 e −t.

HARMONIC FORCING TERM, BY USING COMPLEX SOLUTIONS


Therefore, a particular solution x(t) of the original equation is
x(t) = x1(t) + x2(t) = 14 + 15 e −t.

The method just described is another consequence of the linearity of the class
of equations considered. It is also called the superposition principle, and applies
to linear equations of all orders.

dx
Example 19.7 Obtain a particular solution of + x = 3 cos 2t.
dt
Remembering Example 19.3, we expect the solution will have to contain both cosine
and sine terms, so try
x(t) = p cos 2t + q sin 2t.
The substitution gives (p + 2q) cos 2t + (−2p + q) sin 2t = 3 cos 2t, so that p = 53 , q = 65 ,
and the solution is
x(t) = 53 cos 2t + 65 sin 2t.

Self-test 19.1
Find particular solutions of
d2x dx
− + 3x = f(t)
dt2 dt
in the cases (a) f(t) = 2e−2t; (b) f(t) = sin 2t; (c) f(t) = 3t 3.

19.2 Harmonic forcing term, by using complex solutions


In Example 19.3, we solved a second-order equation with the term cos 2t on
the right by choosing constants p and q so that the expression p cos 3t + q sin 3t
would fit the equation. We shall explain another important method for obtaining
solutions, which derives the required real solutions from complex solutions of a
related equation.
First of all, consider the complex general differential equation
d2 X dX
+b + cX = a e iβ t, (19.2)
dt 2 dt
where b, c, a, and β are real constants, and i is the complex element. Since the forcing
term is an exponential, we shall test for a particular solution of (19.2) having the form
X(t) = P eiβ t
as in Example 19.1 (but this time we must expect that P will be a complex constant).
400

Example 19.8 Find a particular (complex) solution of the complex


FORCED LINEAR DIFFERENTIAL EQUATIONS

d2 X d X
differential equation + + X = 3 e2 i t .
d t2 dt
Look for a solution of the form X(t) = P e 2it. To find P, substitute this expression into the
left-hand side of the differential equation:
(2i)2P e 2it + (2i)P e 2it + P e 2it = P(−4 + 2i + 1) e 2it = P(−3 + 2i) e 2it.
This must be the same as the right-hand side of the equation, 3 e2it, for all values of t.
Therefore, P(−3 + 2i) = 3, so
3 3(−3 − 2i)
P= = = − 133 (3 + 2i).
−3 + 2i (−3)2 + 22
Therefore X(t) = − 133 (3 + 2i) e2it is a particular solution. When expanded, it becomes
X(t) = − 133 (3 + 2i)(cos 2t + i sin 2t)
= − 133 (3 cos 2t − 2 sin 2t) + i [− 133 (2 cos 2t + 3 sin 2t)].

Consider next the real equation for x(t):


19

d2 x dx
+b + cx = a cos β t, (19.3)
dt 2 dt
where b, c, a, β are all real. We know that
cos β t = Re eiβ t
(see (6.8)). Therefore, if we can find a particular solution X(t) of the complex
equation (19.2), its real part will solve the corresponding real equation (19.3).

Example 19.9 Find a particular solution of the equation


2
d x dx
+ − 2x = 2 cos 3t.
d t2 dt
This is the same problem as Example 19.3, reworked so that the methods can be
compared. Since cos 3t = Re e3it, the corresponding complex equation for X(t) is
d2 X d X
+ − 2X = 2 e 3it.
dt2 dt
To find a particular solution of this new equation, try X(t) = P e3it:
d2 X d X
+ − 2X = 9i2P e3it + 3iP e3it − 2P e3it
dt2 dt
= (9i2 + 3i − 2)P e3it = (−11 + 3i)P e3it.
This must equal 2 e3it for all values of t, so
2 2(−11 − 3i)
P= = = −( 11
65 + 65 i).
3
−11 + 3i (−11)2 + 32
Therefore we have a complex solution of the complex equation:
X(t) = −( 11
65 +
3
65 i) e 3it
= −( 11
65 +
3
65 i)(cos 3t + i sin 3t). ➚
401
Example 19.9 continued

19.2
For x(t), we require only the real part of this expression:
x(t) = Re X(t)

HARMONIC FORCING TERM, BY USING COMPLEX SOLUTIONS


= − 11
65 cos 3t + 65 sin 3t,
3

which is what we obtained in Example 19.3 for the same problem.


As a bonus we also obtain a particular solution of
d2 x d x
+ − 2x = 2 sin 3 t,
dt2 dt
namely the imaginary part of X,
x = − 653 cos 3t − 11
65 sin 3t.

In the case when the right-hand side of the equation has the form a sin ω t, the
calculation is the same, but the imaginary part of the complex solution must be
extracted instead of the real part. The following example demonstrates also how
right-hand sides of the form
a eα t cos β t, a eα t sin β t
can be handled in the same way.

d2 x
Example 19.10 Find a solution of + x = e−2 t sin 3t.
d t2
Use the fact that
e−2t sin 3t = Im(e−2t e3it) = Im e(−2+3i)t.
Therefore, consider the corresponding complex equation
d2 X
+ X = e(−2 + 3i)t.
dt2
To find a solution, try the form
X(t) = P e(−2+3i)t.
We find in the usual way that
(−2 + 3i)2P e(−2+3i)t + P e(−2+3i)t = e(−2+3i)t
for all values of t. Therefore
1 −1
P= = = − 401 (1 − 3i)
(−2 + 3i) + 1 4(1 + 3i)
2

and
X(t) = − 401 (1 − 3i) e(−2+3i)t.
If we take the imaginary part of X(t), we obtain a solution of the original equation:
x(t) = Im[− 401 (1 − 3i)(e −2t e 3it)]
= − 401 e −2t(−3 cos 3t + sin 3t).
The same result could be obtained by substituting
x(t) = p e−2t cos 3t + q e−2t sin 3t,
but this would be a very laborious and error-prone process.
402
The method is particularly advantageous when the coefficients are general
constants. The following equation will be important in Chapter 20.
FORCED LINEAR DIFFERENTIAL EQUATIONS

Example 19.11 Find a particular solution of


2
dx dx
2
+ 2k + ω 02 x = a cos ω t, where a  0.
dt dt
Since cos ω t = Re eiω t, first find a solution of
d2 X dX
2
+ 2k + ω 02 X = a eiω t.
dt dt
By substituting X(t) = P eiω t, we find that
P = a/[(ω 02 − ω 2) + i(2kω)].
It is easier if we put P into polar coordinates: P = |P| eiφ, where
| P | = | a/[(ω 02 − ω 2) + i(2kω)]| = a/[(ω 02 − ω 2)2 + (2kω )2 ]2 ,
1

and
φ = arg[(ω 02 − ω 2) − i(2kω)],
19

since a  0. Then we obtain


a cos(ω t + φ )
x(t) = Re( P eiω t ) = 1 , (19.4)
[(ω − ω 2 )2 + (2 kω )2 ]2
2
0

where φ is the polar angle of the point ((ω 02 − ω 2), −2kω ) on an Argand diagram.

2
Particular solution of d x + b d x + cx = f (t)
dt 2 dt
(a) f(t) = a cos β t or a sin β t.
Put X(t) = P eiβ t to solve
X″ + bX′ + cX = a eiβ t.
Then x(t) = Re X(t) or Im X(t), corresponding to cos β t or sin β t respectively.
(b) f(t) = a eα t cos β t or a eα t sin β t.
Solve X″ + bX′ + cX = a e(α + iβ )t, and continue as in (a). (19.5)

Self-test 19.2
Using a complex method, find particular solutions of
d2x dx
+ − x = f(t)
dt2 dt
in the cases (a) f(t) = e−t cos t; (b) f(t) = e−t sin t.
403

19.3 Particular solutions: exceptional cases

19.3
There are exceptional cases for each of the three rules (19.1), when the suggested
substitution does not give any result because the trial function delivers zero when

PARTICULAR SOLUTIONS: EXCEPTIONAL CASES


it is substituted into the left-hand side. This means (as with the similar excep-
tional case of a single solution of the characteristic equation, (18.10)) that we
must choose a trial solution having a different form.
One important exception is the case of the equation
d2 x
+ β 2 x = a cos β t (or a sin β t).
d t2
Note that β occurs on both sides of the equation. This is a special case of Example
19.11 in which k = 0 and ω 2 = ω 02 = β 2. The rule in (19.1) suggests substituting
x(t) = p cos β t + q sin β t,
and choosing p and q so that the two sides match. But we already know that
this is a solution of the unforced equation d2x/dt2 + β 2x = 0 (see (18.14)), and
the inevitable zero that we get on making the substitution cannot be matched to
a cos β t on the right.
In this case, the solution is quite different. The following results can be con-
firmed by direct substitution.

Particular solutions: two exceptional cases


d2 x
(a) + β 2 x = a cos β t: solution is
d t2
a
x(t) = t sin β t.

d2 x
(b) + β 2 x = a sin β t: solution is
d t2
a
x(t) = − t cos β t.
2β (19.6)

d2 x
Example 19.12 Find a particular solution of + 9x = 5 sin 3t.
d t2
Here, x = p cos 3t and x = q sin 3t both give d2x/dt 2 + 9x = 0, so the standard solution
form does not work. From (19.6), with β = 3 and a = 5, the required solution is
5
x(t) = − t cos 3t = − 56 t cos 3t.
2×3
This solution is sketched in Fig. 19.1. Unlike the ordinary sine- and cosine-type solutions,
it grows indefinitely. Such solutions have an important physical significance described
in Chapter 20.

There are other exceptional cases that are not so frequently encountered; some
examples are given among the problems at the end of the chapter.
404

x
FORCED LINEAR DIFFERENTIAL EQUATIONS

− 56 π − 12 π − 16 π 1
6 π 1
2 π 5
6 π t
−1

−2

−3
5
x = − 6 t cos 3t
5
x=±6t

Fig. 19.1
19

Self-test 19.3
Solve the characteristic equation of
d2x dx
2
− 2 − 3x = 0.
dt dt
Explain why a particular solution of
d2x dx
− 2 − 3x = 2e−t
dt2 dt
is a special case. Find a particular solution.

19.4 The general solutions of forced equations


Consider the equation
d2 x
− x = −2 cos t. (19.7)
d t2
A particular solution, xp(t), say, is
xp(t) = cos t. (19.8)

From earlier experience, we should expect other solutions. In order to find some,
consider what happens when we substitute various functions x(t) in the expression
d2
x(t) − x(t). (19.9)
d t2
For example, when we put x(t) = cos t (the ‘particular solution’ mentioned above),
we obtain
405
d 2
cos t − cos t = −2 cos t,
d t2

19.4
as demanded by (19.7).
Suppose now that we can find another function, x(t) = xc(t) say, which produces

THE GENERAL SOLUTION OF FORCED EQUATIONS


zero out of (19.9). For example, x(t) = xc(t) = et makes d2x/dt2 − x equal to zero.
It is then obvious that if we put
x(t) = xp(t) + xc(t) = cos t + et
into (19.9), we again obtain (−2 cos t) on the right: that is to say, we have found
another solution of (19.7).
But we already know, from Chapter 18, all the functions xc(t) that give zero when
they are put into (19.9): they are the solutions of the equation
d2 x
− x = 0, (19.10)
d t2
and are given by
x(t) = xc(t) = A et + B e−t,
where A and B are any constants. Therefore
x(t) = cos t + A et + B e−t (19.11)

is always a solution of (19.7). The differential equation (19.10) is called the


unforced or homogeneous equation corresponding to the original equation (19.7),
and its solutions x(t) are called the complementary functions of the problem
(they complement or extend the particular solution of (19.7) that we obtained).
To show that we have obtained all possible solutions of (19.7), take the particular
solution cos t that we obtained, and suppose that xp(t) is any other solution of (19.7).
Evidently the function x(t) = xp(t) − cos t satisfies (19.10), so x(t) must be a comple-
mentary function. Therefore
xp(t) = cos t + (a complementary function),
so xp(t) must be one of the solutions already expressed by (19.11). Therefore
(19.11) is the general solution of (19.7). Exactly the same argument would have
applied in the general case:
2
General solution of d x + b dx + cx = f (t)
dt 2 dt
(i) Obtain any particular solution, xp(t).
(ii) Obtain all the solutions Axc1(t) + Bxc2(t) of the corresponding unforced or
homogeneous equation
d2x dx
+b + cx = 0
dt2 dt
(the complementary functions).
The sum of these gives the general solution:
x(t) = xp(t) + Axc1(t) + Bxc2(t),
where A, B are arbitrary constants. (19.12)
406
The theory and method is exactly the same for linear equations of the first
order, and of any order, whether the coefficients are constant or not.
FORCED LINEAR DIFFERENTIAL EQUATIONS

d2 x
Example 19.13 Find the general solution of + 4x = 3 cos 5t.
d t2
Particular solution xp(t). Looking forward into the calculation, it can be seen that the
solution needs no sin 5t term. Therefore try
xp(t) = p cos 5t,
where p is a constant. Then substitution into the equation requires
p(−25 cos 5t) + 4p cos 5t = 3 cos 5t
for all t, so p = − 17 . Therefore
x(t) = − 17 cos 5t.
Complementary functions xc(t). We require the solutions xc(t) of the corresponding
unforced equation d2xc /dt2 + 4xc = 0. Try for solutions of the form xc(t) = p emt. The
substitution produces the characteristic equation m2 + 4 = 0. Therefore m = ± 2i, so
a pair of solutions
(e2it, e−2it)
constitutes a complex basis. To get a real basis, choose either one, say e2it, and find its
19

real and imaginary parts. These are


cos 2t, sin 2t,
and this is the required real basis. Therefore, all the complementary functions are
given by
xc(t) = A cos 2t + B sin 2t (A and B arbitrary constants).
General solution. This is the sum of the two:
x(t) = − 17 cos 5t + A cos 2t + B sin 2t.

As explained in Section 19.3, the straightforward trial method for the


particular solution fails if the forcing term on the right is already a com-
plementary function, so it can be a useful tactic to look at the complementary
functions first. The following example contains this feature, and is also an initial-
value problem.

Example 19.14 (a) Obtain the general solution of


d2 x
+ 4x = 3 + 2 cos 2 t.
d t2
(b) Find the particular solution for which x = 0 and dx /dt = 0 when t = 0.
(a) Complementary functions xc(t). These are the solutions of d2xc /dt 2 + 4xc = 0. We
found them in Example 19.13: they are
xc(t) = A cos 2t + B sin 2t with A and B arbitrary.
Particular solution xp(t). There are two terms on the right, so find a particular solution
for each term separately, and add them.
For a solution, xp1(t) say, of d2xp1 /dt2 + 4xp1 = 3, we can obviously take
xp1(t) = 43 . ➚
407
Example 19.14 continued

19.5
Corresponding to the other term, we need a solution, xp2(t) say, of d2xp2 /dt 2 + 4xp2 =
2 cos 2t. We should normally expect a solution of the form p cos 2t + q sin 2t. However,
looking at the complementary functions we found, this function is already a

FIRST-ORDER LINEAR EQUATIONS WITH A VARIABLE COEFFICIENT


complementary function; so we have the exceptional case (19.6), which gives
xp2(t) = 12 t sin 2 t.
Therefore a particular solution of the original equation is
xp(t) = xp1(t) + xp2(t) = 43 + 12 t sin 2 t.
General solution.
x(t) = A cos 2 t + B sin 2 t + 43 + 12 t sin 2 t.
(b) Initial-value problem. We require also dx/dt:
dx
= −2A sin 2 t + 2B cos 2 t + 12 sin 2 t + t cos 2 t.
dt
The initial conditions prescribe x(0) = 0, or
A + 43 = 0,
and x′(0) = 0, or
2B = 0.
Therefore A = − 43 and B = 0, so the required particular solution is
x(t) = − 43 cos 2 t + 43 + 12 t sin 2 t.

Self-test 19.4
Obtain the general solution of
d 2x dx
− 2 − 3x = 2e−t
dt2 dt
dx
(see Self-test 19.3), which satisfies x = 0, = 1 at t = 0.
dt

First-order linear equations with a variable


19.5
coefficient
So far, the coefficient c in the first-order equation dx/dt + cx = f(t) has been a
constant. We shall now suppose c to be variable; call it g(t):
dx
+ g(t)x = f(t). (19.13)
dt
The equation is of linear type (no squares, products, etc., between terms involving
x are present), and the idea of obtaining a general solution by adding comple-
mentary functions to any particular solution still holds good. However, it is
nearly impossible to guess suitable trial functions so we need a new approach to
finding solutions.
408
If we could express the left-hand side dx/dt + g(t)x of the equation as
FORCED LINEAR DIFFERENTIAL EQUATIONS

d
(something),
dt
then the equation would be easy to solve. This cannot be done, but we can instead
do the next best thing. This is to obtain a certain function I(t), called an integrating
factor, such that
⎛ dx ⎞ d
I(t) ⎜ + g(t)x⎟ = [I(t)x] (19.14)
⎝ dt ⎠ dt
identically (i.e. for every function x(t) and for all values of t).
The following example shows the meaning of this idea and the way it is used.

Example 19.15 (a) Show that I(t) = et is an integrating factor for the expression
dx/dt + x. (b) Use it to find the general solution of the equation dx /dt + x = e2t.
(a) We shall confirm (19.14), that
⎛ dx ⎞ d
et ⎜ + x⎟ = (etx). (i)
⎝ dt ⎠ dt
19

Work from the right-hand side of (i). Differentiate the product etx:
d t dx ⎛ dx ⎞
(e x ) = e t + etx = et ⎜ + x⎟ ,
dt dt ⎝ dt ⎠
which is the same as the left-hand side of (i), so et is an integrating factor.
(b) Multiply both sides of the differential equation by et:
⎛ dx ⎞
et ⎜ + x⎟ = et e2t = e 3t.
⎝ dt ⎠
Because of the result in (a), we can write this as
d t
(e x) = e 3t.
dt
Therefore


etx = e3t dt = 13 e 3t + A (A arbitrary),

or
x= 1
3 e2 t + A e −t.

To find a general expression for an integrating factor, refer back to the


definition (19.14); the integrating factor I(t) is chosen so that

⎛ dx ⎞ d
I(t) ⎜ + g(t)x⎟ = [I(t)x].
⎝ dt ⎠ dt

This is the same as


dx dx dI(t) dI(t)
I(t) + I(t)g(t)x = I(t) +x , or I(t)g(t) =
dt dt dt dt
409
dx
(after cancelling I(t) , and dividing through by x). This can be written
dt

19.5
1 dI(t) d ln I(t)
= g(t) or = g(t).

FIRST-ORDER LINEAR EQUATIONS WITH A VARIABLE COEFFICIENT


I(t) dt dt
Therefore


ln I(t) = g(t) dt, so I(t) = e∫ g(t) dt.

(In the case of Example 19.15, we had g(t) = 1, and the present formula gives
I(t) = e∫ dt = et+C; the choice C = 0 gives the integrating factor suggested – any other
choice would do.)

Integrating factor for the equation


dx
+ g(t)x = f (t)
dt
Put I(t) = e ∫ g(t) dt;
⎛ dx ⎞ d
then I(t) ⎜ + g(t)x⎟ ≡ [I(t)x].
⎝ dt ⎠ dt
(19.15)

Solution of d x + g(t)x = f (t)


dt
Multiply both sides by I(t) (see (19.15)): the equation becomes
d
[I(t)x(t)] = I(t)f (t);
dt

then I(t)x(t) =
 I(t)f(t) dt + C, giving x(t). (19.16)

dx 1
Example 19.16 Find the general solution of − x = t 3.
dt t
Firstly, consider the range t > 0. Here g(t) = −1/t. Then ∫ g(t) dt = C − ln t, so that
I(t) = e −lnt = t −1,
where we have chosen C = 0 for convenience. Multiply both sides by I(t) = t −1:
⎛ dx 1 ⎞
t −1 ⎜ − x⎟ = t −1 t 3 = t 2 .
⎝ dt t ⎠
By (19.15), this can be written
d −1
(t x) = t 2 .
dt ➚
410
Example 19.16 continued
FORCED LINEAR DIFFERENTIAL EQUATIONS

Therefore


t −1x = t 2 dt = 13 t 3 + C,

so that
x(t) = 13 t 4 + Ct.
The solution obviously falls into the shape
particular solution + complementary function,
and is clearly the general solution for all t. We should have gained nothing by considering
negative t, or by adding an arbitrary constant, when working out ∫ (1/t) dt: we only need
any integrating factor, not all possible ones.

Notice particularly that, in the examples, we did not need to calculate or check
the truth of a statement like
⎛ d x 1 ⎞ d −1
t −1 ⎜ − x⎟ = (t x).
⎝ dt t ⎠ dt
19

We already know that t −1 is an integrating factor, and this is the very property that
an integrating factor is designed to possess.
Be prepared to recognize this type of equation in disguised form, or when
different letters are involved; for example,
dy x + y
=
dx x + 1
is the same as
dy 1 x
− y= .
dx x + 1 x+1

Self-test 19.5
Find the integrating factor of
dx
− x sin t = e−t−cost sin t,
dt
and obtain the general solution of the equation.
411
Problems

PROBLEMS
19.1 Find a particular solution of each of the suggested by (19.1) does not lead to a result.
following equations by trial as in Section 19.1. Try polynomials of higher degree than 1.
(a) x′ + x = 3 e2t; d 2 y dy
(d) + = x. The absence of a term in y causes
(b) x′ − 3x = t 3 + 1; dx2 dx
(c) 2x′ + 3x = t + 3 et; the second-degree trial function qx2 + qx + r
(d) x″ + x = 3 e2t; to fail. Try a third-degree polynomial instead.
(e) x″ − 41 x = 2 et + 3 e−t; (e) x″ − 2x′ + 2x = et cos t. Try t et (p cos t + q sin
(f) x″ − 2x′ + x = 3; t), or modify the complex-number approach of
(g) x″ + 4x′ − x = 3t 2 − t; Section 19.2 to obtain a particular solution.
(h) x″ − x = 2 cos t; (f) First-order equations also have exceptional
(i) 2x″ + 3x = 2 sin 3t; dy
(j) 2x″ + x′ = sin t − cos t; cases. Consider the equation − y = ex .
dx
(k) x″ + 2x′ + x = cos 2t; (If you have read as far as Section 19.5, you can
d2y also handle it by using an integrating factor.)
(l) − y = 1 − 3 e 2x;
dx2
d 2 y dy 19.5 Find the general solution of the following
(m) − + 2y = 3 sin 2x. equations.
dx2 dx
(a) x″ + 9x = 3 e2t;
(b) x″ − 4x = 2 e−t;
19.2 Use the method of Section 19.2 to find a
(c) 4x″ − x = 1 + 3 cos 2t;
particular solution of the following.
d2y dy
(a) x″ − x = 3 cos 2t; (d) +2 + 2y = 3;
(b) x″ + x = 2 sin 3t; dx2 dx
(c) x″ + 2x′ + x = 3 sin t; (e) x″ − 2x′ + 2x = 3 sin 2t;
(d) x″ − x′ − x = 3 cos t; (f) 4x″ − 2x′ − 2x = 3t2;
(e) 2x″ + x′ + 2x = 2 cos 2t; (g) x″ + x′ = 2 − 3 e−t cos t;
(f) 3x″ + 2x′ + x = 2 sin 2t; (h) 2x″ + x′ − x = 12 t + 3 e −t ;
(g) x″ − 4x = e−t cos t (note: e−t cos t = Re e(−1+i)t); d2y
(i) + y = 1 + 2 e3x + x 2;
(h) x″ − 4x = 3 et sin 2t (note: et sin 2t = Im e(1+2i)t); dx2
(i) Show that a solution of x″ + x′ + 4x = 5 cos 3t is d2y dy
(j) +2 + y = 3 cos 2x + sin 2x;
(5/√34) cos(3t + φ), where φ = arctan 35 . dx2 dx
d2y dy
(k) +4 + 5y = e −x sin x.
19.3 The following differential equations are dx2 dx
examples of the exceptional cases treated in
Section 19.3. Find a particular solution in each case. 19.6 Use an integrating factor (Section 19.5) to
(a) x″ + x = 3 cos t; find the general solution of the following
(b) x″ + 4x = 3 sin 2t; equations.
(c) x″ + 4x = 1 + 3 cos 2t; (a) x′ − 3x = 0;
d2y (b) x′ + 2x = 3;
(d) + 9 y = 2 sin 3x; (c) x′ − 2tx = t;
dx2
d2y dy (d) x′ − t −1x = t + t e−t;
(e) −2 + 2y = ex cos x. (e) x′ − t −1x = t − 1;
dx2 dx
(f) tx′ − 2x + 3 = 0;
dy 1
19.4 The following are exceptional cases of types (g) + y = sin x
dx x + 1
not described in Section 19.3. Find a particular (you will need to use integration by parts to
solution for each. perform the integration);
(a) x″ − x = et; try a solution of the form pte t.
dy 1
(b) x″ − 2x′ + x = et; try a solution of the form (h) 3 + y = x;
pt 2 et. (In this case, both et and t et are dx x
complementary functions, so the form in dy
(i) (x − 1) − y = (x − 1)2;
(a) will not work.) dx
(c) Consider the simple differential equation 1
(j) x ′ − x = ln t;
d2x/dt2 = t. A first try with the form pt + q t
412
dy x + y
(k) tx′ − x = 1 + t; (l) = ; is
dx x + 1

FORCED LINEAR DIFFERENTIAL EQUATIONS

(m) x′ + x cos t = cos t; y(x) = e−x exf(x) dx + C e−x,


dy 1 − y
(n) x = ; (o) (1 − t 2)x′ + tx = t.
dx 1 − x where C is an arbitrary constant.
(b) Show that the particular solution for which
19.7 Show that the general solution of
y = y0 when x = 0 is given by
dy 1
+ y = f (x) x
dx x
is given by
y(x) = y 0 e −x + e −x  e f (u) du.
0
u


1 C
y(x) = xf (x) dx + , 19.9 (Newton cooling). An object is heated
x x
or cooled above or below the ambient air
where C is any constant. Find the solution of the temperature T0 . Under certain physical
equation assumptions, the body temperature T satisfies
dy 1 the equation
+ y = ln x
dx x dT/dt = −k(T − T0 ),
for x  0, for which y = 0 when x = 1. where k is a positive constant. Find the general
solution of the equation.
19.8 (a) Use an integrating factor to show that the The body is at 100°C in an atmosphere at 40°C.
general solution of After 3 minutes, its temperature is 85°C. Find the
dy value of k, and determine when the body will
19

+ y = f(x)
dx reach 60°C.
Harmonic functions
and the harmonic
oscillator
20

CONTENTS

20.1 Harmonic oscillations 413


20.2 Phase difference: lead and lag 415
20.3 Physical models of a differential equation 417
20.4 Free oscillations of a linear oscillator 419
20.5 Forced oscillations and transients 420
20.6 Resonance 423
20.7 Nearly linear systems 425
20.8 Stationary and travelling waves 427
20.9 Compound oscillations; beats 431
20.10 Travelling waves; beats 434
20.11 Dispersion; group velocity 436
20.12 The Doppler effect 437
Problems 439

The two previous chapters mainly describe formal techniques for solving linear
differential equations, rather than their applications. The present chapter pre-
sents qualitative ideas and terminology used for practical phenomena governed,
exactly or approximately, by linear differential equations whose solutions have
harmonic behaviour; that is to say, vibrations or waves occur that are sine or
cosine functions having various physical properties – wavelength, amplitude, phase
velocity, and so on. Such a system is called a harmonic oscillator. Section 20.9
treats travelling waves such as sound waves, and Section 20.10–20.12 some more
advanced phenomena: beats, arising when two harmonic signals are superposed;
dispersion and group velocity, where the velocity is frequency-dependent; the
Doppler effect, arising when a sound or optical source or observer is moving; and
diffraction of superposed waves in space.

20.1 Harmonic oscillations


Consider the equation
d2 x
+ ω 2 x = 0,
d t2
414
where we assume ω  0. Its solutions (see (18.14b)) are
HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

x(t) = C cos(ω t + φ) (20.1)

where C and φ are any constants. We can write


x(t) = C cos(ω t + φ) = C cos ω [t + (φ /ω)].
Therefore the graph of (20.1) is merely the graph x = C cos ω t shifted, or translated,
a distance φ /ω along the t axis; to the left if φ is positive, to the right if φ is negative.
Sine functions are included in the collection (20.1), because sin ω t = cos(ω t − 12 π).
These functions are spoken of generally as harmonic functions.
In applications it is usual to adjust φ so that
C0 and −π  φ  π (20.2)

which can always be done without changing the function values generated by the
expression (20.1). We say then that the function (20.1) is in standard form.

Example 20.1 Express −2 cos(3t − 73 π) in standard form.


Note that cos(A + π) = −cos A. Therefore
−2 cos(3t − 73 π) = 2 cos(3t − 73 π + π) = 2 cos(3t − 43 π).
We now have positive C, but φ is still out of range according to (20.2). To bring it within
range increase it by 2π, which alters nothing:
2 cos(3t − 43 π) = 2 cos(3t − 43 π + 2π) = 2 cos(3t + 23 π),
20

which is now in standard form.

φ /ω x
C
Amplitude C

−2π/ω −π/ω O π/ω 2π/ω t

−C
Period 2π/ω

Fig. 20.1 x = C cos(ω t + φ); C  0, −π  φ  π.

The features of the function x(t) = C cos(ω t + φ) are shown in Fig. 20.1. Assume
that the expression is in standard form (20.2). The graph swings between ±C, and
C is its amplitude. It is periodic (see Section 1.6), repeating itself at intervals of
length 2π /ω, which is its minimum period. The number of complete oscillations
per unit time is the frequency (e.g. in cycles per second, or hertz units), and
Frequency = (period)−1 = ω /2π. (20.3)
415
The parameter ω is angular frequency, often shortened merely to ‘frequency’. The
parameter φ is the phase or phase angle. As explained above, φ /ω represents the

20.2
distance that the graph x = C cos ω t has to be shifted to coincide with (20.1).
Frequently the independent variable represents length x instead of time t, as

PHASE DIFFERENCE: LEAD AND LAG


in a form such as y = C cos(ω x + φ). Then 2π/ω is called wavelength rather than
‘period’, and ω /2π the wave number rather than ‘frequency’.
Graphs of harmonic functions are often displayed by plotting x against the
dimensionless variable
τ = ωt
(τ is the Greek letter ‘tau’) rather than against t. Thus τ will be the name of the
new time-like axis, so that
x = C cos(τ + φ),
which has period 2π in the variable τ. The x, τ graph is drawn in Fig. 20.2.
The new graph has τ period 2π: it repeats itself when τ increases by 2π. It is the
same as the graph of x = C cos τ displaced through an interval in τ of length φ.
Expressed in terms of the period or wavelength in Fig. 20.2, it is clear that φ = π
represents a displacement of half a wavelength, φ = 12 π represents a displacement
of a quarter of a wavelength, and so on.

φ x
C

τ (=ω t)
−2π −π O π 2π

−C
τ period 2π

Fig. 20.2

Self-test 20.1
Express x(t) = −2 sin(3t − --13 π) in standard form.

20.2 Phase difference: lead and lag


Suppose that two oscillations have the same angular frequency ω, but are out of
step because they have a different phase:
x1(t) = C1 cos(ω t + φ1), x2(t) = C2 cos(ω t + φ2).
Then they are said to be out of phase by an angle φ2 − φ1, or φ1 − φ2. More
specifically, the following terminology is widely used in science and engineering
applications:
416

Phase difference; lead and lag


HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

x(t) = C1 cos(ω t + φ1) and y(t) = C2 cos(ω t + φ2) are harmonic functions in
standard form with the same circular frequency ω . If φ1  φ2, then x is said to
lead y, or y is said to lag x, by an angle φ1 − φ2. (20.4)

The reason for these terms is illustrated in Example 20.2.

Example 20.2 If a voltage v = v0 cos ω t is applied to a coil having self-


v
inductance L, the resulting current i = 0 cos(ω t – 21 π), so that v leads i,
ωL
or i lags v, by 12 π (or 90°). Illustrate the sense of the terms graphically for
this case.

xB
C A
−π π
O τ (=ω t) Fig. 20.3 –––– v = v0 cos τ;
------ i = (v0 /ω L) cos(τ − 21 π). The
period inspected is symmetrical
Period 2π about the chosen feature at B.
20

The curves in Fig. 20.3 are plotted against the variable τ = ω t and represent
v0
v = v0 cos τ and i= cos(τ – 12 π).
ωL
Choose one of the curves, say the v curve, and select a prominent feature, say the
maximum at B. Now search an interval within ±π of B (that is to say, within half a
period on either side of B) for the corresponding feature of i. This is the maximum
of i at A.
Now, as we move from left to right (time increasing) through the interval, B appears
before A – that is to say, at a quarter period (--12 π) earlier than A. This will be true for any
feature of v within its own symmetrical corresponding interval of ±π. It is equivalent to
saying that, when the two variables to be compared are in standard form, the one with
the greater phase leads, and the other lags, by the phase difference (taken positively).

In Example 20.2 it is essential to limit the search to the prescribed single period.
Otherwise we could argue (see Fig. 20.3) that because, say, C appears before B,
therefore i leads v, which is contrary to the definition. (Any basic period of length
2π will give the same priority.) Also, notice that if one entity leads another it does
not in the least imply that the first is to be taken as the cause of the second.
Suppose that two oscillations, having the same amplitude and frequency, differ
in phase by π so they are displaced by half a period. If the oscillations are added
together, there is total cancellation, as shown in Fig. 20.4. The following example
shows what happens when the phase difference is less extreme.
417

20.3
C

PHYSICAL MODELS OF A DIFFERENTIAL EQUATION


O
t

−C Fig. 20.4 –––– x = C cos(ω t + φ);


------ x = C cos(ω t + φ ± π).

Example 20.3 Two waves described by C cos ω t and C cos(ω t + φ) are


superimposed (added). Show that the result is a harmonic wave of the same
frequency, and show how the amplitude varies as φ varies between ±π.
From Appendix B the sum can be written
C[cos ω t + cos(ω t + φ)] = [2C cos 12 φ] cos(ω t + 12 φ ).
This is a harmonic oscillation with angular frequency ω, phase 12 φ , and amplitude
2C cos 12 φ . As φ goes from −π through zero to π, the amplitude goes from zero
(cancellation) through the value 2C and back to zero.

This type of superposition is of importance in describing interference and


diffraction phenomena. If the amplitudes of the components are not the same, a
similar calculation applies (see Problem 20.4 at the end of this chapter).

Self-test 20.2
Two waves are represented by C cos(ω t + φ1) and C cos(ω t + φ2). The two
waves are superimposed. What is the amplitude of the resulting harmonic
wave? For what values of the phases does cancellation occur?

20.3 Physical models of a differential equation


Figure 20.5a shows a piston of mass m running in a cylinder, controlled by a
spring which obeys Hooke’s law (a linear spring) and has stiffness s, acted on
by an external force F(t). The displacement of the piston from its equilibrium
position is x(t). Assume also that there is a frictional resistance proportional to
the velocity:
dx
frictional resistance = K , (K  0).
dt
The equation of motion, force equals mass times acceleration, becomes
dx d2 x d2 x K dx s F(t)
F(t) − K − sx = m 2 , or 2
+ + x= . (20.5)
dt dt dt m dt m m
418

(a) (b)
HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

Spring force sx External


Mass force
Damper
dx m F(t) Mass
Friction K
dt

Equilibrium x Spring

Fig. 20.5 (a) Mass–spring system. The arrows indicate the actual direction of the forces when
F(t), sx, and K dx/dt take positive values. (b) Schematic representation: the spring and the
frictional element must be in parallel.

This equation is of the type discussed in Chapter 19:


d2 x dx
+b + cx = f (t), (20.6)
dt 2 dt
in which
K s F(t)
b= , c= , and f (t) = .
m m m
Figure 20.6 represents an LCR circuit driven by a voltage source of zero imped-
ance, V(t). If Q is the charge on the capacitor, then
d2 Q dQ 1 d2 Q R d Q 1 1
20

L 2
+R + Q = V(t), or 2
+ + Q = V(t). (20.7)
dt dt C dt L dt LC L
Again, this is an equation of the type (20.6), with
R 1 1
x = Q, b= , c= , f (t) = V(t).
L LC L
These two physical systems serve as models of the differential equation (20.6).
They are also models of each other, for by choosing the same values of b and c
and the same forcing term f(t) the circuit would serve as a precise analogue
of the piston and mimic its behaviour exactly. A vast number of systems share
the governing equation (20.6), at least approximately. Such a system is called a
linear oscillator.

C R

V(t)

Fig. 20.6
419

20.4 Free oscillations of a linear oscillator

20.4
Suppose that in the piston system there is no external force acting, so that F(t) = 0
for all t. We shall choose a conventional notation that simplifies the algebra a little.

FREE OSCILLATIONS OF A LINEAR OSCILLATOR


Equation (20.6) will be written
d2 x dx
2
+ 2k + ω 02 x = 0 (20.8)
dt dt
(in which we have put K /m = 2k, s /m = ω 20, F(t) = 0). This equation describes the
free oscillations of the mass–spring system.
The parameter k is a measure of the amount of friction in the system. We shall
consider the case when k is ‘small’. This is not very meaningful because k is
not dimensionless, so we could change our units so as to make it as large as we
wished. The only thing that makes sense is to compare it with another parameter
having the same dimensions (see Appendix I). We specify that
k2  ω 20. (20.9)

We have already worked out this problem (see (18.15)). The solutions of (20.8)
subject to (20.9) are given by
x(t) = C e−kt cos[(ω 02 − k 2 )2 t + φ ],
1
(20.10)

where C and φ are arbitrary. These are called the free oscillations or natural oscilla-
tions of the system represented by the equation.
If the friction, or so-called damping, is zero then k = 0 and the equation for the
free oscillations becomes
d2 x
+ ω 02 x = 0 (20.11)
d t2
with solutions
x(t) = C cos(ω 0 t + φ) (20.12)

which are harmonic functions with circular frequency ω 0.


The friction, or damping, changes (20.12) into (20.10). The frequency is changed
from ω 0 to (ω 02 − k 2 )2 , which is a small change if k is small, and the regular
1

oscillations of (20.12) are caused to die away through the factor e−kt in (20.10).
The general effect is shown in Fig. 20.7. We say that the oscillation decays

(b)
x
(a)
x

O t

O t

Fig. 20.7
420
exponentially down to zero, when all the initial energy is used up on friction. This
is weak damping and the oscillation is said to be underdamped.
HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

If k2  ω 02, then there is a comparatively large amount of friction, and the form
of the solution is different from (20.10) (see Fig. 20.7b). There are no oscillations;
the x(t) curve dies away without crossing the t axis more than once, as in a dead-
beat electrical instrument or shock absorber. This is the case of heavy damping or
an overdamped oscillation.

Self-test 20.3
Let ω 0 = k in eqn (20.8), so that x satisfies
d2x dx
2
+ 2 + k2x = 0.
dt dt
Solve the equation, and discuss how the ouput behaves.

20.5 Forced oscillations and transients


Return to the equation (20.5) for the mass–spring system with a non-zero external
force F(t) acting. As before, put K/m = 2k, s/m = ω 20, and F(t)/m = f(t), so that we get
d2 x dx
+ 2k + ω 02 x = f (t),
d t2 dt
which is a forced equation of the type considered in Chapter 19.
We shall consider only the case when f(t) = K cos ω t:
20

d2 x dx
+ 2k + ω 02 x = K cos ω t, (20.13)
d t2 dt
and suppose as before that the friction (or resistance) is ‘small’:
k2  ω 02 .
The mass in the piston system is now subject to competing stimuli. Left to itself
it would oscillate as in (20.10) with circular frequency (ω 02 − k 2 )2 , and finally
1

come to rest. However, the forcing term is trying to make it oscillate with a different
circular frequency ω. The result is described by the general solution of (20.13).
This is equal to the sum of a particular solution (already worked out in (19.4))
and the complementary functions which are the free oscillations given in (18.15):

General solution of the forced linear oscillator equation (k2  ω 02 )


d2x dx
2
+ 2k + ω 02 x = K cos ω t
dt dt
K
x(t) = cos(ω t + Φ ) + C e −kt cos[(ω 02 − k 2 ) t + φ ],
1
2

[(ω 0 − ω )2 + 4k 2ω 2 ]
1
2 2 2

in which Φ is the polar angle of the point (ω 20 − ω 2, −2kω ), and C and φ are
arbitrary. (20.14)

The structure of (20.14) is very important: the general features are summarized
in (20.15) below.
421

Forced oscillations of a linear oscillator

20.5
(A) The forced oscillation (first term of (20.14)) coexists with a free oscillation
(second term). The free oscillation proceeds as if no forcing term were present.

FORCED OSCILLATIONS AND TRANSIENTS


(B) The term representing the forced oscillation is harmonic, with the same
frequency as the forcing term, but a different phase and amplitude. The
term is invariable; initial conditions can have no effect on it since it contains
no adjustable constants.
(C) The free oscillation term adjusts to any initial conditions by means of the
constants C and φ.
(D) If k is positive, that is to say if there is any friction (or resistance in the
case of a circuit), the free oscillations die away to zero due to the factor e−kt.
Therefore all solutions ultimately settle into the same steady oscillation,
independently of the initial conditions. (20.15)

On account of (D) the free oscillation is called a transient oscillation, and may
show itself, for example, by a brief irregularity in the voltage or current upon
switching an electrical apparatus:

Example 20.4 The circuit shown in Fig. 20.8 is initially quiescent and
uncharged. Find the charge Q(t) on the capacitor after switching the circuit on.

L = 10−3

E(t)
= 2 cos 90t R = 8 × 10−3

C = 10−1

Fig. 20.8

We shall rework the problem from first principles. The equation is (see (20.7))
d 2Q dQ
10 −3 2 + 8 × 10 −3 + 10Q = 2 cos 90t,
dt dt
d 2Q dQ
or +8 + 10 4 Q = 2 × 10 3 cos 90t.
dt2 dt
Complementary functions Qc (natural oscillation). The characteristic equation is
m2 + 8m + 104 = 0, so that m = −4 + 99.92i and the complementary functions are
Qc = B e− 4t cos(99.92t + φ), where B and φ are arbitrary.
Particular solution Qp (forced oscillation). Look for a solution to the corresponding
complex equation
d2 X dX
2
+8 + 10 4 X = 2 × 10 3 e90it,
dt dt
and take its real part. By trying a solution of the form X(t) = P e90it we obtain
P = 0.9205 − 0.3488i. In polar coordinates this becomes P = 0.9843 e−0.3622i.
The corresponding complex solution, in polar coordinates, is ➚
422
Example 20.4 continued
HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

X(t) = 0.9843 e(90t − 0.3622)i.


Therefore the particular solution is
Qp(t) = Re X(t) = 0.9843 cos(90t − 0.3622).
The general solution. This is
Q(t) = 0.984 cos(90t − 0.3622) + B e− 4t cos(99.92t + φ).
Initial conditions. At t = 0, Q and dQ/dt are zero. After obtaining dQ/dt and
substituting t = 0 into Q and dQ/dt, we obtain the equations
B cos φ = − 0.9204,
4B cos φ + 99.92B sin φ = 31.389.
The solution is B = 0.9851, φ = 2.777, so Q(t) is given by
Q(t) = 0.9843 cos(90t − 0.3622) + 0.9851 e− 4t cos(99.92t + 2.777).
Figure 20.9 shows the individual contributions of the two terms.

(a)
1

Qp 90t
O

–1
20

(b)
1

Qc 90t
O

–1

(c)
1

Q 90t
O

–1

90t
O 4π 8π 12π 16π 20π 24π 28π 32π

Fig. 20.9 (a) Forced oscillation, Qp = 0.9843 cos(90t − 0.3622). (b) Transient, Qc = 0.9851 e− 4t
cos(99.92t + 2.777). (c) Total oscillation, Q = Qp + Qc.
423

20.6 Resonance

20.6
Return to eqn (20.14) for the linear oscillator and its solutions and examine the
forced oscillation, which is all that is left after the transient has died away. Its

RESONANCE
amplitude, A say, is given by

K
A= 1 .
[(ω 02 − ω 2 )2 + 4k 2ω 2 ] 2

Different values for the forcing frequency ω will produce different amplitudes;
some values of ω will be more effective than others in generating a large
amplitude.
Regard ω 0 and k as representing the fixed characteristics of some kind of system,
and consider an experiment in which we try to excite it with a controllable input
K cos ω t, keeping K constant but trying various values of ω . The amplitude A
will be greatest when (ω 20 − ω 2)2 + 4k2ω 2 = g(ω ), say, is a minimum with respect
to the variable ω . It is found, by solving dg/dω = 0 (see Problem 20.15), that the
minimum occurs when

ω 2 = ω 20 − 2k2, (k2  21 ω 2 ).

When ω 2 takes this value the amplitude A will take its greatest possible value for
the given K and ω , given by

K
A= 1 .
2k(ω − k 2 )2
2
0

Figure 20.10 shows schematically how the amplitude A, and also the phase Φ in
(20.14), vary with forcing frequency ω. Different curves are obtained according
to the amount of friction or damping (or resistance in the case of a circuit) in the
system, measured by the size of k; as the damping decreases, the maximum
increases. When the condition for a maximum is satisfied, the system is said to be
in a state of resonance.

(a) ∞
A k=0

(b)
O ω0 ω
k k=0
small

− 12 π
k0
k large

Φ −π
O ω0 ω k=0

Fig. 20.10
424

Resonating system
HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

d2x dx
2
+ 2k + ω 02 x = K cos ω t.
dt dt
K
Forced amplitude A = .
[(ω − ω ) + 4k 2ω 2 ]
1
2 2 2 2
0

Resonant frequency ω 2 = ω 02 − 2k2.


K
Resonance amplitude
2k(ω − k 2 )
1
2 2
0 (20.16)

A physical feeling for the buildup of a large amplitude can be obtained by think-
ing of a child being pushed on a swing by two people, one on either side of the
swing. The method is to push the swing the way it wants to go, and not to work
against it. This is best done by pushing it, forward and backward alternately,
when it is at the bottom of its path. The driving frequency is then the same as the
natural frequency of the swing. The driving cycle is a quarter of a period out of
phase with the swing’s cycle, because the force is a maximum when the displace-
ment is a minimum. In terms of (20.16), k is assumed small, so that ω 2 = ω 20 very
nearly, and the phase difference Φ is nearly 12 π or a quarter of a period, the forcing
term leading the response by this amount.
Suppose next that there is zero friction,
20

k = 0,
so that
d2 x
+ ω 02 x = K cos ω t, (20.17)
d t2
and from (20.16) the forced amplitude A is
K
A= . (20.18)
ω − ω2
2
0

The natural frequency of this system is exactly ω 0. When ω (the forcing frequency)
gets close to ω 0, the amplitude A can become very large, approaching infinity as
ω approaches ω 0: see Fig. 20.10a.
When ω = ω 0 the equation becomes
d2 x
+ ω 02 x = K cos ω 0 t, (20.19)
d t2
and apparently A = ∞. This result cannot be said to describe a steady solution
of (20.19), but must be reconcilable with (20.19) in some way. In fact it is the
‘exceptional case’ of eqn (19.6a), and has a solution
K
x(t) = t sin ω 0 t. (20.20)
2ω 0
425
This particular solution conveniently satisfies the initial conditions

20.7
x(0) = 0, x′(0) = 0, (20.21)

that is to say, the conditions for initial quiescence. It therefore represents a system

NEARLY LINEAR SYSTEMS


without friction and in a state of resonance, which starts up from rest. Its oscilla-
tions grow steadily to infinity due to the factor t in (20.20). The equation does not
have any solutions corresponding to steady forced oscillations, such as we found
earlier in systems having even a small amount of friction.

Self-test 20.4
Using (20.14) and (20.16), find the phase at the resonant frequency of a
forced linear oscillator.

20.7 Nearly linear systems


Consider the pendulum of Fig. 20.11. It consists of a weightless rod of length l,
pivoted at the top and carrying a point mass m at the lower end. It makes an angle
θ(t) with the vertical. The equation of motion, obtained by taking moments about
O and assuming no resistance, is
d2θ g
+ sin θ = 0, (20.22)
d t2 l
where g is the gravitational constant.

θ
l

mg Fig. 20.11 The pendulum.

This equation is nonlinear since sin θ is not of the form aθ + b, so the methods
of Chapter 19 do not apply to it. However, the Taylor series for sin θ begins
sin θ = θ − 61 θ 3 + 
(θ in radians), so provided that θ remains small enough we can approximate
sin θ by
sin θ ≈ θ.
The error is about 10% when θ = 45°, and 0.1% at 5°. Put this into (20.22); we
obtain the approximate linearized equation
426
d2θ g
+ θ = 0. (20.23)
HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

d t2 l
1
The general solution is θ (t) = C cos[(g/l)–2 t + φ] (see Section 20.1). The values
of C and φ will depend on how it was set going; the initial conditions amount
to prescribing the position and angular velocity at t = 0. However, C must be small
for the approximation to be justified.
Exactly linear equations are uncommon in applications. Most frequently they
occur as the result of a simplifying approximation such as we carried out for the
pendulum. Usually some function in the equation is linearized at the expense of
a restriction on the dependent variable.

Example 20.5 A mass m is fixed at the midpoint of a piece of elastic having


natural length l and stiffness s. The elastic is stretched between two points a
distance L  l apart. Find the period of small lateral vibrations.
(See Fig. 20.12.) The extension e of the branch AC is AC − 12 l, so the tension T in either
branch is
T = se = s(AC − 12 l ).

1
2L
20

T
x(t)
N C
m
T
1
2L

Fig. 20.12 A and B are fixed a


distance L apart where L  l;
the mass m at C is displaced from
B equilibrium at N by a distance x(t).

The total restoring force F is 2T sin θ, and


sin θ = NC/AC = x /AC,
so F = 2s(AC − 12 l )x /AC = 2s[1 − l/(2AC)]x.
The equation of motion is
d2 x
m 2 = − F,
dt
so we must put AC in terms of x. Now
1
AC = ( 14 L2 + x2 )2 ,
d2x G l J
so the equation for x is m = −2s I 1 − 2 x, which is clearly nonlinear.
(L + 4x2)–2 L
1
dt2
However, if the oscillations which we expect are of small amplitude compared
with L, we can put ➚
427
Example 20.5 continued

20.8
AC ; 12 L
with an error of something like 2x2/L2, and the approximation to the restoring force

STATIONARY AND TRAVELLING WAVES


becomes
F ; 2s(1 − l/L)x.
The equation of motion becomes approximately
d2 x d2 x 2 s
m = −2s(1 − l / L)x, or + (1 − l / L)x = 0.
dt2 dt2 m
This is the linearized equation, good for small amplitudes. It has solutions (with l ≠ L)
⎧ ⎡ 2s ⎛ l⎞⎤2
1

⎪ ⎪
x(t) = C cos ⎨ ⎢ ⎜ 1 − ⎟ ⎥ t + φ ⎬ ,
⎪⎩ ⎣ m ⎝ L ⎠ ⎦ ⎪⎭

where C and φ are arbitrary. The approximate period is


1

⎡ 2s ⎛ l⎞⎤2
2π ⎢ ⎝ ⎜ 1 − ⎟⎥ .
⎣m L⎠ ⎦
It is interesting to consider the case when the string is unstretched in the equilibrium
position, so that L = l. In this case l(l 2 + 4x 2)− –2 ≈ 1 − (2/l2)x2, using the binomial expansion.
1

To lowest order for small | x|, the equation of motion is the nonlinear equation
d 2x 4s
m 2 = − x3. This case is discussed in Example 23.2.
dt l

20.8 Stationary and travelling waves


The simplest type of wave motion is a harmonic oscillation: a periodic time
variation of the form C cos(ω t + φ). However, physical vibrations arising in
connection with subjects such as sound, elasticity, radar, X-ray analysis, optics,
and many others involve one or more space variables x, y, z as well as time t.
Oscillations are taking place everywhere in a region, and the variations in phase
and amplitude from point to point are all-important for many applications.

Case (i) Stationary waves in one space dimension


Figure 20.13 shows a sinusoidal or harmonic wave in one space dimension z,
exemplified by the displacement u of the string on a musical instrument when a
perfectly pure tone is played. At every moment the shape of the string is sinusoidal
in z, but each ordinate oscillates harmonically from moment to moment. The
points N on the string where the displacement is zero (nodes), and the points M
(antinodes), are fixed points. The motion is called a standing wave or a stationary
wave. The ordinates go up and down as we watch, but there is no overall motion
along the z axis.
The most general expression for a stationary sinusoidal wave is

⎡ ⎛t⎞ ⎤ ⎡ ⎛ z⎞ ⎤
u(t, x) = A cos ⎢2π ⎜ ⎟ + φ ⎥ cos ⎢2π ⎜ ⎟ + α ⎥ , (20.24)
⎣ ⎝ T⎠ ⎦ ⎣ ⎝ λ⎠ ⎦
428

M M M
HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

z
N N
N N

M M M

Fig. 20.13 A standing wave on a stretched string. The nodes N and antinodes M remain fixed in
position.

where u is the displacement, z the coordinate along the string, λ the wavelength,
T the period of the oscillation, A  0 the amplitude, and φ and α arbitrary phase
angles. Sines can be turned into cosines by increasing the phase by 21 π, so that
(20.24) covers sines as well as cosines. We can express (20.24) in terms of angular
frequency ω and wave number k by putting
2π 2π
T= , λ= ; (20.25a)
ω k
the frequency f in cycles per second ( f hertz) is given by
1 ω
f = = . (20.25b)
T 2π
Therefore
u(t, x) = A cos(ω t + φ)cos(kz + α).
20

(20.26)

The nodes N occur where cos(kz + α) = 0, that is, where z = [(n + --)π − α]/k. 1
2

Case (ii) Travelling waves in one space dimension


Figure 20.14 illustrates another type of wave motion. It may be thought of in terms
of a very long, taut string extending along the z axis to the right. A steady wave
motion moving to the right can be initiated by wobbling the left-hand end of
the string. It is called a travelling or progressive wave. The general form of a
progressive harmonic wave is given by
u(t, z) = A cos(ω t − kz + φ). (20.27)

λ = 2π/k

At time t z

Velocity v
A
A moment
z
later

Fig. 20.14 A travelling wave A cos(ω t − kz + φ ) moving with phase velocity v = ω /k.
429
The shape of u(t, z) at any fixed moment t = t0 takes the form

20.8
u(t, z) = A cos(−kz + B),

where B = ω t0 + φ, a constant. Therefore, the graph of u against z maintains a

STATIONARY AND TRAVELLING WAVES


constant shape for all t0, but it is translated bodily along the z axis by a distance
depending on t0. To find the velocity of the translatory motion we may track
the motion of any feature of the graph as time increases: we could use, say,
any maximum of (20.27). A maximum occurs for values of t and z connected by
ω t − kz + φ = 0; that is, at the moving point

ωt φ
z= + .
k k

From (20.25a,b) and (20.26), the velocity v of the wave along the z axis is therefore
given by

ω λ
v= = = λf. (20.28)
k T

The velocity of a sinusoidal wave is called the phase velocity: it is the velocity of a
point for which the phase maintains a constant value; for example, following an
antinode as above, or a node. A more direct way of justifying the equation v = λ f
is as follows. The number of waves crossing the fixed point P per second is equal
to the frequency f. Therefore, the length of the wave train crossing P per unit time
is f × wavelength, which is the velocity v.
The wavelength λ and the frequency f cannot be assigned independently in a
physical problem (and the same applies to ω and k) since they are connected by
(20.28), and the velocity of propagation v is determined by the physical medium
(even if it varies with the frequency).

Case (iii) Plane waves in three dimensions along the z axis


Set up right-handed axes as in Fig. 20.15, and consider a disturbance (such as
pressure in a sound wave) u(t, x, y, z), described by

u(t, x, y, z) = A cos(ω t − kz + φ), (20.29)

in which A, ω, k, φ are constants. Although this looks similar to (20.27), its mean-
ing is different; eqn (20.29) defines u at every point in the x, y, z space. The values
of u are independent of x and y; that is to say, the value of u over any fixed plane
perpendicular to the z axis, such as ∑ in Fig. 20.15, is uniform at any particular
moment, though this value varies with time t. The waves are therefore called
plane waves.
The z axis is one ray of the three-dimensional wave; along it the situation is
the same as in Case (ii) above. The value of u over ∑′, moving with velocity v, is
equal to the value on the z axis, so any plane ∑′ (Fig. 20.15) that follows a given
constant value of u must move to the right with speed v = ω /k = λ /T. To sum up
these results:
430
HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

y
∑ ∑′

x
O
z

Wave
d irectio
Veloc n
ity v
z

∑ fixed

∑′ following wave

Fig. 20.15 A plane travelling wave. The disturbance u is uniform on the fixed plane ∑. It is
uniform and also constant over ∑′, which moves with velocity v.

Plane wave travelling along the z axis


u(t, x, y, z) = A cos(ω t − kz + φ),
with amplitude A  0, circular frequency ω, wave number k, phase angle φ,
velocity v = ω /k.
20

(20.30)

Case (iv) Plane waves in any direction


Suppose that v is any unit vector (i.e. a vector of unit length; see Section 9.7). It
can be used to indicate a definite direction in space. Equation (20.30) describes a
plane wave travelling along the z axis; we shall verify that a plane wave travelling
in the direction v is given by:

Plane wave u(t, r) travelling in direction V


u(t, r) = A cos(ω t − kv·r + φ),
where v is a unit vector in the direction of propagation, and r the position vector
of any point of observation (x, y, z). (20.31a)

In (20.31a), v· r represents the scalar or ‘dot’ product (see Section 10.1). If α, β,


γ are the angles made by v with the positive directions of the x, y, z axes, then
v = î cos α + q cos β + x cos γ.
The components are direction cosines, so that
|v |2 = cos2α + cos2β + cos2γ = 1
automatically, and
431

20.9
P
z′
(with Oz′

COMPOUND OSCILLATIONS; BEATS


θ Q parallel to v)
O

v
Fig. 20.16

v ·r = x cos α + y cos β + z cos γ. (20.31b)

To prove the result (20.31a), see Fig. 20.16. P : (x, y, z) is a representative point.
∑ is the plane which passes through P, and is perpendicular to the unit vector v.
OQ is perpendicular to ∑ at Q, and passes through the origin O. Extend OQ to
form a new coordinate axis Qz′: this is parallel to v and in the same sense. Then
(see (10.14))
v ·r = |v||r | cos θ = 1 × OP cos θ = OQ = z′,
where z′ is the Oz′ coordinate of Q (and P). The formula becomes
u(t, r) = A cos(ω t − kv·r + φ) = A cos(ω t − kz′ + φ).
This expression is the same as (20.30), but refers to an axis Oz′ parallel to v, in
place of the axis Oz. It therefore represents a plane travelling wave of the type
(20.30) propagated in the direction of v.

Self-test 20.5
Two travelling waves x = A cos(ω t − kz + φ1), x = A cos(ω t + kz + φ2) are
superimposed. What type is the resulting wave? Where are its nodes?

20.9 Compound oscillations; beats


Consider the superposition of two sinusoidal oscillations, u1(t) and u2(t). We shall
limit the discussion to oscillations having the same amplitude and zero phase
angle, but different frequencies, so that
u(t) = u1(t) + u2(t), (20.32a)

with
u1(t) = A cos ω 1t, u2(t) = A cos ω 2t. (20.32b)

The function u(t) is not necessarily periodic. If ω 1 /ω 2 is an irrational number,


such as √2 (see Section 1.1), then u(t) is not periodic. If ω 1 /ω 2 is a rational
number, let
ω1 p
= ,
ω2 q
432
where p and q are integers and p/q is in its lowest terms. In that case u(t) is periodic,
with period T given by
HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

2π p 2πq
T= (which also = ). (20.33)
ω1 ω2
Evidently T may be very much larger than the periods 2π/ω1 and 2π /ω 2 of the
individual components because of the possibly large size of the factors p and q
(see Problem 20.22).
Express the difference between the component frequencies in (20.32) by ∆ω :
ω 2 = ω 1 + ∆ω , (20.34)

and use the trigonometric identity from Appendix B(d):


u(t) = A cos ω 1t + A cos ω 2t
≡ 2A cos 12 (ω 2 − ω 1)t cos 12 (ω 1 + ω 2)t
= (2A cos 12 ∆ω t)cos 12 (ω 1 + ω 2)t. (20.35)

This expression consists of an oscillation cos 12 (ω 1 + ω 2)t, with angular frequency


equal to the average of the component frequencies, modulated by an amplitude
function B(t) given by
B(t) = 2A cos 12 ∆ω t. (20.36)

If ∆ω is fairly small compared with (ω 1 + ω 2), then B(t) is a slowly varying function
1
2
compared with cos 12 (ω 1 + ω 2)t, and (20.35) takes on the appearance of Fig. 20.17c.
Figure 20.17 shows the components u1(t) and u2(t), and the composite function
20

u(t), together with the functions ±B(t), which form the profile of a stream of wave
packets made up of faster oscillations. These wave packets are called beats. The
beats arise from a kind of interference: where u1(t) and u2(t) are nearly in phase
(see Fig. 20.17b) they reinforce each other so that u(t) is large; where they are
opposed u(t) is small (see Fig. 20.17b). Despite appearances, the beats will not in
general contain an exact number of complete cycles of u(t): in this case the period
of u(t) is about 31 beats long (see Problem 20.22b). The period and frequency of
the beats (as distinct from the amplitude function B(t)) are defined to be equal to
the period and frequency of the wave packets; therefore

Beat period
TB = 12 period of B(t) = 2π/(∆ω).

Beat frequency
FB = 2 × frequency of B(t) = ∆ω /(2π). (20.37)

If the wave concerned is a sound wave, the tone that is detected by the ear cor-
responds to the pulse or beat frequency (that is, the frequency of B(t)) rather than
to that of the underlying frequencies f1 and f2. In cases where f1 and f2 are large
compared with the frequency fB of the beats, the underlying rapid oscillation is
sometimes referred to as a carrier wave, and the beats correspond to a signal.
433

u1, u2

20.9
1

COMPOUND OSCILLATIONS; BEATS


8
(a) t
O

−1
u1(t) u2(t)

max[u1]
(b) t
max[u2] A A A A

8
(c) t
O

−1

−2

±B(t) u(t)

Fig. 20.17 Here A = 1, ω1 = 10, ω2 = 13.1. (a) u1 = A cos ω1t, with u2 = A cos ω 2t. (b) Phase
reinforcement of u1, u2 near points A. (c) u1 + u2 and ±B(t) = ±2A cos 21 ∆ω t.

Equilibrium positions

x y
S1 S2 S3

A P Q B

Fig. 20.18 Two loosely-coupled oscillating masses display beats.

For a mechanical example of the occurrence of beats, see Fig. 20.18. Unit particles
P and Q are connected to fixed points A, B through springs S1, S2, S3 of natural
lengths l, with l  13 AB; the stiffness of each of the springs S1 and S3 is K, and that
of the connecting spring S2 is k. The displacements of P, Q from equilibrium are
x(t), y(t) respectively. The particles oscillate along the line AB, and their equations
of motion are
434
F + (K + k)x = ky, H + (K + k)y = kx. (20.38)
HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

A related mechanical example was also considered in Section 13.8 in the chapter
on eigenvalues.
It can be checked by substitution that a particular pair of solutions is given by
x(t) = cos(t√K) + cos[t√(K + 2k)], (20.39a)

y(t) = cos(t√K) − cos[t√(K + 2k)]. (20.39b)

As in (20.36b) we obtain the corresponding amplitude functions, say Bx and By:


Bx(t) = 2 cos 12 [√(K + 2k) − √K]t, (20.40a)

and
By(t) = −2 sin 12 [√(K + 2k) − √K]t
= 2 cos{ 12 [√(K + 2k) − √K]t − 12 π}, (20.40b)

which define the beats.


The particles P and Q therefore behave similarly, only their phases being
different, so we shall only look in detail at the beats Bx(t) corresponding to the
motion x(t) of P (eqn (20.40a)). To obtain an approximation, we shall assume that
k is small compared with K. Put
√K = ω 1, √(K + 2k) = ω 2.
Then
∆ω = ω 2 − ω 1 = √(K + 2k) − √K = √K{√[1 + (2k /K)] − 1}
20

≈ k/√K  1
(using the first term of the binomial theorem (Section 1.18 or 5.4) to approximate
to √[1 + (2k/K)]). Therefore ∆ω ≈ k /√K in (20.36). The displacement x(t) of
particle P has beat period TB given by (20.37):
2π 2π √K
TB = ≈ .
∆ω k
Beats with the same period also occur in y(t), but are out of phase with those
of x(t) by half a beat period. A fixed stock of free mechanical energy is handed
back and forth between P and Q: when one is vibrating vigorously, the other
has only a small amplitude, and P and Q alternate in this respect. The same
phenomenon occurs when two pendulum bobs are coupled by a weak spring.

20.10 Travelling waves; beats


Consider the superposition of two sinusoidal travelling waves, u1(t, z) and u2(t, z),
having the same amplitude and zero phase angle, but different frequencies, so that
u(t, z) = u1(t, z) + u2(t, z), (20.41a)

where
u1(t, z) = A cos(ω 1t − k1z), u2(t, z) = A cos(ω 2t − k2 z). (20.41b)
435

u(t0, z) ±B(t0, z)

20.10
4A z

TRAVELLING WAVES; BEATS


Beat wavelength Velocity v

Fig. 20.19 u(t, z) against z at a particular time.

By the identity from Appendix B(d), u(t, z) may be written as


u(t, z) = 2A cos[ 12 (ω 2 − ω 1)t + 12 (k2 − k1)z]cos[ 12 (ω 2 + ω 1)t − 12 (k2 + k1)z]. (20.42)

In this section we visualize graphs of u plotted against z as in Fig. 20.19, for dif-
ferent values of time t.
Suppose firstly that the phase velocity v is constant for all sinusoidal waves. Put
ω 2 = ω 1 + ∆ω, k2 = k1 + ∆k. (20.43)

The wave numbers and angular velocities are connected by (20.28):


vk1 = ω 1, vk2 = ω 2, v∆k = ∆ω.
Equation (20.42) becomes
u(t, z) = 2A cos 12 ∆ω [t − (z/v)]cos 12 (ω 1 + ω 2)[t − (z /v)]. (20.44)

This wave has the same form as the oscillation discussed in the previous section
(eqn (20.35)), except that we have t − (z /v) in place of t. Plotted against z, as in
Fig. 20.19, the wave travels unchanged at speed v. There is a carrier wave with
wavelength λ C = 2π/[12 (k1 + k2)], multiplied by a beat function B(t, z):
B(t, z) = 2 A cos( 21 t∆ω − 21 z ∆ k).
In terms of k the beat wavelength λB is
λB = 21 (wavelength of B(t, z)) = 21 [2π /( 21 ∆ k)] = 2π /(∆ k)
and the beat frequency fB is
fB = 2(frequency of B(t, z)) = 2[( 21 ∆ω )/(2π)] = ∆ω /(2π).
The beats travel with the phase velocity v:
beat velocity = 21 ∆ω /( 21 ∆ k) = v∆ k /(∆ k) = v.
This is to be expected, since the components u1 and u2 have equal velocity v, and
therefore remain in a constant phase relationship, reinforcing and cancelling each
other over segments that remain in step as the waves travel. The theory of beats in
travelling waves is related to frequency modulation in radio transmission.
436

20.11 Dispersion; group velocity


HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

There are media and special situations where the velocity of a travelling sinusoidal
wave varies with its frequency (or, equivalently, with its wavelength). Such waves
are called dispersive waves: light waves are dispersive, leading to their spectral
decomposition upon entering a refractive medium.
In a dispersive medium, one component wave will overtake the other; therefore
u1 and u2 will not maintain a constant phase relationship as the wave travels, and
the velocity associated with the beats is affected. Suppose that two dispersive
waves, u1(t, z) and u2(t, z), have different angular frequencies ω 1 and ω 2, and
phase velocities v1 and v2 ≠ v1 (whose values will depend on ω 1 and ω 2). Refer
back to (20.44): if ω 2 and ω 1 are fairly close then distinct beats occur (in both
time t and space z). Their profile is determined by the curves ±B(t, z), where
B(t, z) = 2 A cos( 21 t ∆ω − 21 z ∆ k), (20.45)

in which ∆ω = ω 2 − ω 1, ∆k = k2 − k1. This beat profile represents a wave travelling


along the z axis with a velocity vg, called the group velocity, where
1
∆ω ω 2 − ω 1
vg = 2
= . (20.46)
1
2 ∆k k2 − k1
In general vg will differ from the phase velocities v1 and v2 of the constituent waves
u1 and u2, given by v1 = ω 1 /k1, v2 = ω 2 /k2. In so-called anomalous cases, vg can even
exceed them; a signal may travel faster than the phase velocity of the constituent
waves.
20

If the wave number k for a sinusoidal wave in a medium (whether dispersive


or not) is prescribed, then the value of the angular frequency ω is also settled by
ω = kv. We can therefore regard both ω and v as functions of k only, and write
ω (k) = kv(k). (20.47)

A graph of v(k) against k would contain all we need to know about the behaviour
of v to enable wave interactions of any degree of complexity to be computed. (We
could instead work with ω rather than k as the independent variable, or λ, or
f (see Problem 20.18), but in any case only one parameter is needed to specify
the variation of v.)
We shall relate this observation to the group velocity problem just discussed,
for cases where the wave number ∆k of the beats is small compared with the wave
number k of the carrier wave. In this case the beats will be very distinct. From
(20.45), (20.46), and (20.47)
∆ω ω − ω 1 k2v(k2 ) − k1v(k1 )
= vg = 2 = . (20.48)
∆k k2 − k1 k2 − k1
The form of this equation suggests that we can approximate to vg by an expres-
sion involving the derivative of kv(k). Suppose that

∆k = k2 − k1 → 0, or k2 → k1.
437
For simplicity, write k in place of k1; then from (20.47)

20.12
∆ω d(kv(k)) k dv(k)
→ = v(k) + .
∆k dk dk

THE DOPPLER EFFECT


Therefore, for any value of k, and ∆k small enough we have

Group velocity approximation


dv
vg ≈ v + k ,
dk
or
dv
group velocity ≈ phase velocity + k .
dk (20.49)

We have assumed ∆k to be ‘small’, but in a physical context one would like to


know when ∆k is small enough for the approximation to be useful. It is sufficient
that the dimensionless quantity ∆k/k should be small (see Problems 20.25 and
20.26).

20.12 The Doppler effect


Suppose that a steady, plane sound wave is being emitted from a large plane
membrane vibrating with frequency f cycles per second, which is moving from
left to right with velocity u  v, where v is the sound velocity relative to the
medium (see Fig. 20.20). E and E′ show the positions of the membrane at times t0
and t1  t0. There is a fixed observation point at P. The wave front that is at P at
time t0 travels with phase velocity v to the point Q at t1. Speaking broadly, if the

Emitter
Observer and
wave front at P
E
At t = t0
P

u∆ t v∆t

E′ P
At E Q
t1 = t0 + ∆t
Wave front from
P at t0
Emitter

Fig. 20.20 A plane vibrating membrane at E is moving forward with velocity u  v.


438
emitter is chasing the waves, then a given number of waves occupy a shorter
length ahead of the emitter than if it were stationary. Therefore the motion of E
HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

reduces the wavelength λ. The frequency f of the emitter and the phase velocity v
(relative to the medium, which we take to be stationary) are fixed, so that the
frequency of arrival of waves at any fixed point P must be greater than f.
To examine the effect quantitatively, see Fig. 20.20. In the following, u may
be positive, as shown, or negative (corresponding to E moving oppositely to the
wave direction). Put
EP = L, t1 − t0 = ∆t.
Then at time t1, E has moved to E′ and the wave front to Q, so that
E′Q = L − u∆t + v∆ t = L + (v − u)∆ t.
Let the wavelength be λ. Then EP contains L /λ wavelengths, and E′Q contains
E ′Q L + (v − u)∆ t
= wavelengths.
λ λ
In the interval ∆t, f∆ t new waves have been generated, so that
L + (v − u)∆ t L (v − u)∆ t
− = f∆ t, or = f∆ t.
λ λ λ
Therefore
v−u
λ= . (20.50)
20

f
The fixed receiver at P records, say, fP waves per second, and these are travelling at
the normal phase velocity v, so from (20.28) and (20.50),
v vf f
fP = = = . (20.51)
λ v − u 1 − (u /v)
If u  0 (E moving in the direction of v) then fP  f. If u  0 (E moving oppositely to
the direction of v) then fP  f.
The effect is observable if a vehicle with a siren speeds past an observer at
a point P: as it passes there is a sudden lowering of the pitch. If the speed of the
vehicle is u, the frequency drop ∆fP is given by
f f
∆f P = − , (20.52)
1 − (u /v) 1 + (u /v)
which is approximately equal to 2uf /v if u/v is small. The so-called ‘red shift’ in
astronomy, by which the velocity of a receding galaxy can be observed from
the change towards longer wavelengths (lower frequencies) in its spectrum, is
explained on the same lines.
439
Problems

PROBLEMS
20.1 Express the following in standard amplitude– (a) State the time constant for Qc in Example 20.4.
phase form C cos(ω t + φ), with C  0 and (b) Describe how T provides a measure of the
−π  φ  π. rate of exponential decay of x(t), rather like the
(a) 3 cos(3t + --32 π); (b) 3 cos(ω t − 3π); half-life period of a radioactive substance.
(c) 2 sin 3t; (d) 3 sin(2t + --12 π);
(e) −3 cos(2t − --12 π); (f ) −4 cos(2t + --14 π); 20.7 (Heavy damping). Find the general solution
(g) −sin t; (h) 3 cos 2t + 4 sin 2t; of the equation
(i) cos 2t + cos(2t − π); x″ + 2kx′ + ω 2x = 0
(j) cos(2t − --32 π) − cos(2t + --32 π).
when k2  ω 2. Describe the general character of
20.2 State whether x leads or lags y in the
the solutions, contrasting them with the case
following cases, and by how much. when k2  ω 2.
(a) x = 4 cos 3t, y = 3 cos(3t − 12 π).
20.8 Solve the equation
(b) x = 2 cos(2 t + 41 π), y = 3 cos(2 t + 92 π).
(c) x = −3 cos 2t, y = 4 cos 2t. x″ + 10x′ + 24x = 0
(d) x = cos 3t, y = sin 3t. subject to the initial conditions x(0) = −3, x′(0) = 20.
(e) x = 2 cos 3t, y = cos(3t − 94 π). Show that the solution curve crosses the t axis only
once, at the point t = ln 2.
20.3 Obtain the free oscillations of the following
in the form C cos(ω t + φ). State (i) the natural 20.9 (‘Critical damping’). Find the general
frequency if the damping coefficient is put to zero; solution of the equation
(ii) the frequency that actually occurs in the cosine
x″ + 2kx′ + ω 2x = 0
term of the solution; (iii) the number of complete
cycles needed for the amplitude to drop to 0.1 for the case when k2 = ω 2.
of its value at t = 0.
(a) x″ + 20x′ + (2.5 × 105)x = 0. 20.10 The following equation could represent the
(b) x″ + 0.5x′ + 4x = 0. damped vertical motion of a mass supported by a
(c) x″ + 0.15x′ + 3x = 0. spring and subjected to an external periodic force:
(d) x″ + x′ + 20x = 0. x″ + x′ + 36x = 10 cos ω t, for t  0,
the system being in equilibrium under no force
20.4 Express A cos ω t + B sin(ω t + 41 π) in the for t  0.
standard form C cos(ω t + φ ) when (a) A = 3–, B = 1;
1
2
(a) Find the period of the free (damped)
(b) A = 3–, B = −1; (c) A = −3–, B = 1;
1 1
2 2
oscillations. Show that any free oscillations
(d) A = −3–, B = −1.
1
2
stimulated at startup are reduced by a factor
of about 14 after five periods of oscillation.
20.5 (a) Show that the maxima and minima of (b) Obtain expressions in terms of ω for the
x(t) = C e−kt cos(ω t + φ) occur at times TN given by amplitude and phase of the forced oscillation.
⎛k⎞ (c) Find the condition for resonance.
ω TN + φ = − arctan ⎜ ⎟ + Nπ, (d) Plot curves of amplitude and phase against
⎝ω⎠
ω for a range 4  ω  8.
where N is any integer.
(b) Show that the values of x(t) at these points
20.11 A particle rolls to and fro under gravity at
are given by
the bottom of a parabolic cylinder having vertical
(−1)N ω C e −kTN cross-section y = ax2. There is negligible friction.
x(TN ) = .
(ω 2 + k 2 ) 2
1
The equation of motion in terms of horizontal
displacement x is then
20.6 Consider an expression of the form x″ + 2ax(g + 2ax′2)/(1 + 4a2x2) = 0.
x(t) = e−t/Tg(t), Show that for small oscillations the period is
2– π /(ag)– .
1 1
where T is a constant, and g(t) itself does not have 2 2

any term in it like e ±kt (e.g. g(t) might be a constant,


or cos t, or even t 3, but it must not be, for example, 20.12 A particle is balanced at the topmost point,
e−2t cos t). Then T is called the time constant of f(t). x = y = 0, of an inverted parabolic cylinder whose
440
shape is described by y = −ax2, y being measured 20.15 Given the expression for the forced
vertically upward. Its equation of motion is amplitude A in eqn (20.16), deduce the expressions
HARMONIC FUNCTIONS AND THE HARMONIC OSCILLATOR

x″ + 2ax(2ax′ 2 − g)/(1 + 4a2x2) = 0. for the resonant frequency and resonant


amplitude.
By linearizing the equation show that, if the
particle is slightly disturbed, it starts to move away 1
20.16 (a) A plane sound wave has period s and
250
from its initial position (0, 0) at an increasing rate.
wavelength 1.2 m. Obtain the speed of sound in
(This condition is called unstable equilibrium.)
the medium.
20.13 The equation for the displacement x(t) of an (b) There is a broadcasting station with
electrical circuit fixed on springs and influenced by tuning frequency close to 100 MHz. Obtain the
a current-carrying conductor is corresponding wavelength. (The speed of
electromagnetic waves can be taken to be
x″ + 4[x − 2/(3 − x)] = 0. 3 × 108 m s−1 in round figures.)
(a) Show that there are two positions x at which
the circuit could theoretically be in equilibrium. 20.17 Show that the stationary plane wave
(‘Equilibrium’ means that u(t, x) = A cos[(2πt/T) + φ ]cos[(2πz /λ ) + α ]
x(t) = constant (see eqn (20.24)) is equivalent to the superposition
is a solution of the equation.) of two plane waves travelling in opposite
(b) Call the equilibrium positions x = a and x = b. directions.
To investigate the state of affairs near x = a, put
x=a+u 20.18 u = cos(ω t − kz + φ ) represents a travelling
into the equation, so as to obtain an equation for plane wave. Express u in terms of (i) period T and
u(t), which is the distance from a. Then do the wavelength λ ; (ii) frequency f and wavelength λ;
same thing near x = b by putting (iii) circular frequency ω and phase velocity v.
x = b + v,
20.19 Show that the travelling wave u =
where v is distance from b. Tidy the equations as cos(ω t − kz + φ ) is equivalent to the superposition
far as possible. of two stationary waves.
20

(c) Suppose that u in one case, and v in the


other, are small, and linearize the equations in 20.20 u(t, z) = A cos(4500t − 3z) represents a
each case. travelling wave. Obtain the phase velocity v, the
(d) Show that in one case small oscillations take period t, the frequency f, and the wavelength λ .
place, but that in the other the displacement tends
to increase. (One is called a stable equilibrium 20.21 Obtain a general expression for a plane
state, the other unstable.) harmonic wave u(t, x, y, z) of angular frequency
ω, travelling in a medium of wave velocity v, in
20.14 A particle moves in a plane under a central
the direction making equal (acute) angles with
attractive force γ /r α per unit mass, where r and θ
the axes Ox, Oy, Oz.
are its plane polar coordinates relative to an origin
in the attracting body. Its equation of motion can
be expressed in the form 20.22 (a) Prove that u(t, z) = A cos ω 1t + A cos ω 2 t
is periodic if, and only if, ω 1 /ω 2 is a rational
2
du γ
+ u − 2 uα − 2 = 0, number (see Section 1.1), and confirm eqn (20.33)
dθ 2 H for the period.
where u = r −1 and H is its (constant) angular (b) Obtain the period T of the oscillation
momentum per unit mass. cos 10t + cos 13.1t (shown in Fig. 20.17). Compare
Show that the equation has a constant solution T with the period TB of the beats, and with the
u = u0, which is equivalent to a circular orbit. Does periods T1 and T2 of the constituent oscillations.
it stay close to this orbit if its position u is slightly
changed from u0, while H keeps its original value?
20.23 (a) Obtain the sum of the two oscillations
(Hint: put u = u0 + x and linearize the equation for
given by A1 cos(ω t + φ1) and A2 cos(ω t + φ 2), in the
small x. You may assume that for small values of
form A cos(ω t + φ ).
x/u (see (5.4d))
(b) Obtain the sum of two travelling waves
⎛ x⎞ A1 cos(ω t − kz + φ1) and A2 cos(ω t − kz + φ2) in
(u 0 + x)α − 2 ≈ uα0 − 2 ⎜1 + (α − 2) ⎟ .
⎝ u0 ⎠ the form of another travelling wave.
441
20.24 (a) A beam of light falls perpendicularly 20.26 In the notation of Section 20.10, prove the
upon a surface, and is reflected without change identities

PROBLEMS
of amplitude or phase. What is the nature of the ⎛ ∆f ⎞ ⎛ ∆λ ⎞ ∆v
combined wave? ⎜1 + ⎟ ⎜1 + ⎟ =1+
(b) Consider separately the effects of a change ⎝ f ⎠⎝ λ ⎠ v
in phase and a change in amplitude upon and
reflection. ⎛ ∆k ⎞ ⎛ ∆v ⎞ ∆ω
⎜1 + ⎟ ⎜1 + ⎟ =1+ .
⎝ k ⎠⎝ v ⎠ ω
20.25 Two superposed plane waves, u1 and (These relations are exact, but show that when
u2, travel in the z direction through a dispersive small values are being considered the natural
medium in which the phase velocity v is regarded variables to use are ∆f /f etc.)
as a function of wavelength λ . They have the same
amplitude A and phase angle zero, but different 20.27 A fire truck speeds along a highway at
wave numbers (and consequently different 100 km h−1, sounding its siren at a frequency of
angular frequencies ω ). Show that vg = v − λ dv/dλ , 350 cycles per second. Obtain the drop in pitch
where vg is the group velocity. (Hint: start with noticed by an observer standing on the sidewalk
vg = ∆ω /∆k.) as it goes past.
Steady forced oscillations:
21 phasors, impedance,
transfer functions

CONTENTS

21.1 Phasors 442


21.2 Algebra of phasors 444
21.3 Phasor diagrams 445
21.4 Phasors and complex impedance 446
21.5 Transfer functions in the frequency domain 451
21.6 Phasors and waves; complex amplitude 453
Problems 458

We shall consider circuits driven by an applied harmonically alternating voltage,


with resistances placed so that any free oscillations set up by switching on the circuit
die away, leaving only a periodic forced oscillation, as described in Section 20.5.
It is only this remaining, steadily-oscillating state that is discussed here.

21.1 Phasors
Let x(t) represent any variable in the circuit, such as the current in a particular
branch. If the frequency of the applied voltage is ω /2π, then all these possible
variables x(t) share the same frequency ω /2π once the transients have died away,
though in general the phases and amplitudes of different variables are different.
Here we adopt the standardized amplitude /phase form of (20.2), assuming that
x(t) = c cos(ω t + φ), with c  0 and −π  φ  π. (21.1)

We can write x(t) in a complex form instead:


x(t) = Re(c ei(ω t+ φ)) = Re(c eiφ eiω t).
The complex coefficient c eiφ that multiplies eiω t is called the phasor correspond-
ing to x(t), and it is independent of time t. Every variable x(t) will have its own
phasor, but the factor eiω t is the same for each one. In a circuit, φ and c usually
depend on ω, so the values of the corresponding phasors will depend on ω.
Corresponding to each variable denoted by a lowercase letter, we use a bold
capital letter to denote the phasor. This style is traditional, and emphasizes that
phasors, being complex numbers, can be treated as vectors in the Argand
443
diagram. Typically, the variables are the voltages between any two nodes of a
circuit, and the currents in any branch.

21.1
PHASORS
Phasor of a harmonic oscillation
The phasor of x(t) = c cos(ω t + φ) is the complex number X = c eiφ. (21.2)

In engineering applications, phasors c eiφ are sometimes written in the form


c φ, and φ may be expressed in degrees. Thus, if X = 3 e− 4 πi, we can write
1

X = 3 − 14 π = 3 − 45°.
The two numbers displayed are the polar coordinates of the point which re-
presents the phasor on an Argand diagram, in this case the point (3/√2, −3/√2)
corresponding to
3 3
X = 3 cos(− 45°) + i3 sin(− 45°) = −i .
√2 √2
It is often convenient to express a phasor in the form a + ib rather than in the polar
form c eiφ.

Example 21.1 Find the phasor of x(t) = −3 cos(2t + 12 π).


In standard form (21.1), x(t) = 3 cos(2t − 12 π). The phasor X is therefore given by
X = 3 e− 2 πi or 3 −90°.
1

Example 21.2 Given that the prevailing angular frequency is ω = 10 4,


find the functions x(t) having the following phasors. (a) X = 1/(−1 + i),
(b) X = (1 − √3i)/(−1 + i).
(a) Put X into polar form:
1 −1 − i 1 − 3 πi
X= = = − 12 − 12 i = e 4 .
−1 + i (−1)2 + 12 √2
Therefore
1
x(t) = cos(10 4 t − 43 π).
√2
(b) 1 − √3i = 2 e− 3 π i (as can be seen by putting the point 1 − √3i on an Argand diagram).
1

Therefore, using (a),


⎛ 1 3 ⎞
X = (2 e− 3 π i ) ⎜ e− 4 π i⎟ = √2 e− 12 π i .
1 13

⎝ √2 ⎠
The phase (− 12
13
π) is out of the standard range (21.1), so add 2π to it, leaving X unchanged.
We obtain, in the standard form, X = √2 e 12 πi , so
11

x(t) = √2 cos(10 4 t + 11
12 π).
444

Example 21.3 Let x(t) = √3 cos ω t − sin ω t. Find the corresponding phasor.
STEADY FORCED OSCILLATIONS: PHASORS, IMPEDANCE, TRANSFER FUNCTIONS

Take the terms separately:


√3 cos ω t = Re(√3 eiω t);
sin ω t = cos(ω t − 12 π) = Re(e − 2 πi eiω t ).
1

Combining them, we obtain


x(t) = Re[(√3 − e− 2 πi ) eiω t].
1

Therefore
X = √3 − e− 2 πi = √3 − (−i) = 2 e 6 πi or 2 30°.
1 1

Self-test 21.1
Find the phasor of x(t) = −2 cos(3t − --12 π).

21.2 Algebra of phasors


As seen in Example 21.3, when oscillations associated with the same value of ω
combine by addition, so do their phasors. Suppose, for instance, that u(t) and v(t)
have the same angular frequency ω, and that their phasors are U and V. Then
u(t) = Re(U eiω t) and v(t) = Re(V eiω t), so
u(t) + v(t) = Re[(U + V) eiω t],
whose phasor is U + V. The addition holds similarly if there are more terms
present.

Addition principle for phasors


If u(t), v(t), … have a common frequency, and
z(t) = u(t) + v(t) + ··· , then Z = U + V + ··· , where
Z, U, V, … are the corresponding phasors. (21.3)
21

Differentiation and integration give important results. If x(t) = Re(X eiω t), where
X is the phasor, then dx /dt = Re(iωX eiω t), so that the phasor of dx /dt is iωX.
Differentiate again, and a further factor iω is introduced, so that the phasor
of d2x/dt2 is (iω)2X, and so on. For ∫ x(t) dt, we find in the same way that the
phasor is X/iω. The additive arbitrary constant in ∫ eiω tdt has been put to zero
because, in normal use, all the variables that occur oscillate sinusoidally.

Phasors of derivatives and integrals

Variable: x
dx
dt
d2x
dt2 x dt
1
Phasor: X = c eiφ iωX −ω 2X X
iω (21.4)
445

Example 21.4 Obtain the phasor of the expressions

21.3
d2 q
(a) L 2 + R
dt
dq q
dt C
+ , (b) L
di 1
+
dt C 
i dt, in terms of the phasors

PHASOR DIAGRAMS
Q of q(t) and I of i(t). (L, R, and C are circuit constants, and the prevailing
frequency is ω.)
(a) From (21.4) and the addition principle (21.3), the phasor is
L(iω )2Q + R(iω )Q + (1/C)Q = [(1/C) − Lω 2 + iRω ]Q.
(b) The phasor is
L(iω )I + (1/C iω )I = i(Lω − 1/Cω )I.

Example 21.5 Find the steady-state solution of


dx2 dx
+8 + 10 4 x = 2 × 10 3 cos 90t.
dt 2 dt
This is equivalent to the circuit equation in Example 20.5, with x(t) in place of q(t). The
prevailing value of ω is 90. Let X be the phasor of x(t). The phasor of the right-hand side
is 2 × 103, so by using (21.4) we obtain
[(90i)2 + 8(90i) + 104]X = 2 × 103,
or
(1900 + 720i)X = 2 × 103,
from which X can be found.
2 × 10 3 1 1
X= = = = 0.984 e − 0.362i .
1900 + 720i 0.95 + 0.36i 1.0159 e0.3622i
Therefore
x(t) = Re[0.984 e−0.362i e90it] = 0.984 cos(90t − 0.362),
as we found in Example 20.4 for the forced oscillation.

Self-test 21.2
Using phasors, find the steady-state solution of
d 2x dx
+ 4 + 2504x = 103 cos 100t.
dt2 dt

21.3 Phasor diagrams


Complex numbers can be represented by vectors in an Argand diagram (see
Section 6.2), and they are added in the same way as the corresponding vectors.
Phasors are just complex numbers, so they add like vectors too. This fact can be
used to show pictorially how a number of superposed oscillations which are not
in phase with each other contribute to the sum. The diagrams concerned are
called phasor diagrams.
446

Example 21.6 Let u(t) = 2 cos 10t, v(t) = cos(10t − 21 π), and
STEADY FORCED OSCILLATIONS: PHASORS, IMPEDANCE, TRANSFER FUNCTIONS

w(t) = 3 cos(10t + 14 π). Find p(t) = u(t) + v(t) + w(t) by means of a


phasor diagram.

(a) Imaginary axis (b) Imaginary axis


3 3 3
W 2
2 √2 2 1
1 1 3 P
U 3/√2 − 1
45° φ
−1 O 1 2 3 4 5 −1 O 1 2 3 4 5
−1 V Real axis −1 Real axis

−2 −2

Fig. 21.1 (a) Argand diagram showing U, V, W. (b) The sum U + V + W = O__P.

The phasors corresponding to u, v, and w are U = 2, V = e− 2 πi , and W = 3 e 4 πi. In the


1 1

polar-coordinate notation they are U = 2 0°, V = 1 −90°, W = 3 45°. They are shown
as position vectors in Fig. 21.1a, and in Fig. 21.1b they are strung together as usual for
addition. The vector O_P _ can be measured off from the diagram, or calculated using
the dimensions shown. We have
| O_P | = [(3 / √2 + 2)2 + (3 / √2 − 1)2 ]2 = 4.27,
1

3 / √2 − 1
φ = arctan = 0.479 (radians).
3 / √2 + 2
Therefore p(t) = 4.27 cos(10t + 0.479).

Self-test 21.3
If u(t) = cos(5t − --14 π), v(t) = 2 cos(5t + --12 π) and w(t) = 3 cos(5t + --14 π), find
φ(t) = u(t) + v(t) + w(t) by means of a phasor diagram.
21

21.4 Phasors and complex impedance


In the following table, an electric current
i(t) = c cos(ω t + φ)
with phasor
I = c eiφ
is caused to pass through a resistor, an inductor, and a capacitor, separately. The
resulting voltage drop v(t) associated with each is shown, together with its
phasor V. It is the unique steadily oscillating state that is being described by the
phasors.
447

Resistor Inductor Capacitor

21.4
+ − + − + −
i i i

PHASORS AND COMPLEX IMPEDANCE


v v v

Voltage drop: v = Ri v=L


di
dt
v=
1
C  i dt
1
Voltage phasor V: Rc eiφ = R I iωL I I
iω C
Voltage phase: φ φ + 12 π φ − 12 π
(in phase) (v leads i) (v lags i)
(21.5)

A similar table can be constructed if the voltage rather than current is pre-
scribed. The entries can be read from the table above; for example, if the phasor
of the voltage applied to an inductor is V, the phasor of the resulting current is
V/(iωL).
Discussion of circuits in terms of phasors is said to take place in the frequency
domain, rather than the time domain associated with the differential equations
of the circuits.
Each of the three cases in the table can be written in the form

V = ZI,

where Z is either R, iωL, or (iω C)−1. The quantity Z is called the complex
impedance of these elements. There is a plain analogy with Ohm’s law for direct
current through a resistance. We have

Complex impedance Z
Resistor Z=R
Inductor Z = iω L
1
Capacitor Z= .
iω C
(21.6)

By stringing elements of this type together in series and parallel, we can


form composite units. The combined unit has a complex impedance made up
from the complex impedances of the individual elements as in the following
Examples.

Example 21.7 Show that the complex impedance Z of two elements in series,
whose complex impedances are Z1 and Z2, is given by
Z = Z1 + Z2. ➚
448
Example 21.7 continued
STEADY FORCED OSCILLATIONS: PHASORS, IMPEDANCE, TRANSFER FUNCTIONS

v
+ −
i Z1 Z2

v1 v2 Fig. 21.2

Suppose that the impedance of the unit is Z; we mean by this that, if V is the phasor
of the voltage drop across the unit and I is the phasor of the current through it
(see Fig. 21.2), then
V = ZI.
From Fig. 21.2, v = v1 + v2; therefore, by (21.3), the corresponding phasors satisfy
V = V1 + V2.
But i, and therefore I, is the same for Z1 and Z2, so
V1 = Z1I, V2 = Z2I.
Therefore V = Z1I + Z2I = ZI, or
Z = Z1 + Z2.

If the two impedances are in parallel, the analogy with Ohm’s law again exists:

Example 21.8 Show that the complex impedance Z of any two elements Z1 and
1 1 1
Z2 in parallel is given by = + .
Z Z1 Z2

Z1
i1

+
i i

Z2
i2
21

+
Z Fig. 21.3 Two impedances in
i i parallel and their combined
v impedance z = z1z2/(z1 + z2).

From Fig. 21.3, i = i1 + i2; so, by (21.3), I = I1 + I2. The voltage drop is the same for both
branches, so
I1 = V/Z1, I2 = V/Z2, I = V/Z.
Therefore I = (1/Z1 + 1/Z2)V = (1/Z)V, from which the result follows.

It is easy to extend these two results to encompass more elements, and therefore
we have the following general result.
449

Complex impedance Z of series and parallel circuits

21.4
(a) Impedances Z1, Z2, … , in series:
Z = Z1 + Z2 + ··· .

PHASORS AND COMPLEX IMPEDANCE


(b) Impedances Z1, Z2, … , in parallel:
1 1 1
= + + .
Z Z1 Z 2
(21.7)

The analogy with resistive circuits, evident from these formulae, goes much
further. The general rules which govern voltages and currents in a passive linear
circuit are Kirchhoff’s laws: (i) that the algebraic sum of the voltages around any
closed circuit is zero; (ii) that the resultant current entering any junction is zero.
There is also a linear voltage/current relation for each branch. In terms of phasors
and complex impedances for a circuit in a state of steady harmonic oscillation,
these conditions become the following.

Kirchhoff’s laws
Around any closed circuit, ∑ V = 0.
At any junction, ∑ I = 0.
On any branch, V = ZI.
(21.8)

These rules have the same form as the rules for resistive direct-current circuits,
with V, I, and Z appearing in them in place of v, i, and R. It follows that general
rules applicable to DC circuits may be borrowed for the purpose of the circuits
we have been considering. Such rules are the Wheatstone bridge rules, Thévenin’s
theorem, and the structure of equivalent circuits. However, the restriction to
steady harmonic oscillation must be remembered: many circuits can be made
to ‘balance’ like a Wheatstone bridge for steady oscillations, but not for more
general disturbances.

Example 21.9 Find the steady alternating current in the circuit shown in
Fig. 21.4.

+
C
v(t) = v0 cos ω t

L
Fig. 21.4


450
Example 21.9 continued
STEADY FORCED OSCILLATIONS: PHASORS, IMPEDANCE, TRANSFER FUNCTIONS

The unit comprising R and C consists of two complex impedances in parallel, R and
(iω C)−1. If Z is the combined impedance, then
1 1 1
= + ,
Z R (iω C)−1
which gives
R
Z= .
1 + iω RC
Z is in series with the other impedance, iω L, so the impedance of the circuit is given by
R R(1 − ω 2 LC) + iω L
Z= + iω L = .
1 + iω RC 1 + iω RC
Since I = V/Z, and V = v0, we obtain
v0(1 + iω RC) v (1 + ω 2R2 C2 )2
1

i(φ 1 − φ 2 )
I= = 2 0 2 1 e ,
R(1 − ω LC) + iω L [R (1 − ω LC)2 + ω 2L2 ]2
2

where
ωL
tan φ 1 = ω RC, tan φ 2 = .
R(1 – ω 2LC)
Finally,
v0(1 + ω 2R2 C2 )2
1

i(t) = Re(I eiω t) = 1 cos(ω t + φ 1 − φ 2 ).


[R (1 − ω 2LC)2 + ω 2L2]2
2

Example 21.10 Find the steady alternating current entering the circuit shown
in Fig. 21.5.

i M N

v(t) C
= v0 cos ω t

i Q P
21

Fig. 21.5

The phasor of the voltage source is V = v0. By (21.6) the impedance of MNPQ is
R + iω L, and that of MQ is 1/iω C. These are in parallel, so by (21.7) the impedance
Z of the circuit viewed between M and Q is given by
1 1 1
= + .
Z R + iω L 1/(iω C)
Therefore
V ⎛ 1 ⎞
I= = v0 ⎜ + iω C⎟ .
Z ⎝ R + iω L ⎠
The simplest way to get an expression for i(t) is to treat the two terms in the brackets
on the right separately (though this does not give the answer in standard form). We obtain
v ⎛ ω L⎞
i(t) = 2 0 2 2 1 cos ⎜ ω t − arctan ⎟ + v0ω C cos(ω t + 2 π).
1
(R + ω L ) 2 ⎝ R⎠
451

Example 21.11 (Balanced bridge circuit.) (a) For Fig. 21.6a, show that

21.5
(i) if i(t) = 0, then Z1 /Z2 = Z3 /Z4, (ii) if Z1 /Z2 = Z3 /Z4, then i(t) = 0. (b) Check
that i(t) = 0 in the circuit of Fig. 21.6b.

TRANSFER FUNCTIONS IN THE FREQUENCY DOMAIN


(a) v = v0 cos ω t (b) v = v0 cos ω t

Z1 Z2 1
i 1

1
Z

1 1 1
Z3 Z4

Fig. 21.6

(a) The analogy (21.8) between resistive and general circuits for steady harmonic
oscillations enables us to borrow ordinary Wheatstone-bridge theory, substituting
current and voltage phasors and complex impedances for the usual constant currents,
voltages, and resistances. We can therefore say immediately that the circuit is balanced
(i(t) = 0) if, and only if,
Z1 /Z2 = Z3 /Z4.
(b) Z1 consists of a capacitor and resistor in parallel; so, by (21.6) and (21.7),
1 1 1 1
= + or Z1 = .
Z1 (iω ) −1
1 1 + iω
Also
1
Z2 = iω, Z3 = , Z4 = 1 + iω.

Therefore
Z1 1 Z3 1
= and = ;
Z 2 iω (1 + iω ) Z4 iω (1 + iω )
so, from (a), i(t) = 0 and the bridge is balanced.

21.5 Transfer functions in the frequency domain


Consider the circuit of Fig. 21.7, in which the applied voltage is v1(t), with phasor
V1 = c1 eiφ1. Suppose that the voltage drop v2(t) across R2 has a phasor V2 = c2 eiφ 2.
Consider the ratio of these two phasors, denoting it by G12 (the letter G usually
stands for voltage gain):
V2 c2 eiφ 2 c2 i(φ 2−φ 1 )
G12 = = = e .
V1 c1 eiφ 1 c1
Then (assuming that c1, c2  0,)
452

R C
STEADY FORCED OSCILLATIONS: PHASORS, IMPEDANCE, TRANSFER FUNCTIONS

+
v1(t) L R2 v2(t)

Fig. 21.7

c2
| G12 | = ,
c1
which is the ratio of the peak voltages, or amplitudes, of v2(t) and v1(t). The argu-
ment (polar angle) of G12 is the phase difference between them. If instead we are
interested in the current i2(t) through R2 produced by v1(t), then we need the ratio
Z12 = V1 /I2,
where I2 is the phasor of i2(t). This quantity, a voltage divided by a current, is
called a transfer impedance. Alternatively, we could consider the ratio
Y21 = I2 /V1,
in which Y21 is called a transfer admittance (whose parallel is conductance in
DC theory).
In general, the ratio of an output (such as a current in a selected branch) to
an input (such as a voltage driving a network) is called a transfer function in
the frequency domain. A different class of transfer functions is discussed in
Chapter 25, on Laplace transforms.

Example 21.12 Find the transfer impedance Z12 = V1 /I2 for the circuit of Fig. 21.8
when the prevailing angular frequency ω is 200.

2 0.01
A B C
21

i1 i1 − i2 i2

0.005 3
v1(t)
= 10 cos 200t

E D Fig. 21.8

The currents indicated take account of Kirchhoff’s second rule (21.8), that the sum of
the currents entering a junction is zero. The first law expressed in terms of the phasors
(see (21.8)), that the sum of the voltage drops round closed circuits is zero, gives for
the circuits ABCDEA and BCDEB respectively:
⎛ 1 ⎞
2 I1 + ⎜ + 3⎟ I 2 = V1 ,
⎝ 200 × 0.01i ⎠ ➚
453
Example 21.12 continued

21.6
and
⎛ 1 ⎞
⎜ 200 × 0.01i + 3⎟ I 2 − (200 × 0.005i)( I 2 − I1 ) = 0.

PHASORS AND WAVES; COMPLEX AMPLITUDE


⎝ ⎠
After simplification, these become
2 I1 + (3 − 12 i)I 2 = V1 ,
i I1 + (3 − 23 i)I 2 = 0.
The solution for I2 is
i
I2 = V1.
− 112 + 6i
The transfer function required is
Z12 = V1 / I 2 = 6 + 112 i = 8.14 e0.74i .
The amplitude of i2(t) is given by
|I2 | = |V1 |/|Z12 | = 10/8.14 = 1.23.
Its phase is
(phase of V1) − (phase of Z12) = 0 − 0.74 = − 0.74.
The current leads the voltage by this amount.

The methods described in this chapter were invented in the late nineteenth cen-
tury to assist engineers working with alternating current to interpret and make
calculations on their circuits. So long as only steady harmonic oscillations had to
be considered, there was no need to solve differential equations: only algebraic
equations are involved and these are much simpler to manipulate. Since that time,
the methods have been extensively developed so as to permit computer calculation
for circuits of any degree of complexity, using matrix algebra, graph theory, and
other sophisticated techniques. In Section 24.16, another method for algebrizing
circuit equations is described, using Laplace transforms.

21.6 Phasors and waves; complex amplitude


The use of phasors is not restricted to electrical circuits. Phase–amplitude vari-
ation from point to point in space, rather than between one circuit element and
another, determines the diffraction, interference, and scattering properties of
electromagnetic and other types of wave motion. It is not possible to describe the
physical content of these subjects in this book, but only to indicate how phasors
can be useful in such a context. They are employed in Sections 27.10 to 27.12 for
diffraction problems.

(i) Phasors and simple oscillations


Firstly we shall slightly generalize Example 20.3 (the case of two superposed out-
of-phase oscillations having the same frequency) by allowing the amplitudes to
be different. In the notation we shall use, we take u(t) to be given by
u(t) = A1 cos(ω t + φ1) + A2 cos(ω t + φ2), (21.9a)
454
and assume for simplicity that A1  0, A2  0.
As in Section 21.1, u(t) may be written as the real part of a complex function:
STEADY FORCED OSCILLATIONS: PHASORS, IMPEDANCE, TRANSFER FUNCTIONS

u(t) = Re[A1 ei(ω t+φ1) + A2 ei(ω t+φ2)] = Re[eiω t(A1 eiφ1 + A2 eiφ 2)]. (21.9b)

The phasors of the two oscillations are U1 and U2, given by


U1 = A1 eiφ1, U2 = A2 eiφ2. (21.10)

The phasor U of the composite oscillation is therefore given by


U = U1 + U2, (21.11a)

and then
u(t) = Re[U eiω t]. (21.11b)

Figure 21.9 shows a phasor diagram illustrating vectorial addition of the phasors.
The real and imaginary parts of U, needed to evaluate (21.11b), are equal to the
components of the vector resultant shown. (Clearly, the greater the number of
wave components, the greater the advantage of using phasors.)

Im

A2 e iφ2
U
nt
lta
su
Re φ2

A1 eiφ1
φ1
Fig. 21.9 Phasor diagram for u(t),
O Re eqn (21.9a).

(ii) Complex amplitude and travelling waves


Here we take as an example the superposition of two plane travelling waves, u1
and u2 (see Chapter 20) which have the same direction, amplitude, and frequency,
21

but different phases. Propagation is in the direction of the z axis. The variable t
is to be thought of as a stopwatch time; we shall suppose the watch is switched
on at the moment t = 0 when the origin of z is at a wave maximum. We lose no
generality by this, and it simplifies the algebra. The composite wave is given (see
Appendix B(d)) by
u(t, x, y, z) = u1 + u2 = A0 cos(ω t − kz) + A0 cos(ω t − kz + φ)
= 2 A0 cos 21 φ cos(ω t − kz + 21 φ ). (21.12)

This is a travelling wave having amplitude 2A0 cos 21 φ and phase 21 φ.


Alternatively, put
A0 e−ikz = U1, A0 e−i(kz+φ) = U2. (21.13)

Then U1, U2, called the complex amplitudes of u1 and u2, behave like phasors.
Since u = u1 + u2,
455
u = Re[(U1 + U2) e ] = Re[U e ], iω t iω t

21.6
where U is the complex amplitude of u.
Figure 21.10 shows the phasor diagram for obtaining U on the plane z = 0, from

PHASORS AND WAVES; COMPLEX AMPLITUDE


which the amplitude and phase of u(t, x, y, 0) are readily obtainable and agree
with (21.12). On any other plane z = z0, the phase angles become −kz0 and
1
2
φ − kz0: the phasor diagram is rotated clockwise, bodily about the origin,
through an angle (−kz0).

(a)
Im
(b)
Im
0) (z0)
t U( A0 eiφ Resultant
U
u ltan +φ
)
Res i(–
kz 0 Re
φ O −kz0 A
e
0
−kz0 + φ
O A0 Re A e−ikz0

Fig. 21.10 Phasor diagram for (21.12). (a) z = 0, (b) z = z0.

(iii) Intensity of a plane wave


Travelling waves carry energy along with them. A measure of this property is
the energy being transported across a fixed wave surface per unit time, per unit
area: that is, power per unit area. The time average of this quantity is called the
intensity of the wave. For the plane sinusoidal wave
u(t, x, y, z) = A cos(ω t − kt + φ),
the instantaneous rate of transport can be shown to be proportional to u2, so we
require the time average of u2. The motion is periodic (i.e. it is repetitive) so it is
sufficient to average u2 over a single period 2π/ω :
2 π /ω


ω
time average of u2 = A2 cos2 (ω t − kz + φ ) dt
2π 0
2 π /ω


ω
= A2 [1 + cos 2(ω t − kz + φ )] dt = 21 A2 .
4π 0

Therefore
intensity I = KA2, (21.14)

where K is a constant for the medium. In the case of optics, I is directly related to
the brightness of an image on a screen. (We adopt standard practice by describing
a light beam by means of a scalar wave.)
Now suppose that u is expressed in the form u = Re[U eiω t]. Then
U = U1 + U2,
456
where
STEADY FORCED OSCILLATIONS: PHASORS, IMPEDANCE, TRANSFER FUNCTIONS

U1 = A0 e−ikz and U2 = A0 ei(−kz+φ).


Also |U| = A  0, so that, from (21.14),
I = K|U|2, (21.15)

where K is constant. Intensities are not simply additive:


|U |2 = |U1 + U2|2  |U1 |2 + |U2 |2.
As with a particle on a spring, doubling the amplitude involves four times as
much energy.

(iv) Interference of two inclined plane waves in two dimensions


Consider two identical plane light waves u1 and u2, inclined to each other at an
angle γ , and overlapping in a region. One propagates along the z direction and
the other parallel to the (y, z) plane in the direction v with direction cosines (0,
sin γ, cos γ ) (see eqn (20.31) and Fig. 21.11). They impinge on a vertical screen,
which for simplicity we take to be the plane z = 0.

Beam direction Oz
v
γ

v
tion
direc
Beam
Screen
z=0
21

Fig. 21.11 Two uniform light beams, in direction Oz and v, interfere on arrival at a screen.

Let
u1(t, y, z) = A0 cos(ω t − kz), (21.16a)

u2(t, y, z) = A0 cos[ω t − k(y sin γ + z cos zγ )], (21.16b)

from (20.31a). The wave arriving at the screen z = 0 is then the resultant of the
two waves
u1(t, y, 0) = A0 cos ω t, u2(t, y, 0) = A0 cos(ω t − ky sin γ ). (21.17)

By using the identity for the sum of two cosines (Appendix B(d)), we obtain
u1 + u2 = 2 A0 cos 21 (ky sin γ ) cos(ω t − 21 ky sin γ ). (21.18)
457
This represents a pattern of oscillatory disturbance on the screen having amplitude
2 A0 cos( 21 ky sin γ ) and phase (− 21 ky sin γ ), where both depend on the vertical

21.6
coordinate y. It arises as the result of interference between the incoming waves
where they meet on the screen.

PHASORS AND WAVES; COMPLEX AMPLITUDE


On the screen z = 0 the phasors of the incoming waves are

Im
A0 P Re
O θ −ky sin γ

A0
e
–i
ky
N

sin
γ
Fig. 21.12 Phasor diagram for
Q interference of two beams on a
screen. U1 = O_P, U2 = P_Q, U = O_Q.

U1 = A0, U2 = A0 ei(−ky sin γ ).


The phasor diagram for the equation U = U1 + U2 is shown in Fig. 21.12. The
triangle OPQ is isosceles, so
O = P = − 21 ky sin γ and
OQ = ON + NQ = 2 A0 |cos( 21 ky sin γ )|.
Therefore the polar components of U are, as in (21.18),
|U | = OQ = 2 A0 |cos( 21 ky sin γ )|, arg(U ) = θ = − 21 ky sin γ .
In order to express the variations in brightness of the resulting pattern over the
screen, we need the intensity of the combined wave (see (iii), above). By (21.15),
this depends only on |U |2, rather than U itself. From the phasor diagram (or
from (21.18))
|U | 2 = |U1 + U2 | 2 = 4 A20 cos2 ( 21 ky sin γ ) = 2 A20[1 + cos(ky sin γ )].
From (21.15), the intensity distribution I on the screen is therefore propor-
tional to
1 + cos(yk sin γ ). (21.19)

The constant term in (21.19) corresponds to a constant average background


illumination. The term cos(yksin γ ) describes the varying brightness of the inter-
ference fringes. On arrival at the screen the beams reinforce or cancel each other
at regular intervals in the y direction, due to changes in their phase difference
ky sin γ . In principle, a photographic plate on the (x, y) plane would show a
pattern of brighter and darker interference fringes parallel to the x axis and
equally spaced by an amount 2π/(k sin γ ) in the y direction.
458
Problems
STEADY FORCED OSCILLATIONS: PHASORS, IMPEDANCE, TRANSFER FUNCTIONS

21.1 Write down the phasors X corresponding (e) R


to the oscillations x(t) given below, in polar and
a + ib form.
(a) 2 cos(10t + 12 π); (b) −2 cos(10t + 12 π); L
(c) 3 sin ω t; (d) − 4 sin(3t − 41 π).

21.2 Write the following phasors X in polar form, (f ) L


and give the corresponding oscillations x(t) when
the angular frequency is ω.
(a) 1 − i; (b) 2i; (c) −3i; C
(d) −√2 − √2i; (e) −2√3 − 2i; (f ) −1 − √3i;
(g) 1/(1 − 2i); (h) i /(1 − 2i);
(i) (2 + 3i)/(2 − 3i); ( j) 1/(3i) + 2i. (g) R L

21.3 Write down the phasors corresponding


to the following oscillators. State the amplitude
C
and phase.
(a) cos 2 t + cos(2 t − 41 π); (b) cos 3t − sin 3t;
(c) sin 3t + 2 cos 3t.
(h) R C
21.4 Either algebraically, or by calculation or
measurement based on a phasor diagram, give
the phasors of the following functions. L
(a) − cos 2 t + cos(2 t + 41 π) + cos(2 t − 12 π);
(b) cos 1760t − 3 cos(1760t − --12 π) + cos(1760t + --12 π).
(i) L
21.5 Show that the point on an Argand diagram R
corresponding to c ei(ω t+φ ) moves on a circle, centre
the origin and radius c, with constant angular C
velocity ω, and that its projection on the x axis is
given by x = c cos(ω t + φ). (A phasor diagram such
as Fig. 21.1 is a snapshot of conditions at t = 0 in (j) L R
such a representation: the whole diagram rotates
unaltered.)
R
21.6 Obtain the complex impedance of the
following circuit branches.
(k) L
21

(a) R C

C
(b) R L
R

(c) L C
(l) Z1 Z2

(d) R

Z3 Z4

C Fig. 21.13
459
21.7 A voltage v = 2 cos ω t is applied across each 21.9 Sketch phasor diagrams for the following
of the circuits in Problem 21.6. Find the amplitude cases and in each case calculate (or measure) to

PROBLEMS
and phase of the current passing through the obtain the sum:
branches. (a) cos 10t + 2 cos(10t + 0.3);
(b) cos 10t + 2 sin(10t + 10.2);
21.8 In Fig. 21.14, numerical values are given to the (c) cos 10t + 3 cos(10t − 0.2);
complex impedances (the standard units are ohms, (d) sin 20t − 3 cos(20t + 0.75);
although the quantities may be complex). A (e) 2 cos(50t + 0.4) + sin(50t + 0.3)
voltage with phasor V0 is applied. Obtain the − 3 cos(50t − 0.5).
phasor V1 as indicated, the corresponding voltage
gain V1 /V0, and the transfer impedance V0 /I1. 21.10 Use phasor diagrams for the following
problems.
(a) 3 (a) Given axes x, y, z, obtain the general form for
a plane sound wave u(t, x, y, z), travelling in
I1 the direction that makes equal (acute) angles
with the positive directions of the three axes.
V0 3i V1 Investigate the form of the wave on a screen
placed in the (x, y) plane.
(b) The wave in (a) is crossed by another identical
plane wave, which travels in the direction of
−i the z axis. Obtain the interference pattern on
the plane z = 0.
(b) 1 I1

1
V0 −2i V1

−i

Fig. 21.14
Graphical, numerical, and
22 other aspects of first-order
equations

CONTENTS

22.1 Graphical features of first-order equations 460


22.2 The Euler method for numerical solution 463
22.3 Nonlinear equations of separable type 466
22.4 Differentials and the solution of first-order equations 469
22.5 Change of variable in a differential equation 473
Problems 476

Chapters 19 –21 were largely concerned with the theory and applications of
linear differential equations having constant coefficients. We cannot expect that
every physical situation will be described, even approximately, by equations
having this form – the field of differential equations is naturally far more varied.
In this chapter we mainly consider first-order equations which are either non-
linear, or have a nonconstant coefficient. We first show some simple graphical
and numerical methods that can give a good general picture of the solutions, and
indicate how a solution will develop starting from a given initial value. (For refine-
ments of such methods, consult books on numerical analysis: see, for example,
Boyce and DiPrima (1997).)
In cases where analytic solutions (that is, solutions expressible as more or less
explicit formulae) are obtainable the solution methods are different for each type
of equation. An arbitrary constant still arises, but it is embedded in the general
solution, with no parallel to the simple structure of particular and complement-
ary functions that we have seen for linear equations. In Sections 22.3 –22.5 we
show a few frequently occurring types: separable equations are particularly com-
mon. There exist several reference books containing vast collections of special
results and methods: see, for example, Zwillinger (1992).

22.1 Graphical features of first-order equations


In this section we shall use x for the independent variable (instead of t), and y
as the dependent variable (instead of x), and consider differential equations of
the form
461
dy
= f (x, y), (22.1)
dx

22.1
where f(x, y) is unrestricted. If f(x, y) happens to take the form g(x) + h(x)y, the

GRAPHICAL FEATURES OF FIRST-ORDER EQUATIONS


equation is linear and can be handled by the method of Section 19.5; otherwise
none of the methods so far discussed will work.
However, we can always obtain a rough picture of the solution curves by using
a simple fact. Choose any point x = a, y = b, on the (x, y) plane. Then eqn (22.1)
says that the slope of the solution curve which passes through (a, b) must be equal
to f(a, b), and this has a definite numerical value that we can work out. So take
a large number of points (a, b) on the (x, y) plane. For each of them, work out
f(a, b), and draw through the point a short line whose slope is equal to f(a, b), as
is done in Fig. 22.1a for the special case f(x, y) = xy. These are called direction
indicators. Given enough of these, it is possible to draw a family of curves which
follow their directions smoothly, as in Fig. 22.1b. Each of the curves represents a
solution of (22.1), because its slope, or derivative, is correctly reproduced at each
point on it. The picture is called a lineal-element diagram or a direction field. The
technique can in principle be used for first-order equations however complicated
they may be.

(a) (b)
y

P
b (a, b)
1

Lineal element
through P : (a, b)

O a x O 1 x

Fig. 22.1 Lineal-element diagram indicating solution curves for dy/dx = xy.

Rather than to place the direction indicators at grid points as in Fig. 22.1, it is
often easier to look for curves, called isoclines, along which the slope is constant,
as in the following example.

dy
Example 22.1 Sketch the solution curves of = x − y.
dx
Here dy/dx takes constant values K on the isoclines x − y = K, or y = x − K. For example,
dy/dx = 0 on y = x, dy/dx = 1 on y = x − 1, and so on. If we draw the line y = x − K, then
the indicators along it are all parallel, with slope K, so it is easier to draw a large
number of them. Figure 22.2 is constructed in this way. (This equation is in fact linear
with constant coefficients, its solutions being y = x − 1 + C e−x.)
462

y
GRAPHICAL, NUMERICAL, AND OTHER ASPECTS OF FIRST-ORDER EQUATIONS

y −2 −1 0 2
3
1

2
2 1

1
3

O 1 2 3 x
O 1 2 x
Fig. 22.2 Solution curves of
dy/dx = x − y in the first quadrant. Fig. 22.3 Pattern of solution curves of
----------- isoclines, values of K indicated; dy /dx = x2 + y2 in the first quadrant.
———— solution curves. ----------- isoclines, ———— solution curves.

dy
Example 22.2 Sketch the solution curves of = x2 + y2 .
dx
The isoclines are the circles x 2 + y 2 = K (see Fig. 22.3), on each of which the slope is
equal to K (which must be a positive number here).

Closed curves can occur, as in the following example.

dy x
Example 22.3 Sketch the solution curves of =− .
dx y

y
K=1 K = −1
22

x
O

Fig. 22.4 The solution curves of


dy/dx = −x/y. ----------- isoclines,
———— solution curves.

The isoclines having slope K are the radial straight lines −x/y = K (see Fig. 22.4), or
1
y = − x.
K
Thus, for example, if K = −1, the corresponding isocline is y = x, and solutions cut this
straight line with slope −1 as shown in the figure. On y = 0, the slope K must be infinite
so the direction indicators are vertical.
463
The method illustrates why there is always an infinite number of solutions:
there will be a single solution curve through every point where f(x, y) has a

22.2
definite value. The type of exception that might arise is illustrated by the case
f(x, y) = (xy)2 , which only has a meaning when x and y have the same sign; there
1

THE EULER METHOD FOR NUMERICAL SOLUTION


are solutions in only the first and third quadrants of the (x, y) plane. Again, there
can be points from which several solution curves emanate: this occurs at the
origin in Example 22.3, where f(x, y) takes the indeterminate form 0/0; nothing
can be taken for granted at such a point, but elsewhere the curves do not intersect.
To prescribe a point (a, b) through which a curve must pass is equivalent to
imposing an initial condition on the solution: the corresponding initial condition
would read ‘Find the solution for which y = b when x = a’. Therefore, it can be
seen that even when an equation is not linear an initial condition of this sort will
give exactly one solution, points where f(a, b) is indeterminate being excepted.

Self-test 22.1
Sketch the isoclines of the differential equation dy/dx = x2 − y2. Using the
isoclines as a guide, sketch solution curves of the equation.

22.2 The Euler method for numerical solution


For the equation
dy
= f (x, y),
dx
consider an adaptation of the graphical method described in the previous section.
Start at any point P0 : (x0, y0), and draw an indicator with slope f(x0, y0) from P0 to
P1 : (x1, y1) a short distance away (see Fig. 22.5). Then P1 will lie close to the solu-
tion curve through P0. Do the same thing starting with P1, and so on, continuing
as far as is necessary. It is also possible to proceed backwards from P0. Provided
that the steps are small enough, it seems likely that P0, P1, P2, … will be close to the
solution curve through P0, so we have an approximate solution to the initial-value
problem: Find the solution of dy/dx = f(x, y) for which y = y0 when x = x0.
Obviously, to draw the solution curve in this way is not really practicable;
but we need not actually draw it, because the same process can be carried out
numerically as follows.
As shown in Fig. 22.6, choose a small, constant step length h in x for going
from point to point: P0 → P1 → P2 → … , where the vectors P_0__P_1_ , P_1_ P_2_ , P_2__P_3_ , … point
in the direction of the indicators at their respective starting points P0, P1, P2, … .
Corresponding to the x steps, call the y steps k1, k2, k3, … :
y1 = y0 + k1, y2 = y1 + k2, y3 = y2 + k3, …,
where
k1 = hy′(x0) = hf(x0, y0), k2 = hy′(x1) = hf(x1, y1), … .
464

y
GRAPHICAL, NUMERICAL, AND OTHER ASPECTS OF FIRST-ORDER EQUATIONS

P3
y3
y
P2 P3 k3
P1
(x2, y2) (x3, y3) P2
P0 (x1, y1) y2
k2
(x0, y0)
P1
y1
P0 k1
O x y0

h h h
Fig. 22.5 Step-by-step use of direction O x0 x1 x2 x3 x
indicators along a particular solution curve.
Fig. 22.6 Three steps in the numerical
solution of dy/dx = f(x, y), starting at the
point P0 : (x0, y0).

Therefore
for P1: x1 = x0 + h, y1 = y0 + hf(x0, y0);
for P2: x2 = x1 + h, y2 = y1 + hf(x1, y1);
and, in general, with n = 1, 2, 3, … , in turn,
for Pn: xn = xn−1 + h, yn = yn−1 + hf(xn−1, yn−1).
We expect that the points will be close to the solution curve. This is the Euler
method for approximating to the solution of the differential equation.

Euler method for initial-value problems


dy
Differential equation: = f (x, y).
dx
Initial condition: y = b when x = a.
22

Approximate solution: Put x0 = a, y0 = b; then


xn = xn−1 + h, yn = yn−1 + f(xn−1, yn−1)h
for n = 1, 2, … successively. (22.2)

This recipe, or algorithm, describes a step-by-step repetitive process, or iteration;


essentially the same thing has to be done over and over again. The procedure
which produces a new (x, y) from the preceding (x, y) is called a recurrence
relation (compare Newton’s method for solving equations in Chapter 4). Such a
process is easy to program on a computer, and Fig. 22.7 is the skeleton of a flow
diagram. The program should contain a method for stopping itself when x has
gone far enough; also, since small intervals h are usually necessary, it is useful
to include a means of recording only the results for preset values of x to avoid
voluminous output.
465

22.2
Input x=a Write x←x+h
a, b, h y=b x, y y ← y + f(x, y)h

THE EULER METHOD FOR NUMERICAL SOLUTION


Fig. 22.7 Flow diagram for the initial-value problem.

Example 22.4 Use the Euler method to obtain a solution of the initial-value
problem
dy
= xy2 , with y = 1 at x = 0,
dx
between x = 0 and x = 1. Compare the result with the exact solution
y = (1 − 21 x2 )−1 when steps of h = 0.2, 0.1, 0.01, and 0.001 are adopted.
From (22.2), the first few terms are given by:
x1 = h, y1 = 1;
x2 = 2h, y2 = 1 + h2;
x3 = 3h, y3 = (1 + h2) + 2h2(1 + h2)2;
and so on.
The following results are obtained; the entries give y between x = 0 and x = 1 with
various values of step lengths h.

x 0 0.2 0.4 0.6 0.8 1.0

Exact y 1.0000 1.0204 1.0870 1.2195 1.4706 2.0000


h = 0.2 1.0000 1.0000 1.0400 1.1265 1.2788 1.5405
h = 0.1 1.0000 1.0100 1.0623 1.1687 1.3601 1.7129
h = 0.01 1.0000 1.0193 1.0843 1.2139 1.4576 1.9618
h = 0.001 1.0000 1.0203 1.0867 1.2189 1.4693 1.9960

Euler’s method is very simple; but it is usually good enough to provide reason-
able accuracy over a finite range, provided that small enough intervals are used.
The simplest way of checking accuracy is to experiment with successively smaller
intervals h, noting when further reduction in h does not change the values of y
obtained at the number of decimal places required. Several problems on these
lines are given at the end of the chapter.
There exist, however, far more sophisticated algorithms which will give great
accuracy over long ranges without having to use minute values of h (which can
introduce problems of its own). The computer programs for such methods can be
found in libraries of computer routines. For example, the software Mathematica
has a program for the numerical solution and plotting of initial-value problems:
see the projects in Chapter 42. The theoretical side of the subject is called numer-
ical analysis; mathematical theory makes it possible, for example, to estimate the
size of interval required without carrying out trials.
466

22.3 Nonlinear equations of separable type


GRAPHICAL, NUMERICAL, AND OTHER ASPECTS OF FIRST-ORDER EQUATIONS

The equation
dy y2
=
d x x2
is nonlinear (note the y2 term), and none of the theory of Chapters 18 and 19 can be
adapted to solve it. Write it in the form
dy d x
= 2.
y2 x
On the left only y appears, and on the right only x appears. The form looks like
an invitation to integrate both sides:

 y = x
dy dx
2 2
+ C,

so −1/y = −1/x + C. Therefore


x
y= ,
1 − Cx
where C is arbitrary. You should consider checking that these really are solutions
by substituting into the equation. Notice that there is no sign of the complement-
ary functions and particular solutions found for linear equations: an arbitrary
constant C does occur, but it is imbedded deep in the expression. The solution
curves are shown in Fig. 22.8: each curve has its individual asymptotes, namely
the lines y = −C −1, x = C −1. The solution has two branches for each C. For example,
if C = 1, then the solution can be expressed as
(y + 1)(1 − x) = 1,
which has two branches as shown. The curves are hyperbolas.
The method is called separation of variables. It can be applied to equations
which are separable, that is to say, ones that can be arranged in the form
dy
22

= g(x)h(y),
dx

−1 −2 2 1 0

−1
−2
2 x
O
2
1
1
Fig. 22.8 Solution curves
0 −1
y = x /(1 − Cx) for dy/dx = y 2/x 2.
Values of C indicated on the
−2 curves.
467
where the right-hand side is the product of two terms, one a function of x only,
and the other a function of y only. (Alternatively, you might see it more easily as

22.3
an equation which can be put into the form
Y(y) dy = X(x) dx,

NONLINEAR EQUATIONS OF SEPARABLE TYPE


X(x) and Y(y) being functions respectively of x only and y only.) For example, the
equations
dy dy y dy
= x2 y2, = ex sin y, = cos(y2 )
1 1

dx dx x dx
are of the right type.

Separation of variables
dy
Equation type: = g(x)h(y).
dx
dy
Separate the terms: = g(x) d x.
h(y)

Integrate:
 hd(yy) =  g(x) dx + C, so that y is expressed as function of x
(usually an implicit function). C may take a range of values. (22.3)

dy
Example 22.5 Find solutions of the equation y = cos x.
dx
This can be written y dy = cos x dx. By integrating both sides, we obtain 12 y2 = sin x + C,
giving y = ± 2 2 (sin x + C)2 , where C is only to a certain extent arbitrary. It cannot be
1 1

completely arbitrary; for example, if C = −100, then sin x + C will always be negative
(because −1  sin x  1), so the square root never has a real value. We must have
C  −1 to get any real solution. If −1  C  1 there are regularly-spaced intervals on
which sin x + C  0, giving the oval curves in Fig. 22.9. If C  1 their sin x + C  0 for
all x, giving the wavy phase paths.

y
3

x
−2π −π O π 2π 3π

−3

1 1
Fig. 22.9 Solution curves y = ± 2 2 (sin x + C )2 for the equation y(dy/dx) = cos x.
468

dy y(x + 1)
Find solutions of = .
GRAPHICAL, NUMERICAL, AND OTHER ASPECTS OF FIRST-ORDER EQUATIONS

Example 22.6
d x x(y + 1)
After separating, we have
1+y 1+x
 y
dy =  x
d x + C,
or
⎛ 1⎞
 ⎜⎝1 + y ⎟⎠ dy =  ⎛⎜⎝ 1 + x ⎞⎟⎠ dx + C.
1

Therefore y + ln | y | = x + ln |x | + C , or y e y = Ax e x, where A is arbitrary.


We cannot further reduce this ‘solution’ to express y explicitly in terms of x. The
‘answer’ is nearly as obscure as the original equation. More intelligible information
about the solutions could be obtained by using the graphical or numerical methods
of Sections 22.1 and 22.2.

The separation-of-variables technique requires initiative, even in the simplest


cases, as can be seen from the following example.

dy
= 2y 2 .
1
Example 22.7 Find solutions of
dx

After separating, we have 1


2 y − 12

dy = dx, or
y = x + C.
1
2

To express y in terms of x, square both sides. We get


y = (x + C)2.

(b) y = (x + C)2
(a) y y = (x + C)2 y x  −C
22

x
O −C x O −C

Fig. 22.10 (a) y = (x + C)2 for various C. (b) The solutions of the differential equation, consisting
only of the right-hand branches of the parabolas. (Note: y(x) = 0 is also a valid solution.)

This represents a family of parabolas, as shown in Fig. 22.10a. But it cannot be right:
the curves cross at every point, although dy/dx has only one value at any point. In fact,
since y 2  0, only the positive value of y′(x) is legitimate, and this gives the right-hand
1

branches, shown in Fig. 22.10b.


We also lost a solution, namely y(x) ≡ 0. This is connected with the fact that, for this
solution, we in effect divided by zero when we first separated the equation.

The production of pseudosolutions and the non-appearance of certain singular


solutions in the final formula, as in Example 22.7, is a problem which constantly
arises in nonlinear differential equations.
469

Self-test 22.2

22.4
Show that the equation

DIFFERENTIALS AND THE SOLUTION OF FIRST-ORDER EQUATIONS


dy x2 − y2
=
dx xy
can be transformed into a separable equation in v and x by using the change
of variable y = vx. Hence solve the equation.

Differentials and the solution of


22.4
first-order equations
In this section, it is important to distinguish between an identity such as
d(y 2)/dx = 2y dy/dx, which is true for any y(x) (it is just a special case of the chain
rule), and an equation such as dy/dx = xy, which will only be true for special
functions y(x).
Take an identity such as
d 2
x = 2x, (22.4)
dx
and consider another way of writing it:
d(x 2) = 2x dx. (22.5)

It is as if we formally multiply (22.4) by dx to obtain (22.5). Conversely, if we


divide (22.5) by dx, we recover (22.4). We have already used this process to help to
change the variable in an integral in Section 17.1.
Now consider a more complicated identity, obtained from the product rule for
differentiation (y(x) represents any function of x):
d dy
(xy) = y + x . (22.6)
dx dx
The parallel expression of the same identity, obtained as before, is
d(xy) = y dx + x dy. (22.7)

Given either one of them, we can immediately construct the other, so we shall
regard such pairs of expressions as being simply different ways of writing the
same thing. In effect this is what we did when carrying out the separation-of-
variables process for differential equations in Section 22.3, and we are leading up
to a generalization of this method.
In general, a differential expression or differential form has the shape
P(x, y) dx + Q(x, y) dy, (22.8)

where P(x, y) and Q(x, y) are two functions of x and y. In (22.5), we had P(x, y) = 2x
and Q(x, y) = 0; in (22.7), we had P(x, y) = y and Q(x, y) = x. The symbols on the
left of (22.5) and (22.7), d(x2) and d(xy), are called the differentials of x2 and xy
respectively.
470
The table (22.9) (below) gives a list of useful identities written in the usual form
and the differential form for comparison.
GRAPHICAL, NUMERICAL, AND OTHER ASPECTS OF FIRST-ORDER EQUATIONS

Standard form Differential form

d
(C) = 0 (C constant) dC = 0
dx
d 2
(x ) = 2x d(x2) = 2x dx
dx
d 2 dy
(y ) = 2y d(y2) = 2y dy
dx dx
d dy
(xy) = x +y d(xy) = y dx + x dy
dx dx
d ⎛ y⎞ 1 ⎛ dy ⎞ ⎛ y⎞ 1
⎜ ⎟= ⎜x − y⎟ d ⎜ ⎟ = − 2 (y dx − x dy)
dx ⎝ x ⎠ x2 ⎝ dx ⎠ ⎝ x⎠ x
d ⎛ x⎞ 1 ⎛ dy ⎞ ⎛ x⎞ 1
⎜ ⎟= ⎜y − x ⎟ d ⎜ ⎟ = 2 (y dx − x dy)
dx ⎝ y ⎠ y2 ⎝ dx⎠ ⎝ y⎠ y
d ⎛ y⎞ 1 ⎛ dy ⎞ ⎛ y⎞ 1
⎜ ln ⎟ = ⎜x − y⎟ d ⎜ ln ⎟ = − (y dx − x dy)
d x ⎝ x ⎠ xy ⎝ d x ⎠ ⎝ x⎠ xy
(22.9)

Differential forms can be manipulated. For example:


(i) 2 x dx − y dy = d(x2 ) − d( 21 y2 ) = d(x2 − 21 y2 ).
(ii) (x + y) dx + x dy = x dx + (y dx + x dy) = d( 21 x2 ) + d(xy) = d( 21 x2 + xy).
(iii) If u(x) and v(x) are two functions, then
d(uv) = u dv + v du,
which is the product rule for derivatives in differential form. These results are all
identities; that is, true for all functions y(x), u(x), v(x).
22

Example 22.8 Put d(x 3 + x sin y) into the form P dx + Q dy.


We have
d(x 3 + x sin y) = d(x3) + d(x sin y),
for the first term, (d /dx)x 3 = 3x2, and in differential form this becomes d(x 3) = 3x2 dx.
For the second term, we can use the product rule in the form (iii) above (or write it in
standard form first):
d(x sin y) = sin y dx + x d(sin y)
followed by the chain rule
d(x sin y) = sin y dx + x cos y dy.
Finally,
d(x 3 + x sin y) = (3x2 + sin y) dx + (x cos y) dy,
so that, in (22.8), P(x, y) = 3x 2 + sin y and Q(x, y) = x cos y.
471
First-order differential equations can be written alternatively as differential
forms. The simplest is the equation

22.4
dy
= 0,

DIFFERENTIALS AND THE SOLUTION OF FIRST-ORDER EQUATIONS


dx
which has solutions y(x) = C, where C is any constant. In differential form, the
equation becomes
dy = 0,
and the solutions are compatible with the first entry in the table, (22.9).

dy x3
Example 22.9 Find solutions of the equation =− 2.
dx y
In differential form, this becomes
x 3 dx + y 2 dy = 0.
But
x 3 d x + y2 dy = d( 14 x 4 ) + d( 13 y 3 ) = d( 14 x 4 + 13 y 3 ).
This will be zero if
4 x + 3 y = C,
1 4 1 3

where C is, in this case, any constant. The equation is in fact separable; you should
compare this with the process in Section 22.3.

dy x − y
Example 22.10 Find solutions of = .
dx x + y
In differential form, this becomes
0 = (x − y) dx − (x + y) dy = x dx − y dx − x dy − y dy.
Try to rearrange it so that recognizable forms appear:
0 = x dx − y dy − (y dx + x dy) = d( 12 x2 ) − d( 12 y2 ) − d(xy)
= d( 12 x2 − 12 y2 − xy).
This differential will be zero as required if
2 x − 2 y − xy = C ,
1 2 1 2

where C is the ‘variable constant’, or parameter, which will generate a whole family of
solutions.

In the previous example, the terms were rearranged in a search for a group like
y dx + x dy that would simplify, in that case, to d(xy). If a differential form can be
expressed identically (that is to say, for all y(x)) in the form of a single differential
P(x, y) dx + Q(x, y) dy ≡ dF(x, y), (22.10)

where F(x, y) is a fixed function of x and y, then it is called a perfect differential


form, or a perfect differential. Usually this is impossible. For example, consider
the differential form y dx. It can be proved that there does not exist any fixed
function F(x, y) such that
472
y(x) dx = dF(x, y(x)) for every y(x)
GRAPHICAL, NUMERICAL, AND OTHER ASPECTS OF FIRST-ORDER EQUATIONS

(try looking for one).


To solve differential equations by using differentials, we may search for a perfect
differential in order to be able to conclude with the steps
‘dF(x, y) = 0; therefore F(x, y) = C’,
as in the earlier Examples. But if a perfect differential is not already present, we
might be able to produce one by multiplying through by a suitable function, called
an integrating factor for the expression. For example, y dx – x dy is not a perfect
differential; but, from (22.10),
1 ⎛ y⎞
( y dx − x dy) = d ⎜ − ⎟ ,
x 2 ⎝ x⎠
so the new expression is a perfect differential.

Perfect differential forms


Let y be an arbitrary function of x. Then P(x, y) dx + Q(x, y) dy is a perfect
differential if it can be written as
P(x, y) dx + Q(x, y) dy ≡ dF(x, y),
where F(x, y) is a fixed function of x and y. (22.11)

Integrating factor for differential forms


A function I(x, y) is an integrating factor for the differential form
P dx + Q dy if I(P dx + Q dy) is a perfect differential. (22.12)

It can be proved that every differential form has an integrating factor, but only
occasionally is it easy to see one. Examples 22.11 and 22.12 show cases that are
amenable.
22

dy
Example 22.11 Find a family of solutions of x = y.
dx
(This is a linear equation and it is also separable, so we have two other methods for
solving it.) In differential form:
y dx − x dy = 0,
and we cannot do anything with the left-hand side as it stands. However, the remark
above suggests we multiply by the integrating factor 1/x2, obtaining
0 = (1/x2)(y dx − x dy) = d(−y/x).
Therefore
− y/x = C, or y = −Cx,
are the solutions, as is easily confirmed.
There are other possibilities; for example (see (22.9)), we might divide by y 2 or xy.
In the end these lead to the same set of solutions.
473
Note that, in Example 22.11, the equation is linear:

22.5
dy ⎛ 1 ⎞
+ ⎜ − ⎟ y = 0,
dx ⎝ x ⎠

CHANGE OF VARIABLE IN A DIFFERENTIAL EQUATION


and so we can alternatively use the method of Section 19.5. An ‘integrating factor’
I(x) is also used there: it is

I(x) = e−∫ x −1 dx = e−ln x+C = 1/x

(where we choose C = 0 for simplicity), which is different from, though related to,
the ones which work for the differential form y dx − x dy above.

dy
Example 22.12 Find a set of solutions of x = y + y2 x.
dx
Equivalently, y dx − x dy + y 2x dx = 0. The first two terms cannot be written as dF(x, y).
The table (22.9) offers three integrating factors, x−2, y−2, and (xy)−1, to choose from.
It is, however, also necessary to be able to manage the remaining term, y 2x dx, after
multiplying by the integrating factor, so we choose y −2, which gives
0 = (1/y 2)(y dx − x dy) + x dx = d(x / y) + d( 12 x2 ) = d(x / y + 12 x2 ).
Therefore x / y + 12 x2 = C, or y = x /(C − 12 x2 ), are solutions.

Self-test 22.3

Find the general solution of


dy
(x2 + 2xy + x cos xy) + (2xy + y2 + y cos xy) = 0.
dx

22.5 Change of variable in a differential equation


Occasionally we can find a change of variable, or substitution, which will simplify
a differential equation. Some general types are given here for illustration.

Equations not involving y


If a differential equation contains
x, dy/dx, d2y/dx2, … ,
but not y, substitute the independent variable w = dy/dx in place of y, producing
an equation for w of lower order. (22.13)

To see how this works, consider the following Example.


474

2
d2 y ⎛ d y ⎞
GRAPHICAL, NUMERICAL, AND OTHER ASPECTS OF FIRST-ORDER EQUATIONS

Example 22.13 Find solutions of + ⎜ ⎟ = 0.


dx 2 ⎝ d x ⎠
The variable y is not independently present, so put
dy
w= .
dx
Then d2y/dx2 = dw/dx, so that the equation becomes
dw
+ w 2 = 0.
dx
This is a separable equation, and the method of Section 22.3 gives

 
dw 1
− 2
= d x, or = x + A,
w w
where A is constant, so that
1
w= .
x+A
For the second stage, remember that w = dy/dx, so we have
dy 1
= .
dx x + A
Therefore
y = ln | x + A| + B,
where A and B are constants which we see, in retrospect, may be chosen
entirely arbitrarily.

Sometimes it is possible to change the dependent variable y into something else


to obtain a more manageable equation:

dy A yD
Equation of the form =f C xF
dx
Change to a new dependent variable v by
22

v = y/x
and solve the resulting separable equation.
(To make the change write y = xv, so that
dy/dx = x dv/dx + v.) (22.14)

dy 3y − x
Example 22.14 Find solutions of = .
d x 3x − y
This equation can be written in the form
dy 3y / x − 1
= ,
dx 3 − y /x
which has the form f(y/x), so change the dependent variable from y to v = y/x. To obtain
dy/dx in terms of v, write y(x) = xv(x). Then dy/dx = x dv/dx + v, and in terms of v the
equation becomes ➚
475
Example 22.14 continued

22.5
dv 3v − 1 dv v2 − 1
x +v= , or x = .
dx 3−v dx 3 − v

CHANGE OF VARIABLE IN A DIFFERENTIAL EQUATION


This new equation is separable (Section 22.3) (a separable equation will always be
obtained at this stage). Following (22.3):
3−v
 x dx =  v − 1 dv
1
2

= ⎜
⎛ 1 2 ⎞
− ⎟ dv.
⎝ v − 1 v + 1⎠
Therefore ln |[(v − 1)/(v + 1)2]| = ln | x | + C, where C is an arbitrary constant. After
returning to y and simplifying, we have
(y − x)/(y + x)2 = c, (22.15)
where c = ± e . The solution curves are shown in Fig. 22.11, plotted by working directly
C

from the differential equation and using a numerical method (see Section 22.2).

y
x
=
y

x
Si

Fig. 22.11 Solution curves for


ng
ul
ar

dy 3y − x
y
so x

=
=

.
lu

dx 3x − y
tio
n

Notice the special solutions y = ±x.

The methods of separation of variables, differentials, substitutions, etc., used


to solve nonlinear equations are rather hazardous. In Example 22.14 there are
two solutions which do not appear in (22.15), represented by the straight line
segments y = −x for x  0 and x  0 in Fig. 22.11 and corresponding to the limit-
ing case of infinite c. Therefore (22.15) is not a truly general solution. These extra
singular solutions can be found independently in this case by trying the form
y = mx in the equation and solving the resulting quadratic for m.
Singular solutions are important in the theory of vibrations, population problems,
and other nonlinear fields. In general, they represent limiting cases of the ordinary
solutions; in this case, y = x is an envelope of the ordinary solutions: that is, a curve
that is tangential to each ordinary solution. This was the case with the singular
solution y(x) = 0 of Example 22.7.
The following Example illustrates another way of transforming a differential
equation. In a mechanical context, the transformation of the derivative d2x/dt2
involved is called the energy transformation.
476

Example 22.15 The acceleration of a vehicle is constrained by its velocity:


GRAPHICAL, NUMERICAL, AND OTHER ASPECTS OF FIRST-ORDER EQUATIONS

d2x/dt2 = Kv−2, where v = dx/dt and K is a constant. Find the velocity as a


function of distance if x(0) = 0 and v(0) = 0.
Transform the acceleration by the chain rule, using x as the intermediate variable:
d2 x dv dv d x dv 1 d(v2 )
2
= = =v = 2 .
dt dt dx dt dx dx
The differential equation now relates v to x:
2
1 d(v )
2 = Kv −2 .
dx
We could put v 2 = u and separate, but for the sake of adventurousness we shall leave v 2
as it is and turn the equation upside down:
dx
2 = K −1 v2 .
d(v2 )
Now integrate both sides, courageously using v 2 as the variable:


2x = K −1 v2 d(v2 ) = 12 K −1(v2 )2 + C,

or
v 4 = 4Kx − 2CK.
The initial condition x(0) = v(0) = 0 then gives
1
v(x) = (4Kx)4 .
x(t) is obtainable by solving the separable equation v = dx/ dt = (4kx)4 .
1

Self-test 22.4
Find the general solution of
dy y y2
2 = + .
dx x x2
22

Problems
In these problems, y′ means dy /dx. 22.2 (Computational). Use the Euler method to
compute approximate solutions to the following
22.1 Sketch a lineal-element diagram for the initial-value problems. Try various values of the
solution curves of each of the following. step h. Compare the results with the exact
(a) y′ = − y; (b) y′ = x − y; solutions provided.
(c) y′ = x/y; (d) y′ = xy; (a) y′ = − 12 y with y = 1 at x = 0, over the
(e) y′ = −y /x; (f ) y′ = y /x; range 0  x  2. (The exact solution
1 is y = e− 2 x.)
1

(g) y′ = (x − 1)y; (h) y′ = 2 ;


x + y2 (b) y′ = −x/y with y = −1 at x = −1, over the
1 range −1  x 1 1. (The exact solution is
(i) y′ = 2 ; (j) y′ = (1 − y 2 ) 2 ;
1

x + y2 − 1 y = −(2 − x 2 ) 2 . Try to extend your results


(k) y′ = (y /x) 2 . Make sure that not all your curves
1
forward and backward (using negative h)
lie in the first quadrant. to the range −2  x  2.)
477
(c) y′ = (1 − y ) , with y = 0 at x = 0, over the
1
2 2
22.6 Solve the following equations and sketch
range 0  x  12 π. (The exact solution is the solution curves. Take care to avoid spurious

PROBLEMS
y = sin x.) solutions as in Example 22.7. Look out for
solutions you might have lost in the process:
22.3 (Computational). Use the Euler method to these are usually suggested by the sketch.
calculate a few representative solution curves in the dy dy 1
= 2y 2 ; = xy 2 ;
1

following cases. Each curve will have a different (a) x (b)


dx dx
initial condition. You should follow each curve dy dy
= (1 − y 2 ) 2 ; (d) x = (1 − y 2 ) 2 .
1 1
forwards, and probably back-wards as well (c)
(negative h), sufficiently far to get a clear idea of dx dx
how it is behaving. It might be advantageous to
use smaller intervals h over some sections than 22.7 (Differential method). Obtain a family of
over others. solutions of the following equations. (Usually
(a) y′ = y(x + 1)/x(y + 1). This refers to Example 22.6, they must be left in implicit form.)
with the ‘solutions’ y ey = Ax ex, where A is an dy 2 x − y
(a) = (check whether there is also a
arbitrary constant. d x x + 2y solution of the form y = mx);
(b) y′ = 2y 2 . This
1
refers to Example 22.7.
dy y dy x 2 − y
(c) y′ = (y /x) 2 has solution curves in the first
1
(b) = 2 ; (c) = ;
and third quadrants. Sketch a lineal-element dx y − x dx x + y
diagram to obtain the broad pattern, then d y 2x − y dy x − 2xy
(d) = ; (e) = 2 ;
compute a few representative curves. (The d x x − 2y dx x −y
2
general solution is dy 3x
(f) = ;
| y | = (| x | 2 − C ) 2 for | x | 2  C.) d x 3y 2 + 1
1 1 1

dy 2xy
22.4 (Separation of variables). Obtain solutions of (g) + = 0 (this is also a linear equation);
dx x2 − 1
the following equations. dy
(a) y′ = x /y; (b) y′ = 2x /y; (h) (1 − sin y) + cos x = 0;
dx
(c) y′ = x /(y + 2); (d) y′ = (x + 3) /(y + 2);
(e) y′ = x 2 /y 2; (f) y′ = −x 2 /y 2; dy
(i) (1 + 3 e3y ) = 2 e 2x − 1;
(g) y′ = y 2
/x 2
; (h) y′ = −y 2 /x 2; dx
(i) 2xy′ = y 2; (j) yy′ + x = 1; dy
(j) (ex + y + 1) + (ex + y − 1) = 0;
dx dx
(k) = 3t 2x3;
dt dy 1 + cos x sin y
(k) = .
(l) (sin x)
dx
= t; d x 1 − sin x cos y
dt
dy 22.8 (Differential method). Solve the following
(m) ex + y = 1; equations. Some of these need an integrating
dx
dy factor (see eqn (22.12)) such as the ones
(n) (1 + x 2 ) + (1 + y 2 ) = 0, with y(0) = −1. suggested.
dx
dy y y − 2x
(a) = (check also for solutions of
22.5 Show that the solution of the initial-value d x x x − 2y the for y = mx);
problem dy y(1 − x 2 )
(b) = (divide by x 2 );
dy
=− ,
x d x x(1 + x 2 )
dx y dy y2
(c) = 2 ;
where y = 1 when x = 2, is obtained from the equation dx y − 1
x y dy y(y − 1)
2
u du = −  v dv.
1
(d)
dx
= 2
y −x
(divide by y 2 );

dy y(x 2 + y 2 − y)
Generalize this technique to apply to the initial- (e) = (divide by x 2 y 2 );
dx x(x 2 + y 2 )
value problem
dy y x3 − y
dy (f) = (show that this reduces to
= g(x)h(y) d x x x3 + y
dx x y d(x/y) = y d(xy); now put u = xy
3 2

where y = b when x = a. and v = x/y).


478
22.9 The ‘logistic equation’ dP/dt = aP − bP 2, Use this result to solve (a) y′ + y = y4, (b)
− 12
where a  0 and b  0, represents the growth of + =
y′ y y .
GRAPHICAL, NUMERICAL, AND OTHER ASPECTS OF FIRST-ORDER EQUATIONS

a population P(t)  0 in which unrestricted growth


is prevented by the term −bP 2 representing pressure 22.14 The equation
on the means of subsistence. Solve the equation, d2y/dx2 + (b/x) dy/dx + (c/x2)y = 0
sketch the solution curves, and show that in all
is called equidimensional (the dx2, x dx, and x2 in
circumstances P(t) → a/b as t tends to infinity.
the denominators are considered to have the same
22.10 (Computational). A population P(t) of dimensions). It is a linear equation with zero on the
protozoa is assumed to increase according to right, so we expect a general solution of the form
the equation dP/dt = aP − bP 2, where a and b are Ay1(x) + By2(x), where A and B are arbitrary.
constants. Starting with 10 protozoa, they are Show how a basis of solutions (y1(x), y2(x)) can
observed to increase by 150% per day while the be obtained in two ways:
numbers are still low, and to reach a fairly steady (a) Look for solutions having the form y = xM,
level (dP/dt = 0) of 25 000 after a few weeks. Find where M is an unknown constant. Note that M
an approximation to a and b. might be complex; in that case, to obtain real
Use a numerical method to compute the solutions, use
population curve for the first 10 days. xα+iβ = e(α+iβ )lnx = eα lnxeiβ lnx
Compare the curve obtained from the law = xα [cos( β ln x) + i sin( β ln x)],
dP/dt = aP − bP4.
as in Section 18.4.
(b) Change the independent variable to t, where
22.11 (See Example 22.15.) (a) A falling body
t = ln x, or x = et. The new equation has constant
of mass m is subject to air resistance equal to
coefficients.
Kv α, where v is its speed of fall: its equation of
(c) Use either method of find the solutions of
motion is then
α
the equations
d 2x K ⎛ dx⎞ (i) d2y /dx2 − (2/x) dy/dx + (2 /x2)y = 0;
=g− ⎜ ⎟ ,
dt 2 m ⎝ dt ⎠ (ii) d2y /dx2 − (1/x) dy/dx + 1/x2 = 0;
where g is the gravitational acceleration and (iii) d2y /dx2 + (3/x) dy/dx + (2 /x2)y = 0.
x represents its position measured vertically
downwards. Without solving the equation, 22.15 (Computational). A boat enters a river at
show that the limiting speed of fall is equal O, and tries to reach the point A on the other
to (mg/K)1/α. bank, directly opposite O and distant H from O,
(b) Substitute v = dx /dt to obtain an equation by keeping its bow pointed towards A, at an angle
for v 2 of the form θ from OA (see Fig. 22.12). The speed of the boat
in still water is V, and the uniform stream speed
d(v 2 ) ⎛ K 1 ⎞
is v  V.
= 2 ⎜ g − (v 2 ) 2α ⎟ .
dx ⎝ m ⎠
22

H
(c) Assume that (in mks units) K = 4, m = 80,
y
α = 1.2, g = 10, and that the mass is dropped from
rest. Use Euler’s method (22.2) to obtain v 2, and (x, y)
hence v, over a sufficient distance to compare
Bank

Bank

with the limiting speed of fall.

22.12 (See Section 17.5.) Solve the following O θ


x
(implicitly) by putting y = xw. A
(a) dy/dx = (x2 − xy + y2)/xy;
(b) dy/dx + (x2 + y2)/xy = 0; Stream speed, v
(c) dy/dx + (x − y)/(3x + y) = 0;
(d) dy/dx = 2xy/(3x2 − 4y2); Fig. 22.12
(e) dy/dx + 2(2x2 + y2)/xy = 0. Show that, when the boat is at (x, y),
22.13 Show that, if the substitution w = y is
1−n dx/dt = V cos θ, dy/dt = v − V sin θ
made in the equation y′ + g(x)y = h(x)yn (called the where t is time. By dividing one equation by the
Bernoulli equation), we obtain the linear equation other find a differential equation for y in terms of x.
w′ + (1 − n)g(x)w = (1 − n)h(x). Given the values v = 1 m s−1, V = 4 m s−1, H = 30 m,
479
compute the path of the boat. (As you approach speed V  v, always directly towards the mouse.
close to A, you will encounter a problem with Show that

PROBLEMS
dy/dx.) dr/dt = v cos θ − V and dθ /dt = −(v sin θ )/r,
where r and θ are polar coordinates for the cat
22.16 (Computational). As in Problem 22.15,
relative to the (moving) mouse. Construct a
but construct a differential equation for a stream
differential equation for r in terms of θ, and
having a parabolic distribution of velocity, greatest
solve it (it is really only a question of
in the middle and zero at the banks, of the form
integration).
v(x) = ax(H − x).
Put in plausible values for V, v, H, and a, and 22.18 A satellite of mass m takes off vertically with
compute the path. speed V at time t = 0 from the surface of a planet
of radius a. Assume that it is only influenced by the
22.17 A mouse M enters a room at O and rushes gravitational pull of the planet. If r is the distance
to its hole at H with speed v, pursued by the cat of the satellite from the centre of the planet at time
C, who starts from B at the same moment as the t, then, by Newton’s law of gravitation, its
mouse appears (see Fig. 22.13). The cat runs with equation of motion is
d 2r γ Mm
m =− 2 ,
dt 2 r
B
where M is the mass of the planet and γ is the
gravitational constant. Using the identity
d 2r dv dr
= v , where v = ,
dt 2 dr dt
solve the first-order differential equation in v and
C
r to obtain
1 2 ⎛ 1 1⎞
r (v − V 2 ) = γ M ⎜ − ⎟ .
O θ M H 2 ⎝ r a⎠
Confirm that the escape velocity (i.e. the velocity
above which the satellite will not return to the
Fig. 22.13 planet) from the surface of the planet is √(2γ M /a).
Nonlinear differential
23 equations and the
phase plane

CONTENTS

23.1 Autonomous second-order equations 481


23.2 Constructing a phase diagram for (x, B) 482
23.3 (x, B) phase diagrams for other linear equations; stability 486
23.4 The pendulum equation 489
23.5 The general phase plane 491
23.6 Approximate linearization 494
23.7 Classification of linear equilibrium points 496
23.8 Limit cycles 497
23.9 A numerical method for phase paths 499
Problems 501

However many methods may be invented for solving differential equations, there
will always remain equations beyond their scope. But this does not mean that
nothing can be done with them. The important van der Pol equation, which
models a type of electrical oscillator,
d2 x dx
+ c(x2 − 1) + x = 0,
dt 2 dt
where c  0, cannot be solved explicitly. However, there are still comparatively
simple methods which enable us to demonstrate its really important feature,
which is that every solution, no matter how the device is started off, settles down
into the same regular periodic oscillation.
Techniques enabling such conclusions to be drawn without actually solving the
equation are called qualitative methods. This chapter outlines a way of looking
at differential equations which is at the basis of many of these techniques.
Qualitative methods do not consist of a collection of fixed results, and tend to be
exploratory. Therefore computation is important. In the final section, a simple
computing method is described which is easy to program but is effective enough
to analyse realistic physical and biological models.
We shall take t (time) as the independent variable. For derivatives with respect
to time, we use the conventional dot notation (just like the dash notation (4.1)):
dx d2 x
B= , F= 2.
dt dt
481

23.1 Autonomous second-order equations

23.1
Let the independent variable be t and the dependent variable x. We shall only
discuss equations which can be written in the form

AUTONOMOUS SECOND-ORDER EQUATIONS


F(t) = Q(x(t), B(t)),
in which t does not appear independently under Q on the right-hand side. Such
equations are called autonomous. For example, the equation F − xB + 1 = 0 is
autonomous, but tF − xB + 1 = 0 is not autonomous. If startup conditions at t = t0 are
supplied so as to specify an initial-value problem
F = Q(x, B), x(t0) = x0, B(t0) = y0 (23.1)

(there is a special reason for using the symbol y0 here), we expect that the initial
conditions will select exactly one solution.
Suppose that the equation represents an electrical system, and that a graph
of x against t for t  t0 can be plotted automatically. If we find the clock in the
plotter has been wrongly set, then it will not make the graph unusable; only its
starting time t0 will be wrong. Similarly, if we do one experiment starting at
t = 8.00 h and repeat it at t = 13.00 h, the graphs plotted will be the same shape
although the starting times are different (see Fig. 23.1). Intuition suggests that
for autonomous equations, namely those in which t does not occur independ-
ently, it will not be a local clock time t0 assigned to startup that counts, but the
‘stopwatch’ time elapsed from startup, t − t0.

x(t)

Fig. 23.1 Two experiments,


with a device described by an
autonomous equation, which
O 8 13 t start at different times.

This intuition is correct. The mathematical reason is that a change of time scale
from t to t − t0 does not change the form of the differential equation, so the same
phenomena follow. Put
T = t − t0, and write x(t) = X(T ).
Then dX /dT = dx/dt and d2X /dT2 = d2x/dt2. Also t = t0 becomes T = 0, so that the
new initial-value problem is
E = Q(X, A ), X(0) = x0, A(0) = y0. (23.2)

The equation is unchanged, but the starting time is assigned the value zero.
Suppose (23.2) is solved in terms of T. Restore t and x(t) by putting T = t − t0. The
solution x(t) of (23.1) is then a function only of t − t0, so it depends only on the
time elapsed from startup.
482

Example 23.1 Solve the initial-value problems F + ω 2x = 0, with x(t0) = x0 and


NONLINEAR DIFFERENTIAL EQUATIONS AND THE PHASE PLANE

B(t0) = y0.
The general solution is x(t) = A cos ω t + B sin ω t (see (18.14)). The process of finding A
and B from the equations obtained by substituting the expressions for x(t) and B(t) into
the initial conditions is quite complicated (try it). Instead, put
T = t − t0, x(t) = X(T ).
Then
E + ω 2X = 0, X(0) = x0, A(0) = y0.
The solution of this system is simple:
X(T ) = x0 cos ω T + ω −1y0 sin ω T.
Put T = t − t0; then the required solution is
x(t) = x0 cos ω (t − t0) + ω −1y0 sin ω (t − t0),
that is to say, x is a function only of the elapsed time t − t0.

Autonomous second-order equations


The solution of the initial-value problem
F = Q(x, B)
with x(t0) = x0 and B(t0) = y0 is a function only of elapsed time t − t0. (23.3)

23.2 Constructing a phase diagram for (x, x· )


Consider again the initial-value problem of Example 23.1:
23

F + ω 2x = 0, x(t0) = x0, B(t0) = y0. (23.4)

This problem could arise in connection with a mass oscillating on a spring. The
initial conditions imply that the position and velocity are prescribed at the start,
t = t0. If asked what the system was doing at t = t0, specification of the position and
velocity seems to constitute an adequate description of its state. It is in fact a
perfect description, since it is exactly what is required to determine the whole
future of the system. It is therefore reasonable to call the pair of numbers
(x0, y0)
the state of the system at t0.
Subsequently the system moves smoothly through a succession of states: x and
B will vary in time. Catch the system at any moment t1; then the state (x(t1), B(t1))
serves as fresh initial conditions for all the subsequent motion, but there can
never be any conflict with what was predicted from the original initial conditions.
It is the succession of states which is the subject of this chapter; the precise time
that the states occur takes a secondary place.
To track the succession of states (x(t), B(t)) for the initial-value problem (23.4),
we could in principle begin by finding its solution. The solution (Example 23.1) is
x(t) = x0 cos ω t + ω −1y0 sin ω t,
483
so

23.2
B(t) = −ω x0 sin ω t + y0 cos ω t.
In effect, these equations specify the states, x(t) and B(t), parametrically with

CONSTRUCTING A PHASE DIAGRAM FOR (x, x· )


parameter t. However, the expressions do not clearly reveal the association of x
with B, which is what we actually observe from moment to moment. Moreover,
we need a method for equations that we cannot solve.
We shall take a different route to the states (x, B) which does not require that
we solve the differential equation. Write
B = y. (23.5a)

Since F = −ω 2x, and F = (d/dt)B, we have


D = −ω 2x. (23.5b)

These two simultaneous first-order differential equations are equivalent to the


second-order differential equation (23.4). Divide (23.5b) by (23.5a), and use (3.8):
dy dx dy x
= = −ω 2 . (23.6a)
dt dt dx y
Time has disappeared from the problem, and we now have a single first-order
equation connecting x and y (i.e. connecting x and B). Separate the variables (see
Section 22.3) in (23.6a), and we obtain

 y dy = −ω  x dx,
2

or
ω 2x2 + y2 = C, (23.6b)

where C is a positive, but otherwise arbitrary, constant.


The motive for introducing y as the symbol for B now becomes clear. Set up a
pair of axes x and y as in Fig. 23.2. This framework is called the (x, B) phase plane.
The solution (23.6b) represents the family of ellipses displayed in the figure, and
is called an (x, B) phase diagram for the differential equation F + ω 2x = 0.

y=B

(x0, y0)
A

x
O

Fig. 23.2 (x, B) phase diagram


for F + ω 2x = 0, displaying y = B
against x.
484
Any point on the diagram, say A : (x0, y0), represents an initial state. If we follow
the curve passing through A, we obtain the sequence of states for the corresponding
NONLINEAR DIFFERENTIAL EQUATIONS AND THE PHASE PLANE

solution. Equation (23.6a) alone does not tell us which way to travel along the curve;
for this purpose, we momentarily resurrect t. The arrows indicate the directions
that correspond to time going forwards rather than backwards. We defined y by
dx
= B = y. (23.7)
dt
If we are in the upper half plane, then y  0, so dx/dt is positive. Therefore x(t) is
increasing, and the directive arrow points from left to right. By a similar argu-
ment with y  0 we find that in the lower half plane the arrow points from right
to left. We must follow the arrow. Supplied with arrows, the state curves are called
phase paths, or trajectories, or orbits for the differential equation F + ω 2x = 0.
Starting from A : (x0, y0), follow the phase path. In going round, we can pick out
a new feature in passing: that B is zero when x is at a minimum or maximum, and
vice versa. Eventually we get back to A, renewing the initial state. Continue to
follow the path around, duplicating the first circuit; the succession of states is
repeated time after time.
This repetition does not itself establish that this is a truly periodic process.
When we meet A again at the end of the first circuit, it is at a later time, t1 say, so
the initial conditions for the original equation are to this extent changed. Even
though the system must follow the same path, perhaps it takes twice as long to
go round the second time. However, from the discussion in Section 23.1, the time
to complete any circuit, or to go repeatedly between any two fixed points on the
circuit, is invariable because the equation is autonomous. This argument does not
depend on what equation we started with, so we can say in general: any closed
phase path represents a periodic oscillation.
23

Finally notice in Fig. 23.2 the bullet at the origin. This point represents a true
solution, namely
x(t) = 0, y(t) = 0
for all t (corresponding to C = 0 in (23.6b)). It is a special case of an equilibrium
point, meaning, in general, a constant solution
x(t) = k, B(t) = 0,
where k is a constant. Equilibrium points are of great importance in phase dia-
grams. An equilibrium point surrounded by closed curves is called a centre. It
represents periodic oscillations about equilibrium. (Note, however, that oscillations
of different amplitudes do not usually have the same period).
Since we chose a simple case, we have not discovered anything we did not know
already, so consider the following example.

Example 23.2 Sketch an (x, B) phase plane for the equation F + cx3 = 0,
where c  0.
This represents small lateral oscillations x(t) of a mass attached to the middle of an
elastic string that is fixed at the ends and is unextended when x = 0. It is the same as
Example 20.5, with l = L, and c = 4s/ml. We can regard explicit solutions as being
unobtainable. ➚
485
Example 23.2 continued

23.2
Put y = B. Then F = D, and we have two first-order equations, together equivalent to
the original equation:
B = y, D = −cx3,

CONSTRUCTING A PHASE DIAGRAM FOR (x, x· )


Therefore
dy dy dx x3
= = −c .
dx dt dt y
By separating the variables, we obtain

 y dy = −c  x dx,
3

so that 12 y2 = − 14 cx 4 + C, where C is an arbitrary constant. Therefore


y = ±(c / 2)2 (A − x 4 )2 ,
1 1

where A is arbitrary. For any A  0, the phase path consists of two curves, one
for y  0 and one for y  0. On both curves y = 0 where x = ± A 4 . The curves join
1

smoothly at these two points so that the phase paths are closed curves. The family
of phase paths is shown in Fig. 23.3. The origin is an equilibrium point since x = 0,
y = 0 is a solution.

x
O

Fig. 23.3 A centre (always stable).

The phase diagram of Fig. 23.3 consists entirely of closed curves; so every solu-
tion of the differential equation is a periodic oscillation. (However, we cannot say
that they all have the same period: in fact they do not.) This phase diagram has
therefore revealed an important fact about an equation that we could not solve.

Self-test 23.1
Write down and solve the equation for the phase paths of F − x3 = 0. Sketch
the phase diagram.
486

(x, x· ) phase diagrams for other linear


23.3
NONLINEAR DIFFERENTIAL EQUATIONS AND THE PHASE PLANE

equations; stability
For the moment, we shall stay with equations that are familiar from Chapter 18.

Example 23.3 Construct the phase diagram for F − ω 2x = 0.


Put B = y; then D = ω 2x. The origin is an equilibrium point. To eliminate t, form
D dy x
= = ω2 .
B dx y
Separate the variables:

 y dy = ω  x dx,
2

or
y2 − ω 2x 2 = A,
where A is arbitrary. This represents the family of hyperbolas in Fig. 23.4, having
asymptotes y = ±ω x. The directions of the arrows follow the rule in Section 23.2:
left to right in the upper half plane.

O
x
23

Fig. 23.4 A saddle point (always


unstable).

Differential equations usually describe the behaviour of some circuit, machine,


ecosystem, or something else in the real world, and our concern is with inter-
preting the phase diagrams in such a way as to bring to light features of practical
importance. One important question is that of stability of equilibrium of a system.
In phase diagrams, the question turns into that of the stability of an equilibrium
point, such as the origin in Examples 23.1 to 23.3.
In practice, systems are always subject to small external disturbances and
internal fluctuations. For a system to work, it is important that small causes give
small effects, and that the effects do not commence to grow catastrophically. In
that case, a system is said to be stable with respect to small disturbances; otherwise
it is unstable. The precise criterion for tolerable behaviour will depend on what
we require from the particular system.
An equilibrium point surrounded by a structure of curves resembling those
shown in Fig. 23.4 is called a saddle. It could hardly be classed as anything but
an unstable equilibrium point. Apart from two special directions, if equilibrium is
487
disturbed – even by a hairsbreadth – the system will find itself on one of the
hyperbolas, and so it will be swept further and further away from equilibrium.

23.3
On the other hand a centre, exemplified in Figs 23.2 and 23.3, would often be
called a stable equilibrium point. If equilibrium is disturbed by a small amount,

(x, x· ) PHASE DIAGRAMS FOR OTHER LINEAR EQUATIONS; STABILITY


the system does not go wild; it simply oscillates around its equilibrium position.
However, a vehicle which behaved like that would be regarded as very unstable.
Subject to a continual battering, it would vibrate objectionably, and the vibrations
would never die away; therefore the stronger sort of stability illustrated in the follow-
ing two examples is preferable.

Example 23.4 Construct a phase diagram for F + 14 B + x = 0 (weak damping).


The equivalent pair of first-order equations is
B = y, D = − 14 y − x. (i)
Therefore
dy − 14 y − x (ii)
= .
dx y
This is hard to solve. To produce a phase diagram, we could solve the original equation
for x(t) by the methods of Chapter 18, and then obtain y = B(t). These are parametric
equations for (x, y) curves. However, this would not be in the spirit of this chapter,
because almost never are we able to solve the original equation. Instead, the Euler
method of Section 22.2 can be used to obtain solution curves for (ii). (As we shall see
later, it is easier to work from (i) using Section 23.8.) We obtain the pattern of spiral
curves surrounding the origin shown in Fig. 23.5. Clearly x = 0, y = 0 is a solution of
(i) so that the origin is an equilibrium point.

O
x

Fig. 23.5 A stable spiral.

Example 23.4 is a case of a linear oscillator with small damping, discussed


in Section 18.4. The phase paths show that, from any starting point, the origin is
approached via a sequence of diminishing spirals. Therefore any initial disturb-
ance from equilibrium dies away. The equilibrium point is called a stable spiral.
For the equation
F − 14 B + x = 0,
the pattern of curves gives an outgoing unstable spiral.
488

Example 23.5 Construct a phase diagram for 2F + 7B + 3x = 0.


NONLINEAR DIFFERENTIAL EQUATIONS AND THE PHASE PLANE

This is a case of heavy damping (Section 20.4). We obtain


B = y, D = − 27 y − 23 x,
or
dy − 27 y − 23 x
= .
dx y
The phase paths, calculated numerically, are as shown in Fig. 23.6. The system has an
equilibrium point at the origin.

O
x

y = − 12x
y = −3x Fig. 23.6 A stable node.

In Fig. 23.6, the origin is called a stable node. All solutions fall straight into the
origin without any oscillations: the system is deadbeat. Notice the structure of
the node. There are two straight line solutions to the equation

dy − 27 y − 23 x
= ,
dx y
23

which can be found by trying for solutions of the form y = mx. Then dy /dx =
m = (− 27 m − 23 )/m, or 2m2 + 7m + 3 = 0. Therefore m = − 21 or m = −3, and the two
linear solutions are y = − 21 x, y = −3x. The divide the plane into four sectors which
contain curved phase paths. Each of the curves has the property that it is tangen-
tial to y = − 21 x at the origin, and parallel to y = −3x at infinity. This behaviour is
characteristic of nodes arising from linear equations, and the mutual tangency at
the origin is common to all nodes, even those arising from nonlinear equations.
The technique for second-order differential equations can be summed up as
follows.

(x, B) phase plane for F = Q(x, B)


(a) Phase path equations:
B = y, D = Q(x, y).
(b) The direction of a phase path (x, y) is left to right if y  0, and right to
left if y  0.
(c) Equilibrium points correspond to constant solutions of B = 0, D = 0. They
occur at (x, 0), where x is any solution of Q(x, 0) = 0.
(d) Alternative equation dy /dx = Q(x, y)/y. (Shows that different phase paths
may meet only at equilibrium points or other points where Q(x, y)/y is
undefined, and that paths cross the x axis at right angles.) (23.8)
489

Self-test 23.2

23.4
Discuss the possible phase diagrams of the linear equation F + B + cx = 0 for
nonzero values of the constant c.

THE PENDULUM EQUATION


23.4 The pendulum equation
The equation for a pendulum (Fig. 23.7) consisting of a light rod AB of length l
freely pivoted at A and carrying a mass at B is
F + ω 2 sin x = 0,
where x is the angle of inclination and ω 2 = g/l; here g is the gravitational accelera-
tion. This equation can be solved, but only with difficulty, and by using recondite
functions. From (23.7), the equations for the phase paths are
B = y, D = −ω 2 sin x. (23.9)

Fig. 23.7

The equilibrium points solve the equation sin x = 0, so


x = 0, ±π, ±2π, … , with y = 0. (23.10a)

If
x = 0, ±2π, ±4π, … , (23.10b)

the pendulum is hanging vertically from its pivot in equilibrium. These values of x
all represent the same observed state, though on the phase plane they correspond
to different points. Similarly the values
x = ±π, ±3π, … (23.10c)

represent a state which is not usually thought of in connection with a pendulum:


the pendulum rod is perched vertically upwards (and insecurely) on its pivot A.
Consider x = 0 as being representative of the freely hanging state, the equilibrium
points (23.10b). See what happens when the displacement from x = 0 is small, but
the pendulum is no longer in equilibrium. We can then put
sin x ≈ x.
490
The original equation becomes F + ω 2x = 0 (approximately), with solutions x = C
cos(ω t + φ ), where C (small) and φ are arbitrary. This is the familiar condition of
NONLINEAR DIFFERENTIAL EQUATIONS AND THE PHASE PLANE

small, isochronous oscillations. We have already solved the same problem for the
phase plane in Section 23.2 and we found a centre: the family of ellipses shown in
Fig. 23.2:
ω 2x2 + y2 = C,
where C is an arbitrary non-negative constant. This family will be repeated (for
small C) around x = ±2π, ±4π, … in a progressively developing phase diagram;
see Fig. 23.8.

x
−π O π 2π 3π 4π

Fig. 23.8 Phase paths near the equilibrium points for the pendulum equation F + ω 2 sin x = 0.

Next, consider the case when the pendulum stands vertically: we choose as
representative of (23.10c) the case x = π. To find what happens when the state is
slightly displaced from the point (π, 0), put
x = π + X,
where the new variable X is going to be small. Then
23

sin x = sin(π + X) = sin π cos X + cos π sin X = −sin X ≈ −X.


From (23.8), with X instead of sin x, the (approximate) equation for the displace-
ment X from the equilibrium point is
E − ω 2X = 0.
From Example 23.3, this is equivalent to a saddle point on the phase plane at
X = 0 (i.e. at x = π), and all the other equilibrium points in (23.10c) will have the
identical structure, which implies instability.
The state of affairs around the equilibrium points is shown in Fig. 23.8. The
rest of the phase diagram could be computed from (23.9), but in this case it is not
difficult to sketch it in its entirely as in Fig. 23.9. From (23.9),
dy ω 2 sin x
=− .
dx y
By separating the variables, we obtain the equations of the paths, which can be
written in the form
y = ±√2ω (cos x − A)2 ,
1
(23.11)

where A is (to an extent) arbitrary. Since cos x has period 2π, the repetitious nature
of Fig. 23.8 is explained. Notice that (cos x − A)2 is real only when cos x  A.
1
491

23.5
THE GENERAL PHASE PLANE
x
−π O π 2π 3π

Fig. 23.9 Phase diagram


1
(x, B) for the pendulum equation F + ω 2 sin x = 0, given by
y = ±√2ω (cos x − A) with A  1. The figure extends with period 2π. The undulating curves
2

(for A  −1) represent a whirling motion. The separatrices correspond to A = −1. There are
centres at x = 0, ±2π, … , and saddles at x = ±π, ±3π, … .

Therefore A  1. With that limitation, there are two main ranges of A which
give significantly different patterns of (x, y) curves: −1  A  1 and A  −1. The
centres correspond to A = 1, and the special curves joining the saddles, called the
separatrices, correspond to A = −1. Notice the regular whirling motions which
occur if y = B is large enough.

23.5 The general phase plane


There exists a great field of problems which, right from the start, take the form of
simultaneous first-order differential equations:
B = P(x, y), D = Q(x, y). (23.12)

Example 23.6 A community of foxes and rabbits lives in uneasy harmony on


an island. The rabbit population is x(t), and they eat grass. The fox population
is y(t); they eat rabbits. Construct a differential equation model for the
population variation.
In a short time δt there is a rabbit population increase ax δt (a  0) due to births and
natural deaths, and a decrease −bxy δt (b  0) due to meetings with foxes, the frequency
of which we suppose to be jointly proportional to the populations of these animals. The
net change in time δt is therefore
δx = ax δt − bxy δt.
Divide by δt and let δt → 0; we obtain
dx
= ax − bxy. (i)
dt
For the foxes, assume that a shortage of prey causes a death rate c from starvation, offset
by a fecundity factor dxy among those who get something to eat. Then, in time δt,
δy = −cy δt + dxy δt.
Divide by δt and let δt → 0:
dy
= −cy + dxy. (ii)
dt
(i) and (ii) form a simultaneous (nonlinear) first-order system:
B = ax − bxy, D = −cy + dxy, (23.13)
with a, b, c, d,  0.
492
We shall now show how important characteristics of the solutions x(t) and y(t)
of the equations (23.13) resulting from this example can be revealed on a general
NONLINEAR DIFFERENTIAL EQUATIONS AND THE PHASE PLANE

phase plane by plotting y against x. (Notice that B is no longer equal to y, so this


is not the same as the (x, B) phase plane that we had before.) The pair of values
(x, y) will be called a state. Although we do not prove it, the values of x and y
at a particular t constitute initial conditions determining the solution for all
subsequent time t  t0, and because the equations are autonomous, the solutions
are functions only of t − t0.
Firstly, look for any constant solutions (x = constant, y = constant). These must
satisfy the differential equations: that is to say
0 = x(a − by) and 0 = −y(c − dx).
Therefore (x, y) = (0, 0) and (x, y) = (c/d, a/b) are the constant solutions: on the
(x, y) phase plane, they are the equilibrium points.
As before, divide the two equations, obtaining
dy dx dy (c − dx) y
= =− . (23.14)
dt dt dx (a − by) x
This is a separable equation; after separation it becomes

 ⎜⎝ y − b⎟⎠ dy = − ⎜⎝ x − d⎟⎠ dx,


⎛a ⎞ ⎛c ⎞

or
a ln y − by + c ln x − dx = C, (x  0, y  0). (23.15)

Equation (23.15) represents the closed curves shown in Fig. 23.10. It is possible to
23

determine in advance that they are closed: the reason will be given shortly.
The direction arrows on the figure do not obey the rule (23.8b) for the (x, B)
phase plane. Each case has to be treated separately. The principle is easy: we have
to find the direction at a single point, and the directions elsewhere are settled
by continuity of direction: we expect adjacent curves to have the same direction.
We might take the point M : (0, m) in Fig. 23.10. At this point, the second
equation of (23.13) gives D = −cm  0, so y is decreasing at M. Once this direction
is settled, the directions on the other curves follow by continuity.
There is a centre at the equilibrium point E : (c/d, a/b). If the rabbit/fox
populations take the values at E, the equations predict that the state will be

y E
(c/d, a/b)

O x Fig. 23.10
493
permanent. A bad season for grass, or a disease amongst the foxes, will put the
population state somewhere else, and thereafter the populations will undergo

23.5
periodic oscillations. If foxes feast and thrive, rabbits languish, eventually starving
the foxes; therefore rabbits prosper again; and so on.

THE GENERAL PHASE PLANE


The equilibrium point at O is unstable. If rabbits are introduced into a desert
island paradise the population increases indefinitely, following the x axis arrow.
If foxes are introduced to control the rabbits, a great periodic cycle is set up which
goes on for as long as nothing else changes. Clearly the model is imperfect; but,
provided that we have a program which will plot general phase paths, the com-
plexity of the model is really a matter of no importance.
Earlier we said that the implicit equation (23.15) for the phase paths gives
closed curves. It is useful to be able to recognize this feature.

Condition that f(x) + g( y) = C (C arbitrary) represents a centre


If f(x) has a minimum at x = α, and g(y) has a minimum at y = β, then there
is an equilibrium point at (α, β ) which is locally a centre (can also substitute
‘maximum’ for ‘minimum’ in both places). (23.16)

To understand this you might need to look forward at Section 28.1. In three
dimensions, x, y, z, the surface z = f(x) + g(y) is bowl-shaped, with a minimum
or maximum at (α, β ). The paths are closed curves cut out by intersection with
the horizontal planes z = C. The functions c ln x − dx and a ln y − by have maxima
at x = c/d, y = a/b.
For a general system
B = P(x, y), D = Q(x, y),
the equilibrium points are where
P(x, y) = Q(x, y) = 0,
and might therefore appear anywhere in the phase plane, not just on the x axis
as with the (x, B) plane. On Q(x, y) = 0, D = 0 so that phase paths cut this curve
parallel to the x axis. Similarly on P(x, y) = 0, B = 0 so that paths cut this
curve parallel to the y axis. Between these curves the slopes of the paths will be
either positive or negative depending on the sign of Q(x, y)/P(x, y). The curves
Q(x, y)/P(x, y) = constant are known as isoclines (as in Section 22.1), that is,
curves along which the slopes of the phase paths are constant. The following
statements recall the main features encountered in this section.

General phase plane


(a) Phase path equations: B = P(x, y), D = Q(x, y).
(b) Equilibrium points: the solutions of P(x, y) = Q(x, y) = 0.
(c) Phase path direction: find the direction at one point and use continuity
for other paths.
(d) Alternative equation: dy/dx = Q(x, y)/P(x, y).
(e) Phase paths have zero slope on Q(x, y) = 0, and infinite slope on P(x, y) = 0.
(23.17)
494

Self-test 23.3
NONLINEAR DIFFERENTIAL EQUATIONS AND THE PHASE PLANE

Locate the equilibrium points of the autonomous system B = y(x2 − 1),


D = −x(y2 − 1). Some solutions can be seen without solving the equations.
What are they?

23.6 Approximate linearization


In connection with the pendulum problem of Section 23.4, we were able to
analyse equilibrium points by using a linear approximation valid near the points.
In the general case, suppose that B = P(x, y), D = Q(x, y), and that (k, l) is an
equilibrium point:
P(k, l ) = Q(k, l) = 0. (23.18)

We shall obtain a linear approximation to P(x, y) and Q(x, y) valid near (k, l). Put
x = k + X, y = l + Y, (23.19)

where we suppose that X and Y are small. Then, because of (23.18), the approxima-
tions will take the form
A = P(x, y) ≈ aX + bY, C = Q(x, y) ≈ cX + dY (23.20)

(with no constant term present).


Equations (23.20) should provide information about the phase paths near the
equilibrium point (x, y) = (k, l ), which has become (X, Y) = (0, 0), and tell us at
least whether they are stable or unstable. This is often true, but not always: if, to
23

take an extreme case, a = b = c = d = 0, then we would hardly want to rely on it.


The single equation corresponding to (23.17d) which connects X and Y in the
approximation is
dY cX + dY
= . (23.21)
dX aX + bY
It might be expected that the methods described in earlier chapters would be
sufficient to deal with this family of equations, but they lead to implicit relations
between x and y that are difficult to interpret. Therefore, we summarize the
results in eqn (23.22) for the use of readers who are not familiar with matrix
algebra, and give their derivation separately in Section 23.7 which follows.

Equilibrium point (0, 0) of the linear system A = aX + bY, G = cX + dY


Put p = a + d, q = ad − bc, ∆ = p2 − 4q.
(a) If q  0 and ∆  0: a node (p  0, stable; p  0, unstable).
(b) If q  0 and ∆  0: a spiral (p  0, stable; p  0, unstable).
(c) If q  0: a saddle.
(d) If p = 0 and q  0: a centre.
(e) Path directions: investigate one point. (23.22)
495
Since the differential equations in (23.20) are linear, the phase diagram centred on
(0, 0) is self-similar: the pattern of paths is the same if viewed centrally through

23.6
a microscope or seen over an immense field, so we are not restricted to small x
and y.

APPROXIMATE LINEARIZATION
In applying (23.22), do not be too ready to decide that the original equations
have a centre just because the linearized ones do: the small difference from the
linear approximation may be all that is necessary to change a centre into a spiral.

Example 23.7 Classify the equilibrium points of the system


B = x − y, D = 1 − xy.
The equilibrium points are where x − y = 0, 1 − xy = 0; that is, at (1, 1) and (−1, −1).
Near (1, 1). Put x = 1 + X, y = 1 + Y. Then x − y = X − Y, and
1 − xy = 1 − (1 + X)(1 + Y) ≈ −X − Y
for X and Y small. Therefore, in (23.20),
a = 1, b = −1, c = −1, d = −1;
so p = 0, q = −2, ∆ = 8. According to (23.22), this is a saddle point (which is an unstable
equilibrium point).
Near (−1, −1). Put x = −1 + X, y = −1 + Y; then we obtain
x − y = X − Y, 1 − xy ≈ −X − Y,
so that a = 1, b = −1, c = 1, d = 1. Therefore q = 2  0, p = 2  0, ∆ = −4  0; so, by
(23.22), the point is an unstable spiral. The computed phase diagram is shown in
Fig. 23.11. Note that the paths have zero slope on xy = 1 and infinite slope on y = x.
See the remarks before (23.17).

(1, 1)

O
x

(−1, −1)
Fig. 23.11

Self-test 23.4
Investigate the linear approximations of
B = y(x2 − 1), D = −x(y2 − 1)
at its equilibrium points (see Self-test 23.3). Sketch the phase diagram.
496

23.7 Classification of linear equilibrium points


NONLINEAR DIFFERENTIAL EQUATIONS AND THE PHASE PLANE

We return to the linearized time-dependent system A = aX + bY, C = cX + dY. In


matrix form the system may be written as
a = AX (23.23)

where
G X(t) J · G A(t) J G a bJ
X(t) = I , X(t) = I , A=I , (23.24)
Y(t) L C(t) L c dL

and we shall assume that


det A = ad − bc ≠ 0 (23.25)

to secure that the origin is the unique equilibrium point (otherwise there is a line
consisting of an infinite number of equilibrium points).
We look for a basis of solutions having the form
X1(t) = U1 eλ t,1
X2(t) = U2 eλ t,
2
(23.26)

in which U1 and U2 are linearly independent constant vectors. Every solution X(t)
of (23.23) will be the sum of multiples of the basic solutions (23.26).
To determine λ1, λ2, U1 and U2, substitute the form
X(t) = U eλt
into (23.23). After cancelling the common factor eλt, we obtain λU = AU, or
(A − λI)U = 0 (23.27)
23

where I is the identity matrix. The solutions λ are the eigenvalues of A, and U1
and U2, a pair of linearly independent eigenvectors. Equation (23.27) has nonzero
solutions for U if, and only if,
4a − λ b 4
det(A − λI) = 4 = 0,
c d − λ4
which is the quadratic equation
λ2 − (a + d)λ + (ad − bc) = 0.
Put
p = a + d, q = ad − bc, ∆ = p2 − 4q. (23.28)

Then the eigenvalues are given by


1
λ1 and λ2 = --12 (p ± ∆–2 ). (23.29)

There are three main categories, corresponding to very different behaviours of


the phase paths. The general solution, with basis (23.25), take the form
X(t) = C1U1eλ t + C2U2eλ t,
1 2
(23.30)

where C1 and C2 are arbitrary constants.


497
Firstly, suppose that ∆  0. Then λ1, λ2, U1, U2, C1 and C2 are all real. If λ1  0
and λ2  0, all X(t) → ∞ as t → ∞ and X(t) → 0 as t → −∞. If λ1 < 0 and λ2 < 0,

23.8
then all X(t) → 0 as t → ∞ and X(t) → ∞ as t → −∞. These cases correspond
respectively to unstable and stable nodes (Fig. 23.6 is an example of a node).

LIMIT CYCLES
If the eigenvalues are of opposite sign, say λ1  0 and λ2  0; then all X(t) → ∞
with the exception of the two straight line paths corresponding to C2 = 0, which
enter the origin as t → ∞. There are also two that emerge from the origin cor-
responding to C1 = 0. This behaviour defines a saddle point (see Fig. 23.4).
1
Now suppose that ∆  0. Then ∆–2 is pure imaginary, and λ1 and λ2 are complex
conjugates:
λ1 = α + iβ, λ2 = α − iβ, (23.31)

say, where α and β are real. Also the eigenvectors U1 and U2 are complex con-
jugates (or may be so chosen), and since we require the most general real solution
we choose C2 = C1. Express these parameters in the form
C1 = | C1 | eiγ, C2 = | C1 | e−iγ 5
G uJ G| u | e J iρ G UJ G| u | e J 6 .
−iρ
U1 = I L = I , U2 = I L = I (23.32)
v | v | eiσL V | v | e−iσL 7
The general (real) solution for the system becomes
X(t) = C eα t | u | cos(γ + ρ + β t)9
(23.33)
Y(t) = C eα t | v | cos(γ + σ + β t)$
in which C and γ are arbitrary. In the X, Y plane this represents two simultaneous
harmonic motions having the same circular frequency β and different phases,
γ + ρ and γ + σ, and amplitudes modulated by the factor eα t.
Consider first the case α = 0 in (23.31), so that
X(t) = C | u | cos(γ + ρ + β t)9
.
Y(t) = C | v | cos(γ + σ + β t) $ (23.34)

X(t) and Y(t) are periodic, with period 2π/β, and therefore (X(t), Y(t)) represents a
closed path in the X, Y phase plane, which must surround the origin, since the origin
is the only equilibrium point. By varying C we generate a family of geometrically
similar curves. Therefore the origin is a centre (see, for example, Fig. 23.3: it can
be shown that these paths are central ellipses inclined to the axes in general).
In eqn (23.33) for α ≠ 0, the factor eα t modulates the amplitude of (23.34). The
closed paths (23.4) are no longer closed, but expand along every cycle is α  0,
and contract approaching the origin if α  0. We therefore have a family of spirals
(see, for example, Fig. 23.5), and the origin is therefore called a spiral point.

23.8 Limit cycles


Spirals, centres, etc., occur for both linear and nonlinear systems, but a limit cycle
is a feature only of nonlinear systems. When it occurs it usually represents the
most important phenomenon in the phase plane. The following example includes
a limit cycle.
498

Example 23.8 Sketch a phase diagram for


NONLINEAR DIFFERENTIAL EQUATIONS AND THE PHASE PLANE

F + (x2 + B2 − 1)B + x = 0.
Put
B = y, D = (1 − x2 − y2)y − x. (i)
It is possible to express the phase paths in polar coordinates r, θ :
r 2 = x2 + y2 and tan θ = y/x.
Differentiate these equations with respect to t:
rK = xB + yD,
I/cos θ = (xD − yB)/x2.
2
(ii)
Substitute (i) into (ii): remember that B = y, and put x = r cos θ and y = r sin θ as necessary.
Then r(t) and θ(t) are found to satisfy
K = −r(r 2 − 1) sin2θ, (iii)

I = −1 − (r 2 − 1) sin θ cos θ. (iv)

A particular solution of (iii) and (iv) is r = 1, with I = −1. This indicates a pathconsisting
of the circle r = 1, followed around in the clockwise direction with unit angular velocity.
Also, from (iii),
⎧ 0 if r  1,
K⎨
⎩ 0 if r  1, (v)
so the circle is approached from points inside by means of expanding spirals, and from
points outside by contracting spirals. The phase diagram is shown in Fig. 23.12.

y
23

O x

Fig. 23.12

If we start from any initial conditions except for the equilibrium point (0, 0), the
system settles down gradually to the regular oscillation represented by the circle. This
behaviour has a physical explanation. The ‘coefficient’ x2 + B2 − 1, although variable,
serves the purpose of a damping coefficient. Outside the circle, when x2 + y2 − 1  0
(remember y = B), energy is lost and the paths tend to drift inwards. When x2 + y2 − 1
 0 there is negative damping; energy is being supplied, so the amplitude of paths within
the circle increases. For points on the circle x2 + y2 − 1 = 0 the damping is zero, so the
motion is harmonic (the solutions are x = cos(t + φ ), with φ any constant), consistent
with the circular path.
499

(a) y

23.9
15

A NUMERICAL METHOD FOR PHASE PATHS


10
(b) x
5
O
−2 −1 O 1 2 x t
−5

−10

−15

Fig. 23.13 (a) Limit cycle for F + 10(x2 − 1)B + x = 0. (b) The solution x(t) corresponding to
the limit cycle.

The circular path r = 1 in Example 23.8 is an example of a limit cycle, which


is defined generally as an isolated closed phase path. If the paths approach it
spirally (in a broad sense) from both sides, it is called a stable limit cycle. It then
represents a stable oscillation: if we disturb, or perturb, the oscillation by a small
amount, it simply creeps back into the original oscillation. If the paths on one or
both sides point away from the limit cycle, it is called unstable and is unlikely ever
to be observed in practice.
To show how distorted a limit cycle can be, we return to the van der Pol equa-
tion for the special case F + c(x2 − 1)B + x = 0 with c = 10, a comparatively large
parameter value. Figure 23.13 shows its limit cycle in the (x, B) phase plane
together with the solution represented by the limit cycle. As c becomes larger the
oscillator has a constant output for long intervals, and then suddenly switches,
exhibiting what is known as relaxation oscillations.
An extensive advanced treatment of the phase plane in differential equations is
given by Jordan and Smith (2007a,b).

Self-test 23.5
Using polar coordinates, show that F + (x2 + B2 − 1)B + x = 0 has a limit cycle
whose path is given by r = 1. Is it stable?

23.9 A numerical method for phase paths


We shall show a numerical method, related to Euler’s method of Section 22.2, for
plotting phase paths of the system
B(t) = P(x(t), y(t)), D(t) = Q(x(t), y(t)). (23.35)

Essentially we use t as a parameter in a step-by-step solution. Start from an initial


point P : (x0, y0) at time t0. The choice of t0 does not affect the path constructed
because the equations are autonomous. Take short time steps of length h. Then
we proceed from point to point in the diagram;
500
P0 : (x0, y0) → P1 : (x1, y1) → P2 : (x2, y2) → ··· .
NONLINEAR DIFFERENTIAL EQUATIONS AND THE PHASE PLANE

Since, approximately,
xn+1 − xn = hB(tn ) = hP(xn, yn),
and similarly for yn+1 − yn, the rule for getting from Pn to Pn + 1 is as follows.

Euler’s method for B = P(x, y), D = Q(x, y)


xn+1 = xn + hP(xn, yn), yn+1 = yn + hQ(xn, yn).
Compute for n = 0, 1, 2, … successively. (23.36)

This process gives rise to rather unevenly spaced points on a phase path, widely
spaced when P and Q are large, and very closely spaced near an equilibrium point,
where P and Q are inevitably small. However, they have the advantage that, if
necessary, regular time indications can be marked on the path while it is being
computed.
If evenly spaced points are wanted, the parameter can be changed from time t
to arc-length s. We have δs2 = δx2 + δy2; so
1

ds ⎡⎛ dx ⎞
2
⎛ dy ⎞
2
⎤2
= ⎢⎜ ⎟ + ⎜ ⎟ ⎥ = (P 2 + Q2 )2 .
1

dt ⎢⎝ dt ⎠ ⎝ dt ⎠ ⎥⎦

Therefore
dx dx ds P dy Q
= = 2 = 2
23

and
(P + Q2 )2 ds (P + Q2 )2
1 1
ds dt dt
are equivalent equations for the path, in terms of arc-length s. This gives the fol-
lowing method.

To compute the paths of the system B = P(x, y), D = Q(x, y), at evenly
spaced points
Apply (23.36) to the equivalent system
dx dy
= L(x, y), = M(x, y),
ds ds
where
L = P/(P 2 + Q 2 ) , M = Q/(P 2 + Q 2 ) .
1 1
2 2

The step length h is the distance along the path. (23.37)


501
Problems

PROBLEMS
Many of the problems involve computation. straight paths y = mx by substitution. State which
The method of Section 23.8 is sufficient, but a are unstable.
high-accuracy computer library routine would (a) B = x − 5y, D = x − y;
allow the use of much larger values of h, and (b) B = x + y, D = x − 2y;
therefore be more efficient. (c) B = −4x + 2y, D = 3x − 2y;
(d) B = x + 2y, D = 2x + 2y;
23.1 (Computation). Practise computing a phase (e) B = 4x − 2y, D = 3x − y;
diagram in the following cases. Information is (f) B = 2x + 3y, D = −3x − 3y.
given for checking, but imagine that you do not
have it. The equilibrium points are at (0, 0). Take 23.5 For the equations given: find any equilibrium
different starting points; you may have to work points; obtain a linear approximation at each
backwards as well as forwards, by changing the equilibrium point by the method of Section 23.6;
sign of h. classify it from (23.22) (finding the straight line
(a) B = y, D = −4x. (A centre, 4x2 + y2 = C. If paths in the case of nodes and saddles); and put
your paths do not nearly close, try a smaller the sketches on a phase diagram. Guess how the
interval h.) diagram away from the equilibrium points is filled
(b) B = y , D = x. (A saddle, x2 − y2 = C. Find the in (isoclines, Section 17.1, might help here). Then
asymptotes by trying y = mx in the equations: turn to Problem 23.6.
they are y = ±x. It is difficult to make sense (a) B = x − y, D = x + y − 2xy;
of the diagram without this information.) (b) B = 1 − xy, D = (x − 1)(y + 1);
(c) B = y, D = −2x − 3y. (Stable node. Find the (c) B = x − y, D = x2 − 1;
two solutions y = −x, y = −2x, which are radial (d) F + x − x3 = 0 (with B = y);
straight lines as in (b). These represent four (e) B = 4x − 2xy, D = −2y + xy, for x  0 and y  0
paths since they are interrupted by the origin.) (foxes and rabbits, Example 23.6: classify (0, 0)
(d) B = y, D = −3x − y. (Stable spiral.) as if x and y could be negative).
(e) B = y, D = −2x + y. (Unstable spiral.)
(f ) Recompute (a), marking off a time scale on 23.6 (Computational). Check some of the
each of the paths, showing intervals in t of phase diagrams you sketched in Problem 23.5 by
around 0.3. computing representative phase paths. Look out
(g) Recompute (b) with a time scale as in (f). for separatices, which end at equilibrium points.
(h) B = y, D = −2y. A different type: what is the
second-order equation that it comes from? 23.7 Sketch possible phase diagrams from the
information given. If a phase path ends in mid
23.2 Sketch the phase paths for the following air, or if you have a closed curve without an
equations by first solving for them: form dy /dx equilibrium point inside, then there is something
and separate the variables. wrong. There are often several possibilities: for
(a) B = y, D = x; (b) B = x, D = y; example, a path might either join two equilibrium
(c) B = −y, D = x; (d) B = −x, D = y; points or split, forming two branches going to
(e) B = 2y, D = x; (f) B = −2y, D = x. infinity. Suppose that the only equilibrium
points at a finite distance are those given in the
23.3 Solve the following by using the energy following cases.
transformation d 2x /dt 2 = 12 d(B 2 )/dx (Example (a) centre at (0, 0), saddle at (1, 0);
20.15), and sketch the (x, B) phase diagrams. (b) centre at (0, 0), saddles at (±1, 0);
(a) F = ex; (c) unstable node at (0, 0), stable node at (1, 0);
(b) F + B2 + x = 0 (the transformed equation is (d) centres at (±1, 0).
linear in y2);
(c) F − 8xB = 0; 23.8 (Computational). Obtain a phase diagram
(d) F = ex − e−x (the Poisson–Boltzmann equation). for the following (in some of these the linear
approximation point is zero, so it gives no
23.4 Classify the equilibrium point (0, 0) for each information):
of the following linear equations by using (23.22). (a) F + | B | B + x = 0; (b) F + | B |B + x3 = 0;
Sketch the phase diagram: in cases where it is (c) F = x4 − x2; (d) B = 2xy, D = y2 − x2;
appropriate you should first obtain the radial (e) B = 2xy, D = x2 − y2;
502
(f ) F + B(x2 + B2) + x = 0 (notice that the origin is a 23.15 (Computational). Construct a phase
spiral, although the linear approximation has diagram for the following equations. (They
NONLINEAR DIFFERENTIAL EQUATIONS AND THE PHASE PLANE

a centre – see the remark following (23.22)). each contain a limit cycle.)
(a) F + 12 (x 2 + B 2 − 1)B + x = 0;
23.9 From the Taylor series (5.4b), sin x ≈ x − 61 x3 (b) F + 15 (x 2 − 1)B + x = 0;
for small x, so the pendulum equation (23.9) is
(c) F + 15 ( 31 B 2 − 1)B + x = 0;
approximated by F + ω 2(x − 61 x3 ) = 0 (the Duffing
equation). Sketch or compute the phase diagram, (d) F + 5(x 2 − 1)B + x = 0.
and comment on the differences from Fig. 23.9,
for the exact equation. 23.16 As in Problem 23.7, sketch phase diagrams
for the general (x, y) phase plane compatible
23.10 (Computational). For a modified form of with the following information. The equilibrium
the predator–prey problem (compare Example points and limit cycles specified are the only
23.6), in a special case, the equations are ones allowed.
(a) (0, 0) is a spiral and x2 + y2 = 1 is a stable
B = 4x − 2xy − x2, D = −2y + xy − 2y2.
limit cycle.
The additional terms in x2 and y2 are meant (b) (0, 0) is a spiral, x2 + y2 = 1 a stable limit cycle,
to account for competition for resources and x2 + y2 = 4 another limit cycle.
among rabbits and among foxes. Use a linear (c) (±1, 0) are saddles, (0, 0) is a centre, and
approximation at the equilibrium points in order x2 + y2 = 4 is a stable limit cycle.
to classify them, then compute the phase diagram. (d) (±1, 0) are centres, (0, 0) is a saddle, and
x2 + y2 = 4 is a stable limit cycle.
23.11 A model for H(t) hosts supporting (e) (0, 0) is a centre; the only closed path with
P(t) dangerous parasites is O = (a − bP)H, x2 + y2  1 is the stable limit cycle x2 + y2 = 4.
P = (c − dP/H)P, where a, b, c, d, are positive.
Analyse the system in the (H, P) plane. 23.17 Show that, in polar coordinates, the
system
23.12 Figure 23.14 represents a spring of stiffiness
s and natural length l, pivoted at A at a height B = −y + x(1 − x2 − y2),
h above a smooth wire CD. At B is a bead m, D = x + y(1 − x2 − y2)
attached to the spring and sliding on the wire. becomes
The equation of motion is
K = r(1 − r 2), I = 1.
23

s ⎛ l ⎞
F+ ⎜1 − 2 1 ⎟ x = 0. By investigating the sign of K, explain why the
m⎝ (h + x 2 ) 2 ⎠ system has just one limit cycle, which is stable.
Classify the equilibrium points when l  h, l = h, Sketch the phase diagram.
and l  h.
A
23.18 Find the locations of all the equilibrium
points of
B = (x2 + y2 − 1)y, D = −(x2 + y2 − 1)x.
Explain why the circle x2 + y2 = 1 does not
h
represent periodic motion.

B D 23.19 Verify that the differential equation


C x ⎛ B2 ⎞
F + ⎜ 1 − x 2 − 2 ⎟ B + ω 2x = 0
⎝ ω ⎠
Fig. 23.14
has the particular solution x = cos ω (t − t0) for any
23.13 (Computational). Solve Problem 23.12 t0. What is the corresponding phase path in the
modified so that there is friction between the bead (x, y = B) plane? Put further details on a sketch of
and the wire equal to kB. Classify the equilibrium the phase diagram.
points and construct the phase diagrams.
23.20 Locate all equilibrium points of the system
23.14 (Computational). Construct phase
diagrams for the equation F + kB − x + x = 0.2 B = (x2 − 1)y, D = (y2 − 1)x,
Consider various values of k. and sketch its phase diagram.
Part 4
Transforms and
Fourier Series
The Laplace transform
24

CONTENTS

24.1 The Laplace transform 505


24.2 Laplace transforms of t n, e ±t, sin t, cos t 506
24.3 Scale rule; shift rule; factors t n and e kt 508
24.4 Inverting a Laplace transform 512
24.5 Laplace transforms of derivatives 515
24.6 Application to differential equations 516
24.7 The unit function and the delay rule 519
24.8 The division rule for f(t)/t 524
Problems 525

The Laplace transform L is an integral operator; it symbolizes a process that


transforms any function f(t) supplied to it into another function, F(s) say, denoted
by L{f(t)}and called the Laplace transform of f(t). Its form is defined by eqn (24.1).
It has many applications, but here we shall use it for simplifying the solution
of linear differential equations, usually with constant coefficients, under given
initial conditions. The method greatly simplifies what can otherwise be a com-
plicated process, especially if the equations are of high order, or if a forcing term
has jump discontinuities.
The particular virtue of the Laplace transform is to convert the differential
equation into a linear algebraic equation for F(s) that automatically incorporates
the initial conditions as parameters. This equation is easily solved to give F(s)
explicitly. Finally we recover the unknown f(t) from its Laplace transform F(s).
This is called inversion of the transform. The required inverse can be expressed
in all cases as a so-called contour integral, but for the applications in this book
ordinary algebra is sufficient.

24.1 The Laplace transform


Suppose that f(t) is a specified function, defined for 0  t  ∞, and that s is a real
positive parameter (that is to say, a supplementary variable). Then the integral

e 0
−st f (t) dt = F(s)
506
(for c large enough to ensure convergence) is called the Laplace transform of f(t):
the integral transforms f(t) into another function F(s).
THE LAPLACE TRANSFORM

For example, suppose that f(t) = e2t and s  2. Then


∞ ∞ ∞

e e
⎡ 1 ⎤
F(s) = −st e dt =
2t −( s−2 )t dt = ⎢− e−( s−2 )t ⎥
0 0 ⎣ s−2 ⎦0
1 1 1
=− (e−∞ − e0 ) = − (0 − 1) = .
s−2 s−2 s−2
This result is true only if s  2; otherwise the integral is infinite. We shall always
assume that s is large enough to ensure that the integrals we encounter remain
finite, or converge (see Section 15.6).
24

We also use the symbol L to stand for the ‘Laplace transform of’. We have just
proved that
1
F(s) = L {e2t} = .
s−2

Laplace transform of f(t):



L{f(t)} = F(s) =
e
0
–st
f(t) dt, t  0.
(24.1)

Another, very useful, notation is to indicate a transformed function by a tilde


sign: L{ f(t)} = K(s), L{x(t)} = b(s), and so on.
The letter p is often used for the parameter instead of s, especially in mainly
theoretical texts.

24.2 Laplace transforms of tn, e±t, sin t, cos t


(a) Positive, whole-number powers t n, n = 0, 1, 2, … .

L{t n } = e 0
− st n
t dt.

Simplify the integral by substituting u = st, so that


1 1
t = u and dt = du.
s s
Provided that s is positive, the limits of integration t = 0 and ∞ correspond to
u = 0 and ∞ respectively. Therefore
∞ n ∞

 e
⎛ u ⎞ du 1 n!
L{t n } = e− u ⎜ ⎟ = n+1 −u un du =
0
⎝ s⎠ s s 0
sn +1
for n = 0, 1, 2, … (from the standard integral, (17.9) for the factorial). Note that 0!
is to be interpreted as being equal to 1.
507

Laplace transform of powers:

24.2
n!
L{t n} = n+1 , n = 0, 1, 2, … , t > 0.
s

LAPLACE TRANSFORMS OF t n , e ± t , sin t, cos t


Special cases:
1 1 2! 3!
L{1} = , L{t} = 2 , L{t 2 } = 3 , L{t3} = 4 .
s s s s (24.2)

Example 24.1 Find the Laplace transform F(s) of f(t) when


1 2 1 3
f (t) = 1 − t + t − t , t > 0
2! 3!
Composite expressions are dealt with in the following way.
⎧ 1 1 ⎫
F(s) or L { f (t)} = L ⎨1 − t + t 2 − t 3 ⎬ ,
⎩ 2! 3! ⎭
1 1
= L {1} − L {t} + L {t 2 } − L {t 3}
2 3!
(which follows from the fact that each L{ ··· } stands for the integral (24.1))
1 1 1 2! 1 3!
= − 2 + − ,
s s 2! s 3 3! s 4
1 1 1 1
= − 2 + 3 − 4.
s s s s

(b) Exponential e±t.


∞ ∞

 e
1
L{e± t} = e−st e± t dt = −( s ∓1)t dt = − [e−( s ∓ 1)t ]0∞ .
0 0
s∓1
s z 1 are both positive if we take s  1, in which case
1 1
L{e± t} = − (0 − 1) = .
s∓1 s∓1

Exponential function, t  0
1 1
L{et } = , L{e − t} = .
s−1 s+1 (24.3)

(c) Sine and cosine.

Trigonometric functions, t  0
s 1
L{cos t} = , L{sin t} = .
s +1
2
s +1
2
(24.4)
508
Since cos t + i sin t = eit, both of these can be verified at the same time by working
out L{eit} and then separating the real and imaginary parts:
THE LAPLACE TRANSFORM

∞ ∞

 e 1 1
L{eit } = e −st eit dt = −( s− i)t
dt = − [e −(s− i)t]0∞ = − (0 − 1)
0 0
s−i s−i

(since s is positive)
1 s+i
= = 2 .
s−i s +1
Therefore, as in (24.4),
∞ ∞

e e
s
24

−st cos t dt = Re −st eit dt = ;


0 0
s +1
2

∞ ∞

 e
1
e −st sin t dt = Im −st eit dt = .
0 0
s2 + 1

Example 24.2 Find the Laplace transform of 3t2 + 2 e−t − 5 cos t.


L{3t2 + 2 e−t − 5 cos t} = 3L{t2} + 2L{e−t} − 5L{cos t}
2! 1 s
=3 +2 −5 2
s3 s+1 s +1
6 2 5s
+ − = ,
s 3 s + 1 s2 + 1
from (24.2), (24.3) and (24.4).

Self-test 24.1
Find, using the definition (24.1), the Laplace transform of (a) sin 2t,
(b) --21 sin 2t + e2t (for s  2). Show that L{H(t − 1)} = e−s/s, where H is the
unit function (1.13).

24.3 Scale rule; shift rule; factors t n and e kt


The following rules make it easy to derive more complicated transforms from the
basic ones of Section 24.2.

Scale rule
If L{f(t)} = F(s), and k  0, then for t  0
1 ⎛ s⎞
L{f (kt)} = F⎜ ⎟.
k ⎝ k⎠
(24.5)
509
The proof is as follows.

24.3

L{ f (kt)} = e
0
−st f (kt) dt.

SCALE RULE; SHIFT RULE; FACTORS t n AND e kt


Change the variable by putting u = kt, so that t = u /k and dt = du /k. The limits of
integration t = 0 and ∞ go into u = 0 and ∞ respectively because k  0. Therefore
∞ ∞

 e
⎛ du ⎞ 1 1 ⎛ s⎞
L{ f (kt)} = e −s(u /k ) f (u) ⎜ ⎟ = −( s/k )u f (u) du = F⎜ ⎟,
0
⎝ k⎠ k 0
k ⎝ k⎠
∞ −su
since F(s) = ∫ e f(u) du.
0
The following are special cases of the scale rule.

If k is any constant, positive or negative, then for t  0


1 s
(a) L {ekt } = , (b) L {cos kt} = 2 ,
s−k s + k2
k
(c) L {sin kt} = 2 .
s + k2 (24.6)

These are proved from the definition of L in (24.1), or as follows.


(a) Suppose that m is a positive number; then combining (24.3) with the scale
rule (24.5) gives
1 1 1
L{e± mt} = ⋅ = .
m s /m ∓ 1 s ∓ m
The result (24.6a) therefore holds good for both positive and negative k.
(b) From (24.4),
s
L{cos t} = .
s +1
2

Therefore, by the scale rule (24.5), if k  0,


1 s /k s
L{cos kt} = = 2 .
k (s /k) + 1 s + k 2
2

This is true also if k is negative, since it is equal to ∫ ∞0 e−st cos kt dt (see (24.1)).
(c) is similar to (b).

Example 24.3 Find the Laplace transform of cos(3t + 14 π).


cos(3t + 14 π) = cos 14 π cos 3t − sin 14 π sin 3t = (cos 3t − sin 3t)/ √2.
Therefore by (24.6)
L{cos(3t + 14 π)} = cos 14 π L{cos 3t} − sin 14 π L{sin 3t}
1 ⎛ s 3 ⎞ 1 s−3
= ⎜ − ⎟= .
√2 ⎝ s2 + 9 s2 + 9 ⎠ √2 s2 + 9
510
Suppose that we know the Laplace transform F(s) of a function f(t) already.
Then the Laplace transform of ektf(t) can immediately be written down
THE LAPLACE TRANSFORM

∞ ∞

L{ekt f (t)} = 0
e −st ekt f (t) dt =
e
0
(s −k )t f (t) dt.

But ∫ ∞0 e−st f(t) dt = F(s), which is supposed to be known, and here we have s − k in
place of s. Therefore
L{ekt f(t)} = F(s − k).

Shift rule (multiplication by e kt )


24

If L{f(t)} = F(s) and k is any constant, then for t  0


L{ekt f(t)} = F(s − k). (24.7)

The shift rule is so called because the transform function F(s) is ‘shifted’ a distance
k along the s axis by the presence of the factor ekt.

Example 24.4 Find L{e−3t sin 2t}.


From (24.6),
2
L{sin 2t} = .
s2 + 4
By the shift rule (24.7) with k = −3, we deduce that
2 2
L{e −3t sin 2t} = = 2 .
(s + 3) + 4 s + 6s + 13
2

Example 24.5 Find L{t 3 e4t}.


From (24.2), L{t 3} = 3!/s 4. The shift rule with k = 4 gives
3!
L{e 4t t 3} = .
(s − 4)4

There is a rule similar to (24.7) by which we can find the Laplace transform of
t n f(t) when the transform of f(t) is known:

Multiplication by t n, n = 1, 2, …
If L{f(t)} = F(s), and n is a positive integer, then for t  0
dn F(s)
L{tn f (t)} = (−1)n .
dsn (24.8)
511
The simplest way to prove this is to start with the right-hand side. Since

24.3

e −st f (t) dt = F(s),

SCALE RULE; SHIFT RULE; FACTORS t n AND e kt


0

then, assuming that differentiation and integration can be interchanged,


∞ ∞ ∞

   (−t e
dF(s) d d(e −st )
= e −st f (t) dt = f (t) dt = −st ) f (t) dt
ds ds 0 0
ds 0


=− e
0
−st (tf (t)) dt = −L{tf(t)}.

Every time we differentiate, another factor t and another multiplication by −1


appear, which takes us to (24.8).

Example 24.6 Find L{t cos 3t}.


Since, by (24.6b),
s
F(s) = L {cos 3t} = ,
s +9
2

then, by (24.8),
d s 9 − s2 s2 − 9
L{t cos 3t} = − =− 2 = 2 .
ds s + 9
2
( s + 9) 2
(s + 9)2

Note the two following special cases, which occur frequently.

For t  0,
s2 − k 2
L{t cos kt} = ,
(s2 + k 2 )2
2ks
L{t sin kt} = 2 .
(s + k 2 )2
(24.9)

Example 24.7 Find L{t3 e−3t} (a) by using the shift rule, (b) by using (24.8),
(c) by working directly from the definition of the Laplace transform.
(a) From (24.2),
6
L{t 3} = 4 .
s
Therefore, using the shift rule (24.7a) with k = −3,
6
L{e −3t t 3} = .
(s + 3)4
(b) From (24.6) with k = −3
1
L{e −3t} = .
s+3 ➚
512
Example 24.7 continued
THE LAPLACE TRANSFORM

From (24.8) with n = 3,


d3 1 (−1)(−2)(−3) 6
L{t 3 e −3t} = (−1)3 = (−1)3 = .
ds s + 3
3
(s + 3)4
(s + 3)4
(c) From the definition, (24.1),
∞ ∞
L{t 3 e −3t} = e
0
− st
t 3 e −3t dt = e
0
−(s+ 3)t
t 3 dt.

From (24.2), this is equal to


3!
.
(s + 3)4
24

Self-test 24.2
1
Find L{f(t)} where f(t) for t  0 is given by (a) ett3; (b) e− –2 t(H(t) − H(t − 1))
and H is the unit function (1.13).

Self-test 24.3
Given that L{1} = 1s , show that (24.8) used repeatedly, yields the transforms
of successive powers of t. Obtain the transform of t3e−kt by applying the
rule (24.8), given that L{e−kt} = 1/(s + k).

24.4 Inverting a Laplace transform


Given a function f(t), we obtain its transform F(s) by using the definition (24.1).
Alternatively, if a function F(s) is presented, then we can try to recover the func-
tion f(t), from which F(s) is obtained. This second question is the inverse problem
for the Laplace transform – to find ‘?’ in the equation
L{?} = F(s).
It can be proved that there is at most one answer to this problem. The process of
finding f(t) from F(s) is called inversion of F(s).
The notation
f(t) ↔ F(s)
is another useful notation, which underlines the two-way correspondence between
f(t) and F(s).
We can open up a ‘dictionary’ for this purpose, as we did for derivatives and
integrals. The most important results we have so far are given in the table (24.10)
below.
513

f(t) for t  0 F(s)

24.4
n! ⎫
⎧tn (n = 0, 1, … ),

INVERTING A LAPLACE TRANSFORM


⎪ sn+1 ⎪⎪
⎨ 1 ⎬
⎪(m − 1)! t
m−1 (m = 1, 2, … ) 1 ⎪

sm ⎪⎭
1
ekt (any k)
s−k
s
cos kt (any k)
s2 + k2
k
sin kt (any k)
s2 + k2

(24.10)

A much fuller table which also includes the various rules can be found in
Appendix F. Remember that everything we do with Laplace transforms refers to
t  0 only: the defining integral (24.1) calls only on values of t  0.
Partial fractions are often useful for inverting transforms as follows:

Example 24.8 Given the transform 1/[s(s + 1)], find the inverse transform.
In partial fractions,
1 1 1
= − .
s(s + 1) s s + 1
From the table above,
1 1
↔ 1 and ↔ e −t ,
s s+1
so that
1
↔ 1 − e −t.
s(s + 1)

Example 24.9 Invert the Laplace transform


s+1
.
s(s2 + 4)
The partial-fraction rules require the form
s+1 A Bs + C
= + 2 .
s(s + 4) s
2
s +4
When the constants are determined by the method of Section 1.14, we find that A = 14 ,
B = − 14 , C = 1, so that
s+1 1
− 14 s + 1 1
1 s 1
= 4
+ = 4
− + 2 .
s(s + 4) s
2
s +4
2
s 4s +4 s +4
2

514
Example 24.9 continued
THE LAPLACE TRANSFORM

From (24.2),
1
↔ 1.
s
From the table (24.10),
s 1
↔ cos 2t, ↔ 1
sin 2t.
s2 + 4 s2 + 4
2

Therefore
s+1
↔ 1
− 1
cos 2t + 1
sin 2t.
s(s2 + 4)
4 4 2
24

Example 24.10 Invert the Laplace transform


3s + 2
.
s + 2s + 2
2

The quadratic denominator does not have real factors, so partial fractions are not
available. Instead we complete the square:
s2 + 2s + 2 = (s + 1)2 − 1 + 2 = (s + 1)2 + 1.
We aim to write the whole expression in terms of s + 1 so that we can apply the shift
rule (24.7). So put also
3s + 2 = 3(s + 1) − 3 + 2 = 3(s + 1) − 1,
and the transform becomes
3(s + 1) − 1
.
(s + 1)2 + 1
If we had s instead of s + 1, we could invert the transform:
3s − 1 3s 1
= − ↔ 3 cos t − sin t.
s2 + 1 s2 + 1 s2 + 1
Therefore, by the shift rule with k = −1,
3(s + 1) − 1
↔ e −t(3 cos t − sin t).
(s + 1)2 + 1

There is an integral representation of the Laplace transform F(s) in terms of the


function f(t). This inversion requires the theory of complex variables and contour
integration, which is beyond the scope of this text (for discussion of the inverse
see, for example, Riley, Hobson and Bence (1997, Ch. 18)). There are also extensive
tables of Laplace transforms and their inverses (see Roberts and Kaufman (1966)).

Self-test 24.4
Show that
1 1
↔ (et − cos t − sin t)
(s − 1)(s2
+ 1) 2
by using partial fractions.
515

24.5 Laplace transforms of derivatives

24.5
Suppose that L{f(t)} = F(s). Then the Laplace transforms of df(t)/dt, d2f(t)/dt2, …
can be expressed in terms of F(s).

LAPLACE TRANSFORMS OF DERIVATIVES


In the definition,

e
⎧ d f (t) ⎫ d f (t)
L⎨ ⎬= −st dt.
⎩ dt ⎭ 0
dt

Integrate the right-hand side by parts. Using the notation of Section 17.7, put
dv d f (t)
u = e− st , = ,
dt dt
so that
du
= −s e− st , v = f (t).
dt
Then

e
⎧ d f (t) ⎫ d f (t)
L⎨ ⎬= −st dt
⎩ dt ⎭ 0
dt

= [e −st f (t)] −

0
 (−s e
0
−st ) f (t) dt

= 0 − e0 f (0) + s
e0
−st f (t) dt

= −f(0) + sL{f(t)}.

In other words, if L{f(t)} = F(s), then

⎧ d f (t) ⎫
L⎨ ⎬ = sF(s) − f (0). (24.11)
⎩ dt ⎭

(Note that it is f(0), not F(0), that arises here.)


We can use (24.11) recursively to transform higher derivatives in succession.
For example,

⎧ d2 f (t) ⎫ ⎧ d d f (t) ⎫ ⎧ d f (t) ⎫


L⎨ ⎬ = L⎨ ⎬ = sL ⎨ ⎬ − f ′(0) ,
⎩ dt ⎭ ⎩ dt dt ⎭ ⎩ dt ⎭
2

= s[sL{f(t)} − f(0)] − f ′(0),


= s2 F(s) − sf(0) − f′(0),

and higher derivatives successively, from which we obtain the sequence:


516

Laplace transform of derivatives


THE LAPLACE TRANSFORM

If L{f(t)} = F(s), then


⎧ d f (t) ⎫
L⎨ ⎬ = sF(s) − f (0),
⎩ dt ⎭
⎧ d 2 f (t) ⎫
L⎨ 2 ⎬
= s2 F(s) − sf (0) − f ′(0),
⎩ d t ⎭
⎧ d3 f (t) ⎫ 3
L⎨ 3 ⎬
= s F(s) − s2 f (0) − sf ′(0) − f ′′(0),
⎩ dt ⎭
and so on. (24.12)
24

Example 24.11 Obtain the transform of the expression


d2 x dx
+2 + 3x,
dt 2 dt
when x = 4 and dx/dt = 5 at t = 0.
Put L{x(t)} = X(s). Then

⎧ d2 x dx ⎫ ⎧ d2 x ⎫ ⎧ dx ⎫
L⎨ +2 + 3x⎬ = L ⎨ 2 ⎬ + 2 L ⎨ ⎬ + 3L[x]
⎩ dt 2
dt ⎭ ⎩ dt ⎭ ⎩ dt ⎭
= s X − sx(0) − x′(0) + 2[sX − x(0)] + 3X
2

= s2X − 4s − 5 + 2(sX − 4) + 3X
= (s2 + 2s + 3)X − 4s − 13.

Self-test 24.5
Obtain the Laplace transform of the expression
d2x dx
2
− 2 − 6x,
dt dt
where x(0) = 2 and x′(0) = −1.

24.6 Application to differential equations

The results (24.12) enable initial-value problems for linear differential equations
having constant coefficients to be solved.
517

Example 24.12 Find the solution of

24.6
dx
+ 2x = e − t
dt

APPLICATION TO DIFFERENTIAL EQUATIONS


for which x = 3 when t = 0.
Since
dx
+ 2x = e − t ,
dt
it is also true that
⎧ dx ⎫
L⎨ ⎬ + 2 L {x} = L {e −t}.
⎩ dt ⎭
Write
L{x(t)} = X(s).
By (24.12) the transformed equation becomes
1
sX − 3 + 2X =
s+1
(where we put x(0) = 3 as specified by the initial condition). The transform X(s) of x(t) is
therefore given by (using partial fractions)
3s + 4 1 2
X(s) = = + ↔ x(t) = e−t + 2 e−2t,
(s + 1)(s + 2) s + 1 s + 2
which is the required solution. Although the Laplace transform process takes account
only of t values which are positive, the solution is valid for all values of t.

It can be seen that the terms involving f(0), f ′(0), … in (24.12), far from being
merely a nuisance, are exactly what is required to translate a differential equation
together with initial conditions into a simpler problem in ordinary algebra. We
do not have to match up arbitrary constants with the initial conditions; these
conditions are built into the transformed equations.
In many physical situations, we want to know what happens when an inactive
or quiescent system is ‘switched on’. In such cases, we have zero initial conditions
at some time t0  0. For a system described by a second-order differential equa-
tion, the variable and its first derivative are initially set to zero.

Example 24.13 A system is described by the equation


d2 x dx
+2 + 4x = 1.
dt 2 dt
It is initially quiescent and is then switched on at time t0 = 0. Find the subsequent
time variation of x.
We have x(0) = x′(0) = 0. Let
x(t) ↔ X(s). ➚
518
Example 24.13 continued
THE LAPLACE TRANSFORM

Then the equation transforms to


1
s2X + 2sX + 4X =
s
(notice the 1/s) so that
1 11 1 s+2
X= = − .
s(s2 + 2s + 4) 4 s 4 s2 + 2s + 4
The quadratic has no real factors; therefore the second term is rewritten in the manner
of Example 24.10:
1 1 1 (s + 1) + 1
X= −
4 s 4 (s + 1)2 + 3
24

11 1⎛ s+1 1 ⎞
= − ⎜ + ⎟.
4 s 4 ⎝ (s + 1) + 3 (s + 1) + 3 ⎠
2 2

To invert the last two terms: from (24.10)


s 1 1
↔ cos 3t, = sin 3t.
s +3
2
s + 3 √3
2

By using the shift rule (24.7) with k = −1, we obtain


s+1 1 1 −t
↔ e −t cos 3t, ↔ e sin 3t.
(s + 1)2 + 3 (s + 1)2 + 3 √3
Therefore
x(t) = 1
4 − 14 (e − t cos 3t + 1
3 3 e − t sin 3t).

Example 24.14 Solve the equation


d2 x
+ ω 02 x = a cos ω 0t,
dt 2
with x(0) = x′(0) = 0.
If we put L{x(t)} = X(s), then the equation transforms into
as
s2X + ω 20X = ,
s2 + ω 20
so that
as
X= .
(s + ω 20 )2
2

We can read off the inverse from (24.9) with k = ω 0:


a
x(t) = t sin ω 0t.
2ω 0
This equation is one of the exceptional resonant types discussed in Section 19.3. The
advantage of using the Laplace transform is easy to see.
519

Example 24.15 Solve the simultaneous equations

24.7
dx dy
= x − y, = x + y,
dt dt

THE UNIT FUNCTION AND THE DELAY RULE


with the initial conditions x(0) = 1, y(0) = 0.
Let L{x(t)} = X(s) and L{y(t)} = Y(s). Then the transformed equations, including
initial conditions, are
sX − 1 = X − Y, sY = X + Y.
Therefore
(1 − s)X − Y = −1,
X + (1 − s)Y = 0.
By solving these equations, we obtain
−1 + s 1
X= , Y= 2 .
s 2 − 2s + 2 s − 2s + 2
The denominators, s2 − 2s + 2, have no real factors, so use the method of Example 24.10
to rewrite these expressions as
s−1 1
X= , Y= ,
(s − 1) + 1
2
(s − 1)2 + 1
so that the shift rule (24.7) can be used to invert them. By (24.10),
s 1
↔ cos t, ↔ sin t.
s2 + 1 s2 + 1
Therefore, by the shift rule with k = 1,
x(t) = et cos t, y(t) = et sin t.

Self-test 24.6
Solve the equation
d2x dx
2
− 3 + 2x = e3t
dt dt
with initial conditions x(0) = 1, x′(0) = 2.

24.7 The unit function and the delay rule


The Heaviside unit function H(t) (or U(t)) was introduced in Section 1.4. Here is
a reminder of its definition:

Unit function H(t)


⎧0 when t  0,
H(t) = ⎨
⎩1 when t  0. (24.13)
520

(a) x (b) x
THE LAPLACE TRANSFORM

1 1

O t O c t

(c) x (d) x
1
1

d O c t O 1 t
24

(e) x
8
6
Fig. 24.1 (a) x = H(t),
4
(b) x = H(t − c),
2 (c) x = H(t − d) − H(t − c),
(d) x = t[H(t) − H(t − 1)],
O 1 2 t (e) x = et[H(t − 1) − H(t − 2)].

It is shown again in Fig. 24.1a. Figures 24.1b–e show how it can be used to
describe various step functions and switching functions.
For example, the composition of the three segments of Fig. 24.1e is specified by:
⎧et(0 − 0) = 0 if t  1,

et[H(t − 1) − H(t − 2)] = ⎨et(1 − 0) = et if 1  t  2,
⎪⎩et(1 − 1) = 0 if t  2.

Related Laplace transforms are given as follows.

Laplace transform for the unit function


1 e −cs
L{H(t)} = , L{H(t − c)} = (c positive).
s s (24.14)

The various combination rules such as the shift rule (24.7) work for H(t) in the
same way as for smooth functions f(t), as is shown in the following examples.

Example 24.16 Find L{f(t)} when f(t) = et[H(t − 1) − H(t − 2)].


This is the function shown in Fig. 24.1e. Then, from the definition,

e
L{ f (t)} =
0
− st
e t[H(t − 1) − H(t − 2)] dt,
2
= e
1 1
−(s−1)t
[e −(s−1)t ]12 = −
dt = − (e −2(s−1) − e −(s−1) ).
1 s−1 s − 1
Alternatively, we could use the shift rule (24.7), though it has no particular advantage.
521

Example 24.17 Find the Laplace transform of the square wave function shown

24.7
in Fig. 24.2.

THE UNIT FUNCTION AND THE DELAY RULE


x
1

O 1 2 3 4 t
Fig. 24.2

By considering the segments one at a time and using Fig. 24.1c, we have
x(t) = [H(t) − H(t − 1)] − [H(t − 1) − H(t − 2)] + [H(t − 2) − H(t − 3)] − ··· ,
= H(t) − 2H(t − 1) + 2H(t − 2) − 2H(t − 3) + ··· .
From (24.14)
e −ns
H(t − n) ↔ .
s
Therefore
1 2 −s
L{x(t)} = − (e − e −2s + e −3s −  ).
s s
The brackets contain an infinite geometric series with first term e−s and common
ratio −e−s (see (1.37)). Therefore
1 2 e −s 1 − e −s
L (x(t)) = − = .
s s 1+e −s
s(1 + e −s )

Suppose that we have a function g(t) which has a meaning for all positive t,
such as g(t) = e−t. Its Laplace transform is G(s) = ∫ ∞0 e−st g(t) dt. All values of g(t) for
t positive are called on to contribute to this integral, but none of its values for
negative t are called upon (Fig. 24.3a).
Now translate the function a distance c (positive) to the right as in Fig. 24.3b.
The new graph represents g(t − c)H(t − c). It brings with it a section NA which
originally corresponded to negative values of t. We cannot expect that the
Laplace transform of this new function g(t − c) can be expressed in terms of G(s),
because none of these t values played any part in the calculation of G(s).

(a) (b)

A A

O t O c t

Fig. 24.3 (a) Graph of g(t)H(t). (b) Graph of g(t − c)H(t − c).
522
Therefore we cut out the section NA and consider not g(t − c), but g(t − c)H(t − c),
which is shaded in Fig. 24.3b. It is congruent to the shaded part of Fig. 24.3a.
THE LAPLACE TRANSFORM

Then

L{g(t − c)H(t − c)} = e0


−st
g(t − c)H(t − c) dt

= ec
−st
g(t − c) dt.

Put t − c = u, so that t = u + c and dt = du. The integral becomes


∞ ∞

 e−s(u+c) g(u) du = e−sc e −su


g(u) du = e−sc G(s).
24

0 0

This is the second shift rule, or the delay rule, so called because g(t − c)H(t − c)
does not start until t = c.

Delay rule
If G(s) ↔ g(t) and c  0, then
e−cs G(s) ↔ g(t − c)H(t − c). (24.15)

It is most often useful in inverting a Laplace transform.

Example 24.18 Find the inverse Laplace transform of e−2s/s2.


Put G(s) = 1/s2. Then
1
G(s) = 2 ↔ g(t) = t.
s
By the delay rule,
e −2s
= e −2s G(s) ↔ (t − 2)H(t − 2),
s2
a function which suddenly takes off from zero at t = 2.

Example 24.19 Find the inverse Laplace transform of


e−2( s+1)
.
(s + 1)(s + 2)
Put
1 1 1
G(s) = = − ↔ g(t) = e−t − e−2t.
(s + 1)(s + 2) s + 1 s + 2
We require the inverse transform of e−2(s+1) G(s). By the delay rule with c = 2, this is
given by
e−2(s+1) G(s) = e−2 e−2s G(s) ↔ e−2(e−(t−2) − e−2(t−2))H(t − 2) = (e−t − e−2t+2)H(t − 2).
523

Example 24.20 Solve the differential equation

24.7
dx
+ 2x = f (t)
dt

THE UNIT FUNCTION AND THE DELAY RULE


with x(0) = 0, where (Fig. 24.4)
⎧0 when t  1,

f (t) = ⎨e−t when 1  t  2,
⎪⎩0 when t  2.

y
0.4
y = f(t)
0.2

O 1 2 t Fig. 24.4

Let L{x(t)} = X(s). We need


2 2
L{ f(t)} = e1
− st
e −t dt = e
1
−(s+1)t
dt

1
= (e −(s+1) − e −2(s+1) ) = F(s),
s+1
say. The transformed equation is then
sX + 2X = F(s), or X = F(s)/(s + 2).
Therefore
1 ⎛ 1 1 ⎞
X = (e −(s+1) − e −2(s+1) ) = (e −(s+1) − e −2(s+1) ) ⎜ − ⎟
(s + 1)(s + 2) ⎝ s + 1 s + 2⎠
⎛ 1 1 ⎞ −2 −2s ⎛ 1 1 ⎞
= e −1 e −s ⎜ − ⎟ −e e ⎜ − ⎟.
⎝ s + 1 s + 2⎠ ⎝ s + 1 s + 2⎠
Apply the delay rule with c = 1 and c = 2, noting that
1 1
− ↔ e − t − e −2 t.
s+1 s+2
We obtain
x(t) = e−1(e−(t−1) − e−2(t−1))H(t − 1) − e−2(e−(t−2) − e−2(t−2))H(t − 2)
= (e−t − e1−2t)H(t − 1) − (e−t − e2−2t)H(t − 2).
Both terms are zero before ‘switch-on’ at t = 1. Between t = 1 and 2, only the first
term contributes. For t  2 both terms are present, the second causing ‘switching off’.
524

Self-test 24.7
THE LAPLACE TRANSFORM

Use the delay rule to find the inverse of the Laplace transform
e−s e−2s
− .
s 2s3
Sketch the inverse function for t  0.

24.8 The division rule for f(t)/t


24

Assume that L{f(t)} = F(s). Let g(t) = f(t)/t (assuming that f(t)/t can be defined at
t = 0). Take the Laplace transform of both sides of f(t) = tg(t):
d
L{f(t)} = L[tg(t)] = L{g(t)]} = −G′(s) (say, by (24.8)).
ds
Hence F(s) = −G′(s). This separable equation has a solution which can be
expressed in the form

G(s) =  F(u) du,


s

since G(s) → 0 as s → ∞ (this is necessary for all transforms). Finally we obtain


the rule:

Division rule for f(t)/t


L 8 f(t)9 =
1
9t 8  F(u) du.
s
(24.16)

Example 24.21 Find the Laplace transform of (sin t)/t.


Since L{sin t} = 1/(s2 + 1), then, by (24.16),

L8
sin t 9
9 t 8
=  u du+ 1 = [arctan u]
s
2

s
1
= π − arctan s = arctan(1/s).
2
The division rule has to applied with care. For example the function (cos t)/t will
not have a transform since (cos t)/t → ∞ as t → 0, unlike (sin t)/t which has the limit 1
as t → 0.

Self-test 24.8
Find the Laplace transform of (e−t − 1)/t.
525
Problems

PROBLEMS
The dot notation, B = dx/dt, F = d2x/dt 2, etc., is (e) 2F + 3B − 2x, where x(0) = 5, B(0) = −2;
used in some of the questions. (f ) 3F − 5B + x − 1, where x(0) = 0, B(0) = 0.

24.1 Write down L{x(t)}, where x(t) is as follows. 24.7 Use the Laplace transform to solve the
(a) et; (b) 4 e−t; (c) 3 e t − e−t; following initial-value problems.
(d) 3t 2 − 1; (e) 12 t 3 + 2t 2 − 3; (f ) 3 + 2t4; (a) F + 3B + 2x = 0, x(0) = 0, B(0) = 1;
(g) 3 sin t − cos t; (h) 2(cos t − sin t); (b) F + B − 2x = 0, x(0) = 3, B(0) = 0;
1 1 1 (c) F + 4B = 0, x(0) = x0, B(0) = y0;
(i) 1 + t + t 2 +  + t n (you get a geometric
1! 2! n! (d) F + ω 2x = 0, x(0) = c, B(0) = 0.
series; see Section 1.16). (e) F + 2B + 5x = 0, x(0) = 3, B(0) = −3;
(f ) d4y/dx4 − y = 0, y(0) = 1, y′(0) = 0, y″(0) = 0,
24.2 (Scale rule). Find L{x(t)} for the following y′′′(0) = 0 (use x instead of t as the variable in
cases of x(t). the Laplace transform).
(a) e3t; (b) 1 − 2 e−2t;
(c) sin ω t; (d) cos ω t; 24.8 Use the Laplace transform to solve the
(e) 3 cos 2t − 2 sin 2t; following initial-value problems.
(f ) cos2t (express it in terms of cos 2t); (a) F = 1 + t + et, x(0) = 0, B(0) = 0;
(g ) sin2t (see (f )). (b) F + x = 3, x(0) = 0, B(0) = 1;
(c) F + 2B + 2x = 3, x(0) = 1, B(0) = 0;
24.3 (See Section 24.3.) Find L{x(t)} in the (d) F − x = e2t, x(0) = 0, B(0) = 1;
following cases of x(t). (e) F − x = t et, x(0) = 1, B(0) = 1;
(a) t2 et (easiest to start with t2); (f) F − 4x = 1 − e2t, x(0) = 1, B(0) = −1;
(b) t e−2t; (c) t2 e−t; (g) F − 4x = e2t + e−2t, x(0) = 0, B(0) = 0;
(d) e2t cos t; (e) e−t sin t; (h) F + ω 2x = C cos ω t, x(0) = x0, B(0) = y0;
(f ) et sin 3t; (g ) e−2t sin 3t; (i) J − 2F − B + 2x = e−2t, x(0) = 0, B(0) = 0,
−3t
(h) e cos 2t; (i) t cos 3t; F(0) = 2 (look out for factors in the
( j) t sin 3t; (k) t2 sin t; denominator of X(s)).
(l) t 4 e−t (compare the three methods: (i) start with
t 4 and use the shift rule, (ii) start with e−t 24.9 Solve the following simultaneous first-order
and use (24.8), (iii) work directly from the differential equations, for the given initial values.
definition (24.1)). (a) B = x − y, D = x + y, x(0) = 1, y(0) = 0;
(b) B = 2x + 4y + e4t, D = x + 2y, x(0) = 1, y(0) = 0;
24.4 Obtain the Laplace transform for t sin kt by (c) B = x − 4y, D = x + 2y, x(0) = 2, y(0) = 1.
differentiating that of cos kt with respect to k.
24.10 Find the general solution of the following by
24.5 Invert the following Laplace transforms. putting x(0) = A, B(0) = B, where A and B are arbitrary.
(a) 1/s 2; (b) 1/s; (a) F + x = et; (b) F − x = 3; (c) F − 2B + x = et.
(c) 3/(2s); (d) 3/s 5;
(e) 1/(s − 3); (f ) 1/(s + 4); 24.11 Find the general solution of d4 y /dx 4 − y = e x,
(g) 3/(2s − 1); (h) 2 /(2 − 3s); by putting y(0) = A, y′(0) = B, y″(0) = C, y′′′(0) = D,
(i) 1/[s(s − 1)]; (j) 1/(s2 + s − 1); where A, B, C, D are arbitrary. (Let the variable in
(k) s/(s − 1);
2
(l) (2s − 1)/(s2 − 1); the Laplace transform (24.1) be x instead of t.)
(m) s/(s2 + 1); (n) 1/(s2 + 4);
(o) (2s − 1)/(s2 + 4); (p) (2s − 1)/[s(s − 1)]; 24.12 This is a system of first-order equations for
(q) (s2 − 1)/[s(s − 1)(s + 2)(s + 3)]; x0(t), x1(t), … , xn(t):
(r) s/(s − 1)(s2 + 1); (s) 1/(s − 1)3;
B0 = −β x0, Br = β (xr−1 − xr ), x0(0) = 1, xr(0) = 0
(t) (2s + 1)/(s2 − 2s + 2); (u) s /[(s2 + 1)(s2 + 4)].
for r = 1, 2, … , n. Solve them by using the Laplace
24.6 Find the Laplace transform of the following transform, showing that
expressions involving x(t), where L{x(t)} = X(s). 1
(a) B(t), where x(0) = 6; xr = (β t)r e − βt.
r!
(b) B(t), where x(0) = 0;
(c) F(t), where x(0) = 3, B(0) = 5; 24.13 Use the delay rule (24.15) to obtain the
(d) F(t), where x(0) = 0, B(0) = 0; Laplace transform of e−t(t − 2) cos(t − 2)H(t − 2).
526
24.14 Find the functions which give rise to the (b) F − 4x = f(t), where
following Laplace transforms: ⎧1 for 0  t  1,
THE LAPLACE TRANSFORM

(a) e−2s/(s + 3); (b) (1 − s e−s)/(s2 + 1); f (t) = ⎨


(c) e−2s/(s − 4); (d) s e−s/[(s + 1)(s + 2)]; ⎩0 for t  1.
(e) e−s/[(s − 1)(s2 − 2s + 2)]. (c) F − 4x = f(t), where
⎧t for 0  t  1,
24.15 Solve the following differential equations ⎪
assuming that the initial state is of quiescence f (t) = ⎨2 − t for 1  t  2,
x(0) = B(0) = 0: ⎪⎩0 for t  2.
(a) F + x = f(t), where (d) F + x = f(t), where
⎧1 for 0  t  1, ⎧cos t for 0  t  π,
f (t) = ⎨ f (t) = ⎨
⎩0 for t  1. ⎩0 for t  π.
24
Laplace and z transforms:
applications 25

CONTENTS

25.1 Division by s and integration 527


25.2 The impulse function 530
25.3 Impedance in the s domain 533
25.4 Transfer functions in the s domain 535
25.5 The convolution theorem 541
25.6 General response of a system from its impulsive response 543
25.7 Convolution integral in terms of memory 544
25.8 Discrete systems 545
25.9 The z transform 548
25.10 Behaviour of z transforms in the complex plane 552
25.11 z transforms and difference equations 556
Problems 558

Most of the applications described in this chapter are drawn from electronics,
using terms such as signal, input, output, impulse, feedback, and so on. Such
terminology is also adopted in describing analogous behaviour of systems of all
sorts, from mechanics to biology. For linear systems, central mathematical con-
cepts are the delta or impulse function (Section 25.2), convolution (Section 25.5),
and the treatment of discrete systems (Section 25.8). The z-transform is closely
related to the Laplace transform, and simplifies the algebra of discrete systems
to some extent.
Notice that as in Chapter 21 on phasors, in Sections 25.3 and 25.4 on imped-
ance and transfer functions we consider only situations in which any transients
(exponentially decreasing terms in the solutions, which arise from the initial con-
ditions) have already died away. The currents, voltages, etc. are then sine/cosine
oscillations all having the prevailing frequency but with various amplitudes and
phases.

25.1 Division by s and integration


Multiplication by s is associated with differentiation (see (24.12)). Division by s is
associated with integration, as follows.
528

Division rule
LAPLACE AND Z TRANSFORMS: APPLICATIONS

 g(τ ) dτ.
t
1
If G(s) ↔ g(t), then G(s) ↔
s 0
(25.1)

To prove this, put (1/s)G(s) = F(s); then we must express f(t) in terms of g(t).
Rewrite the relation between F(s) and G(s) in the form
sF(s) = G(s).
From (22.12) we know that, in general,
df
↔ sF(s), provided that f(0) = 0;
dt
so then we have df/dt ↔ G(s). This is equivalent to the initial-value problem
df/dt = g(t), with f(0) = 0. By integration we obtain
t

f(t) =  g(τ ) dτ.


0
25

Example 25.1 Find f(t) when F(s) = 1/[s(s2 + 1)], (a) by using partial fractions,
(b) by using (25.1).
1 1 s
(a) = − ↔ 1 − cos t.
s(s2 + 1) s s2 + 1
(b) In the notation of (25.1), put
1
G(s) = 2 ↔ sin t.
s +1
Therefore
t


1 1 1
F(s) = 2 = ↔ sin τ dτ = 1 − cos t.
s(s + 1) s s2 + 1 0

Figure 25.1 shows a capacitor, of capacitance C, being charged by a current i(t),


the voltage drop across the plates being v(t). Assume that the capacitor is
uncharged at t = 0; then, at a later time t,
t

 i(τ ) dτ .
1
v(t) =
C 0

i(t) i(t)

C Fig. 25.1
529
Therefore, according to (25.1), the relation between the Laplace transforms of
v(t) and i(t) is

25.1
1
V(s) = I(s). (25.2)

DIVISION BY S AND INTEGRATION


Cs
We say that (25.2) describes the situation in the s domain, as we spoke of description
in the frequency, or ω, domain in Section 21.4.
If the capacitor has a nonzero initial charge q0, then

1⎛
t

v(t) = ⎜
C⎝ 
0
i(τ ) dτ + q0 ⎟ .

Since q0 ↔ s−1q0, this transforms into


11
V(s) = [I(s) + q0 ]. (25.3)
Cs
We shall not be concerned with this case.

Example 25.2 The circuit shown in Fig. 25.2 is switched on at time t = 0. It is


initially quiescent, and there is zero charge on the capacitor. Find the current
for t  0.

i(t)

+
ωt
v(t) = v0 cos ω C

Fig. 25.2

The circuit equation is


1 t
v0 cos ω t = Ri(t) +
C 0 i(τ ) dτ .

Such an equation is called an integral equation for i(t). The Laplace transform of the
equation is
v0s 1
= RI(s) + I(s),
s +ω
2 2
Cs
so
v s2 v0 1 ⎛ (RCω )2 s RCω 2 1 ⎞
I(s) = 0 2 = ⎜ − +
R (s + 1 /(RC))(s + ω ) R 1 + (RCω )2 ⎝ s2 + ω 2
2
s2 + ω 2 s + 1/(RC)⎟⎠
after splitting into partial fractions. Therefore, for t  0,
v 1
i(t) = 0 [(RCω )2 cos ω t − RCω sin ω t + e−t/RC ].
R 1 + (RCω )2
The first two terms represent a steady forced oscillation and the final term is a transient.
530

25.2 The impulse function


LAPLACE AND Z TRANSFORMS: APPLICATIONS

Figure 25.3 shows the graph of a function which is zero everywhere except for a
tall, narrow rectangle with width ε and height 1/ε, so that the area under the
graph is equal to 1. Imagine that ε is a very small number, as small as we wish.
This very tall and very narrow picture is a simplified version of the impulse function
or delta function, usually denoted by δ(t). It is used in problems involving sudden
and brief events, to represent (say) impulsive force between two bodies in collision;
voltage from a lightning strike; or, if the variable is position rather than time, a
point force.

The impulse or delta function δ(t)


Informal definition: δ(t) = 1/ε for 0  t  ε, and δ(t) = 0 elsewhere, where ε is as
small as is necessary. (25.4)

x
1
ε x = δ(t)
x
25

1
ε

f(c) C

x = δ(t − c)
x = f(t)
b

O a c c +ε t

O ε t Fig. 25.4

Fig. 25.3

In Figure 25.4, δ(t) is moved to the right so as to be at t = c; the vertical strip


therefore represents δ(t − c). An ordinary function f(t) crosses it at C. Consider
the integral

 f(t) δ(t − c) dt,


b

where c lies between a and b. The integrand is zero except between c and c + ε ;
over this very narrow interval, f(t) hardly changes from the value f(c). Therefore
(as closely as we wish)
c+ε c+ε

 f(t) δ(t − c) dt ≈  
b
f(c)
f(c)ε dt =
−1
dt = f(c).
a c
ε c

If c does not lie between a and b, then the integral is zero. The delta function is
sometimes called a sifting function because of this property.
531

Sifting property of δ(t)

25.2
 f(t) δ(t − c) dt = ⎧⎨⎩0f (c)
b
if a  c  b,
otherwise.

THE IMPULSE FUNCTION


a
(25.5)

We can obtain the Laplace transform of δ(t) from (25.5):

Laplace transform of δ(t − c)


L{δ(t − c)} = e
0
−st
δ(t − c) dt = e−cs,

for c  0. In particular, L{δ(t)} = 1. (25.6)

Example 25.3 The equation d2x/dt2 + ω 2x = f(t)/m represents the displacement


x of a particle of mass m on a spring of stiffness ω with external force f(t).
Find the motion for t  0 if the particle is subjected to an impulse (I/m) δ(t − 1)
at time t = 1, assuming equilibrium at t = 0. (I has the physical dimensions
[force × time]: see Appendix I.)
The equation is d2x/dt 2 + ω 2x = Im−1 δ(t − 1). Its transform is
s 2X + ω 2X = Im−1 e−s,
where x(t) ↔ X(s). Therefore
Im−1 −s
X(s) = 2 e .
s + ω2
We know that
1 1
↔ sin ω t;
s2 + ω 2 ω
so, by the delay rule (22.15), we have
Im−1 −s Im−1
X(s) = e ↔ x (t ) = sin ω (t − 1)H(t − 1),
s2 + ω 2 ω
where H stands for the unit function (24.13). There is no motion until t = 1, when the
impulse sets up free oscillations [I/mω] sin ω (t − 1).

Example 25.4 Find the current resulting from an impulsive voltage Iv δ(t)
applied to the circuit of Fig. 25.5, the current being zero before application of
the voltage. (The physical dimensions of Iv are [emf × time]: see Appendix I.)
The equation for the current is L di/dt + Ri = Iv δ(t). After transformation, with i(0) = 0,
it becomes
LsI(s) + RI(s) = Iv.
Therefore
Iv I
I(s) = ↔ i(t) = v e −Rt /L.
L(s + R /L) L ➚
532
Example 25.4 continued
LAPLACE AND Z TRANSFORMS: APPLICATIONS

i(t)

v(t) = Ivδ(t) L

Fig. 25.5

The great, though brief, applied voltage gives only a finite current because of the
counter-emf generated by the coil.

The delta function can be regarded formally as the derivative of the unit func-
tion H(t). As in Fig. 25.6a, smooth out the transition of H(t), from zero to one,
as t passes through the origin, by means of a sloping straight line segment. The
derivative of this function is equal to zero outside the transition interval (0, ε) and
25

equal to ε inside it; this specifies δ(t) as in (25.4).

(b) x

(a) x 1
ε

Slope εε−1

O ε t O ε t

Fig. 25.6

Connection between H(t) and δ(t)


dH(t)
= δ(t).
dt (25.7)

This only conforms with the Laplace-transform derivative rule (24.12),

⎛ 1⎞
s ⎜ ⎟ − H(0) = 1,
⎝ s⎠

if we rather arbitrarily interpret H(0) as being zero. It can be justified by taking


the Laplace transform of the function defined in Fig. 25.6a, namely
533
ε ∞
1
ε  te –st
dt + e –st
dt =
1 − e−εs
s2ε
.

25.3
0 ε

The right-hand side approaches 1/s as ε → 0 (use l’Hôpital’s rule of Section 5.8).

IMPEDANCE IN THE S DOMAIN


It should be understood that certain weaknesses result from treating the impulse
function very informally; the real justification for its use is an elaborate mathe-
matical subject called distribution theory.

Self-test 25.1
An impulsive voltage Kδ(t) is applied at t = 0 to a circuit consisting of an
inductance L and a capacitor C in series. Obtain the current i(t), assuming
the circuit is initially quiescent.

25.3 Impedance in the s domain


In the table (25.8) below three basic circuit elements are shown, together with
their voltage-drop/current relations and the Laplace transforms of these rela-
tions, on the assumption that the initial current through the inductor and the
charge on the capacitor are zero. The expression ‘s domain’ refers to transformed
quantities.

Resistor Inductor Capacitor


i(t) i(t) i(t)

v(t) v(t) v(t)


di(ι) 1
Time domain: v(t) = Ri(t) v(t) = L v(t) = i(τ ) dτ
dt C
0
s domain: V(s) = RI(s) V(s) = LsI(s) V(s) = I(s)/(Cs)
Impedance Z(s): R Ls 1/(Cs)

(25.8)

Table (25.8) should be compared with the table (21.5) for the case of steady
forced oscillations of frequency ω /(2π). The impedances Z(s) in the s plane are
analogous to the complex impedances R, iω L, and 1/(iω C) of (21.6) for the
steady case. One can pass from one to the other by substituting iω for s, or −is for
ω. However, the s forms allow arbitrary inputs to the circuit to be considered.
Impedances combine in series and parallel in the same way as do complex
impedances (see (21.7)) in the frequency domain, but it is to be remembered that
they refer to zero initial conditions only.
534

Combination of impedances Z(s) in the s domain for zero initial state


LAPLACE AND Z TRANSFORMS: APPLICATIONS

Impedances in series
Z1 Z2 …
Z = Z1 + Z2 + ··· .

Impedances in parallel

1 1 1
= + + . Z1 Z2
Z Z1 Z2

(25.9)

Example 25.5 The circuit shown in Fig. 25.7a is initially quiescent, with zero
charge on the capacitor. The constant voltage v0 is switched on at t = 1 and off
at t = 2. Find the current i(t).

(a) (b)
R=3 R
Z1 =
25

1 + RCs

1
C = 12

v0 L=4 V(s) Z2 = Ls

Fig. 25.7

The corresponding s-domain impedances are shown in Fig. 25.7b, in which the elements
R and C are grouped. They are in parallel, so (25.8) and (25.9) give
1 1 1 1 s s+4
= + = + = .
Z1 R (Cs)−1 3 12 12
Hence
12
Z1 = ,
s+4
and also Z2 = Ls = 4s. Then Z for the whole circuit is given by
12 4(s + 1)(s + 3)
Z = 4s + = .
s+4 s+4
Therefore
s+4
I(s) = V(s).
4(s + 1)(s + 3)
Taking into account switch-on at t = 1 and switch-off at t = 2,
v(t) = v0 [H(t − 1) − H(t − 2)], ➚
535
Example 25.5 continued

25.4
so
⎛1 1 ⎞
V(s) = v0 ⎜ e −s − e −2s ⎟ .

TRANSFER FUNCTIONS IN THE S DOMAIN


⎝s s ⎠
Therefore
v0(s + 4) ⎛1 3 1 1 1 ⎞ −s
I(s) = (e −s − e −2s ) = v0 ⎜ − + ⎟ (e − e −2s ).
4s(s + 1)(s + 3) ⎝ 3s 8 s + 1 24 s + 3 ⎠
The first bracketed factor transforms back to
v0 ( 13 − 83 e −t + 241 e −3t ),
and, by using the delay rule (24.15) to deal with the exponentials,
i(t) = v0 ( 13 − 83 e −(t−1) + 241 e −3(t−1) )H(t − 1) − v0 ( 13 − 83 e −(t−2) + 241 e −3(t−2) )H(t − 2).
Nothing happens until the system is switched on at t = 1, when the first term (only) is
activated. At t = 2, when it is switched off, the second term comes in also; some current
persists but it dies away to zero.

It must be emphasized that such a problem is considerably complicated when


the initial conditions are not zero. For example, the expression (25.3) for an
initially charged capacitor is not in the form of a voltage–impedance–current
relationship. In such cases, it is necessary to start with the differential equations
for individual branches.

25.4 Transfer functions in the s domain


The impedance Z(s) which directly connects the current in a unit with the voltage
drop across the same unit is a special case of a more general idea: to relate any two
currents or voltages which occur in the network.
We suppose as before that we have a passive circuit consisting of linear resistors,
capacitors, and inductors, and a single source of voltage which drives the circuit.
Figure 25.8 represents such a network. We denote driving voltage by f(t), because
much of what we say can be taken over into mechanical and other systems. The
unknown voltages and currents we call the variables.

f(t)
P Inp Driving voltage or current
ut
p(t
)
P′

q(t) Q′
u t put
Q O Fig. 25.8
536
The circuit is initially quiescent. Suppose there are N branches, and N voltages
v1(t),v2(t), … , vN(t), with transforms V1(s),V2(s), … ,VN(s), to be determined. An
LAPLACE AND Z TRANSFORMS: APPLICATIONS

external potential difference (voltage) f(t) with transform F(s) is applied across
any two points in the network. Apply the s-transform version of Kirchhoff’s
equations (eqn 21.8) to obtain N equations sufficient to determine the Vn(s). Each
equation takes one of only two possible forms:
either a1V1 + a2V2 + … + aNVN = 0
or b1V1 + b2V2 + … + bNVN = F,
whose coefficients are functions of s. Therefore the transforms are, after solving
the N linear equations, proportional to F in the form:
Vn(s) = Gn(s)F(s), (n = 1, 2, … , N).
The coefficients Gn(s) depend on the circuit parameters, and are called voltage-to-
voltage transfer functions determining the voltage induced in any branch by the
sudden establishment of the given applied voltage F(s). It is only a single further
step to derive voltage-to-current and current-to-current transforms between
particular pairs of branches. We can put the results in the following form:
25

Transfer functions GPQ(s) (initial conditions zero)


Let p(t) (input) and q(t) (output) be the voltages or currents in any two branches.
Then, since P(s) = GP(s)F(s) and Q(s) = GQ(s)F(s), it follows that
GPQ(s) = Q(s)/P(s) = GPQ(s), (say),
where P and Q are the transforms of p and q, and GPQ = GQ(s)/GP(s) is the
appropriate transfer function to take p to q. (25.10)

Transfer functions that connect the various types of variable (voltages or


currents) are given various names in the literature on systems. For example,
in the s domain the ‘transfer’ voltage → current is achieved via impedance,
current → voltage by admittance; voltage → voltage is voltage gain.

Example 25.6 Find the transfer function GPQ(s) from the voltage transform
P(s) across R, regarded as the input, and the voltage transform Q(s) across C,
regarded as the output, in Fig. 25.9a.
Let the current i(t) be as indicated. The impedances of the various groups are shown
in Fig. 25.9b; these are in fact transfer functions from current to voltage for each unit.
In terms of the transforms,
1
P(s) = RI(s), Q(s) = I(s).
Cs
Therefore
Q(s) 1
GPQ(s) = = .
P(s) RCs ➚
537
Example 25.6 continued

25.4
(a) L (b)
rLs

TRANSFER FUNCTIONS IN THE S DOMAIN


r + Ls

R r R
1
p(t) C q(t) P(s) Cs Q(s)

i(t)

I(s)
v(t) V(s)

Fig. 25.9

Thus
1 1
Q(s) = P(s),
RC s
t

 p(τ ) dτ, as expected.


1
and so q(t) =
RC 0

Suppose now that we have a circuit such as the one in Fig. 25.10a, called
Circuit A, where p(t) is the input voltage and q(t) the output voltage. Figure 25.10b
schematizes the arrangement and specifies the transfer function G(s) = Q(s)/P(s)
between p and q.

(a) circuit A
(b) circuit A, s domain
+ +
RA (c)
Q LAs
p(t) LA q(t) P(s) GA = = Q (s) LAs
P RA + LAs GA =
P RA + LAs Q

Fig. 25.10

We could also symbolize the dependence of q on p by the scheme in Fig. 25.10c.


However, this figure suggests the beginnings of some kind of series arrangement:
it looks as if we could attach another circuit to the original one without altering
the transfer function, and so get an easy calculation for the combined circuit.
This is not true in general, but sometimes it is a useful approximation.
To illustrate this question, we will append to Circuit A another Circuit B. It is
shown in Fig. 25.11a, together with its s domain representation and its transfer
function. In Fig. 25.11b, A and B are connected across MN; here p, q, r, and their
transforms P, Q, R represent the actual voltages across the terminals indicated.
The question is ‘do the transfer functions written in the boxes still correctly give
Q(s) in terms of P(s), and then R(s) in terms of Q(s)?’
If an appreciable amount of current passes between Circuits A and B after
attachment, then Q(s) must change, so the true transfer functions of both of the
538

(a) circuit B circuit B, s domain


LAPLACE AND Z TRANSFORMS: APPLICATIONS

RB
LBs
LB GB =
RB + LBs

(b)
circuit A circuit B
N

LAs LBs
P(s) GA = Q(s) GB = R(s)
RA + LAs RB + LBs

Fig. 25.11

circuits will be changed, and the changes will not compensate each other. In special
circumstances, however, the circuits may behave almost independently, or can be
made to do so by means of technical arrangements such as feedback.
25

Example 25.7 The two circuits A and B shown in Fig. 25.12 are connected to
form a composite circuit C. Show that
G(s) ≈ GA(s)GB(s)
(where the G(s) are the transfer functions for the voltages shown) if 1/R is much
smaller than 1/r + 1/r1.

r1 R I r1 I1 R
V1 r VA V2 C VB V r C VC
L L

circuit A circuit B circuit C

Fig. 25.12

For Circuit A alone:


VA(s) r
GA(s) = = .
V1(s) r + r1
For Circuit B alone:
VB(s) impedance of C 1 1
GB(s) = = = ⋅ .
V2(s) total impedance Cs R + Ls + 1/(Cs)
Therefore
1 r 1
GA(s)GB(s) = ⋅ ⋅ .
Cs r + r1 R + Ls + 1/(Cs) ➚
539
Example 25.7 continued

25.4
For Circuit C, by following the voltage drops around closed subcircuits as usual, we get
V = r1I + r(I − I1),

TRANSFER FUNCTIONS IN THE S DOMAIN


0 = (R + Ls + 1/Cs)I1 − r(I − I1),
from which
Vr
I1 = ,
(r + r1 )(r + R + Ls + 1/(Cs)) − r 2
which represents the current ‘leaking’ between A and B. Therefore
VC(s) 1 r
GC(s) = = ⋅ ,
V(s) Cs (r + r1 )(r + R + Ls + 1/(Cs)) − r 2
which we have to compare with GA(s)GB(s) above. Rewrite GC(s) in the form
1 r 1
GC(s) = ⋅ ⋅ .
Cs r + r1 R + Ls + 1/(Cs) − rr1 /(r + r1 )
It can be seen that GC(s) ≈ GA(s)GB(s) if rr1 /(r + r1) is much smaller than R. But
rr1 ⎛1 1⎞
=1 ⎜ + ⎟,
r + r1 ⎝ r r1 ⎠
and this is much smaller than R if 1/r + 1/r1 is much greater than 1/R. The relation
between the circuits could be represented in this case approximately by Fig. 25.13, as if
they processed the voltage signals independently.

r 1 1
GA = r + r GB =
V 1 VA Cs R + Ls + 1(Cs) VB

Fig. 25.13

Example 25.8 Figure 25.14 shows a chain of three systems, which act
independently upon their inputs according to the transfer functions GA(s),
GB(s), and GC(s) indicated in the boxes. Find the transfer function G(s)
between F(s) and FC(s). Find fC(t) when f(t) = H(t), for zero initial conditions.

device A device B device C

1 1 1
GA(s) = GB(s) = GC(s) =
s s+1 s+2
F(s) FA(s) FB(s) FC(s)

Fig. 25.14

We have
FC FC FB FA 1 1 1
= = ,
F FB FA F s+2 s+1s ➚
540
Example 25.8 continued
LAPLACE AND Z TRANSFORMS: APPLICATIONS

so
1
G(s) = .
s(s + 1)(s + 2)
Now let f(t) = H(t); then F(s) = 1/s. Therefore
1 1 1
FC(s) = G(s)F(s) = = .
s(s + 1)(s + 2) s s2(s + 1)(s + 2)
In partial fractions,
31 1 1 1 1 1
FC(s) = − + + − .
4 s 2 s2 s + 1 4 s + 2
Therefore f (t) = − 43 + 12 t + e − t − 14 e −2 t for t  0.

It is possible to get an idea of important features of an output without going


through the whole calculation:

Example 25.9 In a particular system, the output X(s) in the s domain is


related to the input F(s) by X(s) = G(s)F(s). Find the general character of
x(t) if f(t) = cos 2t, G(s) = s/[(s + 1)(s2 + 4s + 5)].
25

Since f(t) ↔ s/(s2 + 4), we have


s s
X(s) = .
(s + 1)(s2 + 4s + 5) s2 + 4
If we expanded this in partial fractions, we should have terms of the types
1 s 1
, , and (from G),
s + 1 (s + 2)2 + 1 (s + 2)2 + 1
and
s 1
and 2 (from F ).
s2 + 4 s +4
Therefore, in terms of time, we should obtain terms like
e−t, e−2t sin t, and e−2t cos t from G (which are transients),
cos 2t and sin 2t from F (a forced oscillation).

Finally we illustrate the relation between transfer functions in the s domain and
complex transfer functions in the ω domain (Section 21.5).

Example 25.10 The transfer function between an input F(s) and an output
X(s) is 1/(s2 + 1). Find the amplitude and phase of the steady forced oscillation
produced by an input f(t) = 3 sin 2t.
As pointed out in Section 25.3, the complex impedance is simply the s domain impedance
with iω substituted for s. The same is true for any transfer function. In the ω domain
representation, the input and output will be represented by phasors F(ω) = 3 e − 2 π i
1

and X(ω ), corresponding to circular frequency ω = 2 in this case. Then


1
X(ω) = 3 e − 2 π i = − e − 2 π i = e 2 π i.
1 1 1

(2i)2 + 1
The amplitude is the modulus of X, which is 1, and the phase is --12 π.
541

25.5 The convolution theorem

25.5
The following result enables us to interpret Laplace transforms that take the form
of a product of two functions.

THE CONVOLUTION THEOREM


Convolution theorem
Suppose F(s) = G(s)H(s), and
G(s) ↔ g(t), H(s) ↔ h(t).
Then

 g(t − τ )h(τ ) dτ
t

f(t) =
0

(which is the same as ∫ t0 h(t − τ )g(τ ) dτ ). (25.11)

The integral is called the convolution of g(t) and h(t). This result will be proved
in Chapter 32, Example 32.12, by using double integration. For the present we
shall verify that it is true in some special cases.

Example 25.11 Find the inverse Laplace transform of


1
F(s) = .
(s + 1)(s + 2)
Put F(s) = G(s)H(s), where
1 1
G(s) = , H(s) = ;
s+1 s+2
then g(t) = e−t and h(t) = e−2t. The convolution theorem (25.11) gives
t t
F(s) ↔ f(t) = e
0
−(t−τ )
e−2τ dτ = e
0
−t−τ

t t
= e
0
−t
e−τ dτ = e−t e
0
−τ
dτ (i)

= e−t(−e−t + 1) = e−t − e−2t.


This result can be confirmed by using partial fractions instead:
1 1 1
= − ↔ e−t − e−2t.
(s + 1)(s + 2) s + 1 s + 2

Notice very carefully the distinction between t and τ in the integrals (25.11): τ is
the variable of integration. The variable t is a constant so far as the integration
process is concerned; so, for example, in eqn (i), Example 25.11, we took e−t
outside the integral sign.
542

Example 25.12 Find the inverse transform of 1/[s(s2 + 1)].


LAPLACE AND Z TRANSFORMS: APPLICATIONS

In Example 25.1, we showed in two different ways that


1
↔ 1 − cos t.
s(s + 1) 2

To confirm that (25.11) gives the same result, put


1 1
G(s) = and H(s) = ,
s +1 2
s
say. Then (for t  0) g(t) = sin t and h(t) = 1, so
g(t − τ ) = sin(t − τ ) and h(τ ) = 1.
Therefore, by the convolution theorem,
t
F(s) ↔  sin(t − τ )1 dτ = [cos(t − τ )]
0
t
τ =0 = 1 − cos t,

as expected.

Example 25.13 (See (25.11)). Confirm directly that


t t

 g(t − τ )h(τ ) dτ =  h(t − τ )g(τ ) dτ.


25

0 0

In the first integral, change the variable, putting


u = t − τ.
Then (remember t is to be treated like a constant) du = − dτ. Therefore
t t

   h(t − u)g(u) du,


0
g(t − τ )h(τ ) dτ = g(u)h(t − u)(−du) =
0 t 0

which is the integral required, merely using u instead of τ for the variable of integration.

Example 25.14 Find an expression for the inverse transform of


1
F(s) = H(s)
s+1
in terms of h(t), the inverse transform of H(s).
Use the convolution theorem, (25.11), putting G(s) = 1/(s + 1). Then
g(t) = e−t.
We therefore obtain from (25.11)
t
f(t) = e
0
−(t−τ )
h(τ ) dτ ,

or its alternative form


t
f(t) =  e h(t − τ ) dτ.
0
−τ
543

Self-test 25.2

25.6
Use the convolution theorem (25.11) to obtain the solution for the unknown
function x(t) in the ‘integral equation’

GENERAL RESPONSE OF A SYSTEM FROM ITS IMPULSIVE RESPONSE


 (t − τ)x(τ) dτ = -- − e + --e .
t
1 t 1 2t
2 2
0

General response of a system from its


25.6
impulsive response
We shall take an electrical network as our example, though what we say applies
to linear mechanical systems as well. Suppose that the system is quiescent, and it
is activated at time t = 0 by an applied voltage f(t) (regarded as the input). Focus
on any particular one of the currents or voltages in the circuit, and call it x(t)
(the output). The transfer function between input and output will be called G(s).
We have then
X(s) = G(s)F(s). (25.12)

Suppose that we conduct an experiment in which we excite the circuit by means


of a voltage impulse Iv δ(t), where Iv is a given constant, and record the result (the
dimensions of Iv are [emf × time]). Then
f(t) = Iv δ(t), so that F(s) = Iv.
The current resulting from this special voltage (an impulsive input) will be called
x*(t), with transform X*(s). Now put F(s) = Iv and X* for X into (25.12), and it
becomes
X*(s) = IvG(s). (25.13)

Such an experiment would therefore give us the corresponding transfer func-


tion G(s) directly (we could even arrange for Iv to equal unity). Thus, even if the
circuit is a ‘black box’ with its details unknown, we still know from (25.13) what
to put into (25.12) for the case when f(t) is any function at all:
X(s) = I −1
v X*(s)F(s).

Therefore, by the convolution theorem (25.11),


t t

x(t) = I −1
v  x*(t − τ )f(τ ) dτ,
0
or I −1
v  x*(τ )f(t − τ ) dτ.
0

This type of result applies to the other circuit variables such as voltages and
charges, and to mechanical systems governed by linear differential equations.
In terms of general outputs and inputs:
544

Output x(t) from an input f(t) to a quiescent linear system, in terms of the
LAPLACE AND Z TRANSFORMS: APPLICATIONS

output x*(t) from an impulsive input I δ(t)


t

x(t) = I−1  x*(t − τ )f(τ ) dτ,


0

or
t

x(t) = I−1  x*(τ )f(t − τ ) dτ.


0
(25.14)

Example 25.15 The displacement x*(t) caused by an impulse I δ(t) applied


to a certain mechanical linear system at rest is found to be x*(t) = e−t − sin 2t.
Find the displacement x(t) corresponding to an applied force f(t) = sin t
starting at t = 0.
We have
t
x(t) = I −1  [e
0
−(t−τ )
− sin 2(t − τ )] sin τ dτ (from (25.14))
25

t t
= I −1 e−t  e sin τ dτ − I  sin(2t − 2τ ) sin τ dτ
0
τ −1

0
t t
= I −1 e−t  e sin τ dτ − --I  [cos(2t − 3τ ) − cos(2t − τ )] dτ,
0
τ 1 −1
2
0

by using the identity (1.17b). In the end, we find


x(t) = 12 I −1 e −t + 13 I −1 sin 2t − 16 I −1(3 cos t + sin t)
for t  0. The first term is a transient, and the second an induced free oscillation, and the
third term represents the forced oscillation.

25.7 Convolution integral in terms of memory


An integral of the type
t

x(t) =  g(t − τ )f(τ ) dτ,


0

such as arose in the convolution theorem (25.11), is called a convolution integral.


Typically, f acts as some kind of ‘cause’, such as a driving force or voltage, and x(t)
stands for a certain ‘effect’ produced.
Choose a time t for observation; then divide the interval τ = 0 to τ = t into a
large number of equal time steps δτ. We have
t τ =t
x(t) =  0
g(t − τ )f(τ ) dτ ≈ ∑
τ
g(t − τ )f (τ ) δτ .
=0
545

(a) (b)

25.8
y g(α) y τ)
y = f(
f(τ1)

DISCRETE SYSTEMS
y = f(τ1)g(τ − τ1)
f(τ1)g(t − τ1)
τ
O Elapsed time, or age α O τ1 t − τ1 t

Fig. 25.15

Now choose any moment τ1 between 0 and t: there was a force f(τ1) applied at this
moment, and its contribution to x at time t  τ1 is
g(t − τ1)f(τ1) δτ .
The factor g(t − τ1) takes into account the time elapsed between the cause and its
effect – in some problems is would be appropriate to call t − τ1 the ‘age’ of f(τ1) at
the moment t of observation, and g an ageing factor. Depending on the type of
problem, this factor might weaken or amplify the contribution of f(τ1) to the
integral as time t passes. The elapsed time is increased if either we take an earlier τ1,
or delay the time of observation by increasing t. Figure 25.15a shows a represent-
ative function g(α ), where α stands for ‘age’, and Fig. 25.15b illustrates its effect on
the influence of f at time τ1 on x at a later time t.

25.8 Discrete systems


Suppose we have a system or processor, which we shall generally think of as an
electrical circuit. All the time functions used are zero for t  0. The input will be
denoted by x(t) and the output by y(t), and either may be referred to as a signal.
The system is said to be linear and time invariant if there is a fixed transfer func-
tion G(s) such that for all inputs and at all times t  0 the input /output relation
between the Laplace transforms of x(t) and y(t) has the form
Y(s) = G(s)X(s) (25.15)

where
x(t) ↔ X(s) and y(t) ↔ Y(s),
subject to the condition of quiescence at t = 0. Thus G(s) completely describes the
effect of the circuit. By (25.11), the convolution theorem (25.15) is equivalent to
t t

y(t) = 
0
x(τ )g(t − τ ) dτ or  x(t − τ )g(τ ) dτ
0
(25.16)

where g(t) ↔ G(s).


For the impulsive input x(t) = x*(t), where x*(t) = δ(t), we have, by (25.5),
X*(s) = 1, so by (25.15), Y*(s) = G(s), or
546

(a) x(t) (b) The sample {x(0), x(T) , ... }


LAPLACE AND Z TRANSFORMS: APPLICATIONS

O T 2T 3T 4T t O T 2T 3T 4T t

Fig. 25.16

g(t) = y*(t), where g(t) ↔ G(s). (25.17)

In other words, the interpretation of g(t) is that it is equal to the output from a
unit delta-function input at t = 0. This repeats the result (25.14).
So far in the chapter we have only considered circuits made up from the tradi-
tional elements, resistances, capacitances, and inductances, but there exists a far
greater variety of basic units. We shall not describe the circuits which contain
these new features, but only specify their properties.
Figure 25.16a shows a smooth signal x(t) starting at t = 0. Imagine that this
serves as the input to a circuit that picks out the values of x(t) at times t = 0, T, 2T,
25

3T, … , samples them over very short time intervals, and ignores the values of x(t)
in between, treating them as if they were zero. This process is indicated by the
shaded strips in Fig. 25.16a. The device registers a sequence of values
{x(0), x(T), x(2T), x(3T), … },
called a sample of x(t) at equal intervals T. In an actual instrument the output
will consist of a succession of ‘spikes’ as in Fig. 25.16b. These can be thought of
as brief puffs of energy generated by the circuit, which are equal in ‘content’ to
the sequence of values above, so it is plausible to represent the sample, y(t) say, by
K
y(t) = ∑ x(kT) δ(t − kT) (25.18)
k= 0

(where K may be infinite). Such a function is called discrete. The circuit works like
the first stage of an analogue-to-digital converter.
Suppose next that we have a circuit which processes discrete inputs of interval T,
and produces discrete outputs of interval T. Such circuits may amplify, or filter, or
delay, or modify the input in a variety of ways. We then have a completely discrete
system. The input x(t) and output y(t), and their Laplace transforms X(s) and Y(s),
take the form
N N
x(t) = ∑ xn δ(t − nT), X(s) = ∑x n e−nTs, (25.19)
n =0 n =0
K K
y(t) = ∑y k δ(t − kT), Y(s) = ∑y k e−kTs, (25.20)
k= 0 k= 0

where xn and yk are constants, and N and K may be infinite. We may alternatively
express x(t) and y(t) in the form
547
x(t) = {x0, x1, x2, … , xN}, or simply as {xn};
y(t) = {y0, y1, y2, … , yK}, or as {yk}.

25.8
Thus, {n + 3} stands for {3, 4, 5, … }. In a case such as {1, 2, 0, 0, 0, 0, … } we may

DISCRETE SYSTEMS
further shorten it to {1, 2}.
Assume next that there exists a transfer function G(s) so that Y(s) = G(s)X(s).
Let g(t) ↔ G(s); then from (25.17) g(t) is equal to the output resulting from the
unit impulsive input
x*(t) = δ(t)
(or x*(t) = {1} or {1, 0, 0, 0, … } in the sequence form). The device generates only
discrete outputs. Then, g(t), which is equal to the response to x*(t), must also
have a discrete form:
M M
g(t) = ∑ gm δ(t − mT), so G(s) = ∑g m e−mTs. (25.21)
m=0 m=0

Example 25.16 A discrete circuit delays any incoming signal by an interval T


(see Fig. 25.17a). (a) Obtain a transfer function G(s) by considering the response
to a delta-function input. (b) Confirm that this transfer function delays an
arbitrary discrete signal by an interval T, and that therefore the circuit is linear.

(a) x(t) y(t)


Input Output

O T 2T 3T t O T 2T 3T 4T t

(b) δ(t) δ(t − T)

O T t O T t

Fig. 25.17 (a) arbitrary discrete input x(t) and output y(t). (b) input x(t) = δ(t), and delayed
output g(t − τ ) δ(t − τ ) ↔ G(s).

(a) If the input is x(t), then the output is y(t) = x(t − T). Therefore, if x(t) = δ(t),
the output is δ(t − T) as in Fig. 25.17b, and by (25.17), we must have g(t) = δ(t − T),
so the transfer function is G(s) = e−Ts. ➚
548
Example 25.16 continued
LAPLACE AND Z TRANSFORMS: APPLICATIONS

(b) To check that this transfer function really works for a general discrete input,

put x(t) = ∑x n δ(t − nT). We then have
n= 0

Y(s) = X(s)G(s) = X(s) e−Ts.


By the delay rule (24.25),
y(t) = x(t − T)H(t − T),
where H(t) is the unit function. Therefore x(t) is delayed by an interval T.

In the general case when the transfer function takes the form {g1, g2, … , gM},
an input x(t) = δ(t), represented by x(t) = {1}, generates a string of impulses
M


x=0
gm δ(t − mT) at intervals mT, m = 0 to M. We shall look at at the system’s

response to an arbitrary input in the next sections.

Self-test 25.3
The transfer function g(t) of a discrete system with time interval T is given by
25

g(t) = --12 δ(t − T) + δ(t − 2T).


By using (25.18) and (25.6) obtain the output y(t) given the input x(t) =
x0 δ(t) + x1 δ(t − T).

25.9 The z transform


In the previous section the only functions of s that appear are exponentials of the
form e−nTs, representing δ(t − nt), where n is a positive integer or zero.
They may be written
e−nTs = (eTs)−n = 1/(eTs)n.
The algebra connected with discrete systems is simplified by introducing a new
variable z, defined by
z = eTs. (25.22)

Then we may write for the transform of a typical discrete input x(t):
N N
xn
X(s) = ∑ xn e−nTs ≡
n=0
∑z
n=0
n
.

We shall reformulate the previous results in terms of z. Suppose we have a


discrete signal x(t) = {x0, x1, x2, … } consisting of equally spaced impulses or
samples with interval T, and x(t) = 0 for t  0. Then the function X(z) given by
549
x x x
X(z) = x0 + 1 + 22 + 33 +  (25.23)
z z z

25.9
is called the z transform of x(t). Given x(t) we can write down the z transform.

THE Z TRANSFORM
Conversely, given a suitable function X(z), we can expand it by Taylor’s theorem
for large z in powers of z −1 in order to obtain the sequence of coefficients {x0,
x1, x2, … } in (25.23), which defines x(t). This sequence is called the inverse
transform of X(z).
Suppose that {xn} is supplied as input to a discrete linear system. The z trans-
form of the output y(t) = {y0, y1, y2, … }, say, is
y1 y2
Y (z) = y0 + + 2 + . (25.24)
z z
We already know from (25.21) that if the circuit is linear it has a transfer function
G(s) taking the form of a similar sequence of impulsive terms. Therefore g(t) has a
z transform:
g1 g2
G(z) = g0 + + 2 + . (25.25)
z z
Finally, from (25.15) (since all we have done is to write a shorthand for eTs), the z
transforms of output and input are related by
Y (z) = G(z)X(z) (25.26)

−1
which is simply the product of two polynomials in powers of z .
We have lost sight of T in these expressions, but we can always recover it by
returning to time-domain or s-domain formulae by putting z = eTs. To summarize:

The z transform of a discrete signal


(a) If x(t) = {x0, x1, x2, …}, its z transform is
x x
X(z) = x0 + 1 + 22 +  .
z z
(b) The sequence {x0, x1, x2, … } is called the inverse transform of X(z).
(c) z is related to the Laplace transform by z = eTs. (25.27)

The transfer function in terms of z


(a) The transfer function takes the form
g g
G(z) = g0 + 1 + 22 +  .
z z
(b) The input/output relation is
Y (z) = G(z)X(z).
(c) The inverse of G(z) is the response to an input δ(t), and has the form
{g0, g1, g2, … }.
(Here G(z) ≡ G(s) ↔ g(f ) where z = eTs.)
(25.28)
550

Example 25.17 Obtain the z transform of the discrete signal x(t) defined by the
LAPLACE AND Z TRANSFORMS: APPLICATIONS

sequences (a) {1}; (b) xn = 1 for n  0; (c) xn = 1 if n is even, xn = 0 if n is odd.

0 0
(a) X(z) = 1 + + +  = 1.
z z2
1 1
(b) X(z) = 1 + + + .
z z2
This is an infinite geometric series with common ratio z −1 (it converges only if |z |  1,
but do not worry about this). From eqn (5.4a):
1 z
X (z ) = = .
1−z −1
z −1
1 1
(c) X(z) = 1 + + + .
z2 z 4
The common ratio is z−2, so by Section 5.4
1 z2
X (z ) = = .
1 − z −2 z 2 − 1

Obtain the z transform of x(t) = {1, 2, 3, … }, or {n + 1}.


25

Example 25.18

We see that
2 3 4
X (z ) = 1 + + 2 + 3 + .
z z z
To sum this series, multiply it by 1/z:
1 1 2 3
X (z ) = + 2 + 3 +  .
z z z z
Subtract the second expression from the first:
⎛ 1⎞ 1 1 z
⎜ 1 − ⎟ X (z ) = 1 + + 2 +  =
⎝ z⎠ z z z −1
(as in the previous Example). Therefore
z ⎛ 1⎞ z2
X (z ) = ⎜1 − ⎟= .
z −1 ⎝ z ⎠ (z − 1)2

Example 25.19 (a) Obtain the inverse z transform of the function X(z) = z /(z − 2).
(b) Deduce the time function x(t) which it represents.
(a) We need to find the coefficients in the infinite series form for X(z):
x1 x2
X (z) = x0 + + 2 + .
z z
This is a Taylor expansion of X (z) in powers of 1/z for large z (see Section 5.6). To obtain
it, we start by expressing X (z) in terms of 1/z:
−1
z ⎛ 2⎞ ⎛ 2⎞
X (z ) = = 1 ⎜1 − ⎟ = ⎜1 − ⎟ .
z−2 ⎝ z⎠ ⎝ z⎠
The binomial expansion (5.4f), with α = −1 and x = −2/z, gives ➚
551
Example 25.19 continued

25.9
2 22 2 3
X (z ) = 1 + + + + .
z z2 z 3

THE Z TRANSFORM
Therefore the sequence of coefficients (i.e. the inverse) is {1, 2, 22, 23, … }.
(b) The corresponding time function x(t) is therefore
x(t) = δ(t) + 2δ(t − T) + 22δ(t − 2T) + 23δ(t − 3T) + ··· .

Example 25.20 The response of a discrete system to the input x(t) = δ(t) + δ(t − T)
is found to be y(t) = δ(t) + 2δ(t − T) + δ(t − 2T). Find (a) the z transfer function
G(z), (b) the Laplace transfer function G(s), (c) the response to a unit impulse δ(t).
1 2 1 g g
(a) Put X (z ) = 1 + , Y (z) = 1 + + 2 , and G(z ) = g 0 + 1 + 22 + (for all we know
z z z z z
at this stage, there might be an infinite number of terms in G(z)). Since Y (z) = G(z)X(z)
2
⎛ 2 1⎞ ⎛ 1⎞ ⎛ 1⎞ ⎛ 1⎞ 1
G(z) = Y (z)/ X (z) = ⎜ 1 + + ⎟ ⎜1 + ⎟ = ⎜1 + ⎟ ⎜1 + ⎟ = 1 + .
⎝ z z2 ⎠ ⎝ z⎠ ⎝ z⎠ ⎝ z⎠ z
(b) Restore s by putting z = eTs, where T is the spacing interval:
G(s) = 1 + e−Ts.
(c) The impulse response is the inverse transform, g(t), of G(s):
g(t) = δ(t) + δ(t − T),
which can be obtained also from (a).

Example 25.21 A smooth signal x(t) is sampled at intervals T to produce


the discrete signal {x(0), x(T), x(2T), … }. Obtain the z transform when
(a) x(t) = cos ω t; (b) x(t) = sin ω t.
The sample sequences in (a) and (b) are respectively {1, cos ωT, cos 2ωT, … } and
{0, sin ωT, sin 2ωT, … }. We can deal with both at the same time by remembering that
cos nωT and sin nωT are respectively equal to the real and imaginary parts of einωT.
Therefore, consider the sequence resulting from the complex input sequence
{1, eiωT, e2iωT, … },
which has the z transform
eiωT e2iωT
1+ + 2 +  = 1 + (eiωTz−1) + (eiωTz−1)2 + ··· .
z z
This is an infinite geometric series with common ratio eiωTz −1, so its sum is equal to
1/(1 − eiωTz −1) = z/(z − eiωT ).
The complex conjugate of the denominator is z − e−iωT, so write
z z z − e −iωT z(z − e −iωT ) z(z − e −iωT )
= = = .
z – eiωT z − eiωT z − e −iωT z 2 − z(eiωT + e −iωT ) + 1 z 2 − 2z cos ωT + 1
The transforms of cos ωT and sin ωT are the real and imaginary parts respectively of
this expression, so:
z(z − cos ωT)
transform of cos ωT = 2 ;
z − 2z cos ωT + 1
z sin ωT
transform of sin ωT = 2 .
z − 2z cos ωT + 1
552
Finally, we note the discrete form of the convolution theorem, (25.11), expressed in
terms of z. For a discrete linear system there exists a transfer function G(z) such
LAPLACE AND Z TRANSFORMS: APPLICATIONS

that input and output are related by Y (z) = G(z)Y (z), or


y1 ⎛ g ⎞⎛ x ⎞
y0 + +  = ⎜ g0 + 1 + ⎟ ⎜ x0 + 1 + ⎟ .
z ⎝ z ⎠⎝ z ⎠

By matching the coefficients of inverse powers of z on both sides we obtain:

Discrete form of the convolution theorem


If Y (z) = G(z)X(z), then
y0 = g0 x0,
y1 = g1 x0 + g0 x1,
y2 = g2 x0 + g1 x1 + g0 x2 ,
and so on. In general,
n
yn = ∑g x
r= 0
r n −r .
(25.29)
25

The structure of these formulae resembles that of a convolution integral, with r in


place of τ and n in place of t in (25.11).

Self-test 25.4
The response of a discrete system to the input x(t) = δ(t) + δ(t − T) is found
to be
δ(t) + δ(t − T) − δ(t − 2T) − δ(t − 3T).
By using z transforms show that the transfer function g(t) = δ(t) − δ(t − 2T).

25.10 Behaviour of z transforms in the complex plane


Suppose that a string of impulses represented by x(t) = {x0, x1, x2, … } is fed into a
discrete processor. When the first impulse arrives at t = 0, it triggers the circuit to
produce a scaled copy of the transfer function G(z) in the time domain, a string of
impulses given by x0g(t) = {x0g0, x0g1, x0g2, … }. The second impulse is felt at
t = T, and G(z) forms another scaled copy of itself, x1g(t − T), starting at t = T,
and so on. These sequences overlap: the second one starts before the first has
ended, and the output sequence consists of the sum of all the effects which are still
present at T, 2T, 3T, … . This is illustrated in Fig. 25.18.
If G(z) has an infinite number of terms, the effect of any input term will be
present for ever after. This extension into the distant future of the influence of an
individual piece of input resembles the presence of transients in systems governed
by differential equations. Very long-term effects are usually undesirable; in
553

δ(t) g(t)

25.10
BEHAVIOUR OF Z TRANSFORMS IN THE COMPLEX PLANE
A

O t O T 2T t

δ(t − T) g(t − T)

t
O t O T 2T 3T

x(t) = δ(t) + δ(t – T) y(t) = g(t) + g(t − T)

Output
Input function
function
A
followed by
B

t
O T t O T 2T 3T

Fig. 25.18

particular, they should not increase as time goes on. Their increase or decrease is
described by the rate of increase or decrease of the coefficients in the series
g1 g2
G(z) = g0 + + 2 + . (25.30)
z z
We shall illustrate how information about this question can be obtained by exam-
ining the behaviour of G(z) when it is given in closed form, and the variable z is
allowed to be complex.
We limit consideration to cases where G(z) is a rational function of z:
aMz M + aM−1z M−1 +  + a0
G(z) = . (25.31)
bN z N + bN−1z N−1 +  + b0
We shall assume that M  N. Suppose that the am and bn are all real numbers, and
that the N solutions of the equation
bNzN + bN−1zN−1 + ··· + b0 = 0 (25.32)
554
are
LAPLACE AND Z TRANSFORMS: APPLICATIONS

z = z1, z2, z3, … , zN.


For simplicity, we shall assume that these numbers are all different. The denomin-
ator of (25.31) then has N different factors of the form (z − zn), for n = 1 to N,
so (25.31) can be written
aMz M + aM−1z M−1 +  + a0
G(z) = . (25.33)
bN (z − z1 )(z − z2 ) … (z − zN )
Notice that G(z) is infinite at the points z1, z2, … , zN. These points are called the
poles of G(z). Some of them may be complex numbers. If so, they occur in pairs:
if zn is a solution of (25.32), then so it its complex conjugate Zn. Equation (25.33)
may now be expressed as the sum of partial fractions (now in general complex) as
in Section 1.14:
C1 C2 CN
G(z) = + ++ (25.34)
z − z1 z − z 2 z − zN
since M  N, where C1 to CN are constants. A typical term has the form
C
, (25.35)
z−c
25

where c may be complex: if so, then C might be complex as well. This term is the
source of a part of the discrete output signal g(t) produced by an input x(t) = δ(t),
and we shall see whether it generates an increasing or a decreasing output.
Suppose firstly that we find a pole at z = c in (25.35), where c is a real number.
Then C is also real, and
−1
C C⎛ c⎞ C Cc Cc2
= ⎜1 − ⎟ = + 2 + 3 + .
z−c z⎝ z⎠ z z z
In the time domain this corresponds to the sequence
{C, Cc, Cc2, … }.
If |c|  1 the terms are increasing in magnitude, and the system is said to be
unstable. If |c|  1 they are decreasing in magnitude. The rate of increase or
decrease is actually exponential, because
| Ccn | = |C | en ln|c|.
If c = ±1, then the output time sequence is nondecreasing, and unstable:
{C, ±C, C, ±C, … }.
Next, suppose that c is complex. Then there is another pole at z = C. Taking
these together, we obtain a pair of complex conjugate terms, generating real
coefficients:
C y C C 1 ⎛ C Cc Cc2 ⎞
+ = 2 Re = 2 Re = 2 Re ⎜ + 2 + 3 + ⎟
z−c z−C z−c z 1 − cz −1 ⎝z z z ⎠
={2 Re (C), 2 Re (Cc), … }. (25.36)
555
Evidently the magnitude (modulus) of the coefficients follows the same rule as
before.

25.10
Each of the terms (23.34) in G(z) contributes to g(t) in a similar way. Therefore,
the response y(t) to a delta function input x(t) = δ(t) depends upon the poles cn of

BEHAVIOUR OF Z TRANSFORMS IN THE COMPLEX PLANE


G(z) as follows:

Stability of a linear system


(i) If | cn |  1 for every pole cn of G(z), the response y(t) dies away, so the system
is stable.
(ii) If not, then the system is unstable. (25.37)

We can interpret the complex poles more closely. Put

C = |C| eiφ and c = | c | eiω. (25.38)

From (25.38)

Ccn = |C | |c| n ei(nω +φ).

Therefore, for n = 0, 1, 2, … ,

2 Re (Ccn) = |C | |c | n cos(nω + φ).

Put this into the time sequence (25.36). It becomes

{2 |C| cos φ, 2|C | |c | cos(ω + φ), 2| C | | c| 2 cos(2ω + φ),


2| C| |c| 3 cos(3ω + φ), … }.

This sequence would be obtained by sampling at t = 1, 2, 3, … from the smooth


function

2|C | |c| t cos(ω t + φ)H(t), (25.39)

so a picture of the progress of the discrete transient can be obtained as in


Fig. 25.19. Alternatively, this is equivalent to samples at t = T, 2T, … taken from

2|C| | c| t/T cos(ω t/T + φ)H(t). (25.40)

In Fig. 25.20 we show an Argand diagram with the unit circle |z| = 1 indicated.
This is used as a design tool to obtain a qualitative idea of how a proposed circuit
will behave, and to modify its properties. We can find the poles (the points where
G(z) is infinite), and place them on the diagram. Poles within the circle promise
transients which die away; if there is a pole outside, then a stimulus applied to the
circuit will produce ever-increasing output, so the system will be unstable. Poles
lying on the circle |z| = 1 produce transients which do not approach zero or infinity
in magnitude. If the values associated with the circuit elements can be adjusted so
that all the poles lie inside the unit circle, than we shall have a circuit for which all
disturbances die away with time.
556
LAPLACE AND Z TRANSFORMS: APPLICATIONS

2 4 6
1 3 5 t

–1

Fig. 25.19 Discrete transient of C/(z − c). Suppose that C = 0.5 and c = 0.8 e2.9i. Then | C| = 0.5,
| c | = 0.8, φ = 0, ω = 2.9. The curve y = (0.8)t cos (2.9t) and the impulsive response to δ(t) are shown.

Imaginary
axis
25

z = c3

z = c2
z = c1
1
O Real Fig. 25.20 The unit circle | z | = 1,
axis and several poles of a transfer
function G(z). One of the poles is
outside the circle so the circuit is
unstable and a transient associated
with this pole will grow
z plane exponentially.

Self-test 25.5
A discrete system has the transfer function given (in terms of z = eTs) by
G(z) = z/(16z2 − 16z + 5).
Use the result (25.37) to determine the stability of the system.

25.11 z transforms and difference equations


Systems can be constructed whose output yn+1 at time t = (n + 1)T depends not only
on the input up to that time, but also on the preceding outputs. This is achieved by
delay elements which pick up each yn at time nT, store it for a time T, then feed it
back into the system so as to modify yn+1 in some way. A chain of delay elements
can reach back further into the history of the outputs. In this way yn+1 may be
related to the current input and earlier outputs by equations such as
557
yn+1 = yn + xn,

25.11
or
yn+2 = 2yn+1 − yn + xn,

Z TRANSFORMS AND DIFFERENCE EQUATIONS


for n = 0, 1, 2, … . Such equations are called difference equations or recurrence
relations. The equations above are called linear difference equations, because the
terms in y only appear linearly. The circuits producing such equations are not
necessarily linear in the sense we have used so far: in the sense of possessing a
transfer function. Difference equations are treated more fully in Chapter 38; for
the present we shall outline a connection with z transforms.

Example 25.22 Obtain the sequence {yn}, where


yn+1 = yn + xn, with n = 0, 1, 2, … ,
given that
y0 = 3 and {xn} = {1, 2, 3, … }.
This is easily done by simply counting.
For n = 0: y1 = y0 + x0 = 3 + 1.
For n = 1: y2 = y1 + x1 = (3 + 1) + 2.
For n = 2: y3 = y2 + x2 = (3 + 1 + 2) + 3, and so on. Evidently,
yn = 3 + (1 + 2 + 3 + ··· + n) = 3 + 12 n(n + 1)
by using a well-known formula (see Appendix A(f)).

Notice that we had to prescribe y0: it was not given by the difference equation,
and we could have assigned any value to it. It resembles the initial condition of a
first-order differential equation.

Example 25.23 Use z transforms to determine the stability of a feedback circuit


which processes the digital signal {xn} according to the difference equation
yn+2 = 3yn+1 − 2yn + xn,
where y0, y1, and the sequence {xn} are given.
It is usual to collect together the y terms on the left-hand side:
yn+2 − 3yn+1 + 2yn = xn (i)
as with a differential equation. The sequences and their transforms are given by
{xn } = {x0, x1, x2, … },
X(z) = x0 + x1z −1 + x2z −2 + ··· ;
{yn } = {y0, y1, y2, … },
Y (z) = y0 + y1z −1 + y2z −2 + ··· ;
{yn+1} = {y1, y2, y3, … },
Y1(z) = y1 + y2z−1 + y3z −2 + ··· ;
{yn+2} = {y2, y3, y4, … },
Y2(z) = y2 + y3z −1 + y4z −2 + ··· . ➚
558
Example 25.23 continued
LAPLACE AND Z TRANSFORMS: APPLICATIONS

By simply looking at the z series it can be seen that


Y1(z) = zY (z) − zy0
and
Y2(z) = zY1(z) − zy1 = z2Y (z) − z2y0 − zy1.
Therefore the z transforms of the sequences obeying the relation (i) are connected by
(z2Y (z) − z2y0 − zy1) − 3(zY (z) − zy0) + 2Y (z) = X(z),
or
(z2 − 3z + 2)Y (z) − (z2 − 3z)y0 − zy1 = X(z).
From this equation we obtain
X (z) + (z 2 − 3z)y0 + zy1 X (z) + (z 2 − 3z)y0 + zy1
Y (z ) = = . (ii)
z 2 − 3z + 2 (z − 1)(z − 2)
The denominator is independent of y0, y1, and {xn}, which are arbitrary. Y (z) has a
pole at z = 2, so the results of the previous section predict that unless y0, y1, and {xn}
are specially chosen, the output will grow exponentially.

In Example 25.23, eqn (ii), the denominator has the form


az 2 + bz + c
25

where a, b, c are the coefficients of yn+2, yn+1, and yn respectively. The denominator
alone determines the growth of transients, so there is really no need to work right
through the problem if all we want is information about the stability. In fact, if y0
= y1 = 0, which would be a natural condition, the circuit has a transfer function
equal to 1/(az2 + bz + c), so the situation is exactly the same as in the previous sec-
tion. Similar considerations apply to linear difference equations of any order.
A table of z-transforms can be found in Råde and Westergren (1995).

Problems

25.1 Invert the transforms (a) 1/[s(s2 + 1)], for 0.01 time units. Approximate the applied
(b) 1/[s2(s2 + 1)], (c) 1/[s3(s2 + 1)], by using (25.1). voltage by a suitable impulse function, and
solve the equation for i(t).
25.2 The equation for the current i(t) in an RLC
circuit for zero initial charge is 25.3 The displacement x(t) of a mass on a spring
t
with velocity damping and external force f(t)

 i(τ ) dτ = v(t).
di 1 per unit mass reduces to the conventional form
L + Ri +
dt C 0
F + 2kB + ω 2x = f(t). The initial conditions are
x(0) = 1, B(0) = 1. An impulse I is applied at t = t0.
(a) Solve this equation when L = 2, R = 3, C = 31 , Find the solution for t  0 for k2  ω 2.
v(t) = 3 cos t in conveniently scaled units, for
zero initial current and charge. 25.4 A light plank of length l rests across a
(b) Adapt the equation to the case when v(t) = 0 crevasse, and sags under the weight of a
and there is an initial charge q0 on the mountaineer of mass M standing at the centre. The
capacitor, and solve it, given that i(0) = 0. displacement u(x), where x is measured from one
(c) The circuit in (a) is quiescent with zero charge; end, is determined in general by Kd4u/dx4 = f(x),
then, at t = t0, a voltage of 300 units acts in it where K is constant and f(x) is force per unit length
559
along the plank. The boundary conditions, which 25.7 Evaluate the convolution integral
say that the plank merely rests on its ends, are

PROBLEMS
t

u(0) = u″(0) = u(l) = u″(l) = 0.  g(τ )h(t − τ ) dτ,


0
Treat the mountaineer as a point force and solve
the problem using Laplace transforms. (Hint: two or
conditions are prescribed at x = 0, but four are t

needed: call the missing ones A, B. Find A and B


by requiring u(l) = u″(l) = 0.)
 h(τ )g(t − τ ) dτ,
0

in the following cases. Sometimes it might be easier


25.5 Find the impedances of the circuits in to invert the corresponding Laplace transform
Fig. 25.21. (25.11).
(a) g(t) = et, h(t) = 1; (b) g(t) = 1, h(t) = 1;
(a) (c) g(t) = et, h(t) = et; (d) g(t) = e−t, h(t) = t;
R=2 (e) g(t) = t, h(t) = sin t; (f) g(t) = cos t, h(t) = t;
(g) g(t) = sin 3t, h(t) = e−2t;
C=2 (h) g(t) = sin t, h(t) = sin t;
(i) g(t) = t 4, h(t) = sin t;
L=3 (j) g(t) = t n, h(t) = t m.

L=2 25.8 Use the convolution theorem (25.11) to


(b) obtain an expression, in the form of an integral, for
C=3 a particular solution of the following equations.
d 2x d 2x
(a) + ω 2x = f (t); (b) − ω 2x = f (t).
dt 2 dt 2
R=2
25.9 Use the convolution theorem (25.11) to find a
solution x(t) of the following Volterra-type integral
R=1
(c) equations.
R=2 t

L=1 (a)  x(τ )(t − τ ) dτ = t ;


0
4

t
C=2
(b) x(t) = 1 +  x(τ )(t − τ ) dτ;
0
Fig. 25.21
t

25.6 Find the transfer functions V2(s)/V1(s) and


(c) x(t) = sin t +  x(τ ) cos(t − τ ) dτ.
0
V2(s)/I(s) in the circuits in Fig. 25.22.
25.10 By following a similar argument to that
(a) leading up to (25.14), show that
I(s) t

 x**(τ )f (t − τ ) dτ,
R=3 C=2
d
x(t) =
V1(s) V2(s) dt 0
L=1
where x**(t) represents the response of a quiescent
‘black box’ to a unit-function input H(t), and x(t)
R=5
is its response from quiescence to an input f(t).
Suppose that the transform X**(s) of the unit-
(b) R=2 L=2
function response is given by 1/(s − 1)(s + 2) in a
I(s) particular case. Obtain the response from zero
C=2 initial conditions to an input H(t) sin ω t.
V1(s) R=3 V2(s)

25.11 (a) A student learning a language aims to


memorize 50 new words a day, starting at t = 0.
She is successful in this but, after a time lapse α,
Fig. 25.22 remembers only a fraction e−0.01α of those learned at
560
any time. Express the number N(t) of words still in where ⎣t⎦ is the integer floor function (the largest
her vocabulary in terms of a convolution integral, integer less than or equal to t: for example,
LAPLACE AND Z TRANSFORMS: APPLICATIONS

and evaluate it. ⎣2.3⎦ = 2, ⎣3⎦ = 3, and ⎣−2.3⎦ = −3).


(b) The student decides to offset the words
forgotten by attempting (50 + 0.1t) words per day. 25.15 The equation
Find the new N(t), assuming the same initial t
success and the same rate of forgetting. 2  cos(t − u)x(u) du = x(t) − t
0
25.12 A population p(t) for t  0 develops as is an example of an integral equation. Note that
follows. The p0 individuals in existence at t = 0 die the integral is of convolution type, which means
out on average via a factor e−γ t, so that at time t that the Laplace transform of the equation is
only about p0 e−γ t are still in existence. For the rest, 1
take any time τ  t. The number born between τ 2 L {cos t}X(s) = X(s) − .
and τ + δτ is bp(τ ) δτ, where b is the birthrate; s2
these individuals die out through a factor e−β (t− τ ), Show that the solution is
where β  γ and t − τ is the time elapsed from birth. x(t) = 2(t − 1) et + t + 2.
Show that
t

p(t) = p0 e− γ t + b  p(τ ) e
0
−β(t− τ )
dτ ,
25.16 The differential equation
d 2x
+t
dx
−x=0
and solve the equation. dt 2 dt
does not have constant coefficients: the coefficient
25.13 A simple harmonic oscillator with of dx/dt is t. Using the results (24.8) and (24.12),
displacement x is subject to a constant force F0 show that the transform of the differential
25

for 0  t  t0, and allowed to oscillate freely equation subject to the conditions x(0) = 0
for t  t0. If H(t) is the Heaviside function its and x′(0) = 1 satisfies the first-order equation
equation of motion is
dX(s)
mF + kx = F0[H(t) − H(t − t0)]. −s + (s 2 − 2)X(s) = 1.
ds
If the system starts from rest in equilibrium, show
Verify that X(s) = 1/s2 satisfies this equation, and
that the Laplace transform is hence obtain the required solution of the original
F0 (1 − e −st ) equation.
L{x(t)} = ,
m s(s 2 + ω 2 )
where ω = √(k /m). Show that, for 0  t  t0, the 25.17 Using the method outlined in Problem
solution is 25.16, solve the following variable-coefficient
equations using Laplace transforms:
F0 (a) tx″(t) + (1 − t)x′(t) − x(t) = 0, x(0) = x′(0) = 1;
x(t) = (1 − cos ω t),
k (b) x″(t) + tx′(t) − 2x(t) = 2, x(0) = x′(0) = 0;
and find the solution for t  t0. (c) tx″(t) − x′(t) + tx(t) = sin t, x(0) = 1, x′(0) = 0.

25.14 An equation of the form 25.18 (Discrete systems, Section 25.8). The
dx(t) following signals are expressed in the sequence
= x(t − 1) + t, forms. Write the explicit form of x(t) and its
dt
Laplace transform in each case.
which relates the derivative at time t to the value (a) {1, 2, 1, 0, 0, 0, … }.
of the function at an earlier time, is an example (b) {0, 1, 2, 3, … }.
of a differential delay equation. If x(t) = 0 for (c) {3}.
t  0, show that the Laplace transform of the (d) {(−2)n}.
solution is (e) {0, 0, 3}.
1 1
L{x(t)} = = . 25.19 The transfer functions g(t) in the time
s 2(s − e −s ) s3(1 − e −s/s)
domain, and inputs x(t), are given below. Obtain
Expand 1/(1 − e−s/s) in powers of e−s/s using a
binomial expansion, and show that the outputs y(t) in each case.
(a) g(t) = {1, 1}, x(t) = {1, 1}.
⎣ t⎦
(t − n)n+2 (b) g(t) = {1, 1/2, 1/22, … }, x(t) = {1, 1}.
x(t) = − ∑ ,
(c) g(t) = {1, −1, 1, −1, … }, x(t) = {0, 2, 2}.
n = 0 (n + 2)!
561
25.20 Obtain the output y(t), when the transfer 25.25 (a) Prove that if the z transform of the
function is G(s) = 1/(1 − 31 e −Ts ) and the Laplace discrete function x(t) defined by {x0, x1, x2, … }

PROBLEMS
transform of the input is X(s) = e−Ts + 2 e−2Ts. is X(z), then the transform of {0, x0, x1, … } is
(Hint: expand G(s) in the form of an appropriate (1/z)X(z).
infinite series in powers of e−sT.) (b) Deduce that the transform of {0, 0, … , 0,
x0, x1, … } (starting with N zeros) is (1/z)NX(z).
25.21 Obtain the z transforms corresponding to (This is a time-delay rule for z transforms.)
the various specifications that follow:
(a) x(t) = δ(t − T) + 2δ(t − 2T) − δ(t − 3T). 25.26 Prove that if the z transform of
(b) x(t) = {1, −1, 1, −1, … }. {x0, x1, x2, … } is X(z), then the transform
(c) x(t) = {1/2n}. of {xN, xN+1, xN+2, … } is
(d) X(s) = e−Ts/(1 − e−2Ts). zNX(z) − zNx0 − zN−1x1 − ··· − zxN−1.
25.22 The following functions are sampled at (This resembles the differentiation rule for
interval T. Obtain the z transform of the (discrete) Laplace transforms, (24.12). Start the process
sampled functions (H(t) is the unit function (1.13)). with N = 1, then N = 2 etc., until the sequence
(a) tH(t). becomes clear.)
(b) e−t H(t).
25.27 The following represent transfer functions
25.23 Obtain the z transforms of the transfer for discrete systems, G(z). Find the poles, mark
functions, G(z), of various discrete, linear, systems them on an Argand diagram as in Fig. 25.20, and
which have been tested for the particular input x(t) state whether the systems are stable or not. Obtain
and output y(t) as specified: the rate of growth or decay of their transients.
(a) x(t) = {1, 1}, y(t) = {1, −1}, and find the (a) (z + 1)/(z2 − 4). (b) (z2 − z)/(4z2 − 1).
sequence for g(t). (c) 1/(4z 2
+ 1). (d) (z3 + 1)/(2z4 + 5z2 + 2).
(b) x(t) = {1, 0, 0, 3}, y(t) = {1, 1}.
(c) x(t) = {1, −1}, y(t) = {1, 1}. 25.28 {xn} and {yn} represent inputs and outputs
(d) x(t) = {1, 1, 1, … }, y(t) = {1, 0, −1, 0, 1, 0, −1, to discrete systems governed by the difference
0, 1, … }. equations shown. Use z transforms to obtain the
(e) x(t) = {1, 0, 1, 0, … }, transforms Y (z) in terms of X(z) and the initial
y(t) = {1, 0, −1, 0, 1, 0, −1, 0, 1, … }. values y0 and y1. State whether the systems are
stable or not.
25.24 Prove that if the z transform of the discrete (a) 4yn+2 − yn = xn; y0 = 1, y1 = 2.
function given by {x0, x1, x2, … } is X(z), then the (b) yn+2 − 3yn+1 + 2yn = 2xn; y0 = 0, y1 = 1.
discrete transform of y(t) = {x0, x1 e−CT, x2 e−2CT, … } (c) 2yn+2 + yn+1 + yn = xn+1 − xn; y0 = 0, y1 = 1.
is X(cCTz). (d) 2yn+2 + 3yn+1 − yn = xn; y0 = 1, y1 = 1.
26 Fourier series

CONTENTS

26.1 Fourier series for a periodic function 563


26.2 Integrals of periodic functions 564
26.3 Calculating the Fourier coefficients 566
26.4 Examples of Fourier series 569
26.5 Use of symmetry: sine and cosine series 572
26.6 Functions defined on a finite range: half-range series 574
26.7 Spectrum of a periodic function 577
26.8 Obtaining one Fourier series from another 578
26.9 The two-sided Fourier series 579
Problems 582

If a note on a piano is played, firstly by pressing the key and then by plucking the
string, the sounds produced are very different although the pitch or fundamental
frequency heard is the same in both cases. The note produced by an instrument is
not a pure tone or sinusoidal wave; it is a richer sound which contains other fre-
quencies. These occur in different proportions when the same note is stimulated
in different ways, or is sounded on different instruments.
A trained ear can detect some detail in these differences; the extra components
can be distinguished and their pitch recognized, or they can be isolated by using
resonators. The extra component frequencies of a note are all higher than the
fundamental frequency, and related to it in a simple manner. If the fundamental
frequency is f, then the harmonics present have frequencies
f, 2f, 3f, 4f, 5f, …,
the strength of the harmonics dropping off to zero as their frequency increases.
When these components are added, a profile for the composite wave is obtained.
A particular note was found to have components as shown:
Order of harmonic: 1 2 3 4 5 …
Frequency: f 2f 3f 4f 5f …
Relative amplitude: 1.0 0.9 0.3 0.3 0.1 …
The shape and amplitude of the component harmonic waves, and of the com-
posite wave, are shown in Fig. 26.1.
563

26.1
4
O O

FOURIER SERIES FOR A PERIODIC FUNCTION


(ms)
(f ) (4f )

O O
(2f ) (5f )

Compound sound

4
O O (ms)
(3f )

Fig. 26.1

By means of an electronic synthesizer the proportions in which harmonics


occur can be controlled and a great variety of sound quality generated, from flute
to drum. Given any particular fundamental frequency f, it is plausible that we
could generate a sound wave of any preassigned quality (that is to say, of any
shape) by adjusting the balance of the harmonics. This possibility is essentially
what the theory of Fourier series is about, though in a wider context than that of
sound waves.

26.1 Fourier series for a periodic function


The following symbols are used in connection with periodic functions (see
Section 20.1):
f = frequency (cycles/time; if time is in seconds, the unit is the hertz);
ω = angular frequency (radians/time): ω = 2πf = 2π/T;
T = period or wavelength: T = 1/f = 2π /ω.
A typical periodic function P(t) with period T is shown in Fig. 26.2. Any full-
period interval may be chosen for discussion; suppose it is the interval between
t = −π/ω and t = π /ω.
We shall express P(t) over this interval, denoted by [−π /ω, π/ω], as the sum of
harmonic (sinusoidal) curves having frequencies f, 2f, 3f, … , where f = 1/T; or,
equivalently, angular frequencies ω, 2ω, 3ω, … , where ω = 2π /T. A constant
term is also needed, since the average value of P(t) will not generally be zero. Both
564

T
FOURIER SERIES

− ωπ π
ω t
O
Fig. 26.2 A periodic function P(t)
with period T = 2π /ω.
26

sine and cosine terms are needed, because if we involve only sines or only cosines,
the sum will have a symmetry, odd or even (Section 15.9), which P(t) might not
have. Then we expect that, for suitable values of the constants an and bn

Fourier series for a function P(t) of period T


P(t) = 12 a0 + (a1 cos ω t + b1 sin ω t) + (a2 cos 2ω t + b2 sin 2ω t) + ···

= 12 a0 + ∑(a
n =1
n cos nω t + bn sin nω t),

where ω = 2π/T, −∞  t  ∞. (26.1)

Equation (26.1) is a Fourier series for P(t), and the constants a0; a1, b1; a2, b2; …
are its Fourier coefficients. It will be shown how to determine the coefficients in
Section 26.4: the factor --21 in the constant term --12 a0 is introduced to simplify the
working.
We have spoken in terms of the one-period range t = −π/ω to t = π/ω, but every
term on the right of (26.1) is periodic with the same period T = 2π/ω as P(t).
Therefore the series will describe P(t) for every value of t, not merely for t in the
interval between ±π /ω.

26.2 Integrals of periodic functions


We prove two results needed for the next section, in which the values of the
coefficients in (26.1) are determined. Figure 26.3 represents a periodic function
P(t) with period T. Choose any value of t, say t = t0, and compare the two integrals
T t0 +T

0
P(t) dt and

t0
P(t) dt,

each of which is taken over a one-period interval of P(t). The figure shows that the
integrals are equal by virtue of the area analogy (15.13). The two shaded areas in
Figs 26.3a, b are assembled from identical segments which are simply added up
in a different order.
Alternatively, differentiation with respect to t0 gives
t0 +T
d G

J
P(t) dtL = P(t0 + T) − P(t0) = 0,
dt0 I t0
565

(a) P(t)

26.2
δA

INTEGRALS OF PERIODIC FUNCTIONS


t
O T

(b) P(t)
δA

t0 t0 + T t
O

Fig. 26.3 Illustrating the area analogy for (a) ∫ T0 P(t) dt and (b) ∫ tt00+T P(t) dt, where P(t) has period T.

using (15.20) and the periodicity of P(t). Hence the integral is independent of t0.

The integral over any one-period interval of a function P(t) having period T,
t0 +T

 t0
P(t) dt,

does not depend on t0. (26.2)

Example 26.1 Show that


 sin 2t cos t dt = 0.
0
2

The period of cos t is π, because


2

cos2t = 12 (1 + cos 2t)


and the period of cos 2t is π. The period of sin 2t is also π. Therefore, by (26.2),
π 1
π

 
2

sin 2t cos2t dt = sin 2t cos2 t dt,


0 − 12 π

since the range − 12 π to 12 π also covers a period π. But the integrand is an odd function
about the origin, so that the value of the last version is zero (Section 15.9).

The following special results can be proved by using the trigonometric identi-
ties in Appendix B which convert products to sums.
566

Trigonometric integrals over a one-period interval


FOURIER SERIES

(a) For n and m = 0, 1, 2, … with n ≠ m,


π /ω

−π /ω
cos nω t cos mω t dt = 0,

π /ω

−π /ω
sin nω t sin mω t dt = 0,

π /ω
26

−π /ω
cos nω t sin mω t dt = 0.

(b) For n = 1, 2, …
π /ω π /ω

−π /ω
cos2 nω t dt =  −π /ω
sin2nω t dt = π /ω.

For n = 0, we obtain
π /ω π /ω

−π /ω
dt = 2π/ω and −π /ω
0 dt = 0.

(c) The range −π /ω to π/ω may be replaced by any interval of length 2π/ω.
(26.3)

26.3 Calculating the Fourier coefficients


From (26.1), we expect that any periodic function P(t) having period 2π/ω can be
expressed in the form of a Fourier series

P(t) = 1
2 a0 + ∑ (a
n =1
n cos nω t + bn sin nω t). (26.4)

To find a particular coefficient aN, multiply both sides of (24.4) by cos Nω t:



P(t) cos Nω t = 21 a0 cos Nω t + ∑(a n cos nω t cos Nω t + bn sin nω t cos Nω t).
n=1

Integrate both sides of this equation over the period between −π /ω and π/ω :
π /ω π /ω

−π /ω
P(t) cos Nω t dt = a0 1
2  −π /ω
cos Nω t dt

⎛ π /ω π /ω

 

+ ∑ ⎜ an cos nω t cos Nω t dt + bn sin nω t cos Nω t dt⎟ . (26.5)
n =1 ⎝ − π /ω − π /ω ⎠

(i) The constant term a0


Consider the case N = 0. According to (26.3a), all terms under the summation
sign in (26.5) are zero; so, after putting cos Nω t = cos 0 = 1, we are left with
π /ω π /ω

−π /ω
P(t) dt = 21 a0  −π /ω
dt =
π
ω
a0 .
567
Therefore

26.3
π /ω

a0 =
ω
π  −π /ω
P(t) dt, (26.6)

CALCULATING THE FOURIER COEFFICIENTS


and --12 a0 is equal to the average value of P(t) over a period.

(ii) The cosine terms, 1  N  ∞


Suppose that N ≠ 0. By (26.3), all the integrals on the right of (26.5) are zero
except the single one that involves aN, so (26.6) reduces to
π /ω π /ω

−π /ω
P(t) cos Nω t dt = aN  −π /ω
cos2Nω t dt =
π
ω
aN .

Therefore, for N = 1, 2, 3, … ,
π /ω

aN =
ω
π  −π /ω
P(t) cos Nω t dt. (26.7)

By comparing (26.7) with (26.6), it can be seen that a0 and a1, a2, … are all given
by the same formula. That is why the constant term in (26.4) is written as 12 a0
instead of a0.

(iii) The sine terms, 1  N  ∞


To find bN for N = 1, 2, … , multiply (26.4) by sin Nω t and integrate. In a similar
way to that described above, we find that, for N = 1, 2, 3, … ,
π /ω

bN =
ω
π  −π /ω
P(t) sin Nω t dt. (26.8)

Since P(t) is a known function, the integrals in (26.6), (26.7), and (26.8) can be
evaluated to give all the coefficients in the Fourier series (26.4).

In the following summary, the letter n is used in place of N to simplify the form of
the results.

Fourier series for periodic functions


Function: P(t), period T = 2π/ω,

Fourier series: P(t) = 12 a0 + ∑ (a
n =1
n cos nω t + bn sin nω t),

Fourier coefficients:
π /ω


an =
ω
π −π /ω
P(t) cos nω t dt (n = 0, 1, 2, … ),

π /ω

b = 
ω
n P(t) sin nω t dt (n = 1, 2, … )
π −π /ω

(in place of the range of integration −π /ω to π/ω, any other one-period interval
may be used). (26.9)
568
It can be seen also that since
FOURIER SERIES

π /ω 1
2T

 
ω 1
1
a0 = P(t) dt = P(t) dt,

2
− π /ω
T − 12T

the following is true:

Average value of P(t)


26

The average value of P(t) over a one-period interval is equal to the constant
term --12 a0. (26.10)

Notice the case of period 2π, which often occurs. In such cases ω = 2π /T = 1:

Fourier series for functions P(t) with period 2π



P(t) = 12 a0 + ∑ (a
n =1
n cos nt + bn sin nt),

where
π

an =  1
π −π
P(t) cos nt dt,

b = 
1
n P(t) sin nt dt.
π −π

(The integrals may be taken over any one-period interval instead of [−π, π].)
(26.11)

For some Fourier coefficients such as

t cos nwt dt
k
(k, a positive integer)

the integrals can be obtained by repeated integration by parts. Define the following
indefinite integrals:


F1(t) = f(t) dt,

F2(t) = F1(t) dt,

F3(t) = F2(t) dt, … .

Using integration by parts (see Section 17.7),

P(t)f(t) dt = P(t)F (t) − P′(t)F (t) dt


1 1

1

= P(t)F (t) − P′(t)F (t) + P″(t)F (t) dt 2 2

= P(t)F1(t) − P′(t)F2(t) + P″(t)F3(t) + … .


This is known as Kronecker’s method. If P(t) = tk, then the series will terminate.
The formula is particularly useful for Fourier coefficients since the indefinite
569
integrals Fi(t) are easy to evaluate. For example, if F(t) = cos nwt, then we can
choose

26.4
1 1 1
F1(t) = sin nwt, F2(t) = − cos nwt, F3(t) = − sin nwt, …

EXAMPLES OF FOURIER SERIES


nw n2w2 n3w3
and so on. Similar indefinite integrals can be found for F(t) = sin nwt.

Self-test 26.1
If P(t) is 2π-periodic, and P(t) = t2 for 0  t  2π, use the Kronecker formula
to find the Fourier series of P(t).

26.4 Examples of Fourier series


The actual calculation of Fourier coefficients requires attention to detail, espe-
cially in respect of a0.

Example 26.2 Find the Fourier series of the function P(t) shown in Fig. 26.4.

P(t)
π

−2π −π O π 2π 4π t Fig. 26.4

The period is 2π, so that ω = 2π/2π = 1. Choosing the interval −π to π as the basis of the
calculation yields
⎧−t if −π  t  0,
P(t) = ⎨
⎩ t if 0  t  π.
The coefficients can be obtained from (26.11):
Coefficients bn. P(t) is an even function about the origin (see Section 15.9), and sin nt is
odd; therefore P(t) sin nt is odd. Hence the integrals defining bn are all zero:
bn = 0 (n = 1, 2, … ). (i)
Coefficients an. Since P(t) is even and cos nt is even, P(t) cos nt is even; so (26.11) gives
2 ⎛ ⎡ t sin nt ⎤ sin nt ⎞
π π π π

  
2 2
an = P(t) cos nt dt = t cos nt dt = ⎜ − dt⎟ ,
π 0 π 0 π ⎝ ⎢⎣ n ⎥⎦ 0 0 n ⎠
(ii)

after integrating by parts.


At this point it is seen that n = 0, as before, is a case requiring separate treatment
(basically because ∫ cos nt dt ≠ n−1 sin nt + C when n = 0). Postponing the question
of n = 0, suppose firstly that n = 1, 2, … . The formula becomes
2 2 (−1)n − 1
an = [cos nt ] π
= .
πn2
0
π n2 ➚
570
Example 26.2 continued
FOURIER SERIES

Therefore
⎧0 if n is even,
an = ⎨
⎩− 4 / πn if n is odd.
2 (iii)

We still have to find a0, which is given by (26.11) as


π

 t d t = π.
2
a0 = (iv)
π 0

Collect the coefficients from (i), (iii), and (iv) and put them back into the Fourier series:
26

4 ⎛ cos t cos 3t cos 5t ⎞


P(t) = 12 π − ⎜ 2 + + + ⎟ .
π⎝ 1 32 52 ⎠

In Fig. 26.5, we show how P(t) gradually takes shaped as we take more and more
terms of the Fourier series in Example 26.2. Here
4 ⎛ cos t cos 3t cos 5t ⎞
P(t) = 21 π − ⎜ 2 + + + ⎟ ,
π⎝ 1 32 52 ⎠
= 1.571 − 1.273 cos t − 0.141 cos 3t − 0.051 cos 5t − … .

(c)
(a) π π
1
2 π

−π O π −π O π

(b) (d)
π π

1
2 π

−π O π −π O π

Fig. 26.5 (a) 1.571; (b) 1.571 − 1.273 cos t; (c) 1.571 – 1.273 cos t − 0.141 cos 3t;
(d) 1.571 − 1.273 cos t − 0.141 cos 3t − 0.051 cos 5t.

Example 26.3 Find the Fourier series for the function shown in Fig. 26.6.
The period is T = 2π, so that ω = 1 and the Fourier series is

P(t) = 12 a0 + ∑ (a n cos nt + bn sin nt).
n=1

It makes no difference to the ease of calculation whether −π to π or 0 to 2π is chosen as


the basic interval. We will take 0 to 2π to remind you of the possibility. Then
571
Example 26.3 continued

26.4
P(t)
π

EXAMPLES OF FOURIER SERIES


−3π −2π −π O π 2π 3π 4π t

Fig. 26.6

⎧t (0  t  π),
P(t) = ⎨
⎩0 (π  t  2π).

Coefficient an. From (26.11),


2π π

  t cos nt dt.
1 1
an = P(t) cos nt dt =
π 0 π 0

Warned by Example 26.2, we deal first with the case n = 1, 2, 3, … :


π
1 ⎡1 1 ⎤ 1 ⎡⎛ 1 ⎞ ⎛ 1 ⎞⎤ 1
an = t sin nt + 2 cos nt ⎥ = ⎢ ⎜ 0 + 2 cos nπ⎟ − ⎜ 0 + 2 ⎟ ⎥ = [(−1)n − 1].
π ⎢⎣ n n ⎦0 π ⎣⎝ n ⎠ ⎝ n ⎠ ⎦ πn2
The sequence has every even-order term zero:
2 2 2
a1 = − , a2 = 0, a3 = − 2 , a4 = 0, a5 = − 2 , .
π π3 π5
The case n = 0 is again special:


1 1 1 2 π 1
a0 =
π
P(t) dt =
π
[ 2 t ]0 = 2 π.
0

Coefficient bn.
2π π π
1 ⎡ t cos nt ⎤
 
1 1 1
bn = P(t) sin nt dt = t sin nt dt = − + 2 sin nt ⎥
π 0 π 0 π ⎢⎣ n n ⎦0
1⎡ 1 1 ⎤ (−1)n
= ⎢ − (π cos nπ − 0) + 2 (0 − 0)⎥ = − .
π⎣ n n ⎦ n
The series is difficult to write out if the cosine and sine terms are kept together. By
separating them, we obtain
2⎛ 1 1 ⎞ ⎛ 1 1 ⎞
P(t) = 14 π − ⎜ cos t + 2 cos 3t + 2 cos 5t + ⎟ + ⎜ sin t − sin 2t + sin 3t − ⎟ .
π⎝ 3 5 ⎠ ⎝ 2 3 ⎠

In Example 26.3, the function P(t) jumps from π to zero at the points
t = … , −π, π, 3π, … .
To see what values are generated by the series at such points put, say, t = π into the
series we obtained. All the cosine terms become (−1) and all the sine terms are
zero, so that at x = π the series delivers
572

1 2⎛ 1 1 ⎞
π + ⎜ 1 + 2 + 2 + ⎟ .
FOURIER SERIES

4 π⎝ 3 5 ⎠
A few minutes with a calculator make it clear that this series for P(π) cannot
add up to π, and plainly it does not give zero either. In fact its sum is --12 π, half-way
between these values. The general rule is as follows.

Fourier series at a jump in value of a function


26

The sum of a Fourier series at a jump is equal to the average of the two function
values on either side. This is written as
--12 [x(t0− ) + x(t0+ )].
(26.12)

Figure 26.7 shows how the function is fitted by the series when the six terms up
to cos 3t and sin 3t are taken.

π 1
Value 2 π given
by the series

−3π −2π −π O π 2π 3π 4π

Fig. 26.7

In (26.12), x(t 0−) and x(t 0+) are the left- and right-hand limits at the jump. As can
be seen in Fig. 26.7, x(π −) = π and x(π +) = 0 at the discontinuity at t = π.

Self-test 26.2
1 1 …
Using Example 26.2, what is the sum of the series 1 + + + ?
32 52

26.5 Use of symmetry: sine and cosine series


In general, the Fourier series for a periodic function will contain both sine and
cosine terms. However, the following results hold.

Even and odd functions P(t)


(a) If P(t) is even about the origin, then
b1 = b2 = ··· = 0.
(b) If P(t) is odd about the origin, then
a0 = a1 = a2 = ··· = 0. (26.13)
573
These results follow from (26.9) and (26.11), because P(t) sin nω t is odd if
P(t) is even, and P(t) cos nω t is odd if P(t) is odd.

26.5
USE OF SYMMETRY: SINE AND COSINE SERIES
Example 26.4 Obtain the Fourier series for the switching function P(t) shown
in Fig. 26.8.
The period T is 2, so that ω = π. Choose the basic interval to be t = −1 to 1. On this
interval,
⎧−1 for −1  t  0,
P(t) = ⎨
⎩ 1 for 0  t  1.
Since P(t) is odd about the origin,
a0 = a1 = a2 = ··· = 0.
For the bn, from (26.9), since the integrands are even functions,
π 1 1

  sin nπt dt = − nπ [cos nπt] = − nπ [(−1) − 1]


2 2
bn = P(t) sin nπt dt = 2 1
0
n
π −1 0

for n = 1, 2, … . The sequence bn is therefore


4 41
b1 = , b2 = 0, b3 = , b4 = 0, … ,
π π3
and the Fourier series is
4⎛ 1 1 ⎞ 4 ∞ sin(2r − 1)π t
P(t) = ⎜ sin π t + sin 3π t + sin 5π t + ⎟ = ∑ .
π⎝ 3 5 ⎠ π r =1 2r − 1

P(t)

1
P(t)
1
−1 O 1 2 t

−1
−2 − 32 −1 − 12 O 1
2 1 3
2 2 t

Fig. 26.8 Fig. 26.9

Example 26.5 Obtain the Fourier series for the switching function P(t) shown
in Fig. 26.9.
The period is 2, so that ω = π. Choose [−1, 1] as the representative interval; then
⎧1 if − 12  t  12 ,
P(t) = ⎨
⎩0 elsewhere on the interval.
Since P(t) is an even function, b1 = b2 = b3 = ··· = 0. The coefficients an are given by
1
1

 
2
1 1 2
an = P(t) cos nπt dt = cos nπt dt = [sin nπ t]−2 1 = sin 12 nπ.
−1 − 12 nπ 2
nπ ➚
574
Example 26.5 continued
FOURIER SERIES

As we have seen before, a0 gives trouble since this formula is meaningless when n = 0.
We have, in fact,
1


2

a0 = 1 dt = 1.
− 12

Then
a0 = 1, a1 = 2/π, a2 = 0, a3 = −2/(3π), a4 = 0, …,
so that the odd-order coefficients alternate in sign.
26

Finally the series is


2⎛ 1 1 ⎞ 2 ∞ (−1)r cos(2r − 1)π t
P(t) = 1
2 + ⎜ cos πt − cos 3πt + cos 5πt − ⎟ =
π⎝ 3 5 ⎠
1
2 − ∑
π r =1 2r − 1
.

Functions defined on a finite range:


26.6
half-range series
It is often necessary to obtain a Fourier-type series for a function which is of
interest only over some finite interval, and whose natural extension, if any, is not
necessarily periodic. A Fourier series is invariably periodic, so that it cannot fit a
non-periodic function everywhere. For example, consider the problem of finding
a Fourier series which will fit f(t), where (Fig. 26.10a)
f(t) = t between t = 0 and π,
when our only concern is whether the series fits f(t) between 0 and π, the
behaviour of the series elsewhere being a matter of indifference.
Figure 26.10 illustrates a technique for producing such series. We hold on to the
given function inside the interval of interest, but extend it by means of an artificial

(a) f (t)
(c) fc(t)
π π

−π π t −3π − 2π −π O π 2π 3π t
−π

(b) fs(t) (d) fa(t)


π π

−3π − 2π −π O π 2π 3π t −3π − 2π −π O π 2π 3π t
−π

Fig. 26.10 (a) f(t) = t on 0  t  π, with natural non-periodic extension.


(b) fs(t) = t on 0  t  π, has period 2π and is an odd function.
(c) fc(t) = t on 0  t  π, has period 2π and is an even function.
(d) fa(t) = t on 0  t  π, and has an arbitrary extension of period 3π.
575
function which is periodic. This extended function will have a Fourier series of its
own, and it will agree with f(t) on the interval 0 to π.

26.6
In Fig. 26.10b we have extended the non-periodic function f(t) = t on 0  t  π
to an artificial function fs(t) which has period 2π and is an odd function. Being

FUNCTIONS DEFINED ON A FINITE RANGE: HALF-RANGE SERIES


odd, it has a Fourier series consisting of sine terms only, and this odd extension
will correctly reproduce f(t) on 0  t  π.
Alternatively, Fig. 26.10c shows how to get a series of cosine terms by an even
extension fc(t), keeping fc(t) = f(t) on 0  t  π.
Again, Fig. 26.10d shows a fairly arbitrary extension of period 3π, which will
have a Fourier series containing both sine and cosine terms. Obviously there is an
infinite number of possibilities, the most important being the so-called half-range
sine and cosine series, corresponding to odd and even extensions respectively.

Example 26.6 Obtain a Fourier sine series for f(t) = t on the interval 0  t  π.
Extend f(t) on 0  t  π as an odd function fs(t) with period 2π (not π) as shown in
Fig. 26.10b. Then ω = 1 in (26.9). Choose the interval −π to π as basic. Then since fs(t)
is odd, we know in advance from (26.13) that

fs(t) = ∑b n sin nt
n=1

(sine terms only), where


π

 f (t) sin nt dt
1
bn = s
π −π
π
=  f (t) sin nt dt (since f (t) is odd; see (15.17))
2
s s
π 0
π
=  f(t) sin nt dt (since f (t) agrees with f(t) on 0  t  π)
2
s
π 0
π π
2⎡ 1 ⎤ 2⎛ π ⎞ 2
=  t sin nt dt = ⎢− t cos nt +
2 1
sin nt ⎥ = ⎜ − cos nπ⎟ = (−1) n+1
.
π 0 π⎣ n n ⎦ π2⎝ n ⎠ n
0

Therefore the required series is



⎛ 1 1 ⎞ (−1)n+1 sin nt
2 ⎜ sin t − sin 2t + sin 3t − $⎟ = 2 ∑ ,
⎝ 2 3 ⎠ n=1 n
and this is equal to t on 0  t  π, but nowhere else. (By (26.12), the value delivered by
the series at t = π is zero, which is to be expected by (26.12) since fs(t) has a jump at t = π.)

(a) Half-range cosine series for 0  t  π


π

 f(t) cos nt dt.



2
f(t) = 12 a0 + ∑ an cos nt, an =
n =1 π 0
(b) Half-range sine series for 0  t  π
π

 f(t) sin nt dt.



2
f(t) = ∑ bn sin nt, bn =
n =1 π 0
(26.14)
576
FOURIER SERIES

t
−2t0 −t0 O t0 2t0 3t0

Fig. 26.11 f(t), 0  t  t0; odd extension, fs(t), period 2t0.


26

Suppose that, more generally, a sine series representing f(t) for 0  t  t0 is


required (Fig. 26.11). Extend f(t) to an odd function fs(t) having period 2t0. Then,
in eqn (26.9),
ω = 2π/2t0 = π /t0,
and, since fs(t) is odd,

fs(t) = ∑ b sin(nπt /t ),
n =1
n 0

where
t0 t0

  f (t) sin t
1 nπt 2 nπt
bn = fs(t) sin dt = dt,
t0 − t0
t0 t0 0 0

since fs(t) = f(t) on the interval 0 to t0.


If the extension is carried out so as to produce an even periodic function, a
similar calculation leads to a cosine expansion.

(a) Half-range cosine series for 0  t  t0

 f (t) cos ntπt dt.


∞ t0
nπt 2
f (t) = a + ∑ an cos
1
2 0 , an =
n =1 t0 t0 0 0

(b) Half-range sine series for 0  t  t0

 f (t) sin ntπt dt.


∞ t0
nπ t 2
f (t) = ∑ bn sin , bn =
n =1 t0 t0 0 0
(26.15)

Self-test 26.3

Obtain the sine series expansion ∑ bn sin nt which represents cos t over the
n=1
restricted interval 0  t  π.
577

26.7 Spectrum of a periodic function

26.7
Suppose that P(t) is a periodic function with period T. The Fourier series has
the form

SPECTRUM OF A PERIODIC FUNCTION



P(t) = 21 a0 + ∑(a n cos nω t + bn sin nω t)
n =1

where ω = 2π/T. By the identity (1.18),


an cos nω t + bn sin nω t = cn cos(nω t + φn),
where φn is a phase angle, and
cn = √(a n2 + bn2 ) (n = 1, 2, … ),
which is the (positive) amplitude or strength of the nth term. For completeness,
we include
c0 = 21 | a0 |.
The sequence of amplitudes c0, c1, c2, … is called the spectrum of P(t). If the
series consists only of cosine (or sine) terms, then correspondingly cn = √(a n2 ) =
| an | (or |bn |) − the spectral components are always positive or zero.
The spectrum can be displayed as if it were a physical spectrum. Figure 26.12
shows the spectrum of the function worked out in Example 26.5. The property
which makes the spectrum a useful concept is that the spectrum is independent of
the time origin of t, although the Fourier series itself is not. If

P(t) = 21 a0 + ∑c n cos(nω t + φn),
n =1

then the series for P(t − t0), whose graph is the same shape as P(t) but moved to the
right a distance t0, is

P(t − t0) = 21 a0 + ∑c n cos[nω (t − t0) + φn].
n =1

(b) cn
1

(a) P(t)
1

O
− 52 − 32 − 12 1
2
3
2
5
2 O 1 3 5 7 9n

Fig. 26.12 (See Example 26.5.) (a) P(t) = --21 + (2 /π)(cos πt − --31 cos 3πt + ··· ).
(b) Spectral components --21 , 2 /π, 2 /(3π), 2 /(5π), … .
578
The cn remain the same, and only the phase angle changes. Therefore it is only the
shape of P(t) which determines its spectrum, not its clock-timing. For this reason
FOURIER SERIES

the spectral or harmonic composition of a piano note is always the same,


independently of what time of day the note is played.
It is important to realize that the spectrum refers only to a complete periodic
function, and not to an isolated segment such as those discussed in Section 26.7.
For the functions shown in Fig. 26.10, there correspond different Fourier series
which have different spectra.
26

26.8 Obtaining one Fourier series from another


There exist ‘dictionaries’ of Fourier series, but the entries cannot match exactly all
the functions required in practice. If the broad shape of the dictionary entry is the
same as that of the function whose series is needed, then scaling it or translating it
along the t axis, or along the axis of P(t), might be all that is required. The trans-
ition can require more than one stage.
The examples which follow are based on the standard form shown in Fig. 26.13,
which was expressed as a Fourier series in Example 26.2:
4 ⎛ cos t cos 3t cos 5t ⎞
P(t) = 21 π − ⎜ 2 + + + ⎟ . (26.16)
π⎝ 1 32 52 ⎠

Q(t)
P(t)
1
π
−2π −π O π 2π 3π t
−2π −π O π 2π 3π t

Fig. 26.13 Fig. 26.14

Example 26.7 Find the Fourier expansion of the function Q(t) shown in Fig. 26.14.
This is the same as Fig. 26.13 except that the vertical dimension is reduced by a factor
1/π. Therefore, from (26.16),
4 ⎛ 1 ⎞
Q(t) = 1
− ⎜ cos t + 2 cos 3t + ⎟ .
π2 ⎝ ⎠
2
3

Example 26.8 Find the Fourier expansion of the function Q(t) shown in Fig. 26.15.
Here the t scale is changed by a factor π. We obtain, from (26.16),
4⎛ 1 1 ⎞
Q(t) = 12 π − ⎜ cos π t + 2 cos 3π t + 2 cos 5π t + ⎟ .
π⎝ 3 5 ⎠
It is necessary to be careful here: it is not t/π but πt in the new series. Check the period:
it is equal to 2, which is correct.
579

Q(t)

26.9
π Q(t)
π

THE TWO-SIDED FOURIER SERIES


O
−3 −2 −1 1 2 3 4 5 6t − 52 π − 32 π − 12 π 12 π π 32 π 5
2 π t
O

Fig. 26.15 Fig. 26.16

Example 26.9 Find the Fourier expansion of Q(t) in Fig. 26.16.


The graph of P(t) in Fig. 26.13 has been shifted a distance --12 π to the left (see Fig. 26.16).
Therefore
Q(t) = P(t + 12 π)
4⎛ 1 1 ⎞
= 12 π − ⎜ cos(t + 2 π ) + 2 cos 3(t + 2 π ) + 2 cos 5(t + 2 π ) + ⎟ .
1 1 1
π⎝ 3 5 ⎠
As n goes through the sequence 1, 3, 5, 7, … , cos --12 nπ = 0 and sin --12 nπ becomes the
alternating sequence 1, −1, 1, −1, … . Therefore
4⎛ 1 1 ⎞
Q(t) = 12 π − ⎜ − sin t + 2 sin 3t − 2 sin 5t − ⎟ .
π⎝ 3 5 ⎠

Self-test 26.4
It was shown in Example 26.6 that

2 (–1)n+1 sin nt
t= ∑
π n=1 n
(0  t  π).

By integrating both sides of the equation over an interval (0, τ), obtain the
Fourier cosine series for τ 2 (0  τ  π).

26.9 The two-sided Fourier series


Equations (26.9) define the Fourier series in terms of circular frequency ω, where
ω = 2π/T, and T is the period. For the rest of the chapter we shall instead use the
fundamental frequency f0 (complete cycles per unit time), since it will simplify the
subsequent development of Fourier transforms in Chapter 27. We then have
1
T= and ω = 2πf0.
f0
In terms of f0, (26.9) becomes
580

Fourier series in terms of frequency f0


FOURIER SERIES

xP(t) a real or complex function with period T = 1/f0.


(a) Fourier series

xP(t) = 12 a0 + ∑(a
n =1
n cos 2πnf0 t + bn sin 2πnf0t).

(b) Coefficients


26

an = 2f0 xP(t) cos 2πnf0t dt


Period

b = 2f 
n 0 xP(t) sin 2πnf0t dt.
Period
(26.17)

We shall now show that (26.17) may be reorganized into another shape, as
follows:

The two-sided Fourier series


xP(t) a real or complex function with period T = 1/f0.
(a) Two-sided series

xP(t) = ∑X e
n = −∞
n
i 2 πnf0t

(b) Coefficients Xn

Xn = f0 Period
xP(t) e−i2πnf0t dt.
(26.18)

The coefficients are in general complex even if xP(t) is real, and the series runs
from n = −∞ to n = ∞.
To prove (26.18) we shall work backwards from it to arrive at (26.17). Start with
(26.18a):
∞ −1
xP(t) = X0 + ∑X
n =1
n ei2 π nf0 t + ∑X
n =−∞
n ei2 π nf0 t

∞ ∞
= X0 + ∑X
n =1
n ei2 π nf0 t + ∑X
n =1
−n e− i2 π nf0 t, (26.19)

changing the counting index n to (−n) in the final term.


From (26.18b), for n positive and negative,

Xn = f0  Period
xP(t)[cos 2πnf0t − i sin 2πnf0t] dt = 12 (an − ibn), (26.20)
581
where

26.9

an = 2 f0
Period
xP(t) cos 2 πnf0 t dt,⎪

THE TWO-SIDED FOURIER SERIES




(26.21)
b = 2f xP(t) sin 2 πnf0 t dt. ⎪
n 0
⎪⎭
Period

Therefore
a−n = an and b−n = −bn. (26.22)

It follows from (26.20) and (26.22) that when n  0, as in the sums (26.19),
Xn = 12 (an − ibn), X−n = 12 (a−n − ib−n) = 12 (an + ibn), (26.23)

where an and bn are the same numbers as the coefficients in the original series (26.17).
Finally, (26.19) becomes

xP(t) = 21 a0 + ∑ [ (a
n =1
1
2 n − ibn ) ei2 π nf0 t + 21 (an + ibn ) e− i2 π nf0 t ].

After using Euler’s formula (6.8) for the exponentials, and carrying out the mul-
tiplications, the terms in which i appear cancel, and we are left with

xP(t) = 21 a0 + ∑ (a n cos 2πnf0t + bn sin 2πnf0t),
n =1

which is the original form (26.17a). (Since xP(t) may be complex, so may an and bn,
so we should not shorten the final calculation by taking twice the real part
of 12 (an − ibn) ei2πnf0t.)
The following properties sometimes save calculation:

Properties of Xn in the two-sided Fourier series


(a) X n = --12 (an − ibn), and X −n = --12 (an + ibn), where an and bn are derived as in (26.17).
(b) If xP(t) is real, then X −n = e n, the complex conjugate of Xn.
(c) If xP(t) is real and even, X n is real and X −n = X n.
(d) If xP(t) is real and odd, X n is pure imaginary and X −n = −X n. (26.24)

Example 26.10 Obtain the two-sided Fourier series for the function xP(t),
having period T, of which a single period is shown in Fig. 26.17.
In (26.18) f0 = 1/T. Therefore
1
T 1
τ

 
1 2
1 2

Xn = xP(t) e −i 2πn t / T dt = e −i 2πnt / T dt


T − 12 T T − 12 τ

1 T 1
τ −i iπnt / T
= [e −i 2πnt / T ]−2 1 τ = (e − e −iπnt / T )
T (−i 2 πn) 2
2 πn
1
= sin(π nτ /T) (from (6.10)).
πn ➚
582
Example 26.10 continued
FOURIER SERIES

xP(t)
1

− 12 T − 12 τ O 1
2 τ 1
2 T t Fig. 26.17
26

Finally

1 πnτ iπnt / T
xP(t) = ∑ πn
sin
T
e .
n= −∞

Problems

26.1 Draw a sketch of the following odd ⎧0 (− π  t  0),


(a) f (t) = ⎨
2π-periodic functions defined for −π  t  π, ⎩t (0  t  π);
and find a general formula for their Fourier
⎧t + π (− π  t  0),
coefficients: (b) f (t) = ⎨
⎩t (0  t  π).
⎧−1 (−π  t  0),
(a) f (t) = ⎨
⎩ 1 (0  t  π); 26.4 A wave is described by the 2π-periodic function
(b) f(t) = t (−π  t  π); ⎧0 (− π  t  0),
f (t) = ⎨
⎧−t 2 (−π  t  0),
(c) f (t) = ⎨ 2 ⎩sin t (0  t  π).
⎩ t (0  t  π); Find the Fourier series of f(t).

(d) f (t) = ⎧⎨ e t − 1 (−π  t  0),


−t
26.5 Show that the Fourier series of the
⎩−(e − 1) (0  t  π); 2π-periodic function
⎧ 1 (− π  t  − 12 π), ⎧0 (− π  t  0),
⎪⎪−1 (− 21 π  t  0), f (t) = ⎨
(e) f (t) = ⎨ ⎩1 (0  t  π),
1 (0  t  12 π),
⎪ is
⎪⎩−1 ( 12 π  t  π).
2⎛ 1 1 ⎞
1
+ ⎜ sin t + sin 3t + sin 5t + ⎟ .
π⎝ ⎠
2
26.2 Draw a sketch of the following even 3 5
2π-periodic functions defined for −π  t  π, and
find a general formula for their Fourier coefficients: What value does the Fourier series take at t = 0?
By choosing a particular value of t, find the sum
⎧−1 (−π  t  − 12 π), of the series
(a) f (t) = ⎪⎨ 1 (− 12 π  t  12 π), 1 1 1
⎪⎩−1 ( 12 π  t  π); 1− + − + .
3 5 7
(b) f(t) = t2; (c) f(t) = cos --21 t.
26.6 A signal F sin t with amplitude F  0 is fully
26.3 Draw a sketch of the following 2π-periodic rectified into F| sin t |. Find the Fourier series of the
functions defined for −π  t  π, and find a rectified signal. What is the amplitude of its first
general formula for their Fourier coefficients: harmonic?
583
26.7 A Fourier series is given by Fourier series of f(t) obtained by term-by-term
differentiation is the same series as the Fourier

PROBLEMS

n+a
∑n
n =1
3
+ an + 3
sin nt, series of f ′(t).
Consider now the function g(t) = t 3 defined for
where a is a design parameter in the system. Find −π  t  π. Find the Fourier series of g(t) and g′(t).
a in order that the leading harmonics n = 1 and Confirm that the derivative of the Fourier series
n = 2 have amplitudes in the ratio 2 : 1. What is of g(t) is not the same series as the Fourier series
the amplitude of the next harmonic? of g′(t).
Comparing the functions of f(t) and g(t), what
26.8 The two 2π-periodic signals shown in feature of g(t) do you think causes the problem
Fig. 26.18 are added. Find the Fourier series of with its differentiated Fourier series?
the combined signal. What value should F take in
order that the leading harmonic should disappear? 26.12 Sketch the wave defined by

⎧0 (−π  t  0),
P(t) = ⎨
(a) F ⎩|sin 2t | (0  t  π),
extended so as to have period 2π. Find its Fourier
series. (The identities
t sin A cos B = --12 [sin(A + B) + sin(A − B)],
−π O π
sin A sin B = --12 [−cos(A + B) + cos(A − B)],
will be needed.)
−F
26.13 Show that
(b) ∞
(−1)n − 1
1 t = 2∑ sin nt
n =1 n
t for −π  t  π. Integrate the terms from t = 0 to
−π O π t = x, and rearrange them to show that
−1 ∞ ∞
(−1)n − 1 (−1)n − 1
x2 = 4 ∑ 2
− 4 ∑ cos nx.
n =1 n n =1 n2
Fig. 26.18
Now use (26.10) to establish the value of the
constant term in this Fourier series.
26.9 A T-periodic function is defined by
Q(t) = --14 T 2 − t2 for − --12 T  t  --12 T. 26.14 From Problem 26.13, or by direct means,
obtain the Fourier series valid for −π  t  π:
Find the Fourier series of Q(t). What is the error ∞
(−1)n
between the sum of the first four terms of the t 2 = 31 π 2 + 4 ∑ cos nt.
series and Q(t) at (a) t = 0, (b) t = --14T? n =1 n
2

By integrating all the terms in this expression from


26.10 A 2π-periodic function is defined by
t = 0 to t = x, obtain a Fourier series for x3 − π2x.
⎧β t(π − t) (0  t  π), (It is always valid to integrate a Fourier series
f (t) = ⎨
⎩β t(π + t) (−π  t  0).
in this way in order to obtain a new one, but
differentiation of the terms does not always lead
Find the Fourier series for f(t). What is the ratio to a valid series.)
of the amplitudes of the third and first harmonics?
Compare the values of f(t) and the Fourier series 26.15 Obtain the Fourier series of the
up to and including the coefficient b3 at t = --12 π. function having period T which is defined
for − --12 T  t  --12 T by
26.11 Find the Fourier series of the 2π-periodic
⎧−2t (− 12 T  t  0),
function defined by f(t) = t(π2 − t2) for −π  t  π. P(t) = ⎨
⎩ 2t (0  t  2 T ).
1
Find the derivative of f(t) for −π  t  π and find
its Fourier series. Confirm that the derivative of the Display its spectrum as in Fig. 26.12.
584
26.16 The function f(t) is defined on the interval ⎧− 1 (− 1  t  0),
P(t) = ⎨
0  t  1 by f(t) = 1. Express f(t) as a half-range ⎩ 1 (0  t  1),
FOURIER SERIES

Fourier series on 0  t  1, (a) as a sine series,


(b) as a cosine series. Sketch the sum of the series for one period, is
on −∞  t  ∞ in both cases. 2⎛ 1 1 ⎞
P(t) = ⎜ sin π t + sin 3π t + sin 5π t + ⎟
π⎝ 3 5 ⎠
26.17 The function f(t) is defined on 0  t  1 by
f(t) = t. Express f(t) as a half-range Fourier series on (see Example 26.4).
0  t  1, (a) as a cosine series, (b) as a sine series. Deduce the expansion of the function Q(t),
period T, defined on one period by
26.18 Express f(t) = sin ω t on 0  t  π/ω as a
26

⎧− a (0  t  12 T ),
half-range cosine series. Sketch the sum of the Q(t) = ⎨
⎩ a ( 2 T  t  T ).
1
series on −∞  t  ∞.

26.19 Express f(t) = cos ω t on 0  t  π/ω as a 26.25 Find the Fourier series of the 2π-periodic
half-range sine series. Sketch the sum of the series sawtooth wave defined by
on −∞  t  ∞. f(t) = t (−π  t  π).
26.20 Express f(t) = cos t on 0  t  2π as a Determine the forced part of the solution of the
half-range sine series. second-order differential equation
d 2x
26.21 Express f(t) = cos t on 0  t  2π as a + Ω 2x = K sin ω t,
dt 2
half-range cosine series.
where ω ≠ ±Ω. Hence put together the periodic
26.22 Express the function f(t), for 0  t  π, output of the forced system
(a) as a half-range sine series, (b) as a half-range d 2x
cosine series: + Ω 2x = f (t),
dt 2
⎧1 (0  t  12 π),
f (t) = ⎨ where f(t) is the sawtooth wave above. For what
⎩0 ( 2 π  t  π ).
1
values of Ω does the system exhibit resonance?

26.23 The Fourier series for the function P(t), 26.26 The Fourier series of a function with
period 2π, given by period T is given by
⎧− t (− π  t  0), ∞
P(t) = ⎨ f (t) = 12 a 0 + ∑ (an cos ω t + bn sin ω t),
⎩ t (0  t  π ), n =1
is where T = 2π /ω. Multiply both sides of the
4 ⎛ cos t cos 3t cos 5t ⎞ equation by f(t) and integrate between − --12 T and
P(t) = 12 π − ⎜ + + + ⎟
π ⎝ 12 32 52 ⎠ --12 T, to obtain Parseval’s identity
(see Example 26.2). Deduce from this the Fourier 1
2T ∞


2
expanions of the following periodic functions. f (t)2 dt = 12 a 20 + ∑ (a n2 + bn2 ).
T − 12 T n =1
(a) Q(t), period 4, where
(a) Let T = π, and
⎧− 3t (− 2  t  0),
Q(t) = ⎨
⎩ 3t (0  t  2); ⎧−1 (− 12 π  t  0),
f (t) = ⎨
⎩ 1 (0  t  2 π).
1
(b) R(t), period 2, where
⎧1 + t (− 1  t  0), Show that
R(t) = ⎨
⎩1 − t (0  t  1). ∞
1 π2
(Sketch R(t) to understand the connection with ∑ (2n + 1)
n =1
2
=
8
.
P(t).)
(c) Check that P(t), Q(t), R(t) have similar spectra. (b) Let f(t) = t (−π  t  π) be a 2π-periodic
function. Find its Fourier series (see
26.24 The Fourier series for the function P(t), Problem 26.1b), and deduce the corresponding
period 2, given by Parseval identity.
585
26.27 The function f(t) with period T has the (b) Let ω = p/q and ω0 = r /s, where p, q, r, s are
Fourier series whole numbers such that no two of them have

PROBLEMS
∞ a common divisor other than 1. What is the
f (t) = 12 a 0 + ∑ (an cos nω t + bn sin nω t). period? Express x(t) as the sum of two waves
n =1 with angular frequencies ω ± ω0 (these are
Find the Laplace transform of the function as called the sidebands). What is the Fourier
the sum of a series of Laplace transforms of the cosine expansion based on this period?
trigonometric terms. Hence find the Laplace (c) If you know about irrational numbers, show
transform of the 2π-periodic function defined by that x1(t) = cos t cos 2t never repeats itself
exactly: it is not periodic.
⎧−t 2 (− π  t  0),
f (t) = ⎨ 2
⎩ t (0  t  π). 26.29 (a) Prove that
(See Problem 26.1c.) 1
2T

26.28 A radio wave described by  − 12 T


⎧T, m = n,
ei 2π nf0 t e −i 2πm f0 t dt = ⎨
⎩0, m ≠ n.
x(t) = a cos ω t cos ω0t,
(b) Confirm that the expansion (26.18) is valid by
where ω 0 is very much greater than ω, represents multiplying both sides of (26.18a) by e −i 2πNf0 t and
a carrier wave subject to amplitude modulation integrating the result over one period.
by the comparatively slowly varying term cos ω t,
which represents a musical note. Roughly sketch 26.30 Obtain the two-sided Fourier series for
the general character of x(t). the sawtooth function xP(t) defined by xP(t) = t/T
(a) If ω = 500 and ω0 = 100 001 (notice the 1 at the for 0  t  T, together with its periodic extension
end), what is the period of x(t)? of period T.
27 Fourier transforms

CONTENTS

27.1 Sine and cosine transforms 587


27.2 The exponential Fourier transform 590
27.3 Short notations: alternative expressions 592
27.4 Fourier transforms of some basic functions 593
27.5 Rules for manipulating transforms 596
27.6 The delta function and periodic functions 599
27.7 Convolution theorem for Fourier transforms 601
27.8 The shah function 605
27.9 Energy in a signal: Rayleigh’s theorem 607
27.10 Diffraction from a uniformly radiating strip 608
27.11 General source distribution and the inverse transform 612
27.12 Transforms in radiation problems 613
Problems 618

Fourier series are used to express functions defined over a finite range as a per-
iodic series of harmonic terms. Fourier integrals, which are the subject of this
chapter, are used to describe non-periodic functions over an infinite range. For the
infinite interval t  0 there exist cosine or sine transforms, and corresponding
inverse transforms. We shall treat these as intuitive extensions of the correspond-
ing Fourier series to functions having an infinite period. For functions arbitrary
over the two-sided infinite interval −∞  t  ∞ there exist (subject to certain tech-
nical limitations) the complex exponential transform and its inverse, which we
formulate as a combination of cosine/sine transforms.
The strict mathematical arguments necessary to prove the results go far beyond
the scope of this book, so intuitive justification is used freely. There are some
apparent restrictions. For example, convergence of the integrals (see Section 15.6,
on improper integrals) requires the functions concerned to approach zero as the
variable t approaches ±∞. This restriction would, for instance, rule out considera-
tion of periodic functions. However, in some circumstances restrictions can safely
be disregarded as illustrated in the later sections of this chapter. (There exists a
sophisticated theory of generalized functions regulating such liberties.)
587

27.1 Sine and cosine transforms

27.1
Figure 27.1 shows examples of functions that are not periodic. Non-periodic
functions can still be expressed in terms of harmonic functions (sines, cosines, and

SINE AND COSINE TRANSFORMS


their complex exponential forms), but instead of an infinite series of harmonic
terms associated with discrete frequencies f0, 2f0, 3f0, … , where f0 is the funda-
mental frequency, an infinite integral over a continuum of frequencies is required.

x(t)
x(t)
x(t)

t t t
O O O

Fig. 27.1 Three non-periodic functions.

A full derivation of such results is too complicated to give in this book, but
representation by a continuous distribution of frequencies can be made plausible
by regarding a non-periodic function as the limit of a periodic function as it
approaches an infinite period. To illustrate this idea we shall consider a simple case.
Let pO(t) be a real-valued function for −∞  t  ∞, periodic with period T, and
odd (i.e. pO(−t) = −pO(t)). Further, suppose that it consists of a stream of discrete,
equally spaced ‘pulses’, each of duration τ  T, and has the value zero between
them, as illustrated in Fig. 27.2. The function pO(t) can be represented by a Fourier
sine series (see Section 26.6). Up to this point we have expressed Fourier series in
terms of the circular frequency ω = (2π /T), but here we shall use the fundamental
frequency f0 = 1/T instead. Then (26.15b) becomes

pO(t) = ∑b
n= 0
n sin(2πnf0t), (27.1)

where
1 1
2T 2T

bn = 2f0
− 12 T
pO(t) sin(2πnf0t) dt = 4f0
0
pO(t) sin(2πnf0t) dt (27.2)

(since the integrand is an even function).

Period T
PO(t)

t
−T − 12T − 12 τ O 1
τ 1
T T
2 2

Pulse duration τ

Fig. 27.2
588
We shall seek a representation of the fixed single pulse present in the interval
− 12 T  t  12 T by letting T → ∞. This pushes away to infinity the periodic copies
FOURIER TRANSFORMS

of the central pulse, whilst leaving the central pulse unaffected. (In a physical con-
text, as in passing a solitary pulse through an electrical filter, we should be likely
to disregard extraneous pulses that arrive only every hour, or every month, or
every century, as the period T is taken larger and larger.)
The series (27.1) becomes increasingly intractable as T  τ (meaning T ‘is
much greater than’ τ ); too many terms have to be taken in order to get a reason-
able approximation to pO(t). However, we can recast the series as an integral, and
this problem disappears. Write (27.2) in the form
1
2T


bn
27

=2 pO(t) sin(2πnf0t) dt = Xs(nf0), (say), (27.3)


2 f0 0

from (27.2). As T → ∞, f0 = 1/T → 0. Also, the successive frequency components


nf0 are separated by a distance f0. Now, for f0 small, write
f0 = δf, nf0 = fn.
Equation (27.1) becomes

pO(t) = 2δf ∑ Xs( fn) sin(2π fnt). (27.4)
n= 0

When T → ∞ so that δf → 0 the periodic copies of the central pulse are consigned
to infinity, and we are left with a solitary pulse x(t) given by
⎧p (t), − 21 τ  t  21 τ ;
x(t) = ⎨ O
⎩0, elsewhere.
At the same time (by eqn (15.9)) the sum in (27.4) approaches an infinite integral,
as does the finite integral in (27.3). We then have the symmetrical pair of relations:

Fourier sine transform


(a) x(t) = 2  X ( f ) sin(2πft) df,


0
s

where

(b) Xs(f ) = 2  x(t) sin(2πft) dt.


0 (27.5)

The function Xs( f ) is called the Fourier sine transform of x(t), or the spectral
density or frequency distribution function corresponding to x(t), in the context
of sine transforms. All positive frequencies are represented. Notice that (27.5a)
automatically defines x(t) as an odd function if the context demands that we be
concerned with the time range −∞  t  ∞. The normal use for the sine trans-
form is, however, for t  0 only.
589
If we start with an even periodic chain of pulses pE(t) and its Fourier cosine
series, we arrive similarly at the cosine transform pair:

27.1
Fourier cosine transform

SINE AND COSINE TRANSFORMS


(a) x(t) = 2  X (f ) cos(2πft) df,


0
c

where

(b) Xc(f ) = 2  x(t) cos(2πft) dt.


0 (27.6)

The equations (27.5a) and (27.6a) are also known as the inverse transforms
of Xs(f ) and Xc(f ). As with Laplace transforms, they solve the problem: ‘given a
frequency distribution, obtain the corresponding time function’.
To arrive at the sine and cosine equations we assumed that the signal x(t) con-
sists of a pulse of finite extent τ. However, the results are true for suitably behaved
functions having infinite extent, from t = 0 to t = ∞ (or from t = −∞ to ∞ provided
that they are appropriately odd or even functions). Thus x(t) = e−t for t  0 has
both a sine and a cosine transform for t  0.
As in eqn (26.12) for Fourier series, the value attributed to x(t) by (27.5) and
(27.6) is the average of its values on either side of a jump discontinuity at t = t0:
x(t) = 12 [x(t 0− ) + x(t +0 )] (27.7)

(in the notation of (26.12)).

Example 27.1 (a) Obtain the cosine transform of the function x(t) given by
⎧1, 0  t  1,
x(t) = ⎨
⎩0, t  1,
(see Fig. 27.3a) and write down the inverse transform (without evaluating it).
(b) Deduce that


sin u
du = 21 π.
0
u
(c) Show that the value attributed to x(1), a point of discontinuity of x(t),
conforms with eqn (27.7).

(a) (b)
x(t) x(t)
1
1

t t
O 1 −1 O 1

Fig. 27.3

590
Example 27.1 continued
FOURIER TRANSFORMS

(a) From (27.6b)



sin(2π f )
1

  cos(2πft) dt = π f [sin(2π ft)]


1
Xc(f ) = 2 x(t) cos(2πft) dt = 2 1
0 = .
0 0 πf
The inverse transform is

sin(2π f )
2  0 πf
cos(2π ft) df. (i)

This is equal to x(t) at points of continuity of x(t).


(b) By putting (−t) for t into (i), x(t) is extended to an even function (i.e. x(−t) = x(t)) on
−∞  t  ∞: see Fig. 27.3b). The point t = 0 is therefore a point of continuity of x(t) on
27

−∞  t  ∞, so that

sin 2π f
x(0) = 2 0 πf
df = 1. (ii)

Substitute u = 2πf, df /f = du/u, and we obtain from (ii) the standard integral


sin u
du = 12 π. (iii)
0 u
(c) The point t = 1 marks a jump in value of x(t) from 1 to 0. Equation (27.7) predicts that
the integral (i) will deliver the value x(1) = 12 (1 + 0) = 12 . To confirm this, put t = 1 in eqn (i):
∞ ∞
sin 2π f sin 4π f
2  0 πf
cos 2π f df = 0 πf
df .

Put u = 4πf, df /f = du/u; then by using the result (iii):


∞ ∞
sin 4π f
 
1 sin v 1
df = dv = ⋅ 12 π = 12 ,
0 πf π 0 v π
as predicted.

Self-test 27.1
Find the Fourier cosine transform of x(t) = e−t, and deduce from its inverse
that

 0
cos u
u2 + t2
π
du = e−t.
2t

27.2 The exponential Fourier transform


The sine and cosine transforms have limited usefulness for the applications in this
chapter since they refer only to functions x(t) that are odd or even, unless we
restrict the interval of t to t  0. Functions on −∞  t  ∞ are in general neither
odd nor even, and the following transform pair is free from these limitations:
591

The complex exponential Fourier transform pair

27.2
x(t) is any well-behaved real or complex function on −∞  t  ∞ such that

∫−∞ |x(t)| dt exists. Then at points of continuity of x(t)

THE EXPONENTIAL FOURIER TRANSFORM


(a) x(t) = −∞


X(f ) e2πift df,

where

(b) X( f ) =  −∞
x(t) e−2πift dt.
(27.8)

These formulae closely resemble eqns (26.18) for the two-sided (complex) Fourier
series, and it is possible to calculate the transition from (26.18) to the exponential
transform by the procedure described in the previous section. Alternatively, (27.8)
can be obtained from the sine and cosine transforms.

The condition that ∫−∞ |x(t)| dt should exist (i.e. converge) appears to be rather
restrictive. For example, any function which does not tend to zero as t → ±∞ is
suspect. Besides functions like t and et, this condition would disqualify all per-
iodic functions such as sin ω t. The imprecise term ‘well-behaved’ in (27.8) implies

further unspecified restrictions. Here we shall only say that if ∫−∞ | x(t)| dt exists,
the only exclusions are functions having a degree of eccentricity rarely encoun-
tered in physical applications. Simple jump discontinuities in the value of x(t) are
allowed, and, as with Fourier series, eqn (27.8a) delivers a value at such points
equal to the average of the values of x(t) on either side of the jump. If there is a
jump at t = t0, then (as in (26.12))
x(t) = 12 [x(t 0− ) + x(t +0 )]. (27.9)

The scope of the Fourier transform is not paralysed by the restrictions, and the
examples in this chapter will show that the system is far more flexible than this
discussion might suggest.

Example 27.2 (Compare Example 26.10.) Find the spectral distribution


function X( f ) of the signal x(t) defined by
⎧1, − 21 τ  t  21 τ ,
x(t) = ⎨
⎩0, elsewhere.
Figure 27.4a shows x(t). The spectral distribution function, or Fourier transform, of x(t)
is given by
∞ 1
τ

 
2

X(f ) = x(t) e −2 π i f t dt = e −2 π i f t dt
−∞ − 12 τ

1 1
τ i
= [e −2 π i f t ]−2 1 τ = (e − π i f τ − e π i f τ )
−2 π i f 2
2π f
i 1
= (−2i) sin π f τ = sin π f τ .
2π f πf ➚
592
Example 27.2 continued
FOURIER TRANSFORMS

(b) X(f ) τ
(a) x(t)
1

t f
− 12 τ O 1
2 τ − τ1 1
τ

Fig. 27.4

The signal x(t) is reconstructed from the spectral components X(f ) by


27

∞ ∞
sin πfτ 2 π i f t
x(t) = 
−∞
X(f ) e2 π i f t df = −∞ πf
e df .

Figure 27.4b shows the frequency distribution, which in this case is a real function.

The following properties are sometimes useful:

Properties of the exponential Fourier transform X(f )


If x(t) is a real function, then:
(a) X(−f ) = X(f ), the complex conjugate of X(f ).
(b) If x(t) is even, X( f ) is real and even.
(c) If x(t) is odd, X( f ) is pure imaginary, and odd. (27.10)

27.3 Short notations: alternative expressions


There are several conventional notations in common use in connection with the sine,
cosine, and exponential Fourier transforms. For example:

(a) F [x(t)] = 
−∞
x(t) e−2πift dt (27.11a)

(sometimes expressed as F [x](t)) denotes the exponential Fourier transform of


x(t). Fs and Fc are used for the sine and cosine transforms respectively.
(b) F −1[X(f )] denotes the inverse Fourier transform: it gives us back the originat-
ing function x(t) if we know the frequency distribution X( f ):

F [X(f )] =
−1
 X( f ) e
−∞
2πift
df = x(t). (27.11b)

(c) The ‘tilde’ notation is often convenient:


F [x(t)] = r(t).
Thus, if we have several time-dependent functions u(t), v(t), … , their trans-
forms may be written p(f ), q(f ), … . Many books also use the notation
x(t) ↔ X(f ).
593
More importantly, there are several different versions of the results (27.5),
(27.6), and (27.8), all of which are widely used, so it is necessary to establish which

27.4
one has been adopted in a particular piece of work. For example, the Fourier
cosine transform and its inverse may appear in the form

FOURIER TRANSFORMS OF SOME BASIC FUNCTIONS


∞ ∞

Gc(ω ) =
2
π  x(t) cos ω t dt,
0
x(t) =  G (ω) cos ω t dω,
0
c (27.12)

where ω can be interpreted as circular frequency (ω = 2πf0). Here we have pre-


ferred to present more symmetrical expressions, but all versions are equivalent
to each other by means of a change of variable. There are similar variants of the
exponential transform (27.8); in particular, the positive and negative signs in
the complex exponents ±2πift may be interchanged. See also Problem 27.7.

27.4 Fourier transforms of some basic functions


(Note: a more complete list of transforms is given in Appendix G.)

(a) The top-hat function Π(t)


The functions Π(t) and Π(t/τ ) are shown in Fig. 27.5. The transforms were found
in Example 27.2:

(a) II(t) (b) II(t/τ )


1 1

− 12 O 1
2 t − 12 τ O 12 τ t

Fig. 27.5 (a) Π(t), (b) Π(t/τ ) (width τ ).

Top-hat function
sin π f
(a) F [Π(t)] = .
πf
sin (π f τ)
(b) F [Π(t/τ )] = .
πf (27.13)

(b) The function sinc


The functions on the right of (27.13) are related to a standard function defined by
sin(π x)
sinc x = .
πx
594

sinc x
FOURIER TRANSFORMS

−2 O
−4 −3 −1 1 2 3 4 x Fig. 27.6 sinc x = sin(πx)/(πx).

Its graph is shown in Fig. 27.6. It is an even function, and it can be shown that the
signed area under the curve is equal to unity (see, e.g. Example 27.1.ii).
27

The function sinc x


(a) sinc x = sin(πx)/πx, sinc 0 = 1.

(b) 
−∞
sinc x dx = 1.
(27.14)

The transform of the top-hat functions (27.13) become

Fourier transform of Π(t)


(a) F [Π(t)] = sinc f.
(b) F [Π(t /τ )] = τ sinc(τ f ). (27.15)

For the Fourier transform of sinc t, start with (27.15b). Since τ sinc τ f = F [Π(t/τ )],
it follows that

Π(t/τ ) = F −1[τ sinc(τ f )] = 


−∞
τ sinc(τ f ) ei2πft df.

Interchange the letters t and f, take the complex conjugate of the result to make
the sign in the exponential negative, and put 1 /τ in place of τ. We obtain


1 t
Π(τ f ) = sinc e− i2 π f t dt.
−∞
τ τ
Multiply through by τ to obtain the results:

Fourier transform of sinc t


(a) F [sinc t] = Π(f ).
(b) F [sinc(t/τ )] = τ Π(τ f ). (27.16)

Equations (27.15) and (27.16) illustrate a general fact: that as the duration of
a signal increases (e.g. as τ increases in (27.15b)), the effective frequency range
tends to become narrower, and conversely.
595

(b) 1/α Re X(f )

27.4
Im X(f )
(a) x(t)

FOURIER TRANSFORMS OF SOME BASIC FUNCTIONS


1

−α /2π α /2π f

O t

Fig. 27.7 (a) x(t) = e−α tH(t). (b) X(f ) = F [e−α tH(t)].

(c) A one-sided exponential function


Consider the function in Fig. 27.7a defined by
x(t) = e−α tH(t),
where H(t) is the unit function (1.13) and α is positive.
∞ ∞

F [x(t)] = 
−∞
x(t) e−i2πft dt =  0
e−αt e−i2πft dt

e
−1
= −(α + i2 π f )t dt = [e−(α + i2 π f )t]0∞
0
α + i2 π f
1
= .
α + i2 π f
Therefore
1
F [e−αtH(t)] = . (27.17)
α + i2 π f
Since x(t) is neither even nor odd the spectral distribution is a complex function.
Its real and imaginary parts are shown in Fig. 27.7b.

Example 27.3 Find the Fourier transform of the function given by x(t) = e−|t|
(see Fig. 27.8a)

(a) x(t) (b) X(f )

1 2

−2 O 2 t −1 O 1 f

Fig. 27.8 (a) x(t) = e−|t |H(t). (b) X(f ) = F [e−|t| ].


596
Example 27.3 continued
FOURIER TRANSFORMS

We have
∞ 0 ∞
X( f ) = −∞
e−| t| e −i2 π f t d t = 
−∞
e t e − i2 π f t d t + e
0
−t
e − i2 π f t d t

1 (−1)
= [e(1− i2 π f )t ]0−∞ + [e −(1+ i2 π f )t]0∞
1 − i 2π f 1 + i 2π f
1 1 2
= + = .
1 − i2π f 1 + i2π f 1 + 4 π 2 f 2
This function is shown in Fig. 27.8b.
27

27.5 Rules for manipulating transforms


The following rules enable new transforms to be obtained from known ones. The
constants A, B, C, D, K are assumed to be real, but the signals x(t) may be real or
complex.
The proofs are left to the problems. Most of them are obtained by writing
down the appropriate Fourier integral or its inverse and then making a simple
change of variable. The Examples illustrate how these results are used.

Signal Transform
x(t) X(f ) = F [x(t)]
(a) Linearity Ax1(t) + Bx2(t) AX1(f ) + BX2(f )
(b) Time scaling x(At) X(f/A)/ |A|
Time reversal x(−t) X(−f )
(c) Time delay x(t − B) X(f ) e−i2πBf
(d) Frequency scaling x(t/C)/|C| X(Cf )
(e) Frequency shift x(t) ei2πDt X(f − D)
(f) Modulation x(t) cos 2πKt [X(f − K) + X(f + K)]/2
x(t) sin 2πKt [X(f − K) − X(f + K)]/(2i)
(g) Duality X(t) x(−f )
(h) Differentiation dx(t)/dt (i2πf )X(f )
dnx(t)/dtn (i2πf )nX(f )
(27.18)

Example 27.4 Given that F [Π(t)] = sinc f, obtain F [sinc(t /τ )].


Use the time-scaling rule (27.18b) with A = 1/τ :
Π(t/τ ) ↔ τ sinc(fτ ).

Example 27.5 Given that F [Π(t)] = sinc f, obtain F [Π(at + d )].


We have
Π(at + d ) = Π(a{t + d /a}). ➚
597
Example 27.5 continued

27.5
From the time-scaling rule (27.18b).
F [Π(at)] = (1/|a |) sinc(f/a).

RULES FOR MANIPULATING TRANSFORMS


Then, using the time-delay rule (27.18c), with B = −d/a,
F [Π(a{t + d /a})] = (1/|a |) sinc( f/a) ei2πdf /a.

Example 27.6 Obtain the signal x(t) produced by the spectral distribution X( f )
shown in Fig. 27.9.

X(f )
1

−3 −2 −1 O 1 2 3 f Fig. 27.9

The two rectangular pulses are arrived at by extending the range of Π(f ) by a factor 2
to give Π(t/2), then shifting this graph along the f axis a distance 2 to the left and 2 to the
right to give
X( f ) = Π( 12 {f + 2}) + Π( 12 {f − 2}).
From the frequency-scaling rule (27.18d) with C = 12 ,
Π( 12 f ) ↔ 2 sinc 2t.
Then, by the frequency-shift rule (27.18e), with K = z 2,
X( f ) ↔ (ei4πt + e−i4πt)·2 sinc 2t = 4 cos 4πt sinc 2t.
(The modulation rule (27.18f) could have been adopted for the final stage instead.)

Example 27.7 Let


⎧− et , t  0,
x(t) = ⎨ −t (i)
⎩e , t  0.
Given that
x(t) ↔ −4πif /(1 + 4π2f 2),
use the duality theorem (27.18g) to deduce the Fourier transform of t /(1 + t 2).
Put
X( f ) = −4πif/(1 + 4π2f 2).
By the frequency-scaling rule (27.18d), with C = 1/2π,
⎛ f ⎞
X ⎜ ⎟ = −2if/(1 + f 2) ↔ 2πx(2πt).
⎝ 2π ⎠
Divide by (−2i) and rename the functions obtained Y(f ) and y(t):
f/(1 + f 2) = Y(f ) ↔ iπx(2πt) = y(t). (ii)
x(t) is given, so we know the inverse transform of
Y(f ) = f/(1 + f 2). ➚
598
Example 27.7 continued
FOURIER TRANSFORMS

We need the transform of the identical time function given by


Y(t) = t/(1 + t 2).
The duality theorem (27.18g) tells how time may be exchanged for frequency in a
given function. Applying it to Y(f ) in (ii) we have
Y(t) ↔ y(−f ) = iπx(−2πf ),
so we must substitute (−2πf ) for t into (i) (including the inequalities t  0 and t  0)
giving
⎧i π e2 π f , f  0,
t /(1 + t 2 ) ↔ ⎨
⎩− i π e −2 π f
, f  0.
27

Example 27.8 (Sidebands) The voltage signal x(t) = v(t) cos 2πf0t represents an
audiofrequency signal v(t) used to modulate a carrier wave of high frequency
f0. Suppose that F [v(t)] = V( f ), where f lies in the range −fm  f  fm  f0. Use
the modulation formula (27.18f ) to illustrate the general nature of the spectral
distribution function X( f ) = F [x(t)].
From (27.18f)
X(f ) = 12 [V(f − f0) + V( f + f0)]. (i)

V(f − f0) is zero unless


− fm  f − f0  fm,
that is unless
f0 − fm  f  f0 + fm, (ii)

and similarly V( f + f0) is zero unless


−f0 − fm  f  − f0 + fm. (iii)

The intervals (ii) and (iii) do not overlap, since fm  f0. Therefore the spectral
distribution (i) falls into two separate parts on opposite sides of the origin of f, as in
Fig. 27.10. They are related to the sidebands of communication engineering. The two
parts have the same shape, since their graphs consist of the graph of V(f ) moved through
distances ± f0. (In general they would be complex, and even if they are real they will not
generally correspond to two real signals.)

(a) V(f )
(b) X(f )

O f −f0 f0

Fig. 27.10 (a) Spectral distribution of v(t). (b) Spectral distribution of v(t) cos 2πf0t.
599

Self-test 27.2

27.6
(a) Prove that F [dx/dt] = i2πf X[ f ], and F [d2x/dt2] = −4π2f 2X[f ]. (b) Given
that F [e−πt ] = e−πf (we cannot prove this result here), deduce that F [e−t ] =
2 2 2

THE DELTA FUNCTION AND PERIODIC FUNCTIONS


√πe−π f . (c) Use the results (a) and (b) to prove that
2 2

F [t2e−t ] = 12 √πe−π f (1 − 2π2f 2).


2 2 2

27.6 The delta function and periodic functions


The delta or impulse function δ(t) was defined in Section 25.2. It is convenient for
the present purposes to modify slightly the definition given there. For the Laplace
transform, which is concerned with t  0 only, we constructed a rectangle of
width ε and height 1 /ε, placing it on the interval t = 0 to t = ε. Here we are dealing
with −∞  t  ∞ and instead we place it on the interval t = − 21 ε to t = 21 ε (see
Fig. 27.11). Nothing else changes. The principal properties are:

δ(t)
1/ ε

t
− 12 ε 1
2 ε Fig. 27.11

The impulse or delta function δ(t)


(a) Informal definition: δ(t) = 1 /ε for − 12 ε  t  12 ε, and δ(t) = 0 elsewhere
(allowing ε to be as small we wish).
(b) Sifting property: if a  c  b,

 f(t) δ(t − c) dt = f(c),


b

and the integral is otherwise zero.


(c) Fourier transform: by (b)

F [δ(t − c)] =  −∞
δ(t − c) e−2πift dt = e−2πifc.
(27.19)

The signal giving rise to δ(f − f0) is given by the inverse transform:

F −1[δ(f − f0)] = −∞


δ( f − f0) ei2πft df = ei2πf0t.
600
Therefore
FOURIER TRANSFORMS

ei2πf0t ↔ δ(f − f0),


and similarly
e−i2πf0t ↔ δ( f + f0).
These are complex signals. But
cos (2πf0t) = 12 (ei2πf0t + e−i2πf0t);
so
cos (2πf0t) ↔ 12 [δ( f − f0) + δ(f + f0)]. (27.20a)
27

Similarly,
1 i2 π f0 t
sin (2π f0 t) = (e − e− i2 π f0 t ),
2i
so
1
sin (2π f0 t) ↔ [δ(f − f0 ) − δ(f + f0 )]. (27.20b)
2i
Therefore, the (real) cosine and sine functions having frequency f0 are each asso-
ciated with a pair of spectral lines, located at f = ± f0, as in Fig. 27.12.
The delta function is not at all a normal function. It belongs to a class of
mathematical entities called generalized functions. They are essential in practical
applications, since their use greatly simplifies what would otherwise be very difficult
calculations. Generalized functions play a part similar to the symbol i in complex
numbers: i is not an ordinary number, but in most ways it behaves like one.
There are apparent anomalies associated with generalized functions; for ex-
ample, we have just obtained the Fourier transform of cos (2πf0t), but the normal

F [cos 2πf0t] 1
i F [sin 2πf0t]

f0
−f0 O f0 f −f0 O f

Fig. 27.12
601
definition of a Fourier transform (27.8b) does not work with a periodic function,
because the integral does not approach a definite value when we apply the infinite

27.7
limits of integration. Exact justification and interpretation of these questions are
far beyond the scope of this book. You should regard relations such as (27.20) as

CONVOLUTION THEOREM FOR FOURIER TRANSFORMS


being usually safe, and call on them as if you were using a dictionary, as did the
original inventors of these methods.
In this sense we can represent the Fourier transform of a general periodic func-
tion in the form:

Transform of a periodic function


xP(t) is periodic with period T, and f0 = 1 /T.

(a) F [xP(t)] = ∑ X δ(f − nf ),
n = −∞
n 0

where

(b) Xn = f0 Period
xP(t) e−i2πnf t dt
0

(27.21)

The spectral frequency distribution consists of an infinite row of ‘spikes’ δ(f − nf0)
spaced at equal intervals f0. These are weighted by Xn, which are just the two-
sided Fourier series coefficients for the periodic function xP(t) given by (26.18b).
To prove the result (27.21), take the Fourier series representation (26.18a), and
use (26.20a and b) to transform the cosines and sines in the series term by term.
We obtain

F [xP(t)] = ∑ X δ(f − nf ),
n=−∞
n 0

which is (27.21a). The coefficients Xn are given by (26.18b), which is the same
as (27.21b).

27.7 Convolution theorem for Fourier transforms


Suppose that we have a spectral distribution X(f ) which can be written as the
product of two simpler functions X1( f ) and X2( f ), whose inverses we know:
X( f ) = X1(f )X2(f ), (27.22)

where
x1(t) ↔ X1(f ), x2(t) ↔ X2(f ). (27.23)

The inverse transform of X(f ) is given by


x(t) = 
−∞
ei2πft X1(f )X2(f ) df (27.24)
602
in which
FOURIER TRANSFORMS

∞ ∞

X1(f ) =  −∞
e−i2πftx1(t) dt = −∞
e−2πifux1(u) du, (27.25)

changing t to u, since t is already in use in (27.24). Substitute (27.25) into (27.24):


⎛ ∞

x(t) =

−∞
e i2 π f t



−∞
e− i2 π fux1(u) du⎟ X2 ( f ) df


⎛ ∞

=
  ei2 π f (t − u)x1(u) du⎟ X2 ( f ) df
27


−∞ ⎝ −∞ ⎠

⎛ ∞

=

−∞
x1(u) ⎜


−∞
ei2 π f (t − u)X2 ( f ) d f ⎟ du

after changing the order of integration (this process is justified in Section 32.1).
The interior integral is equal to the inverse of X2( f ) at time (t − u), so it is equal to
x2(t − u). Therefore

x(t) = 
−∞
xi(u)x2(t − u) du. (27.26a)

If we had started by substituting for X2(t), we should obviously have arrived at


x(t) = 
−∞
x1(t − u)x2(u) du, (27.26b)

confirming that the two integrals on the right of (27.26a and b) are equal. This
enables us to invert products of spectral distributions.
The integrals
∞ ∞

−∞
xi(u)x2(t − u) du or  −∞
x1(t − u)x2(u) du (27.27a)

(which are equal) are often written in the short notation


x1(t) * x2(t) or x2(t) * x1(t). (27.27b)

x1(t) * x2(t) (or x2(t) * x1(t)) is called the convolution of x1(t) and x2(t). The result
(27.26) is the convolution theorem. In the short notation:

Convolution theorem for Fourier transforms


Let x1(t) ↔ X1(f ) and x2(t) ↔ X2(f ). Then
X1(f )X2(f ) ↔ x1(t) * x2(t)
where x1(t) * x2(t) is given by (27.27). (27.28)
603
In the convolution integrals (27.27a) the variable of integration is u, and t is to be
treated like a parameter.

27.7
Obtain x(t) = x1(t) * x2(t) when x1(t) = Π(t) and x2(t) = 1/(1 + t 2).

CONVOLUTION THEOREM FOR FOURIER TRANSFORMS


Example 27.9

Write the convolution in the form


∞ ∞

 
1
x(t) = x1(u)x2(t − u) du = Π(u) du.
−∞ −∞ 1 + (t − u)2
Since Π(u) = 0 unless − 12  u − 12 , the limits of integration become ± 12 , so
1 1
t + 12

  
2
1 2
1 dv
x(t) = Π(u) du = du =
−2
1 1 + (t − u)2 − 12 1 + (t − u)2 t − 12 1 + v2
(after putting t − u = v)
4
= arctan(t + 12 ) − arctan(t − 12 ) = arctan ,
4t 2 + 3
which can be obtained from an addition formula in Appendix B(b).

At (i) in Example 27.9 the limits of integration were modified to take account of
the fact that the integrand is zero except over the interval − 21  u  21 . In many
typical cases it is quite awkward to establish the new limits. Consider, for example,
the convolution of two identical pulses Π(t):

x(t) = Π(t) * Π(t) = 


−∞
Π(u)Π(t − u) du. (27.29)

For different values of t, Π(t − u) occupies a different position on the u axis. For
certain ranges of t it partially overlaps Π(u) from the left, or from the right, and
for other ranges of t there is no overlap, as illustrated in Fig. 27.13.

II(u)
II(t − u) for various t
1

No overlap Overlap − 12 O 1
2 Overlap No overlap u
from left from right

Fig. 27.13

To take this into account, set up a diagram as in Fig. 27.14, with axes t and u.
The region in which Π(t)Π(t − u) is nonzero is easy to find by carrying out the
following construction.
(i) Π(u) is nonzero only if − 21  u  21 . The edges of this region are the
straight lines
u = − 21 and u = 21 .
Draw these and label them with the u values.
604

Current
FOURIER TRANSFORMS

value of t
u
1 1
2 u= 2

1
2
t−
=
u
− 12 O
−1 1
2 1 t

1
2
27

t+
=
u

1
u=− 2

Fig. 27.14

(ii) Π(t − u) is nonzero only if − 21  t − u  21 , or if


u− 1
2  t  u + 21 .
In terms of u, the edges of this region are therefore
u=t+ 1
2 and u = t − 21 .
Draw these lines and label them.
(iii) The parallelogram enclosed by the four lines contains the u, t values for
which the integrand (27.29) is nonzero. Next, draw a vertical line representing
the current values of t as in Fig. 27.14. The effective limits of integration are
represented by the points where the t line intersects the sides, and the u values are
already written on these sides. (The limits of integration are therefore different
functions of t for values of t on either side of a vertex.)
Other ways of interpreting convolution integrals will be found elsewhere, but
this is by far the simplest way for working them out. It can be adapted for use
whatever the nonzero intervals for the two functions may be. In practice it is used
as follows:

Example 27.10 (a) Show that Π(t) * Π(t) = Λ(t), where (Fig. 27.15a)
⎧1 + t, −1  t  0,

Λ(t) = ⎨1 − t, 0  t  1,
⎪⎩0, elsewhere.
(b) Show that F [Λ(t)] = sinc2f.
(a) Put x(t) = Π(t) * Π(t), and use the diagram Fig. 27.14 as described in (iii) above.
If t  −1 or t  1, there is no overlap, so x(t) = 0.
If −1  t  0, the limits of integration are from u = − 12 to t + 12 , so ➚
605
Example 27.10 continued

27.8
(b)
F [Λ(t)] = sinc2f

THE SHAH FUNCTION


1
(a) Λ(t)
1

−1 O 1 t O f

Fig. 27.15

t + 12
x(t) = 
− 12
1 × 1 du = 1 + t.

If 0  t  1, the limits of integration are from u = t − 1


2 to 12 , so
1


2

x(t) = 1 × 1 du = 1 − t.
t− 12

Therefore the convolution is equal to Λ(t), shown in Fig. 27.15a.


(b) From the convolution theorem, (27.28),
F [Π(t) * Π(t)] = F [Π(t)]F [Π(t)] = {F [Π(t)]}2,
and
F [Π(t)] = sinc f
by (27.15). Therefore (see Fig. 27.15b)
F [Λ(t)] = sinc2f.

A more convenient way to express the triangle function is given in (27.30a):

Triangle function
(a) Definition
⎧1 − | t |, −1  t  1;
Λ(t) = ⎨
⎩0, elsewhere.
(b) Transform
F [Λ(t)] = sinc 2f. (27.30)

27.8 The shah function


The generalized function =T(t) (pronounced ‘shah’) otherwise called a Dirac
comb is defined by

= T(t) = ∑ δ(t − nT)
n =−∞
606

(b) F [=T(t)] = f0 =f (f )
FOURIER TRANSFORMS

0
( f0 = 1/T)
(a) =T(t)

t
−2T −T O T 2T t −3f0 −2f0 −f0 O f0 2f0 3f0
∞ ∞
Fig. 27.16 (a) =T(t) = ∑ δ(t − nT ).
n =−∞
(b) F [=T(t)] = f0 ∑ δ( f − nf ).
n =−∞
0
27

(Fig. 27.16a). It is an even function consisting of an infinite string of equal ‘spikes’


(the delta functions) spaced at a constant interval T, one of them being at t = 0.
Since it is periodic, its Fourier transform is given by (27.21a):

F [=T(t)] = ∑X
n=−∞
n δ(f − nf0), (27.31)

where f0 = 1/T (the frequency spacing) and, from (27.21b),


1


2T

∑

Xn = f0 e−i2πnf0t
=T(t) dt = f0 e− i2 πnf0 t δ(t − nT) dt = f0 ,
Period n =−∞ − 12T

by the sifting rule (27.19b), since the only delta function within the period is the
one where n = 0. Therefore, from (27.31),
F [=T(t)] = f0=f ( f ). 0

It is shown in Fig. (27.16b).

The shah function



(a) =T(t) = ∑ δ(t − nT).
n = −∞

(b) F [=T(t)] = f0=f ( f ), where f0 = 1/T.


0 (27.32)

Example 27.11 The function x(t) is zero when t  − 12 T and t  12 T. Show that
the convolution y(t) = =T(t) * x(t) is the periodic function, having period T,
which agrees with x(t) in the range − 12 T  t  12 T.
Write

=T(t) * x(t) =  −∞
x(u)=T(t − u) du

∞ ∞ ∞
= ∑ x(u) δ(t − u − nT ) du = ∑ x(t − nT ),
n = −∞ −∞ n = −∞ ➚
607
Example 27.11 continued

27.9
(b) y(t)
(a) x(t)

ENERGY IN A SIGNAL: RAYLEIGH’S THEOREM


− 12 T O 12 T t − 32 T − 12 T O 12 T 3
2 T t

Fig. 27.17 (a) x(t) (non-periodic). (b) =T(t) * x(t).

using the sifting theorem (the critical points are where t − u − nT = 0). The term with
n = 0 reproduces x(t), which is zero outside the range − 12 T to 12 T. The term with n = 1
slides that graph a distance T to the right, and we have a non-overlapping copy of x(t)
in the range 12 T to 23 T, and so on. The general picture is shown in Fig. 27.17: y(t) is a
periodic copy of x(t), with period T.

Rules for Fourier transforms and a short table of Fourier transforms are listed
in Appendix G. A longer table of transforms is given by Råde and Westergren
(1995), but note they use an alternative definition of the transform (see the
comments at the end of Section 27.3).

27.9 Energy in a signal: Rayleigh’s theorem


The total energy E carried by a signal, from t = −∞ to t = ∞, often takes the form

E= −∞
|x(t)|2 dt.

This can be expressed in terms of the spectral distribution X( f ) as follows:

Rayleigh’s theorem
∞ ∞


−∞
| x(t)|2 dt =

−∞
| X( f )|2 df

or
∞ ∞


−∞
x(t)f(t) dt =
 −∞
X( f )e( f ) df .
(27.33)

We have
∞ ∞ ∞
⎛ ∞

E=
−∞
| x(t)| dt =
2
 −∞
x(t)f(t) dt =
 −∞
x(t) ⎜


−∞
e( f ) e− i2 π f t df ⎟ dt

(after expressing x(t) as the inverse transform of X(f ), and taking its complex
conjugate). Now change the order of integration:
608

⎛ ∞
⎞ ∞ ∞

E=
 e( f )⎜
 x(t) e− i2 π f t dt⎟ df =  E( f )X( f ) df =  |X( f )| 2 df.
FOURIER TRANSFORMS

−∞ ⎝ −∞ ⎠ −∞ −∞

Parseval’s theorem extends this result for cases when the energy depends on two
functions, x(t) and y(t), as in the case of current and voltage in circuits. It states that
∞ ∞

−∞
x(t)h(t) dt = −∞
X( f )Y(f ) df (27.34)

27.10 Diffraction from a uniformly radiating strip


27

(Note: For the necessary background to waves and phasors see Sections 20.8
and 21.6.)
We shall illustrate a type of calculation which arises in diffraction problems
in several branches of physics. In optics it occurs in Fraunhofer diffraction by a
narrow slit, and there are similar problems in acoustics. Also there is a close con-
nection with the theory of radiating antennas. We shall present the problem in an
abstract way, since the process of tailoring it to a real situation involves additional
physical considerations.
Consider the half-space z  0, criss-crossed by travelling waves all having the
same frequency f and wavelength λ. At every point P there is a disturbance u(t, P)
produced by superposition of all the rays passing through P, and interference
between these rays determines the resultant amplitude and phase of the oscilla-
tion at P. Instead of using u(t, P) we shall assign a phasor, or complex amplitude
(see Section 21.6), U(P) to every point, so that
u(t, P) = Re[U(P) e2πift].
We need a preliminary result. Figure 27.18 show a ray directed along an arbit-
rary axis Oz. It has constant amplitude a. The disturbance is given by

⎡ ⎛t z⎞ ⎤
u(t, z) = a cos ⎢2π ⎜ − ⎟ + φ ⎥ , (27.35)
⎣ ⎝T λ⎠ ⎦

Q
O
P z Fig. 27.18

where T, λ, and φ are the period, wavelength, and a constant phase angle, and the
wave velocity v = λ /T is directed towards the right. Let Q and P be arbitrary fixed
points on Oz. In an obvious notation

⎡ ⎛t z ⎞ ⎤
u(t, zQ ) = a cos ⎢2π ⎜ − Q ⎟ + φ ⎥ ,
⎣ ⎝T λ⎠ ⎦
⎡ ⎛t z ⎞ ⎤
u(t, z P ) = a cos ⎢2π ⎜ − P ⎟ + φ ⎥ .
⎣ ⎝T λ ⎠ ⎦
609
The corresponding phasors or complex amplitudes at Q and P are UQ, UP given by

27.10
UQ = a ei[φ −(2πizQ /λ)] = a eiφQ,
UP = a ei[φ −(2πizP /λ)] = a eiφP.

DIFFRACTION FROM A UNIFORMLY RADIATING STRIP


The out-of-step behaviour of the oscillations at Q and P is defined by the phase
difference
φQP = φP − φQ = −2π(zP − zQ)/λ.
Therefore:

Phase change along a ray QP


UP = UQ eiφ = UQ e−2πiQP/λ,
QP

where QP is the distance from Q to P. Therefore the phase of the complex


amplitude decreases from Q to P by an amount equal to
2π × distance QP measured in wavelengths.
( 27.36)

Figure 27.19 shows an infinite radiating strip in the (x, y) plane having width h,
its central line along the y axis, and infinite length −∞  y  ∞. Each infinitesimal
element, or source, δA on the strip emits a harmonic wave spreading equally in
all directions (i.e. it generates a spherical wave). We assume firstly that the dis-
tribution of sources on the strip is uniform: the contribution to the oscillation
strip of any element of area δA is αδA where α is independent of position on
the strip (one may imagine a uniform distribution of tiny, equal, hemispherical
loudspeakers). Secondly, all the sources have the same frequency and phase: they
are all oscillating in step.
Since the strip is infinitely long and the source distribution is independent of y,
the problem is two-dimensional: the wave fields over all cross-sections y = constant
are identical. Figure 27.20 shows the cross-section y = 0, z  0. P is a typical point
distant r from O, and OP is inclined at θ to Oz (the positive direction for θ is
clockwise here). Q is a typical elementary source at (0, x), with −h  x  h, and

x
x
1
h S′ Element, width δx
2
δA S′
1
h
O 2 Q
Strip,
width h O z
y θ
S z r
− 12 h − 12h S
P

Fig. 27.19 Infinite radiating strip, width h, Fig. 27.20 Cross-section y = 0 of Fig. 27.19.
parallel to the y axis, radiating into z  0, is
typical radiating element.
610
width δx. The waves arriving at P from all points on the strip SS′ interfere, and
when the quantities h/λ, r /λ, are of the appropriate magnitude, a systematic
FOURIER TRANSFORMS

distribution of intensity is observed as the angle of view θ varies between + − 1 π,


2
called a diffraction pattern or angular spectrum. (Note: in the physics literature,
this problem is usually referred to as one-dimensional.)
Firstly we shall determine the phases of the rays at P. For simplicity we take
the common phase of the sources to be zero. Then by (27.36) the phase φQP of the
complex amplitude component induced at P by the ray QP (Fig. 27.20) is given by
φQP = −2πQP/λ. (27.37)

To obtain an expression for the length QP: by the cosine rule (Appendix B(f))
27

applied to the triangle OQP


1
QP = [r 2 + x2 − 2rx cos( 12 π + θ )]–2
1

⎛ 2 x sin θ x2 ⎞ 2 1
= r ⎜1 + + 2 ⎟ = r(1 + q)–2
⎝ r r ⎠

say, where
2 x sin θ x2
q= + 2.
r r
It can be shown that if h/r  √5 − 1 (which would always be so in practice), then
|q|  1 for all x in − 12 h  x  12 h and all θ in − 12 π  θ  12 π. In that case we can use
1
the binomial theorem (5.4f) to approximate to (1 + q)–2 . The first few terms
are given by

(1 + q)2 = 1 + 21 q − 18 q2 +  .
1

Therefore

⎡ 1 ⎛ 2 x sin θ x2 ⎞ 1 ⎛ 2 x sin θ x2 ⎞
2

(1 + q) = r ⎢1 + ⎜ + 2⎟ − ⎜ + 2 ⎟ + ⎥
1
2

⎢⎣ 2⎝ r r ⎠ 8⎝ r r ⎠ ⎥⎦

⎛ 1 x2 1 x3 1 x4 ⎞
= r + x sin θ + r ⎜ 2 cos2θ − sin θ − ⎟
⎝2 r 2 r3 8 r4 ⎠
x2 ⎛ 2 x x2 ⎞
= (r + x sin θ ) + ⎜ cos θ − sin θ − 2 + ⎟ . (27.38)
2r ⎝ r 4r ⎠

We shall retain the linear expression (r + x sin θ ) as the approximation. However, we


must ensure that the error is much smaller than one wavelength λ. Otherwise
the subsequent calculation of the interference effect at P of all the sources together
will be seriously affected. The error consists of the neglected group of terms in
(27.38), so we require 12 (x2/r) cos2θ  λ, where  means ‘is very much less than’,
and merely indicates an order of magnitude. This is satisfied for all the x and
θ values if
611
h /r  8λ,
2
(27.39)

27.10
Finally we have

Phase change φQP along the ray QP (Fig. 27.19)

DIFFRACTION FROM A UNIFORMLY RADIATING STRIP


If − 12 h  x  12 h and h2/r  8λ, then
φQP = −2π(r + x sin θ )/λ,
with an error  2π (radians). (27.40)

Note that the approximation to φQP is linear in x sin θ.


Next we shall transform all distances into multiples of a wavelength λ. Put
r/λ = R, x /λ = X, h/λ = H. (27.41)

These are natural variables for the problem; the physical outcome depends on the
number of wavelengths in h, for example. If we double the wavelength we must
double r, x, and h to preserve the same geometry. Since R, X, H are dimensionless:
if the unit of length is changed, say from metres to angstrom units, these quantit-
ies are unaffected. Equation (27.40) becomes:

Phase change φQP along the ray QP


(Distances measured in wavelengths.)
If − 12 H  x  12 H and H 2/R  8, then
φQP = −2π(R + X sin θ ),
accurate to a small fraction of 2π. (27.42)

Suppose that the amplitude of the source at Q is aδX in the new units. We can
allow to some extent for attenuation along the ray QP, provided that it depends
effectively only on distance R. We approximate its contribution, δQU, to the
complex amplitude UP at P by putting
δUQ = u(R) e−2πiXsinθ, (27.43)

where, using (27.42),


u(R) = a(R) e−2πiR δX, (27.44)

and a(R) also includes an attenuation factor.


The resultant complex amplitude UP at P arising from all the sources on
−H  X  H is then given by
1
2H

UP = lim
δX→ 0
∑ δQU = u(R) − 12 H
e−2 π iX sinθ dX

= u(R) −∞
Π(X/H ) e−2πiXsinθ dX, (27.45)

where Π is the top-hat function (27.13).


612
We are interested only in the dependence of the expression (27.45) for UP on the
angle θ, and not on the variation with distance, or on the R-dependent part of the
FOURIER TRANSFORMS

phase of UP. We therefore define the angular spectrum function F(sin θ ) by casting
off u(R), so that for constant R:

F(S) =  −∞
Π(X/H) e−2πiXS dX, (27.46a)

where
S = sin θ. (27.46b)

By comparing (27.46a) with (27.8b) (with X in place of t and S in place of f ), it


27

can be seen that F(S) is the Fourier transform of Π(X/H). Also, we can refer to
(27.13b) to evaluate it (with H standing in place of τ ). We obtain
sin(π SH)
F(S) = = H sinc(HS ). (27.47)
πS
In terms of the original variables x, h, λ, θ, therefore, over a circular arc r = constant,
angular distribution of amplitude ∝ sinc(h sin θ /λ). (27.48a)

This angular dependence is illustrated in Fig. 27.21b. The intensity distribution


(see Section 21.4(iii)) is proportional to | UP |2, so that
angular spectrum of intensity ∝ sinc2(h sin θ /λ), (27.48b)

shown in Fig. 27.21c.

(a) (b) Amplitude (c) Intensity


Π(x/h)

sin θ = λ /h

x θ θ
− 12h O 1
2 h O O

Fig. 27.21 (a) Source distribution ∝ Π(x /h). (b) Amplitude spectrum ∝ sinc(h sin(θ /λ)) (zeros at sin θ = nλ /h.)
(c) Intensity spectrum ∝ sinc 2(h sin θ /λ).

General source distribution and


27.11
the inverse transform
Suppose now that the source distribution is not necessarily confined absolutely
to a strip, and is not necessarily uniform. Let its complex amplitude be e(x)
where
613
e(x) = | e(x)| e iφ(x)
, −∞  x  ∞. (27.49)

27.12
By following exactly the same procedure, we obtain the angular spectrum

F(S) =  E(X) e−2πiSX dX,

TRANSFORMS IN RADIATION PROBLEMS


(27.50)
−∞

where S = sin θ, X = x /λ, and E(X) = e(x). In principle, the source may be infinitely
extended in the ±x directions, though realistically we shall assume E(X) to be
negligible beyond a certain range of values. Equation (27.50) is again the Fourier
transform of the source distribution, and its inverse transform is given by

E(X) =  −∞
F(S) e2πiSX dS. (27.51)

Equation (27.51) suggests that we might be able to construct a source distribution


E(X) (that consists, for example, of a suitable array of antennas), having pre-
scribed directional properties defined by a fairly arbitrary function of direction
F(S) = F(sin θ ). However, there seems to be a difficulty, in that the inversion
integral (27.51) requires a value of F(S) at every value of S, −∞  S  ∞. But in the
physical world S stands for sin θ with − 21 π  θ  21 π, or for −1  S  1. Outside
of this range we cannot prescribe values of S in advance.
To meet this difficulty we shall add another approximation requirement to
the small print of the theory. For ‘well-behaved’ functions (see the remarks
following (27.8)), |F(S)| defined by (27.50) approaches zero as S → ±∞. If we
can be confident that
∞ 1

−∞
F(S) e2πiSX dX ≈ −1
F(S) e2πiSX dS (27.52)

to an acceptable degree of accuracy, then we may ignore the range |S|  1 for
the purpose of obtaining E(X) from a given F(S). A commonly arising physical
situation that provides support for the approximation (27.52) involves radiation
fields that are strongly directional, the diffracted rays being effectively confined to
a fairly narrow range of θ. The radiation from a uniform strip (Section 27.10) is of
this character if the dimensions are right.

27.12 Transforms in radiation problems


We continue to use the dimensionless variables (27.41) for simplicity of expression.
The reformulation of earlier results (see Appendix G(a)) in terms of the new variables
is obtained by the following correspondences:

Appendix G(b): f t x X D B u
Current symbols: S(= sin θ ) X E F K D W

The results we shall be using, recast in the present notation, are:


614

Rules for radiation problems


FOURIER TRANSFORMS

Rule Source Amplitude


distribution spectrum
(a)Scaling E(AX) F(S/A)/ |A|
(b)Linear phase factor E(X) eiKX F[S − (K/2π)]
(c)Source displacement E(X − D) F(S) e−2πiDS
(d)Reciprocity (duality) F(X) E(−S)
(e)Convolution g(X) * E(X) G(X)F(S)
(G(s) = F [g(X)])

(where f(X) * g(X) = ∫−∞ f(W)g(X − W) dW
27


= ∫−∞ f(X − W)g(W) dW).
(27.53)

We now give some examples showing the significance of these rules for radia-
tion problems. It will be assumed that the estimates (27.42) and (27.52) apply
where necessary. Notice that if the effective diffracted range of θ is small enough,
S can be identified with θ for the purpose of visualizing the diffraction patterns
that arise.

(i) Change of scale


Suppose that F [E(X)] = F(S) and F [E(AX)] = FA(S), where F(S) and FA(S) describe
the respective angular spectra. If 0  A  1, the graph of E(AX) is obtained from
that of E(X) by stretching it uniformly by a factor 1 /A  1 along the X axis. Then
the scaling rule (27.3a) states that the graph of FA(S) is obtained by contracting the
graph of F(S) by a factor A. (The amplitude of FA(S) is affected by the 1/| A |, but
this does not affect the angular distribution, which is all we are interested in.)
If A  1, then E(X) is contracted and F(S) is stretched.
This is illustrated by the uniform source distribution shown in Fig. 27.20,
where (in terms of X and H, which are measured in wavelengths)

⎛ X⎞
E(X) = Π ⎜ ⎟ , F(S) = H sinc HS,
⎝ H⎠

so that A = 1/H. The breadth of the central loop of F(S) and its satellites is
inversely proportional to H.

(ii) Linear phase change across a radiating strip


Let E(X) and E(X)eiKX be two source distributions and F(S) the angular spec-
trum of E(X). Equation (27.53b) states that the spectrum of E(X) eiKX is equal to
F[S − (K/2π)]. The angular displacement of the diffracted pattern (given in terms
of S) is therefore shifted by a constant amount K/(2π), the pattern itself remaining
unchanged. The diffracted pattern ‘swings’ through an angle determined by
∆ S = K /(2π), where S = sin θ and ∆S is the change in S. Such a phase gradient can
be induced across an antenna array to redirect the main beam.
615
(iii) Displacement of the source

27.12
Equation (27.53c) states that if we move the emitter bodily up the X axis by a dis-
tance D (wavelengths), then F(S) becomes F(S) e−2πiDS. This result may seem a little
curious, since it is physically obvious that the new spectrum is simply the old spec-

TRANSFORMS IN RADIATION PROBLEMS


trum moved up a distance D. However, θ is still being measured from the same
origin, and this formula, easy to prove, condenses some awkward geometry. Note
that this is the dual or reciprocal property (27.53d) corresponding to item (ii).

(iv) Interference between two narrow uniform sources


Figure 27.22 shows two identical and uniform radiating strips of width h placed
symmetrically a distance d apart as measured between their centres. The source
distribution e(x) is given by
⎛ x + 21 d ⎞ ⎛ x − 21 d ⎞
e(x) = aΠ ⎜ ⎟ + aΠ ⎜ ⎟,
⎝ h ⎠ ⎝ h ⎠
where a is a constant. In terms of the dimensionless variables X, H, and D = d /λ,
⎛ X + 21 D ⎞ ⎛ X − 21 D ⎞
E(X) = aΠ ⎜ ⎟ + a Π ⎜ ⎟. (27.54)
⎝ H ⎠ ⎝ H ⎠
By (27.47),
F [Π(X/H)] = F(S) = H sinc(HS).
Then by using the property (27.53c)
F(S) = aF(S) eπiDS + aF(S) e−πiDS = 2H sinc(HS) cos(DS). (27.55)

The zeros of F(S) due to the term sinc HS are at S = nπ /H and those due to cos
DS are at S = (n + 21 )π /D, and they are interlaced. If D  H (not necessarily

O
x
h h

z Fig. 27.22
616

F(S)
FOURIER TRANSFORMS

S
O
27

Fig. 27.23 The angular spectrum of the arrangement in Fig. 27.21.

hugely greater, but perhaps 10 times greater) an interference of the type shown in
Fig. 27.23 is obtained for the angular spectrum F(S). The envelope is proportional
to sinc(HS). The intensity spectrum is proportional to the square of this function,
sinc 2(HS) cos2(DS) (see Fig. 27.23). If D  H, the underlying fine-scale oscillation
may be difficult to resolve instrumentally.

(v) Arrays of sources in terms of a convolution


Suppose (see Fig. 27.24) that we have an array of N identical radiating elements
centred on the points X = X1, X2, … , XN. The nth element has the distribution
E0(X − Xn)Π[(X − Xn)/H], where H is the constant width (in wavelengths).
Assume that X1, X2, … , XN are spaced so that the elements are non-overlapping.
The overall source distribution E(X) is then given by

X = x/ λ

Width H X4

X3

X2

X1

Fig. 27.24 An array of sources.


617
N
⎛ X − Xn ⎞
E(X) = ∑ E (X − X ) Π ⎜⎝ ⎟. (27.56)

27.12
0 n
n =1 H ⎠

We shall show that E(X) can be expressed in the form of the convolution

TRANSFORMS IN RADIATION PROBLEMS


E(X) = E0(X)Π(X/H) * g(X), (27.57a)

where
N
g(X) = ∑ δ(X − X ),
n =1
n (27.57b)

and δ represents the delta function (27.19). g(x) is called the distribution function
for the array. To prove (27.57): by the definition of the convolution (27.53e),


⎛ X⎞ ⎛ X′ ⎞ N
E0(X) Π ⎜ ⎟ * g(X) =
⎝ H⎠
E0(X′) Π ⎜ ⎟
⎝ H⎠
∑ δ(X − X′ − X ) dX′
n =1
n
−∞

∑
N
⎛ X′ ⎞
= E0(X′) Π ⎜ ⎟ δ(X − X′ − Xn ) dX′
n =1 −∞
⎝ H⎠

∑
N
⎛ X − Xn − w ⎞
= E0(X − Xn − w) Π ⎜ ⎟ δ(w) dw
n =1 −∞
⎝ H ⎠

(after putting w = X − Xn − X′)


N
⎛ X − Xn ⎞
= ∑ E (X − X )Π ⎜⎝
0 n
H ⎠

n =1

(from the sifting property of the delta function (27.19b))


= E(X)
as required by (27.57a).
Let the transforms of g(x) and E0(X)Π(X/H) be given by
F [g(X)] = G(S), F [E0(X)Π(X/H)] = F0(S).
Then by (27.53e) and (27.57a)
F [E(X)] = F [E0(X)Π(X/H) * g(X)] = G(S)F0(S). (27.58)

Therefore the spectrum of the array is equal to the transform of the array dis-
tribution function, multiplied by the spectrum of the single element centred on
the origin.
Alternatively, we can obtain G(S) explicitly:


N N
G(S) = ∑ δ(X − X ) e n
−2 π iXS dX = ∑e −2 π iXn S , (27.59)
−∞ n =1 n =1

so that
618
N
F(S) = F0(S) ∑ e−2 π iXn S. (27.60)
FOURIER TRANSFORMS

n −1

Each displaced source


E0(X − Xn)Π[(X − Xn)/H]
is subject to the displacement rule (27.53c), which introduces the factor e2πiXn S into
the spectrum of E0(X)Π(X/H), by (iii) above. Therefore we obtain the sum (27.60)
more directly. This approach is simpler, but for more general cases convolution
methods are more versatile.
27

Problems
27.1 Obtain the Fourier sine and cosine of the cosine transform of x(t). (Hint: split the
transforms of the function x(t) = e−t for t  0. Find range of integration into two parts, −∞ to 0
the value delivered by the inverse sine transform and 0 to ∞.)
at t = 0. (Hint: cos(2πft) + i sin(2πft) = e2πift.) (b) Use the result of Problem 27.3 that the
cosine transform of e−t is √πe−π f to find the
2 2 2

cosine transform of e−at , α > 0.


2
27.2 Show that the cosine transform of the
function
27.5 Prove that if x(t) is an odd function then X( f )
⎧1 − t, 0  t  1,
x(t) = ⎨ is a pure imaginary odd function. Show that X( f )
⎩0, t 1 reduces to −iXs( f ), where Xs is the sine transform
is Xc(f ) = sin (πf )/(π2f 2). By considering the inverse
2 of x.
transform show that
∞ 27.6 From Problem 27.3, the cosine transform


sin 2u of e−t is √πe−π f . Use this result together with the
2 2 2

du = 12 π.
u2 scaling rule (27.17b) to prove that F [e−πt ) = e−πf .
2 2
0

27.3 Let x(t) = e−t , t  0. Use the procedure that


2
27.7 By means of a change of variable show that
follows to show that its Fourier cosine transform, an alternative form of the Fourier transform pair
Xc(f ), is given by Xc(f ) = √π e−π f :
2 2
is given by
(i) Write the integral defining Xc(f ), and obtain ∞


1
dXc /df by differentiating under the integral X(ω ) = x(t) eiω t dt,
√(2π) −∞
sign (see Sections 17.9 or 27.8).
(ii) Integrate by parts to obtain the differential ∞


1
equation x(t) = X(ω ) e −iω t dω .
√ π)
(2 −∞
dXc
= − 2 π 2 f Xc ,
df 27.8 Prove that if x(t) is an even function then
and obtain the general solution. F [x(t)] is an even function of f. Use this fact to
(iii) Use the fact (see Example 32.11) that reduce the Fourier transform pair to a real form.

0
e−x dx = --12 √π
2
27.9 Prove that if x(t) is an odd, real function, then
X(f ) is a pure imaginary odd function. Show that
to provide the initial condition Xc(0) = √π for the Fourier transform pair can then be reduced to
(ii), and deduce that Xc(f ) = √πe−π f .
2 2
a pair of real equations.

27.4 (a) Show that if x(t) is an even function, then 27.10 Prove the time-scaling rule, (27.18b), and
F [x(t)] is an even function of f, and takes the form the time-delay rule, (27.18c).
619
27.11 By (27.15), F [Π(t)] = sinc f. (a) Use the 27.18 Prove that
t
time-delay rule (27.18c) to obtain the transform of
 x(τ ) dτ.

PROBLEMS
H(t) * {x(t) H(t)} =
⎧1, 0  t  1,
x(t) = ⎨ 0
⎩0, elsewhere.
(b) Confirm the result (a) by evaluating F [x(t)] 27.19 (a) Obtain x1(t) * x2(t) when
directly. x1(t) = x2(t) = e−tH(t).
(c) Use the time-delay rule and the time-scaling (b) Use your result together with the convolution
rule to obtain F [x(t)] where b  12 c and theorem (27.28) to obtain the transform of a new
function, t e−t.
⎧−1, − b − 12 c  t  − b + 12 c, (c) Obtain F [t e−α t] from (b), where α  0.

x(t) = ⎨ 1, b − 12 c  t  b + 12 c, (d) Obtain the same result as in (c) by noticing that
⎩⎪ 0, elsewhere. d −α t
(e ) = −t e−α t.
(Hint: sketch a diagram.) dα

27.12 Given that F [Λ(t)] = sinc 2f (proved in 27.20 (a) Prove that
Example (27.10)), where Π(t − --21 ) * Π(t + --21 ) = Λ(t).
⎧1 + t, −1  t  0, (Hint: use the convolution theorem, (27.28).)

Λ(t) = ⎨1 − t, 0  t  1, (b) Show that
⎩⎪0, elsewhere, Π(t − a) * Π(t − b) = Λ(t − a − b).
obtain (a) F [Λ(2t)]; (b) F [Λ(2t − 3)]. (c) Show that
⎧0, t  − 32 and t  32 ,
27.13 (a) Prove the frequency-shift ⎪ + −
⎪ 2  t  − 2,
3 3 1
t,
property, (27.18). Π(t) * Π( 12 t) = ⎨ 2
1, − 2  t  2,
1 1
(b) Obtain F [x(t)e ±i2πf0t]. ⎪3
⎩⎪ 2 − t, 2  t  2 .
1 3
(c) From (b) deduce the modulation rules, (27.18),
for F [x(t) cos 2πf0 t] and F [x(t) sin 2πf0t].
(d) Obtain F [Π(--21t) cos 2πf0 t] and F [Π(--21t) sin 2πf0t]. 27.21 Show that the total energy in the signal
x(t) = e−α tH(t) (α  0) is equal to 1 /2α. Show
27.14 (a) Given that Λ(t) ↔ sinc f, obtain F [sinc t]
2 2 that the total energy due to the frequency range
1
either by using the duality rule (27.18), or by a −f0  f  f0 is equal to arctan(2πf0 /α).
direct method. πα
(b) Use the result (a), together with the time-delay
and time-scaling rules, to find F [sinc2(at + b)]. 27.22 Prove the result of Example 27.11 by using
(Λ(t) is defined by the convolution theorem (27.28) together with the
expression (27.31) for F [=T(t)].
⎧1 + t, −1  t  0,

Λ(t) = ⎨1 − t, 0 ≤ t  1, 27.23 Use the Fourier transform to obtain a
⎪⎩0, elsewhere.) particular solution of the differential equation
d 2x 1
−x= ,
27.15 (a) Prove the differentiation rule (27.18). dt 2 1 + t2
(b) Given that e−| t| ↔ 2 /(1 + 4π2f 2), obtain in the form of a convolution integral.
F −1[if/(1 + 4π2f 2)].
27.24 (a) Given that F [sinc t] = Π(t), deduce that
27.16 From the result e−α tH(t) ↔ 1 /(α + i2πf ), ∞ ∞

  sinc u du =
sin u
use the time-reversal rule to obtain F [e−α |t|], d u = 12 π and 1
2 .
where α  0. 0 u 0

(b) Given (see Problem 27.6) that F [e−πt ] = e−πf ,


2 2

27.17 (a) Obtain deduce that F [t e−πt ] = −ife−πf .


2 2

F [e−αt cos β t H(t)] and F [e−αt sin β t H(t)],


where α  0. (Hint: look at the table of simplifying 27.25 Use the convolution theorem and time-delay
rules (27.18) before trying to tackle these directly.) rule (27.17) to show that
(b) Obtain F [e−α t cos(2πf0t + φ) H(t)], where α  0. sinc t * sinc t = sinc t
620
and, more generally, that 27.27 in the manner of Example 27.11, interpret
the convolution
FOURIER TRANSFORMS

sinc(t − a) * sinc(t + a) = sinc t


=T(t − a) * g(t + b)Π({t + b}/τ ),
27.26 The function gτ(t) defined by where τ  t.
t+ 2τ
1

 27.28 Obtain the function h(t) = Π(t) * Λ(t) and


1
gτ (t) = g(u) d u
τ t− 2τ
1 its Fourier transform. Use the result to evaluate

∫ −∞ sinc 3u du. (Hint: the segments of h(t) should
is called the moving average of g over a range join up continuously (check this), and the work
of length τ. (The output from a recording is halved by noticing that h(t) is even.)
instrument is often a moving average overs a
short interval τ.) 27.29 x, y, and z are any three (suitable) functions.
(a) Show that gτ(t) = τ −1 Π(t/τ ) * g(t). Prove that
27

(b) Obtain the moving average gτ(t) when (a) x(t) * {Ay(t) + Bz(t)} = Ax(t) * y(t) + Bx(t) * z(t).
g(t) = Π(t), for values τ = 41 , 34 , 2, and (b) x(t) * y(t) = y(t) * x(t).
indicate their general nature by (c) x(t) * {y(t) * z(t)} = {x(t) * y(t)} * z(t) (i.e. the
sketches. brackets may be omitted).
Part 5
Multivariable calculus
Differentiation of
functions of two variables 28

CONTENTS

28.1 Depiction of functions of two variables 624


28.2 Partial derivatives 627
28.3 Higher derivatives 629
28.4 Tangent plane and normal to a surface 632
28.5 Maxima, minima, and other stationary points 635
28.6 The method of least squares 638
28.7 Differentiating an integral with respect to a parameter 640
Problems 642

Quantities in nature usually depend on, or are functions of, more than one variable.
The elevation H of land above sea level depends on two map coordinates x and y;
so H is a function of the two variables x and y, and we write H(x, y). If we want to
take account of geological changes, then time t becomes a consideration, and in
that case H is a function of three variables x, y, t, and we write H(x, y, t). It is easy
to produce examples involving many variables; for example, the distance between
two points P : (x1, y1, z1) and Q : (x2, y2, z2) is a function of six variables. The state of
the economy is a function of a multitude of variables. We alternatively speak of a
function in one, two, three, … dimensions.
Suppose that a quantity z, called the dependent variable, depends on two
independent variables x and y. The dependence can often be expressed by an
explicit formula such as
z = x 3 + y 3, z = ex−2y, z = | xy|,
and so on. To make statements which apply to all sorts of dependence we use the
notation
z = f(x, y),
or z = g(x, y) etc. The letter f on its own signifies a particular function or process:
a computer subroutine, a particular formula, or a set of rules which will generate
a single number z when two numbers x and y are fed to it in the right order.
Thus, if
f(x, y) = 2x + y2,
624
then
DIFFERENTIATION OF FUNCTIONS OF TWO VARIABLES

f(3, −2) = (2 × 3) + (−2)2 = 10,


f(−2, 3) = [2 × (−2)] + 32 = 5,
f(a, b) = 2a + b2, f(u2, v) = 2u2 + v2,
f(−x, x) = −2x + x2, f(y, x) = 2y + x2.
Notice the last one particularly: it is different from f(x, y).
The ‘graph’ of a function of two variables, f(x, y), takes the form of a surface
z = f(x, y) in three dimensions x, y, z. The partial derivatives represent the slopes
of the surface in the x and y directions, from which the slope is any direction can be
found. Read carefully Section 28.4 on the tangent plane at a point on the surface,
regarded as the plane through the point that best fits the surface at the point. This
idea has far-reaching implications for the chapters that follow.
In Sections 28.5 and 28.6 we apply the theory to find maxima and minima of
functions of two variables. These are stationary points (compare Section 4.2 for the
case of a single variable). Notable features of a surface are any points where it
rises to a local maximum, like the top of a hill. The height coordinate z falls away
from the crest in all directions, but further away it may reach a local minimum, like
the bottom of a mixing bowl, and there might be other local maxima and minima.

28.1 Depiction of functions of two variables


Consider the particular function
28

f(x, y) = x2 + y2.
Set up x, y, z axes; put
z = x2 + y2,
and proceed as if plotting a graph. Take a large number of pairs (x, y), work out z
for each, then put the point (x, y, z) in the axes. For example, if x = 1 and y = 2, then
z = 5 and we ‘plot’ the point (1, 2, 5) as shown in Fig. 28.1a. For Fig. 28.1b, a great

(a) P : (1, 2, 5) (b) P : (1, 2, 5)

5 z z 5

1 y y
1
−2 2 −2 2
−1 1 −1
O O
−1 1 −1 1
2 x 2 x
−2 −2

Fig. 28.1 Depicting a point P on the surface z = f(x, y) = x2 + y2.


625

(a) 1 y (b) (c) (d)

28.1
z 2 z
z
y z
O 1 y 1y
x

DEPICTION OF FUNCTIONS OF TWO VARIABLES


1 –1 1 1
–1
O O
–1
1 x 1 x
–1 O –1
1 x
−2 –1
1 1
Fig. 28.2 (a) The plane z = 2x + 4y − 2. (b) The hemisphere z = (1 − x2 − y2) 2 . (c) The cone z = (x2 + y2) 2 .
(d) The saddle z = x − y .
2 2

F
8 00 B
9 00
E 1000
1100 1200
1100 C

1000
A 9 00
8 00
1200

Fig. 28.3

number of points is supposed to have been plotted. They cover a surface shaped
like an inverted bowl.
Every function has a characteristic surface shape, which is the analogue in
three dimensions of the graphs used for functions of a single variable. Some other
functions are depicted in Fig. 28.2.
Another way of depicting a function is to sketch its contour map consisting of
its level curves. Figure 28.3 shows a contour map of a patch of countryside. Along
each contour the height is constant, and is indicated on the curve. The important
features of the terrain are very easy to pick out; there are peaks at A and B, a pass
at C (which is a ‘saddle’ as in Fig. 28.2d), valleys north west and south east of C
and ascents north east and south west of C. At E the contours are close together,
so the slope is steep, and at F the contours are widely spaced so the slopes are
comparatively gentle.
Consider again the function f(x, y) = x2 + y2 depicted in Fig. 28.1. The contour
of height c is the circle
x2 + y2 = c,
where c  0, which is a circle of radius c 2, as shown in Fig. 28.4a. This can be
1

visualized as in Fig. 28.4b, as a horizontal slice of the surface z = x2 + y2 at height c,


projected on to the (x, y) plane.
626

(a) y (b) z Horizontal slice,


DIFFERENTIATION OF FUNCTIONS OF TWO VARIABLES

height c

x
O 1 2
c=1 1.73 c
2 1.41
3
4
c

y
Contour
x x2 + y2 = c
Fig. 28.4

Example 28.1 Sketch the contours of the function f(x, y) = xy.


The contour of height c is given by the equation
z = xy = c,
or
28

y = c/x.
These curves are known as rectangular hyperbolas. By varying c, taking positive and
negative values, the contour map or level curves of Fig. 28.5 are obtained.

−4 4
−3 3
−2 2
c=0
c = −1 c=1

c=0 c=0
x
O
c=0

c=1 c = −1

2 −2

3 −3
4 −4

Fig. 28.5
627

28.2 Partial derivatives

28.2
Suppose that z = f(x, y) represents the height above sea level of a piece of countryside.
In Fig. 28.6a, an observer stands at the point P : (x, y), facing east, in the direc-

PARTIAL DERIVATIVES
tion of the x axis. A short step forward takes the observer to Q : (x + δx, y), up or
down a slope. The altitude changes by an amount
δz = f(x + δx, y) − f(x, y).

(a) y N (b) y
Q

δy
δx
W E P
P Q

O x S O x

Fig. 28.6

The average slope in this direction over the step length δx is δz /δx, so the slope at
P facing the observer is given by
δz f (x + δ x, y) − f (x, y)
lim = lim .
δ x→ 0 δ x δ x→ 0 δx
Since the variable y is constant during the step, this is in effect an ordinary
derivative, taken with respect to x only. However, it is customary to signal that
another variable is present, which is done by using the special sign ∂ (still called
‘dee’) instead of the usual d for the derivative, writing
∂f ∂z
or
∂x ∂x
instead of df/dx or dz/dx. This is called the partial derivative of f(x, y), or of z,
with respect to x.
If the observer faces north and takes a step δy, as in Fig. 28.6b, then we obtain
in the same way the slope ∂f/∂y or ∂z /∂y in the y direction.

Partial derivatives
If z = f(x, y), then
∂f ∂z f (x + δ x, y) − f (x, y)
or = lim ,
∂x ∂x δx→0 δx
∂f ∂z f (x, y + δy) − f (x, y)
or = lim .
∂y ∂y δy →0 δy (28.1)
628

Example 28.2 Find ∂z /∂x and ∂z /∂y at the point x = 1, y = 3 when


DIFFERENTIATION OF FUNCTIONS OF TWO VARIABLES

z = x2y + 2x2 − 3y + 4.
For ∂z/∂x, y has the status of a constant for the purpose of the differentiation, so
∂z
= 2xy + 4x − 0 + 0 = 2xy + 4x.
∂x
At the point (1, 3), ∂z /∂x = 10.
For ∂z/∂y, x is treated as constant, so
∂z
= x2 + 0 − 3 + 0 = x2 − 3.
∂y
At (1, 3), ∂z /∂y = −2.

We often need to indicate the particular point at which a derivative is to be


evaluated, like the point (1, 3) in the previous example. There are many notations
in use for this purpose. We use

⎛ ∂z ⎞ ⎛ ∂f ⎞
⎜ ⎟ and ⎜ ⎟
⎝ ∂x ⎠ (a,b) ⎝ ∂x ⎠ (a,b)
or

⎛ ∂z ⎞ ⎛ ∂f ⎞
⎜ ⎟ and ⎜ ⎟
28

⎝ ∂y ⎠ P ⎝ ∂y ⎠ P
to mean the derivatives are to be evaluated at P : (a, b). In this connection, the
following definitions are equivalent to (28.1):

Partial derivatives at (a, b)


⎛ ∂z ⎞ ⎛ ∂f ⎞ f (x, b) − f (a, b)
⎜ ⎟ or ⎜ ⎟ = lim ,
⎝ ∂x ⎠ ( a, b) ⎝ ∂x ⎠ ( a, b) x → a x−a

⎛ ∂z ⎞ ⎛ ∂f ⎞ f (a, y) − f (a, b)
⎜ ⎟ or ⎜ ⎟ = lim .
⎝ ∂y ⎠ ( a, b) ⎝ ∂y ⎠ ( a, b) y → b y−b
(28.2)

∂ ⎛ x ⎞ ∂ ⎛ 1 ⎞
Example 28.3 Obtain (a) ⎜ ⎟ ; (b) ⎜ 2 1 ⎟ .
∂x ⎝ x + y ⎠ ∂y ⎝ (x + y )2 ⎠
2

(a) We hold y constant and use the quotient rule (3.2):


∂ ⎛ x ⎞ ⎛ ∂x ∂ ⎞
⎜ ⎟ = ⎜ (x + y) − x (x + y)⎟ (x + y)2
∂x ⎝ x + y ⎠ ⎝ ∂x ∂x ⎠
y
= ,
(x + y)2
since ∂x/∂x = 1, and y is constant. ➚
629
Example 28.3 continued

28.3
(b) x is held constant. Use the chain rule (3.3), putting
u = x 2 + y 2, z = u− 2 ;
1

HIGHER DERIVATIVES
then
∂z dz ∂u
= .
∂y du ∂y
(We write ∂u/∂y instead of du/dy in the chain rule because both x and y are present in u,
and x is being held constant.) Continuing, we have
∂z
= (− 12 u− 2 )(2y) = − y(x2 + y2 )− 2 .
3 3

∂y

Example 28.4 The potential function V(x, t) = A e−qt sin k(x − ct) represents an
attenuating wave travelling to the right along a cable with speed c. Here A, q, k,
c are constants. Find (a) the rate of change of V with time t at any fixed point x;
(b) the ‘potential gradient’ ∂V/∂x along the wire at any moment.
(a) For ∂V/∂t, use the product rule (3.1) with u = A e−qt and v = sin k(x − ct). We treat x
as constant, so ∂v/ ∂t instead of dv/dt will be written into the product rule:
∂V ∂(uv) ∂v du
= =u +v
∂t ∂t ∂t dt
= A e−qt[−kc cos k(x − ct)] + (−qA e−qt) sin k(x − ct)
= −A e−qt[kc cos k(x − ct) + q sin k(x − ct)].
∂V ∂
(b) = A e −qt sin k(x − ct) = k A e−qt cos k(x − ct),
∂x ∂x
t being treated as constant.

It will be seen that no new rules have to be learned in order to obtain the par-
tial derivatives of given functions. In fact you have always unconsciously carried
out partial differentiation when differentiating expressions like A sin(ω t + φ),
without worrying whether A, ω, φ were really constants or just to be treated as
such while differentiating.

Self-test 28.1
Obtain the first partial derivatives of f(x, y) when f(x, y) is given by (a) cos2(xy);
(b) cos(x2 − y2); (c) ex ln(xy) (xy  0).

28.3 Higher derivatives


Having differentiated a function, we might want to differentiate it again. If
∂z ∂ ⎛ ∂z ⎞ ∂ ⎛ ∂z ⎞
z = f(x, y), we can form and then ⎜ ⎟ or ⎜ ⎟ , thus forming second
∂x ∂x ⎝ ∂x ⎠ ∂y ⎝ ∂x ⎠
derivatives, or derivatives higher than the second. There are four second derivatives,
written as follows:
630
∂ ⎛ ∂z ⎞ ∂2 z ∂ ⎛ ∂z ⎞ ∂2 z
⎜ ⎟ = , ⎜ ⎟ = ,
∂x ⎝ ∂x ⎠ ∂x2 ∂y ⎝ ∂x ⎠ ∂y ∂x
DIFFERENTIATION OF FUNCTIONS OF TWO VARIABLES

∂ ⎛ ∂z ⎞ ∂2 z ∂ ⎛ ∂z ⎞ ∂2 z
⎜ ⎟ = , ⎜ ⎟ = .
∂x ⎝ ∂y ⎠ ∂x ∂y ∂y ⎝ ∂y ⎠ ∂y2

Example 28.5 Obtain the four second derivatives when


z = x3y + xy2 + x + y2 + 1.
The first derivatives are
∂z ∂z
= 3x2y + y2 + 1, = x3 + 2xy + 2y.
∂x ∂y
Therefore
∂2 z ∂ ⎛ ∂z ⎞
= ⎜ ⎟ = 6xy,
∂x2 ∂x ⎝ ∂x ⎠
∂2 z ∂ ⎛ ∂z ⎞
= ⎜ ⎟ = 3x2 + 2y,
∂y ∂x ∂y ⎝ ∂x ⎠
∂2 z ∂ ⎛ ∂z ⎞
= ⎜ ⎟ = 3x + 2y,
2
∂x ∂y ∂x ⎝ ∂y ⎠
∂2 z ∂ ⎛ ∂z ⎞
= ⎜ ⎟ = 2x + 2.
∂y 2
∂y ⎝ ∂y ⎠
28

In the last example, we see that the mixed derivatives satisfy ∂2z /∂y ∂x =
∂ z /∂x ∂y. This is always true for normal functions, although the proof is difficult:
2

Mixed derivatives
For any function f(x, y),
∂2 f ∂2 f
= .
∂y ∂x ∂x ∂y
In higher derivatives, the ∂x and ∂y in the denominator may be arranged
in any order. (28.3)

∂ 3f ∂ 3f ∂ 3f
For example, = = , and so on.
∂x ∂y2 ∂y2 ∂x ∂y ∂x ∂y
The next example shows how to manage a problem in notation. Often a function
f(x, y) is used in which the variables x and y only occur in a fixed combination
u = h(x, y), so that
f(x, y) = g(u), with u = h(x, y),
where g represents a general, unspecified, function of a single variable. To obtain a
general formula for ∂f /∂x use the chain rule (3.3) (see also Example 4.2c):
631
∂ f dg ∂u ∂u ∂h
= = g ′(u) = g ′[h(x, y)].
∂x du ∂x ∂x ∂x

28.3
It is a common mistake to write ∂g/∂x instead of g′[h(x, y)] in this context,

HIGHER DERIVATIVES
presumably misreading the chain rule. You must work out g′(u) first, before
substituting u = h(x, y). Thus suppose that f(x, y) = g(5x − 3y); then
∂f ∂f
= 5g′(5x − 3y) and = −3g′(5x − 3y).
∂x ∂y

If z = g(u), where u = h(x, y), then


∂z dg ∂u ∂h
= = g ′(h(x, y)) ,
∂x du ∂x ∂x
∂z d g ∂u ∂h
= = g ′(h(x, y)) .
∂y d u ∂y ∂y (28.4)

Example 28.6 Prove that if z = φ(x − ct), where φ is any function, then
∂z2 1 ∂z 2
= .
∂x 2 c 2 ∂t 2
Put z = φ(u) where u = x − ct. Then
∂z dφ ∂u
= = φ ′(u).
∂x du ∂x
By the chain rule again,
∂2 z ∂ dφ ′(u) ∂u
= φ ′(u) = = φ ″(u). (i)
∂x2 ∂x du ∂x
Similarly
∂z dφ ∂u
= = φ ′(u)(−c),
∂t du ∂t
so
∂2 z ∂ d ∂u
= [−cφ ′(u)] = [−cφ ′(u)] = (−c )2 φ ″(u). (ii)
∂t 2
∂t du ∂t
Therefore, from (i) and (ii)
∂2 z 1 ∂2 z
= 2 2.
∂x 2
c ∂t

The equation
∂2 z 1 ∂2 z
=
∂x2 c2 ∂t 2
in Example 28.6 is called the wave equation in one space dimension. It is a partial
differential equation as contrasted with the ordinary differential equations treated
earlier in the book. We have verified that φ(x − ct) is always a solution, for any
function φ. The general solution is
632
φ(x − ct) + ψ(x + ct),
where φ and ψ are arbitrary functions. The general solution of partial differential
DIFFERENTIATION OF FUNCTIONS OF TWO VARIABLES

equations involves arbitrary functions rather than the arbitrary constants that
occur in ordinary differential equations: even the simple equation ∂z /∂x = 0 has
the general solution z = f(y), where f(y) is an arbitrary function.

Self-test 28.2
If u(x, y) = x2y2/(x + y), show that
∂2u ∂2u ∂u
x + y =2 .
∂x2 ∂x ∂y ∂x

28.4 Tangent plane and normal to a surface


The tangent plane to a surface z = f(x, y) at a point Q on the surface plays the
same role as the tangent line to a curve for functions of a single variable. The tan-
gent plane is the plane that fits the surface near Q better than any other possible
plane, as when a coin is pressed against a teapot at a particular point.
Suppose that the tangent plane at Q (Fig. 28.7) has the equation
z = Ax + By + C.
28

Q (a, b, c) y

O
P

Fig. 28.7 The tangent plane to


z = f(x, y) at Q : (a, b, c), where
x c = f(a, b).

There are three constants to be determined, so we need three conditions to settle


the values. The conditions it is reasonable to expect the tangent plane to satisfy are
(i) It must pass through Q; so c = Aa + Bb + C.
(ii) In the x direction at Q, the slope A of the plane must be equal to the slope of
the surface; so
⎛ ∂f ⎞
A=⎜ ⎟ .
⎝ ∂x ⎠ Q
(iii) In the y direction at Q the slope B of the plane must be equal to the slope of
the surface; so
⎛ ∂f ⎞
B=⎜ ⎟ .
⎝ ∂y ⎠ Q
633
Then the equation for the tangent plane becomes

28.4
⎛ ∂f ⎞ ⎛ ∂f ⎞ ⎡ ⎛ ∂f ⎞ ⎛ ∂f ⎞ ⎤
z = ⎜ ⎟ x + ⎜ ⎟ y + ⎢c − ⎜ ⎟ a − ⎜ ⎟ b⎥ ,
⎝ ∂x ⎠ Q ⎝ ∂y ⎠ Q ⎢⎣ ⎝ ∂x ⎠ Q ⎝ ∂y ⎠ Q ⎥⎦

TANGENT PLANE AND NORMAL TO A SURFACE


or, more tidily,
⎛ ∂f ⎞ ⎛ ∂f ⎞
z − c = ⎜ ⎟ (x − a) + ⎜ ⎟ (y − b),
⎝ ∂x ⎠ Q ⎝ ∂y ⎠ Q
where the values of c, ∂f/∂x and ∂f/∂y are to be calculated using z = f(x, y).

Tangent plane at Q : (a, b, c) on the surface z = f(x, y)


⎛ ∂f ⎞ ⎛ ∂f ⎞
z − c = ⎜ ⎟ (x − a) + ⎜ ⎟ (y − b).
⎝ ∂x ⎠ ( a, b) ⎝ ∂y ⎠ ( a, b)
(28.5)

Example 28.7 Find the equation of the tangent plane at the point Q : (2, 1, −2)
on the sphere x2 + y2 + z2 = 9.
Recast the equation into the form z = f(x, y), noticing that Q is on the lower half of
the sphere:
z = −(9 − x2 − y2) 2 .
1

Work out the coefficients first. The chain rule gives:


∂f ⎛ ∂z ⎞
= −(−2x) ⋅ 12 (9 − x2 − y2 )− 2 , and ⎜ ⎟ = 1;
1

∂x ⎝ ∂x ⎠ (2,1)
∂f ⎛ ∂z ⎞
= −(−2y) ⋅ 12 (9 − x2 − y2 )− 2 , and ⎜ ⎟ = 12 .
1

∂y ⎝ ∂y ⎠ (2,1)
Therefore the equation of the tangent plane at Q is
z − (−2) = 1(x − 2) + 12 (y − 1),
or
z = x + 12 y − 92 .

A straight line SQR (Fig. 28.8) is said to be normal or perpendicular to the surface
z = f(x, y) at Q if it is perpendicular to its tangent plane at Q. The equation (28.5) for
the tangent plane can be written in the form
⎛ ∂f ⎞ ⎛ ∂f ⎞
⎜ ⎟ x + ⎜ ⎟ y + (−1)z = C,
⎝ ∂x ⎠ Q ⎝ ∂y ⎠ Q
where C is a constant, so (see eqn (10.22)) a triplet of direction ratios for the line
normal to the surface at Q is
⎛ ⎛ ∂f ⎞ ⎛ ∂f ⎞ ⎞
⎜ ⎜ ⎟ , ⎜ ⎟ , −1⎟ ,
⎝ ⎝ ∂x ⎠ Q ⎝ ∂y ⎠ Q
(28.6)

which are the coefficients of x, y, z.
634

R
DIFFERENTIATION OF FUNCTIONS OF TWO VARIABLES

z
Normal
n vector at Q

y
Q

S
Normal line through Q
x Fig. 28.8

Example 28.8 Find the cartesian (x, y, z) equation of the straight line normal to
the surface x2 + y2 + z2 = 9 at (2, 1, −2).
From Example 28.7 (which has the same data), the direction ratios in (28.6) are
1, 12 , −1.
Therefore the equation of the normal line at Q is (see Section 10.9)
x−2 y−1 z +2
= 1 = .
1 −1
28

The triplet of direction ratios in (28.6) can be regarded as the three components
of any vector parallel to the normal line. Such a vector is still called a normal
vector at Q, and is denoted usually by n:

⎛ ⎛ ∂z ⎞ ⎛ ∂z ⎞ ⎞
n = ⎜ ⎜ ⎟ , ⎜ ⎟ , −1⎟ .
⎝ ⎝ ∂x ⎠ Q ⎝ ∂y ⎠ Q ⎠

Any multiple of this vector is another normal vector, since it will be parallel to the
same line. A normal vector placed at Q is shown in Fig. 28.8.

Normal vector n at Q : (a, b, c) where c = f(a, b), on the surface z = f(x, y)


⎛ ∂z ⎞ ⎛ ∂z ⎞
n = ⎜ ⎟ î + ⎜ ⎟ q + (−1)x
⎝ ∂x ⎠ Q ⎝ ∂y ⎠ Q
⎛ ⎛ ∂z ⎞ ⎛ ∂z ⎞ ⎞
= ⎜ ⎜ ⎟ , ⎜ ⎟ , −1⎟ ,
⎝ ⎝ ∂x ⎠ Q ⎝ ∂y ⎠ Q ⎠
or any multiple of this vector. Its components are direction ratios of the
normal line at Q. (28.7)
635

Example 28.9 Find several vectors normal to the sphere x 2 + y 2 + z 2 = 9 at the

28.5
point (2, 1, −2) on the sphere.
The data are again the same as in Example 28.7. The normal taken from (28.7) is

MAXIMA, MINIMA, AND OTHER STATIONARY POINTS


(1, 12 , −1). Another is (−1, − 12 , 1), pointing in the opposite direction, while ( 23 , 13 , − 23 )
is a unit vector which is a normal.

Self-test 28.3
Find the equations of the tangent planes to the surface z = x2 + y2 at the four
points (±1, ±2). Find the region in the x,y plane bounded by the tangent planes.

28.5 Maxima, minima, and other stationary points


For a function of a single variable, a local maximum or minimum or a point of
inflection occurs where the tangent line to the graph of the function is horizontal.
For a function f(x, y) of two variables, there are similar possibilities at points where
the tangent plane is horizontal. Such points, or rather their (x, y) coordinates, are
called stationary points of f(x, y), because as we pass through them the function
is momentarily neither increasing nor decreasing. Sometimes a stationary point is
a local minimum or maximum as illustrated in Fig. 28.9a, b.
The condition for the tangent plane at Q on z = f(x, y) to be horizontal is that
the normal n at Q should be vertical, or parallel to the z axis. Therefore the x and
y components of n in (28.7) must be zero:
∂f ∂f
= 0, = 0.
∂x ∂y

(a) (b)

(c) (d)
Q
Q

Fig. 28.9 (a) A local minimum. (b) A local maximum. (c) A saddle. (d) A shoulder.
636
These constitute two simultaneous equations whose solutions (x, y) are the
stationary points of f(x, y).
DIFFERENTIATION OF FUNCTIONS OF TWO VARIABLES

Stationary points of f(x, y)


are at the simultaneous solutions (x, y) of
∂f ∂f
= 0, = 0.
∂x ∂y (28.8)

We shall usually describe a stationary point of f(x, y) as being ‘at P : (x, y)’ rather
than ‘at Q : (x, y, z) on z = f(x, y)’. If necessary, the corresponding value of z can
be worked out after finding (x, y).

Example 28.10 Find the stationary points of


f(x, y) = x − xy2 − 2y,
1
3
3

and the value of f(x, y) there.


Since ∂f/ ∂x = x2 − y2 and ∂f /∂y = −2xy − 2, stationary points occur where
x2 − y2 = 0, xy + 1 = 0.
The first equation is equivalent to y = ±x. Consider these alternatives separately:
If y = x, the second equation becomes x2 + 1 = 0, which has no solution. Therefore
reject y = x.
If y = −x, the second equation becomes −x2 + 1 = 0, which has solutions
x = ±1. Corresponding to these we have
28

y = −x = z1.
Therefore there are two stationary points, (1, −1) and (−1, 1). The values of f(x, y) at
these points are
f (1, −1) = 43 , f (−1, 1) = − 43 .

A stationary point at (a, b) is a local maximum if f(a, b) is greater than f(x, y)


at all points in its immediate locality; it is a local minimum if the words ‘less than’
are substituted for ‘greater than’. On a contour map, a maximum or minimum
shows its presence by being surrounded by closed contours as for the surface
shown in Fig. 28.10a in Fig. 28.10b illustrated.

(a) z (b)
Q3 y

0.5
Q3 Q2 Q1
Q2
Q1 x
y –1 1
–0.5

Fig. 28.10 (a) The surface z = −y2 − --12 x4 + x2 showing maxima at Q1 and Q3, and a saddle point
at Q2. (b) The corresponding contour map showing closed level curves around the maxima.
637
As with functions of a single variable, the criteria for a maximum or minimum
involve higher derivatives. The following test enables maxima, minima, and other

28.5
stationary points to be distinguished in most cases, but we omit the proof, which
is difficult.

MAXIMA, MINIMA, AND OTHER STATIONARY POINTS


Test for the character of a stationary point P : (a, b) of f(x, y)
Suppose than ∂f /∂x = ∂f/∂y = 0 at P. Then P is
2
∂2 f ∂2 f ⎛ ∂2 f ⎞
(a) a saddle if −⎜ ⎟  0 at P,
∂x 2 ∂ y 2 ⎝ ∂x ∂y ⎠
2
∂2 f ∂2 f ⎛ ∂2 f ⎞
(b) a maximum if 2 2 − ⎜ ⎟ 0
∂x ∂ y ⎝ ∂x ∂y ⎠

∂2 f ⎛ ∂2 f ⎞
with  0 ⎜ or 2  0⎟ at P,
∂x 2 ⎝ ∂y ⎠
2
∂2 f ∂2 f ⎛ ∂2 f ⎞
(c) a minimum if −⎜ ⎟ 0
∂x 2 ∂ y 2 ⎝ ∂x ∂y ⎠

∂2 f ⎛ ∂2 f ⎞
with  0 ⎜ or 2  0⎟ at P.
∂x 2
⎝ ∂y ⎠
(d) If none of these apply, the point might be any type. (28.9)

We can hint at the reason for the conditions in (28.9), by considering the
particular function
f(x, y) = --12 ax2 + hxy + --12 by2, (28.10)

where a, h, and b are constants. It follows that the derivatives are given by
∂f ∂f
= ax + hy, = hx + by
∂x ∂y
∂2f ∂2f ∂2f
= a, = h, = b.
∂x2 ∂x∂y ∂y2
Therefore f(x, y) has a stationary value where
ax + hy = 0, hx + by = 0.
Provided ab ≠ h2, the function has one stationary value at (0, 0).
Assuming that a ≠ 0, we can rewrite (28.10) in the form (completing the square):
A hy D
2
1 A D
f(x, y) = --12 a C x + F + C ab − h2F y2.
a 2a
Hence, for all (x, y) ≠ (0, 0)
f(x, y)  f(0, 0) = 0 if a  0 and ab − h2  0 (minimum);
f(x, y)  f(0, 0) = 0 if a  0 and ab − h2  0 (maximum).
638
If ab − h2  0 then irrespective of the sign of a there are values of (x, y) for which
f(x, y)  0, and other values for which f(x, y)  0 for the same parameter values.
DIFFERENTIATION OF FUNCTIONS OF TWO VARIABLES

The function given by (28.10) is a quadratic form which could be an approxima-


tion to a more general function near the origin. This local approximation obtained
by a Taylor series in two variables (not considered in this book) forms the basis of
the justification of (28.9).

Example 28.11 Find and classify the stationary points of


f(x, y) = x + 13 y 3 − x 2 − y 2.
1
3
3

The stationary points are the solutions of ∂f/∂x = 0, ∂f /∂y = 0, or


x 2 − 2x = 0, y 2 − 2y = 0.
From the first, we obtain x = 0 or x = 2. From the second, y = 0 or y = 2. Therefore there
are stationary points at (0, 0), (0, 2), (2, 0), (2, 2). To test them, we need the second
derivatives at a general point:
∂2 f ∂2 f ∂2 f
= 2x − 2, = 2y − 2, = 0.
∂x 2
∂y 2
∂x ∂y
At (0, 0), these become respectively −2, −2, 0. Then
2
∂2 f ∂2 f ⎛ ∂2 f ⎞ ∂2 f ∂2 f
−⎜ ⎟ = 4  0, = = − 4  0.
∂x2 ∂y2 ⎝ ∂x ∂y ⎠ ∂x2 ∂y2
Since the conditions of (28.9b) apply, the point is a maximum.
At (0, 2) and (2, 0),
28

2
∂2 f ∂2 f ⎛ ∂2 f ⎞
−⎜ ⎟ = − 4  0;
∂x2 ∂y2 ⎝ ∂x ∂y ⎠
so, by (28.9a), both points are saddles.
At (2, 2),
2
∂2 f ∂2 f ⎛ ∂2 f ⎞ ∂2 f ∂2 f
−⎜ ⎟ = 4  0, = 2 = 4  0.
∂x2 ∂y2 ⎝ ∂x ∂y ⎠ ∂x 2
∂y
Therefore, by (28.9c), the point is a minimum.

Self-test 28.4
A container with no lid has a triangular base in the form of an equilateral
triangle of side-length a, with vertical sides of height h. If the surface area is
a constant A, what are the dimensions of the container of maximum volume.

28.6 The method of least squares


Suppose that a succession of experiments is performed in which we vary one
quantity x, such as voltage applied to a circuit, and measure the corresponding
value of another variable y, say the resulting current. The values recorded for y
might be subject to random errors of measurement; on a graph of the results, this
will show up as scatter among the points, as in Fig. 28.11.
639

28.6
(xn, yn)
yn
en

THE METHOD OF LEAST SQUARES


+b
ax
y=

x
O xn Fig. 28.11

We might have reason to believe that the underlying relation between x and y is a
straight line. There is no way of deducing this line with certainty, but the follow-
ing method is often used to obtain a convincing straight line fit to the points.
Suppose that there are N points altogether; call them
(x1, y1), (x2, y2), …, (xN, yN).
The general point is called (xn, yn). Figure 28.11 shows a candidate for the best-
fitting straight line,
y = ax + b,
and we have to adjust the constants a and b to obtain a good fit. The vertical
deviation en of a point (xn, yn) from the line is shown:
en = yn − (axn + b).
The criterion we shall use to determine the best straight line is to choose a and
N
b so that ∑e
n =1
2
n is as small as possible; that is to say, we want to minimize

N N

∑ e n2 =
n =1
∑ (y
n =1
n − axn − b)2 = f(a, b) (say).

Therefore a and b are the variables in this problem, and everything else has fixed
values.
For a minimum, we require at least that
∂f ∂f
= = 0.
∂a ∂b
The derivatives are given by
∂f N N

∂a
= ∑ 2(−xn)(yn − axn − b) = 2 ∑ (ax 2
n + bxn − xnyn),
n =1 n =1

∂f N N

∂b
= ∑ (−2)(yn − axn − b) = 2 ∑ (ax n + b − yn).
n =1 n =1

N
Noting that ∑ b = b + b + ··· + b = Nb, we find the conditions for a minimum
n =1
as the following pair of simultaneous equations for a and b:
640

Method of least squares


DIFFERENTIATION OF FUNCTIONS OF TWO VARIABLES

To fit a straight line y = ax + b to the N points (xn, yn) (n = 1, 2, … , N):


find a and b by solving the simultaneous linear equations
N N N
a ∑ xn2 + b ∑ xn = ∑x y , n n
n =1 n =1 n =1

N N
a ∑ xn + bN = ∑y . n
n =1 n =1 (28.11)

We shall not prove that the stationary point of f(a, b) found by this method is
actually a minimum (see Problem 28.21).

Example 28.12 Find the straight line which best fits the data:
xn 0.0 1.1 3.2 3.9 7.1 8.9
yn 1.1 1.6 1.6 2.8 2.9 3.8
Here N = 6, and the coefficients in (28.10) are
6 6

∑ xn = 24.2, ∑y n = 13.8,
n=1 n=1
6 6

∑x 2
n = 156.28, ∑x y n n = 72.21.
n=1 n=1
28

The equations for a and b therefore become


156.28a + 24.2b = 72.21,
24.2a + 6b = 13.8.
By solving these we find that a = 0.28, b = 1.16, so the required line is y = 0.28x + 1.16.

The equations for a and b are sometimes ill-conditioned, meaning that the
solutions are very sensitive to small changes in the coefficients. It is therefore
advisable to retain all the significant figures given by the data while solving them,
despite the fact that we know they already embody the errors of measurement.

Self-test 28.5
In the method of least squares (28.11)), suppose that xn = n (n = 1, 2, … , N),
that yn is measured at successive integer values of xn. Find a and b in the
straight line fit y = ax + b. (Hint: use the summations in Appendix A(f ).)

Differentiating an integral with respect to


28.7
a parameter
Suppose that we have an integral whose integrand contains a parameter α as well
as the variable of integration – for example,
1 ∞

e  x +α.
dx
αt dt, g(x)h(x + α ) dx,
0 −∞
641
We shall consider a definite integral, though the process works in the same way
for indefinite integrals. Indicate the dependence on α in the general case by

28.7
 f(t, α) dt.
b

I(α) =

DIFFERENTIATING AN INTEGRAL WITH RESPECT TO A PARAMETER


a

Then dI(α)/dα can be obtained by the following rule:

Differentiating an integral with respect to a parameter

 f(t, α) dt = I(α), then


b

If
a


b
dI(α ) ∂f(t, α )
= dt.
dα ∂α a
(28.12)

This process is also called differentiation under the integral sign. To prove (28.12),
change α to α + δα; then I(α) changes to I(α + δα). Put
I(α + δα) − I(α) = δI(α).
Then

1 ⎛
b b

 
δ I(α ) I(α + δα ) − I(α )
= = ⎜ f (t, α + δα ) dt − f (t, α ) dt⎟
δα δα δα ⎝ a a ⎠
b


f (t, α + δα ) − f (t, α )
= dt.
a
δα
Now let δα → 0. Then δI(α)/δα becomes dI(α)/dα, and the integrand becomes
∂f(t, α)/∂α, which is the result (28.12).

 t + α , where α  0, and use (28.12) to


dt
Example 28.13 Evaluate I(α ) = 2 2
0

 (t + α ) .
dt
evaluate J(α ) = 2 2 2
0

From Appendix E,

π

dt
I(α ) = = [α −1 arctan(t /α )] 0∞ = .
0
t +α
2 2

By (28.12),
∞ ∞
∂ −2α d ⎛ π⎞ π
  (t
dI 1
= dt = dt = ⎜ ⎟ =− 2.
dα 0
∂α t 2 + α 2 0
2
+ α 2 )2 dα ⎝ 2α ⎠ 2α
Therefore

π
 (t
dt
J(α ) = = .
0
2
+ α 2 )2 4α 3
642

Self-test 28.6
DIFFERENTIATION OF FUNCTIONS OF TWO VARIABLES

Using Example 28.13, evaluate


K(α) =  (t +dtα ) .
0
2 2 3

Problems

28.1 Sketch contour maps of the following 28.5 In plane polar coordinates (r, θ ) in the first
quadrant, r = (x 2 + y 2 ) 2 and x = r cos θ. Form ∂r/∂x
1
functions:
(a) 2x − 3y + 4; (b) −x + 2y − 1; and ∂x/∂r, and show that
(c) (x − 1)(y − 1); (d) x2 + 41 y 2 − 1; ∂r ∂x
(e) x2 + 2x + y2 (complete the square in x); ≠ 1.
∂x ∂r
(f ) y /x; (g) y2 − x2; (h) y /x3;
(i) x3 + 4y2; ( j) y/(x + y). By considering the meaning of the derivatives
∂r/∂x and ∂x/∂r near a particular point P in the
manner of Fig. 28.6, show why it is not to be
28.2 By sketching rough contour maps, indicate
expected that the product should equal 1. (In the
the paths of steepest ascent (the paths on which z
case of a single variable and ordinary derivatives,
increases most rapidly), starting at the point (1, 1):
we often get true results by formally cancelling out
28

(a) z = 2x − 3y + 4; (b) z = x − y;
symbols like dx, du, etc., as in the chain rule. This
(c) z = x2y2; (d) z = (x − 1)2 + 41 (y − 1)2.
almost never works when more variables are
present: see for example the next problem.)
28.3 Obtain ∂f/ ∂x and ∂f /∂y at the point (2, 1) for
the following functions. 28.6 (a) Let z = sin(x − y); show that
(a) 3x + 7y − 2; (b) −2x + 3y + 4; ∂z ∂z
(c) 2x2 − 3y2 − 2xy − x − y + 1; = −1.
∂x ∂y
(d) 81 x3 + y3 − 2y − 1; (e) x4y2 − 1;
(f) (x − 1)(y − 2); (g) 1 /(xy); ∂z ∂z
(b) Let z = g(x − y); show that = −1.
x−y 3 ∂x ∂y
(h) x /y; (i) ; ( j) 2 ;
x+y x + y2
(k) (x 2 + y 2 ) 2 ; (l) (2x − 3y + 2)3; (m) ex +y ; 28.7 Show that, if z = g(x/y), then
1 2 2

(n) cos(x − y ); (o) sin(x /y); (p) arctan(y /x).


2 2
∂z ∂z
x +y = 0,
∂x ∂y
28.4 (a) Let z = g(ax + by), where a and b are and check the result in the case z = sin x/y.
constants. Express ∂z / ∂x and ∂z /∂y in terms
of g′(ax + by) (which means g′(u) when u is
subsequently put equal to ax + by). Check ∂ 2f ∂ 2f ∂ 2f ∂ 2f
28.8 Find , 2, , in each of the
your result for the cases when g(u) = cos u ∂x ∂y ∂y ∂x ∂x ∂y
2

and g(u) = eu. following cases (see (28.3)).


(b) Let z = g(sin xy). Express ∂z /∂x and ∂z /∂y (a) ax + by + c; (b) x2 + 2y2 + 3xy − x + 1;
in terms of x, y, and g′(sin xy). Check the result by (c) sin(x − y); (d) y /x; (e) e2x+3y;
differentiating esin xy directly. (f) 1 /x + 1 /y; (g) sin 3x + cos 2y;
(c) A certain physical quantity V is a function (h) (3x − 4y)4; (i) 1 /(x + y); (j) ln xy;
(k) 1/(x 2 + y 2 ) 2 .
1
only of the radial coordinate r in plane polar
coordinates: V = g(r), where r = (x 2 + y 2 ) 2. Express
1

∂V/ ∂x and ∂V/ ∂y, firstly in terms of x and y, then 28.9 Confirm that, if r = (x 2 + y 2 ) 2 and
1

in terms of r and θ. z = ln r, then


643
∂z x ∂z2
1 2x 2
simple parametrization of each line. (Use different
= and = − 4 .
∂x r 2 ∂x 2 r 2 r letters for the two parameters: these will be the

PROBLEMS
new variables for the minimization.)
Show that z = ln r is a solution of the equation
∂ 2z ∂ 2z
+ = 0. 28.17 N points (x1, y1), (x2, y2), … , (xN, yN) are
∂x 2 ∂y 2 given in a plane, and P : (x, y) is a general point.
(This is called Laplace’s partial differential Find P so that the sum of the squares of its
equation in two dimensions.) distances from the N given points is as small
as possible.
28.10 Obtain the tangent plane and a normal
vector for the following surfaces at the points 28.18 (a) A rectangular box with a lid must hold
given. a given volume V, and have the smallest possible
(a) z = x2 + y2 at (1, 1, 2); (b) z = xy at (2, 2, 4); surface area. Show that it must be a cube. (Call the
(c) z = x /y at (2, 1, 2); lengths of two of its sides x and y.)
(d) z = (29 − x 2 − y 2 ) 2 at (3, 4, 2);
1
(b) An open-topped rectangular box must have a
(e) z = x2 + y2 − 2x − 2y at (1, 1, −2); given volume V and its surface area must be as small
(f ) z = exy at (0, 0, 1). as possible. Find its dimensions.
(c) A circular-cylindrical box must have a fixed
28.11 The two surfaces z = x2 + y2 and z = x − y + 2 volume V and minimum surface area. Find its
intersect at the point Q : (1, 1, 2). Find normal dimensions (i) if it has a lid, (ii) if it has no lid.
vectors at Q to each of the two surfaces, n 1 to the (d) A rectangular container is required to have
first and n 2 to the second. By considering the total surface area S, and a volume as large as
scalar product n 1 · n 2, find the angle between possible. Find its dimensions (i) if it has a lid,
the normals and hence the angle at which the (ii) if it does not have a lid.
surfaces cut at Q.
28.19 Find the straight line which best fits the
28.12 Find the stationary points of the following experimental data in the sense of Section 28.7:
functions, and classify them using (28.9). x 1 2 3 4 5
(a) (x − 1)(y + 2); (b) x2 + y2 − 2x + 2y; y 3.1 2.1 2.0 1.8 1.2
(c) 31 x3 − 31 y3 − x + y + 3; (d) cos x + cos y;
(e) ln(x2 + x) + ln( y2 + y); (f) ex + y −2x+2y;
2 2
28.20 The population P of a fast-breeding rodent
(g) xy + 1 /x + 1 /y; (h) x3 + y3 − 3xy + 1; was observed over a period of 12 months, and the
(i) sin x + sin y; (j) xy2 − x2y + x − y + 1; following estimates obtained:
(k) (x2 − y2) + 2xy; (l) (2 − x2 − y2)2;
t (months) 0 2 3 5 8 10 12
(m) x4 + y4 + y − x;
P (pop’n) 12 23 26 60 170 300 690
(n) x4 + y4 (this eludes the test (28.9) − the point is
obviously a minimum). Assume that the underlying growth law takes the
form (see Section 1.12)
28.13 Classify the stationary point of ax2 + 2hxy P = A ebt,
+ by2 at (0, 0) for various relations between a, b, where A and b are constants.
and h. To estimate A and b, take the logarithm of this
expression and treat y = ln P as a variable in the
28.14 Find positive numbers a, b, c so that least-squares method of Section 28.7.
(a) a + b + c = 21 and abc is a maximum.
(b) abc = 64 and a + b + c is a minimum. 28.21 For the least-squares method of
Section 28.7, use the test (28.9) to show that the
28.15 Find the absolute maximum value of
values of a and b obtained do minimize the sum
(2 − x2 − y2)2 in the ‘box’ −1  x  2, −1  y  1.
of squares. (This is, of course, rather obvious
(It will be necessary to investigate the function on
intuitively.)
the four edges of the box separately, since the
absolute maximum will not be revealed by the
conditions (28.9) if it is on the edges.) 28.22 Using Laplace transforms with respect to t,
solve the partial differential equation
28.16 Find the shortest distance between the ∂z ∂z
+x + z = 2x,
straight lines x = y = z and 2x = y = z + 2, by using a ∂t ∂x
644
for x  0 and t  0, where z(0, t) = 0 and 28.24 If z = f(x, y), how many nth-order
z(x, 0) = 0. partial derivatives of f(x, y) are these of the
DIFFERENTIATION OF FUNCTIONS OF TWO VARIABLES

form ∂ nf/∂x r ∂y n−r assuming that the order of


28.23 A grain silo of height 2a with a square differentiation is immaterial? How many would
floor on z = 0 has vertical sides given by x = 0, there be if the order did matter?
x = a, y = 0, y = a in x, y, z space. Grain is poured
into the silo and eventually settles with a surface 28.25 (A necessary condition for functional
given by dependence.) By using eqn. (28.4), show that
if g(x, y) = H{f(x, y)}, then
z = [2a2 − (x − 12 a )2 − y2]/a.
∂f ∂g ∂f ∂g
Find the highest and lowest points of the surface of ∆= − ≡ 0,
the grain in the silo. (Note: the lowest point(s) are ∂x ∂y ∂y ∂x
not stationary points.) H being a function of a single variable.
28
Functions of two variables:
geometry and formulae 29

CONTENTS

29.1 The incremental approximation 645


29.2 Small changes and errors 648
29.3 The derivative in any direction 651
29.4 Implicit differentiation 654
29.5 Normal to a curve 657
29.6 Gradient vector in two directions 659
Problems 662

In Section 28.4 the tangent plane at a point on the surface z = f(x, y) was defined
to be the plane that best fits the surface in the neighbourhood of the point.
Written in algebraic terms this property becomes the incremental approximation
to the surface in the neighbourhood of the point, and constitutes the best linear
approximation to f(x, y) close to the point of contact. The incremental approxima-
tion is the origin of all applications of this topic through this and the next two
chapters. An immediate application is to the question of approximating to the
effect of making small changes δx, δy in the variables in a complicated formula
z = f(x, y), and the associated question of estimating errors in z when x and y are
subject to errors.

29.1 The incremental approximation


It was explained in Section 28.4 that the tangent plane at a point is the plane that
best fits a surface at the point. The formula for the tangent plane to a surface
z = f(x, y) at Q : (a, b, c), where c = f(a, b), is
⎛ ∂f ⎞ ⎛ ∂f ⎞
z − c = ⎜ ⎟ (x − a) + ⎜ ⎟ (y − b)
⎝ ∂x ⎠ (a,b) ⎝ ∂y ⎠ (a,b)
(see (28.5)). We will set up new axes with origin at Q, parallel to the old ones, and
call them δx, δy, δz (see Fig. 29.1), anticipating that we shall be concerned with
small distances from Q. Then
δx = x − a, δy = y − b, δz = z − c.
646

z δz
FUNCTIONS OF TWO VARIABLES: GEOMETRY AND FORMULAE

y) y
(x,
z=

δy
Q

δx
O

tangent plane

Fig. 29.1

In the new coordinates, the equation of the tangent plane is

⎛ ∂f ⎞ ⎛ ∂f ⎞
δz = ⎜ ⎟ δx + ⎜ ⎟ δy.
⎝ ∂x ⎠ (a,b) ⎝ ∂y ⎠ (a,b)

Now consider the quantity δf, where

δf = f(x, y) − f(a, b).


29

This is the exact change in z on the surface z = f(x, y) from its value at Q. The tan-
gent plane is the best-fitting plane to the surface at Q, so the formula

⎛ ∂f ⎞ ⎛ ∂f ⎞
f(x, y) − f(a, b) = δ f ≈ ⎜ ⎟ δx + ⎜ ⎟ δy
⎝ ∂x ⎠ (a,b) ⎝ ∂y ⎠ (a,b)

must give the best-fitting linear approximation to δf near x = a, y = b:

Best linear approximation to f(x, y) near (a, b)


⎛ ∂f ⎞ ⎛ ∂f ⎞
δ f ≈ ⎜ ⎟ δx + ⎜ ⎟ δy,
⎝ ∂x ⎠ ( a, b) ⎝ ∂y ⎠ ( a, b)
where δ f = f(x, y) − f(a, b), δx = x − a, δy = y − b. (29.1)

You are more likely to remember the formula obtained by calling the general
point (x, y) instead of (a, b), and putting z in place of f. Also the approximation
will be good enough to be useful only when δx and δy are ‘small’ (how small will
depend on circumstances):
647

Incremental approximation for f(x, y) (mnemonic version)

29.1
For small enough increments δx and δy:
∂f ∂f
f(x + δx, y + δy) − f(x, y) ≈ δx + δy.

THE INCREMENTAL APPROXIMATION


∂x ∂y
If we put z = f(x, y), this can be written
∂z ∂z
δz ≈ δx + δy.
∂x ∂y (29.2)

This will be the source of almost all our results from now on, but remember
that ∂z/∂x and ∂z /∂y in (29.2) are constants given by the explicit formulas (28.2)
and (29.1).

Example 29.1 Let z = x2 + 3y2. Find an approximation to δz in terms of δx


and δy near the points (a) x = 2, y = 1; (b) x = 3, y = 2; (c) x = 0, y = 0. (d) Find
the exact value of δz in case (a) and compare it with the approximate values in
the three cases when δx and δy both take the values 0.1, 0.01, and 0.001.
∂z ∂z
In general, = 2x and = 6y.
∂x ∂y
(a) At (2, 1), ∂z/∂x = 4 and ∂z /∂y = 6. Therefore, from (29.2)
δz = 4 δx + 6 δy approximately.
(b) At (3, 2), ∂z/∂x = 6 and ∂z /∂y = 12; so
δz = 6 δx + 12 δy approximately.
(c) At (0, 0), ∂z/∂x = ∂z /∂y = 0; so the formula predicts δz = 0 approximately.
The reason is that (0, 0) is a stationary point, so z hardly changes when we move
a short distance from (0, 0).
(d) From (a), the approximation near (2, 1) when δx = δy = 0.1 is
δz = (4 × 0.1) + (6 × 0.1) = 1.0.
The exact value is given by
δz = f(2.1, 1.1) − f(2, 1) = 1.04,
so the error in estimating δz is (− 4)%. If δx = δy = 0.01, the error is (−0.4)%; if
δx = δy = 0.001, it is (− 0.04)%.

We see from (d) in the Example that the approximation improves percentage-
wise as δx and δy get smaller: it is not merely that the error decreases because δx,
δy, δz all go to zero together. The following Example shows the reason for this.

Example 29.2 Find the exact algebraic form of the error incurred by using (29.2)
to estimate δz at (2, 1) when z = x 2 + 3y2 (see Example 29.1a).
Put x = 2 + δx and y = 1 + δy. Then
δz = f(2 + δx, 1 + δy) − f(2, 1) = (2 + δx)2 + 3(1 + δy)2 − 7
= (4 δx + 6 δy) + (δx2 + 3 δy2).
648
The first two terms represent the linear approximation obtained in Example 29.1a.
The remainder is the error incurred, the part we ignore in the approximation. The
FUNCTIONS OF TWO VARIABLES: GEOMETRY AND FORMULAE

error consists only of higher powers of δx and δy, and this will always be the case.
Therefore the error is an order of magnitude smaller than the linear terms retained
in the incremental approximation (29.2).

Self-test 29.1
The hypotenuse of a right-angled triangle with side lengths x and y is given
by z = √(x2 + y2). Find an approximation to δz in terms of δx and δy.
Calculate the approximate change in δz at x = 3, y = 4 if δx = 0.1 and
δy = − 0.1.

29.2 Small changes and errors


The incremental approximation (29.1) or (29.2) can be used to estimate the effect
of making small changes in the values of variables in a formula.

Example 29.3 Estimate the change in the value of


1
z= 2
(x + y2 )2
1

when (x, y) change from (3, 4) to (3.1, 3.8).


29

Using (29.2), put (x, y) = (3, 4), δx = 0.1, δy = −0.2. We require


⎛ ∂z ⎞
⎜ ⎟ = [−x(x2 + y2 ) 2 ](3,4) = − 125 ,
−3 3
⎝ ∂x ⎠ (3,4)
⎛ ∂z ⎞
⎜ ⎟ = [−y(x + y ) 2 ](3,4) = − 125 .
2 2 −3 4
⎝ ∂y ⎠ (3,4)

Therefore, approximately,
δz = (− 125
3
)(0.1) + (− 125
4
)(− 0.2) = 0.004.
(The exact value of δz is 0.003 91 … .)

1
Example 29.4 The period T of the swings of a pendulum is equal to 2π(l/g) 2 ,
where l is its length and g the gravitational constant. Estimate the error in
calculating T if, instead of using closely correct values l = 1.015 and g = 9.812
in the formula, we use the rounded values l = 1 and g = 10.
The formula corresponding to (29.2) is
∂T ∂T
δT ≈ δl + δg.
∂l ∂g
Suppose for simplicity we decide to substitute the rounded values l = 1 and g = 10 into
the coefficients: we obtain ➚
649
Example 29.4 continued

29.2
∂T ∂T
= (π l − 2 g − 2 )(1,10) = 0.993, = (−π l 2 g − 2 )(1,10) = − 0.099.
1 1 1 3

∂l ∂g

SMALL CHANGES AND ERRORS


Equation (29.2) then requires that we put
δl = (true value) − (rounded value) = 0.015,
δg = (true value) − (rounded value) = − 0.188.
Then
δT ≈ (true value) − (rounded value)
≈ (0.993)(0.015) + (− 0.099)(−0.188) = 0.0335
But this is not the error: for that we need
(error) = (rounded value) − (true value) = −δT,
so the error is about −0.0335. (The exact error is −0.0339 … .)

In the last example, we substituted the rounded (erroneous) values into ∂T/ ∂ l
and ∂T/∂g, which led to a complication we might have avoided. However, usually
there is no choice, the exact values being unknown. Let z = f(x, y), and suppose
that we want to estimate the error is z which could arise from using measured
(i.e. approximate) values for x and y. The error ∆x in x is defined to be
∆x = (measured value of x) − (exact value of x),
and similarly for ∆y and ∆z.
Usually we only know a range of possible error, not the errors themselves. For
example, we might say that a parcel weighed 1430(±15) g, meaning that we think
it is between 1415 g and 1445 g. Therefore, the values of ∆x and ∆y are unknown,
so the exact values of x and y are unknown, and are not available to go into (29.2)
in place of (x, y). Instead, in such cases, take x, y to be convenient reference values,
at which the derivates are evaluated. To correspond with this, the definition of
δx, δy, δz in (29.2) requires
δx, δy, δz = (true values) − (reference values),
Therefore
δx = −∆x, δy = −∆y, δz = −∆z
go into (29.2). Every term has then a negative sign, so the formula in terms of ∆x,
∆y, ∆z has the same shape as the incremental formula:

Small-error formula
If z = f(x, y), then
∂z ∂z
∆z = ∆x + ∆y (approximately),
∂x ∂y
where x and y are reference values, and ∆ stands for
error = (reference value) − (exact value). (29.3)

This is used in the following way.


650

Example 29.5 In a triangle ABC, the side BC has length a given by


FUNCTIONS OF TWO VARIABLES: GEOMETRY AND FORMULAE

c sin A
a= .
sin(A + B)
Suppose that c = 10 (exactly), and angles A and B are measured to 5° accuracy:
A = 45(±5)°, B = 30(±5)°. Estimate the largest possible resulting error in a.
Put
sin A
a = f(A, B) = 10 ,
sin(A + B)
where A and B are measured in radians. Then
∂a sin(A + B) cos A − cos(A + B) sin A sin B
= 10 = 10 2 .
∂A sin2(A + B) sin (A + B)
Also
∂a cos(A + B) sin A
= −10 .
∂B sin2(A + B)
Choose as reference values A = 45° and B = 30° (for this seems to be the simplest choice).
We get ∂a/∂A = 5.36 and ∂a/∂B = −1.96. The error formula (29.3) becomes
∆a = 5.36 ∆A − 1.96 ∆B
approximately, where ∆A and ∆B must be measured in radians.
The greatest possible magnitude of ∆a occurs if ∆A and ∆B happen to have the
opposite signs and their greatest possible magnitudes; that is, if ∆A = −∆B = ±0.087
radians. In that case, ∆a = ±0.64. Therefore
a = f(--14 π, --16 π) ± 0.64 = 7.32 ± 0.64,
showing a possible error of about 8.7%.

Example 29.6 One solution of the equation x2 + bx + c = 0 is


29

x = [−b + (b − 4c)2 ]. (a) Find an approximate expression for the error


1 1
2
2
∆x arising from small errors ∆b and ∆c in b and c. (b) Estimate the maximum
possible error in the solution x if b and c are rounded to one decimal to give
b ; 3.1, c ; 2.1.
(a) We have ∆x = (∂x/∂ b) ∆b + (∂x/∂c) ∆c, in which we must put
∂x 1 ∂x
= 2 [−1 + b(b2 − 4c )− 2 ], = −(b2 − 4c )− 2 .
1 1

∂b ∂c
(b) Since b and c are rounded numbers, all that we know about them is that
b = 3.1(±0.05), c = 2.1(±0.05),
meaning that the error might be anywhere in the range indicated. Putting the reference
values b = 3.1 and c = 2.1 into (a), we obtain
∂x ∂x
= 0.909, = − 0.909;
∂b ∂c
so, by (29.3),
∆x = 0.909 ∆b − 0.909 ∆c.
This takes its greatest possible magnitude when ∆b and ∆c take their maximum values
and have opposite sign: that is, when
∆b = ±0.05, ∆c = z0.05.
In that case ∆x = ±0.909(0.05 + 0.05) = ±0.091.
The value of x estimated from the rounded coefficients is x = −1. Although the rounding
error is only at most 2.4%, the error in the solution could be as large as ±9.1%.
651

Self-test 29.2

29.3
The volume V of a circular cylinder of base radius r and height h is given by
V = πr 2h. Find δV in terms of δr and δh. If r = 4 and h = 5 and r increases by

THE DERIVATIVE IN ANY DIRECTION


0.1 and h increases by 0.2, estimate the percentage increase in volume.

29.3 The derivative in any direction


The plane in Fig. 29.2 is a map of a surface z = f(x, y) with all detail omitted.
On the surface above P : (x, y) we see a slope ∂z /∂x if we look east, a slope ∂z/∂y
looking north, and other slopes in other directions. We can find the slopes in other
directions in terms of ∂z/∂x and ∂z /∂y. It might seem that we could make the
intermediate slopes equal to anything we liked, but if the surface at P is smooth
enough to have a tangent plane, this is not so. In effect, the slopes we see are the
slopes of the tangent plane in the various directions.

δs
δy
θ
P
δx

O x
Fig. 29.2

Consider the direction P_Q which makes an angle θ with the positive x axis,
the direction for positive angles being anticlockwise as with polar coordinates.
Let the length PQ = δs, a short step, and let δx and δy be as shown. Then, by
(29.2), the change in elevation in this direction is given approximately by
∂z ∂z
δz ≈ δx + δy.
∂x ∂y
Divide by δs; we obtain
δ z ∂z δ x ∂z δ y ∂z ∂z
≈ + = cos θ + sin θ
δ s ∂x δ s ∂y δ s ∂x ∂y
from Fig. 29.2. Now let δs → 0; the approximation becomes exact, and we have
an expression for the slope in any direction. Using the notation for the directional
derivative,
δ z dz
lim = ,
δ s→ 0 δ s ds
we have the following formula.
652

Directional derivative
FUNCTIONS OF TWO VARIABLES: GEOMETRY AND FORMULAE

The slope of z = f (x, y) at P in direction θ:


dz ⎛ ∂z ⎞ ⎛ ∂z ⎞
= ⎜ ⎟ cos θ + ⎜ ⎟ sin θ .
ds ⎝ ∂x ⎠ P ⎝ ∂y ⎠ P
(29.4)

Example 29.7 Find the slope of the surface z = xy + x2 at P : (2, 3) in the


direction (−120)°.
The direction is shown on Fig. 29.3.

y P
3 (2, 3)
120°

2
Q

x
O 1 2 3
Fig. 29.3

⎛ ∂z ⎞
⎜ ⎟ = (y + 2x)(2,3) = 7,
⎝ ∂x ⎠ (2,3)
29

⎛ ∂z ⎞
⎜ ⎟ = (x)(2,3) = 2.
⎝ ∂y ⎠ (2,3)
Also
cos(−120°) = −sin 30° = − 12
and
sin(−120°) = −cos 30° = − 12 √3;
so
dz
= 7(− 12 ) + 2(− 12 √3) = − 12 (7 + 2√3).
ds
which means that the surface is descending in the direction (−120)°.

Example 29.8 The temperature distribution in a plate heated at the point (0, 0)
is given by T = 1/(x 2 + y2) 2 . (a) Find the temperature gradient at the point (3, 3)
1

in a direction of 45° to the positive x axis. (b) In polar coordinates, T = 1/r. Show
that the result (a) is the same as ∂T/∂r taken at any point on the circle r = 3√2.
⎛ ∂T ⎞ ⎛ x ⎞ 1
(a) ⎜ ⎟ = ⎜− 2 3 ⎟ =− ,
⎝ ∂x ⎠ (3,3) ⎝ (x + y2 )2 ⎠ (3,3) 18√2
⎛ ∂T ⎞ ⎛ y ⎞ 1
⎜ ⎟ = ⎜− 2 3 ⎟ =− .
⎝ ∂y ⎠ (3,3) ⎝ (x + y ) ⎠ (3,3)
2 2 18√2 ➚
653
Example 29.8 continued

29.3
Also cos θ = 1/√2 and sin θ = 1/√2. Therefore the temperature gradient at (3, 3)
in the given direction is

THE DERIVATIVE IN ANY DIRECTION


dT 1
=− .
ds 18
(b) T = 1/r, so ∂T/∂r = −1/r 2. At the given point, r = 3√2, so the result is the same.

Example 29.9 At any point on the plane z = 3x − y + 4, find (a) an expression


for the slope dz/ds in every direction, (b) the directions in which dz /ds = 0,
(c) the directions in which dz/ds is a maximum and a minimum.
(a) ∂z/∂x = √3 and ∂z/∂y = −1; and these are the same at every point. By (29.4),
dz
= 3 cos θ − sin θ .
ds
(b) dz /ds = 0 where 3 cos θ − sin θ = 0, or tan θ = √3. Therefore θ = 60° or θ = −120°.
These directions are opposed: see Fig. 29.4. They give the direction of the contour
through any point.

y 60°
r
tou

150°
ste
con

des epest
cen
t P
ste
asc epest
ent −30°
r
tou
con

−120°
O x Fig. 29.4

(c) dz/ds is a maximum (direction of steepest ascent), or a minimum (steepest descent),


in directions such that
d ⎛ dz ⎞
⎜ ⎟ = 0,
dθ ⎝ d s ⎠
or − 3 sin θ − cos θ = 0, or tan θ = −1/√3. Therefore θ = −30° or θ = 150°, these
directions being directly opposed: see Fig. 29.4. By considering the sign of
d2 ⎛ dz ⎞
⎜ ⎟,
dθ 2 ⎝ d s ⎠
or just by thinking about it, it can be seen that the directions of steepest ascent and
descent from P are as shown.

In the last example, the directions of steepest ascent/descent at any point are
perpendicular to the directions of the contours; we shall now show that this is
true for all surfaces. On the contour map of z = f(x, y), the slope in the direction
θ at a point P : (x, y) has the form (29.4):
dz
= A cos θ + B sin θ,
ds
654
where A and B are the values of ∂z /∂x and ∂z /∂y at P. This is zero in the directions
θ1 where
FUNCTIONS OF TWO VARIABLES: GEOMETRY AND FORMULAE

tan θ1 = −A /B.
The two directions θ1 which satisfy this equation differ by π, so they indicate
smooth passage of the contour through P. The gradient dz /ds is a maximum or
minimum when
d ⎛ dz ⎞
⎜ ⎟ = 0,
dθ ⎝ d s ⎠
or in directions θ2 where
tan θ2 = B /A,
which give the directions of steepest ascent /descent. Since
tan θ1 tan θ2 = −1,
these directions are perpendicular (see (1.9)), a fact known intuitively by any hill
walker.

Steepest ascent/descent
At each point on the map of z = f(x, y), the direction of steepest ascent/descent
is perpendicular to the contour. (29.5)

The two systems of curves, consisting of the contours and the curves which follow
directions of steepest ascent or descent, are perpendicular wherever they cross, so
29

they are called orthogonal systems of curves (see Section 30.4).

Self-test 29.3
A surface is defined by z = e−x −2y . A person walks on the surface on the line
2 2

x + y = 1 between (0, 1) and (1, 0). Find the rate of ascent/descent of the
walker at each point on the line.

29.4 Implicit differentiation


An equation of the type
f(x, y) = c,
where c is a constant, describes a curve or curves in the (x, y) plane, since we
can imagine solving it to obtain y as a function of x. For example, x 2 + y 2 = 4
represents the two semicircles y = ±(4 − x 2) 2 . Another interpretation is that the
1

equation f(x, y) = c describes the contour z = c of a surface z = f(x, y), projected


into the (x, y) plane, as in Fig. 28.4b.
Although it is usually impossible in practice to solve for y in terms of x, it is
always possible to obtain an expression for the slope dy /dx of the curve in terms
655

29.4
Q

δy
P

IMPLICIT DIFFERENTIATION
δx
) =c
,y
f(x

O x Fig. 29.5

of x and y. Choose any point P : (x, y) on the curve (Fig. 29.5), and move along it
a short distance to Q : (x + δx, y + δy). Then dy /dx on the curve is given by
dy δy
= lim .
dx δx →0 δ x
Since P and Q both lie on the curve, δf = 0; so the incremental approximation (29.2)
gives
∂f ∂f
δx + δ y ≈ 0,
∂x ∂y
or
δy ∂f ∂f
≈− .
δx ∂x ∂y
Now let δx → 0. The ‘≈’ becomes ‘=’, and δy/δx becomes dy/dx, from which we
obtain:

The implicit-differentiation formula


The slope of f(x, y) = c at any point (x, y) on the curve is given by
dy ∂f ∂f
=− .
dx ∂x ∂y (29.6)

The process is called implicit differentiation because f(x, y) = c gives y in terms of


x only ‘implicitly’, not explicitly.

Example 29.10 Find an expression for dy /dx at a general point (x, y) on the
circle x2 + y2 = 4.
Here f(x, y) = x2 + y2, and so
∂f ∂f
= 2x, = 2y.
∂x ∂y
Therefore, by (29.6),
dy 2x x
=− =− ,
dx 2y y
where x2 + y2 = 4
656
In the last Example we would have obtained exactly the same result for the
circle x2 + y2 = 1, or x2 + y2 = 100. It is the numerical values of x and y to be put in
FUNCTIONS OF TWO VARIABLES: GEOMETRY AND FORMULAE

the right-hand side which will distinguish the circle under discussion from all the
other circles. In fact the equation we obtained,
dy x
=− ,
dx y
can be thought of as a differential equation. Its solutions (obtained by the method
of Section 22.3) are x2 + y2 = C, which includes the given circle and all the others
as well.

Example 29.11 Find dy /dx on the curve x3y − xy3 = 6 at the point (2, 1).
(You can check that the point (2, 1) is really on the curve.) Putting f(x, y) = x3y − xy3,
we have
∂f ∂f
= 3x2y − y 3 , = x 3 − 3xy2 .
∂x ∂y
Therefore, at any point (x, y) on the curve,
dy 3x2y − y 3
=− 3 .
dx x − 3xy2
At (2, 1), the slope is
⎛ dy ⎞ ∂ f / ∂x 11
⎜ ⎟ =− =− .
⎝ dx ⎠ (2,1) ∂ f / ∂y 2
(This is not a differential equation: it is a numerical value which holds at only a
29

single point.)

The link with differential equations can be used in many ways, as in the fol-
lowing example.

Example 29.12 Find the family of curves which is orthogonal (perpendicular)


to the curves xy = C.
The rectangular hyperbolas xy = C are the contours of the function f(x, y) = xy, and the
new family will be the curves of steepest ascent /descent on the contour map of xy.
The differential equation of the family xy = C is, from the implicit-differentiation
formula (29.6),
dy y
=− .
dx x
Wherever the new curves intersect with these they must cut at a right angle, so the
product of their slopes at any intersection must be equal to −1. Therefore the new
family must have the differential equation
dy x
=
dx y
because (−y/x)P(x/y)P = −1 at any point P. This equation can be solved by separating
the variables (Section 22.3), which gives ➚
657
Example 29.12 continued

29.5
y

NORMAL TO A CURVE
P

O x
xy = C
y2 − x2 = B Fig. 29.6

 y dy =  x dx
or
y2 − x2 = B,
where B is an arbitrary constant. This is another family of hyperbolas. A small region of
the (x, y) plane is shown in Fig. 29.6.

Self-test 29.4
Using the implicit formula (29.6), find dy/dx where x2 + 2xy + 4y2 = 4 (the
curve is an ellipse: draw a sketch of it). Find where the maximum and min-
imum values occur on the ellipse.

29.5 Normal to a curve


The slope at any point P on the curve f(x, y) = c is equal to
⎛ ∂f ⎞ ⎛ ∂f ⎞
−⎜ ⎟ ⎜ ⎟ (by (29.6)).
⎝ ∂x ⎠ P ⎝ ∂y ⎠ P
We shall obtain a vector n perpendicular or normal to the curve at P. A straight
line through P perpendicular to the curve must have slope
⎛ ∂f ⎞ ⎛ ∂f ⎞
⎜ ⎟ ⎜ ⎟ ,
⎝ ∂y ⎠ P ⎝ ∂x ⎠ P
because the product of the slopes must be equal to −1. A vector with components
(a, b) has slope b/a, so one normal vector n is
658

⎛ ⎛ ∂f ⎞ ⎛ ∂f ⎞ ⎞
n = ⎜⎜ ⎟ , ⎜ ⎟ ⎟ .
FUNCTIONS OF TWO VARIABLES: GEOMETRY AND FORMULAE

⎝ ⎝ ∂x ⎠ P ⎝ ∂y ⎠ P ⎠
Any multiple of this n is also a normal at the point. Dropping the suffix P, we have
the following result.

Normal vector n at the point (x, y) on the curve f(x, y) = c


⎛ ∂f ∂f ⎞ ∂f ∂f
n=⎜ , ⎟ = î+ q.
⎝ ∂x ∂y ⎠ ∂x ∂y
Any multiple of this is also a normal. (29.7)

Example 29.13 Find several normal vectors at the point (2, 1) on the curve
x 2 + y2 = 5.
Putting f(x, y) = x2 + y2, we have ∂f /∂x = 2x and ∂f /∂y = 2y; so
⎛ ∂f ⎞ ⎛ ∂f ⎞
⎜ ⎟ = 4, ⎜ ⎟ = 2.
⎝ ∂x ⎠ (2,1) ⎝ ∂y ⎠ (2,1)
Therefore one vector normal to the circle at (2, 1) is
n = (4, 2)
and from this any number of other normal vectors can be constructed by taking
multiples. For example, (2, 1), (−2, −1), and ( √25 , √15 ) are also normals, the last one
being a unit normal (one having unit length), which is often important.
29

Example 29.14 Find the angle of intersection between the curves x2 + y2 = 5


and x2 − y2 = 3 at the point (2, 1).
In the last Example, we showed that n1 = (4, 2) is normal to x2 + y2 = 5 at (2, 1). Similarly
the vector n2 = (4, −2) is normal to x2 − y2 = 3 at the point. From Fig. 29.7, it can be seen
that the acute angle θ between the normals is equal to one of the angles between the
curves (the other is π − θ ). From (10.4),

y
x2 + y2 = 5
2 x2 − y2 = 3

n2
θ
1 (2, 1) θ

n1

O 1 2 x Fig. 29.7

n1 . n 2 (4, 2) ⋅(4, −2) 3


cos θ = = = ,
| n1 || n 2 | √20 √20 5
so θ = 53.1°.
659

Self-test 29.5

29.6
On the ellipse x2 + 2xy + 4y2 = 4 (see Self-test 29.4), find the direction of the
normal to the ellipse at the point x = 1, y  0 on the ellipse.

GRADIENT VECTOR IN TWO DIMENSIONS


29.6 Gradient vector in two dimensions
It is familiar that the value of a quantity such as pressure or temperature depends
on, or is a function of, position (x, y). These are scalar functions: the values they
take up are ordinary numbers. There are also vector quantities that depend
on position. Figure 29.8 shows some streamlines for a fluid flowing over a long
cylinder (assuming that the flow is always in the plane of the paper). The velocity
v is a vector which varies from point to point, so we can write v = v(x, y).
Gravitational, magnetic, and electric fields are other instances of vector functions
of position or vector fields.

Fig. 29.8 Schematic for flow past


a cylinder.

Associated with any scalar function, there is an important vector function


which arises as follows. We repeatedly produce formulae involving the pair of
elements ∂f/∂x and ∂f/∂y in combinations of the form U ∂f/∂x + V ∂f/∂y, where
U and V are constants or functions; for example, as in (29.7), (29.2), and (29.4).
We can manipulate this pair as a unit by regarding

∂f ∂f ⎛ ∂f ∂f ⎞
î + q or ⎜ , ⎟
∂x ∂y ⎝ ∂x ∂y ⎠

as a vector function. We call this vector function the gradient of f and denote it by
grad f or ∇f
(∇ is pronounced ‘del’ or ‘nabla’). We shall see that it works rather like an ordin-
ary derivative, but in two dimensions; hence its name.
Alternatively we can regard the symbol grad or ∇ standing alone as an operator
(compare d /dx): it operates on scalar functions f(x, y), instructing us to carry out
the operation î ∂/∂x + q ∂/∂y or (∂/∂x, ∂/∂y) on f(x, y):

⎛ ∂ ∂⎞
grad f (x, y) = ⎜ î + q ⎟ f (x, y).
⎝ ∂x ∂y ⎠
660

Gradient in two dimensions


FUNCTIONS OF TWO VARIABLES: GEOMETRY AND FORMULAE

Given a scalar function f(x, y), grad f or ∇f stands for


∂f ∂f ⎛ ∂f ∂f ⎞
î +q or ⎜ , ⎟ .
∂x ∂y ⎝ ∂x ∂y ⎠
Alternatively, grad or ∇ stands for the operator
∂ ∂ ⎛ ∂ ∂⎞
î +q or ⎜ , ⎟ .
∂x ∂y ⎝ ∂ x ∂y ⎠
(29.8)

Example 29.15 Let f(x, y) = x2 + y2. Obtain (a) the vector function grad f;
(b) the value of grad f at the point (1, 2); (c) an expression for the magnitude,
or length, of grad f at (x, y).
∂f ∂f
(a) grad f = î+ q = 2xî + 2yq ;
∂x ∂y
or we can use the alternative notations, and even the operator viewpoint:
⎛ ∂ ∂⎞
∇f = ⎜ , ⎟ (x2 + y2 ) = (2x, 2y).
⎝ ∂x ∂y ⎠
(b) At x = 1, y = 2, we have
grad f = (2, 4).
(c) The magnitude or length of a vector v = (a, b) is |v| = (a2 + b2) 2 , so
1

|grad f | = [(2x)2 + (2y)2] 2 = 2(x2 + y2) 2 .


1 1
29

We can re-express some earlier results in terms of grad. For example, we may
write (29.7) immediately as follows.

A normal vector n at the point (x, y) on the curve f(x, y) = c is


n = grad f(x, y). (29.9)

As we remarked earlier, expressions occurring in physical theory frequently


take the form
∂f ∂f
U +V ,
∂x ∂y
where U and V may be constants, or various functions. Then we can write such
expressions as a scalar (‘dot’) (see Section 10.1) product by inventing a new
vector function S = Uî + Vq:

If S = Uî + Vq, then
∂f ∂f ⎛ ∂f ∂f ⎞
U +V = (U, V) ⋅ ⎜ , ⎟ = S . grad f .
∂x ∂y ⎝ ∂ x ∂y ⎠
(29.10)
661
Now consider the directional-derivative formula (29.4), regarding it as repre-
senting the rate of change of f(x, y) in the direction θ :

29.6
df ∂f ∂f
= cos θ + sin θ .

GRADIENT VECTOR IN TWO DIMENSIONS


d s ∂x ∂y
To recast this in the form of (29.10), we require the vector
î cos θ + q sin θ.
This is a unit vector (i.e. it has length unity) because
(cos2θ + sin2θ )2 = 1;
1

so put
î cos θ + q sin θ = v,
where v is a unit vector pointing in the desired direction, and (29.10) becomes

Directional derivative
In the direction of a unit vector v, the rate of change of f(x, y) is given by
df
= v · grad f;
ds
that is to say, df/ds is equal to the component of grad f in the direction of v.
(29.11)

Equation (29.11) can be written in a different way. If a and b are two vectors,
then the angle between them, φ, can be obtained from the identity
a ·b = |a | |b| cos φ, with 0  φ  π
(see (10.4)). If we put a = v and b = grad f, and use the fact that | v | = 1, we obtain
an alternative form of (29.11).

Directional derivative (alternative form)


df
= |grad f | cos φ,
ds
where φ is the interior angle between grad f and the required direction. (29.12)

By using (29.12), the perpendicularity of the directions of steepest ascent and


the contours of f(x, y), proved in Section 29.3, can be recovered.

Self-test 29.6
Find the rate of change of f(x, y) = x2 − 2xy2 + y3 at (−1, 1) in the direction
(1, 2).
662
Problems
FUNCTIONS OF TWO VARIABLES: GEOMETRY AND FORMULAE

1
29.1 Use the incremental approximation (29.1) or b2 tan A tan C
S= 2
.
(29.2) to estimate the change δz due to changes δx tan A + tan C
and δy as specified, and check the percentage error
by calculating the exact result. Suppose that nominally b = 2, A = 30°, C = 60°,
(a) z = x2 + y2 at (3, 1), δx = 0.1, δy = 0.3; but that C is found to be too large by 5%. By what
(b) z = sin xy at (0.5, 1.2), δx = 0.1, δy = −0.05; amount should A be changed so that S would be
(c) z = ex +3y at (1, 1), δx = 0.1, δy = 0.2;
2 2 restored to the correct area?
(d) z = 1/(x2 + y2) 2 at (2, 1), δx = −0.2, δy = 0.1.
1

29.8 A certain type of experiment to measure


29.2 Given z = x − y and two points P : (1.0, 2.1)
2 2 surface tension S requires the formula S = ahr 3/p 2,
and Q : (1.1, 2.0), (a) estimate the change in z in where a is a constant and h, r, and p are measured
going from P to Q; (b) estimate the change in going quantities. Take the logarithm of the formula to
from Q to P; (c) explain in general terms why the find the fractional change in δS/S in S, in terms of
second estimate is not precisely the negative of simultaneous fractional changes in h, r, and p.
the first.
29.9 Find the directional derivative df/ds
29.3 (See Example 29.2.) Obtain the exact of each of the following functions according
algebraical form of the error incurred in δf, where to the data. Also, for the given point, find the
directions of the contour and the direction of
δf = f(x + δx, y + δy) − f(x, y),
steepest ascent.
∂f ∂f (a) f(x, y) = x2 + y2 at (1, 2), direction θ = 30°;
in using the approximation δ f ≈ δx + δy
∂x ∂y (b) f(x, y) = x2y2 at (2, 1), direction θ = −45°;
(a) for f(x, y) = xy near the point (2, 1); (c) f(x, y) = x2y − xy2 + 2 at (−1, 1), direction
(b) for f(x, y) = x /y near the point (2, 1). θ = 120°;
(d) f(x, y) = sin xy at ( 12, π), direction θ = −90°;
29.4 The relation between the object distance u, (e) f(x, y) = cos(x2 − y) at (0, −π), direction θ = 0;
the image distance v, and the focal length f of a (f) f(x, y) = e x−y at (1, 1), direction θ = −45°.
thin lens is
29

dy
1 1 1 29.10 Find at the prescribed points on the
+ = . dx
u v f curves given.
Suppose that the measured values of u and v are (a) xy = 1 at (2, 12);
u = 0.31(±0.01), v = 0.56(±0.03); calculate the (b) x2 + y2 = 25 at (3, 4);
greatest possible error in estimating f, and the (c) 1/x − 1/y = 12 at (1, 2);
corresponding percentage error. (d) 101 x 2 + 151 y 2 = 1 at (2, 3);
(e) x3 + 2y3 = 3 at (1, 1);
29.5 A viscous liquid is forced through a tube of (f) x3y + 3x2 − y2 − 19 = 0 at (2, 1);
diameter d = 0.002 ± 0.0001m and length l = 0.1m (g) xy2 − x2y + 6 = 0 at (3, 2);
under a pressure p = 5000 ± 50Nm−2, and is found (h) x2 + y2 = 4 at (2 cos θ, 2 sin θ );
to pass fluid at a rate q = 1.66 × 10−6m3s−1. The (i) x2/a2 + y2/b2 = 1 at (a cos t, b sin t);
viscosity η is given by the formula (j) x cos y = y sin x at (π /2, 0);
π pd 4 (k) y2 − 4ax = 0 at (at2, 2at).
η= .
128 ql
Find the maximum error in the viscosity estimate. 29.11 The ideal-gas equation, for a fixed mass of
gas is PV = RT, where R is a constant. There are
29.6 One root of the equation x2 + bx + c = 0 is three variables: P is pressure, V is volume, and
x = 12 [−b + (b2 − 4c) 2 ]. Suppose that b = 20.4 and
1
T is absolute temperature. Show that
c = 95.5. Estimate the percentage error in the root ⎛ ∂V ⎞ ⎛ ∂T ⎞ ⎛ ∂T ⎞
which would arise if these were rounded to ⎜ ⎟ = −⎜ ⎟ ⎜ ⎟ .
⎝ ∂P ⎠ T ⎝ ∂P ⎠ V ⎝ ∂V ⎠ P
b = 20, c = 96.
(The notation (∂u/∂v)w means that the variable
29.7 The area S of a triangle with base b and base w is kept constant during differentiation when
angles A and C is given by u = g(v, w). Use (29.6).)
663
29.12 Find the cartesian equation of the tangent Then, by differentiating this equation and treating
line at a point (x1, y1) on each of the following y as a function of x, we obtain

PROBLEMS
curves. (Find dy /dx first.) dy dy
(a) x2 + y2 = a2; (b) x2/a2 + y2/b2 = 1; 2x + 2x + 2y + 2 y = 0,
dx dx
(c) a2x2 − b2y2 = c; (d) xy = 1; (e) x 3 + y 3 = 1;
2 2

(f) ax + 2hxy + by + 2gx + 2fy + c = 0.


2 2 from which dy/dx can be found. Check that (29.6)
gives the same result.
29.13 Suppose that the curves f(x, y) = α and g(x, y)
= β intersect at right angles at a point (a, b). Find 29.19 Find normal vectors to the curves below,
dy/dx at the point for each curve and deduce that, and find the angle between them at the intersection
at (a, b), given.
(a) xy = 2, x2 − y2 = −3, at intersection (1, 2).
∂ f ∂g ∂ f ∂g
+ = 0. (b) y = x3, x2 + 12 y2 = 36, at intersection (2, 8).
∂x ∂x ∂y ∂y (c) x2 + xy + y2 = 3, x + y = 2, at intersection (1, 1);
Use this result to confirm that, in the following interpret your result geometrically.
cases, the two systems of curves are orthogonal (i.e. (d) ax2 + 2hxy + by2 + c = 0 and
they always intersect at right angles). Here α and β ax0x + 12h(x0 + x)(y0 + y) + by0y + c = 0,
are the parameters for the two systems – by varying
them we obtain all the curves for the systems. at any point (x0, y0) which lies on the first curve.
(a) x2 + y2 = α, y/x = β; (b) x2 − y2 = α, xy = β;
(c) y3 − x3 = α, 1/y + 1/x = β; 29.20 Find d2y /dx2 on the following curves.
(d) (x2 + y2)/x = α, (x2 + y2)/y = β. (a) x4 − y4 = 1; (b) xy = 1; (c) xy exy = 1.

29.14 Let (x, y) be any point on the curve 29.21 Obtain grad f, where f(x, y) is given by the
y3 − x3 = 1. Find an expression for dy /dx at the following. Give its components, its direction, and
point. Since this expression holds good for every its magnitude at the points specified.
point on the curve, it is a differential equation, (a) 1/(x + y) at (1, −2); (b) y/x at (2, 0);
having the given curve as one of its solution (c) y2 − 3x2 + 1 at (0, 0); (d) 1/x − 1/y at (2, 1);
curves. Verify this by solving it, and obtain the (e) 1/r, where r is the polar coordinate,
r = (x2 + y2) 2 ; confirm that the gradient
1
other solutions.
vector points in a radial direction.
29.15 (Numerical). Form the differential equation
for the following families of curves, in which c is 29.22 Use the gradient vector to obtain a unit
the parameter; then use the numerical solution vector perpendicular to the following curves
method of Section 22.2 to obtain a contour map at the points given
of the functions concerned. (a) 2x − 3y + 1 = 0 at any point;
(a) x2 + 2y2 = c, c > 0; (b) x2 + xy − y3 = c; (b) x2 + y2 = 5 at (2, 1);
x2 + y (c) x2 + y2 = r 2 at (x0, y0) on the circle;
(c) = c; (d) xy e−x = c. (d) x2/a2 + y2/b2 = 1 at (x0, y0) on the ellipse;
x + y2
(e) y = 3x2 − 2 at (2, 10).
29.16 Form the differential equation for each
system of curves, and deduce the differential 29.23 Use the property (29.9) to find the angle of
equation for the orthogonal (perpendicular) intersection of the following curves at the point
system. Solve it to obtain the orthogonal of intersection given.
system. (a) y2 − x2 = −3 and x3 − y3 = 7 at (2, 1);
(a) y2 − x2 = c; (b) y3 + x3 = c; (b) x2y − xy2 = 0 and x/y − y/x = 0 at (2, 2);
(c) y2 = cx; (d) ey − ex = c. (c) x2 + y2 + 2x − 4y + 4 = 0 and y = x2 + 2x + 2 at
(−1, 1); explain the result geometrically.
29.17 Find the curves of steepest ascent from an
arbitrary point (a, b) for each of the following 29.24 Use (29.12) to prove the results given in
functions. Section 29.3 for a general f(x, y): that (a) the
(a) 12 x 2 + y 2; (b) x3y3; (c) 12 y 2 − y − x 2. directions of most rapid increase and decrease
through a point (x, y) are perpendicular to the
29.18 Implicit differentiation of y with respect direction of the contour through the point; (b) the
to x can be carried out as follows when f(x, y) is maximum rate of increase from the point is equal
given explicitly. Consider f(x, y) = x2 + 2xy + y2 = c. to | grad f | at the point.
Chain rules, restricted
30 maxima, coordinate
systems

CONTENTS

30.1 Chain rule for a single parameter 664


30.2 Restricted maxima and minima: the Lagrange multiplier 667
30.3 Curvilinear coordinates in two dimensions 672
30.4 Orthogonal coordinates 675
30.5 The chain rule for two parameters 676
30.6 The use of differentials 679
Problems 681

A chain rule is a rule for manipulating ‘functions of functions’. Chain rules are
analogous to the chain rule for a single variable of Sections 3.3 and 3.6, but there
are many forms available for use with two or more variables.
The first application of a chain rule (eqn (30.1)) is to the question of locating a
maximum/minimum. Other applications are described in subsequent sections.

30.1 Chain rule for a single parameter


Suppose that x and y depend on, or are functions of, another variable t (say) which
we call the parameter. It might represent time, for example. We shall write
x = x(t), y = y(t).
As t varies, the point (x, y) follows a curve of some sort which is said to be defined
parametrically. The curve also has a characteristic direction, which is the direc-
tion in which the curve is described as t is increasing, and is indicated by an arrow.
We then have a directed path.

Example 30.1 Show that both of the following parametrizations define a unit
semicircle, centred at the origin, in the upper half plane, traced anticlockwise:
(a) x = cos t, y = sin t, where t increases from 0 to π; (b) x = −u, y = (1 − u2) 2 ,
1

where u increases from −1 to 1.


(a) The shape of the curve is obtainable by eliminating t:
x2 + y2 = cos2t + sin2 t = 1; ➚
665
Example 30.1 continued

30.1
so the points lie on the unit circle. Also, as t increases from 0 to π, y is positive and x
decreases from 1 to −1. The path is the upper semicircle from (1, 0) to (−1, 0), described
in a single direction, as shown in Fig. 30.1a.

CHAIN RULE FOR A SINGLE PARAMETER


(a) y (b) y
1
t = 2π u=0
u = 0.4 u = −0.4
3
t = 4π 1
t = 4π u = 0.8 u = −0.8
1 1

t=π t=0 u=1 u = −1

−1 O 1 x −1 O 1 x

Fig. 30.1

(b) x2 + y2 = (−u)2 + (1 − u2) = 1. As u increases from −1 to 1, y remains positive while x


decreases from 1 to −1. The path is as in (a): see Fig. 30.1b.

Given a function f(x, y) which can take values all over the (x, y) plane, the
function
g(t) = f(x(t), y(t))
picks out only the values on the path (x(t), y(t)). As we move along this path, the
value of g(t) varies, and we might be concerned with the rate at which it changes
with t. (This is generally different from the rate at which f(x, y) changes with
distance along the path, which is equal to the directional derivative (29.4), and
corresponds to using arc-length s as the parameter.)
To find df/dt, suppose that t increases from t to t + δt. Then, on the curve (x(t),
y(t)), x changes from x to x + δx and y to y + δy. Divide (29.2) (the incremental
approximation) by δt:
δf ∂f δx ∂f δy
≈ + .
δt ∂x δt ∂y δt
δx dx δy dy
Let δt → 0. Then ‘≈’ becomes ‘=’, → , and → , and we have the
δt dt δt dt
chain rule (or total derivative):

Chain rule for one parameter


Given f(x, y), x = x(t) and y = y(t),
df ∂f dx ∂f dy
= +
dt ∂x dt ∂y dt
(and similarly with z in place of f, if we write z = f(x, y)). (30.1)
666
This expression is like the chain rule (3.3) for functions of a single variable with
an extra term in it for the variable y. Partial derivative rather than ordinary
CHAIN RULES, RESTRICTED MAXIMA, COORDINATE SYSTEMS

derivative signs are then written as necessary.

Example 30.2 Let f(x, y) = xy − y 2, x = t 2, y = t 3. (a) Find df /dt using the chain
rule; (b) find df/dt by substitution.
∂f ∂f dx dy
(a) = y, = x − 2y, = 2t, = 3t 2.
∂x ∂y dt dt
Therefore, by (30.1),
df ∂f dx ∂f dy
= + = y(2t) + (x − 2y)3t 2 = 2t 4 + (t 2 − 2t 3)3t 2 = 5t 4 − 6t 5.
dt ∂x dt ∂y dt
(This expression can be written in various ways in terms of x and y, for example as
5x2 − 6xy, or 5yx 2 − 6x2y 3 . These all look very different, but they all take the same
1 1

values since x and y are connected by the fact that (x, y) lies on the given curve.)
(b) By substitution,
f(x(t), y(t)) = xy − y2 = t 2 t 3 − (t 3)2 = t 5 − t 6.
Therefore, as before,
df
= 5t 4 − 6t 5 .
dt

Example 30.3 Prove the implicit-differentiation formula (29.6) by using the


chain rule with x treated as the parameter.
If f(x, y) = c, then there is a solution y = y(x) for which
f(x, y(x)) = c
30

is automatically true for every value of x involved (i.e. it is an identity). Therefore


df (x, y(x))
= 0.
dx
Comparing this with (30.1), the chain rule, we have x in place of t for the parameter.
In terms of the chain rule, we therefore have
∂f dx ∂f dy ∂f ∂f dy
0= + = + .
∂x dx ∂y dx ∂x ∂y dx
From this we recover the implicit-derivative formula (29.6)
dy ∂f ∂f
=− .
dx ∂x ∂y

The chain rule is more useful for obtaining general results, as in Example 30.3,
than in working out special instances such as Example 30.2.

Self-test 30.1
Let z = f(x, y) = x(3y2 − x2). Show how z varies on the surface z = f(x, y) when
x = cos t, y = sin t, (0  t  2π). Find dz/dt on x = cos t, y = sin t as a function
of t using the chain rule. Where do the stationary values of z occur on this
curve on the surface?
667

Restricted maxima and minima:


30.2

30.2
the Lagrange multiplier
Consider the simple function

RESTRICTED MAXIMA AND MINIMA: THE LAGRANGE MULTIPLIER


f(x, y) = x + y.
This has no maxima, minima, or other stationary points, since z = x + y repres-
ents an inclined plane. However, if we travel around the plane on a particular
path, we are likely to encounter high points and low points, and points where we
are momentarily travelling on the level. Suppose that we walk on the circular path
x2 + y2 = 1 on the plane, shown on a map in Fig. 30.2.

1 1
(√2 , − √2 )
g
in

1
as

(maximum)
re
nc

A
zi

x2 + y2 = 1

x
O 1
z=
√2
B
z=
g
in

1
as

1 1
(− √2 , − √2 )
re
ec

(minimum)
zd

y
+
x
of
rs
z=

ou
nt
−1

co
z=
−√
2

Fig. 30.2 Map showing the circular path and the contours projected on to the plane z = 0.

Then A corresponds to the highest point; this is where we were walking uphill
but then turn downhill: it is a local maximum point on the path. If we plotted a
graph of elevation against time, this point would show up as local maximum on
the graph.
The clue which reveals A to be a maximum is that one of the contours of x + y
is a tangent to the path at A. Those nearby contours that the path crosses are
all lower than the one through A. Similarly, at B, there is a local minimum for
the path.
This is an example of a restricted stationary-point problem, the ‘restriction’
being the condition that the only points considered are those that lie on a particu-
lar curve. A general statement of the problem is as follows.

Restricted stationary-point problem


Find the stationary points of f(x, y) subject to the condition g(x, y) = c. (30.2)
668
Very simple problems of this type can be solved by an elementary method, as in
the following example.
CHAIN RULES, RESTRICTED MAXIMA, COORDINATE SYSTEMS

Example 30.4 Find the maximum possible area a rectangle may have if the
perimeter is restricted to length 10 units.
Call the sides x and y. Then we require the maximum of the area A:
A = f(x, y) = xy (i)

subject to the restriction on the perimeter P


P = g(x, y) = 2x + 2y = 10. (ii)

From the perimeter equation (ii), we have y = 5 − x; so the area can be


expressed in terms of x only:
A = x(5 − x).
This has a stationary point where dA/dx = 0, or
5 − 2x = 0,
that is at x = 52 . The perimeter equation (ii) gives correspondingly y = 52 , so
the desired shape is a square of area 254 .

However, although the following problem looks very similar, there turns out to
be a difficulty.
30

Example 30.5 Find the maxima/minima of z = f(x, y) = x 2 − y 2 on the circle


g(x, y) ≡ x + y = 1.
2 2

On the circle,
y 2 = 1 − x 2, − 1  x  1. (i)

The values taken by z = x − y on this curve are given in terms of x by


2 2

z = x 2 − (1 − x 2) = 2x 2 − 1, − 1  x  1.
The only stationary points of this function are where
d
(2x2 − 1) = 0,
dx
which is at x = 0. At x = 0, the curve equation (i) gives y = ±1, so we have found the
points A : (0, 1) and A′ : (0, −1). These are in fact minima, and they are shown on the
path in Fig. 30.3a.
However, there are plainly two maxima also, at B and B′, which are completely
missed by the process above. We could have found them (but lost A and A′) if we
had substituted for x instead of y by means of x2 = 1 − y2. You can see the reason
for losing A and A′ if you sketch the function 2x2 − 1 between x = ±1. The maximum
values are at the ends, but cannot be found by differentiating; see also Example 4.8.
The restricted maximum and minimum values occur on the curve which is the
intersection of the circular cylinder x2 + y2 = 1 with the saddle z = x2 − y2 shown in
Fig. 30.3b.
669

y z = x2 − y2 z
−1 − 3
(a) 2 (b)

30.2
0 2
3
A (min.)
1 y

RESTRICTED MAXIMA AND MINIMA: THE LAGRANGE MULTIPLIER


A
z decreasing
B

O
B (max.) B′ (max.) O
0 x
B′ x

A′ x 2 + y2 = 1
z decreasing

1 A′ (min.)
2
3
0
− 23 −1

Fig. 30.3 (a) Contour map of f(x, y) = x2 − y2, showing also the curve g(x, y) = x2 + y2 = 1. Here A and A′ are
minima, and B and B′ are maxima. (b) The circle x2 + y2 = 1 in the x,y plane, with the corresponding values
of z = x2 − y2 shown.

We can get over this difficulty by parametrizing the curve g(x, y) = c as in the
following example, which repeats Example 30.5.

Example 30.6 Find the stationary points of x2 − y2 on the curve x2 + y2 = 1.


Put
x = cos t, y = sin t, 0  t  2π,
then the circle C : x 2 + y 2 = 1 is traced once, anticlockwise, starting and ending at (1, 0).
On C,
f(x, y) = cos2 t − sin2 t.
As we go along the path C , stationary points are encountered where
df (x(t), y(t))
0= = 2 cos t(− sin t ) − 2 sin t cos t
dt
= − 4 sin t cos t = −2 sin 2t.
The solutions of this equation in the range 0  t  2π are t = 0, --12 π, 2π, --32 π, which
correspond to the points (1, 0), (0, 1), (−1, 0), (0, −1). Therefore this approach
successfully found all the stationary points on the path, which the method of Example
30.5 failed to do. The stationary points can be classified by using the second-derivative
test, eqn (4.2).

We shall now describe the Lagrange-multiplier method for solving the restricted
stationary-value problem (30.5). This uses the parametric idea, but all reference to
a parameter is eliminated eventually so that we do not have to invent a parametriza-
tion and then go through the resulting algebra.
Think of time t as a possible parameter, and P : (x(t), y(t)) as a point moving
along the curve with velocity (dx/dt, dy /dt). We shall imagine g(x, y) = c is
670
expressed parametrically so that (a) the path is traced exactly once as t moves
through its range, and (b) dx/dt and dy/dt are never both zero together (if t is
CHAIN RULES, RESTRICTED MAXIMA, COORDINATE SYSTEMS

time, this means that the moving point P never pauses).


Then as P : (x(t), y(t)) moves along g(x, y) = c, the points Q where df /dt = 0 are
the stationary points of f(x(t), y(t)). Therefore, by the chain rule (30.1),
∂f dx ∂f dy
+ = 0.
∂x dt ∂y dt
To get rid of dx /dt and dy/dt, which are special to the particular parametrization
chosen, we need another equation. On the curve, g(x, y) has a constant value c, so
dg/dt = 0 at every point including Q. Therefore by the chain rule,
∂g dx ∂g dy
+ = 0.
∂x dt ∂y dt
These last two equations can be regarded as a pair of homogeneous algebraic
equations for dx /dt and dy/dt. From Section 12.5, the equations have a non-
trivial solution if and only if the determinant of the coefficients is zero, so at the
(unknown) point Q
∂f ∂g ∂f ∂g
− = 0.
∂x ∂y ∂y ∂x
This can be written alternatively in the forms
∂f ∂g ∂f ∂g
= λ, or = λ, (30.3a, b)
∂x ∂x ∂y ∂y
30

where λ (‘lambda’) is a new unknown constant, called the Lagrange multiplier


for the problem. We have lost some information here, because the condition
dg/dt = 0 did not distinguish between one value of c and another, so we reassert
the condition
g(x, y) = c. (30.3c)

Looking back, we have three unknowns: x and y (the coordinates of any sta-
tionary point Q) and λ, another constant. To determine these, there are three
equations: (30.3a, b) and (30.3c). Finally, we summarize the method.

Lagrange-multiplier method for the restricted stationary-value problem


To find the stationary points of f(x, y) subject to g(x, y) = c, solve the following
equations for x, y, λ:
g(x, y) = c, (i)

∂f ∂g
− λ = 0, (ii)
∂x ∂x
∂f ∂g
− λ = 0. (iii)
∂y ∂y
(The value of λ can usually be discarded.) (30.4)
671
Notice that all reference to the parameter t has disappeared. There are many ways
of proving (30.4), but this is probably the simplest for two dimensions. The problem

30.2
is treated for three dimensions in Section 31.8.

RESTRICTED MAXIMA AND MINIMA: THE LAGRANGE MULTIPLIER


Example 30.7 Find the stationary points of x 2 − y 2 on the circle x 2 + y2 = 1
(compare Examples 30.5 and 30.6).
In (30.4), f(x, y) = x 2 − y 2 and g(x, y) = x 2 + y 2 = 1. The equations to be solved, in the
order of (30.4), become
x 2 + y 2 = 1, (i)
2x − λ (2x) = 0 or (1 − λ)x = 0, (ii)
−2y − λ (2y) = 0 or (1 + λ )y = 0. (iii)
From (ii), either λ = 1 or x = 0. Taking these possibilities in order:
If λ = 1, then (iii) gives y = 0; consequently (i) gives x = ± 1.
Therefore we have found the points (1, 0) and (−1, 0) which we called B′ and B in
Example 30.5.
If x = 0, then (iii) gives λ = −1, and (i) gives y = ±1. We have therefore found the points
(0, 1) and (0, −1) which we called A and A′ in Example 30.5.

The equations obtained are often awkward to solve. It is best to be very system-
atic, not wandering aimlessly between the equations. Be careful not to overlook
possibilities (such as that (ii) in Example 30.7 is solved by λ = 1); and check at
the end that the solutions actually fit. The values found for λ do have a special
significance in certain subjects but otherwise can be thrown away.

Example 30.8 Find the rectangle of maximum area which can be placed
symmetrically in the ellipse x2 + 4y2 = 1 as shown in Fig. 30.4.

A
(x, y)

x
O

Fig. 30.4

Suppose that one of the vertices, say A, is at (x, y). We shall require that x and y be
positive, since this is sufficient, given the symmetry. The area is equal to 4xy = f(x, y),
while x and y are subject to g(x, y) = x2 + 4y2 = 1.
The three equations, taken in the order of (30.4), become
x2 + 4y2 = 1, (i)
2y − λx = 0, (ii)
x − 2λy = 0. (iii)
Suppose that neither x nor y is zero (that could not give a maximum). Then, from
(ii) and (iii), ➚
672
Example 30.8 continued
CHAIN RULES, RESTRICTED MAXIMA, COORDINATE SYSTEMS

λ = 2y/x = x /(2y), (iv)


so x = ±2y. However, these must have the same sign for positive area, and we postulated
that x and y should be positive. Therefore
x = 2y  0; (v)
so, from (iv) again,
λ = 1. (vi)
Use (v) to substitute for x in (i): we get 8y = 1, or (rejecting negative values of y)
2

y = 1/(2√2),
and (v) gives correspondingly
x = 1/ √2.
The sides have length 1/√2 and √2, so the area is 1.

Self-test 30.2
Using the Lagrange multiplier method, find the stationary values of f(x, y)
= x2 − 3xy + y2 on the circle x2 + y2 = 2.

30.3 Curvilinear coordinates in two dimensions


Suppose that x and y are functions of two parameters, or variables, u and v. To
indicate this, write
x = x(u, v), y = y(u, v).
30

This situation arises when we change coordinates from (x, y) to another system.
For example, the equations
x = u cos v, y = u sin v,
represent polar coordinates, with u as the radial and v as the angular coordinate.
Now hold v constant; put
v = β,
say, and let u vary. Then
x = u cos β, y = u sin β.
Here u is the only active parameter; as it varies, (x, y) traces a radial straight line.
Suppose instead that u is held constant, say
u = α;
then, as v varies, (x, y) follows the circle of radius | α |
x = α cos v, y = α sin v.
The point where the two curves intersect can be described either by
u = α, v=β
673
in the new (polar) coordinates, or in the original coordinates by

30.3
x = α cos β, y = α sin β.
In general, if we have

CURVILINEAR COORDINATES IN TWO DIMENSIONS


x = x(u, v), y = y(u, v),
and vary u and v together in an arbitrary way, then the corresponding points (x, y)
will completely cover some area in the (x, y) plane. If, however, we put u = α and vary
v, then put v = β and vary u, we obtain two families of curves in parametric form:
(x(α, v), y(α, v)) and (x(u, β ), y(u, β )),
where the parameters are u and v. By choosing different values for α and β, we
produce a net consisting of two independent systems of curves. This can serve as
a new coordinate system.

Example 30.9 Sketch the coordinate system defined by


x = u + v, y = u − v.
Put u = α (constant) and vary v in the equations
x = α + v, y = α − v.
Eliminating the active parameter v between the two equations:
y = −x + 2α,
which is a straight line. By taking different values of α, we obtain a system of parallel
straight lines as in Fig. 30.5a.

(a) (b) (c)


y y y u
u=0 u=1 −4
4 4 4 4
u = −1 v=1
u=2 v = −2 −2
2 2 2
2
−4 −2 O
O 2 4 −4 −2 O 2 4
x x −4 −2 2 4 x
2
−2 −2 −2
u = −2 v=2 −2
v = −1
4
−4 −4 −4 v
−4
v=0

Fig. 30.5

Put v = β and vary u:


x = u + β, y = u − β.
Therefore
y = x − 2β,
which gives another family of parallel straight lines, obtained by taking various values
for the constant β, as in Fig. 30.5b.
The two families happen to be at right angles. Taken together, as in Fig. 30.5c, they
form a left-handed system of cartesian coordinates (u, v) with origin at x = 0, y = 0.
674
New coordinates (u, v) can alternatively be defined in the form
CHAIN RULES, RESTRICTED MAXIMA, COORDINATE SYSTEMS

u = u(x, y), v = v(x, y).


For example, the system
u = (x 2 + y2) 2 , v = arctan (y/x)
1

defines polar coordinates, u (radial) and v (angular). To sketch the curves corres-
ponding to constant u or v, we put
α = u(x, y) or β = v(x, y);

each of these gives the corresponding curve implicitly.

Example 30.10 Sketch the coordinate system (u, v) described by


u = y − 2x , v = x 2 y.
1
2 2

The curve u = α is obtained in terms of x and y by solving


α = y 2 − 2x 2.
These curves (for various values of α) are in fact recognizable without solving, being a
system of hyperbolas with asymptotes
y = ± 2 x.

The curves v = β are given in x 2 y = β, so y = β /x 2 . The system is sketched in Fig. 30.6


1 1

for the first quadrant. Notice that v = 0 on both x = 0 and y = 0: the connection between
(x, y) and (u, v) is not one-to-one over the whole (x, y) plane.
30

2
v=3
3

2
v=2
1 u=1
u=0
v=1
v=0

−2 −3
u = −1 v=0
O 1 2 x Fig. 30.6

Self-test 30.3
Sketch the curvilinear coordinates defined by the elliptic system
x = cosh u cos v, y = sinh u sin v.
675

30.4 Orthogonal coordinates

30.4
Suppose that we have a (u, v) system of coordinates defined either by x = x(u, v),
y = y(u, v), or by u = u(x, y), v = v(x, y), and the curves u = α and v = β always

ORTHOGONAL COORDINATES
intersect at right angles for any constants α and β. Then the (u, v) system is said
to be an orthogonal system of coordinates. For example, polar coordinates are
orthogonal. Coordinate systems which are not orthogonal are seldom used because
of the complexity of the formulae connected with them. A test for orthogonality
is the following.

Conditions for an orthogonal system of coordinates


The (u, v) system is orthogonal if
either (a) u = u(x, y), v = v(x, y), and
∂u ∂v ∂u ∂v
+ = 0;
∂x ∂x ∂y ∂y
or (b) x = x(u, v), y = y(u, v), and
∂x ∂x ∂y ∂y
+ = 0.
∂u ∂v ∂u ∂v (30.5)

We prove this result as follows.


(a) Consider the curve from each family which passes through (x, y). Accord-
ing to (29.7), normal vectors to the two curves at (x, y) are n1 = (∂u /∂x, ∂u /∂y) and
n2 = (∂v/∂x, ∂v/∂y) respectively. The curves meet in a right angle if their normals
do so, and the condition for this is n1 · n2 = 0, which is equivalent to the condition
given in (30.5a).
(b) Consider the curves u = α and v = β which pass through a point P which has
new coordinates (α, β ). Their parametric equations are
x = x(α, v), y = y(α, v), for the curve u = α,
x = x(u, β ), y = y(u, β ), for the curve v = β.
Their slopes at P are given by
⎛ dy ⎞ ⎛ ∂y ∂x ⎞
⎜ ⎟ =⎜ ⎟ on u = α, and
⎝ dx ⎠ P ⎝ ∂v ∂v ⎠ P
⎛ dy ⎞ ⎛ ∂y ∂x ⎞
⎜ ⎟ =⎜ ⎟ on v = β.
⎝ dx ⎠ P ⎝ ∂u ∂u ⎠ P
The condition for the curves to be perpendicular is that the product should equal
(−1), and this is equivalent to the result in (30.5b).

Example 30.11 Confirm that the following coordinate systems (u, v) are
1
orthogonal. (a) u = y2 − 2x2, v = x–2 y (x  0); (b) x = 2uv, y = u2 − v2.
(a) Use (30.5a). We have
∂u ∂v 1 − 1 ∂u ∂v
= − 4x, = 2 x 2 y, = 2y, = x2,
1

∂x ∂x ∂y ∂y ➚
676
Example 30.11 continued
CHAIN RULES, RESTRICTED MAXIMA, COORDINATE SYSTEMS

so
∂u ∂v ∂u ∂v
+ = − 4x( 12 x − 2 y) + 2y(x 2 ) = 0.
1 1

∂x ∂x ∂y ∂y
(b) Use (30.5b); notice how this condition is differently structured from (30.5a). We
have
∂x ∂x ∂y ∂y
= 2v, = 2u, = 2u, = −2v;
∂u ∂v ∂u ∂v
so
∂x ∂x ∂y ∂y
+ = 2v(2u) + 2u(−2v) = 0.
∂u ∂v ∂u ∂v

Self-test 30.4
Confirm that the elliptic system x = cosh u cos v, y = sinh u sin v is orthogo-
nal (see Self-test 30.3 and Example 30.11).

30.5 The chain rule for two parameters


Suppose that we have a new set of coordinates defined by
x = x(u, v), y = y(u, v),
30

and a function f(x, y): an arbitrary function of position. The function f(x, y) can
be expressed in terms of the new coordinates; for example, if
x = u2 − v 2, y = 2uv, and f(x, y) = x 2 + y2,
then
f(x, y) = (u2 − v 2)2 + (2uv) 2 = (u 2 + v 2)2
when evaluated at the same point.
If we put
z = f(x, y),
then the derivatives ∂z /∂u and ∂z /∂ v indicate how z, or f(x, y), changes as we
follow the curves of constant v and constant u respectively. Consider the derivative
∂z
∂u
in which v is held constant, at v = β say. Since only u varies, we are able to adopt
the single-variable chain rule (30.1), with u instead of t. However, we must write
∂x/∂u and ∂y/∂u instead of dx/du and dy/du in order to indicate that another
variable v is present, although it is regarded as constant for the differentiation.
We obtain the following.
677

Chain rule for two parameters

30.5
If x = x(u, v), y = y(u, v), z = f(x, y), then
∂z ∂z ∂x ∂z ∂y
= + ,

THE CHAIN RULE FOR TWO PARAMETERS


∂u ∂x ∂u ∂y ∂u
∂z ∂z ∂x ∂z ∂y
= + .
∂v ∂x ∂v ∂y ∂v
(Or f may be written instead of z.) (30.6)

Example 30.12 Use the chain rule (30.6) to obtain ∂z /∂v where x = u2 − v2,
y = 2uv, and z = xy; check the result by substitution.
For the chain rule, we require
∂z ∂z ∂x ∂y
= y, = x, = −2v, = 2u.
∂x ∂y ∂v ∂v
Then
∂z ∂z ∂x ∂z ∂y
= + = −2yv + 2xu = 2u 3 − 6uv2 .
∂v ∂x ∂v ∂y ∂v
To check the result, write z in terms of u and v:
z = xy = (u2 − v 2)2uv = 2u3v − 2uv3.
∂z
Therefore = 2u 3 − 6uv2 , as before.
∂v

There is clearly no advantage in using the chain rule for a simple explicit case
such as this. The use of such rules is to obtain general results as in the following
examples.

Example 30.13 Find expressions for ∂z /∂r and ∂z /∂θ when x = r cos θ,
y = r sin θ, and z is a function of position.
To use (30.6), put (r, θ ) in place of (u, v):
∂z ∂z ∂x ∂z ∂y ∂z ∂z
= + = cos θ + sin θ ,
∂r ∂x ∂r ∂y ∂r ∂x ∂y
∂z ∂z ∂x ∂z ∂y ∂z ∂z
= + = −r sin θ + r cos θ .
∂θ ∂x ∂θ ∂y ∂θ ∂x ∂y

Example 30.14 Find expressions for ∂z /∂x and ∂z /∂y in terms of ∂z /∂r and
∂z/∂θ, where x = r cos θ, y = r sin θ.
The appropriate form for chain rule (30.6) will be
∂z ∂z ∂r ∂z ∂θ ∂z ∂z ∂r ∂z ∂θ
= + , = + .
∂x ∂r ∂x ∂θ ∂x ∂y ∂r ∂y ∂θ ∂y
To find ∂r/∂x etc., use the alternative form for polar coordinates:
r = (x2 + y2) 2 , θ = arctan (y/x); ➚
1
678
Example 30.14 continued
CHAIN RULES, RESTRICTED MAXIMA, COORDINATE SYSTEMS

then
∂r x r cos θ
= 1 = = cos θ ;
∂x (x2 + y2 )2 r
∂θ 1 ⎛ y⎞ y r sin θ sin θ
= ⎜− ⎟ = − 2 =− =− .
∂x 1 + ( y / x)2 ⎝ x2 ⎠ x + y2 r2 r
Therefore
∂z ∂z sin θ ∂z
= cos θ − .
∂x ∂r r ∂θ
Similarly ∂r/ ∂ y and ∂θ /∂y can be calculated to give
∂z ∂z cos θ ∂z
= sin θ + .
∂y ∂r r ∂θ
(These can also be obtained by treating the pair of expressions for ∂z/∂r and ∂z/∂θ
obtained in Example 30.13 as if they were a pair of simultaneous equations for ∂z/∂x
and ∂z/∂y, and solving them.)

Example 30.15 Supposing that no further information is provided, simplify the


expression
∂P ∂U ∂P ∂V
+ .
∂U ∂M ∂V ∂M
We may understand from the notation that
30

P = P(U, V).
The partial derivative notation ∂U/∂M and ∂V/∂M indicates that
U = U(M, … ) and V = V(M, … ),
at least one more variable being present: the expression does not tell us its name. The
chain rule automatically simplifies the expression to
∂P ∂U ∂P ∂V ∂P
+ = .
∂U ∂M ∂V ∂M ∂M

Notice how the expressions in (30.6) are formed. Suppose that

P = P(U, V), Q = Q(U, V), U = U(X, Y), V = V(X, Y).

∂P
To form for example , write
∂X

∂P ∂P ∂P
= + ,
∂X ∂X ∂X
then fill in the spaces in the first term with ∂U and the second with ∂V.
679

Example 30.16 Prove that if (x, y) and (u, v) are coordinates related by

30.6
x = x(u, v) and y = y(u, v), (i)

or alternatively by

THE USE OF DIFFERENTIALS


u = u(x, y) and v = v(x, y), (ii)

then

⎡ ∂x ∂x ⎤ ⎡ ∂u ∂u ⎤
⎢ ∂u ∂v ⎥⎥
⎢ ∂x
⎢ ∂y ⎥⎥ ⎡1 0⎤
⎢ = = I2 .
⎢ ∂y ∂y ⎥ ⎢ ∂v ∂v ⎥ ⎢⎣0 1⎥⎦
⎢⎣ ∂u ∂v ⎥⎦ ⎢⎣ ∂x ∂y ⎥⎦
In the first matrix, the relations (i) are implied, and in the second the relations (ii).
By multiplying the matrices we obtain
⎡ ∂x ∂u ∂x ∂v ∂x ∂u ∂x ∂v ⎤
⎢ ∂u ∂x + ∂v ∂x ∂u ∂y + ∂v ∂y ⎥
⎢ ⎥. (iii)
⎢ ∂y ∂u + ∂y ∂v ∂y ∂u + ∂y ∂v ⎥
⎢⎣ ∂u ∂x ∂v ∂x ∂u ∂y ∂v ∂y ⎥⎦
Each of these elements has the right shape for the representation of a derivative by the
chain rule (30.6), though the variable combinations occupying the various positions my
seem unusual. The matrix becomes
⎡ ∂x ∂x ⎤
⎢ ∂x ∂y ⎥ ⎡1 0⎤
⎢ ⎥=⎢ ⎥,
⎢ ∂y ∂y ⎥ ⎣0 1⎦
⎢⎣ ∂x ∂y ⎥⎦
since x = x(u, v) = x(u(x, y), v(x, y)) = x and y = y(u, v) = y(u(x, y), v(x, y)) = y
identically.

Self-test 30.5
If x = r cos θ, y = r sin θ, verify identity (iii) in Example 30.16.

30.6 The use of differentials


Problems are sometimes made easier by working directly with the incremental
approximation (29.2): if z = f(x, y), then
∂z ∂z
δz ≈ δx + δy.
∂x ∂y
This can be more fruitful than searching for a chain rule or other formula which
will work. It is customary in certain applications, particularly in thermodynamics,
to write such a formula in the form
∂z ∂z
dz = dx + dy,
∂x ∂y
680
in which ‘≈’ becomes ‘=’ and dx, dy, dz are put in place of δx, δy, δz. Such
expressions can be easily manipulated in the same way as the differential forms
CHAIN RULES, RESTRICTED MAXIMA, COORDINATE SYSTEMS

described in Section 22.4 for functions of a single variable (the theory, however, is
somewhat difficult). Here we shall adopt ‘=’ for brevity, but retain δx etc.

Example 30.17 Find a vector normal to the curve f(x, y) = c at a point (x, y) on
the curve. (Compare Section 29.5.)
Let P be (x, y) and Q a nearby point (x + δx, y + δy) also on the curve (see Fig. 30.7). Put
z = f(x, y).

f(x, y) = c
R Q

δy
P
δx
Fig. 30.7

Then, since z is constant on the curve (it equals c),


∂z ∂z
δz = 0 = δx + δy,
∂x ∂y
where the derivatives are evaluated at P. This can be written
⎛ ∂z ∂z ⎞ .
⎜ , ⎟ (δx, δy) = 0.
⎝ ∂x ∂y ⎠
But (δx, δy) = P_Q is in the direction of the tangent at P (more and more nearly as PQ
30

becomes smaller, of course), so ( ∂z/ ∂x, ∂z/ ∂y) is a vector in the direction of the normal,
as we found in Section 29.6.

Example 30.18 Show that the coordinate system (u, v) defined by


x = 2uv and y = v2 − u2
is orthogonal.

y
Q u = aα

v = bβ 2u δv 2v δv

P
−2u δu
2v δu

O x Fig. 30.8

We have to show that any two curves given respectively by u = α and v = β intersect
in a right angle, as in Fig. 30.8. If u and v are allowed to vary arbitrarily, the
incremental formula gives ➚
681
Example 30.18 continued

PROBLEMS
δx = 2v δu + 2u δv, δy = −2u δu + 2v δv. (i)
But u does not vary on the curve u = α, so δu = 0 and (i) becomes δx = 2u δv, δy = 2v δv.
The vector P_Q points nearly in the direction of the tangent at P:
P_Q = (δx, δy) = (2u δv, 2v δv). (ii)
Similarly, on the curve v = β, we have δv = 0; so
δx = 2v δu, δy = −2u δu. (iii)
P_R points in the direction of the tangent to v = β, and
P_R = (δx, δy) = (2v δu, −2u δu).
From (ii) and (iii), we have
P_Q · P_R = (2u δv, 2v δv)·(2v δu, −2u δu)
= 4uv δu δv − 4uv δu δv = 0,
so the curves intersect in a right angle.

Problems
30.1 Find a parametrization (x(t), y(t)) suitable (a) Find the maximum area of a rectangle having
for the following curves, specifying the range of perimeter of length 10.
t required to traverse the curve exactly once, in (b) Find the rectangle with area 9 which has the
the anticlockwise direction if the curve is closed. shortest perimeter.
(a) x 2 + y 2 = 25; (c) Find the stationary points of x2 + 2y2 subject to
(b) 41 x 2 + 91 y 2 = 1; x2 + y2 = 1.
(c) xy = 4; (d) Find the largest rectangle in the first quadrant
(d) x 2 − y 2 = 1 (try using the identity 1 + tan2A of the (x, y) plane which has two of its sides
= 1/cos2A); along x = 0 and y = 0 respectively, and a vertex
(e) 41 x 2 − 91 y 2 = 1; on the line 2x + y = 1.
(f) y2 = 4ax; (e) Find the minimum distance of the straight line
(g) (x − 1)2 + (y − 2)2 = 9; x + 2y = 1 from the point (1, 1). (It is easier to
(h) 2x − 5y + 2 = 0. consider the square of the distance.)
(f) Find the shortest distance from the origin to
30.2 For each of the following cases, obtain df/dt the curve x2 + 8xy + 7y 2 = 225.
in terms of t by means of the chain rule (30.1). (g) With reference to Fig. 30.4, find the rectangle
(a) f(x, y) = x2 + y2, x(t) = t, y(t) = 1/t; in the ellipse which has the maximum
(b) f(x, y) = x2 − y2, x(t) = cos t, y(t) = sin t; perimeter.
(c) f(x, y) = xy, x(t) = 2 cos t, y(t) = sin t; (h) Find the stationary points of (x − y + 1)2 on
(d) f(x, y) = x sin y, x(t) = 2t, y(t) = t 2; y = x2.
(e) f(x, y) = 4x2 + 9y2, x(t) = --12 cos t, y(t) = --13 sin t. (i) Show that in general there are three normals
to a parabola from any given point inside it.
30.3 Two athletes run around concentric circular
tracks of radius r and R with speeds v and V 30.5 Find the stationary points of f(x, y) on
respectively. They start on the same radial line. By g(x, y) = c (i) by parametrizing the given path as in
using time as a parameter, find the rate of change Example 30.6, (ii) by using the Lagrange-multiplier
with time of the distance between them and technique, in each of the following cases.
interpret any stationary points. (a) f(x, y) = x2 + y2 on g(x, y) = xy = 1;
(b) f(x, y) = x2 + y2 on (x − 1)2 + y2 = 1;
30.4 Use the Lagrange-multiplier method to solve (c) f(x, y) = x2 + 4y2 on x2 + y2 = 1;
the following problems. (d) f(x, y) = 3x − 2y on x2 − y2 = 4;
682
(e) f(x, y) = xy on g(x, y) = x2 + y2 = 1 (compare 30.9 Use the chain rule (30.6) to find ∂f/∂u
this with (a)). and ∂f/∂v in terms of u and v in each of the
CHAIN RULES, RESTRICTED MAXIMA, COORDINATE SYSTEMS

following cases.
30.6 Show by means of sketches that, for the (a) f(x, y) = 2x − y, x = uv, y = u2 − v2;
restricted stationary-value problem, a stationary (b) f(x, y) = y/x, x = u + v, y = u − v;
point can be expected at any point where the curve (c) f(x, y) = y2, x = u2 + v2, y = v/u;
g(x, y) = c is tangential to a contour of f(x, y). (d) f(x, y) = (x − y)/(x + y), x = v, y = u − v.
Use this observation to derive the Lagrange-
multiplier principle. (Hint: consider the normals at 30.10 By using the chain rule (30.6) twice, obtain
the point of tangency; or use implicit differentiation ∂ 2f /∂u2, ∂ 2f /∂v2, and ∂ 2f /∂u ∂v in each of the
to get expressions for the directions of the curves following cases.
there.) (a) f(x, y) = y/x, x = u + v, y = u − v;
There are cases when a stationary point can (b) f(x, y) = x2 + y 2, x = uv, y = u 2 − v2;
occur although the curves are not tangential there. (c) f(x, y) = y2, x = uv, y = v.
Try to identify these cases by sketching various
possibilities. (Hint: they correspond to λ = 0.)
30.11 Find expressions for ∂f /∂u, ∂f /∂v, ∂2f /∂u2,
30.7 A change of coordinates from (x, y) to (u, v) ∂2f /∂v2, and ∂2f /∂u ∂v if
is specified by each of the following. Show that f(x, y) = g(x2 − y2), x = u + v, y = u − v.
the new coordinate system is orthogonal. (The expressions will involve the functions
(a) u = 2x + 3y, v = −3x + 2y; g′(x 2 − y 2) etc.)
(b) u = xy, v = x2 − y2;
(c) u = x2 + 2y 2, v = y /x2;
(d) u = xy 2, v = y2 − 2x2; 30.12 Let w = w(u, v), u = u(x, y),
(e) u = x + 1/x + y2/x, v = y − 1/y + x2/y; v = v(x, y), where u and v are related in such
(f) x = 2u − v, y = u + 2v; a way that
(g) x = u2 − v2, y = 2uv; ∂u ∂v ∂u ∂v
= , =− .
(h) x = u/(u2 + v2), y = v/(u2 + v 2); ∂x ∂y ∂y ∂x
(i) x = u2 − v2, y = −2uv. Prove that
30.8 Let r(t) and θ (t) be polar coordinates which ∂ 2u ∂ 2u ∂ 2v ∂ 2v
30

+ = 0, + = 0.
are functions of a parameter t. ∂x 2 ∂y 2 ∂x 2 ∂y 2
(a) Express dx/dt and dy/dt in terms of dr/dt, Use the chain rule (30.6) to prove that
dθ /dt, r, and θ .
∂ 2w ∂ 2w ⎡⎛ ∂u ⎞ ⎛ ∂u ⎞ ⎤ ⎡ ∂ 2w ∂ 2w ⎤
2 2
(b) Use (a) to obtain expressions for d2x/dt2 and
+ = ⎜ ⎟ + ⎜ ⎟ ⎥⎢ 2 +
⎢ .
d2y/dt2. ∂x 2 ∂y 2 ⎢⎝ ∂x ⎠ ⎝ ∂y ⎠ ⎥ ⎣ ∂u ∂v 2 ⎥⎦
(c) Prove that ⎣ ⎦
2
d 2x d 2y d 2r ⎛ dθ ⎞
cos θ + sin θ 2 = 2 − r ⎜ ⎟ , 30.13 Let r and θ be the usual polar coordinates,
dt 2
dt dt ⎝ dt ⎠
and z = f(x, y); show that:
d 2y d 2x 1 d ⎛ 2 dθ ⎞
cos θ − sin θ 2 = ⎜r ⎟. ⎛ ∂z ⎞
2
⎛ ∂z ⎞
2
⎛ ∂z ⎞
2
1 ⎛ ∂z ⎞
2
dt 2
dt r dt ⎝ dt ⎠ (a) ⎜ ⎟ + ⎜ ⎟ = ⎜ ⎟ + 2 ⎜ ⎟ ;
⎝ ∂x ⎠ ⎝ ∂y ⎠ ⎝ ∂r ⎠ r ⎝ ∂θ ⎠
(These two equations express the radial and
tangential components of acceleration, given on ∂ 2 z ∂ 2 z ∂ 2 z 1 ∂z 1 ∂ 2z
(b) + 2 = 2 + + 2 .
the left, in terms of polar coordinates.) ∂x 2
∂y ∂r r ∂r r ∂θ 2
Functions of any number
of variables 31

CONTENTS

31.1 The incremental approximation; errors 683


31.2 Implicit differentiation 686
31.3 Chain rules 688
31.4 The gradient vector in three dimensions 688
31.5 Normal to a surface 690
31.6 Equation of the tangent plane 691
31.7 Directional derivative in terms of gradient 692
31.8 Stationary points 696
31.9 The envelope of a family of curves 702
Problems 704

The occurrence of functions of many variables is very common. Some subjects


demand continual use of the chain rules in various forms; for example, Hamil-
tonian mechanics, thermodynamics and general relativity. Fortunately the basic
results are mostly natural extensions to more terms of those applying to two and
three variables.
When the variables number three or more the possibility of representing the
situation adequately on a two-dimensional sheet of paper using geometrical intui-
tion begins to fail, and the treatment becomes more algebraical. As in Chapters 29
and 30, everything springs from the (linear) incremental approximations (31.1),
from which the formulae derived in this chapter mostly follow without difficulty.
It is useful to know about the gradient vector (see Section 29.6) and the scalar
product of two vectors (see Chapter 10).

31.1 The incremental approximation; errors


For functions of three and more variables, simple pictorial representations are not
available. Nevertheless many of the important formulae follow the pattern of the
two-variable case, simply containing more terms of the same type. This follows
from the incremental approximation (29.1), extended to three and more variables.
Suppose that f(x, y, z, … ) is any function of N ( 3) variables. The partial
derivatives, ∂f /∂x, ∂f/ ∂y, ∂f/ ∂z, … , have the same meaning as they did in
684
Chapter 28: during differentiation, all the variables except the named one are
treated as constants.
FUNCTIONS OF ANY NUMBER OF VARIABLES

Higher derivatives are defined as with functions of two variables; for example,
∂ 3f ∂ ∂ ∂f
= .
∂x ∂y ∂z ∂x ∂y ∂z
It follows from the result for second derivatives (Equation (28.3)) that, for smooth
functions,
∂ 3f ∂ 3f ∂ 3f
= =
∂x ∂y ∂z ∂y ∂x ∂z ∂z ∂y ∂x
and so on: the derivatives may be taken in any order.
The incremental approximation has the same form as (29.1) and (29.2), simply
containing further terms corresponding to the extra variables:

Incremental approximation for f(x, y, z, … )


For small enough increments δx, δy, δz, … :
δf = f(x + δx, y + δy, z + δz, … ) − f(x, y, z, … )
31

∂f ∂f ∂f
≈ δx + δy + δz +  .
∂x ∂y ∂z
If we put w = f(x, y, z, … ), this can be written
∂w ∂w ∂w
δw ≈ δx + δy + δz +  .
∂x ∂y ∂z (31.1)

To prove (31.1), the idea of a tangent plane is not available, so we must go


directly for the linear approximation to the function. Put w = f(x, y, z, … ), and
consider a fixed ‘point’ P : (x, y, z, … ) and another nearby point Q : (x + δx,
y + δy, z + δz, … ). Then w changes to w + δw. We assume that the relation be-
tween δw and δx, δy, δz, … is close to linear for small δx, δy, δz, … ; that is to say,
δw = A δx + B δy + C δz + ··· + ε, (31.2)

where A, B, … are certain constants and the error ε is of a lower order of


magnitude than the δ quantities (compare (29.2) for two variables).
In order firstly to find A, vary only x, so that
δx ≠ 0, δy = δz = ··· = 0.
Put these into (31.2) and divide by δx, giving
δw ε
=A+ .
δx δx
Now let δx → 0. Then δw/δy → ∂w/∂x, and (since ε is of lower order of magnitude
than δx) ε /δx → 0. Therefore
∂w
= A.
∂x
685
Similarly ∂w/∂y = B, and so on, which gives the result (31.1).
The incremental approximation (31.1) can be used to estimate errors as in

31.1
Section 29.2:

THE INCREMENTAL APPROXIMATION; ERRORS


Small-error formula
If w = f(x, y, z, … ), then (approximately)
∂w ∂w ∂w
∆w = ∆x + ∆y + ∆z +  ,
∂x ∂y ∂z
where x, y, z, … stand for the measured values and ∆ stands for
error = (measured value) − (exact value). (31.3)

Example 31.1 In a triangle ABC, cos C = (c2 − a2 − b2)/(2ab) (cosine rule,


Appendix B). In a particular case, the measured side lengths are a = 3, b = 4,
and c = 5.5 units. Possible errors of measurement lie between ±0.1 units.
Find the error in estimating C in the worst case. (Compare Example 29.6.)
For ease of differentiation put
c2 a b
w = cos C = − − .
2ab 2b 2a
Then
∂w ∂w ∂w
∆(cos C) = ∆ w ≈ ∆a + ∆b + ∆ c,
∂a ∂b ∂c
where
∂w c2 1 b ∂w c2 a 1 ∂w c
= − 2 − + 2, =− + − , = .
∂a 2a b 2b 2a ∂b 2ab2 2b2 2a ∂c ab
From the measurements,
∂w ∂w ∂w
= − 0.323, = − 0.388, = 0.458.
∂a ∂b ∂c
Therefore ∆(cos C) ≈ −0.323 ∆a − 0.388 ∆b + 0.458 ∆c.
To obtain ∆C from ∆(cos C):
⎛ d ⎞
∆(cos C) ≈ ⎜ cos C⎟ ∆C = (−sin C) ∆C,
⎝ dC ⎠
where C is in radians. The value of C estimated from the cosine rule using the measured
values is 1.350 radians, and sin 1.350 = 0.976. Therefore
∆C ≈ −0.334 ∆a − 0.398 ∆b + 0.469 ∆c.
The magnitude of ∆C is a maximum if by chance the errors are
∆a = z0.1, ∆b = z0.1, ∆c = ±0.1,
and then
∆C ≈ ±0.120.
which is about a 9% error.
686

Self-test 31.1
FUNCTIONS OF ANY NUMBER OF VARIABLES

The area A of a triangle in which the angle between two sides of lengths a and
b is C, is given by A = --12 ab sin C. The measured lengths are a = 2, b = 3 and
C = 30°. Possible errors of measurement are ±0.1 for a and b, and ±3° for C.
Find the maximum error in the worst case.

31.2 Implicit differentiation


There is an analogy with the implicit-differentiation formula (29.6). Suppose that
f(x, y, z, … ) = 0. (31.4)

This condition implies that any one of the variables depends on, or is a function
of, all the others. For example, if the variables are x, y, z, and r, and
x 2 + y 2 + z 2 − r 2 = 0,
then
31

1
y = ±(r 2 − x 2 − z 2)–2 .
Subject to (31.4) we can therefore talk about partial derivatives such as ∂y/∂x:
we think of y as being a function of the other variables, but with all the variables
except x and y held constant.
Suppose that (x, y, z, … ) and (x + δx, y + δy, z + δz, … ) both satisfy condition
(31.4). Then δf = 0 and the incremental approximation gives
∂f ∂f ∂f
δx + δy + δz + $ ≈ 0. (31.5)
∂x ∂y ∂z
Suppose next that all the variables except x and y are kept constant, so that δx ≠ 0
and δy ≠ 0, but δz = ··· = 0. Equation (31.5) becomes (∂f/∂x) δx + (∂f/∂y) δy ≈ 0,
so that
δy ∂f ∂f
≈− .
δx ∂x ∂y
Now let δx → 0 and the equation becomes (compare (29.6))
∂y ∂f ∂f
=− :
∂x ∂x ∂y

Implicit differentiation
If f(x, y, z, … ) = 0, then
∂y ∂f ∂f
=− .
∂x ∂x ∂y
Any other two variables may be substituted for x and y. (31.6)
687

Example 31.2 For a fixed mass of gas, an equation of the form f(P, V, T) = 0

31.2
holds (the ‘equation of state’), where P, V, and T represent the pressure, volume,
and temperature respectively. Show that

IMPLICIT DIFFERENTIATION
∂P ∂T ∂P ∂P ∂T ∂V
(a) =− , (b) = −1.
∂T ∂V ∂V ∂T ∂V ∂P
The relation f(P, V, T) = 0 implies that any of P, V, or T is a function of the other
two variables: P = P(V, T), V = V(T, P), and T = T(P, V). If we put, say P = P(V, T)
= constant, then implicit differentiation, by (31.6), gives ∂V/∂T or ∂T/∂V in terms of
∂f/ ∂V and ∂f/∂T (where we are reminded of the ‘constant P’ condition by the partial
derivative signs instead of dV/dT and dT/dV). Similarly we obtain ∂P/∂T, ∂T/∂P,
∂P/∂V, and ∂V/∂T.
∂P ∂f ∂f ∂T ∂f ∂f
(a) =− and =−
∂T ∂T ∂P ∂V ∂V ∂T
(from (31.6)). Therefore
∂P ∂T ∂f ∂f ∂P
= =− (using (31.6) again).
∂T ∂V ∂V ∂P ∂V
(b) By repeating the process (a) with different variables,
∂P ∂T ∂V ⎛ ∂f ∂f ⎞ ⎛ ∂f ∂f ⎞ ⎛ ∂f ∂f ⎞
= ⎜− ⎟ ⎜− ⎟ ⎜− ⎟ = −1.
∂T ∂V ∂P ⎝ ∂T ∂P ⎠ ⎝ ∂V ∂T ⎠ ⎝ ∂P ∂V ⎠

There are many more similar formulae obtainable by permuting P, T, V: these


identities are important in the theory of thermodynamics. A helpful notation,
encountered in physics texts, contains a reminder of the variables to be held
constant during differentiation. If f(P, V, T) = 0, then the new notation puts
⎛ ∂P ⎞
⎜ ⎟
⎝ ∂V ⎠ T
in place of ∂P/∂V. The formula in Example 31.2a would become
⎛ ∂P ⎞ ⎛ ∂T ⎞ ⎛ ∂P ⎞
⎜ ⎟ ⎜ ⎟ = −⎜ ⎟ ,
⎝ ∂T ⎠ V ⎝ ∂V ⎠ P ⎝ ∂V ⎠ T
indicating that the symbols ∂T on the left cannot simply be ‘cancelled’ as with
ordinary derivatives.

Self-test 31.2
Let f(x, y, z) ≡ x3 + yz + xy2 + z2 = 0. Find the implicit derivatives
A ∂x D A ∂z D A ∂x D
C ∂z F y , C ∂y F x , C ∂y F z .

A ∂x D A ∂z D A ∂x D
Verify that C F C F = − C F .
∂z y ∂y x ∂y z
688

31.3 Chain rules


FUNCTIONS OF ANY NUMBER OF VARIABLES

The chain rule for a single parameter t is obtained exactly as in the case of a single
variable: divide (31.1) by δt and take the limit, to give the following formula.

Chain rule for a single parameter


Given f(x, y, z, … ), where x = x(t), y = y(t), z = z(t), … ,
df ∂f dx ∂f dy ∂f dz
= + + +
dt ∂x dt ∂y dt ∂z dt
(or with w in place of f if w = f(x, y, z, … )). (31.7)

Notice that (x(t), y(t), z(t)) defines a directed path in three dimensions.
In the case of more than one parameter, the results of Section 30.1 may be
extended as follows.

Chain rule for more than one parameter


31

For a function f(x, y, z, … ), where x, y, z, … are functions of parameters u, v, … ,


we have
∂f ∂f ∂x ∂f ∂y ∂f ∂z
= + + +,
∂u ∂x ∂u ∂y ∂u ∂z ∂u
∂f ∂f ∂x ∂f ∂y ∂f ∂z
= + + +,
∂v ∂x ∂v ∂y ∂v ∂z ∂v
and so for any other parameters. (If w = f(x, y, z, … ) then w may be
written in place of f.) (31.8)

31.4 The gradient vector in three dimensions


The gradient vector function, introduced for two dimensions in Section 29.6,
extends to any number of dimensions, though we shall restrict consideration to
three variables in this section.
In the equations we have obtained, such as (31.1) and (31.8), there repeatedy
occurs the triplet of elements ∂f/ ∂x, ∂f/ ∂y, ∂f / ∂z, together with various multi-
pliers. We can manipulate this group as a unit by regarding
⎛ ∂f ∂f ∂f ⎞ ∂f ∂f ∂f
⎜ , , ⎟ , or î + q+ x,
⎝ ∂x ∂y ∂z ⎠ ∂x ∂y ∂z
as a vector function; the gradient of f, now in three dimensions; and denote it by
grad f or ∇f,
as before. As in Section 29.6, we can also think of grad or ∇ standing alone as an
operator: an instruction to carry out the process
689
∂ ∂ ∂ ⎛ ∂ ∂ ∂⎞
î +q + x , or ⎜ , , ⎟,
∂x ∂y ∂z ⎝ ∂x ∂y ∂z ⎠

31.4
on some scalar function f(x, y, z). The definition is stated for reference as

THE GRADIENT VECTOR IN THREE DIMENSIONS


follows.

Gradient vector function (three dimensions)


For a scalar function f(x, y, z):
⎛ ∂f ∂f ∂f ⎞ ∂f ∂f ∂f
grad f or ∇f = ⎜ , , ⎟ =î +q +x .
⎝ ∂x ∂y ∂z ⎠ ∂x ∂y ∂z
Alternatively, grad or ∇ stands for the operator
⎛ ∂ ∂ ∂⎞ ⎛ ∂ ∂ ∂⎞
⎜î +q + x ⎟ , or ⎜ , , ⎟.
⎝ ∂x ∂y ∂z ⎠ ⎝ ∂x ∂y ∂z ⎠
(31.9)

Example 31.3 Let f(x, y, z) = x2 + y2 + z2. Obtain (a) the vector function grad
f(x, y, z); (b) the value of grad f(x, y, z) at the point (1, 2, 3); (c) an expression
for the magnitude (or length) of grad f(x, y, z).
⎛ ∂f ∂f ∂f ⎞
(a) grad f(x, y, z) = ⎜ , , ⎟ = (2x, 2y, 2z);
⎝ ∂x ∂y ∂z ⎠
or one can use the ‘operator’ idea and the other way of writing a vector:
⎛ ∂ ∂ ∂⎞
grad f = ⎜ î + q + x ⎟ (x2 + y2 + z 2 ) = î(2x) + q(2y) + x(2z).
⎝ ∂x ∂y ∂z ⎠
(b) At x = 1, y = 2, z = 3,
grad f = (2, 4, 6).
(c) The magnitude or length |v| of a vector v = (a, b, c) is |v| = (a2 + b2 + c 2 )2 ; so
1

| grad f | = [(2x)2 + (2y)2 + (2z)2 ]2 = 2(x2 + y2 + z 2 )2 .


1 1

Expressions which occur in the theory frequently take the form


∂f ∂f ∂f
U +V +W , (31.10)
∂x ∂y ∂z
where U, V, and W may be constants, or various functions. If we put
îU + qV + xW = S,
where S is another vector (compare Section 29.6), then we can write (31.10) in
the form
∂f ∂f ∂f ⎛ ∂f ∂f ∂f ⎞
U +V +W = (U, V, W) ⋅ ⎜ , , ⎟ = S . grad f ,
∂x ∂y ∂z ⎝ ∂x ∂y ∂z ⎠
as in the following example.
690

Example 31.4 Suppose that the concentration of plankton in the sea is


FUNCTIONS OF ANY NUMBER OF VARIABLES

C(x, y, z, t). A whale travels on the path x = x(t), y = y(t), z = z(t), where t
is time. Show that, along the path of the whale,
dC ∂ C
= + v . grad C,
dt ∂t
where v is its velocity.
By the chain rule (31.7),
dC ∂C dx ∂C dy ∂C dz ∂C
= + + + ,
dt ∂x dt ∂y dt ∂z dt ∂t
after putting dt/dt = 1 into the final term. The whale’s velocity is
⎛ dx dy dz ⎞
v=⎜ , , ⎟,
⎝ dt dt dt ⎠
so that
dC ∂ C
= + v . grad C.
dt ∂t
(If the whale drifted with the motion of the sea, v would represent the velocity of the
31

current. This case is related to the concept of material derivative in fluid mechanics.
Instead of C there is a quantity such as the density or momentum of a particular piece
of fluid, whose variation we follow as the fluid moves around.)

Self-test 31.3
If f(x, y, z) = x2 + y2 + 2z2, find grad f. At what point on the surface
x2 + y2 + 2z2 = 1 is grad f in the direction of the vector (1, 1, 1)?

31.5 Normal to a surface


An equation of the form
g(x, y, z) = k
represents a surface in three dimensions, because we can imagine ‘solving’ the
equation for z in order to obtain equivalent equation(s):
z = f(x, y).
Thus, if x2 + y2 + z2 = 1, then z = ±(1 − x2 − y2 )2 . The normal to the surface can
1

be expressed neatly as follows.

Normal (perpendicular) to a surface


Let P be any point on a surface g(x, y, z) = k. Then grad g, evaluated at P, is
normal to the surface at P. (31.11)
691

31.6
C
)=
y,z grad g(x, y, z)
g(x,

EQUATION OF THE TANGENT PLANE


P Q
(x + δx, y + δy, z + δz)
(x, y, z)

O
x

y Fig. 31.1

(Compare (29.9), for the normal to a curve in two dimensions.) The proof is as
follows. In Fig. 31.1, P : (x, y, z) is the given point on the surface and Q : (x + δx,
y + δy, z + δz) is any nearby point on the surface. Then
g(x + δx, y + δy, z + δz) − g(x, y, z) = 0, or δg = 0.
Therefore, by the incremental formula (31.1),
∂g ∂g ∂g
0= δx + δy + δz = (grad g) · (δx, δy, δz).
∂x ∂y ∂z
This shows that grad g is perpendicular to the vector (δx, δy, δz). But P_Q can
be chosen to point in any direction from P in the surface, so the only possibility is
that grad g is perpendicular to the surface itself at P.
We already know (from (28.7)) that a vector normal to a suface described in the
form z = f(x, y) is
⎛ ∂f ∂f ⎞
⎜ , , −1⎟ .
⎝ ∂x ∂y ⎠
This is reconciled with (31.11) if we write its equation in the form
g(x, y, z) = f(x, y) − z = 0.

31.6 Equation of the tangent plane


Suppose that a surface is specified in the form g(x, y, z) = k, and that the point
P : (a, b, c) is on the surface. The conditions that the tangent plane must satisfy
are (a) it contains the point P; and (b) it is perpendicular to the normal vector
grad g(x, y, z) evaluated at P, as in (31.11). These conditions are satisfied by the
equation
⎛ ∂g ⎞ ⎛ ∂g ⎞ ⎛∂g ⎞
⎜ ⎟ (x − a) + ⎜ ⎟ (y − b) + ⎜ ⎟ (z − c) = 0 ,
⎝ ∂x ⎠ P ⎝ ∂y ⎠ P ⎝∂z ⎠ P (31.12)

where g(a, b, c) = k. It can be seen that the expression is zero when x = a, y = b,


z = c. Also the coefficient vector
⎛ ⎛ ∂g ⎞ ⎛ ∂g ⎞ ⎛ ∂g ⎞ ⎞
⎜ ⎜⎝ ∂x ⎟⎠ , ⎜⎝ ∂y ⎟⎠ , ⎜⎝ ∂z ⎟⎠ ⎟ = [grad g]P
⎝ P P P⎠
692
is perpendicular to the plane. Therefore we may state the equation of the plane as
follows.
FUNCTIONS OF ANY NUMBER OF VARIABLES

Tangent plane to the surface g(x, y, z) = k at P : (a, b, c)


⎛ ∂g ⎞ ⎛ ∂g ⎞ ⎛ ∂g ⎞
⎜ ⎟ (x − a) + ⎜ ⎟ (y − b) + ⎜ ⎟ (z − c) = 0.
⎝ ∂x ⎠ P ⎝ ∂y ⎠ P ⎝ ∂z ⎠ P
(31.13)

Self-test 31.4
Find the normal to the surface x2 + 2xy + xz3 at the point (1, 1, 1). Hence find
the tangent plane to the surface at (1, 1, 1).

31.7 Directional derivative in terms of gradient


The vector grad f contains the necessary information to calculate the rate of
31

change of f(x, y, z) in any direction. In Fig. 31.2, let P : (a, b, c) be any point.
Suppose that we require the rate of change with distance of f(x, y, z) in the
direction PR.

Q R

δz

δx N
P
δy
y M

O
x Fig. 31.2

Choose a nearby point Q : (x + δx, y + δy, z + δz) on PR, and put


PQ = δs = (δx2 + δy2 + δz2) 2
1

(where δs is a standard symbol for a small element of distance). Then


δx δy δz
= cos α , = cos β, = cos γ ,
δs δs δs
where cos α, cos β, cos γ are the direction cosines of PQ (Section 10.5): α, β, γ are
the angles PQ makes with the directions of the coordinate axes. Now divide the
incremental approximation (31.1) through by δs and take the limit as δs → 0.
We obtain an expression for the rate of change of f(x, y, z) with distance in any
direction:
693

Directional derivative in three dimensions

31.7
In the direction having direction cosines (cos α, cos β, cos γ ):
∂f ∂f ∂f ∂f
= cos α + cos β + cos γ .

DIRECTIONAL DERIVATIVE IN TERMS OF GRADIENT


∂s ∂x ∂y ∂z (31.14)

In the two-dimensional version (29.4), the coefficients cos θ and sin θ are equal
to the two-dimensional direction cosines, cos θ and cos( 12 π − θ ), so (29.4) is
compatible with (31.14).
The direction cosines cos α, cos β, cos γ have the property cos2α + cos2β + cos2γ
= 1 (see Section 10.5), so they are the components of a unit vector v which points
in the desired direction. Therefore (31.14) can be written differently:

Directional derivative in three dimensions in terms of the gradient


In the direction of the unit vector v,
df
= v . grad f ,
ds
which is the component of grad f in direction v. (31.15)

As in Section 29.6, the result (31.15) can be expressed in a third way. If a and b
are two vectors, and φ is the angle between them, then a · b = | a | | b| cos φ. Putting
v for a and grad f for b in (31.15), and using the fact that |v| = 1, we obtain the next
result.

Directional derivative in three dimensions


df
= | grad f | cos φ,
ds
where φ is the angle between grad f and the unit direction vector v. (31.16)

Now take a function f(x, y, z), and a point P : (x1, y1, z1) as in Fig. 31.3. By
means of (31.16), we can explore the rate of variation of f(x, y, z) in all directions,
by pointing v in the required directions. The only thing that changes the value of
df/ds when we do this is the angle φ. It can be seen from (31.16) that: (i) if φ = 12 π,
then df/ds = 0, which is consistent with v pointing tangentially to the surface
f(x, y, z) = fP, where fP is the value of f at P; (ii) df /ds takes its maximum value
| grad f |, when φ = 0. That is to say, grad f points in the direction of most rapid
increase of f and it is normal to the surface f(x, y, z) = fP.
It is worth noticing that, for a fixed angle φ, the unit vector v may point
anywhere along the generators of a cone having axis grad f, as shown in Fig. 31.3.
The directional derivative df/ds is the same in all these directions.
694

z
FUNCTIONS OF ANY NUMBER OF VARIABLES

grad f

φ v

P
y

x Fig. 31.3

Example 31.5 Let f(x, y, z) = 4 − x2 − 21 y2 − 21 z 2 represent the atmospheric


concentration of a chemical which attracts insects. (a) Write down grad f at
31

(x, y, z). (b) Find a unit vector v which points in the direction of most rapid
rate of increase in f(x, y, z) at the point (1, 1, 1). (c) An insect sets off from
(1, 1, 1) and flies a short distance δs in the direction given by (b). Find its
new coordinates (approximately).
⎛ ∂f ∂f ∂f ⎞
(a) grad f (x, y, z) = ⎜ , , ⎟ = (−2x, −y, −z), or −2xî − yq − zx.
⎝ ∂x ∂y ∂z ⎠
(b) grad f always points in the required direction; at the point (1, 1, 1), its components
are (−2, −1, −1). To obtain the corresponding unit vector v, divide by the length
[(−2)2 + (−1)2 + (−1)2]–2 = √6, obtaining v = (−2/√6, −1/√6, −1/√6).
1

(c) The insect moves a distance δs from the point P : (1, 1, 1) along v (see Fig. 31.4),
so its vector displacement - is

P
δs
(1, 1, 1)
y
v δs

x Fig. 31.4


695
Example 31.5 continued

31.7
⎛ 2 1 1 ⎞
v δs = ⎜ − δs, − δs, − δs⎟ .
⎝ √6 √6 √6 ⎠

DIRECTIONAL DERIVATIVE IN TERMS OF GRADIENT


The components of this vector are the x, y, z displacements
2 1 1
δx = − δs, δy = − δs, δz = − δs.
√6 √6 √6
The new coordinates are therefore
2 1 1
x=1− δs, y=1− δs, z =1− δs.
√6 √6 √6

Example 31.6 Using Example 31.5c as a model, give a systematic method,


suitable for computation, for approximating to the path of an insect which
always flies in the direction of most rapidly increasing concentration.
Figure 31.5 shows notionally the path of such an insect. It starts at P0 : ( x0, y0, z0),
and the path consists of short steps of equal length h (instead of δs, for the purpose
of programming). The progression is P0, P1, P2, … , Pn , Pn+1, … , with coordinates
numbered (xn , yn , zn ) for n = 0, 1, 2, … . At each Pn , the insect moves in the
unit-vector direction vn , where sn = grad f(x, y, z) and, as in Example 31.5c,

vn

Pn h Pn+1

P2
h
P1
h
P0 Fig. 31.5

vn = [(grad f )/| grad f |]Pn = (−2xn, −yn, −zn)/(4xn2 + y n2 + z n2 )–2 = (an, bn, cn)
1
(say). (31.17)

For the general step from Pn to Pn+1, we obtain the small displacement components δxn,
δyn, δzn in the x, y, and z directions (in general these will differ from step to step):
(δxn, δyn, δzn) = vnh = (an h, bn h, cn h),
from (31.17). Therefore
(xn+1, yn+1, zn+1) = (xn + δxn, yn + δyn, zn + δzn) = (xn + anh, yn + bn h, zn + cn h). (31.18)

Equations (31.17) and (31.18), with the starting point (x0, y0, z0) given, form a step-by-
step process which is easy to computerize. The following table of the early stages was
calculated with h = 0.05; the starting point in this case is the point (1, 1, 1), where
f(x, y, z) = 4 − x2 − 12 y2 − 12 z 2 as in Example 31.5.

n xn yn zn

0 1 1 1
1 0.959 0.980 0.980
2 0.919 0.959 0.959
3 0.878 0.938 0.938
4 0.839 0.917 0.917
5 0.799 0.895 0.895
696
If a surface is defined by the equation
FUNCTIONS OF ANY NUMBER OF VARIABLES

f(x, y, z) = k,
where k is a constant, it is called a level surface of the function f (it is the analogy
of a contour in the theory for functions of two variables). For example, the level
surfaces of
x2 y2 z2
f(x, y, z) = + + , (a, b, c constants),
a2 b2 c2
are the ellipsoids
x2 y2 z2
+ + = k, (k  0).
a2 b2 c2
According to (31.11), therefore, we can say in different language:

Normal to a level surface of f(x, y, z)


grad f, evaluated at a point P, is perpendicular to the level surface of
f(x, y, z) through P. (31.19)
31

It follows that the insect in Example 31.6 crosses perpendicularly all the level
surfaces that it meets.

Self-test 31.5
If f(x, y, z) = z − x2 − y2, what are the level surfaces of the function? Find the
directional derivative at (1, 1, 3). Sketch the level surface through (1, 1, 3) and
the direction of the directional derivative.

31.8 Stationary points


Stationary points (which include maxima and minima) are more difficult to
discuss and visualize in more than two dimensions, since we no longer have
the horizontal tangent plane to refer to. We should expect a stationary point of
f(x, y, z, … ) to occur at any point Q where
∂f ∂f ∂f
= = =  = 0, (31.20)
∂x ∂y ∂z
since all our previous formulae have been merely extended versions of the two-
dimensional case. To show that this criterion is the right one, choose any path
through Q, and suppose that we describe it parametrically by
x = x(t), y = y(t), z = z(t), … .
Then, if (31.20) holds at Q, the chain rule (31.7) together with (31.20) gives
697
df ∂f dx ∂f dy ∂f dz
= + + = 0.
dt ∂x dt ∂y dt ∂z dt

31.8
Therefore a turning point of f(x(t), y(t), z(t), … ) is encountered at Q on every

STATIONARY POINTS
path passing through Q, and this is what we should wish to happen for the
point Q to be described as stationary.

Stationary points of f(x, y, z, … )


The stationary points are the solutions (x, y, z, … ) of the equations
∂f ∂f ∂f
= = =  = 0.
∂x ∂y ∂z (31.21)

Example 31.7 Find the stationary points of the function


f(x, y, z) = x + y2 + z2 − xy − 2yz − zx − z.
2

The conditions (31.21) become


∂f/ ∂x = 2x − y − z = 0,
∂f/ ∂y = −x + 2y − 2z = 0,
∂f/ ∂z = −x − 2y + 2z − 1 = 0.
By elimination the only solution of these equations is x = − 12 , y = − 58 , z = − 38 .

Restricted stationary-value problems (see Section 30.2) may occur in any num-
ber of dimensions. In three dimensions, the restriction may be either to values of
f(x, y, z) on some given curve, or to values on some given surface.
To help visualize a three-dimensional situation, suppose that a fish swims
through a field of pollution of density P = f(x, y, z). At some point in the sea the
pollution is at an overall maximum, but this is of no concern to the fish if it does
not swim through it. However, it will notice highs and lows along its own path
even if there is nothing special about such points from an overall viewpoint.
These are restricted maxima and minima on the fish’s path. Suppose that the path
of the fish is expressed parametrically:
x = x(t), y = y(t), z = z(t).
Then the stationary points peculiar to the path are where df /dt = 0. By the chain
rule (31.7), these are the points where
∂f dx ∂f dy ∂f dz
+ + = 0.
∂x dt ∂y dt ∂z dt
When written in terms of t, this is an equation giving the critical values of t. (We
must be careful to avoid a parametrization such that dx/dt = dy/dt = dz /dt = 0 at
some point on the path: at such a point, a non-existent stationary point would
be predicted.)
698

Stationary points of f(x, y, z) on the path (x(t), y(t), z(t))


FUNCTIONS OF ANY NUMBER OF VARIABLES

The stationary points are the solutions of


∂f dx ∂f dy ∂f dz
+ + = 0.
∂x dt ∂y dt ∂z dt (31.22)

(In particular cases it might be easier to substitute x(t), y(t), z(t) directly into
f(x, y, z) for the turning points of f with respect to t.)
It is more usual for restricted stationary-value problems to be formulated in
a way that avoids parametric considerations. The restriction to a surface is the
easier case. Instead of a fish in the body of the sea, consider a crab which confines
itself to the undulating seabed described by an equation of the form
g(x, y, z) = c,
encountering there the local pollution, given throughout the sea by f(x, y, z). The
crab does not know about the rest of the sea, but as it moves around it will meet
highs and lows (and other stationary points) unconnected with possibly more
extreme pollution in the body of the sea. A stationary point will be found at a
point Q on the surface g(x, y, z) = c if
31

df
= 0 at Q
ds
in all directions v from Q which do not point into the body of the sea, but are
tangential to the surface g(x, y, z) = c.
Figure 31.6 shows such a point Q, and various tangential directions denoted by
unit vectors v pointing away from Q. From (31.15), one condition for a restricted
stationary point is

grad f

grad g

v v

Q
v v c
=
, z)
x,y
g(
Fig. 31.6

df
= 0 = v . grad f at Q, for all such v. (31.23)
ds
In other words, grad f must be perpendicular to the surface at Q (ignoring for
the moment the chance that grad f might be zero at Q). But, by (31.11), grad g is
always perpendicular to the surface g(x, y, z) = c; in particular at Q. Therefore grad
f and grad g, evaluated at Q, are parallel vectors; so
grad f = λ grad g at Q,
699
where λ is an (unknown) constant, called a Lagrange multiplier for the problem.
By writing grad f and grad g in their components, we obtain

31.8
∂f ∂g ∂f ∂g ∂f ∂g
−λ = 0, −λ = 0, −λ = 0. (31.24a,b,c)

STATIONARY POINTS
∂x ∂x ∂y ∂y ∂z ∂z
We now have three equations for the four unknowns: (x, y, z) (the position of
Q) and λ. To find another equation, notice that (31.24a,b,c) would be unaffected if
we had g(x, y, z) equal to some constant other than c, so it is necessary to reassert
the particular surface:
g(x, y, z) = c. (31.24d)

The special possibility mentioned above, that (by chance) grad f = 0 at Q, is still
governed by eqn (31.24). When they are solved, we merely find that λ = 0. The case
corresponds to the unrestricted stationary-point problem (see (31.20)), where the
point found happens to lie in the specified surface.

Restricted stationary-point problem: Stationary points of f(x, y, z) subject to


g(x, y, z) = c
Solve for x, y, z, λ the equations
g = c, (i)

∂f ∂g
−λ = 0, (ii)
∂x ∂x
∂f ∂g
−λ = 0, (iii)
∂y ∂y
∂f ∂g
−λ = 0. (iv)
∂z ∂z (31.25)

Example 31.8 Find the stationary points of f(x, y, z) = x2 + y2 + yz + zx on the


hyperboloid g(x, y, z) = x2 + y2 − z2 = 1.
The four equations (31.25) are
x2 + y2 − z2 = 1, (i)
2x + z − 2λx = 0, or (2 − 2λ)x + z = 0; (ii)
2y + z − 2λy = 0, or (2 − 2λ)y + z = 0; (iii)
y + x + 2λz = 0, or x + y + 2λ z = 0. (iv)
Equations (ii), (iii), and (iv) constitute a set of homogeneous linear algebraic equations
for x, y, z. The only possibilities are either that x = y = z = 0, which is excluded since these
values do not satisfy (i), or that the determinant of the coefficients is zero:

⎡2 − 2λ 0 1⎤
det ⎢ 0 2 − 2λ 1 ⎥ = 0,
⎢ 1 1 2λ ⎥⎦

so that (1 − λ)(2λ2 − 2λ + 1) = 0. The only real solution is
λ = 1. ➚
700
Example 31.8 continued
FUNCTIONS OF ANY NUMBER OF VARIABLES

The equations then become


z = 0, z = 0, x + y + 2z = 0,
or
z = 0, y = −x. (v)
Substitute for y in terms of x into (i). Then
2x2 = 1, or x = ±1/√2.
Therefore, using (v) again gives two stationary points at
(1/√2, −1/√2, 0) and (−1/√2, 1/√2, 0).

For the corresponding problem of finding the stationary points of f(x, y, z) on


a specified curve, as for the fish problem discussed at the beginning of the section,
the curve will be assumed to be specified by the intersection of two surfaces:
g(x, y, z) = c1, h(x, y, z) = c2.
The method of solution involves two Lagrange multipliers:
31

Restricted stationary-point problem: stationary points of f(x, y, z) subject to


g(x, y, z) = c1 and h(x, y, z) = c2.
Solve for x, y, z, λ, µ the equations
g = c1 h = c2, (i), (ii)
∂f ∂g ∂h
−λ −µ = 0, (iii)
∂x ∂x ∂x
∂f ∂g ∂h
−λ −µ = 0, (iv)
∂y ∂y ∂y
∂f ∂g ∂h
−λ −µ = 0. (v)
∂z ∂z ∂z (31.26)

We shall not give the proof in full. Briefly, the situation is shown in Fig. 31.7. Q is
a stationary point on the curve of intersection and v a unit vector tangential to it
at Q. Since

grad h grad f

grad g

Q
g(x, y, z) = c1
h(x, y, z) = c2
v

Curve of
intersection Fig. 31.7
701
df
= v . grad f = 0 at Q,
ds

31.8
grad f is perpendicular to v at Q. For the same reason as in the earlier case, grad g

STATIONARY POINTS
and grad h are also perpendicular to v at Q. Therefore the three vectors grad f,
grad g, grad h all lie in the same plane (which is perpendicular to v), so grad f can
be expressed in terms of the other two vectors:
grad f = λ grad g + µ grad h,
where λ and µ are certain constants, the Lagrange multipliers for this problem.
Then split this equation into its components to obtain (31.26).

Example 31.9 Find the stationary points of x2 + y2 + z2 on the curve of


intersection of the vertical cylinder x2 + y2 = 1 with the plane x + y + z = 1.
(This is an inclined ellipse.)
Here f(x, y, z) = x2 + y2 + z2, g(x, y, z) = x2 + y2, and h(x, y, z) = x + y + z. The equations
to be solved become
x2 + y2 = 1, (i)

x + y + z = 1, (ii)

2x − λ2x − µ = 0, or 2x(1 − λ) = µ, (iii)

2y − λ2y − µ = 0, or 2y(1 − λ) = µ, (iv)

2z − µ = 0. (v)

From (iii) and (iv), either (a) λ = 1, so that µ = 0, or (b) λ ≠ 1, so that x = y.


We consider these possibilities in order.
(a) The case λ = 1, µ = 0. (We cannot deduce anything about x and y from (iii) and (iv)
if this is true.) From (v) we obtain z = 0, so (i) and (ii) become
x2 + y2 = 1, x + y = 1.
The solutions are x = 0, y = 1, and x = 1, y = 0. Then we have found two
solutions:
(0, 1, 0) and (1, 0, 0).
(b) The case λ ≠ 1, x = y. From (i), x = ±1/√2, y = ±1/√2. Equation (ii) then gives
z = 1 − x − y = 1 z √2. Thus we have two more solutions:
(1/√2, 1/√2, 1 − √2) and (−1/√2, −1/√2, 1 + √2).

For a restricted stationary-point problem in N variables, there may be up to


N − 1 restricting equations, or constraints, with the same number of Lagrange
multipliers. The equations to be solved then follow the pattern of (31.25) and
(31.26).
The identification of a maximum or minimum is usually of most interest. The
general question is difficult, but sometimes it is fairly obvious. For instance, in the
previous example the values of f are restricted to a closed curve, so the values of
f obtained make it clear that the points (a) give minima of f, and points (b) give
maxima.
702

Self-test 31.6
FUNCTIONS OF ANY NUMBER OF VARIABLES

Find the points where f(x, y, z) = xy + 2z is stationary subject to x + y − z = 0


and x2 + y2 + z2 = 6.

31.9 The envelope of a family of curves


Figure 31.8a shows the straight lines
y = α − α 2x,
for several values of α, which we call the parameter of the family of straight lines.
The ‘boundary’ of the family is starting to form itself into a curve E, which is
sketched in Fig. 31.8b. The reason why the curve E is sharply defined is because all
the straight lines are tangential to it, and therefore reinforce it along its length. The
curve is called the envelope of the family y = α − α 2x, where α is the parameter of
the family.
31

(a) (b)
y y
3 3

2 α2x
y = α −α 2

Envelope E
1 1

x
O 1 2 1 3 1 O 1 2 3 x
α = 52 2 1 2 3

Fig. 31.8

The family does not have to consist of straight lines. Suppose that the family is
described by
f(x, y, α ) = 0.
To find the envelope (Fig. 31.9) consider two close values of the parameter, α and
α + δα, the corresponding curves of the family being
f(x, y, α ) = 0 and f(x, y, α + δα ) = 0.
The intersection point R is the point where
f(x, y, α ) = f(x, y, α + δα ) ( = 0).
Therefore, at the point R,
f (x, y, α + δα ) − f (x, y, α )
= 0.
δα
703

Parameter

31.9
α + δα
α
Parameter
α
R

THE ENVELOPE OF A FAMILY OF CURVES


Envelope E
Q

Fig. 31.9

Now let δα → 0. Then R and Q come together at P on the envelope, and this equa-
tion becomes
∂f (x, y, α )
= 0, (31.27)
∂α
at P. Also P lies on the curve
f(x, y, α) = 0. (31.28)

(We had not so far used the fact that f is zero rather than some other constant.) If
we eliminate α between (31.27) and (31.28), we obtain an equation in x and y
which describes the envelope.

Envelope of the family of curves f(x, y, α) = 0, where α is a parameter


The result of eliminating α between the equations f(x, y, α ) = 0
∂f
and = 0 contains the envelope.
∂α (31.29)

(The solution might also include the track of other peculiarities.)

Example 31.10 Find the envelope of the family of straight lines y = α − α 2x,
where α is the parameter. (See Fig. 31.8.)
Let f(x, y, α) = y − α + α 2x. Then
∂f
= −1 + 2αx = 0.
∂α
Therefore
α = 1/(2x). (i)
On the envelope, also
y − α + α 2x = 0; (ii)
so, from (i), y − 1/(2x) + 1/(4x) = 0, or
y = 1/(4x),
which is a rectangular hyperbola (see Fig. 31.8b).

Self-test 31.7
Find the envelope of the family of circles (x − α)2 + y2 = α(α  0).
704
Problems
FUNCTIONS OF ANY NUMBER OF VARIABLES

31.1 Write down the incremental approximation (c) (For A) The area of a triangle with sides a, b, c
for δf in the following cases. is given by
(a) f(x, y, z) = 2x + 3y2 + 4z2 − 3; A = [s(s − a)(s − b)(s − c)]–,
1
2

(b) f(x, y, t) = (x 2 + y 2 )− 2 e−t;


1

where s = --12 (a + b + c). Consider the case when


(c) f(r, θ, t) = e−t r cos θ;
a = 2, b = 4, c = 3, all with possible errors as
(d) f(x, y, z, t) = x2 + y2 + z2 − t2;
(e) f(x1, y1, x2, y2 ) = (x1 − x2 )2 + (y1 − y2 )2; large as ±0.1. (You can substitute s directly
(f ) f(x, y, t) = (1/r) e−(x +y )/t. Compare with the
2 2 into the formula for A or A2, but it is easier
algebraically to obtain two simultaneous
expression for δg when g(r, t) = (1/r) e− r /t in
2

polar coordinates. equations, with numerical coefficients,


involving ∆A, ∆s, ∆a, ∆b, ∆c.)
31.2 The distance d between two points (x1, y1, z1)
and (x2, y2, z2) in a plane is given by d 2 = (x1 − x2)2 31.7 (Section 24.2). If f(x, y, z, w) = c (a constant),
+ (y1 − y2)2 + (z1 − z2)2. Find approximately the then any of the four variables is a function of the
change from (1, 1, 2), (1, 2, 1) to (1.1, 0.9, 1.8), other three. Use (31.6) to show that
(0.9, 2.1, 1.1). ∂x ∂y ∂x ∂y ∂x
(a) = 1; (b) =− .
∂y ∂x ∂y ∂z ∂z
31.3 R1, R2, R3, R4 are resistances in a circuit
∂x ∂y ∂z ∂w
whose overall resistance is R, arranged so that (c) Simplify .
∂y ∂z ∂w ∂x
1 /R = 1 /R4 + (R1 + R2)/(R1R2 + R2R3 + R3R1).
Test the truth of the results in the cases:
31

Find an expression for δR in terms of δR1, δR2, (i) x + 2y + 3z + 4w = 5; (ii) xy2z3w = 1.


δR3, and δR4.
Suppose that initially R1 = 3, R2 = 10, R3 = 5, and 31.8 Assume that the following relations define z
R4 = 10, and that R1 becomes 3.2 and R2 becomes implicitly as a function of x and y. Write down the
9.8. Estimate the change in R3 necessary if R is to relation between δx, δy, δz at the points prescribed.
remain unaltered. Without solving for z, deduce ∂z/∂x and ∂z/∂y at
the points.
31.4 The equation 2x3 − 3x − 45 = 0 has a
(a) 2x − 3y + 4z = 1 at points satisfying the
solution x = 3. Find an approximate solution
condition;
to the equation 2.1x3 − 2.9x − 47 = 0.
(b) x2 + y2 + z2 = 14 at (1, 2, −3);
31.5 Estimate the maximum possible error and
(c) 4x3 + y4 + 9z3 − xyz2 = 13 at (1, 1, 1);
(d) x2 − z2 = 9 at x = 5, y = y0, z = 4 (this is a
the corresponding percentage error in w for the
hyperbolic cylinder).
following cases.
(a) w = yz + zx + xy, x = 2 (±0.1), y = 3 (±0.2),
31.9 (a) Compare the result of using the chain
z = 1 (±0.1).
rule (31.7) with that of direct substitution in
(b) w = (x − y)(y − z)(z − x), x = 1 (±0.1),
order to find df/dt when f(x, y, z) = xy /z and
y = 2 (±0.1), z = 3 (±0.1).
x = t, y = 4t, z = 2t.
(c) w = (x + y + z − t)−1, where it is known only
(b) The same parametrization as in (a), but with
that x = 1.2, y = 2.9, z = 1.9, and t = 2.1 after
the function f(x, y, z) = sin(xy /z).
rounding to 1 decimal place. Compare with
(c) Obtain an expression for df/dt on the path in (a)
the exact maximum and percentage errors.
when f(x, y, z) = g(xy /z), g being any function, and
31.6 Estimate the maximum error and the confirm that it works with case (b). (Hint: express
maximum percentage error for the following. the result in terms of g′.)
(a) (For c) c2 = a2 + b2 − 2ab cos A (the ‘cosine rule’
31.10 Cylindrical coordinates r, θ, z are shown in
for a triangle ABC). Here a = 2 (±0.1), b = 4
Fig. 31.10. They are related to x, y, z by x = r cos θ,
(±0.1), A = 135° (±2°). (Note: ∆(c 2) ≈ 2c ∆c.)
y = r sin θ, z = z.
(b) (For d )
(a) Given f(x, y, z), use the chain rule (31.8) with r,
d 2 = (x1 − x2)2 + (y1 − y2)2 + (z1 − z2)2, θ, z as the parameters to express ∂f/ ∂r, ∂f /∂θ,
where the measured values (x1, y1, z1) = (1, 2, 1) ∂f /∂z in terms of ∂f/ ∂x, ∂f /∂y, ∂f /∂z.
and (x2, y2, z2) = (2, 1, 1) have been rounded to (b) Regarding (a) as a pair of equations for ∂f/∂x
1 significant figure. (Note: ∆(d 2) ≈ 2d ∆d.) and ∂f/∂y, show that
705
z (c) x + y + z = 9 at (2, 1, −2);
2 2 2

(d) --14 x2 + --19 y2 + 161 z2 = 3 at (2, 3, 4);

PROBLEMS
P
(e) x3y + zx3 = 5 at (1, 2, 3);
(x, y, z) 1 1 1
(f) + + = 1 at (2, 3, 6);
x y z
(g) (x2 + 4y2 − z2)−1 = 161 at (4, 1, 2).
z
31.13 By finding the gradient vectors, obtain the
y
angle between the following surfaces at the point
of intersection given:
(a) x2 + y2 + z2 = 9, x2 − z2 = 0 at (2, 1, 2);
(b) x2 − y2 + z2 = 1, 2x − 3y + z + 1 = 0 at (2, 2, 1);
r
O
θ
(c) x2 + y2 − z2 = 0, 3x + 4y + 5z = 50 at (3, 4, 5).
Explain the result.

31.14 (a) Find grad f for


1

f (x, y, z) = A eα ( 2x +4y +z ) 2 ,
2 2 2

x
where A and α are constants. Deduce that the
Fig. 31.10
vector (2x, 4y, z) points in the direction of grad f.
(b) Let f(x, y, z) = g[u(x, y, z)], where g and u are
two other functions. Show that
∂f ∂f sin θ ∂f ⎛ ∂u ∂u ∂u ⎞
= cos θ − and grad f = ⎜ g ′(u) , g ′(u) , g ′(u) ⎟ ,
∂x ∂r r ∂θ ⎝ ∂x ∂y ∂z ⎠
∂f ∂f cos θ ∂f and deduce that grad u points either in the same or
= sin θ + .
∂y ∂r r ∂θ in the opposite direction to grad f.
(c) The results (b) show that the differentiation 31.15 Write down expressions for the directional
operations ∂/∂x and ∂/∂r are equivalent derivative of the following at the point (x, y, z), in
respectively to the polar forms terms of a unit direction vector v.
∂ sin θ ∂ (a) x + 2y + 3z; (b) x2 − y2 − 3z;
cos θ − and
∂r r ∂θ (c) (x − 1)3 + y3 + z3.
∂ cos θ ∂
sin θ + . 31.16 Find df/ds for the following functions f,
∂r r ∂θ taken at the point (2, 3, 2) in the direction
Use this fact to confirm that v = ( 41 √2, 41 √2, 12 √3) .
∂ 2f ∂ 2f ∂ 2 f 1 ∂f 1 ∂ 2f (a) x − y + 2z; (b) xy + yz + zx;
+ = + + . (c) (xy + yz + zx)2;
∂x 2 ∂y 2 ∂r 2 r ∂r r 2 ∂θ 2
(d) x2 − y2 + 5 (in three dimensions: this represents
a vertical cylinder).
31.11 Obtain the vector function grad f for each of
the following. 31.17 The equations for two surfaces, f(x, y, z)
(a) x + y + z; = a, g(x, y, z) = b, where a and b are constants,
(b) 2x − 3y + 5z − 6; together represent their curve of intersection,
(c) x2 + y2 + z2; C. Show that the vector product grad f × grad g,
(d) x3 + 3z3 − 1 (in three dimensions); evaluated at a point on C, points in the direction
(e) x 2 − 41 y 2 + 91 z 2; of C. Use this to find a unit vector v in the direction
(f) 1/r, where r = (x 2 + y 2 + z 2 ) 2 ; confirm that the
1
of C in the following cases:
gradient vector points in the direction of the (a) 2x + 3y − z = 1, x − y − z = 0, at any common
position vector (x, y, z). point.
(b) x + y = 0, x − z = 0, at any common point.
31.12 Obtain a vector which is normal to the (c) x2 + y2 + z2 = 6, x − y + z = 0, (1, 2, 1).
following surfaces at the points specified, and (d) x2 + (y − 1)2 = 1, x2 + (y − 2)2 = 4, at x = 0,
construct a unit vector from it: y = 0, and any value of z. Explain what is
(a) x − 2y + z = 0 at any point; happening here.
(b) y2 + z2 = 2 at any point; (e) xy + yz + zx = 3, x + y + z = 3, at (1, 1, 1).
706
31.18 Find the stationary points of the following (e) The problem of the rectangular block
functions with respect to all the variables named of greatest volume which can be fitted
FUNCTIONS OF ANY NUMBER OF VARIABLES

in f: into an ellipsoid leads to the problem:


(a) f(x, y, z) = x2 + y2 + z2; find the SPs of xyz subject to x2/a2 + y2/b2
(b) f(x, y, z) = x3 − 3x + y3 − 3yz + 2z2; + z2/c2 = 1.
(c) f(x, y, z) = xy + yz + zx + y − z; (f ) SPs of x2 + 4y2 + z2 on the intersection of the
(d) f(x, y, z) =x/z + y/x + z /y; two planes x − y − 2z = 0 and z = 1.
(e) f(x, y, z, λ) = (x + y + z) − λ(x2 + y2 + z2 − 1); (g) SPs of x2 − y2 − z2 on the straight line
(f) f(x, y, z) = x4 + y4 + z4 − 2(x − y + z)2. x–1 y−2 z −2
= = .
2 −1 3
31.19 Find the stationary points of x2 + y2 + z2 on
(h) SPs of xyz subject to xy + yz + zx = 1.
the path
(Compare (d).)
x = cos t, y = sin t, z = sin 12 t, (i) SPs of x − y − 2z on the intersection of z = 1
where 0  t  4π. with x2 + 4y2 + z2 = 6. (Compare (f).)

31.20 At the point (x, y, z) in the air, an insecticide


31.22 (Numerical). (a) Write a program to
maintains a concentration carry out the numerical scheme suggested in
s = C exp{−α [2(x − 1)2 + 4y2 + z2]}, Example 31.6 in two dimensions in order to
where C is a constant. An insect is trying to escape obtain the curves along which a function f(x, y)
by following the path of most rapid decrease in increases most rapidly. It may also be possible
concentration. for you to display the curves on the screen.
(b) Use (a) to obtain a numerical solution of the
31

(a) Show that, when it is at (x, y, z), its direction


is that of the vector (2(x − 1), 4y, z). following problems. Try a succession of decreasing
(b) In a short interval of time δt, it moves to step lengths h in order to ensure plotting accuracy.
(x + δx, y + δy, z + δz). Show that, (c) The altitude H of a part of a hill is given
approximately, in km by
δx δy δz H = 0.5 − x2 − 4y2.
= = .
2(x − 1) 4y z Start for example with (x, y) = (2, 2), and go
(c) By letting δt → 0, show that its path is to the summit. (For comparison, the exact
described by the two differential equations solution to the problem is y = --18 x4; the summit
is at the origin.)
dz z dz z (d) Plot the track of most rapid descent from the
= , = .
dx 2(x − 1) dy 4y point (3, 2) in the case where H = --12 + x2 − y2.
(Such simultaneous equations would often be (To descend, use negative h. The shape is a saddle:
written dx/2(x − 1) = dy /4y = dz /z.) viewed from the origin, H increases east and west
(d) Show that the general solution of these and decreases north and south.)
equations, which expresses a path in space, (e) In certain types of fluid flow in two dimensions,
can be written as the velocity vector v(x, y) is equal to the gradient
of a single scalar potential function φ(x, y, z):
z = Ay 4 = B(x − 1) 2 ,
1 1

v = grad φ. A streamline through any point is in


where A and B are arbitrary constants. the direction of v. Plot some streamlines on and
(e) Assuming that the insect starts at (0, 1, 1), find outside of the circle x2 + y2 = 1 when
its path. ⎛ 1 ⎞
φ(x, y) = x ⎜1 + 2 ⎟.
31.21 Use the Lagrange-multiplier technique of ⎝ x + y2 ⎠
(31.25)–(31.26) to solve the following restricted
stationary-point (SP) problems. 31.23 Often a function f(x, y, z) takes the form of
(a) SPs of x + y + z subject to 1/x + 1/y + 1/z = 1. ‘a function of a function’: w = f(x, y, z) = g(u(x, y, z)).
(b) SPs of xyz subject to 1/x + 1/y + 1/z = 1. (An example is
(c) SPs of x2 + y2 + z2 subject to ax + by + cz = 1.
(d) SPs of xy + yz + zx subject to xyz = 1. (This w = f(x, y, z) = sin xyz : u = xyz, w = sin u.)
corresponds to finding the rectangular block (a) Write down several examples of functions
of given volume which has the smallest which can be regarded in this way.
surface area.) (b) Show that
707
∂f ∂u ∂f ∂u Try it out with the function x/y + y/z + z/x,
= g ′(u) , = g ′(u) ,
∂x ∂x ∂y ∂y noting that

PROBLEMS
∂f ∂u y z y
= g ′(u) . = .
∂z ∂z x x z
(You only need the one-variable chain rule
(3.3).) 31.25 Let f(x, y, z, t) = ei(k1x+k2y+k3z−ω t), where i is the
(c) Check the correctness of the formulae complex element (i2 = −1), and k1, k2, k3, and ω are
(b) in the cases when (i) w = ex −y +z ,
2 2 2
constants. Show that
(ii) w = sin(xy /z). ∂ 2f ∂ 2f ∂ 2f 1 ∂ 2f
(d) Using the results (b), rewrite the chain rule + 2 + 2 = 2 2,
∂x 2
∂y ∂z c ∂t
(31.7) in the form appropriate to functions of where c = ω /√(k12 + k22 + k32). (This is called the wave
the form g(u(x, y, z)). equation in three dimensions, and f(x, y, z, t) is one
(e) The path x = cos t, y = sin t, z = t represents of its solutions.)
a helix whose axis is the z axis. Find an Prove that g(k1x + k2y + k3z − ω t), where g is any
expression in terms of t for df/dt on the function of a single variable, is also a solution.
path when f(x, y, z) = g(xy/z), where g is
any function. Confirm the result for any 31.26 Find the envelopes of the following families.
simple case. (a) y = α + α 2x (parameter α );
(b) y + α 2x = α (parameter α );
31.24 Often a function takes the form x y
(c) + = 1 (parameter α );
f(u, v, w), α 1−α
where u, v, and w are themselves functions (d) x cos θ + y sin θ = 1 (parameter θ ).
of x, y, and z. Write down a version of the chain
rule (31.8) which enables ∂f/ ∂x, ∂f/ ∂y, ∂f/ ∂z to 31.27 The cross-sectional profile of a long
be found. (In this case, x, y, and z function like cylindrical mirror is the semicircle x2 + y2 = 1 in the
parameters and u, v, and w like the principal right-hand half plane. Rays from the left, parallel
variables.) Use this result to prove the to the x axis, fall on the mirror.
following results: (a) Show that the equation of the ray reflected
(a) If φ = f(x − y, y − z, z − x), where f is any from the point (cos θ, sin θ ) on the mirror is
function, then x sin 2θ − y cos 2θ = sin θ.
(b) By regarding θ as the parameter, show that
∂φ ∂φ ∂φ
+ + = 0. the envelope of these reflected rays is given
∂x ∂y ∂z by x2 + y2 = --14(3y – + 1). (In optics, this envelope
2
3

Check your result with the function is called the caustic of the reflected rays.)
(x − y)(y − z)(z − x).
31.28 Show that the envelope of the family of
(b) If φ = f(y /x, z /x), where f is any function of straight lines such that the length cut off between
two variables, then the x and y axes is L, a constant, is given by
∂φ ∂φ ∂φ x– + y– = L–. Sketch the curve (it has four
2
3
2
3
2
3
x +y +z = 0.
∂x ∂y ∂z segments).
32 Double integration

CONTENTS

32.1 Repeated integrals with constant limits 709


32.2 Examples leading to repeated integrals with constant limits 710
32.3 Repeated integrals over non-rectangular regions 713
32.4 Changing the order of integration for non-rectangular regions 715
32.5 Double integrals 717
32.6 Polar coordinates 721
32.7 Separable integrals 724
32.8 General change of variable; the Jacobian determinant 727
Problems 732

In Chapter 15 we defined the definite integral of a function of a single variable


f(x) as the limit of a sum of certain elements over the interval of integration. The
double integral extends this idea to a function f(x, y) of two variables, taken over
a two-dimensional region in the x,y plane, thus greatly expanding the scope of
integration in applications.
Treating y as if it were constant, integrate f(x, y) with respect to the variable x
over an interval a  x  b, where a and b constants. The resulting expression is a
function of y. Integrate the result with respect to y over c  y  d, where c and d
are constants. The final result is then a constant, and always independent of x and
y. Such a process is known as repeated integration. The values of f(x, y) at all
points in a rectangle defined by a  x  b, c  y  d in the x,y plane are involved
in the integration process. We illustrate the procedure and show how the order of
the integrations does not affect the answer; we may integrate first with respect to
x, or first with respect to y, leading to the same final result.
Many scientific problems involve integration over a region that is not merely a
rectangle with sides parallel to the axes. In such cases we have to consider repeated
integrals involving a variable limit; a dissection of the region which leads to
repeated integrals; or, more generally, to the definition of a double integral over
an arbitrary region, and the possibility of transforming the coordinates to a set
better suited to the shape of the region. Apart from the examples worked out in
the chapter a list of different types of application for double integrals is given at
the end of Section 32.5.
709

32.1 Repeated integrals with constant limits

32.1
Before explaining how they arise, we show first how a repeated integral is written
and evaluated. The following is an example of a repeated integral with constant

REPEATED INTEGRALS WITH CONSTANT LIMITS


limits:
1 2

I=
  (xy + y − 1) dx dy.
0 0
2

There are two stages of integration; first with respect to x, then with respect to y,
this being determined by the order in which dx and dy appear under the integral
signs.
You are recommended to copy the following procedure step by step at first.
(i) Put brackets round the inner integral, which is the first to be evaluated:
1
⎛ 2

I=
  (xy + y − 1)dx⎟⎠ dy.
0

⎝ 0
2

(ii) Make it clear which variable connects with which limits of integration, by
explicitly labelling them as shown:
1
⎛ 2

I=
 
y=0

⎝ x =0
(xy + y2 − 1) dx⎟ dy.

(iii) Evaluate the inner integral with respect to the first variable (here x), treat-
ing the other variable (y) as a constant:
2

x =0
(xy + y2 − 1) dx = [ 21 x2y + y2 x − x] 2x = 0 = 2y + 2y 2 − 2.

This process eliminates the variable x.


(iv) Use the result of (iii) as the integrand of the outer integral:
1

I=

y=0
(2y + 2y2 − 2) dy = [ y2 + 23 y 3 − 2y]1y = 0 = − 13 .

This eliminates the variable y, so that the final result is a definite number. If
you find you are left with an x or y in the result, then you have not followed the
process correctly.

Example 32.1 Evaluate the repeated integrals


1 4 4 1

(a) I =  0 2
(xy + 1) dx dy, (b) J =   (xy + 1) dy dx.
2 0

1
⎛ 4

(a) As a repeated integral I =  ⎜⎝ 
y=0 x =2
(xy + 1) dx⎟ dy.
⎠ ➚
710
Example 32.1 continued
DOUBLE INTEGRATION

The inner integral becomes


4

 x=2
(xy + 1) dx = [ 12 x2 y + x] x4 = 2 = 8y + 4 − (2y + 2)
= 6y + 2.
This forms the integrand of the outer integral:
1
I= y=0
(6y + 2) dy = [3y2 + 2y]1y = 0 = 5.

(b) Here the order of the symbols ∫ 4x=2 and ∫ 1y=0 has been reversed, and also the order of
dx and dy. In other words, the same processes are to be carried out, but in the reverse
32

order. The details, however, look different.


We have
4
⎛ 1

J=   ⎜
x=2 ⎝ y= 0
(xy + 1) dy⎟ dx.

The inner integral is with respect to y, and we treat x as a constant:
1

y=0
(xy + 1) dy (x constant).

This is equal to
[x( 12 y 2) + y]1y = 0 = 12 x + 1.
The outer integral becomes
4
J=  x =2
( 12 x + 1) dx = 5,

which is the same as I in (a).

In this Example, it makes no difference to the result whether we integrate with


respect to x or y first. Later we show that this is always true when the repeated
integral has constant limits.

Self-test 32.1
Evaluate the repeated integrals
3 2 2 3

I=   (x y + xy ) dx dy,
1 0
2 2
J=   (x y + xy ) dy dx.
0 1
2 2

Examples leading to repeated integrals with


32.2
constant limits
Figure 32.1a represents a heap of grain in a rectangular silo of length 8 m and
breadth 4 m. The vertical sides of the silo are x = 0; x = 8; y = 0; y = 4. The top
surface of the grain is curved, with the equation
711

(a)

32.2
z Thickness δy
C (b)
D
z

EXAMPLES LEADING TO REPEATED INTEGRALS WITH CONSTANT LIMITS


1 2 1 2
z = 32 x + 16 y +2
4 y (y constant) C

A
O D

B
A B
x x
8 O 8

Fig. 32.1

z= 1
32 x2 + 1
16 y2 + 2 (0  x  8; 0  y  4),
and the problem is to find the volume, V say, of the grain.
Imagine the grain divided into thin vertical plane slices, parallel to the (x, z)
plane, the thickness of a slice being δy. A typical slice is shown in Fig. 32.1a, and
the value of y is constant on its faces. It is lifted out and displayed in elevation
separately in Fig. 32.1b. The face area is given by
8

Area ABCD =

x =0
( 321 x2 + 1
16 y2 + 2) dx,

in which y takes the current constant value. Therefore its volume, δV say, is given by

⎛ 8

δV ≈ ⎜

x =0
( 321 x2 + 1
16 y2 + 2) dx⎟ δy.

When we take the sum of all the elements δV and let δy → 0, we obtain in the
usual way
4
⎛ 8

V=
  ⎜
y=0 ⎝ x =0
( 321 x2 + 1
16 y2 + 2) dx⎟ dy.

The result has therefore taken the form of a repeated integral of the kind de-
scribed in Section 32.1. In evaluating it, the inner integral gives the cross-sectional
area of a slice on which y is held constant:
8

x =0
( 321 x2 + 1
16 y2 + 2) dx = [ 961 x 3 + 1
16 y2x + 2x] 8x = 0 = 64
3 + 21 y2.

Finally
4

V=
y= 0
( 643 + 21 y2 ) dy = [ 643 y + 61 y 3] y4= 0 = 96.

It can be seen that if we had taken the slices parallel to the (y, z) plane the
process would have led to the integral
712
8
⎛ 4

V=
  ⎜ ( 321 x2 + 1
y2 + 2) dy⎟ dx.
DOUBLE INTEGRATION

16
x =0 ⎝ y=0 ⎠
The integrand is the same, and the result must be the same, when the integrations
over x and y are carried out in the opposite order.
In general a repeated integral with constant limits,

  f(x, y) dx dy,
d b

c a

can be interpreted when f(x, y) is positive as the volume of material in a box


32

standing on the rectangle specified by a  x  b, c  y  d, when the material in


it has depth f(x, y). If f(x, y) is negative over any part of this area, then it will
obviously make a negative contribution to the integral. This signed-volume
analogy is closely similar to the signed-area analogy (15.13). Therefore in all cases,
to change the order of integration is simply to add all the signed volume elements
in a different order:

Changing order of integration in a repeated integral with constant limits

  f(x, y) dx dy =   f(x, y) dy dx.


d b b d

c a a c (32.1)

There is frequently an advantage to be had from changing the order of integra-


tion in this way.

  xe
1 2

Example 32.2 Evaluate I = xy


dx dy.
0 0

The inner integral is


2

x=0
x exy dx,

which, though not very difficult, does involve integration by parts. To avoid this, try the
alternative order of integration:
2
⎛ 1

I=  ⎜⎝ 
x=0 y=0
x exy dy⎟ dx.

The inner integral, with x being treated as a constant, is
1 1
⎡ 1 ⎤
y=0
x exy dy = ⎢x exy ⎥ = ex − 1.
⎣ x ⎦ y=0
Then
2
I=  (e
0
x
− 1) dx = [ex − x] 20 = e2 − 3,

which is much simpler.


713

Self-test 32.2

32.3
Change the order of integration and evaluate

REPEATED INTEGRALS OVER NON-RECTANGULAR REGIONS


1 –1 π


2

I= x(sin xy + cos xy) dx dy.


0 0

32.3 Repeated integrals over non-rectangular regions


Suppose now that the base of the silo, loaded with grain, has the triangular shape
OPQ shown in Fig. 32.2; to take a definite instance, we could consider its depth
f(x, y) to be the same as before: f(x, y) = 321 x2 + 161 y2 + 2; but our discussion will
hold for a general function f. Again we measure the volume V by summing the
volumes of slices parallel to the x axis, having thickness δy.

(a)
z (b)
z
C C (c)
y D
y y=4
D Q
R
4
2y
x=8 x= x=8
Q x = 2y
O A R
B x A B
8 O R A B
P 4 x
x 8 O y=0 P

Fig. 32.2 (a) Silo with triangular base OPQ; OP = 8, PQ = 4. (b) A cross-section y = OR. (c) The region of
integration, with a strip AB.

Figure 32.2b shows a typical slice ABCD lifted out and viewed in (x, z) axes in
order to obtain its face area, and Fig. 32.2c shows the base of the silo in plan view;
the slice chosen is along AB.
The slices all have different x values at their starting points, and these values
depend on y, so the limits of integration are not constant in this case. In order to
determine the range of integration of the slice at level y, it is necessary to refer to
the triangular area in Fig. 32.2c, called the region of integration for this problem.
The equation of the side OQ is
y = 12 x
for 0  x  8. Since we need x in terms of y, we express this as
x = 2y,
and it is helpful to write it on OQ, as shown, together with the simpler information
required for the other limits of integration.
714
The face area of the slice ABCD at level y is therefore given by
DOUBLE INTEGRATION


8

area ABCD = f(x, y) dx.


x=2y

Its volume δV is equal to (area ABCD × δy), so


⎛ 8

δV ≈ ⎜

 x =2 y
f (x, y) dx⎟ δ y,

and finally the whole volume V is

  f(x, y) dx dy.
4 8
32

V=
0 2y

Notice that the limits of integration have nothing to do with the integrand
f(x, y), but depend only on the shape of the region of integration in the (x, y) plane
(in this case the triangle OPQ in Fig. 32.2c). The limits of integration are the same
no matter what the integrand.

  (x + y) dx dy.
1 2

Example 32.3 Evaluate the integral I =


0 2y

Write the integral


1
⎛ 2

I=   y=0

⎝ x =2y
(x + y) dx⎟ dy.

The inner integral is
2

 x =2y
(x + y) dx = [ 12 x2 + yx] x2 =2y = (2 + 2y) − (2y2 + 2y2) = 2 + 2y − 4y2.

(Follow the calculation carefully.) Then


1
I=  y =0
(2 + 2y − 4 y2 ) dy = [2y + y2 − 43 y 3]10 = 53 .

  xy dx dy, and sketch the region


2 y

Example 32.4 Evaluate the integral I =


of integration. 0 0

Let
2
⎛ y

I=   ⎜
y=0 ⎝ x=0
xy dx⎟ dy.

The inner integral is
y

x=0
xy dx = [ 12 x2y] yx = 0 = 12 y3.

Therefore
2
I=  0
1
2 y 3 dy = 18 [ y 4 ] 20 = 2.

715
Example 32.4 continued

32.4
A sketch of the region of integration can be constructed in the following way.
The region of integration consists of the points (x, y) which simultaneously satisfy
(i) 0  y  2 and (ii) 0  x  y.

CHANGING THE ORDER OF INTEGRATION FOR NON-RECTANGULAR REGIONS


One way of finding these is to sketch the boundaries of the required region. These
are the lines
y = 0, y = 2, and x = 0, x = y,
and they are shown on Fig. 32.3a. The region consists of any points which lie between
both pairs of boundaries (i) and (ii). This is the triangle shown in Fig. 32.3.

y
y=2
2
x=0

1 y=x

y=0
x
O 1 2
Fig. 32.3

Self-test 32.3
Find the volume of the grain in the silo shown in Fig. 32.2.

Changing the order of integration for


32.4
non-rectangular regions
If the region of integration is a rectangle with sides parallel to the axes, then (32.1)
states that changing the order of integration simply involves performing the same
operations in the opposite order. But if we do the same thing with the previous
example, we get
y

  xy dy dx.
2

0 0

This is obviously nonsense: the answer contains y, whereas we ought to get the
answer 2 again. In fact the new form means nothing at all.
If the region of integration is non-rectangular we have to write

f(x, y) dy dx
and begin again, filling in the limits of integration so that we cover the same
region. The interior integral is now with respect to y, so we must start with strips
716

Vertical strip,
DOUBLE INTEGRATION

width δx
y
2
y=2

x=0
y=x
1

y=0
x
32

O 1 2 Fig. 32.4

parallel to the y axis as shown in Fig. 32.4. Then the inner integral gives the
contribution from the strip:


2

δx f(x, y) dy.
y=x

The outer integral involves all x between 0 and 2, so finally we have


y 2

  f(x, y) dx dy =   f(x, y) dy dx.


2 2

0 0 0 x

Each case has to be considered individually in this way. The region of integration
should be sketched to ensure the correct change of limits.

Example 32.5 Change the order of integration in the repeated integral


1
√(1− 4y 2 )


2

y dx dy
0 − √(1− 4y 2 )

and so evaluate the integral.


Write the integral in the form
⎛ √(1− 4y 2 ) ⎞
1

 
2

I= ⎜ y dx⎟ dy.
y=0 ⎝ x = −√(1− 4y 2 ) ⎠
The limits of integration express the boundaries of the region of integrations:
x = (1 − 4y2 )2 , x = −(1 − 4y2 ) 2 , y = 0, y = 12 .
1 1

These are shown in Fig. 32.5 (the curved part can be written x2 + 4y2 = 1: an ellipse
with semi-axes equal to 1 and 12 ). Figure 32.5a shows how the form given is obtained,
by starting with horizontal strips which end at x = ±(1 − 4y2 )2 .
1

In Fig. 32.5b, the position with regard to vertical strips is shown, for which the
inner integral will be over y. When the order of integration is changed by this means,
we obtain
⎛ √( 2 )
2 1−x ⎞
1
1
I=  
x = −1

⎝ y=0
y dy⎟ dx.
⎠ ➚
717
Example 32.5 continued

32.5
(a) y

Horizontal strip, Horizontal strip,

DOUBLE INTEGRALS
1
width δy 2 width δy

1 1
x = −(1 − 4y2)2 x = (1 − 4y2)2

−1 O 1 x

(b) y Vertical strip,


width δx
1
2

−1 y=0 O 1 x
Fig. 32.5

The inner integral is now


1
√( −x 2 )
2 1

 √(1−x 2 )
1
y dy = [ 12 y2 ]02 = 18 (1 − x2 ).
0

Therefore
1
I= 1
8  −1
(1 − x2 ) dx = 18 [x − 13 x 3 ]1−1 = 16 .

It is left to you to try it in the original form; it is perfectly possible, but more
complicated.

Self-test 32.4
R is the region in the x,y plane bounded by the straight lines x = 1, y = 0
and the parabola x = y2. Evaluate

I=  √x e R
−y√x
dx dy.

32.5 Double integrals


The repeated-integral notation is very informative and self-contained: all the
information needed is contained in the integral. It even suggests the coordinates
to be used, and gives explicitly the boundary of the region of integration.
However, problems do not always fall easily into this form.
Suppose we have a lake of any shape, as shown in Fig. 32.6a, whose depth is
f(x, y): notional contours showing water depth are suggested. We want to find the
volume of water in the lake.
718

(a) y (b) δA at P
y
DOUBLE INTEGRATION

Region of
integration
R

50
40
O 30 x O x
20
10
32

Fig. 32.6

Call the area covered by the lake the region of integration R for the problem.
Construct a mesh on R consisting of small area elements δA as in Fig. 32.6b: the
mesh may be quite arbitrary for the present purpose. A typical area element δA is
at P. Below δA is a depth of water we shall denote by f(P) (we shall not use f(x, y)
because cartesian coordinates might not be the ones we eventually want to use).
The volume δV in the vertical column of water below δA at the point P is approx-
imated by
δV ≈ f(P) δA.
If we add up all the volume elements in the usual way, we obtain the total
volume V. Denote this operation by

V = ∑ δV,
R

thus indicating a certain region of integration R , which can be obtained by reference


to the diagram, or might be specified separately in words or in some other way.
Now let all the area elements δA approach zero, while becoming more numerous
in order to cover R . We obtain

V = ∑ δV = lim ∑ f(P) δA.


R δ A→0 R

It is natural to write this as some kind of integral as we did in one dimension (see
Section 15.1). There are several notations; we shall write

V=  f(P) dA,
R

which is to be read: the double integral of f over the region R . Unlike a repeated
integral it does not give any clue as to how to evaluate it.
As a rule, the argument that gives rise to a double integral by way of a certain
summation will not have anything to do with volume; but, as a result of the sum-
mation that it represents, the signed-volume analogy referred to in Section 32.2
will always hold good:
719

Double integral I =  f(P) dA and the signed-volume analogy

32.5
R

(i) I stands for lim ∑ f (P) δA; R represents a given region in a plane; and

DOUBLE INTEGRALS
δA→ 0
R
δA is a typical area element of R , taken at the point P. The summation is
over all the elements δA of R .
(ii) (Signed-volume analogy) Whatever its origin, the integral is numerically
equal to the signed volume between a surface z = f(x, y) and the plane z = 0,
taken over the region R . (Where z is negative the contribution counts as
negative.) (32.2)

Example 32.6 A flat plate occupying a region R is acted on at every point P


on the plate by a variable normal stress σ(P) per unit area. Express the total
(resultant) force F on the plate as a double integral.
The position is as in Fig. 32.7: R is the plate. The force δF acting on a typical area element
δA at P is given by

δF = σ
σ(P)δA
The region R of
the plate

δA

Fig. 32.7

δF ≈ σ(P) δA.
Add up the contributions of all the elements covering R , and take the limit as the mesh
becomes finer and finer. We obtain

F = lim
δA→0 R
∑ σ(P) δA =  σ (P) dA.
R

Confronted with a double integral, one has to decide on what coordinate


system to use (say cartesian or polar coordinates) so as to turn it into a repeated
integral. In the following example, cartesian coordinates (x, y) are appropriate.

Example 32.7 We have shown in Example 32.6 that the resultant force F on any
flat plate R subject to a (perpendicular) pressure σ(P) per unit area is given by

F=  σ(P) dA. Find the force on a rectangular plate of sides 2 and 3 units
R
when σ = 3(r 2 − 2), where r is the distance from one of the corners.
This double-integral expression is perfectly general, applying to any plate, any
distribution of force, and any coordinates. We have to reformulate the problem ➚
720
Example 32.7 continued
DOUBLE INTEGRATION

y
3

Horizontal
δy strip
δA

O 2 x
32

δx Fig. 32.8

for this case. Place the rectangle as in Fig. 32.8, with the corner to which the data refer
at the origin. A suitable mesh is the rectangular mesh, with δA having sides δx and δy:
the area element is δA = δx δy. Also σ = 3(x2 + y2 − 2).
We can add (which ultimately means integrate) the contributions
δF ≈ 3(x2 + y2 − 2) δx δy
in any order that is convenient. Suppose we decide to add the contributions along each
horizontal strip at level y, and then to add the results from the strips. Then from the
strip at level y we obtain the contribution
⎛ 2


⎝ 
x=0
3(x2 + y2 − 2) dx⎟ δy,

after letting δx → 0. When we add the contributions from all the strips and let δy → 0,
we have the repeated integral
3 2 3
F= 
0 0
3(x2 + y2 − 2) dx dy = 3  (−
0
4
3 + 2y2 ) dy = 42.

If we had considered vertical strips we would have obtained the same result for F – this
would correspond to inverting the order of integration in the repeated integral.

The following examples show the adaptability of the notation. In each case, R
represents the region in question, with P a representative point of R and dA the
corresponding element.
(i) Area. Area of R : ∫∫R dA.
(ii) Variable surface density. Total mass of a thin flat plate, of variable mass σ(P)
per unit area: ∫∫R σ(P) dA.
(iii) Moments. Moment of (ii) about the x axis: ∫∫R yσ(P) dA.
(iv) Moments of inertia. Moment of inertia of (ii) about the y axis: ∫∫R x2σ(P) dA.
(v) Probability. A function f(x, y) is eligible to be a probability density
function for random variables X and Y over a region R if f(x, y)  0 and
∫∫R f(x, y) dA = 1. The probability that (X, Y) lies in a subregion S is then
∫∫S f(x, y) dA. (Here it is helpful to retain x and y: we are not obliged to use P
if it does not have the right associations.)
721
(vi) Vector resultant. A force per unit area, f(P) (stress), variable in direction and
magnitude, is applied to the surface of a flat plate R . The resultant force F is

32.6
given by F = ∫∫R f(P) dA.

POLAR COORDINATES
In order to interpret or evaluate the integral, we write f in its components
f = îf1 + qf2 + xf3: the original double integral with a vector function as integrand is
really three double integrals in one.
The resultant force F · v in a fixed direction v is ∫∫R f (P) · v dA. This integrand
f ·v is not a vector. It can be rewritten in any convenient way: for example, as
| f | cos θ, where θ is the angle between f and v.

Self-test 32.5
A thin rectangular plate has uniform density ρ and side-lengths 2a and 2b.
Find the moment of inertia of the plate about an axis perpendicular to the
plate through its centre. (For a plate of general shape R , the moment of
inertia about a perpendicular axis is

 ρ(x + y ) dx dy
R
2 2

where the origin is at the axis.)

32.6 Polar coordinates


If the boundary of R in a double integral is circular with centre at the origin, or if
R is a circular sector, it might be easiest to work it out using polar coordinates.
However, when we do this, the integrand changes in an important way.
Figure 32.9a shows an annular sector R whose boundaries are specified by r = a,
r = b, θ = α, θ = β, where r and θ are polar coordinates. We want to evaluate

 f(P) dA = lim ∑ f(P) δA,


R
δA→ 0
R
(32.3)

(a) (b)
y y

r=
b
δr
β
θ=

δA ≈ r δr δθ
r δθ
δA δA
r=
a
α δθ
θ=
Region R r

O x x

Fig. 32.9
722
where P is a representative point of R , and δA for the moment permits any kind of
division of R into small area elements. We want to put everything in terms of
DOUBLE INTEGRATION

polar coordinates. This process must include a suitable choice of elements δA, so
that the summation, or integration, (32.3) can be carried out in an orderly way
over the δA elements – the equivalent of ‘strips’ in (x, y) coordinates.
The mesh suitable for this purpose is also shown in Fig. 32.9a, and one of the
area elements δA is shown in Fig. 32.9b. It is nearly a rectangle, with sides δr and
r δθ, so
δA ≈ r δr δθ.
The sum in (32.3) therefore becomes, in polar coordinates,
32

lim
δr →0
∑ f (r, θ )(r δr δθ ).
δθ → 0 R

The sum of the elements along the radial line θ is


⎛ b



 r =a
f (r, θ )r dr⎟ δθ

after letting δr → 0. Now add up the contributions from all these narrow sectors,
ranging from θ = α to θ = β, and let δθ → 0. We obtain the repeated integral
β
⎛ b

 
θ =α

⎝ r =a
f (r, θ )r dr⎟ dθ .

Notice that this contains an extra element r in the integrand.

Double integral in polar coordinates, when the region of integration R is an


annular sector
If the sector R is the region a  r  b, α  θ  β, then
β

   f(r, θ )r dr dθ.
b

f(P) dA =
R α a (32.4)

Effectively the integral in polar coordinates becomes a repeated integral with


constant limits over a rectangle in an r,θ plane, but with integrand rf(r, θ ).

Example 32.8 Find the volume V between the two planes x + y + z = 4 and z = 0,
over the quadrant 0  r  1, 0  θ  21 π.
Here R is the region 0  r  1, 0  θ  12 π in the plane z = 0. Expressed as a double
integral, the required volume V is

V=  f(P) dA =  z dA.
R R

Here z is given by
z = 4 − x − y = 4 − r cos θ − r sin θ = f(r, θ ). ➚
723
Example 32.8 continued

32.6
Then by (32.4),
1
π 1

  (4 − r cos θ − r sin θ )r dr dθ
2

V=

POLAR COORDINATES
0 0

⎛ π

1
1
=  ⎜  (4r − r cos θ − r sin θ ) dr ⎟ dθ
2
2 2

⎝θ =0 r =0 ⎠
1
π


2

= [2r 2 − 13 r 3 cos θ − 13 r 3 sin θ ]1r = 0 dθ


θ =0
1
π
=
2

(2 − 1
3 cos θ − 1
3 sin θ ) dθ = π − 23 .
0

Example 32.9 A circular disc of radius 0.1 m has a surface charge density
σ = 10 (1 + 10 r 3 sin 21 θ ) coulombs per square metre. Find the total charge.
−6 3

The total charge Q is given by ∫∫R σ (P) dA, where the region R is the disc
0  r  0.1, 0  θ  2π (if in doubt, sketch it). Remembering that, in polar
coordinates, δA = r dr dθ (or reading straight from (32.4)), we have
2π 0.1 2π 0.1
Q= 0 0
σ (r, θ ) r dr dθ = 0 0
10 − 6(1 + 10 3 r 3 sin 12 θ ) r dr dθ


⎛ 0.1

= 10 − 6   θ =0

⎝ r =0
(r + 10 3 r 4 sin 12 θ ) dr⎟ dθ


= 10 − 6  θ =0
[ 12 r 2 + 15 10 3 r 5 sin 12 θ ] r0.=10 dθ


= 10 − 8  0
( 12 + 1
5 sin 12 θ ) dθ = 10 − 8 [ 12 θ − 2
5 cos 12 θ ] 20 π

= 3.94 × 10−8.
Since the repeated integral has constant limits, the same result would be obtained by
integrating in the reverse order (see (32.1)).

Example 32.10 The curve r = cos θ (0  θ  14 π ), together with the radii from
the origin to its ends, forms the boundary of the region R , and is shown in
Fig. 32.10. Obtain (a) the area of R , and (b) its moment about the y axis.
(a) In general, the area of a region R is ∫∫R dA. In this case, we shall add up the
contribution δA along radial sectors inclined at angle θ, one of which is shown, and
then sum the results for all these sectors to obtain the total area A. We can indicate
this, together with the range for r and θ, by writing
θ = 14 π r = cosθ θ = 14 π r = cosθ
A= ∑ ∑ δA ≈ ∑ ∑ (r δr δθ ).
θ =0 r =0 θ =0 r =0

When we let δr and δθ tend to zero, we have a repeated integral with a variable limit:
1
π cosθ 1
π 1
π

   
4 4 4

A= r dr dθ = = 0 dθ =
[ 12 r 2 ] rcosθ 1
cos2θ dθ = 1
π + 18 .

2 16
0 0 θ =0 0
724
Example 32.10 continued
DOUBLE INTEGRATION

y
r = cos θ

r
δA
θ δθ
O 1 x Fig. 32.10

(b) The moment δM, about the y axis, of an area element δA is


32

δM ≈ x δA;
so, as a double integral, the total moment M is given by

M=  x dA.R

In the same way as in (a), but with an extra factor


x = r cos θ,
we have
1
π cosθ

 
4

M= r 2 cos θ dr dθ = 1
32 π+ 1
12 .
0 0

Self-test 32.6
Find the volume of the region bounded by the paraboloid z = 1 − x2 − y2 and
the plane z = 0.

32.7 Separable integrals


Suppose that we have a repeated integral I with constant limits, whose integrand
f(x, y) is the product of a function of x only with a function of y only:
d b d b

I=   f(x, y) dx dy =   g(x)h(y) dx dy.


c a c a

It is called a separable integral because of the following property.


The inner integral is
b b

a
g(x)h(y) dx = h(y)  g(x) dx,
a

since y is held constant. Therefore


d
⎛ b

I=
c
h(y) ⎜

 a
g(x) dx⎟ dy.

725
The integral ∫ g(x) dx, once worked out, is just a constant, so we can take it out
b
a
from under the y integral, obtaining

32.7
b d

I=  g(x) dx  h(y) dy,

SEPARABLE INTEGRALS
a c

which is simply the product of two ordinary integrals. We have the following result:

Separable integrals

   h(y) dy.
d b b d

g(x)h(y) dx dy = g(x) dx
c a a c (32.5)

This can sometimes speed up the working when evaluating integrals, but the fol-
lowing example proves an important result by applying (32.5) the other way round.

Example 32.11 Prove that


e 0
−x 2 dx = 21 π 2 .
1


Put I=  0
e−x dx.
2

The name given to the variable of integration in a definite integral is a matter of


indifference, so we can equally well put

I=  0
e−y dy.
2

The product is I 2:
∞ ∞
I2 =  0
e−x dx
2

 0
e−y dy.
2

By (32.5), this can be written as a repeated integral


∞ ∞
I2 =  0 0
e−x e−y dx dy,
2 2
(32.6)

because this repeated integral is separable.


Regard x and y as cartesian coordinates. The region of integration R is the whole of
the first quadrant (0  x  ∞; 0  y  ∞) in Fig. 32.11. Now change to polar coordinates,
putting
e−x e−y = e−(x +y ) and x2 + y2 = r 2.
2 2 2 2

The area element is dA = r dr dθ, and the same region R is described in polars by
0  r  ∞, 0  θ  12 π .
Then
1
π ∞ 1
π ∞

  e
2 2

I =
2
e −r 2
r dr dθ = dθ −r 2
r dr (by (32.5) again)
0 0 0 0

= − 12 π 12 [e −r ] r∞= 0 = 14 π.
2

Therefore I = 12 π 2 .
1
726

y
R
DOUBLE INTEGRATION

r δθ
δA
δr

O x Fig. 32.11
32

Example 32.12 Prove the convolution theorem for Laplace transforms (see
(25.11)): that is, if F(s) and G(s) are the Laplace transforms of f(t) and g(t)
respectively, then F(s)G(s) is the Laplace transform of ∫ 0t f(τ )g(t − τ ) dτ.
Consider the Laplace transform P(s) of ∫ 0t f(τ )g(t − τ ) dτ:
∞ t ∞ t
P(s) =  
0
e −st
0
f (τ )g(t − τ ) dτ dt =  e
0 0
− st
f (τ )g(t − τ ) dτ dt.

The region of integration is the infinite triangle in the (τ, t) plane shown in Fig. 32.12.
Change the order of integration by summing vertical strips: we find that

Vertical strip

O τ Fig. 32.12

∞ ∞
P(s) = e
0 τ
− st
f (τ )g(t − τ ) dt dτ .

Now change the variable in the inner integral from t to u, where


u = t − τ,
remembering that τ is constant in the inner integral. We obtain
∞ ∞ ∞ ∞
P(s) =  g(u)f (τ ) e
0 0
− s(u + τ )
du dτ = e
0 0
− su
e −sτ g(u)f (τ ) du dτ
∞ ∞
=  e g(u) du  e
− su − sτ
f (τ ) dτ
0 0
(since the integral is separable)
= F(s)G(s).
727

General change of variable; the Jacobian


32.8

32.8
determinant
Consider the integral

GENERAL CHANGE OF VARIABLE; THE JACOBIAN DETERMINANT


I=  f(x, y) dx dy
R
(32.7)

where R is the region of integration in the (x, y) plane and the area elements
δA are small rectangles of side δx and δy as in Fig. 32.13. The shape of R might
suggest the use of another system of coordinates to evaluate I. The special case of
polar coordinates was illustrated in Section 32.6.
Suppose that new coordinates u and v are defined by the relations
x = x(u, v); y = y(u, v); (32.8)

where there is a one-to-one correspondence between (x, y) and (u, v). The
objective is to put (32.7) entirely in terms of u and v.

y Rectangular y
v = vP
element δA
u = uP

δy P
δx

Region R

O x O x
Fig. 32.13 Fig. 32.14

Figure 32.14 shows a general point P at (xP, yP), or at (u = uP, v = vP) in the new
coordinates. The coordinate curves u = uP and v = vP through P are also shown.
Now let δu and δv represent positive small increments in u and v respectively.
In Fig. 32.15(a) the two curves u = uP + δu and v = vP + δv are also shown. The

(a) u = uP + δu v = vP + δv
v = vP
u = uP
R (b) R

S δA′ Q Q : (xQ, yQ)


S : (xS, yS)
P
P : (xP, yP)

Fig. 32.15 (a) Area element δA′ for the u, v coordinates. (b) When δu and δv are small, PQRS is
nearly a parallelogram.
728
area element PQRS, denoted by δA′, is of the type appropriate for the new
coordinates, and when δu and δv are small, PQRS is nearly a parallelogram, as
DOUBLE INTEGRATION

indicated in Fig. 32.15(b).


The area of the parallelogram PQRS is given by

⎡x − xP xS − xP ⎤
δA′ = det ⎢ Q
⎣ yQ − yP yS − yP ⎥⎦

where the verticals stand for the modulus of the determinant between them (see
Problem 32.17, and also Example 11.2).
The elements of the determinant are given approximately by
32

∂x
xQ − xP = x(uP + δu, vP) − x(uP, vP) ≈ δu,
∂u
∂x
xS − xP = x(uP, vP + δv) − x(uP, vP) ≈ δv,
∂v
∂y
yQ − yP = y(uP + δu, vP) − y(uP, vP) ≈ δu,
∂u
∂y
yS − yP = y(uP, vP + δv) − y(uP, vP) ≈ δv,
∂v
where the partial derivatives are evaluated at P. Therefore

⎛ ∂x ∂x ⎞ ⎛ ∂x ∂x ⎞

δA′ = det ⎜ ∂u ∂v ⎟ δu δv = det ⎜ ∂u ∂v ⎟ δu δv,
⎟ ⎜ ⎟
⎜ ∂y ∂y ⎟ ⎜ ∂y ∂y ⎟ (32.9)
⎝ ∂u ∂v ⎠ ⎝ ∂u ∂v ⎠

(where the vertical lines denote the modulus of the determinants) since we required
δu and δv to be positive.
The determinant which occurs in (32.9) is of wide importance. It is called the
Jacobian of the transformation (32.8), and has the notation
∂(x, y)
.
∂(u, v)
For brevity it is sometimes denoted simply by J(u, v).

The Jacobian determinant of the transformation x = x(u, v), y = y(u, v):


⎛ ∂x ∂x ⎞
∂(x, y) ⎜ ∂v ⎟
J(u, v) or = det ⎜ ∂u
∂(u, v) ∂y ∂y ⎟
⎜ ⎟
⎝ ∂u ∂v ⎠ (32.10)

From (32.8) and (32.9), remembering the modulus, we can therefore say:
729

Area δA′ of an element at P with sides in the directions of the u, v coordinate

32.8
curves
∂(x, y)
δA′ = δu δv (or | J(u, v)| δu δv)

GENERAL CHANGE OF VARIABLE; THE JACOBIAN DETERMINANT


∂(u, v)
where δu, δv are positive, and the Jacobian determinant is evaluated at P. (32.11)

We can now rewrite the original integral (32.7) in terms of the new coordinates
u and v:

To express a double integral in new coordinates


If x = x(u, v), y = y(u, v), then

 f(x, y) dx dy =  f(x(u, v), y(u, v)) ∂∂((xu,, vy)) du dv,


R S

where S is the region R transformed to the cartesian (u, v) plane. (32.12)

The effect of making the change of variable is to change the integrand to


something different. This is not surprising; a similar thing happens in the one-
dimensional case. If

 f(x) dx
b

I=
a
dx
and we change the variable by putting x = x(u), the new factor appears in the
integrand, and the limits change. du
The final step is to convert (32.12) into a repeated integral in terms of u and v,
so that the integrations can be carried out.

Example 32.13 Transform the integral I =  (x + y ) dx dy into


R
2 2

polar coordinates r, θ, where R is the region shown in Fig. 32.16.


Here r and θ stand in place of u and v, and
x = r cos θ, y = r sin θ,
∂x ∂x
= cos θ , = − r sin θ ,
∂r ∂θ
∂y ∂y
= sin θ , = r cos θ .
∂r ∂θ
Therefore,
∂(x, y) ⎛ cos θ −r sin θ ⎞
J(r, θ ) = = det ⎜
r cos θ ⎟⎠ = r cos θ + r sin θ = r.
2 2
∂(r, θ ) ⎝ sin θ
This is already positive, so the new area elements are given by
δA′ = r δr δθ, ➚
730
Example 32.13 continued
DOUBLE INTEGRATION

45°
O 1 2 x Fig. 32.16
32

as we found in Example 32.11. Also


f(x, y) = x 2 + y 2 = r 2.
Finally

I=  (x + y ) dx dy =  r J(r, θ ) dr dθ =  r dr dθ.
R
2 2

S
2

S
3

This is to be read straightforwardly as a fresh double integral in variables called r


and θ, with rectangular area elements δr δθ. When we draw the diagram to find the
shape of S, r and θ are to be treated as cartesian coordinates in axes labelled r and θ
(see Fig. 32.17). In this frame, S is bounded by the straight lines r = 1, r = 2, θ = 0, θ = 14 π.

θ
1
1
4
π

O 1 2 r Fig. 32.17

Expressed as a repeated integral, using (say) strips parallel to the r axis,


1
π 2 2 1
π

   
4 4

I= r 3 dr dθ = r 3 dr dθ (since the integral is separable)


0 1 1 0
1
π
= [ 14 r 4 ]12 [θ ] 04 = 15
16 π.

Example 32.14 Evaluate I =  (y − x ) dx dy over the square region in Fig. 32.18


R
2 2

by changing the variables to u, v, where x = v − u, y = v + u.


Figure 32.18 shows the x, y and the u, v equations of the sides of R . We have
⎡ ∂x ∂x ⎤
∂(x, y) ⎢ ∂v ⎥ = det ⎡−1 1⎤ = −2.
= det ⎢ ∂u ⎥ ⎢⎣ 1 1⎥⎦
∂(u, v) ⎢ ∂y ∂y ⎥
⎢⎣ ∂u ⎥
∂v ⎦ ➚
731
Example 32.14 continued

32.8
Therefore
∂(x, y)
δA = δu δv = 2uv δu δv.

GENERAL CHANGE OF VARIABLE; THE JACOBIAN DETERMINANT


∂(u, v)
In terms of u and v, y2 − x2 = 4uv, so

I=  4uv(2 du dv) = 8  uv du dv.


S S
The corresponding region S in the (u, v) plane is shown in Fig. 32.19. Therefore

y
2 v
y
1) 2

v=1
=

x
(v
x

1

2
=
y

1)
(u

R 1
u=0 S u=1
(v

0)
y

0
=
+

0) 0

=
(u
x

x
=


y

O v=0 1 u
−1 O 1x
Fig. 32.19
Fig. 32.18

1 1 1 1
I= 
0 0
uv du dv = 8 
0
u du  v dv (since the integral is separable)
0

=8× 1
2 × 1
2 = 2.

Self-test 32.7
Sketch the region bounded by the parabolas y = x2, y = 2x2, x = y2, x = 2y2.
Using the change of variable u = y/x2, v = x/y2, find the area enclosed by the
parabolas. What is the area bounded by y = 2x2, y = x2 and x = 2y2?
732
Problems
DOUBLE INTEGRATION

32.1 Evaluate the following repeated integrals 2 y 2 1− 12 x

with constant limits. (f ) 


0 − 1
2y
y 2 sin xy dx dy; (g)  0 0
x 2 dy dx;

 
1 2 1 1

(a) xy2 dx dy; (b) y exy dx dy; 1 √( 1−y 2) 1 √(1−x 2 )


0 1 0 0 (h)  x dx dy; (i)  x dy dx.

  dx dy;   dx dy;
d b b d 0 0 0 0
(c) (d)
c a a c
32.4 Find the volume of the wedge-shaped object

1 1


d b


2

(e) dy dx; (f) y sin xy dx dy; having one curved surface which is part of the
c a 0 0
cylinder x2 + y2 = 1, and whose flat surfaces are z = 0
32

and the plane z = 2y. (Consider the simple wedge in


   x dy dx;
1 1 2 1

(g) x2 dx dy; (h) 2 z  0 only.)


−1 −1 1 0

  (xy − x y) dx dy;
1 1 32.5 Reverse the order of integration in each of the
2 2
(i) following cases. It is necessary to sketch the region
0 −1
of integration and to indicate a typical strip

  (xy − x y) dy dx;
1 1
2 2
corresponding to the new order of integration,
(j) as in Section 32.4.
−1 0
1 y 1 1

  (x + y + 1) dx dy;   f (x, y) dx dy;   f (x, y) dx dy;


1 1

(k) 2 2 (a) (b)


0 0 0 y
0 0


1

1 2 y+1

(l)   cos(x + y)dy dx;


0 0
(c) 
1 0
f (x, y) dx dy;

2 1

(m)   dx dy.
x 1 √(1− y 2 )

y 1 0
(d) 
0 −√(1− y 2 )
f (x, y) dx dy;

1
4 2y
32.2 Find the signed volume between the given
surfaces and the plane z = 0 over the specified
(e) 
2 0
f (x, y) dx dy;
rectangular regions.
(a) z = xy, 0  x  1, 0  y  1; 1 x2

(b) z = xy, −1  x  1, 0  y  1 (explain the


result);
(f) 
0 x3
f (x, y) dy dx;

(c) z = x + y, −1  x  2, −2  y  1; 1 1−x

(d) z = −1, a  x  b, c  y  d;
(e) z = 2x − y + 3, 0  x  1, 0  y  1;
(g) 
0 −1+x
f (x, y) dy dx (it becomes the sum of
two integrals);
(f) z = 1 /(x + y), 1  x  2, 0  y  1; 1 1+√(1−x 2 )
(g) z = (x + 2y − 1)2, −2  x  1, −1  y  1. (h) 
−1 1−√(1−x 2 )
f (x, y) dy dx.

32.3 In the following problems, the region of


integration is not rectangular. In each case, sketch 32.6 Change the order of integration in the
the region of integration and indicate a typical following, and hence evaluate them. It is necessary
strip for the inner integral. to sketch the region of integration R and to
y

  dx dy; (b)   x y dx dy;


1 1 1
2 indicate the strip corresponding to the new
(a)
0 0 0 y
inner integral.
2π 2(y −1)
1 1
2

 
x

(c)   x y dy dx (compare (b));


1 2

2 (a) x sin xy dx dy; (b) x 2 dx dy;


0 0 0 0 1 0
y y

(d)   (x + y ) dx dy; (e)   y dx dy;


1 1 1 y

0 0
2 2

0 −y
(c)  x e
0 0
2 xy
dx dy;
733
∞ y−2


(h) R is the half plane y  0, and f(P) = y e −(x 2 +y 2)
.
x 2y e −x y dx dy;
2 2
(d)
(Hint: separate the integral: see (32.5).)

PROBLEMS
1 −2
0 4
y

1 2


32.9 A circular hole of radius 12 a is drilled through
(e) y(1 − x 2 − y 2 )2 dx dy; a sphere of radius a in such a way that the edge of
−1 −2
the hole passes through the centre of the sphere.
2 1

 x
y Let the equation of the sphere and the cylinder be
(f ) dx dy; x2 + y2 + z2 = a2 and (x − 12 a)2 + y 2 = 41 a 2. If x = r
2
+ y2
1 0
cos θ and y = r sin θ, show that the volume Vc of
∞ ∞ material removed (the section in the (x, y) plane is

1
(g) dx dy; shown in Fig. 32.20) is given by
1 0
(x + y)3
2π a cos θ
1

(h)
1


1

y(x 2 − y 2 ) dx dy;
1
2
Vc = 2  
− 12 π 0
a 2 − r 2 r dr dθ .

0 y
Hence find the volume of the remaining part of the
2 y −1

 2
sphere.
(i) x dx dy (the integral must be split
0 − y −1 into two parts);
y
1 1


y dx dy
(j) 1 .

0 y (x 2 − y 2 ) 2
r
32.7 The symbol ∫∫R f(P) dA represents a double θ
integral taken over the region R , and dA is the area O 1
2
a a x
element at the point P in R (see Section 32.5). In
the following cases, the region R is described, and
f(P) given in cartesian coordinates. Evaluate the
integrals.
(a) R is the rectangle with corners at (1, 1), (2, 1),
(2, 4), and (1, 4), and f(P) = x2 + y2.
Fig. 32.20
(b) R is the equilateral triangle with vertices at
(0, −1), ( 3 2 , 0), and (0, 1), and f(P) = x.
1

32.10 Find the Jacobian


(c) R is the circle of radius 2, centred at the origin,
and f(P) = y 2. ∂x ∂x
∂(x, y)
J(u, v) = = ∂u ∂v
32.8 As in Problem 32.7, but polar coordinates ∂(u, v) ∂y ∂y
are to be used for the evaluation (see Section 32.6). ∂u ∂v
Remember the change in the area element; of the following transformations:
see (32.4). (a) x = u2 − v2, y = uv;
(a) R is the disc x2 + y2  1, and f(P) = x2 + y 2. (b) x = u − v, y = 2v;
(b) R is the disc x2 + y2  1, and f(P) = y2. (c) u = 2x − y, v = x + 2y;
(c) R is the area whose boundary consists of the x (d) x = u − e−v, y = u − ev.
axis between x = 0 and 2, the y axis between y
= 0 and 2, and a quarter of the circle x2 + y 2 = 4. 32.11 Find the Jacobian of the transformation
Also f(x, y) = xy. x = u/v, y = uv. Let R be the region bounded by
(d) R is the sector 1  r  2, 0  θ  12 π , and y = 2x, y = x, xy = 1, xy = 8. Express
f(P) = xy.
(e) R is the disc x2 + y2  4, and f(x, y) = arctan
(y/x).  xy dx dy
R
2

(f) R is the first quadrant of the plane, and f(x, y) as a repeated integral in the (u, v) plane, and
= e − 4(x +y ).
2 2

evaluate it.
(g) Show that the volume of a sphere of radius a is
3 πa
4 3. (Consider the hemisphere
32.12 Sketch the region in the (x, y) plane
0  z  (a 2 − x 2 − y 2 ) 2 .) bounded by the parabolas y = x2, y = 2x2, x = y2,
1
734
x = 2y2. Find the Jacobian of the transformation 32.16 Find the Jacobian J(u, v) of the
given by transformation u = x2 − y2, v = 2xy. Draw a
DOUBLE INTEGRATION

y x sketch of the region R in the (x, y) plane bounded


u= 2, v= 2. by the curves x2 − y2 = 1, x2 − y2 = 4, xy = 2, xy = 4.
x y
By using the change of variable from (x, y) to (u, v),
Hence find the area bounded by the parabolas. evaluate

32.13 Evaluate
 (x + y ) dx dy.
2 2

 x e
R
x+y
dA,
R

where R is the region bounded by the square 32.17 Let PQRS be a parallelogram with P at
|x| + |y| = 1. (xP, yP), Q at (xQ, yQ), and S at (xS, yS). Show
32

that its area is given by the modulus of the


32.14 A plastic component is cut from a solid determinant
plastic rod which is cylindrical with cross-section ⎡x − x P x S − x P ⎤
bounded by the rhombus R , y = 1 − 12 x, y = −1 − 12 x, det ⎢ Q ⎥.
y = 1 + 12 x, y = −1 + 12 x in the (x, y) plane, with the ⎣ yQ − y P y S − y P ⎦
rod in the z direction. The ends of the component (Hint: simplify the problem by placing P at the
are shaped into the surfaces z = x2 + 2 and origin of coordinates.)
z = −(x2 + 2). Find the volume

V=2  (x + 2) dA
R
2 32.18 The following technique can be regarded as
integrating under another integral sign with respect
of the component. to a parameter (compare Section 17.9).
(a) Noting that
32.15 For the polar transformation x = r cos θ, y = r b

e
1 −ax
sin θ, show that − xy
dy = (e − e −bx )
x
∂(x, y) a
= r.
∂(r, θ ) evaluate the integral
Show that r and θ are given by ∞


e −ax − e −bx
y dx,
r= x +y ,
2 2
tan θ = . 0
x
x
Show that where a  0 and b  0.
(b) In a similar way, evaluate
∂(r, θ ) 1 ∂(x, y)
= =1 .
∂(x, y) r ∂(r, θ ) ∞


cos ax − cos bx
In fact under fairly general conditions, the dx,
−∞
x2
Jacobian satisfies this inverse rule, which is helpful
in some cases since it can avoid the inversion of where a and b may take any values. In this
transformations (see Example 30.16). problem the result depends on the signs of a
Find ∂(u, v)/ ∂(x, y) if u = y /x2 and v = x/y2 using and b. (You may assume that
this rule, and confirm that

∂(x, y)

1 sin u 1
= . du = √π.)
∂(u, v) 3u 2v 2 0 u 2
Line integrals
33

CONTENTS

33.1 Evaluation of line integrals 736


33.2 General line integrals in two and three dimensions 739
33.3 Paths parallel to the axes 743
33.4 Path independence and perfect differentials 744
33.5 Closed paths 746
33.6 Green’s theorem 748
33.7 Line integrals and work 750
33.8 Conservative fields 752
33.9 Potential for a conservative field 754
33.10 Single-valuedness of potentials 756
Problems 759

Consider the following scenario. The success of a museum is measured by two


variables. These are the monthly income x from visitors, and the monthly income
y from grants and donations. The variation is smoothed out so that they form
a continuous record. The exhibitions director receives a variable monthly bonus
I which rewards success and penalizes failure in promoting attendance: when
attendance changes by a small amount δx, positive or negative, there is a change
in bonus (up or down) of δI, where
δ I ; f(x, y) δx. (33.1)

Figure 33.1 charts the fortunes, or the state (x, y), of the museum over a period,
starting at state A and arriving at state B, in the form of a curve joining A and B.
Time does not register on this diagram, except that direction of development as
time increases is indicated by the arrow. The directed curve is called the path from

A to B, denoted by (AB) or AB (there may be more letters in the brackets).
Suppose that the bonus at the starting state A is IA, and at the state B it is IB. Then
the problem is to find the change in bonus over the period, I(AB):
IB − IA = I(AB). (33.2)

Divide up the path into many short segments such as PP′ (Fig. 33.1). The incre-
ment δI over a typical segment is given by
736

y (grants)
LINE INTEGRALS

δx  0
A P′
δy
C P
33

δx

δx  0
O x (visitors) Fig. 33.1

(δI)along PP′ ≈ f(x, y) δx.


Add the contributions of all the segments to obtain I(AB):

I(AB ) ≈ ∑ f(x, y) δx.


(AB)
(33.3)

Given a specific function f(x, y) and a specific path (AB), I(AB) could in principle
be computed by carrying out the summation (33.3) numerically, taking δx very
small, and allowing for the fact that δx is sometimes positive and sometimes
negative. We have to split up the path for this purpose: in Fig. 33.1, δx is negative
along (AC) and positive along (CB). If the path is vertical along a section, then δx
will be zero, and there will be a zero change in the bonus along this part despite
the fact that y is changing.
In imagination, let δx → 0. Then ‘≈ ’ becomes ‘=’. It is natural to write the
result as a kind of integral:

I(AB) = lim ∑ f (x, y) δx =


δx→ 0
(AB )
 (AB)
f(x, y) dx, (33.4)

where the notation reminds us that we take values of (x, y) which lie on (AB) and
take account of the sign of δx at each point on the path.
The integral in (33.4) is called a line integral. It is not straight-forwardly an
ordinary integral because the direction, left to right or right to left, at every point
must be taken into account. The director is losing money along (AC). In order
to arrive at B from A many paths are possible. In general, a line integral I(AB) will
depend on the total history, on what path has been followed between A and B, and
we say that the integral is path dependent.

33.1 Evaluation of line integrals


To show how to intepret (33.4) in terms of ordinary integrals, suppose that the
bonus function f(x, y) is given by
f (x, y) = x + 21 y so that δ I = (x + 21 y) δx
737

(a) y B
(b) y

33.1
5 5

EVALUATION OF LINE INTEGRALS


A
A
3 3
y = x2 − 7x + 15

C y = x2 − 3x + 3
1 1

O 1 3 5 x O 1 3 5 x

Fig. 33.2

(in suitable units). Suppose that the museum passes from state A : (3, 3) to state B
as in Fig. 33.2a, or to state C as in Fig. 33.2b; it thrives in Fig. 33.2a and declines
in Fig. 33.2b.
Consider the case (a). The graph AB can be expressed in principle as a function
of x, and the curve chosen for illustration is
y = x2 − 7x + 15.
where B is the point (5, 5). Then

I(AB) =
 (AB )
(x + 21 y) dx =
 (AB )
[x + 21 (x2 − 7x + 15)] dx.

But δx is positive all the way; so, regarded as the limit of the sum in (33.4), this is
just an ordinary integral. After simplifying the integrand, we have


5

I= 1
2 (x2 − 5x + 15) dx = 11.33.
x=3

For the case (b), the equation of the curve from A to C is


y = x2 − 3x + 3,
so that I(AC) becomes

I(AC) =
 (AC )
1
2 (x2 − x + 3) dx.

where C is the point (1, 1). In this case, however, the δx in the sum (33.4) are all
negative: x is decreasing. To turn this into an ordinary integral, we have therefore
to reverse the sign:
3

I(AC) = − I(CA) = −

x =1
1
2 (x2 − x + 3) dx = −5.33. (33.5)

There is a reduction of bonus for bringing the museum to the edge of ruin.
While we are still observing things, notice first that, in connection with the sign
change for negative δx on (AC) in (33.5), we can write
738
3 1

I=−
 1
2 (x2 − x + 3) dx = (+)
 1
2 (x2 − x + 3) dx.
LINE INTEGRALS

1 3

In other words, we obtain the correct result by setting the x coordinate of the
starting point as the lower limit, and that of the end-point as the upper limit,
whether x is constantly decreasing or constantly increasing along the path, and
this is a general result.
Lastly we compare the result for the parabolic path (AC) in Fig. 33.2 with a
straight path from A to C whose equation is
33

y = x.
Then
1

 (AC )
(x + 21 y) dx =
 3
3
2 x dx = 23 [ 21 x2 ]13 = − 6.

This is different from (33.5), so we must in general expect that line integrals will
be path dependent.
The following summary generalizes the special case we have discussed.

The line integral I(AB) =  (AB)


f (x, y) dx

(a) Definition: I(AB) = lim


δx → 0
∑ f(x, y) δx, where (x, y) takes values on the path
( AB)
(AB) from A to B.
(b) I(AB) is path dependent, and I(BA) = − I(AB).
(c) If δx has constant sign on the path y = y(x) from C : (xC, yC) to D : (xD, yD),
then

 
xD

f (x, y) dx = f (x, y(x)) dx.


(CD) xC
(33.6)

Example 33.1 Evaluate the two line integrals

(a) (AB )
xy dx, (b)  (ACB )
xy dx,

on the paths shown in Figs 33.3a,b respectively.


(a) On (AB), y = x and δx  0. Here A = (2, 2) and B = (4, 4); so, by (33.6),
4
I(AB) = ( AB)
xy dx =  x dx = [ x ]
2
2 1
3
3 4
2 = 56
3 .

(b) The path (ACB) has to be broken into two parts: (AC), on which δx  0, and (CB),
on which δx  0, where C = (0, 4). Then

I(ACB) =  (ACB)
xy dx =  (AC)
xy dx + 
(CB)
xy dx.

739
Example 33.1 continued

33.2
(a) y B (b) y y=4
4 4 B
C

GENERAL LINE INTEGRALS IN TWO AND THREE DIMENSIONS


y=4−x
y=x

2 2 A
A

O 2 4 x O 2 4 x

Fig. 33.3

On (AC), y = 4 − x; on (CD), y = 4. Therefore


0 4
I(ACB) = 
(AC)
x(4 − x) dx + 
(CB)
4x dx = 
2
(4x − x2 ) dx +  4x dx
0

= [2 x − x ] + 2 [x ] = .
2 1
3
3 0
2
2 4
0
80
3

These results illustrate the path dependence between A and B.

Despite (33.6c), reduction to ordinary integrals over x is not usually the best
way to evaluate line integrals, as will be seen later on.

Self-test 33.1
⁄ is the straight line directed from A : (1, 2) to B : (2, 1). Show that

(a)  ⁄
(ax + by) dx = --23 (a + b) and (b) ⁄
(ax + by) dy = − --23 (a + b).

General line integrals in two and


33.2
three dimensions
Line integrals of the type

(AB)
g(x, y) dy

are to be understood in a similar way:

(AB )
g(x, y) δy means lim
δy → 0
∑ g(x, y) δy,
(AB )
740
in which the sign of δy is positive on a segment along which y is increasing and
negative on a segment where y is decreasing. We can similarly consider paths and
LINE INTEGRALS

functions in three dimensions:

(AB)
f(x, y, z) dx, (AB)
g(x, y, z) dy, (AB)
h(x, y, z) dz,

and string these types together to obtain a general integral


33

(f dx + g dy + h dz).
(AB)

In the following definition, (AB) is a directed path in three dimensions with


representative point P : (x, y, z), and f, g, h are any three functions, their values at
P being denoted by fP, gP, hP:

General line integrals

(a)  (AB)
(f dx + g dy + h dz) = lim
δx,δy,δz → 0
∑ (f
( AB)
P δx + g P δy + hP δz)

(δx, δy, δz are positive (negative) where x, y, z are increasing (decreasing)).

(b)  (AB)
(f dx + g dy + h dz) = −  (BA)
(f dx + g dy + h dz).

(c) For two dimensions, suppress the variable z.


(d) The above integrals generally depend on the path between A and B. (33.7)

To organize such an integral in order to take account of the signs of δx, δy, δz
is often difficult. For example, if the path (AB) consists of an ellipse inclined to the
three axes, each term must be broken into two sections, leading to six integrals
in all, to ensure constancy of sign of δx, δy, or δz along each. However, if a
parametric representation of the path is adopted, the correct interpretation is
obtained automatically.
Consider the integral with respect to x,

(AB)
f(x, y, z) dx,

where (AB) is parametrized as


x = x(t), y = y(t), z = z(t),
so that, as the parameter t increases or decreases from tA at A to tB at B, the path
is traced exactly once in the right direction. Then, in the short interval from t to
t + δt, the change in δx is approximated by
dx
δx = δt,
dt
741
and δx automatically has the right sign, determined by dx/dt. Now put (dx /dt) δt
into the defining sum in place of δx; correspondingly (dx /dt) dt will go into

33.2
the original integral in place of dx. After doing the same thing with the y and z
integrals, we have the following result.

GENERAL LINE INTEGRALS IN TWO AND THREE DIMENSIONS


Parametric evaluation of line integrals
(a) If x = x(t), y = y(t), z = z(t), and P : (x, y, z) covers (AB) exactly once in the
correct direction as t increases or decreases from tA to tB, then

  ⎛⎜⎝ f ddxt + g ddyt + h ddzt ⎞⎟⎠ dt.


tB

(f dx + g dy + h dz) =
(AB) tA

(b) For the two-dimensional case, omit terms in z. (33.8)

Example 33.2 Evaluate I = (AB)


(x2 + y) dy, where (AB) is the path shown

in Fig. 33.4.

y
B (−1, 1)
1

−1 O 1 x

−1 (1, −1) A
Fig. 33.4

On (AB), y = −x, so we can use x = t, y = −t, with t running from t = 1 to t = −1. This
covers (AB) once in the right direction (it is like using x as the parameter). Then
dy
x2 + y = t2 − t and = −1,
dt
so
−1
I=−  1
(t 2 − t) dt

= −[ 13 t 3 − 12 t 2 ]1−1 = 23 .
It is immaterial what parametrization is used, so long as it satisfies the conditions
in (33.8). The following example compares two parametrizations.
742


LINE INTEGRALS

Example 33.3 Evaluate I = (x dy − y dx), where (AB) is the semicircle


(AB)
shown in Fig. 33.5, by means of two parametrizations:

y
1
B
33

−1 O 1 x

A
−1
Fig. 33.5

(a) x = cos t, y = sin t for − 21 π  t  21 π;


(b) x = (1 − t 2 )2 , y = t, for −1  t  1.
1

(a) x = cos t, y = sin t; so


dx dy
= − sin t = cos t.
dt dt
Then
1
π 1
π 1
π

  
2 2 2

I= [cos t cos t − sin t(− sin t)] dt = [cos2 t + sin2 t] dt = dt = π.


− 12 π − 12 π − 12 π

(b) x = (1 − t 2 )2 , y = t; so
1

dx dy
= − t(1 − t 2 )− 2 , = 1.
1

dt dt
Then
1 1

 
dt
I= {(1 − t 2 )2 − t[−t(1 − t 2 )− 2 ]} dt = 1 = [arcsin t ] −1 = π.
1 1
1

−1 −1 (1 − t 2 )2

Example 33.4 Evaluate I = 


(AB)
(x dx + y dy + z dz), where (AB) is the path

x = a cos t, y = a sin t, z = bt between t = 0 and 4π. ((AB) is a helix along the z axis.)
We have
dx dy dz
= −a sin t, = a cos t, = b,
dt dt dt
so the integral becomes
4π 4π
I= 
0
(−a2 cos t sin t + a2 sin t cos t + b2t) dt = b2 0
t dt = 8b2π2.
743

Self-test 33.2

33.3
Obtain  (y2 dx − x2 dy) where the path ⁄ is the parabolic arc y2 = x from

PATHS PARALLEL TO THE AXES



A : (1, −1) to B: (4, 2).

33.3 Paths parallel to the axes


Sometimes it is necessary to evaluate line integrals along line segments which are
parallel to the axes. In such cases the easiest approach is a direct one, as shown in
the following example.

Example 33.5 (See Fig. 33.6.) Evaluate the line integral


(a) over the path (AOB); (b) over the path (AQB).
(AB)
x dy

1 Q
B

A
O 1 x
Fig. 33.6

In this method, we refer to the sums in the definition (33.7).

(a) On (AO), δy = 0 since y is constant; so (AO)


x dy = 0.

On (OB), x = 0; so (OB)
x dy = 0.

Therefore

(AOB)
x dy = 
(AO)
x dy +  (OB)
x dy = 0.
1

(b) On (AQ), x = 1; so  (AQ)


x dy = (AQ)
1 dy =  dy = 1.
0

On (QB), y is constant; so δy = 0, and 


(QB)
x dy = 0. Therefore

(AQB)
x dy = 
(AQ)
x dy +  (QB)
x dy = 1.
744

Self-test 33.3
LINE INTEGRALS

The points O : (0, 0), A : (2, 0), B : (0, 1) are vertices of a rectangle with sides
parallel to the axes.

(a) Show that 


Z
(y dx + x dy) = 2, where Z denotes the path from A to

B following the sides OA and AB of the rectangle.


33

(b) Show that (y dx + x dy) = 2, where z is the straight line joining O


z
and B.
(c) Notice that y dx + x dy can be written as d(xy). How does this relate to
the results (a) and (b)?

33.4 Path independence and perfect differentials


Despite the fact that the value of a line integral taken between two given points
usually depends on the path chosen, there are many cases of physical importance
for which the value is independent of the path: all paths between A and B lead to
the same value. To show that such cases can exist, the following two examples
show integrands for which the integral is independent of the path.

Example 33.6 (In two dimensions). Show that I =


is independent of the path chosen from A to B.

(AB)
(y dx + x dy)

We can write the integrand in terms of a perfect differential (see Section 22.4):
y dx + x dy = d(xy).
Now express the integral in the form

I= 
(AB)
(y dx + x dy) =  (AB)
d(xy). (i)

From (33.7), the meaning of the integral is the limit of a certain sum, which can be recast
in the form
lim
δx,δy → 0
∑ ( y δx + x δ y) =
( AB)
lim
δ(xy)→ 0
∑ δ(xy).
( AB)
(ii)

As we travel along (AB), the value of (xy) starts at xAyA, where the values are taken at A,
then goes by steps δ(xy) until it attains the value xByB. In other words,

∑ δ(xy) = x y B B − xA yA,
( AB)

which is independent of what path connects A to B. We can summarize the operation by


simply writing

I= 
(AB)
(y dx + x dy) =  (AB)
d(xy) = xByB − xAyA
745

Example 33.7 Prove that

33.4
I=  [(y + z) dx + (z + x) dy + (x + y) dz]

PATH INDEPENDENCE AND PERFECT DIFFERENTIALS


(AB)

is independent of the path from A to B.


We can use the idea of differentials in exactly the same way: it is suggested by the
incremental approximation (31.1). If we put
f(x, y, z) = yz + zx + xy,
the incremental approximation gives
δf = (y + z) δx + (z + x) δy + (x + y) δz,
which parallels the corresponding statement involving differentials:
d(yz + zx + xy) = (y + z) dx + (z + x) dy + (x + y) dz.
In the same way as in Example 33.6, we have

I= (AB)
[(y + z) dx + (z + x) dy + (x + y) dz] = 
(AB)
d(yz + zx + xy)

= (yz + zx + xy)B − (yz + zx + xy)A,


the suffices A and B meaning the values of the brackets at the end-points A and B. This
is independent of the path.

In general, suppose that we can recognize the functions f, g, and h in the


differential form
f(x, y, z) dx + g(x, y, z) dy + h(x, y, z) dz
as being expressible in terms of a single-valued function S(x, y, z) in the following
way:
∂S ∂S ∂S
f= , g= , h= .
∂x ∂y ∂z
Then we can write f dx + g dy + h dz as a perfect differential
∂S ∂S ∂S
f dx + g dy + h dz = dx + dy + dz = dS.
∂x ∂y ∂z
This can be substituted into our integrals to yield

(AB)
(f dx + g dy + h dz) = (AB)
dS = SB − SA.

Provided that there is no ambiguity in the values to be assigned to SB and SA (the


possibility is discussed in Section 33.10 below), this provides a way of evaluating
the integral and possibly demonstrating path independence.

Evaluation of integrals over perfect differentials


If f dx + g dy + h dz = dS, and S is single-valued then

 (AB)
(f dx + g dy + h dz) = SB − SA.
(33.9)
746

Self-test 33.4
LINE INTEGRALS

Prove that if A and B are any two fixed points, then  [(x − yz) dx + (y − 2x) dy

2 2

+ (z2 − xy) dz] has the same value for every path ⁄ connecting them.

33.5 Closed paths


33

A closed path is one that returns to its starting point, so that B has the same
coordinates as A, as in Figs 33.7a,b. We shall discuss only simple closed paths.
These do not cross over themselves, as do the curves in Fig. 33.7b.

(b) y
(a) y A
B

B
A
A
B

O
x O
x

Fig. 33.7 (a) A simple closed path. (b) Closed paths which are not simple.

It is clear from the definition (33.7) that, when A and B are the same point,
and the path is closed, their position on the curve will not affect the value of the
integral. Consequently its coordinates are not usually stated; a closed path is
indicated by a symbol such as C, and the integral is written

 (f dx + g dy + h dz).
C

(The notation  is also used for line integrals around closed curves.) In three
C
dimensions, the direction along C is specified by extra information, such as by an
arrow on a sketch of the curve. However, in two dimensions, a convention oper-
ates: if it is not otherwise indicated, the standard direction is anticlockwise.

Example 33.8 Evaluate I =  (x dy − y dx), where C is the ellipse


C
x2/a2 + y2/b2 = 1, described in the standard direction.
The ellipse can be parametrized for the anticlockwise direction by
x = a cos t and y = b sin t (0  t  2π), ➚
747
Example 33.8 continued

33.5
where we assume a and b to be positive. We had a choice for the range of t, because
we can start at any point on the ellipse. For the choice 0 to 2π, the path starts and ends
at (a, 0). Then

CLOSED PATHS
dx dy
= −a sin t, = b cos t;
dt dt
so
2π 2π
I= 0
[a cos t(b cos t) − b sin t(−a sin t)] dt = ab 
0
dt = 2πab.

The following result is sometimes useful:

A criterion for general path independence

If (f dx + g dy + h dz) = 0 for every closed curve C,


C

then  (f dx + g dy + h dz) is path independent for every A and B.


(AB)
(33.10)

(a) M (b) M B
B

A A
N N

Fig. 33.8 (a) Two paths M and N between A and B. (b) Closed curve C using M and N reversed.

To prove this, see Fig. 33.8a. Here A and B are any two points. While (AMB) and
(ANB) are any two paths from A to B. If we reverse the direction of the path
(ANB) we have Fig. 33.8b, which is a closed curve C. Suppose that we know the
integral around every closed curve to be zero. Then, on C,

0=  (AMBNA)
(f dx + g dy + h dz)

=  (AMB)
(f dx + g dy + h dz) + 
(BNA)
( f dx + g dy + h dz)

=  (AMB)
(f dx + g dy + h dz) − 
(ANB)
( f dx + g dy + h dz).

Therefore the integrals along (AMB) and (ANC) are equal.


748
Path independence for any two particular points is sufficient to ensure path
independence between all pairs of points:
LINE INTEGRALS

Path independence between given points


Let A and B be any two given points. Then, if I(AB) is path independent,
I(PQ) is path independent for every pair of points P and Q. (33.11)
33

P A

K
R

B
Q Fig. 33.9

The fixed points A and B are shown on Fig. 33.9, and (P, Q) is any other pair
of points. (AKB), (AP), (QB), and (PRQ) are arbitrary paths joining the points
specified by their brackets. Since I(AB) is independent of the path joint A and B,
I(APRQB) = I(AKB),
so
I(AP) + I(PRQ) + I(QB) = I(AKB),
or
I(PRQ) = I(AKB) − I(AP) − I(QB).
But the right-hand side does not depend on which path was chosen for (PQ).
Therefore I(PQ) is independent of the path joining P and Q, which proves the result.

33.6 Green’s theorem


The following theorem connects a two-dimensional line integral around a closed
curve C with a certain double integral over the region A enclosed by C, as shown
in Fig. 33.10. The functions P(x, y) and Q(x, y) which occur are assumed to be
‘smooth’ in the region considered. ‘Smoothness’ has a technical meaning, that
all the first derivatives of P and Q are continuous, but it will be enough for us
to say that P and Q must have no undefined values, jumps or infinities on C and its
interior A taken together. The theorem is:

Green’s theorem in a plane


C is a simple closed path containing a region A ; P(x, y) and Q(x, y) are smooth
functions. Then

 (P dx + Q dy) =  ⎛⎜⎝ ∂∂Qx − ∂∂Py ⎞⎟⎠ dA,


C A

where dA is the area element, and the line integral direction is anticlockwise.
(33.12)
749

(a) y (b) y (c) y

33.6
C y = f(x)
d
M

GREEN’S THEOREM
A A B
x = k(y)
x = h(y)
N y = g(x) c
O x a O b x O x

Fig. 33.10 (a) The diagram for Green’s theorem. (b) For the integration of ∂P/∂y. (c) For the
integration of ∂Q/∂x.

Although the result is true in general, we shall prove it only for a curve like that in
Fig. 33.10a, for which lines parallel to the axes cut the curve in at most two points.

Green’s theorem involves the sum of two results. Firstly, consider


A
∂P
∂y
dA . 
We shall integrate it by vertical strips as in Fig. 33.10b. Suppose that the top part
AMB of C, between x = a and b, has the equation y = f(x), and the lower part ANB
is y = g(x). Then

  [P(x, f(x)) − P(x, g(x))] dx


b f (x )


b
∂P ∂P
dA = dy d x =
A
∂y a g(x )
∂y a

= P(x, y) dx −  P(x, y) dx
(AMB) (ANB)

= − P(x, y) dx −  P(x, y) dx
(BMA) (ANB)

= −  P(x, y) dx. (i)


C

Similarly, but by using horizontal strips as in Fig. 33.10c,


d h(y)

  Q dy.
∂Q ∂Q
dA = dx dy = (ii)
A
∂x c k(y)
∂x C

By subtracting (i) and (ii) we obtain the result required.

Example 33.9 Show that if C is a simple closed curve, the geometrical area it
encloses is equal to 12 ∫C (x dy − y dx).
Put P = − y and Q = x in Green’s theorem (33.2):
⎛ ∂ ∂ ⎞
 (−y dx + x dy) = 2  ⎜⎝ ∂x (x) − ∂y (− y)⎟⎠ dA =  dA ,
1 1
2 C A A

which is the geometrical area enclosed.


750
Green’s theorem (33.12) enables us to produce a criterion by which path indepen-
dence can be recognized:
LINE INTEGRALS

A condition for path independence


If ∂P/∂y = ∂Q/∂x, then for any points A and B, ∫(AB) (P dx + Q dy) is
independent of the path (AB). (33.13)

To prove this, let C be any closed curve. Its interior is denoted by A. Then, by Green’s
33

theorem,

 (P dx + Q dy) =  ⎛⎜⎝ ∂∂Qx − ∂∂Py ⎞⎟⎠ dA = 0.


C A

Therefore, by (33.10), we have path independence.

33.7 Line integrals and work


A particle follows a certain path (AB) in three-dimensional space under the
action of various forces and its own inertia. Consider one of the forces, F(x, y, z),
which might be the contribution of a force field such as gravity, or a point force
such as friction or the tension in a string. F is a vector, and it does not necessarily
point along the path of the particle.
The path can be parametrized by using t, the time, as the parameter, and its
position specified briefly by its position vector r(t):
(x(t), y(t), z(t)) = r(t)
as in Fig. 33.11a. In a time interval δt, the particle moves from P to Q, and r(t)
changes to r(t + δt), the change being denoted by δr. Figure 33.11a also shows the
force F acting on the particle when it is at P. During the interval δt, the work δW
done by F alone on the particle is approximated by
δW = (component of F in direction PQ) × (distance PQ)
= (|F| cos θ )| δr| = F· δr.

(a) z (b)
F
δr Q
P δz
Q
P θ B
δr δx
δy
r r + δr y

A
x

Fig. 33.11
751
The total work W(AB) done on the particle by F along the path (AB) is given by

33.7
W(AB) = ∑ δW ≈ ∑ F . δr.
(AB ) (AB )

LINE INTEGRALS AND WORK


When the step length goes to zero the sum can be written as an integral:

W(AB) =  (AB)
F· dr.

This integral has an ordinary meaning when it is written in (dx, dy, dz) form by
splitting F and δr into their components (see Fig. 33.11b):
F = F1î + F2 q + F3 x and δr = δx î + δy q + δz x.
Then
F ·δr = F1 δx + F2 δy + F3 δz,
and

W(AB) ≈ ∑ (F 1 δx + F2 δy + F3 δz).
(AB )

Finally, taking the limit as δx, δy, δz approach zero, we obtain the exact result.

Work done by a force F along a path (AB)

W(AB) = (AB)
F·dr = (AB)
(F1 dx + F2 dy + F3 dz).

(For two dimensions, suppress z.) (33.14)

Example 33.10 A field of force F is constant everywhere. Show that the work
done by F alone on a particle which moves from a fixed point A to a fixed
point B is independent of the path followed.
Put F = aî + bq + cx where a, b, c are constants. Then, by (33.14), W(AB) is given by

W(AB) = 
(AB)
(a dx + b dy + c dz) = (AB)
d(ax + by + cz)

= (ax + by + cz)B − (ax + by + cz)A,


which is a quantity independent of the path followed.

Example 33.11 (a) r is a position vector in axes x, y, z, and r = |r |. Show that, in


terms of differentials,
d(r−1) = −(x/r 3) dx − (y/r 3) dy − (z/r 3) dz.
(b) The gravitational force F of the earth acting on a particle of mass m at a
distance r from the earth’s centre O is equal to –mγ r /r 3 (γ constant; F is directed
towards O and has magnitude mγ /r 2). Show that the work done by F when the
particle moves between any two points A and B is equal to mγ (r −1 B − r A ) (it is path
−1

independent). ➚
752
Example 33.11 continued
LINE INTEGRALS

(a) r = (x2 + y2 + z 2 )2 , so by (31.1)


1

∂ −1 ∂ −1 ∂ −1
δ(r −1 ) ≈ (r ) δx + (r ) δy + (r ) δz
∂x ∂y ∂z
x y z
=− 3 δx − 3 δy − 3 δz
(x 2 + y 2 + z 2 ) 2 (x 2 + y 2 + z 2 ) 2 (x 2 + y 2 + z 2 ) 2
= −(x/r 3) δx − (y/r 3) δy − (z /r 3) δz.
33

The corresponding differential relation is


d(r −1) = −(x /r 3) dx − (y/r 3) dy − (z/r 3) dz.
(b) Refer to Fig. 33.12 and (33.14). The components of F are (F1, F2, F3) =
(−mγ x/r 3, −mγ y/r 3, −mγ z /r 3), obtained by splitting r into its components.
Therefore the work W(AB) done is

A
F
B
rB
rA

The earth Fig. 33.12

 ⎛ x z ⎞

y
W(AB) = (F1 dx + F2 dy + F3 dz) = mγ ⎜ − 3 dx − 3 dy − 3 dz⎟
(AB) ( AB) ⎝ r r r ⎠

= mγ  (AB)
d(r−1) (from (a))

= mγ (r −1
B − r A ).
−1

Examples 33.10 and 33.11 illustrate cases where the work done by a force
between two fixed points is independent of the path between them, but this is
not a universal state of affairs: for example, it is not the case for the force field
F = (y, −x, 0).

33.8 Conservative fields


We have spoken in a general way of a field of force and its action on a particle. By
a particle we mean an object small enough for its exact shape, physical constitu-
tion, state of rotation, and so on to be unimportant on the scale of the problem
being considered; it behaves in the way we imagine a point should behave.
However, the magnitude of the force exerted upon it by gravity, electrostatic
influence, etc., will still depend on the mass or charge assigned to the particle. We
need a way to specify the strength of the force field itself, a field intensity, which
is independent of what particle we put into it. When the field strength is specified,
we should be able to deduce its effect on any particle.
753
This is not always quite straightforward, because the introduction of a new
particle into (say) an electrostatic field might change the distribution of charge

33.8
that constitutes the source of the field, so that in effect we would be putting the
particle into a modified situation. The case is similar with gravity: if an asteroid

CONSERVATIVE FIELDS
enters the moon’s gravitational field, the moon will respond by moving, and the
field entered will change, if only by a little. For the purpose of defining field
intensity, we imagine that somehow such an effect is prevented from taking place.
Subject to this, we have the following definition.

Field intensity fP at P
fP is equal to the vector force that would act on a particle of unit mass (charge, etc.)
at P if the sources are assumed to be unaffected by the particle. (33.15)

Therefore, if the gravitational field intensity is GP at P, the force with which the
field acts on a particle of mass m at P is mGP. One can alternatively imagine a
particle of extremely small mass µ to be introduced as a test particle. Then fP will
be equal to µ −1 times the force exerted on such a particle.
Consider the action of a field of intensity f(x, y, z) on a unit particle which is
travelling on a path (AB) (Fig. 33.13). We shall consider not the work done by f
on the particle, but the work done against the field by the particle, which has the
opposite sign. Denote this quantity generally by v. The work done against f in a
step PQ is given by
δv ≈ − f · δr. (33.16)

The total work along the path is the limit of the sum of the δv, which can be
expressed as a line integral as before:

v(AB) = − (AB)
f · dr = − (AB)
(f1 dx + f2 dy + f3 dz),

where f = ( f1, f2, f3).


The important case is when v(AB) is independent of the path from A to B, in
which case the field is said to be a conservative field. In practice our ‘field’ will not
usually consist of the whole of space; some space will be occupied by impenetr-
able bodies, or we might be interested only in the region R inside a metal cage.

P
Q
δr

r
A r + δr

B
O Fig. 33.13
754
According to (33.11) it is only necessary to check path independence for any
single pair of points (A, B) within this region. Therefore:
LINE INTEGRALS

Conservative field in a region R


Let A and B be two given points in R . Then f(x, y, z) is conservative in
R if v(AB) = −∫(AB) f ·dr is independent of the path in R from A to B. (Or
equivalently if ∫C f ·dr is zero for every closed path C in R : see (33.10).) (33.17)
33

The constant field of Example 33.10 and the gravitation field of Example 33.11
are conservative.

33.9 Potential for a conservative field


Suppose that f (x, y, z) is conservative in a region R , and that A is a fixed point
in R . Since f is conservative, the integral


v(AP) = −
(AP)
f · dr,

where P is another point in R , is independent of the path (AP), and so its value
depends only on the location (x, y, z) of P. Therefore we shall write
v(AP) = V(x, y, z) or VP, (33.18)

in which we have suppressed the coordinates of A since they are constant.


In Fig. 33.14, suppose that (AP) is a fixed reference path from A to P : (x, y, z).
Let Q : (x + δx, y, z) be a point close to P, displaced from it a distance δx in the x
direction only. Choose a path (AQ) consisting of two parts: the selected path (AP)
and a straight line extension (PQ) from P to Q. The choice of these paths rather than
any others does not affect the values of VP and VQ since the field is conservative.
Then, from (33.18),
v(AQ) − v(AP) = VQ − VP = V(x + δx, y, z) − V(x, y, z).
But also, from (33.16), δv = −f · δr with δy = δz = 0; so
v(AQ) − v(AP) ≈ −f1 δx,

z
f

P
δx
Q
A
y

x Fig. 33.14
755
where f1 is the x component of f. Equating the last two results and dividing by δx,
we obtain

33.9
f1 = −[V(x + δx, y, z) − V(x, y, z)]/δx.

POTENTIAL FOR A CONSERVATIVE FIELD


When δx → 0, this becomes
∂V
f1 = − ,
∂x
and similarly
∂V ∂V
f2 = − and f 3 = − .
∂y ∂z
Therefore

⎛ ∂V ∂V ∂V ⎞
f = f1î + f2 q + f3x = − ⎜ î + q+ x⎟ ,
⎝ ∂x ∂y ∂z ⎠
or
f = − grad V. (33.19)

We call V a potential function for the field f, or simply a potential. The single
scalar function V(x, y, z) contains all the information necessary to define the
three scalar components of f: f1(x, y, z), f2(x, y, z), f3(x, y, z). The point A is
commonly taken to be at infinity: you might recognize the idea of ‘the work
required to bring a particle in from infinity’ in mechanics. However, if we choose
a different reference point A, it only changes V by an additive constant, and does
not, therefore, affect the truth of (33.19); we get the same f whatever location A
has. We sum up this result as follows.

Potential V of a conservative field f


If f (x, y, z) is conservative in a region R , then
f = − grad V
in R , where V is a scalar potential function for f. Also V is defined in the
region R by

VP = − (AP)
f ·dr,

where A is a fixed point. (33.20)

As an example of a potential, the gravitational field from a particle of mass M,


namely f = −Mγ r /r 3, has the potential V = −Mγ /r. This can be checked from the
working of Example 33.10. The potential function V is equal to the work done
against the field in moving a unit particle from a fixed point A to the current point
P in cases when the field is conservative. Therefore the potential energy of a par-
ticle of mass m, relative to the reference point A at P, is equal to mV. Alternatively,
756
V can be regarded as energy stored by the gravitational field, like energy stored in
a spring.
LINE INTEGRALS

33.10 Single-valuedness of potentials


There is a connection between the question of single-valuedness in a perfect dif-
ferential and the conservative property of a force field. There exist fields f (x, y, z)
which have a potential, but are not conservative because they do not satisfy the
condition (33.17), that ∫(AB) f · dr should be independent of path.
33

Potential field
If there is a scalar function V such that
f = − grad V,
then f(x, y, z) is called a potential field. (33.21)

We can test whether such a field is conservative or not:

Condition for a potential field to be conservative


If f = −grad V, and V is single valued, then f is conservative. In this case, the work
v(AB) in moving a unit particle from A to B against the field is equal to VB − VA.
(33.22)

This is proved as follows:

v(AB) = −  f · dr =  (grad V) ·dr


(AB) (AB)

 ⎜⎝î ∂x + q ∂y + x ∂z ⎟⎠ ⋅(î dx + q dy + x dz)


⎛ ∂V ∂V ∂V ⎞
=
(AB )

 ⎜⎝ ∂x dx + ∂y dy + dz dz⎟⎠ =  dV (see (31.1))


⎛ ∂V ∂V ∂V ⎞
=
(AB ) (AB )

= VB − VA.
Provided that ∫(AB) dV is independent of the path from A to B, the value to be
assigned to VB − VA is unambiguous and we say that V is single valued. However,
the values of V may depend not only on the position, but also on the way in which
the position was reached (analogously to the time spent reaching a point on the
other side of a road being dependent on whether you cross directly or via the
underpass). For example, in the plane, let
V = θ,
where θ is the polar angle traversed in reaching the current position, measured
continuously from a given starting point. What do we mean by

(AB)
dθ ?
757

33.10
B

SINGLE-VALUEDNESS OF POTENTIALS
O A x

D Fig. 33.15

Figure 33.15 shows two paths from A to B : (ACB) goes from A to B more or
less directly, and (ADB) circles the origin completely first. The definition of the
integral is that

(AB)
dθ = lim
δθ → 0
∑ δθ,
(AB )

where the summation is carried out by taking small steps along the path. On
(ACB), θ passes smoothly from θ = 0 to θ = 12 π, so

(ACB)
dV = (ACB)
dθ = θB − θA = 12 π.

But on (ADB), θ starts at θA = 0 and increases smoothly through values 12 π, π,


2 π, 2π, to θB = 2 π. Therefore
3 5

(ADB)
dV =  (ADB)
dθ = θB − θA = 52 π.

Therefore V in this case is path dependent. If the potential of a force field is


given by
V = θ,
where θ is the traversed polar angle, then strictly the field is not conservative:
various paths from A to B involve different amounts of work by a unit particle
moving in the field.

Example 33.12 Show that the two-dimensional field


−y x
f (x, y) = 2 î + 2 q
x +y 2 x + y2
is not a conservative field.
Apart from physical constants this represents the circumferential magnetic field around
a straight wire carrying a current, or the velocity field of a vortex. Put r = xî + yq; then
−y x
f .r = x +y 2 = 0.
x +y
2 2
x + y2 ➚
758
Example 33.12 continued
LINE INTEGRALS

O
x
33

signifies f Fig. 33.16

Therefore the field is perpendicular to the radius vector at every point, as in Fig. 33.16. It
is easy to confirm that
f = − grad V,
where V is a (path-dependent) continuous function such that
tan V = y /x.
Thus we may take V = θ as described in the case we just discussed. (We cannot write
V = arctan y/x, because this function is discontinuous across the y axis: it would have
an infinite gradient there.) The figure makes it obvious that the field is not conservative:
more work is done if you take a unit magnetic pole against the field 50 times around
the origin in order to travel between two points than if you go directly.

The field in Example 33.12 is not conservative, but whole classes of paths are
equivalent. Suppose that, as in Fig. 33.17, we have two paths, (AMB) and (ANB),
which can be steadily deformed into each other (as if A and B were connected by
a piece of elastic) without passing over the origin. Then these two paths are equi-
valent. In this case, θ starts at θA = 0; although the value of θ wanders about
on (ANB), increasing and decreasing, it still ends at the value θB = 12 π, as on the
path (AMB).

B
M

x
O A

L
Fig. 33.17
759

PROBLEMS
A

x
O

B
Region R
Fig. 33.18

However, (AMB) cannot be deformed into the third path (ALB) without
passing over the origin; by following it around, it can be seen that θB = − 32 π for
this path.
Suppose that we confine consideration to a ‘patch’, or region R as in Fig. 33.18,
which neither contains nor surrounds the origin O. Then, within this region, the
field behaves as if it were conservative, because any path from A to B inside the
region can be deformed into any other without crossing the origin. We could not
tell, from experiments confined to R , that the field is not conservative over the
whole plane.

Problems

33.1 (Section 33.1). Evaluate the following (a) y (b) y


line integrals where (AOB) is shown in 1 (1, 1) 1 (1, 1)
Fig. 33.19a. B B

(a) 
(AOB)
x dx; (b) 
(AOB)
y dx; (c) 
(AOB)
x2 dx.

33.2 Evaluate the following integrals; O 1 x O 1 x


P represents the parabolic path (AOB) on
y 2 = x, shown in Fig. 33.19b.

(a)  x dx;
P
(b)  y dx;
P −1 (1, −1)
A
−1 (1, −1)
A

(c)  x dx;
P
2
(d)  (x + y) dy;
P
Fig. 33.19

(e)  xy dy;
P
2
(f)  (x dx + y dy);
P
33.3 (Section 33.2). Evaluate the following line
(g) P
( 12 dx − y dy); (h) P
(y dx − x dy). integrals over the various paths P, which are
specified parametrically.
760

(a)  xy dx; P is x = t , y = t; 0  t  1.
2 2
(g)  (y dx + x dy); (h)  (y dx + x dy).
LINE INTEGRALS

P (ABC) (AOC)

(b)  (x dy − y dx); P is x = cos t, y = sin t; 0  t  π. 33.6 (Section 33.4). The integrands given are
P perfect differentials; P represents any path
(c)  (z dx − x dy + y dz); P is x = t + 1, y = t, z = 2t;
having the right direction which joins the two
given points. Evaluate
P
0  t  1. (a)  (x dx + y dy + z dz); P is (−1, 1, −1) to (1, −1, 1).
 (x dx + y dy + z dz); P is x = cos t, y = sin t,
P
33

2 2 2
(d)
P (b)  (yz dx + zx dy + xy dz); P is (0, 0, 0) to (1, 1, 1).
z = t; 0  t  2π. P

(c)  e
(e) Compare (c) when P joins the same two
(x dx + y dy + z dz); P is (0, 0, 0) to
x 2+y 2+z 2
points, (1, 0, 0) to (2, 1, 2), but x = t2 + 1,
P
y = 2t − t2, z = 2t2; 0  t  1. (1, 1, 1).
33.4 (Section 33.2). The line integral ∫(AB) f(x, y) dy,
where the path (AB) is described by the curve
(d)  [(y + z) dx + (z + x) dy + (x + y) dz]; P is
P
y = k(x), can be written formally as (1, 1, 1) to (0, 1, 0).

 f (x, k(x))
dk
dx
dx.
(e)  [cos(xy + yz + zx)] [(y + z)dx + (z + x)dy
P
( AB)
+ (x + y) dz]; P is (1, 0, π) to (0, π, 1).
Apply this formula to ∫(AB) (x + y) dy, taken over the
parabolic path in Fig. 33.19b. Express it as the sum
of two ordinary integrals over x. (This is like using
(f)  (xy dx + x y dy); P is (1, 1) to (2, 2).
P
2 2

x as the parameter in Section 33.1.)


33.7 (Section 33.5). Evaluate the following
two-dimensional line integrals over the closed
33.5 (Section 33.3). The references are to Fig. 33.20.
y paths C given, the direction being anticlockwise.
2
C
B
(a)  (x dy − y dx); C is the circle x + y = 4.
C
2 2 2 2

 ⎜⎝ y dx + x dy⎟⎠ ; C is the ellipse --x + --y = 1


⎛x y ⎞ 1 2 1 2
(b) 4 9
C
use the parametrization x = 2 cos θ, y = 3 sin θ.
1
33.8 Evaluate the following (all the paths C are
closed).

A
x
(a)  (y dx + z dy + x dz); C is x = sin t, y = cos t,
C
O 1 z = sin t; 0  t  2π.

Fig. 33.20 (b) (ABC)


(y dx + z dy + x dz); (ABC) is the triangle

A : (1, 0, 0), B : (0, 1, 0), C : (0, 0, 1).

(a)  (ABC)
dx; (b)  (AOC)
dy; (c)  (yz dx + zx dy + xy dz); C is any closed path.
C

(c)  (x dy − y dx); (d)  (x dy − y dx); 33.9 Show that ∫(AB) (yx2 dx + 13 x 3 dy) is path
(ABC) (AOC) independent between any two points A and B.
Use this fact to evaluate the integral along the
(e)  y dy; (f)  y dy; spiral path given in polar coordinates (r, θ ) by
(ABC) (AOC) r = eθ /2π for 0  θ  π.
761
33.10 Show that if ∫(AB) (f dx + g dy) is independent 33.17 A force field has field intensity f(x, y, z)
of the path (AB) for every two points A and B, then = yî + q + xx. Is f conservative? Find the work

PROBLEMS
the integral around every closed path is zero. (Hint: done against the field by a unit particle moving
A and B may coincide.) in a straight line from (0, 0, 0) to (1, 1, 1).

33.11 Show that if the variables are changed 33.18 A force f is given by f (x, y, z) = yzî + xzq +
in a perfect differential form, it remains a perfect xyx. Show that it is conservative. Find the work
differential. Illustrate this by transforming the done against f along the path x = cost, y = sint,
identity y dx + x dy = d(xy) into polar coordinates. z = sin t cos t; − 12 π  t  12 π. Are you doing this
the easiest way?
33.12 (Green’s theorem, Section 33.6). Confirm
the truth of Green’s theorem (33.12) for some very 33.19 Prove that a force field f having the form
simple cases for which you know you can work f = rα t, where α is any constant, r is distance from
out both the line integral and the double integral the origin, t is the unit position vector, and t = r/r,
involved. is a conservative field. (Hint: start by putting
r = (x 2 + y 2 + z 2 ) 2 , and guess something that f
1

33.13 (Green’s theorem, Section 33.6). Check the might be the gradient of. If you cannot guess,
correctess of the area formula, Example 33.9, by then use the fact that grad F(r) = t(∂F/∂r).)
evaluating the line integral --12 ∫C (x dy − y dx) taken
around the following closed paths. 33.20 Generalize Problem 33.19 to a field f = tf(r).
(a) The circle x2 + y2 = 4. What is the potential of such a field?
(b) The ellipse 14 x2 + 19 y2 = 1.
(c) The triangle with vertices (−1, 0), (2, 0), (0, 4). 33.21 Confirm that Green’s theorem still holds for
boundary C of the annular region A between the
33.14 Find the area of the star-shaped circles x2 + y2 = 1 and x2 + y2 = 4 for the line integral
region bounded by the curve x + y = 1, by
2 2
3 3

parametrizing its equation as in Example 33.9.


 [(2x − y ) dx − xy dy].
C
3

33.15 The gravitation force F arising from a What are the directions on C ?
particle of mass M at the origin upon a particle of
mass m at a point with position vector r is given 33.22 Show that ∫C (5x4y dx + x5 dy) = 0 holds
by F = −γ Mmr /r 3. Find the work done by F on a for any closed curve C for which Green’s
particle which travels in from infinity to r. theorem is true.
33.16 Use Green’s theorem with (33.10) to decide
33.23 Sketch the curve given parametrically by
whether the following represent conservative fields
(in two dimensions) or not in the stated regions. x = cos t − 12 sin 2t, y = sin t; 0  t  2π.
(a) (x2 − y2, 2xy); all x, y. Using Green’s theorem, find the area enclosed
(b) (--12 ln(x2 + y2), arctan(y /x)); x  0. by the curve.
Vector fields: divergence
34 and curl

CONTENTS

34.1 Vector fields and field lines 762


34.2 Divergence of a vector field 764
34.3 Surface and volume integrals 765
34.4 The divergence theorem; flux of a vector field 770
34.5 Curl of a vector field 773
34.6 Cylindrical polar coordinates 777
34.7 General curvilinear coordinates 779
34.8 Stokes’s theorem 781
Problems 785

Vector fields in two dimensions have already been encountered in Section 29.6.
A vector field in three dimensions extends this concept to a vector with three
components which are functions of position in space. In terms of cartesian com-
ponents a vector field F(x, y, z) will have the form
F(x, y, z) = F1(x, y, z)î + F2(x, y, z)q + F3(x, y, z)x.
Vector fields abound in physical and engineering applications. Fluid velocity,
gravitational forces, magnetic and electric fields are examples of vector fields. In
time-varying applications the vector field and its components will also depend on
a fourth variable, namely time, but here we shall concentrate only on the space
variables.

34.1 Vector fields and field lines


At each point where the vector field is defined we can draw a vector. Figure 34.1
shows a region with a sample of local vectors drawn. Generally their magnitudes
and directions will vary from point to point.
Assuming that the components of the vector field are smooth functions, we can
associate with the vector field, field lines or integral curves, which are such that
the vector field at any point is always tangential to a field line (Fig. 34.2). (The
streamlines in Fig. 29.8 are field lines for a two-dimensional velocity field.)
Suppose that a particular field line is given by the position vector r = r(t), where t
763

z
z

34.1
v

VECTOR FIELDS AND FIELD LINES


v

O O v

y y
v

x x

Fig. 34.1 Vector field. Fig. 34.2 Field lines.

is any suitable parameter. Then its tangent is in the direction of dr/dt (see eqn (9.18))
which must be in the same direction as F:
dr
= µ(t)F(x, y, z)
dt
where µ(t) is some scalar function of the parameter t. Hence in component form
dx dy dz
= µ(t)F1(x, y, z), = µ(t)F2(x, y, z), = µ(t)F3(x, y, z).
dt dt dt
Elimination of the unknown µ(t) leads to:

Equations for field lines


dx dy dz
= = ,
F1(x, y, z) F2(x, y, z) F3(x, y, z) (34.1)

which are two simultaneous differential equations for x, y, and z in differential


form (see Section 22.4). The solution is not always easy, but here is an example of
a vector field whose field lines can be found.

Example 34.1 Find the field lines of the vector field


F = xy zî + xzq + xx.
2

Equation (34.1) becomes


dx dy dz
= = ,
xy2z xz x
which is equivalent to the two differential equations
dx dy
= y2 , = z,
dy dz ➚
764
Example 34.1 continued
VECTOR FIELDS: DIVERGENCE AND CURL

which are both separable differential equations. Hence

 dx =  y dy,
2
or x = 13 y 3 + C1 , (34.2)

for any value of z, and

 dy =  z dz, or y = 12 z 2 + C2 . (34.3)

for any value of x. Equations (34.2) and (34.3) are two families of surfaces (both are
cylindrical) and their curves of intersections are the field lines of F.

Self-test 34.1
Find the field lines of the vector field given by F = (y, −x, x). What kind of
curves are the field lines?
34

34.2 Divergence of a vector field


The divergence of a vector field F, denoted by div F, is a scalar field defined by:

Divergence of a vector field


∂F ∂F ∂F
div F = ∇ ·F = 1 + 2 + 3 .
∂x ∂y ∂z (34.4)

The notation ∇ · F emphasizes the del operator (Section 29.6)


∂ ∂ ∂
∇=î +q +x
∂x ∂y ∂z
again. Here ∇ · F is the ‘scalar product’ of the operator and the vector field. It can
be proved that div F is invariant, that is, its value does not change, under transla-
tion or rotation of the axes.

Example 34.2 Find the divergence of


F = sin(xy)î + y cos zq + xz cos zx.
From the definition above
∂ ∂ ∂
div F = (sin(xy)) + (y cos z ) + (zx cos z )
∂x ∂y ∂z
= y cos(xy) + (1 + x) cos z − zx sin z.
765

Self-test 34.2

34.3
Let r = xî + yq + zx, and r = | r |. Show that div (r/r 3) = 0.

SURFACE AND VOLUME INTEGRALS


34.3 Surface and volume integrals
Let S be a surface (Fig. 34.3), and let δ S be an element of area on S. Suppose that
f(x, y, z) is a given function. The surface integral of f(x, y, z) over the surface S,
written as

 f(x, y, z) d S
S

is the limit

lim
δ S→0
∑ f (x, y, z) δ S .
S

This superficially resembles, but does not in fact represent a double integral of the
type (32.2) considered earlier. Select x and y as basic variables (we could take y, z
or z, x instead). Then although the surface S is defined as a function of x and y by
writing z in terms of x and y, the elements δ S are inclined to the x,y plane, so do
not have area δx δy. We allow for this and convert the expression into an ordinary
double integral as follows.
Let the projection of S on to the (x, y) plane be R , and let δA be the projection
of the element δ S. (We assume that any line parallel to the z axis cuts S in at most
one point.) The element δA could be the rectangular element having area δx δy,
in which case, for small δx and δy, δ S would be approximately a parallelogram
on the tangent plane at a point P within δ S.
The relation between δ S and δA in Fig. 34.3 depends on the unit normal L at P.
Consider a vertical plane through P containing the vector L and x as shown in
Fig. 34.4. Let θ be the smaller angle between L and x, that is 0  θ  180°.
Then the length of any straight line element in δ S perpendicular to the plane of
L and x is unaltered by the projection, but all line elements in δ S lying in the plane
of n and k are changed in length by projection by a factor cos θ = |L · x|. Hence

δA = |L·x | δ S.

Thus

 f(x, y, z) dS =  f(x, y, z) | LdA⋅ x |


S R
(34.5)

which can be used as a definition of the surface integral. To obtain the surface
area S put f(x, y, z) = 1.
766

x
VECTOR FIELDS: DIVERGENCE AND CURL

z θ
L
δS
P
δS P S

y
x δA R

Fig. 34.3

δA
34

Fig. 34.4

Example 34.3 The roof of a building has the cylindrical shape z = h − bx2 over a
square floor plan given by | x|  a, | y |  a, where h  2a2b (see Fig. 34.5). Find
the surface area of the roof.
The surface area is given by

S=  δS.
S
In this case we use cartesian coordinates to define the element δA, which is the rectangle
with sides parallel to the axes with lengths δx and δy. Thus δA = δx δy. The integration
takes place over the square |x |  a, |y |  a in the (x, y) plane (Fig. 34.6). We also require
the unit normal L. By (28.7) the unit normal will be

z
z = h − bx2
y
a
δy

δx

−a O a x
y

O −a

x
Fig. 34.6

Fig. 34.5

767
Example 34.3 continued

34.3
(−2bx, 0, −1)
L= .
4b2x2 + 1

SURFACE AND VOLUME INTEGRALS


Hence
1
|L.x| = ,
4b x + 1 2 2

and by (34.5)
a a

S= 
−a −a
4b2x2 + 1 dx dy.

The repeated integral is separable (Section 32.7). Hence


a a a

S= 
−a
4b2x2 + 1 dx 
−a
dy = 2a 
−a
4b2x2 + 1 dx.

The remaining integral can be evaluated using the substitution x = (sinh u)/(2b).
The result is
a
S= [2ab 4a2b2 + 1 + sinh −1(2ab)].
b

If the surface S is given by z = f(x, y), we can obtain a general cartesian formula
for the surface area. A vector in the direction of the normal at any point on the
surface is given (see Section 28.4) by

A ∂f ∂f D
n = C − , − , 1F
∂x ∂y

where we have chosen n to be in the direction in which its x component is positive.


This ensures a positive value for the area. A unit vector in the direction of n is


A ∂f ∂f D G A D 2 A D 2J
L = C − , − , 1F H 1 + ∂f + ∂f K .
∂x ∂y I C ∂xF C ∂y F L

Hence


G A D 2 A D 2J
x·L = 1 H 1 + ∂f + ∂f K ,
I C ∂xF C ∂y F L

and the surface area of S is therefore given by

 
G A D 2 A D 2J
 S
dS =
R
H 1 + ∂f + ∂f K dx dy,
I C ∂xF C ∂y F L

where R is the projection of S on to the (x, y) plane.


A surface in three dimensions is a two-dimensional object, which means that it
can be represented by a position vector which is a function of two parameters.
Remember that for a curve in three dimensions, the position vector is a function
768
of a single parameter. Unlike the cartesian form z = f(x, y), parametric equations
enable the creation of much more complicated surfaces.
VECTOR FIELDS: DIVERGENCE AND CURL

Parametric form of a surface


A surface can be represented by a position vector r as a function of two
parameters u and v in the form
r(u, v) = x(u, v)î + y(u, v)q + z(u, v)x,
where a  u  b, c  v  d. (34.6)

The parameters u and v are defined over a rectangle in the (u, v) plane.
For example, for the surface
r = a cos u sin v î + a sin u sin v q + a cos v x,
we can see that
|r | = √[a2 cos2u sin2v + a2 sin2u sin2v + a2 cos2v]
= √[a2(cos2u + sin2u) sin2v + a2 cos2v]
34

= a√[sin2v + cos2v] = a,
which means that the position vector r traces out a sphere of radius a, centre at
the origin. We need to specify u and v to determine which part of the sphere
is defined. For the whole surface, these parameters must range over the intervals
0  u  2π and 0  v  π.
More complicated surfaces can be generated in this way, and their graphical
representation has become easier using symbolic computer software (see Chapter
42, projects for this chapter). For example, the position vector r defined by
r = (3 + cos v) cos u î + (3 + cos v) sin u q + sin v x
where 0  u  2π and 0  v  2π generates a torus (like the shape of a doughnut)
with its axis in the x direction (Fig. 34.7a). The vase-shaped surface in Fig. 34.7b
is generated by

(a) (b) z

y
y
x x

Fig. 34.7 (a) Torus; (b) a vase.


769
r = (1 + a sin bu) cos v î + (1 + a sin bu) sin v q + u x,

34.3
where a = 0.3, b = 3.5 for 0  u  2 and 0  v  2π.
Triple integrals or volume integrals can also be defined in vector calculus. By

SURFACE AND VOLUME INTEGRALS


analogy with the double integral, the triple integral of a function f (either a scalar
or vector field) over a three-dimensional region V is

I=  f(P) dV,


V

where δV is an increment of volume and P is a point in δV (see Fig. 34.8). Its evalua-
tion requires it to be converted into a repeated integral with three integrations.

δV

O
x y
v
Fig. 34.8

Example 34.4 A cube of metal occupying | x|  a, | y |  a, | z |  a has a radial


density distribution given by
ρ(x, y, z) = α + β(x2 + y2 + z2),
where α and β are positive constants. Find the mass of the cube.
We choose a rectangular grid with volume element δV = δx δy δz (Fig. 34.9). The mass of
this element is approximately
ρ δx δy δz = [α + β(x2 + y2 + z2)] δx δy δz.

δV = δx δy δz
z

O y

v
Fig. 34.9

770
Example 34.4 continued
VECTOR FIELDS: DIVERGENCE AND CURL

The total mass is therefore the sum or integral of these elements within the cube. The
integral, with δx, δy, and δz parallel to the axes, sweeps out the interior of the cube if it
is integrated in the x, y, and z directions, in turn, between −a and a in each case. Hence
the mass M of the cube is
a a a

M=    [α + β(x + y + z )] dx dy dz.
−a −a −a
2 2 2

This integral can now be evaluated as a repeated integral as follows:


a a

M=   [α x + β(--x + xy + xz )] dy dz
−a −a
1 3
3
2 2 a
−a

a a

=   [2α a + 2β (--a + ay + az )] dy dz
1 3
3
2 2

−a −a
a

=  [2α ay + 2β (--a y + --ay + ayz )] dz


1 3
3
1
3
3 2 a
−a
−a
a

=  [4α a + 4β (--a + --a + a z )] dz = [4α z + 4β ( a z +


2 1 4
3
1 4
3
2 2 2 2
3
4 1 2 3
3a z )] a−a
34

−a

= 8a3(α + β a2).

Self-test 34.3
Find the surface area of the paraboloid defined by z = 1 − x2 − y2 for z  0.

34.4 The divergence theorem; flux of a vector field


The divergence theorem (due to Gauss) relates a volume integral to a surface
integral. Let V be a region in three dimensions which is bounded by a smooth
surface S. We shall prove the theorem in the restricted case in which any straight
line parallel to any of the cartesian axes cuts S in at most two points. It will look
something like the surface shown in Fig. 34.10. The theorem is:

z
L2

S2 : z = g2(x, y)
δS2

δV

O
V
x δS1 y
C
S1 : z = g1(x, y) L1 Fig. 34.10
771

Divergence theorem and flux

34.4
Let S be a surface enclosing a region V, and let F be a smooth vector field defined
in V. Then

THE DIVERGENCE THEOREM; FLUX OF A VECTOR FIELD


 div F dV =  F·L dS,
V S

where L is the unit normal to S drawn outwards from V (the integral on the right
is called the flux of F out of S ). (34.7)

Within the restrictions imposed on S we can divide S into two surfaces, an upper
one S2 with equation, say, z = g2(x, y) and a lower one S1 with equation z = g1(x, y),
the two surfaces meeting on the curve C. We shall use the cartesian increment
δx δy δz for δV.
The divergence theorem is really the sum of three results. Suppose that F =
F1î + F2 q + F3x, and consider first

I3 =  ∂F∂z dx dy dz.


V
3

If R is the projection of C on to the (x, y) plane, then as a repeated integral,


G z=g2(x,y)
∂F3 JK
I3 =   R
H
I z=g (x,y)
dz dx dy =
∂z L
1
 [F (x, y, z)]
R
3
z=g2(x,y)
z=g1(x,y) dx dy

=  [F (x, y, g (x, y)) − F (x, y, g (x, y))] dx dy.


R
3 2 3 1

From the previous section, noting carefully the directions of L1 and L2, the outward
normals to S1 and S2, it follows that
dx dy = x · L2 d S2 on S2 but dx dy = −x· L1 dS1 on S1,
since the angle between x and L1 is obtuse. Hence

I3 = F (x, y, g (x, y))x· L dS +  F (x, y, g (x, y))x ·L dS


S2
3 2 2 2
S1
3 1 1 1

=  F x· L d S. 3
S

Similarly it can be shown that

I1 =  ∂F∂x dV =  F î · L dS.


V
1

S
1

I2 =  ∂F∂x dV =  F q · L dS.


V
2

S
2
772
VECTOR FIELDS: DIVERGENCE AND CURL

δS v

Fig. 34.11

Addition of these results gives the divergence theorem:

 C ∂F∂x + ∂F∂y + ∂F∂z F dV =  div F dV


A D
I1 + I2 + I3 = 1 2 3

V V

 (F î + F q + F x) ·L dS =  F · L dS.
34

= 1 2 3
S S

The divergence theorem tells us something about the physical interpretation


of the divergence of a vector field. In Fig. 34.11 the curves represent streamlines
of the flow of an incompressible fluid, and v is the local velocity of the fluid.
Consider any fixed closed surface S drawn in the flow. Then the outflow through
an element of area δS on the surface (that is, the flux through δS ) will be v ·L δS
per unit time. The total outflow through S will be

 v · L dS.
S

Assuming that the fluid is incompressible, and fluid is neither being created
nor destroyed within S, it follows that

 v · L dS = 0,
S

that is the net outflow, the flux through S, is zero. By the divergence theorem it
must be true that

 div v dV = 0
V

for every closed surface S. Therefore


div v = 0
throughout the flow. This is known as the equation of continuity of an incom-
pressible fluid in fluid dynamics. A vector field which satisfies div v = 0 is said to
be solenoidal. Generally the divergence of a vector field at a point P measures the
rate at which the vector field spreads out from P.
773

(a) z (b) z

34.5
CURL OF A VECTOR FIELD
O O

y y
x x

Fig. 34.12

It is not difficult to generalize the divergence theorem to regions which have


corners or parts of S parallel to an axis as in Fig. 34.12a, or to regions for which
the two-point rule does not apply as in Fig. 34.12b. This region can be split into
regions to each of which the divergence theorem applies, and the theorem applies
to the whole region by addition. The surface integrals over the joins cancel out.

Self-test 34.4
C is a simple (not self-intersecting) closed curve of area A on the plane
z = h  0 in three dimensions. A cone is formed by joining each point on C
to the origin by a straight line. Using the divergence theorem find the volume
of the cone. Deduce the volume of a regular tetrahedron and a regular
octahedron both having all sides of length a.

34.5 Curl of a vector field


The curl of a vector field
F(x, y, z) = F1(x, y, z)î + F2(x, y, z)q + F3(x, y, z)x
is a vector field defined as follows:

Curl of a vector field


⎛ ∂F ∂F ⎞ ⎛ ∂F ∂F ⎞ ⎛ ∂F ∂F ⎞
curl F = ⎜ 3 − 2 ⎟ î + ⎜ 1 − 3 ⎟ q + ⎜ 2 − 1 ⎟ x
⎝ ∂y ∂z ⎠ ⎝ ∂z ∂x ⎠ ⎝ ∂x ∂y ⎠
î q x
∂ ∂ ∂
=
∂x ∂y ∂z
F1 F2 F3
(34.8)
774
This ‘determinant’ is a useful hybrid form which has unit vectors on the top row,
operators on the second, and components on the third, and it is evaluated using
VECTOR FIELDS: DIVERGENCE AND CURL

the first row expansion rule for determinants. This rule is analogous to the
determinant rule for the vector product given in Section 11.2. The del form is
curl F = ∇ × F.
It can be proved that curl F is a vector invariant: the physical entity it represents
does not vary under translation or rotation of the axes.

Example 34.5 Find the curl of


F = e î + (x + y)q + xz eyx.
xyz 2

Using (34.8)
î q x
∂ ∂ ∂
curl F =
∂x ∂y ∂z
exyz x2 + y xz ey

⎛ ∂ ∂ 2 ⎞ ⎛ ∂ ∂ ⎞
34

= ⎜ (xz ey ) − (x + y)⎟ î + ⎜ (exyz ) − (xz ey )⎟ q


⎝ ∂y ∂z ⎠ ⎝ ∂z ∂x ⎠
⎛ ∂ ∂ xyz ⎞
+ ⎜ (x 2 + y) − (e )⎟ x
⎝ ∂x ∂y ⎠
= xz eyî + (xy exyz − z ey)q + (2x − xz exyz)x.

We can interpret the curl of a vector field as follows. Consider a rectilinear


shear fluid flow with velocity
v = ω yî (ω constant).
Imagine that we are looking down on the surface of the flow in Fig. 34.13.
The divergence of v is zero so that the flow satisfies the equation of continuity.
Its curl is given by

î q x
∂ ∂ ∂
curl v = = −ω x,
∂x ∂y ∂z
ωy 0 0

O x

Fig. 34.13
775
which is a vector perpendicular to the x,y plane. The fluid as a whole does
not appear to rotate, but a small leaf placed on the flow will rotate in a clockwise

34.5
sense as it is carried along with the stream. For example, if it is placed so that
y  0 for all points on the leaf, then the points furthest from the x axis will

CURL OF A VECTOR FIELD


be moving faster than those nearest. The spin, or local angular velocity, turns
out to be
1
2 curl v = − 21 ω x

everywhere.
A vector field which satisfies curl v = 0 is said to be irrotational. There are two
important identities for special vector fields.

A conservative potential field φ is irrotational since


curl grad φ = 0. (34.9)

The curl of a vector field F is solenoidal since


div curl F = 0. (34.10)

The verification of these results is straightforward. For the first one

î q x
∂ ∂ ∂
curl grad φ = ∂x ∂y ∂z
∂φ ∂φ ∂φ
∂x ∂y ∂z

⎛ ∂2φ ∂2φ ⎞ ⎛ ∂2φ ∂2φ ⎞ ⎛ ∂2φ ∂2φ ⎞


=⎜ − ⎟î +⎜ − ⎟q +⎜ − ⎟x
⎝ ∂y ∂z ∂z ∂y ⎠ ⎝ ∂z ∂x ∂x ∂z ⎠ ⎝ ∂x ∂y ∂y ∂x ⎠
=0
assuming that scalar field is smooth enough to ensure that all the mixed partial
derivatives cancel.
For the second result

∂ ⎛ ∂F3 ∂F2 ⎞ ∂ ⎛ ∂F1 ∂F3 ⎞ ∂ ⎛ ∂F2 ∂F1 ⎞


div curl F = ⎜ − ⎟ + ⎜ − ⎟ + ⎜ − ⎟
∂x ⎝ ∂y ∂z ⎠ ∂y ⎝ ∂z ∂x ⎠ ∂z ⎝ ∂x ∂y ⎠
∂2 F3 ∂2 F2 ∂2 F1 ∂2 F3 ∂2 F2 ∂2 F1
= − + − + −
∂x ∂y ∂x ∂z ∂y ∂z ∂y ∂x ∂z ∂x ∂z ∂y
= 0.
776

Example 34.6 Show that the vector field


VECTOR FIELDS: DIVERGENCE AND CURL

F = (y2z + z + y exy)î + (2xyz + x exy)q + (xy2 + x)x


is conservative. Find the scalar potential φ of F.
We first check that curl F = 0. Thus

î q x
∂ ∂ ∂
curl grad φ =
∂x ∂y ∂z
y2z + z + y exy 2xyz + x exy xy2 + x

⎛ ∂ ∂ ⎞
= ⎜ (xy2 + x) − (2xyz + x exy )⎟ î
⎝ ∂y ∂z ⎠
⎛ ∂ ∂ ⎞
+ ⎜ (y2 z + z + y exy ) − (xy2 + x)⎟ q
⎝ ∂z ∂x ⎠
⎛ ∂ ∂ 2 ⎞
+ ⎜ (2xyz + x exy ) − (y z + z + y exy )⎟ x
⎝ ∂x ∂y ⎠
34

= î(2xy − 2xy) + q [(y2 + 1) − (y2 + 1)] + x[(2yz + exy + xy exy)


− (2yz + exy + xy exy)]
= 0.
We now need to find φ such that grad φ = F, that is
∂φ
= y2z + z + y exy,
∂x
∂φ
= 2xyz + x exy,
∂y
∂φ
= xy2 + x.
∂z
Integrate the partial derivatives with respect to x, y, and z to give:


φ = (y2z + z + y exy) dx + f(y, z) = xy2z + xz + exy + f(y, z); (34.11)

φ = (2xyz + x e ) dy + g(z, x) = xy z + e + g(z, x);


xy 2 xy
(34.12)

φ = (xy + x) dz + h(x, y) = xy z + xz + h(x, y).


2 2
(34.13)

Here the ‘constants of integration’ become functions of the other two variables in each
case since partial derivatives are being integrated. Finally, φ given by (34.11), (34.12),
and (34.13) must all result in the same answer. This can be achieved by the choices
f(y, z) = C, g(z, x) = xz + C, h(x, y) = exy + C,
where C is any constant. Hence
φ = xy2z + xz + exy + C.
Note that potentials of conservative fields can only be found to within an additive
constant.
777

Self-test 34.5

34.6
Let F(x, y, z) = xyî + yz2q + xyzx and G(x, y, z) = yî + zq + (x2 + y2)x.
Determine curl F, curl G, and div (F × G). Verify the identity

CYLINDRICAL POLAR COORDINATES


div (F × G) = G · curl F − F · curl G.

34.6 Cylindrical polar coordinates


In many applications it is advantageous to use alternative three-dimensional
coordinate systems. Usually the geometry of the application suggests an appropri-
ate system, and one system which is suitable for problems involving cylinders uses
cylindrical polar coordinates (ρ, θ, z) shown in Fig. 34.14. They are related to
x, y, z by:

Cylindrical polar coordinates ρ, φ, z


x = ρ cos φ, y = ρ sin φ, z=z
(0  ρ  ∞, 0  φ  2π, −∞  z  ∞).
(34.14)

Cylinder ρ = constant
z
z êz

P : (x, y, z) êφ
P : (ρ, φ, z)

z P : (ρ, φ, z)
êρ
φ
φ
ρ ρ y
y
x y x
Vertical plane
x
φ = constant
Fig. 34.14 Cylindrical polar coordinates. Horizontal plane
z = constant

Fig. 34.15

A point P can be viewed as lying at the intersection of three surfaces (Fig. 34.15):
the cylinder ρ = a constant, the radial plane φ = a constant through the z axis, and
the horizontal plane z = a constant. These surfaces meet at right angles at every
point, and coordinate systems with this property are said to be orthogonal.
The point P can be represented by the position vector r, where
r = r( ρ, φ, z) = ρ cos φ î + ρ sin φ q + z x.
Along the ρ-increasing line through P, φ and z are constant. The vector ∂r /∂ρ,
evaluated at P, is a tangent to this curve at P, pointing in the direction of increas-
ing ρ. The corresponding unit vector in this direction is
778

∂r ∂r 1 ∂r
êρ = =
VECTOR FIELDS: DIVERGENCE AND CURL

∂ρ ∂ρ hρ ∂ρ

where hρ = |∂r/ ∂ρ | is called the scale factor associated with ρ. In cylindrical


coordinates

∂r
hρ = = | cos φ î + sin φ q | = (cos2 φ + sin2 φ )2 = 1.
1

∂ρ

Similarly, hφ = ρ and hz = 1. Therefore the unit vectors in cylindrical polars are


∂r 1 ∂r ∂r
êρ = = cos φ î + sin φ q , êφ = = −sim φ î + cos φ q , êz = =x (34.15)
∂ρ ρ ∂φ ∂z
All partial derivatives are to be evaluated at P.
The gradient of a scalar function U(x, y, z) can be expressed in terms of the unit
vectors êρ, êφ, êz. Suppose that
grad U = gρêρ + gφ êφ + gzêz,
34

where we require the components gρ, gφ, gz. Treating U as a function of ρ, φ, z, the
incremental formula gives
∂U ∂U ∂U
δU = δρ + δφ + δ z. (34.16)
∂ρ ∂φ ∂z
Also, from (31.15), the directional derivative of U is
dU
= v · grad U,
ds
where v represents an arbitrary direction. Since | δs| = | δr |,
dr ∂r dρ ∂r dφ ∂r d z dρ dφ dz
v= = + + = hρ êρ + hφ êφ + hz êz .
d s ∂ρ d s ∂φ d s ∂z d s ds ds ds
Therefore,
dU dρ dφ dz
= hρ g ρ + hφ gφ + hz gz ,
ds ds ds ds
or, expressed in increments,
δU = hρ gρ δρ + hφ gφ δφ + hz gz δz. (34.17)

Compare (34.16) and (34.17), which are true for arbitrary δρ, δφ, δz. In turn, put
one of δρ, δφ, δz to a nonzero value, and the other two to zero. We obtain
1 ∂U 1 ∂U 1 ∂U
gρ = , gφ = , gz = ,
hρ ∂ρ hφ ∂φ hz ∂z
so that grad U is given by
1 ∂U 1 ∂U 1 ∂U ∂U 1 ∂U ∂U
grad U = êρ + êθ + êz = êρ + êφ + êz .
hp ∂ρ hφ ∂φ hz ∂z ∂ρ ρ ∂φ ∂z
779
The divergence and curl also have their cylindrical polar forms. For the vector
field F = Fρ êρ + Fφ êφ + Fz êz, these are

34.7
1⎡∂ ∂ ∂ ⎤
div F = ( ρFρ ) + (Fφ ) + ( ρFz )⎥ ,

GENERAL CURVILINEAR COORDINATES



ρ ⎣ ∂ρ ∂φ ∂z ⎦
and

⎡ êρ ρêφ êz ⎤
1⎢∂ ∂ ∂⎥
curl F = ⎢ ⎥
ρ ⎢ ∂ρ ∂φ ∂z ⎥
⎢⎣ Fρ ρFφ Fz ⎥⎦

⎛ 1 ∂Fz ∂Fφ ⎞ ⎛ ∂Fρ ∂Fz ⎞ 1⎛ ∂ ∂ ⎞


=⎜ − ⎟ êρ + ⎜ + ⎟ êφ + ⎜ ( ρFφ ) − (Fρ )⎟ êz .
⎝ ρ ∂φ ∂z ⎠ ⎝ ∂z ∂ρ ⎠ ρ ⎝ ∂ρ ∂φ ⎠

34.7 General curvilinear coordinates


In this section we simply summarize the generalizations of the results given in
the previous section for cylindrical polar coordinates to orthogonal curvilinear
coordinates. Suppose that the position vector r of a point is expressed in terms of
the curvilinear coordinates u1, u2, u3 so that

r = r(u1, u2, u3) = x(u1, u2, u3)î + y(u1, u2, u3)q +z(u1, u2, u3)x.

x u3 = constant

q ê3

P
O
ê2
u1 = constant ê
1
Fig. 34.16 Orthogonal curvilinear
u2 = constant
î coordinates.

Assume that the curvilinear coordinates are orthogonal, that is the surfaces
u1 = a constant, u2 = a constant, u3 = a constant meet at right angles at every point
(Fig. 34.16). The unit vector ê1 is in the direction of the curve along which the
surfaces u2 = constant and u3 = constant meet, and it points in the direction of u1
increasing. The other unit vectors are in the directions of the intersections of the
other surface pairs as shown in Fig. 34.16.
The scale factors and unit vectors are given by:
780

Scale factors, unit vectors


VECTOR FIELDS: DIVERGENCE AND CURL

∂r ∂r ∂r
h1 = , h2 = , h3 = .
∂u1 ∂u2 ∂u3
1 ∂r 1 ∂r 1 ∂r
ê1 = , ê2 = , ê3 = .
h1 ∂u1 h2 ∂u2 h3 ∂u3
Elements of distance δs in the u1, u2, u3 directions are respectively
h1 δu1, h2 δu2, h3 δu3. (34.18)

We simply state the formulae for grad, div, and curl in general curvilinear co-
ordinates without derivation. They are given by:

Gradient of U
1 ∂U 1 ∂U 1 ∂U
grad U = ê1 + ê2 + ê3.
h1 ∂u1 h2 ∂u2 h3 ∂u3 (34.19)
34

Divergence of F = F1 ê1 + F2 ê2 + F3 ê3
1 ⎡ ∂ ∂ ∂ ⎤
div F = ⎢ (h2 h3F1 ) + (h3h1F2 ) + (h1h2 F3 )⎥ .
h1h2 h3 ⎣ ∂u1 ∂u2 ∂u3 ⎦ (34.20)

Curl in orthogonal curvilinear coordinates


⎡h1ê1 h2 ê2 h3ê3 ⎤
1 ⎢⎢ ∂ ∂ ∂ ⎥
⎥.
curl F =
h1h2 h3 ⎢ ∂u1 ∂u2 ∂u3 ⎥
⎢ ⎥
⎢⎣h1F1 h2 F2 h3F3 ⎥
⎦ (34.21)

Derivations of these formulae are given by Riley, Hobson and Bence (1997).

Example 34.7 In terms of (x, y, z), spherical polar coordinates (r, θ, φ)


are given by
r = xî + yq + zx = r sin θ cos φ î + r sin θ sin φ q + r cos θ x,
r  0, 0  θ  π, 0  φ  2π.
The coordinates are shown in Fig. 34.17: the coordinates are orthogonal with
coordinate surfaces of a sphere, r = constant, a vertical plane, φ = constant,
and a cone, θ = constant. Find the scale factors of these curvilinear coordinates.
Hence, obtain the gradient of the scalar field U, and the divergence of the vector
field F = Fr êr + Fθ êθ + Fφ êφ in spherical polar coordinates. ➚
781
Example 34.7 continued

34.8
z

STOKES’S THEOREM
P : (r, θ , φ)
r
θ
φ

y Fig. 34.17 Spherical polar


x
coordinates (r, θ, φ).

The scale factors are


∂r
hr = = |sin θ cos φ î + sin θ sin φ q + cos θ x |
∂r

= sin2θ (cos2 φ + sin2 φ ) + cos2θ = 1,


∂r
hθ = = |r cos θ cos φ î + r cos θ sin φ q − r sin θ x|
∂θ
= r cos2θ(cos2 φ + sin2 φ ) + sin2θ = r,

∂r
hφ = = |−r sin θ sin φ i + r sin θ cos φ j | = r sin θ.
∂φ
From (34.19)
∂U 1 ∂U 1 ∂U
grad U = êr + êθ + êφ .
∂r r ∂θ r sin θ ∂φ
By (34.20)
1 ∂ 2 1 ∂ 1 ∂
div F = (r Fr ) + (sin θ Fθ ) + (Fφ ).
r 2 ∂r r sin θ ∂θ r sin θ ∂φ

Self-test 34.6
Spherical polar coordinates are given by x = r sin θ cos φ, y = r sin θ sin φ,
z = r cos θ. Using hr, hθ , hφ given in Example 34.7 and (34.21), obtain curl F.
Verify that curl F = 0 if F = 2r sin θî + r cos θq.

34.8 Stokes’s theorem


Figure 34.18 shows a finite open region SC bounded by a simple, closed, directed
curve C, and whose projections on to the three coordinate planes are also simple.
The projection on to the x,y plane is shown, and a unit vector perpendicular to
SC at a general point on SC is indicated.
782

SC
VECTOR FIELDS: DIVERGENCE AND CURL

x
δS
C
q
Projection
A3
δA3
C3
î

Fig. 34.18 The region SC ; its boundary C with direction of C indicated (this direction corresponds
to the choice of the positive side of S as the upper side of the diagram); the positive direction of the
normal L, and the projection of the system on to the x,y plane.

Suppose also that SC is such that every straight line parallel to î, q or x cuts SC at
most once. This is a drastic restriction on SC ; for example, if x, y or z has a local
maximum or minimum at a point on SC the condition is not satisfied; but this is
34

the first step towards consideration of a general form of surface. The condition
implies that on SC
x = X(x, y), y = Y(x, y), z = Z(x, y), (34.22)

where X, Y and Z are single-valued functions.


Next, identify a ‘negative side’ SC− and a ‘positive side’ SC+ for SC . These may
been chosen arbitrarily. (We have chosen these to be the underside and upper side
respectively in Fig. 34.18.) The direction of the unit normals is defined as emerg-
ing from SC into the space on the positive side SC+. The direction along C is decided
by the right-hand screw rule: if a standard corkscrew is turned so as to advance
through C from the negative side into the positive space, then C must follow the
direction of its rotation.
We assume that all the functions involved are ‘smooth’ enough to validate the
processes used. This includes smoothness conditions on SC and C , but in applica-
tions the functions involved are normally unproblematic.
Now suppose that the geometrical system described lies in a vector field
V(x, y, z) = îu(x, y, z) + qv(x, y, z) + xw(x, y, z).
Then
curl V = curl(îu) + curl( qv) + curl(xw). (34.23)

We shall show that under the restrictions described

 L · curl V dS =  V · ds,
SC C
(34.24)

where dS is an area element of SC , and ds a vectorial element of C in the direction of


C . C is a line integral around the closed curve C . It may help in interpreting (34.24)
to think of V as representing the velocity field of a fluid. Equation (34.24) then
states that the flux of the vector field curl V (the vorticity through SC ) is equal to
the circulation of V round the curve C.
783
Consider the contribution of the first term in (34.24) to L · curl V dSC . From the
definition of curl,

34.8
G ∂u ∂u J G ∂u ∂u J
[L · curl(î)]SC dS = L · I q − x L dS = I L · q − L · x L dS, (34.25)
∂z ∂y SC ∂z ∂y SC

STOKES’S THEOREM
the suffix indicating that the derivatives of u are to be evaluated on SC . Since
z = Z(x, y) on SC ,
uSC (x, y, z) = u(x, y, Z(x, y)) = U(x, y)
say, where U is a function of two variables. (Notice that
G ∂u(x, y, z) J ∂U(x, y)
I L ≠ .
∂x CS ∂x
These terms are connected through a chain rule.)
If r(x, y, z) is the position vector of a point on SC , then
r = îx + qy + xZ(x, y) = R(x, y).
But (see Section 28.5), ∂R/∂y is perpendicular to L at the point, so that from
(34.22)
∂R ∂Z
L· =L·q+L·x = 0.
∂y ∂y
By substituting for L · q in (34.25) we obtain
G ∂u ∂u ∂Z J ∂U
[L · curl(î)]SC dS = −L · x I + dS = −L · x dS (by the chain rule).
∂y ∂z ∂y L SC ∂y
Also L · x dS = dA3, where dA3 is the area of dS projected on to the x,y plane, as in
Fig. 34.8, and C projects on to C 3. Therefore

 L · curl(îu)dS =  C − ∂U∂y F dy dz.


A D
SC A3

Green’s theorem in the plane (see Section 33.6) applies to this projection: when
we put Q = U(x, y) and P = 0 in eqn (33.12), we obtain from Green’s theorem

 L · curl(îu) dS =  U(x, y) dx =  u(x, y, z) dx,


SC C3 C

since U(x, y) on C 3 is equal to u(x, y, z) on C. Due to the property (34.22),


projections on to the y, z and z, x planes must lead similarly to

 L · curl( qv) dS =  v(x, y, z) dy,


SC C

and

 L · curl(xw) dS =  w(x, y, z) dz.


SC C

By adding the three results we obtain


784
VECTOR FIELDS: DIVERGENCE AND CURL

B Fig. 34.19
Notional illustration of adjacent
patches and the boundary curve C.
S′C ′
On opposite sides of a common
S″C ″
edge, the directions of circulation
C A are opposed, as on AB.

 L · V dS =  (u dx + v dy + w dz) =  V · ds,
SC C C
confirming eqn (34.24)
Now consider a smooth surface SC having boundary C which does not neces-
sarily satisfy the geometrical limitations prescribed. It is plausible that such a
surface can be partitioned into sub-areas, each of which does satisfy the con-
ditions applied so far; the surface can be covered by N non-overlapping ‘patches’
of this type. The result (34.24) applies to each element Ci separately, so that SC is
the union of N patches, and by addition
34

 
N
L · curl V dS = ∑ V · ds. (34.26)
S i=1 Ci

The directions Ci round the patches are each determined by the right-hand screw
rule.
Figure 34.19 shows two contiguous patches S ′ and S ″ that share a common
boundary segment AB. Along AB, ds″ = −ds′, and V is continuous across AB,

so C″
V · ds″ = −  C′
V · ds′. Therefore, when the summation (34.26) is carried out,

cancellation takes place along all the edges of adjoining patches, leaving only the
contributions from the uncompensated segments constituting the boundary C.
Therefore we have obtained a general form for Stokes’s theorem (of which
Green’s theorem can be regarded as a special case):

Stokes’s theorem  L · curl V dS =  V · ds,


SC C

where V is a continuous vector field on SC with boundary a simple, closed,


directed curve C, and L is the continuously varying unit normal to SC with
direction relative to C determined by the right-hand screw rule. (34.27)

Note 1. The vectorial expressions in Stokes’s theorem do not depend upon the
axes x, y, z that were used in the proof – the course of the argument would have been
the same whatever right handed axes had been used. (The expressions in (34.27) are
invariant with respect to transformations between right-handed systems of axes.)
Note 2. The flux of curl V through SC is constant and equal to the circulation
round C for all surfaces SC spanning a fixed curve C. (This also follows from the
Divergence Theorem (34.7) since, by (34.10), curl V ≡ 0.)
785
Problems

PROBLEMS
34.1 Find the surface area of the spherical cap of (a) grad(UV) = U grad V + V grad U;
height h whose equation for z  0 is (b) div(UF) = (grad U) ·F + U div F;
z = √[a2 − x2 − y2] − a + h, (0  h  a). (c) div(F × G) = (curl F) ·G − F·(curl G);
(d) curl curl F = grad(div F) − div grad F. By div
34.2 Evaluate the following triple integrals as grad F is meant î div(grad F1) + q div(grad F2 )
repeated integrals: + x div(grad F3 ).
1 z 2y
(e) grad(F · G) = F × curl G + G × curl F +
(a) 
0 0 y
x dx dy dz; (F ·grad)G + (G ·grad)F.

1 z √[1− y 2 ] 34.9 Show that


(b) 
0 0 0
x dx dy dz;
div grad φ =
∂ 2φ ∂ 2φ ∂ 2φ
+ + .
∂x 2 ∂y 2 ∂z 2
1 z √[1− y 2 − z 2]

(c) 
0 0 − 12 √[1− y 2 − z 2]
x3 dx dy dz. This is often written as ∇2φ . The equation
∇2φ = 0
is known as Laplace’s equation.
34.3 It is intended to evaluate the integral
Show that φ = 1 / x 2 + y 2 + z 2 is a solution of
 f(x, y, z) dx dy dz
V
Laplace’s equation.

as a repeated integral over the interior of the sphere 34.10 Prove that
x2 + y2 + z2 = a2 which lies in the first octant x  0, (a) div(F + G) = div F + div G;
y  0, z  0. Work out the limits of integration if (b) curl(F + G) = curl F + curl G.
the order of integration is x followed by y
followed by z. 34.11 Find the divergence of each of the following
vector fields:
34.4 Show that the volume of the tetrahedron (a) F = exyzî + ey zq + exzx;
2

bounded by the coordinate planes x = 0, y = 0, (b) F = (xz − y)î + yzq + 2xyx;


z = 0 and the plane (c) F = (xz− y2)î + yzq + 2x2yx.
x y z Indicate any vector fields which are solenoidal.
+ + = 1, a  0, b  0, c  0,
a b c
is 16 abc. 34.12 Find the curl of each of the following vector
fields:
(a) F = exyzî + ey zq + exzx;
2

34.5 Find the area of the surface z = x2 + y for


which | x |  1 and | y |  1. (b) F = (xz − y)î + yzq + 2xyx;
(c) F = (2xy + yz)î + (x2 + xz)q + xyx.
34.6 Show that the vector field
34.13 A vector field v is both irrotational and
F =(yz exyz − y sin xy + z)î + (xz exyz − x sin xy)q
solenoidal. Show that its scalar potential Φ satisfies
+ (xy exyz + x)x Laplace’s equation:
is irrotational. Find the scalar potential of F. ∇2Φ = 0.
34.7 Paraboloidal coordinates (u, v, φ) are
34.14 If r = xî + yq + zx and r is its magnitude find
defined by
(a) div(r 2r); (b) curl(r 3r);
x = uv cos φ, y = uv sin φ, z = 12(u2 − v2),
(c) grad r 3; (d) div(r/r 3);
where u  0, v  0, and 0  φ  2π. Find the (e) curl(r/r 2); (f) div grad r 3.
corresponding scale factors. Find also div F in
paraboloidal coordinates.
34.15 Prove the identity
34.8 Using the definitions of grad, div, and curl, (v ·grad)v = --12 grad v2 − v × curl v,
verify the following identities: where v = | v |.
786
34.16 Show that Laplace’s equation (see Problem for any closed surface S for which the divergence
34.9) in cylindrical polar coordinates is given by theorem holds.
VECTOR FIELDS: DIVERGENCE AND CURL

1 ∂ ⎛ ∂U ⎞ 1 ∂ 2U ∂ 2U
⎜ρ ⎟+ 2 + = 0.
ρ ∂ρ ⎝ ∂ρ ⎠ ρ ∂φ 2 ∂z 2 34.20 Suppose that F is a smooth vector field
which equals the outward unit normal L on S. Use
If U = f( ρ), that is U is independent of the other the divergence theorem to show that the surface
variables, show that f satisfies the ordinary area of S is given by
differential equation
ρf ″(ρ) + f ′(ρ) = 0.
Hence show that f(ρ) = A + B ln ρ, where A and B
 div F dV,
V

are constants. where V is the interior of S.

34.17 Show that Laplace’s equation in spherical


34.21 Let S be a closed surface surrounding a
polar coordinates is given by
region V for which the divergence theorem holds.
1 ∂ ⎛ 2 ∂U ⎞ 1 ∂ ⎛ ∂U ⎞ By using the vector field r = xî + yq + zx, show that
⎜r ⎟+ ⎜ sin θ ⎟
r 2 ∂r ⎝ ∂r ⎠ r 2 sin θ ∂θ ⎝ ∂θ ⎠ the volume enclosed by S is

+
1 ∂ 2U
r sin θ ∂φ 2
22
= 0.
1
3  r ·L dS,
S
A solution with spherical symmetry is sought for where L is the outward normal to S.
34

U, that is with U = f(r). Show that f(r) = A + (B /r), Using this result verify that
where A and B are constants.
(a) the volume of the sphere enclosed by x2 + y2 +
z2 = a2 is 43 πa3;
34.18 A vector field is given by F = xy2î + xzq +
(b) the volume of a cone with vertex at the origin
xyzx. Let S be the surface of a cube bounded by
the planes x = ±1, y = ±1, z = ±1. Use the divergence and plane base of area A in the plane z = h is
1
theorem to evaluate 3 Ah.

 F · L dS,
S
34.22 Let S be a closed surface surround a region
V for which the divergence theorem holds. Let F be a
where L is the outward normal to the cube. vector field which satisfies div F = 1 in a region which
contains V. Show that the volume enclosed by S is
34.19 Prove that given by the formula

 L · curl F dS = 0
S
 F· L dS.
S
Part 6
Discrete mathematics
Sets
35

CONTENTS

35.1 Notation 789


35.2 Equality, union, and intersection 790
35.3 Venn diagrams 792
Problems 799

We are often interested in grouping together objects that have common charac-
teristics or features. We might be interested in the integers 1, 2, 3, 4, or in all the
integers. Such a group is called a set. The set of all points in a plane would consist
of pairs of numbers of the form (x, y), where x and y are coordinates which can
take any real values. These examples all involve numbers, but the elements of sets
can be other objects such as functions, or matrices, or Fourier series, or Laplace
transforms, etc.

35.1 Notation
A set is a collection of objects or elements. The elements in the set can be defined
by a rule or in any descriptive manner. Sets are usually denoted by capital letters
such as S, A, B, X, etc., and their elements by lowercase letters such as s, a, b, x,
etc. The elements in a set are listed between braces { … }. If the set A consists of
just two numbers 0 and 1, then we write
A = {0, 1}, or A = {1, 0}, (35.1)

the order being a matter of indifference. We say that 0 and 1 are the elements or
members of the set A, or belong to A. We write
0 ∈A, 1 ∈A,
read as ‘0 belongs to the set A’, etc. The number 2 does not belong to A, and we
write
2 ∉A,
that is ‘2 does not belong to the set A’.
The set defined by (35.1) is the binary set, which could represent the on and off
states of a system. This could be the state of a light switch, for example.
790
Sets can be either finite, having a finite number of elements, or infinite, in which
case the set contains an infinite number of elements. Thus the set given by (35.1)
SETS

defines a finite set A, while


B = {1, 2, 3, … },
35

the list of all positive integers, defines an infinite set.


Some of the more common sets have their own special symbols:

Notation for sets of numbers


, the set of all real numbers
, the set of all complex numbers
+, the set of all positive real numbers (excludes zero)
, the set of all integers (positive, negative, and zero)
+, the set of all positive integers
−, the set of all negative integers
, the set of all rational numbers (i.e. numbers of the form p/q where q ≠ 0 and
p are integers) (35.2)

Often the elements are defined by a rule rather than by a list or formula. We
write the set as
S = {x | x satisfies specified rules},
which can be translated as ‘S is the set of values of x which satisfy the stated rules’.
The rules occur after the vertical |. Thus
S = {x | x ∈ + and 2  x  8}
is an alternative way of writing S = {2, 3, 4, 5, 6, 7, 8}. As another example,
S = {x| x ∈  and 0  x  1}
is the closed interval [0, 1], that is, all real numbers between 0 and 1 including 0
and 1.

Self-test 35.1
List in full the elements in the following sets:
(a) S1 = {x |x ∈ + and −2  x  8},
(b) S2 = {p/q | p ∈ +, q ∈ +, 1  p  3 and 2  q  4}.

35.2 Equality, union, and intersection


Two sets A and B are said to be equal if they contain exactly the same elements. If
this is the case, we write
A = B.
791
For example,

35.2
A = {1, 2, 3}, B = {3, 2, 1}, C = {3, 1, 2, 1}
are all equal, that is A = B = C. The order of the elements is immaterial, and

EQUALITY, UNION, AND INTERSECTION


repeated elements are discounted.
In a given context, the set of all elements of interest is known as the universal
set, usually denoted by U. The definition of U depends on the context. For example,
for the set A above, the universal set might be  (the set of all positive integers), or
+ (the set of all positive numbers), or some other set which includes {1, 2, 3},
depending on the particular application.
We now define how sets can be combined to create new sets. The union of two
sets A and B is the set of all elements that belong to A, or to B, or to both. It is
written as
A ∪ B = {x| x ∈A or x ∈ B or both},
and read as ‘A union B’.

Example 35.1 Find the union of


A = {x | x ∈ and 0  x  2} and B = {x |x ∈ and 1  x  3}.
The elements in the union have to belong to one or other of the intervals 0  x  2, or
1  x  3, or to both. The condition is satisfied by all numbers in the interval 0  x  3,
and by no others. Hence
A ∪ B = { x|  and 0  x  3}.

The intersection of two sets A and B is the set A ∩ B that contains all elements
common to both A and B. It is written and defined by
A ∩ B = {x| x ∈A and x ∈B}.

Example 35.2 Find the intersection of the sets A and B in Example 35.1.
The elements in the intersection have to belong simultaneously to both intervals, that
is to the overlapping part of the intervals [0, 2] and [1, 3], which is [1, 2]. Thus
A ∩ B = {x|x ∈ and 1  x  2}.

In the definitions of A ∪ B and A ∩ B above, we can see that the logical opera-
tion ‘or’ is associated with union, while ‘and’ is associated with intersection.
If A and B have no elements in common, then A and B are said to be disjoint.
The set with no elements is called the empty set and denoted by ∅. Thus, if A
and B are disjoint, then A ∩ B = ∅. Thus if A = {1, 2, 3} and B = {4, 5, 6} then
A ∩ B = ∅.
The complement of a set A is the set of all those elements which belong to the
universal set U but do not belong to A. We denote this set by D (the notations
Ac and A′ are also frequently used): it will depend on the definition of U. Hence,
the complement of A is, assuming that x ∈U,
792
D = {x |x ∉A}.
SETS

We say that A is a subset of B, expressed as A ⊆ B, if every element of A also


belongs to the set B. It follows that A ⊆ U if B ⊆ U. If there are elements of B
which are not in A, then A is called a proper subset of B and written A ⊂ B. The
35

statement A ⊆ B includes the possibility that A = B, while A ⊂ B does not. If A ⊆ B


and B ⊆ A, then all elements in A are contained in B, and vice versa; in other words,
A = B.
The sets of integers  and rational numbers  are proper subsets of the real
numbers , that is
 ⊂ , and  ⊂ .
We can summarize the results as follows.

Set operations
(a) Union: A ∪ B = {x| x ∈A or x ∈B or both}.
(b) Intersection: A ∩ B = {x| x ∈A and x ∈B}.
(c) Complement: D = {x| x ∉A}.
(d) Empty set: ∅, the set with no elements.
(e) Subset: A ⊆ B means that A is a subset of B.
(f) Proper subset: A ⊂ B means that A ⊆ B but A ≠ B. (35.3)

Self-test 35.2
Find the union and intersection of A = {x | x ∈  and −1  x  2},
B = {x | x ∈ + and 1  x  4}.

35.3 Venn diagrams


Useful graphical views and interpretations of sets and operations on them can
be provided by Venn diagrams. We represent sets by regions in the plane, with
the interpretation that the region stands for those elements belonging to the given
set. The diagrams are symbolic: the set A = {1, 2}, for example, could be repres-
ented by the circle as shown in Fig. 35.1. Usually, sets are represented by the
interiors of circles, but any closed curves can be used. In a given context, all the
sets are subsets of a certain universal set U, whose nature will differ according to
the context.
If the universal set is represented by a rectangle, then a subset A of U is repres-
ented by the interior of a circle within the rectangle shown in Fig. 35.2. This is a
Venn diagram for U and A. Remember that A could represent an infinite number
of elements, or one element, or be the empty set ∅. Two regions in U may have
elements in common. For example, A = {1, 2, 3} and B = {3, 4, 5} have the common
element 3. In a Venn diagram this is represented by intersecting regions A and B as
in Fig. 35.3a.
793

35.3
U

VENN DIAGRAMS
A
A

Fig. 35.1 Fig. 35.2 Venn diagram for the universal


set U and a set A.

U U

A B A B

(a) (b)

U U

A A
B

(c) (d)

Fig. 35.3 (a) Union A ∪ B. (b) Intersection A ∩ B. (c) Complement D. (d) Proper subset A ⊂ B.

The union, intersection, complement, and proper subset can be represented by


the Venn diagrams shown in Fig. 35.3. The shaded regions indicate the elements
defined by the operations.
From the definitions of union, intersection, and complement, or from Venn
diagrams, the following laws of the algebra of sets can be deduced:

(a) Algebra of sets


A ∪ A = A, A ∩ A = A.
(b) Commutative laws:
A ∪ B = B ∪ A, A ∩ B = B ∩ A.
(c) Associative laws (see Fig. 35.4):
(A ∪ B) ∪ C = A ∪ (B ∪ C),
(A ∩ B) ∩ C = A ∩ (B ∩ C). ➚
794
SETS

A B A B
35

C C

(a) (b)

Fig. 35.4 (a) (A ∪ B) ∪ C or A ∪ (B ∪ C). (b) (A ∩ B) ∩ C or A ∩ (B ∩ C).

(d) Distributive laws:


A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C),
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C). (35.4)

Sets also satisfy the following identity and complementary laws:

Identity laws: A ∪ ∅ = A, A ∩ U = A.
Complementary laws:
A ∪ D = U, A ∩ D = ∅, F = A. (35.5)

For example, D consists of all elements that do not belong to A, and none that do;
so there are no elements common to A and D. Therefore A ∩ D = ∅.
The difference of the sets A and B, written as A\ B, consists of the set of those
elements that belong to A but do not belong to B. Thus
A\B = {x| x ∈A and x ∉B} or A_ _[_ _B ∩ A.
(The notation A − B is also used for A\B.) Figure 35.5b shows a Venn diagram
for A\B.

A B

A B

Fig. 35.5 Venn diagram for the difference Fig. 35.6


A\B (shaded).
795

Example 35.3 Using Fig. 35.6 as the Venn diagram of two sets A and B, mark by

35.3
shading the following sets:
(a) A ∪ E, (b) A ∩ E, (c) D ∩ E, (d) D ∪ E, (e) A_ _]_ _B, (f ) A_ _[_ _B.

VENN DIAGRAMS
Venn diagrams of the sets are shown in Fig. 35.7.

A B A B

(a) (b)

A B A B

(c) (d)

A B A B

(e) (f )

Fig. 35.7

Examples 35.3c, e and 35.3d confirm de Morgan’s laws, which are

De Morgan’s laws
A_ _]_ _B = D ∩ E, A_ _[_ _B = D ∪ E. (35.6)

Example 35.4 Using Fig. 35.8 as the Venn diagram of three sets A, B, and C,
shade the following sets:
(a) (A ∩ B) ∪ C, (b) (A ∩ B) ∩ C, (c) (A ∩ B) ∩ (A ∩ C),
(d) (A ∪ B) ∪ (A ∩ C).
The required sets are shown in Fig. 35.9. (Figs. 35.9b, c confirm that
(A ∩ B) ∩ C = (A ∩ B) ∩ (A ∩ C).) ➚
796
Example 35.4 continued
SETS
35

A B

Fig. 35.8

A B A B

C C

(a) (A ∩ B) ∪ C (b) (A ∩ B) ∩ C

A B A B

C C

(c) (A ∩ B) ∩ (A ∩ C) (d) (A ∪ B) ∪ (A ∩ C)

Fig. 35.9

Example 35.5 Show that (A ∩ B) ∪ (A ∩ E) = A.


By the distributive law (35.4),
(A ∩ B) ∪ (A ∩ E) = A ∩ (B ∪ E)
= A ∩ U (by the complementary law)
= A (by the identity law).
This can be confirmed graphically by drawing a Venn diagram.
797

Example 35.6 Show that (A ∪ B) ∪ (A\B) = A ∪ B.

35.3
From Fig. 35.5, we can observe that A\B = A ∩ E. Hence
(A ∪ B) ∪ (A\B) = (A ∪ B) ∪ (A ∩ E)

VENN DIAGRAMS
= A ∪ (B ∪ (A ∩ E)) (associative law)
= A ∪ ((B ∪ A) ∩ (B ∪ E)) (distributive law)
= A ∪ ((B ∪ A) ∩ U)
= A ∪ (B ∪ A) (identity law)
= (B ∪ A) ∪ A (commutative law)
= B ∪ (A ∪ A) = B ∪ A
= A ∪ B.
Alternatively, and more intuitively, we may notice that, since A\B is a subset of A, it is
therefore also a subset of A ∪ B, and so adds nothing to A ∪ B when united with it.

Example 35.7 In a manufacturing process, a product passes through three


production stages and is given a quality check at all three stages, which it either
passes or fails. Let Pi represent the set of products passing the quality check at
stage i. Draw a Venn diagram of the process. Interpret the quality failures of
the products in the sets given by M1, P2\(P1 ∪ P3), and (P1 ∪ P2) ∩ P3. What set
represents the completely satisfactory products?
A production run of 1000 occurs, of which 8 fail all stages, 20 pass only stage
P1, 31 only stage P2, and 17 only stage P3; 814 pass stages P1 and P2, 902 stages P2
and P3, and 800 stages P3 and P1. Determine the final number which pass all
quality checks.
M1 represents all products which fail the P1 quality check.
P2\(P1 ∪ P3) represents those products which pass only P2 stage.
(P1 ∪ P2 ) ∩ P3 represents those products which are satisfactory at stages P3 and P1 or P2.
The set P1 ∩ P2 ∩ P3 represents those products which are satisfactory at all stages.
The numbers associated with each subset of the universal set U are shown in Fig. 35.10.
Since 8 fail all quality checks, then the number of elements in P1 ∪ (P2 ∪ P3) is 992. In
the figure, k represents the number of products which pass all the quality checks. Hence
800 − k, for example, represents those products which are satisfactory in stages P1 and
P2, but fail in P3. Thus P1 ∪ P2 ∪ P3 contains

P1 P2
20 814 – k 31

800 – k 902 – k

992 P3
17
U : 1000
Fig. 35.10


798
Example 35.7 continued
SETS

992 = 20 + 31 + 17 + (814 − k) + (902 − k) + (800 − k) + k


products. Hence 992 = 2584 − 2k, and so
k = 796.
35

Of the 1000 products manufactured, 796 passed all the quality checks.

In the previous Example, we are really interested in the numbers of elements


in each of the sets. For example, the number of elements in U is 1000 and the
number of elements in P2\(P1 ∪ P3), those products which pass only stage 2, is 31.
We write
n(U) = 1000, n[P2\(P1 ∪ P3)] = 31.
The number of elements in the set S is n(S): this number is known as the car-
dinality of S. Many sets can have infinite cardinality. For example, n(), where
 is the set of rational numbers, is an infinite number. We write n() = ∞. The
empty set ∅ has no elements: hence n(∅) = 0.
The following results apply to finite sets. If two finite sets A and B are disjoint,
then they have no elements in common. It follows that
n(A ∪ B) = n(A) + n(B).
This result applies to any number of disjoint sets. It is clear that they must be
disjoint, since otherwise elements would be counted more than once.
This last result is also a useful method of counting elements when combined
with a Venn diagram. Consider just two sets A and B as shown in Fig. 35.11. The
sets representing each of the subsets in the Venn diagram A\B, A ∩ B, and B\ A are
shown in Fig. 35.11. Since these sets are disjoint, then we can obtain a formula for
the number of elements in the union of A and B, namely
n(A ∪ B) = n(A\B) + n(A ∩ B) + n(B\ A). (35.7)

For sets A and B separately,


n(A) = n(A\B) + n(A ∩ B), n(B) = n(B\ A) + n(A ∩ B). (35.8)

Elimination of n(A\ B) and n(B\ A) between (35.7) and (35.8) leads to the altern-
ative result
n(A ∪ B) = n(A) + n(B) − n(A ∩ B).

A B

A\B A∩B B\A

Fig. 35.11 Counting elements in


the union of two sets.
799
For three finite sets A, B, and C the corresponding result is

PROBLEMS
n(A ∪ B ∪ C) = n(A) + n(B) + n(C) + n(A ∩ B ∩ C) − n(B ∩ C)
− n(C ∩ A) − n(A ∩ B).
This result can be constructed from the Venn diagram.
Further discussion of sets and their algebra can be found in Garnier and
Taylor (1991).

Self-test 35.3
In Fig. 35.8, shade
(a) A ∩ (B ∩ C); (b) A ∪ B; (c) (A ∪ C) ∩ (B ∪ C).

Problems
35.1 (Section 35.1). List the elements in the (a) A = {x| x ∈, and −2  x  1},
following sets: B = {x |x ∈, and −1  x  2};
(a) S = {x | x ∈+ and 3  x  10}; (b) A = {x| x ∈+ and −5  x  2},
(b) S = {x | x ∈+ and −2  x  4}; B = {x | x ∈, and −5  x  2};
(c) S = {x | x ∈ and −2  x  4}; (c) A = {n |n = 1/m and m ∈ +},
(d) S = {x | x ∈+ or −, and −2  x  4}; B = {n | n = 1/m2 and m ∈+};
(e) S = {1/x |x ∈+ and 3  x  8}; (d) A = {x | x ∈ and x2 − 3x + 2 = 0},
(f ) S = {x2 |x ∈+ and | x |  3}; B = {x | x ∈ and 2x2 + x − 3 = 0};
(g) S = {x + iy | x ∈+, y ∈+, 1  x  4, (e) A = {x| x ∈ and |x |  2},
2  y  5}. B = {x | x ∈ and | x − 1 |  1}.

35.2 (Section 35.3). Show on Venn diagrams the 35.5 (Section 35.3). Construct a set formula for
following sets: the shaded sets of Fig. 35.12:
(a) A ∪ E; (b) D ∩ E;
(c) A ∩ (B ∪ C); (d) (A ∩ B) ∪ (B ∩ C); (a)
(e) A__ _[_ _B ;
A B
(f) (A\ B) ∩ C;
(g) A\ (B ∩ C);
(h) (_A\_ B_)_ _ _]_ (_ _B _\_C _).
C
35.3 (Section 35.2). Determine the union A ∪ B
of each of the following pairs of sets A and B:
(a) A = {x |x ∈ and −1  x  2},
B = {x | x ∈ and −1  x  4}; (b)
(b) A = {x |x ∈ and −1  x  0}, A B
B = {x | x ∈ and 0  x  1};
(c) A = {1, 2, 3, 4}, B = {− 4, −3, −2, −1};
(d) A = {y |y = cos x, x ∈, and 0  x  --21 π},
B = {y| y = sin x, x ∈, and −--21 π  x  --21 π}. C

35.4 (Section 35.2). Determine the intersections


A ∩ B of the following sets: Fig. 35.12
800
(c) 35.10 The cartesian product of two sets A and B is
the set of all ordered pairs {(a, b)}, where a ∈A and
SETS

A B
b ∈B. It is written as
A × B = {(a, b) | a ∈A and b ∈B}.
If A = B, then we write A × A = A2. Let A = {1, 2}
35

C and B = {1, 2, 3}; write down all the elements in


the sets A × B, B × A, A2, and B2.

(d) 35.11 The cartesian product extends to the


B products of three or more sets. Thus
A × B × C = {(a, b, c) | a ∈A and b ∈B and c ∈C}.
A C
Let A = {1, 2, 3}, B = {0, 1}, and C = {1, 2}. Write
down all the elements in
D
A × B × C, A2 × C, (A ∪ B) × C, (A ∩ B) × C.

35.12 At the end of a production process, 500


Fig. 35.12 (continued)
electrical components pass through three quality
35.6 The set S consists of products, each of which checks P, Q, and R. It is found that 38 components
is given n pass/fail tests, numbered 1 to n. The set fail check P, 29 fail Q, 30 fail R, 7 fail P and Q, 5
Sr , consists of those products that pass test r. What fail Q and R, 8 fail R and P, and 3 fail all checks.
is the set of products that Determine how many components:
(a) fails all tests, (b) fails only test 1, (a) pass all checks,
(c) fails some tests? (b) fail just one check,
(c) fail just two checks.
35.7 At Keele University, all first-year students
must take three subjects of which at least one must 35.13 (Section 35.4). For three finite sets A, B, and
be a science subject, and at least one must be a C, show that the number of elements in the union
humanities or social science subject. Let A be the of the sets is given by
set of all first-year students in a given year, A1 n(A ∪ B ∪ C) = n(A) + n(B) + n(C)
the set of students who take exactly one science
+ n(A ∩ B ∩ C) − n(B ∩ C)
subject, B1 the set of students who take just one
humanities subject, and B2 the set of those who − n(C ∩ A) − n(A ∩ B).
take two social science subjects. Draw a Venn
diagram to represent the different sets of students 35.14 If A and B are two finite sets, explain why,
classified by groups of subjects. Give set formulae for the cartesian product (defined in Problem 35.10
for students who take above),
(a) just one social science subject, n(A × B) = n(A)n(B).
(b) no humanities subject,
(c) one subject from each group. 35.15 The menu in a restaurant contains three
courses: 4 starters (set A), 5 main courses (set B),
35.8 (Section 35.3). The rules listed in (35.4) and 3 sweets (set C). Customers can choose either
illustrate the duality principle which states that the full menu or, alternatively, a main course and a
every statement involving sets which is true for all sweet. In terms of cartesian products what is the set
sets has a dual in which ∪ and ∩ are interchanged, of all possible meals (the answer is really a set of
and ∅ and U are interchanged everywhere. pairs and triples). For how many different orders
Use Venn diagrams to establish the following: can customers ask?
(a) (A\ B) ∩ C = (A ∩ C)\ B;
(b) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C). 35.16 Given A = {1, 2, 3}, B = {3, 4}, and C = {2, 3,
What are their dual identities? 4, 5}, find the elements in the sets B ∪ C, B ∩ C, and
the cartesian products A × B and A × C. Verify that
35.9 Three sets A, B, and C satisfy
A × (B ∪ C) = (A × B) ∪ (A × C),
A ∩ B ∩ C = (A ∩ C) ∪ (B ∩ C).
Explain why the duality principle of Problem 34.8 A × (B ∩ C) = (A × B) ∩ (A × C).
does not apply. What condition of the duality (This example suggests general results which are
principle is violated? true for all sets.)
Boolean algebra: logic gates
and switching functions 36

CONTENTS

36.1 Laws of Boolean algebra 801


36.2 Logic gates and truth tables 803
36.3 Logic networks 805
36.4 The inverse truth-table problem 808
36.5 Switching circuits 809
Problems 812

We are now going to present some new operations between special entities. They
have analogies with ordinary addition and multiplication, and the symbols for
them will be similar; but not the same, since we need to emphasize that these are
Boolean operations. The algebra involved is named after George Boole (1815–
64) who first developed the modern ideas of symbolic logic. Boolean algebra has
applications in logic and switching circuits.

36.1 Laws of Boolean algebra


Consider a set B which consists of just two elements 0 and 1, that is B = {0, 1}. We
shall denote the sum of two elements a and b of B by a ⊕ b (the notations ∨, ∪,
and +, and the alternative term join are also used); we denote the product of the
two elements by a * b (the notations ∧, ∩, ×, and ·, or simply ab, and the alterna-
tive term meet are also in use) and the complement of a by A (~a and ¬a are used
in logic). These binary operations applied to the members of B are defined to give
the elements shown in Table 36.1.
Table 36.1 Binary operations

Sum Product Complement


a b a⊕b a b a*b a A

0 0 0 0 0 0 0 1
0 1 1 0 1 0 1 0
1 0 1 1 0 0
1 1 1 1 1 1
802
Thus, for example
BOOLEAN ALGEBRA: LOGIC GATES AND SWITCHING FUNCTIONS

0 ⊕ 1 = 1, 1 ⊕ 1 = 1, 0 * 0 = 0, 1 * 1 = 1, 0 = 1, 1 = 0.
The elements of B are known as Boolean variables. We have restricted our set B
to one with just two elements or binary digits, because this is the main applica-
tion in circuits and computer design, but definitions can be interpreted for more
general sets. A Boolean algebra is a set with the operations ⊕, *, and ¯ defined on
it, together with the following laws on any elements a, b, c which belong to B:

Commutative laws:
a ⊕ b = b ⊕ a, a * b = b * a;
Associative laws:
a ⊕ (b ⊕ c) = (a ⊕ b) ⊕ c, a * (b * c) = (a * b) * c;
Distributive laws:
a * (b ⊕ c) = (a * b) ⊕ (a * c),
a ⊕ (b * c) = (a ⊕ b) * (a ⊕ c). (36.1)

In addition, the set must contain distinct identity elements 0 and 1 for the opera-
tions ⊕ and * respectively. For these elements we must have the identity and com-
plement laws:

Identity laws:
a ⊕ 0 = a, a*1=a
Complement laws:
a ⊕ A = 1, a*A=0 (36.2)
36

To summarize, we can say that a Boolean algebra consists of the collection


(B, ⊕, *, ¯, 0, 1),
in other words, a set B, the binary operations ⊕ and *, the complement ¯, and the
identity elements 0 and 1.
In our case B = {0, 1}, the binary set, which consists simply of identity elements.
We can check that the definitions in Table 36.1 satisfy the laws in (36.1). They are
essentially the laws of set operations with sum ⊕ and product * replacing union ∪
and intersection ∩, and with 1 replacing the universal set U and 0 the empty set ∅.
Just as with sets, we can deduce further laws, some of which are included
in (36.3):

Absorption laws:
a ⊕ (a * b) = a, a * (a ⊕ b) = a;
de Morgan’s laws: 15352 = A * B, 15452 = A ⊕ B;
Identity laws:
1 ⊕ a = a ⊕ 1 = 1, 0 * a = a * 0 = 0;
Reflexive law: I = a. (36.3)
803
Note that * takes precedence over ⊕ in the absence of brackets. Thus, in the first
absorption law, a ⊕ a * b means a ⊕ (a * b); in the second absorption law, the

36.2
brackets are essential.
We will prove one of the absorption laws to illustrate how proofs are ap-

LOGIC GATES AND TRUTH TABLES


proached in Boolean algebra.

Example 36.1 Prove that a ⊕ a * b = a.


For all a,b ∈B
a ⊕ a * b = a * 1 ⊕ a * b (identity law)
= a * (1 ⊕ b) (distributive law).
Now
1 ⊕ b = (1 ⊕ b) * 1 (identity law)
= 1 * (b ⊕ 1) (associative law)
= (b ⊕ B) * (b ⊕ 1) (complement law)
= b ⊕ B * 1 (distributive law)
= b ⊕ B (identity law)
= 1 (complement law).
Finally
a ⊕ a * b = a * 1 = a.

36.2 Logic gates and truth tables

Any expression made up from the elements of B and the operations ⊕, *, and ¯ is
known as a Boolean expression. For example,

a ⊕ b, a ⊕ B, a ⊕ A * b,

are Boolean expressions. For the binary set, the elements 1 and 0 can represent
‘on’ or ‘off’ states in digital circuits. The basic components in a computer are
logic gates which can produce an output from inputs. All the outputs and inputs
can be in one of two states, usually either low voltage (0) or high voltage (1).
The fundamental Boolean operations of ⊕, *, and ¯ correspond to devices
known respectively as the OR gate, AND gate, and NOT gate. As with circuit com-
ponents such as resistance and inductance, each has its own symbol.
The or gate has two inputs and a single output represented by the symbol in
Fig. 36.1. The output is f = a ⊕ b. The inputs a and b can each take either of the
values 0 or 1. Hence there are four possible inputs into the device as listed in
Table 36.2. The final column f can be completed using the sum rule in Table 36.1.
Then, if a is ‘on’ (1) and b is ‘off’ (0), the output f is ‘on’ (1). Table 36.2 is known
as the truth table of the or gate.
804
Table 36.2 Truth table for the OR gate
BOOLEAN ALGEBRA: LOGIC GATES AND SWITCHING FUNCTIONS

a b f=a⊕b
a f=a⊕b

b
0 0 0
0 1 1
Fig. 36.1 The or gate. 1 0 1
1 1 1

The symbol and truth table for the AND gate are shown in Fig. 36.2 and
Table 36.3. Again the device has two inputs and the single output f = a * b, the
product of a and b.

Table 36.3 Truth table for the AND gate

a b f=a*b
a f=a*b

b 0 0 0
0 1 0
Fig. 36.2 The and gate. 1 0 0
1 1 1

Finally the NOT gate is shown in Fig. 36.3 with its truth table given as
Table 36.4. The not gate has a single input and a single output which is the com-
plement of its input.
36

Table 36.4 Truth table for the NOT gate


f=A
a a f= A

0 1
Fig. 36.3 The not gate.
1 0

There is further jargon associated with these gates. The output a ⊕ b is known
as the disjunction of a and b, while a * b is known as the conjunction of a and b,
and A is called the negation of a.
These devices can be connected in series and parallel to create new logic
devices, each of which will have its own truth table.
A series connection between a not gate and an and gate is shown in Fig. 36.4a.
The output a * b of the and gate becomes the input of the not gate which results
in the output 15452. This combined device is known as the NAND gate, and it has
its own symbolic representation shown in Fig. 36.4b. Its truth table is given in
Table 36.5.
805

a a*b f = 15452 Table 36.5 Truth table for the NAND gate

36.3
b a b f = 15452

LOGIC NETWORKS
(a)
0 0 1
a f = 15452
0 1 1
b 1 0 1
(b)
1 1 0
Fig. 36.4 The nand gate.

A series connection between a not gate and an or gate produces the NOR gate
as shown in Fig. 36.5a. The output f is the complement of the sum of a and b. The
nor gate also has its own symbol contraction shown in Fig. 36.5b. It has the truth
table shown in Table 36.6.

a a⊕b f = 15352 Table 36.6 Truth table for the NOR gate
b
a b f = 15352
(a)
0 0 1
a f = 15352
0 1 0
b
(b) 1 0 0
1 1 0
Fig. 36.5 The nor gate.

Self-test 36.1
The output of an AND gate (Fig. 36.2) is attached to a NOT gate (Fig. 36.3).
Construct the truth table for the system.

36.3 Logic networks


The five gates introduced in the previous section can be linked in series and
parallel combinations to create further logic networks. Some examples are
presented here.

Example 36.2 Construct the Boolean expression for the output f of the device
shown in Fig. 36.6.
Starting from the left in Fig. 36.6, the upper and gate produces an output a * b and the
lower or gate has an output c ⊕ d. These become the inputs into the or gate on the
right. Hence the final output is ➚
806
Example 36.2 continued
BOOLEAN ALGEBRA: LOGIC GATES AND SWITCHING FUNCTIONS

a a*b
b
f

d c⊕d Fig. 36.6

f = (a * b) ⊕ c ⊕ d.
Since there are four inputs, the output f can be determined for each of the 24 = 16
possible inputs. Hence if, for example, a = 1, b = 0, c = 0, d = 1, then the output f = 1.

Example 36.3 Figure 36.7 shows a logical network with three inputs a, b, c, and
four devices. Find a Boolean expression for the output f. Write down the truth
table for the system.

P
a R S
f
b
Q

c Fig. 36.7
36

Note that the input b is the same in both devices P and Q. The output from the and gate
P is a * b, and the output from R is 15452. The output from Q is b ⊕ c. Hence the inputs
15452 and b ⊕ c into S produce an output
f = 15452 ⊕ b ⊕ c.
The truth table for this network is given in Table 36.7. Whatever the inputs, the device
is always ‘on’.

Table 36.7

a b c a*b b⊕c 15452 ⊕ (b ⊕ c)


0 0 0 0 0 1
0 0 1 0 1 1
0 1 0 0 1 1
0 1 1 0 1 1
1 0 0 0 0 1
1 0 1 0 1 1
1 1 0 1 1 1
1 1 1 1 1 1
807

Example 36.4 Show that, using just the nor gate, it is possible to build a logic

36.3
network to model any Boolean expression.
Given inputs a and b, we have to show that devices can be constructed using just nor

LOGIC NETWORKS
gates with outputs of a ⊕ b, a * b, and A. For inputs of a and b, the single nor gate
generates an output of 15352. Figure 36.8 shows three devices which simulate the
required outputs.

(a) a 15352 7-[-8\9\7-[-8 = a ⊕ b

(b) 15351 = A
a
7\9\8 = a ∗ b

b
25352 = B

(c) 15351 = A
a

Fig. 36.8 The simulations are: (a) OR gate; (b) AND gate (c) NOT gate.

Example 36.5 Design a logic network using or, and, and not gates to
reproduce the Boolean expression f = a * B ⊕ a for inputs a and b.
From input b we obtain B by a not gate. The inputs a and B are then fed into an and
gate to produce a * B. Finally a spur from the a input and the a * B output are fed into
an or gate as shown in Fig. 36.9.

a*B⊕a

b a*B
B Fig. 36.9

Self-test 36.2
An AND gate with inputs a and b, and a NOT gate with input c are connected
to a NOR gate. Find a Boolean expression for the output f, and construct a
truth table for the system.
808

36.4 The inverse truth-table problem


BOOLEAN ALGEBRA: LOGIC GATES AND SWITCHING FUNCTIONS

In this problem we attempt the inverse problem; of creating a Boolean expression


for a given truth table. For example, Table 36.8 is a truth table for two inputs a
and b. We illustrate a method for the construction of a Boolean expression which
will generate this truth table. Pick out cases for which f = 1. For the case a = 0, b = 1,
write down A * b, and for a = 1, b = 0 write down a * B, using in the products,
the complement of any zero element. Thus, for example, if a = 0 and b = 1, then
A = 1 and A * b = 1. Similarly a * B = 1. Hence by Table 36.3
A*b⊕a*B=1
Table 36.8 Truth table for EXOR gate

a b f
a f=a*B⊕A*b
b 0 0 0
0 1 1
Fig. 36.10 The exclusive-or gate.
1 0 1
1 1 0

for these cases. f remains zero for the remaining outputs. We obtain
f = A * b ⊕ a * B, (36.4)

and the final output f can be checked.


This particular gate is known as the exclusive-OR gate, or EXOR gate, and has
its own symbol shown in Fig. 36.10. This form of f obtained by the construction
just described is known as the disjunctive normal form. By the definitions in
36

Table 36.1, the construction guarantees a Boolean expression for any truth table.
Applied to the truth table for the OR gate (Table 36.2), the disjunctive form gives
f = (A ⊕ b) ⊕ (a ⊕ B) ⊕ (a ⊕ b),
which is evidently a more complicated version of a ⊕ b.
The method can be applied to more complex truth tables. Table 36.9 shows an
output for three inputs. The output 1 appears in rows 2, 4, 5, 7, 8. In row 2, a = 0,
Table 36.9

a b c f

0 0 0 0
0 0 1 1
0 1 0 0
0 1 1 1
1 0 0 1
1 0 1 0
1 1 0 1
1 1 1 1
809
b = 0 and c = 1. Hence we introduce A * B * c which equals 1. Apply the same pro-
cedure to rows 4, 5, 7, 8 introducing the complement for zero Boolean variables.

36.5
The disjunctive normal form for a corresponding Boolean expression is, follow-
ing the rules for products of elements and their complements,

SWITCHING CIRCUITS
f = A * B * c ⊕ A * b * c ⊕ a * B * C ⊕ a * b * C ⊕ a * b * c.
Check that f does give the required output. The disjunctive normal form always
guarantees an answer, but it is not necessarily the simplest or most efficient in
circuit architecture.

Self-test 36.3
Construct a Boolean expression for the truth table

a b 15452

0 0 1
0 1 1
1 0 1
1 1 0

using the disjunctive normal form. Compare the answer with the answer of
Self-test 36.1.

36.5 Switching circuits


A circuit of on–off switches can also be represented by Boolean expressions. For
example, Fig. 36.11 shows a simple on–off switch in part of a circuit. Current
flows if the switch S is in the on or closed position (a = 1), and does not flow if the
switch is in the off or open position (a = 0). The variable a represents the state of
the switch.

a Fig. 36.11 On–off switch.

Consider two switches S1 and S2 in series (Fig. 36.12). Current only flows if both
switches are closed, that is when a = 1 and b = 1, where a and b represent the
states of the switches. Hence the truth table for the series switches is as shown in
Table 36.10. Thus the state of current flow is given by f = a * b, the product of
a and b.
Similarly two switches in parallel (Fig. 36.13) correspond to the sum of a and b.
The truth table is given in Table 36.11. The final column indicates that f = a ⊕ b.
The complement of a, the state of switch S1, is another switch S2 in the circuit
which is always in the complementary state to S1, off when S1 is on and vice versa.
810
Table 36.10 Truth table for two switches
in series
BOOLEAN ALGEBRA: LOGIC GATES AND SWITCHING FUNCTIONS

S1 S2 a b f
a b 0 0 0
Fig. 36.12 Two switches in series. 0 1 0
1 0 0
1 1 1

Table 36.11 Truth table for two switches


in parallel
S1
a b f

S2
0 0 0
0 1 1
Fig. 36.13 Two switches in parallel. 1 0 1
1 1 1

S1
a
36

A Fig. 36.14 Complement of a


S2 switch using a rigid tie.

It can be represented symbolically by Fig. 36.14, in which the switches S1 and S2


are joined by a rigid tie.
These devices are analogous to the gates of Section 36.3. For switching circuits,
the Boolean expressions are often referred to as switching functions.

Example 36.6 Find a switching function f for the system shown in Fig. 36.15.

S1

S2

S4

S3
Fig. 36.15


811
Example 36.6 continued

36.5
Let a1, a2, a3, a4 represent respectively the states of each switch S1, S2, S3, S4. Since S2 and
S3 are in parallel, their output will be a2 ⊕ a3. This combined in series with a4 will give
an output of (a2 ⊕ a3) * a4. In turn, this is in parallel with S1. Hence, the final output is

SWITCHING CIRCUITS
(a2 ⊕ a3) * a4 ⊕ a1.

Example 36.7 A light on a staircase is controlled by two switches S1 and S2, one
at the bottom of the stairs and one at the top. Switches can be separately ‘up’ or
‘down’. If both switches are up, the light is off. Either switch changed to down
switches the light on, and any subsequent change to a switch alters the state of
the light. Design a truth table for the circuit.
The truth table is shown in Table 36.12, where the state of Si (i = 1, 2) is ai = 0 when the
switch is up (off) and ai = 1 when the switch is down (on). The light on is f = 1, and
the light off is f = 0. This truth table is the same as that for the exclusive-or gate in
Section 36.4. Hence, from (36.4), the circuit can be represented by the switching
function
f = a1 * A2 ⊕ A1 * a2.
The actual circuit is shown in Fig. 36.16, where S1 and S2 are one-pole two-way switches.
At S1, the state a1 represents the switch ‘up’ and its complement A1 is the switch down.
A similar state operates at S2.

Table 36.12

Switch S1 Switch S2 Light a1 a2 f

up up off 0 0 0
down up on 1 0 1
down down off 1 1 0
up down on 0 1 1

S1 S2
a1 a2

A1 A2

AC
supply

Fig. 36.16 Two-switch light


Light control.

Further explanation of Boolean algebra with many applications to switching


circuits can be found in Garnier and Taylor (1991).
812

Problems
BOOLEAN ALGEBRA: LOGIC GATES AND SWITCHING FUNCTIONS

36.1 Read through Example 36.1. Now prove the 36.9 (Section 36.3). Find a Boolean expression f
other absorption law: which corresponds to the truth table shown in
a * a ⊕ b = a. Table 36.13.
(Example 36.1 and this result illustrate the duality
principle, which states that any theorem which Table 36.13
can be proved in Boolean algebra implies another
theorem with * and ⊕ interchanged for the same a b c f
elements.)
0 0 0 1
36.2 (Section 36.1). Prove the de Morgan result 0 0 1 1
15352 = A * B, 0 1 0 0
by showing that (a ⊕ b) ⊕ (A * B) = 1. Explain how 0 1 1 0
the duality result (Problem 36.1) gives the other de 1 0 0 1
Morgan theorem. 1 0 1 1
36.3 (Section 36.1). Let B be the Boolean algebra
1 1 0 0
with the two elements 0 and 1. For arbitrary 1 1 1 0
a,b ∈B, prove the following:
(a) a * (A ⊕ b) = a * b;
(b) (a ⊕ b) * (a ⊕ B) = a; 36.10 (Section 36.3). Construct Boolean
(c) (a ⊕ b) * A * B = 0. expressions for the output f in the devices shown
in Figs 36.17a–d. Construct the truth tables in
36.4 (Section 36.1). Using the laws of Boolean each case.
algebra for the set with two elements 0 and 1,
show that: (a) a
(a) a * b ⊕ a * B = a; b
(b) a ⊕ A * B * c = a ⊕ B * c. f
36

Use the result to obtain the truth tables in each case.


c

36.5 (Section 36.4). In Problem 36.4b, it is d


shown that
(b) a
a ⊕ A * B * c = a ⊕ B * c.
Design two sequences of gates which give the same f
output for the inputs a, b, and c. The resultant b
gates are said to be logically equivalent.
c
36.6 (Section 36.4). Design a circuit of gates to
produce the output (c) a

(a ⊕ B) * (a ⊕ C ). b
Construct the truth table for this Boolean c
expression.
d

36.7 (Section 36.1). Show that the Boolean a


(d)
expressions (a ⊕ b) * (A ⊕ b) ⊕ a and a ⊕ b are
equivalent. b

c
36.8 (Section 36.1). Show that the following
d
Boolean expressions are equivalent:
(a) a ⊕ b; (b) a ⊕ b * b. Fig. 36.17
813
36.11 Find the outputs f and g in the logic circuits Table 36.16
shown in Fig. 36.18. This device can represent

PROBLEMS
binary addition in which g is the ‘carry’ in the a b c f
binary table shown in Table 36.14. The output g
gives the ‘1’ in the ‘10’ in the binary sum 1 + 1 = 10. 0 0 0 1
0 0 1 0
g
0 1 0 0
a f 0 1 1 1
b 1 0 0 1
1 0 1 0
1 1 0 1
Fig. 36.18 1 1 1 0

Table 36.14
36.15 (Section 36.4). Find switching functions for
x y x+y the switching circuits shown in Figs 36.19a,b.
0 0 0
(a) S1
0 1 1
1 0 1 S2
1 1 10
S3
S5
36.12 (Section 36.3). Reproduce the logic gate in S4
Fig. 36.6 using just the nor gate.

(b) S1
36.13 (Section 36.4). Using the disjunctive normal
form, construct a Boolean expression f for the truth
tables given in Tables 36.15 and 36.16. S2 S3
S5
36.14 (Section 36.3). Show that any Boolean S4
expression can be modelled using just a nand gate.
(Hint: use a method similar to that explained in
Example 36.4.) Fig. 36.19

Table 36.15 36.16 A lecture theatre has three entrances and the
lighting can be controlled from each entrance; that
a b f
is, it can be switched on or off independently. The
0 0 0 light is ‘on’ if the output f equals 1 and ‘off’ if f = 0.
Let ai = 1 (i = 1, 2, 3) when switch i is up, and let ai
0 1 1 = 0 (i = 1, 2, 3) when it is down. Construct a truth
1 0 1 table for the state of the lighting for all states of the
1 1 1 switches. Also specify a Boolean expression which
will control the lighting.
Graph theory and
37 its applications

CONTENTS

37.1 Examples of graphs 815


37.2 Definitions and properties of graphs 817
37.3 How many simple graphs are there? 818
37.4 Paths and cycles 820
37.5 Trees 821
37.6 Electrical circuits: the cutset method 823
37.7 Signal-flow graphs 827
37.8 Planar graphs 831
37.9 Further applications 834
Problems 837

A graph is a network or diagram composed of points, or nodes or vertices, joined


together by lines or edges, each of which has a vertex at each end. Figure 37.1
shows a graph which has four vertices {a, b, c, d} and six edges {ab, ab, ad, bd, bc,
cd}. Two vertices are not joined in this graph, namely a and c, while a and b are
joined by two edges. Generally, it is not the shape of the graph which is important;
it is usually the number and connection of the edges which is significant. The
terminology is unfortunate. Graphs in this context should not be confused with
curves generated by functions as in Chapter 1. ‘Networks’ might be a more
appropriate term but historical precedent is difficult to overturn. However the
context usually fixes the meaning.

b d

c Fig. 37.1
815

37.1 Examples of graphs

37.1
Here are some practical examples of situations and objects which can be usefully
represented by graphs.

EXAMPLES OF GRAPHS
(i) Electrical circuits. Figure 37.2a shows an electrical circuit with four resistors
R1, R2, R3 and R4, an inductor L, and a voltage source V1. Each edge has just one
component, and the joins between components are the vertices (the term node is
frequently used in circuit theory) in the graph. Care has to be taken with the
definition of nodes (see Section 37.6): they are not necessarily where three or more
wires meet. This circuit has four vertices a, b, c, d, and it can be represented by the
graph in Fig. 37.2b if we are only interested in the links, not what they contain. The
presence of a line or edge between two nodes in the graph indicates that there is a
component between the nodes.

a c
R1
b
R2
b
a c
L1
+ v1 R3 R4

(a) d (b) d

Fig. 37.2

Figure 37.3 shows another circuit with six vertices in which the boxes indicate
electrical components. The wires joining c to f and b to e cross over each other.
In the design of printed circuits, it is useful to know whether the circuit can be
redrawn so that no wires cross. Such a graph, with no edges crossing, is known
as a planar graph. The graph in Fig. 37.2 is planar, but the graph of the circuit in
Fig. 37.3 has no planar drawing: at least two edges will cross in any plane diagram
of it. We shall discuss this notion in Section 37.8.

c b a

d f
e
Fig. 37.3

(ii) Chemical molecules. Molecular diagrams look like candidates for graphs. The
molecule of ethanol can be represented by Fig. 37.4a. In its graph representation
in Fig. 37.4b, the vertices represent atoms and the edges bonds. The number of
816

(a) H H (b)
GRAPH THEORY AND ITS APPLICATIONS

H C C O H

H H

Fig. 37.4 Ethanol molecule.

(a) (b)
37

Fig. 37.5 (a) Traffic flow in a road grid, (b) Digraph representation of the roads in (a).

bonds which meet at an atom is the valency of the atom. Thus carbon (C) has
valency 4, oxygen (O) valency 2, and hydrogen (H) valency 1. Generally in graphs,
the number of edges that meet at a vertex is known as the degree of the vertex.
(iii) Road maps. Road maps and street plans are graphs with roads as edges and
junctions as vertices. However, most road networks include one-way streets.
Hence graphs need to be modified to indicate directions in which movement or
flow is permitted. Figure 37.5a shows a typical section of a street plan with some
one-way streets. We have to associate directions with the edges as shown in the
graph of the plan in Fig. 37.5b. Note that two-way streets now have two directed
edges associated with them. This is an example of a directed graph, which is also
known by the shortened term digraph.
(iv) Shortest paths. Figure 37.6 shows a digraph with weights associated with
each edge. The graph could represent routes between towns S and F which pass

A F
4

3 2 4 2 5

B E
S 6 4 7 T

5 2 3 3 5

6
C D Fig. 37.6
817
through intermediate towns A, B, … , the weights associated with each directed
edge could stand for distances or times. This graph is shown as a digraph, but

37.2
weights could be present without directions in some cases. We might be interested
in this example in the shortest distance between the start (S) and the finish (F).

DEFINITIONS AND PROPERTIES OF GRAPHS


37.2 Definitions and properties of graphs
As we have seen, a graph is an object composed of vertices and edges with one
vertex at each end of every edge. An edge which joins a vertex to itself is known
as a loop. If two or more edges join the same two vertices then they are known as
multiple edges. A graph with no loops or multiple edges is known as a simple
graph. A graph with loops and/or multiple edges is known as a multigraph.
A graph in which every vertex can be reached from every other vertex along
a succession of edges is said to be connected. Otherwise the graph is said to be
disconnected. A connected graph is in one piece; a disconnected graph is in two
or more pieces.
The degree of a vertex x is the number of edges that meet there, denoted by
deg(x). If, in a graph G, all the vertices have the same degree r, then G is said to be
regular of degree r.

Example 37.1 Find the degree of the vertices in the graph in Fig. 37.1.
Three edges meet at the vertex a. Hence deg(a) = 3. Four edges meet at b. Hence
deg(b) = 4. Similarly, deg(c) = 2 and deg(d) = 3.

A simple graph in which every vertex is joined to every other vertex by just one
edge is called a complete graph (see also Section 37.8).
Figure 37.7 shows some examples of the various graphs described above.

b d Multiple
b c
edges

d
a c e Loop
a
a b
(a) (b)

a b c
b d

e c

a c e f e d d
(c) (d) (e)

Fig. 37.7 (a) Connected simple graph. (b) Connected multigraph. (c) Disconnected multigraph.
(d) Regular graph of degree 3. (e) Complete graph with five vertices: deg(a) = 4.
818
Since every edge has a vertex at each end, it follows that the sum of all the vertex
degrees equals twice the number of edges. This is known as the handshaking
GRAPH THEORY AND ITS APPLICATIONS

lemma. For example, from Example 37.1,


deg(a) + deg(b) + deg(c) + deg(d) = 3 + 4 + 2 + 3 = 12,
which is twice the number of edges in the graph shown in Fig. 37.1.
There are two immediate consequences of the handshaking lemma:
(i) the sum of all the vertex degrees in a graph is an even number;
(ii) the number of vertices of odd degree is even.

Self-test 37.1
List the degrees of each vertex, as an increasing sequence, for each graph
in Fig. 37.7.
37

37.3 How many simple graphs are there?


Graphs can be described as labelled, in which case the vertices are distinguishable
as in Fig. 37.8a or unlabelled as in Fig. 37.8b. If we look at graphs with just three
vertices, there are eight labelled simple graphs as shown in Fig. 37.9, but there are
just four distinct unlabelled graphs as shown in Fig. 37.10. In Fig. 37.9, the three
labelled graphs with one edge will correspond to the one unlabelled graph in
Fig. 37.10.
The number of labelled simple graphs with n vertices is fairly easy to calculate.
Between any two vertices, there is the possibility of an edge. Any vertex can be
joined to n − 1 other vertices. Since this will duplicate edges, there will be 12 n(n − 1)
possible edges. Each edge may be either present or not. Hence the number of
1
possible combinations of present and absent edges will be 2 –2 n(n−1), which is the
number of labelled graphs (this number increases extremely rapidly, with n).
1
Thus there must be 2 –2 4(4−1) = 26 = 64 labelled graphs with four vertices; of these,

(a) a b (b)

d c

Fig. 37.8
819

a a a a

37.3
HOW MANY SIMPLE GRAPHS ARE THERE?
b c b c b c b c

a a a a

b c b c b c b c

Fig. 37.9 Labelled graphs with three vertices.

Fig. 37.10 Unlabelled graphs with three vertices.

Fig. 37.11 All unlabelled graphs with four vertices.

11 can be identified as unlabelled graphs. The latter graphs are shown in Fig. 37.11.
Of the 11 unlabelled graphs it can be seen that six are connected and four are
regular.
For applications involving electrical circuits, the main interest is in connected
graphs. The numbers of the various categories of graphs up to n = 7 vertices are
820
Table 37.1
GRAPH THEORY AND ITS APPLICATIONS

n 1 2 3 4 5 6 7

Labelled graphs 1 2 8 64 1024 32 768 2 097 152


Unlabelled graphs 1 2 4 11 34 156 1 044
Connected graphs 1 1 2 6 21 112 853
Regular graphs 1 2 2 4 3 8 6

given in Table 37.1. It can be seen from the table that the number of unlabelled
graphs is a considerable reduction on the labelled set, and that regular graphs
are comparatively rare. The counting of unlabelled graphs does not follow from a
simple formula.

Self-test 37.2
37

List all unlabelled simple graphs with five vertices. Indicate which graphs
are connected, and which are regular. What are the degrees of the regular
graphs?

37.4 Paths and cycles


Suppose we follow a succession of connected edges between two vertices a and z
in a graph, along which there may be repeated edges and vertices. This is known
as a walk between a and z. If all the edges walked are different (i.e. no edge is
covered more than once but vertices may be visited more than once), then the walk
defines what is known as a trail. A trail is said to be closed if the first and last
vertices are the same. If all the vertices on a trail are different, except possibly the
end pair, then the succession defines a path. A closed path is known as a cycle. For
example, in Fig. 37.12, a–f–b–c–d is a path between a and d, but a–b–f–e–b–c–d
is only a trail since vertex b is passed through twice. Also, a–b–c–d–e–f–a is an
example of a cycle.

a f

b e

c d Fig. 37.12
821

Example 37.2 Electrical circuits are usually such that every edge of their

37.5
representative graph is part of a cycle. List all the distinct cycles in the graph
in Fig. 37.2a.

TREES
The graph of the circuit is repeated in Fig. 37.13. The complete list of cycles is:
3-edge cycles: a–b–c–a, a–b– d–a, a–d–c–a, b–d–c–b;
4-edge cycles: a–b–d–c–a, a–d–b–c–a, a–b–c–d–a.

a c

b f
b
g
c e

Fig. 37.13 d

Fig. 37.14

Some graphs have special closed-path and cycle properties. A connected graph
G is said to be eulerian if there exists a closed trail that includes every edge in G.
A connected graph G is said to be hamiltonian if there exists a cycle that includes
every vertex in G. The graph in Fig. 37.13 is hamiltonian but not eulerian. One
hamiltonian cycle in its graph is a–b–d–c–a. Note that this cycle does not have to
cover every edge in the graph.
The graph in Fig. 37.14 is both eulerian and hamiltonian. An eulerian trail is
a–b–c–d–e–f–g–e–c–g–b–f–a,
and a hamiltonian cycle is
a–b–c–d–e–g–f–a.
It can be shown that a connected graph is eulerian if and only if every vertex
has even degree. This provides an easy test for the eulerian property of a graph.

37.5 Trees
A connected graph which has no cycles is known as a tree. An example of a tree is
shown in Fig. 37.15. The edges in a tree are called branches.
Suppose that a graph G consists of the set V(G) of vertices and the set E(G) of
edges. Then any graph whose vertices and edges are subsets of V(G) and E(G)
respectively is called a subgraph. It is important to note that the subgraph must
be a graph whose vertices and edges come from G; and only edges that join two
vertices of the subgraph are permitted in the subset of E(G).
822
GRAPH THEORY AND ITS APPLICATIONS

Fig. 37.15 An example of a tree.

Suppose that G is a connected graph.

A spanning tree of G is a subgraph of G which is a tree and includes all vertices


of G.

Figure 37.16a shows a connected graph G and Fig. 37.16b shows a spanning tree
of G. Graphs can have many different spanning trees. The set of edges that are not
37

part of the spanning tree (the broken edges in Fig. 37.16b) is known as the cotree
and its edges are called links.

(a) (b)

Fig. 37.16 (a) Connected graph. (b) The same graph with a spanning tree.

Construct a tree from a vertex by adding edges. Each edge added must introduce
a new vertex, since otherwise a cycle would be created and the graph would no
longer be a tree. A tree with two vertices has one edge, a tree with three vertices
has two edges and so on. Hence a tree with n vertices must have just n − 1
branches. It follows that a graph with n vertices must have a cotree with e − n + 1
links, where e is the number of edges of the graph.
We now introduce the cutset, by which we can disconnect a graph into two sub-
graphs which together contain all the original vertices, by removing a minimum
set of edges in the graph.

Cutset
In a connected graph, a cutset is a set of edges (a) whose removal disconnects
the graph into two subgraphs and (b) no proper subset of the cutset disconnects
the graph.
823

(b)

37.6
(a)
a b a b

ELECTRICAL CIRCUITS: THE CUTSET METHOD


f f C2
C1

e c c
e

d d

Fig. 37.17 A cutset of a graph.

A proper subset of the cutset is one which does not include the cutset. There must
be no redundancy in the cutset. Thus, for example in Fig. 37.17a, the broken line
C1, which removes the edges ba, bf, and bc, defines a cutset {ba, bf, bc}, since {b}
and {a, c, d, e, f } are disconnected subgraphs. C2 in Fig. 37.17b does not define a
cutset, since the subset {ab, bf, bc} of edges disconnects the graph.

Self-test 37.3
(a) What are the degrees of the vertices in the spanning tree in Fig. 37.16(b)?
Design a spanning tree with vertex degrees {1, 1, 2, 2, 2}. (b) Indicate span-
ning trees of the graph in Fig. 37.14 with vertex degrees (i) {1, 1, 2, 2, 2, 2, 2},
(ii) {1, 1, 1, 1, 2, 2, 4}.

37.6 Electrical circuits: the cutset method


In this section we give a brief description of the representation of circuits by
graphs, and show how Kirchhoff’s laws can be applied to cutsets of the resulting
graphs. Figure 37.18a shows a plan of a circuit with seven resistors, a voltage sup-
ply, and two capacitors. This particular circuit has 10 components and 10 edges.

R3
R3
C2 R6 B C D
E R6
A C1
C1 V C2 R4 R2 R1
R7 R4 R2 R1 V
R5 R5

R7
(a) (b)

Fig. 37.18 A circuit and its graph.


824
Note that A will be a vertex or node (a preferred term in circuits) but that the
joins B, C, and D are not separate nodes but can be coalesced into a single node.
GRAPH THEORY AND ITS APPLICATIONS

The equivalent graph is shown in Fig. 37.18b: it has five nodes and 10 edges. Note
that it is a multigraph with two nodes joined by two edges and two nodes joined
by four edges.
A circuit loop in the circuit is a cycle in the graph.
Kirchhoff’s laws have already been stated in eqn (21.8), but for convenience they
are given again here in graph terms. They state (i) that the algebraic sum of the
voltages around any loop is zero, and (ii) that the algebraic sum of the currents
entering any node is zero.
In addition, for resistors we also have Ohm’s law which states that the voltage
across a resistor is directly proportional to the current flowing through it, that is
v∝i or v = Ri,
where the constant R is measured in units called ohms (Ω). Figure 37.19 shows a
circuit with two independent maintained current sources iX and iY: the symbol of
the circle enclosing an arrow represents a maintained current in the direction
of the arrow.
37

The corresponding six-node digraph with currents i1, i2, … , i8 in the directions
indicated is shown in Fig. 37.20. If any current turns out to be negative then its
direction will be opposite to that shown.

R1

R8 b i1 c
R3 R2
i2 i8
i3
iX R7
iX f i7 d
R4 R5
iY i5
i4
iY
R6
a i6 e

Fig. 37.19 Fig. 37.20

Now introduce nodal voltages va, vb, … , vf as shown in Fig. 37.21. The use
of nodal voltages means that effectively Kirchhoff’s first law is automatically
satisfied. The earthing at e makes ve = 0 and other voltages can be measured rela-
tive to this zero ground potential.
This circuit has 13 unknowns: 8 currents and 5 nodal voltages. The problem
with circuits is the selection of the minimum number of consistent equations
from Kirchhoff ’s laws and Ohm’s law sufficient to determine the unknowns.
The graph of this circuit is the same as that in Fig. 37.16a, and we shall use the
same spanning tree as shown in Fig. 37.16b. In this graph, the number of nodes n is
6, the number of edges e is 10. Hence the cotree has, from the previous section,
e − n + 1 = 10 − 6 + 1 = 5 links. Any cutset of the original graphs which contains
one and only one branch of the spanning tree (the rest of the cutset consisting of
links) is known as a fundamental cutset of the circuit. Hence we can associate five
825

C1 C2 C3

37.6
vb vc
i1
i8 C4

ELECTRICAL CIRCUITS: THE CUTSET METHOD


i3 i2 C5
vf vd
iX
i7
i4 i5 iY
va ve = 0
i6

Fig. 37.21 Fig. 37.22

fundamental cutsets with the spanning tree in Fig. 37.16b. Five possible cutsets
C1, C2, … , C5 are shown in Fig. 37.22.
By repeated use of Kirchhoff’s second law to the nodes on one side of a cutset,
it follows that the algebraic sum of the currents crossing the cutset must be zero.
Hence the five cutset equations are:
C1: i1 − i3 + iX = 0, (37.1)

C2: i1 − i3 + i4 + i5 + i 7 − i8 = 0, (37.2)

C3: ii − i2 + i 7 − i8 = 0, (37.3)

C4: i6 − i5 − i 7 + i8 = 0, (37.4)

C5: iY − i8 = 0. (37.5)

These equations must be independent since each one contains a current from a
branch of the spanning tree which does not appear in any other equation. Further
any non-fundamental cutset equation will be a linear combination of the five
fundamental cutset equations. The number of branches in the spanning tree
defines the number of independent equations.
We can also apply Ohm’s law to each resistor (note that current flows from
high to low potential). Thus the voltage difference across R1 is vc − vb, so that
i1 = (vc − vb)/R1. (37.6)

Similarly
i2 = (vf − vc )/R2, (37.7)

i3 = (vb − vf )/R3, (37.8)

i4 = (vf − va )/R4, (37.9)

i5 = vf /R5, (37.10)

i6 = (−va )/R6, (37.11)

i7 = vc /R7, (37.12)

i8 = (vd − vc )/R8. (37.13)

We can now substitute for the currents from (37.6) to (37.13) into (37.1) to (37.5)
resulting in five linear equations to determine the nodal voltages va , vb , vc , vd , vf in
terms of the known currents iX and iY . The remaining currents can then be
calculated from (37.6) to (37.13).
826

Example 37.3 Using the cutset method, find all currents and nodal voltages in
GRAPH THEORY AND ITS APPLICATIONS

the circuit shown in Fig. 37.23.

R1 = 12 Ω

R2 = 3 Ω
R3 = 1 Ω
R4 = 2 Ω
iX = 2 A

R5 = 2 Ω

iY = 1 A

Fig. 37.23

The circuit can be represented by a graph with five nodes (Fig. 37.24) with the currents
i1, i2, i3, i4, i5 in the directions shown.
37

a(va) i1 b(vb) i1 C1
a b
C2
i2
i2 C3
i3 i3 C4
i4
d (v ) c
iX d c
d
(vc) iX i4
i5
iY i5
iY
e (ve = 0)
e

Fig. 37.24 Fig. 37.25 Fundamental cutsets.

A spanning tree with three links is shown in Fig. 37.25 together with cutsets C1, C2,
C3, C4. Hence Kirchhoff’s second law implies:
C1: i1 − i3 + i2 = 0, (37.14)
C2: iX − i3 + i2 = 0, (37.15)
C3: −iY + i5 − i3 + i2 = 0, (37.16)
C4: −iY + i4 + i2 = 0. (37.17)
With ve = 0, the currents in terms of the nodal voltages va, vb, vc, vd are, by Ohm’s law:
i1 = (va − vb )/R1 = 2(va − vb), (37.18)
i2 = (vc − vb )/R2 = 3 (vc − vb ),
1
(37.19)
i3 = (vb − vd )/R3 = vb − vd , (37.20)
i4 = (vc − vd )/R4 = 2 (vc − vd ),
1
(37.21)
i5 = vd /R5 = 12 vd. (37.22)
Eliminate the currents in (37.14) to (37.17) using (37.18) to (37.22):
2va − 103 vb + 13 vc + vd = 0, (37.23)

3 vb − 3 vc − vd = 2,
4 1
(37.24)
− 43 vb + 13 vc − 23 vd = 2, (37.25)
− 3 vb + 6 vc − 2 vd = 1.
1 5 1
(37.26)

827
Example 37.3 continued

37.7
These are linear equations which can be solved using the methods of Chapter 12.
Computer algebra is also very useful in solving sets of equations of this type (see the
computer algebra applications for Chapter 12 in Chapter 42). The answers are

SIGNAL-FLOW GRAPHS
va = 5 V, vb = 4 V, vc = 4 V, vd = 2 V.
Since vc = vb, no current flows through the resistor on bc.
We can summarize the result for an earthed circuit which contains only resistors and
current sources. Suppose that the representative graph of the circuit contains n nodes and
e edges of which f contain known current sources. The curcuit will have e − f unknown
currents and n − 1 unknown nodal voltages giving e − f + n − 1 unknowns in total. Its
spanning tree will have n − 1 edges which will lead to n − 1 fundamental cutset equations,
and Ohm’s law will apply to e − f resistors. Hence we shall always have a consistent set
of e − f + n − 1 equations to find the unknowns.
This result can be extended to circuits with current sources, voltage sources (batteries),
and resistors. If the representative graph has n nodes and e edges of which f contain
current sources and s maintained voltage sources, then the number of unknown currents
will be e − f and the number of unknown nodal voltages will be n − 1 − s since the nodal
voltage difference across a battery will be known. Hence the number of unknowns is
e − f + n − 1 − s which will satisfy n − 1 cutset equations and e − f − s Ohm’s laws.

37.7 Signal-flow graphs


Figure 37.26 shows a block diagram of a negative-feedback control system. The
input into the system is P(s) and the output Q(s). All operations are defined by their
transfer functions (see Section 25.4). The boxes represent devices or controllers.
The circle represents a sum operator, and the return sign on F(s) indicates positive
or negative feedback. The output signal Q(s) is fed back into the input through
H(s), and it is a negative feedback which will reduce the output. In a later problem,
we shall consider a device with a positive feedback. Thus the input into G(s) is
A(s) = P(s) − F(s). (37.27)

The boxes each produce outputs given by the transfer functions


Q(s) = G(s)A(s), (37.28)

F(s) = H(s)Q(s). (37.29)

We wish to find Q(s) in terms of P(s), G(s), and H(s), from the equations (37.27)
to (37.29). Thus, from (37.28)
Q(s) = G(s)A(s) = G(s)[P(s) − F(s)], = G(s)[P(s) − H(s)Q(s)].

P(s) A(s) Q(s)


G(s)
+

H(s) Fig. 37.26 Negative-feedback
F(s) Q(s) control system.
828

P(s) Q(s)
GRAPH THEORY AND ITS APPLICATIONS

G(s)
1 + G(s)H(s) Fig. 37.27 Block-reduced diagram
for Fig. 37.26.

H1(s)
+
P(s) + + Q(s)
G1(s) G2(s) G3(s)


H2(s)

Fig. 37.28 A multiple-feedback control system.

Hence the output transfer function is


37

G(s)
Q(s) = P(s).
1 + G(s)H(s)
This is the closed-loop transfer function. The actual signal can be obtained by
finding the inverse Laplace transform for Q(s). Hence the system is equivalent to
that shown in Fig. 37.27.
If the feedback reinforces the input signal it is called positive feedback.
Figure 37.28 shows a multiple-feedback control system with a positive and a
negative feedback. The output signal is given by
G1(s)G2 (s)G3(s)
Q(s) = P(s), (37.30)
1 − G2 (s)H1(s) + G1(s)G2 (s)G3(s)H2 (s)
which can be obtained by the method of block-diagram reduction. For example,
the feedback through H1 makes the system equivalent to that shown in Fig. 37.29.
We can now combine the series devices which reduce the system to the negative-
feedback control system considered at the beginning of this section. The details
are omitted here.

P(s) + Q(s)
G2(s)
G1(s) G3(s)
1 − G2(s)H1(s)

H2(s)

Fig. 37.29 First stage in the block reduction of the multiple-feedback control system.

This block-reduction method can get quite complicated for a complex feedback
system. Instead of using block reduction in this way, represent the system by a
weighted digraph as shown in Fig. 37.30, where the weights are the transfer
functions – except that the edges representing the input and output are assigned
829

H1(s)

37.7
P(s) 1 G1(s) G3(s) 1 Q(s)
x1 x2 x3 x4

SIGNAL-FLOW GRAPHS
G2(s)

−H2(s)

Fig. 37.30 Signal-flow graph for the multiple-feedback control system shown in Fig. 37.28.

weight 1 since they carry no devices. Also the negative feedback is replaced by
−H2(s), to make sure that it reduces the input into G1(s). This is the signal-flow
graph of the system. Let the inputs into the nodes be x1, x2, x3, and x4 as shown;
then, for the positive-feedback cycle,
x3 = G2x2, x2 = G1x1 + H1x3.
(The argument (s) has now been dropped from the working.) Hence
G1 G2 x1
x3 = .
1 − G2 H1
In other words, we can replace (a) by (b) in Fig. 37.31.

H1
G1G2
G1 1 − G2H1
by
x1 x2 x3 x1 x3

G2
(a) (b)

Fig. 37.31

There are other rules, and a complete list now follows for the replacements for
subgraphs in the graph.
(a) Multiple edges. See Fig. 37.32. This follows since
x2 = Gx1 + Hx1 = (G + H)x1.

G
G+H
by
x1 x2 x1 x2
H Fig. 37.32 Multiple edges.

(b) Edges in series. See Fig. 37.33. This follows since


x3 = Hx2 = H(Gx1) = HGx1.

G H GH
by
x1 x2 x3 x1 x3 Fig. 37.33 Edges in series.
830
(c) Cycles. See Fig. 37.34. This follows since
x3 = Hx2 and x2 = Gx1 + Jx3.
GRAPH THEORY AND ITS APPLICATIONS

Assume that HJ ≠ 1; otherwise there is infinite gain.

H GH
G 1 − HJ
by
x1 x2 x3 x1 x3

J Fig. 37.34 Cycle.

(d) Loops. See Fig. 37.35. This follows since


x2 = Gx1 + Hx2
with H ≠ 1.
(e) Stems. See Fig. 37.36. This follows since
x2 = Gx1, x3 = Hx2 = HGx1, x4 = Jx2 = JGx1.
Apply these rules to the successive reduction of the feedback system in Fig. 37.30.
37

The sequence of steps in the reduction of the signal-control graph to a single-edge


graph is shown in Fig. 37.37. The weight of the final edge agrees with the output in
eqn (37.38).

H1
G1 G3
P(s) G2 Q(s)
−H2
1 1
rule (c)

H G1G2
G 1 − H1G2
1−H G3
P(s) Q(s)
by −H2
x1 G x2 x1 x2 1 1

Fig. 37.35 Loop. rule (b)

G1G2G3/(1 − H1G2)
P(s) Q(s)
−H2
1 1

rule (c)
x3 x3
H GH
G G1G2G3
by P(s) 1 − H1G2 + G1G2G3H2 Q(s)
x1 x2 x1
J GJ 1 1
x4 x4

Fig. 37.36 Stem. G1G2G3


1 − H1G2 + G1G2G3H2

Fig. 37.37 Successive steps in the


reduction of the signal-flow graph of
the control system shown in Fig. 37.28.
831
Essentially the operations in a signal-flow graph are those applied to a
weighted digraph as illustrated in the following example.

37.8
Find the output–input relation in the signal-flow graph shown in

PLANAR GRAPHS
Example 37.4
Fig. 37.38.
Applying rule (a) to the multiple edge, and rule (c) to the cycle, the graph is reduced
to Fig. 37.39. Apply the series rule to the divided edges to give Fig. 37.40. Finally the
multiple-edge and series rules give Fig. 37.41. Thus the output is given by
abd
q= + he(g + f ).
1 − bc
In the actual control system a, b, c, … will be transfer functions.

b
ab
a d 1 − bc d
p 1 c 1 q p 1 1 q
g
h e h e
f g+f

Fig. 37.38 Fig. 37.39

abd
1 − bc
p q abd
1 1 p + he(g + f ) q
1 − bc

h(g + f )e Fig. 37.41


Fig. 37.40

Self-test 37.4
Suppose that c is in the opposite direction in the signal-flow graph Fig. 37.38.
Find the new output–input relation.

37.8 Planar graphs


As we remarked in Section 37.1, planar graphs are important in circuit design
since planar circuits can be manufactured as a single board. A planar graph is
a graph that can be drawn with no edges crossing or meeting except at vertices.
The standard example of a simple application which cannot be represented by a
planar graph is the delivery of three services, water (W), gas (G), and electricity (E),
to three houses A, B, C (Fig. 37.42). This graph has no plane drawing. The reorgan-
ization of the graph in Fig. 37.43 shows the impossibility of this; if W and C are
connected last then this edge must cross either AE or BG.
832

A
GRAPH THEORY AND ITS APPLICATIONS

W G
W G E

B C

E
A B C

Fig. 37.42 Bipartite graph K3,3. Fig. 37.43

The graph in Fig. 37.42 is an example of a bipartite graph in which one set
of vertices may be connected to another set of vertices, but not to vertices in the
same set. If every vertex in one set is connected by one edge to every vertex in the
37

other set then it is called a complete bipartite graph. If the sets have m and n
vertices respectively, then the notation Km,n denotes the complete bipartite graph.
Figure 37.42 shows the graph K3,3 and this graph is not planar. Check that the
graphs K2,2 and K2,3 are planar.
In planar graphs there is a relation between the numbers of vertices, edges, and
faces. In a plane drawing of a graph, the plane is divided into regions called faces.
One face is the region external to the graph. Figure 37.44 shows a planar graph
with five vertices and seven edges, and with four faces: A, B, C, and the external
face D.
A remarkable formula, due to Euler, links the numbers of vertices, edges, and
faces of a graph.

Theorem (Euler). Suppose that the graph G has a planar drawing, and let v be
the number of vertices, e the number of edges, and f the number of faces of G.
Then
v − e + f = 2.

Proof. For the graph G, define a spanning tree (see, for example, Fig. 37.45). The
spanning tree must have n vertices and n − 1 edges (see Section 37.5). It must also
have just one face. Since
n − (n − 1) + 1 = 2,
Euler’s formula holds for the spanning tree. Successively replace the other edges
in the graph. Each time an extra edge is added, a face is divided and one extra face
is added. However, algebraically, this cancels the additional edge in the accumu-
lation to Euler’s formula for the spanning tree. Hence
v−e+f=2
for the reconstructed graph G.
833

a b

37.8
D

PLANAR GRAPHS
c
C

e d

Fig. 37.44 A planar graph with Fig. 37.45 A graph with a spanning tree.
fiver vertices, seven edges, and
four faces.

The complete graph with n vertices is denoted by Kn. Since every vertex is joined
to n − 1 vertices, Kn has --21 n(n − 1) edges. The graphs of K2, K3, K4, and K5 are
shown in Fig. 37.46. Of these graphs, K2, K3, and K4 are planar, but K5 and all
succeeding complete graphs are not.

K5

K3 K4

K2

Fig. 37.46 The complete graphs Kn for n = 2, 3, 4, 5.

The graphs K3,3 and K5 are the keys to tests for planarity of graphs, and whether
it is possible to design, for example, a plane printedcircuit board to make the re-
quired connections between electronic components. It was proved by Kuratowski
in 1930 that every non-planar graph contains subgraphs which are either K3,3 or
K5, or K3,3 or K5 with additional vertices on their edges.
Further discussion of graph theory with many applications can be found in the
introductory text by Wilson and Watkins (1990).

Self-test 37.5
(a) A regular dodecahedron has 12 faces (pentagons) and 30 edges. How
many vertices does it have?
(b) An icosahedron has 20 faces (triangles) and 12 vertices. How many edges
does it have?
834

37.9 Further applications


GRAPH THEORY AND ITS APPLICATIONS

Braced frameworks
Consider a frame which consists of four struts in the shape of a rectangle
(Fig. 37.47a) with pin joints at each corner. Without a diagonal tie the structure will
not support a vertical load, but will collapse into a parallelogram as shown in
Fig. 37.47b. The structure can be made rigid and load bearing by the insertion of
a diagonal strut as in Fig. 37.48.

Load

(a) (b)

Fig. 37.47 Single unbraced


pin-jointed frame.
37

Fig. 37.48 Braced frame.

Consider now a pinjointed framework with m × n rectangular frames with


some individual frames braced. How can we decide whether a particular frame-
work is braced, that is no part of it can be sheared? And if it is braced, how many
ties could be removed to leave a minimum bracing? The framework is similar to a
vertical section of scaffolding or a steel-framed building, although in both cases
the joins are bolted but can still need bracing to ensure rigidity.
Figure 37.49 shows a 5 × 6 framework with 11 braces as shown (braces can be
diagonal struts in either direction). Label the cell rows r1, r2, … , r5 and the cell
columns c1, c2, … , c6 as shown in Fig. 37.49. The framework will be represented
by a bipartite graph (see Section 37.8) with the cell rows and columns as vertices.
Arrange them in rows as shown in Fig. 37.50.

c1 c2 c3 c4 c5 c6
r1
r1 r2 r3 r4 r5
r2 Cycle

r3
r4
c1 c2 c3 c4 c5 c6
r5
Fig. 37.50

Fig. 37.49 5 × 6 framework.


835
If a particular rectangular cell is braced then the identifying row and column
vertices are joined by an edge. Thus the cell r1c1 is braced so that an edge joins r1

37.9
and c1 in the bipartite graph. No edge joins r1 and c3 since this cell is not braced.
The bipartite graph representing the framework is shown in Fig. 37.50. If the

FURTHER APPLICATIONS
graph is connected, then the framework is braced since the shearing of any cell or
group of cells is not then possible. The graph is connected in this case, and the
framework is braced. Can any braces be removed in such a way that the frame-
work is still braced? Any brace which is removed must not disconnect the graph.
If the graph contains a cycle (Section 37.4) then any edge removed from the cycle
will not disconnect the graph. This removal rule can be applied to each cycle in
the graph. If, at the end of this process, there are no cycles remaining and the
graph remains connected, then the framework is said to have a minimum bracing.
The framework graph in Fig. 37.50 contains just one cycle, namely r1 c1 r3 c3r4 c6r2 c2 r1
(see Fig. 37.49). Any edge can be removed from this cycle leaving a minimum
bracing. The removal of any further edges will disconnect the graph.
If every cell is braced in a framework then the bipartite graph will be complete,
and the framework will be seriously overbraced. You might note that a complete
bipartite graph Km,n has mn edges but a minimum bracing for an m × n frame-
work has m + n − 1 edges: for example, if m = 5 and n = 6 then mn = 30 whilst
m + n − 1 = 10.
Figure 37.51 shows an unbraced 4 × 5 framework, its (disconnected) graph, and
the same framework sheared.

c1 c2 c3 c4 c5
r1

r2 r1 r2 r3 r4

r3

r4
c1 c2 c3 c4 c5

Fig. 37.51 An unbraced framework.

Phasing of traffic signals


Figure 37.52 shows a road junction with eight incoming lanes of traffic and a one-
way exit. Suppose that each lane can be controlled by its own individual signal.
One solution for traffic management would be to allow each lane to have a green
signal in sequence with the remaining all on red, but this would be inefficient
since obviously several lanes of traffic can move simultaneously without risk.
How can an efficient phasing of the signals be designed?
Label each incoming lane a, b, c, …, h as shown, and let these be vertices of a
graph (Fig. 37.53). Starting, say, with a we decide which traffic lanes are compat-
ible with a; that is, which lanes can also have green lights simultaneously without
risk of a collision. Thus a and b are compatible, and we therefore join a and b by
836

a h
GRAPH THEORY AND ITS APPLICATIONS

b g
a
b
c
h
g
f
c f

d e d e

Fig. 37.52 Road junction. Fig. 37.53

an edge. Lanes a and c are also compatible, and we therefore join a and b by an
edge. Lanes a and c are also compatible, but a and e are not, and so on. The graph
G in Fig. 37.53 shows which lanes are compatible, and is known as the compatibility
37

graph for this junction.


We now look for complete subgraphs (Section 37.8) in G. An edge is a complete
subgraph (K2), a triangle (K3) is a complete subgraph with three vertices, K4 with
four vertices, and so on. We try to use the largest subgraphs in any covering of G,
that is a list of subgraphs which includes all vertices. In G, abcd, abdf, and abfg
are K4 subgraphs, and there are a large number of triangles. For example, we can
cover G by the set of subgraphs
{abcd, abfg, def, fgh}.
Generally, we include as many large subgraphs as possible. In this list it is better
to use fgh rather than just gh: this could be chosen since f is included in other
subgraphs.
Suppose that the period of the traffic signal sequence is T seconds with each
lane having a green light for at least 15 T. There are four different traffic flows
represented by the subgraphs. Suppose that each subgraph list of lanes has a
green light for 14 T. The green/red phasing sequence is shown in Table 37.2.
The actual phasing lane by lane is shown in Fig. 37.54 where the solid line
indicates the green light for a lane. For example, between 14 T and 12 T, lanes a, b, e,
f are on green with the others on red.

Table 37.2

Subgraph

Time abcd abfg def fgh

0– --14 T green red red red


--14 T– --12 T red green red red
--12 T– --34 T red red green red
--34 T–T red red red green
837

abcd abfg def fgh

PROBLEMS
abcd abfg def fgh
1 1 3 1 2 3 4
0 4T 2T 4T T 0 5T 5T 5T 5T T
a a
b b
c c
d d
e e
f f
g g
h h

Fig. 37.54 Traffic phasing. Fig. 37.55

The total waiting time for the traffic at the junction is a measure of the effici-
ency of the timings and phases. Let ta, tb, tc, … be the waiting times of the lanes so
that, from Fig. 37.54, we can see that ta = 12 T, tb = 12 T, tc = 34 T, etc. Hence the total
waiting time WT is given by
WT = ta + tb + ··· + th = 21 T + 21 T + 43 T + 21 T + 43 T + 14 T + 21 T + 43 T = 92 T.
Can the waiting time be reduced within the time constraints by choosing either
a different set of subgraphs to cover G, or a different sequence of timings?
Figure 37.55 shows the same choice of subgraphs but with different timings.
The result is a slightly shorter waiting time of 225T.

Problems

37.1 (Section 37.2). Write down the degree of each 37.4 Sketch the eight regular graphs with six
vertex in the graph in Fig. 37.56. vertices. How many of them are connected?

a d 37.5 The adjacency matrix of a graph G with


no loops is a vertex–vertex matrix, in which
the element in the ith row and jth column is 0
if vertices i and j are not joined by an edge,
e
and r if i and j are joined by r edges. Thus,
if we list the vertices a, b, c, d as 1, 2, 3, 4
respectively, then the adjacency matrix of
the graph in Fig. 37.1 is
b c ⎡0 2 0 1⎤
⎢2 0 1 1⎥
A=⎢
1⎥
Fig. 37.56 .
0 1 0
⎢ ⎥
⎣1 1 1 0⎦
37.2 (Section 37.2). Draw the complete graph with
six vertices. How many edges does it have? Note that the leading diagonal has zeros if there
are no loops. The adjacency matrix is a formula
37.3 (Section 37.2). Sketch the 21 connected for the graph.
unlabelled graphs with five vertices. How many of Evaluate A2. What is the interpretation of the
them are planar? matrix in terms of the edges of G?
838
37.6 Draw the graphs defined by the following a
adjacency matrices:
GRAPH THEORY AND ITS APPLICATIONS

⎡0 1 1 1 1⎤
⎡0 2 0 0⎤ b
⎢1 1⎥
f
0 1 1 ⎢2 1⎥
(a) A = ⎢1 1⎥ , (b) A =
0 1

1 0 1
⎥ ⎢0 1 0 1⎥
.
⎢1 1 1 0 1⎥ ⎢ ⎥ g
⎢⎣1 1 1 1 0⎥⎦ ⎣0 1 1 0⎦

c e
37.7 Write down the adjacency matrices of
the graphs in Fig. 37.7. Note that a single loop
introduces an element 1 into the appropriate
position on the leading diagonal. What d
characterizes the matrix of a disconnected graph?
Fig. 37.58

37.8 (Section 37.4). How many different cycles


37.13 Figure 37.59 shows a digraph. How many
pass through a single vertex in a complete graph
trails are there between a and e? Which of them
with four vertices?
are also paths? Can you find a four-edge cycle?
37.9 (Section 37.4). List all trails between vertices
a and f in the graph shown in Fig. 37.57. Identify a
37

which trails in the list are also paths.

37.10 (Section 37.4). Is the graph in Fig. 37.57


f b
eulerian? If it is find an eulerian closed trail. Is it
hamiltonian?

37.11 (Section 37.5). Construct a spanning tree


for the graph shown in Fig. 37.57. Draw its cotree. e c
Show that there is a spanning tree in which no
vertex has degree more than two.

c d

Fig. 37.59
b d
37.14 (Section 37.6). Figure 37.60 shows a circuit
with an independent current source i0. Represent
a e the circuit by a graph. How many vertices does the
graph have?

R1
h f
R2 R3 R4

g
R5 R6 i0 R7
Fig. 37.57

37.12 Figure 37.58 shows a graph with seven Fig. 37.60


vertices.
(a) Decide whether the graph is eulerian. 37.15 (Section 37.6). A circuit is represented by the
(b) Construct a spanning tree for the graph. graph shown in Fig. 37.61. The current i0 is from
How many branches does the tree have? an independent source, and all other edges contain
(c) Draw a cutset which disconnects the vertices a resistor in which the current i1 passes through a
a, b, g, f from the vertices c, d, e. resistor R1 and so on. Define a spanning tree for the
839
i2 (a) iX = 1 A
b c

PROBLEMS
i4 i3

R2 = 2 Ω R1 = 3 Ω
e R3 = 1 Ω
i7
i1 i5
R4 = 1 Ω iY = 2 A
i0

iZ = 2 A

a i6 d
R5 = 2 Ω R6 = 1 Ω
Fig. 37.61

graph. How many fundamental cutsets are required?


(b) R1 = 1 Ω
Write down the current equations associated with
each of the cutsets. If i1 = 2 A, a maintained current,
vc = 0 (earthed), and Rk = 1 Ω for k = 0, 1, 2, … , 7, R2 = 2 Ω

R4 = 1 Ω
find the remaining voltages va , vb , vd, ve . R3 = 3 Ω

R5 = 2 Ω iX = 1 A
37.16 (Section 37.6). Figures 37.62a,b show two
circuits with current sources and resistors. Use

R7 = 1 Ω
the cutset method to find the modal voltages and
currents through the resistors. iY = 2 A
R6 = 1 Ω
37.17 Complete the block-reduction method
for the multi-feedback control system shown
R8 = 1 Ω
in Fig. 37.28.
37.18 (Section 37.5). Figure 37.63 shows a Fig. 37.62
positive-feedback control system. If P(s) is the
system input, find its output Q(s), and the transfer
P(s) + Q(s)
function of a single equivalent device.
G(s)
37.19 (Section 37.7). Find the outputs in the +
systems shown in Figs 37.64a,b by progressively H(s)
replacing parts of the system by equivalent devices
until just one device remains. Find the transfer Fig. 37.63
function of the resulting equivalent single device.

(a)
H1(s)

P(s) + Q(s)
G1(s) G2(s) G3(s)
+
+
H2(s)

(b)
H1(s)

P(s) + Q(s)
G1(s) G2(s) G3(s)
+
+
H2(s)

H3(s)

Fig. 37.64
840
(a) −H1 (b)
GRAPH THEORY AND ITS APPLICATIONS

1 G1 G2 G3 1 x1 1 x2 G1 x4 G2 x5 1 x6
x1 x2 x3 x4 x5 x6
H1 H2
H2 x3
−H3

(c) H1 (d) H1
x3 G6
x4
G6 G5
G7 G5 x7 x8 G7
1 G1 G2 G3 G4 1 1 G2 G3 1

x1 x2 x5 x6 x7 x8 x9 x1 x2 G1 x3 x4 x5 G4 x6 x9

−H2 −H2

x5
(e) −H1

G4

x1 G1 x2 G2 x3
G3

−H2
37

x4

Fig. 37.65

37.20 (Section 37.7). Reduce each of the signal- 37.24 (Section 37.1). List all the paths between
flow graphs in Figs 37.65a,b,c,d to an equivalent S and T in the network given in Fig. 37.6, and
single edge, and (e) to a stem, and find the transfer hence find the shortest and longest paths. (This
function in each case. method of simply listing all paths can become very
extensive for larger networks: efficient algorithms
37.21 (Section 37.8). Label the edges, vertices, and are really required to reduce the number of
faces of the graphs shown in Figs 37.66a,b and calculations.)
verify Euler’s formula.
37.25 (Section 37.9). Show that the framework
in Fig. 37.67 is overbraced. How many ties can
be removed to leave a minimum bracing?

(a) (b)

Fig. 37.66 Fig. 37.67

37.26 (Section 37.9). How many ties will be


37.22 (Section 37.8). Show that the bipartite needed to secure a minimum bracing for the
graph K2,3 has a planar representation. framework shown in Fig. 37.68? Draw in a
suitable set of ties for a minimum bracing.
37.23 (Section 37.8). The complete graph K5 does
not have a plane drawing. What is the minimum 37.27 (Section 37.9). Decide whether the
number of edge crossings in a plane representation frameworks shown in Fig. 37.69 are overbraced,
of the graph? have a minimum bracing, or are not braced.
841
37.28 (Section 37.9). The framework in Fig. 37.69c
is required to be strengthened so that it is

PROBLEMS
overbraced with each diagonal tie as an edge in at
least one cycle in the associated bipartite graph.
What is the minimum number of ties which must
be added?

37.29 Figure 37.70 shows a junction with eight


distinct lanes of traffic each controlled by a
separate traffic signal. This is really a ‘design and
solve’ problem. Here is one model: of the doubtful
cases assume that lane a is compatible with both
c and e, and that e is compatible with h. Draw
Fig. 37.68
the compatibility graph for this junction. List all
complete subgraphs with four and three vertices.
(a) If the period of the traffic signal cycle is T and the
subgraphs
{abef, cdg, aeh}
are chosen with each allowed green for --13 T,
calculate the total waiting time. Suppose that
the subgraph abef runs for --12 T and the others
for --14 T each. How does this affect the total
(b) waiting time?

h g

(c) a
b
f
e

c d

Fig. 37.69 Fig. 37.70


38 Difference equations

CONTENTS

38.1 Discrete variables 842


38.2 Difference equations: general properties 845
38.3 First-order difference equations and the cobweb 847
38.4 Constant-coefficient linear difference equations 849
38.5 The logistic difference equation 854
Problems 859

In many applications, functions can only take discrete values – that is, they can-
not (for various reasons) take a continuous spectrum of values. It is reasonable to
model the temperature in a room by a function which varies continuously with
time – most of the calculus in this book is concerned with such functions. On the
other hand, the population size of a country can only take integer values. As
births and deaths occur, the population size is discontinuous in time, and the
graph of population size against time will be a step function. Between births and
deaths the population number will be constant so that we are only concerned
with changes which take place at these events. In this problem jumps occur at
variable time intervals.
We can obtain discrete data from a continuous signal or function by sampling
the signal at regular time steps rather than keeping a continuous record. This is
often the situation in microprocessor-driven operations.
The progress of events is often described in the form of equations linking several
successive events: so-called difference equations. The reader may notice analogies
between the solutions of these and the solutions of differential equations.

38.1 Discrete variables


Let us start by considering a simple financial application which generates discrete
values. In compound interest the sum of £P0 is invested in an account to which
interest accrues annually at a compound rate of 100I%. If £P1 is the amount in
the account at the end of the first year, then
P1 = (1 + I)P0. (38.1)
843
Let £Pn be the sum after n years. Then, similarly

38.1
Pn = (1 + I)Pn−1. (38.2)

This is an example of a difference equation or recurrence relation. It gives the

DISCRETE VARIABLES
values of Pn at the integer values 1, 2, … in terms of the immediately preceding
value. Treating the variable as n, the difference in this case is 1. The notation P(n)
instead of Pn is often used to emphasize the function aspect of P but we have
chosen the more economical subscript form Pn.
It is fairly easy to solve (38.2) by repeated application of the formula starting
with (38.1). Thus
P2 = (1 + I)P1 = (1 + I)2P0,
P3 = (1 + I)P2 = (1 + I)3P0,
and so the formula
Pn = (1 + I)nP0 (38.3)

holds at least for values of n up to 3. Suppose that (38.3) holds for n = k. Then
(38.2) implies that
Pk+1 = (1 + I)Pk = (1 + I)k+1P0 .
So the same formula holds for Pk+1. Hence, if the result is true for k then it is also
true for k + 1. Equation (38.1) confirms that it is true for k = 1. It follows sequen-
tially that it is true for n = 2, n = 3, and so on. (This method of proof is known as
induction.)

Example 38.1 £1000 is invested for 5 years at the following rates: (a) 5%
5 5
annually; (b) % calendar monthly; (b) 365
12 % daily (ignoring leap years).
(c) Calculate the final amount in the account in each case.
In each case the formula is
Pn = (1 + I)nP0,
with P0 = 1000, but the I and n differ.
(a) This is the original problem with n = 5 and I = 0.05. Hence
P5 = (1 + 0.05)5 × 1000 = 1.055 × 1000 = 1276.28
(in £, to the nearest penny).
(b) This account has 12 compounding periods each year, giving a total of 60 over the
5 years. Hence we require
60
⎛ 0.05 ⎞
P60 = ⎜ 1 + ⎟ × 1000 = 1283.36.
⎝ 12 ⎠
(c) For the daily rate, there are 365 × 5 = 1825 compounding periods. Thus we require
1825
⎛ 0.05 ⎞
P1825 = ⎜ 1 + ⎟ × 1000 = 1284.00.
⎝ 365 ⎠
There is a slight gain with increasing number of compounding periods.
844
The following financial application of a loan repayment leads to a difference
equation.
DIFFERENCE EQUATIONS

The general mortgage problem is as follows. An amount £P is borrowed for a


period of N years, at an interest rate of i% per annum (as a fraction I this is equi-
valent to £(i/100) per year per pound of debt). Repayment is made by N equal
payments £A, one at the end of every year, starting at the end of of the first year.
There are two constituents of each payment A. One part goes to pay the interest
on the debts that was carried during the previous year. The rest is used for capital
repayment to reduce future debt. Given P, N and I, we want to know the regular
annual repayment A required to exactly clear the debt at the end of year N. (There
are other mortgage models that are used which calculate interest daily or
monthly: the above method can be adapted by changing N to handle these cases.)
38

The nth payment A is made at the end of year n, after which the debt out-
standing is denoted by un. The payment A comprises:
(interest owed on un−1 through year n) + (a capital repayment)
Therefore
A = Iun−1 + (un−1 − un) or un = −A + (1 + I)un−1 (38.4)

where n = 1, 2, … , N, u0 = P, and the constant A is to be chosen so that the final


payment clears the debt, that is uN = 0.
The difference equation can be solved by step-by-step employment of the
recurrence relation (38.4). In general
un = −A + β un−1 (we put 1 + I = β, for brevity).
Start with u0 = P, and calculate the sequence u1, u2, u3, … , uN:
u1 = −A + β P,
u2 = −A + β u1 = −A + β(−A + β P) = −A(1 + β ) + β 2P,
u3 = −A + β u2 = −A + β{−A(1 + β ) + β 2P} = −A(1 + β + β 2) + β 3P,
and so on – the rule for subsequent terms of the sequence is clear. Use eqn (1.34)
to sum N terms of the emerging geometric series in β; then
A(β N − 1)
uN = −A(1 + β + β 2 + … + β N−1) + β NP = − + β NP.
(β − 1)
Using the condition uN = 0, we find that
I(1 + I)NP
A= .
(1 + I)N − 1

Example 38.2 The sum of £50 000 is borrowed over 25 years to be repaid in
equal instalments, the interest on the outstanding balance in any year being 8%.
Find the annual repayments over the term of the loan.
In the notation above, P = £50 000, I = 0.08, N = 25. Therefore the annual repayment to
the nearest penny is
I(1 + I)NP 0.08 × 1.0825 × 50 000
A= = = £4683.94
(1 + I)N − 1 1.0825 − 1
845
Example 38.2 continued

38.2
The total repayment over 25 years is
NA = 25A = £117 098.47.

DIFFERENCE EQUATIONS: GENERAL PROPERTIES


The capital repayment included in A at the end of year 1 is only
u0 − u1 = A − IP = £683.94,
which indicates how interest payment predominates in the early years of the mortgage.

38.2 Difference equations: general properties


Any equation of the form
un = f(un−1, un−2, … , un−m) (38.5)

(where m is an integer  1) for consecutive sequence of integers n, which may


or may not terminate, is known as a difference equation. The term discrete
dynamical system is also frequently used. Thus
un = 2un−1 + 2, (38.6)

un = 3un−1 + 2un−2 + n2, (38.7)

un+1 = kun(1 − un) (38.8)

are examples of difference equations.


The number m in (38.5) is known as the order of the difference equation: it is
the difference between the largest and smallest subscripts attached to u, namely
n − (n − m) = m.
Thus (38.6) and (38.8) are first-order difference equations, while (38.7) is second-
order. The sequence of integers attached to u can be translated (i.e. any integer
can be added to the index n) without affecting the difference equation. The dif-
ference equation
un+2 = 3un+1 + 2un + (n + 2)2,
is the same as (38.7): n has been replaced by n + 2 throughout, although the limits
on n change.
Given initial conditions, the successive terms are very easy to compute. For a
first-order difference equation, we can assume that u0 is given, but it could be any
term, say ur, which is taken as the initial condition. Generally, our aim is to find a
sequence {un} and a formula for un for n  r which satisfies the difference equation.
The difference equation (38.8) (which is known as the logistic equation) with
k = 2 is
un+1 = 2un(1 − un). (38.9)

Suppose that we put u0 = 14 ; then the sequence


3 15 255 65 535
u1 = , u2 = , u3 = , u4 = , …,
8 32 512 131 072
846

un
DIFFERENCE EQUATIONS

1
2
u3 u4
u2
u1
u0

Fig. 38.1 Iterations of the sequence


O 1 2 3 4 n un+1 = 2un(1 − un) with u0 = --41 .

follows by successive substitution. This sequence of numbers is actually approach-


ing the value 12 as n increases. We can sketch the sequence by discrete values at
38

integer values of x in the usual cartesian axes. The series of dots in Fig. 38.1 is a
graphical representation of the sequence.
The implied limiting value of un as n → ∞ for this particular sequence suggests
that un = 12 is a constant solution of the difference equation (38.9), and this can be
confirmed. We can find all constant solutions by simply putting un = u for all n.
From (38.9), the constant solutions are given by
u = 2u(1 − u), or 2u2 − u = 0,
which implies that u = 0 and u = 12 are solutions. These are also known as the fixed
points or equilibrium values of the difference equation.

Fixed points or equilibrium values


For any first-order difference equation un+1 = f(un), its fixed points are given
by solutions of
u = f(u). (38.10)

You might notice, by trial computation, that the solutions of (38.9) vary quan-
titatively with the initial value, u0. If 0  u0  1, then un appears to approach --12 as
n becomes large, but, if u0  1 or u0  0, then un becomes unbounded for large n.
We shall discuss the logistic equation further in Section 38.5.
For the second-order difference equations, the same process gives equilibrium
values. For example, if
un+2 − 2un+1 + 4un = 6,
then this equation has an equilibrium value obtained by putting un+2 = un+1 = un = u,
so that
u − 2u + 4u = 6 or u = 2.
On the other hand, the second-order difference equation (38.7) has no equilib-
rium values since
u − 3u − 2u − n2 = −4u − n2
can never be zero for constant u and all n.
847

Self-test 38.1

38.3
The sum of £100 000 is borrowed over a 25 year term at an annual interest
rate of 6.5%. Find the annual repayment assuming that the interest rate

FIRST-ORDER DIFFERENCE EQUATIONS AND THE COBWEB


remains the same throughout. At the end of 5 years, the interest rate is
increased to 7%. What should the annual repayments be increased to repay
the outstanding loan over the remaining 20 years?

38.3 First-order difference equations and the cobweb


An alternative method of representing solutions of difference equations graphic-
ally is the cobweb construction. Consider the first-order difference equation
un+1 = f(un) = --12 un + 1.
The equation has a fixed point or equilibrium value where
u = --12 u + 1 so that u = 2.
With this in view plot the lines y = x and y = --12 x + 1 (Fig. 38.2) in the x,y plane.
These straight lines intersect at x = y = 2, which corresponds to the fixed point.
Select an initial value, say, u0 = --12 , and represent it by the point P0 : (u0, 0) = (--12 , 0)
in the x,y plane. From the difference equation
u1 = --12 u0 + 1 = --12 · --12 + 1 = --54 .
We can represent this by the point P1 : (u0, u1) = (--12 , --54 ) in Fig. 38.2. Join P0 to P1, and
then to Q1 : (u1, u1) = (--54 , --54 ) on the line y = x. Now join Q1 to P2 = (u1, u2) = (--54 , 13
–8 ) on
y = --2 x + 1. Repeat the process by drawing lines between y = x and y = --2 x + 1 using
1 1

the same rules.


The usefulness of the method is that a graphical representation and inter-
pretation of the solutions can be achieved by simple line drawings as shown in
Fig. 38.2 and in the following example. It is particularly helpful for finding fixed
points and assessing their stability. The connected lines are known as cobwebs
for obvious reasons. We can observe that this difference equation has only one
fixed point at (2, 2), which is stable, since all cobwebs approach the point form
any initial point.

y y=x
(2, 2) y = 12x + 1
P2
P1
1 Q1

O
P0 x
Fig. 38.2
848
For a general difference equation un+1 = f(un), the cobweb construction takes
place between the straight line y = x and the curve y = f(x).
DIFFERENCE EQUATIONS

Example 38.3 Sketch a cobweb solution for


un+1 = −kun + k,
for (a) k = 21 , (b) k = 23 , (c) k = 1, using the initial value u0 = 3
4 in each case.

y y y
3
y=x 2 1
38

P4 y = −x + 1
y=x
y=− x+ 3
2
3
2 y=x
1 P2 P2 Q2
2 P2 Q2
Q2
y = − 12x + 1
Q1
Q3 P3
2 P1
P1 Q3 P1 Q1
P3
Q1
O P0 1x O P0 1 x O P0 1x

Fig. 38.3 Cobweb for Fig. 38.4 Cobweb for Fig. 38.5 Cobweb for
un+1 = − 12 un + 12 with u0 = 34 . un+1 = −2un + 2 with u0 = 34 . un+1 = −un + 1 with u0 = 34 .

(a) Plot the lines y = x and y = − 12 x + 12 . They intersect at the fixed point ( 13 , 13 ). Starting
from P0 : ( 43 , 0), the cobweb traces P0 P1Q1P2Q2P3 … in Fig. 38.3. Evidently it approaches the
fixed point as n → ∞, indicating stability.
(b) The lines are y = x and y = − 23 x + 23 . The fixed point is at ( 53 , 53 ), and the cobweb
path is P0 P1Q1P2Q2 … in Fig. 38.4. The path moves away from the fixed point implying
its instability.
(c) The lines are y = x and y = −x + 1 with fixed point ( 12 , 12 ). The path starting at P0 :
( 43 , 0) follows the rectangle P1Q1P2Q2, indicating periodicity (Fig. 38.5). This is true for
any starting value except that of the fixed point itself.
Graphs of the sequences un versus n are shown in Fig. 38.6.

(a) (b) un (c)


un
1 1 un
3 3 3
4 4 4

1
4

O 1 2 3 4 5 6 7 8 n O 1 2 3 4 5 6 7 n O 1 2 3 4 5 6 7 8 n

Fig. 38.6 Solutions of un+1 = −k(un + 1) for (a) k = 21 , (b) k = 2, (c) k = 1.


849
The stability of the fixed point of the general first-order linear difference equa-
tion can be summarized as follows.

38.4
CONSTANT-COEFFICIENT LINEAR DIFFERENCE EQUATIONS
Stability
The first-order difference equation un+1 = −kun + a has a fixed point at
u = a/(1 + k), (k ≠ −1). The fixed point is stable if | k |  1, unstable if | k |  1,
and periodic if k = 1.
If k = −1, the equation has no fixed point unless a = 0. (38.11)

Self-test 38.2
Consider the difference equation un+1 = f(un) = --21 − un2. Plot the curve y = f(x) =
--21 − x2 and the straight line y = x. What are the coordinates of the fixed point
in the x,y plane? Given u1 = 0.2, compute u2, u3, u4, u5. Draw the corres-
ponding cobweb. Does it indicate stability of the fixed point?

38.4 Constant-coefficient linear difference equations


Any difference equation of the form
un + an−1 un−1 + ··· + an−mun−m = f(n),
where the ai (i = n − m, … , n − 1) are constants, is a constant-coefficient linear
difference equation. We shall look in detail at the second-order case
un+2 + 2aun+1 + bun = f(n), (38.12)

where a and b are constants and f(n) is a given function. The methods generalize in
a fairly obvious way to higher-order systems.
There are many parallels between the difference equation (38.12) and second-
order constant-coefficient equations (Chapters 18–19). The equation is said to be
homogeneous if f(n) = 0, and inhomogeneous otherwise, just as in the case of
second-order differential equations. However, this section is self-contained and
reference back is not necessary. The general solution of the inhomogeneous case
requires that of the homogeneous case: hence we start with the latter.

Homogeneous equations
We can see how to proceed by looking at the first-order constant-coefficient equation
un+1 − cun = 0. (38.13)

As can be seen from (38.2) or verified directly, the general solution of this equa-
tion is
un = Ac n, (38.14)

where A is any constant. Notice that we could equally well write


un = Ac n−1, or un = Ac n+1:
850
the result would be equally correct, although A would take different values for the
same initial condition. The significant property of (38.13) and its solution (38.14) is
DIFFERENCE EQUATIONS

that un+1 is a constant multiple of un.


With this in view, we attempt to find solutions of
un+2 + 2aun+1 + bun = 0 (38.15)

in the form un = pn, where p is a constant. Thus


un+2 + 2aun+1 + bun = pn+2 + 2apn+1 + bpn = (p2 + 2ap + b)pn = 0,
for all n, if p = 0 or
p2 + 2ap + b = 0. (38.16)
38

The case p = 0 leads to the self-evident solution un = 0. We are interested in


solutions of (38.16), which is known as the characteristic equation of (38.15).
There are various cases to consider. Suppose that the roots of (38.16) are the dis-
tinct numbers p1 and p2. Hence un = p n1 and un = p n2 are solutions of (38.15). Since
this equation is homogeneous and linear, it follows that any linear combination
of p1n and p n2 is also a solution. We state this as follows.

Distinct roots
The general solution of un+2 + 2aun+1 + bun = 0 for distinct roots p1 and p2 of
p2 + 2ap + b = 0 is
un = Ap n1 + Bp n2, for any constants A and B. (38.17)

Example 38.4 Find the general solution of


un+2 − un+1 − 6un = 0. (38.18)

The characteristic equation of (38.18) is


p2 − p − 6 = 0, or (p − 3)(p + 2) = 0.
The roots are p1 = 3, p2 = −2. Hence the general solution is
un = A·3n + B(−2)n.

Example 38.5 Find the solution of


un+2 + 2un+1 − 3un = 0
that satisfies u0 = 1, u1 = 2.
The characteristic equation is
p2 + 2p − 3 = 0, or (p + 3)(p − 1) = 0.
The roots are p1 = −3, p2 = 1. Hence the general solution is
un = A(−3)n + B·1n = A(−3)n + B.
From the initial conditions,
u0 = 1 = A + B, u1 = 2 = −3A + B.
Hence A = − 14 and B = 54 . The required solution is
un = − 14 ⋅(−3)n + 54 .
851
The characteristic equation can have equal roots, which is a special case.
Consider the difference equation

38.4
un+2 − 2aun+1 + a2un = 0,

CONSTANT-COEFFICIENT LINEAR DIFFERENCE EQUATIONS


where a ≠ 0. Its characteristic equation is
p2 − 2ap + a2 = 0, or (p − a)2 = 0,
which has the repeated root p = a. One solution is Aan; but we require a second
independent solution. Consider the expression un = nan. Then
un+2 − 2aun+1 + a2un = (n + 2)an+2 − 2(n + 1)an+2 + nan+2
= an+2(n + 2 − 2(n + 1) + n) = 0.
Hence a further independent solution is un = Bnan.

Equal roots
The general solution of un+2 − 2aun+1 + a2un = 0 is
un = (A + Bn)an. (38.19)

Roots can also be complex. Consider the difference equation


un+2 + 2un+1 + 2un = 0.
Its characteristic equation is
p2 + 2p + 2 = 0
with roots p1 = −1 + i, p2 = −1 − i. The method still works and the general solution
becomes
un = A(−1 + i)n + B(−1 − i)n.
For a real-valued problem, the constants A and B will be complex conjugates
which ensure that un is real. The solution can be cast in real form by using the
polar forms (Section 6.3) of the complex numbers. In this case
−1 ± i = 2 e± 4 π i .
3

Hence
un = A2 2 n e 4 π in + B2 2 n e− 4 π in
1 3 1 3

= 2 2 n[A(cos 43 π n + i sin 43 π n) + B(cos 43 π n − i sin 43 π n)]


1

= 2 2 n(C cos 43 π n + D sin 43 π n),


1

where C = A + B and D = (A − B)i.

Complex roots, α ± iβ = r e ±θ i
The general complex solution of
un+2 + 2aun+1 + bun = 0,
where a 2  b, is
un = A(α + iβ )n + B(α − iβ )n.
The general real solution is
un = rn(C cos nθ + D sin nθ ). (38.20)
852

Example 38.6 Obtain the general solution of


DIFFERENCE EQUATIONS

un+2 + un = 0.
The characteristic equation is
p2 + 1 = 0,
giving roots p1 = i, p2 = −i. Hence
un = Ain + B(−i)n.
In polar form, i = e 2 π i, − i = e − 2 π i. Hence the real form of the solution is
1 1

un = C cos 12 πn + D sin 12 πn.

Inhomogeneous equations
38

The general inhomogeneous equation is


un+2 + 2aun+1 + bun = f(n) (38.21)

(see (38.12)). Let un = vn + qn, where vn is the general solution of the corresponding
homogeneous equation. Substitute this form of un into (38.21):
(vn+2 + qn+2) + 2a(vn+1 + qn+1) + b(vn + qn) = f(n),
or
(vn+2 + 2avn+1 + bvn ) + (qn+2 + 2aqn+1 + bqn) = f(n).
Since vn satisfies the homogeneous equation, it follows that
qn+2 + 2aqn+1 + bqn = f(n),
which means that qn must be a particular solution of the inhomogeneous equa-
tion. As in differential equations, vn is known as the complementary function.
We construct particular solutions by appropriate choices of functions usually
containing adjustable parameters which are suggested by the form of the function
f(n). If a particular choice fails, then we reject it and try something else.

Example 38.7 Obtain the general solution of


un+2 − un+1 − 6un = 4.
From Example 38.4, the complementary function is
vn = 3nA + (−2)nB.
For the particular solution, we try qn = C, since f(n) = 4. Then
qn+2 − qn+1 − 6qn − 4 = C − C − 6C − 4 = −6C − 4 = 0,
if C = − 23 . Hence qn = − 23 , and the general solution is
un = 3n A + (−2)n B − 23 .

Example 38.8 Obtain the general solution of


un+2 + 2un+1 − 3un = 4.
From Example 38.5, the complementary function is
vn = (−3)nA + B. ➚
853
Example 38.8 continued

38.4
In this case we expect the choice qn = C to fail, since it must make the left-hand side of
the difference equation vanish. When this happens, we try
qn = Cn.

CONSTANT-COEFFICIENT LINEAR DIFFERENCE EQUATIONS


Then
qn+2 + 2qn+1 − 3qn − 4 = C(n + 2) + 2C(n + 1) − 3Cn − 4 = 2C + 2C − 4 = 4C − 4 = 0,
if C = 1. Hence the general solution is
un = (−3)n A + B + n.

Table 38.1 lists some simple forcing terms f(n) with suggested forms of par-
ticular solution and alternatives containing parameters to be determined by
direct substitution.

Table 38.1

f(n) Trial solution qn

k (a constant) C; or Cn, if C fails;


or Cn2, if C and Cn fail; etc.
kn Ckn; or Cnkn, if Ckn fails; etc.
n C0 + C1n
np (p an integer) C0 + C1n + ··· + Cpnp (may need higher
powers of n in special cases)
sin kn or cos kn C1 cos kn + C2 sin kn

Example 38.9 Find the general solution of


un+2 − 4un = n.
The characteristic equation is
p2 − 4 = 0, or (p − 2)(p + 2) = 0.
The roots are p1 = 2, p2 = −2. Hence the complementary function is
vn = 2nA + (−2)nB.
For the particular solution, try (choosing from Table 38.1)
qn = C0 + C1n.
Then
qn+2 − 4qn+1 − n = C0 + C1(n + 2) − 4C0 − 4C1n − n = (−3C0 + 2C1) + n(−3C1 − 1).
The right-hand side vanishes for all n if
−3C0 + 2C1 = 0, −3C1 − 1 = 0.
Hence C1 = − 13 , C0 = 2C1 /3 = − 29 , and the general solution is
un = 2n A + (−2)n B − 2
9 − 13 n.
854

Self-test 38.3
DIFFERENCE EQUATIONS

Find the general solution of un+2 − 4un+1 + 4un = 2n.

38.5 The logistic difference equation


Consider again the logistic difference equation
un+1 = α un(1 − un), (38.22)

where α is a parameter which will take various values. This nonlinear equation can
38

model population growth of generations. If un represents the population size of


generation n and α is the birthrate, then we might expect the population size of
the next generation to be α un in the absence of any inhibiting factors such as lack
of resources or overcrowding. If α  1, then the population model given by the
first-order difference equation un+1 = α un would imply that the population would
grow to infinity, since the equation has the solution un = α nu0. To counter this
possibility, we can introduce a feedback term −α un2 which will tend to reduce
population growth when the population is large.
Fixed points of the equation (38.22) occur where
u = α u(1 − u);
that is, for u = 0 and u = 1 − 1/α. We can adapt the cobweb method of Section 38.3
to this nonlinear difference equation by plotting graphs of the parabola y = f(x) =
α x(1 − x) and the straight line y = x. Fixed points of the difference equation occur
where the line and the parabola intersect. The values of x at these points are given
by the solutions of
α x(1 − x) = x, or x(α x − 1 − α ) = 0.
In the cobweb, the fixed points have coordinates (0, 0) and P : (1 − (1/α), 1 − (1/α)).
We shall only look at values of α  1, so that one fixed point is in the first
quadrant, x  0, y  0. A cobweb solution starting at for the case α = 2.8 is
shown in Fig. 38.7.
Notice that, for this choice of α and u0, the fixed point P appears to be stable;
that is, the cobweb solution approaches P. The slope of the graph of y = α x(1 − x)
at P determines the stability or instability of the solutions. The slope at P is

y
y=x

y= Fig. 38.7 Cobweb solution for


2.8x(1 − x) un+1 = 2.8un(1 − un) showing a
solution starting from x = u0
O u0 1 x approaching the fixed point at P.
855
mf ′(1 − (1/α)) = α − 2α (1 − (1/α)) = −α + 2.

38.5
As with the cobweb for two intersecting lines for the linear difference equation
in Section 38.3, the fixed point P is locally stable if m = 2 − α  −1, in that all
cobweb paths starting close to (α − 1)/α approach the fixed point P as n → ∞. This

THE LOGISTIC DIFFERENCE EQUATION


inequality implies that α  3. Notice also that, if 1  α  2, then y = x intersects
the parabola y = α x(1 − x) between the origin and its maximum value. This
follows since the maximum occurs at x = 12 and 0  1 − 1/α  12 implies 1  α  2.
For α  3 the solutions become more complicated. The fixed point at the
origin is unstable: hence there is no stable fixed point to which solutions can
approach. We can obtain a clue as to what happens if we look at the function of a
function given by
y = f(f(x)) = α [α x(1 − x)][1 − α x(1 − x)] = α 2x(1 − x) − α 3x 2(1 − x)2.
When α = 3, this curve intersects y = x at x = 0 and at P only. This can be
checked by noting that fixed points can be found from
x = 9x(1 − x) − 27x2(1 − x)2
which can be written as
x(27x3 − 54x2 + 36x − 8) = 0, or x(3x − 2)3 = 0.
Graphs of the curves y = f(x) and y = f(f(x)) for α = 3 are shown in Fig. 38.3a.
The fixed point P on y = f(x) is at ( 23 , 23 ). As α increases two additional fixed
points develop on the line y = x. Further graphs of the two functions y = f(x)
and y = f(f(x)) for α = 3.4 are shown in Fig. 38.8b, together with the line y = x. The
graph indicates that there are now four fixed points at O, A, B, C.
For general α, the fixed points of y = f(f(x)) occurs where
x = α 2x(1 − x) − α 3x2(1 − x)2,
or
x(1 − α − αx)[α 2x2 − α(1 + α)x + 1 + α] = 0,

y (a) y (b)
1 1

C
P
B

O 1 x O 1 x

Fig. 38.8 (a) Graph of y = f(f(x)) for the critical case α = 3. (b) Graph of y = f( f(x)) for α = 3.4
showing fixed points O, A, B, C. The dashed curve shows y = f(x) in both cases.
856
where we could have predicted the solution x = (1 − α)/α corresponding to the
point B in Fig. 38.8b. The solutions of
DIFFERENCE EQUATIONS

α 2x2 − α(1 + α)x + 1 + α = 0 (38.23)

are
x1 9 1
= [1 + α z √{(α + 1)(α − 3)}] (α  3)
x2 8 2α
which determine, respectively, the coordinates of A and C.
From (38.23)
x1 + x2 = (1 + α)/α. (38.24)
38

Also
f(x1) = αx1(1 − x1) = αx1 − αx12
= αx1 − (1/α)[α(1 + α)x1 − 1 − α] (using (38.23))
= (1/α)(−αx1 + 1 + α) = x2
by eqn (38.24). Similarly f(x2) = x1.
It follows that
f(f(x1)) = f(x2) = x1 and f(f(x2)) = f(x1) = x2.
Hence if x = x1 initially then subsequently x alternates between x1 and x2 shown
by the square in Fig. 38.8b. This phenomenon is known as period doubling.
The values x = x1 and x = x2 are fixed points of y = f(f(x)), and their stability is
determined by the slopes of y = f(f(x)) at the points.
The critical slopes for stability at A and C are both (−1); we now find the value
of α at which this occurs. We have
d
f(f(x)) = α 2 − 2α 2x − α 3(2x − 6x2 + 4x3)
dx
= α 2 − 2α 2(1 + α)x + 6α 3x2 − 4α 3x3. (38.25)

We require the value of α given by


d
f(f(x)) = −1 or 4α 3x3 − 6α 3x2 + 2α 2(1 + α)x − α 2 = 1, (38.26)
dx
when x satisfies (38.23).
Remove the x 3 term from (38.26) by multiplying (38.3) by 4αx, and subtracting
it from (38.26). Then
−2α 2(α − 2)x 2 + 2α(α − 2)(α + 1)x − (1 + α 2) = 0. (38.27)

Equations (38.26) and (38.27) must have the same roots in x. In each case, make
the coefficient of x 2 equal to 1. The equations for comparison are
(α + 1) (α + 1)
x2 − x+ = 0,
α α2
(α + 1) (α 2 + 1)
x2 − x+ = 0.
α 2α 2 (α − 2)
857
These equations have the same roots if

38.5
α +1 (α 2 + 1)
= ,
α2 2α 2 (α − 2)

THE LOGISTIC DIFFERENCE EQUATION


or
α 2 − 2α − 5 = 0. (38.29)

We are interested in values of α  3, so that the required root of (38.29) is


α = 1 + √6 = 3.449… . In fact the slopes at both A and C both become −1 for this
value of α. Thus, for
3  α  1 + √6,
the 2-cycle solution is stable.
As α in creases from 1, the stable fixed point at x = (α − 1)/α becomes unstable
at α = 3. This bifurcates into a stable period 2 solution.
At α = 1 + √6, the system bifurcates again into a 4-cycle or period-4 solution,
which corresponds to the set of stable fixed points of y = f( f( f( f(x)))). A graph of
this function for α = 3.54 is shown in Fig. 38.9 together with the eight fixed points.
The cycle doubles again at about α = 3.544, … and so on. The intervals between
the bifurcations of the period doubling rapidly decrease, until a limit is reached
at about α = 3.570, … beyond which chaos occurs. The iterations are no longer
periodic for most values of α beyond this point, although there are some brief
intervals of periodicity.

y
1 y=x

Fig. 38.9 Fixed points of


y = f( f( f( f(x)))) for α = 3.54,
given by the intersection of
O 1 x the curve and the line y = x.

Logistic equation
un+1 = f(un) = α u n(1 − un).
Fixed point for α  0, x  0 at x0 = (α − 1)/α.
Fixed point x0 stable if
f ′(x0) = α − 2α x0 = −α + 2  −1, that is if α  3.
Period-2 solution: fixed points (α  3)
x1,x2 = {1 + α z √[(α + 1)(α − 3)]}/(2α).
Period-2 solution stable if 3  α  1 + √6. (38.28)
858
The sequence of period-doubling bifurcations is known as the Feigenbaum
sequence, and it has certain universal aspects in that it is not just a consequence of
DIFFERENCE EQUATIONS

the logistic equation, but has common features with other difference equations
which generate period doubling.
The simplest way to view the progressively complex behaviour is through a
computer-drawn picture of the iterations of
un+1 = α un(1 − un)
for stepped increases in α starting at α = 2.8 up to α = 3.8, which covers the main area
of interest. The result is shown in Fig. 38.10. The series of single dots for each α
in 2.8  α  3 indicates the fixed point, which then bifurcates into a stable 2-cycle
attractor for 3  α  1 + √6. This in turn bifurcates into a stable 4-cycle attractor
38

at α = 1 + √6 and so on. The effect of infinite period doubling is that the solution is
ultimately non-periodic. The generally chaotic and noisy behaviour of the differ-
ence equation can clearly be seen in the large number of dots for larger values of α.
These non-periodic sets are known as strange attractors. The successive iterates
of the logistic equation wander about in a seemingly random but bounded man-
ner, and never settle into a periodic solution. However, within the chaotic band of
α values, there appear windows of periodic cycles. Problem 38.26, for example,
confirms that there is a 3-cycle around α = 3.83.
The logistic equation can be thought of as a relatively simple model example.
Many similar nonlinear difference equations also exhibit similar period-doubling
bifurcations and strange attractors.

un
1.0

0.8

0.6

0.4

0.2 Fig. 38.10 Period doubling for the


logistic equation for increasing α,
α
followed by chaotic iterations
2.8 3.0 3.2 3.4 3.6 3.8 4.0 beyond about α = 3.57.
859
Problems

PROBLEMS
38.1 £1000 is invested over 10 years at an interest f(n) = f(--13 n) + --58 ,
rate of 6% annually. Find the final total investment. given the initial condition f(1) = 0.
What should the monthly interest rate be to achieve
the same final total? 38.8 (Section 38.3). Find the general solutions of
the following difference equations:
38.2 The sum of £50 000 is borrowed over 25 (a) un+2 + 2un+1 − 3un = 0;
years and the money is repaid in equal annual (b) un+2 − 9un = 0;
instalments. The interest rate on the outstanding (c) un+2 + 9un = 0;
balance in any year is 10%. Find what the annual (d) un − 4un−1 + 5un−2 = 0;
repayments would be. After 5 years, the interest (e) un+2 − 4un+1 + 4un = 0;
rate is reduced to 9%. (f) un+3 − un+2 + un+1 − un = 0;
(a) Find the required adjustment to the annual (g) un+3 − un = 0;
repayments for the loan to be repaid over the (h) un+3 − 3un+2 + 3un+1 − un = 0;
original term. (i) un+2 − un+1 − un + un−1 = 0.
(b) If the repayments are not changed, by how
much will the mortgage term be reduced? 38.9 Express the solution of the initial-value
problem
38.3 Find the fixed points of the following
un+2 − 6un+1 + 13un = 0, u0 = 0, u1 = 1,
difference equations:
(a) un+1 = un(2 − un ); in real form.
(b) un+1 = un(1 + un )(2 − 3un );
(c) un+1 = sin un ; (d) un +1 = --12 sin un ; 38.10 Find the difference equation satisfied by
(e) un+1 = eun − 1. un = A ·2n + B ·(−5)n,
for all A and B.
38.4 Given the initial value u0 in each case,
calculate the sequence of terms up to u5 for each 38.11 Obtain particular solutions of the following
of the following first-order difference equations: inhomogeneous difference equations:
(a) un+1 = 2un(3 − un), u0 = 1; (a) un+2 + 2un+1 − 3un = f(n), where
(b) un+1 = 2un(1 − un), u0 = --12 ; (i) f(n) = 2n; (ii) f(n) = n; (iii) f(n) = 2
(c) un+1 = 3.2un(1 − un), u0 = --12 ; (iv) f(n) = (−3)n.
(d) un+1 = 4un(1 − un), u0 = --12 . (b) un+2 + 2un+1 + 2un = f(n), where
(i) f(n) = 1; (ii) f(n) = n + 3;
38.5 (Section 38.3). Sketch the cobweb solutions (iii) f(n) = cos --34 πn.
for the following first-order equations with the (c) un+3 − 3un+2 + 3un+1 + un = f(n), where
stated initial conditions, and discuss the stability (i) f(n) = 1; (ii) f(n) = n; (iii) f(n) = n2.
of the fixed point: (d) un+2 − 6un+1 + 9un = f(n), where (i) f(n) = 2n;
(ii) f(n) = 3; (iii) f(n) = 3n; (iv) f(n) = n3n.
(a) un+1 = --12 un + --12 , u0 = --12 and u0 = --32 ;
(b) un+1 = 2un − 2, u0 = --12 and u0 = --32 ; 38.12 A ball bearing is dropped from a height
(c) un+1 = −un + 2, u0 = --12 and u0 = --34 ; z = h0 on to a metal plate, and the coefficient of
(d) un+1 = − --12 un + --32 , u0 = --12 and u0 = --32 ; restitution between the ball and the plate is ε,
where 0  ε  1. Set up a difference equation for
(e) un+1 = −2un + 3, u0 = --12 and u0 = --32 . the maximum height reached after n impacts.
Solve the equation. (Assume that a ball dropped
38.6 The function f(n) satisfies from a height h hits the plate with speed v = √(2gh),
f(n) = f(--12 n) + 1. where g is the acceleration due to gravity. The
Put n = 2m and g(m) = f(2m), and show that rebound speed of the ball is ε v.) Instead of being
stationary, the plate now oscillates so that it is
g(m) = g(m − 1) + 1. moving upwards at a speed u (a constant) at the
Hence find f(n) given that f(1) = 0. moment of each impact with the ball. Find the
difference equation for hn. Show that the difference
38.7 Use the method suggested in the previous equation has a fixed point and interpret its
problem to solve meaning.
860
38.13 Dn(x) is the n × n determinant defined by probability that the walker moves to either x = r + 1
or x = r − 1 at any stage is --12 . The probability uk that
DIFFERENCE EQUATIONS

2x 1 0 … 0
the walker reaches x = 0 first, given an initial
1 2x 1 … 0
Dn(x) = (n  2), position x = k, satisfies the difference equation
    
0 0 0 … 2x uk = 12 uk−1 + 12 uk+1, u0 = 1, uN = 0,
for 1  k  N − 1. Find uk. What is the probability
2x 1 that the walker reaches x = N first?
D2(x) = , D1(x) = 2x.
1 2x If dk is the expected number of steps in the walk
Show that before it reaches 0 or N, then dk satisfies
Dn(x) = 2xDn−1(x) − Dn−2(x). dk = 12 (1 + dk+1) + 12 (1 + dk−1), d0 = dN = 0
Solve the difference equation for x ≠ 1 and x = 1. for 1  k  N − 1. Find the expected duration of
the walk.
38.14 Let {un} (n = 0, 1, … ) be a sequence. The
38

38.18 Show that un = n! is a solution of the second-


power series
order difference equation

f (un , x) = ∑ un xn un+2 = (n + 2)(n + 1)un.
n=0
By using the substitution un = vnn!, find a second
is known as the generating function of the independent solution.
sequence. Thus, for example, if un = (−1)n/n!, then

(−1)n n 38.19 Given that
f (un , x) = ∑ x = e −x , n
n = 0 n! sn = ∑ k 3,
which means that e−x is the generating function k=1

of {un}. find a first-order difference equation for sn. Solve


The generating function of {un+1} is the equation to find a formula for the sum sn.
∞ ∞
1 (−1)n + 1
f (un + 1, x) = ∑ un + 1 xn = ∑ (n + 1)! x n +1
38.20 Show that the difference equation
n=0 x n=0
un+2 + 2aun+1 + bun = 0
1 ⎛ ∞ (−1)n n ⎞ 1
= ⎜∑ x − 1⎟ = [ f (un , x) − 1]. can be expressed as
x ⎝ n = 0 n! ⎠ x zn+1 = Azn,
Consider the difference equation where
un+2 + un+1 − 2un = 0, u0 = 1, u1 = −2. ⎡u ⎤ ⎡−2a b⎤
zn = ⎢ n ⎥ , A=⎢ .
By taking the generating function of the equation, ⎣vn ⎦ ⎣ −1 0⎥⎦
show that Deduce that
f (un, x) =
1
. zn = Anz0.
1 + 2x Consider the case with a = 1 and b = −8. Find the
Using the binomial theorem find un. eigenvalues of A and use the methods of Section
13.5 to find a formula for An. Hence solve the
38.15 A Fibonacci sequence is defined as a difference equation for un in terms of u0 and u1.
sequence in which any term is the sum of the two
preceding terms. For the Fibonacci sequence 38.21 (Section 38.5). Consider the logistic
starting with u1 = 1, u2 = 2, find and solve the equation
difference equation for un. un+1 = α un(1 − un).
Draw cobweb solutions starting at u0 = --12 for the
38.16 Solve the initial-value difference equation
cases α = 2.7, α = 2.9, and α = 3.3. What do you
3un+2 − 2un+1 − un = 0, u1 = 2, u2 = 1, infer about the stability of the fixed point in the
and show that un → --54 as n → ∞. first quadrant?

38.17 A symmetric random walk takes place on 38.22 (Section 38.5). In the logistic equation
the integer steps on the line between x = 0 and un+1 = α un(1 − un), for what positive values
x = N. At any position x = r (1  r  N − 1), the of α is the origin a stable fixed point?
861
38.23 (Section 38.5). Find the two stable values 38.26 By starting from u0 = 0.957 417, compute
between which un ultimately oscillates in the u1, u2, … , u5 for the difference equation

PROBLEMS
logistic equation un+1 = 3.25un(1 − un). un+1 = α un(1 − un), α = 3.83,
and confirm that the logistic equation appears to
38.24 Consider the difference equation
have a 3-cycle for this value of α.
un +1 = α ( 12 − | un − 12 |).
Sketch the function y = f(x) = α(--12 − | x − --12 |) for 38.27 Find the fixed points of the difference
α = --32 . Where are the equilibrium points of the equation
difference equation for α  1? Show that the un+1 = α un(1 − un)2,
origin is stable if α  1, and unstable if α  1. in the three cases (a) α = 9, (b) α = 4, (c) α = --94 .
What happens if α = 1? Discuss the stability of the fixed points in each case.
Sketch the graph of y = f( f(x)) for α = 2. Show
that there exists a 2-cycle and locate the periodic 38.28 Show that the special logistic equation
values of un. un+1 = 4un(1 − un )
38.25 Find the fixed points of has the solution
un = sin2(2nCπ)
un+1 = α un(1 − u n3 ),
where C is any constant. This general solution
for all α. Determine the slope of y = f(x) =
includes closed-form chaotic solutions. For
α x(1 − x3) at the nonzero fixed point. Confirm
example, if C = 1/π, then
that this fixed point is stable if α  --53 and
unstable if α  --53 . Sketch cobweb solutions un = sin2(2n)
for α = 1.2, 1.4, 1.8. which never repeats itself for n = 0, 1, 2, … .
Part 7
Probability and statistics
Probability
39

CONTENTS

39.1 Sample spaces, events, and probability 866


39.2 Sets and probability 868
39.3 Frequencies and combinations 872
39.4 Conditional probability 875
39.5 Independent events 877
39.6 Total probability 879
39.7 Bayes’ theorem 880
Problems 881

An experiment or trial is described as random if the result or outcome of the


experiment is not predictable or contains uncertainty. The theory of probability
is essential in the modelling and analysis of random experiments. In some aspects
of life we expect and often hope that situations we meet behave in a predictable or
deterministic manner. We expect water to freeze at 0°C under normal pressure;
we expect the sun to rise at the appropriate time each day. For important safety
reasons we expect an aircraft to have predictable characteristics in a wide range
of sometimes extreme situations. However, the weather is largely unpredictable
looking more than a week into the future. The distinction between random and
deterministic has become less ‘certain’ in more recent times. Some physical
phenomena such as the weather can be modelled by deterministic equations
but still exhibit long-term, seemingly unpredictable, behaviour. Such systems,
which display what is known as chaos (see Section 38.5 for a model difference equa-
tion with a chaotic output), show extreme sensitivity to small initial changes.
Chaos is distinct from random behaviour but the outcome can show very similar
manifestations.
If an experiment is repeatable we can count the occasions when a particular
outcome occurs. (This only makes sense if the conditions surrounding the experi-
ment do not change with time.) In repeating such an experiment many times, the
proportion of favourable outcomes may achieve some regularity. We can measure
this by calculating the relative frequency of this outcome defined by
number of occurrences of the given outcome
relative frequency = .
total number of experiments
866
After a large number of experiments, this ratio may approach a steady value which
is known as the probability of this particular outcome.
PROBABILITY

For example, the standard die has six faces numbered 1, 2, 3, 4, 5, 6. After a large
number of throws, we would expect the number 1 (or any other number) to appear
on the upper face with a relative frequency of 1/6. Hence we expect that the prob-
ability of a 1 appearing is 1/6.
Many probabilities are based on data, past records, the ‘degree of belief’, the
view of individuals, and so on. Horse races are usually not repeated so that there
39

can be no relative frequency approach, but book-makers and punters bet on the
basis of the previous form of the horses, the state of the course, and the pattern
of bets. Generally as the race approaches the bookmakers’ odds reflect how the
accumulation of bets has been distributed among the runners. Many outcomes
will be assigned probabilities with at least some subjective element.
Probabilities are important in measuring risk, and there can be surprising
results. From past data the earth receives a significant meteor impact every 100 years.
The probability of a particular individual being killed by such an impact is very
small but nonzero. However, the impact could be cataclysmic, which means that by
some measures the probability of being killed by a meteor impact is greater than
that arising from a plane crash. In engineering, as the reliability of components
improves, the likelihood of failure becomes more remote, but might as a con-
sequence have more serious implications if it does occur.

39.1 Sample spaces, events, and probability


The first task with our random experiment is to define the list, or set, of all possible
outcomes which is known as the sample space. A simple example is the single
spin of a coin, in which there are two possible outcomes with either a head or a tail
showing. The outcomes can be denoted by H (for head) and T (for tail). The sample
space S for this experiment has two elements H and T and we denote it in set
terms by
S = {H, T}.
(Information about sets and set notation can be found in Chapter 35.) For the
single throw of a fair die, the sample space has the six possible outcomes, namely
1, 2, 3, 4, 5, 6. Hence its sample space is
S = {1, 2, 3, 4, 5, 6}.
Some sample spaces have an infinite number of elements. Suppose we spin a
coin until a tail appears. Any number of heads could appear before a tail. Hence
the sample space is
S = {0 head, 1 head, 2 heads, 3 heads, and so on}.
However, the sample space is countable, that is the elements in the sample space
can be matched against the positive integers. A sample space is said to be discrete
if it contains a finite or countably infinite set of outcomes. A list of outcomes such
as {2, 4, 6, 8, … } would be countably infinite.
A collection of elements satisfying a common requirement in a sample space is
known as an event. For a die the event could be the appearance of a particular
867
number, say 5, an odd-number outcome, or any number less than 5 on a single
throw. These events are respectively the sets

39.1
A1 = {5}, A2 = {1, 3, 5}, A3 = {1, 2, 3, 4}:

SAMPLE SPACES, EVENTS, AND PROBABILITY


they are subsets of the sample space; that is, in the notation of (35.3) A ⊆ S in each
case.
As we mentioned in the introduction, the probability of an event is the relative
frequency that the event takes place in a large number of repetitions of the experi-
ment. The probability of an event A is denoted by P(A). For the single spin of a
coin we expect heads and tails to be equally likely to occur. Thus
P(H) = 12 P(T) = 12 .
We can also view this in a non-experimental way. If an event can occur in n
different ways out of a total number of N possible ways, all of which are equally
likely, then the probability of the event is n/N. For a fair coin a head can arise in
one way from two equally likely ways. Hence P(H) = 12 .
For the die the probability that any individual number x is face up is given by
P(x) = 1/6 on a single throw. The probability that a number less than 5 appears
will be
number of ways in which numbers less than 5 occur
P(A3 ) =
total number of possible outcomes
4 2
= = ,
6 3
where A3 = {1, 2, 3, 4}.

Example 39.1 Two coins are spun. What is the probability that at least one
head appears?
It is essential in the solution to distinguish the coins, as, say, a and b. Thus if Ha is the
event that coin a shows a head, Ta that a shows a tail, and so on, then the sample space
has four elements:
S = {(Ha, Hb ), (Ha, Tb ), (Ta , Hb ), (Ta, Tb)},
which are all equally likely. Thus
P((Ha, Hb )) = P((Ha, Tb )) = P((Ta, Hb )) = P((Ta, Tb )) = 14 .
The event A, that at least one head appears, is the subset
A = {(Ha, Hb ), (Ha, Tb ), (Ta, Hb )},
which contains three of the four elements. Hence at least one head occurs with
probability P(A) = 34 .

Example 39.2 Two distinguishable dice a and b are rolled. What are the
elements of the sample space? What is the probability that the sum of the face
values of the two dice is 8? What is the probability that at least one 5 appears?
We distinguish the outcome of each die separately, so that there are 6 × 6 = 36 possible
outcomes for the pair. The sample space has 36 elements of the form (i, j) where ➚
868
Example 39.2 continued
PROBABILITY

i and j take all integer values 1, 2, 3, 4, 5, 6, and i is the outcome of die a and j is the
outcome of b. The full list is
S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
39

(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)},
and they are all equally likely. If A1 is the event that the sum of the dice is 8, then from
the list
A1 = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}
which occurs for 5 elements out of 36. Hence
P(A1) = 365 .
The event that at least one 5 appears is the list
A2 = {(1, 5), (2, 5), (3, 5), (4, 5), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 5)},
which has 11 elements. Hence
P(A2 ) = 11
36 .

Self-test 39.1
Two distinguishable dice are rolled. Using the list given in Example 39.2,
what is the probability of the event A1 that the sum of the face values is 7?
What is the probability that (a) the event A2 that no 3 or 5 appears, (b) the
event A3 that no 3 and 5 appears?

39.2 Sets and probability


Set notation is very helpful in representing sample spaces and events. This section
uses the properties of sets and Venn diagrams explained in Chapter 35. Consider
Example 39.2 again: this is the problem when two dice are rolled. In set terms the
sample space S can be thought of as the universal set for this experiment. Suppose
that we are interested in the event A3 in which either the sum of the two dice is 8
(event A1 ) or at least one 5 appears (event A2 ), or both. This event is the union of
the subsets of S, namely A1 and A2 , represented by
A3 = A1 ∪ A2,
where

A1 = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)},
A2 = {(1, 5), (2, 5), (3, 5), (4, 5), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 5)}.
The event A3 has 14 elements of which two are common to both A1 and A2.
869
If A4 is the event that both A1 and A2 occur, then A4 is the intersection of A1 and
A2, namely

39.2
A4 = A1 ∩ A2 = {(3, 5), (5, 3)}.

SETS AND PROBABILITY


The two events are shown diagrammatically in Fig. 39.1.
Remember that the complement of a set or event is denoted by D, and the
empty set by Ø.

(a) (b)

A1 A2 A1 A2

S S

Fig. 39.1 (a) The event A3 = A1 ∪ A2, (b) The event A4 = A1 ∩ A2.

Example 39.3 Suppose that A, B, C are three events in the sample space S. Write
down the sets which represent the events that: (a) A occurs, but neither B nor C
occurs; (b) A, B, and C all occur.
(a) The event that B or C occurs will be B ∪ C. The event that neither B nor C occurs
will be the complement B__ _]_ _C. The required set will be the intersection of this event
and A, namely
A ∩ (B_ _]_ _C).
By de Morgan’s first law (35.6), this is equivalent to
A ∩ (E ∩ y),
which can be written unambiguously as A ∩ E ∩ y by the associative law for
intersection (35.4).
(b) Events B and C occur in the set B ∩ C. Events A and B ∩ C occur in the event
A ∩ (B ∩ C) or A ∩ B ∩ C.

Two events are said to be mutually exclusive if they cannot occur together in a
single trial (or experiment), which in set terms is equivalent to the two subsets of
S being disjoint: that is, having no elements in common. Consider the following
illustrative application of a single die, which is rolled and the score noted. An
event of interest in a random experiment can be specified in many ways. A player
could be interested in even or odd scores, the score 2 or not, or scores which are
factors of 6 or not. In each case the sample space is divided into two disjoint
sets or mutually exclusive events, together constituting an exhaustive (meaning
that there are no outcomes which are not in at least one event) list of outcomes.
For example, if A stands for the event of an even score, then D must represent an
odd score. Thus
870
A ∩ D = Ø, and A ∪ D = S,
PROBABILITY

since the outcome must be either even or odd.


If Ai denotes the event of a score i where i = 1, 2, … , 6 for the rolling of a die,
then the events are mutually exclusive and exhaustive, and the sample space will
be the union of these events:
S = A1 ∪ A2 ∪ ··· ∪ A6.
39

Any union of events can be expressed in terms of the union of certain mutually
exclusive events. For example, the union A ∪ B of two events A and B can be
partitioned into the mutually exclusive events A ∩ E, A ∩ B, and D ∩ B. Then
A ∪ B = (D ∩ B) ∪ (A ∩ B) ∪ (A ∩ E).
In another example, an event A in the sample space which also contains B can
be divided as
A = (A ∩ B) ∪ (A ∩ E),
which can be interpreted as meaning that A can occur either with B or without B.
Suppose the sample space is partitioned into the n mutually exclusive and
exhaustive events A1, A2, … , An. If A is any event, then
A = (A ∩ A1) ∪ (A ∩ A2) ∪ ··· ∪ (A ∩ An).
This means that, if A occurs, then it must occur as one, and only one, of the events
A1, A2, … , An. It might happen that A ∩ Ai = Ø for some intersections, but this
does not matter.

Example 39.4 In Example 39.2, express the sample space S and the events A1
and A2 in set terms.
The sample space is given by
S = {(i, j )|i, j = 1, 2, 3, 4, 5, 6},
which has 36 elements since the dice are distinguishable and (i, j) is distinct from (j, i).
The events A1 (the sum of the faces of two is 8) and A2 (at least one 5 appears for two
dice) can be written
A1 = {(i, j)|i + j = 8},
A2 = {(i, j)|either i = 5 or j = 5 or both}.

The rules governing probability are as follows:

Axioms of probability
For every event A in a sample space S, the probability P(A) must satisfy:
(a) 0  P(A)  1;
(b) for the empty set (or non-event) and the sample space S:
P(Ø) = 0, P(S) = 1;
(c) for n mutually exclusive events A1, A2, … , An,
P(A1 ∪ A2 ∪ ··· ∪ An) = P(A1) + P(A2) + ··· + P(An). (39.1)
871
The rules can be interpreted as

39.2
(a) every probability must lie between and including 0 and 1;
(b) the probability of an impossible event is zero, and the probability of the

SETS AND PROBABILITY


occurrence of some element in a sample space is certain;
(c) the probability that one of a set of mutually exclusive events occurs is the
sum of the probabilities of each event.

Example 39.5 Two dice are rolled. What is the probability that a total score
of 4 or 7 occurs?
Let A1 be the event of a score 4 and A2 be the event of a score 7. These cannot occur
together, so they must be mutually exclusive events. The event of a score 4 or 7 is
A1 ∪ A2. Hence by (39.1c) and the complete list of outcomes in Example 39.2,
P(A1 ∪ A2) = P(A1) + P(A2) = 363 + 366 = 14 .

If two events A1 and A2 are not mutally exclusive then they must have ele-
ments of the sample space in common. Using partitioning, which was explained
previously in this section, A1, A2, and therefore the union of A1 and A2 can be
partitioned into unions of mutually exclusive events. Thus, since A1 ∩ A2 is A1 not
A2, and A1 ∩ A2 is A1 and A2.
A1 = (A1 ∩ D2) ∪ (A1 ∩ A2).
Similarly
A2 = (D1 ∩ A2) ∪ (A1 ∩ A2).
Therefore
A1 ∪ A2 = (A1 ∩ D2) ∪ (D1 ∩ A2) ∪ (A1 ∩ A2),
since (A1 ∩ A2) ∪ (A1 ∩ A2) = A1 ∩ A2. Hence by rule (c) in (39.1),
P(A1) = P(A1 ∩ D2) + P(A1 ∩ A2), (39.2)

P(A2) = P(D1 ∩ A2) + P(A1 ∩ A2), (39.3)

P(A1 ∪ A2) = P(A1 ∩ D2) + P(D1 ∩ A2) + P(A1 ∩ A2). (39.4)

Elimination of P(A1 ∩ D2) and P(D1 ∩ A2) between (39.2), (39.3), and (39.4) leads to:

Probability addition law


For two events which are not mutually exclusive:
P(A1 ∪ A2) = P(A1) + P(A2) − P(A1 ∩ A2).
(39.5)

Geometrically the result can be seen from Fig. 39.2 in which the intersection
A1 ∩ A2 is ‘counted twice’ in P(A1) + P(A2).
872
PROBABILITY

A1

A1 ∩ A2
39

A2

Fig. 39.2

Example 39.6 If two dice are rolled, what is the probability that either the sum
is 8 or at least one 5 appears?
As we saw in Example 39.2, if A1 is the event that the sum is 8 and A2 the event that at
least one 5 appears, then
P(A1 ) = 365 , P(A2 ) = 11
36 .

These are not mutually exclusive events because both events occur when the outcomes
are {(5, 3)} or {(3, 5)}. Therefore
A1 ∩ A2 = {(3, 5), (5, 3)},
in which case
P(A1 ∩ A2 ) = 362 = 181 .
Hence, by (39.5),
P(A1 ∪ A2 ) = 365 + 11
36 − 36 = 18 ,
2 7

which means that the sum is 8 or at least one 5 appears with probability 187 . This can be
checked by counting the occurences of 8 or at least one 5 in the list in Example 39.2.

Self-test 39.2
Using a Venn diagram show that, for three events A, B, C,
(A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C).
Extend formula (39.5) to the probability of three events A, B, C, namely
P(A ∪ B ∪ C).

39.3 Frequencies and combinations


In many applications the total number of elements in a sample space or in an
event can be counted. Enumeration of outcomes can be a lengthy process. For
example, suppose that an experiment consists of trials such as the spinning of k
873
coins or the rolling of k dice. If there are n possible outcomes for each coin or die,
then the same space has nk possible outcomes. The rolling of four distinguishable

39.3
dice leads to a sample space with 64 = 1296 elements.
As we saw in the preamble to this chapter, probabilities can be obtained using

FREQUENCIES AND COMBINATIONS


relative frequency arguments. For the counting process which is needed, permutation
and combination formulae are often useful. Section 1.17 provides a full account of
permutations and combinations. We shall review the main results here.
A permutation is a particular ordered selection. The notation n Pr means the
number of ways in which r different items can be selected from n distinct items
taking regard of the order of selection. If items are not replaced, the first item can
be chosen in n ways leaving n − 1 items. Hence the second item can be chosen in n
− 1 ways. The first two items can be chosen in n(n − 1) different ways. Continuing
this process r times we obtain
n!
n Pr = n(n − 1)(n − 2)… (n − r + 1) = .
(n − r)!

Example 39.7 How many permutations of the letters a, b, c, d can be made if


two are selected each time?
In this example n = 4 and r = 2. Thus
4 P2 = 4·3 = 12.

The full list of permutations is


{ab, ac, ad, ba, bc, bd, ca, cb, cd, da, db, dc}.

A combination is an unordered selection, simply a subset. The notation for a


combination is nCr which means the number of ways in which r different items can
be selected from n items without regard to order. Among the n Pr permutations
there are r! which give the same combination, because the first position can be
chosen in r ways, the second in r − 1 ways, and so on. Thus
n Pr n!
nCr = = .
r! (n − r)!r!
In the example above the items ab and ba are not distinguished in the combina-
tion and so on, so that two different letters may be chosen from four different
letters in
4!
4C2 = =6
2!2!
ways.
Note that
n! n!
nCr = = = nCn−r .
(n − r)!r! (n − (n − r))!(n – r)!
An alternative notation for nCr is
⎛ n⎞
Cr = ⎜ ⎟ .
n
⎝ r⎠
874
(See eqn (1.46)). Notice also that the sequence nCr (r = 0, 1, 2, … , n) generates the
coefficients of the binomial series in (a + b)n (see Section 1.18 and Appendix A(c)).
PROBABILITY

In the formulae 0! is interpreted as having the value 1, so that special values are

nC0 = 1, nCn = 1.

Example 39.8 How many different five-card hands can be dealt from a standard
deck with 52 cards? What is the probability that a hand dealt at random consists
39

of five spades?
This is a combination problem, not a permutation one. Thus there are
52! 52 ⋅ 51 ⋅ 50 ⋅ 49 ⋅ 48
52 C5 = = = 2 598 960
47!5! 1⋅ 2 ⋅ 3 ⋅ 4 ⋅ 5
different hands.
The number of different hands consisting of five spades is, since there are 13 spades in
the pack,
13! 13 ⋅ 12 ⋅ 11 ⋅ 10 ⋅ 9
13 C5 = = = 1287.
8!5! 1⋅ 2 ⋅ 3 ⋅ 4 ⋅ 5
To obtain the probability that a random five-card hand contains five spades we can
use the counting argument, namely that out of the 2 598 960 equally likely different
hands 1287 will have five spades. Hence, by the frequency argument
1287
P(five-card spade hand) = ≈ 0.0005,
2 598 960
which implies that about one hand in 2000 will have five spades.

Example 39.9 A box contains 20 balls of which 7 are red(r), 5 are white(w),
and 8 are black(b) balls. If three balls are drawn at random, without
replacement, find the probability that
(a) two red balls and one black ball are drawn;
(b) one of each colour is drawn;
(c) one or more red balls are drawn;
(d) all are of the same colour.
The total number of three-ball selections which can be made is
N = 20C3 = 1140
for labelled balls. They are all equally likely to be drawn.
(a) The numbers of ways in which two red balls and one black ball can be drawn is
7 ⋅6
7C2 × 8 = ⋅ 8 = 168.
1⋅ 2
Hence
168 168 14
P(2r and 1b) = = = ≈ 0.15.
N 1140 95
(b) The number of ways in which one of each colour can be chosen is
7 × 5 × 8 = 280 from a total of 1140. Hence
280 14
P(1r and 1w and 1b) = = ≈ 0.25.
1140 57 ➚
875
Example 39.9 continued

39.4
(c) The number of ways in which no red ball is drawn is 13C3 = 286 from
the total of 1140. Hence the probability that a selection contains at least
one red ball is

CONDITIONAL PROBABILITY
286 854 427
P( 1r ) = 1 − P(0r ) = 1 − = = ≈ 0.75.
1140 1140 570
(d) Since the events are mutually exclusive, using (39.1c),
P(3r or 3w or 3b) = P(3r ∪ 3w ∪ 3b) = P(3r) + P(3w) + P(3b)
C + C + C
= 7 3 5 3 8 3
20C 3

101
= ≈ 0.09.
1140

Self-test 39.3
In Example 39.8, suppose that six cards are dealt. What is the probability
that the hand consists of red cards only?

39.4 Conditional probability


In many applications we are interested in an event A given that an event B occurs.
The probability of A, conditional that B occurs, is written as P(A|B). A Venn
diagram showing the overlapping events is displayed in Fig. 39.3a. The probability
P(A|B) refers to the restricted set in Fig. 39.3b in which effectively the new universal
set is B. In enumeration terms we can derive
(number of outcomes in A ∩ B)
P(A | B) =
(number of outcomes in B)
(number of outcomes in A ∩ B)/(number of outcomes in S)
= .
(number of outcomes in B)/(number of outcomes in S)
Hence the formal definition is, assuming that P(B) ≠ 0,

(a) A∩B
(b)
A B

B
S

Fig. 39.3 (a) Both A and B occur in the shaded intersection A ∩ B. (b) P(A | B) refers to the new
universal set B.
876

Conditional probability of A given B


PROBABILITY

P(A ∩ B)
P(A | B) = .
P(B) (39.6)

Example 39.10 Six cards are dealt from a well-shuffied deck of playing cards.
Given that all six cards are black, find the probability that they are all of the
39

same suit.
Let A and B represent the following events:
A = {the cards are black}, B = {the six cards in the same suit}.
Thus
A ∩ B = {six black cards of the same suit}.
Therefore
(number of combinations of six clubs or six spades)
P(A ∩ B) =
(number of combinations of six cards)
2 ⋅ 13C6
= .
52 C6
Also
(number of combinations of six black cards) 26C6
P(B) = = .
(number of combinations of six cards) 52C6

Hence the conditional probability that they are all of the same black suit is
P(A ∩ B) 2 ⋅ 13C6 52C6 13! 6!20! 12
P(A | B) = = ⋅ = 2⋅ ⋅ = ≈ 0.015.
P(B) C
52 6 C
26 6 6!7! 26! 805

Note the following properties of conditional probabilities:

(i) P(A|A) = 1.
(ii) P(A|B)P(B) = P(B|A)P(A).

The latter follows since A ∩ B = B ∩ A and


P(A| B) = P(A ∩ B)/P(B), P(B | A) = P(B ∩ A)/P(A)
from definition (39.6).

Example 39.11 A production line is supplied with the same component made
by two different machines M1 and M2. It is known from samples of the outputs
that the probability that a component from M1 is not faulty is 0.91 and from
M2 is 0.85. Machine M1 supplied 60% of the components and machine M2 40%.
Components are chosen at random and tested before the next stage of
production. What is the probability that
(a) given that a component was made by M2 it is not faulty?
(b) a component is not faulty? ➚
877
Example 39.11 continued

39.5
Let A1, A2, and B be the events
A1 = {component made by M1}, A2 = {component made by M2},

INDEPENDENT EVENTS
B = {component not faulty}.
From the 60%/40% supply we know that P(A1) = 0.6 and that P(A2) = 0.4. The known
failure rates in M1 and M2 give the conditional probabilities P(B| A1) = 0.91 and
P(B|A2) = 0.85.
(a) The answer is P(B|A2) = 0.85.
(b) Write the event B as (B ∩ A1 ) ∪ (B ∩ A2 ) which is still the event that the component
is not faulty. Since B ∩ A1 and B ∩ A2 are mutually exclusive, it follows that
P[(B ∩ A1) ∪ (B ∩ A2)] = P(B ∩ A1) + P(B ∩ A2)
= P(B |A1)P(A1) + P(B|A2)P(A2)
= 0.91 × 0.6 + 0.85 × 0.4 = 0.886,
using (39.6). Hence the probability of a non-faulty component is 0.89 approximately.
In solving this problem we have encountered a new law in (b) called the law of total
probability which will be discussed further in Section 39.7.

Self-test 39.4
A manufacturer buys components from three suppliers: 50% from supplier
S1, 30% from S2, and 20% from S3. A component from S1 is found to be faulty
with probability 0.05, from S2 with probability 0.07 and from S3 with probab-
ility 0.06. What is the probability that a component chosen at random is not
faulty?

39.5 Independent events


The recognition of independence of events and data is crucial in probability and
statistics. Two events are said to be independent if the occurrence of either event
has no effect on the occurrence of the other. In terms of conditional probability
this means that two events A and B are independent if and only if
P(B |A) = P(B) or P(A| B) = P(A). (39.7a)

In that case
P(A ∩ B) = P(A)P(B) (39.7b)

by (39.6). The independence result (39.7b) generalizes for N independent events


A1, A2, … , AN to
P(A1 ∩ A2 ∩ ··· ∩ AN) = P(A1)P(A2) … P(AN). (39.8)

The following simple illustration shows the distinction between dependent


and independent events. Two cards are chosen at random from a pack of 52 cards.
In the first case, the first card is replaced before the second card is chosen. The
events considered are
878
A = {first card is an ace},
B = {second card is an ace}.
PROBABILITY

Then

P(A) = 4
52 = 1
13 , and P(B | A) = 4
52 = 1
13 = P(A).
In other words the events are independent.
39

On the other hand if there is no replacement, then

P(A) = 1
13 but P(B | A) = 3
51 ≠ P(A),
indicating that A and B are not independent events.

Example 39.12 Figure 39.4 shows parts of two circuits which contain electrical
components P, Q, and R placed in parallel and series. For the parallel case the
circuit fails if all three components fail, but in the series case failure occurs if just
one component fails. In some time interval the probabilities of failure of P, Q,
and R are respectively p, q, and r. What are the probabilities of circuit
breakdown in the two cases?

(a) P (b)
p
Q
q
R P Q R
r p q r

Fig. 39.4 (a) Components in parallel. (b) Components in series.

Let A, B, and C be the events


A = {P fails}, B = {Q fails}, C = {R fails},
where we assume that failures are independent events.
For the parallel case failure occurs if A ∩ B ∩ C occurs. By (39.8)
P(A ∩ B ∩ C) = P(A)P(B)P(C) = pqr,
which means that the probability of failure is pqr.
For the series case failure occurs if the event A ∪ B ∪ C occurs. Using (39.5) twice
and (39.8)
P(A ∪ B ∪ C) = P(A) + P(B ∪ C) − P(A ∩ (B ∪ C))
= p + P(B ∪ C) − P(A)P(B ∪ C)
= p + (1 − p)(P(B) + P(C) − P(B ∩ C))
= p + (1 − p)(q + r − qr)
= (p + q + r) − (qr + rp + pq) + pqr
which is the probability of series failure.
879

Self-test 39.5

39.6
In a large collection of components, the probability that any component
is faulty is 0.05. Three components are chosen at random. Determine the

TOTAL PROBABILITY
following probabilities: (a) the three components are all faulty; (b) only one
component is faulty; (c) at least one component is faulty; (d) at least two
components are faulty.

39.6 Total probability


Suppose that a sample space is partitioned (see Section 39.2) into two events A1
and A2 which are mutually exclusive. In other words A1 ∩ A2 = Ø and A1 ∪ A2 = S.
Let B be an event in S (see Fig. 39.5). The sets B ∩ A1 and B ∩ A2 are mutually
exclusive so that
P(B) = P(B ∩ A1) + P(B ∩ A2).
From the notion of conditional probability (39.6) we obtain:

A1 A2

S Fig. 39.5

The law of total probability


For mutually exclusive events A1 and A2
P(B) = P(B |A1)P(A1) + P(B| A2)P(A2). (39.9)

The result generalizes to the case in which S contains n mutually exclusive and
exhaustive events A1, A2, … , An. If B is an event in S, then
n
P(B) = ∑ P(B | A )P(A ).
i i
i=1

Example 39.13 A box contains 8 red and 13 black components. A machine


draws components at random from the box and fits them into a circuit. What
is the probability that the second component is red?
Suppose now that components in the box are replaced with components
of the same colour as they are used. What is the probability that the second
component is red? ➚
880
Example 39.13 continued
PROBABILITY

Define the event as follows:


A1 = {first component is red},
A2 = {first component is black},
B = {second component is red}.
Then
39

P(A1 ) = 8
21 , P(A2 ) = 13
21 .
Also
P(B | A1 ) = 7
20 , P(B | A2 ) = 8
20 .
Using (39.9), since A1 and A2 are mutually exclusive
P(B) = P(B| A1)P(A1) + P(B |A2)P(B| A2)
= 7
20 ⋅ 218 + 8
20 ⋅ 13
21 =
8
21 .
Hence the probability that the second component draw is red is 8/21, which is the same
as P(A1). This suggests (correctly) that the probability that the second ball is red does
not depend on the colour of the first ball.
The first solution was selection without replacement. In the second part of the
question the components are replaced. In this case P(B) = 218 ; in other words, with or
without replacement, the probability that the second component is red is still 8/21.

Self-test 39.6
In Example 39.13, what is the probability that the third component is red
when randomly drawn from the box.

39.7 Bayes’ theorem


Suppose that the sample space S is the union of the mutually exclusive events A1
and A2. In this case A2 = D1, and the notation suggests the generalization which
follows. Suppose that an event B occurs. We ask the question: if B occurs, what is
the probability that A1 occurs? In other words, what is P(A1 | B)?
From the rule for conditional probability (39.6) we can deduce that, since
B ∩ A1 = A1 ∩ B, etc.,

P(B ∩ A1) = P(A1 | B)P(B) = P(B| A1)P(A1), (39.10)

P(B ∩ A2) = P(A2 | B)P(B) = P(B| A2)P(A2). (39.11)

From (39.9) we also have

P(B) = P(B |A1)P(A1) + P(B |A2)P(A2). (39.12)


881
Elimination of P(B) between (39.10) and (39.12) leads to:

PROBLEMS
Bayes’ theorem
For mutually exclusive events A1 and A2
P(B | A1 )P(A1 ) P(B | A1 )P(A1 )
P(A1 | B) = = .
P(B) P(B | A1 )P(A1 ) + P(B | A2 )P(A2 ) (39.13)

Example 39.14 It is known that 4% of a batch of components in a


manufacturing process are faulty. Components are tested on the production line
with 90% probability that a fault component is detected, but it is known that
in 2% of the cases a component which is not faulty is nevertheless recorded as
faulty. What is the probability that a component which is recorded as faulty
is actually faulty?
Let A1 and A2 be the events
A1 = {component faulty},
A2 = {component not faulty},
and let B be the event
B = {test indicates faulty}.
Then
P(A1 ) = 0.04, P(A2 ) = 0.96, P(B| A1 ) = 0.90, P(B |A2 ) = 0.02.
We require the probability that the component is faulty given that the test recorded
faulty, that is P(A1 |B). By Bayes’ theorem (39.13)
P(B | A1 )P(A1 ) 0.9 × 0.04
P(A1 | B) = =
P(B | A1 )P(A1 ) + P(B | A2 )P(A2 ) 0.9 × 0.04 + 0.02 × 0.96
= 0.65.
If the sample space is partitioned by {Ai} (i = 1, 2, … , n) then the generalized Bayes’
theorem is
P(B | Ai )P(Ai )
P(Ai | B) = n .
∑ j=1 P(B | Aj )P(Aj )

Problems
39.1 How many elements do the following sample 39.3 Two dice are rolled and the scores noted.
spaces contain? Write down the elements in the sample space. How
(a) the spinning of five coins; many elements does the set have? Let A denote the
(b) the sum of the faces of three dice; event {the sum of the outcomes is 5}, and B denote
(c) a coin and a die randomly thrown together; the event {at least one die shows 4}.
(d) a dart thrown at a dartboard. Express the sets of these events in formula terms.
List all the elements in A, B, A ∪ B, and A ∩ B.
39.2 Two dice are rolled. What is the probability
that the sum of the face values is 7? What is the 39.4 Suppose that A, B, and C are three events of
probability that no 5 appears? What is the the sample space S. Write down the set formulae
probability that the score is 7 or less? for the events:
882
(a) only B occurs, 39.13 Prove that
(b) exactly one of A, B, or C occurs. C + n−1Cr−1 = nCr.
PROBABILITY

n−1 r

39.5 Suppose that a sample space S includes the 39.14 Prove that
events A and B. Show that the number of elements n n
in A ∪ B can be expressed as (a) ∑ nCr = 2n , (b) ∑ Cr 3r = 4n.
n
n(A ∪ B) = n(A ∩ B) + n(D ∩ B) + n(A ∩ E) r= 0 r= 0

(this is an alternative version of (35.7)). 39.15 How many different four-card hands can be
39

Suppose two dice are rolled. Let A denote the dealt from a deck of 52 playing cards? How many
event {the sum of the outcomes is 6} and B the hands contain four cards of the same suit? What
event {both dice show the same number}. List the is the probability that a hand dealt randomly
elements in A ∩ B, D ∩ B, and A ∩ E, and find n(A contains four cards from the same suit?
∪ B) using the formula above.
39.16 In the previous question investigate how the
39.6 A card is drawn from a deck of 52 playing probabilities change for n-card hands (1  n  13)
cards. If A is the event that an ace is drawn, B is with n cards from the same suit.
the event that a heart is drawn, and C is the event
that a black card is drawn, explain in terms 39.17 A box contains 22 balls of which 7 are red,
of the cards drawn what the following events 9 are white, and 6 are black. Four balls are drawn
represent: at random from the box without replacement.
(a) A ∩ B; (b) A ∩ C; (c) A ∪ B; Find the probability that
(d) A ∪ B ∪ C; (e) A\B; (f) D\E; (g) D\y; (a) three red balls and one white ball
(h) (A ∩ B) ∪ C; (i) (A ∩ B) ∪ (A ∩ y). are drawn;
(b) the balls are red;
39.7 Cards are drawn from a deck of 52 (c) the balls are all of the same colour;
playing cards without replacement. What is (d) there is at least one ball of each colour.
the probability that
(a) the first card is a king? 39.18 A production line is supplied with the
(b) the first two cards are kings? same component made by two different machines
(c) the first card is a king, the second and third M1 and M2. It is known from samples of the
cards are not kings, and the fourth card outputs that the probability that a component
is a king? from M1 is not faulty is 0.89 and from M2 is 0.83.
Machine M1 supplies 70% of the components
39.8 A well-shuffled deck of cards is cut twice and machine M2 30%. Components are
randomly. What is the probability that two aces chosen at random and tested before the
are shown? (This is a problem of selection with next stage of production. What is the
replacement.) probability that
(a) given that it was made by M1, a component
39.9 Evaluate the following permutations: is not faulty?
(a) 5 P3; (b) 10 P4; (c) 7 P7; (d) 7 P1. (b) a component is not faulty?
(c) given that a component was faulty that was
39.10 How many different three-letter ‘words’ manufactured by M2?
can be made up from the letters a, b, c, d, e with
no repetition of letters? 39.19 A production line is supplied with the
same component made by three different machines
39.11 How many five-digit numbers can be M1, M2, and M3. It is known from samples of the
formed (numbers cannot start with 0) from outputs that the probability that a component
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, if from M1 is not faulty is 0.87, from M2 is 0.84, and
(a) numbers are selected without replacement? from M3 is 0.91. Machine M1 supplies 45% of the
(b) any number of repetitions of numbers is components, machine M2 30%, and machine
allowed? M3 25%. Components are chosen at random
(c) without replacement but such that the number and tested before the next stage of production.
must be divisible by 5? What is the probability that
(a) a component is not faulty?
39.12 Calculate the following combinations: (b) given that a component was faulty that it was
(a) 7C3; (b) C96;
99 (c) 11C5. manufactured by M2?
883
p1
r1

PROBLEMS
p2 q

r2
p3

Fig. 39.6

(c) given that a component was faulty that it was 2, 3, …, 49 without replacement. Prizes are given
made by M1 or M2? to those who correctly select three, four, five, or
six numbers. Find the probability of winning in
39.20 Figure 39.6 shows part of a circuit with six each case. A seventh bonus ball is also drawn from
components in a parallel and series combination. the remaining 43 balls and further prizes are given
The probabilities of failures of components are p1, for those who correctly choose the bonus ball
p2, p3, q, r1, and r2 as shown and are independent. and any five of the six drawn numbers. Find the
What is the probability that this part of the circuit probability of winning in this case. What is the
fails? If all components have the same probability overall probability that a lottery ticket wins
of failure of 0.98, what is the probability that this at least one prize?
part of the circuit fails? (Parallel and series failures
are as in Example 39.12.). 39.23 A game is played in which n players each
39.21 It is known that in a batch of 100
spins a coin and the outcome is examined. The
microprocessors, 5 are defective. game continues until the outcome is either n − 1
(a) A microprocessor is chosen at random without heads and 1 tail, or 1 head and n − 1 tails. The
replacement. What is the probability that it is single player with the different outcome wins
defective? the coins from the other players. Show that the
(b) Two are chosen at random without probability that the game ends at a given play is
replacement. What is the probability that n/2n−1, and that the probability that the game
finishes at the ith play is given by the geometric
both are defective?
distribution
(c) Two are chosen without replacement. Given i −1
that the first is defective, what is the n ⎛ n ⎞
⎜1 − n−1 ⎟ .
probability that the second is also defective? 2n−1 ⎝ 2 ⎠
39.22 In the UK national lottery 6 numbered balls Find also the mean number of plays to the end of
are selected at random from 49 balls numbered 1, the game.
Random variables and
40 probability distributions

CONTENTS

40.1 Probability distributions 885


40.2 The binomial distribution 887
40.3 Expected value and variance 889
40.4 Geometric distribution 891
40.5 Poisson distribution 892
40.6 Other discrete distributions 894
40.7 Continuous random variables and distributions 895
40.8 Mean and variance of continuous random variables 897
40.9 The normal distribution 898
Problems 901

In experiments or trials in which the outcome is numerical, the outcomes are


values of what is known as a random variable. For example, suppose that a coin
is spun three times and we record the outcomes and ask: how many heads appear?
Then the answer will be 0, 1, 2, or 3 heads. The sample space S, which lists all
possible outcomes in trial, has eight elements given by
S = {(HHH), (THH), (HTH), (HHT), (TTH), (THT), (HTT), (TTT)}.
The random variable X associated with the question is the number of heads
obtained. Generally, the random variable X assigns a number to each event in the
sample space S. This set of numbers is denoted by SX. In this example
SX = {0, 1, 2, 3},
which is a list of the possible numerical outcomes of the number of heads.
The random variable X can be thought of as a function or mapping from the
sample space S to SX which, since it is a set of real numbers, can be represented by
points on a straight line. A representation of the mapping is displayed in Fig. 40.1
in which it is shown that the element s in S is mapped by X into the value X(s) on
the real line SX. In the example s could be (HTT) giving X(HTT) = 1, but notice
that X(THT) = X(TTH) = 1 also.
In the example above, X is a discrete random variable with a finite set of out-
comes. In some cases the possible outcomes are infinite in number but can still be
counted. For example, suppose that X is the random variable of the number of
885

40.1
s

PROBABILITY DISTRIBUTIONS
S SX
Fig. 40.1 Mapping of the random
0 1 2 3 variable X from the sample space
X(s) S onto the real line SX.

spins of a coin until a head appears. The list of possible outcomes is {1, 2, 3, … }
which is unbounded but countable.
Obviously many random variables can be associated with the same experiment.
In the example above where a coin is spun three times, a random variable Y, say,
could be the number of tails observed.
Generally, capital letters X, Y, … denote random variables and small letters
{x1, x2, … } sets of elements in a sample space.

40.1 Probability distributions


Let the random variable X take the values x1, x2, … (depending on the context it
is sometimes more convenient to start with x0, x1, … ), where the set of numbers
can be finite or infinite. In terms of a random variable we write probabilities as
P(X = xi ), which means the probability that the random variable X takes the value
xi. We might be interested in P(X  xi ), which is the probability that the random
variable takes values strictly less than xi, and so on. Often we denote P(X = xi )
simply by the symbol pi . The pairs (xi, pi) for i = 1, 2, … define the probability
distribution or probability function for the random variable X. Note that for any
probability distribution of a discrete random variable we must have:

Probability distribution P(X = xi ) = pi


(i) 0  pi  1;
(ii) ∑i=1
n
pi = 1, if X has n possible outcomes, or ∑i=1

pi = 1 if X has a countably
infinite set of outcomes. (40.1)

A discrete probability distribution can be expressed in a table such as

xi = x1 x2 x3 …
pi = p1 p2 p3 …

For the coin spun three times in Section 40.1, the distribution would be, with x0
representing no heads, x1 one head, and so on,
886

xi = 0 1 2 3
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

pi = 1
8
3
8
3
8
1
8

assuming that each of the outcomes in the original S is equally likely.

Example 40.1 A box contains six components of which two are defective.
Components are selected at random without replacement until a defective
component is chosen. Find the probability distribution of the number of
components drawn from the box.
Let X be the random variable (number of components withdrawn including the
defective). Then
SX = {1, 2, 3, 4, 5} = {xi} (i = 1, 2, 3, 4, 5).
The probability
p1 = P(X = x1 ) = 26 = 13 ,
since there is a 2 in 6 chance of choosing a defective on the first selection. Also
P(X = x2 ) = 64 ⋅ 25 = 154 ,
since the probability of choosing a non-defective component at the first stage is 46 which
leaves two defective in the remaining five. Similarly
P(X = x3 ) = 64 ⋅ 53 ⋅ 24 = 15 , P(X = x4 ) = 64 ⋅ 53 ⋅ 24 ⋅ 23 = 152 ,
P(X = x5 ) = 64 ⋅ 53 ⋅ 24 ⋅ 13 ⋅ 1 = 151 .
The complete distribution is
40

xi = 1 2 3 4 5
pi = 1
3
4
15
1
5
2
15
1
15

The distribution can be represented graphically as shown in Fig. 40.2.

pi
0.4

0.3

0.2

0.1

O 1 2 3 4 5 xi Fig. 40.2

Self-test 40.1
Two dice are rolled until the sum of the face values is 7. Find the distribution
of the number of throws. (Use the table in Example 39.2: note that the
distribution is infinite.)
887

40.2 The binomial distribution

40.2
Suppose a series of trials are independent, and have two possible outcomes which
occur with probabilities p and (1 − p). If p is constant throughout then these are

THE BINOMIAL DISTRIBUTION


known as Bernoulli trials. A simple example is the spinning of a coin. We could
define a random variable X which takes the value 1 if a head appears and 0 if a tail
appears. The probabilities that these occur is --21 in each case. The terms success and
failure are frequently used in this context, and generally Bernoulli trials apply to
populations that naturally divide into pairs of alternatives, for example on /off,
male/female, alive /dead, etc. With 1/0 representing success/failure, a Bernoulli
sequence of trials might look like
1000111100101100….
If p is the probability of success at each trial and q = 1 − p is the probability of
failure, then the probability distribution of Bernoulli trials is

xi = 0 1
pi = q p

Let us consider a further distribution which can arise from Bernoulli trials. A
series of independent Bernoulli trials takes place with the probability of success or
failure of any given trial given by p or q where p + q = 1. Consider the probability
distribution of i successes in a fixed number of trials n. In the notation of probab-
ility distributions
xi = i (i = 0, 1, 2, … , n).
Here is a particular sequence:
1
 1…
11 00…
00 .
i times n − i times

This sequence, in which there are i successes followed by n − i failures, occurs with
probability
piqn−i,
since the probability of a success followed by another success is p × p = p2, and
so on. However, there are many sequences which have i successes (1) and n − i
failures (0), and the number of possible arrangements is nCi (see Section 39.4 for
an explanation of the combination notation). Including every arrangement, the
probability of i successes in n trials is
nCi piqn−i.
This is called the binomial distribution for X, the random variable of the number
of successes in n trials.

Binomial distribution which has the probability function


n!piqn − i
P(X = xi) = pi = nCi piqn−i = (i = 0, 1, 2, …, n).
(n − i)!i! (40.2)
888
The binomial distribution contains two parameters, n the number of trials and p
the probability of success. Since q = 1 − p, the first few entries in the distribution are
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

xi = 0 1 2 3 …
n(n − 1) 2 n − 2 n(n − 1)(n − 2) 3 n −3
pi = qn npqn−1 pq pq …
2! 3!

which are recognizably the first few terms in the binomial expansion of (p + q)n
(see Appendix A(c)). Hence
n n
n!piqn − i
∑ p = ∑ (n − i)!i! = (p + q)
i=0
i
i=0
n = 1,

since p + q = 1. This confirms that (40.2) does satisfy the key requirement for a
probability distribution. Some bar charts for the binomial distribution are shown
for n = 10 and p = 0.3, 0.5, 0.7 in Fig. 40.3.

(a) (b) (c)


pi pi pi

0.2 0.2 0.2

0.1 0.1 0.1


xi xi xi
40

O 1 2 3 4 5 6 7 8 9 10 11 12 O 1 2 3 4 5 6 7 8 9 10 11 12 O 1 2 3 4 5 6 7 8 9 10 11 12

Fig. 40.3 Binomial distribution for n = 10 and (a) p = 0.3, (b) p = 0.5, (c) p = 0.7.

Example 40.2 Three dice are rolled simultaneously. What is the probability that
two 5s appear with the third face showing a different number?
Let the random variable X be the number of 5s which appear. Then
SX = {0, 1, 2, 3}.
The outcomes from each die are independent with a 5 showing called a success and
no 5 showing a failure. The probability that a single die shows a 5 is 16 . Hence X has a
binomial distribution with parameters n = 3 and p = 16 . Hence, by (40.2),
P(X = 2) = 3C2( 16 )2( 56 ) = 216
15
≈ 0.069,
which is quite small. The other probabilities are
P(X = 0) = 125
216 ≈ 0.579, P(X = 1) = 216 ≈ 0.347, P(X = 3) = 216 ≈ 0.005.
75 1

The odds for obtaining three 5s are 1 in 216.

Self-test 40.2
In a population of 1000 individuals, it is found that 350 are of height greater
than 1.8 m. A random sample of eight individuals is chosen. Find the probab-
ility distribution of the number of individuals of height greater than 1.8 m.
889

40.3 Expected value and variance

40.3
The expected value or mean or expectation of a random variable is defined in
terms of a weighted average of outcomes: the weighting is equal to the probability

EXPECTED VALUE AND VARIANCE


pi with which xi occurs. Thus if X is a random variable which can take the values
x1, x2, … with probabilities p1, p2, … then

Expected value or mean of X is defined by


E(X) = ∑p xi
i i

where the summation is over all i, either finite or countably infinite.


The symbol µ is often used for the expected value E(X). (40.3)

For the binomial distribution (40.2) with parameters n and p, the expected
value is (note that the distribution has n + 1 elements)
n
E(X) = ∑
i=0
nCi piqn − ii
n
n!piqn − i n
(n − 1)! pi−1qn − i
= ∑1 (n − i)!(i − 1)! = np ∑1 (n − i)!(i − 1)!
i= i=
n −1
(n − 1)! piqn − i−1
= np ∑ = np(p + q)n −1
i=0 (n − 1 − i)!i !
= np,
using the binomial expansion (Appendix A(c)).

Example 40.3 In Example 40.2 what is the expected value of the number of 5s
which appear when three dice are rolled?
From the definition of expected value and the results in the previous example,
3
125 75 15 1 108 1
E(X) = ∑ P(X = i)i = 216 ⋅ 0 + 216 ⋅ 1 + 216 ⋅ 2 + 216 ⋅ 3 = 216 = 2 .
i=0
This result checks with np = 3
6 = 12 .

Random variables can be combined as in X + Y, and it is possible to consider


functions of random variables g(X). Expected values satisfy the following theorems
(which will not be proved here, however). If c is a constant, X and Y are random
variables, and g(X) is a given function of X, then

Rules for expected values


(i) E(cX) = cE(X);
(ii) E(X + Y) = E(X) + E(Y);
(iii) E(XY) = E(X)E(Y) (if X and Y are independent);
(iv) E(g(X)) = ∑ i=1
n
g(xi)pi (for a finite distribution). (40.4)
890
Whilst the expected value of a random variable is a useful average it gives no
idea of the spread of the distribution about the expected value. Two distributions
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

can have the same mean but can have very different shapes in relation to the mean.
A measure of the spread is the difference X − E(X), the difference between the
random variable and its mean. However, its expectation is always zero since,
using (40.4)(i), (ii) above,
n n n
E(X − E(X)) = ∑ (xi − E(X))pi =
i =1
∑ xi pi − E(X) ∑ pi = E(X) − E(X) ·1 = 0,
i =1 i =1

which is obviously not helpful as a measure of spread. Instead we choose the random
variable (X − E(X))2. Its expected value is known as the variance and is denoted by

Variance of a random variable


Var(X) = σ 2 = E[(X − E(X))2] = E[(X − µ)2],
where µ = E(X). (40.5)

Using (40.4), note that the variance can be expressed in the form
Var(X) = E(X 2 − 2 µ X + µ2) = E(X 2 ) − 2 µ E(X) + µ2 = E(X 2 ) − µ2 (40.6)

which is more convenient.


Since the units associated with the variance are squares, the symbol σ 2 is fre-
quently used for variance so that in ‘linear’ terms the spread can be defined by
σ = Var(X). This is known as
40

Standard deviation of the random variable X:


σ = Var(X). (40.7)

Example 40.4 Find the variance of the binomial distribution given by (40.2).
Using (40.2) and (40.4)(iv)
n n
i ⋅ n!piqn− i
Var(X) = E(X2) − µ2 = ∑ i2 nCi piqn−i − µ 2 = ∑ (n − i)!(i − 1)! − µ .
2

i=0 i =1

As a device for summing the series we assume that p and q are independent parameters,
and use the formula E(X) = np(p + q)n−1 for the expected value of the binomial
distribution. Thus
n
i ⋅ n!piqn− i ∂ ⎛ n n!piqn− i ⎞
∑ (n − i)!(i − 1)! − µ
i =1
2
=p ∑
∂p ⎝ i =1 (n − i)!(i − 1)!⎟⎠
⎜ − µ2

∂ ∂
=p (E(X)) − µ 2 = p (np( p + q)n−1) − µ 2
∂p ∂p
= p[n(p + q)n−1 + n(n − 1)p(p + q)n−2] − n2p2
= pn[1 + p(n − 1)] − n2p2 (since p + q = 1)
= np(1 − p).
891
The following rules for variances can be proved:

40.4
Rules for variances
(i) Var(X + c) = Var(X);

GEOMETRIC DISTRIBUTION
(ii) Var(cX) = c2 Var(X);
(iii) Var(X + Y) = Var(X) + Var(Y) (if X and Y are independent). (40.8)

Self-test 40.3
Find the expected value and variance of the distribution obtained in
Self-test 40.2.

40.4 Geometric distribution


Consider again a sequence of independent Bernoulli trials explained in Section 40.3
with in any trial a probability p of success (1) and a probability q = 1 − p of failure
(0). Suppose that we are interested in the number of trials up to and including the
first success. Call this random variable X, and let X = i correspond to the sequence
0 0 0 … 0 1.
i − 1 times

The probability of i − 1 consecutive failures is q i−1 so that the probability (i − 1)


failures is followed by a success is
P(X = i) = pi = q i−1p (i = 1, 2, … ).
Unlike the binomial distribution, this distribution has an infinite sample space.
It defines

The geometric distribution


P(X = i) = pi = (1 − p)i−1p (i = 1, 2, … ). (40.9)

Note that
∞ ∞
p
∑p
i =1
i = p ∑ qi −1 = p(1 + q + q2 + $ ) =
i =1 1−q
= 1,

using the formula for the sum of a geometric series (see Section 1.16). A bar chart
of a geometric distribution with p = 0.2 is shown in Fig. 40.4.
The expected value of the random variable of the geometric distribution is
∞ ∞
µ = E(X) = ∑ ip = ∑ ipq
i =1
i
i =1
i −1 = p(1 + 2q + 3q2 + … )

= p[(1 + q + q2 + … ) + q(1 + 2q + 3q2 + … )]


p
= + qµ = 1 + qµ.
1−q
892

pi
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

0.15

0.10

0.05
xi
Fig. 40.4 Geometric distribution
O 1 2 3 4 5 6 7 8 9 10 11 12 with p = 0.2.

Hence µ = 1/p. In a similar manner it can be shown that the variance is given by
σ 2 = (1 − p)/p2.

Example 40.5 In a drug-testing programme, independent and sequential tests


are conducted. Each test costs £500. The probability of success at each test is
p. However, for each test after the first there is an additional cost per test of
£200. What should p be greater than if the expected cost of the tests should
not exceed £2000?
Let X be the random variable of the number of tests up to and including the first
success. We are actually interested in a random variable which is a function of X,
namely the cost C(X) which is given by
C(X) = 500X + 200(X − 1) = 700X − 200.
Thus C(1) = 500, C(2) = 1200, C(3) = 1900, etc. Using (40.2), the expected value of
C(X) is
40

700
E(C(X)) = E(700X − 200) = 700E(X) − 200 = − 200,
p
since X is a random variable with a geometric distribution. This expected cost is less
than £2000 if
700 700
− 200  2000 or  2200.
p p
Hence the probability must satisfy the inequality p  7
22 ≈ 0.32.

Self-test 40.4
For the geometric distribution, prove that its variance is (1 − p)/p2.

40.5 Poisson distribution


Let X be a random variable which can take values 0, 1, 2, … with probability
λ n e− λ
pn = P(X = n) = (n = 0, 1, 2, … ).
n!
This is a probability distribution since
∞ ∞
λ n e− λ
∑ pn =
n =0
∑n =0 n!
= e− λ eλ = 1,
893
λ
using the power series for e (see eqn (5.4b)). This is known as the Poisson
distribution with parameter λ. It occurs in problems in which discrete data

40.5
accumulate. The distribution is appropriate for data arriving in a sequential
random manner.

POISSON DISTRIBUTION
The Poisson distribution has mean

n λ n e− λ ∞
λ n e− λ
µ = E(X) = ∑0 n! = ∑ (n − 1)!
n= n =1

λ n e− λ ∞
λn
= λ∑ = λ e− λ ∑ = λ e − λ eλ = λ
n =0 n ! n =0 n !

(using (5.4b)). Its variance is



n 2 λ n e− λ
σ 2 = Var(X) = E(X 2 ) − µ 2 = ∑ n!
− µ2
n =1

nλn d
= e− λ ∑ − λ 2 = e− λ λ [λ eλ ] − λ 2
n =1 (n − 1)! dλ
= λ e−λ(eλ + λ eλ) − λ2 = λ.
Apart from being a distribution in its own right, the Poisson distribution is also
a useful approximation to the binomial distribution (see Section 40.3) for large n.
In the binomial term nCi piqn−i put the parameter p = λ /n. Then
i n−i
i n −i =
n! ⎛ λ ⎞ ⎛ λ⎞
nCi p q ⎜ ⎟ ⎜1 − ⎟
(n − i)!i ! ⎝ n ⎠ ⎝ n⎠
i n−i
n(n − 1) … (n − i + 1) ⎛ λ ⎞ ⎛ λ⎞
= ⎜ ⎟ ⎜1 − ⎟
i! ⎝ n⎠ ⎝ n⎠
⎡⎛ 1⎞ ⎛ i − 1⎞ ⎤
⎢ ⎜1 − ⎟  ⎜1 − ⎟ ⎥ n
⎢⎝ n⎠ ⎝ n ⎠ ⎥⎛ λ⎞
=⎢ λ ⎥ ⎜1 − ⎟ .
i

⎛ λ⎞
i
⎝ n⎠
⎢ i! ⎜1 − ⎟ ⎥
⎢⎣ ⎝ n⎠ ⎥⎦

Now let n → ∞. The term in the square brackets approaches λi/i! as n → ∞,


whilst
n
⎛ λ⎞
⎜ 1 − ⎟ → e− λ .
⎝ n⎠

(This limit can be obtained by putting h = −n /λ in the approximation for e given


in Section 1.10.) Thus
λ i −λ
n Ci piqn − i → e
i!
as n → ∞. This is a useful approximation as the following application illustrates.
894

Example 40.6 Certain processors are known to have a failure rate of 1.2%.
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

They are shipped in batches of 150. What is the probability that a batch has
exactly one defective processor? What is the probability that it has two?
We assume that the defects are independent. We use the binomial distribution with
probability
pi(n, p) = nCi p iqn−i
with n = 150 and p = 0.012 (failure of the component is ‘success’ in the binomial
convention). Hence for i = 1, 2, a direct calculation gives
p1(150, 0.012) = 150C1(0.988)149(0.012)1 = 0.297 891,
p2(150, 0.012) = 150C2(0.988)148(0.012)2 = 0.269 549.
In this problem n is ‘large’, so that it is suitable for the Poisson approximation. The
parameter λ for the corresponding Poisson distribution is given by
λ = np = 150 × 0.012 = 1.8.
Hence the probability of one failure is
λ −λ
e = 1.8 e−1.8 = 0.297 538,
1!
and of two failures is
λ 2 − λ (1.8)2 −1.8
e = e = 0.267 784,
2! 2
which show accuracy to 2 decimal places compared with the binomial distribution. This
is more than sufficient in many applications. The Poisson approximation avoids the
rounding errors which can occur in calculating probabilities raised to large powers.
40

Self-test 40.5
The Poisson distribution is given by pi = λi e−λ/i!, (i = 0, 1, 2, … ) and the
binomial distribution by bi = n! pi(1 − p)n−i/[(n − i)!i!], (i = 0, 1, 2, … , n) (see
(40.2)). As shown in this section, the Poisson distribution can be used as an
approximation to the binomial distribution for large n. See how the distribu-
tions compare numerically for n = 20, p = 0.1, λ = pi = 2, for i = 0, 1, 2, … , 6.

40.6 Other discrete distributions


(a) The Pascal or negative binomial distribution
This is the distribution with function
pi = i−1Ck−1 p k(1 − p)i−k (i = k, k + 1, … ).
This distribution is an extension of the geometric distribution, and arises from
the random variable which is the number of Bernoulli trials to achieve k successes
where a success occurs with probability p. This is sometimes known as inverse
sampling, since the number of successes k is specified in advance.
Its mean and variance are
k k(1 − p)
µ= , σ2 = .
p p2
895
(b) Hypergeometric distribution

40.7
Consider a box containing w white balls and b black balls. Suppose that n balls are
chosen at random from the box without replacement. What is the probability
that i white balls are chosen? The i balls must be chosen from w, and the n − i balls

CONTINUOUS RANDOM VARIABLES AND DISTRIBUTIONS


from b. Hence the number of possible samples is wCi bCn−i. By this counting
method we obtain
Ci bCn − i
P(X = i) = pi = w
,
w + bCn

where
⎧0, 1, 2, … , n if n  w,
i=⎨
⎩0, 1, 2, … , w if n  w.
The function pi defines the hypergeometric distribution. Its mean and variance
are given by
nb nwb(b + w + n)
µ= , σ2 = .
w+b (w + b)2 (w + b − 1)
The same problem with replacement leads to the binomial distribution.

40.7 Continuous random variables and distributions


In many applications the discrete random variable which takes its values from a
countable list is inappropriate. For example, the random variable X could be the
time from, say, t = 0 until a light bulb fails. Whilst it would be possible to measure
failure to the nearest hour and use a discrete random variable, it is often more
convenient and more accurate to use a continuous random variable, which is
defined for the continuous variable t  0, and is no longer a countable list
of values.
Instead of the sequence of probabilities {P(X = xi )} = {pi }, we define a probab-
ility density function (pdf) f(x) over −∞  x  ∞ which has the properties:

Probability density function (pdf )


(a) f(x)  0 (−∞  x  ∞);

(b) ∫ −∞ f(x) dx = 1;
(c) for any x1, x2 such that −∞  x1  x2  ∞,


x2

P(x1  X  x2) = f (x) dx.


x1
(40.10)

A graph of a possible density function f(x) against x is shown in Fig. 40.5. By (a) the
curve must never fall below the x axis, by (b) the area under the curve must be 1,
and by (c) the probability that X lies between two values x1 and x2 is the shaded
area under the graph. Unlike pi, the pdf f(x) is not itself a probability.
We can associate with the pdf a cumulative distribution function (cdf) F(x)
which is defined by
896

f(x)
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

Fig. 40.5 Probability density


O x1 x2 x function.

Cumulative distribution function (cdf )


x

F(x) = P(X  x) = f (u) du.


−∞ (40.11)

For continuous random variables, P(X = x) is zero which is unhelpful. Only the
probability of X over an interval of x such as P(X  x) has a meaning. By (40.10b)
it follows that
F(x) → 1 as x → ∞,
and
x2

P(x1  x  x2) =  f (u) du = F(x2) − F(x1).


40

x1

A typical cdf, which must be a non-decreasing function, is shown in Fig. 40.6.

F(x)
1

Fig. 40.6 Cumulative distribution


O x function (cdf ).

Example 40.7 Let X be the random variable of time to failure of a light bulb
measured from time t = 0. Assume that X has a pdf
⎧α e−α t t  0
f (t) = ⎨
⎩0 t  0,
where t is measured in hours. What is the probability that the bulb has failed
at t = 10 hours? What is the probability that the light bulb fails between
t = 10 hours and t = 20 hours?
Note that f(t) is a pdf since f(t)  0 and
∞ ∞

−∞
f (t) dt =  αe
0
−α t
dt = [−e −α t]0∞ = 1.

897
Example 40.7 continued

40.8
For the first question we require
10
P(X  10) =  αe −α t
dt = [−eα t]10
0 = 1 − e
−10α
.

MEAN AND VARIANCE OF CONTINUOUS RANDOM VARIABLES


0

For the second question


20
P(10  X  20) = 10
e −α t dt = [−3−α t]10
20
= e−10α − e−20α = e−10α(1 − e−10α ).

Thus the light bulb fails before 10 hours with probability 1 − e−10α, and between 10 hours
and 20 hours with probability e−10α(1 − e−10α ).

The pdf in the previous example is the exponential distribution which is frequently
used in ‘time to failure’ problems. Its cdf is given by
⎧ x
F(x) = ⎨
0

⎪ α e −α u du = 1 − e −α x, x  0;
⎪ 0, x  0.

Note that density functions do not have to be continuous: they can include jumps.
Also if some event can only take place after a given time, say, then we put the density
function equal to zero until that time.

Self-test 40.6
In Example 40.9, assume that
10 t0
f(t) = 2 γ 0tβ
3 γ e−α(t−β ) t  β.
What is the relation between the parameters α, β and γ for f(t) to be a
pdf? What is the probability of failure of a light bulb failure by time t0 if
(a) t0  β; (b) t0  β?

40.8 Mean and variance of continuous random variables


By analogy with that for discrete random variables the expected value or the mean,
and the variance of a continuous random variable X with pdf f(x), are defined to be

Mean of continuous random variable:


µ = E(X) =
 −∞
xf (x) dx.

Variance of continuous random variable:


σ 2 = Var(X) = E((X − µ)2) =


−∞
(x − µ)2 f (x) dx.
(40.12)
898
For the exponential distribution with pdf given by
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

⎧α e−α t, t  0,
f (t) = ⎨
⎩0, t  0,

its mean or expected value is


∞ ∞ ∞

  αt e dt = − t dt (e ) dt
d
µ= tf (t) dt = −α t −α t

−∞ 0 0

 dt dt (integrating by parts)
d(t)
= −[t e−α t]0∞ + e −α t

0

e
1
=0+ −α t dt = .
0
α
It can be shown similarly that
1
σ2 = .
α2

Self-test 40.7
Prove that the variance of the exponential distribution with pdf
80 t 0
40

f(t) =
9 α e−αt t0
is 1/α 2.

40.9 The normal distribution


The normal distribution with pdf defined by

Normal distribution, N( µ, σ 2),


1
f (x) = e−( x − µ ) / 2σ , −∞  x  ∞,
2 2

σ 2π (40.13)

is particularly important in many applications. It has a symmetrical bell-shaped


distribution about its mean µ. Note also that σ in (40.13) is its standard deviation.
A typical normal distribution is shown in Fig. 40.7. It can be proved that
∞ ∞ ∞

−∞
f (x) dx = 1,
 −∞
xf (x) dx = µ,

−∞
x2 f (x) dx = σ 2 .

The normal distribution N( µ, σ 2) is a two-parameter distribution with its mean


and standard deviation as parameters.
899

f(z)

40.9
f(x) 0.4

0.3

THE NORMAL DISTRIBUTION


0.2
0.1

x −3 −2 −1 O 1 2 3 z

Fig. 40.7 A normal distribution. Fig. 40.8 The standard normal curve.

The standardized normal distribution is N(0, 1), with pdf given by


1
f (z) = e− 2 z .
1 2


It has mean zero and standard deviation 1. Any normal random variable X with
distribution N( µ, σ 2) can be ‘standardized’ by considering the random variable
Z = (X − µ)/σ. In the distribution (40.12) this is equivalent to the substitution
z = (x − µ)/σ. The standard normal curve representing N(0, 1) is shown in Fig. 40.8.
The standard deviations within 1, 2, 3 units of the mean are also shown in the
figure. If Z is the corresponding random variable then the probability that Z
lies within one standard deviation of the mean zero is the area under the curve
between −1 and 1. Thus
1

2π 
1
P(−1  Z  1) = e− 2 z dz = 0.6827,
1 2

−1

where numerical integration is required to evaluate this integral. Tables of standard


normal distributions can also be used to estimate the answer (see Appendix H).
Similarly
2

2π 
1
P(−2  Z  2) = e− 2 z dz = 0.9545,
1 2

−2
3

2π 
1
P(−3  Z  3) = e− 2 z dz = 0.9973.
1 2

−3

The last result implies that there is a 99.73% chance that a selected item lies within
three standard deviations of the mean for the standardized normal distribution.
The importance of the normal distribution lies in the observation that in many
measurements, which almost always involve random experimental errors, the
distribution of the errors seems to be normal.
The cdf for the standardized normal distribution N(0, 1) is

2π 
1
Φ(z) = P(Z  z) = e − 12 u 2
du,
z
900

Φ (z)
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

z Fig. 40.9 Cumulative distribution


function for the standardized
−4 −2 2 4 normal distribution.

whose values can be obtained from Appendix H. A graph of Φ(z) against


z is shown in Fig. 40.9: it can be used to estimate probabilities for the normal
distribution.

Example 40.8 The mean height of 459 university students is 180 cm with
a standard deviation of 4.2 cm. Assuming that the heights are normally
distributed estimate the number of students who have heights greater than
200 cm, and the number who have heights between 175 cm and 185 cm.
For this sample µ = 180 and σ = 4.2. Hence the normal distribution N(180, 17.64)
is given by
1 1 −(x −180)2 /35.28
e .
2π 4.2
40

We can obtain the corresponding standardized normal distribution by putting


x − 180
z= .
17.64
If x = 200 then z = (200 − 180)/17.64 = 1.13. Hence
P(Z  1.13) = 1 − F(1.13) = 1 − 0.87 = 0.13,
approximately (this can be read either from Fig. 40.8, or by using tables). Hence around
0.13 × 459 ≈ 59 students will have heights in excess of 200 cm; similarly if x = 195, then
z = −0.28, and if x = 185 then z = 0.28. Thus
P(−0.28  Z  0.28) = F(0.28) − F(−0.28) = F(0.28) − (1 − F(0.28))
= 2F(0.28) − 1 = 2 × 0.61 − 1 = 0.22,
approximately. Hence it expected that about 101 students will have heights between
175 cm and 185 cm.

For an extensive treatment of probability and random processes, including theory


and applications, see Grimmett and Stirzaker (2001).
901
Problems

PROBLEMS
40.1 A biased coin is spun three times. The 1001100011100100
probability of a head appearing is 0.45 and of a tail 7 successes will have occurred in 16 trials, that
0.55. If X is the random variable of the number of is X = 16 for r = 7 in this case. Show that
heads shown, what is the sample space of X? What
is the probability distribution of X? ⎛ i − 1⎞ r
pi = P(X = i ) = ⎜ ⎟ p (1 − p)i − r
Sketch a bar chart showing the probability ⎝ r − 1⎠
distribution. What is the probability that X is for i = r, r + 1, r + 2, … . This is the negative
greater than or equal to 1, that is P(X  1)? binomial distribution. Confirm that

40.2 Explain why the sequence ∑p
i=r
i = 1.
1⎛ 1 1 ⎞
pj = ⎜ j + j − 1⎟ ( j = 1, 2, 3, … ) Show that
3⎝2 2 ⎠
r r(1 − p)
can be interpreted as a probability distribution. E(X) = , Var(X) = .
p p2
If P(X = j ) = pj find P(X  6).
40.9 In a milk-bottling plant bottles are filled
40.3 The probability of success in a sequence of with milk and their weights checked. If a bottle
1
independent Bernoulli trials is . If 12 trials take
3 is underweight or more than 4% overweight the
place calculate the probabilities of 0, 1, … , 12 production line is stopped and the problem
successes. Calculate also the mean and standard investigated. Assume that a bottle fails randomly
deviation of the random variable which is the with the same probability p. What would be an
number of successes. appropriate distribution for this problem? On
average it is found that breakdown occurs every
40.4 The uniform distribution has the pdf 1503 bottles. What is the probability that an
⎧1 /(b − a) a  x  b individual bottle fails the weight test?
f (x) = ⎨
⎩0 elsewhere.
40.10 Suppose that the random variable X has the
Sketch the graphs of the pdf and its cdf. Find the exponential distribution with pdf
mean and standard deviation of the uniform
⎧1.5 e −1.5 t , t  0,
distribution. f (t) = ⎨
⎩0, t  0.
40.5 Prove that the variance of the geometric Find the following probabilities
distribution pi = (1 − p)i−1p, (i = 1, 2, … ), is (a) P(0  X  1); (b) P(X  0); (c) P(X  1);
(1 − p)/p2. (d) P(X  1); (e) P(X  2) or P(X  1).

40.6 Components join a production assembly 40.11 Calls to a freephone information line are
line in sequence. The probability that a particular assumed to occur so that the times between calls
component is faulty is 0.012. How many are exponentially distributed with mean time of
components (excluding the faulty component) 20 minutes between calls. If X is the random
will be expected to join the assembly line before variable of the time between calls, (a) What
a faulty one is encountered? What is the standard is the probability that there are no calls in
deviation of the number of components to a one-hour interval?
failure? (b) What is the probability that there is at least
one call within a 15-minute interval?
40.7 A coin is spun until a tail a shown. What is
the probability that eight heads appear before the 40.12 A geiger counter is an instrument for
first tail? counting the number of radioactive particles
emitted by a radioactive sample which strike the
40.8 In a series of Bernoulli trials the probability instrument. In a probability model of the counter,
of success is p. Let X be the random variable until the random variable X, which is the number of
r successes occur. For example, if 1 denotes success radioactive particles detected in a given time
and 0 denotes failure then, in the sequence interval, has a Poisson distribution
902
e−λ λ n 40.15 It is required in an application that
P(X = n) = (n = 0, 1, 2, … )
n! ⎧A(a 2 − t 2 ) −a  t  a
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

f (t) = ⎨
where λ is a parameter which characterizes the ⎩0 elsewhere
radioactivity of the sample. Show that the mean
should be a pdf. What should the parameter A be
and variance of the probability distribution are
in terms of a? Find the variance of the distribution.
both λ .
What should A and a be for the distribution to
What is the probability that five or more hits
have a standard deviation of 1?
occur in the time interval?
40.16 The time to failure of catalytic converters
in exhaust systems of cars is modelled by a normal
40.13 The random variable Z has a standardized random variable with mean of 1200 hours. If 95% of
normal distribution. Estimate the following the converters are to last at least 1000 hours without
probabilities: failure, what is the maximum value which the
(a) P(Z  0.8); (b) P(Z  0.7); standard deviation of the normal distribution can
(c) P(−0.5  Z  0.8). take?

40.17 The random variable of the time to


40.14 A particular repetitive operation on a failure in a batch of light bulbs is assumed to be
production line has a uniform distribution (see exponentially distributed with mean time to failure
Problem 40.4) with pdf f(t) = 0.1 for 33  t  43, of 500 hours. What is the probability that a light
where time is measured in seconds. What are the bulb is still functioning after 640 hours? A room is
mean time and variance of the operation? On lit by four light bulbs which are not replaced as they
average what proportion of operations take fail. What is the probability that just two bulbs will
longer than 40 seconds? still be working at 640 hours?
40
Descriptive statistics
41

CONTENTS

41.1 Representing data 903


41.2 Random samples and sampling distributions 908
41.3 Sample mean and variance, and their estimation 910
41.4 Central limit theorem 911
41.5 Regression 913
Problems 915

Statistics is a subject concerned with the collection, analysis, and interpretation


of data. Any method which seeks to interpret the data is a branch of statistical
inference. The data set usually consists of a random sample from some larger set
called a population and may be quite a small proportion of it. The objective is
to make inferences about the population as a whole from a small sample. Hence
if we want to find out what the mean salary of a population is, then a random
sample of individuals is taken, and the mean of the resulting sample is used to
estimate and make inferences about the unknown mean salary of the population.
Generally the values in a sample are known as variates. From this process of
sampling the aim is to infer properties about the whole population: this is known
as statistical inference. Any quantity calculated from the sample is known as a
statistic, and the corresponding (usually unknown) value in the population is
known as a parameter.

41.1 Representing data


Let us first look at graphical ways of representing the data. Table 41.1 shows the
number of vehicles which cross an automatic census cable on a road on a particu-
lar day. The day is split into two-hour time slots from midnight to midnight. We can
represent the data by a histogram, in which each two-hour slot is represented by a
rectangle whose height is the frequency or the number of vehicles in this case, and
whose width is the time interval, as shown in Fig. 41.1. We can draw a frequency
polygon by joining the midpoints of the tops of the rectangles. If there is a large
number of time intervals then the polygon may sometimes be replaced by a
smooth curve fitted to the data.
904
Table 41.1
DESCRIPTIVE STATISTICS

Time interval Number of vehicles

00:00–02:00 6
02:00–04:00 4
04:00–06:00 9
06:00–08:00 21
06:00–10:00 24
10:00–12:00 15
12:00–14:00 16
14:00–16:00 18
41

16:00–18:00 29
18:00–20:00 20
20:00–22:00 16
22:00–24:00 10

30
Number of vehicles

20

10

0 2 4 6 8 10 12 14 16 18 20 22 24
Time in hours

Fig. 41.1 Histogram of the data in Table 41.1.

Given a set of data, the design of histogram for the data is a matter of judge-
ment. In the example, the 24 hours were divided into 12 two-hour time intervals,
but we could have collected alternatively over 24 one-hour intervals. The intervals
are also known as cells or bins. The intervals should usually be of equal ‘width’.
Also the number of intervals should not be too large for the data. A working rule
is that the number of intervals may roughly increase like √n, where n is the number
of observations. In the example above there are 188 observed vehicles which accord-
ing to the rule suggests about 14 intervals which is close to our choice of 12.
Here is another example. A ‘snapshot’ of vehicles on a short stretch of road
is taken at the same hour on the same weekday for 59 occasions. Table 41.2 is a
905
Table 41.2

41.1
Number of vehicles (xi ) Frequency (fi )

0 12

REPRESENTING DATA
1 15
2 13
3 8
4 5
5 3
6 2
7 1

15

10
Frequency

0 1 2 3 4 5 6 7
Number of vehicles on the stretch of road

Fig. 41.2 Histogram of Table 41.2.

frequency table, which collocates the numbers of cars. The histogram is shown
in Fig. 41.2.
The sample mean X of n observations {xi}, where xi occurs with frequency fi , is
defined by

∑ni =1 fi xi
X= ,
∑ni =1 fi
which is equivalent to the average, or mean, of the total set of observations since
there must be ∑i=1
n
fi of them. For the traffic census in Table 41.2
906
(0 × 12) + (1 × 15) + (2 × 13) + (3 × 8) + (4 × 5) + (5 × 3) + (6 × 2) + (7 × 1)
X=
12 + 15 + 13 + 8 + 5 + 3 + 2 + 1
DESCRIPTIVE STATISTICS

119
= ≈ 2.017.
59
The mean X of this sample will be an estimate for the true population mean. If the
samples are not classified into categories then fi = 1 and, as before,
n
1
X=
n
∑x,
i =1
i

where n is now the number of samples.


There are other measures of the central characteristics of samples. The mode is
41

the value of xi which occurs most often in a sample, and is therefore most likely to
occur in other random samples. Thus in Table 41.2, the number 1 appears most
often (15 times), so that the mode of these data is 1.
The central item in an ordered list of sample values is known as the median.
Suppose that a list of examination marks is given by
Examination marks: 31, 36, 38, 39, 45, 46, 57, 60, 65, 65, 69, 72, 75, 79
in increasing order. If the sample has an odd number of items, say 2n + 1, then the
median is the (n + 1)th item: if the number is even, say 2n, then it is defined to be
the average of the nth and the (n + 1)th numbers in the ranking. In the list of
examination marks above the median is 12 (57 + 60) = 58.5. The mode is 65 but
it would be a number of no particular significance in this list, since the mode
contains only two marks.
In Table 41.2, there are 59 numbers from 0 to 7 consisting of 12 zeros, 15 ones,
etc. The median is the 30th number which will be one of the twos. Hence the
median is 2.
The box plot displays graphically important features of data such as the
median, the spread, and symmetry of the data, and is particularly useful in com-
paring different data sets, as for example in the results in a series of associated
examination papers. Suppose that the examination marks in three papers are as
percentages (each in increasing order) as shown in Table 41.3. We first find the
median of the marks in each paper. Thus the medians are 59.5 for Paper 1,
61 for Paper 2, and 56 for Paper 3.
Suppose that the data contain 2n observations. Then the first quartile is the
median of the n smallest observations, the second quartile is the overall median
of the data, and the third quartile the median of the n largest observations. If

Table 41.3

Examination marks (0–100)

Paper 1 (16 results) 27, 40, 46, 48, 55, 55, 56, 58, 61, 63, 64, 66, 68, 69, 72, 78
Paper 2 (11 results) 30, 38, 39, 48, 58, 61, 64, 68, 69, 70, 81
Paper 3 (9 results) 26, 40, 43, 54, 56, 61, 62, 72, 74
907
Table 41.4

41.1
First quartile Median Third quartile

Paper 1 51.5 59.5 67

REPRESENTING DATA
Paper 2 43.5 61 68.5
Paper 3 43 56 62

the data contain 2n + 1 observations then the first quartile is the median of the
n + 1 smallest observations and the third quartile the median of the n + 1 largest
observations. (The second quartile is the overall median.) The quartiles divide the
observations into four approximately equal numbers of observations.
For the examination marks paper by paper the quartiles are given in Table 41.4.
The difference between the third and first quartiles is a measure of the spread of
the data, and is known as the interquartile range.
Create a vertical scale 0 –100 as shown in Fig. 41.3. For each paper, position a
box such that its upper edge is level with the third quartile on the scale, and its
lower edge is level with the first quartile. The line across the middle of the box is
the median. Extend each box by a line to the extreme marks above and below the
box. These lines are known as whiskers. Visually we can see how the average and
spread of the marks compare. A compressed box indicates poor discrimination in
the marks, and long whiskers might indicate exceptional successes or failures
(often known as outliers). Examiners may wish to take remedial action by scaling
in the light of the comparative box plots if there are candidates in common
among the papers.

100

Highest mark
75
Third quartile

Median

50 First quartile

Lo west mark
25

0
Paper 1 Paper 2 Paper 3

Fig. 41.3 Box plots for three examination papers.


908

Self-test 41.1
DESCRIPTIVE STATISTICS

A class of 15 students sit four final examination papers. The marks as


percentages are as follows:
Paper 1: 8,30,38,41,42,44,44,47,50,55,56,56,60,70,75;
Paper 2: 25,35,40,46,49,50,50,58,59,60,61,66,70,75,80;
Paper 3: 36,37,41,43,44,45,46,50,55,57,59,60,67,70,78;
Paper 4: 30,40,45,45,45,47,48,49,50,55,64,66,70,76,84.
Find the median, the first and third quartiles for each paper. Plot the corre-
sponding box plots from the data.
What is the mean mark for each paper?
41

41.2 Random samples and sampling distributions


One aim in statistics for a set of data is to fit a probability model to it so that
inferences can be drawn concerning the data. This usually requires the selection
of a probability distribution to model the data, often on the basis of minimal
information. Having chosen the distribution, the parameter values of the dis-
tribution have to be estimated from the data. The set of data is the only hard
information.
Consider a simple yes/no poll of the population in the UK. The question is
asked: are you in favour of the UK joining the European Monetary Union (EMU)?
The answer must be either ‘yes’ or ‘no’: ‘don’t know’ responses are not permitted.
The question could be put to the whole population in a referendum (at consider-
able expense), but if we are interested just in an opinion poll then the question
could be put, say, to a random sample of 500 individuals chosen randomly from
the population. (There is the question of how this can be achieved but we will not
dwell on this polling problem.) Suppose that in the sample 56% say ‘yes’ and 44%
say ‘no’. We could conclude that the population is in favour but we could also ask
how much weight should be attached to this poll result. Is it far enough away from
the 50% critical value, for example, for a confident prediction?
More information might be obtained if we took a number of 500-person polls
from the population, and examined the distribution of the random variable X
representing the number of votes for EMU. These polls will not take place but we
can see whether we can make any inferences from the postulate. This distribution
is known as the sampling distribution of X. We could model the posing of the ques-
tion to 500 individuals as a series of 500 independent Bernoulli trials which has a
binomial distribution (see Section 40.3) with parameters n = 500 and an unknown
probability p for the number of ‘yes’ votes.
For a single poll with n persons, the binomial distribution (the probability of i
yes votes) is
n!pi(1 − p)n − i
(i = 1, 2, … , n).
(n − i)!i !
909
The mean of the binomial distribution, which is the mean number of yes votes,
is np.

41.2
We can estimate p from each poll. Since the mean for the binomial distribution
is np, it seems reasonable to estimate p as X /n. We shall denote an estimator for

RANDOM SAMPLES AND SAMPLING DISTRIBUTIONS


p from a sample by N. The value of a random variable is known as an estimate.
The symbol N is used for both the random variable and its value.
One test of whether we are looking at an appropriate measure of the probability
p is the behaviour of the expected value of N. Thus

1 np
E(N) = E(X/n) = E(X) = = p, (41.1)
n n
since we are assuming a binomial distribution. Hence the expected value of
the estimates gives the probability p, the mean of the sampling distribution of
N. Generally, if the expected value of the estimate equals the parameter being
estimated, then the estimate is called unbiased; if this is not the case then the
estimate is called biased.
The spread of the estimate can be found by calculating the expected value
E[(N − p)2]. Then

E[(N − p)2] = E[{N − E(N)}2] = Var[N]


by (40.5). Hence, the variance of the sampling distribution of N is given by

⎡X ⎤
Var[N] = Var ⎢ ⎥
⎣n ⎦
1
= 2 Var[X ] (by (40.8(ii))
n
np(1 − p) p(1 − p)
= = , (41.2)
n2 n
for the binomial distribution. As we might expect, the variance of the sample
means decreases with increasing sample size.
Given the one sample at the beginning of this section, the estimate for p is
p = 0.56. The estimated variance (see 40.5) of this single sample replacing p by
N is
N(1 − N)
Y 2 = Var[N] =
n
0.56 × 0.44
= ≈ 0.005.
500
The corresponding standard deviation N is Y ≈ 0.022.
We can turn the result round and ask the question: what sample size should we
choose to achieve a given Y? Thus the sample size will be

√[p(1 − p)]
n= .
Y2
910

41.3 Sample mean and variance, and their estimation


DESCRIPTIVE STATISTICS

The general sampling problem is as follows. Suppose that we take a sample of n


observations x1, x2, … , xn of a population where xi is a value of a random variable
Xi. The random variables (or random sample) X1, X2, … , Xn are assumed to be
independent with the same density distributions. We assume that the sampling
distribution can be modelled by a known probability distribution. Estimates,
preferably unbiased, are required for the mean and variance of the underlying
distribution.
An obvious choice for the mean is simply the sample mean which is the average
value of the sample.
The sample mean is a random variable defined by
41

Sample mean for sample size n


X + X 2 +  + Xn
e= 1 .
n (41.3)

If x1, x2, … , xn are values obtained in a particular sample then its mean is
n
1
X=
n
∑x.
i =1
i

What is the relation between the sample mean and the mean of the population?
The expected value of e is

⎛1 n ⎞ 1 n 1
E(e) = E ⎜
⎝n

i =1
Xi⎟ =
⎠ n

i =1
E(Xi ) = nµ = µ.
n

As we might expect, the expected value of the sample mean is the same as the
mean of the population.
The variance of the sample mean is, by (40.6),

⎛1 n ⎞ 1 n
nσ 2 σ 2
Var(e) = Var ⎜
⎝n

i =1
Xi ⎟ = 2
⎠ n
∑ Var(X ) =
i =1
i
n2
=
n (41.4)

where σ 2 is the unknown variance of the population. Its standard deviation σ /√n
is known as the standard error of the sample mean.
We also need an estimate for the variance σ 2 of the population. We might
choose
n
(Xi − e)2
T2 = ∑ n
,
i =1

which is the variance of the sample, but is it unbiased? In other words does its
expected value equal σ 2? The following algebra supplies the answer:
911
⎡1 n ⎤ ⎡1 n ⎤
E(T2 ) = E ⎢ ∑ (X − e)2 ⎥ = E ⎢ ∑ [(X − µ) − (e − µ)]2 ⎥

41.4
i i
⎢⎣ n i =1 ⎥⎦ ⎢⎣ n i =1 ⎥⎦
⎡1 n ⎤

CENTRAL LIMIT THEOREM


= E⎢ ∑ [(X i − µ)2 − 2(Xi − µ)(e − µ) + (e − µ)2 ]⎥
⎢⎣ n i =1 ⎥⎦
1 ⎡n ⎤ 1
= E ⎢∑ (Xi − µ)2 − n(e − µ)2 ⎥ = [nσ 2 − nE [(e − µ)2 ]]
n ⎢⎣ i =1 ⎦⎥ n
σ2 n − 1 2
= σ 2 − Var(e) = σ 2 − = σ
n n
using (41.4) in the last line. In other words the expected value of the sample
variance is not an unbiased estimator of the variance of the sampling distribution:
there is a correction factor of (n − 1)/n. A better statistic for an unbiased estimator
of the variance is

Estimator for the sample variance


∑ni =1 (Xi − e)2
S2 = .
n−1 (41.5)

For large samples the difference between T2 and S2 is small but it can be significant
for small sample sizes. The estimator is often known simply as the sample variance.

41.4 Central limit theorem


The normal distribution was introduced in Section 40.9, and its importance in the
context of random errors was hinted at there. The pdf for a normal distribution with
mean µ and variance σ 2 is (eqn (40.10))
1
f (x) = e−(x − µ )2 /2σ 2 .
σ 2π
The central limit theorem (which will not be proved here) states that if random
samples are taken from a distribution with mean µ and standard deviation σ,
then the sampling distribution of the random variable e of the sample mean will
be normally distributed with mean µ and standard deviation σ /√n as n → ∞
whatever the original distribution of the Xi. Analytically this can be expressed as

Central limit theorem


x


⎛ e − nµ ⎞ 1
 x⎟ = e−
1 2
u
lim P ⎜ 2 du.
n →∞ ⎝ σ √n ⎠ 2π −∞ (41.6)

In this result (e − µ)√n/σ is the standardized random variable derived from e,


and for large n it is normally distributed.
912
As we have already stated the true significance of this result is that it is independent
of the distribution of each Xi, which need not be normal.
DESCRIPTIVE STATISTICS

This result can be illustrated in the case of the throwing of n dice in which the
frequencies of average scores are kept. The probabilities can easily be computed
(see Project 41.4 in Chapter 42) for small values of n. For example, if n = 2, then
the possible average scores and the probabilities with which they occur are given
in Table 41.5 and Fig. 41.4a. Graphs for n = 2, 4, 6 computed using a program to
generate the bar charts are shown in Fig. 41.4. The bounding curve begins to show
for n = 6 the familiar shape of the normal distribution.
Table 41.5
3 5 7 9 11
Average score 1 2 2 2 3 2 4 2 5 6 6
41

1 2 3 4 5 6 7 8 9 10 11
Probabilities 36 36 36 36 36 36 36 36 36 36 36

Probability
Probability
n=2
Probability
0.1 0.1 n=4 0.1
n=6

O 1 2 3 4 5 6 O 1 2 3 4 5 6 O 1 2 3 4 5 6
(a) Average score (b) Average score (c) Average score

Fig. 41.4 Probabilities versus average scores for rolling two, four and six dice.

Example 41.1 A die is rolled 6000 times. The number T of times face 1 appears is
counted. Find m1 and m2 in P(m1  T  m2) in order that T should lie within one
standard deviation of its mean value 1000.
Let X be the random variable that a 1 appears face up on the die. Then E(X) = 1000. Its
variance is given by
σ 2 = Var(X) = E(X2) − E(X)2 = 5000/6.
By the central limit theorem
⎛ 1 ⎞
⎜ T − 6000 ⋅
6 k ⎟ ≈ 1
k2
P ⎜ k1 
⎜⎜ 5
2⎟
⎟⎟ 2π k1 
e− 2 u du,
1 2

6000
⎝ 6 ⎠
or
⎛ 5 ⎞ k2

2π 
5 1
k1 6000 + 1000  T  k2 6000 + 1000⎟ ≈ e− 2 u du.
1 2
P⎜
⎝ 6 6 ⎠ k1

Hence by the normal distribution table (Appendix H)


k1 = − 0.8413, k2 = 0.8413.
Hence
5
m1 = − 0.8413 6000 + 1000 ≈ 976,
6
m2 ≈ 1024.
913

Self-test 41.2

41.5
Suppose in Example 41.1, the die is rolled 12 000 times. Find the new m1 and
m2 in P(m1  T  m2) in order that T should be within one standard devia-

REGRESSION
tion of its mean of 2000. If n is the number of times the die is rolled, how does
m2 − m1 behave in terms of n?

41.5 Regression
Suppose that we have a set of data in which one quantity is measured in relation
to another quantity. For example, the fuel consumption of a car will vary with
the speed of the car, or the weight of an individual will vary with the height of the
person. We may wish to speculate as to what the relationship is between two (or
more) quantities.
Suppose that a sample of measurements is taken, for example fuel consump-
tion (y) for different speeds (x) of a car. This leads to the paired data (x1, y1),
(x2, y2), … , (xn, yn), in which one or both variables may contain random errors.
We can obtain an idea of the likely relationship between x and y by plotting the
coordinates (xi, yi) as points in rectangular cartesian coordinates, giving what
is known as a scatter diagram. Some examples are shown in Fig. 41.5. If we fit a
curve to the data shown in the scatter diagrams, then we might guess a straight
line fit to the data in Fig. 41.5a, and a curve in Fig. 41.5b, whereas in Fig. 41.5c,
which shows data centred around a point, we might feel that no relationship
exists between the variables. Often in scientific experiments the relationship
between the variables can be inferred from some underlying theory although
parameters may be unknown. For example, it might be known that the formula
relating x and y is linear so that we need to find the best straight line fit to the data.
For others we might need to guess the likely shape of the curve from the scatter of
the data as in Fig. 41.5b.
In some data sets there can be errors in both measurements. In others, one
variable known as the controlled or independent variable x is specified (measure-
ments could be made at specified times which are known accurately) and y, which
will contain random errors, is known as the response or dependent variable.

(a) (b) (c)


y y y

x x x

Fig. 41.5 Scatter diagrams.


914
In the fuel efficiency tests the speed of the car could be measured accurately
(controlled variable), but the fuel consumption (response variable) might be
DESCRIPTIVE STATISTICS

affected by other factors (ambient temperature, engine tuning, etc.). On the other
hand in the height/weight data the measurements could be accurate, although the
weight could vary over time. There is unlikely to be a ‘formula’ relating height
and weight (there may be other parameters involved) but nevertheless it is useful
to have a working relation between the two for life tables used by insurance com-
panies. The process of estimating the response variable from a set of controlled
variables is known as regression.
If the hypothesis is that the data follow a straight line relationship then the
model is known as a linear regression model. This regression model assumes that
the random variable Y of the data {yi} is given by
41

Y = ax + b + ε,
where a and b are unknown parameters and ε is a random error with mean 0 and
unknown variance σ 2. Note that the variance of Y is
Var(Y) = Var(ax + b + ε) = Var(ax + b) + Var(ε) = Var(ε) = σ 2.
With x as a controlled variable, the vertical deviation of the point (xi, yi) from
the line is
ei = yi − (axi + b).
We use the method of least squares for the sum of the squares of the deviations
which requires the minimum of
n n
f (a, b) = ∑e
i =1
2
i = ∑ (y
i =1
i − axi − b)2

(see Section 28.6 for a full derivation of a and b). The minimum occurs where
∂f/∂a = ∂f/∂b = 0 and, as in (28.11), the best straight line fit is given by the solu-
tion of
n n n
a ∑ x 2i + b ∑ xi = ∑x y, i i
i =1 i =1 i =1
n n
a ∑ xi + bn = ∑y. i
i =1 i =1

The solutions of these equations are the least-squares estimates for a and b, and
using the notation for estimators we shall distinguish them by s and S:

Least-squares estimates:
∑ni =1 xi yi − n XY
S = h − sf, s= ,
∑ni =1 x 2i − n X 2
where
n n
1 1
X=
n
∑x, i =1
i Y=
n
∑y.
i =1
i
(41.7)
915
The least-squares regression estimator t is given by

PROBLEMS
t = sx + S,
and this can be used to estimate y for other values of x. It also defines the equa-
tion of the regression line of y on x though the data. The regression line of x on y,
which generally will be a different line, can be found similarly.
The estimates s and S have been obtained by least squares. Are they unbiased
estimators of a and b? We can decide the answer to this question by finding their
expected values. Thus, noting that Yi is the random variable with value xi and that
xi is a controlled variable,

⎡ ∑ni =1 xi Yi − n Xg ⎤
E(s) = E ⎢ n 2 2 ⎥
sdfdsf
⎣ ∑ i =1 x i − n X ⎦
E[∑ni =1 xi(axi + b + ε i ) + X ∑ni =1 (axi + b + ε i )]
=
∑ni =1 x 2i − n X
∑ni =1 xi(axi + b) − ∑in=1 X (axi + b)
= (since E(ε i ) = 0)
∑ni =1 x 2i − n X
a ∑ni =1 x 2i + bn X − naX2 − nbX
= = a.
∑ni =1 x 2i − n X
Also, by (41.9) and the result E(s) above
E(S) = E(g − sf)
1 ⎡n ⎤
= E ⎢∑ (axi + b + ε i )⎥ − XE(s)
n ⎢⎣ i =1 ⎥⎦
n
1
=
n
∑ (ax
i =1
i + b) − Xa = aX + b − Xa = b.

Hence s and S are unbiased estimators of a and b respectively.


Regression lines are most easily determined and compared with the data by
using computer software. Whilst we have only discussed regression lines, in many
applications regression curves are more appropriate, but the important point is
that they must be linear in the parameters.
This is a very brief introduction to statistical methods. For a detailed treatment
consult Montgomery and Runger (1994).

Problems
41.1 Find the mean, median, first and third 41.2 In a university degree examination
quartiles, and the interquartile range of the with four papers each taken by 20
following two data sets: candidates the percentage marks
(a) 10, 11, 11, 15, 17, 20, 25, 25, 27, 30, 38, 42, 47; are as shown in Table 41.6. Draw
(b) 5, 12, 15, 16, 20, 29, 29, 32, 39, 44. comparable box plots for the
Draw box plots for both sets of data. results.
916
Table 41.6 you say about the distribution of the
sample mean?
DESCRIPTIVE STATISTICS

Examination marks (0–100)


41.7 A random sample is taken from a population
Paper 1 24, 27, 27, 30, 40, 42, 48, 55, 58, 60, which has mean µ and variance σ 2. The sample
61, 63, 64, 66, 66, 68, 69, 72, 78, 85 values are
Paper 2 30, 35, 36, 38, 39, 40, 44, 45, 48, 51, 9.71, 10.26, 9.80, 9.85, 9.99, 10.10, 9.79.
54, 58, 61, 64, 65, 65, 69, 70, 81, 90 Estimate the sample mean and the sample
Paper 3 26, 29, 30, 35, 36, 37, 46, 48, 49, 49, variance, σ 2.
50, 54, 56, 61, 69, 70, 71, 71, 72, 74
41.8 A die is thrown 9000 times, and the number
Paper 4 10, 20, 22, 34, 41, 44, 45, 45, 45, 50, of times face 1 appears is recorded. If T is the
55, 55, 55, 56, 64, 65, 66, 70, 85, 91 random variable for the number of 1s in 9000
throws, calculate k1 and k2 such that
41

k2

2π 
41.3 Samples of packets of crisps are weighed at the 1
P(1460  T  1540) = e − 12 x 2
dx.
end of a manufacturing process. Packets have to k1
contain a minimum of 25 g. The sample weights are
25.1, 25.3, 25.0, 25.7, 25.3, 25.2, 25.1, 25.5, 41.9 Fuel consumption figures for standard urban
25.7, 25.1. cycles of a selection of cars together with their
Calculate the sample mean, mode, and standard weights are given in Table 41.8. Find the least-
deviation. squares estimator for a regression line of fuel
consumption (c) on weight (w).
41.4 In a continuous production process a
Table 41.8
machine cuts pipes into nominal lengths of 10
metres. The actual lengths in a production run Vehicle Weight, Fuel consumption,
are given in Table 41.7. Draw a histogram over
w (kg) c (km l−1)
(a) 10 intervals of width 0.1 metres, (b) 5 intervals
of width 0.2 metres. Add a frequency polygon to
both histograms.
A 2100 4.96
B 1350 9.10
Table 41.7
C 1008 12.04
Length Frequency Length Frequency D 1323 7.68
interval of pipes interval of pipes
E 710 15.15
9.5  x  9.6 1 10.0  x  10.1 21 F 1215 10.98
9.6  x  9.7 4 10.1  x  10.2 15 G 1436 7.75
9.7  x  9.8 5 10.2  x  10.3 11 H 1561 8.25
9.8  x  9.9 12 10.3  x  10.4 5 I 2120 4.85
9.9  x  10.0 20 10.4  x  10.5 2 J 1975 4.64
K 1535 5.56
41.5 In an experiment 127 observations are taken
which can be assigned to a maximum of 36 An unbiased estimator for the variance in linear
intervals. If you wish to display the data in a regression is given by
histogram, what would be a suitable number n
( yi − ti )2
of intervals to use? ∑
i=1 n−2
,

41.6 A random variable X has a uniform where ti = sxi + S. Estimate the variance of the
distribution (see Problem 40.4) with pdf regression line.
One point is some distance from the regression
⎧1, 1  x  2;
f (x) = ⎨ line (such rogue values are known as outliers). If
⎩0, otherwise. this particular vehicle is excluded from the data
A random sample of size 35 is taken. Find the mean how are the regression line and the estimated
and estimate the variance of the sample. What can variance affected?
Part 8
Projects
Applications projects using
symbolic computing 42

CONTENTS

42.1 Symbolic computation 919


42.2 Projects 920

42.1 Symbolic computation


There have been a number of significant advances in symbolic computation and
computer algebra manipulation in recent years. These are systems which bring to-
gether symbolic, numerical, and graphical operations in one software package.
The mathematical methods introduced in this book are particularly appropriate
contexts in which to have a first look at such systems.
The software Mathematica† has been used extensively in the production of the
drawings of curves and surfaces, and in the checking of examples and problems, in
this text. At an elementary level, Mathematica is particularly helpful, for example,
with operations such as differentiation (including partial derivatives), the con-
struction of Taylor series, elementary algebraic operations involving matrices
and linear equations, elementary integration (including repeated integrals), and
difference equations; but most topics in this book can be approached to some
extent using Mathematica. It is also useful in curve sketching in that a quick view
of the general feature of a curve can be obtained, which can then be revised and
edited to produce detailed graphs as required.
It is not the purpose of this book to provide an introduction to Mathematica.
There are a number of texts which do, including the handbook that comes with
the system. There are other software packages including MAPLE+ which can also
be used in mathematics. Apart from this chapter, Mathematical Techniques is
software-free.
Useful information about Mathematica and its applications can be found in
the following texts by Abell and Braselton (1992), Blackman (1992), Skeel and
Keeper (1993) and Wolfram (1996).

† Mathematica is a registered trade mark of Wolfram Research Inc.


+ MAPLE is a registered trademark of Waterloo Maple Software.
920

42.2 Projects
APPLICATIONS PROJECTS USING SYMBOLIC COMPUTING

The following projects are listed by chapter. They are selected samples of prob-
lems and do not cover every topic in the book. The intention is that they can be
approached using mainly built-in Mathematica commands: very few problems
require programming in Mathematica. It is generally inadvisable to attempt
these problems by hand, since many could involve a great deal of manipulation,
although some projects are prompted by examples and problems in the relevant
chapters.
It is worth emphasizing that computer algebra systems usually generate out-
puts or answers without explanation of how the results are arrived at, unless
the programming within them is investigated. Outputs can go wrong for many
mathematical reasons. For example, a curve can oscillate too frequently for the
built-in point spacing to detect, which can result in a false graph. This can be cor-
rected by increasing the number of plot points, but the potential difficulty has to
be recognized at the formulation stage. Symbolic computation is not a substitute
for understanding mathematical techniques.
Mathematica notebooks for each project are available on the web at:

www.oxfordtextbooks.co.uk/orc/jordan_smith4e

Any comments should be sent to the authors at:

School of Computing and Mathematics, Keele University, Keele, Staffordshire


ST5 5BG, UK. (Email: [email protected])
42

Chapter 1 (c) y = f(−x);


1. Draw the graphs of y = x3, y = (x − 1)3, (d) y = f(| x|); all for −2  x  2.
y − 1 = x3, y − 1 = (x − 1)3 for −1.5  x  2.5. 5. Define the Heaviside function H(t) and the
How do they differ? signum function sgn t. Plot graphs of the
2. (a) Plot the points (n, n2 + 1) for n = 1, 2, 3, 4, 5. following functions on −4  t  4:
(b) Plot the points in (a) but with successive (a) H(t);
points joined by straight lines. (b) sgn t;
(c) Plot y = x2 between x = 0 and x = 5. (c) H(t) + H(−t);
(d) Show the curves from (b) and (c) on the (d) sgn(sin t).
same graph.
6. Plot the graphs of the curves defined by the
3. Plot curves defined by the following relations following polar equations:
between x and y. (a) r = --12 (1 − cos θ ) for 0  θ  π (cardioid).
(a) x2 + 3y2 = 4; −2  x  2; (b) r = (4 sin2θ − 1) cos θ for 0  θ  2π
(b) x2 + 2y2 − xy + 2y = 4; −3  x  3; (folium).
(c) x4 + 2y2 − xy − 2x2y = 4; −2  x  3.
7. Express
4. Define the function f(x) = x(1 − x2). Plot the
1
graphs
(a) y = f(x);
(x − 1)(x − 2)(x − 3)(x − 4)(x − 5)
(b) y = f(1 − x); in partial fractions.
921
Chapter 2 3. If
x2 + 2y2 − xy − 2yx2 = 4,

42.2
1. Define the function
x sin x − 1 + cos x find dy /dx as a function of x and y.
f(x) = .
sin 2x + 2 − 2 ex

PROJECTS
Chapter 4
Find limx→0 f(x). Plot the function for − 0.5  x 1. Display rules for the first and second derivatives
 − 0.001 and for 0.001  x  0.5, and check with respect to x of the following general forms:
graphically that this agrees with the limit. (a) f(x 2);
2. Find the derivative of (b) f(sin x);
(c) f(sin(x 2)).
f(x) = 7x2 + 8x3 + 9x4 + 10x5 + 11x6 + 12x7
and its values f ′(0.2) and f ′(0.4). 2. Find the first and second derivatives of
f(x) = 0.1x5 − 0.5x4 + 0.2x3 + x2 − 0.7x + 2.2.
3. Find the derivative of
Estimate the roots of f ′(x) = 0 from a graph of
f(x) = x4 + 2x3 − 3x2 − 2x + 4. y = f(x). Then find the roots to 5 decimal places
Find the approximate values of x where by a root-finding routine. Calculate f ″(x) at
f ′(x) = 0, using a numerical solution routine. each stationary point, and confirm the second-
Plot graphs of y = f(x) and y = f ′(x) on the derivative test for stationary points. Points of
same axes and compare the zeros of f ′(x) with inflection are given by f ″(x) = 0. Find their
the zero slopes on y = f(x). locations on the original graph of y = f(x).
4. Find the equation of the tangent to the curve 3. Plot the graph of
y = x sin 2x x2 − 1
y= ,
at x = 0.7. Plot the graphs of the curve and its 2x + 1
tangent. and its asymptotes y = --12 x − --14 and x = − --12
5. Find the first three derivatives of (see Fig. 4.13).
f(x) = x sin2x + x2 sin(x2), 4. Plot the graph of y = f(x) = x5 − 2x3 + x2 − 3x + 1
and confirm that the first nonzero higher in the interval −1  x  3, and estimate the
derivative at x = 0 is f (3)(0) = 6. roots of f(x) = 0 in this interval. Set up a
Newton routine
6. Plot the graphs of y = f(x), y = f ′(x), and
y = f ″(x) for f(xn )
xn+1 = xn − ,
f ′(xn )
f(x) = x2(x2 − 3)
for calculating the roots of f(x) = 0, and find,
in the interval −2  x  2.5. (This should
starting at x = 0.5 and 1.6, the roots to 10
confirm the results from Problem 2.19.)
significant figures. What is the smallest number
Chapter 3 of iterations required in each case to calculate
the roots to 10 significant figures?
1. Display rules for the derivatives of the
following general forms: 5. Plot the graph of y = x + sin 5x in the interval
(a) f(x)g(x); 0  x  25 using
(b) f(x)/g(x); (a) the default plotting routine,
(c) f(g(x)); (b) plotting with 20 plot points,
(d) f(x)g(x)h(x); (c) plotting with 50 plot points.
(e) f(x)g(x)/h(x); Explain why the graphs are different for this
(f) f(h(x))/h(x). type of function.
2. Find the first derivatives of Chapter 5
f(x) = esinx cos x sin x.
2
1. Obtain formulae for the Taylor polynomials for
The function is periodic. What is its minimum the following functions centred at x = a as far
period? Plot its graph and the graph of f ′(x) as (x − a)3:
over one cycle. Estimate where f(x) is stationary (a) f(x); (b) [ f(x)]2;
and then find each of the roots of f ′(x) = 0 to 5 (c) f(x)g(x); (d) e f(x).
decimal places using a root-finding routine. State the coefficient of (x − a)2 in each case.
922
2. Find Taylor expansions about x = 0 up to and Find and compare
including x5 for each of the following functions: (a) AB and BA; (b) A(BC) and (AB)C;
APPLICATIONS PROJECTS USING SYMBOLIC COMPUTING

(a) ex; (b) (x + 1) cos x; (c) (A + B)T and AT + BT; (d) (AB)T and BTAT.
(c) ln(1 + sin x); (d) exp(sin(ex − 1)).
2. Find the inverse of
3. Find the Taylor polynomials for (sin2x)/x2 up to
⎡1 x1 x 21 ⎤
and including xN for N = 2, 4, 6. Plot the graphs ⎢1 x 2 x 22 ⎥
of the function and its Taylor polynomials ⎢1 x x32 ⎥⎦
for 0.001  x  2, and compare them. At ⎣ 3

approximately what values of x do the Taylor (see Problem 7.18). Find the equation of the
polynomials visibly part company from the parabola of the form y = a + bx + cx2 through
exact function? the points (−1, −2), (--12 , −1), and (--52 , 2).
4. Find the Taylor polynomials for ln x about x = 1 3. Let
for N = 6. Construct an error function which is
⎡ 31 1
3
1
6
1
6 ⎤
the difference of ln x and its Taylor polynomial. ⎢1 1 1 1 ⎥
Show that, at 2.159 approximately, this error A = ⎢ 41 2
1
8
3
8
1 ⎥.
starts to exceed 0.2 as x increases. Plot this ⎢ 81 4
1
8
1
4
1

error function against x for 1  x  2.2. ⎢⎣ 2 6 6 6 ⎥⎦

Chapter 6 Find A2, A4, A8, A16. How do you expect An to


behave as n → ∞?
1. Solve, for the complex number a, the equation
z = 0 where Chapter 8
(2 + 3i)4 (a − 2i) 1. Let
z= + .
(1 − 5i)3 (1 + 5i)4 ⎡1 −1 2 3⎤
⎢3 1 0 −3⎥
2. If z = x + iy, find the real and imaginary parts A=⎢
3 −1⎥
,
2 −1
of z ez cos z. ⎢ ⎥
⎣2 −1 2 4⎦
3. Find the 13 roots of z13 = 1 + i, and plot the roots
42

on the Argand diagram. ⎡ 2 4 −3 1⎤


⎢ 0 −1 4 3⎥
4. Let z1 = 1 − 2i, z2 = 3 + i. Plot the following B=⎢
3 1⎥
.
−2 −2
points on the Argand diagram: ⎢ ⎥
⎣−2 5 6 −5⎦
z1 + z2, Z1 + Z2, z1 − z2, Z1 + z2, z1z2, z1 /z2.
Find det A, det B, det A−1, and det AB. Confirm
5. Find | z | and Arg z, where that
(1 + 2i)4 2(3 − 4i)3 det A−1 = 1 /det A, det A det B = det AB.
z= + .
(1 + 3i) 1 + 4i
2. Factorize the following determinants:

Chapter 7 1 1 1 1
1 1 1
(a) (b) a b c d
1. Let a b c ; ;
a 2 b2 c 2 d 2
a 2 b2 c 2
⎡ 1 2 3 4⎤ 3 3
a b c d 3 3

⎢−2 3 − 4 1⎥
A=⎢
1 2⎥
,
3 4 1 1 1 1
⎢ ⎥
⎣ 4 −1 2 3⎦ (c) a b c d
.
a2 b2 c2 d2
⎡ 1 0 −1 0⎤ a4 b4 c4 d 4

⎢ 1 −2 1 2⎥
B=⎢
1⎥
,
−3 1 −3 3. Find the values of a for which
⎢ ⎥
⎣ 2 1 2 1⎦
5 a −1 1
2 1 a 2
⎡3 1 2 1⎤
⎢p p 1 2⎥ 3 a 1 4
C=⎢ −1
1 −2 − 3 2 ⎥
. 0 a 2
⎢ ⎥
⎣2 1 0 −1⎦ is zero.
923
Chapter 9 2. Use a row-reduction method to solve the linear
equations

42.2
1. Plot the curve which has the position vector
r = (2 cos t)î + (2 sin t)q + 0.3tx x + 2y + pz = 5,
from t = 0 to t = 20. What is the curve called? 3x + 2y + z = q,

PROJECTS
The position vector represents a particle 2x − y + 4z = 7,
moving along the curve. Find the velocity where p and q are two parameters. Confirm that
vector k and the acceleration vector ] of the
particle. Show that k · ] = 0. 63 − 5q
z= (p ≠ − 117 ),
11 + 7p
2. Plot the trefoil knot given parametrically by
r = (1 + a cos 3t)(cos 2t î + sin 2t q ) + a sin 3t x and discuss the nature of solutions for all values
of p and q.
with a ∼ 0.25 and 0  t  2π.
3. Using a row-reduction instruction, show that
Chapter 10 x1 + 3x3 = 5,
1. Show that
−x1 + x2 − x3 + x4 = −1,
⎡ −2 −3 /2 2 −1 /2 2 −3 /231 /2 ⎤ x1 + 2x2 + 11x3 = 4,
⎢ −2 −3 /2 −2 −1 /2 2 −3 /231 /2 ⎥
⎢2 −131 /2 −x1 + 2x2 + 3x3 + x4 = 3
⎣ 0 2 −1 ⎥⎦
is an inconsistent set of equations.
defines a rotation of axes. If each row defines
Chapter 13
the direction of the X, Y, Z axes in the x, y, z
frame, find the equation of the plane 1. Find the eigenvalues and eigenvectors of
x + 2y − 2z = 1 in the new axes.
⎡− 6 1 2 0⎤
⎢ 1 0 −3 −1⎥
Chapter 11 A=⎢
0⎥
.
2 1 −6
1. The area of a triangle whose vertices are the ⎢ ⎥
⎢⎣ −2 2 0 −3⎥⎦
points with position vectors a, b, and c is given
by the formula How many linearly independent eigenvectors
--12 |b × c + c × a + a × b |. does A have?
Find the eigenvalues of the following
Devise a program based on this formula
matrices:
to determine the area for general vertices. (a) A−1; (b) A2; (c) A + kI.
What is the area if a = (1, 0, 1), b = (2, −1, 1),
and c = (1, 1, 2)? Plot a diagram showing the 2. Find the eigenvalues and eigenvectors of
triangle.
⎛ 1 2 1⎞
2. A tetrahedron has vertices with position vectors A = ⎜ 2 1 1⎟ .
⎜ 1 1 2⎟
a = (1, −1, 2), b = (−1, 2, 3), ⎝ ⎠
c = (2, −1, 3), d = (1, 3, −2).
Construct a matrix C of eigenvectors and
Find its surface area. Draw a three-dimensional confirm that
plot showing the tetrahedron viewed from the
A = CDC −1,
point with position vector (2.1, −2.4, 1.5).
where D is a diagonal matrix of eigenvalues.
Chapter 12 Obtain the general formula for
1. Use a row-reduction routine to solve the linear An = CDnC −1.
equations
3. Find the inverse and transpose of
x + 2y − 3z = q,
2x + py + z = −1, ⎡1 2 2⎤
A = 31 ⎢2 1 −2⎥ ,
x − 2y − z = 4, ⎢2 −2
⎣ 1⎥⎦
where p and q are two parameters. Determine
for what values of p and q the equations and verify that A is an orthogonal matrix. Find
have (a) a unique solution, (b) no solution, the eigenvalues of A. What expected property
(c) an infinite set of solutions. do they have?
924

 x(x + 1)(xdx+ 2)(x + 3);


4. Find the eigenvalues of
(e)
APPLICATIONS PROJECTS USING SYMBOLIC COMPUTING

⎡ 5 5 − 6 2⎤
⎢−3 13 − 6 2⎥
A=⎢

−3 7 0 2⎥

⎢⎣ 3 −15 12 2⎥⎦
.
(f)  (1 dx− x ) . 3

Find the expression det(A − λ I4), and Check each answer by recovering the integrands
by differentiation.
demonstrate the Cayley–Hamilton theorem
of Problem 13.21. 3. Evaluate the following definite integrals:

  √(5 +x4xdx− 4x ) ;
2 1
Chapter 14
(a) x(ln x)3 dx; (b)
1. Plot the graphs of the derivative dy /dx = sin 2x
2
1 0
and the equation of the curve through (π, −1) of 1
–2

(c)  (d)  ∑
1 100
3 n
x dx x
which this is the derivative (see Example 14.7). ; dx.
(1 − x )
0
2 –52
x!
0 n=0
2. Plot the graph of
4. Find
dy
= x e−x + sin x − x2 cos 2x, a
dx
for 0  x  10. Show that an antiderivative
I(a) =  (ln x) dx.
1
3

which is zero when x = 0 is Find the limit


y = 2 + --14 [−4(1 + x) e−x − 4 cos x − 2x cos 2x
bI(1/b)
+ sin 2x − 2x2 sin 2x]. lim
b→0 (−ln b)3
.
Plot the graph of the signed area between x = 0
and x = 10. How does I(a) behave as a → ∞? Does

Chapter 15
1. Set up a program to compute the area under the
 (ln x) dx
1
3
42

curve y = f(x) between x = a and x = b using the exist?


approximation
5. A cylindrical hole of circular cross-section and
N−1
radius b is drilled through a sphere of radius
h ∑ f (xn ), a  b, the axis of the hole passing through the
n= 0
centre of the sphere. Find the volume of the
where h = (b − a)/N and xn = a + nh. Apply the remaining object. Display a diagram of the
method to the following functions, limits, and object for some values of a and b.
subdivision numbers:
(a) f(x) = x2, 1  x  3, N = 200; Chapter 16
(b) f(x) = x e−x, 0  x  3, N = 20; 1. Plot the graph of the polar equation r = sin 5θ
(c) f(x) = x3 sin x, 0  x  π, N = 30; for 0  θ  2π. Find the area enclosed by the
(d) f(x) = cos(e−x), 0  x  1, N = 25. five ‘petals’ of the curve.
In cases (a), (b), and (c), compare the numerical Show that the area of the 2n + 1 petals of
result with the areas obtained by integration. In r = sin(2n + 1)θ (n  1) is independent of n.
these cases, how many subdivisions are required
to obtain a numerical result correct to 3 decimal 2. Devise a program to generate the trapezium rule:

 f(x) dx ≈ bN− a [--f(a) + (f(x ) + f(x ) + …


places? In (a), show that over 10 000 steps are b
1
required. Why is this? 2 1 2
a
2. Use a symbolic integration program to obtain
+ (f(xN−1)) + --12 f(b)].
the following indefinite integrals:
Apply the program to the integral
(a)  (ln x)3 dx; (b)  sin5x cos3x dx;
e
2
−2x
sin2x dx,
0

(c)  x e sin x dx;


2 x
(d)  √(1 − x ) dx;
2
and compare the result with the exact value of
the integral. Investigate how many steps are
925
required to obtain a result accurate to 3 decimal 5. Evaluate the integral
places. a

 (lnxx) dx

42.2
6
Apply the program also to Problem 16.20. f(a) = 2
1
3. A thin plane metal plate consists of an isosceles
for a  1. Find f(10), f(20), and f(∞). The

PROJECTS
triangle of height h and base length 2a with a
semicircle of radius a attached symmetrically by results indicate that f(a) tends to a limit very
its diameter to the base of the triangle. Find the slowly as a → ∞. Find where
location of its centroid on its axis of symmetry. (ln x)6
g(x) =
4. Set up a program to generate Simpson’s rule x2
has a maximum value, and plot the graph

b

f(x) dx ≈ y = g(x) for 1  x  100.


a
1–
N
1–
N−1 Chapter 18
b−aA D
2 2

C f(a) + f(b) + 4 ∑ f(x2k−1) + 2 ∑ f(x2k)F , 1. Solve the differential equation B + x = 0, for


3N k=1 k=1
the initial conditions (a) x(0) = 0, (b) x(0) = 1,
where N is an even number. Apply the method (c) x(0) = 2, and plot the solutions on the
to f(x) = e−x , with b = 1, a = 0. Compare results same axes for 0  t  2.
2

with the trapezium rule above.


2. Solve the differential equations
Chapter 17 (a) 2F + 3B + x = 0, (b) F + 2B + 2x = 0,
1. Illustrate the substitution method in integration (c) F + 2B + x = 0,
by writing a program to integrate each for the six sets of initial conditions:
(i) x(0) = 0, B(0) = 1;
 √(5 +x4x− 2− x ) dx, 2
(ii) x(0) = 0, B(0) = 2;
(iii) x(0) = 0, B(0) = 3;
using the substitutions x = u + 2, u = 3 sin t. (iv) B(0) = 0, x(0) = 1;
Integrate directly and through the substitutions. (v) B(0) = 0, x(0) = 2;
(vi) B(0) = 0, x(0) = 3.
2. Integrate the following, and compare your
Plot all solutions on the same axes for each
answers with computer-integrated ones:
differential equation, for 0  t  5.
(a)  4xx dx+ 1 ;2
(b)  tan x dx; Chapter 19
1. Solve the differential equation 2F + 3B + x
= cos t subject to B(0) = 0, x(0) = 1. Plot the
(c)  cos x dx; 4
(d)  √(xx dx− 1) ; solution for 0  t  50.
2. Solve the differential equation F + x = cos t
(e)  sin3x
cos x
dx. subject to x(0) = 0, B(0) = 0. Plot the solution
for 0  t  20.

3. Computer-integrate the infinite integrals Chapter 20


∞ ∞ 1. Solve the differential equation F + x = 0 subject
I10 =  0
t10 e−t dt, I11 = 
0
t11 e−t dt, to the initial conditions x(0) = 1, B(0) = 0. Also
solve F + sin x = 0, by a built-in numerical
solution method for 0  t  10 subject to the
and confirm that I11 /I10 = 11. same initial conditions. Plot both solutions for
4. Computer-integrate the following infinite 0  t  10. Comparison of the plotted solutions
integrals: will indicate by how much the period decreases
∞ ∞
when the linear approximation is used. Rerun
(a)  0
e−x sin x dx; (b)  1
ln x
x10
dx;
the programs for different amplitudes x(0).


Chapter 21
(c)  3
x e −ax2
dx. 1. Draw the phasor diagram of the sum of the
1 three phasors of
926
u(t) = 2 cos 10t, v(t) = cos(10t − --12 π), Chapter 24
w(t) = 3 cos(10t + --14 π)
APPLICATIONS PROJECTS USING SYMBOLIC COMPUTING

1. Computer algebra systems are quite efficient


(see Example 21.6). at finding Laplace transforms of complicated
expressions involving standard functions. Test
Chapter 22 the system with the following transforms:
1. Draw the lineal-element diagram of dy /dx = xy, (a) L{t 8 e−t};
produced by a standard package in the square (b) L{t 2 e−t cos t};
{0  x  1, 0  y  1} (see Section 22.1). ! d3x #
(c) L@ ;
Compare this with the exact solutions (see dt3 $
Section 22.1) drawn through the points (0, 0.2),
(0, 0.4), and (0, 0.6). (d) L{f(t)} where f(t) = !@ 1 if 0  t  c
0 if t  c;
2. Repeat the above process for the differential –12
equation dy /dx = x − y of Example 22.1. (e) L {et/t };
(f) L{cosh at}.
3. Design a program for Euler’s method
2. Solve
(Section 22.2) for the initial-value problem
B + 2x = e−t, x(0) = 3,
dy
= xy2, y(0) = 1 using a Laplace-transform package, and
dx
compare the answer with that of Example 24.12.
(see Example 22.4) with step length h = 0.2 and Plot the input e−t and the output against t for
five steps. Run the program for the cases h = 0.1 0  t  3.
and h = 0.01 and compare the results.
3. Using a Laplace-transform package, solve the
4. Plot numerical solutions for system
dy 3y − x F + 2B + x = a cos ω t, x(0) = 0, B(0) = 0.
=
dx 3x − y Plot the input and output functions for a = 1,
ω = 1, and 0  t  30. Estimate the eventual
(Example 22.14 and Fig. 22.11) using built-in
42

amplitude of the periodic output.


routines. As with many equations of this type
it is often easier to solve the equivalent 4. Find the functions whose Laplace transforms
simultaneous equations are:
dx dy 1 e−s
= 3x − y, = 3y − x, (a) ; (b) .
dt dt s(s + 1)(s + 2)(s + 3) (s2 + 4)(s + 1)
numerically for various initial values of x(0) Plot the functions in each case.
and y(0).
5. Consider the function f(t) = ln t. Show the
Chapter 23 Laplace-transform package produces the
transform
1. By splitting the differential equation F + 2x = 0 3

into the system 1


(γ + ln s),
s
B = y, D = −2x3,
where γ is Euler’s constant given by
and plotting four phase paths respectively
through the four points A m 1 D
γ = lim C ∑ − ln mF .
(x(0), y(0)) = (0.3, 0), (0.6, 0), (0.9, 0), (1.2, 0) m→∞ k k=1

over the interval −1.5  x  1.5, show that the Derive a program to calculate Euler’s constant.
solutions appear to be periodic. It should give γ = 0.577 215… .

2. Plot phase paths for the van der Pol equation Chapter 25
F + 10(x − 1)B + x = 0
2 1. Find the Laplace transform of the solution of
showing the limit cycle. Also show the F + ω 2x = a δ(t − 1), x(0) = B(0) = 0,
corresponding (t, x) graph of the periodic which has impulse input applied at time t = 1.
solution (the periodic solution has an initial Invert the transform and plot the output for
value close to x(0) = 2, B(0) = 0). ω = 4, a = 1 (see Example 25.3).
927
2. Following the previous project, solve the more Chapter 27
complicated problem with two impulses:

42.2
1. Find the Fourier transforms of the following
2F + 3B + 2x = a δ(t − π) cos t + b δ(t − 2π), functions:
x(0) = B(0) = 0. (a) the top-hat function Π(t);

PROJECTS
(b) the one-sided exponential e−t H(t);
Plot the output for a = b = 1. (c) e−| t |;
3. Let f(t) = t 3, g(t) = cos t. Find the convolution (d) e−| t−1 |;
(e) 1 /(1 + t 2).

 f(t − u)g(u) du.


t
Plot the graph of the transform in (e).
0
2. Find the functions whose Fourier transforms are
(a) e−f ;
2
Then verify that
(b) 1 /(4 + f 2);
 f(t − u)g(u) du 89.
t

L{ f(t)}L{g(t)} = L 89 (c) 2;
0 (d) 2 cos( f − a).

4. A transfer function with a parameter a is given Chapter 28


by (Section 25.10) 1. Plot the saddle surface z = x2 − y2 in the cylinder
x2 + y2  1, using a three-dimensional
4z 3 − 8z 2 − 2z + 4
G (z) = . parametric plot routine with parameters r
6z − 6z − 2a 2z 2 + 3z 2 + 2a 2z − 2a 2
4 3
and u where
Find the locations of the poles of G(z). For what (x, y, z) = (r cos u, r sin u, r 2 cos 2u).
values of a do all poles lie within the unit circle
(indicating transient stability)? Plot the poles on Also draw a contour plot of the surface in
an Argand diagram for a = 2. the (x, y) plane on the square −1  x  1,
−1  y  1.
Chapter 26 2. Plot the surface z = xy(x2 − y2) in the cylinder
1. Consider the period-2 sawtooth function x2 + y2  1 using the same routine as in
defined over its fundamental interval Project 28.1 above, but with the parametric
−1  t  1 by f(t) = t. Find its general equations
Fourier coefficient and output its first four (x, y, z) = (r cos u, r sin u, --14 r 3 sin 4u).
terms. Plot and compare the graphs of
this truncated series and the sawtooth for How would you describe this saddle? Draw
−3  t  3. its contour plot in the square −1  x  1,
−1  y  1.
2. Repeat the previous problem but with the
3. For the function
function
f(x, y) = ex y sin(xy) + x ln(x2 + y3),
2

⎧ 1 (0  t  1),
f (t) = ⎨ verify that
⎩−1 (−1  t  0).
∂2f ∂2f
Plot the graphs of f(t) and the first 12 terms of = .
its Fourier series. The graph should show the ∂y ∂x ∂x ∂y
Gibbs’ phenomenon, in which the Fourier series 4. Plot the surface given by z = cos xy over
approximation overshoots the function at −π  x  π, − --12 π  y  --12 π. Find the partial
discontinuities. You can try it with (say) 20 derivatives at (--14 π, 1) and construct the equation
terms or more, but you should include more of the tangent plane there. Finally plot the
interpolating points in these cases. surface and its tangent plane.
3. Find the Fourier coefficients of the 2π-periodic 5. Find the stationary points of
function defined by
f(x, y) = 0.3x3 + 0.2y2 − x2y − xy + 2y
f(x) = x6 − 5π2x4 + 7π 4x2 numerically by solving
on the interval −π  x  π. What is the sum of
∂f ∂f
the series = = 0.
∂x ∂y

(−1)n+1
∑ n6
? Plot the contours on the (x, y) plane for
n=1 −3  x  3, −9  y  3.
928
Find the values of the second derivatives at z = x + y e−xy + xy
each stationary point and check the second over 0  x  1, −1  y  1. Interpret the
APPLICATIONS PROJECTS USING SYMBOLIC COMPUTING

derivative tests (28.9) at each point. integral as the volume under the surface. Does
6. Find the least-squares straight line fit to the the integral contain ‘negative’ volumes under
points the surface? Plot the positive part of the surface
over the same rectangle.
(0, 1.1), (1, 2), (2, 2.9), (3, 3.9),
(4, 4.5), (5, 5.1), 2. Evaluate the repeated integral
√(a2 −y2)/a


in the (x, y) plane. Plot the data and the least- a

squares straight line fit. If you are using a x2y dx dy.


built-in routine, check your results against 0 − √(a2 −y2 )/a

that given by (28.10). Plot the region of integration in the (x, y)


plane, and then check that the integral has the
Chapter 29 same value with the order of the integration
1. Find the family of curves orthogonal to that of reversed.
dy
= y e−x. Chapter 33
dx
1. Let
Plot both families of curves for | x|  2, |y |  2. f (x, y, z) = xyî + yzq + (z − y)xx.
Chapter 30 Find f as a function of t on the line x = t, y = t,
1. Find where the function z = t. Evaluate the line integral
f(x, y) = x3 − 2xy − x + 3y2
is stationary subject to the condition x2 + 2y2
 f · dr
= 1. Devise a program which uses the Lagrange- on this line between (0, 0, 0) and (1, 1, 1).
multiplier method (30.4): here is a suggested Repeat the process with the curve x = t2,
line of approach. First plot the contours of y = t3, z = t 4, and the same end-points. Plot
z = f(x, y) and the curve x2 + 2y2 = 1. Locate
42

both paths of integration.


the approximate coordinates of any point of
tangency. Then use a built-in root-finding Chapter 34
scheme to locate the stationary values. 1. Plot the surfaces defined parametrically by the
There should be four. following position vectors:
Chapter 31 (a) r = (3 + cos v) cos u î + (3 + cos v) sin u q +
sin v x (see Section 34.3);
1. Find the equation of the tangent plane to the (b) r = (1 + a sin(bu)) cos v î + (1 + a sin(bu))
surface sin v q + ux, where a = 0.3 and b = 3.5
x3y + zx + xy2z = −3 (see Section 34.3).
at (1, 2, −1). 2. Given that
2. Show graphically the intersection of the f (x, y, z) = exyz î + z cos(xy)q + (x2 + y2)x,
cylinder x2 + y2 = 1 and the plane x + y + z = 1
find
(Example 31.9).
(a) div f;
3. Find the envelope of the family of curves (b) curl f;
y(a2 − 1 + ax) = x (c) div curl f;
with parameter a. Plot the envelope and a (d) curl curl f at the point (1, 0, −1).
sample of touching curves in −3  x  3.
3. Using symbolic computation test the validity of
Chapter 32 the following identities:
(a) (F ·grad)F = --12 grad(F · F) − F × curl F;
1. By repeated integration, evaluate the integral
(b) div(F × G) = G ·curl F − F·curl G;
(c) curl(F × G) = (G · grad)F − (F· grad)G − G
  (x + y e
1 1
−xy
+ xy) dx dy, div F + F div G;
−1 0
(d) div(U grad V − V grad U) = U∇2V − V∇2U;
using a symbolic routine. Plot the surface (e) curl curl F = grad div F − ∇2F.
929
Chapter 35 3. Check the complete graphs Kn, 2  n  7, and
the bipartite graphs Ki,j (2  i  5; i  j  6)

42.2
1. A and B are the sets of integers defined by
for planarity, using a built-in diagnostic test.
A = {2n + 5(−1)n |n ∈+, 1  n  100},
B = {n2 − n + 1 |n ∈+, 1  n  10}. Chapter 38

PROJECTS
Produce lists of the elements in A ∪ B and 1. Rework Example 38.2 using a symbolic package
A ∩ B. How many elements do each of these for solving difference equations. Solve the
sets have? mortgage difference equation
2. Let A, B, and C be the following sets: Qm − (1 + I)Qm−1 = −A,
with I = 0.08 and Q0 = P = 50 000 (in £). Given
A = {n(n − 1) |n ∈+, 2  n  100},
that Q25 = 0, find A. List the outstanding debt
B = {| n2 − 100n | |n ∈+, 1  n  160}, Qm each year m to the nearest £. Plot (a) the
C = {4n | n ∈+, 1  n  2200}. outstanding debt against years and (b) the
Verify the first distributive law annual interest repayments A − IQm against
years.
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).
2. Solve the following homogeneous difference
How many elements are there in the set
equations:
A ∩ (B ∪ A)? (a) un+2 − un+1 − 12un = 0;
Chapter 36 (b) un+2 + 2un+1 + 2un = 0;
(c) un+2 + 4un+1 + 4un = 0;
1. Design programs to generate the truth tables
(d) un+3 + 3un+2 + 3un+1 + un = 0, u0 = 0, u1 = 1,
for the or gate, the and gate, the not gate, the u2 = −1.
nand gate, and the nor gate.
3. Solve the following inhomogeneous difference
2. Design a program to simulate the truth table in equations:
Example 36.3 which has the output (a) un+2 − un+1 − 12un = 2 + n + n2;
f = 15452 ⊕ b ⊕ c (b) un+2 − un+1 + 4un = 2n;
for inputs a, b, and c. (c) un+3 + 3un+2 + 3un+1 + un = n2, u0 = 0, u1 = 1,
u2 = −1.
Chapter 37 4. Devise a program to generate cobweb plots
1. In the cutset method applied to the circuit in for the first-order difference equation
Fig. 37.23, the currents i1, i2, i3, i4, i5 and the
un+1 = − kun + k
voltages va, vb, vc, vd satisfy the nine equations
for (a) k = --12 , (b) k = --32 , (c) k = 1, with initial
i1 − i3 + i2 = 0, iX − i3 + i2 = 0,
value u0 = --34 in each case (see Example 38.3).
− iY + i5 − i3 + i2 = 0,
5. Display cobweb plots for the logistic difference
−iY + i4 + i2 = 0, equation
i1 = (va − vb)/R1, un+1 = α un(1 − un)
i2 = (vc − vb)/R2, for selected values of α. Some suggested values
i3 = (vb − vd )/R3, are:
i4 = (vc − vd)/R4, (a) α = 2.8 to show a stable fixed point;
(b) α = 3.4: find the period-2 solution;
i5 = vd /R5,
(c) α = 3.5: find the period-4 solution;
where iX = 2 A, iY = 2 A, and R1 = --12 Ω, R2 = 3 Ω, (d) α = 3.7: chaotic output;
R3 = 1 Ω, R4 = 2 Ω, R5 = 2 Ω. Solve this set of (e) α = 3.83: should be able to locate a stable
linear equations for the currents and voltages. period-3 solution.
2. Draw the labelled drawings of the bipartite 6. Design a program to generate the period-
graphs K5,6 and K6,6. Answer the following for doubling display shown in Fig. 38.11 for the
each graph by the built-in diagnostic test. logistic equation un+1 = α un(1 − un) for α
(a) How many edges has each graph? increasing from α = 2.8 to α = 4.
(b) Is the graph eulerian? If it is, list an eulerian
walk. Chapter 39
(c) Is it hamiltonian? If it is, list a hamiltonian 1. (See Example 39.8.) A box contains 40 balls of
cycle. which 7 are red, 12 are white, and 21 are black.
930
In each of the cases n = 2, 3, 4, 5, 6, 7, n balls Chapter 41
are drawn at random from the box without
APPLICATIONS PROJECTS USING SYMBOLIC COMPUTING

1. Devise a program to draw comparative


replacement. What is the total number of n-ball box plots for the examination data given
selections which can be made? What is the in Problem 41.2.
probability that there are n (n = 2, 3, 4, 5, 6, 7)
balls of the same colour? Show the probabilities 2. Produce a histogram and frequency polygon
graphically in a bar chart. for the pipe length data given in the table
accompanying Problem 41.4.
Chapter 40 3. Some randomized points (xi, yi ) are generated
1. List the probabilities of the binomial by the Mathematica command
distribution for n = 12 and p = 0.7. Check that Table[{x+0.2*Random[],x+2+
their sum is 1. Plot this discrete distribution as 1.2*Random[]},{x,0,6,0.5}].
a bar chart.
Find the regression lines of y on x, and of
2. Plot graphs of the probability density function x on y, for the data. Plot the data and both
(pdf ) and the cumulative distribution function regression lines. Also find the mass centre of
(cdf ) for the standardized normal distribution the data, and add this point to the graph.
N(0, 1). Where does the mass centre lie in relation
to the regression lines?
3. Model a sequence of n Bernoulli trials with
success /failure equally likely, in which the 4. Two dice are rolled and the average scores
number of successes is recorded. You could try recorded. Compute the probabilities of the
n = 50 run 500 times and count the number of possible average scores, and plot them in a bar
successes i for i = 0, 1, 2, … , n. This should chart. Repeat the program for four and six dice.
approximate to the binomial distribution Plot bar charts in each case to illustrate the
i n−i
nCi p q . Plot this distribution and compare development normal distribution predicted by
it with the simulation. the central limit theorem.
42
Self-tests: Selected answers

Chapter 1 Chapter 3
1.1 2  x  3. 3.1 dy/dx = ex(sin x + cos x).

1.2 AB = BC = √26, AC = √52; R is a right angle. dy 1 + x2 − 2x2 ln x


3.2 = .
dx x(1 + x2)2
1.3 The circles intersect at the points (0, 1) and (2, −1).
3.3 dy/dx = 1728e12x(1 + 12e12x)11.
1.4 The graph is
3.4 dy/dx = akx ln a.
! t | t|  1
x=@ 3.5 dy/dx = 2xex cos(ex ).
2 2
0 |t|  1.
3.6 dy/dx = 12 x− –2 cos x[cos x(2 + ln x) − 4x sin x ln x].
1
1 + √3
1
1.5 cos 12 π= 2√2
, sin 121 π = √32√2− 1 .
3.7 dy/dx = (x − 3)/[3x(1 + 2y2)]. At (1, 1), dy/dx = − 29 .
1.6 sin(2arctan x) = 2x/(1 + x2).
3.8 dy/dx = 2/(1 − 4x2 ).
1.7 r = 1 + cos θ, which is a cardioid.
3.9 dy/dx = −(b/a)cot t. Tangents with slope (−1)
1.8 y = 1/(1 − x2e−x). occur at (a2, b2)/√(a2 + b2) and (−a2, −b2)/√(a2 + b2).
1.9 Time T = (ln 10)/k.
a b Chapter 4
1.11 (b) f(x) = −
(a − b)2(x − a) (a − b)(x − b)2 4.1 (a) f ′(x) = ex[cos(x2) − 2x sin(x2)];
a (b) f ′(x2) = ex [cos(x4) − 2x2 sin(x4)];
2

− ;
(a − b)2(x − b) (c) df(x )/dx) = 2xex [cos(x4) − 2x2 sin(x4)].
2 2

a 1 4.2 x = 1 is a maximum, and x = 2 is a minimum.


(c) f(x) = + .
(x − a)3 (x − a)2 4.3 x = 0 is a minimum, and x = 1 is a point of
1.12 Sum to infinity is inflection (using a slope test).
4.4 The area change is δA = 8πrδr = 5.027: the exact
3 2
− . change is 5.152.
(1 − x)2 1 − x
4.5 Solution is x = 0.7686 to four decimal places,
1.13 (a) 5040; (b) 9990.
requiring three steps.
1.14 2[1 + 2nC2x + 2nC4x + … + 2nC2nx2n].
2 4

Chapter 5
Chapter 2 5.1 1 + 2x + 2x2 + 43 x 3 + 23 x4 + 15
4 5
x.
2.1 Tangent: y = −2x + 2; normal: y = − 4x + 19
8 ; 5.2 Required accuracy needs terms as far as x5.
3 13
intersection point ( , ). 16 8 5.3 1 − 12 x − 18 x2 +
x. 13 3
48
∞ 1
(x − π) − 2n
2.2
dV
= 4πr 2. 5.4 ∑ (−1) n
. 2

dr n=0 2n!
5.5 −2.
dy
2.3 = 70(x6 + x9).
dx
Chapter 6
2.4 (a) 2; (b) 2; (c) 3.
6.1 (a) 4 + 3i; (b) i; (c) 2i.
2.5 d(cosh x)/dx = sinh x: d(sinh x)/dx = cosh x.
6.2 z = 1 + i, Z = 1 − i, z2 = 2i, Z 2 = −2i, 2z = 2 + 2i,
r
2.6 (2r)!/x r!. zZ = 2.
932
π π
6.3 z = 2(cos 5π
3 + i sin 3 ), Z = 2(cos 3 + i sin 3 ),

Chapter 10
2z = 4(cos 5π3 + i sin 5π3 ), z2 = 2√7(cos θ +i sin θ ),
SELF-TESTS: SELECTED ANSWERS

where cos θ = 2/√7, sin θ = −√(3/7). 10.4 (b) − 45°.

6.4 z10 = 32i. 10.5 (b) (1/√2, 1/√2, 0).

6.5 cos 4θ = 8 cos4θ − 8 cos2θ + 1. 10.6 Angle is arccos (− 16 ).

6.6 z = 2nπi, z = ln(2 ± √3) + 2nπi, (n = 0, ±1, ±2, … ). 10.7 (b) Perpendicular distance are 1/√14, 4/√14.
6.7 S(θ ) = cos(cos θ ) cosh(sin θ ). x− –1 y − –51 z
10.8 (b) Line is 5
= = 1.
–1
5 − –51 − –5
Chapter 7
7.1 In full the matrix is Chapter 11
G −1 1 −1 J 11.1 |c | = √26.
H −2 4 −8 K .
I −3 9 −27 L 11.4 (a) −7î + 6q + x.

G 4 −1 −1 J
G1 0J
7.2 AB =
I0 , BA = H −1 −2 7K . Chapter 12
6L I −2 −1 5L
12.1 Solution is x1 = 2, x2 = −2, x3 = −3.
7.3 2A + 3B, A2, AB + BA are symmetric: AB and BA
are not symmetric. G −2 −1 −5 −2J
H 5 2 9 4K
7.4 A4 = abcdI4, so that A−1 = A3/(abcd). 12.2 A−1 = .
H 7 3 13 6K
I −8 −3 −15 −7 L
Chapter 8
12.3
8.1 det A = 2(k − 1)2; k = 1.
(a) If a ≠ −3/2, the system has the unique solution
8.2 Dn = (x − a)n−1(x + na).
x = (a + b)/(3 + 2a),
8.3 The adjoint and inverse are given by y = (−3 + 2b)/(3 + 2a),
G −3 −k − 2 −2k + 2 J z = (a + b)/(3 + 2a).
adjA = H −4 −1 6 K (b) If a = −3/2 and b ≠ 3/2, the system has no
I −1 k + 1 −2k − 1 L solutions.
G3 k+2 2k − 2 J (c) If a = −3/2 and b = 3/2, then the system has the set
A =
−1 1 H
4 1 −6 K of solutions x = λ, y = −1 + 2λ, z = λ.
4k+5 I
1 −k − 1 2k + 1 L
The matrix is singular if k = − 54. The product A adj(A) Chapter 13
will always be zero for a singular matrix.
13.1 Eigenvalues: −1, 1 − √2, 1 + √2.
Eigenvectors (−1, 2, 2)T, (−1 + (1/√2), 1/√2, 1)T,
Chapter 9 −1 − (1/√2), −1/√2, 1)T.
9.1 A_D = (29, 35); direction is 0.878… rads to x 13.2 k ≠ ±1.
direction.
13.3 Eigenvalues are −2, 1, 3. The corresponding
9.3 Relative speed = 86.02 lm/hr; direction is 35.5° E
eigenvectors are
of S.
9.4 Plane is x − 1 = λ − 2µ, y + 1 = λ, z − 2 = 3λ − 5µ.
(−2, 1, 2)T, (2, −1, 1)T, (3, 1, 2)T.

9.5 The point of intersection is (−1, −2p/(1 − p), A possible matrix C is given by
(1 + p)/(1 − p)); the locus is the straight line x = −1, G −2 2 3J
y + z = 1. C = H 1 −1 1 K .
9.6 ] = −ω 2 r. I 2 1 2L
933
13.4 17.6 (x + 3) + ln| x + 1| + ln| x + 3| + C.
9
2
−1 1
4
3
4

SELF-TESTS: SELECTED ANSWERS


G 8 −8 −12 J G 8 4 8J xn+1 A 1 D
1 (−1)n H 17.7 ln x − , (n ≠ −1); 12(ln x)2, (n = −1).
A = H 20 −20 −30 K +
n
−10 −5 10K n+1C n + 1F
15 I 15 I
−18 18 27 L 12 6 12 L
xα+1
G −15 6 10 J 17.8 [2 − 2(α + 1) ln x + (α + 1)2(ln x)2].
2n (α + 1)3
+ H −15 15 10 K
15 I
15 −6 15 L
Chapter 18
13.5 The eigenvalues are −8, 3, 5, and the
eigenvectors are (−1, −3, 1)T, (−3, 2, 3)T, (1, 0, 1)T. 18.1 x = e10t−20.

18.2 (a) x = Aet + Beat; (b) x = (A + Bt)et.


Chapter 14 18.3 x = e2t (A cos 3t + B sin 3t).
14.1 x = − cos 3t + 4t.
2
3
2
3

14.2 (a) 13 e3x − cos 2x + C; (b) −3x−1 + C; Chapter 19


(c) 4 ln |x | + C.
19.1 (a) x = 29 e−2t; (b) x = 25 cos 2t − 15 sin 2t;
14.3 (a) − 12 e−x ; (b) esinx.
2
(c) x = t 3 + t 2 − 43 t − 109 .
14.4 Signed area = e − e−1 − 2; geometrical area 19.2 The complex solution is x = −e−t+it/(2 + i).
= e + e−1 − 2. (a) x = e−t(− 25 cos t − 15 sin t);
(b) x = e−t(15 cos t − 25 sin t).
Chapter 15 19.3 A particular solution is x = − 12 t e−t.
15.1 Approximate area = 19
25 ; exact area = 3 .
2
19.4 x = 38 (−e−t + e3t) − 12 t e−t.
15.2 (a) 13 (b3 − a3); (b) (ex − ex )/x.
3 2
19.5 x = − 12 (cos t + sin t)e−t−cos t + C e−cos t.
15.3 1
18 sin18 x + C.
15.4 rms[f(t)] = a/√2. Chapter 20
15.5 1
. 20.1 x(t) = 2 cos(3t + 16 π).
2

15.6 1/(1 + b2). 20.2 The amplitude of the superimposed waves is


2C cos 12(φ1 − φ2). Cancellation occurs if φ1 − φ2 = π.
15.7 sinh3 x cosh2 x is an odd function; cos3t is odd
about t = 12 π. 20.3 x = (A + Bt) e−kt.

15.8 I(x) = 2xex cos(x2) − ex cos x. 20.4 The resonant phase occurs at the polar angle
2

given by (2k2, −2k√(ω 20 − 2k2)).

Chapter 16 20.5 Nodes occur at z = [(2n + 1)π + (φ1 − φ2)]/2k.

16.1 Volume = 28
15 π.
Chapter 21
16.2 Area = 1
12 π.
21.1 X = 2 e –2 πi.
1

16.3 Y = 2∫ 0 y√(1 − y)dy/∫ −1(1 − x2) dx.


1 1

21.2 X = −0.1319 − 0.00141i.

Chapter 17 21.3 p(t) = √(14 + 4√2) cos(5t + φ) where φ is the


polar angle of (2√2, 2 + √2).
17.1 1
2 x− 1
12 sin(6x + 8) + C.
17.2 − 12 cos(x2) + C.
Chapter 22
17.3 I1 = − 14 cos4 x + C, I2 = − 12
1
cos4(3x + 2) + C.
22.1 The isoclines are given by the hyperbolas
17.4 I1 = ln 2, I2 = 12 (ln 2)2. x2 − y2 = constant.
17.5 (a) 4 − 2 ln 3; (b) π. 22.2 General solution is x2(x2 − 2y2) = constant.
934
22.3 General solution is x2y + xy2 + sin xy = constant. Chapter 28
SELF-TESTS: SELECTED ANSWERS

22.4 General solution is xy2 = C(y − x)2, where C is a 28.1 (a) ∂f /∂x = −2y cos(xy) sin(xy),
constant. ∂f/∂y = −2x cos(xy) sin(xy);
(b) ∂f/∂x = −2x sin(x2 − y2), ∂f/∂y = 2y sin(x2 − y2);
(c) ∂f /∂x = (xy)x[1 + ln(xy)], ∂ f /∂y = x2(xy)x−1;
Chapter 23
28.3 Tangent planes are given by ±2x ± 2y − z = 2.
23.1 The origin is the only equilibrium point. The The tangent planes intersect the x, y plane in
equation of the phase paths is y2 = 12 x4 + C, where C is a square.
a constant.
28.4 For maximum volume a = 2√[A/(3√3)].
23.2 For c  0, the origin is a saddle; for 0  c  , 1
4
the origin is a stable node; and for c  14 the origin is a 6[2 ∑ nyn − (N + 1) ∑ yn]
28.5 a = ,
spiral. N(N2 − 1)

23.3 Equilibrium points are at (0, 0), (1, 1), (−1, −1),
2[−3 ∑ nyn + (2N + 1) ∑ yn]
b= ,
(1, −1), (−1, 1). Solutions are x = ±1, y = ±1. N(N − 1)
where all summations are from 1 to N.
23.4 The origin is a centre, the points (1, 1), (−1, −1),
(1, −1), (−1, 1) are all saddle points. 28.6 K(α) = 3π/(16α 5).

23.5 Since K  0 for r ≠ 1, the limit cycle is unstable.


Chapter 29
Chapter 24 29.1 At (3, 4), δz = 35 δx + 45 δy. The approximate
change is −0.02.
24.1 (a) 2/(s2 + 4).
29.2 Percentage increase in volume is
24.2 (b) 2(1 − e− –2 e−s)/(2s + 1).
1
approximately 9%.
24.3 L{t e } = 6/(s + k) .
3 −kt 4 29.3 In terms of x, the rate can be expressed as

24.5 (s2 − 2s − 6)X(s) − 2s + 5.


dz
= √2(x − 2)e−x −2(1−x)
2 2

ds
24.6 x(t) = 12 (et + e3t).
for 0  x  1.
24.8 L{(e−t − 1)/t} = ln[s/(s + 1)].
29.4 dy/dx = −(x + y)/(x + 4y). The maximum occurs
at (−2/√3, 2/√3) and the minimum at (2 /√3, −2/√3).
Chapter 25 29.5 The direction of the normal is

25.1 i(t) = (K/L) cos(t/√LC).


( 14 √17 − 1, 2√17).
29.6 df/ds = 2√5.
25.2 x(t) = −et + 2e2t.

Chapter 30
Chapter 26
30.1 dz /dt = −3 sin t(sin2t − 3 cos2t). Stationary at
26.1 The Fourier coefficients are a0 = 8π /3, an = 4/n ,
2 2 t = 0, 13 π, 23 π, π, 43 π, 53 π.
bn = −4π/n, (n = 1, 2, … ).
30.2 Stationary points are at (1, 1), (−1, −1), (1, −1),
26.2 1
8 π2 (−1, 1).

∞ 30.3 The families curves are confocal ellipses and


8n
26.3 Sine series is ∑ π(4n2 − 1) sin 2nt. hyperbolas.
n=1

Chapter 31
Chapter 27
31.1 Maximum error = 0.261 units for an
27.1 2 /(1 + 4π 2f 2 ). area A = 1.5 units.
935
31.3 The point is (2, 2, 1)/√10. Chapter 36

SELF-TESTS: SELECTED ANSWERS


31.4 The tangent plane is 5x + 2y + 3z = 10. 36.1 The output is a * b which has the truth table:

31.5 The directional derivative (−2, −2, 1).


a b a*b
31.6 Restricted stationary values occur at (1, 1, 2),
0 0 1
(−1, −1, −2), (1, −2, −1), (−2, 1, −1). 0 1 1
31.7 The envelope is the parabola y2 = x + 14 . 1 0 1
1 1 0

Chapter 32
36.2 f = (a * b)  C. The truth table is
32.1 I = J = 28.
a b c f
32.2 I = 12 π.
0 0 0 0
32.3 Volume = 152/3. 0 1 0 0
32.4 I = e−1. 0 1 1 1
1 0 0 0
32.5 The moment of inertia is 13 M(a2 + b2), where M 0 0 1 1
is the mass of the plate. 1 0 1 1
32.6 The volume is V = 12 π. 1 1 0 0
1 1 1 0
32.7 Both areas = 12
1
.
36.3 f = (A * B)  (a * B)  (A * b): this problem and
Chapter 33 Self-test 36.1 have the same truth table.

33.2 9/10.
Chapter 37
Chapter 34 37.1 (a) {1, 2, 2, 2, 3}; (b) {2, 2, 3, 3}; (c) {1, 1, 1, 2,
3}; (d) {3, 3, 3, 3, 3, 3}; (e) {4, 4, 4, 4, 4}.
34.1 The field lines are ellipses being the intersection
of circular cylinders and inclined planes. 37.2 21 are connected of which three are regular with
degrees 0, 2 and 4.
34.3 Surface area is 16 (5√5 − 1).
37.3 (b)(ii) A spanning tree could be the graph with
34.4 Volume = 13 Ah; volume of tetrahedron = 12
1 3
a; edges {ba, bf, bg, bc, ge, cd}.
volume of octahedron = a √2. 1 3
3
37.4 ad(b + c) + eh(g + f ).
34.5 curl F = (x − 2yz)î − yzq − xx;
curl G = (2y − 1)î − 2xq − x. 37.5 By Euler’s theorem: (a) the dodecahedron has
20 vertices; (b) the icosahedron has 30 edges.

Chapter 35
Chapter 38
35.1 (a) S1 = {1, 2, 3, 4, 5, 6, 7, 8};
(b) S2 = {14 , 13 , 12 , 23 , 34 , 1, 32 }. 38.1 At 6.5% the repayment is £8198.15; at 7% the
repayment is £8526.64.
35.2 A  B = {x | x ∈R and −1  x  2, x = 3 x = 4};
A  B = {1, 2}. 38.2 The fixed point is ( 12 (√3 − 1), 12 (√3 − 1)). The
iteration gives u2 = 0.460, u3 = 0.288, u4 = 0.417,
35.3 (a) Same as Fig. 35.9b; (b) same as Fig. 35.9d; u5 = 0.326 to 3 decimal places, which indicates
(c) elements which are not only in A or B or C. stability.
38.3 un = (A + Bn + 18 n2)2n.
936
Chapter 39 40.5 With n = 20, λ = 2 the distributions are
compared in the following table:
SELF-TESTS: SELECTED ANSWERS

39.1 P(A1) = ; (i) P(A2) = ; (ii) P(A3) = .


1
6
4
9
17
18

39.2 P(A  B  C) = P(A) + P(B) + P(C) − P(B  C) i 0 1 2 3 4 5 6


− P(C  A) − P(A  B) + P(A  B  C).
Binomial 0.122 0.270 0.285 0.190 0.090 0.032 0.009
39.3 The probability of six red cards is 0.0113. Poisson 0.135 0.271 0.271 0.180 0.090 0.036 0.012

39.4 Component is faulty with probability 0.942.


40.6 γ = α /(αβ + 1); the probabilities are (a) γ t0;
39.5 (a) 0.000125; (b) 0.1354; (c) 0.1426; (d) 0.1425. α G 1 J
(b) I β + e−α(t 0 −β )L .
8
39.6 , the same as the probability for the second αβ + 1 α
21
drawn component.
Chapter 41
Chapter 40 41.1 The medians and quartiles are as follows:
40.1 The probability that the sum 7 occurs at throw i
is pi = 16(56)i−1, (i = 1, 2, … ). 1st quartile median 3rd quartile mean

40.2 The probability pi that i individuals are


Paper 1 41.5 47 56 47.7
over-height is given by Paper 2 47.5 58 63.5 54.9
Paper 3 43.5 50 59.5 52.5
A 13 D i A 7 D 8−i Paper 4 45 49 65 54.3
pi = C , (i = 0, 1, 2, … , 8).
C 20 F C 20 F 8 i
41.2 m1 ≈ 1966; m2 ≈ 2034. m2 − m1 behaves like √n
40.3 Expected value is 2.8, and the variance is 1.82.
as n → ∞.
Answers to selected problems

Full solutions of these end-of-chapter problems can be found at the website:


www.oxfordtxtbooks.co.uk/orc/jordan_smith4e

Chapter 1 (f) 1/4x − 1/4(x + 2) − 1/2(x + 2)2.


(h) 1/2(x − 3) + 1/2(x + 1).
1.2 (a) y = −2x + 3; (b) y = 1; (c) y = 32 x − 13 .
Intersections are A : (2, 1), B : (54, 12), C : (1, 1). 1.37 (b) 1/[2(x − 1)] + 1/[2(x2 + 1)] − x /[2(x2 + 1)].

AB = 14 √13 , AC = 1, BC = 14 √5. 1.38 (b) x − 3 − 1/(x + 1) + 8/(x + 2).

1.3 (b) Slope = --31 . Intersection with axes at (2, 0), 1.39 (b) 1 + 1/2 + 1/5 + 1/10 + 1/17.
(0, − ).
2 6
3
1.40 (b) ∑( ) 1 n
3 = ( 13 )2 + ( 13 )3 +  + ( 13 )6
1.4 (b) (y + 2)/(x + 1) = −2, so y = −2x − 4. n= 2

(d) (y − 2)/(x − 1) = 3, so y = 3x − 1. = ( 13 )2 [1 + 1
3 +  + ( 13 )4 ].
Now (1.31) gives the sum in the brackets. Finally we
1.7 (b) Centre (1, 0), radius 2.
obtain 121/729.
(d) Centre ( , − ), radius √11.
1 1 1
2 2 2
(e) −341/1024.
1.9 (b) x = − 53 ± 1
5 √14, y = − 15 ± 15 √14. 1.44 (c) 1/99; (e) 30/11.
1.14 (b) 1. (d) −1/√2. (f ) −√3/2. 1.45 (b) 10/9; (d) 2/3.

1.16 (b) cos x; (d) −cos x. 1.47 (b)(i) 2m m!.

1.17 (b) 2 cos 12 (x + y) sin 12 (x − y). 1.49 (c) 256; (d) 20; (f) 59.

1.18 In the following, n represents any integer: 1.50 (a) 72; (b) 360.
(b) 12 π + n π; (d) 16 + 13 n; (f ) 2n. 1.51 (b) 24; (d) 164.
1.19 (b) amp. = 1.5; ang. freq. = 0.2; period = 31.41; 1.54 (a) 2880; (b) 720.
phase = −0.48.
1.55 (a) 120; (b) 720; (c) 220; (d) 1000.
1.20 (b) 12 x − 32 ; (d) arcsin 12 x, 0  x  2.
(f ) arccos(arcsin x), 0  x  sin 1. Chapter 2
1
(h) − 12 + (1 + 4x) 2 , x  − 14 . 2.1 (b) 0.5; (e) 2; (g) 1.
1.22 (b) 13 e 2; (d)ln 13, or − 13 ln 3; (f ) 2; (h) ±√2;
1
3 2.2 (c) 6; (e) − 14 ; (g) −4.
(l) Hint: write sinh 2x = 12 (e 2x − e − 2x ) and obtain a
2.3 (c) −1/x2; (f) 4x.
quadratic equation for e2x. x = 12 ln(4 + √17).
2.4 (c) −8.
1.26 Hint: x = tanh y = (ey − e−y)/(e y + e−y). Form an
2.5 (c) 32, −32.
equation for e y and solve it.
2.8 (c) dE/dT = 4kT3.
1.28 5 cos(ω t − 0.927).
2.9 (b) 7x6 − 18x5 + 1.
1.29 C = 2, α = 1.386, f(2) = 1/8.
2.11 Use the formula for tan(A − B) in Appendix B(b).
1.30 Tidal period = 12.57 h. It floats for 9.20 h.
2.12 (b) --21 ; (d) 1; (g) 2; (i) π /180 = 0.0175.
Hint: it floats when sin 0.5t  −0.666. Sketch
y = sin 0.5t and y = 0.666 and find the intersections. 2.15 (a) 2 cos x + 3 sin x.

1.33 The vertex is (−4, 7). 2.16 (b) y = 24x − 39; (d) y = e−1x.

1.36 (b) 2/(x + 2) − 1/(x + 1). 2.17 (b) 6x − 2, 6, 0.


(d) 1/2x − 1/(x + 1) + 1/2(x + 2). 2.20 y = (−x + x0 + 2a2x30)/(2ax0).
938
Chapter 3 5.3 (b) The terms in the expansion of sin x are of size
|x | 2n−1/(2n − 1)! with n = 1, 2, … . We need to choose n
ANSWERS TO SELECTED PROBLEMS

3.1 (b) x cos x + sin x; (f ) 2x ln x + x.


so that this is less than 0.000 05 when x = ±2. The first
3.2 (b) 1/(1 + x)2; (f ) (x2 − 2x sin x cos x)/x4 cos2x. value within the limits is n = 7. The polynomial is
(m) nx n−1. x− 1
x3 + 1
x5 − 1
x7 + 1
x9 − 1
x11 + 1
x13.
3! 5! 7! 9! 11! 13!
2 2
dg df df df dg dg 5.4 (b) 12 π − x.
3.3 (d) f +g , g 2 +2 +f 2,
dx dx dx dx dx dx
5.5 (b) 12 + 14 x + 18 x 2 +  , −2  x  2.
d 3f d 2f dg df d 2 g d3 g
g 3 +3 2 +3 + f . (h) 1 − 21! x + 41! x 2 −  , valid for all x.
dx dx dx dx dx 2 dx 3
5.6 (b) 1 + 12 x − 18 x 2.
3.4 (b) −2 cos x sin x; (e) 2 sin x/cos3x;
(j) 12x2(x 3 + 1)3; (n) −3 e−3x. 5.7 (b) tan x ≈ (x − 16 x 3 + 1
120 x )(1 − 12 x 2 +
5 1
24 x 4 )−1
− 12
≈ x + 13 x 3 + 2 5
x.
; (i) − 12 x − 2 .
3 15
3.5 (f ) 12 x
5.8 (d) ln(1 + x + x2) = ln[x2(1 + 1/x + 1/x2)]
3.6 (f ) e−t(cos t − sin t); (k) 2 sin x(cos x − sin x)/x 3.
= 2 ln x + ln(1 + 1/x + 1/x2).
3.9 (c) (−2x sin x2)/cos x2. The original function only
Then treat 1/x + 1/x as the small variable.
2
has a meaning when cos x2  0.
5.11 (b) Suppose that the first nonzero derivative is
3.10 (b) et(cos t + t cos t − t sin t).
the N th : f (N)(c) ≠ 0. Consider whether N is even or
1 1
3.11 (b) dy/dx = −y 2 /x 2 . This can be written in other odd, and whether f (N)(c) is positive or negative.
1 1
ways; for example, put y 2 = 1 − x 2 from the equation
5.17 (c) 12 (ex + e−x).
of the curve.
5.18 (c) 45 .
3.15 (b) −5.

3.16 (b) dy/dx = ± x/[2√(1 − (x/2)2)].


Chapter 6
6.1 (b) 3 ± i.
Chapter 4
6.3 (b) 3 − 5i; (d) 9 + 3i; (f ) 1 + 6i.
4.1 (b) 2t 2; (c) 4t 3.
6.5 (d) − 13
25 −
9
i.
4.2 (c) x = e−1 (min); (g) x = 0 (min); 25

(i) x = −1/√3 (min), x = 1/√3 (max); 6.6 (a) −4i; (c) − 15 + 58 i.


(t) Points of inflection at x = nπ; maxima at 6.7 (a) 1 − i; (c) −2i.
x = (2n + 12 )π; minima at (2n − 12 )π.
6.8 (b) 16.233 − 0.167i; (d) 88.669.
4.5 If base = x and rectangle height = y, then
A = xy + 18 π x 2 (constant), and P = (1 + 12 π)x + 2y. 6.9 (b) | z2 | = 8; Arg z 2 = − 13 π. (d) |z4 | = 3; Arg z4 = π.
Substitute for y from the formula for A to express P 6.10 (b) y = 2; (d) the parabola, y 2 = 4x;
in terms of x only. The minimum of P is reached (f ) y = x (x  0).
1
when x = [2A /(1 + 14 π)] 2 . πi − 13 π i
; (g) e2 ei; ( j) √2 e 4 πi.
3 3
6.11 (a) √2 e 4 ; (d) 14 e
4.10 (b) δy ≈ −0.2 (exact value −0.227… ).
6.16 (a) 2nπi (n = 0, ±1, ±2, … ); (c) (2n + 1)πi.
(d) δy ≈ −0.4 (exact value −0.5).
6.18 (a) cos(ln 2) + i sin(ln 2).
4.11 (a) δv ≈ −0.11; (d) δA ≈ −0.08.
6.23 (a) x2 − y2 + 2xyi.
(d) cos x cosh y − i sin x sinh y.
Chapter 5
6.28 2 + i, 2 − i, −1 − i, −1 + i.
1
5.1 (b) (1 + x) 2 ≈ 1 + 12 x − 18 x 2 + 1
x 3. For 2 decimal
16 6.29 (b) e2cosθ cos(2 sin θ ).
places, we need | 16 x 3 |  0.005, or −0.43  x  0.43.
1

(d) To four terms,


Chapter 7
sin 2x ≈ 2x − 1.333x 3 + 0.267x5 − 0.025x7,
7.2 x = −2, y = 1.
where (for this context) the coefficients are rounded
to 3 decimal places. For two-decimal accuracy, we ⎡−10 −5⎤
7.6 BA = ⎢ .
need −0.79  x  0.79. ⎣ 20 10⎥⎦
939
⎡− 5 6 16⎤ Chapter 10
7.7 A 2 + C 2 = ⎢− 8 11 2⎥ .

ANSWERS TO SELECTED PROBLEMS


⎢ ⎥ 10.1 (a) 10. (e) zero.
⎣⎢− 6 − 6 −7 ⎦⎥
10.3 If your diagram is a parallelogram ABCD,
7.11 A2n−1. the theorem obtained is AC 2 + BD2 = 2(AB2 + AD2).
7.16 x = −17, y = −2, z = 8.
If you use the triangle rule the result gives the median
of a triangle in terms of the sides.
10.5 (a) 6. (b) −5.
Chapter 8
10.6 (a) 35.3°.
8.1 (c) 1; (e) −1.
10.8 54.7°.
8.4 (b) 1728; (d) −8132.
10.9 33x2 + 13y2 − 95z2 + 48xy − 144yz + 96zx = 0.
8.6 (b − c)(c − a)(a − b)(a + b + c).
10.10 32.5°, 78.9°, 68.6°.
8.14 x = a, b, c, −a − b − c.
10.12 F = − 13
11
a+ 30
11
b+ 57
11
c.
⎡ 7 1 −5⎤ 10.16 α = − 52 , β = 57 , γ = 2
15 .
8.16 det(AB) = −36, A−1 = 1 ⎢−2 0 2⎥ .
2
⎢ ⎥ 10.17 x = 0, y = 0, z = 1.
⎢⎣ 1 1 −1⎥⎦
2 2
⎛ 1⎞ ⎛ 1⎞
10.18 (a) (2√2, 0). (b) ⎜ X − ⎟ + ⎜Y + ⎟ = 1.
Chapter 9 ⎝ √2 ⎠ ⎝ √2 ⎠

9.1 (a) P_Q = (5, −3), Q_P = (−5, 3). 10.19 (c) (l, m, n) = ( 13 , − 32 , − 32 ).

9.2 (f ) Length = 5, θ = 126.9°. 10.21 (a) ±( 133 , 4


13 , 12
13 ).

⎛ 3 3√3 ⎞ 10.26 (a) 19.1°.


9.3 (b) ⎜ , ⎟.
⎝2 2 ⎠ 10.30 (a) P1 is −y + z = 4, P2 is 2x − 2y + z = 5.
(b) 45°. (c) 2√2. (d) and
9.4 B_E = (0, −4); BE = 4; bearing south. (e). The line L is given by r = λ(1, 4, −4). Show that
9.5 (c) √6. intersection with P1 and P2 occurs when λ = − 12.

9.7 (b) 2a = (6, 4, 6), 3b = (3, 3, 6), 2a − 3b = (3, 1, 0). 10.34 Begin by finding any two points on the line of
intersection. (The resulting form is not unique.)
9.10 (a) (3, 3, −6). (b) (X + 2)2 + (Y − 1)2 + (Z + 3)2 = 1.

9.16 Speed 10√2; direction towards north east. (Hint: Chapter 11


use vwe = vw − ve in components, with vN = (u, v).)
11.1 (a) (4, 7, 5). (d) −9. (h) (−24, 3, 15).
9.22 (b) 43 a + 14 b; (c) 32 a − 1
b.
2
11.9 Hint: the determinant is equal to a · (b × c),
9.23 (a) (a + λ b)/(1 − λ). (b) (a − λ b)/(1 + λ). where Q_A = a, etc.
(c) The point is on the extension of AB in the
11.12 X = − 16 , Y = − 32 , Z = − 56 .
direction of A_B.
11.13 (c) λ = µ = − 12 , v = − 32 . L3 meets L1
9.26 (a) y + z = 1. (b) 3x − 2y − z = 0.
at (−1, 0, − 12) and L2 at ( 12 , − 32 , − 12 ).
9.27 √2.
11.15 (a) ( 243 , , − 43 ). (b) ( 163 , 243 , − 163 ). (c) 163 i.
16
3

9.28 (a) ±(3/√34, 4/√34, 3√34). (b) ±( , , ). 2


7
3
7
6
7
(Note: the unit vector in the direction of î − 2q − 2x
is 13(î − 2q − 2x).)
9.29 (a) −3î + 2q + 4x. Length √29.
11.16 (a) −6. (b) 6. (c) 0. (d) 0. (e) −2√3.
1⎛ a b⎞
9.36 r = ⎜ + ⎟ . (Hint: draw a diagram
2 ⎝ | a | | b |⎠
involving T and s.) Chapter 12
9.37 The minimum separation occurs when 12.1 (c) x1 = 1, x2 = −1, x3 = −5.
t = 12 12 s. (e) x1 = 2, x2 = −1, x3 = 2, x4 = 2.
940
12.7 x1 = 40, x2 = 88, x3 = −68, x4 = −59. (k) x + ln x + C ((x + 1)/x = 1 + x−1);
1
2x − 2x 2 + C; ln| x | − 2x −1 − 12 x −2 + C.
ANSWERS TO SELECTED PROBLEMS

⎡ 5 0 −5⎤
12.9 (b) 1 ⎢− 6 10 1⎥ . 14.2 (b)
25
⎢ ⎥ − 15 (1 − x)5 + C; − 32 (8 − 3x)− 2 + C; 32 (1 − x) 3 + C.
3 4

⎢⎣ 7 5 3⎥⎦
⎡ 1 0 0 0 0⎤ 14.3 (b) −ln | 1 − x | + C; − 15 ln| 4 − 5x | + C.
⎢−1 1 0 0 0⎥
⎢ ⎥ 14.4 (c) 38 x + 1
4 sin 2x + 1
32 sin 4x + C.
(e) ⎢ 0 −1 1 0 0⎥ .
⎢ 0 0 −1 1 0⎥ 14.5 x2 ex − 2x ex + 2 ex + C.
⎢ 0 0 0 −1 1⎥⎦
⎣ 14.6 (a) 2; (h) −ln 2.

12.12 The shadow on the z plane has vertices at the 14.7 (c) 4 − x2  0 if −1  x  2, and
points (−1, 0, 0), (−1, −2, 0), (1, 0, 0). 4 − x2  0 if 2  x  3. The geometrical area is
12.16 Non-trivial solutions if k = 1, −1, 4. |F(x)| 2−1 + |F(x)| 23,
where F(x) = 4x − 13 x 3.
12.18 Non-trivial solutions if k = −6, −1, 3, 4.
14.8 (a) At + B; (b) 16 t 3 + At + B.
12.22 x1 = 1.398, x2 = 1.090, x3 = −0.2844,
x4 = −0.3697.
Chapter 15
Chapter 13 x =1


1

⎡−3⎤ ⎡1⎤
15.1 (b) lim
δx→ 0
∑ x 5 δx =
x = −1
x 5 dx = [ 16 x 6 ]1−1 = 0.
−1
13.1 (b) Eigenvalues 4, 9. Eigenvectors ⎢ ⎥ , ⎢ ⎥ .
⎣ 2⎦ ⎣1⎦
(e) Eigenvalues 3 − 4√2, 3 + 4√2. Eigenvectors 15.2 (b) (x + 1) dx = 1
2 2
3 (x + 1) 2 + C .
3

⎡−1 − 2√2⎤ ⎡−1 + 2√2⎤


⎢ 7 ⎥, ⎢ 7 ⎥.
⎣ ⎦ ⎣ ⎦
 dx = [x] = 2; (i) (2
2
3
15.3 (c) 2
0
2
3
2
− 1).
13.4 (c) Eigenvalues (−2, 2, 3). Eigenvectors 0

⎡ 0⎤ ⎡1⎤ ⎡0⎤
⎢−1⎥ , ⎢0⎥ , ⎢2⎥ .
 (x − 1)dx = [ x − x]
1

⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 15.4 (b) 2 1
3
3 1
−1 = − 43 .
⎢⎣ 2⎥⎦ ⎣⎢0⎦⎥ ⎣⎢1⎦⎥ −1

13.7 a = −2 and a = − 72 . ∞

e − 12 v
dv = −2 [e − 2 v] 0∞ = − 2(0 − 1) = 2 .
1
15.5 (b)
13.12 The matrix C is given by
0

⎡ 7 −1 −1⎤
C = ⎢−1 −1 1⎥ .
 (1 − e ) dt = T + e
T

⎢ ⎥ 15.6 (c) 2/π; (h) −t −T


− 1.
⎢⎣−1 0 2⎥⎦ 0

1
⎡1 1 1⎤ (T + e −T − 1) = 1 + T −1 e −T − T −1 → 1
T
13.16 lim An = 13 ⎢1 1 1⎥ .
n→∞ ⎢ ⎥ as T → ∞.
⎢⎣1 1 1⎥⎦
13.22 Eigenvalues are 0, 4, 4, 12. 15.7 The integrands are (a) even; (b) odd; (c) odd;
(d) odd.
13.26 A3n = I3, A3n+1 = A, A3n+2 = A2.
15.9 (b) The exact result is √π/2.
− 12
sin (x + 1) − 12 x − 2 sin x.
1

Chapter 14 15.10 (e) 12 (x + 1)

14.1 (a) 16 x 6 + C; 53 x 5 + C; 12 x 4 + C; 19 x 3 + C; 15.11 (b) x  −1 : 1


2 (constant); −1  x  1 : 12 x 2;
3x + C; 3x + C; C.
2
x  1 : 12 (constant)
(g) ex + C; −e−x + C;
+ C; −2 e − 2 x + C; − 32 e −2x + C.
1
5 2x
2e 15.14 6.
941
Chapter 16 17.7 (e) tan x − x + C; (f) −x − arctan(x ) + C;
−1 −1
1
(k) 12 [arcsin x + x(1 − x 2 ) 2 ] + C.

ANSWERS TO SELECTED PROBLEMS


16.1 5.3 × 10 −3.
17.8 (b) 12 ln| x /(x + 2)| + C.

 (20 − 10t) dt = −20, x(4) = −17.


4

16.2 (d) ln |x + 1 | − --21 ln |2x + 1 | + C.


2 (f ) ln | x | − --21 ln(x2 + 1) + C.
(i) --21 ln[(1 + sin x)/(1 − sin x)] + C.
16.3 (b) 12 π; (g) π.
17.9 (b) --21 ln(x2 − 2x + 3) + C. (e) ln(ex + e−x) + C.
16.5 (a) 43 πab2. 1
(f ) 2 ln(x 2 + 1) + C.

 πx dy =  π(2y) dy = 28π/3.
2 2

16.6 v = 2 2 17.10 (b) 13 x e3x − e3x + C.


1
9

1 1 (f ) 2x sin x + 4 cos 12 x + C.
1
2
(i) 12 x 2 ln x − 14 x 2 + C. ( j) xn+1 [ln x − 1/(n + 1)]/(n + 1).
 mx dx =
L

16.7 Put x = 0 at A; moment = 1


2 mL2. (k) Hint: bring together the two terms ∫ (ln x/x) dx.
0
17.11 (a) Hint: there are two stages required; see
16.8 1.18.
Example 15.20.
16.9 0.015 g.
17.12 Hint: the same integral occurs on both sides
16.12 A sketch shows that x(x − 1)  −x if 0  x  2. but with a different factor.
Therefore the area is 17.13 (b) zero; (d) 12 ; (h) π.
x =2

∑ [x(x − 1) − (−x)] δx =  x
2

lim 2
dx = 83 . 17.15 F(0) = 1
2 π, F(1) = 1, F(4) = 3
16 π, F(5) = 8
15 .
δx→ 0
x=0 0
17.16 (a) 2 (ln 2)3 − 6 (ln 2)2 + 12 ln 2 − 6.
16.13 3
2 . (b) F(0) = 2, F(1) = π, F(4) = π 4 + 12π 2 + 48,
16.14 (b) π.
F(5) = π5 + 20π3 + 120π.

16.15 In a plane perpendicular to the end, y is 17.23 (c) (a/b) arctan [(a tan x)/b] + C;
downward and x is horizontal; the origin is at the (d) ln(tan 12 x) + C;
top. Area elements are horizontal strips of width δy (g) ln|sec x + tan x| + C; (j) ln[(1 + √5)/2];
in the end face. Force = 12 ρgLH 2. Moment = 16 ρgLH 3. (k) 8(6√3 + 1)/15.

16.16 Distance of centre of mass from vertex is 34 H. 17.25 Coordinates of centroid: ( 35 h, 0).

16.17 1
12 σ a3b(σ = mass per unit area).
16.18 (a) 1
4 σBH 3; (b) 1
48 σHB3, where σ is mass per Chapter 18
unit area. 18.2 (b) x = A e 2 t; (e) x = A e − 3 t ; (i) x = A et.
1 4

16.23 8a.
18.3 (b) x = e 3 (t−1); (d) x = 10 e−(t+1).
1

18.4 I(t) = I0 e−Rt/L. I reduces to a fraction 1/n of itself


Chapter 17 in any interval of length (L/R) ln n.
17.1 (c) − 13 e −3x + C; (f ) − 121 (3 − 2x)6 + C;
18.5 (a) A(t) = C e−kt (C arbitrary).
(j) (2x − 3) 2 + C; (n) 12 ln| 2x + 3 | + C;
1

1
(o) ln| 1 − x | + 1/(1 − x) + C. (b) The half-life T = ln 2 years. The information
3
k
17.2 (b) − 32 cos 12 (3t − 1) + C; (e) − 32 (− t) 2 + C; implies that e = 1 − 0.175 = 0.825, so k = 0.0096.
−20k

Therefore T = 72 years.
17.3 (d) 1
2 sin(x 2 + 3) + C. (j) 1
2 ln(1 + x 2 ) + C.
18.6 If N(t) is the number, then δN ≈ 20(--21 N) δt so the
17.4 (c) sin3 2x + C. (g) Put cot 2x = cos 2x/sin 2x,
1
equation is dN/dt = 10N. In the second experiment
6
then u = sin 2x, giving --21 ln | sin 2x| + C.
there is an average death-rate of 1 per rabbit per year,
(j) --31 cos3x − cos x + C.
so dN/dt = 9N.
17.5 (b) 205/32; (e) −ln 2; (h) --21 ln 2;
18.7 (b) A et + B e−2t. (e) A e t/2√3 + B e−t/2√3.
(k) zero; (n) (2 /ω) cos φ.
(l) A e−3t + Bt e−3t.
17.6 (b) --π; 1
2 (d) --π + --;
1
4
1
2 (f ) --π.
3
8 (n) A + Bt (this is an exception to (18.10)).
942
1
18.9 (b) --32 (et − e−2t). (b) Amplitude = 10 /[(36 − ω 2 )2 + ω 2 ] 2 ,
(d) The general solution is A e−x + Bx e−x, phase = −arctan[ω /(36 − ω 2) ].
ANSWERS TO SELECTED PROBLEMS

y = e(x − 1)e−x. (c) Resonance: ω = 5.958.


18.10 (b) A cos 3t + B sin 3t.
(d) A cos ω 0 t + B sin ω 0 t. (f ) et(A cos t + B sin t). Chapter 21
(i) e − 3 t (A cos 13 √2t + B sin 13 √2t).
2
1
πi − 12 πi
21.1 (b) −2 e 2 (2 e in standard form).
18.11 (c) a cos ω 0 t + (b/ω 0) sin ω 0 t. − 34 πi
1
21.2 (d) 2 e ; 2 cos(ω t − 43 π).
18.12 θ = α cos( g /l ) t. 2
(i) e 1.97i
; cos(ω t + 1.97).
18.13 The initial angular velocity dθ /dt is v/l; − 12 πi
= 1 + i = √2 e 4 πi.
1
21.3 (b) 1 − e
1

v ⎛ g⎞ 2

21.4 (b) 1 − 3 e
− 12 π i
+ e 2 πi = 1 + 4i = √17 eiφ , where
1

θ= sin ⎜ ⎟ t.
1
(lg) 2 ⎝ l⎠ φ = arctan 4 = 1.33.
18.14 θ = 0.0719 e−0.033t sin 0.696t. 21.6 (b) R + ω Li. (d) R /(1 + ω RCi).

18.18 A = (Mg/P) e −ρ g(y−H)P


. (i) R + iω L /(1 − ω 2LC).
(k) iω RL /[R(1 − ω 2LC) + iω L].
21.7 V = ZI and V = 2.
Chapter 19 1
(d) I = 2(1 + iω RC)/R; | I| = 2(1 + ω 2R2C 2 ) 2 /R;
19.1 (b) − t − t − t −
1 3
3
1 2
3
2
9
11
27 . arg I = arctan(ω RC).
(d) 53 e 2t. (i) − 152 sin 3t.
21.8 (b) V1 / V0 = 2
(3 − 2i); V0 / I1 = 12 (5 − i).
(k) − 253 cos 2t + 254 sin 2t. 13

19.2 (d) 15 (−6 cos t − 3 sin t).


(f ) − 137
2
(4 cos 2t + 11 sin 2t).
Chapter 22
(h) 65 e (4 cos 2t + 7 sin 2t).
3 t
22.4 (b) 2x2 − y2 = C. (g) y = x /(1 + Cx).
(k) x = ± 2 − 2(C − t 3)− 2 for t 3  C.
1 1

19.3 (b) − 43 t cos 2t.


(n) arctan y + arctan x = --41 π. Take the tangent of this
19.4 (b) 12 t 2 e t; (e) 12 t e t sin t. expression and use the formula for tan(A + B); we
19.5 (c) A e 2 + B e
1
t − 12 t
− 1 − 173 cos 2t. find that y = (x + 1)/(x − 1).
(i) A cos x + B sin x + x2 − 1 + 15 e3x . 22.6 (b) y = 16 (x + C) for x + C  0. y = 0 is also a
1 2 2 2

solution. (d) Those parts of the curves y = sin(ln |x |


19.6 (c) − 12 + A e . t2
+ C) for which x and dy /dx have the same sign. Also
(g) (sin x − cos x − x cos x + A)/(x + 1).
y = ±1 are solutions.
(l) (x + 1) ln |x + 1| + 1 + A(x + 1).
22.7 (b) y 3 − 3xy = C. (d) xy − y 2 − x2 = C.
19.9 11 12 minutes.
(f ) y 3 + y − x3 = C. (h) y + cos y + sin x = C.
( j) ex+y + y − x = C.
Chapter 20 22.8 (b) xy + y/x = C; (d) x /y + y − x = C;
20.1 (b) 3 cos(ω t + π). (e) 3 cos(2t + --21 π). (e) y /x − x/y − 1/x = C; (f ) x2 /(2y2) + 1/(xy) = C.
(h) 5 cos(2t + φ), φ = −arctan --. 4 1
3
22.12 (b) x(1 + 2y 2 /x 2 ) 4 = C.
20.2 (c) x leads y by π. (d) x − 4y = Cy .
2 2 3

20.3 (b) (i) 0.318 cycles/s. (ii) 0.316 cycles/s.


(iii) About 3 cycles. Chapter 23
20.4 (b) C = √(4 − √6), φ = arctan(1/(√6 − 1)), 23.2 (b) y = Cx (this is not covered by (23.22)).
(− 12 π  φ  0). (d) xy = C (a saddle).
20.7 The solutions are of exponential type. 23.4 (b) Saddle (i.e. unstable). m = 12 (−3 ± √13).
(f) Stable spiral; directions are clockwise round
20.8 x = e−4t − 4 e−6t.
origin.
20.9 A e−kt + Bt e−kt.
23.5 (b) Equilibrium points at (1, 1). (1, 1) is a stable
20.10 (a) Period = 1.0508. spiral, anticlockwise about (1, 1). (d) Equilibrium
943
points at (−1, 0), (0, 0), (0, 1); (0, 0) is a centre and 25.5 (b) 2s /(6s + s + 1).
2

(−1, 0), (1, 0) are saddle points.

ANSWERS TO SELECTED PROBLEMS


25.6 (b) V2 /V1 = 3/(20s2 + 12s + 5); V2 /I = 3/(4s2 + 1).

25.7 (b) t; (f) 1 − cos t; (h) 12 (−t cos t + sin t);


Chapter 24 (j) n!m!tn+m+1/(n + m + 1)!.
24.1 (b) 4/(s + 1); (d) 6/s − 1/s; (g) (3 − s)/(s + 1).
3 2

 f (τ )(e
t
1
25.8 (b) ω ( t− τ )
− e −ω (t− τ )) dτ .
24.2 (b) 1/s − 2/(s + 2); (e) (3s − 4) /(s + 4); 2
2ω 0
(g) --21 [1/s − s /(s2 + 4)].
25.9 (b) cosh t.
24.3 (b) 1/(s + 2)2; (d) (s − 2) /(s2 − 4s + 5);
(i) (s2 − 9) /(s2 + 9)2; (l) 24 /(s + 1)5. 25.19 (a) x(t) = δ(t) + 2δ(t − T ) + δ(t − 2T ),
1
X(s) = 1 + 2 e−sT + e−2sT.
24.5 (b) 1; (d) 18 t 4 ; (g) e ; (k) 12 e t + 12 e − t;
3
2
2t

(o) 2 cos 2t − 1
sin 2t; (s) 12 e tt 2; (u) 13 (cos t − cos 2t). 25.21 (a) z −1 + 2z −2 − z −3. (b) 1 − z −1 + z −2 − ···
2
= z /(z + 1). (c) 2z /(2z − 1). (d) z /(z2 − 1).
24.6 (e) (2s2 + 3s − 2)X(s) − 10s − 9.
25.22 (a) Tz /(z − 1)2.
24.7 (b) 2 et + e−2t; (e) 3 e−t cos 2t;
25.23 (a) (z − 1)/(z + 1), g(t) = {1, −2, 2, −2, … }.
(f ) y = 1
4 ex + 1
4 e− x + 1
2 cos x.
25.27 (a) Unstable. Poles at z = ±2, giving growth --43 2n
24.8 (b) 3 − 3 cos t + sin t.
and --41(−1)n2n. (c) Stable. Poles at z = ± --21 i, giving decay
(e) − 18 e − t + 98 e t − 14 t e t + 14 t 2 e t.
1 1 1
(i) − 76 e t − 12 e − t + 43 e 2t − 121 e −2t. cos πn.
4 2n 2
24.9 (b) x = 3
8 + 5
8 e4 t + 12 t e4 t; y = − 163 + 3
16 e4 t + 14 t e4 t.
24.10 (b) e t( 12 A + 12 B + 32 ) + e − t( 12 A − 12 B + 32 ) − 3, Chapter 26
where A and B are arbitrary. This is the same
26.1 (b) an = 0, bn = −2(−1)n/n.
as C et + D e−t − 3, where C and D are arbitrary.
2
(e) an = 0, bn = [1 + (−1)n − 2 cos( 12 nπ)].
24.13 e−2 e−2s[(s + 1)2 − 1]/[(s + 1)2 + 1]2 πn
= e−2 e−2s s(s + 2)/(s2 + 2s + 2)2.
2π 2 4
26.2 (b) bn = 0, a0 = , an = 2 (−1)n(n = 1, 2, … ).
24.14 (b) H(t) sin t − H(t − 1) cos(t − 1). 3 n
24.15 (b) ( 18 e 2t + 1 −2t
− 14 )H(t), 4(−1)n
8 e (c) bn = 0, an = − .
−( 18 e 2( t−1) + 18 e −2( t−1) − 14 )H(t − 1). π(4n 2 − 1)
(d) 12 H(t)t sin t + 12 H(t − π)(t − π) sin(t − π). 2
26.3 (a) a0 = 12 π, a2n = 0, a2n −1 = − ,
πn 2
Chapter 25 (−1)n
bn = − (n = 1, 2, … ).
25.3 Hint for working: s2 + 2ks + ω 2 has real factors
n
when k2  ω 2; so put s2 + 2ks + ω 2 = (s − α)(s − β ), 26.5 Series sum is 14 π.
1
where α, β = −k ± (k 2 − ω 2 ) 2 . Then x(t) is given by
26.8 F = 2.
(α − β )−1[(α + κ ) eα t − ( β + κ ) eβ t]H(t)
+ I(α − β )−1[eα (t−t0) − eβ(t−t0)]H(t − t0), 4β
26.10 a0 = 0, an = 0, bn = [1 − (−1)n](n = 1, 2, … ).
πn3
where κ = 1 + 2k.

4
25.4 By proceeding as suggested, we obtain 26.16 (a) ∑ (2n − 1)π sin(2n − 1)πt.
n=1
u(x) = Ax + 16 Bx 3 + (Mg /6K)(x − 12 l )3 H(x − 12 l ).

2 4
The conditions at x = l give A = Mgl 2/16K, 26.18 −∑ cos 2nω t.
B = −Mg /2K. This problem could be solved by π n=1 π(4n 2 − 1)
integrating the equation four times, and linking the
1 41 1
solutions over [0, --21 l] and [--21 l, l ] by the condition 26.23 (b) R(t) = – + –icosπt + — cos 3πt
2 π3 32
that u(x), u′(x), u″(x) are continuous at x = --21 l, but
this is automatically secured in the Laplace-transform 1 5
+ —2 cos 5πt + ...i.
method. 5 7
944

1 π2
26.26 (b) ∑n
n= 0
2
=
6
. Chapter 29
ANSWERS TO SELECTED PROBLEMS

29.1 (b) δz = 0.0718… (exactly). The incremental



i 1 i 2πnt/T
26.30 1
2 +


n= −∞ n
e . approximation gives δz ≈ 0.0784. Error = 9.1%.
29.3 (b) −δy(δn + δy)/(1 + δy).

29.6 −5.7%.
Chapter 27
29.7 1.67% reduction, approximately.
27.1 Xs(f ) = 4πf/(1 + 4π2f 2); Xc(f ) = 2/(1 + 4π 2f 2). 29.9 (b) −2√2; (d) zero (it is the same in all
∞ directions).
27.8 x(t) = 2
 0
X(f ) cos 2πft dt where 29.10 (b) – 43 ; (e) − 12 ; (j) 1.

∞ 29.12 (b) x1x /a2 + y1y /b2 = x21 /a2 + y 12 /b2.


X(f ) = 2
0
x(t) cos 2πft dt. (f ) ax1x + h(y1x + x1y) + byy1 + g(x + x1) + f(y + y1) + c
= 0.
27.11 (c) 2c sinc cf cos 2πbcf. 29.16 (b) x −1 − y −1 = constant; (d) ex + ey = constant.
27.12 (a)
1
2 sinc 2 12 f . (b) 1
2 sinc 2 12 f e −i3πf . 29.17 (b) y2 − x2 = b2 − a2.

27.17 {1/[α + i(2πf + β )] + 1/[α + i(2πf − β )]}. 29.19 (b) 49.8° or 130.2°.

27.19 (b) 1/(1 + i2πf )2. (d) Hint: compare Problem 29.12f.

27.20 (b) The Fourier transform is sinc2( f ) e−i2π(a+b)f 29.21 (b) (0, 12 ); (d) (− 14 , 1).
↔ Λ[t − (a + b)]. 29.22 (b) (2, 1)/ √5.

29.23 (b) φ = 0.
Chapter 28
28.3 (c) 4x − 2y − 1; − 6y − 2x − 1. Chapter 30
(f ) y − 2; x − 1. (i) 2y/(x + y)2; −2x/(x + y)2.
30.2 (b) − 4 sin t cos t; (d) 2 sin(t2) + 4t2 cos(t2).
(k) x(x 2 + y 2 )− 2 ; y(x 2 + y 2 )− 2 .
1 1

30.3 It is easiest to start by expressing the distance D


∂V ∂V in terms of polar coordinates (r, θ ), (R, φ ) by using
28.4 (c) = g ′(r) cos θ; = g ′(r) sin θ.
∂x ∂y the cosine rule (Appendix B(f)). Then
dD (Rv − rV ) sin(φ − θ )
28.8 ∂2f/∂x2, ∂2f/∂y2, and ∂2f/∂x ∂y = ∂2f/∂y ∂x =− 1 ,

are given in order: (b) 2, 4, 3. (d) 2y/x3, 0, −1/x2. dt [R2 + r 2 − 2Rr cos(φ − θ )] 2
(h) 108(3x − 4y)2, 192(3x − 4y)2, −144(3x − 4y)2. where θ = vt/r, φ = Vt /R.
(k) −r −3 + 3x2r −5, −r −3 + 3y 2r −5, 3xyr −5, where 30.4 (b) x = y = 3. (e) The coordinates of the nearest
1
r = (x2 + y2) 2 . point on the given line are ( 53 , 15 ). Distance = 2/√5.
28.10 (b) 2x + 2y − z = 4; one normal is (2, 2, −1). 30.5 (b) (0, 0), (2, 0). (A suitable parametrization is
(d) 3x + 4y + 8z = 29; one normal is (− 32 , −2, −1). x = 1 + cos t, y = sin t.)
28.11 78.9° or 101.1°. (d) (±6 /√5, ±4 /√5). (A suitable parametrization
would be x = 2/cos t, y = 2 tan t.)
28.12 (b) (1, −1), min; (d) (nπ, mπ); min if n and
m odd, max if n and m even, otherwise saddle; 30.8 (b) F = −2KI sin θ + } cos θ − I2r cos θ − Jr sin θ,
(h) (0, 0) saddle; (1, 1) minimum; (k) (0, 0), saddle. H = 2KI cos θ + } sin θ − I2r sin θ + Jr cos θ.
30.9 (c) ∂f/∂u = −2v2/u3, ∂f/∂v = 2v/u.
28.14 (a) a = b = c = 7; (b) a = b = c = 4.
30.10 (b) ∂2f/∂u2 = 12u2 − 2v2, ∂2f/∂u ∂v = −4uv,
28.15 The maximum is 9, attained at (2, ±1).
∂2f /∂v2 = −2u2 + 12v 2.
28.16 Minimum distance = √2.
30.11 It is easiest to put x2 − y2 in terms of uv. Finally,
−2 1 1 1
28.18 (b) Depth = 2 3 V 3 ; square base, side 2 3 V 3 . ∂2f /∂u2 = 16v2g″(4uv), ∂2f /∂v2 = 16u2g″(4uv),
28.23 Lowest point is z = 43 a at (0, a) and (a, a). ∂2f /∂u∂v = 4g′(4uv) + 16uvg″(4uv).
945
1+ y 1− y


0


1
Chapter 31 (g) f (x, y) dx dy + f (x, y) dx dy.

ANSWERS TO SELECTED PROBLEMS


2 −2
3 −1 0 0 0
31.1 (b) δf ≈ −x(x + y ) e δx 2 −t

− y(x 2 + y 2 )− 2 e − t δy − (x 2 + y 2 )− 2 e − t δt. 32.6 (b) 32 ; (d) 32 ; (h) 1


3 1
12 .
(e) δf ≈ 2(x1 − x2 ) δx1 − 2(x1 − x2 ) δx2 32.7 (b) 1.
+ 2(y1 − y2 ) δy1 − 2(y1 − y2) δy2.
π
32.8 (b) 14 π; (d) 15
8 ; (f) 16 .
31.2 −0.07.
32.9 2a (4 + 3π)/9.
3

31.3 It is easiest to write δ(1 /R) ≈ −δR/R2. We obtain


32.10 (a) 2(u2 + v2); (b) 2; (c) 1/5; (d) −2 cosh v.
δR ≈ 0.198 δR1 + 0.018 δR2 + 0.334 δR3. The required
δR3 is −0.108. 32.11 The value of the integral is 2(257 − 129√2)/5.

31.4 Put ax − bx − c = f(a, b, c, x) and use (31.1).


3 32.12 Area = 1/12.

31.5 (b) Hint: use logarithmic differentiation: 32.13 1/e.


δw ≈ −3 δx + 3 δz and δw ≈ 2(±0.6). What is the 32.14 Volume = 64/3.
significance of the absence of a term in δy?
32.15 1/4.
31.8 (b) 2δx + 4δy − 6δz = 0. For ∂z /∂x, put δy = 0:
32.18 (a) √π(| b | − | a| ).
∂z/∂x = 13 . Similarly ∂z/∂y = 23 .
31.11 (b) (2, −3, 5); (d) (3x2, 0, 9z2);
1 Chapter 33
(f) (−x/r 3, −y/r 3, −z /r 3), where r = (x 2 + y 2 + z 2 ) 2 .
1
33.1 (b) 1.
31.12 (b) (0, 2y, 2z). Unit vector = (0, y /(y 2 + z 2 ) 2 ,
1 33.2 (b) 43 ; (d) 23 ; (f ) 0.
z /(y 2 + z 2 ) 2 ).
33.3 (b) π; (d) 83 π3.
31.13 (b) cos φ = 11 /(3√14), so φ = 11.5° (i.e. the
angle of intersection of smallest magnitude). 33.5 (b) 2; (d) 0; (g) 0.

31.15 (b) v·(2x, −2y, −3). 33.6 (b) 1; (d) −3; (f ) 15


2 .

31.16 (b) (Check that v as given is a unit vector.) 33.7 (b) 0.


df 33.8 (b) − 32 .
= 7.51.
ds 33.9 Zero.
31.17 (b) −2î − 2x.
33.11 Put x = x(u, v) and y = y(u, v), where u
31.18 (b) (±1, 0, 0) and (±1, 14 , ). 3
16
and v are the new coordinates. Then put
(d) x = y = z is a line of stationary points (excluding ∂x ∂x
dx = du + dv etc.
the origin). (e) x = y = z = ±1, √3, λ = ± --21 √3. ∂u ∂v
31.19 Stationary at (1, 0, 0), (−1, 0, 1), (−1, 0, −1). 33.14 38 π.

31.21 (b) (3, 3, 3); (e) (a/√3, b /√3, c/√3); (g) ( 13, 73 , 1). 33.16 (b) Non-conservative.

31.26 (b) 4xy = 1; (d) x + y = 1. 2 2

Chapter 34
34.1 π[a3 − (a − h)3]/a.
Chapter 32
34.2 (a) 1 /84; (b) 1 / 24; (c) 13 / 384.
32.1 (b) e − 2; (d) (d − c)(b − a); (i) − --31 ; (m) --21 ln 2.
34.5 2√6 + 2 sinh−1(√2).
32.2 (b) Zero. Refer to the signed-volume analogy
(30.2b). (f ) ln(27/16). 34.6 Scalar potential is exyz + cos xy + zx + C.

32.4 4
3 . 34.7 Scale factors are h1 = h2 = √(u2 + v2), h3 = uv.

34.11 (b) div F = 2z.



1 x

32.5 (b) f (x, y) dy dx.


34.12 (b) curl F = 2xî + (x − 2y)q + x.
0 0
√(1− x ) 34.14 (a) 5r 2; (b) 0; (c) 3rr; (d) 0; (e) 0; (f ) 12r.
2


1

(d) f (x, y) dy dx.


−1 0
34.18 0.
946
Chapter 35 37.17 The transfer function is
ANSWERS TO SELECTED PROBLEMS

PG1G2G3
35.1 (c) −2, −1, 0, 1, 2, 3, 4; (f ) 1, 4, 9. Q= .
1 − G2H1 + G1G2G3H2
35.3 (c) A ∪ B = {−4, −3, −2, −1, 1, 2, 3, 4}.
37.19 (a) The transfer function is
35.4 (b) A ∩ B = {x |x ∈  and −5  x  2}.+ PG1G2G3
Q= .
(d) A ∩ B = {1}. 1 + G2H1 − G1G2G3H2
35.5 (b) B\(A ∪ C); (d) (B ∩ D)\ A. G1G3
37.20 (a) .
(1 − G1G2H2 )(1 + G3H1 )
35.6 (b) S1\(S ∪ S2 ∪ ··· ∪ Sr).
G1G2G3G4 GGG
35.7 (b) [(A\A1)\B1] ∪ B2.
(d) + 5 6 7.
1 + G2G3H2 1 − H1
35.10 A2 = {(1, 1), (1, 2), (2, 1), (2, 2)}. 37.24 SAFT, length 12.

35.12 (b) 66. 37.25 2 ties.

37.27 (b) Framework is overbraced.


Chapter 36 37.28 Two ties.

36.6 See table below. 37.29 Waiting times are 13T/3 and 4T.

36.10 (a) (a *b) *(b ⊕ c); (d) (\A\\*\b\\9\a\\*B


\ \)\
\ (c\ \*\d\)
\.
Chapter 38
36.15 (a) If a1 represents the state of switch S1, etc.,
then the switching function is 38.1 £1790.85, 0.487%.

(a1 ⊕ a2) ⊕ [(a3 ⊕ a4) *a5]. 38.2 (b) 16.9 years.

36.16 See the table below. 38.3 (b) 0, (−1 ± √13)/6.

38.6 f(n) = (ln n)/ln 2.


Solution for 36.6 Solution for 36.16
38.8 (b) un = A3n + B(−3)n.
a b c (a ⊕ B) *(a ⊕ C) a1 a2 a3 f (c) un = 3n(A cos 12 nπ + B sin 12 nπ).
0 0 0 1 0 0 0 0 38.11 (a) (ii) un = − 163 n + 18 n 2.
0 0 1 0 0 0 1 1 (b) (ii) un = 15 n + 11 25 .
0 1 0 0 0 1 0 1 (c) (iii) un = 12 n 2.
0 1 1 0 0 1 1 0 (d) (iii) un = 181 n 2 3n.
1 0 0 1 1 0 0 1 38.13 Dn(1) = n + 1.
1 0 1 1 1 0 1 0
38.16 un = 5
4 − 94 (− 13 )n.
1 1 0 1 1 1 0 0
38.17 dk = k(N − k).
1 1 1 1 1 1 1 1
38.19 sn = 14 n2(1 + n)2.

38.22 0  α  1.
Chapter 37
38.23 Oscillates between 0.4953 and 0.8124.
37.3 Twenty are planar.
38.24 The periodic values of the 2-cycle are 0.4
37.4 Five are connected. and 0.8.
37.8 Six not including reversed order.

37.13 There are three different paths between a and e. Chapter 39


37.14 Five vertices. 39.1 (d) 63.

37.15 i1 = − 4
i , i = − i , i3 = − i , i4 = − i ,
1 10 11 39.2 The probability that the score is 7 or less is 7/12.
21 0 2 21 0 21 0 21 0
i5 = 1
21 0i , i6 = 4
i ,i = − i .
21 0 7
3
7 0 39.5 n(A ∪ B) = 10.
947
39.6 (b) Ace of clubs or ace of spades drawn; 40.6 Mean number of non-faulty components to
(d) any ace or any heart or any black card drawn; failure is 82.33; standard deviation of the number

ANSWERS TO SELECTED PROBLEMS


(f) any heart except the ace of hearts; (h) ace of hearts of components to failure is 82.83.
or any black card.
40.7 1/29.
39.7 (b) 1/221; (b) 0.004 166… .
40.9 Probability that a bottle fails the test is 0.000 67.
39.9 (b) 5040; (d) 7.
40.10 (a) 0.777; (c) 0.223.
39.11 (a) 27 216; (c) 3360.
40.11 (b) 0.528.
39.12 (b) 156 849.
40.13 (b) P(Z  0.7) = 0.758.
39.15 270 725; 0.010 56… .
40.14 On average 30% of operations take longer
39.17 (a) 9/209; (c) 16 /665; (d) 683 /1463. than 40 seconds.
39.18 (b) 0.872; (c) 0.4. 40.15 Standard deviation of 1 if a = √5 and
39.19 (b) 0.37; (c) 0.82. A = 3/(20√5).

39.20 With the same probability of failure 0.98, 40.16 Maximum value of standard deviation is 121.6.
probability that circuit fails is 0.963.
40.17 Probability that just two bulbs will be still
39.21 (b) 1 /495; (c) 4 /99. working is 0.242.

39.22 Overall probability is approximately 1/53.7.


Chapter 41
39.23 Mean number of plays to the end of the game
n−1
is 2 /n. 41.1 (b) Mean = 24.1; median = 24.5;
interquartile range = 17.
Chapter 40 41.3 Sample mean = 25.3; mode = 25.1; variance
= 0.0644.
40.1 P(X  1) = 0.833.
41.5 About 11 intervals.
40.2 P(X  6) = 1 /32.
41.6 Estimated variance of the sample is 1 /12.
40.3 Mean = 4; standard deviation = 1.633.
41.8 k1 = −1.1337; k2 = 1.1337.
40.4 Mean = (a + b)/2; standard deviation =
(b − a)/(2√3). 41.9 For full data â = − 0.0071; S = 18.76.
Appendices

A Some algebraical rules


(a) Index laws for real numbers
(i) a0 = 1.
(ii) a paq = a p+q.
(iii) a −p = 1/a p.
(iv) (a p)q or (a q)p = a pq (so a p/q = (a p)1/q or (a1/q)p).
(v) apbp = (ab)p
1
Conventionally, a–2 and √a represent the positive root when we are talking about
real numbers (for complex numbers, see Chapter 6). For all the rules to hold in all
1
cases, a must be positive so that a p/q is a always real number. For example, (−8)–2 or
1
√(−8) is not real: there is no real number whose square is equal to −8. But (−8)–3
or √(−8) = −2.
3

(b) Quadratic equations


ax2 + bx + c = 0 has the solutions
x1, x2 = [−b ± √(b2 − 4ac)] /2a.
(i) In terms of x1 and x2, the factors are
ax2 + bx + c = a(x − x1)(x − x2).
(ii) Sum and product of solutions:
x1 + x2 = −b/a, x1x2 = c /a.

(c) Binomial theorem


(i) If n is a positive integer (or whole number)
n(n − 1) n − 2 2 n(n − 1)(n − 2) n − 3 3
(a + b)n = an + nan−1b + a b + a b +  + bn
2! 3!
n
⎛ n⎞
= ∑ ⎜⎝ r ⎟⎠ a n −r r
b
r =0

where the binomial coefficients are denoted by


⎛ n⎞ n!
⎜ ⎟ = .
⎝ r ⎠ (n − r)!r !
There are (n + 1) terms in this sum, and it is symmetrical in a and b.
An important special case is
n(n − 1) 2 n(n − 1)(n − 2) 3
(1 + x)n = 1 + nx + x + x +  + xn .
2! 3!
949
(ii) Pascal’s triangle. Each entry (apart from the numerals 1) is the sum of two
previous entries – that above, and that above and to the left – as illustrated

APPENDICES
by the underlined group:
n=1 1 1
n=2 1 2 1
n=3 1 3 3 1
n=4 1 4 6 4 1
and so on. Thus
(1 + x)4 = 1 + 4x + 6x 2 + 4x3 + x4.
(iii) Permutations and combinations (see Section 1.17).
n! n!
P =
n r , Cr =
n .
(n − r)! (n − r)! r !

(d) Factorization
a2 − b2 = (a + b)(a − b),
a3 − b3 = (a − b)(a2 + ab + b2),
a3 + b3 = (a + b)(a2 − ab + b2).

(e) Constants
e = 2.718 281 82… ,
π = 3.141 592 65… ,
1 radian = 57.295 78… °,
1° = 0.017 45… radians,
360° = 2π radians.

(f ) Sums of powers of integers


n

∑r = 1 + 2 + 3 +  + n = 1
2 n(n + 1)
r =1
n

∑r 2 = 12 + 22 + 32 +  + n 2 = 61 n(n + 1)(2n + 1)
r =1
n

∑r 3 = 13 + 2 3 + 33 +  + n 3 = 14 n 2 (n + 1)2 .
r =1

B Trigonometric formulae
(a) Relation between trigonometric functions
sin2A + cos2A = 1,
tan A = sin A/cos A; sec A = 1/cos A; cosec A = 1 /sin A.

(b) Addition formulae


sin(A ± B) = sin A cos B ± cos A sin B,
cos(A ± B) = cos A cos B z sin A sin B,
tan(A ± B) = (tan A ± tan B)/(1 z tan A tan B).
950
(c) Addition formulae: special cases
sin 2A = 2 sin A cos A,
APPENDICES

cos 2A = cos2A − sin2A


= 2 cos2A − 1 = 1 − 2 sin2A,
tan 2A = 2 tan A/(1 − tan2A),
sin 3A = 3 sin A − 4 sin3A,
cos 3A = 4 cos3A − 3 cos A.

(d) Product formulae


sin A sin B = 12 [cos(A − B) − cos(A + B)],
cos A cos B = 12 [cos(A − B) + cos(A + B)],
sin A cos B = --12 [sin(A − B) + sin(A + B)].
sin C + sin D = 2 sin 21 (C + D) cos 21 (C − D),
sin C − sin D = 2 sin 21 (C − D) cos 21 (C + D),
cos C + cos D = 2 cos 21 (C + D) cos 21 (C − D),
cos C − cos D = −2 sin 21 (C + D) sin 21 (C − D).

(e) Product formulae: special cases


sin2A = --12 (1 − cos 2A),
cos2A = --12 (1 + cos 2A),
sin3A = --14 (3 sin A − sin 3A),
cos3A = --14 (3 cos A + cos 3A).

(f) Triangle formulae


(i) α + β + γ = 180°.
(ii) Cosine rule: a2 = b2 + c2 − 2bc cos α.
sin α sin β sin γ
(iii) Sine rule: = = .
a b c

β
a
c

γ
α
b

(g) Trigonometric equations


In the following, n represents any integer (i.e. any whole number, positive or
negative); x is in radians.
(i) sin x = 0 and tan x = 0 when x = nπ; cos x = 0 when x = 12 π + nπ.
(ii) The following formulae show how to obtain all the solutions of certain
equations when one solution has been obtained (e.g. a hand calculator or
951
a computer gives only one solution of sin x = − , namely x = arcsin(− --) =
1
2
1
2
− 0.5236… ).

APPENDICES
If sin α = c, then all the solutions of sin x = c are x = nπ + (−1)nα.
If cos β = c, then all the solutions of cos x = c are x = 2nπ ± β.
If tan γ = c, then all the solutions of tan x = c are x = nπ + γ.

(h) Hyperbolic functions


cosh x = 12 (ex + e−x); sinh x = 12 (ex − e−x); tanh x = sinh x /cosh x;
sech x = 1 /cosh x; coth x = cosh x /sinh x; cosech x = 1/sinh x,
sinh(x ± y) = sinh x cosh y ± cosh x sinh y,
cosh(x ± y) = cosh x cosh y ± sinh x sinh y,
cosh2x − sinh2x = 1,
sinh 2x = 2 sinh x cosh x,
cosh 2x = cosh2x + sinh2x,
cosh ix = cos x; sinh ix = i sin x;
1
sinh−1x = ln[x + (x2 + 1)–2 ],
1
cosh−1x = ln[x + (x2 − 1)–2 ] (x  1),
tanh−1x = 12 ln[(1 + x)/(1 −x)] (−1  x  1).

C Areas and volumes


(a) The area of a triangle is 12 bh, where b is the length of one side and h its
height from that side.
(b) The circumference of a circle is 2πr, where r is its radius.
(c) The area of a circle is πr 2, where r is its radius.
(d) The area of a circle sector is 12 r 2θ, where r is its radius and θ the angle of the
sector in radians.
(e) The volume of a sphere is 43 πr 3, where r is its radius.
(f) The surface area of a sphere is 4πr 2, where r is its radius.
(g) The volume of a cone is 13 Ah, where h is its height and A the cross-sectional
area of its base.
(h) The area of an ellipse is πab, where a and b are the lengths of its semi-axes.
(i) The area of a regular n-sided polygon of side-length a is 14 na2 cot(π/n).
952

D A table of derivatives
APPENDICES

dy
y
dx

c (constant) 0
xn (n any constant) nx n−1
eax a e ax
kx (k  0) kx ln k
ln x (x  0) x −1
sin ax a cos ax
cos ax −a sin ax
tan ax a /cos2ax
cot ax −a/sin2x
sec ax (a sin ax)/cos2ax
cosec ax −(a cos ax) /sin2ax
1
arcsin ax a/(1 − a2x2)–2
1
arccos ax −a/(1 − a2x2)–2
arctan ax a/(1 + a2x2)
sinh ax a cosh ax
cosh ax a sinh ax
tanh ax a/cosh2ax
1
sinh−1ax a/(1 + a2x2)–2
1
cosh−1ax a/(a2x2 − 1)–2
tanh−1ax a/(1 − a2x2)
dv du
u(x)v(x) u +v
dx dx
u(x) 1 A du dv D
v −u F
v(x) v 2 C dx dx
1 1 dv

v(x) v2 dx
dy du
y(u(x))
du dx
dy dv du
y(v(u(x)))
dv du dx
953

E Tables of indefinite and definite integrals

APPENDICES
f(x)  f(x) dx (C is an arbitrary constant.)
1
xm (m ≠ −1) xm + 1 + C
m+1
x−1 ln| x| + C, or ln |Cx|
e ax (1/a) eax + C
k x (k  0) kx /ln k + C
ln x (x  0) x ln x − x + C
sin ax −(1/a) cos ax + C
cos ax (1/a) sin ax + C
tan ax −(1/a) ln |cos ax| + C or −(1/a) ln|C cos ax|
cot ax (1/a) ln |sin ax | + C or (1/a) ln|C sin ax|
sec ax −(1/2a) ln [(1 − sin ax)/(1 + sin ax)] + C
cosec ax (1/2a) ln[(1 − cos ax)/(1 + cos ax)] + C
1
arcsin ax (1/a)(1 − a2x2)–2 +
1
x arcsin ax + C
arccos ax −(1/a)(1 − a2x2)–2 +1 x arccos ax + C
arctan ax −(1/a)ln(1 − a2x2)–2 + x arctan ax + C
sinh ax (1/a) cosh ax + C
cosh ax (1/a) sinh ax + C
tanh ax (1/a) ln{cosh ax} + C
1/(x2 + a2) (1 /a) arctan(x /a) + C
1/(x2 − a2) (1 /2a) ln |(x − a)/(x + a) | + C or
(1 /a) tanh−1(x/a) + C
1 /(a2 − x2 )2 arcsin(x /a) + C (or −arccos(x/a) + C)
1

1 /(a2 + x2 )2 (1/a) sinh−1(x /a) + C or ln[x + (x2 + a2 )2 ] + C


1 1

1/(x2 − a2 )2 ln[x + (x2 − a2 )2 ] + C


1 1

x eax (1/a2)(ax − 1) eax + C


x cos ax (1/a2)(cos ax + ax sin ax) + C
x sin ax (1/a2)(sin ax − ax cos ax) + C
2 x ln x − 4 x + C
1 2 1 2
x ln x
eax cos bx [1/(a2 + b2)] eax(a cos bx + b sin bx) + C
eax cos bx [1/(a2 + b2)] eax(−b cos bx + a sin bx) + C
954
A table of definite integrals
APPENDICES

 a dx+ x = 2aπ
0
2 2

1– –1 π

 sin x dx =  cos x dx = 1
2

0 0
π

 sin mx sin nx dx = !@--0,π, mm ≠= nn#$ (m, n positive integers)


0
1
2
π

 cos mx cos nx dx = !@--0,π, mm ≠= nn#$ (m, n positive integers)


0
1
2
π

 sin mx cos nx dx = !@0,2m/(m − n ), mm ++ nn even


0
2
odd $
2
#
(m, n positive integers)

 x e dx = n! (n = 0, 1, 2, … )
0
n −x

 e dx = 12 πa (a  0)
0
−ax2

 e cos bx dx = a +a b (a  0)
0
−ax
2 2

 e sin bx dx = a +b b (a  0)
0
−ax
2 2

Gradshteyn and Ryzhik (1994) is a useful source of hundreds of indefinite and definite
integrals.
955

F Laplace transforms, inverses, and rules

APPENDICES
In the following tables, n and m represent a positive integer or zero. The constants
k and c are arbitrary unless otherwise indicated.

Transforms Inverses

f (t) F(s) = e
0
− st
f (t)dt F(s) f(t)

n! 1 1
tn t m−1
sn+1 sm (m − 1)!
1 1
e kt ekt
s−k s−k
n! 1 1
t n e kt tm−1 ekt
(s − k )n+1 (s − k )m (m − 1)!
s s
cos kt cos kt
s + k2
2
s + k2
2

k 1 1
sin kt sin kt
s2 + k 2 s2 + k 2 k
s2 − k 2 s2 − k 2
t cos kt t cos kt
(s2 + k 2 )2 (s2 + k 2 )2
2ks s 1
t sin kt t sin kt
(s + k 2 )2
2
(s + k 2 )2
2
2k
H(t − c) (c  0) e−cs/s e−cs/s (c  0) H(t − c)
δ(t − c) (c  0) e−cs e−cs (c  0) δ(t − c)

Summary of rules: In the following rules, F(s) ↔ f(t).


1 ⎛ s⎞ 1 ⎛ t⎞
Scale rule (24.5) f (kt) ↔ F⎜ ⎟ and F(ks) ↔ f ⎜ ⎟ (k  0).
k ⎝ k⎠ k ⎝ k⎠
Shift rule, or multiplication If k is any constant, ekt f(t) ↔ F(s − k).
by ekt (24.7)
dn F(s)
Powers of t (24.8) If n is a positive integer, then t n f (t) ↔ (−1)n .
d sn
df (t) d2 f (t)
Derivatives (24.12) ↔ sF(s) − f(0), ↔ s2F(s) − sf(0) − f ′(0).
dt dt 2
Delay rule (24.15) If c  0, then e−csF(s) ↔ f(t − c)H(t − c) (where H is the
Heaviside unit function).
t

 f(τ ) dτ .
1
1/s as an integration If F(s) ↔ f(t), then F(s) ↔
operator (25.1) s 0

Convolution theorem (25.11) If g(t) ↔ G(s) and f(t) ↔ F(s), then


t
⎛ t

F(s)G(s) ↔
0
g(t − τ )f (τ ) dτ ⎜ =


0
g(τ )f (t − τ ) dτ ⎟ .

956

G Exponential Fourier transforms and rules


APPENDICES

Signal (time function) Transform (frequency distribution)


∞ ∞
Fourier transform pair x(t) = 
−∞
X( f ) ei2 πft df X( f ) = −∞
x(t) e −i2 πft dt

Linearity Ax1(t) + Bx2(t) AX1( f ) + BX2( f )


Time scaling x(At) |A |−1X(A−1f )
Time reversal x(−t) X(−f )
Time delay x(t − B) X(f ) e−i2πBf
Frequency scaling |C|−1x(C −1t) X(Cf )
Frequency shift x(t) ei2πDt X( f − D)
Modualtion x(t) cos 2πKt --12 [X( f + K) + X( f − K)]
x(t) sin 2πKt --12 i[X( f + K) − X( f − K)]
Differentiation dx(t)/dt (i2πf )X( f )
dnx(t)/dtn (i2πf )nX( f )
Duality X(t) x(−f )

Convolution −∞
x1(u)x2(t − u) du = x1(t) * x2(t) X1( f )X2( f )

=  −∞
x1(t − u)x2(u) du

Multiplication x1(t)x2(t)  −∞
X1( f − v)X2(v) dv

=  −∞
X2(v)X2( f − v) dv

Periodic function xP(t) xP(t) (period T) ∑X
n= −∞
n δ( f − nf0 ), where f0 = 1/T,

Xn = f0 ∫ PeriodxP(t) e −2 π i f0 t dt

Short table of Fourier transforms

Signal Transform Signal Transform

Π(t) = H(t − 12 ) − H(t + 12 ) sinc f 1


π e−2π| f |
sinc t Π(f ) 1 + t2
e−πt e−πf
2 2
⎧1 + t, −1  t  0

Λ(t) = ⎨1 − t, 0  t  1 sinc f 2
δ(t) 1
⎪⎩0, elsewhere 1 δ(f )
Λ(f ) 2 [δ(f + f0) + δ(f − f0)]
1
sinc2t cos 2πf0t
1/(1 + i2πf ) 2 i[δ(f + f0) − δ(f − f0)]
1
e−t H(t) sin 2πf0t

t e−t H(t) 1/(1 + i2πf )2 =T(t) = ∑ δ(t − nT) (T  0) f0=f0(f ) (f0 = 1/T)
e−|t | 2/(1 + 4π2f 2) n= −∞
957

H Probability distributions and tables

APPENDICES
(a) Distributions, means, and variances
(i) Discrete distributions
Distribution Probability Mean ( µ ) Variance (σ 2)

n!pr q n− r
Binomial np np(1 − p)
(n − r )!r !
1 1−p
Geometric (1 − p)r−1p
p p2
λ n e−λ
Poisson λ λ
n!
k k(1 − p)
Pascal r−1 Ck−1 pk(1 − p)r−k
p p2
C C
w r b n− r nb nwb(b + w + n)
Hypergeometric
w + bCn w +b (w + b)2(w + b − 1)

(ii) Continuous distributions


Distribution Density Mean ( µ ) Variance (σ 2)

⎧λ e − λ x, x  0 1 1
Exponential ⎨0, x0
⎩ λ λ2
⎧1/(b − a), a  x  b
Uniform ⎨0,
1
2 (a + b)2 1
12 (b − a)2
⎩ elsewhere

Standardized 1 0 1
e− 2x
1 2

normal

(b) Cumulative normal distribution tables


Standardized cumulative normal distribution giving the values of
x


1
Φ(x) = e− 2 t dt
1 2

2π −∞

for 0  x  3.0 at 0.01 intervals. For x  0, Φ (x) can be calculated from Φ(−x) =
1 − Φ (x).

Φ (x)

x
958

x 0 1 2 3 4 5 6 7 8 9
APPENDICES

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.0633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9137 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

Table giving x for specified values of Φ(x) for 0.50  Φ(x)  0.99 at 0.01 intervals

Φ (x) x Φ (x) x Φ (x) x


0.50 0.0000 0.67 0.4399 0.84 0.9945
0.51 0.0251 0.68 0.4677 0.85 1.0364
0.52 0.0502 0.69 0.4959 0.86 1.0803
0.53 0.0753 0.70 0.5244 0.87 1.1264
0.54 0.1004 0.71 0.5534 0.88 1.1750
0.55 0.1257 0.72 0.5828 0.89 1.2265
0.56 0.1510 0.73 0.6138 0.90 1.2816
0.57 0.1764 0.74 0.6433 0.91 1.3408
0.58 0.2019 0.75 0.6745 0.92 1.4051
0.59 0.2275 0.76 0.7063 0.93 1.4758
0.60 0.2533 0.77 0.7388 0.94 1.5548
0.61 0.2793 0.78 0.7722 0.95 1.6449
0.62 0.3055 0.79 0.8064 0.96 1.7507
0.63 0.3319 0.80 0.8416 0.97 1.8808
0.64 0.3585 0.81 0.8779 0.98 2.0537
0.65 0.3853 0.82 0.9154 0.99 2.3263
0.66 0.4125 0.83 0.9542
959

I Dimensions and units

APPENDICES
Physical quantities of different types, such as acceleration, force, momentum,
electrical potential, can be classified by expressing them as simple combinations
of certain primary dimensions such as mass, length and time. These expressions
determine how we can state the magnitude of a physical quantity – for example
any velocity can be expressed in metres per second, but never in metres per
kilogram. Five primary dimensions provide a basis sufficient for all common
purposes. Their names, the algebraic symbols denoting their dimension, and
appropriate units (the international (SI) system) are shown in the following table.

Basic quantity Dimension symbol SI unit Unit symbol

length L metre m
mass M kilogram kg
time T second s
electric current I ampere A
absolute temperature θ Kelvin K

We can now assign dimensions to any derived physical quantity, which classifies
it without indicating its magnitude. For example, the velocity at any moment of any
particle, substance, electromagnetic wave, etc., could in principle be measured as
(a distance travelled)/(time taken). Symbolically we write:
[velocity] = LT−1,
where the square brackets mean ‘the dimension of’. The form LT−1 of the right-
hand side indicates that an appropriate SI unit of measurement would be metres
per second. The following table comprises mechanical and electromagnetic
quantities, their dimensions, and conventional SI terms for certain special units
of measurement. Notice how known dimensional forms may be multiplied and
divided to obtain more complicated ones.

Derived quantity Dimension SI unit Symbol Name

angle dimensionless – rad radian


velocity (displacement/unit time) LT−1 m s−1 – –
acceleration (velocity/unit time) LT−2 m s−2 – –
force (mass × acceleration) MLT−2 kg m s−2 N newton
momentum (mass × velocity) MLT−1 kg m s−1 – –
moment of momentum ML2T−1 kg m2 s−1 – –
pressure (force/unit area) ML−1T−2 N m−2 Pa pascal
work, energy (force × distance) ML2T−2 Nm J joule
area L2 m2 – –
volume L3 m3 – –
power (work/unit time) ML2T−3 J s−1 W watt
angular frequency (radian/unit time) T−1 s−1 Hz hertz
charge (current × time) IT As C coulomb
potential (work/unit charge) ML2T3I−1 J C−1 V volt
resistance (potential/unit current) ML2T3I−2 V A−1 Ω ohm
magnetic flux (work/unit current) ML2T−2I−1 J A−1 Wb weber
inductance (magnetic flux/unit current) ML2T−2I−2 Wb A−1 H henry
960
For comprehensive tables of units and constants, consult Kaye and Laby (1995).
If two physically meaningful expressions are equal, then both sides must obvi-
APPENDICES

ously have the same physical dimensions. This often provides a useful check on a
calculation. Also, in any expression containing the sum of two or more terms, the
terms must all have the same dimensions if it is to make any physical sense. For
example, expressions equivalent to the form (energy + momentum), or (current +
voltage) can have no physical significance. However, in such cases the dimensions
of any letters used as constant factors must not be overlooked: the expression
(momentum + (k × energy)) could be meaningful provided [k] = TL−1.
The dimensions of quantities that appear as derivatives and integrals are
treated in the following way. Suppose for example that t is time ([t] = T) and x(t)
is a function representing displacement ([x] = L). Then
G dx J G d2x J
I dt L = LT −1
, I dt2 L = LT ,
−2

and so on. Also

 
G b
J G d
J
I x(t) dtL = LT, and I t dtL = T2.
a a

These follow from the definition of the integral as a sum.


Dimensional analysis is helpful in checking the validity of equations. For
example, in the pendulum equation (see eqn 20.22)
d2θ g
+ sin θ = 0
dt2 l
all terms should have the same dimensions, which is true since
G d2θ J Gg J
I dt2 L = T
−2
and Il sin θL = LT L = T ,
−2 −1 −2

where g is the acceleration due to gravity, l is the length of the pendulum, and the
angle θ and sin θ are dimensionless. Physically the equation
d2θ g
+ sin θ = 0,
dt2 l2
with the same definition of symbols could not represent a general physical law
because the dimensions of the two terms are different.
Dimensionless analysis indicates how equations can be simplified by making
them dimensionless. In the pendulum equation above, let τ = t√(g/l). Then the
dimensionless pendulum equation becomes
d2θ
+ sin θ = 0
dτ 2
which includes pendulums of all lengths, in any uniform gravitational field.
Further reading

Abell, M.L. and Braselton, J.P. (1992) Mathematica by Example, Academic Press, San
Diego.
Blachman, N. (1992) Mathematica: A Practical Approach, Academic Press, San Diego.
Boyce, W.E. and DiPrima, R.C. (1997) Elementary Differential Equations and Boundary
Value Problems (6th edn), Wiley, New York.
Garnier, R. and Taylor, J. (1991) Discrete Mathematics for New Technology, Adam
Hilger, Bristol.
Gradshteyn, I.S. and Ryzhik, I.M. (1994) Table of Integrals, Series, and Products (5th
edn), Academic Press, San Diego.
Grimmett, G.R. and Stirzaker, D.R. (2001), Probability and Random Processes (3rd edn),
Oxford University Press.
Jordan, D.W. and Smith, P. (2007a) Nonlinear Ordinary Differential Equations (4th
edn), Oxford University Press.
Jordan, D.W. and Smith, P. (2007b) Nonlinear Ordinary Differential Equations:
Problems and Solutions, Oxford University Press.
Kaye and Laby (1995) Tables of Physical and Chemical Constants, National Physical
Laboratory (16th edn) (available online at www.kayelaby.npl.co.uk).
Montgomery, D.C. and Runger, G.C. (1994) Applied Statistics and Probability for
Engineers, Wiley, New York.
Råde, L. and Westergren, B. (1995) Mathematics Handbook for Physics and Engineering,
Studentlitteratur, Lund.
Riley, K.F., Hobson, M.P. and Bence, S.J. (1997) Mathematical Methods for Physics and
Engineering, Cambridge University Press.
Roberts, G.E. and Kaufman, H. (1966) Table of Laplace Transforms, Saunders,
Philadelphia.
Seggern, D.H. von (1990) CRC Handbook of Mathematical Curves and Surfaces, CRC
Press, Baton Roca.
Skeel, R.D. and Keeper, J.B. (1993) Elementary Numerical Computing with
Mathematica, McGraw-Hill, New York.
Whitelaw, T.A. (1983) An Introduction to Linear Algebra, Blackie, Glasgow.
Wilson, R.J. and Watkins, J.J. (1990) Graphs: An Introductory Approach, Wiley, New
York.
Wolfram, S. (1996) The Mathematica Book (3rd edn), Wolfram Media/Cambridge
University Press.
Zwillinger, D. (1992) Handbook of Differentials Equations (2nd edn), Academic Press,
Boston.
Index

Pages of the main topics are given in heavy type for quick reference.

A linear 646 ordinate 6


Newton’s method (for origin 6
Abscissa 6
equations) 116–119 right-handed 6, 198, 246
absolute value 6
rectangle rule 322, 347 rotation of 223, 226–229
acceleration 68, 212
Simpson’s rule 355, 925
polar components 216
step-by-step 108
radial 216 B
Taylor polynomials 125, 922
transverse 216
trapezium rule 347, 924 bar chart 888
vector 213
arccos, arcsin, arctan functions basis
adjacency matrix 837
25 differential equations 385–390
adjoint (adjugate) matrix 189,
arc length 355 vectors 210
190
area (see also integrals) Bayes’ theorem 880
admittance 536
analogy for integrals 333, 346 beam problem 354
algebra, Boolean 801–813 (see
as a definite integral 323 beats 431–437
also Boolean algebra)
geometrical 344 frequency 432
algorithm, (numerical) 118, 464
as a line integral 761 period 432
(see also approximation)
parallelogram 248 Bernoulli
amplitude 22, 414
in polar coordinates 345 equation 478
complex 453
signed 314, 320, 327 trial 887
angle 16
as a sum 285 binary
degree 16
of a surface 343 operation 801
dimension 959
table 951 set 789, 802
polar 12
trapezium rule 347 binomial distribution 887–888
radian 16, 949
of a triangle 951 mean 889
angular (circular) frequency 22,
Argand diagram 144 Poisson approximation 893
415, 428
imaginary axis 144 variance 890
angular momentum 258
parallelogram rule 145 binomial theorem 51–54, 120,
angular spectrum function 612
for phasors 443 948
angular velocity 258
real axis 144 coefficient 51
antiderivative 307–318
argument Pascal’s triangle 52
and area 314
complex number 146 Taylor series 131
bracket notation 316
function 13 bins (statistics) 904
composite 317
principal value 146 bipartite graph 832
table of 313
asymptote 11, 109, 113, 114 complete 832
antidifferentiation 307–318 (see
attenuation 611 bisection method 122
also antiderivative)
attractor 858 block diagram 827
approximations
strange 858 reduction 828
algorithm 118, 464
augmented matrix 263 Boolean algebra 801–813
bisection method 122
autonomous differential absorption laws 802
Euler method 463, 499
equations 481 algebra 802
Gauss–Seidel method 273
axes, cartesian 6 AND gate 804
incremental 115, 645, 683
abscissa 6 associative laws 802
iterative process 118
coordinates 6 binary addition 813
Jacobi method 274
left-handed 6, 198, 246 binary operation 801
lineal element diagram 461,
oblique 257 binary set 802
926
963
Boolean expression 803 cdf (cumulative distribution complex impedance 446 –451
commutative laws 802 function) 895 (see also impedance)

INDEX
complement 801 cells (statistics) 904 complex numbers 140 –156 (see
complement laws 802 central limit theorem 911 also Argand diagram)
conjunction 804 centre (phase plane) 484, 485, argument 146
de Morgan’s laws 802 492, 497 conjugate 142, 143
disjunction 804 centre of mass 348, 350 de Moivre’s theorem 150
disjunctive normal form 808 centroid 349, 363 difference 142
distributive laws 802 chain rules 86, 91, 101, 631, 664, division 142
duality principle 812 668, 676 exponential form 148, 151
exclusive-OR-gate 808 more than one parameter 668, Euler’s formula 149
expression 803 676 imaginary part 141
EXOR gate 808 one parameter 664, 668 logarithm 142
identity laws 802 chaos 857, 861, 865, 929 modulus 141, 145
join 801 characteristic equation ordered pair 144
logic gates 803 difference equations 850 parallelogram rule 145
logic networks 805 differential equation 385–391 polar coordinates 146
logically equivalent gates matrices 279 principal value 146
812 circle 10 product 142
meet 801 area 951 quotient 142
NAND gate 805 cartesian equation 10 real part 141
negation 804 circumference 951 reciprocal 142
NOR gate 805 vector equation 207 rules for 141
NOT gate 804 circuits(electrical) 105, 380, 418, standard form 141
OR gate 804 446–453, 823–827, 838, sum 141
product 801 839, 878 compound interest 122, 842–843
reflexive law 802 balanced bridge 451 conditional probability 875–877
sum 801 cutset method 823 cone 241, 625
switches in parallel 810 LCR 418 surface area 342
switches in series 810 Laplace transform nethods volume 342, 951
switching circuits 809 535 conic sections 12
switching function 810 parallel 448, 878 conjunction 804
truth table 803 RL 380 conjugate, complex 142, 143
truth table, inverse 808 series 448, 878 connected graph 817, 820
variables 802 signal flow graphs 827 conservative field 752–759, 775
box plot 906, 930 switching 809 potential 754, 775
interquartile range 907 cobweb diagram 847–849 continuity equation 772
median 907 cofactor 180 contour map 625–626, 636, 927
quartiles 906 combinations 49–51, 949 convergence
outliers 907 common ratio (geometric series) of infinite series 129
whiskers 907 43 of integrals 330
branch (graph theory) 821 compatibility convolution 541, 927 (see also
linear equations 267 Fourier transform;
complement (of a set) 781 Laplace transform;
C
complementary function (see z-transform)
Capacitor 447, 528 also difference discrete 552
complex impedance 447, 533 equations; differential Fourier transform 535–538,
phasor 446 equations) 956
cardinality (of a set) 798 difference equations 852 Laplace transform 541, 726,
cardioid 57, 355, 920 differential equations 405 927
carrier wave 432, 585, 598 complete graph 817 memory and 544
caustic 707 completing the square 11, 140, theorem 541, 535, 726
Cayley-Hamilton theorem 302 367 z-transform 490
964
coordinates, three-dimensional orthogonal systems of 928 dash notation 100
(see also axes) parametric equations 95, 664 definition 65
INDEX

cartesian 623 point of inflection 93, 238 dot notation 215, 480
curvilinear 672 radius of curvature of 123, 239 function of a function rule 86
cylindrical polar 777 sketching 108–114 higher order 77, 102
orthogonal systems of 675 slope 62–65 implicit 93
paraboloidal 785 tangent line 62–66 and incremental
rotation of 226–229 tangent vector 212 approximation 115
spherical polar 780 curvilinear coordinates 672 index notation 125
coordinates, two-dimensional curl 780 of inverse functions 94
(see also axes) cylindrical polars 777 logarithmic 92
cartesian 6 divergence 780 material 690
origin 6 elliptic system 674 notations 65, 100
orthogonal systems 675 gradient 780 parameter, in terms of 95
polar 28–30 paraboloidal 785 of polynomials 126
rotation 223 scale factor 780 of product 83, 101
coplanar vectors 218, 251 spherical polars 780 of quotient 85, 101
cosh function 37 cutset (graph theory) 822, 929 and rate of change 67
Taylor series 131 fundamental 824 of reciprocal 85, 101
cosine function 18 (see also cycle(graph theory) 820 second 79
trigonometric functions) cylindrical polar coordinates of sums 70
antiderivative 313 704, 777 table of derivatives 76, 91, 952
derivative 76 total 665
exponential form 150 of vectors 213
D
Taylor series 130 of f(ax + b) 90
cosine rule 58, 116, 950 damper 418 of ex 75
cosine/sine transforms 587–590 damping 419 of ln x 76
inverse 589 critical 439 of cos x, sin x 75
at a jump 589 heavy 420 of xn 69, 89
counting index(series) 43 weak 420 derivative, partial 627
Cramer’s rule 260, 262, 270 dash notation for derivative 100 higher 629
cross product (see vector deadbeat 420, 488 mixed 630
product) decay, radioactive 36, 393 second 629
cumulative distribution definite integral 320–338 determinants 173, 175, 179–190,
function(cdf) 895 degree(of angle) 16 922
curl 773–776 degree (of a vertex) 817, 818 2 × 2 173, 179
in curvilinear coordinates 780 delay rule (second shift rule) 522 3 × 3 175, 180
determinant formula 774 del (grad) operator 659 cofactor 180
identities 785 delta function (impulse function) cofactor, sign rule 181
curvature 238, 243 530, 599 expansion by first row 180
centre of 122 and discrete systems 546 expansion, general 185
radius of 123, 239 Fourier transform of 599 factorization 191, 922
curves and Heaviside unit function Jacobian 728
angle between intersecting 658 532 minor 303
asymptotes 11, 109, 113, 114 Laplace transform of 531 notation 179
caustic 707 de Moivre’s theorem 150 product 188
chord 62– 65 de Morgan’s laws 795, 802 rules 182–188
convex/concave 239 derivative, directional (see suffix permutation 180
curvature of 238, 243 directional derivative) tridiagonal 192
envelope 475, 702, 928 derivative, ordinary 65 (see also zero 186
gradient 62 derivative, partial) diagonal dominance 274
length 355 and antiderivative 307 diagonalization of a matrix
normal to 238, 657, 658 chain rule 86, 91 286–289, 923
965
difference (sets) 794 variable coefficients, linear state of the system 482
difference equations 842– 861 407 trajectory (phase path) 484

INDEX
attractor 858 differential equations, linear van der Pol equation 480, 492
bifurcation 857 constant coefficient differential form 469, 679
chaos 857 379–412 for differential equation
characteristic equation 850 basis 385, 388 469 –473
cobweb 847–849 characterstic equation integrating factor 472
complementary function 385–391 and line integals 744
852 complementary function 405 perfect 472, 744
compound interest 843 damped oscillator 390 table of 470
constant coefficient 849 first-order 382 differentiation 61–80 (see also
difference 843 forced equations 395–407 derivative)
equilibrium 846 general solution 382, 386, 404, chain rule 86, 91
Feigenbaum sequence 858 405, 420 function of a function rule 86
first-order 847 harmonic forcing 399 implicit 93
fixed point 846 homogeneous (unforced) of integral with respect to
forcing term, table equations 379 parameter 640
generating function initial conditions 384, 391 of inverse functions 94
homogeneous 849 – 852 particular solutions 395–404 logarithmic 92
inhomogeneous 852– 853 second-order, unforced partial (see also derivative,
linear, constant coefficients 384–392 partial)
logistic equation 845, second-order, forced 395 –412 product rule 83, 101
854 – 858, 861 superposition principle 399 quotient rule 85, 101
order 845 unforced equations 379–394 reciprocal rule 85
particular solution 852–853 differential equations, nonlinear reversing 307
period-2 cycle 858 (qualitative methods) of vectors 213
period-3 cycle 858, 861 480–502 diffraction 417, 608–618
period-4 cycle 858 autonomous 481 angular spectrum 610
period doubling 856 centre 484, 494 array distribution 617
recurrence relation 843 direction of paths 488, attenuation 611
stability 847, 854 492–493 convolution 616
strange attractor 858 Duffing equation 502 interference 615
z-transform 556 equilibrium point 484, 493 pattern 610
differential-delay equation 560 Euler’s method 500 phase change on ray 609, 611
differential equations, first order initial value problem 481 radiating strip 608
379–382, 407– 410, instability 486 radiation 613
460 – 479 limit cycle 497 radiation rules 614
Bernoulli equation 478 linearization 494 source distribution 612
change of variable 473 linearized systems, digraph (directed graph) 816
and differentials 469 – 473 classification of 494, 496 weighted 828
direction field 461 node 488, 494 dimensions 959–960
direction indicators 461 numerical method 499 directed graph 816
energy transformation 473 orbit (phase path) 484 directed line segment 198
Euler numerical method 463 periodic motion 484 direction cosines 225
graphical method 460 phase diagram 482 direction ratios 229, 230
integrating factor 408– 410 phase paths (trajectories, directional derivative 651–654,
isoclines 462 orbits) 484, 488, 493 661, 692–696
lineal-element diagram 461 phase plane 483 discrete systems 545–558 (see
logistic 478 saddle 484, 494 also z-transform)
numerical solution 473 self-similar systems 495 impulsive input 545
separable 466 – 469, 474 separatrix 491 input/ouput 545
singular solutions 468, 475 spiral 487, 494 sampling 546
solution curves 461 stability 486, 487 signal 545
966
time invariant 545 eigenvalues 279 –304 exp(x) (see exponential function)
transfer function 549 characteristic equation 279 expected value (mean,
INDEX

disjunction 804 complex 280 expectation) 889, 897


disjunctive normal form 808 in differential equations 496 exponential distribution 897, 957
dispersion 436 orthogonal 293 exponential function 30 –33
displacement 193 repeated 283 derivative 75
relative 195 vibrating system 298 doubling period 35
displacement vector 195 zero 283 doubling principle 35, 36
addition 197 eigenve ctors 279–304 growth, decay 32, 35
components 196 in differential equations 279 half-life period 36
distance 7 orthogonal 293 Laplace transform of 507, 509
of point from plane 234 for repeated eigenvalues 283 limit of axne−cx 110
distribution, sampling 908 elementary row operations 262 Taylor series 130
distributions, probability (see ellipse 11 value of e 32
probably distributions) area 951 expression 5
divergence (of a vector field) 764, parametric equations 99
779, 780 polar coordinates 29
F
in curvilinear coordinates 780 semi-axes 11
in cylindrical polars 779 empty set 791 face(of a graph) 832
identities 785 energy transformation 475, 501 factorial function 48, 78, 372
in spherical polars 781 envelope 475, 702 928, feedback 827
theorem 771 equilibrium (forces) 236 Feigenbaum sequence 858
divergent series 129 equilibrium point 484, 493 fibonacci sequence 860
dodecahedron 833 centre 484, 494 field (see also vector field)
Doppler effect 437 node 488, 494 conservative 752–759, 775
dot product (see scalar product) saddle 484, 494 intensity 753
double integration 708–734 spiral 487, 494 potential 754, 775
change of variable 727–731 stability of 487 fixed point 846
changing order, constant errors 649, 683–685 (see also stability 847–849
limits 712 approximation) fluid flow 662, 690, 706, 772, 774
changing order, non-constant escape velocity 479 material derivative 690
limits 715 estimate (statistical) 906, 909 flux 770
constant limits 709–713 estimator of parameter 909 focal length 121, 662
double integrals 717 biased/unbiased 909 force
inner integral 709 sample mean 905, 911 at a point 235
Jacobian 727 sample variance 911 components 236
Jacobian, inverse 734 standard error 910 equilibrium 235
non-rectangular regions 713 ethanol molecule 816 moment 251, 254
outer integral 709 eulerian graph 821 resultant 235
polar coordinates 721 Euler’s constant 926 Fourier coefficients 564 –576
repeated integral 709 Euler’s formula (complex Fourier series 562–585
region of integration 718 numbers) 149 average value 567
separable type 724 Euler’s method (differential carrier wave 585
signed volume analogy equations) 463, 499 coefficients 566
duality principle 596, 800, 812 Euler’s theorem (graph theorem) complex coefficients 580
Duffing equation 502 832 cosine series 572
dummy variable 13 events 866 even functions 572
exhaustive 869 extensions 576
independent 877 fundamental frequency 562
E
intersection 869 Gibbs’ phenomenon 927
e, numerical value 949 mutually exclusive 869 half-range series 574–576
echelon form 264 partitioned 870 harmonics 562
edge (of graph) 814 union 868 at a jump 572
967
Laplace transform of 585 frequency 22, 414 curves, angle between
odd functions 572 angular 22, 415 curvilinear coordinates 672

INDEX
Parseval’s identity 585, 608 domain 451 dependent/independent
period 2π 568 forcing 376 variables 623
period T 564 polygon (statistics) 903 depiction of 624
periodic function 563, 567 friction 418 derivatives, mixed 630
pitch 562 function 12 (see functions of one, directional derivative 652, 681
sawtooth wave 5 two and N variables) errors 648–650
sine series 572 complementary 405, 852 gradient vector 659
spectrum 577 generating 860 higher derivatives 629
switching functions 573 implicit 12 implicit differentiation
symmetry 572 functions of one variable 12–35 654–656, 666
two-sided 579–582 (see also derivative; incremental approximation 645
Fourier transforms 587– 620, 956 differentiation) Lagrange multiplier 667–672,
(see also diffraction) argument 12 681
convolution 601– 605 delta 530, 599 least squares method 638–640
cosine transform 586, 589 dependent/independent level curves 625
definitions 588, 589, 591 variables 12 linear approximation 646
delta function 599 discontinuous 14 maximum/minimum 635
of derivative 596 even 13 maximum/minimum,
Dirac comb 605 exponential 30, 33 restricted 667–672
duality 596 harmonic 21, 413 normal to a curve 658, 660
exponential 591, 592 Heaviside 14 normal to surface 632
of exponential function 595 hyperbolic 36, 153, 951 orthogonal systems of curves
Fourier transform pair 591 implicit 12 656
frequency distribution impulse 530 partial derivatives 627
function 588 incremental approximation saddle point 637
frequency scaling 596 115, 645, 683 stationary points, Lagrange
frequency shift 596 input/output 12 multipliers 670
fundamental frequency 587 inverse 23–25 stationary points, restricted
generalized functions 600 inverse hyperbolic 38 667
inverse transform 589, 591 inverse trigonometric 25–28 stationary points, tests for 637
jump discontinuity 591 logarithm 33 steepest ascent/descent
modulation 596 maximum/minimum 102 653–654
notations 527 mean value 339 surface 624
Parseval theorem 608 odd 13 tangent plane 632
periodic function 599 periodic 22 functions of many variables
Rayleigh’s theorem 607 point of inflection 93 683–707
rules, table of 596, 956 rational 14 chain rule 688
shah function 605 signum (sgn) 15 derivative, mixed 684
sidebands 598 stationary points 102 directional derivative 692, 693
signal energy 607 switching 810 envelope 702, 703, 707
sinc function 593, 594 translation of 13 errors 685
sine transform 587, 588 trigonometric 17–22, 25–27, gradient vector 688, 689
spectral density 588 949(table) higher derivatives 684
table 956 unit step 14 implicit differentiation 686
time scaling 596 functions of two variables incremental approximation
top-hat function 593, 623–642, 645–683 683, 684
594 chain rule, one parameter Lagrange multipliers 699–701,
triangle function 605 664–665 706
frameworks 834 –835 chain rule, two parameters level surface 696
bipartite graph 834 676–679 material derivative 690
minimum bracing 834 contour map 625 normal to surface 690
968
partial derivatives 683 compatibility graph 836 harmonic oscillator 413–425
restricted stationary points complete graph 817, 833 amplitude 414
INDEX

697–702 connected graph 817, 820 angular frequency 415


stationary points 696 cotree 822 damped 419
tangent plane 691, 692 cutset 822, 929 lead and lag 415
cutset, fundamental 824 period 414
cutest, proper 823 phase (difference) 415
G
cycle 820 wavelength 415
gamma function 372 degree of a vertex 817, 818 wave number 415
gas, equation of state 687 digraph (directed graph) 816 harmonics 562
gate (logic) 803 disconnected graph 817 Heaviside unit function 14
AND 804 edge 814 Laplace transform of 519
EXOR 808 eulerian graph 821 histogram 903
NAND 805 Euler’s theorem 832 homogeneous linear equations
NOR 805 face 832 271 (see also linear
NOT 804 frameworks 834 –835 algebraic equations)
OR 804 hamiltonian graph 821 Hooke’s law 417
Gauss-Seidel method 273 handshaking lemma 818 hyperbola 11
diagonal dominance 274 labelled graph 818–820 asymptotes 11
Gaussian elimination 263, 264 link 822 rectangular 626
back substitution 263 loop 817 hyperbolic functions 36, 153
echelon form 264 multigraph 817 derivatives 88, 952
inverse matrix 265 node 815 identities 37, 38, 951(table)
pivots 264 path 820 inverse 38
generalized function 600 path, shortest 816 inverse as logarithms 38
geometric distribution 891, 957 planar graph 815, 831 trigonometric functions,
mean 891 regular graph 817, 820 relation with 153
variance 892 signal flow graph 827–831 hypergeometric distribution 895
geometric sequence simple graph 817 mean 895
(progression) 43 spanning tree 822 variance 895, 957
geometric series 43 subgraph 821
common ratio 43 traffic signal phasing 835, 841
I
infinite 45 trail 820
sum of 44, 46 tree 821 Icosahedron 833
geometrical area 314 unlabelled graph 818–820 identity 5
in polar coordinates 344 vertex 814 impedance 446 –451, 533–535
Gibbs’ phenomenon 927 walk 820 capacitor 447, 533
gradient (curve) 9, 61 weighted digraphs 828 complex 446 –451
gradient vector (grad) 659, 688 gravitational field 755 in frequency domain 446, 533
curvilinear coordinates 780 Green’s theorem 748 inductor 447, 533
in cylindrical polars 778 group velocity 436–437 parallel 448, 534
identities 785 resistor 447, 533
in spherical polars 781 series 448, 534
H
graphs (see also curves) 7 in s-domain 533
gradient 61 Half-life 36 implicit differentiation 654, 666,
sketching 108 hamiltonian graph 821 686
slope 61– 65 handshaking lemma 818 implicit function 12
graph theory (networks) harmonic forcing 399 improper integral 328
814 – 841 harmonic function 21, 414 convergence 330
bipartite graph 832 standard form 414 divergence 330
branch 821 harmonic oscillation 413 impulse function 530, 599 (see
circuits, electrical 821, phase diagram 483 also delta function)
824 – 827, 838, 839 phasor 443 impulsive input 544
969
increment 63 surface 765 L
incremental approximation 115, symmetrical 335

INDEX
645, 683 table of integrals 953–954 Lagrange multipliers 667–672,
indefinite integral 324 trapezium rule 346, 347 681, 955
table 953 variable limits 336 Laplace equation 785, 786
identity matrix 170 volume 765 Laplace transforms 505–561,
index laws 4, 948 integral equation 529, 559, 560 926–927, 955 (see also
induction 843 Volterra 559 z-transform)
inductor integrand 324 convolution theorem 541, 726,
impedance 533 integrating factor 407–410 927
phasor 446 integration 320–378 (see also cosine function 507
complex impedance 447 integral; double definition 505
inequality 5 integration) of derivatives 515
infinite series 128 change of variable 362–366 delay rule(second shift rule)
convergence 129 of inverse function 370 522
divergence 129 partial fractions 366 delta function 530
geometric 43 by parts 368–373 differential-delay equation 560
partial sums 129 reduction formulae 373 differential equations 516–519
sum 128 by substitution 356 –366, 378 differential equations,
Taylor series 130 of trigonometric products variable coefficients 560
inflection, point of 93, 238 362 discrete systems 545
inner product (see scalar interference 417, 456, 615 division by s 528
product) fringes 457 division rule 524
integer floor function 560 intersection (sets) 791 of Fourier series 585
integers 4 interval 5 Heaviside unit function 519
sums of powers of 949 infinite 5 impedance, s-domain 530
integrals 320–378 (see also inverse function 23–25 impulse function 530
antiderivative; derivative of 94 impulsive input 543
integration; double integration of 370 integral equations 529, 559,
integral; line integral) reciprocal relations 23 560
and area 323, 333, 327 reflection property 24 inverse 505, 512
area, polar coordinates 345 inverse matrix 172, 190 inverses, table of 955
area analogy 327, 346 Gaussian elimination 265 multiplication by ekt 510
of complex functions 331 Inverse trigonometric functions multiplication by tn 510
definite 323 25 notation 506
differentiation of (variable principal values 26 of powers, tn 507
limits) 336 irrotational field 775 partial fractions 513
differentiation with respect isocline 461, 493 quiescent system 517
to parameter 374 iterative methods (see rules, list of 955
even function 334 approximation) s-domain 529
improper 328 scale rule 508, 955
indefinite 324 shift rules 510, 955
J
infinite 329 sifting 531
as limit of a sum 341–353 Jacobi method (for linear sine function 507
limits of integration, variable equations) 274 square wave 521
336 Jacobian (double integration) table of 513
numerical evaluation of 322, 728 and transfer function,
339, 346, 355 jump (discontinuity) 14 s-domain 535
odd function 334 and transfer function,
rectangle rule 322, 347 ω-domain 540
K
Simpson’s rule 355, 925 Volterra integral equation 559
solid of revolution 343 Kirchhoff laws 449, 824, 825 and z transform 548
square bracket notation 316 Kuratowski 833 lead and lag 415
970
least squares geometrical interpretation 268 determinant of 175, 190 (see
estimates 914 homogeneous 271 also determinants)
INDEX

method 638 – 640, 928 ill-conditioned 640 diagonal 170


Leibniz’s formula 123 incompatible 267 diagonalization of 286–289
level curve 625 Jacobi numerical method 275 echelon 264
normal to 657 pivots 264 eigenvalues 281
level surface 696 trivial, nontrivial solutions eigenvectors 285
normal to 696 271 idempotent 301
l’Hôpital’s rule 136 –138 linear dependence 185, 285 identity 170
light switches (Boolean linear independence 286 inverse 172
application) 811, 813 linear oscillator 418–425 (see leading diagonal 169
limit 65, 72, 98, 110, 111 also harmonic oscillator; lower triangular 274
for derivative 65 oscillations) non-singular 173
important limits 72–76 circuit model 418 null 163
left /right 111 damping 419 order 161
limit cycle 497, 926 deadbeat 420 orthogonal 295–298, 923
stability 499 free (natural) oscillations 419 positive-definite 295
line integrals 735 –761 overdamped (heavy damping) powers of 171, 289 –292
closed path 746 420 quadratic form 292
definition 736 resonance 423 rank 304
evaluation 736 transient 420 rectangular 162
field, conservative 752, underdamped (weak row-stochastic
753–759 damping) 420 singular 173
field intensity 753 link (graph theory) 822 skew-symmetric 169
Green’s theorem 748 ln (see logarithm) square 162
non-conservative potential logarithm 33–35 symmetric 169, 294
field 757 of complex number 157 trace of 301
as an ordinary integral 736 derivative of 76 transpose 168
parametric form 740–742 properties 34 unit 170
path 735 Taylor series 131 upper triangular 274
path dependence 736, 738 logarithmic differentiation 92 vector 162, 169
path independence 744, logic gates (see Boolean algebra; zero 163
747–750 gate) 803 matrix algebra 161–178 (see also
paths parallel to axes 743 logic networks (see Boolean matrices; matrix,
of perfect differential 744 –745 algebra) 805 inverse)
potential 755, 756 logistic equation 478, 845, associative law 164
potential field 756 854–858, 860, 929 Cayley-Hamilton theorem 302
in two and three dimensions loop (of graph) 817 conformable for
739 multiplication 165
work 750, 753, 755 difference 163
M
linear algebraic equations elementary row operations 262
259 –278 mass-centre 348, 350 equality 162
augmented matrix 263 mass-spring system 418 linear equations 259–278
back substitution 263 material derivative 690 multiplication 165
compatible 267 Mathematica projects 920–930 multiplication by a constant
Cramer’s rule 229 matrices 161–176 (see 163
diagonal dominance 274 eigenvalues; multiplication on left/right
echelon form 264 eigenvectors; matrix 167
elementary row operations 262 algebra; matrix, inverse) postmultiplication 167
elimination 259 adjacency 837 premultiplication 167
Gauss-Seidel numerical adjoint (adjugate) 189, 190 row-on-column operation 165
method 273 augmented 263 sum 163
Gaussian elimination 263, 265 characterstic equation 279 summation notation 166
971
matrix, inverse 172, 189, 923 (see normal underdamped (light damping)
matrix; matrix algebra) to curve 238, 657, 658 420

INDEX
by Gaussian elimination to plane 232 oscillator, linear 419–425
of a product 174 to surface 632, 690 outcome 866
rule for 2 × 2 173 normal coordinates 299 outlier 907
rule for 3 × 3 175 normal distribution 898–900
maximum/minimum standard normal curve 899
P
local 103 standardized 899, 957
N variables 696 table 958 parabola 12
one variable 102 number line 5 paraboloidal coordinates 785
one variable, classification 104 number 3 parallelepiped
restricted 107, 670, 697, 699 complex (see complex volume (determinant) 257
(see also Lagrange numbers) volume (vector) 251
multipliers) exponent (index) 4 parallelogram
two variables 635– 638 exponent rules 4 area (determinant) 734
two variables, classification 637 index laws 948 area (vector) 248
mean (expected value, infinity sign 4 parallelogram rule
expectation) 889, 897 integer 4 complex numbers 145
median 906 irrational 4 vector addition 201
mode 906 modulus 6 parameter (statistics) 903
modulus 6 powers 4 parametric equations of a curve
moment (see also force) rational 4 95, 664
about an axis 253 real 3 Parseval identity 585, 608
of force 251, 255 recurring decimal 4, 46 partial derivative 627 (see
vector 252 set notations 790 functions of N variables)
moment of inertia 348, 350 –352 numerical methods (see higher 629
cone 377 approximation) mixed 630
disc 352, 377 second 629
rectangle 354 partial differentiation 623–705
O
sphere 377 partial fractions 39–42
triangle 351 Ohm’s law 824 in integration 366
moment of momentum 258 operator 66 and Laplace transforms 513
mortgage 844 ordered pair 144 rules 40
moving average 620 ordinate 6 and z-transforms 554
multigraph 817 origin of coordinates 6 partial sum 129
mutually exclusive events 869 orthogonal matrix 295 –298, Pascal distribution 894, 957
923 Pascal’s triangle 52, 949
rotation of axes 296 Path, phase (see line integral;
N
orthogonal systems phase plane)
nabla (gradient) 659 of coordinates 675 path (graph theory) 820
negation (Boolean algebra) 804 of curves 654, 928 pdf (probability density
negative binomial (Pascal) oscillations function) 895
distribution 894, 957 addition 417, 454 pendulum 71, 394, 425, 489–491,
mean 894, 957 beats 431–437 648, 960
variance 894, 957 compound 431 perfect differential form 471–473
Newton cooling 412 damped 419 line integrals 744
Newton’s method 116 –119 deadbeat 420, 488 period 9, 22, 414
nodal analysis (circuits) 824 forced 420 period doubling 856
node (graph theory) 815 harmonic 413, 427 periodic functions 22 (see also
node (phase plane) 488, 494 longitudinal 298 harmonic functions;
nonlinear differential equations overdamped (heavy damping) Fourier series)
(see differential 420 amplitude 22, 414
equations, nonlinear) transients 420 and Fourier series 563, 567
972
angular frequency 22, 415 normal vector 231 expected value 889
frequency 22, 414 tangent 632, 691 exponential 897
INDEX

integrals of 564 vector equation of 207, 232 geometric 891


lead/lag 415 Poisson distribution 892, 957 hypergeometric 895
mean value 328 mean 893 independent event 877
period 22, 414 variance 893 mean 889, 897
phase 22, 415 approximation to binomial negative binomial 894
phase difference 415 distribution 893 normal 898
spectrum 577 polar angle 17 normal, standardized 899
wavelength 19, 415 polar coordinates 28–30 Pascal 894
wave number 428 complex numbers 146 Poisson 892
permutations 46 –51, 949 cylindrical 704, 777 probability density function
circular 49 in double integration 721 (pdf) 895
perspective 243 geometrical area in 344 relative frequency 865
phase (angle) 22, 415 motion in 214–216 sample space 866
phase difference 415 spherical 780 standard deviation 890
phase plane 483, 488, 491 (see polygon table 957
also differential regular, area 951 trial 865
equations, nonlinear) polynomial 40 uniform 901
equilibrium points 484, 488, derivative 126 variance 890, 897
493 Taylor 125 product rule (differentiation) 83,
general 491– 497 population (statistics) 903 101
limit cycle 497, 926 estimated mean 906
numerical method 499 population problems 36, 393,
Q
path direction 484, 488, 493 478, 491, 560, 643
phase diagram 483 position vector 206 quadratic equation 140, 948
phase path 484, 488, 493 derivative of 213 quadratic form 292
phase velocity 429 positive definite matrix 295 positive definite 295
phasors 442– 459 potential 755 quartiles 906
addition 444 energy 71, 755 quotient rule (differentiation) 85,
algebra of 444 field 756 101
Argand diagram 443 single-valued 670
capacitor 447 probability 865–883
R
complex amplitude 453– 454 addition law 871
complex impedance 446 – 451 axioms 870 radian 16, 949
definition 443 Bayes’ theorem 880 radiation problems 613–618 (see
of derivative 444 conditional 875 –877 diffraction, interference)
diagram 445 event 866 radioactive decay 393
frequency domain 447, 451 frequency 872 random sample 903, 910
harmonic oscillation 443 mutually exclusive event 869 random variable 884–902 (see
inductor 447 and sets 868 also probability
of integrals 444 total 879 distributions)
interference 456 Venn diagrams 868 continuous 895
oscillations 453 probability distributions discrete 884
resistor 447 884–902, 957 mean (expected value) 889, 897
time domain 447 Bernoulli trials 887 probability distribution
transfer function 451 binomial 887–888 (function) 885
and waves 453 continuous 895–900 standard deviation 890
pitch 562 counting method variance 890, 891, 897
pivot 264 cumulative distribution random walk 860
planar graph 815, 831 function (cdf) 895 rank (of a matrix) 304
plane density function (pdf) rational function 40
cartesian equation 208 discrete 884–895 Rayleigh’s theorem 607
973
reciprocal rule (differentiation) sampling distribution 908 shah function 605
85 sawtooth wave 584, 927 shift rule (Laplace transform)

INDEX
rectangle rule (integration) 322, scalar 219 510, 955
347 scalar function 659 shoulder (surface) 635
recurrence relation 464 (see also scalar (dot, inner) product sidebands 585, 598
difference equations) 218–240 (see also sifting function 531
recurring decimal 2, 46 vectors) signal 545, 589
red shift 438 angle between vectors 221 signal energy 607
reduction formula 373 of basis vectors 222 signal flow graphs 827–831
regression 913–915 invariance 248 block diagram 827
controlled variable 913 perpendicular vectors 222 cycle 830
least squares estimate 914 scalar triple product 249 edges in series 829
linear model 914 cyclic order 250 feedback 827, 828
line 915 scale rule (Laplace transform) loop 830
response 913 508, 955 multiple edges 829
scatter diagram 913 scatter diagram 913 stem 830
straight line fit 914 separable differential equations weighted digraph 828
unbiased estimators 916 466–469 signed area 314, 320
relaxation oscillation 499 separation of variables 466–469 signum (sgn) function 15, 920
repeated integral 709–717 (see separatrix 491 simple harmonic motion (see
double integration) sequence 43, 129, 845 harmonic oscillator)
separable 724 of partial sums 129 Simpson’s rule 355, 925
resistor series (see Fourier series; sinc function 594
impedance 533 geometric series; infinite sine function 18 (see also
phasor 447 series; Taylor series) trigonometric functions)
complex impedance 447 sets 789–800 antiderivative 312
resonance 423 associative laws 793 derivative 75
restricted stationary values 106, binary 789 exponential form 150
667, 697 cardinality 798 rectified 582
resultant of forces 235 cartesian product 800 Taylor series 130
root mean square (rms) 328 commutative law 793 sine rule 950
rotation of axes (see also axes) complement 791 sine transform (see cosine/sine
223, 226 complementary laws 794 transform)
row operations, elementary 262 de Morgan’s laws 795 singular matrix 173
difference 794 singular solutions (of differential
disjoint 791 equations) 468
S
distributive law 794 sinh function 37
saddle (phase plane) 484, 494 duality 800 Taylor series 131
saddle (surface) 625, 635– 638, elements 789 sinusoid 22
927 empty 791 slope 9, 61–65, 66 (see curve;
sample 903 equality 790 functions of two
mean 905, 910 finite 790 variables; straight line)
standard error of mean 910 identity laws 794 solenoidal field 775
variance 911 infinite 790 solid of revolution 343
sample space 866 (see also intersection 791 surface area integral 343
events) number sets 790 volume integral 343
countable 866 ordered pairs 800 spectral density 588
discrete 866 proper subset 792 spectrum (Fourier series) 577
elements of 866 subset 792 speed 98
event 866 union 791 sphere
exhaustive 869 universal 791 surface area 951
partitioning of 870 Venn diagram 792 volume 951
Venn diagram 868 sgn (signum) function 15, 920 spherical polar coordinates 780
974
spiral (curve) 28 Stokes’s theorem 784 Taylor polynomial 125, 921, 922
archimidean 57 straight line 8 Taylor series 124 –138
INDEX

equiangular 57 cartesian form in three binomial series 131


spiral (phase plane) 487, 494 dimensions 235 composite functions 132
square root 4 cartesian form in two general point, about 134
standard deviation 890 dimensions 8, 230 large variable 134
standard error 910 determinant equation of 191 polynomial approximation
stationary points (see also direction ratios 230 125
maximum/minimum) gradient 9 table 130
N variables 696 parametric form 235 tetrahedron 773, 785, 923
one variable 103 perpendicular 9 thermodynamic equations 687
one variable, classification 104 slope 9 top-hat function 593
restricted 107, 667, 697 vector equation 235 Fourier transform 593
two variables 635 strange attractor 858 torque 253
two variables, classification streamline 706 torus 768
637 subgraph 821 total derivative 665
statistic 903 subset 792 total probability 879
statistical inference 903 substitution, method of total waiting time 837
statistics 903 –916 (integration) 356 –366, traffic signals 835–837, 841
bins 904 378 compatibility graph 836
box plot 906 summation sign 43 phasing 836
cells 904 sums of integer powers 949 subgraph 836
central limit theorem 911 superposition principle 399 total waiting time 837
controlled variable 913 surface 625, 690 trail 820
estimate 906, 909 area 767 transfer admittance 452
estimator 909, 911 cone 625 transfer function 535, 536, 549
frequency polygon 903 contour map 625, 927 frequency domain 451
histogram 903 hemisphere 625 s-domain 535
interquartile range 907 integral 765 transfer impedance 452
least squares estimate 914 maximum/minimum 635, 637 transform (see Laplace
mean 905 normal to 632, 690 transform; z-transform;
median 906 parametric form 768 discrete systems; Fourier
mode 906 saddle 625, 635, 637, 927 transform; cosine/sine
outlier 907 shoulder 635 transform)
parameter 903 stationary points 635 transient 421
population 903 stationary points classification translated function 13
quartiles 906 637 transpose 168
regression 913–915 tangent plane 632, 691 trapezium rule 346, 347
regression estimator 915 switches travelling waves 427, 430, 434,
regression line 915 light 811, 813 454
response variable 913 in parallel 810 complex amplitude 454
sample mean 905, 910 in series 810 tree (graph theory) 821
sample variance 911 truth tables 810, 811 spanning 822
sampling distribution 908 switching circuits (Boolean trefoil knot 923
scatter diagram 913 algebra) 809 trial 865
standard error 910 switching function 713 Bernoulli 887
statistic 903 triangle function 605
variates 903 Fourier transform 605
T
whisker 907 triangle rule 200 (vectors)
steepest ascent/descent 654 tangent (to a curve) 65 triangle, vector area 923
stem 830 equation 66 trigonometric functions 17–21
stiffness of spring 298, 417 vector 212, 238 derivatives 18
Hooke’s law 417 tangent plane 632, 691 exponential form 150
975
identities 20, 155, 566, multiplication by a scalar 200 solenoidal 772, 775
949 –950(table) normal to curve 680 Stokes’s theorem 781–784

INDEX
integrals of products 362 normal to plane 231 surface area 767
inverse 25 normal to a surface 632, 690 surface integral 765
Taylor series 130 parallel 199 triple integral 769
truth tables 803 (Boolean parallelogram rule 201 volume integral 765, 769
algebra) perpendicular vorticity 782
for gates 803 plane equation 208, 231 vector (cross) product 244 –258
inverse method 808 position 206 direction of 246
for switches 809 and relative velocity 204 invariance 248
right/left-handed system of rules 245
198 of unit vectors 245
U
row 162, 169 vector space 285
uniform distribution 901 rules of vector algebra base vectors 285
union (of sets) 791 199–202 vector triple product 255
unit step function 14 scalar (dot, inner) product velocity 67, 212, 213
units (SI) 959 –960 220 angular
unit vector 211, 223 scalar triple product 249 polar components 216
universal set 791 straight line 209, 230, 234 relative 204
subtraction 200 Venn diagram 792, 868
sum of 181 vertex 814
V
tangent to curve 213, 238 vibrations (see oscillations)
valency 816 triangle rule 200 volume
van der Pol equation 499, 926 unit 211, 223 of cone 342, 951
variable, random (see random vector product 244 (see vector ellipsoid 353
variable) product) integral 769
variable, dependent/independent vector triple product 255 parallelepiped 251, 257
12 and velocity (see velocity) 213, of solid of revolution 343
variance 890, 897 216 table 951
variance, sample 911 vector fields 762–786 (see also tetrahedron 785
variate 793 curl; divergence;
vectors 193 –258 (see also axes; gradient)
W
vector field) cylindrical polar coordinates
acceleration 213 777 walk (graph theory) 820
addition of 200 curl 773–777 walk, random 860
angle between 220 curvilinear coordinates water clock 354
basis 210 779–781 wave (see also diffraction;
column 162, 169 divergence 764 interference;
components 199 divergence theorem 770–773 oscillations)
coplanar 203, 251 field lines 762–764 antinode 427
cross product (see vector fluid flow 772, 774, 775 attenuating 629
product) flux 770 beats 431, 432, 434
and curvature 238 gradient 780 carrier 432, 585, 598
differentation 212–214 identities 785 complex amplitude 453–454
directed line segment 198 integral curves 762 compound oscillation 431
displacement 193, 197 irrotational 775 diffraction 417
dot product (see scalar Laplace’s equation 786 dispersive 436
product) orthogonal coordinates Doppler effect 437
equality 199 (general) equation 631
gradient 659 –661, 688 paraboloidal coordinates 785 frequency modulation 435
(see also gradient) scale factor 778 group velocity 436
invariance 248 spherical polar coordinates intensity 455
magnitude (length) 199 780 interference 417
976
modulation 432, 435 wave number 428 complex plane 522–556
node 427 wave packets 432 definition 549
INDEX

number 415 wave train 429 delay circuit 547


phase velocity 429 Wheatstone bridge 122, 449, 451 difference equations 556
plane 429, 430 whisker 907 differentiation analogue 561
progressive 428 work 341, 750–752 discrete signal 561
sinusoidal 427 inverse 549
standing 427 linear system 555
Z
stationary 427 poles of 554
train 429 z-transform (see also discrete stability of discrete system 554
travelling 428, 430, 434 systems) 548 –561 time-delay rule 561
wavelength 19, 415 convolution theorem 552 transfer function 549

You might also like