Book 02 17 2020
Book 02 17 2020
Farzali Izadi
4.2.1 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.3 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.1 The Poisson Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
1
Selected Topics in Calculus
1.1 Limits
1 4y
Central to calculus is the value of the slope of a line, 4x , but
when both the numerator and denominator become almost zero. To
evaluate the slope, that ratio, under those vanishing conditions,
requires the idea of a limit. And central to the idea of a limit is
the idea of a sequence of rational numbers.
We encounter such a sequence in geometry when we determine
a value for π, which is the ratio of the circumference of a circle to
the diameter. To do that, we inscribe in the circle a regular polygon.
The ratio of the perimeter of the polygon to the diameter, which
we can actually calculate, will be an approximation to π. And as
we increase the number of sides – that is, if we consider a sequence
of polygons: 60 sides, 61 sides, 62, 63, 64, and so on – then the
sequence of those ratios gets closer and closer to π. Now, the circle
is never equal to any polygon. But by considering a sufficiently
large number of sides, the difference between the circle and that
polygon, the error, will be less than any small number we name.
Less even than
0.00000000000000000000000000000001!
That is the idea of a sequence approaching a limit, or a boundary.
By that process, we can approximate the value of π as closely as
we possibly can. For instance, the reader surely can recognize the
1
This section has been quoted from:
“http://www.salonhogar.net/themathpage/aCalc/limits.htm”.
2 1 Selected Topics in Calculus
1.2 Derivative
2
The problem of finding the tangent line to a curve and the prob-
lem of finding the velocity of an object both involve finding the
same type of limit. This special type of limit is called a derivative
and we will see that it can be interpreted as a rate of change in
any of the sciences or engineering.
In general, suppose an object moves along a straight line accord-
ing to an equation of motion s = f (t), where s is the displacement
(directed distance) of the object from the origin at time t. The
function f that describes the motion is called the position func-
tion of the object. In the time interval from t = a to t = a + h the
change in position is f (a + h) − f (a). The average velocity over
this time interval is
displacement f (a + h) − f (a)
Average velocity = = ,
time h
which is the same as the slope of the secant line.
Now suppose we compute the average velocities over shorter and
shorter time intervals [a, a + h]. In other words, we let h approach
0. We define the velocity (or instantaneous velocity) at time t = a
to be the limit of these average velocities:
f (a + h) − f (a)
v(a) = lim
h→0 h
This means that the velocity at time t = a is equal to the slope of
the tangent line at this point.
We have seen that the same type of limit arises in finding the
slope of a tangent line or the velocity of an object. In fact, limits
of the form
f (a + h) − f (a)
lim
h→0 h
arise whenever we calculate a rate of change in any of the sciences
or engineering, such as a rate of reaction in chemistry or a marginal
cost in economics. Since this type of limit occurs so widely, it is
given a special name and notation, namely derivative.
2
This section is quoted from:
Stewart, James, Daniel K. Clegg, and Saleem Watson. Calculus: early transcen-
dentals. Cengage Learning, 2020.
4 1 Selected Topics in Calculus
Indefinite Integral
3
This section is quoted from:
Stewart, James, Daniel K. Clegg, and Saleem Watson. Calculus: early transcen-
dentals. Cengage Learning, 2020.
1.3 Definite and Indefinite Integrals 5
Z b
2. f (t) dt = F (b) − F (a),where F is an antiderivative of f ,
a
that is F 0 = f .
Both parts of the Fundamental Theorem establish connections
between antiderivativesZ and definite integrals. Part 1 says that if
x
f is continuous, then f (t) dt is an antiderivative of f . Part
Z b a
Definite Integral
1.4 Exercises
2.1.1 Vectors
Scalars
Vectors
2.1.2 Matrices
and define the product Ax by the left-hand side of (2.1), then (2.1)
is equivalent to Ax = b. The scalars aij are called the elements of
A. The set m × n matrices with real elements is written Rm×n .
The set of m × n matrices with real or complex components is
written Cm×n . The indices i and j of the elements aij of a matrix
are called respectively the row index and the column index.
where each Aii is square and all entries below the Aii blocks
are zero. One can easily show that the characteristic polynomial
Y b
det(A − λI) of A is the product det(Aii − λI) of the charac-
i=1
teristic polynomials of the Aii and therefore that the set λ(A) of
[b
eigenvalues of A is the union λ(Aii ) of the sets of eigenvalues of
i=1
the diagonal blocks Aii . The canonical forms that we compute will
be block triangular and will proceed computationally by break-
ing up large diagonal blocks into smaller ones. If we start with
a complex matrix A, the final diagonal blocks will be 1-by-1, so
the ultimate canonical form will be triangular. If we start with a
real matrix A, the ultimate canonical form will have 1-by-1 diago-
nal blocks (corresponding to real eigenvalues) and 2-by-2 diagonal
blocks (corresponding to complex conjugate pairs of eigenvalues);
such a block triangular matrix is called quasi-triangular.
2.3 Exercises 15
2.3 Exercises
3.1 Statistics
1
Statistics is the science of sampling. How one set of measurements
differs from another and what the implications of those differences
might be are its primary concerns. Conceptually, the subject is
rooted in the mathematics of probability, but its applications are
everywhere. Statisticians are as likely to be found in a research lab
or a field station as they are in a government office, an advertising
firm, or a college classroom.
Properly applied, statistical techniques can be enormously effec-
tive in clarifying and quantifying natural phenomena. In general,
statistical techniques are employed either to (1) describe what did
happen or (2) predict what might happen. It is unarguably true
that the interplay between description and prediction.
the population that the data are thought to represent. Even when
a data analysis draws its main conclusions using inductive statis-
tical analysis, descriptive statistics are generally presented along
with more formal analysis. For example in a paper reporting on
a study involving human subjects, there typically appears a table
giving the overall sample size, sample sizes in important subgroups
(e.g. for each treatment or exposure group), and demographic or
clinical characteristics such as the average age, the proportion of
subjects with each gender, and the proportion of subjects with
related comorbidities.
3.2.1 Mean
3.2.5 Median
3.2.6 Mode
The mode is the value that occurs the most frequently in a data
set or a probability distribution. In some fields, notably education,
sample data are often called scores, and the sample mode is known
as the modal score. Like the statistical mean and the median, the
mode is a way of capturing important information about a random
variable or a population in a single quantity. The mode is in general
different from the mean and median, and may be very different for
strongly skewed distributions. The mode is not necessarily unique,
since the same maximum frequency may be attained at different
values. The most ambiguous case occurs in uniform distributions,
wherein all values are equally likely.
3.2.7 Range
The range is the length of the smallest interval which contains all
the data. It is calculated by subtracting the smallest observation
(sample minimum) from the greatest (sample maximum) and pro-
vides an indication of statistical dispersion. It is measured in the
same units as the data. Since it only depends on two of the obser-
vations, it is a poor and weak measure of dispersion except when
the sample size is large.
The range, in the sense of the difference between the highest and
lowest scores, is also called the crude range. When a new scale for
measurement is developed, then a potential maximum or minimum
will emanate from this scale. This is called the potential (crude)
range. Of course this range should not be chosen too small, in or-
der to avoid a ceiling effect. When the measurement is obtained,
the resulting smallest or greatest observation, will provide the ob-
served (crude) range.
3.2 Descriptive Statistics 23
3.2.10 Variance
3.3 Exercises
that impact the results of the experiment. There are often variables
that you don’t even know about. A confounding variable is an
extraneous variable that varies across the independent variable.
1. Ordinal data are also categorical, but in this case .......... have
an order and can be ranked. Examples include stages of breast
cancer.
2. Numeric data can be discrete or continuous. Discrete data have
fixed values. .......... data can take any value, frequently within
a given range.
3. Data can be broadly .......... as categorical or numeric. Cate-
gorical data may be nominal, ordinal or binary.
4. Binary, or dichotomous, data have only two possible outcomes.
Common .......... are Yes/No or True/False responses, but they
could also include other common epidemiological outcomes,
such as “survive” and“not survive”.
5. Numeric data which may include something such as weight and
length (where the range would be from .......... to, theoretically,
infinity).
6. Nominal Data describes categorical data without an order. Ex-
amples .......... blood groups (O, A, B, AB), eye colour and
marital status.
7. The most obvious first step in assessing a trend is to plot the
............. of interest by year.
8. Time series analysis refers to a particular collection of spe-
cialised .......... methods that illustrate trends in the data.
9. Interval data are numerical data where the differences between
two ........... can be interpreted, but the ratio between two num-
bers is meaningless.
10. Moving averages (or rolling averages) provide a useful way of
.......... time series data.
3.3 Exercises 27
11. Ratio data are numerical and have a true zero and .......... dif-
ferences and ratios are meaningful.
12. There are four data scales: nominal, ordinal, interval and ratio.
Nominal and ordinal .......... have already been described.
4
Selected Topics in Probability
4.1 Probability
1
Probability theory arose originally in connection with games of
chance and then for a long time it was used primarily to investigate
the credibility of testimony of witnesses in the “ethical” sciences.
Nevertheless, probability has become a very powerful mathemat-
ical tool in understanding those aspects of the world that cannot
be described by deterministic laws. Probability has succeeded in
finding strict determinate relationships where chance seemed to
reign and so terming them “laws of chance” combining such con-
trasting notions in the nomenclature appears to be quite justified.
This introductory chapter discusses such notions as determinism,
chaos and randomness, predictability and unpredictability, some ini-
tial approaches to formalizing randomness and it surveys certain
problems that can be solved by probability theory. This will per-
haps give one an idea to what extent the theory can answer ques-
tions arising in specific random occurrences and the character of
the answers provided by the theory.
Games of chance and the analysis of testimony of witnesses
were originally the basic areas of application of probability theory.
Games of chance involving cards, dice and flipping coins naturally
permitted the creation of appropriate random experiments (this
terminology first appeared in the twentieth century) so that their
1
This section has be quoted from:
Skorokhod, Valeriy. Basic principles and applications of probability theory.
Springer Science & Business Media, 2005.
30 4 Selected Topics in Probability
4.2.1 Independence
4.2.2 Samples
e−np (np)k
n k
lim
n→∞
P (X = k) = lim
n→∞
p (1 − p)n−k =
p→0 p→0 k k!
np=const np=const
4.4 Exercises
5.1 Groups
1
Let S be a set. A mapping
S×S →S
is sometimes called a law of composition (of S into itself). If x, y
are elements of S, the image of the pair (x, y) under this mapping
is also called their product under the law of composition, and will
be denoted by xy. (Sometimes, we also write x · y, and in many
cases it is also convenient to use an additive notation, and thus to
write x + y. In that case, we call this element the sum of x and y.
It is customary to use the notation x + y only when the relation
x + y = y + x holds.)
Let S be a set with a law of composition. If x, y, z are elements
of S, then we may form their product in two ways: (xy)z and
x(yz). If (xy)z = x(yz) for all x, y, z in S then we say that the law
of composition is associative.
An element e of S such that ex = x = xe for all x ∈ S is called
a unit element. When the law of composition is written additively,
the unit element is denoted by 0, and is called a zero element. A
unit element is unique, for if e0 is another unit element, we have
e = ee0 = e0 by assumption. In most cases, the unit element is
written simply 1 (instead of e).
1
This chapter has been quoted from:
Serge Lang, Algebra, 2002 Springer-Verlag New York, Inc.
38 5 Selected Topics in Algebra
S × S → S and S × T → T
Then a product (xy)z makes sense with x ∈ S, y ∈ S, and z ∈
T . The product x(yz) also makes sense for such elements x, y, z
and thus it makes sense to say that our law of composition is
associative, namely to say that for all x, y, z as above we have
(xy)z = x(yz). If the law of composition of G is commutative, we
also say that G is commutative (or Abelian).
By a submonoid of G, we shall mean a subset H of G containing
the unit element e, and such that, if x, y ∈ H then xy ∈ H (we
say that H is closed under the law of composition). It is then clear
that H is itself a monoid, under the law of composition induced by
that of G.
A group G is a monoid, such that for every element x ∈ G there
exists an element y ∈ G such that xy = yx = e. Such an element y
is called an inverse for x. Such an inverse is unique, because if y 0 is
also an inverse for x, then y 0 = y 0 e = y 0 (xy) = (y 0 x)y = ey = y. We
denote this inverse by x−1 (or by −x when the law of composition
is written additively).
Let G be a group. A subgroup H of G is a subset of G contain-
ing the unit element, and such that H is closed under the law of
composition and inverse (i.e. it is a submonoid, such that if x ∈ H
then x−1 ∈ H). A subgroup is called trivial if it consists of the unit
element alone. The intersection of an arbitrary non-empty family
of subgroups is a subgroup (trivial verification).
5.2 Rings and Homomorphism 39
5.3 Module
5.4 Exercises
6.2 Interpolation
df (x) f (x + h) − f (x)
= lim
x h→0 h
where h is the step size. If we use a Taylor expansion for f (x), we
can write
h2 00
0
f (x + h) = f (x) + hf (x) + f (x) + · · ·
2
We can then set the computed derivative fc0 (x) as
f (x + h) − f (x) hf 00 (x)
fc0 (x) ' ' f 0 (x) + + ···
h 2
Assume now that we will employ two points to represent the func-
tion f by way of a straight line between x and x + h. This means
that we could represent
f (x + h) − f (x)
f20 (x) = + O(h).
h
where the suffix 2 refers to the fact that we are using two points
to define the derivative and the dominating error goes like O(h).
This is the forward derivative formula. Alternatively, we could use
the backward derivative formula
f (x) − f (x − h)
f20 (x) = + O(h).
h
If the second derivative is close to zero, this simple two point
formula can be used to approximate the derivative.
Above we have used the interpolating polynomial to approxi-
mate values of a function f at points where f is not known. An-
other use of the interpolating polynomial, of equal or even higher
importance in practice, is the imitation of the fundamental oper-
ations of calculus. In all these applications the basic idea is ex-
tremely simple: Instead of performing the operation on the func-
tion f , which may be difficult or-in cases where f is known at
discrete points only-impossible, the operation is performed on a
suitable interpolating polynomial.
6.5 Numerical Integration 49
6.6 Exercises
1
This section has been quoted from:
Morris. Tenenbaum, and Harry Pollard. Ordinary differential equations: an el-
ementary textbook for students of mathematics, engineering, and the sciences.
Dover Publications, 1963.
56 7 Selected Topics in Differential Equation
for every x in I.
3
This section has been quoted from:
http://en.wikipedia.org/wiki/Boundary value problem
60 7 Selected Topics in Differential Equation
7.6 Exercises
any point x on the endpoints will be ..........to the string itself. This
force is called the tension in the string and its magnitude will be
given by T (x, t).
Finally, we will let Q(x, t) represent the vertical ..........per unit
mass of any force acting on the string. Provided we again assume
that the slope of the string is small the vertical displacement of
the string at any point is then given by,
∂u2
∂ ∂u
ρ(x) 2 = T (x, t) + ρ(x)Q(x, t)
∂t ∂x ∂x
This is a very difficult partial differential equation to solve so
we need to make some further simplifications.
First, we’re now going to assume that the string is ..........elas-
tic. This means that the magnitude of the tension, T (x, t), will
only depend upon how much the string stretches near x. Again,
recalling that we’re assuming that the slope of the string at any
point is small this means that the tension in the string will then
very nearly be the same as the tension in the string in its equilib-
rium position. We can then assume that the tension is a constant
value, T (x, t) = T0 .
Further, in most cases the only external force that will act upon
the string is ..........and if the string light enough the effects of
gravity on the vertical displacement will be small and so will also
assume that Q(x, t) = 0. This leads to
∂ 2u ∂ 2u
ρ = T0
∂t2 ∂x2
If we now divide by the mass density and define,
T0
c2 =
ρ
we arrive at the 1-D wave equation,
∂ 2u 2
2∂ u
= c
∂t2 ∂x2
The initial conditions will also be a little different here from what
we saw with the heat equation. Here we have a 2nd order time
derivative and so we’ll also need two initial conditions. At any
64 7 Selected Topics in Differential Equation
cal concept, to decide how general its definition should be. The
definition finally settled on may seem a bit abstract, but as you
work through the various ways of constructing topological spaces,
you will get better feeling for what the concept means. A topology
on a set X is a collection τ of subsets of X having the following
properties:
1. ∅ and X are in τ .
2. The union of the elements of any subcollection of τ is in τ .
3. The intersection of the elements of any finite subcollection of τ
is in τ .
A set X for which a topology τ has been specified is called a
topological space.
Properly speaking, a topological space is an ordered pair (X, τ )
consisting of a set X and a topology τ in X, but we often omit
specific mention of τ if no confusion will arise.
If X is a topological space with topology τ , we say that a subset
U of X is an open set of X if U belongs to the collection τ . Using
this terminology, one can say that a topological space is a set X
together with a collection of subsets of X, called open sets, such
that ∅ and X are both open, and such that arbitrary union and
finite intersections of open sets are open.
If X is any set, the collection of all subsets of X is a topology
on X; it is called the discrete topology. The collection consisting
of X and ∅ is also a topology on X; we shall call it the indiscrete
topology, or the trivial topology.
Suppose that τ and τ 0 are two topologies on a given set X. If
τ 0 ⊃ τ , we say that τ 0 is finer than τ ; if τ 0 properly contains τ ,
we say that τ 0 is strictly finer than τ . We also say that τ is coarser
than τ 0 , or strictly coarser, in these two respective situations. We
say τ is comparable topology with τ 0 if either τ ⊃ τ 0 or τ 0 ⊃ τ .
Other terminology is sometimes used for this concept. If τ 0 ⊃ τ ,
some mathematicians would say that τ 0 is larger than τ and τ is
smaller than τ 0 . Many mathematicians use the words “weaker” and
“stronger” in this context. Unfortunately, some of them (particu-
lary analysts) are apt to say that τ 0 is stronger than τ if τ 0 ⊃ τ ,
while others (particulary topologists) are apt to say that τ 0 is
weaker than τ in the same situation.
8.2 Euler Characteristic 67
8.3.1 Simplexes
Oriented Simplexes
8.4 Exercises
9.2 Combinatorics
At the beginning of the 18th century, the following problem was
proposed:
9.3 Graphs and Trees 77
9.4 Recursion
9.5 Exercises
1. Translate the following sentences.
Certain graph problems deal with finding a path between two
vertices such that each edge is traversed exactly once, or finding a
path between two vertices while visiting each vertex exactly once.
These paths are better known as Euler path and Hamiltonian path
respectively. The Euler path problem was first proposed in the
1700s.
• An Euler path is a path that uses every edge of a graph exactly
once. It starts and ends at different vertices.
• An Euler circuit is a circuit that uses every edge of a graph
exactly once. It starts and ends at the same vertex.
There are simple criteria for determining whether a multigraph
has a Euler path or a Euler circuit. For any multigraph to have a
Euler circuit, all the degrees of the vertices must be even.
82 9 Selected Topics in Discrete Mathematics
1
This chapter has been quoted from:
Introduction to Operations Research Seventh Edition, Hillier and Lieberman 2001
The McGraw-Hill Companies.
86 10 Selected Topics in Optimization
and
x1 ≥ 0, x2 ≥ 0, . . . , xn ≥ 0.
Common terminology for the linear programming model can now
be summarized. The function being maximized, c1 x1 + c2 x2 + · · · +
88 10 Selected Topics in Optimization
illustrate how this is done with two examples at the end of this
section.
This reference to a unit cost implies the following basic assump-
tion for any transportation problem.
The cost assumption: The cost of distributing units
from any particular source to any particular destination
is directly proportional to the number of units distributed.
Therefore, this cost is just the unit cost of distribution times
the number of units distributed. (We let cij denote this unit
cost for source i and destination j.)
The only data needed for a transportation problem model are the
supplies, demands, and unit costs. These are the parameters of the
model.
The model: Any problem (whether involving transporta-
tion or not) fits the model for a transportation problem if
it can be described completely in terms of a parameter and
it satisfies both the requirements assumption and the cost
assumption. The objective is to minimize the total cost of
distributing the units.
10.4 Exercises
11.6 Exercises
However, recognition of this fact is one that took a long time for
mathematicians to accept. For example, John Wallis wrote, “These
Imaginary Quantities (as they are commonly called) arising from
the Supposed Root of a Negative Square (when they happen) are
reputed to imply that the Case proposed is impossible”.
Through the Euler formula, a complex number z = x + iy may
be written in “phasor” form
The term elementary generally denotes a method that does not use
complex analysis. For example, the prime number theorem was
first proven using complex analysis in 1896, but an elementary
proof was found only in 1949 by Erdős and Selberg.[76] The term
is somewhat ambiguous: for example, proofs based on complex
Tauberian theorems (for example, Wiener-Ikehara) are often seen
as quite enlightening but not elementary, in spite of using Fourier
1
This chapter has been quoted from:
https://en.wikipedia.org/wiki/Number theory#Elementary tools
106 12 Selected Topics in Number Theory
While the word algorithm goes back only to certain readers of al-
Khwārizmī, careful descriptions of methods of solution are older
than proofs: such methods (that is, algorithms) are as old as any
108 12 Selected Topics in Number Theory
a ≡ b (mod n)
The parentheses mean that (mod n) applies to the entire equation,
not just to the right-hand side (here b). Sometimes, = is used
instead of ≡ ; in this case, if the parentheses are omitted, this
generally means that “mod” denotes the modulo operation applied
to the righthand side, and the equality implies thus that 0 ≤ a < n.
The number n is called the modulus of the congruence.
12.3.2 Applications
12.4 Exercises
1. Translate the following text.
A continued fraction is an expression obtained through an it-
erative process of representing a number as the sum of its inte-
ger part and the reciprocal of another number, then writing this
other number as the sum of its integer part and another reciprocal,
and so on. In a finite continued fraction (or terminated continued
fraction), the iteration/recursion is terminated after finitely many
steps by using an integer in lieu of another continued fraction. In
contrast, an infinite continued fraction is an infinite expression. In
either case, all integers in the sequence, other than the first, must
be positive. The integers ai are called the coefficients or terms of
the continued fraction.
Continued fractions have a number of remarkable properties
related to the Euclidean algorithm for integers or real numbers.
Every rational number pq has two closely related expressions as a
finite continued fraction, whose coefficients ai can be determined
by applying the Euclidean algorithm to (p, q). The numerical value
of an infinite continued fraction is irrational; it is defined from its
infinite sequence of integers as the limit of a sequence of values for
finite continued fractions. Each finite continued fraction of the se-
quence is obtained by using a finite prefix of the infinite continued
fraction’s defining sequence of integers. Moreover, every irrational
number α is the value of a unique infinite continued fraction, whose
coefficients can be found using the non-terminating version of the
Euclidean algorithm applied to the incommensurable values α and
1. This way of expressing real numbers (rational and irrational) is
called their continued fraction representation.
2. Use the correct form of the word.
1. Number theory is a vast and fascinating field of mathematics,
sometimes called “higher arithmetic,” (consist) ..........of the
study of the properties of whole numbers.
12.4 Exercises 111
2
This section has been quoted from:
http://www-groups.dcs.st-and.ac.uk/h̃istory/HistTopics/The four colour theorem.html
120 13 Famous problems in Math history
However the final ideas necessary for the solution of the Four
Colour Conjecture had been introduced before these last two re-
sults. Heesch in 1969 introduced the method of discharging. This
consists of assigning to a vertex of degree i the charge 6 − i. Now
from Euler’s formula we can deduce that the sum of the charges
over all the vertices must be 12. A given set S of configurations
can be proved unavoidable if for a triangulation T which does not
contain a configuration in S we can redistribute the charges (with-
out changing the total charge) so that no vertex ends up with a
positive charge.
Heesch thought that the Four Colour Conjecture could be
solved by considering a set of around 8900 configurations. There
were difficulties with his approach since some of his configurations
had a boundary of up to 18 edges and could not be tested for re-
ducibility. The tests for reducibility used Kempe chain arguments
but some configurations had obstacles to prevent reduction.
The year 1976 saw a complete solution to the Four Colour
Conjecture when it was to become the Four Colour Theorem for
the second, and last, time. The proof was achieved by Appel and
Haken, basing their methods on reducibility using Kempe chains.
They carried through the ideas of Heesch and eventually they con-
structed an unavoidable set with around 1500 configurations. They
managed to keep the boundary ring size down to 14, making com-
putations easier that for the Heesch case. There was a long period
where they essentially used trial and error together with unbeliev-
able intuition to modify their unavoidable set and their discharging
procedure. Appel and Haken used 1200 hours of computer time to
work through the details of the final proof. Koch assisted Appel
and Haken with the computer calculations.
The Four Colour Theorem was the first major theorem to be
proved using a computer, having a proof that could not be verified
directly by other mathematicians. Despite some worries about this
initially, independent verification soon convinced everyone that the
Four Colour Theorem had finally been proved. Details of the proof
appeared in two articles in 1977. Recent work has led to improve-
ments in the algorithm.
Index
abscissa, 46 boundary, 1, 59
absolute value, 2, 10
abstract, 9 calculate, 1
additive, 37 calculus, 1
additively, 37 canonical, 13
adjacent, 78, 121 category, 98
algebraic, 105 ceil, 22
algorithm, 47 chain, 70
analogy, 76 chaos, 29
analytic, 106 circle, 1
analytical, 47 circumference, 1
angle, 10 coarser, 66
antiderivative, 4 collection, 96
applicability, 50 column, 10
approach, 1–3 combinatorics, 76, 107
approximate, 2 commutativity, 38
approximation, 1, 45 compact, 95
arc, 78 compactness, 95
arithmetic, 105 comparable, 66
array, 9 completeness, 98
assignment, 88 complex, 105
associative, 37 complexity, 86
associativity, 38 component, 11
assume, 47 componentwise, 11
assumption, 89 composition, 37
asymptotic, 33 compute, 3
average, 3 conclusion, 76
axiomatically, 9 conditional, 76
confidence, 23
barycentric, 69 configuration, 122
bases, 9 conjecture, 106
basis, 97 conjugate, 11, 100
orthogonal-, 97 conjunction, 75
126 Index
connective, 76 element, 10
constraint, 88 eliminate, 56
functional-, 88 elliptic, 60
nonnegativity-, 88 endomorphism, 39
continuous, 4 equation
contradict, 75 characteristic, 81
converge, 2 equivalent, 99
convergent, 98 error, 1
convex, 97 estimate, 122
coordinates, 97 evaluating, 5
coordinatize, 97 event, 30
corollary, 117 execute, 47
correctness, 75 experiment, 29
countable, 98 explicit, 77
counter-example, 117 extreme, 95
cover, 96
cryptography, 109 factorization, 115
curve, 5 fallacious, 114
cycle, 70 fallacy, 114
false, 75
decomposition, 9 feasible, 86
degenerate, 10 region, 88
denominator, 1 field, 10
derangement, 77 figure, 45
derivative, 3 finer, 66
partial-, 58 floating-point, 10
deterministic, 29 form
diagonal, 13 bilinear-, 9
diameter, 1 quadratic-, 9
dichotomy, 99 frequency, 22, 32
difference, 46 function, 3
Differential, 4 position-, 3
differential, 55 objective, 88
differentiate, 55
differentiation, 4 geometry, 1
digit, 45 graph, 77
significant-, 45 acyclic, 79
dimension, 9, 11 bipartite, 78
disjunction, 76 complete, 78
disks, 95 planar, 79
dispersion, 23 simple-, 78
distance, 5 undirected-, 78
distributivity, 39 weighted, 79
domain, 99
homogeneous, 81
edge, 67, 78 homomorphism, 39
eigenfunction, 59 hyperbolic, 60
eigenvalue, 13 hyperplane, 69
eigenvector, 13 hypotheses, 47
Index 127
perimeter, 1 sample, 31
permutation, 77 sampling, 31
phenomenon, 19 scalar, 9
plane, 95 scientific, 85
polygon, 1 secant, 3
polyhedron, 67 sentence, 75
polynomial declarative-, 75
characteristic-, 13 sequence, 1, 47
interpolating-, 48 sequential, 96
root, 13 sesqui, 100
predictability, 29 sesquilinear, 100
premise, 76 set
probabilistic, 107 ordered-, 95
probability, 19, 29 orthogonal-
conditional-, 31 maximal-, 97
process, 1 ordered, 66
proof, 108 side, 1
propagation, 49 simplex
proportional, 90 method, 87
proposition, 75 simplicial, 70
compound-, 75 skew, 22
slope, 1
quadratic, 106 solution, 88
quadrature, 49 feasible-, 88
quantitatively, 19 optimal-, 88
quartile, 24 space, 95
quasi triangular, 14 Euclidean-, 95
quaternion, 120 metric-, 96
quotient, 30 complete-, 97
topological-, 96
randomness, 29 vector-, 12
range, 22 linear-, 12
crude-, 22 quotient-, 69
rate, 3 standard deviation, 23
ratio, 1 statistical, 19
recurrence, 81 statistics, 19
Recursion, 80 descriptive-, 19
reducibility, 122 inductive-, 19
reducible, 122 inferential-, 19
reformulate, 91 stochastic, 30
regular, 1 subgraph, 78
relation subgroup, 40
equivalence-, 98 subinterval, 5
congruence, 108 submodule, 41
equivalence, 69 submonoid, 38
replacement, 31 subsequence, 96
restrictive, 91 subsequent, 45
ring, 39 subspaces, 9
rounding, 45 subtract, 22
Index 129
successive, 46 true, 75
surface, 69
surjective, 99 uncountable, 98
symmetric, 78 uniform, 22
uniformly, 96
union, 66
tangent, 3
unoriented, 70
term, 2
unpredictability, 29
terminology, 66
tetrahedron, 68
variability, 23
topology, 10, 65
variable, 2
discrete-, 66 decision-, 87
indiscrete, 66 random-, 20
trajectory, 57 variance, 24
transitive, 45 variation
transportation, 88 coefficient of- , 23
tree, 79 vector, 9
rooted-, 79 space, 10
spanning-, 79 velocity, 3
minimal, 79 instantaneous-, 3
triangle verification, 38
right, 114 vertex, 67, 78
triangulable, 70 degree of a-, 78
triangular, 13 isolated-, 78
triangulation, 70, 122 volume, 5
trivial, 38
trivially, 39 weighting, 24