The Best Approximation Theorem INCOMPLETE
The Best Approximation Theorem INCOMPLETE
Siddharth Kothari
September 2023
1 Abstract
In this paper, we demonstrate the power of abstraction through linear algebra,
and more specifically, through the ‘best approximation theorem’. We begin
with the problem of trying to solve, or finding the best possible solution to an
over-determined system of linear equations. Then, we aim to generalize this
idea, and apply it to seemingly unrelated areas, such as trying to approximate
trigonometric and exponential functions using polynomials or even doing it the
other way around - approximating polynomials using trigonometric functions!
2 The setup in Rn
2.1 Solving Overdetermined Systems
To begin, we have a system of linear equations namely, Ax = b, where A is
a m by n real matrix, and b is an n dimensional vector. For a system to be
over-determined, there must be more equations than unknowns, that is, m must
be greater than n.
1
While we know that an exact solution to an over-determined Ax = b, may not
be exist (Note: the system of equations in unsolvable exactly when the vector
b is not in the column space of the matrix A), we can still try to find the ‘best’
solution. In this case, we quantify the ‘best’ solution, to be the one such that
the ‘error’, e = ∥Ax - b∥ is as small as possible. Phrased differently, we want
to answer the question:
Once we find the vector p, to find the solution x, we can simply solve the
equation: Ax = p. Geometrically, the column space of a matrix can be a line,
a plane, a 3D plane, and so on!
One way to visualize a specific (2D) case of this problem, is to go back to the
matrix-vector equation that we had for the line of best fit. The column space
of the coefficient matrix is a plane in Rn , and the vector b is (mostly always)
sticking out from the plane. What vector in C(A) is closest to b ? The key
to solve this problem (and the general one), is to generalize the mere fact that
the distance between a point and a line (a one-dimensional subspace) is least
when the line segment connecting the two is perpendicular to the given line.
Similarly, the distance between a point (b) and a plane (C(A)) is least when
the line connecting the two is perpendicular to the plane ! It turns out this
holds in general, and we give a formal ‘proof’ for this fact:
a · (b - p) = 0
Since p is in the span of a, we can write p = ax, for some number x
a · (b − ax) = 0
a · b = x(a · a)
2
Finally solving for x, we get:
a·b
x=
a·a
Multiplying by a, we get that p is equal to:
a·b
p= a
a·a
Applying theorem 2.1.1 again, we know that for p to be the vector in C(A) that
is closest to b, the corresponding e must be perpendicular to C(A). That is, it
should be perpendicular to all the basis vectors of C(A). We get n equations,
and the ith equation looks like: ai · (b − p) = 0. We can encompass all these
n equations into a single matrix-vector equation:
AT (b − p)
Now note that p is in the column space of A, and thus we can find scalars
x1 , · · · , xn such that
p = x1 a 1 + · · · + xn a n
Now this identical to multiplying the matrix A with the vector x, so we have
that p = Ax. Substituting this into the previous equation:
AT (b − Ax) = 0
AT b = AT Ax
x = (AT A)−1 AT b
3
3 The Set Up in an Inner Product Space
4 Application to the Space of Continuous Func-
tions
5 Appendix
Theorem A1: Given a matrix A whose columns are linearly independent, the
matrix AT A is invertible.