0% found this document useful (0 votes)
36 views4 pages

The Best Approximation Theorem INCOMPLETE

This document discusses the best approximation theorem from linear algebra. It explains how to find the best solution to an over-determined system of linear equations by minimizing the error between the solution and the actual values. It also generalizes this concept to approximating functions using different bases like polynomials and trigonometric functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views4 pages

The Best Approximation Theorem INCOMPLETE

This document discusses the best approximation theorem from linear algebra. It explains how to find the best solution to an over-determined system of linear equations by minimizing the error between the solution and the actual values. It also generalizes this concept to approximating functions using different bases like polynomials and trigonometric functions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

The Best Approximation Theorem

Siddharth Kothari
September 2023

1 Abstract
In this paper, we demonstrate the power of abstraction through linear algebra,
and more specifically, through the ‘best approximation theorem’. We begin
with the problem of trying to solve, or finding the best possible solution to an
over-determined system of linear equations. Then, we aim to generalize this
idea, and apply it to seemingly unrelated areas, such as trying to approximate
trigonometric and exponential functions using polynomials or even doing it the
other way around - approximating polynomials using trigonometric functions!

2 The setup in Rn
2.1 Solving Overdetermined Systems
To begin, we have a system of linear equations namely, Ax = b, where A is
a m by n real matrix, and b is an n dimensional vector. For a system to be
over-determined, there must be more equations than unknowns, that is, m must
be greater than n.

Just as an example, over-determined systems show up while trying to compute


a line of best fit for a set of data points. Lets say we had n data points,
Pi = (xi , yi ), with i ranging from 1 to n. The equation for the line of best
fit may look something like y = mx + c, and we can construct a system of n
equations to solve for m and c, where the ith equation looks like: yi = mxi + c.
Recasting this into a matrix-vector equation, we get:
 
x1 1    
 .. ..  m = y1
 . . c y2
xn 1
Note that an over-determined system of equations may or may not have a
unique solution (for example, a line can pass through all the three points (1,2),
(2,4) and (3,6)), but intuitively, we know that mostly it will not have a exact
solution.

1
While we know that an exact solution to an over-determined Ax = b, may not
be exist (Note: the system of equations in unsolvable exactly when the vector
b is not in the column space of the matrix A), we can still try to find the ‘best’
solution. In this case, we quantify the ‘best’ solution, to be the one such that
the ‘error’, e = ∥Ax - b∥ is as small as possible. Phrased differently, we want
to answer the question:

Question 2.1.1: In the vector space Rm , given a m by n matrix, A, and a


vector b, find the vector p in C(A) such that it closest to b, that is, the quantity
∥p - b∥ is minimized. This vector p is called the projection of the vector b onto
C(A).

Once we find the vector p, to find the solution x, we can simply solve the
equation: Ax = p. Geometrically, the column space of a matrix can be a line,
a plane, a 3D plane, and so on!

One way to visualize a specific (2D) case of this problem, is to go back to the
matrix-vector equation that we had for the line of best fit. The column space
of the coefficient matrix is a plane in Rn , and the vector b is (mostly always)
sticking out from the plane. What vector in C(A) is closest to b ? The key
to solve this problem (and the general one), is to generalize the mere fact that
the distance between a point and a line (a one-dimensional subspace) is least
when the line segment connecting the two is perpendicular to the given line.
Similarly, the distance between a point (b) and a plane (C(A)) is least when
the line connecting the two is perpendicular to the plane ! It turns out this
holds in general, and we give a formal ‘proof’ for this fact:

Theorem 2.1.1: In the vector space Rm given a m by n matrix, A, and


a vector b, the vector p in C(A) is closest to b if and only if e = b - p is
perpendicular to C(A), that is, p is perpendicular to all the vectors in C(A).

2.2 The One Dimensional Case


In this section, we find an explicit formula for p, the projection of b, onto
the column space of A, a m by 1 matrix. That is C(A) = span(a), a one
dimensional subspace of Rm .

By theorem 2.1.1, we know that for p to be the best approximation of b, b - p


must be orthogonal to a. In an equation (Remember that two vectors a and b
are orthogonal iff a · b = 0) ,

a · (b - p) = 0
Since p is in the span of a, we can write p = ax, for some number x

a · (b − ax) = 0
a · b = x(a · a)

2
Finally solving for x, we get:
a·b
x=
a·a
Multiplying by a, we get that p is equal to:
a·b
p= a
a·a

2.3 The n-dimensional Case


Now we make the generalization from the one-dimensional case to the
n-dimensional case. The matrix whose column space we are projecting onto
is such that its columns are linearly independent (why independent?) . In
other words, A is an m by n matrix such that C(A) = span(a1 , · · · , an ) (the ith
column of the matrix A is equal to ai ).

Applying theorem 2.1.1 again, we know that for p to be the vector in C(A) that
is closest to b, the corresponding e must be perpendicular to C(A). That is, it
should be perpendicular to all the basis vectors of C(A). We get n equations,
and the ith equation looks like: ai · (b − p) = 0. We can encompass all these
n equations into a single matrix-vector equation:

AT (b − p)

Now note that p is in the column space of A, and thus we can find scalars
x1 , · · · , xn such that

p = x1 a 1 + · · · + xn a n

Now this identical to multiplying the matrix A with the vector x, so we have
that p = Ax. Substituting this into the previous equation:

AT (b − Ax) = 0
AT b = AT Ax

To solve for x, we can multiply both sides by the inverse of AT A (See


Theorem A1).

x = (AT A)−1 AT b

2.4 Using an Orthonormal Basis


The equation simplifies drastically if we use an orthonormal basis for the column
space of the matrix.

3
3 The Set Up in an Inner Product Space
4 Application to the Space of Continuous Func-
tions
5 Appendix
Theorem A1: Given a matrix A whose columns are linearly independent, the
matrix AT A is invertible.

You might also like