Interpolation and approximation
Given a set of points {(xj , yj ), j = 1, · · · , n}, and given a functional
⃗ nd “best” β
form f (x; β), ⃗ so that f (x; β) ⃗ “models” the data.
Interpolation Approximation
⃗ = yj
f (xj ; β) ⃗ + εj = yj
f (xj ; β)
εj is “noise”, E(εj ) = 0
1/19
Least squares approximation
E(εj ) = 0
E(ε2j ) = σj2
σj ≈ const
n
∑ 2
⃗ = ⃗ ⇒ min
2
χ (β) yj − f (xj ; β)
j=1
2/19
Weighted least squares
E(εj ) = 0
E(ε2j ) = σj2
If σj are signi cantly different (heteroscedasticity)
2
⃗
∑
n yj − f (xj ; β)
⃗ =
χ2 (β) ⇒ min
j=1
σj2
3/19
Least absolute deviations
Also known as L1 regression:
n
∑
⃗ = ⃗ ⇒ min
S(β) yj − f (xj ; β)
j=1
4/19
Total least squares
Also known as Orthogonal distance regression: minimize the sum of
squares of orthogonal distances from observations to the curve.
Can be more appropriate e.g. if both variables, x and y have
measurement errors.
5/19
Linear least squares
6/19
Least squares approximation
E(εj ) = 0
E(ε2j ) = σj2
σj ≈ const
n
∑ 2
⃗ = ⃗ ⇒ min
2
χ (β) yj − f (xj ; β)
j=1
7/19
Linear least squares
Consider an ordinary least squares problem,
n
∑ 2
⃗ = ⃗
ξ(β) j
y − f (x j ; β) ⇒ min
j=1
⃗ a linear combination
Let the model, f (x; β), is a linear function of β,
of m basis functions, φk (x)
∑
m
⃗ =
f (x; β) βk φk (x)
j=1
Typically, want m < n.
8/19
Linear least squares
⃗ a linear combination
Let the model, f (x; β), is a linear function of β,
of m basis functions, φk (x)
∑
m
⃗ =
f (x; β) βk φk (x)
j=1
The basis functions need not be linear:
▶ polynomials: φk (x) = xk
▶ Fourier series: φk (x) = eisk x
▶ φk (x) = xk log x
▶ ...
9/19
Linear least squares
We minimize with respect to β⃗
∑
n
⃗ =
ξ(β) |zj |2
j=1
where (j = 1, . . . , n)
( )
zj = yj − β1 φ1 (xj ) + β2 φ2 (xj ) + · · · + βm φm (xj )
Which is eqivalent to
2
ξ(β) =
y − Aβ⃗
2
with y = (y1 , · · · , yn )T and Akj = φk (xj ).
10/19
Design matrix
The design matrix A is an n × m matrix
φ1 ( ) φ2 ( ) · · · φm ( )
φ1 ( ) φ2 ( ) · · · φm ( )
A=
···
φ1 ( ) φ2 ( ) · · · φm ( )
The dimensions of the design matrix is # of observations × # of
parameters
11/19
Design matrix
The design matrix A is an n × m matrix
φ1 (x1 ) φ2 (x1 ) · · · φm (x1 )
φ1 (x2 ) φ2 (x2 ) · · · φm (x2 )
A=
···
φ1 (xn ) φ2 (xn ) · · · φm (xn )
The dimensions of the design matrix is # of observations × # of
parameters
11/19
Example: straight line t
The model is
⃗ = β1 + β2 x
f (x; β)
m=2: φ1 (x) = 1 , φ2 (x) = x
and the design matrix is
1 x1
1 x2
A = . .
.. ..
1 xn
12/19
Linear least squares
Normal equations
13/19
Linear least squares: normal equations
To minimize the quadratic form
⃗ =
y − Aβ⃗
2
ξ(β) 2
set the derivatives to zero,
∂ ⃗ = 0,
ξ(β) j = 1, · · · , m
∂βk
And obtain the normal equations:
⃗ = AT y
AT A β
14/19
Linear least squares: normal equations
Normal equations
⃗ = AT y
AT A β
give a formal solution of a linear least squares problem.
However, ( )
cond AT A = [cond A]2
so that typically the system of normal equations is very poorly
conditioned.
15/19
Linear least squares
QR factorization of the design matrix
16/19
Linear least squares: QR factorization
Recall that a matrix A can be factorized into
A = QR
where Q is orthogonal (QT Q = 1) and R is upper triangular.
Since a design matrix is thin and tall (m < n), last n − m rows of R
are zero:
[ ]
R1
A=Q
0
where dim R1 = m
17/19
Linear least squares: QR factorization
Since the 2-norm of a vector is invariant under a rotation by an
orthogonal matrix Q, we rotate the residual y − Aβ⃗
2
( )
2
ξ(β) =
y − Aβ⃗
=
QT y − Aβ⃗
[ ]
2
T R1 ⃗
=
Q y − β
0
Next, write [ ]
T f
Q y=
r
with dim f = m and dim r = n − m.
18/19
Linear least squares: QR factorization
This way,
2
⃗ =
ξ(β)
f − R1 β⃗
+ ∥r∥
2
⃗ satis es
And the minimum of ξ(β)
R1 β⃗ = f
19/19
Linear least squares: QR factorization
This way,
2
⃗ =
ξ(β)
f − R1 β⃗
+ ∥r∥
2
⃗ satis es
And the minimum of ξ(β)
R1 β⃗ = f
Algorithm
▶ Factorize the design matrix A = QR
▶ Rotate y → QT y (only need m rows ⇒ thin QR)
▶ Solve R1 β⃗ = f by back substitution.
19/19