Regression Using LS Handout
Regression Using LS Handout
Function class. For the problem (1) to make sense, our search
needs to be limited to a class of functions F . For example, we
might restrict F to be polynomial, or twice differentiable, etc.
Choosing this F is a modeling problem, as it basically encodes
what we believe to be true (or reasonably close to true) about the
function we are trying to discover. It also has to be something
Loss function. This penalizes the deviation of the f (tm) from the
ym. Given a loss function `(·, ·) : R × R → R+ that quantifies
the penalty for the deviation at a single sample location, we can
assign a (positive) numeric score to the performance of every
candidate f by summing over all the samples. This allows us to
write (2) more precisely as an optimization problem:
M
X
min `(ym, f (tm)).
f ∈F
m=1
There are again many loss functions you might consider, and
depending on the context, some might be more natural than
M
X 2
min |ym − f (tm)| .
f ∈F
m=1
T
f (t) = ht, wi = t w.
T
ym ≈ tmw,
M
X T 2
min |ym − tmw| .
w∈RD m=1
tT1
y1
tT2 y2
A= , y = .
... ...
tTM yM
2
min ||y − Aw||2.
w∈RD
T
fˆ(t) = t ŵ = ŵ1t1 + · · · + ŵD tD .
f (t) = w0 + w1t1 + · · · + wD tD .
N
X
f ( t) = xnψn(t).
n=1
N
X N
X
y1 ≈ xnψn(t1), · · · , yM ≈ xnψn(tM ).
n=1 n=1
M
X N
X M
X
2 T 2
min |ym − xnψn(tm)| = |ym − Ψ (tm)x| ,
x∈RN m=1 n=1 m=1
(3)
where Ψ(·) : RD → RN is
ψ1(t)
ψ2(t)
Ψ ( t) =
... .
ψN (t)
2
min ||y − Ax||2.
x∈RN
N
X
fˆ(t) = x̂nψn(t).
n=1
We have seen that in both the linear regression case and the
nonlinear regression case, the problem reduces to same form.
We will look at problems like this multiple times in this course,
and it is worth starting to study this kind of problem from the
perspective of linear algebra.
We will answer these questions using the following two facts from
linear algebra. For any M × N matrix A,
since Av = 0.
4. If rand(A) = M , then there exists at least one x̂ such that
||y − Ax̂|| = 0, that is Ax̂ = y . If in addition M < N ,
then there will be an infinity of such solutions.
The first thing to realize (and this holds in the general case as
well) is that the solution to the above will be in Row(A). We
know that the row and null spaces are orthogonal complements
of one another, and so every x ∈ RN can be written as
The problem with solving (7) when A has a non-trivial null space
is not just that there are an infinity of solutions, it is also that
the space of solutions is unbounded – you can add a vector from
the null space of arbitrary size and not change the funtional.
Even when A technically has full column rank, if it is poorly
conditioned, then a small change in y can amount to a massive
change in x̂.
2 2
min ||y − Ax||2 + δ||x||2. (9)
x∈RN
T T
(A A + δI)x = A y. (10)
T −1 T
x̂ = (A A + δI) A y.
T T −1
x̂ = A (AA + δI) y.
You can verify this by plugging the expression above into the left
hand side of (10).
for some δ ≥ 0.
M
X 2 2
min |ym − f (tm)| + δ||f ||S . (12)
f ∈S
m=1
Note that we are using the Hilbert space norm to penalize the
size of the function. If we again introduce a basis, modeling the
target function as being in the span of ψ1, . . . , ψN , then we can
rewrite this as
M
X N
X
2 2
min |ym − Ax| + δ|| xnψ n||S ,
x∈RN m=1 n=1
N X
X N
= xnxk hψ n, ψ k iS
n=1 k=1
T
= x Gx,