0% found this document useful (0 votes)
63 views17 pages

Least-Squares Data Fitting: EE263 Autumn 2015 S. Boyd and S. Lall

The document discusses least squares data fitting, which finds the linear combination of basis functions that best fits a set of data points by minimizing the total squared error between the fitted values and actual measured values. It describes how to set up and solve the least squares problem using matrices and provides examples of polynomial fitting and system identification using moving average models.

Uploaded by

Mouli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views17 pages

Least-Squares Data Fitting: EE263 Autumn 2015 S. Boyd and S. Lall

The document discusses least squares data fitting, which finds the linear combination of basis functions that best fits a set of data points by minimizing the total squared error between the fitted values and actual measured values. It describes how to set up and solve the least squares problem using matrices and provides examples of polynomial fitting and system identification using moving average models.

Uploaded by

Mouli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

EE263 Autumn 2015 S. Boyd and S.

Lall

Least-squares data fitting

1
Least-squares data fitting

we are given:

I functions f1 , . . . , fn : S → R, called regressors or basis functions

I data or measurements (si , gi ), i = 1, . . . , m, where si ∈ S and (usually)


mn

problem: find coefficients x1 , . . . , xn ∈ R so that

x1 f1 (si ) + · · · + xn fn (si ) ≈ gi , i = 1, . . . , m

i.e., find linear combination of functions that fits data


least-squares fit: choose x to minimize total square fitting error:
m
X
(x1 f1 (si ) + · · · + xn fn (si ) − gi )2
i=1

2
Least-squares data fitting

I total square fitting error is kAx − gk2 , where Aij = fj (si )

I hence, least-squares fit is given by

x = (AT A)−1 AT g

(assuming A is skinny, full rank)

I corresponding function is

flsfit (s) = x1 f1 (s) + · · · + xn fn (s)

I applications:

I interpolation, extrapolation, smoothing of data


I developing simple, approximate model of data

3
Least-squares polynomial fitting

problem: fit polynomial of degree < n,

p(t) = a0 + a1 t + · · · + an−1 tn−1 ,

to data (ti , yi ), i = 1, . . . , m

I basis functions are fj (t) = tj−1 , j = 1, . . . , n

j−1
I matrix A has form Aij = ti

t21 ··· tn−1


 
1 t1 1
1 t2 t22 ··· tn−1
2

A=
 
.. .. 
 . . 
1 tm t2m ··· n−1
tm

(called a Vandermonde matrix)

4
Vandermonde matrices

assuming tk 6= tl for k 6= l and m ≥ n, A is full rank:

I suppose Aa = 0

I corresponding polynomial p(t) = a0 + · · · + an−1 tn−1 vanishes at m points


t1 , . . . , t m

I by fundamental theorem of algebra p can have no more than n − 1 zeros, so


p is identically zero, and a = 0

I columns of A are independent, i.e., A full rank

5
Example

I fit g(t) = 4t/(1 + 10t2 ) with polynomial

I m = 100 points between t = 0 & t = 1

I fits for degrees 1, 2, 3, 4 have RMS errors .135, .076, .025, .005, respectively

0.75 0.75

0.50 0.50

0.25 0.25
degree 1 degree 2
0.00 0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.75 0.75

0.50 0.50

0.25 0.25
degree 3 degree 4
0.00 0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

6
Growing sets of regressors

consider family of least-squares problems

p
X
minimize xi ai − y



i=1

for p = 1, . . . , n

(a1 , . . . , ap are called regressors)

I approximate y by linear combination of a1 , . . . , ap

I project y onto span{a1 , . . . , ap }

I regress y on a1 , . . . , ap

I as p increases, get better fit, so optimal residual decreases

7
Growing sets of regressors

solution for each p ≤ n is given by

xls = (ATp Ap )−1 ATp y = Rp−1 QTp y


(p)

where

I Ap = [a1 · · · ap ] ∈ Rm×p is the first p columns of A

I Ap = Qp Rp is the QR factorization of Ap

I Rp ∈ Rp×p is the leading p × p submatrix of R

I Qp = [q1 · · · qp ] is the first p columns of Q

8
Norm of optimal residual versus p

plot of optimal residual versus p shows how well y can be matched by linear com-
bination of a1 , . . . , ap , as function of p

kresidualk

kyk
min kx1 a1 − yk
x1

X
7
min k xi ai − yk
x1 ,...,x7
i=1
p
0 1 2 3 4 5 6 7

9
Least-squares system identification

we measure input u(t) and output y(t) for t = 0, . . . , N of unknown system

u(t) unknown system y(t)

system identification problem: find reasonable model for system based on mea-
sured I/O data u, y
example with scalar u, y (vector u, y readily handled): fit I/O data with moving-
average (MA) model with n delays

ŷ(t) = h0 u(t) + h1 u(t − 1) + · · · + hn u(t − n)

where h0 , . . . , hn ∈ R

10
System identification

we can write model or predicted output as

u(n − 1) ···
    
ŷ(n) u(n) u(0) h0
 ŷ(n + 1)   u(n + 1) u(n) ··· u(1)    h1
 
=
   
.. .. .. ..  .
  ..
 
 .   . . . 
ŷ(N ) u(N ) u(N − 1) ··· u(N − n) hn

model prediction error is

e = (y(n) − ŷ(n), . . . , y(N ) − ŷ(N ))

least-squares identification: choose model (i.e., h) that minimizes norm of model


prediction error kek
. . . a least-squares problem (with variables h)

11
Example

data used to fit model

4 3.5

3 3.0
2.5
2
2.0
1
1.5

u(t) 0 y(t) 1.0


-1 0.5
0.0
-2
-0.5
-3
-1.0
-4 -1.5
-5 -2.0
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
t t

12
Example
for n = 7 we obtain MA model with

(h0 , . . . , h7 ) = (.024, .282, .418, .354, .243, .487, .208, .441)

with relative prediction error kek/kyk = 0.37


3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
-1.0
-1.5
-2.0
0 10 20 30 40 50 60 70

y(t) actual output, ŷ(t) predicted from model


13
Model order selection

question: how large should n be?

I obviously the larger n, the smaller the prediction error on the data used to
form the model

I suggests using largest possible model order for smallest prediction error

14
Model order selection

1.0

relative prediction error kek/kyk


0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1
0 5 10 15 20 25 30 35 40 45 50 55
n

difficulty: for n too large the predictive ability of the model on other I/O data
(from the same system) becomes worse

15
Out of sample validation

I evaluate model predictive performance on another I/O data set not used to
develop model model validation data set
I check prediction error of models (developed using modeling data) on valida-
tion data
I plot suggests n = 10 is a good choice

1.0
relative prediction error kek/kyk

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1
0 5 10 15 20 25 30 35 40 45 50 55
n 16
Validation

for n = 50 the actual and predicted outputs on system identification and model
validation data are:

3.5 2.5
3.0 2.0
2.5 1.5
2.0 1.0
1.5 0.5
1.0 0.0
0.5 -0.5
0.0 -1.0
-0.5 -1.5
-1.0 -2.0
-1.5 -2.5
-2.0 -3.0
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70
t t

I y(t) actual output, ŷ(t) predicted from model

I loss of predictive ability when n too large called model overfit or overmodeling

17

You might also like