0% found this document useful (0 votes)
58 views

Numerical Optimization: Lecture Notes #24 Nonlinear Least Squares - Orthogonal Distance Regression

This document provides an overview of orthogonal distance regression (ODR), which is a technique for nonlinear least squares regression that accounts for errors in both the dependent and independent variables. ODR formulates the problem as minimizing the sum of squared orthogonal distances between data points and the model curve. This allows the error to be measured as the shortest distance to the model, rather than just the vertical distance. ODR can be solved using standard nonlinear least squares methods, but a direct implementation may be computationally expensive due to the increased number of parameters. The document discusses error models, weighted least squares, and how ODR relates to standard nonlinear regression.

Uploaded by

emwlupe
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Numerical Optimization: Lecture Notes #24 Nonlinear Least Squares - Orthogonal Distance Regression

This document provides an overview of orthogonal distance regression (ODR), which is a technique for nonlinear least squares regression that accounts for errors in both the dependent and independent variables. ODR formulates the problem as minimizing the sum of squared orthogonal distances between data points and the model curve. This allows the error to be measured as the shortest distance to the model, rather than just the vertical distance. ODR can be solved using standard nonlinear least squares methods, but a direct implementation may be computationally expensive due to the increased number of parameters. The document discusses error models, weighted least squares, and how ODR relates to standard nonlinear regression.

Uploaded by

emwlupe
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Summary

Orthogonal Distance Regression


Numerical Optimization
Lecture Notes #24
Nonlinear Least Squares Orthogonal Distance Regression
Peter Blomgren,
[email protected]
Department of Mathematics and Statistics
Dynamical Systems Group
Computational Sciences Research Center
San Diego State University
San Diego, CA 92182-7720
http://terminus.sdsu.edu/
Fall 2013
Peter Blomgren, [email protected] Orthogonal Distance Regression (1/21)
Summary
Orthogonal Distance Regression
Outline
1
Summary
Linear Least Squares
Nonlinear Least Squares
2
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
Peter Blomgren, [email protected] Orthogonal Distance Regression (2/21)
Summary
Orthogonal Distance Regression
Linear Least Squares
Nonlinear Least Squares
Summary: Linear Least Squares
Our study of non-linear least squares problems started with a look at
linear least squares, where each residual r
j
(x) is linear, and the
Jacobian therefore is constant. The objective of interest is
f (x) =
1
2
|Jx +r
0
|
2
2
, r
0
=r(0),
solving for the stationary point f (x

) = 0 gives the normal equations


J
T
J

x

= J
T
r
0
.
We have three approaches to solving the normal equations for x

in
increasing order of computational complexity and stability:
(i) Cholesky factorization of J
T
J,
(ii) QR-factorization of J, and
(iii) Singular Value Decomposition of J.
Peter Blomgren, [email protected] Orthogonal Distance Regression (3/21)
Summary
Orthogonal Distance Regression
Linear Least Squares
Nonlinear Least Squares
Summary: Nonlinear Least Squares 1 of 4
Problem: Nonlinear Least Squares
x

= arg min
xR
n
[f (x)] = arg min
xR
n
_
_
1
2
m

j =1
r
j
(x)
2
_
_
, m n,
where the residuals r
j
(x) are of the form r
j
(x) = y
j
(x; t
j
).
Here, y
j
are the measurements taken at the locations/times t
j
,
and (x; t
j
) is our model.
The key approximation for the Hessian

2
f(x) = J(x)
T
J(x) +
m

j =1
r
j
(x)
2
r
j
(x) J(x)
T
J(x).
Peter Blomgren, [email protected] Orthogonal Distance Regression (4/21)
Summary
Orthogonal Distance Regression
Linear Least Squares
Nonlinear Least Squares
Summary: Nonlinear Least Squares 2 of 4
Line-search algorithm: Gauss-Newton, with the subproblem:
_
J(x
k
)
T
J(x
k
)
_
p
GN
k
= f (x
k
).
Guaranteed descent direction, fast convergence (as long as the
Hessian approximation holds up) equivalence to a linear least
squares problem (used for ecient, stable solution).
Peter Blomgren, [email protected] Orthogonal Distance Regression (5/21)
Summary
Orthogonal Distance Regression
Linear Least Squares
Nonlinear Least Squares
Summary: Nonlinear Least Squares 3 of 4
Trust-region algorithm: Levenberg-Marquardt, with the
subproblem:
p
LM
k
= arg min
pR
n
1
2
|J(x
k
) p +r
k
|
2
2
, subject to | p|
k
.
Slight advantage over Gauss-Newton (global convergence), same
local convergence properties; also (locally) equivalent to a linear
least squares problem.
Peter Blomgren, [email protected] Orthogonal Distance Regression (6/21)
Summary
Orthogonal Distance Regression
Linear Least Squares
Nonlinear Least Squares
Summary: Nonlinear Least Squares 4 of 4
Hybrid Algorithms:
When implementing Gauss-Newton or Levenberg-Marquardt,
we should implement a safe-guard for the large residual
case, where the Hessian approximation fails.
If, after some reasonable number of iterations, we realize that
the residuals are not going to zero, then we are better o
switching to a general-purpose algorithm for non-linear opti-
mization, such as a quasi-Newton (BFGS), or Newton method.
Peter Blomgren, [email protected] Orthogonal Distance Regression (7/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
Fixed Regressor Models vs. Errors-In-Variables Models
So far we have assumed that there are no errors in the variables
describing where / when the measurements are made, i.e. in the data
set t
j
, y
j
where t
j
denote times of measurement, and y
j
the measured
value, we have assumed that t
j
are exact, and the measurement errors
are in y
j
.
Under this assumption, the discrepancies between the model and the
measured data are

j
= y
j
(x; t
j
), i = 1, 2, . . . , m.
Next, we will take a look at the situation where we take errors in t
j
into
account these models are known as errors-in-variables models, and
their solutions in the linear case are referred to as total least squares
optimization, or in the non-linear case as orthogonal distance
regression.
Peter Blomgren, [email protected] Orthogonal Distance Regression (8/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
Least Squares vs. Orthogonal Distance Regression
1 1.5 2 2.5 3 3.5 4 4.5 5
0.9
1
1.1
1.2
1.3
1.4
1 1.5 2 2.5 3 3.5 4 4.5 5
0.9
1
1.1
1.2
1.3
1.4
Figure: (left) An illustration of how the error is measured in stan-
dard (xed regressor) least squares optimization. (right) An example
of orthogonal distance regression, where we measure the shortest
distance to the model curve. [The right gure is actually not correct, why?]
Peter Blomgren, [email protected] Orthogonal Distance Regression (9/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
Orthogonal Distance Regression
For the mathematical formulation of orthogonal distance regression
we introduce perturbations (errors)
j
for the variables t
j
, in
addition to the errors
j
for the y
j
s.
We relate the measurements and the model in to following way

j
= y
j
(x; t
j
+
j
),
and dene the minimization problem:
(x

) = arg min
x,

1
2
m

j=1
_
w
2
j
_
y
j
(x; t
j
+
j
)
_
2
+ d
2
j

2
j
_
,
where

d and w are two vectors of weights which denote the
relative signicance of the error terms.
Peter Blomgren, [email protected] Orthogonal Distance Regression (10/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
Orthogonal Distance Regression: The Weights
The weight-vectors

d and w must either be supplied by the
modeler, or estimated in some clever way.
If all the weights are the same w
j
= d
j
= (, then each term in the
sum is simply the shortest distance between the point (t
j
, y
j
) and
curve (x; t) (as illustrated in the previous gure).
In order to get the orthogonal-looking gure, I set w
j
= 1/0.5 and
d
j
= 1/4, thus adjusting for the dierent scales in the t- and
y-directions.
The shortest path between the point and the curve will be normal
(orthogonal) to the curve at the point of intersection.
We can think of the scaling (weighting) as adjusting for measuring
time in fortnights, seconds, milli-seconds, micro-seconds, or
nano-seconds...
Peter Blomgren, [email protected] Orthogonal Distance Regression (11/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
Orthogonal Distance Regression: In Terms of Residuals r
j
By identifying the 2m residuals
r
j
(x,

) =
_
_
_
w
j
_
y
j
(x; t
j
+
j
)
_
j = 1, 2, . . . , m
d
j m

j m
j = (m + 1), (m + 2), . . . , 2m
we can rewrite the optimization problem
(x

) = arg min
x,

1
2
m

i =1
w
2
j
_
y
j
(x; t
j
+
j
)
_
2
+ d
2
j

2
j
,
in terms of the 2m-vector r(x,

)
(x

) = arg min
x,

1
2
2m

i=1
r
j
(x,

)
2
= arg min
x,

1
2
|r(x,

)|
2
2
.
Peter Blomgren, [email protected] Orthogonal Distance Regression (12/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
Orthogonal Distance Regression Least Squares
If we take a cold hard stare at the expression
(x

) = arg min
x,

1
2
2m

i =1
r
j
(x,

)
2
= arg min
x,

1
2
|r(x,

)|
2
2
.
We realize that this is now a standard (nonlinear) least squares
problem with 2m terms and (n + m) unknowns x,

.
We can use any of the techniques we have previously explored for
the solution of the nonlinear least squares problem.
However, a straight-forward implementation of these strategies
may prove to be quite expensive, since the number of parameters
have doubled to 2m and the number of independent variables have
grown from n to (n + m). Recall that usually m n, so this is a
drastic growth of the problem.
Peter Blomgren, [email protected] Orthogonal Distance Regression (13/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
Orthogonal Distance Regression Least Squares: Problem Size
m
n (n+m)
2m
Figure: We recast ODR as a much larger standard
nonlinear least squares problem.
Standard LSQ-solution via QR/SVD O(mn
2
), for m n; slows
down by a factor of 2(1 + m/n)
2
.
Peter Blomgren, [email protected] Orthogonal Distance Regression (14/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
ODR Least Squares: Exploiting Structure
Fortunately we can save a lot of work by exploiting the structure of the
Jacobian of the Least Squares problem originating from the orthogonal
distance regression many entries are zero!
r
j

i
= w
j
[y
j
(x; t
j
+
j
)]

i
= 0, i , j m, i ,= j
r
j

i
=
[d
j m

j m
]

i
=
_
0 i ,= (j m), j > m
d
j m
i = (j m), j > m
r
j
x
i
=
[d
j m

j m
]
x
i
= 0, i = 1, 2, . . . , n, j > m
Let v
j
= w
j
[y
j
(x; t
j
+
j
)]

j
, and let D = diag(

d), and V = diag(v),


then we can write the Jacobian of the residual function in matrix form...
Peter Blomgren, [email protected] Orthogonal Distance Regression (15/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
ODR Least Squares: The Jacobian
We now have
J(x,

) =
_

J V
0 D
_
,
where D and V are m m diagonal matrices, D = diag(

d), and
V = diag(v), and

J is the m n matrix dened by

J =
_
[w
j
(y
j
(x; t
j
+
j
))]
x
i
_
j = 1, 2, . . . , m
i = 1, 2, . . . , n
We can now use this matrix in e.g. the Levenberg-Marquardt
algorithm...
Peter Blomgren, [email protected] Orthogonal Distance Regression (16/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
ODR Least Squares: The Jacobian Structure
m
n (n+m)
2m
Figure: If we exploit the structure of the Jacobian,
the problem is still somewhat tractable.
Peter Blomgren, [email protected] Orthogonal Distance Regression (17/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
ODR Least Squares Levenberg-Marquardt 1 of 2
If we partition the step vector p, and the residual vector r into
p =
_
p
x
p

_
, r =
_
r
1
r
2
_
where p
x
R
n
, p

R
m
, and r
1
, r
2
R
m
, then e.g. we can write
the Levenberg-Marquardt subproblem in partitioned form
_

J
T

J + I
n

J
T
V
V

J V
2
+ D
2
+ I
m
_
_
p
x
p

_
=
_

J
T
r
1
Vr
1
+ Dr
2
_
Since the (2, 2)-block V
2
+ D
2
+ I
m
is diagonal, we can eliminate
the p

variables from the system...


Peter Blomgren, [email protected] Orthogonal Distance Regression (18/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
ODR Least Squares Levenberg-Marquardt 2 of 2
_

J
T

J + I
n

J
T
V
V

J V
2
+ D
2
+ I
m
_
_
p
x
p

_
=
_

J
T
r
1
Vr
1
+ Dr
2
_
p

=
_
V
2
+ D
2
+ I
m
_
1
_
(Vr
1
+ Dr
2
) + V

J p
x
_
This leads to the n n-system A p
x
=

b, where
A =
_

J
T

J + I
n

J
T
V
_
V
2
+ D
2
+ I
m
_
1
V

J
_

b =
_

J
T
r
1
+

J
T
V
_
V
2
+ D
2
+ I
m
_
1
_
Vr
1
+ Dr
2
__
.
Hence, the total cost of nding the LM-step is only marginally
more expensive than for the standard least squares problem.
Peter Blomgren, [email protected] Orthogonal Distance Regression (19/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
ODR LSQ (2m (n + m)) Levenberg-Marquardt LSQ (m n)
The derived system may be ill-conditioned since we have formed a
modied version of the normal equations

J
T

J + stu... With some


work we can recast is as an m n linear least squares problem
p
x
= arg min
p
|

A p

b|
2
, where

A =
_

J + [

J
T
]

V
_
V
2
+ D
2
+ I
m
_
1
V

J
_

b =
_
r
1
+ V
_
V
2
+ D
2
+ I
m
_
1
_
Vr
1
+ Dr
2
__
Where the mystery factor [

J
T
]

is the pseudo-inverse of

J
T
.
Expressed in terms of the QR-factorization QR =

J, we have

J
T
= R
T
Q
T
, [

J
T
]

= QR
T
,
Since QR
T
R
T
Q
T
= I = R
T
Q
T
QR
T
.
Peter Blomgren, [email protected] Orthogonal Distance Regression (20/21)
Summary
Orthogonal Distance Regression
Error Models
Weighted Least Squares / Orthogonal Distance Regression
ODR = Nonlinear Least Squares, Exploiting Structure
Software and References
MINPACK Implements the Levenberg-Marquardt algorithm. Available
for free from http://www.netlib.org/minpack/.
ODRPACK Implements the orthogonal distance regres-
sion algorithm. Available for free from
http://www.netlib.org/odrpack/.
Other
The NAG (Numerical Algorithms Group) library and HSL
(formerly the Harwell Subroutine Library), implement
several robust nonlinear least squares implementations.
GvL
Golub and van Loans Matrix Computations, 4th edition
(chapters 56) has a comprehensive discussion on orthogo-
nalization and least squares; explaining in gory detail much
of the linear algebra (e.g. the SVD and QR-factorization)
we swept under the rug.
Peter Blomgren, [email protected] Orthogonal Distance Regression (21/21)

You might also like