Least Square
Least Square
A. Fotiou
Professor, Aristotle University of Thessaloniki, Dept. of Geodesy and Surveying,
Lab. of Geodetic Methods and Satellite Applications
[email protected]
Summary: Some remarks on least squares adjustment topics are presented, often bypassed
or given little attention. The minimum number of observable parameters, the relation be-
tween observables and other unknown parameters, the role of the functional model and the
linearization effect are fundamental to build a correct adjustment model. Discussing simple
worked examples from the fields of geodesy and surveying, with extensions to practical
applications, a closer look is gained. Especially, a better understanding of the least squares
principal concepts could be very helpful for students, young scientists and non-experts in-
volved in problems requiring adjustment of observations.
The final step of the adjustment is the assessment of the results where estimated
covariance matrices are used together with error and other estimates. This is usu-
ally made through a statistical evaluation or reliability control. In this step, the dis-
tribution of errors must be known. Among various choices the Normal/Gauss dis-
tribution has been accepted for it is both a realistic and mathematically simple hy-
pothesis. First, the reliability of the results is examined by hypothesis testing, veri-
fying the validity of the a priori stated model hypotheses (null hypotheses). Usu-
ally, we try to detect and localize possible significant model errors, looking mainly
in the estimated size in relation to their intercorrelation as given by the estimated
covariance matrix. Ending with an acceptable set of observations and a realistic
covariance matrix, in association with an efficient functional model, measures of
marginal detectable errors or marginal effects on the estimated parameters can be
also derived (reliability measures). At this point, computed measures of precision
or uncertainty are also measures of accuracy/quality, like confidence intervals, ar-
eas, spaces. The assessment process could be carried out in a pre-analysis step, be-
fore the measurement campaign, in a trial and error optimization under certain
standards and specification criteria set a priori (design criteria, optimization prob-
lems).
The accuracy or quality of an estimated parameter express the degree of closeness
to its real (unknown) value; therefore, accuracy is always unknown. Moreover, the
precision of an observation or an estimated parameter express the degree of close-
ness between repeated observations or estimated parameters. Precision of a pa-
rameter is given usually by the variance or standard deviation, also called r.m.s.e
(root mean square error) for unbiased estimations. In case the observations are free
of systematic errors (biases) and outliers (gross errors), measures of precision are
60 A. Fotiou
also measures of accuracy as errors are influenced only by the (unavoidable) ran-
dom errors. Pre-processing and post-adjustment tests - usually statistical tests- refer
to the reliability control.
Some authors characterize outliers as mistakes and define errors as being only ran-
dom and systematic; obviously, mistakes have large values and normally could
easily be detected and excluded. We should note that random errors may have any
large value as illustrated by the normal distribution, though this is exceptionally not
likely for a correct model. A more appropriate term for all types of errors might be
the term model errors; the division to three or two categories supports mainly
teaching requirements and an ease understanding with respect to their sources.
Generally, it is difficult to distinguish the specific type of an estimated error, espe-
cially in models with many interrelated variables and parameters, for instance in
the adjustment of geodetic and surveying networks, where a weak configuration in
relation to the associated weights affects seriously error estimates, making thus dif-
ficult or even impossible a correct error localization.
Despite any statistical evaluation, there is not a hundred percent guaranty that all
possible existed errors have been removed. Based on a statistical decision, there is
always a probability (risk) to reject a correct null hypothesis - committing a type-I
error - or to accept a wrong one - committing a type-II error. In the present paper,
we will not deal with statistical assessment of the results. The goal is to discuss and
make some useful remarks on basic topics, sometimes bypassed or completely
omitted in literature. The number of the minimum observable parameters needed
for an adjustment, the relation between the number of observables and other un-
known parameters, the role of the adjustment model and the linearization process
are hot and essential topics for any adjustment problem.
With P, the weight matrix and C the variance covariance matrix of errors or
equally of observations, we usually have two types of stochastic models:
C = σ2Q, P = C−1 = (σ2Q) −1 ( σ2 known or a priori known and Q known) (2.2)
62 A. Fotiou
a solution does not exist. Always in an adjustment problem there are more than one
proper selections of no and therefore more than one solutions but not the LS-
solution.
The general form of the functional model is,
u(y α , xα ) = 0 (2.7)
where is the (symmetric) normal equation matrix and u the (m x 1) constant vec-
tor. From (3.4) we get the LS solution/estimates,
xˆ = (ATPA)−1(ATPb) = −1u [det(≠0)], xˆ α = xˆ ο + xˆ (3.5)
Any other estimate based on the above best estimates is also a best estimate.
In case of initial linear equations, y α = Axα + t or b = Ax α + v , where a constant
64 A. Fotiou
vector t is possibly present. Comparing with (3.3), y b − t ≡ b and xα ≡ x meaning
that x o = 0 (no approximate values are needed), while matrices A, remain the
same. The approximate values x o should be close enough to their correct esti-
mates, otherwise serious linearization errors will affect the solution. In any case,
the adjustment must be repeated with approximate values the previous estimates
until a negligible difference, between consecutive results, e.g. between the un-
known parameter estimates, is achieved.
where M is the (symmetric) normal matrix (det(M ≠ 0)). The adjusted observa-
tions, and the posteriori variance factor when needed, are given by
We remind that a complete elimination of the unknown parameters from the obser-
vation equations (3.1) results in condition equations (3.10).
vˆ T Pvˆ
ˆ2 =
xˆ α = xˆ ο + xˆ , vˆ = P−1BTM−1 (w + Axˆ ), yˆ α = y b − vˆ , σ (3.18)
s−m
The covariance matrices for the above estimates are then given by
ˆ =C
Cxˆ = Cxˆ α = −1 if P = C−1 , C ˆ α = σˆ 2−1 if P = Q −1 (3.19)
xˆ xˆ
ˆ = σˆ 2C (P = Q −1 ) (3.20)
Cvˆ = P−1BTM−1[I − A−1ATM−1 ]BP−1 (P = C−1 ) , C vˆ vˆ
66 A. Fotiou
∂h
h(xα ) = h(xo ) + | (x α − x o ) = −z + Hx = 0 ,
α o
or Hx = z (3.21)
∂x
must be included, as a new subset of equations, in the corresponding linear sys-
tems. For the related adjustment algorithms not given here, we underline some is-
sues. The number of k-constraints mean that one or more of the m parameters are
not independent and therefore not fundamental for the description of the problem.
The independent (m – k) parameters, (m – k) = r < m, if they are known, the rest k
parameters are determined; thus, we must include k equations in the functional
model.
In the method of observation equations with constraints, the new total number of
equations is given by sˊ= n + m – r = n + k where m – r = k or m = r + k. Like-
wise, in the method of mixed equations, the new total number of equations is,
sˊ= (n + m – r) + k = s + k, where m – r ≤ k or m ≤ r + k.
Sometimes it is better to eliminate a part or even all the constraints. A simple case
of constraints that are eliminated, refers to fix a number of point coordinates in a
geodetic network, in order to define the coordinate system or datum of the adjust-
ment.
4. An introductory example
The role of the parametric degree r is fundamental in any adjustment model. More
over the selection of the adjustment method and the linearization are important is-
sues. With the help of worked examples we will try to clarify some related issues.
Consider a problem where the shape of a triangle on a horizontal plane must be
determined. For this purpose, we measured just two angles of the triangle. Obvi-
ously, the third angle is the difference of their sum from 180º. Geometrically, the
triangle’s shape is determined by taking an arbitrary length of the side with the two
angles at each end and intersecting the two lines formed by each angle (the third
vertex is defined). Note that there are two solutions, two intersections at each half-
plane; we choose one of them if there is some knowledge of the relative orienta-
tion. However, it is not possible with two measured angles to have error control;
any serious error in the observations affects directly the position of the third vertex
resulting in an erroneous shape. Realizing that the shape is determined by a mini-
mum number of two angles (observables), the parametric degree r = 2. Also, n = 2
and therefore, the condition n > r for an LS adjustment is not fulfilled.
Let’s modify the design and measure also the third angle, in total the three angles
(ωΑ , ωB , ωC ) . Suppose that measurements have known precision given by their
standard deviations σωΑ , σωΒ , σωC . While r remains constant (r = 2), regardless the
increase of observations, we will have, n = 3 > 2. In addition, there is a minimum
T
and the unknown parameters, x = [ zA zB ] .
α
Although, these equations are directly linear and one should take advantage of it
(see below), we will treat the model as if it was nonlinear.
T Τ
With x = ⎡⎣zA zB ⎤⎦ , e.g., z oA = ω 'A , z oB = ω 'B , x = x − x = [ δzΑ δzΒ ]
o o o α o
T T
and yo = f (xo ) = ⎡⎣ωoΑ ωoΒ ωoC ⎤⎦ = ⎡⎣zoΑ zoΒ 180° − (zoΑ + zoΒ )⎤⎦
the algorithm is applied as given above, noting that correct units should be used
throughout the computational steps. Applying the algorithm, we begin with the
evaluation of matrices A, b of the linear system b = Ax + v :
⎡ 1 0 ⎤ ⎡ω 'Α − zoA ⎤ ⎡ 0 ⎤
⎢ ⎥ ⎢
A = ⎢⎢ 0 1 ⎥⎥ , b 0 o
b = y − y = ⎢ ω 'Β − zΒ ⎥ = ⎢ 0 ⎥
⎥
⎢⎣−1 −1⎥⎦ ⎢ o ⎥ ⎢ω ' − (180° − (ω ' + ω ' )) ⎥
⎣⎢ω 'C − ωC ⎦⎥ ⎣ C Α Β ⎦
The easily formed (2x2) normal matrix must be inverted and all the relevant es-
68 A. Fotiou
ˆ Α,ω
timates {δzˆ Α , δzˆ Β , zˆ Α , zˆ Β , vˆ ωA , vˆ ωB , vˆ ωc , ω ˆ B, ω
ˆ C } are then computed. In this ex-
ample, having initially linear equations, we could alternatively proceed on with a
direct estimation of ( zˆ Α , zˆ Β ) instead of their corrections. In this case matrix A is
the same while b is different. The same holds for and u.
yC yC xC x −x
ωA = arctan , ωB = arctan , ωC = arctan + arctan B C
xC x B − xC yC yC
Also, the analytical structure of A and b is,
T
b = y b − y 0 = ⎡⎣ω 'Α − ω0A ω 'Β − ω0Β ω 'C − ω0C ⎤⎦
g(yα ) = ωΑ + ωΒ + ωC −180° = 0
which is also linear. This equation could easily be written directly realizing the
geometric angle condition. The matrices of the linear system are then formed,
B = [1 1 1] , w = [ ω 'Α + ω 'Β + ω 'C − 180°]
70 A. Fotiou
detic networks coordinates estimation is of primary importance, the ease of ex-
pressing the observation equations and the direct estimation of covariance matrices
of unknown parameters, in relation to the plenty of computing power, makes the
observation equation method almost the standard choice since some decades.
The application of the algorithm results in ẑ A . After the end of the adjustment al-
ˆ B is also obtained.
gorithm the estimate ẑ B = ω
We can also apply the mixed equations method using coordinates as previously
described. As far as the algorithmic and computing ease, we will see that in some
problems such as the best fit of a function to data points (see below), the mixed
equation method takes precedence.
72 A. Fotiou
safely detected, i.e. if the reason is the quality of the observations or the quality of
the redundant constraints or both.
1
4
Using the observation equations method, the problem can be solved by choosing
three of the observables (m = r = 3) as also being unknown parameters and subse-
quently formulate the respective equations, as we did in the previous worked ex-
ample. Alternatively, we can choose heights to describe the problem and the three
T
unknown parameters, e.g., the heights of points (2, 3, 4), x = [ h2 h3 h4 ] , while a
α
fixed height will be given to the fourth point, i.e., h1 = q1 = constant , in order to
define the height system/datum by minimal constraints (k=1). From three heights
among five points and one known height any other height difference is well de-
fined.
6. Derived observations
Sometimes, instead of the original observations, derived observations are used. The
advantage could be a simpler functional model but with a probably more compli-
cated stochastic model and a demanding software. A simple example will clarify
these concepts. Consider a measured plane (convex) quadrilateral (1-2-3-4) with
observables the 12 horizontal directions, three from each vertex to the other ones.
3
4
2
1
A horizontal direction βij is an angular quantity measured from a point (i) to an-
other point (j) with respect to a reference direction θio , e.g. the zero direction of
the horizontal circle of the theodolite. The reference direction is an orientation pa-
rameter that must be defined, among other parameters, in the method of observa-
tion equations (or even in the mixed model); otherwise the concept of a direction is
meaningless and the problem is not correctly described. A series of measured di-
74 A. Fotiou
rections refers to the directions from one point to other points, having all of them
the same orientation direction. On the other hand, the difference between any two
directions form the respective horizontal angle, ωijk = βik − βij , where i k is the
right sighting and i j the left sighting. From a horizontal angle and one of its di-
rections the other direction is directly derived.
Directions as observable parameters define not only the shape of the quadrilateral
but also the four orientation parameters, one at each point. The parametric degree r
= 8 and this should be explained, through a construction scheme, as it is not so
clear: Let’s take an arbitrary length e.g., for the distance d12 with an arbitrary ori-
entation. From point 1 we use β14 as an orientation direction and draw β13 , β12 .
From point 2 take as reference β 21 and draw β 24 , β 23 so that point 3 is determined
by the intersection with β13 . The intersection of the respective directions defines
points 3 and 4. Up to now we have used six directions so that the shape of the
quadrilateral is well defined; any other angle can be derived using simple geome-
try. However, there is something missing, and this is the possibility to describe di-
rections from points 3 and 4. This can be feasible by using a reference direction for
point 3, e.g., β32 and one for point 4, e.g., β 43 . At this point the problem has been
fully described by a least number of eight observables, hence the parametric degree
r = 8 and the degrees of freedom f = 12 – 8 = 4. Other equivalent subsets of eight
observables are possible.
Having n = 12 and r = 8, we must choose m = r = 8 unknown parameters in case
the method of observation equations is used. One option is to take a suitable set of
eight unknown parameters among the set of the twelve observables and form the
function model accordingly (some geometric manipulations needed). The other
option, almost always preferred, is to use point coordinates to describe the prob-
lem. In this case the reference system/datum should be also defined, e.g., by fixing
coordinates (x1=0, y1=0) and (x2=q, y2=0), q being a constant (minimal constraints).
Therefore, with four unknown parameters, being (x3, y3), (x4, y4) and with four un-
known orientation parameters (θ1, θ2, θ3, θ4), eight unknown parameters are used in
total. We remind that the observation equation of a horizontal direction is given by,
⎛ x j − xi ⎞
βij = αij − θi = arctan ⎜ ⎟ − θi
⎝ y − yi ⎠
where, αij is the plane azimuth (angle from the grid north to the direction ij).
Usually, in geodetic networks, θi coincides with the most left measured direction
in a series from a station point; its approximate value is given by the azimuth com-
puted from approximate coordinates.
In case of condition equations, the number of the independent equations should be
76 A. Fotiou
(x i − x c ) 2 + (yi − yc ) 2 − R 2 = 0 (7.1)
Due to errors, all points do not satisfy the same equation; any triplet of data points
determine a different circle. The goal is to find the best solution, i.e. the best circle
under some fitting criterion. Generally, in fitting problems, data points should have
a suitable distribution, here along a large arc and preferably along the whole circle,
so that the circle is correctly defined avoiding probable inconsistencies and bad
approximate values. However, small arcs should be avoided as considerable errors
may grossly affect estimates of circle parameters.
Noting with (x i , yi ) the observables and (v xi , v yi ) the errors, an efficient fitting
criterion, is based on the minimization of squares of both errors. This can be ex-
pressed by the minimization of the geometric distances of the observed points from
the best fit circle,
( )
2
∑ di2 = ∑ (x ′i − x c ) 2 + (y′i − y c ) 2 − R (7.2)
Such distances are reckoned along the directions from any data point to the center
of the circle in case the errors/observations have the same uncertainty, a common
situation in practice (geometric fit, orthogonal fit).
A criterion not so strict but competitive and often preferable to the previous one for
its mathematical simplicity, is based on the minimization of the squares of the so-
called algebraic distances, i.e.,
∑ di2 = ∑ ( (x ′i − x c )2 + (y′i − yc )2 − R 2 )
2
(7.3)
In literature (see, e.g., Chernov 2010), there are many approaches and model modi-
fications to both above model implementations, e.g., either using the model
A(x i2 + yi2 ) + Bx i + Cyi + D = 0 (under constraint on A, B, C, D)
or z i + Bx i + Cyi + D = 0 , from where, with the substitutions,
we derive a linear model with respect to the unknown parameters with simple and
fast computational algorithms. We do not intend here to go into details; instead we
will point out how this problem could be faced with each one of the three basic ad-
justment methods and how one could face the effect of the linearization to obtain a
best or at least a sufficient solution.
78 A. Fotiou
tions (7.5), we take three of them and analytically express the coordinates of the
circle and its radius in terms of the coordinates of the three used points. Next, we
substitute these coordinates to the remaining (N–3) equations and in this form no
unknown parameter except the observables is included. Matrix M to be inversed is
(N–3)x(N–3) and nothing notable is gained. Moreover, equation complexity is ob-
vious. Therefore, the mixed equation model is again the preferred one among the
three basic adjustment methods.
∂u i ∂u ∂u
+ (R − R o ) + i (x i − x io ) + i (yi − yio ) + ... = 0 . (7.6)
∂R o ∂x i o ∂yi o
Noting that
x c = x oc + δx c , y c = y oc + δy c , R = R o + δR ,
and
xi = x′i − vxi , yi = y′i − vyi , xio = x′i − voxi , yio = y′i − voyi ,
from where,
x i − x io = (x ′i − v xi ) − (x′i − voxi ) = − v xi + voxi , yi − yio = − v yi + voyi ,
we have,
u i (x oc , yco , R o , x io , yio ) = (x io − x oc ) 2 + (yio − yoc ) 2 − (R o )2 =
= (x′i − voxi − x co )2 + (y'i − voyi − yco )2 − (R o )2
∂u i ∂u ∂u
(x c − x oc ) + i (yc − yoc ) + i (R − R o ) =
∂x c o ∂yc o ∂R o
⎢⎣ δR ⎥⎦
∂u i ∂u
(x i − x io ) + i (yi − yio ) = 2(x io − x co ) (x i − x io ) + 2(yio − yoc )(yi − yio ) =
∂x i o ∂yi o
= 2(x ′i − voxi − x co )(x ′i − v x i − voxi ) + 2(y 'i − voyi − yoc )(y 'i − v yi − voyi ) =
⎡ v xi ⎤
= −2 ⎡⎣(x ′i − vox i − x oc ) (y 'i − voyi − yoc ) ⎤⎦ ⎢ ⎥ +
⎣⎢ v yi ⎦⎥
80 A. Fotiou
⎡ x ′ − v ox ⎤ ⎡ x ′i − vox ⎤
o ⎤⎢ i
+ 2 ⎡⎣ (x ′i − v ox i x co ) v oyi ⎥ = −B i v i + B i ⎢ ⎥.
i i
− (y 'i − − yc ) ⎦
⎢ y 'i − v oy ⎥ ⎢ y 'i − voy ⎥
⎣ i ⎦ ⎣ i ⎦
With
⎡ x ′i − v ox ⎤
w i = (x ′i − v ox i x co ) 2 v oyi y oc ) 2 o 2
− (R ) + Bi ⎢ ⎥
i
− + (y 'i − −
⎢ y 'i − v oy ⎥
⎣ i ⎦
8. Concluding remarks
A means to study the real world and to obtain quantitative determination for vari-
ous physical problems is provided by the analysis of observations in relation to a
mathematical model that reflects the unknown reality. For that purpose, the ad-
justment of observations consists an indispensable part in geodetic sciences, geo-
matics and other related fields.
Central LS adjustment topics, about the correct choice of the adjustment
model/method and the efficient algorithmic implementation for parameter esti-
References
Chernov, N. (2010). Circular and Linear Regression: Fitting Circles and Lines by Least
Squares. CRC Press, A Chapman &Hall Book.
Dermanis, Α. (1986). Adjustment of Observations and Estimation Theory, Vol. I (in greek).
Ziti editions, Thessaloniki.
Dermanis, Α. (1987). Adjustment of Observations and Estimation Theory, Vol. II (in
greek). Ziti editions, Thessaloniki.
Dermanis, Α. and A. Fotiou (1992). Methods and Application of Observation Adjustment
(in greek). Ziti Publisher, Thessaloniki.
Fotiou, A. (2007). Geometric Geodesy-Theory and Practice. Ziti editions, Thessaloniki (in
greek).
Fotiou, A. (2017a). Line of Best Fit by a Least Squares Modified Mixed Model. Volume
‘Living with GIS’ in honour of the memory of Prof. I. Paraschakis, School of Rural and
Surveying Engineering, Aristotle University of Thessaloniki, 57-80.
Fotiou, A. (2017b). Computationally Efficient Methods and Solutions with Least Squares
Similarity Transformation Models. Volume ‘Living with GIS’ in honour of the memory
of Prof. I. Paraschakis, School of Rural and Surveying Engineering, Aristotle University
of Thessaloniki, 107-128.
Koch, K.R. (1987). Parameter Estimation and Hypothesis Testing in Linear Models.
Springer-Verlang.
Mikhail, E.M. and F. Ackermann (1976). Observations and Least Squares. IEP-A Dun-
Donnelley Publisher.
Pan, G., Y. Zhou, H. Sun, and W. Guo (2015). Linear observation based total least squares.
Survey Review, 47:340, 18-27.
Pope, A. J. (1972). Some pitfalls to be avoided in the iterative adjustment of nonlinear
problems. In Proc.: 38th Ann. Meeting, American Society of Photogrammetry, Washing-
ton, D.C., 449-473.
Rossikopoulos, D. (1999). Surveying Networks and Computations (in greek). Ziti editions.
Schaffrin, B. and A. Wieser (2008). On weighted total least-squares adjustment for linear
regression. J Geod., 82, 415– 421.
Sneew, N., F. Krumm, M. Roth (2015). Adjustment Theory. Lecture Notes, Rev. 4.43,
Geod. Inst. Univ. Stuttgart.
Wells, D.E. and E.J. Krakiwsky (1971). The Method of Least Squares. Lecture Notes, De-
partment of Geodesy and Geomatics Engineering, U.N.B.
82 A. Fotiou