0% found this document useful (0 votes)
35 views20 pages

2013 - Fitting Cylinders To Data

Ajuste del volumen de cilindros a los datos y nubes de puntos

Uploaded by

Leonardo Garro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views20 pages

2013 - Fitting Cylinders To Data

Ajuste del volumen de cilindros a los datos y nubes de puntos

Uploaded by

Leonardo Garro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Journal of Computational and Applied Mathematics 239 (2013) 250–269

Contents lists available at SciVerse ScienceDirect

Journal of Computational and Applied


Mathematics
journal homepage: www.elsevier.com/locate/cam

Fitting cylinders to data


Yves Nievergelt
Department of Mathematics, Eastern Washington University, Cheney, WA 99004-2418, USA

article info abstract


Article history: The problem of fitting cylinders to data arises in science and industry. This article proves
Received 31 October 2011 the existence of generalized cylinders – Cartesian products of generalized spheres and
Received in revised form 10 September affine manifolds – fitted to data by using many criteria, including generalized least-squares,
2012
weighted median, and midrange regressions.
© 2012 Elsevier B.V. All rights reserved.
MSC:
41A10
41A28
41A50
41A52
41A63
65D10
65D17

Keywords:
Cylinders
Least-squares methods
Median
Midrange
Regression

0. Introduction

The problem of fitting circles, spheres, and cylinders to data occurs in surveying, engineering, and metrology, to measure
how accurately the data fit specifications:
1. Industrial manufacturers fit concentric spheres or concentric cylinders to parts for certification [1–8].
2. Photogrammetric quality control engineers compare specifications to manufactured products by fitting circles to wheel
rims [9], and cylinders [10,11] or spheres [12, pp. 183–188], [13, p. 1127] to storage tanks.
3. Photogrammetry engineers reconstruct cylindrical objects from photographs in the investigation of accidents [14].
4. Structural engineers monitor deformations to avert failures by fitting surfaces to buildings, for example, cylinders to oil
production platforms [15,16].
This article addresses the apparently overlooked issue of whether there is any best-fitting cylinder (Cartesian product of a
sphere and affine manifold) of any co-dimension.
Best-fitting circles or spheres need not exist, because the moduli spaces of (parameters of) circles or spheres do not contain
the limiting cases of lines or planes and thus are not compact. Nevertheless, the spaces of generalized circles, consisting of
all circles and lines in the plane and at infinity, or generalized spheres, consisting of all spheres and planes in space and at
infinity, or, more generally generalized spheres of dimension K , consisting of all spheres of dimension K and affine manifolds
of dimension K in RL and at infinity, with 0 ≤ K ≤ L, are compact spaces. The existence of at least one such best-fitting

E-mail address: [email protected].

0377-0427/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
doi:10.1016/j.cam.2012.09.037
Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269 251

least-squares, median, and midrange generalized circle in the plane, spheres in space, and other hyperspheres with unit
co-dimension in hyperspace follows not only from the compactness of the moduli space of parameters of all generalized
hyperspheres, but also from the compactness of each finite set of data points, which ensures that the best-fitting generalized
hyperspheres are not at infinity [4,17–23]. In some situations, there exists at least one best-fitting circle with a positive
radius, as opposed to a point or a line. For example, at least five data points in general position in the plane admit at least
one median circle [18, p. 902, Lemma 1; p. 903, Lemma 2]; more generally, at least L + 3 data points in general position in
RL admit of at least one median hypersphere [21, p. 501, Remark 3.3]. As another example, relative to the principal axes of
a distribution of data in the plane, if a third moment does not vanish (xxy ̸= 0), then there is a least-squares circle with
a positive radius [20, p. 68, Theorem 8]; similar results for higher-dimensional least-squares spheres do not seem to have
been established yet.
Best-fitting circles or spheres need not be unique: there are data with several midrange and several least-squares
circles [22, pp. 260–261, Example 1.2.1], and an infinite continuum of median circles [21, p. 595, Example 7.2]. Circles and
spheres are particular cases of best-fitting cylinders, which therefore need not be unique either.
In contrast, the existence of any geometrically best-fitting circle, cylinder, or sphere with a higher co-dimension seems to
have received little attention. Thus the problem consists of fitting a generalized cylinder, defined as the Cartesian product of a
generalized circle or sphere and an affine manifold, with both factors allowed to have any non-negative integral dimensions.
For example, a cylinder in space is the Cartesian product of a one-dimensional circle and a one-dimensional line. A circle in
space is the Cartesian product of a circle and a zero-dimensional point. Because the unit sphere on the real line R consists of
two distinct points {−1, 1}, a pair of parallel lines in the plane is the Cartesian product of such a zero-dimensional sphere
and a line.
The following notation accommodates all kinds of generalized cylinders, with any dimension and any co-dimension. In
the general setting considered here, the data consist of a finite sequence of points X ′ = (x1 , . . . , xN ) – where ′ denotes
transposition – in a metric space (X, d), called the ambient space or data space.

Example 1. The metric space (X, d) may be the Euclidean space X = RL containing the data X ′ = (x1 , . . . , xN ), with its
Euclidean distance d.

Example 2. The metric space (X, d) may be the linear space A⊥ perpendicular to the axis A of a cylinder and containing the
orthogonal projections of the data on A⊥ .
The situation examined here also includes a non-empty set E of subsets of X.

Example 3. In linear regression, E is the set of all lines in the plane X = R2 , or all planes in space X = R3 , or all hyperplanes
in hyperspace X = RL .

Example 4. In circular regression, E is the set of all generalized circles in the plane, or generalized spheres in space, or
generalized hyperspheres in hyperspace.

Example 5. In the present context, E may be the set of all generalized circular cylinders in space, or the set of all generalized
circles in space, or the set of all pairs of parallel lines in the plane. In general, E is the set of all generalized cylinders of a
specific type – Cartesian products of a generalized sphere of dimension M − 1 and an affine manifold of dimension K – in
RL , with K , L, and M fixed.
The distance from a point p ∈ X to a subset A ⊆ X is defined by
d(p, A) := inf{d(p, q): q ∈ A} ∈ [0, ∞].
If A is an orientable hypersurface, or any subset A ⊂ X that splits its complement X \ A into two mutually disjoint
connected components, such that X \ A = A+ ∪ A− with A+ ∩ A− = ∅, then it may be advantageous to define a signed
distance by d± (p, A) := d(p, A) for p ∈ A+ , whereas d± (p, A) := −d(p, A) for p ∈ A− . Thus distances may be positive above
but negative below a plane A in the space R3 .
The problem considered here consists in finding a cylinder (Cartesian product of a sphere and an affine manifold) E ∈ E
that minimizes an objective function F : RN → R+ of the array d := (d1 , . . . , dN ) ∈ RN of signed or unsigned ‘‘residual’’
distances dj := d(xj , E ) from the data points to E. Thus E must minimize the objective F (E ) := F (d). Defined independently
of the distance d on the space (X, d), the objective F may be any norm on RN , which may be called the residual space.

Example 6. In least-squares regression, F is the Euclidean ℓ2 -norm, ∥ ∥22 , so F (d1 , . . . , dN ) = ∥(d1 , . . . , dN )∥22 = |d1 |2 +
· · · + |dN |2 .

Example 7. In median regression, F is the ℓ1 -norm, ∥ ∥1 , also called the ‘‘taxi cab’’ or ‘‘octahedron’’ norm: F (d1 , . . . , dN ) =
∥(d1 , . . . , dN )∥1 = |d1 | + · · · + |dN |.

Example 8. In midrange regression, F is the ℓ∞ -norm, ∥ ∥∞ , also called the ‘‘supremum’’ or ‘‘cube’’ norm: F (d1 , . . . , dN ) =
∥(d1 , . . . , dN )∥∞ = max{|d1 |, . . . , |dN |}.
252 Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269

The objective F need only be continuous on RN and such that for each t ∈ RN
lim inf F (d) > F (t). (1)
∥d∥→∞

Condition (1) means that for each t ∈ RN , there exist δ > 0 and ε > 0 such that for each d ∈ RN , if ∥d∥ > δ , then
F (d) > F (t)+ε . Condition (1) does not depend on the choice of the norm ∥d∥, because all norms are topologically equivalent
on RN [24, p. 95, Theorem 3.12-A]. With any such objectives F and F , a phrase such as ‘‘an object A fits the data better than
an object B does’’ means that F (A) < F (B).
For practical purposes of computations, and for theoretical purposes of proofs of existence of minima, Section 1 reviews
parameterizations of sets of lines, planes, and other affine subspaces, and Section 2 reviews parameterizations of sets of
circles, spheres, and other generalized hyperspheres. Section 3 establishes the main theoretical result: the existence of
generalized cylinders that fit data best according to various regression criteria. For least-squares regressions, Section 4
narrows the search for optimal circles in higher-dimensional spaces by showing that generalized least-squares circles lie
within a plane through the generalized mean of the data. Yet least-squares circles need not lie in a least-squares plane,
sphere, or cylinder. Merely as a proof of concept, Section 5 outlines prototypes of algorithms for Section 6, which presents
examples with real data.
For comparison and contrast, Appendix A shows that each least-squares affine manifold with positive dimension and
co-dimension contains a least-squares affine manifold of a smaller dimension, and is contained in a least-squares affine
manifold of a larger dimension. Counter-examples reveal that such a nesting can fail for median affine manifolds and for
least-squares circles. Resolving an issue raised in the literature, Appendix B verifies that large objects such as surfaces and
hypersurfaces generally fit data better than small objects such as points and curves do. For instance, with at least two distinct
data points in the plane, relative to any ℓp -norm of the residual distances, no singletons can be a best-fitting circle.

1. Parameterizations of sets of affine subspaces in Euclidean spaces

Practical and theoretical situations call for calculations in spaces of affine manifolds, where each element is an affine
manifold, and where a topology defines a concept of neighborhood of affine manifolds. For instance, orthogonal least-
squares regressions determine among other affine manifolds an affine manifold that fits data best. Similarly, finding a best-
fitting cylinder entails finding its axis. Likewise, finding a best-fitting circle in space entails finding the plane supporting it.
Such calculations operate not on the affine manifolds but on their parameters, for instance, unit normal vectors or parallel
orthonormal bases. Thus, specifying a plane and an orientation – ‘‘up’’ and ‘‘down’’ – in space R3 amounts to specifying a
point on the plane and a unit vector perpendicular to the plane. Alternatively, for situations that call for a plane in space
without any orientation, a normal unit vector and its opposite are equivalent. In either case, a matrix factorization of a
normal vector yields an orthonormal basis for the plane. Practical and theoretical situations may also call for such a basis,
for instance, to calculate lines and circles within a plane. Because of such versatility and ubiquity of unit vectors, this section
describes computationally efficient methods for specifying unit vectors and orthonormal bases in any finite-dimensional
Euclidean linear space RM . The results can then also specify the parameters defining generalized cylinders.
The spaces of parameters are compact, guaranteeing the existence of a minimum for each continuous objective function
defined on any closed subset of parameters.

1.1. Parameterizations of sets of unit vectors

There are no ways to make an (M − 1)-dimensional space Y ⊆ RM −1 of parameters correspond to the unit sphere S M −1
bijectively and bi-continuously (because the unit sphere is not homeomorphic to any proper subset of itself [25, p. 364,
Example 3], and hence not to any Y ⊆ RM −1 ). Still, there are ways to parameterize hemispheres, for instance, through
inverse stereographic projections of RM −1 into (but not onto) S M −1 .
Geometrically, for each fixed point p ∈ S M −1 , called a pôle, the equatorial hyperplane p⊥ normal to p through the center 0
can be an (M − 1)-dimensional space of parameters: the inverse stereographic projection Pp−1 through p maps each vector of
parameters z ∈ p⊥ to the point where the line through z and p intersects the unit sphere. Algebraically, for each index j, the
unit sphere punctured at (without) the pôle ej = (0, . . . , 0, 1, 0, . . . , 0)′ , with 1 only in coordinate j, can be parameterized
by the inverse stereographic projection Pe−j 1 through ej from the equatorial hyperplane, where z ∈ e⊥ j = R
M −1
, given by the
formulas [26, p. 23, Eqs. (3.3) and (3.31)]
Pe−j 1 : RM −1 → S M −1 \ {ej },

(2 z1 , . . . , 2 zj−1 , ∥z∥22 − 1, 2 zj , . . . , 2 zM −1 )
z → u := ,
∥z∥22 + 1
Pej : S M −1 \ {ej } → RM −1 ,
(u1 , . . . , uj−1 , uj+1 , . . . uM )
u → z := .
1 − uj
Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269 253

j , where uj > 0. The inverse stereographic


The map Pe−j 1 is also used to parameterize just the hemisphere above the plane e⊥
projection through the opposite pôle −ej is
−1 M −1
P− ej : R → S M −1 \ {−ej },
(2 w1 , . . . , 2 wj−1 , 1 − ∥w∥22 , 2 wj , . . . , 2 wM −1 )
w → u := ,
1 + ∥w∥22
P−ej : S M −1 \ {−ej } → RM −1 ,
(u1 , . . . , uj−1 , uj+1 , . . . uM )
u → w := .
1 + uj
Any two such projections suffice to parameterize the entire unit sphere, for instance, Pej and P−ej , or Pek and Peℓ for k ̸= ℓ.
Computationally, to avoid parameters diverging to infinity, it suffices to restrict the domain of each of Pej and P−ej to a ball
j =M
centered at 0 with any radius r > 1 in RM −1 and still cover all of S M −1 . Alternatively, 2 M open unit balls and maps (P±ej )j=1
parameterize S M −1 with 2 M open hemispheres. This alternative is useful for avoiding multiple coverings of unoriented
directions, and to parameterize hemispheres where one of the coordinates uj vanishes nowhere, for instance, for use with
the QR factorization defined in Section 1.2.
Either way an equatorial band or the intersection of two hemispheres on S M −1 gets covered twice, but this double
cover can be a computational advantage: an algorithm near the boundary of the domain of one inverse stereographic
projection Pej is also deeper in the interior of the domain of the other inverse stereographic projection P−ej . Consequently, an
optimization algorithm may occasionally need to switch between inverse stereographic projections, but it need only focus
on unconstrained optimizations, without ever encountering a boundary constraint. Indeed, S M −1 has no boundary!
In hyperspace RM , if u is a point between two opposite pôles −ej and ej on the unit sphere S M −1 , then −1 < uj < 1,
so z := Pej (u) and w := P−ej (u) are defined, with k̸=j u2k = 1 − u2j > 0. If an algorithm goes through an iteration z with

∥z∥2 > 2, then switching to w = P−ej [Pe−j 1 (z)] yields a return to a value of w with ∥w∥2 < 1/2. Indeed, substituting the
formula for u = Pe−j 1 (z) into the formula for w := P−ej (u) reveals that the change of parameters w = P−ej [Pe−j 1 (z)] is an
inversion in the unit sphere [27], defined by w = z/∥z∥22 :
(2 z1 , . . . , 2 zj−1 , ∥z∥22 − 1, 2 zj , . . . , 2 zM −1 )
z → Pe−j 1 (z) = u = ,
∥z∥22 + 1
(u1 , . . . , uj−1 , uj+1 , . . . , uM )
u → P−ej (u) = w =
1 + uj
(2 z1 , . . . , 2 zj−1 , 2 zj , . . . , 2 zM −1 ) z
= = .
∥z∥22 −1 ∥22
 
(∥z∥2 + 1) · 1 + ∥z∥2 +1
2 ∥ z
2

Moreover, at least one of the maps (Pe−j 1 )jj= M


covers a neighborhood of u or −u. The foregoing formulas can serve to
=1
parameterize oriented axes of cylinders or oriented normals to lines, planes, and hyperplanes. Subsequent subsections
provide alternative formulas for parameterizing unoriented axes and hyperplanes.

1.2. Parameterizations of orthonormal bases for hyperplanes

With the space of all oriented hyperplanes through 0 parameterized, for instance, by their unit normal vectors as in
Section 1.1, there arises the question of how to parameterize each such hyperplane. In the plane RM = R2 , for each unit
vector u = (u1 , u2 )′ , which may be normal to an oriented line L through 0, the vector v := (−u2 , u1 )′ is a unit vector
perpendicular to u, parallel to L, so (u, v) is a right-handed basis of R2 such that v can be used to parameterize L. In any
space RM , each non-zero vector u = (u1 , u2 , u3 , . . .)′ , which may be normal to an oriented hyperplane H through 0, can be
complemented to an orthogonal right-handed basis (±u, v, w, . . .) of RM , such that (v, w, . . .) is a basis for H, through the
Householder reflection Q defined by
u1 + sign(u1 ) ∥u∥2
 
u2 1
, , Q := I − c q q′ .
 
q :=  u3 c :=

..
 ∥u∥2 · (∥u∥2 + |u1 |)
.
The matrix Q represents the reflection across the median hyperplane between u and −sign(u1 ) e1 , so det(Q ) = −1. Also,
Q is symmetric (Q = Q ′ ), orthogonal (Q Q ′ = I ) and involutive (Q 2 = Q ) with Q e1 = −sign(u1 ) u/∥u∥2 and Q u =
−sign(u1 ) ∥u∥2 e1 , which states that the last M − 1 columns (v, w, . . .) of Q are perpendicular to u. Thus (sign(u1 ) u/∥u∥2 ,
v, w, . . .) is a right-handed orthonormal basis of RM .
254 Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269

Computationally, to produce an orthogonal (but not necessary orthonormal) basis (±u, v, w, . . .) without computing
any norm, the Householder reflection can also be replaced by M − 1 ‘‘fast Givens reflections’’ [28, Chapter 5, Section 5.1.13,
pp. 205–209].
Thus an inverse stereographic projection Pp−1 of the parameter z ∈ RM −1 to u = Pp−1 (z) ∈ S M −1 ⊂ RM followed by a
Householder reflection Q produces an oriented hyperplane H ⊂ RM perpendicular to u and an orthonormal basis (v, w, . . .)
for H.
If u1 = 0, then either sign works for sign(u1 ), but at the cost of a discontinuity in the factorization u = Q · R. To en-
sure a local continuity, it suffices to adopt a similar factorization based not on u1 but on any uj ̸= 0, for instance, such that
|uj | = ∥u∥∞ := max{|u1 |, . . . , |uM |}, with Q representing the reflection across the median hyperplane between u and
−sign(uj ) ej . Such a discontinuity also corresponds to the boundary of a parameterization. To restore a local continuity, it
suffices then to switch to another parameterization of a neighborhood of u, for instance, another inverse stereographic pro-
jection, or one of the canonical projections defined in Section 1.3. In general such global discontinuities cannot be avoided,
because there are no ways to define bases perpendicular to unit vectors everywhere continuously on the sphere: there are
no global continuous tangent frames on S 2 or S M −1 ; such frames exist only on S 1 , S 3 , S 7 [29, p. 87, Corollary 2].

1.3. Parameterizations of sets of lines and hyperplanes through the origin

Without imposing any orientation, specifying a hyperplane H through the origin in RM amounts to specifying a non-zero
normal vector r ∈ RM . Multiplying r by a non-zero scalar λ ∈ R gives the same hyperplane H. Thus the set
[r] := {λ r: λ ∈ R \ {0}}
of all non-zero multiples of r corresponds to the unoriented normal to H. Yet normal vectors in different directions
correspond to different hyperplanes. Hence there is a bijection between the set of all hyperplanes through the origin and
the projective space
P(RM ) := {[r]: r ∈ RM \ {0}}.
There are also no ways to make an (M − 1)-dimensional space Y ⊆ RM −1 of parameters correspond to the projective space
P(RM ) bijectively and bi-continuously [30, p. 237, Theorem 40.6]. Nevertheless, there are many ways to cover it multiple
times or to parameterize pieces of it. For instance, the projection S M −1 → P(RM ) defined by u → [u] covers P(RM ) twice; it
also guarantees that P(RM ) is compact [31, p. 42]. Also, an inverse stereographic projection of an open hemisphere ‘‘above’’
or ‘‘below’’ an equatorial plane on the unit sphere S M −1 ⊂ RM parameterizes all of P(RM ) except a lower-dimensional
subspace P(RM −1 ), and inverse stereographic projections of M such open hemispheres cover all of P(RM ).
A commonly used alternative set of parameterizations begins with the maps
gj : RM −1 → RM , (x1 , . . . , xM −1 )′ → (x1 , . . . , xj−1 , 1, xj+1 , . . . , xM −1 )′
for j ∈ {1, . . . , M }. (If x := (x1 , . . . , xM −1 )′ is in the unit cube, which is the unit ℓ∞ -ball B∞
M −1
, in RM −1 , and so ∥x∥∞ =
max{|x1 |, . . . , |xM −1 |} ≤ 1, then gj (x) lies on a face of the unit ℓ -sphere S∞ in R . Thus gj is also the inverse stereographic
∞ M −1 M
M −1 M −1
projection from B∞ into S∞ from the point a infinity in the direction −ej .)
Based on the maps g1 , . . . , gM just described, the inverse canonical projections
fj : RM −1 → P(RM ), (x1 , . . . , xM −1 )′ → [(x1 , . . . , xj−1 , 1, xj+1 , . . . , xM −1 )′ ]
for j ∈ {1, . . . , M } also cover P(RM ). For each [r] ∈ P(RM ), since r ̸= 0, there exists an index j ∈ {1, . . . , M } such that rj ̸= 0,
and then letting xk := rk /rj gives
 ′ 
′ r1 r j −1 rj+1 rM −1
x1 , . . . , xj−1 , xj+1 , . . . , xM ,..., , 1, ,..., ,

→ = [r].
rj rj rj rj

Thus fj parameterizes a neighborhood of [r] in P(RM ).

1.4. Parameterizations of sets of affine hyperplanes

For all hyperplanes, not necessarily through 0, the parameterization fM factors through the map gM : RM −1 → T :=
M −1
R × {1} ⊂ RM from Section 1.3:
gM : RM −1 → T := RM −1 × {1} ⊂ RM ,
(x1 , . . . , xM −1 )′ → (x1 , . . . , . . . , xM −1 , 1)′ .
Thus an affine hyperplane in RM −1 × {1} is an affine hyperplane in RM −1 raised by 1. Yet a subset of RM −1 × {1} is an
(M − 2)-dimensional affine hyperplane if and only if it is the intersection of RM −1 × {1} with an (M − 1)-dimensional
linear hyperplane through the origin in RM . Consequently, the parameterizations fj in Section 1.3 for the space of all linear
hyperplanes through the origin in RM also parameterize all the affine hyperplanes in RM −1 , including the hyperplane at
Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269 255

Fig. 1. A line L on the horizontal plane T = R2 × {1} is the intersection of T with a plane Π through the origin of the space R3 . The parameter
z ∈ R2 = C through an inverse stereographic projection determines a unit vector u ∈ S 2 ⊂ R3 and hence also the normal plane Π = u⊥ . A factorization
u = Q · R = (±u, v, w) · R yields an orthonormal basis W = (v, w) parallel to the plane Π . A factorization (u, e3 ) = Q · R = (±u, r u + s e3 , L) · R yields
a unit vector L parallel to the line L = Π ∩ T . Algorithms on unit vectors or linear subspaces in space, or affine lines in the plane, need only operate on the
parameter z ∈ C.

infinity, which corresponds to the linear hyperplane perpendicular to eM in RM . Therefore, as P(RM ), the space of all the
affine hyperplanes in RM −1 is also compact.
The compactness of the projective parameter space P(RM ) guarantees the existence of a minimum for each continuous
objective F on a closed subset D ⊆ P(RM ).

Example 9. A plane P ⊂ R3 in space may be defined as the intersection of the space


T := R3 × {1} = {(x, y, z , 1): x, y, z ∈ R} ⊂ R4
with a three-dimensional linear subspace Π ⊂ R4 through 0. The hyperplane Π can be specified by any unit normal vector
u ∈ R4 , for instance, from an inverse stereographic projection, or a canonical projection from Section 1.3. A point (x, y, z ) is
on the plane P if and only if (x, y, z , 1) ∈ Π ∩ T , and so (x, y, z , 1)· u = 0. Another normal vector ũ defines the same plane Π ,
and hence intersects T along the same plane P, if and only if u and ũ are non-zero multiples of each other. A vertical normal
vector u, parallel to e4 = (0, 0, 0, 1)′ , corresponds to the plane at infinity in T . Thus the space of affine planes in space,
including the plane at infinity, is homeomorphic to the compact projective space P(R4 ). A factorization [u, e4 ] = Q · R with
R upper triangular [32, Section 4.7] yields in the third and fourth columns of the orthogonal matrix Q , which is a product of
Givens or Householder reflections, an orthonormal basis (v, w) parallel to the plane P, which can be used to parameterize
P, or to project data orthogonally on P. Fig. 1 shows similar constructions for a line in a plane, and a plane through 0.

1.5. Parameterizations of intermediate-dimensional affine subspaces

Although the finitely many examples in Section 6 do not include such dimensions, the inclusion of the present subsection
allows the subsequent proofs and results to hold for all dimensions. More general practical and theoretical situations may
call for parameterizations of affine subspaces of any dimension K in RM . For instance, fitting a circle to data in R5 entails
finding the two-dimensional plane in R5 that supports such a circle. To this end, there already exist parameterizations
similar to those in Sections 1.3 and 1.4 [33, p. 193]. Specifically, a K -dimensional linear subspace Z ⊆ RM can be specified by
a (not necessarily orthonormal) basis U := (u1 , . . . , uK ) ∈ RM ×K . A basis V := (v1 , . . . , vK ) ∈ RM ×K spans the same linear
subspace Z if and only if there exists an invertible matrix Λ ∈ RK ×K such that V = U · Λ, or, equivalently, V ′ = Λ′ · U ′ ,
similarly to λr. Also similarly to rj ̸= 1, since U is a basis, it has a non-singular K × K principal minor submatrix, so if Λ
is its inverse matrix, then the corresponding principal minor of V = U · Λ is the identity. For illustration purposes, if this
principal minor has row and column indices 1, . . . , K , then V = U · Λ has the form
 
I
V =U ·Λ= ,
A

where I ∈ RK ×K is the identity, and A ∈ R(M −K )×K is any matrix. Thus the function
 
I
f1,...,K : R(M −K )×K → G(K , RM ), B →
B

parameterizes a neighborhood of Z in the space G(K , RM ) of all K -dimensional linear subspaces of RM . If B = A, then
f1,...,K (A) is a basis for Z . If B is near A in the topology of R(M −K )×K , then the linear subspace with basis f1,...,K (B) is near Z in
the topology of G(K , RM ). Parameterizations of all of G(K , RM ) arise from selecting other K × K minors, of which there are
a total of (M !)/{(K !) [(M − K )!]}.
The space G(K , RM ) of all K -dimensional linear subspaces of RM is also the set of all equivalence classes [U ] of bases,
with two bases U and V mutually equivalent if and only if there exists an invertible matrix Λ such that V = U · Λ, which
shows that G(K , RM ) is also compact [33, p. 194].
The transposition mapping B to B′ from R(M −K )×K to RK ×(M −K ) reflects the isomorphism between G(K , RM ) and G(M −
K , RM ). Also, for each basis U = (u1 , . . . , uK ) of a K -dimensional linear subspace Z ⊆ RM , a factorization U = Q · R with R
256 Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269

upper triangular [32, Section 4.7] yields in the last M − K columns of the orthogonal matrix Q , which is a product of Givens
or Householder reflections, an orthonormal basis W = (w1 , . . . , wL−K ) parallel to the orthogonal complement Z ⊥ ⊆ RM ,
which can be used to parameterize Z ⊥ , or to project data orthogonally on Z ⊥ .
As in Section 1.4, the space of all K -dimensional affine subspaces of RM −1 is homeomorphic to the compact space
G(K , RM ) of all K -dimensional linear subspaces of RM , through their intersections with RM −1 × {1}.
The compactness of the Grassmannian parameter space G(K , RM ) guarantees the existence of a minimum for each
continuous objective F on a closed subset D ⊆ G(K , RM ). Thus one of the main conclusions from the present section is
that to guarantee the existence of a minimum for an objective F , it suffices to verify that F is defined and continuous on a
non-empty closed subset D ⊆ G(K , RM ).

2. The existence of hyperspheres minimizing the objective F

This section summarizes results from the literature about the continuity of the Euclidean distance from a point to
a generalized hypersphere in hyperspace. From the continuity of the Euclidean distance relative to the parameters of a
generalized hypersphere, and from a priori bounds on such parameters in terms of the data, which restrict the parameters
to a compact set, follows the existence of a best-fitting generalized hypersphere relative to any objective that is continuous
relative to the residual distances.
A generalized hypersphere within RM corresponds to a Cartesian equation

a · ∥x∥22 + 2 · h′ · x + c = 0 (2)

with coefficients a, c ∈ R and h ∈ RM that are not all zero. (If a = 0 = c and h = 0, then the solution set is all of
RM .) Multiplying Eq. (2) by any non-zero scalar s ∈ R \ {0} does not change its solution set: the parameters (a, h, c ) and
(s · a, s · h, s · c ) correspond to the same solutions. There are many ways to specify unique parameters. To avoid computational
overflows, one way to specify parameters divides (a, h, c ) by a coefficient with a larger magnitude, for instance, as follows.
If |a| > max{|c |, ∥h∥2 }/2, then use the parameters (1, z, t ) := (1, h/a, c /a).
If |c | > max{|a|, ∥h∥2 }/2, then use the parameters (s, w, 1) := (a/c , h/c , 1).
If ∥h∥2 > max{|a|, |c |}, then let q := ∥h∥∞ := max{|h1 |, . . . , |hL |}, where |hj | = q for some j, and use (α, β, γ ) :=
(a/q, h/q, c /q), where βj = hj /q = ±1.
Algorithms on generalized hyperspheres need only operate on these parameters.
The preceding cases overlap one another, so an algorithm approaching the boundary of one case finds itself well into the
interior of another case, as in Section 1. Section 5.2 revisits computational and geometric aspects of these three cases.
To see why a continuous objective has a minimum on the set of parameters corresponding to generalized hyperspheres,
piece the foregoing three cases back together. To this end, introduce an equivalence relation ≡ on the space RM +2 \{(0, 0, 0)}
of parameters, so that (a1 , h1 , c1 ) ≡ (a2 , h2 , c2 ) if and only if there is a non-zero scalar s ∈ R \ {0} such that s · (a1 , h1 , c1 ) =
(a2 , h2 , c2 ). The set [a1 , h1 , c1 ] of all parameters (a2 , h2 , c2 ) equivalent to (a1 , h1 , c1 ) is the equivalence class of (a1 , h1 , c1 ).
All the parameters in the same equivalence class correspond to the same solution set. The set P(RM +2 ) of all such equivalence
classes is the projective space of RM +2 . With parameters σ in the unit ball in RM +1 , at least one of the inverse stereographic
j =M +1
projections (Pe−j 1 )j=1 from Section 1 covers a neighborhood of [a, h, c ] in P(RM +2 ). Algorithms may operate on σ instead
of (z, t ) or (s, w) or (α, β, γ ).
If (a1 , h1 , c1 ) ≡ (a2 , h2 , c2 ), then a1 = 0 if and only if a2 = 0. Thus the equation a = 0 is well defined on P(RM +2 ), and
similarly for h = 0 and c = 0.
If a = 0 but h ̸= 0, then Eq. (2) corresponds to a hyperplane in RM . If a = 0 and h = 0, then c ̸= 0 by the definition of
P(RM +2 ) and Eq. (2) corresponds to the hyperplane at infinity in RM . If a ̸= 0, then Eq. (2) is equivalent to

a2 · ∥x + a−1 h∥22 = ∥h∥22 − a c . (3)

Two cases can arise. In the case where ∥h∥22 − a c < 0, Eq. (3) has no solutions, because its left-hand side is non-negative.
In the alternative case, where ∥h∥22 − a c ≥ 0, Eq. (3) corresponds to a hypersphere with radius r and center z given by

∥h∥22 − ac
r2 = , z = −a−1 h.
a2

As ratios of homogeneous polynomials of the same degree, r 2 and each coordinate of z have the same value at (a, h, c )
and (s · a, s · h, s · c ). Thus they are well-defined functions P(RM +2 ) → P(R). Extending their co-domains to P(R)
allows for a continuous transition to the case where a = 0 as r 2 or z tends to infinity. The set of parameters G :=
{[a, h, c ] ∈ P(RM +2 ): ∥h∥22 − a c ≥ 0} is closed in P(RM +2 ) and hence compact, and for each [a, h, c ] ∈ G the solution
set of Eq. (2) is either a hyperplane for a = 0, or a hypersphere for a ̸= 0. Relative to the Euclidean norm on RM , the
signed distance dx ([a, h, c ]) from a point x ∈ RM to the generalized hypersphere is a continuous function of x ∈ RM and
Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269 257

[a, h, c ] ∈ G \ {[0, 0, 1]} [21, p. 583, Eq. (3.3)]:


 
 ∥ax + h∥2 − ∥h∥22 − ac
for a ̸= 0 ≤ ∥h∥22 − ac ,



a

dx ([a, h, c ]) := (4)
a · ∥x∥22 + 2hx + c

2
> ,



  for ∥ ax + h∥ 2 + ∥ h∥ 2 − ac 0
∥ax + h∥2 + ∥h∥22 − ac


where a and ∥ax + h∥2 + ∥h∥22 − ac cannot vanish simultaneously on G \ {[0, 0, 1]}, because a = 0 = ∥ax + h∥2 +

∥h∥22 − ac implies h = 0, whence [a, h, c ] = [0, 0, 1]. Moreover, for each point x ∈ RM , formulas (4) show that for
[a, h, c ] ∈ G \ {[0, 0, 1]},
lim dx ([a, h, c ]) = lim dx ([s, w, 1]) = ∞. (5)
[a,h,c ]→[0,0,1] [s,w,1]→[0,0,1]

For each data set X ′ = (x1 , . . . , xN ), define the vector of distances d = (d1 , . . . , dN ) ∈ RN by dj := dxj ([a, h, c ]) as in
formulas (4) for each j ∈ {1, . . . , N }. Let [a0 , h0 , c 0 ] be parameters for any real hypersphere, for instance, centered at the
mean of the data z := (x1 + · · · + xN )/N with mean square radius r 2 := j=1 (xj − z) /N, and let d be the vector of
N 2 0

distances, with d0j := dxj ([a0 , h0 , c 0 ]). Applying condition (1) to t = d0 and to any norm ∥d∥ of the vector of distances d
gives ε > 0 and δ > 0 such that inf{F (d): ∥d∥ > δ} ≥ F (d0 ) + ε . By the continuity of each dxj the set
D := {[a, h, c ] ∈ G: ∥d∥ ≤ δ} ⊂ G \ {[0, 0, 1]}
is closed in G and hence compact. Note that δ , ε , and D depend on F . Also, D contains [a0 , h0 , c 0 ] and F (d) > F (d0 ) for
every [a, h, c ] outside of D . Consequently, every objective F that is continuous on G \ {[0, 0, 1]} reaches a minimum on D ,
which is then also a minimum on G \ {[0, 0, 1]} for any objective F satisfying hypothesis (1).

Example 10. The circle centered at the origin with radius equal to one half satisfies the equation ∥x∥22 − 1/4 = 0, with
parameters [a, h, c ] = [1, 0, −1/4]. An open neighborhood of these parameters consists of all parameters [(1, z, t )] such
that −∞ < t < 0, corresponding to the circles centered at any z with squared radius r 2 = ∥z∥2 − t ≥ −t > 0, with
equation ∥x∥22 + 2 z′ x + t = 0. If iterations lead to a value of −t > 0 so large that overflows occur, then an algorithm can
switch to the equivalent parameters [a, h, c ] = [1/t , z/t , 1] = [s, w, 1], with s < 0 and equation s ∥x∥22 + 2 w′ x + 1 = 0.

Remark 11. From parameters [ak , hk , ck ] such that ∥hk ∥22 − ak ck ≥ 0, if an algorithm produces parameters [ak+1 , hk+1 , ck+1 ]
such that ∥hk+1 ∥22 − ak+1 ck+1 < 0, then it may be necessary to modify the algorithm, for instance, by reducing the step size.

Remark 12. For least-squares, median, or midrange regression, for each center z the optimal radius is the mean, median, or
midrange radius from z to the data [23, p. 170, Corollary 2], [21, p. 585, Theorem 4.1], [22, p. 279, Lemma 3.1.1]. Therefore, if
there are at least two distinct data points, then the optimal radius is positive. Thus the minimum of the objective occurs in
the interior G◦ of G, where ∥h∥22 − ac > 0. Yet the compactness of P(RM +2 ) was used to prove the existence of a minimum.
Appendix B extends the preceding result to the existence of best-fitting sets of any number of concentric hyperspheres
and sets of any number of parallel hyperplanes.

3. The existence of least-squares, median, and midrange circles and cylinders

This section proves that for all data in any Euclidean space there exist Cartesian products of generalized spheres and
affine manifolds in any dimension minimizing any continuous, generalized least-squares, median, or midrange, objective
F : RN → R under condition (1). The proof relies on the continuity of distances and orthogonal projections, on the compact-
ness of moduli spaces of linear manifolds and generalized spheres, and on the compactness of the data.

3.1. Distances from points to cylinders in manifolds

Generalized cylinders include a circle in space, and a pair of lines in space. They may thus lie within a plane, hyperplane,
or other affine submanifold of the same space. Lemma 13 pertains to the Euclidean distance from a point to an object in a
manifold.

Lemma 13. Consider, in a Euclidean space RJ with the Euclidean distance d, a point p ∈ RJ , and an affine manifold H ⊆ RJ
containing a subset E ⊆ H. Let p̂ be the orthogonal projection of p on H. Then
d(p, E )2 = d(p, H )2 + d(p̂, E )2 .
258 Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269

Proof. If q ∈ E ⊆ H, then (p − p̂)′ · (p̂ − q) = 0, and the Pythagorean theorem gives

d(p, q)2 = d(p, p̂)2 + d(p̂, q)2 .

Both sides reach their minimum (or infimum) at the same (sequence of) points q. 

Lemma 14 pertains to the Euclidean distance from a point to a cylinder.

Lemma 14. Consider, in a Euclidean space RJ with the Euclidean distance d, a point p ∈ RJ , a linear subspace A ⊆ RJ , a subset
S ⊆ A⊥ in the orthogonal complement, and the cylinder C := S + A. Let p̂ be the orthogonal projection of p on A⊥ . Then

d(p, C ) = d(p̂, S ).

Proof. For each w ∈ A, apply Lemma 13 to Hw := A⊥ + w, Ew := S + w ⊆ Hw , and the orthogonal projection p̂w on Hw of
a point p ∈ RJ :

d(p, Ew )2 = d(p̂w , Ew )2 + d(p, Hw )2 .

If also z ∈ A, then Hw and Hz are parallel to A⊥ , so d(p̂z , Ez ) = d(p̂w , Ew ). Moreover, since A ⊕ A⊥ = RJ there exists q ∈ A
such that p ∈ A⊥ + q = Hq . Thus

d(p, Ew )2 = d(p̂, S )2 + d(p, Hw )2


≥ d(p̂, S )2 + 0
= d(p̂, S )2 + d(p, Hq )2
= d(p̂, S )2 ,
where the first equality results from setting z := 0 ∈ A, and where d(p, Hq ) = 0. 

3.2. Parameterizations of sets of generalized cylinders

The next considerations focus on parameterizing sets of generalized cylinders in hyperspace. By way of an introduction,
the following examples show how to parameterize sets of generalized cylinders in the plane and in space. Examples 15 and
16 show how to parameterize sets of circular cylinders that are hypersurfaces.

Example 15. In the two-dimensional Euclidean plane R2 , the common direction A of a pair of parallel lines is a linear
subspace A ⊂ R2 through 0, which can be specified as in Sections 1.1 and 1.3 by a unit vector u ∈ S 1 , or the equivalence
class [u] ∈ P(R2 ). Section 1.2 then derives from u a unit vector v ∈ A⊥ spanning the line A⊥ that supports the cross-section.
The pair of parallel lines can be specified by the signed distances (coordinates) r and s along v from each of the lines to the
origin.

Example 16. In the three-dimensional Euclidean space R3 , the direction of the axis A of a cylinder can be specified as in
Sections 1.1 and 1.3 by a unit vector u ∈ S 2 , or the equivalence class [u] ∈ P(R3 ). Section 1.2 then derives from u an
orthonormal basis W = (v, w) for the normal subspace Π = A⊥ . A circular cross-section in A⊥ can be specified by canonical
or stereographic projections of parameters [a, h, c ] in R2+2−1 = R3 relative to the basis W from Section 2, as in Example 10
and Fig. 2.

Examples 17 and 18 focus on circular cylinders that are not hypersurfaces.

Example 17. For a circle in R3 , the plane P containing the circle, and an orthonormal basis W = (v, w) parallel to P, can
be specified as in Example 9 from Section 1.4 by a unit vector s ∈ S 3 ⊂ R4 or an equivalence class [s] ∈ P(R4 ), and a
QR-factorization [s, e4 ] = Q · R, where Q = (±s, p s + q e4 , v, w). A circle in P can be specified by canonical or stereographic
projections of parameters [a, h, c ] in R2+2−1 = R3 with the basis W = (v, w) from Section 2, as in Example 10 and Fig. 3.

Example 18. To specify a pair of parallel lines in space R3 , the plane P containing the parallel lines and a basis W for P can
be specified as in Example 17. Within P, the parallel lines can be specified as described in Example 15.

Examples 17 and 18 suggest splitting proofs and algorithms for fitting a generalized cylinder C into two (not necessarily
mutually independent) parts. One part may fit the parameters of an affine subspace P ⊆ RL with the smallest dimension
containing C , in space as in Example 17, in general as in Section 1.5. The other part may determine a best-fitting generalized
cylinder C with unit co-dimension (a hypersurface) within P.
Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269 259

Fig. 2. The parameter z ∈ R2 = C through an inverse stereographic projection determines a unit vector u ∈ S 2 ⊂ R3 and also the normal plane
Π = A⊥ = u⊥ . A factorization u = Q · R = (±u, v, w) · R yields an orthonormal basis W = (v, w) for Π . Relative to the basis W , the circle C ⊂ Π has
parameters [a, h, c ] as described in Section 2. The cylinder E has an axis parallel to u with cross-section C ⊂ Π = u⊥ . Algorithms on cylinders in R3 need
operate only on z ∈ C and stereographic projections of [a, h, c ] in R3 .

Fig. 3. A parameter z ∈ R3 through an inverse stereographic projection determines a unit vector s = (s1 , s2 , s3 , s4 )′ ∈ S 3 ⊂ R4 and also the normal
hyperplane Π = s⊥ ⊂ R4 and hence the affine plane P := Π ∩ (R3 × {1}) normal to r := (s1 , s2 , s3 )⊥ with equation s1 x + s2 y + s3 z + s4 = 0. A
factorization (s, e4 ) = Q · R = (±s, p s + q e4 , v, w) · R yields an orthonormal basis W = (v, w) for P. Relative to the basis W , the circle C ⊂ P has
parameters [a, h, c ] as described in Section 2. Algorithms on circles in R3 need operate only on z ∈ R3 and stereographic projections of [a, h, c ] in R3 .

3.3. The existence of best-fitting generalized cylinders

Theorem 19 proves the existence of such best-fitting generalized cylinders.

Theorem 19. For all non-negative integers K , L, M, and N with 0 < K < L and 1 ≤ M ≤ L − K , let C be the set of all sums S + A
(isomorphic to Cartesian products S × A) of a linear manifold A of dimension K and a generalized sphere S ⊆ A⊥ of dimension
M − 1 in RL . Then for each data sequence X ′ = (x1 , . . . , xN ) in RL , and for each non-negative continuous objective F : RN → R
satisfying condition (1), there exists at least one generalized cylinder C = S + A ∈ C minimizing F .

Proof. The conclusion follows from the continuity of the objective F on a closed subset of a compact Cartesian product of
Grassmann spaces. 

To prove the existence of best-fitting generalized cylinders of a fixed dimension K + M − 1, the proof of Theorem 19 uses
the compactness of the space G(M + K , RL+1 ) of all affine subspaces of dimension M + K in RL , each of which might contain
such a best-fitting generalized cylinder. In some cases, however, it may suffice to restrict the search to a smaller subset of
affine subspaces, as in Section 4.

4. Weighted least-squares circles lie in a plane through the weighted mean

This section shows that a weighted least-squares generalized circle in space lies in a plane passing through the weighted
mean of the data; similarly, a weighted least-squares pair of parallel lines in space also lies in a plane passing through the
weighted mean of the data. Consequently, to find weighted least-squares circles or pairs of parallel lines in space, it suffices to
search in planes through the weighted mean of the data. Counter-examples show that a weighted total least-squares circle
need not lie in a weighted total least-squares cylinder, sphere, or plane (which also passes through the weighted mean
but need not have the same normal). Thanks to notation from linear algebra, the same proof holds for all dimensions, and
also holds for subsets other than cylinders, for instance, ellipsoids and polytopes. Thus in R4 a weighted least-squares two-
dimensional sphere (homeomorphic to S 2 ), cylinder, or pair of parallel planes lies in some three-dimensional hyperplane
through the weighted mean of the data.
260 Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269

Theorem 20. Let E be a non-empty set of non-empty closed subsets of RL . Assume that E contains all translations of all of its
members: if E ∈ E and t ∈ RL , then E + t ∈ E , too. Also assume that the convex hull of each E ∈ E has dimension J: for each
E ∈ E there exists one J-dimensional affine subspace P ⊆ RL such that E ⊆ P.
With any invertible diagonal matrix T ∈ RN ×N , define F (d) := ∥T · d∥22 .
Then a best-fitting E ∈ E lies in a J-dimensional affine subspace of RL that passes through the weighted mean meanT (X ) of
the data, defined by 1 = (1, . . . , 1)′ and
1′ · T ′ · T · X
meanT (X ) := .
1′ · T ′ · T · 1
In other words, if E ∈ E lies in a J-dimensional affine subspace P ⊂ RL that does not pass through the weighted mean meanT (X )
of the data, then there exists t ∈ RL such that P + t passes through meanT (X ) and E + t fits the data better than E does.
Proof. For each point p ∈ RL and each circle, sphere, cylinder, or other object E ∈ E , let P be a J-dimensional affine manifold
containing E, let p̂ be the orthogonal projection of p on P, and let p̆ be a point on the subset E closest to p. Then, by Lemma 13,
d(p, p̆)2 = d(p, p̂)2 + d(p̂, p̆)2 . Shifting the affine manifold P by a vector t perpendicular to P shifts p̂ and E by t and hence
fixes d(p̂, p̆)2 . Thus
N
 N
 N

F (d) = Tk2,k · d(xk , x̆k )2 = Tk2,k · d(xk , x̂k )2 + Tk2,k · d(x̂k , x̆k )2 ,
k=1 k=1 k=1

where the second sum is invariant under the shift t, and the first sum has its minimum if and only if P passes through the
weighted mean of the data, by Theorem 23. 
In contrast to the case for affine manifolds, however, counter-examples show that a least-squares circle may lie in a plane
that need not be a least-squares plane: the two planes may have different normals. Similarly, a least-squares circle need not
lie on a least-squares cylinder. Likewise, a least-squares circle need not lie on a least-squares sphere.

5. Prototypes of algorithms for the plane and space

Whereas the present article focuses on proofs of the existence of best-fitting generalized cylinders, efficient algorithms
for finding them constitute a topic for further study. Nevertheless, as a proof of concept, Section 6 applies the foregoing
theory to examples with real data. To this end, this section summarizes the relevant computational methods:
1. Choose a positive integer J; for each j ∈ {1, . . . , J } choose a parameter zj ∈ C and map it to a unit vector uj such that the
unit vectors u1 , . . . , uJ are nearly equally spaced on the surface of the unit sphere or projective space.
2. For each unit vector uj compute an orthonormal basis Wj = (vj , wj ) of the normal plane A⊥ ⊥
j = uj .
3. Project the data orthogonally on the normal plane A⊥ j by means of the basis Wj .
4. Fit the parameters [aj , hj , cj ] of a generalized circle Cj to the projected data.
5. Select the parameters [am , hm , cm ] that give the smallest value of the objective.
6. Refine (zm , [am , hm , cm ]) by any iterative minimization routine.
For pairs of parallel lines in the plane, and for pairs of parallel planes in space, Appendix B shows that there exists a
partition of the data into two mutually disjoint subsets of data, with each parallel line or plane fitted to each subset of data
but with the same normal as in, for instance, Section 6.2.

5.1. The parameterization and search for axes in projective planes and spaces

For lack of a better algorithm for determining the axis of a best-fitting generalized cylinder, the algorithms suggested
here begin by testing finitely many sample mesh points in a space of parameters for all axes.
In space, the axis of a circular cylinder and the axis of a circle (the line through the center of the circle and perpendicular to
the plane containing the circle) are also one-dimensional subspaces A ⊂ R3 . The set of all such one-dimensional subspaces
can be parameterized by the parameter space of unit vectors u ∈ S 2 ⊂ R3 , or their equivalence classes [u] ∈ P(R3 ),
which can be parameterized by two inverse stereographic projections of the real plane R2 . Thus the parameter space for
A⊥ consists of two copies of a subset of R2 , for instance, the open ball B 2 (0, 2) := {z ∈ R2 : ∥z∥2 < 2}. Consequently, an
algorithm need search through finitely many sample mesh points in a disc B 2 (0, 2) at most twice. Alternatively, in space
R3 , the northern and southern hemispheres, opposite to ∓e3 , can be parameterized by nearly equally spaced grid points
on the surface of the hemisphere [34]. For each sample unit normal vector u, Section 1.2 shows how to get an orthonormal
basis W = (v, w) for the two-dimensional orthogonal complement A⊥ by a QR-factorization of u. To derive from u an
orthonormal basis W = (v, w) for A⊥ by a factorization u = Q · R = (±u, v, w) · R in a way that depends continuously on
j =3
z, it may be advantageous to use the three stereographic projections (Pej )j=1 , as explained in Section 1.
Either method (grid or stereographic projection) places nearly equally spaced sample points u (sample vectors of
parameters) on the sphere. To select an initial vector of parameters, it suffices to test each sample vector u by fitting to
Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269 261

the orthogonal projections of the data on the perpendicular plane A⊥ the parameters [a, h, c ] of a generalized circle relative
to an orthonormal basis W = (v, w) by the methods described in Section 5.2, and pick one sample vector u that fits the data
best. General purpose optimization algorithms can then refine the values of the parameters z and canonical or stereographic
projections of [a, h, c ].

5.2. The parameterization and search for cross-sections in projective spaces

Chernov and Lesort have conducted an extensive theoretical and computational study of various algorithms for starting
and refining the search for optimal parameters [a, h, c ] of generalized circles in the plane [19,20], much of which generalizes,
mutatis mutandis, to generalized spheres in space, and to generalized hyperspheres in hyperspace. Thus for each axis A,
specified by a unit vector u, the generalized circle in A⊥ that best fits the orthogonal projections of the data on the plane A⊥
can be computed by Chernov and Lesort’s algorithms [19,20]. They impose the condition that ∥2 h∥22 − 4 a c = 1 [19, p. 243,
Eq. (2.6)]. They point out, as regards guaranteeing a smooth objective, that it is unlikely that the optimal center coincides
with a data point: it is indeed impossible [23, p. 173, Lemma 7]. However, they also report divergence if the optimal center
z = a−1 h = 0 is at the origin: an algorithm might stall there, for instance by suffering from perturbations leading to
1 + 4 a c < 0, where they restart the algorithm after random shifts of coordinates away from the origin [19, p. 245]. The
following considerations attempt to explain such computational singularities for centers near the origin.
If h = 0, then the remaining parameters (a, c ) are on, not in an open neighborhood in, the subset of RL+2 defined by
the constraint ∥2 h∥22 − 4 a c = 1, where arbitrarily small perturbations of a or c can lead to 1 + 4 a c < 0. Yet a vector of
parameters (a, h, c ) such that h = 0 and 1 + 4 a c = 0 corresponds to the circle centered at the origin with squared radius
r 2 = −c /a = 1/(4 a2 ). There are parameterizations of neighborhoods of the parameters of hyperspheres with any positive
radius centered at the origin.
If 2 |a| > max{∥h∥2 , |c |} > 0, then use the parameters (1, z, t ) := (1, h/a, c /a). The inequalities 4 a2 > ∥h∥22 > a c
become 4 > ∥z∥22 > t. The open set of parameters

Ω := {(ζ, τ ) ∈ RL+1 : 8 > ∥ζ∥22 > τ }


is an open neighborhood of (1, z, t ), which allows for small perturbations while keeping the center ζ and squared radius
∥ζ∥22 − τ away from the boundary. This case allows for all centers ζ such that ∥ζ − 0∥22 < 8, and arbitrary radii, because
−∞ < τ < ∥ζ∥22 .
If 2 |c | > max{|a|, ∥h∥2 } > 0, then use the parameters (s, w, 1) := (a/c , h/c , 1). The inequalities 4 c 2 > ∥h∥22 > a c
become 4 > ∥w∥22 > s. The open set

Ω := {(ω, σ ) ∈ RL+1 : 8 > ∥ω∥22 > σ }


is an open neighborhood of (s, w, 1), which allows for small but arbitrary perturbations of ω and σ . This case allows for
continuous transitions between hyperplanes at a distance ∥c /(2 h)∥2 > 1/4 from the origin for a = 0 and large spheres for
a near 0.
If ∥h∥2 > max{|a|, |c |} > 0, then ∥h∥22 > |a| |c |, so ∥h∥22 − a c > 0 automatically. Let q := ∥h∥∞ := max{|h1 |, . . . , |hL |}
and use the parameters (α, β, γ ) := (a/q, h/q, c /q). This case allows for continuous transitions between hyperplanes at a
distance ∥c /(2 h)∥2 < 1/2 from the origin for a = 0 and large spheres for a near 0.
The preceding three cases overlap one another, so an algorithm can switch from one case to another without ever
encountering a boundary.

6. Examples

Section 6.1 shows a circle and a cylinder fitted to the same data in space, while Section 6.2 demonstrates parallel lines
fitted to archaeological data.

6.1. Fitting circles and circular cylinders to data in space

One of the many methods used to study quantitatively the deformation of blood vessels during cardiac motion relies on
various measures of vessel curvature, for instance, by fitting a circle or a cylinder to data points measured along the vessel.

Example 21. Table 1 lists a sample from data supplied by Eric Petersen at Boston Scientific, Three Scimed Place, Maple
Grove, MN 55 369 (personal communication). After initial searches with methods from Section 5 in each direction on a grid
on P(R3 ), refinements with Matlab’s fmins give a circle and a cylinder displayed in Fig. 4, with parameters in Table 2, which
also lists parameters for a fitted line, plane, and sphere, for comparison. In this example, the fitted circle lies neither on the
fitted cylinder nor on the fitted sphere, but it lies nearly in the fitted plane, because it also lies in a plane through the mean of
the data, and both planes have computationally nearly identical normals. Nevertheless, the fitted circle, cylinder, and sphere
have comparable radii and thus also comparable curvatures.
262 Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269

Table 1
Data points in space arranged in the columns of a transposed design matrix X ′ .

x 62 82 89 93 94 85 65 77 12 48
y 397 347 326 288 266 209 163 187 102 138
z 103 107 112 120 128 149 169 157 198 180

Table 2
Curves and surfaces fitted by the geometric total least-squares method to the data in Table 1.
Curve or surface Radius Center Normal Fita
R x y z x y z σ
Circle 244.1 −140.3 292.9 191.2 0.2777 0.2656 0.9232 1.620
Cylinder 230.2 −118.0 296.0 213.6 0.4500 0.4601 0.7654 0.190
Sphere 244.3 −142.1 291.2 185.3 0.266
Lineb ∞ 70.7 242.3 142.3 −0.9483 0.2297 0.2193 19.826
Planec ∞ 70.7 242.3 142.3 0.2778 0.2656 0.9232 1.587
a
Root mean square distance: square root of the average squared distance to the data points.
b
Only one normal is listed here; another normal is the normal to the fitted plane, by Theorem 25.
c
The selected center of the fitted line and plane is the mean of the data, by Theorem 25. At the limit of
a circle or sphere as the radius increases to ∞, a projective center can be defined by the normals already
listed.

Fig. 4. A circle with its center (·), and a cylinder with its center (+), fitted to the data in Table 1.

6.2. Fitting parallel lines to data in the plane

Alexander Thom and Alexander Strang Thom measured the Cartesian coordinates of Megalithic (3000–1000 B.C.) stones
on both sides of West Kennet Avenue at Avebury, shown in Fig. 5, which they found to be nearly parallel along each section:

From the positions of the stones shown in [the figure] it can safely be assumed that the avenue was intended to
be uniform in width. If we assume further that the six sections [. . .] identified in the figure were intentionally
straight, we may ask whether or not the changes in direction at each corner are consistent with any simple geometric
construction [35, p. 195].

Identifying the direction of a section thus leads to fitting parallel lines to its two sides.

Example 22. Table 3 lists the Cartesian coordinates of the stones along the edges of the fifth section of West Kennet Avenue
at Avebury [35, p. 193, Table 1], with a conjectured resulting accuracy better than 1 foot in x and 0.5 foot in y [35, p. 195].
Thus the variance in x is four times the variance in y, but there is no indication of any correlation between the
measurements errors in x and y. Therefore, to assign squared weights proportional to the reciprocal of the variances within
each data point [36, p. 144], the inner product and the induced distance in the data space RL = R2 may be changed to
⟨p, q⟩ := p1 q1 + 4p2 q2 = p′ · A · q with A = diagonal(1, 4), which amounts to a change of coordinates defined by x′ := x
and y′ := 2 y. However, there is no indication of any correlation of measurement errors between data points, so the weight
matrix T is the identity.

x1 = (23 038, 1598.6)′ , whereas the
The southwest edge corresponds to the first sequence of points, X1 , with mean ⃗

x2 = (23 106, 1110.6)′ . For the set {X1 , X2 },
northeast edge corresponds to the second sequence of points, X2 , with mean ⃗
Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269 263

Fig. 5. West Kennet Avenue; photograph courtesy Tim Prevett, http://[email protected].

Table 3
Cartesian coordinates of stones along West Kennet Avenue at Avebury [35, p. 193, Table 1].
Fifth section
Pair of stonesa Stone on southwest edge Stone on northeast edge
−x [ft/10] y [ft/10] −x [ft/10] y [ft/10]
Northwest sequence X1 Southeast sequence X2
28 ⃗x′1,1 21 470 1340 ⃗x′2,1 21 520 835
29 ⃗x′1,2 22 260 1478 ⃗x′2,2 22 300 1010
30 ⃗x′1,3 23 040 1597 ⃗x′2,3 23 130 1116
31 ⃗x′1,4 23 860 728 ⃗x′2,4 23 930 1230
32 ⃗x′1,5 24 560 1850 ⃗x′2,5 24 650 1362
′ ′
Means ⃗x1 23 038 1598.6 ⃗x2 23 106 1110.6
a
Numbers assigned by Thom and Thom to identify mutually facing pairs of stones on
opposite edges.

the design matrix becomes


 1522 251.4 
 822 129.4 
 2 −1.6 
−120.6
 
 ′
  −778
X1′ − 1
⃗⃗x1 −1568 −258.6
 
X := ′ =  1544 .
251.4 
X2′ − 1
⃗⃗x2 
 824 119.4 
5.4 
 
 24
−100.6
 
−806
−1586 −275.6
Measuring distances in the data space by the inner product with A = diagonal(1, 4) amounts to post-multiplying the design
matrix X on its right by the square root B′ := diagonal(1, 2) before computing the singular-value decomposition of X · B′ .
Rounded to sufficiently many digits for comparison, its smallest singular value is σ̂2 = 74.7, which equals the sum of
the squared distances, and corresponds to the (normal) right-singular vector v̂2 = (−0.309, 0.951)′ . Transforming v̂2 by
diagonal(2, 1) and normalizing the result gives a normal vector v⃗2 = (−0.16023, 0.98708)′ in the data space.
In comparison, with unweighted distances (A = I ), the smallest singular value of X is σ2 = 38.7, which equals the sum
of the squared distances, and corresponds to the (normal) right-singular vector v2 = (−0.16017, 0.98709)′ .
The weighted and unweighted total least-squares pairs of parallel lines are visually nearly indistinguishable at the scale
of Fig. 6.
Thom and Thom also conjecture that the cosecant of each angle between consecutive sections is an integer [35, p. 195].
However, the current direction of a section may differ from the original direction, because
[. . .] it has been excavated and the stones re-erected or marked by plinths set up in what were considered to have
been their original positions. It is difficult to know how successful the restorers were in placing the stones and plinths
exactly in their original positions [35, p. 193].
264 Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269

Fig. 6. Pair of parallel lines ( = ) fitted by the total least-squares method to data (•).
Source: From [35, p. 193, Table 1].

Consequently, while the straightness, direction, and uniform width of each section of the avenue can be established
mathematically to some degree, the data do not appear to contain sufficient information to go further and determine any
greatest common divisor of any trigonometric function of the angles between sections [37–40].

7. Conclusions

The present article proves the existence and presents examples of best-fitting single or concentric generalized cylinders,
relative to various distances in the ambient space of the data, and various metrics in the space of residual distances. Efficient
algorithms for finding such best-fitting generalized cylinders constitute a topic for further study.

Acknowledgments

This work was supported in part by a Faculty Research and Creative Works Grant from Eastern Washington University. I
thank the referee for reading through several drafts and for suggestions that led to vast enhancement of the manuscript.

Appendix A. Generalized total least-squares affine manifolds form flags

For use in Section 4, the present section generalizes to all dimensions the results that in space a least-squares line passes
through the least-squares point and lies within a least-squares plane. For a distance d induced by an inner product on the
data space RL , and for an objective F also induced by an inner product on the residual space RN , an optimal affine manifold
(minimizing F ) is called a generalized orthogonal (or total) least-squares, or least-squares for short, affine manifold. This section
shows that such least-squares affine manifolds are nested as a flag of affine subspaces
{p} = V0 ⊂ V1 ⊂ · · · ⊂ VL−1 ⊂ VL = RL ,
in the sense that each least-squares affine manifold VL−K of co-dimension K lies in a least-squares affine manifold of co-
dimension K − 1 if K > 0 and contains a least-squares affine manifold of co-dimension K + 1 if K < L. For example, a
least-squares line in space lies within a least-squares plane and contains a least-squares point.

A.1. Least-squares entities correlated with singular matrices

Some applications of least-squares methods in statistics involve linear combinations of the residuals defined by singular
correlation matrices [41]. Such singular matrices preclude the use of any reversible change of coordinates to and from a
Euclidean space with its usual dot product. Nevertheless, the following considerations show that such potentially singular
correlation matrices define semi-scalar products and semi-norms, with which calculations can proceed almost exactly as
with the usual dot product.
In the data space RL , because the Euclidean norm is defined in terms of the inner product, and vice versa by the polar
identity ∥p + q∥22 − ∥p − q∥22 = 4 p′ · q [24, p. 108, Eq. (3.2-3)], for a hyperplane P = ν⊥ , the objective F can also be defined
in terms of the array of signed distances from the data to P, with 1 := (1, . . . , 1)′ ∈ RN :
d± (x1 , P ) (x1 − q)′ ν (x1 − q)′
       ′  
x1
.. .. ..  . 
d± =   ν =  ..  − 1 · q  ν.
′
= =
     
. . .
d± (xN , P ) (xN − q)′ ν (xN − q)′ x′N
Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269 265

For unweighted orthogonal least-squares regression, the objective F is the squared Euclidean norm of the array of signed
distances: F (d± ) = ∥d± ∥22 . For weighted least-squares regression, which multiplies each signed distance d± (xj , P ) by a pos-
itive weight factor Dj,j , the objective F becomes F (d± ) = ∥D · d± ∥22 , with the diagonal matrix D = diagonal(D1,1 , . . . , DN ,N ).
For generalized least-squares regression, the objective F can be the squared Euclidean norm of any linear combination of
the array of signed distances with any matrix T , so F (d± ) = ∥T · d± ∥22 . The only requirement on T is that T · 1 ̸= 0, so that
1′ · T ′ · T · 1 ̸= 0 [42, p. 209]. If T is not invertible, then d → T · d is not an invertible change of coordinates. Neverthe-
less, calculations with ⟨d, b⟩T ′ T := d′ · T ′ · T · b, which is called a semi-scalar product [43, p. 250], and ∥d∥2T ′ T := ⟨d, d⟩T ′ T ,
which is the induced semi-norm [43, p. 23], work here as they do with the dot product. Examples of such matrices T appear
in [41].
Define the design matrix X as the transposed matrix of data, so that X = (X ′ )′ :
′
X = x1 , . . . , xN ∈ R N ×L .

Also, define the generalized mean of the data X ′ and of the design matrix X by

1′ · T ′ · T · X
meanT (X ′ )′ := meanT (X ) := ∈ R1×L .
1′ · T ′ · T · 1
Define the centered design matrix by subtracting the generalized mean from the data:

X̌ := X − 1 · meanT (X ) ∈ RN ×L .

Theorem 23 characterizes generalized least-squares hyperplanes.

Theorem 23. The generalized mean of the centered design matrix is zero.
For each point q ∈ RL , let E be the set of all the hyperplanes through q; a hyperplane P ∈ E minimizes the objective
F (d± ) = ∥T · d± ∥22 if and only if P is perpendicular to a right-singular vector for the smallest singular value of T · (X − 1 · q′ ).
For each unit vector u ∈ S L−1 ⊂ RL , let E be the set of all the hyperplanes perpendicular to u; if T · 1 ̸= 0, then a hyperplane
P ∈ E minimizes F (d± ) = ∥T · d± ∥22 if and only if P passes through meanT (X )′ , where F (d± ) = ∥T · X̌ · u∥22 .
Therefore, every least-squares hyperplane passes through meanT (X )′ and is perpendicular to a right-singular vector for the
smallest singular value of T · X̌ .
Moreover, for all unit vectors u1 , . . . , uK ∈ S L−1 ⊂ RL , a set of hyperplanes P1 ⊥ u1 , . . . , PK ⊥ uK with signed distances d±
k

k ∥2 if and only if every Pk passes through meanT (X ) .


K
from the data to Pk minimizes the objective k=1 ∥T · d± 2 ′

Proof. For the unweighted least-squares case see [44–46]. More generally,
 
1, X̌ = 1′ · T ′ · T · [X − 1 · meanT (X )]
T ′T
1′ · T ′ · T · X
= 1′ · T ′ · T · X − 1′ · T ′ · T · 1 ·
1′ · T ′ · T · 1
= 0, ′

whence meanT (X̌ ) = 0′ . Also, for each point q ∈ RL ,

F (d± ) = ∥T · d± ∥22 = ∥T · (X − 1 · q′ ) · u∥22

reaches a minimum relative to u ∈ S L−1 if and only if u is a right-singular vector for the smallest singular value of T ·(X −1·q′ ).
Moreover, by the Pythagorean Theorem relative to the semi-scalar product ⟨d, b⟩T ′ T := d′ · T ′ · T · b, for each u ∈ S L−1 ⊂ RL ,

(X − 1 · q′ ) · u2 ′ = (X − 1 · meanT (X ) + 1 · meanT (X ) − 1 · q′ ) · u2 ′


   
T T T T
 2 2
= X̌ · u ′ + 1 · [meanT (X ) − q ] · uT ′ T
   ′
T T
 2
≥ X̌ · u ′
 
T T

with equality if and only if [meanT (X ) − q′ ] · u = 0 (because T · 1 ̸= 0), which states that meanT (X ) − q′ is parallel to P
and hence meanT (X ) and q both lie in P.
Combining the previous two conclusions reveals that an optimal unit normal vector u is the a right-singular vector for
the smallest singular value of the matrix T · X̌ .
The last conclusion follows by adding the minima of each summand ∥T · d± 2
k ∥2 . 
266 Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269

A.2. Relative positions of total least-squares manifolds

In preparation for generalized least-squares affine manifolds, Theorem 24 characterizes the minimum of weighted sums
of squares. Let ej := (0, . . . , 0, 1, 0, . . . , 0) with a single 1, in the jth coordinate, and denote by Span(v1 , . . . , vM ) the linear
subspace of RL spanned by any vectors v1 , . . . , vM ∈ RL .

Theorem 24. On the unit sphere S L−1 , for all real σ1 ≥ · · · ≥ σL ≥ 0 the quadratic form P (w) := wj2 σj2 reaches its
L
j =1
minimum value σ at w := eL .
2
L
On a Cartesian product of K unit spheres, with W = (w1 , . . . , wK ) ∈ (S L−1 )K , the quadratic form Q (W ) := P (wk )
K
k=1
has a minimum where wk ∈ Span(eL+1−K , . . . , eL ) for every k ∈ {1, . . . , K }.
For orthonormal vectors w1 , . . . , wK , the minimum value is σL2+1−K + · · · + σL2 .

Proof. For each unit vector w = (w1 , . . . , wL ) ∈ S L−1 , substituting wL2 = 1 − wj2 in P gives P (w) = wj2 (σj2 −
L−1 L−1
j =1 j =1
σL ) + 1 · σL ≥ σL , with equality for w = eL , because σj − σL ≥ 0 for every j ∈ {1, . . . , L − 1}.
2 2 2 2 2

More generally, substituting j=1 wj2 = 1 − j=L+1−K wj2 into P gives


L−K L

L
 L−K

P (w) = σj2 wj2 + σj2 wj2
j=L+1−K j =1

L
 L−K

≥ σj2 wj2 + σL2+1−K wj2
j=L+1−K j =1
 
L
 L

= σ w +σ
j
2 2
j
2
L+1−K 1− w2
j .
j=L+1−K j=L+1−K

Ď
Thus replacing wL+1−K by wL+1−K := wL2+1−K + 1 − wj2 and replacing w by wĎ := (0, . . . , 0, wLĎ+1−K , wL+2−K ,
L
j=L+1−K
Ď
. . . , wL ) gives a vector w ∈ S ∩ Span(eL+1−K , . . . , eL ) where P (wĎ ) ≤ P (w).
L−1

For orthonormal vectors w1 , . . . , wK in RK , the matrix W := (w1 , . . . , wK ) ∈ RK ×K is orthogonal and thus preserves
M N
the Euclidean norm of vectors and hence also the Frobenius norm of matrices, ∥A∥2F = k=1
2
ℓ=1 |Ak,ℓ | , in the sense
that ∥W · A∥F = ∥A∥F [47, Section 2.5.2, pp. 70–71]. On Span(eL+1−K , . . . , eL ), with W and Σ = diagonal(σL+1−K , . . . , σL )
re-indexed, the quadratic form Q becomes
K
 K
 L
 K
 L

Q (W ) = P (wk ) = σj2 (wk )2j = [(wk )j Σj,j ]2
k=1 k=1 j=L+1−K k=1 j=L+1−K

K
 L

= [Wk′,j Σj,j ]2 = ∥W ′ Σ ∥2F = ∥Σ ∥2F
k=1 j=L+1−K

= σL2+1−K + · · · + σL2 .
Thus for orthonormal vectors the minimum value is σL2+1−K + · · · + σL2 . 

Theorem 25. Consider an affine manifold of co-dimension K as the intersection of K hyperplanes P1 = s⊥ 1 , . . . , PK = sK .



±
Denote by dk the array of signed distances from the data to Pk . Each least-squares affine manifold of co-dimension K , minimizing
k ∥2 , lies in a least-squares affine manifold of co-dimension K − 1 if K > 0 and contains a least-squares affine
K
k=1∥T · d± 2

manifold of co-dimension K + 1 if K < L.


K ± 2
Proof. Theorem 23 shows that each least-squares affine manifold of co-dimension K minimizing k=1 ∥T · dk ∥2 passes
K ± 2  K 2
through the generalized mean of the data, where k=1 ∥T · dk ∥2 = k=1 ∥T · X̌ · sk ∥2 .

j=1 σj uj vj ,
L
The singular-value decomposition (SVD) of the weighted centered data T · X̌ has the form T · X̌ = U · Σ · V ′ = ′

where U = (u1 , . . . , uN ) ∈ RN ×N and V = (v1 , . . . , vL ) ∈ RL×L are orthogonal matrices (with orthonormal columns), and
Σ = diagonal(σ1 , . . . , σL ) ∈ RN ×L is diagonal with σ1 ≥ · · · ≥ σL ≥ 0. Consider K mutually orthonormal vectors w1 , . . . ,
wK ∈ RL with coordinates (wk )j = wj,k relative to the orthonormal basis V , such that wk = j=1 wj,k vj . Then
L

 
K K  L L
  K
∥(T · X̌ ) · wk ∥22 = wj2,k σj2 = wj2,k σj2 = Q (W ).
k=1 k=1 j=1 j =1 k=1
Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269 267

Theorem 24 shows that the objective Q reaches its minimum value σL2 + · · · + σL2+1−K for wk = vL+1−k . Thus at the
minimum sj = vj for every j, so the generalized total least-squares affine manifolds form an orthogonal flag, normal to
vL , vL−1 , . . . , v2 , v1 for the optimal point, normal to vL , vL−1 , . . . , v2 for the optimal line, . . . , normal to vL for the optimal
hyperplane, and normal to ∅ for RL . 

Theorem 25 fails for objectives that are not induced by inner products, for instance, for median regressions, for which F is
the ℓ1 -norm of the residuals. In this case, a point ⃗ x1 , . . . , ⃗
x in RL with the Euclidean norm ∥ ∥2 is a median point for data ⃗ xN if
x minimizes F (d) :=
N
and only if ⃗ k=1 ∥⃗x−⃗ xk ∥2 = ∥d∥1 . Counter-examples show that a median line need not pass through
a median point.

Appendix B. Many large objects fit data better than a few small objects do

The literature on fitting circles to data in the plane raised the question of whether a singleton can fit data better than
a circle can, and with at least two distinct data points answered it negatively for median circles [18, p. 902, Lemma 1]
and least-squares circles [20, p. 55]. The same negative answer also holds for fitting median [21, p. 585, Theorem 4.2],
midrange [22, p. 279, Lemma 3.1.1], and least-squares [23, p. 170, Corollary 2] hyperspheres to at least two distinct data
points in hyperspace. Here, the corresponding question is whether a point, line, circle, or other curve can fit the data better
than a plane, cylinder, or other surface can. Theorem 27 confirms that the answer is negative for every objective F that
increases with each variable.

Example 26. A cylinder fits data better than a line can: for each data set with at least three non-collinear points x1 , x2 , x3
in the Euclidean space (RK , ∥ ∥2 ), and for every line L ⊂ RK , there exists a cylinder E ⊆ RK such that F (E ) < F (L), for
instance, a cylinder E containing L and xℓ with xℓ ̸∈ L: apply Theorem 27 to the sets C of all lines and E of all cylinders in RK .

The following setup accommodates a variety of such examples. For every collection E of larger objects, for instance,
cylinders, and for every collection C of smaller objects, for instance, points, in the sense that for every C ∈ C there exists
E ∈ E such that C ⊆ E, the question arises of whether a smaller object C ∈ C can fit the data better than any larger object
E ∈ E can. Again for the purposes of informal exposition, the objects in C may be called ‘curves’ while the objects in E may
be called ‘surfaces’ (with quotation marks). As another instance, C can be the set of all singletons and E can be the set of all
spheres in X. Theorem 27 confirms that the answer is negative for every objective F that increases with each variable.

Theorem 27. Suppose that every ‘curve’ C ∈ C is a closed subset of the metric space (X, d). For some integers N > M > 0
assume that for all points p1 , . . . , pM ∈ X there exists a ‘curve’ C ∈ C such that p1 , . . . , pM ∈ C . Also assume that for every
point p ∈ X and every ‘curve’ C ∈ C there exists a ‘surface’ E ∈ E such that p ∈ E and C ⊆ E. Assume further that the data
x1 , . . . , xN are in general position in the sense that there does not exist any ‘curve’ C ∈ C such that x1 , . . . , xN ∈ C .
Then for each ‘curve’ C ∈ C there exists a ‘surface’ E ∈ E for which d(xj , E ) ≤ d(xj , C ) for every j ∈ {1, . . . , N }, with a strict
inequality for at least one such index.
If also F is a strictly increasing function of each variable, then F (E ) < F (C ).

Proof. With data in general position, for each C ∈ C there exists a data point xℓ such that xℓ ̸∈ C . Hence d(xℓ , C ) > 0
because C is a closed set in (X, d). By hypothesis there exists E ∈ E with xℓ ∈ E and C ⊆ E. From C ⊆ E it follows that
d(xj , E ) ≤ d(xj , C ) for every j ∈ {1, . . . , N }, while from xℓ ∈ E it follows that d(xℓ , E ) = 0 < d(xℓ , C ). If F increases strictly
with each variable, then F [d(x1 , E ), . . . , d(xN , E )] < F [d(x1 , C ), . . . , d(xN , C )], which states that F (E ) < F (C ). 

Small objects (‘curves’) can be unions of one or a few closed sets, for instance, single lines, whereas large objects
(‘surfaces’) can consist of the union of many more of the same closed sets, for instance, parallel lines. While Theorem 27
already shows that such ‘surfaces’ fit the data better than such ‘curves’ do, Theorem 28 reveals that the constituent ‘curves’
of such ‘surfaces’ partition the data relative to which ‘curve’ lies closest to which data points. To this end, let |Y | denote the
cardinality of a set Y .

Theorem 28. Assume that F is a strictly increasing function of each variable. Let A be a set of closed subsets of a metric space
(X, d) such that for each point p ∈ X there exists an object A ∈ A with p ∈ A. For all K , M ∈ N such that 1 ≤ K < M, let
C be the set of unions of K not necessarily distinct objects from A, and let E be the set of unions of M not necessarily distinct
objects from A. Then for each data set in general position, in the sense that no ‘curves’ in C contain all the data points, there exists
a ‘surface’ E ∈ E such that F (E ) < F (C ) for every C ∈ C .
Also, for each E = A1 ∪ · · · ∪ AM ∈ E there exists a partition of the data X ′ = X ′ 1 ∪ · · · ∪ X ′ M such that d(xj , Ak ) ≤ d(xj , Aℓ )
for all k, ℓ ∈ {1, . . . , M } and every xj ∈ X ′ k . Therefore, if FY is any weighted p-norm on R|Y | , then E minimizes FX ′ if and only if
each Ak minimizes FX ′ k over all such partitions of the data X ′ .

Proof. Theorem 27 yields the first conclusion.


The second conclusion holds because for each ‘surface’ E = A1 ∪ · · · ∪ AM ∈ E there exists a partition of the data
X ′ = X ′ 1 ∪ · · · ∪ X ′ M such that d(x, E ) = d(x, Ak ) for every x ∈ X ′ k . 
268 Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269

Theorem 28 thus shows that to fit a union of M ‘curves’ to data it suffices to consider all partitions of the data into M
subsets and fit one ‘curve’ to each subset.

Example 29. For each non-zero vector ν ∈ RL let C be the set of hyperplanes normal to ν, and let E be the set of unions of
two parallel or identical hyperplanes normal to ν. Then for each data set with at least L + 1 points in general position, there
is a partition X ′ = X ′ 1 ∪ X ′ 2 such that an optimal union of hyperplanes H = H1 ∪ H2 ∈ E consists, for each k ∈ {1, 2},
of an optimal hyperplane Hk for the subset X ′ k . In particular, for orthogonal least-squares approaches, Appendix A shows
that each hyperplane Hk passes through the centroid of the subset X ′ k . Consequently, the optimal normal direction is the
right-singular vector ν = vL for the smallest singular value σL of the matrix (Xˇ′ 1 , Xˇ′ 2 )′ , where mbox′ denotes transposition,
and each Xˇ′ k is X ′ k centered around its own mean.

Example 30. Some applications require cylindrical or spherical shells bounded by concentric cylinders or spheres [1–6,8].
To this end, write the equation of a generalized hypersphere in the form a · ∥x∥22 + b′ · x + c = 0. For each projective center
z = −b/(2 a) let C be the set of generalized hyperspheres centered at z, and let E be the set of unions of pairs of concentric
or identical generalized hyperspheres centered at z. Then for each data set with at least L + 1 points in general position,
there is a partition X ′ = X ′ 1 ∪ X ′ 2 such that an optimal pair of generalized hyperspheres S = S1 ∪ S2 ∈ E consists, for each
k ∈ {1, 2}, of an optimal generalized hypersphere Sk for the subset X ′ k .

References

[1] Pankal K. Agarwal, Boris Aronov, Sariel Har-Peled, Micha Sharir, Approximating and exact algorithms for minimum-width annuli and shells, Discrete
and Computational Geometry 24 (4) (2000) 687–705.
[2] Timothy M. Chan, Approximating the diameter, width, smallest enclosing cylinder, and minimum-width annulus, in: SCG’00: Proceedings of the
Sixteenth Annual Symposium on Computational Geometry, SCG’00, ACM, New York, NY, USA, 2000, pp. 300–309.
[3] Olivier Devillers, Franco P. Preparata, Evaluating the cylindricity of a nominally cylindrical point set, in: Proceedings of the Eleventh Annual ACM–SIAM
Symposium on Discrete Algorithms, SODA’00, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000, pp. 518–527.
[4] Zvi Drezner, Stefan Steiner, George O. Wesolowsky, On the circle closest to a set of points, Computers & Operations Research 29 (6) (2002) 637–650.
[5] Christian A. Duncan, Michael T. Goodrich, Edgar A. Ramos, Efficient approximation and optimization algorithms for computational metrology,
in: Michael Saks (Ed.), Proceedings of the Eighth Annual ACM–SIAM Symposium on Discrete Algorithms, SODA’97, Society for Industrial and Applied
Mathematics, Philadelphia, PA, USA, 1997, pp. 121–130.
[6] Jesús García-López, Pedro A. Ramos, Fitting a set of points by a circle, in: Jean Daniel Boissonnat (Ed.), Proceedings of the Thirteenth Annual Symposium
on Computational Geometry, SCG’97, ACM, New York, NY, USA, 1997, pp. 139–146.
[7] David L. Powers, Finding a best approximate circle, The UMAP Journal 13 (2) (1992) 101–112.
[8] T.J. Rivlin, Approximation by circles, Computing (ISSN: 0010-485X) 21 (2) (1978–1979) 93–104. http://dx.doi.org/10.1007/BF02253130.
[9] C.S. Fraser, J. Shao, Scale-space methods for image feature modeling in vision metrology, Photogrammetric Engineering and Remote Sensing 64 (4)
(1998) 323–328.
[10] Uzi Ethrog, Measuring the deformation of storage tanks, Photogrammetria 40 (3) (1986) 299–310.
[11] H. Papo, B. Shmutter, Tank calibration by stereophotogrammetry, Photogrammetria 34 (3) (1978) 101–109.
[12] W. Faig, Close-range precision photogrammetry for industrial purposes, Photogrammetria 36 (5) (1981) 183–191.
[13] J.D. Siegwarth, J.F. LaBreque, C.L. Carrol, Volume uncertainty of a large tank calibrated by photogrammetry, Photogrammetric Engineering and Remote
Sensing 50 (8) (1984) 1127–1134.
[14] Uzi Ethrog, Metric information from nonmetric photographs of circular or cylindrical objects, Photogrammetria 42 (4) (1988) 163–176.
[15] R.M. Feltham, Determining cylindrical parameters, The Photogrammetric Record 13 (75) (1990) 407–414.
[16] E. Vozikis, Engineering applications for the WILD NTP concept, The Photogrammetric Record 12 (69) (1987) 307–321.
[17] Jack Brimberg, Henrik Juel, Anita Schöbel, Locating a minisum circle on a sphere, Operations Research 55 (4) (2007) 782–791.
[18] Jack Brimberg, Henrik Juel, Anita Schöbel, Locating a minisum circle in the plane, Discrete Applied Mathematics 157 (2009) 901–912.
[19] N. Chernov, C. Lesort, Least squares fitting of circles, Journal of Mathematical Imaging and Vision 23 (2005) 239–252.
[20] Nikolai Chernov, Circular and Linear Regression: Fitting Circles and Lines by Least Squares, in: Monographs on Statistics and Applied Probability,
vol. 117, CRC Press, Boca Raton, FL, 2011.
[21] Yves Nievergelt, Median spheres: theory, algorithms, applications, Numerische Mathematik 114 (4) (2010) 573–606.
[22] Yves Nievergelt, A finite algorithm to fit geometrically all midrange lines, circles, planes, spheres, hyperplanes, and hyperspheres, Numerische
Mathematik 91 (2) (2002) 257–303.
[23] Yves Nievergelt, Perturbation analysis for circles, spheres, and generalized hyperspheres fitted to data by geometric total least-squares, Mathematics
of Computation 73 (245) (2004) 169–180.
[24] Angus Ellis Taylor, Introduction to Functional Analysis, Wiley, New York, NY, 1958.
[25] James Dugundji, Topology, Allyn and Bacon, Boston, MA, 1966.
[26] Hans Schwerdtfeger, Geometry of Complex Numbers, Dover, New York, NY, 1979.
[27] David E. Blair, Inversion Theory and Conformal Mapping, American Mathematical Society, Providence, RI, 2000.
[28] Gene H. Golub, Charles F. Van Loan, An analysis of the total least squares problem, SIAM Journal on Numerical Analysis 17 (6) (1980) 883–893.
[29] Raoul Bott, John Milnor, On the parallelizability of the spheres, Bulletin of the American Mathematical Society 64 (1958) 87–89.
[30] James R. Munkres, Elements of Algebraic Topology, Benjamin/Cummings, Menlo Park, CA, 1984.
[31] James W. Vick, Homology Theory, Academic Press, New York, NY, 1973.
[32] Josef Stoer, Roland Bulirsch, Introduction to Numerical Analysis, third ed., Springer-Verlag, New York, NY, 2002.
[33] Phillip A. Griffiths, Joseph Harris, Principles of Algebraic Geometry, Wiley, New York, NY, 1978.
[34] Donna A. Calhoun, Christiane Helzel, Randall J. Leveque, Logically rectangular grids and finite volume methods for PDEs in circular and spherical
domains, SIAM Review 50 (4) (2008) 723–752.
[35] Alexander Thom, Alexander Strang Thom, Avebury (2): The West Kennet Avenue, Journal for the History of Astronomy 7 (1976) 193–197.
[36] Gilbert Strang, Introduction to Applied Mathematics, Wellesley-Cambridge Press, Wellesley, MA, 1986.
[37] P.R. Freeman, Thom’s survey of the Avebury ring, Journal for the History of Astronomy 8 (1977) 134–136.
[38] P.R. Freeman, A Bayesian analysis of the megalithic yard, Journal of the Royal Statistical Society, Series A 139 (1976) 20–55.
[39] J. Gates, Testing for circularity of spacially located objects, Journal of Applied Statistics 20 (1) (1993) 95–103.
[40] J. Gates, Distance mean and variance functions and sphere fitting, Statistics 25 (3) (1994) 251–266.
[41] George Zyskind, Frank B. Martin, On best linear estimation and a general Gauss–Markov theorem in linear models with arbitrary nonnegative
covariance structure, SIAM Journal on Applied Mathematics 17 (6) (1969) 1190–1202.
Y. Nievergelt / Journal of Computational and Applied Mathematics 239 (2013) 250–269 269

[42] Dennis Leech, Keith Cowling, Generalized regression estimation from grouped observations: a generalization and an application to the relationship
between diet and mortality, Journal of the Royal Statistical Society 145 (2) (1982) 208–223.
[43] Kôsaku Yosida, Functional Analysis, fourth ed., in: Die Grundlehren der Mathematischen Wissenschaften in Enizeldarstellungen, vol. 123, Springer-
Verlag, New York, NY, 1974.
[44] Leonhard Euler, Theoria Motus Corporum Solidorum Seu Rigidorum: EX Primis Nostrae Cognitionis Principiis Stabilita et ad Omnes Motus, qui in
Huiusmodi Corpora Cadere Possunt, Accommodata, A.F. Roese, Rostock and Greifswald, MDCCLXV, 1765. Available on the Euler Archives http://www.
math.dartmouth.edu/∼euler/pages/E289.html. Translated into English by Ian Bruce http://www.17centurymaths.com/contents/mechanica3.html.
[45] Leonhard Euler, Charles Blanc (Eds.), Theoria Motus Corporum Solidorum Seu Rigidorum EX Primis Nostrae Cognitionis Principiis Stabilita et ad Omnes
Motus qui in Huiusmodi Corpora Cadere Possunt Accomodata 1st Part, in: Opera Omnia, vols. 2–3, Birkhäuser, Basel, CH, 1948.
[46] Karl Pearson, On lines and planes of closest fit to systems of points in space, Philosophical Magazine Series 6 2 (11) (1901) 559–572.
[47] Gene H. Golub, Charles F. Van Loan, Matrix Computations, second ed., Johns Hopkins University Press, Baltimore, MD, 1989.

You might also like