0% found this document useful (0 votes)
45 views8 pages

Agarwal, Snavely, Seitzfast Algorithms - 2008 - Fast Algorithms For L-Inf Problems in Multiview Geometry-Annotated

This document presents fast algorithms for solving L∞ problems in multiview geometry. It shows that these problems are instances of convex-concave generalized fractional programs. It surveys major solution methods for these programs and presents them in a unified framework centered around a parametric optimization problem. The document proposes two new algorithms and shows an existing algorithm is a special case of a classical algorithm. It compares algorithm performance on various datasets, finding an algorithm by Gugat performs best. An open source toolbox implements the algorithms.

Uploaded by

jly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views8 pages

Agarwal, Snavely, Seitzfast Algorithms - 2008 - Fast Algorithms For L-Inf Problems in Multiview Geometry-Annotated

This document presents fast algorithms for solving L∞ problems in multiview geometry. It shows that these problems are instances of convex-concave generalized fractional programs. It surveys major solution methods for these programs and presents them in a unified framework centered around a parametric optimization problem. The document proposes two new algorithms and shows an existing algorithm is a special case of a classical algorithm. It compares algorithm performance on various datasets, finding an algorithm by Gugat performs best. An open source toolbox implements the algorithms.

Uploaded by

jly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Fast Algorithms for L∞ Problems in Multiview Geometry

Sameer Agarwal Noah Snavely Steven M. Seitz


[email protected] [email protected] [email protected]
University of Washington, Seattle

Abstract sonable for small problems like triangulation and camera


resectioning, the bisection algorithm is very slow for large
Many problems in multi-view geometry, when posed as scale problems like structure and translation estimation with
minimization of the maximum reprojection error across ob- known rotations, where the number of variables can be in
servations, can be solved optimally in polynomial time. We the hundreds of thousands for large problems [18].
show that these problems are instances of a convex-concave
generalized fractional program. We survey the major solu- The objective of this paper is to present fast algorithms for
tion methods for solving problems of this form and present the solution of large scale L∞ problems. We first show that
them in a unified framework centered around a single para- L∞ problems in multi-view geometry are convex-concave
metric optimization problem. We propose two new algo- generalized fractional programs (Section 2). Like the L∞
rithms and show that the algorithm proposed by Olsson et problem, generalized fractional programs are also quasi-
al. [21] is a special case of a classical algorithm for gen- convex and can be solved using the bisection algorithm.
eralized fractional programming. The performance of all However, unlike a generic quasi-convex program they have
the algorithms is compared on a variety of datasets, and specific structure that can be exploited to build algorithms
the algorithm proposed by Gugat [12] stands out as a clear which are significantly faster than the bisection algorithm.
winner. An open source MATLAB toolbox thats implements We then introduce the parametric optimization problem that
all the algorithms presented here is made available. lies at the heart of a number of methods for solving general-
ized fractional programs (Section 3). We survey the major
methods for solving generalized fractional programs and
1. Introduction present them in a unified framework centered around this
parametric optimization problem (Sections 4-7). Along the
As the theory of multi-view geometry has matured, the
way, we propose two new algorithms for solving L∞ prob-
focus of research has recently shifted from the study of
lems (Section 4) and show that a recently proposed algorithm
the geometric and algebraic structure of the problem to the
by Olsson et al. [21] for L∞ optimization is a special case
numerical solution of the resulting optimization problems.
of a classical algorithm for generalized fractional program-
A particularly fruitful line of work has been the develop-
ming (Section 5). We then compare the performance of the
ment of methods that minimize the maximum reprojection
various algorithms on a variety of large scale data sets and
error across observations (the L∞ norm of the vector of
show that an algorithm proposed by Gugat [12] stands out
reprojection errors) instead of the more commonly used
as a clear winner (Section 8). Last but not least, we make
sum of squared reprojection errors. The advantage of this
available an open source MATLAB toolbox for doing large
approach is that in many cases the resulting optimization
scale L∞ optimization. The toolbox includes all the code
problem has new structure that is amenable to global opti-
used to perform the experiments reported in this paper.
mization. In particular these optimization problems turn out
to be quasi-convex [13, 15, 16] enabling efficient, globally
optimal solutions using the methods of convex optimization We now summarize the notational conventions used in
[5]. A wide range of multi-view geometry problems have the rest of the paper. Upper case letters, e.g., Pi , de-
been solved in the L∞ framework, including triangulation, note matrices, lower case Roman and Greek letters, e.g.,
camera resectioning, homography estimation, structure and a, γ, denote scalars, and bold-faced letters e.g., x, λ, de-
translation with known rotations, reconstruction by using a note column vectors. 0 and 1 denote vectors of all ze-
reference plane, camera motion estimation and outlier re- ros and ones respectively. Superscripted symbols, e.g.,
moval [13, 15–17, 24, 25]. xk indicate iterates of an algorithm and the superscript
In all of these works, the method used for solving the ∗, e.g., γ ∗ , denotes an optimal solution. For two vectors
L∞ optimization problem is a bisection search for the mini- x = [x1 , . . . , xn ] and y = [y1 , . . . , yn ], x  y is used
max reprojection error. While this approach may be rea- to indicate xi ≤ yi , ∀i = 1, . . . , n. Finally, given scalar
functions fi (x) i = 1, . . . , m, f (x) = [f1 (x), . . . , fm (x)].

1
2. The L∞ problem where X is a nonempty subset of Rn , fi (x) and gi (x) are
We begin with a brief review of the L∞ problem in multi- continuous on X and gi (x) are positive on X [10]. Further,
view geometry and its relation to generalized fractional pro- if we assume
gramming. We use the triangulation problem as an example.
1. X is a convex set,
Given camera matrices Pi = [Ri |ti ], i = 1, . . . , m,
where Ri = [ri1 , ri2 , ri3 ]> and ti = [ti1 , ti2 , t13 ] and the 2. ∀i, fi (x) is convex and gi (x) is concave, and
corresponding images [ui , vi ] of a point x ∈ R3 , we wish to 3. ∀i, either fi (x) are non-negative or the functions gi (x)
find that value of x which minimizes the maximum repro- are affine,
jection error across all images: then GF P is a convex-concave generalized fractional pro-
> > gram. P is therefore a convex-concave generalized fractional
 
m ri1 x + ti1 ri2 x + ti2
min max u i − >x + t
, vi − >x + t

program. From here on, the phrase generalized fractional
x i=1 ri3 i3 ri3 i3

>
program will always refer to convex-concave generalized
subject to ri3 x + ti3 > 0, ∀ i = 1, . . . , m. fractional programs.
>
The constraint ri3 x + ti3 > 0 ensures that the point x lies As was shown in [15] and [16], P is quasiconvex. In fact,
in front of each camera, and making use of it, the above all generalized fractional programs are quasiconvex. The
problem can be re-written as a general problem of the form proof is as follows:
 > By definition, a function h(x) is quasi-convex if its
a x + bi1 , a> x + bi2

m i1 i2 domain is convex and for all γ, the sublevel sets Sγ =
min max
x i=1 a>
i3 x + bi3
{x|h(x) ≤ γ} are convex [5]. The domain of the objective
subject to Cx  d, function h(x) = maxi fi (x)/gi (x) is the convex set X and
its γ-sublevel set is given by
where the constants aij , bij , C and d are appropriately de-
fined. Sγ = {x ∈ X |h(x) ≤ γ}
There is flexibility in the choice of the norm k · k. The = {x ∈ X | max fi (x)/gi (x) ≤ γ}
i
L2 -norm leads to the formulation considered by Kahl [15]
and Ke & Kanade [16], and the L1 -norm leads to the formu- = {x ∈ X |fi (x)/gi (x) ≤ γ, ∀i = 1, . . . , m}
lation considered by Seo & Hartley [23]. In both of these = {x ∈ X |f (x) − γg(x)  0}
cases, each fraction in the objective function is of the form
fi (x)/gi (x), where fi (x) = ka> > If f (x) is non-negative, then Sγ is empty for γ < 0, other-
i1 x + bi1 , ai2 x + bi2 k is
>
a convex function and gi (x) = ai3 x + bi3 is concave, in wise g(x) is affine and f (x) − γg(x) is a convex function
particular gi (x) is affine. For the remainder of this paper we for all γ ∈ R. Thus, depending on the value of γ, the set Sγ
will not differentiate between the two norms, and consider is either an intersection of a set of convex sets, or else Sγ is
the generic optimization problem empty; in either case it is a convex set.

m fi (x) 2.2. The Bisection Algorithm


min max (P )
x∈X i=1 gi (x)
Given initial bounds on the optimal value γ ∗ , we can
where X = {x|Cx  d} is the convex polyhedral feasible perform a bisection search to find the minimum value of
set. Compactness of the feasible set is a common require- γ for which Sγ is non-empty, at each iteration solving an
ment for the convergence analysis of optimization algorithms. instance of the following feasibility problem(Algorithm 1).
For L∞ problems, the set X is usually not compact. This
is however not a significant hurdle. A closed bounded set Find x
is a compact set. The feasible set X is closed by definition, subject to f (x) − γg(x)  0
and it is always possible to enforce compactness by adding a x∈X (Pγ )
constraint kxk∞ ≤ M to X for some large constant M with-
out affecting the solution to the original problem. Therefore, If Sγ is non-empty, the optimization algorithm will return
without loss of generality, we assume that X is a compact some x ∈ Sγ as output, otherwise it will report that the prob-
convex polyhedral set with a non-empty interior. lem is infeasible. For the L2 norm this reduces to a Second
2.1. Generalized Fractional Programming Order Cone Program (SOCP), and for the L1 norm it reduces
A non-linear optimization problem is a generalized frac- to a Linear Program (LP). In both cases, efficient polynomial
tional program if it can be written as time methods exist for solving the resulting optimization
problem [5]. This is the standard method for solving generic
m fi (x) quasi-convex optimization problems [5] and the algorithm
min max (GF P )
x∈X i=1 gi (x) suggested by Kahl [15] and Ke & Kanade [16].
Algorithm 1 Bisection Algorithm the support set of the optimal solution [23]. The hope is that
Require: Initial interval [l1 , u1 ] s.t. l1 ≤ γ ∗ ≤ u1 . each intermediate L∞ problem is small enough to be solved
1: loop quickly and that the total effort is less than what is needed to
2: γ k = (lk + uk )/2, Solve Pγ k to get xk solve the full problem. In our experience, the performance
3: if feasible then of this algorithm depends crucially on the distribution of
4: x∗ = xk , uk+1 = maxi fi (xk )/gi (xk ), lk+1 = reprojection errors and for distributions with thick tails the
lk performance can be quite poor.
5: else Olsson et al. exploited the pseudo-convexity of the re-
6: lk+1 = γ k , uk+1 = uk projection error to construct an interior point method that
7: end if numerically solves the Karush-Kuhn-Tucker equations [21].
8: if uk+1 − lk+1 ≤ 1 then This method was found to be numerically unstable with slow
9: return (x∗ , uk+1 ) or pre-mature convergence. They also proposed a second
10: end if method based on solving a series of SOCPs, which had good
11: end loop empirical performance. However, no convergence theory for
this method was given. In this paper we show that this second
method is in fact a classical method for solving generalized
Consider the sequence γk , which converges to γ ∗ in the fractional programs and has super-linear convergence.
limit. If, for some α and c < 1, the following limit exists We note that in this paper we do not cover the work on
interior point methods developed specially for fractional
|γ k+1 − γ ∗ | programs [11, 20]. These methods require the development
lim =c of specialized codes. Our interest is in methods which can
n→∞ |γ k − γ ∗ |α
exploit the development of advanced solvers for linear and
then the sequence γ k is said to have an order of convergence conic programming to build scalable algorithms.
α. Sequences for which α = 1 are said to converge linearly.
In general, sequences with higher orders of convergence 3. A Parametric View of L∞ Optimization
converge faster than sequences with lower orders. In many Let us now consider the following parametric optimiza-
cases of interest, the exact order of convergence is hard to tion problem:
prove and we have to satisfy ourselves with lower bounds on 
 minw,x w
the order of convergence. For example, if
w(γ) = subject to f (x) − γg(x)  w1 (Qγ )
x∈X

|γ k+1 − γ ∗ |
lim = 0,
n→∞ |γ k − γ ∗ |
Parametric here implies that we will be considering the
it implies that α > 1 and the sequence is said to be super solution of this optimization problem for various values of
linearly convergent. the parameter γ. We denote by w(γ) the optimal value
While simple to implement and analyze, the bisection function; for each γ, w(γ) is equal to the minimum value
algorithm suffers from two major shortcomings. First, in attained by the variable w. Earlier, we saw that for fixed
each iteration, the bisection algorithm reduces the search values of γ, f (x)−γg(x) is convex, thus Qγ is also a convex
space by half, hence α = 1 and c = 1/2. Thus the bisection program. Note that if we fix w = 0 then Qγ reduces to Pγ .
algorithm converges linearly. Second, effort spent on search- Further, if the set X is non-empty, then Qγ is feasible for all
ing for a feasible point when the set Sγ is empty is wasted, values of γ, and w(γ) > 0 if and only if Pγ is infeasible.
i.e., it tells us nothing about the solution beyond the fact that The optimal value function w(γ) has a number of inter-
the optimal mini-max reprojection error is greater than γ. esting properties:

2.3. Related Work in Computer Vision Theorem 1 ([7]). For all γ, w(γ) is finite, decreasing and
continuous. P and Qγ always have optimal solutions. The
The complexity of the L∞ optimization problem is a optimal value of P , γ ∗ , is finite and w(γ ∗ ) = 0. w(γ) = 0
function of the number of observations. One interesting implies γ = γ ∗ .
feature of the L∞ problem is that only a small subset of
the observations actually constrain the solution, i.e., if we Theorem 1 establishes a link between the solutions of Qγ
remove all observations not in the support set of the optimum, and P . The problem of solving P can now be rephrased the
this reduced problem would have the same solution [25]. problem of finding the zero of the function w(γ). In the next
Using this observation, Seo & Hartley propose an iterative four sections we describe four different approaches to this
algorithm that solves a series of L∞ problems to construct problem. All of the approaches are based on solving a series
a subset of the observations which is guaranteed to contain of problems Qγ k ; what differentiates them from each other
is how they exploit the structure of P and Qγ to determine ering the special case of P when m = 1, i.e., the single ratio
the sequence γ k , and how quickly this sequence converges problem:
to the optimal γ ∗ . f (x)
min (P1)
x∈X g(x)
4. Bisection and related methods
and its associated parametric problem
The problem of finding the roots of an function of one
variable is one of the oldest problems in mathematics, and w(γ) = min f (x) − γg(x). (Q1γ )
x∈X
there are a wide variety of solution methods available.
Since w(γ) is the minimum of an affine function over a
The simplest algorithm is the bisection algorithm, which
convex region, it is concave in γ. Further, let γ k ∈ R and
starts with an interval known to contain a root and performs
xk be the solution to Q1γ . Then for all γ we have
a binary search to find it. The bisection method for solv-
ing quasi-convex problems, described earlier, does exactly w(γ) ≤ f (xk ) − γg(xk )
this; since this method only considers the sign of w(γ), an
≤ (f (xk ) − γ k g(xk )) − g(xk )(γ − γ k )
indication of the feasibility or infeasibility of Pγ is enough.
We now consider two new methods for finding the root ≤ w(γ k ) − g(xk )(γ − γ k )
of w(γ).
Thus, −g(xk ) is a supergradient of w(γ) at γ k [5]. Super-
gradients generalize the notion of derivatives for concave
4.1. A New Bisection Algorithm
functions. When a function is differentiable, it has a unique
Because the feasible set Sγ gets smaller as γ gets closer to supergradient equal to its derivative, but a general concave
γ ∗ , the feasibility problem Pγ gets harder as we get closer to function can have more than one supergradient at a point.
the solution. Since Qγ is always feasible, our first proposal Newton’s method for finding the roots of the equation
is to use to use Qγ in place of Pγ in the bisection problem, w(γ) = 0 can be stated as the following update rule:
replacing the feasibility test with a test for the sign of w(γk ).
w(γ k )
γ k+1 = γ k − .
4.2. Brent’s Method ∂γ w(γ k )
The utility of supergradients is that they can be used to
The bisection algorithm is an extremely robust algorithm,
construct a Newton method for concave functions. Replacing
but this robustness comes at the price of linear convergence.
the gradient with the supergradient gives us
Interpolation-based algorithms, such as the secant method
and the method of false position, use a model (linear or w(γ k ) f (xk )
quadratic) to predict the position of the root based on the cur- γ k+1 = γ k + = (1)
g(xk ) g(xk )
rent knowledge of the function. A number of interpolation-
based methods have superlinear convergence. Depending upon the smoothness properties of w(γ), Newton
A modern method which combines the speed of methods can have convergence rates quadratic or better. Un-
interpolation-based methods with the robustness of the bi- fortunately, since w(γ) is not differentiable in general, and
section algorithm is the Brent method [6]. This method uses we only have access to its supergradients, the convergence
a inverse quadratic interpolation scheme with safeguards rate of Eq. 1 is only superlinear [14]. This method is known
that include bracketing and switching to bisection when the in the literature as Dinkelbach’s Procedure [9]. Motivated
interpolation update moves too slowly. Our second proposal by Eq. 1, Crouzeix et al. suggested [8] using
is to use Brent’s method to find the roots of w(γ). f (xk )
γ k+1 = max
i g(xk )
5. Dinkelbach’s Algorithm
In the last section we considered root finding methods to solve the case when m > 1. Algorithm 2 describes the
that solve the equation w(γ) = 0 by querying the value of resulting algorithm. One would hope that an analog of the
w(γ) at various values of γ. These methods do not make supergradient inequality will hold true for this algorithm too.
use of the structure of P or Qγ , and treat the function w(γ) Unfortunately that is not true and only a weaker inequality
as a black box. There is hope that methods which consider holds: [8]
the form of P and Qγ can achieve better performance than (
w(γ k ) − mini {gi (xk )}(γ − γ k ) γ > γk
such black box methods. Starting in this section we consider w(γ) ≤
methods that take the specific form of the objective function w(γ k ) − maxi {gi (xk )}(γ − γ k ) γ < γk
of P into account. (2)
The first class of methods we consider are based on esti- Consequently, the resulting algorithm converges only lin-
mating and using the gradient of w(γ). We begin by consid- early.
Algorithm 2 Dinkelbach’s Algorithm Now, consider the first-order Taylor expansion of the term
Require: Initial γ 1 ≥ γ ∗ γgi (x) around (γ k , gi (xk ))
1: loop γgi (x) ≈γ k−1 gi (xk ) + (γ − γ k−1 )gi (xk−1 )
2: Solve Qγ k to get (xk , wk )
3: γ k+1 = maxi fi (xk )/gi (xk ) + γ k−1 (gi (x) − gi (xk−1 ))
4: if |wk | ≤ 2 then =γ k−1 gi (x) + (γ − γ k−1 )gi (xk−1 ) (3)
5: return (xk , γ k+1 ) k−1
6: end if Let w = γ − γ . Then P can be approximated as
7: end loop min w + γ k−1
subject to f (x) − γ k−1 g(x)  wg(xk−1 )
x∈X
5.1. Scaled Dinkelbach’s Algorithm
But all hope is not lost. Observe that which is exactly the optimization problem suggested by [21]
for the case of the L2 norm. [21] reported good empirical
m fi (x)/vi performance of the resulting algorithm, but did not provide
min max
x∈X i=1 gi (x)/vi any convergence analysis. Since γ k−1 is a constant, this
optimization problem is equivalent to Q0γ k . Thus we have
for any vi > 0 has exactly the same solution as P . In par- shown that the algorithm suggested in [21] is the classical
ticular, this is true for vi = gi (x∗ ), where x∗ is an optimal Dinkelbach Procedure of Type II, and therefore has superlin-
solution to P . Let us consider the corresponding parametric ear convergence. Further, the algorithm is applicable to both
problem the L1 and L2 norm cases.

 min w 6. Dual Dinkelbach’s Algorithm
fi (x) − γgi (x)

0 The parametric problem Qγ is a convex program. We now
w (γ) = subject to ≤ w, i = 1, . . . , m
 gi (x∗ ) consider an algorithm that uses the Lagrangian dual of Qγ
x∈X

to construct a superlinearly convergent algorithm [3]. Let Σ
denote the set of vectors λ ∈ Rm such that λ  0, 1> λ = 1,
At γ ∗ , the above problem has the solution x∗ . Now
and let
let us see what happens to Eq. 2 at (γ ∗ , x∗ ). Since
maxi {gi (x∗ )/gi (x∗ )} = 1 = mini {gi (x∗ )/gi (x∗ )}, the λ> f (x)
γ(λ) = min . (4)
two cases of the inequality 2 collapse into one. Thus, in x∈X λ> g(x)
the neighborhood of x∗ , − maxi {gi (x)/gi (x∗ )} is approx- Then the following theorem characterizes γ(λ).
imately the supergradient and we can recover superlinear
convergence. Of course we do not know x∗ a priori. But Theorem 2 ([3]). If λ∗ = arg maxλ∈Σ γ(λ), then γ(λ∗ ) =
it suggests a modification to Algorithm 2, where Qγ is γ ∗ and if x∗ = arg minx∈X λ∗> f (x)/λ∗> g(x) then x∗ is
replaced by the scaled problem also an optimal solution of P .
Theorem 2, motivates solving P by solving
min w maxλ∈Σ γ(λ). Notice that like w(γ), γ(λ) is the op-
subject to f (x) − γg(x)  wg(xk−1 ) timal value function of an optimization problem, in this case
the minimum value of a generalized fractional program with
x∈X (Q0γ )
a single fraction. Making an analogy with the duality theory
of constrained optimization [5], it is possible to consider
The resulting algorithm is known as Dinkelbach’s Procedure
Eq. 4 to be a dual of P with zero duality gap.
of Type II, or the differential correction algorithm [2].
So how does one go about maximizing γ(λ)? Since Eq. 4
is a fractional program, consider the parametric problem
5.1.1 Equivalence to Olsson et al. associated with it:

There is another way in which we can arrive at this algorithm. w(γ, λ) = min λ> (f (x) − γg(x))
x∈X
Let us re-write P as
From Theorem 1 we know that w(γ, λ) > 0 implies that
γ(λ) > γ. This suggests the following iterative update
min γ
subject to f (x) − γg(x)  0 λk+1 = arg max w(λ, γ(λk )) (5)
λ∈Σ
x∈X (P ) and the following result holds true.
Algorithm 3 Dual Dinkelbach’s Algorithm Algorithm 4 Gugat’s Algorithm
1: Choose λ0 ∈ Σ Require: l1 ≤ γ 1 ≤ u1 , such that l1 ≤ γ ∗ ≤ u1 .
2: loop 1: loop
>
3: γ k = minx∈X λk−1> f (x)/λk−1 g(x) 2: Solve Qγ k to get (xk , wk , λk , µk )
4: Solve Qγ k to get (xk , wk , λk , µk ) 3: z k = maxi fi (xk )/gi (xk )
5: if |wk | ≤ 1 then 4: if z k < γ ∗ then
6: return 5: x∗ = xk , γ ∗ = z k
7: end if 6: end if
8: end loop 7: uk+1 = min(uk , z k )
8: if wk ≥ 0 then
9: lk+1 = max(lk , γ k + wk /σ)
Theorem 3 ([3]). If f (x) is positive, strictly convex and 10: else
g(x) is positive concave, then Eq. 5 converges superlinearly. 11: lk+1 = lk
12: end if
Consider now the Lagrangian of Qγ :
13: if |wk | ≤ 1 or (uk+1 − lk+1 ) ≤ 2 then
L(x, w, λ, µ; γ) =w + λ> (f (x) − γg(x) − w1) 14: return (x∗ , γ ∗ )
15: end if
+ µ> (Cx − d) (6) 
16: γ k+1 = max lk+1 , min γ k + wk /λk> g(xk ), uk+1
Our use of the symbol λ as the dual variable associated with 17: end loop
the constraint f (x) − γg(x)  w1 is deliberate. Indeed the
following holds true
Here, L(x, w, λ, µ; γ) is the Lagrangian given by Eq. 6, and,
Theorem 4 ( [3]). If (x∗ , w∗ , λ∗ , µ∗ ) is a primal-dual solu- X(γ) and Λ(γ) are the sets of the primal and dual solutions
tion to Qγ , then λ∗ = arg maxλ∈Σ w(λ, γ) of Qγ . Results of this type, are however, not particularly
useful from a computational point of view, since they involve
Algorithm 3 describes the resulting algorithm. The al-
finding a saddle point over the Cartesian product of all primal
gorithm successively approximates γ ∗ from below and can
and dual solutions.
be considered a dual to the Dinkelbach algorithms which
Gugat showed that, given a particular primal-dual solution
approximate γ ∗ from above.
pair (x∗ , w∗ , λ∗ , µ∗ ), the derivative of the Lagrangian at that
7. Gugat’s Algorithm point approximates the one-sided derivative well enough that
Recall that the reason why Dinkelbach’s Procedure of the Newton update converges superlinearly [12]. i.e.,
Type I has linear convergence in the case of multiple ratios ∂γ+ L(x∗ , w∗ , λ∗ , µ∗ ; γ) ≈ −λ∗> g(x∗ )
is because the supergradient inequality does not hold true
anymore. The Scaled Dinkelbach’s Algorithm works well in Thus, the update rule for γ can now be stated as
the neighborhood of the optimal solution, but its linearization wk
breaks down away from the solution. γ k+1 = γ k + (7)
λk> g(xk )
The classical presentation of Newton’s method is based on
assuming that the function being considered is differentiable. It is interesting to note here that, for the case when m =
If, however, we are satisfied with superlinear convergence, 1, λ∗ = 1 and wk = f1 (xk ) − γ k g1 (x), and the update
Newton’s method can be constructed using the notion of the rule reduces to the familiar Dinkelbach’s update for single
one-sided derivative: fractions: γ k+1 = f1 (xk )/g1 (xk )
Gugat’s algorithm combines update rule 7 with bracketing
w(γ + δ) − w(γ)
∂γ+ w(γ) = lim which ensures that the iterates are always bounded and the
δ→0+ δ algorithm does not diverge. Finally, we need a number σ
The general problem of estimating the derivatives of the op- that determines how much we can increase the lower bound
timal value function with respect to the parameters of the lk , if w(γ k ) is non-negative, without missing the root. This
optimization problem is addressed in the perturbation the- modification ensures that the algorithm does not oscillate. σ
ory of optimization problems [4]. Under suitable regularity should obey, σ ≥ maxi maxx∈X gi (x). For our purposes, a
conditions, a classical result relates the one-sided deriva- large upper bound (1e6) is sufficient.
tive of the optimal value function with the derivatives of the
Lagrangian as follows: 8. Experiments
∂γ+ w(γ) = inf sup ∂γ L(x, w, λ, µ; γ) In this section we compare the performance of the various
x,w∈X(γ) λ,µ∈Λ(γ)
algorithms we have described. Since our primary interest
is in large scale L∞ optimization, we restrict our attention Dinkel I is usually better on datasets with low noise, on
to the problem of estimating structure and translation with datasets with a lot of outliers it consistently performs the
known rotations. As demonstrated by Martinec & Pajdla, worst.
solving this problem offers an alternative approach to the
Of the three blackbox methods, Bisect I, Bisect II and
problem of reconstruction from multiple views [18].
Brent, for L1 problems Brent’s method usually performs the
Six algorithms were compared. Bisect I is from [15].
best, but for L2 problems Bisect I beat both Bisect II and
Bisect II and Brent were proposed in Section 4. Dinkel
Brent’s method. Even for L1 problems, Bisect I becomes
I and Dinkel II refer to Dinkelbach procedures of type I
more competitive as the problem size increases. This differ-
and II respectively. Gugat refers to Gugat’s algorithm. The
ence in performance can be explained by taking a closer look
Dual Dinkelbach algorithm is omitted because it displayed
at the problems Pγ and Qγ . Bisect I is based on solving Pγ
extreme numerical instability when solving for γ k using
which is a feasibility problem where as Bisect II and Brent’s
the single ratio problem. All algorithms were implemented
method use Qγ , which is an optimization problem. For sim-
in MATLAB. The MATLAB function fzero implements
ilar values of γ, Pγ is easier to solve since the optimizer
Brent’s method and and we use this implementation in our
terminates as soon as it finds a point inside the feasible set,
experiments. For the L2 norm, Qγ is a SOCP and we use
whereas it has to find the analytic center of the constraints
SeDuMi [27] as our solver.1 For the L1 norm, Qγ is an LP,
in case of Qγ . Unfortunately Bisect II and Brent’s method
and we use MOSEK as the solver, as it had better runtime
are unable to exploit the value of w(γ) effectively enough
performance than SeDuMi and was able to handle problems
to offset the cost of solving more expensive sub-problems.
that required memory greater than 2GB.
This becomes obvious if we look at the number of iterations
The algorithms were compared on 8 datasets. Tables 1 for these methods. Bisect I consistently takes more iterations
and 2 list the details of each dataset along with the perfor- and still performs better on runtime as compared to Bisect II
mance of each algorithm on it. For each algorithm we list and Brent’s method.
the runtime in seconds. Only the time used by the solver
is noted here. The number in parentheses is the number of The clear winner out of the six algorithms is Gugat’s al-
times the subproblem Qγ was solved (P for Bisect I). gorithm, which had the best performance on every dataset.
The Dino and the Oxford datasets are available from the It particularly shines for large-scale sets, where it is between
Oxford Visual Geometry Group. The four Temple datasets 1.5 to 4 times better than the Bisection algorithm. Its clever
are from [22]. The Pisa data set is a proprietary dataset, and construction that exploits the dual solution to estimate the
the Trevi dataset is based on images of the Trevi fountain gradient of w(γ) makes this algorithm both numerically ro-
found on Flickr [1]. Except for the first two datasets, which bust and computationally efficient. Based on our experiences,
come with camera information, the camera rotations and we recommend that Gugat’s algorithm be used as a standard
focal lengths were obtained from an independent bundle algorithm for L∞ optimization.
adjustment process [26]. Since outliers are a big issue in
L∞ problems, we used two kinds of datasets. The Dino,
Oxford, and Temple 1-4 datasets are clean datasets with 9. Discussion
no significant outliers. Trevi and Pisa datasets contain a
In summary we have shown that L∞ problems are a
significant number of outliers. No results for the L2 norm
particular case of generalized fractional programming, and
are reported for the Temple 3 & 4 and the Pisa datasets.
methods for solving them can be used with great success
For Temple 3, SeDuMi returned with a numerical failure.
in multi-view geometry. While our experimental results
Temple 4 and Pisa were too large for the 32-bit version of
have only considered the structure and translation estimation
SeDuMi to fit in memory. All experiments were run with an
problem, the method presented in this paper are general and
initial guess of γ = 50 and a lower bound of 0 and an upper
applicable to all the different L∞ problems. We hope that
bound of 100 pixels error. The termination parameters were
Gugat’s algorithm will become a standard tool for solving
1 = 0.01, 2 = 0.001.
L∞ problems in multi-view geometry.
There are number of interesting features in both tables.
Bisect I, Bisect II and Dinkel I are linearly convergent al- It is also our observation that the L2 problems are poorly
gorithms and it shows in their poor runtime performance as conditioned as compared to the corresponding L1 problems.
compared to the other three superlinear methods. There is Further, since LP solvers are much more mature than SOCP
no clear winner between Bisect I and Bisect II, and while solvers, the L1 norm formulation is a better one to solve in
our opinion. The exact cause of the conditioning problems
1 We also experimented with MOSEK [19], which is a leading com- of L2 problems is a problem that deserves more attention.
mercial LP and SOCP solver, and found that Qγ for moderate to large
sized problems triggered a bug in the solver leading to poor numerical In future work we hope to use the dual structure of Qγ to
performance. analyze the problem of outlier removal.
Dataset Images Points Observations Bisect-I Bisect-II Brent Dinkel-I Dinkel-II Gugat
Dino 36 328 2663 12(13) 12(9) 7(5) 7(5) 6(4) 4(3)
Oxford 11 737 4035 19(13) 25(12) 12(6) 41(21) f(f) 10(5)
Temple 1 43 4233 29163 226(13) 196(9) 109(5) 132(6) 104(5) 81(4)
Temple 2 103 8063 63373 676(13) 576(10) 275(5) 339(6) 277(5) 220(4)
Temple 3 203 15898 128530 985(13) 1646(10) 778(5) 1079(7) 794(5) 472(3)
Temple 4 312 22033 178897 1353(13) 1875(9) 1042(5) 1426(7) 1237(6) 619(3)
Trevi 58 4054 15085 191(14) 101(10) 70(7) 247(24) 59(6) 50(5)
Pisa 100 64023 436060 14435(14) 17311(13) 13665(10) 28396(28) 11352(7) 4617(4)

Table 1. Runtimes for L1 norm reprojection error. All times are in seconds. The number in the parentheses indictes the number of times Qγ
or Pγ was solved. f denotes numerical failure. Parameter settings, 1 = 0.01, 2 = 0.001, σ = 1e6.

Dataset Images Points Observations Bisect-I Bisect-II Brent Dinkel-I Dinkel-II Gugat
Dino 36 328 2663 6(9) 21(9) 11(5) 9(4) 9(4) 8(4)
Oxford 11 737 4035 12(12) 20(9) 62(28) 84(30) 30(10) 10(4)
Temple 1 43 4233 29163 180(11) 356(9) 226(5) 298(7) 199(5) 121(3)
Temple 2 103 8063 63373 439(11) 512(5) 558(5) 842(8) 566(5) 315(3)
Trevi 58 4054 15085 123(13) 156(8) 229(13) 743(30) 130(6) 33(2)

Table 2. Runtimes for L2 norm reprojection error. All times are in seconds. The number in the parentheses indictes the number of times Qγ
or Pγ was solved. Parameter settings, 1 = 0.01, 2 = 0.001, σ = 1e6.
Acknowledgements [12] M. Gugat. A fast algorithm for a class of generalized frac-
tional programs. Man. Sci., 42(10):1493–1499, 1996.
The authors are grateful to Prof. Paul Tseng for several useful [13] R. Hartley and F. Schaffalitzky. l∞ Minimization in geometric
discussions, Erling Andersen for his help with implementing the reconstruction problems. In CVPR, pages 504–509, 2004.
algorithms in MOSEK and Kristin Branson for reading several [14] T. Ibaraki. Parametric approaches to fractional programs.
drafts of the paper. Math. Prog., 26(3):345–362, 1983.
This work was supported in part by National Science Foundation [15] F. Kahl. Multiple view geometry and the L∞ -norm. In ICCV,
grant IIS-0743635, the Office of Naval Research, Microsoft, and pages 1002–1009, 2005.
the UW Animation Research Labs. [16] Q. Ke and T. Kanade. Quasiconvex optimization for robust
geometric reconstruction. In ICCV, pages 986–993, 2005.
References [17] H. Li. A practical algorithm for l∞ triangulation with outliers.
In CVPR, 2007.
[1] Photo Tourism. http://phototour.cs.washington.edu. [18] D. Martinec and T. Pajdla. Robust rotation and translation
[2] I. Barrodale, M. Powell, and F. Roberts. The differential estimation in multiview reconstruction. In CVPR, 2007.
correction algorithm for rational l∞ -approximation. SIAM J. [19] MOSEK ApS, Denmark. The MOSEK optimization tools
on Num. Anal., 9(3):493–504, 1972. manual Version 5.0 (Revision 60).
[3] A. Barros, J. Frenk, S. Schaible, and S. Zhang. A new al- [20] Y. Nesterov and A. Nemirovskii. An interior-point method
gorithm for generalized fractional programs. Math. Prog., for generalized linear-fractional programming. Math. Prog.,
72(2):147–175, 1996. 69(1):177–204, 1995.
[4] J. Bonnans and A. Shapiro. Perturbation Analysis of Opti- [21] C. Olsson, A. Eriksson, and F. Kahl. Efficient optimization
mization Problems. Springer, 2000. for l∞ problems using pseudoconvexity. In ICCV, 2007.
[5] S. Boyd and L. Vandenberghe. Convex Optimization. Cam- [22] S. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski.
bridge University Press, 2004. A comparison and evaluation of multi-view stereo reconstruc-
[6] R. Brent. Algorithms for Minimization Without Derivatives. tion algorithms. pages 519–526, 2006.
Courier Dover Publications, 2002. [23] Y. Seo and R. Hartley. A fast method to minimize L∞ error
[7] J. Crouzeix and J. Ferland. Algorithms for generalized frac- norm for geometric vision problems. In ICCV, 2007.
tional programming. Math. Prog., 52(1):191–207, 1991. [24] K. Sim and R. Hartley. Recovering camera motion using l∞
[8] J. P. Crouzeix, J. A. Ferland, and S. Schaible. An algorithm minimization. In CVPR, pages 1230–1237, 2006.
for generalized fractional programs. J. of Opt. Theory and [25] K. Sim and R. Hartley. Removing outliers using the l∞ -norm.
Appl., 47:35–49, 1985. In CVPR, pages 485–494, 2006.
[9] W. Dinkelbach. On nonlinear fractional programming. Man. [26] N. Snavely, S. Seitz, and R. Szeliski. Photo Tourism: Explor-
Sci., 13(7):492–498, 1967. ing photo collections in 3D. TOG, 25(3):835–846, 2006.
[10] J. Frenk and S. Schaible. Fractional Programming. Springer, [27] J. Sturm. Using SeDuMi 1.02, a Matlab toolbox for optimiza-
2004. tion over symmetric cones. Opt. Meth. and Soft., 11-12:625–
[11] R. Freund and F. Jarre. An interior-point method for fractional 653, 1999.
programs with convex constraints. Math. Prog., 67(1):407–
440, 1994.

You might also like