The Algorithm Selection Problem
The Algorithm Selection Problem
Purdue e-Pubs
1975
Report Number:
75-152
Rice, John R., "The Algorithm Selection Problem" (1975). Department of Computer Science Technical
Reports. Paper 99.
https://docs.lib.purdue.edu/cstech/99
This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries.
Please contact [email protected] for additional information.
THE ALGORITHM SELECTION PROBLEM
John R. Rice
Computer Science Department
Purdue University
West Lafayette, Indiana 47907
July 1975
CSD-TR 152
CONTENTS
Section 1: Introduction
*This work was partially supported by the National Science Foundation through
Grant GP-32940X. This paper was presented as the George E. Forsythe
Memorial Lecture at the Computer Science Conference, February 19, 1975,
\qashington, DC.
1
1. INTRODUCTION
obscures the common features of this selection problem and the primary
It should be made clear that we do not believe that these models will
This will always require exploitation of the specific nature Df the situa-
tion at hand. Even so, we do believe that these models will clarify the
Three concrete examples are given below which the reader can use to
b
f f(x)dx
a
duce (a) high hatch throughput, (b) good response to interactive jobs,
(c) good service to semi-interactive jobs and Cd) high priority fidelity.
effective. i.e. never loses and wins whenever an opponent's mistake allows
it.
and these parameters are then chosen so as to satisfy (as well as they can)
are decision trees (with size, shape and individual decision elements as
Problem Space: The set of problems involved is very large and quite
diverse. This set is of high dimension in the sense that there are a number
is large and diverse. Ideally there may be millions of algorithms and prac-
distinguish between two which are identical except for the value of some
particular algorithm for a particular problem are complex and hard to cam-
pare (e.g. one wants fast execution, high accuracy and simplicity). Again
measures.
2. ABSTRACT MODELS
2.1 The Basic Model and Associated Problems. We describe the basic abstract
model by the diagram in Figure 1. The items in this model are defined below
NORM
MAPPING
-
"1/
II II = Norm on ~npP~ovddiggoDnennumberc~oeava~uateaanaa~gotitbm£s
Algorithm Selection Problem: Given all the other items in the above
There must be, of course, some criteria for this selection and we present
choose just one algorithm from a subclass S(; to apply every member
These four criteria do not exhaust the meaningful criteria but they do
illustrate the principal ideas. There are five main steps to the analysis
classical theory within this framework. The space 9' is a function space
and the algorithm space !IIi may be identified with a subspace of fi8.. TDhe
algorithm enters as the means of evaluating elements of q(. The performance
mapping is
Two concrete examples of the model are cii..~~.!s.'Y~.Q. i..,,_ rlPf..R..U_ b,.
Sections 3 and 4 of this paper. We present a third, simpler one from the
area of artificial intelligence.
for playing Tic-Tae-Toe. The problem space is the set of part~~l games of
Tic-Tae-Toe. ~~ile this number is large. there are in fact only 28 distinct
involves only the existence of immediate winning positions and vacant position
There are 16 parameters a. which take on one of the follcwing five values.
1
~37
[s a corner free?
855 I S56
Q
Figure 2. The form of the selection mapping for the Tic-Iae-Toe
example. Each 8 is
one of five moveS~ i
This example is so simple that one can make immediate assignments of ~eTtain
of the values of the a.o , Experiments have shown that a variety of crude
,
schemes for computing values of the a. (selecting the best algorithm) work
one would compute this if one had no a priori information about the game.
9
not and we call this selection based on features of the problem. This
xE'3'
PERROBHEMCE
SPACE
F
. FEATURE
EXTRACTION
n
f(x)Eff- !it' S(f(x)) AEW
p{A.x)
PE~
'- "- PERFORMANCE
FEATURE SELECTION .-- PERFORMANCE /
ALGORITHM PEME1\SlIRE CE
SPACE MAPPING MAPPING
SPACE SPACE
\ 1/
II P II = ALGORITHM PERFORMANCE
Note that the selection mapping now depends only on the features f(x) but
yet the performance mapping still depends on the problem x. The introduc-
criteria for selection are still valid for this new model as well as
the five steps in the analysis and solution of the problem. The deter-
process, often one of the most important parts. One may view the features
those problems with the same features would have the same performance for
the dimension m of ~, what m features are the best for the predic-
~* (f) so that
*
d (A) ~ max
max * IIp(A,x) - p(A,y)I I < max max I !p(A,x) - p(A,y) II
m fE5' x,yE'ff (f) fEY X,YEY(f)
then the effective dimension of 9(for the problem at hand) is probably much
of _l1i"is close to m.
and the dimension m of ~ what m features are the best for prediction
The determination of the best (or even good) features is one of the
Many problem spaces ~ are known only in vague terms and hence an experi-
over 5P. That is, one chooses a sample from !J1 and restricts considerations
and if one has a good set of features for ~ • then one can at least force
mation most relevant to the performance of algorithms for the problem at hand.
upon (if not explicitly stated) set of features. For example, consider the
this problem with considerable confidence. The selection problem for quad-
12
situation exists for problems that have been studied for one or two
uncertainties for problems that have just appeared in the past one or two
decades.
13
2.3 Alternate Definitions of Best for the Models. In the preceding sections
It is reasonable to ignore the performance for the worst case and, instead,
Minimax Approach
The corresponding mathematical statements for the least squares and least
The use of integrals in these formulations implies that a topology has been
introduced in the problem space ~. Many common examples for ~ are dis-
crete in nature and in these cases the topology introduced reduces the
14
Note that the only difference between the two new formulations is the
flllp(B(x),x)II-llp(so(x),x)lllrdx2 flllp(B(x),x)II-llp(S(x),x11IrdX
9 9
for all S E~
sistent approach for the reformulations. That is, if we use least squares
on the problem space we also use it on the feature space ~ and the algorithm
l/r
IIp(A,x) - p(A,y) Il r ]
r
d (A,.w)·
m f [ ff
fEY x,y E .w(f)
then for Problem E: Best Feature for a Particular Algorithm, the objective
r r rA*
d (A) = d (A,~) = min
m m .w
l/r
f [ f ff IIp(A,x) - p(A,y) Il r ]
fEff AE~ x,yE.\:f(f)
15
significant in the larger context, but they are very significant in determining
context. This lesson is, roughly, that the crucial ingredients for success
are proper choices of the subclasses ~O' ~O and~. Once these are made
properly then the mathematical optimi~ation should be made for that value of
r that gives the least difficulty. If the problem is completely linear then
culty. The situation is more variable for nonlinear problems. Note that there
no doubt there are similar cases for the algorithm selection problem. We
of instances.
16
2.4 The Model with Variable Performance Criteria. We have assumed so far
weight given to each of these might vary from almost zero to almost 100%.
A model for this version of the selection problem is shown in the diagram
of Figure 4.
xE9 F f(x)E?
PROBLEM FEATURE FEATURE
SPACE EXTRACTION SPACE
Ilpll ~
g(p,w)
ALGORITHM PERFORMANCE
Some of the mappings now have changed domains, but their nature is the same.
The choice of 9l n for the criteria space is clearly arbitrary (and perhaps
are:
Problem subclasses ~O
Norm mapping g
from formulating all of them. Some of the more important problems are:
performance:
for all S E yO. Note that g(p(B(x,w) ,x) ,w) is the best possible per-
formance and the other g terms are the performances of the algorithms
actually selected.
We note that the abstract model presented in this section CQuid he elaborated
from simple models can visualize how this would be done. However, the
crucial point of a model is not its theoretical structure but its relevance
to underlying real world problems. In other words, does this model allow us
3.1 The Components in the Abstract Model. The next two sections are
involves spaces for the problems, the features, the criteria, the algorithms
and the performance measures. These spaces are described as follows:
is from a very large population of many millions (see Rice (1975);' The actual
algorithms one might consider are mentioned in the various references
and many of them are named later.
There have been ten substantial testing efforts reported which are listed
below in chronological order. We indicate the test functions used (by A,
B or C), the requested accuracies (by E values) and the algorithms involved.
The algorithms are named and described, but detailed references are not
given here, one must refer to the test reports.
22
6. PiesseBB (1973).
Complete details not reported
-2 -3 -13
Teat set A with e:" 10 ,10 " .. ,10
u
Algorithms: CCQUAD, SQUANK
Also see Lyness and Kaganore (1975) for further discussion on the nature
of this problem. This testing has provided much useful information and
served to identify some poor algorithms. However, it has not been well
-enough organized- -to·---.al)ow defini tf've-conclusionsand-th-ere is stillconsid-::'
erable doubt about the relative merits of the better algorithms. We note
that a much better experiment can be performed.
are oscillatory with a singularity). This process gives a test set which
produces a grid over the entire feature space. This test set can be combined
. -2 -4 -8 -12
with accuracy values of £ = 10 ,10 ,10 • 10 to permit a much more
precise measurement of algorithm performance.
There are about a dozen existing algorithms that merit inclusion in this
experiment and a little estimation shows that a rather substantial compu-
tation is required for this experiment. An important result of the syste-
matic nature of this approach is that one can consider probability distribu-
tion in the problem space which induce a probability distribution on the
feature space and algorithm performances can be compared (over this problem
subdomain) without repeating the experiment.
This suggested experiment is far from the most general of interest and is
clearly biased against certain well known algorithms. For example, SQUANK
takes considerable care in handling round-off effects (a feature omitted
here) and explicitly ignores oscillations (a feature included here) and thus
one would not expect SQUANK to compare favorably with some other algorithms
on the basie of this experiment.
26
Pz (A,x) 1/(1
"
+ .1(log
This places a severe penalty on failing to achieve EX' and a mild penalty
on achieving much more accuracy than These conventions allow us to
E.
x
find the performance vector (P1(A,x),PZ(A,x)) and we introduce a criteria
unit vector (w1,w ) and the norm of p(A,x) is then
Z
27
4.1 The. Components. in --the Abstract Model. The general case of this
probl.em may be' expressed' as fellows:
The abstract model involves spaces for the prob lems, the featuI'es J the
criteria, the algorithms and the performance measures. These spaces are
described as follows:
Batch - small job response: median and maximum turnaround for jobs
with small resource requirements
Batch - large job response: median and maximum turnaround for all
batch jobs other than small ones (or special runs)
On line response - median and maximum response time for common service
functions (e.g. fetching a file, editing a line, submitting a
batch job)
Interactive response median and maximum response times for standard
short requests
Thruput - total number of jobs processed per unit time, number of CPU
hours billed per day, etc.
where
6
'l(r1 ) 2 * r1
R (r ) = Ir - 1501001/128
2 2 2
if r = 0
5
{ 300 + if r ~ 1
5
R (r ) = r
6 6 6
and R and R are shown in Figure ~ (8) and (b)
a 4
589
..s-·192
448 r-__
253
6500-r
4
900-r 256
112 3
8
(a) priority contribution for CPU time (b) priority contribution for I/O units
8
9 * max(a10 -r 4 ,O) + all * la12-r41~ + 8 13 * la 14 - r41~
where w is from the three dimensional criteria space with wI > 0 and
wI + w
2
+ w3 = 1.
Thus we see that the 19 coefficients are in fact functions of six other
independent variables. One could. for example, attempt to determine coeffi-
cients n .. so that
1J
~
We now consider how to find the best scheduler of'this form. To set
the context, let us outline how the computation might go in an ideal world.
The basic building block would be the computation of best a for given
i
wj and f j . This block is designated by the function OPT, i.e. OPT(~t) is
the set of 19 best coefficients. Note that this does not involve any assump-
tion about the form of the relationship between the a. and the variables
1
33
W and f. , i.e. the 0 .. are not involved. We would then select an appro-
j J 1J
priate set of values for the variables w and f • say w ' 1 = 1 to m
j j j1 w
f , k = 1 to m and execute the algorithm
jk f
It seems plausible that one can obtain values of IItII which are fairly
tightly associated with values of ~, t and~. This means that it is, in
principle, feasible to carry out the optimization problem. A simplified
example of the situation is shown in Figure 6 where we assume there is 1
+ + ,.
variable for a, and 1 variable for wand t.
II Pi I
values of ;: and t
+
a
Figure 6. Function values of I lin I obtained when there is no direct
control over some of the arguments (f in this case).
In order to compensate for the irregular nature of the values obtained, one
should use an integral form of the minimization problem and then introduce
quadrature rules to accommodate the irregularity. Standard quadrature rules
for this situation are not ava,Ha.bIe . Reasonable accuracy can be achieved
by using ordinary Reimann sums with areas determined from a space-filling
curve map. That is, one maps the high dimensional domain onto [0,1], then
one assigns weights to the points according to the length their images span
in [0,1]. Note that certain values of fmight be very uncommon and hence the
optimization obtained there might be unreliable. Fortunately, the rarity of
35
As a final note, we consider the time that might be required for a complete
determination of the "best" scheduling algorithm.. Given a fairly constant
job configuration, we assume that we can obtain values for I 1P1 I and all
other quantities within a 10 minute time interval. This corresponds to 1
function evaluation. Thus we are led to assume that one evaluation of OPT
takes from 1/2 to 1 day of system time. The ineffic.iency due to the lack
of control over setting parameters will probably double this time, say to
_-'-'-w .. _ __~ . _ . _ .
1 1/2 days. The number of.~~aluations of OPT.~eedea· to 0~tain semi~reason~ble
reliability in the Q .. computations is probably the order of 50 or 100.
1J
This implies ahout 3 ~o 6 months to select the best scheduling algorithm.
Note how this approach differs from . ··the common theoretical approach.
There one assumes some model for the computer operation and then analytically
obtains a good (or optimum) scheduling algorithm for that model. Here there
is no explicit model of the computer operation, one tries to obtain a good
scheduling algorithm by observing the systems behavior directly rather than
through the intermediary of a mathematical model. It is, of course, yet to
be seen just how feasible or effective this direct approach will be.
examine these questions, to indicate what light can be shed on them from
the existing theory of approximation and to point out the new problems in
4. Computation
6.2 Norms and approximation forms. The question of norms enters in the
formance space ~n to the single number which represents the algorithm per-
the possibilities are well-known. The most common are of the form
Ilpll = [~
i=l
w opr] l/r
1 J.
minimax norm). However, the nature of the selection problem is suep that
we can anticipate using non-standard norms. The reason is that the perfor-
where
39
0 for P2 2. 10,000
10- 5 for 10,000 ..:: P ..:: 20,000
a(P ) =
z
2 2*10- 5 for 20,000 ..:: P .::.. 30,000
z
9
7P/1O- for P2 > 30 J OOO
and
0 for P3 < .5
There are two observations, one positive and one negative, about such
tien. The negative one is that they do complicate the theory sometimes and j
more often, make the computations substantially more difficult. The positive
the choice of approximation form. That is, if one has a good choice of approx-
imation form, one obtains a good approximation for any reasonable norm. This
implies that one can, within reason, modify the norm used so as to simplify
tion is that one cannot compensate for a poor choice of approximation form
a. discrete
b. linear
c. piecewise
d. general non-linear:
standard mathematical
separable
abstract
theory. The form is to be used for the selection mapping S(f(x)): ~ 7 Jdf
and we visualize a parameter (or coefficient) space 5f plus a particular
Rational: S(f(x),c) =
Exponential: S(f(x), c)
Non-linear. Separable:
NO YES
c 3 f 1£2+ C4
> C5"'
(£1+f )2
2
s =
FUNCTION S(F,C)
SUM=O
DO 20 K=l, C(l)
20 S~I=S~I+C(K+l)*F(K)
H(F(l) > C(l) ) THEN SUM = s~l/( cecel)+l))+l )
PROD=!.
IF( F(cel)+2) < (C(C(1)+1)+F(2))jF(3) ) THEN PROO=F(1)*F(2)
DO 40 K=l, C(C(1)+3)
40 PROD = ( F(K)+C(K))*PROD+C(C(1)+K+3 )
S = C(1)*SUM+C(2)*PROD+C( C(l)+C( C(1)+3)+1 )*F(l)
The main thrust of approximation theory is for the case where the co-
this manifold.
One thus may conclude that there are three distinct situations as
most favorable situation is for the linear, piecewise linear and nonlinear
it currently exists. This does not mean that all of these cases are already
solved and all one has to do is to II copy lt the solutions from somewhere.
Rather, it means that these are the kinds of problems the machinery is supposed
The second situation is for the tree and algorithm forms. I-Jere it seems
that a major change in emphasis is required. The exact nature of the new
machinery is certainly unclear and no doubt there are hidden difficulties which
are not apparent from a casual inspection. However, it seems plausible that
the general spirit of the approach and techniques may well be similar to that
already existing. For example, the piecewise linear forms may be visualized
as one of the simplest of the tree forms. The development and analysis for
the piecewise forms (even for variable pieces) has progressed fairly smoothly
over the past 10 years and the resulting body of results is very much of the
There were (and still are), of course, some difficult questions for the piece-
wise linear, but the prospects do not appear to be too bad for developing a
useful body of approximation theory machinery for the tree and algorithm forms.
43
The third and least favorable situation is for the discrete forms. The
in this case. One ascertains the best selection mapping by a finite enumer-
ation. Unfortunately, the enumeration may well be over very large sets. Even
sidered (at least in some abstract sense). It is not at all clear how
algorithm selection procedures are to evolve in this situation and the develop-
ment of such procedures is one of the foremost open questions in this entire
area of study.
machinery comes into play after this choice is made. Thus it is essential to
have insight into both the problem and algorithm spaces and into the possible
This section has two distinct parts. First, we introduce the concept
ting the overall value of various approximation forms for the algorithm
selection problem.
and 1 disk
c. Scene ana~ysis.
fewer lines
to analyze.
45
algorithms are developed for a particular class of problems even though the
goes from easy to hard. Thus one visualizes a nested set of problems where
the innermost set consists of very easy problems and the largest set consists
classification (at least in a reasonable way) for complex problem spaces. One
is lacking the insight to know in all circumstances just what makes a problem
hard or easy.
6.3.2 Degree of Convergence. The idea of degree of convergence comes from con-
these forms do as one goes further out in the sequence? A standard example
We assume that for each approximation from the sequence we have the best
coefficients possible.
algorithm for every problem. If we let A*(x) be the best algorithm for
problem x and let ~(x) be the algorithm chosen by the best coefficients for
the N-th approximation form, then the question is: How does
max
xE9'
46
called the degree of convergence for the problem space ~ and the sequence of
approximation forms.
degree of convergence is known for many cases. In the standard case the
problem is to evaluate a function f(x) and the best algorithm A*(x) is taken
EN - KN
-N for some constant K. In this case EN goes to zero extremely
fast. If one replaces sin (x) by ABS(X-IL then EN - KN- I which is not
mati cal context where one knows a variety of properties. We can say. however,
that really fast convergence using simple forms (i.e. polynomials and similar
everywhere). A large proportion (at least 50%) of the lIfunctions T1 that arise
operation is allowed.
in Figure 7.
48
/INPUT
MEMORY ""X
1"
N TERMS READ X
COEF(K) SPECIAL F=COEF(O)
MULTIPLY
t.
DO 10 K=l,NTERMS
ADD F=F+X*COEF(K)
UNIT 10 CONTINUE
PRINT F
TEST N TERMS
UNIT FAIt
SUCCEED
,
OUTPUT
F
" I
machine. For example, a machine which can only add and multiply but which
into the same framework as the piecewise forms and the tree or algorithm
These problems are orders of magnitude simpler than the typical situation
that arises in the algorithm selection problem. Thus there is little hope
for the near future that we will obtain optimal algorithms for most of these
complexity in the algorithm selection problem, there are three good reasons
to consider the idea. First, it provides the proper framework within which
to contemplate the problem. Second, the results for simple problems show
that the standard ways of doing things are often not optimal or even anywhere
is likely that further theoretical developments in the area will indicate that
problems.
situation becomes more and more extreme. We do not attempt to define this
moves away from these easy problems. A robust algorithm then is one whose
performance degrades slowly as one moves away from the problems for which
it was designed. Since the problem space is so large and so poorly under-
There is a reasonable probability that one will face a problem with a com-
student in a classroom. One has three candidate algorithms for the estimate:
the average wealth, the medium wealth and the mid-range wealth. In a
mid-range now produces ridiculous estimates like $200 or $300 million and
the average is not much better with estimates like $20 or $30 million. The
51
a wealthy person and thus is a very robust algorithm for this problem.
lVhile the average is more robust than the mid-range, it is not very satis-
measure. In some situations one can achieve robustness with very simple
specific situations exist which exhibit behaviors exactly opposite the usual
one. We have already noted that the most crucial decision in the algorithm
and disadvantages of the forms as they interact with the special features of
S2
a choice of form for the algorithm selection mapping is made which achieves
situations.
6.4.1 Discrete Forms. One might tend to dismiss this case as "degenerate". After
all, if one is merely to select the best one of three or eleven algorithms,
there seems to be little need for any elaborate machinery about approximation
forms. We do.not imply that how to:identify the best will be easy. rather
there are some very interesting and challenging features of these forms.
in fact or in concept a very large set. Even though we may have selected
samples from a very much larger set. Recall from the discussion of the
numerical quadrature problem that there may well be tens of millions of algo-
rithms of even a rather restricted nature. Thus in the mind's eye there is
is in its ability to handle problems involving very large finite sets. The
emphasis has been on developing tools to handle problems with infinite sets
(e.g. the continuum) and one frequently draws a complete blank when faced
We are really saying that the proper way to consider discrete forms is
53
about continuous forms (such as presented later in this section) and hope-
these lines because we have no knowledge of the possible continuum behind the
discrete set.
We conclude by recalling that robustness is a property of individual
evaluated for each algorithm in the discrete set. However. if the set is
large, then this is impractical. In this latter case, one probably must
continuum.
6.4.2 Linear Forms. There are so many obviously nice things about linear forms
that we might tend to concentrate too much on what is bad about them; or
we might tend to ignore anything bad about them. Some of these nice things
are:
These observations imply that we should give these forms first consideration
and that we should try other things only after we are fairly sure that some
The bad thing about these forms comes from the following experimentally
observed fact: Many real world processes are not linear or anywhere close to
cannot prove them here. Indeed, certain theoretical results (e.g. the
Weirstrass Theorem) are frequently used to support just the opposite con-
space 9 has just one attribute of consequence and l~e call it x (which
identifies the problem with a real number that measure this attribute).
Our algorithm space..w is likewise simple with one attribute which we call
A. Suppose that x and A range between 0 andl and suppose the best algorithm
is for A = .27 if x < .41, A = .82 if .41 < x < .8 and is A = .73 for
x > .8. The best or optimal algorithm selection mapping is then as shown
in Figur~ 8 (left).
A A /
/
/
/
/
/
/
/
/
/
/
/
x . x
polynomials, e.g.
expect results such as shown in Figure 9 (provided one has been careful
in the computations). It is hard to argue that either one of these selec-
tien mappings is a good approximation to the optimal one. Note that in both
cases that the polynomials are truncated at either A=O or at A=l in order
A A
., ... ,"
r ,, I
,, , ,,
, , ,. ,
\
\
\ !
I I ~.
I
,• Ii
\J '
I ,
I
,,
,I -, I
[
I '
r
,,J.I I
,I ,, II
, [
[
x x
nomials? One frequently sees Fourier Series (sines and cosines), exponen-
tials, Bessel functions, etc., etc. None of these give noticeably better
selection mapping and then we find 0*=0 and 6*=1 gives a perfect approxima-
tion.
This last observation shows the impossibility of making universal
right things. then the linear forms can do very well indeed. In practice
though. one is usually limited to just a few pOSSibilities and one has
very little information about the optimal mapping. Note that a typical
not likely to hit upon the optimal mapping as one of the things to include
tions there are numerous results about how the error of polynomial and
there will always be a large error at that jump. We also see that the large
then it is known that the degree of convergence for N-terms is like liN.
That means that if 1 term gives a unit error, then 10 terms give a .1 error,
100 terms give .01 error, etc. This is a very bad situation even for the
57
again the number of terms. For K=5, if 1 term gives a unit error then we
,qould expect to need about 32 terms for 1/2 unit error, 1000 terms for 1/4
unit error and 100,000 for .1 error. For K=lO, the corresponding numbers
10
are 1,000, 1,000,000 and 10 , respectively for errors of 1/2, 1/4 and .1.
How often can one expect the problem space to produce selection
phenomena from physics and engineering problems indicates more than 50% of
these functions are unsuitable for polynomials and other standard linear
mathematical forms. This includes Fourier Series which are currently widely
results. There is an intuitive reason why one should expect this. Many
dominate the behavior. As one goes from one domain to another there is a
being required. Recall that polynomials, Fourier Series, etc. have the
many real world situations and is another intuitive reason for doubting the
One must admit that the above" arguments are taken from simplified and
58
tion problems is very tenuous indeed. Yet, we conjecture that things get
worse rather than better as one gets away from these situations into a
6.4.3 Piecewise Linear Forms. In simple terms, we break up the problem domain into
pieces and use separate linear forms on each piece. The motivation is to
cases the most crucial step is to determine the appropriate pieces and yet
these forms assume that they are fixed and given by some a priori process.
In these cases we in fact have a two stage process: the first is an intuitive-
coefficients for each of the linear pieces. Note that there are often some
interconnections between the pieces (for example. broken lines are piecewise
linear functions of one variable which join up continuously) which give rise
to mathematical problems which are non-standard but still linear (and hence
usually tractible).
because of the vagueness of the process for determining the pieces. Indeed
if the pieces are poorly chosen or too big. then one can have all the
difficulties mentioned with the traditional linear forms. On the other hand.
improvement to happen.
(ii) Sometimes the problem domain is small enough that one can
S9
something like ~ to -t
N
where N is the number of coefficients
computation.
6.4.4 General Nonlinear Forms. It is not very profitable to discuss such forms
in the abstract. These forms include everything. including the best possible
selection mapping, and thus one can do perfectly with them. Thus we must
60
a variety of such classes. A partial list of these with simple examples is:
2
c +c x+c x
1 2 s
Rational Functions: c +c x
4 S
C2x +ce c4 x +
Exponential/Trigonometric Functions: ce
1 3
2
c +c x+c x for -~ < x .:: c 4
1 2 3
Piecewise Polynomials: { c +c x+c x 2+CaX 3 for c < x .:: c g
S 6 7 4
2
cIO+cllx+cI2x for c < x < ~
9
There are several general statements that one can make about these
forms:
(i) A considerable (or even very extensive) amount of analysis has been
(iii) The computational effort required to obtain best (or even very
a variety of cases.
selection mapping. One then chooses that nonlinear form which possesses this
behavior and for which one can handle the analytical and computational
difficulties.
sorn~qhat of an art and there is no algorithm for making the choice. On the
other hand, the degree of convergence and complexity results for rational
functions and piecewise polynomials show that they have great flexibility
and are likely to do well in most situations. Doing well might not be
good enough. In real problems the dimensionalities are high and needing
for an n-diemnsional feature (or problem) space. With n=2 this is a modest
coefficients of another approach~ but in either case one cannot use the
forms.
6.4.5 Tree and Algorithm Forms. These forms are most intriguing because they
prorndse so much and have the mystery of the unknown. Perhaps it is a case
of the grass being greener on the other side of the fence. These forms may
have difficulties and disadvantages which are not apparent now but which
The primary basis for their promise is their flexibility and potential
traditional foms have taken many years to develop and even now can be
will severely restrict the usefulness of these forms for many years.
The piecewise linear forms are an example of a simple tree form and
their success bodes well for other cases. Computational techniques and
theoretical analysis for these forms is progressing steadily and we can look
for them to enter into the " s tandard and routine" category before long. This
development should serve as a useful guide for other simple tree and algo-
rithmic fonns. Still, we are very far removed from the time when we can
select as our approximation form a 72 line Fortran program and then compute
the best " coefficient values" (Fortran statements) for a particular appli-
cation.
In sununary, \\'e have very little hard information about these forms,
but they appear to hold great promise and to provide a great challenge for
6.5 An error to avoid. Occasionally one observes the following situation develop;
(ii) A crude model is made of it. This model perhaps has some
In the specific instance at hand, the real world problem is the algorithm
selection mapping, the model is the approximation form selected and the
effort is in determining the coefficients of this form. The error that one
can make is in believing that finding the best coefficients of the selection
to believe that the best coefficients will give good selections. One is
particularly susceptible to making this error when using ~imple linear forms
for the selection ~apping. On~ may refer to Figure 8 for an illustration
f.or this situation.
6.6 The Mathematical Theory Questions. This section presents an intuitive
6.6.1 The Existence Question. In concrete situations one rarely worries about
worries about the existence of good ones). Yet. from time to time this
are just sets of real numbers and the question is readily reduced to a
problem about sets of real numbers. One then attempts to show that:
This line of reasoning may fail a various points for nonlinear approxi-
S (f, c)
easy to see such silliness in mOTe complex examples. The second example
to rewrite this form so that the difficulty disappears. One can, however,
S(f, c) = f E {-I,D,11
Thus the feature f can take on only one of three possible values and we
Suppose now that the best selection (of all possible forms and problems) is
S(D,c) = 1
= 1
l+O*£Z
S(-I,c) = S(+I,c) = 1
l+c
Z
6S
\~e can make SC!.l,c) as close to zero as we ""ant by making C large, however,
z
if we set c2=~J then S(O,e) is ruined. The difficulty in this example is
the approximation form chosen must be extended in some way. A simple mathe-
mati cal example of this occurs for the t\~O exponential form
Now let c =-c , c1=u/E and then let € go to zero. The result is
l 3
afe c4f
and we see that this form with two exponentials also contains a function of
f
completely different mathematical form. However. the plot of fe and
is a singularity at the north and south poles for the geographic coordinates
CASIf .. C, C, (3
f
Figure 10. The curve fe and nearby curves of the form
with various values of c ' c2J and c "
r 3
Consider piecewise linear forms (broken lines) with variable break points.
Figure llshows two things that can happen when the break points come together.
On the left we see that two of them can converge so that the result is a
step function with a jump discontinuity_ On the right we see that four
"
II
I
II
II
II
\ \ \i II
\ \ \I J.:...
\ \ Ii IIII
\ \ III '
\\11 II I
\,'\\
I
I, II
! IrI
III \ 11'1 I
\ \ \ III! I
I \
I', \ \ II I I I
Figure 11. Two ways that non-linear break points in a broken line form
can introduce new forms: a jump discontinuity (left) and
a "delta" function (right).
the definition are discovered, then one can expect computational difficulties.
f
For example, if one is using the two exponential form c e C2f + c e c4
I 3
f
and the best approximation is fe (or nearly so), then the computations
expect the same phenomena to occur for the tree and algorithm forms. A very
6.6.2 The Uniqueness Question. One is usually not interested in this question
per se. any best (or good) approximation will do. HOI.,rever. its study J like
may arise.
68
X,
CL{'I'""e. l"eff~5.e.ftI""3
<:I. \~ o fl,,... S D kf.ai",~J
by 11. ;w"c..rl"Jl3 S.
Figure 12. First is that almost all points have a unique best approximation
even if a few do not. Second, we see that when there is more than one
The point x, for example, has best approximations xl and x " Finally, the
2
point y illustrates the most difficult situation Nhere even though the
69
in Figure 12. First, and somewhat less important. one can expect trouble
at those points where two or more closest points are close together. This
occurs near the three ends of the "Unes of non-uniqueness II in Figure 12.
More important is the fact that computational schemes are almost always local
in nature and thus might well locate YZ as the closest point to y. Further,
such schemes usually give no inkling that there might be a point much closer
approximation (yZ is far from y) and our limited experience in these matters
does support the hope that Ilgood'! locally best approximations are likely to
best coefficients C*, then we have minimized something, namely our measure
for example, the derivation of the normal equations for least squares
•
70
linear programming problems is obtained this way modulo the changes necess-
Now, the maximum only occurs at the extrema of If-51 and if we denote them
3
30. I£(t) - S(c,t)l at t, =0
J 1 ). = 1,2,3, ...
a j = 1.2, ... ,n
signl£ - sl at 3c. S(c,t)at t* =0
"
1 J 1 i = 1,2,3, ...
n
If S(c,t) is linear i.e., S(c,t) = ~ c" ~J.(t) then we have
j=l J
J = 1,2, ... ,n
(1) sign 1£ - sl ~j(tlat t, =0
1 i = 1,2,3, ...
extrema t~
1
must occur with a combination of signs so that it is impossible
so that
'gn If
S• - sl at t, = (_l)i or (_ll i + 1 i .. 1,2, ... ,n+l
1
approximations.
The main point made lS that almost all characterization conditions come
from setting derivatives equal to zero even though in some cases it may look
The implication for computation is that they also are based on finding
in nature (unless one is lucky) and share many of the computational properties
difficult to initialize for convergence. Some methods may have all three of
6.7 Conclusions, open questions and problems. One objective of this section
We also conclude that approximation theory currently lacks much of the necessary
new results for and apply known techniques to these new circumstances. The
problem and general optimization theory. This is not surprising since the
have not attempted to detail this relationship here, but one may refer to Rice (1970)
moderate to high dimensionality and thus one should expect them to be quite
the best selection. Indeed, the results of Rabin (1974) suggest that this com-
form for the selection mapping. It is here that theories give the least
1. What is the relationship between tree forms and piecewise linear forms?
Can all tree forms be made equivalent to some piecewise form, linear or
non-linear?
2. What alre the algorithm forms for the standard mathematical forms? Do
they suggest useful simple classes of algorithm forms? See Hart etaal.
(1968, Chapter 4) for algorithm forms for some polynomial and rational forms.
8. Obtain more precise information about the nature of real world functions?
For simplicity, one may use one evaluation of f(x) as the unit of
mathematical forms.
12. Develop techniques to partition high dimensional problem sets into subsets
in one dimension.
13. Develop existence theorems for various classes of thee form approximations.
14. l'ihat are the relationships between best algorithm selection and the
REFERENCES
Hart. John F. et .!!..• Computer Approximations, John Wiley. New York, 1968.