0% found this document useful (0 votes)
2 views

The unexpected power of linear programming

This article discusses the extensive applications of linear programming beyond traditional uses, highlighting its effectiveness in modeling various practical problems, including nonlinear relationships. It aims to enhance the understanding of linear programming for students and practitioners by showcasing its versatility in areas such as prediction, curve fitting, and decision-making. The authors also provide insights into alternative methods for parameter estimation in regression models, demonstrating the robustness of linear programming in diverse scenarios.

Uploaded by

Amr AbuZaid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

The unexpected power of linear programming

This article discusses the extensive applications of linear programming beyond traditional uses, highlighting its effectiveness in modeling various practical problems, including nonlinear relationships. It aims to enhance the understanding of linear programming for students and practitioners by showcasing its versatility in areas such as prediction, curve fitting, and decision-making. The authors also provide insights into alternative methods for parameter estimation in regression models, demonstrating the robustness of linear programming in diverse scenarios.

Uploaded by

Amr AbuZaid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Annals of Operations Research (2024) 343:573–605

https://doi.org/10.1007/s10479-024-06245-5

ORIGINAL RESEARCH

The unexpected power of linear programming: an updated


collection of surprising applications

Bruce Golden1 · Linus Schrage2 · Douglas Shier3 · Lida Anna Apergi1

Received: 14 June 2024 / Accepted: 22 August 2024 / Published online: 21 September 2024
© The Author(s) 2024

Abstract
Linear programming has had a tremendous impact in the modeling and solution of a great
diversity of applied problems, especially in the efficient allocation of resources. As a result,
this methodology forms the backbone of introductory courses in operations research. What
students, and others, may not appreciate is that linear programming transcends its linear
nomenclature and can be applied to an even wider range of important practical problems.
The objective of this article is to present a selection, and just a selection, from this range of
problems that at first blush do not seem amenable to linear programming formulation. The
exposition focuses on the most basic models in these selected applications, with pointers
to more elaborate formulations and extensions. Thus, our intent is to expand the modeling
awareness of those first encountering linear programming. In addition, we hope this article
will be of interest to those who teach linear programming and to seasoned academics and
practitioners, alike.

Keywords Linear programming · Curve fitting · Goal programming · Discriminant


analysis · AHP · Game theory · DEA · Applications in finance · Scheduling · Intercollegiate
gymnastics

Bruce Golden, Linus Schrage, Douglas Shier and Lida Anna Apergi have contributed equally to this work.

This is an updated version of the paper “The power of linear programming: Some surprising and unexpected
LPs" that appeared in 4OR, 19(1), 15–40, 2021.

B Douglas Shier
[email protected]
Bruce Golden
[email protected]
Linus Schrage
[email protected]
Lida Anna Apergi
[email protected]
1 Decision, Operations and Information Technologies, Smith School of Business, University of
Maryland, Van Munching Hall, College Park, MD 20742, USA
2 Booth School of Business, University of Chicago, Chicago, IL 60637, USA
3 School of Mathematical and Statistical Sciences, Clemson University, Martin Hall, Clemson, SC 29634,
USA

123
574 Annals of Operations Research (2024) 343:573–605

1 Background

Graduate students in operations research almost always take a course in linear programming
during their first year of study. They learn about applications to farming, nutrition, efficient
manufacturing, transportation optimization, energy planning, telecommunication, etc. They
study the simplex method as well as duality theory and sensitivity analysis. They might also
learn about well-studied special cases such as the transportation problem. If time permits, an
introduction to integer programming might also be included. All of this is extremely useful.
On the other hand, it is hard to appreciate the power and flexibility of linear programming
at first glance. For most of us, it takes many years to fully understand how widely applicable
the linear programming model is. The primary purpose of this article is to help our younger
colleagues appreciate George Dantzig’s gift to us at the beginning of their unfolding careers
as operations researchers.
George Dantzig proposed the simplex method in the summer of 1947. In October 1947,
Dantzig and John von Neumann met for the first time. Dantzig described the linear pro-
gramming model and von Neumann shared his initial insights with Dantzig (more on this
later).
In September 1948, there was a meeting of the Econometric Society in Madison, Wiscon-
sin. Dantzig presented a lecture on linear programming entitled “Programming in a Linear
Structure” to a distinguished audience of economists, statisticians, and mathematicians. After
the presentation, Professor Harold Hotelling, a prominent statistician at the University of
North Carolina, made the critical comment: “But we all know the world is nonlinear.” (For
more historical details, the reader is referred to Assad and Gass (2011) and Lenstra et al.
(1991).)
In the following sections, we demonstrate the power of linear programming to model
not only some important nonlinear relationships but also an unexpected variety of decision
problems. This paper is an extension of Golden et al. (2021), including additional sections
on measuring circularity of approximately circular manufacturing data, optimization with a
ratio objective, single resource scheduling, and team construction in gymnastics.

2 Prediction and curve fitting

In this section, we use prediction and curve fitting to demonstrate that linear programming,
despite its name, can also be used to model some important nonlinear relationships.
Linear regression is perhaps the most commonly used technique for predictive analysis.
The goal in linear regression is to model the relationship between a dependent variable,
which is the value we are trying to predict, and one or more independent variables. When
the model consists of one independent variable we have simple linear regression and when
it consists of two or more independent variables we have multiple linear regression. As can
be inferred by the name, the model is linear, which means that the relationship between the
dependent and the independent variables is assumed to be linear. In this section, we will try to
answer the following questions. First, can linear regression handle nonlinear data? Second,
can linear programming (LP for short) be used to solve the same prediction problems as linear
regression? We hope to use this line of reasoning to respond, at least partially, to Hotelling’s
criticism.
A common way of estimating the parameters of a linear regression model is the ordi-
nary least squares (OLS) method. Consider the following multiple linear regression model

123
Annals of Operations Research (2024) 343:573–605 575

consisting of k independent variables: yi = β0 + β1 xi1 + · · · + βk xik + i , with y the


dependent variable, x the independent variables, βi the regression coefficients, and  the
random error component. The objective of the OLS method is to minimize the sum of the
squares of the errors, which are the differences between the dependent variable n observed
in the data and the prediction of the model. The least squares function is i=1 i 2 =
n k
i=1 (yi − β0 − j=1 β j x i j ) and is minimized with respect to β0 
, β1 , ..., βk . In the case of
2
n
a simple linear regression model, the least squares function becomes i=1 (yi − β0 − β1 xi )2 .
The OLS method has been around for about 200 years (Stigler, 1981). Since we can take par-
tial derivatives and set them equal to zero, we can solve for the beta values analytically.
Nevertheless, there are other ways of estimating the parameters of a linear regression
model. One approach is the least absolute deviations (LAD) method. In this case, the objec-
tive changes to minimizing the sum of absolute errors instead of the squared errors, as
in
nthe case of OLS. k In other words, the objective is to minimize the following function:
i=1 |y i − β0 − j=1 β j x i j |. The LAD model can be formulated as an LP. For example,
in the case of a simple linear regression model, the LP for LAD is the following:


n
min ui (1)
i=1
subject to
− u i ≤ (yi − β0 − β1 xi ) ≤ u i , ∀i (2)
u i ≥ 0, ∀i (3)
β0 , β1 unrestricted. (4)

In the above LP, yi and xi are inputs since they are observations in our data. The decision
variables in this LP are u i , β0 , and β1 . Variable u i is the absolute error of observation i,
resulting from our regression model. In particular, constraints (2) and (3) ensure that the
absolute error of each observation will be nonnegative and at least as large as the difference
between the observed and the predicted value. Constraint (2) is equivalent to |yi −β0 −β1 xi | ≤
u i . (In general, |z| ≤ r is equivalent to −r ≤ z ≤ r .) The fact that the objective function (1)
is to minimize the sum of u i over i ensures that each error will be equal to the absolute value
of the difference between the observation and the prediction. Variables β0 and β1 give us the
intercept and slope coefficient of our simple linear regression model, respectively.
The main difference between the OLS and the LAD models is that OLS gives greater
emphasis to outliers, while LAD gives equal emphasis to each observation. This makes LAD
less sensitive to outliers than OLS. On the other hand, LAD can admit multiple optimal
solutions, while the OLS solution is unique. However, both models can be solved quickly,
with OLS being solved analytically and LAD iteratively. Consider the example depicted in
Fig. 1. We can see from the figure that the relationship between y and x does not seem to be
linear, but we can disregard this for now. Another quick observation is that there seem to be
two outliers in the upper left. In this example, we have 22 observations, which we want to fit
using simple linear regression. We have applied both the OLS and LAD methods in order to
estimate the parameters of these two lines. The model for OLS is y = 3.3082 − 1.9955x and
for LAD it is y = 2.3737 − 1.3157x. The results show that the OLS model is affected more
by outliers, since the line passes closer to observations (0, 5.9) and (0.2, 4.9). On the other
hand, the LAD model passes closer to a larger number of observations. Furthermore, the
number of negative and positive residuals (i.e., observations below and above the regression
line) is more equally balanced in the case of the LAD line. In particular, there are 10 negative,

123
576 Annals of Operations Research (2024) 343:573–605

Fig. 1 Regression lines for the


OLS and LAD models

10 positive, and 2 residuals equal to zero in the case of the LAD. In the case of the OLS,
there are 13 negative and 9 positive residuals. This behavior is to be expected.
Robustness is an advantage for the LAD model, whereas non-uniqueness is a limitation.
In practical situations, one would suspect that the vast majority of LAD solutions are unique.
Suppose we wanted to test this conjecture. In particular, consider the question: How frequently
do multiple optimal LAD solutions arise as a function of the number of observations, number
of independent variables, and R-squared value? For a given dataset, we could determine if an
LAD solution is unique by solving one additional linear program. We present this surprisingly
simple linear program in Appendix A. Based on our limited experiments, it seems highly
unlikely that LAD admits multiple optimal solutions in practical situations.
Another reason why it sometimes makes more sense to use LAD than OLS involves side
constraints. Prior or sample information or relevant economic considerations may suggest
additional constraints such as
1. 0 ≤ β0
2. 0 ≤ β0 ≤ β1
3. y = β0 + β1 x must pass through point (x̃,ỹ)
or something similar. When we add these constraints to an LAD model, we still have a
relatively simple LP to solve. On the other hand, the OLS model becomes a quadratic program,
which is typically more difficult to solve.
Another method for estimating the parameters of a linear regression model involves min-
imizing the maximum absolute deviation (MMD). In this case, the objective is to minimize
the maximum absolute error across all observations. The aim of this approach is to generate a
regression model that is more equally fair for all observations, by trying to avoid cases where
a few observations have a very large deviation from the predicted value. This method is also
formulated as an LP, as presented below.

min r (5)
subject to
− r ≤ (yi − β0 − β1 xi ) ≤ r , ∀i (6)

123
Annals of Operations Research (2024) 343:573–605 577

Fig. 2 Regression lines for the


OLS, LAD, and MMD models

r ≥ 0, (7)
β0 , β1 unrestricted. (8)

The formulation has similarities to the LAD model. The decision variables in this case
are r , β0 , and β1 . Variable r is the maximum absolute error across all observations. This is
the main difference between the LAD and MMD formulations. Otherwise, a similar formu-
lation technique is followed since r basically replaces the individual errors u i in the LAD
formulation.
Figure 2 extends the example presented before to include the regression line for the MMD
method, which is y = 4.3486 − 2.8160x. We see from the figure that the MMD line is the
one that is affected most by the outliers. This is expected because MMD tries to minimize
the maximum distance between an observed value and its prediction. This is also illustrated
in Fig. 3, which includes bands showing the positive and negative errors with respect to the
observed and predicted values. We see that the best balance between the maximum positive
and the maximum negative distance is observed in the third graph, which corresponds to the
MMD model. The greatest imbalance is observed in the case of LAD, and OLS is in-between.
As mentioned earlier, the data in our example do not seem to follow a linear relationship. In
particular, the scatterplot indicates that a logarithmic transformation applied to the y variable
could help obtain a linear relationship. After transforming the data and using the OLS, LAD,
and MMD methods on the resulting dataset we get the models displayed in Fig. 4. We can
see from the data points that the relationship between y and x has become more linear.
Furthermore, the resulting regressions are very similar to each other. In particular, the OLS
model is log(y) = 1.3592 − 1.6186x, the LAD model is log(y) = 1.2339 − 1.5534x, and
the MMD model is log(y) = 1.3432 − 1.6236x. The reason for the similarity is that there
are no extreme outliers in the data after the transformation, which would cause each method
to behave differently based on its objective.
As a final step, we can transform our models to curves (expressed in terms of y) and see
how well they fit the data. In other words, the OLS model becomes y = e1.3592−1.6186x ,
the LAD model is y = e1.2339−1.5534x , and the MMD model is y = e1.3432−1.6236x . Figure 5
displays the resulting curves and it illustrates how they fit the original data. From this example,

123
578 Annals of Operations Research (2024) 343:573–605

Fig. 3 Bands of maximum positive and negative errors for the OLS, LAD, and MMD models

Fig. 4 Regression lines for the


OLS, LAD, and MMD models on
transformed data

we can see that we can use LP methods to model data that are not necessarily linear. This
provides a partial response to Hotelling’s remark about our nonlinear world. In addition,
linear programming has been used to address a variety of other nonlinear problems (e.g.,
minimize a piecewise linear convex function (Charnes & Lemke, 1954) and solve linear
fractional programming problems (Charnes & Cooper, 1962)).

3 Circular reasoning

The previous section illustrated how LP can be used to estimate model parameters in which
the absolute as well as maximum error of fitting a linear function can be minimized. Here

123
Annals of Operations Research (2024) 343:573–605 579

Fig. 5 Curves for the OLS, LAD,


and MMD models

Table 1 Coordinates of drilled 1 2 3 4 5 6 7 8 9


holes
xi 0.2 0.4 0.5 0.8 1.2 1.6 1.9 2.0 2.2
yi 1.6 0.4 2.2 2.4 0.2 2.3 0.4 0.6 1.8

we indicate how LP plays an important role in determining an excellent approximation in


fitting a circle to existing data. Again this illustrates that the linear designation in “linear
programming" is not an accurate description of the wide applicability of LP to important
engineering problems.
High-precision manufacturing often requires the use of coordinate-measuring machines
to assess the quality of the items produced. Specifically, a number of holes are to be drilled
into an object in a circular pattern in a way that respects a pre-specified tolerance. While this
does not appear to be a problem amenable to linear analysis, it turns out that LP can again
come to the rescue, aided by the use of the MMD metric introduced in Sect. 2.
To provide a specific motivating example, consider how we might measure the devia-
tion from circularity of the holes drilled by certain precision machinery, whose coordinates
(xk , yk ) are given in Table 1 and displayed graphically in Fig. 6. We want to determine the
center (x0 , y0 ) of a circle of radius r0 that best approximates the given set of n points (xk , yk ).
This artificial problem was constructed to show how our approach could handle data that was
purposely designed to be noncircular and, therefore, a challenge for approximation.
A natural way to measure the accuracy of this approximation is by computing the maximum
deviation δ of the given data points (xk , yk ), k = 1, . . . , n from the circumference of the
proposed approximating circle centered at (x0 , y0 ) and having radius r0 . We would then
like to choose parameters x0 , y0 , r0 to minimize δ = maxk |rk − r0 |, where rk measures the
 distance between (xk , yk ) and the proposed center (x0 , y0 ). Since the expression
Euclidean
rk = (xk − x0 )2 + (yk − y0 )2 is rather intimidating, we will instead try to minimize
 = maxk |rk2 − r02 | = maxk |(xk − x0 )2 + (yk − y0 )2 − r02 | = maxk |Rk |, where
Rk = (xk − x0 )2 + (yk − y0 )2 − r02 = (xk2 + yk2 ) − 2xk x0 − 2yk y0 − ρ0 ,
ρ0 = r02 − x02 − y02 .

123
580 Annals of Operations Research (2024) 343:573–605

Fig. 6 Positions of drilled holes

This now leads to the formulation of the following minimax LP (Gass et al., 1998), in
which we wish to minimize the largest value of |Rk | by suitable choice of the unrestricted
variables x0 , y0 , ρ0 :

min t (9)
subject to
t ≥ (xk2 + yk2 ) − 2xk x0 − 2yk y0 − ρ0 , ∀k (10)
−t ≤ (xk2 + yk2 ) − 2xk x0 − 2yk y0 − ρ0 , ∀k (11)
x0 , y0 , ρ0 unrestricted. (12)
Once we have solved this LP, we will have values x0 , y0 , ρ0 and then can compute r0
using ρ0 = r02 − x02 − y02 or r02 = ρ0 + x02 + y02 . While ρ0 is not restricted in sign, Gass et
al. (1998) have established that at optimality ρ0 + x02 + y02 ≥ 0 must hold so that we will
indeed obtain a non-negative real value for the radius r0 .
Solution of the problem represented in Table 1 is shown in Fig. 7, where the best-fitting
circle, centered at (x0 , y0 ) = (1.2, 1.28) with radius r0 = 1.122, is displayed along with the
given data points. The optimum objective function value is found to be t =  = 0.156. The
true maximum deviation δ = maxk |rk − r0 | = 0.072, suggesting that the drilled holes are
within the specified manufacturing tolerance of (say) 0.10.

4 Estimating possible prediction performance

In Sect. 2, we examined two different ways in which we can use LP in order to estimate linear
regression parameters. In this section, we discuss how LP can be used to estimate how well
a dataset can predict a specific outcome, or, in other words, how good is the quality of the
data. This approach can be used before starting the analysis of a dataset in order to get a
preliminary assessment of the expected performance of predictions based on available data.
Furthermore, it can be used to evaluate the prediction models resulting from the analysis.

123
Annals of Operations Research (2024) 343:573–605 581

Fig. 7 Best-fitting minimax circle

This is achieved by generating a lower bound on the absolute prediction error of all models
predicting a specific outcome, based on a given dataset. If the models resulting from the
analysis are close to achieving that bound, it means that it is unlikely that better models can
be obtained. On the other hand, if there is a large gap, it would make sense to investigate
other modeling approaches to better predict the outcome of interest, based on the available
data.
The lower bound can be generated through an LP formulation, which was proposed in
Anderson and Bjarnadottir (2024). This is based on the notion of internal consistency, which
states that if two inputs are very similar, then the corresponding predicted outcomes should
also be similar. The LP follows below:


min (i+ + i− ) (13)
i
subject to
− f (D(i, j)) ≤ (Pi − P j ) ≤ f (D(i, j)), ∀i, j : D(i, j) ≤ δ (14)
Pi − Yi + i+ − i− = 0, ∀i (15)
i+ , i− ≥ 0, ∀i (16)
Pi unrestricted, ∀i. (17)

In the above LP, the decision variables are i+ , i− , and Pi . Variable Pi denotes the pre-
diction for observation i. Variables i− and i+ denote the positive and negative difference
between the prediction Pi and the true value observed, which is denoted by Yi . Constraint
(15) ensures that the true difference is calculated between the prediction and the observation.
The objective (13) of the LP is to minimize the sum of errors determined by (15). Finally,
constraint (14) ensures that two observations that are similar to each other will also have sim-
ilar predictions. In particular, the distance between two observations is denoted by D(i, j),
while δ denotes the distance threshold under which two observations are considered to be
similar. If D(i, j) ≤ δ (i.e., observations i and j are similar), then their predictions should

123
582 Annals of Operations Research (2024) 343:573–605

Fig. 8 MAE before and after the transformation for each model and the lower bound

not be farther apart than an amount f (D(i, j)). By f (D(i, j)) we denote a function link-
ing the distance D(i, j) of two observations to the maximum allowable difference of their
predictions. Of course, (14) is equivalent to |Pi − P j | ≤ f (D(i, j)).
In order to better demonstrate how the above LP can be applied on a dataset, we use one of
the examples provided in Anderson and Bjarnadottir (2024). In particular, the authors provide
a technology adoption example using data from Agarwal and Karahanna (2000). The goal
of this study is to predict the intention to use the internet based on the perceived ease of use
and usefulness. The output of multiple linear regression is Intent = 0.306 · Perceived Ease Of
Use + 0.478· Perceived Usefulness. Based on this output, the authors choose f (D(i, j)) =
0.478 · D(i, j) to use in (14). In other words, they select the maximum absolute value across
all coefficients of the regression output. The reasoning is that a small change of  is not
expected to cause more than a 0.478 change in the outcome. Therefore, it is recommended
to use f (D(i, j)) = βmax · D(i, j). Regarding the value of threshold δ, as it increases the
lower bound for the mean absolute error increases. For example, when δ = 0, only identical
observations are restricted to give identical predictions. However, when δ is large (e.g.,
approaching 0.25 in Fig. 8), each pair of observations is restricted in terms of predictions. In
other words, the number of constraints of type (14) increases substantially. Including more
constraints on the value that the predictions can take can lead to a higher objective value,
since the objective is to minimize the sum of errors.
As a final step, we generate a lower bound for the mean absolute error of the dataset
used in Sect. 2. In this case, we have a simple linear regression. We decided to use the
parameters resulting from the OLS method, since this is the one most commonly used. The
OLS regression model for the data before the transformation was y = 3.3082 − 1.9955x, and
for the data after the transformation log(y) = 1.3592 − 1.6186x. We calculate two bounds.
The first bound is used to estimate the quality of the data before the transformation and uses
f (D(i, j)) = 1.9955 · D(i, j). The second is applied to the data after the transformation
and uses f (D(i, j)) = 1.6186 · D(i, j). Since this is a simple linear regression, there is only
one β to choose from for βmax . (For higher-dimensional datasets, this step becomes trickier.)
The resulting mean absolute error (MAE) bounds are included in Fig. 8. The figure on the
left-hand side includes the MAE of the bound before the transformation as we gradually

123
Annals of Operations Research (2024) 343:573–605 583

increase the value of δ. We also show the MAE of each of the models resulting from the
OLS, LAD, and MMD methods. We can see that the LAD model is the one closest to the
bound. This was expected since the objective of the LAD method is to minimize the sum of
absolute errors, leading to a minimization of MAE. The figure on the right-hand side includes
the MAE of the bound after the logarithmic transformation. As expected, the three models
have nearly equal MAE, since their equations are also very similar. Again the LAD does
slightly better than the other two models. The information conveyed in Fig. 8 is consistent
with that seen in Figs. 2 and 5. In the former, it is clear that none of the linear models provides
an adequate fit. In the latter, it looks like (with the exception of the two outliers), the fit is
excellent. We can see that the MAE values of the graph on the right-hand side are lower than
in the graph on the left-hand side for the bound and the models. However, it is not possible to
make comparisons between the two, since they are estimated over a different range of values.
After the transformation, y becomes log(y), which generally leads to lower error values. In
addition, the βmax values are not the same for the original and transformed datasets.
We also point out that the internal consistency LP model is less informative for small
values of δ. As mentioned previously, when δ is very small (e.g., less than 0.1 in Fig. 8),
constraints (14) do very little to restrict the objective function. Therefore, the MAE lower
bound will be nearly impossible for any model to approach.

5 Goal programming

In the beginning of nearly every book on linear programming, we are told that LPs have
constraints, decision variables, and an objective function. It follows, therefore, that linear
programming can only handle a single objective. But, is this true?
Let’s consider the following toy problem (taken from Wang (2017)). A company produces
three different products. Let xi equal the amount produced of Product i for i = 1, 2, 3.
Suppose the company initially proposes the (overly optimistic) constraints below:

40x1 + 30x2 + 20x3 ≤ 100 employees (18)


2x1 + 4x2 + 3x3 = 10 tons of raw material (19)
5x1 + 8x2 + 4x3 ≥ 30 million dollars in profit (20)
x1 , x2 , x3 ≥ 0. (21)

If we solve for one variable in (19) and plug this value into (18) and (20), we see that the
solution requires another variable to take on a negative value, violating (21).
Goal programming (GP) was developed in Charnes and Cooper (1961) in order to handle
multiple goals or objectives. Weighted goal programming and lexicographic goal program-
ming are the two most frequently used variants. We focus on weighted goal programming in
this paper for two reasons:

1. Weighted goal programs can be viewed as regular linear programs and they can be solved
using the simplex method; lexicographic goal programs are more complicated and require
more advanced solution techniques.
2. Weighted GP is more flexible than lexicographic GP and sensitivity analysis is easier.

In weighted goal programming, there are typically two types of constraints: hard and soft.
Hard constraints cannot be violated. Soft constraints, also known as goal constraints, can
be violated in a solution. Deviation variables measure the degree of violation and weights

123
584 Annals of Operations Research (2024) 343:573–605

(coefficients) are assigned to each of these variables. The overall objective is to minimize the
sum of the weighted deviations.
As mentioned, it is not possible to satisfy (18)–(21) in the toy problem described earlier.
Let us assume that management decides to penalize (weight) violations as follows:
• For (18), 5 per unit over 100.
• For (19), 8 per unit below 10.
• For (19), 12 per unit over 10.
• For (20), 15 per unit below 30.
The goal program below emerges.
min z = 5D1− + (8D2+ + 12D2− ) + 15D3+ (22)
subject to
40x1 + 30x2 + 20x3 + D1+ − D1− = 100 (23)
2x1 + 4x2 + 3x3 + D2+ − D2− = 10 (24)
5x1 + 8x2 + 4x3 + D3+ − D3− = 30 (25)
all decision variables ≥ 0. (26)
The constraints come from (18)–(21). We have added deviation variables to ensure that
each constraint is an equality. The objective function coefficients are the penalties or weights
− +
given above. We can solve this as an LP and obtain the solution x2 = 103 , D2 = 3 , D3 = 3 ,
10 10

all other decision variables = 0, and z = 90. Therefore, we can use LP to solve multi-objective
linear optimization problems.
An alternative (weighted) goal programming model was proposed by Flavell (1976).
Instead of trying to minimize the sum of the weighted deviations as we have done in (22)–
(26), suppose we focus on the goals. In other words, if we believe that the three goals are
equally important, a more relevant objective might be to balance (or equalize) the weighted
deviations for the three goals. The following LP accomplishes this.
min z = γ (27)
subject to
5D1− ≤ γ (28)
8D2+ + 12D2− ≤γ (29)
15D3+ ≤ γ (30)
40x1 + 30x2 + 20x3 + D1+ − D1− = 100 (31)
2x1 + 4x2 + 3x3 + D2+ − D2− = 10 (32)
5x1 + 8x2 + 4x3 + D3+ − D3− = 30 (33)
all decision variables ≥ 0. (34)
Constraints (31)–(34) are identical to (23)–(26). The left-hand sides of (28)–(30) are
the weighted deviations for goals (18)–(20). Since each cannot exceed γ , by minimizing
γ (in (27)), we succeed in balancing the weighted deviations. The LP solution gives x1 =
0.29, x2 = 3.22, D1− = 8.32, D2− = 3.47, D3+ = 2.77, all other decision variables = 0, and
z = 41.6.
Readers interested in learning more about goal programming are encouraged to consult
(Jones & Tamiz, 2010).

123
Annals of Operations Research (2024) 343:573–605 585

6 Discriminant analysis

In discriminant analysis we have several observations on entities that can be categorized into
two or more given groups. Linear discriminant analysis (LDA) is the process of predicting
which group a new entity belongs to, based on a linear formula involving the attribute values
of the entity. Stated another way, LDA is simply forecasting where the dependent variable is
categorical, as opposed to standard regression, where the dependent variable is continuous.
For simplicity, we consider here the case where there are just two groups, 1 and 2. For
example, based on a biopsy, we might want to classify a tumor as benign or malignant.
The traditional approach for carrying out discriminant analysis was introduced by Fisher
(1936) and is referred to as Fisher’s Discriminant Analysis (FDA). This method works well
when its underlying assumptions about normality of the data hold, at least approximately.
An alternative approach, not relying on any distributional assumptions about the data, is
based on linear programming formulations. We present one such approach here, focusing for
simplicity on the case of two groups. Gochet et al. (1997) give a thorough introduction to
using LP to solve LDA problems when there are two or more groups. In an early article on
this topic, Freed and Glover (1981) model LDA as a goal program (see Sect. 5); subsequent
work by these authors in Freed and Glover (1986) developed alternative LP models for the
LDA problem.
We want to derive a linear scoring formula over the attributes of an entity, so that if the
score is < 0, we classify it as belonging to group 1. If the score is > 0 we classify it as
belonging to group 2. If the score is 0, we flip a coin. So one of the good features of a scoring
formula is that it tends to not produce scores of 0. We use the term training set for the set of
data from which we wish to derive the forecasting formula, in contrast to new data to which
we want to apply the forecasting formula.
Suppose that αik denotes the value of attribute k for observation i, and let s denote the
threshold for deciding whether an observation belongs to group 1 (has score ≤ −s) or group
2 (has score ≥ s). Usinga linear model (with coefficients βk to be estimated) for the scoring
system, we have β0 + k βk αik − ei ≤ −s for each observation i in group 1. Here we allow
ei ≥ 0 in our scoring system. Similarly, for each observation i in group 2 we
for the error
have β0 + k βk αik + ei ≥ s. Since we would like to make the errors as small as possible,
we obtain the following linear programming problem:

min ei (35)
i
subject to

β0 + βk αik − ei ≤ −s, ∀i in group 1 (36)
k

β0 + βk αik + ei ≥ s, ∀i in group 2 (37)
k
ei ≥ 0, ∀i (38)
βk unrestricted. (39)

Note that s > 0 is a scale factor for the solution; a reasonable value is s = 1. If β0 , βk , ei
is a feasible solution to (36)–(39), and we multiply s by θ > 0, then θ ∗ β0 , θ ∗ βk , θ ∗ ei
is also a solution to the modified system. Using s = 0 is not useful because it admits the
useless solution with all βk = 0.

123
586 Annals of Operations Research (2024) 343:573–605

Fig. 9 Candidate hyperplanes separating two groups of entities

To illustrate the motivation for this LP, consider the example shown in Fig. 9, where there
are six entities known to be in group 1, five entities known to be in group 2, and there are two
attributes. Each point in the figure corresponds to an entity and its coordinates indicate its
attribute values; entities in group 1 are indicated by circles and those in group 2 by squares.
Also shown in Fig. 9a are two candidate hyperplanes H−1 and H+1 defined respectively by
(36) and (37) with all ei = 0. We notice that these two hyperplanes do not fully separate
the two groups of entities, with the misclassified entities shown in black. However, Fig. 9b
shows that by allowing positive errors ei in (36), we can essentially shift the hyperplane H−1
upward and thus capture all entities in group 1. In a similar way, by allowing positive errors
ei in (37), the shifted hyperplane H+1 now captures all entities in group 2.
Koehler (1990) lists a number of characteristics that one might desire in a good LDA.
The recent book on LDA by Shinmura (2016) is another excellent source. In it, a variety of
approaches (some from statistics and some from mathematical programming) are compared
over many different types of datasets.
We conclude this section with a couple of caveats. First, for real datasets, where the number
of observations from group 1 may differ significantly from the number from group 2, it may
be important to attach weights to the error terms in the objective function, where each weight
is inversely related to the number of observations from the specific group. Second, if the
assumptions behind FDA are satisfied, then FDA is hard to beat. On the other hand, when
the explanatory variables are not normally distributed, the LP approach might be better.

7 Analytic hierarchy process

The Analytic Hierarchy Process (AHP) is probably the most popular multi-criteria decision
making approach in operations research. The approach was developed by Thomas Saaty in
the 1970s and first formally described in Saaty (1977). Many hundreds of AHP applications
in numerous different areas have been published and presented at scholarly meetings. More
than a dozen AHP software products have been developed and international conferences
devoted to the advancement of AHP take place on a regular basis. Most frequently, AHP is
used to select a best alternative, rank a set of alternatives, or prioritize a set of alternatives.

123
Annals of Operations Research (2024) 343:573–605 587

Fig. 10 Step 1 in AHP

Fig. 11 A comparison matrix

AHP uses a four step process.


1. Decompose the problem into an L-level hierarchy of interrelated decision criteria and
alternatives.
2. Use collected data to generate pairwise comparisons at each level of the hierarchy (except
for level 1).
3. Apply the eigenvalue method to estimate the weights of the elements at levels 2, 3, up to
L of the hierarchy.
4. Aggregate the relative weights over levels 2 to L to arrive at overall weights for the
alternatives.
As an example, suppose the state of Maryland is trying to determine the best fishery
management policy in the Chesapeake Bay in response to a dramatic decline in the population
of river herring in the Bay. A partial hierarchy (Step 1) is displayed in Fig. 10. This problem
arose in the 1980s (see (DiNardo et al., 1989) for details). River herring had been a very
abundant fish in the Chesapeake Bay. Many larger fish relied on river herring as a source of
food. Decision makers in Maryland wanted to guard against a ripple effect.
In Step 2, we compare elements two at a time. For example, at level 2, we ask the question:
With respect to the overall goal, which is more important – the scientific or economic factor
– and how much more important is it? This gives us entry a S E in a pairwise comparison
matrix A. The entries are numbers from { 19 , 18 , ..., 21 , 1, 2, ..., 8, 9} and the result is a positive
reciprocal matrix. An illustration is shown in Fig. 11, where for example the entry 5 indicates
that the scientific factor is deemed five times as important as the political factor.
If we fill in the entries within the triangle, the remaining entries are automatically deter-
mined. We observe that we have not been perfectly consistent (since 2 × 2  = 5). AHP

123
588 Annals of Operations Research (2024) 343:573–605

Fig. 12 Step 4 in AHP

provides a way of measuring the consistency of decision makers in making comparisons, but
they are not required or expected to be perfectly consistent.
In Step 3, we apply the eigenvalue method to estimate the weights of the elements at each
level of the hierarchy. In particular, the weights associated with a comparison matrix A are
estimated by solving Aŵ = λ M AX ŵ, where A is the pairwise comparison matrix, λ M AX is
the largest eigenvalue of A, and ŵ is its right eigenvector.
The solution to this (level 2) problem is: Scientific gets a weight of.595, Economic gets
a weight of .276, and Political gets a weight of .128. We point out that the weights sum to
one (except for roundoff errors). The three alternatives include temporarily closing the river
herring fishery, restricting access (by size, season, or both), and maintaining the status quo.
In Step 4, we aggregate the weights over levels 2–4 to arrive at overall weights for the
alternatives at the bottom of the hierarchy (see Fig. 12). Given that the scientific factor is, by
far, the most important factor, it is not surprising that the overall AHP recommendation is to
close the fishery. We have skipped many details in the AHP example. The interested reader
should consult (Saaty, 1980) or (Golden et al., 1989). A recent literature review can be found
in Emrouznejad and Marra (2017).
In the remainder of this section, we present an entirely different approach to estimate the
weights for a pairwise comparison matrix, based on linear programming. In AHP, we specify
wi
ai j to approximate w j
, where wi is the weight of element i. But since ai j is not necessarily
wi
exact, we must acknowledge the possibility of an error. Let w j
= ai j i j (i, j = 1, 2, ..., n)
define an error i j in the estimate ai j . If the decision maker is perfectly consistent, then
wi wi
i j = 1 and ln(i j ) = 0. If ai j < w j
, then i j > 1 and ln(i j ) > 0. If ai j > w j
, then i j < 1
and ln(i j ) < 0.
Given A = [ai j ] which is n × n and positive reciprocal, decision variables wi and i j as
defined above, and the fundamental equation
wi
= ai j i j , ∀i, j, (40)
wj
we can introduce transformed decision variables xi = ln(wi ), yi j = ln(i j ), and z i j = |yi j |.
If we take the natural log of both sides in (40), we obtain

xi − x j − yi j = ln(ai j ), ∀i, j. (41)

Since the right-hand side is a constant, (41) is linear. From the nature of A, it follows that
i j = 1ji and yi j = −y ji . Furthermore, z i j ≥ yi j and z i j ≥ y ji identifies which of i j or  ji

123
Annals of Operations Research (2024) 343:573–605 589

is greater than or equal to 1. Now we are in a position to present a two-stage LP approach to


AHP. The first stage linear program is presented below.


n−1 
n
min zi j (42)
i=1 j=i+1

subject to
xi − x j − yi j = ln(ai j ), i, j = 1, 2, ..., n; i  = j (43)
z i j ≥ yi j , i, j = 1, 2, ..., n; i < j (44)
z i j ≥ y ji , i, j = 1, 2, ..., n; i < j (45)
x1 = 0 (46)
xi − x j ≥ 0, i, j = 1, 2, ..., n; ai j > 1 (47)
xi − x j ≥ 0, i, j = 1, 2, ..., n; aik ≥ a jk for all k; aiq > a jq for some q (48)
z i j ≥ 0, i, j = 1, 2, ..., n (49)
xi , yi j unrestricted i, j = 1, 2, ..., n. (50)

Constraints (43) define the error terms. Constraints (44) and (45) select which of i j
or  ji is greater than or equal to 1. This enables the objective function (42) to minimize
the product of errors (each of which is greater than or equal to 1). After the logarithmic
transformation, this product becomes a summation. The objective function is, essentially, a
measure of the inconsistency in the pairwise comparison matrix. Since there are an infinite
number of solutions to (42)–(45), we can arbitrarily fix the value of w1 . We, therefore, set
x1 = 0 in (46). The final weights can be normalized to sum to one. Constraints (47) and (48)
are element dominance and row dominance constraints, respectively. These are discussed in
detail by Chandran et al. (2005).
The first stage LP minimizes the product of the errors that are greater than 1, but multiple
optimal solutions may exist. In the second stage LP, we select from the set of all optimal
solutions to the first stage LP, a solution that minimizes the maximum of errors i j . The
second stage LP is presented below:

min z max (51)


subject to

n−1 
n
zi j = z ∗ , (52)
i=1 j=i+1

z max ≥ z i j , i, j = 1, 2, ..., n; i < j (53)


and all first stage LP constraints. (54)

In the above formulation, z ∗ is the optimal first stage solution value and z max is the
maximum value of the errors z i j .
If we apply the two-stage LP approach to the comparison matrix in Fig. 11, the second
stage LP yields x1 = 0.0, x2 = −0.7673, and x3 = −1.5347. This results in the same set of
weights obtained using the eigenvalue method (i.e.,.595,.276,.128). For the interested reader,
the first stage LP is presented in Appendix B. In addition, an interesting application of the
two-stage LP approach (involving the ranking of great U.S. military leaders) was undertaken
by Retchless et al. (2007).

123
590 Annals of Operations Research (2024) 343:573–605

Although the eigenvalue method has been around since the 1970s and has been well
studied, the LP approach has a number of benefits. It is easy to understand and computationally
fast. There is readily available (LP) software which facilitates sensitivity analysis. Finally, the
LP model explicitly seeks to minimize inconsistency; this is not the case with the eigenvalue
method.

8 Game theory and LP

Game theory is an important tool used to model rational decision making in situations involv-
ing competition or conflict. Its origins go back to a seminal work by John von Neumann and
Oskar Morgenstern, Theory of Games and Economic Behavior (von Neumann & Morgen-
stern, 1944). Interestingly enough, the publication of this work appeared around the same
time as when George Dantzig was formulating the simplex method for solving linear pro-
graming problems. As this section will illustrate, there is indeed a very close connection
between game theory and linear programming.
Let’s begin with a motivating example. Suppose that a valuable jewel (valued at $90,000)
has just been sold by a museum and is to be delivered to the buyer the next day. In the
meantime, it needs to be safely stored for the evening. There are two vaults (with different
security systems), located far apart in the city, where the jewel can be stored. A well-known
jewel thief knows of the sale, but not where the jewel will be held for the night. Given the
effort involved in defeating a security system, only one location can be selected by the thief
for the robbery. The security system at location A has a 1/10 chance of being breached, while
the more sophisticated system at location B has only a 1/25 chance of being breached. Where
is the best place for the museum to hide the jewel?
The museum (M) has two strategies: Either place the jewel at location A or at location
B. Similarly, the thief (T) has two strategies: Spend the night at location A or at location
B. Based on which pair of strategies is chosen by these two players, we can formulate the
following payoff matrix, which shows the potential rewards to the thief:

The numerical value for (A, A) in this matrix is calculated as (1/10) ∗ 90, 000 = 9000 and
that for (B, B) is (1/25) ∗ 90, 000 = 3600.
Since the payoff for going to location A is larger than that for B, the thief might be tempted
to go to A. However, the museum can figure this out and might then try to fool the thief and
place the jewel instead at B. The thief, being equally intelligent, would figure out the thinking
of the museum and would then go instead to B. However, the museum could again reproduce
this chain of reasoning, and thus place the jewel at A. And so this infinite loop of reasoning
between two skilled adversaries can continue, leading to no obvious rational solution.
The brilliant idea of von Neumann to resolve this impasse was to trust your decision to a
random device: In this case, T should select location A with probability 2/7 and location B
with probability 5/7. Then it won’t matter which strategy M chooses:
M chooses A: The thief’s expected reward is (2/7) ∗ 9000 + (5/7) ∗ 0 = 2571.43
M chooses B: The thief’s expected reward is (2/7) ∗ 0 + (5/7) ∗ 3600 = 2571.43

123
Annals of Operations Research (2024) 343:573–605 591

Because of symmetry, the museum should also select A with probability 2/7 and B with
probability 5/7. Then, it won’t matter which strategy the thief chooses.
Here is another two-person game, in which the row player R has two pure strategies and
the column player C has three pure strategies. The payoffs in the matrix below might indicate
for example the gains or losses in market share to row player R when each combination of
marketing strategies is selected by the two competing companies.

Suppose R plays the mixed strategy (1/2, 1/2); i.e., R chooses strategy 1 with probability
1/2 and strategy 2 with probability 1/2. If C simultaneously plays strategy 1, then the expected
payoff to R is (1/2) ∗ 0 + (1/2) ∗ 5 = 5/2. Alternatively, if C plays 2, then R would have the
expected payoff of 1, whereas if C plays 3, then R would have the expected payoff of −1/2.
In any event, all that R can guarantee is a payoff of min{5/2, 1, −1/2} = −1/2.
Alternatively, suppose R plays the mixed strategy (2/3, 1/3). Then one can calculate the
guaranteed expected gain for R to be min{5/3, 0, 1/3} = 0. At least 0 is better than −1/2,
with the former achieved by using the mixed strategy (2/3, 1/3). Let’s now find the optimal
mixed strategy (x 1 , x2 ) for player R, that is, one that achieves the largest expected payoff v.
Using the mixed strategy (x1 , x2 ) against player C’s first strategy produces an expected
payoff of 5x2 . When used against the second and third strategies, we obtain expected payoffs
of −2x1 + 4x2 and 2x1 − 3x2 , respectively. Consequently the overall guarantee against any
strategy of C is given by v = min{5x2 , −2x1 + 4x2 , 2x1 − 3x2 }. This problem can then be
formulated as the following linear program:

max v (55)
subject to
0x1 + 5x2 ≥ v (56)
− 2x1 + 4x2 ≥ v (57)
2x1 − 3x2 ≥ v (58)
x1 + x2 = 1 (59)
x1 , x2 ≥ 0. (60)

Note that the payoff for player R is not restricted in sign, as it is not known in advance
whether this game is inherently in favor of R or not. While the above linear program is not
in standard form, it can be easily converted into a more familiar form.
In a similar way, we can determine an optimal mixed strategy (y1 , y2 , y3 ) for Player C.
Namely, when this mixed strategy is used against the first strategy of R, the expected value to
Player R is −2y2 + 2y3 ; when played against the second strategy of R, the expected value to
Player R is 5y1 + 4y2 − 3y3 . Since Player C wants to limit the amount that Player R will gain,
Player C wants to choose (y1 , y2 , y3 ) to minimize w = max{−2y2 + 2y3 , 5y1 + 4y2 − 3y3 }.
This is again a linear program:

min w (61)
subject to

123
592 Annals of Operations Research (2024) 343:573–605

0y1 − 2y2 + 2y3 ≤ w (62)


5y1 + 4y2 − 3y3 ≤ w (63)
y1 + y2 + y3 = 1 (64)
y1 , y2 , y3 ≥ 0. (65)

These two linear programs are in fact duals of one another with v ∗ = w ∗ = 2/11, achieved
by setting x ∗ = (7/11, 4/11) and y ∗ = (0, 5/11, 6/11). Since the optimal objective value
is positive, this game is inherently in favor of Player R.
In general, the two linear programs associated with this type of (zero-sum) game are duals
of one another, which is a manifestation of von Neumann’s minimax theorem, first proved
in a 1928 paper and then rejuvenated in his 1944 book with Morgenstern (von Neumann
& Morgenstern, 1944). With this background we can now appreciate the historic meeting
between George Dantzig and John von Neumann in October 1947 at the Institute for Advanced
Study at Princeton.
At that meeting (see (Lenstra et al., 1991)), Dantzig presented the motivating Air Force
logistics problem and its formulation as a linear program. Soon von Neumann impatiently
asked “Get to the point”. So Dantzig, a bit perturbed and ready to impress von Neumann,
quickly sped through the algebraic and geometric concepts underlying linear programming.
After Dantzig’s rapid-fire display of the intricacies of linear programming, von Neumann
stood up and said “Oh, that!” and for the next 90 min proceeded to give a lecture on the
mathematical theory of linear programs and duality, as well as the essentials of Farkas’
Lemma. Dantzig, who had carefully searched the literature and had found nothing like this
before, was simply astounded. Then von Neumann kindly reassured Dantzig that he had
simply reasoned by analogy from his recently published work (1944) on game theory. In
particular, for two-person zero-sum games we have two players with totally opposed interests,
so there is naturally a pair of linked (dual) programs. The extension to general linear programs
is then not such a great leap.

9 Data envelopment analysis

Data Envelopment Analysis (DEA) provides a way to compare the efficiencies of various
production units or firms: that is, organizations that transform measurable inputs into mea-
surable outputs. It has been applied to evaluating the effectiveness of schools, government
departments, hospitals, banks, retail establishments, and other organizations. The follow-
ing example illustrates some of the assumptions underlying this technique and how linear
programming can be applied to carry out such comparisons.
Suppose we have a number of firms, each of which transforms two inputs into a single
output. Specifically, if x1 and x2 are the amounts of inputs 1 and 2, respectively, then the
output is given by the production function y = f (x1 , x2 ). DEA assumes that the production
function f is linear and homogenous (i.e., we have constant returns to scale). As a numerical
illustration, suppose there are eight firms, with respective inputs and outputs given in Table 2.
Because the production function is linear and homogeneous, we can examine the input
levels needed to produce an output level of 1 (see Table 3).
By comparing firms 2 and 6 in Table 3, we see that the same output of firm 6 can be
achieved by firm 2 using fewer resources: Specifically, firm 2 uses no more of input 1 and
strictly less of input 2 (3 < 3.5). That is, firm 2 dominates firm 6 in terms of efficiency. No
such similar dominance exists among the other seven firms. However, it is possible that some

123
Annals of Operations Research (2024) 343:573–605 593

Table 2 Example with eight 1 2 3 4 5 6 7 8


production units
x1 144 460 110 360 780 115 760 231
x2 360 600 660 300 130 175 266 140
y 90 200 110 120 130 50 190 70

Table 3 Inputs needed to 1 2 3 4 5 6 7 8


produce an output level of 1
x1 1.6 2.3 1 3 6 2.3 4 3.3
x2 4 3 6 2.5 1 3.5 1.4 2
y 1 1 1 1 1 1 1 1

Fig. 13 Inputs achieving unit


output for an eight firm example

combination of firms jointly dominates some other firm. To investigate this possibility, we
plot in Fig. 13 the (x1 , x2 ) input values for the eight firms.
Visually we can now see that firm 4 is inefficient relative to the other firms. Namely, the
output of firm 4 can be produced more economically by using a combination of firms 2 and 8.
In other words, the economically efficient firms are those lying on the dotted line in Fig. 13.
To see this numerically, suppose we use 27/106 of firm 2’s inputs and 390/371 of firm 8’s
inputs. Then the total amounts x1 , x2 used will be
(27/106) ∗ [460 600] + (390/371) ∗ [231 140] = [360 300],
exactly the inputs needed by firm 4. However, the output produced by using this combination
of firms 2 and 8 would be
(27/106) ∗ [200] + (390/371) ∗ [70] = [124.53],
which exceeds the output of 120 provided by using firm 4 alone.
More generally, suppose there are k firms, where firm j requires the n inputs
x1 j , x2 j , . . . , xn j and produces the m outputs y1 j , y2 j , . . . , ym j . To determine if firm p is
inefficient relative to the other firms, we can simply solve an LP. Namely, we want to see if
there exists a nonnegative combination, using weights λ j , of the firms that uses no more than

123
594 Annals of Operations Research (2024) 343:573–605

the given inputs to firm p (see (67)) and which produces the maximum possible output (see
(66) and (68)). So we solve the following linear program in variables λ j and z:

max z (66)
subject to

k
xi j λ j ≤ xi p , i = 1, . . . , n (67)
j=1

k
yr j λ j ≥ zyr p , r = 1, . . . , m (68)
j=1

λ j ≥ 0, j = 1, . . . , k. (69)

Notice that λ p = 1 (and all other λ j = 0) is a feasible solution having z = 1, so that the
optimal z ∗ ≥ 1. If in fact we find that z ∗ > 1, then we can conclude that unit p is inefficient,
demonstrated through the explicit weights λ j .
A recent survey on DEA is available in Emrouznejad and Yang (2018).

10 Optimization with a ratio objective

10.1 The Omega ratio

Many optimization problems involve two or more sets of criteria, some of which are of the
“less is better" type such as cost or risk, and some of which are of the “more is better" type,
such as revenue, service level, or expected return. Let us consider the simplest case where there
is one “less is better" criterion, risk (call it x1 ), and one “more is better" criterion, expected
return to our investment portfolio (call it x2 ). One approach for taking into account these
two, possibly incommensurate, criteria is to maximize the ratio x 2 /x1 . If risk is measured by
the standard deviation in return, this ratio is essentially equivalent to the widely used Sharpe
ratio in finance, introduced in Sharpe (1966). Suppose the random return of a portfolio is
denoted by R. Given some threshold or target rate of return τ , the Sharpe ratio is defined as
E(R − τ )/S D(R), where E and SD denote the expected value and standard deviation of the
random variable R.
One criticism of using the Sharpe ratio is that it may be misleading for asymmetric
distributions. To illustrate, consider three possible portfolios and associated growth outcomes
shown in Fig. 14. In each, we invest $1. Portfolio A is a fairly safe portfolio: with probability
0.2 the $1 grows to $1.5 and with probability 0.8 the portfolio returns simply the $1 invested.
Portfolio C is risky in that there is a nontrivial probability of losing a lot of money. With
probability 0.2 the $1 drops in value to $0.7 and with probability 0.8 the portfolio grows to
$1.2. Portfolio B is somewhere in between the other two portfolios in terms of risk; with
probability 0.5 the $1 drops in value to $0.9 and with probability 0.5 the portfolio grows to
$1.3. An interesting feature of the three portfolios is that each has an expected growth factor
of 1.1 (or an expected return of 0.1), and a standard deviation in return of 0.2. Thus, for any
specified τ , all three portfolios have the same Sharpe ratio.
Keating and Shadwick (2002), concerned about downside risk, suggested what they
called the Omega ratio. In the denominator, it considers only returns below the target,
and disregards returns above the target. Kapsos et al. (2014) showed that the Omega ratio
= E(R − τ )/E(max(0, τ − R) is a linear program except for a ratio objective.

123
Annals of Operations Research (2024) 343:573–605 595

Fig. 14 Three portfolios and their


associated growth outcomes

Suppose we have m observations (or scenarios) on the returns from n stocks (or invest-
ments), specifically, we are given: gs j = return in scenario s of stock j. The crucial decision
variables are w j , the fraction of the portfolio invested in stock j.
The Omega ratio model then has the constraints:

n
w j = 1, (Portfolio fractions must sum to 1)
j=1

n
rs = gs j w j , s = 1, . . . , m (Return under scenario s)
j=1

ds ≥ τ − rs , s = 1, . . . , m (Downside shortfall in scenario s)


m
x1 = ds /m, (Average downside shortfall)
s=1
m
x2 = ( rs /m) − τ, (Average return minus target return)
s=1

and the objective is to maximize the Omega ratio x2 /x1 .


The above model is nonlinear because of the ratio objective. Charnes and Cooper (1962)
showed that it can be made linear by scaling all constraints by the denominator, x 1 , of the
objective function. This gives:
max x̂2 /x̂1 = x̂2 /1 (70)
subject to
δ = 1/x1 (or equivalently x1 = 1/δ), (71)
x̂2 = δx2 , (72)
ŵ j = δw j , j = 1, . . . , n (73)
r̂s = δrs , s = 1, . . . , m (74)
d̂s = δds , s = 1, . . . , m (75)
x̂1 = δx1 = x1 /x1 = 1, (76)
n
ŵ j = δ, (77)
j=1

n
r̂s = gs j ŵ j , s = 1, . . . , m (78)
j=1

d̂s ≥ δτ − r̂s , s = 1, . . . , m. (79)

123
596 Annals of Operations Research (2024) 343:573–605

We now use the same trick as in the Circularity example of Sect. 3. We disregard the
nonlinear constraints (the first five) and then solve the remaining LP. In a post-processing
step, we use the first six constraints to recover values for the original w j , x1 and x2 .
The above example illustrated the case where the sense of the objective was Maximize. It
is easy to see that the same idea works if the objective is Minimize.

10.2 Using repeated LPs for the ratio objective case

A disadvantage of the above approach is that it does not work if the model either has integer
variables, or if some of the constraints are nonlinear. An alternative way of handling a problem
with a ratio objective was described by Dinkelbach (1967). We can apply this approach
to maximizing the ratio x2 /x1 as follows. Suppose we have a feasible solution for which
x2 /x1 = q. We could then add the constraint x2 /x1 ≥ q, or x2 − q x1 ≥ 0. Note, we assume
x1 > 0. Now, there is a feasible solution with x2 /x1 > q, if and only if, when we use the
objective z = max x2 − q x1 , we find z > 0. This leads to the Dinkelbach iterative algorithm:

0. Guess an initial value q,


1. Solve the problem with the objective max x2 − q x1 ;
2. Compute r = x2 /x1 ;
3. If r = q, stop,
Else set q = r and go to (1);

An obvious question is how many iterations does the Dinkelbach method require. Table 4
includes the results for an Omega ratio problem using yearly data from 1975 to 2023 for
13 popular stocks such as IBM, P&G, and Merck. The models and data are available in the
Models Library at www.lindo.com. Search on Omega Ratio.
We can see that there were never more than 5 LP solves required. If a good initial guess
was supplied, then the number of LP solves was 2. If we had a perfect guess, then the number
of LP solves would be 1. One other interesting point about a “not quite" LP but with a ratio
objective is that there is always an optimum which is a corner point of the LP feasible region.
Thus, the number of iterations in the Dinkelbach method is never greater than the number of
corner points of the feasible region.

Table 4 Results of an Omega τ Initial q guess


ratio problem
0 1 1000
Iterations Optimal q = r

0.0 4 3 3 10.512
0.025 4 4 3 6.070
0.05 4 4 3 3.391
0.075 4 4 4 1.862
0.10 4 3 5 1.174
0.125 3 3 4 0.677
0.15 2 3 3 0.377

123
Annals of Operations Research (2024) 343:573–605 597

11 A scheduling problem

11.1 An LP approach to the bicycle problem

While Integer Programming (IP) formulations are commonly used when trying to model a
scheduling problem, an LP approach can also be used to model some types of scheduling
problems. One interesting example is provided by the bicycle problem (Chvátal, 1983). In this
problem there are n people who want to travel from a common origin to a common destination
point, situated at a distance of d units away. There is one bicycle available, which can be
used by only one person at a time. We also assume that each person uses the bicycle at most
once. Each person i can walk with a speed of wi and bike with a speed of bi , where bi > wi .
The goal is to minimize the time it takes for the last person to reach the destination point. In
other words, this is a scheduling problem, where the schedule shows how the resource (i.e.,
the bicycle) can be shared among the people in order to transport all people to the destination
point in the least amount of time.
We will first disregard the sequencing details of the problem, and state some linear con-
straints that must be satisfied by any solution. We solve this LP and obtain a lower bound
t ∗ on the shortest duration solution. In many cases, the lower bound is attainable and an
optimal schedule can be easily deduced. When this is the case, the simple post-processing
step, presented in Sect. 11.2, can often be applied to obtain a detailed schedule with duration
equal to t ∗ . In other cases, the optimal duration to the scheduling problem is greater than t ∗ .
A formulation of the bicycle problem is as follows:

min t (80)
subject to
xi + u i + yi + z i ≤ t, i = 1, . . . , n (81)
wi xi − wi u i + bi yi − bi z i = d, i = 1, . . . , n (82)
n n
yi + z i ≤ t, (83)
i=1 i=1
n n
bi yi − bi z i ≤ d, (84)
i=1 i=1
xi , u i , yi , z i ≥ 0, i = 1, . . . , n. (85)

where xi is the time person i spends walking forward, u i is the time person i spends walking
backwards, yi is the time person i spends cycling forward, and z i is the time person i spends
cycling backwards. Again, in the above formulation, t provides a lower bound on the time it
will take for the last person to reach the destination point.
Constraints (81) ensure that each person uses at most t units of time. According to con-
straints (82), we stipulate that each person travels exactly d units of distance. Constraint
(83) guarantees that the time spent on the bicycle is at most t. Constraint (84) says that the
forward travel distance on the bike is at most d. The objective function is in (80) and the
non-negativity restrictions are in (85). A solution to this LP provides an efficient assignment
of walking and cycling durations for each of the travelers, but not a detailed schedule.
An important observation is that this formulation allows for people to move backwards
toward the origin point. In fact, for some instances of this problem, the optimal solution
does, in fact, have people moving in the reverse direction. Consider the example presented
in Chvátal (1983), where we have n = 3 people that need to travel a distance of d = 100,

123
598 Annals of Operations Research (2024) 343:573–605

Fig. 15 Traveling schedule of


each person from origin to
destination: solid line shows
travel on bicycle and dashed line
travel on foot

Fig. 16 Utilization of the bicycle in the optimal solution of the example

with the following moving speeds: w1 = 1, b1 = 6, w2 = 2, b2 = 8, w3 = 1, b3 = 6. In


this case, the optimal value of t = 55 and a corresponding schedule is the following:
• Person 1: cycles from 0 to 54, and walks from 54 to 100
• Person 2: walks from 0 to 54, cycles backwards from 54 to 46, and walks from 46 to 100
• Person 3: walks from 0 to 46, and cycles from 46 to 100
Fig. 15 includes a visualization of the above solution. The solid lines represent the distance
traveled on the bicycle and the dashed lines represent travel on foot.
In other words, person 2 walks fast enough so that he can ride the bicycle back to distance
46 for person 3 to use. Thus, person 2 ends up walking for 108 units of distance in total.
In this solution, all people reach the destination in 55 units of time. One more interesting
observation about this optimal solution is that even though the bicycle allows all people to
travel faster, it is used for a relatively small portion of the time. Figure 16 shows the units of
time that the bicycle is being used by any of the three people. Of the 55 units of time that
it took for the three people to travel from the origin to the destination, the bicycle was only
used for 19 units of time.
In the LP presented in (80)–(85), it can be shown (see (Chvátal, 1983)) that for any optimal
solution the following holds: u i = 0 and yi z i = 0. In other words, in no optimal solution
will (1) a person have to walk backwards, and (2) each person may have to bike forward or
backwards, but not both.
Although the LP provides a lower bound and it holds for more general cases (e.g., multiple
bikes), Chvátal (1983) provides an example (with four travelers) such that the bound is not
tight. If there is one bicycle, at most one person can use it at any time. A constraint of this
type is missing from the formulation. Adding one would probably require an IP formulation.
Integer programming is another fascinating and widely applicable topic in operations
research. As mentioned in Nemhauser and Wolsey (1999), LP algorithms are generally used
as critical subroutines in IP algorithms. It is, therefore, not much of an exaggeration to say:
without LP, there would be no IP.

11.2 Deducing a detailed schedule from the LP solution

If we restrict ourselves to Walk → Bike → Walk (W → B → W) schedules, where each person


first walks a distance (possibly 0), then bikes the distance (perhaps backwards) specified by

123
Annals of Operations Research (2024) 343:573–605 599

Table 5 Expected scores for 10 Gymnast Event Total


gymnasts in 4 events
1 2 3 4

1 9.5 9.5 9.5 9.5 38.0


2 9.5 9.5 9.5 9.5 38.0
3 9.2 9.2 8.6 8.6 35.6
4 8.6 8.6 9.2 9.2 35.6
5 9.0 9.0 9.0 9.0 36.0
6 9.1 9.1 8.6 8.6 35.4
7 8.6 8.6 9.1 9.1 35.4
8 9.2 9.2 8.6 8.6 35.6
9 8.6 8.6 9.2 9.2 35.6
10 8.9 8.9 8.9 8.9 35.6

the LP solution, and then walks the remaining distance to the endpoint d, then it is fairly easy
to generate a feasible detailed schedule. This W → B → W algorithm is:

Place the bicycle initially at 0.


For p = 1 to number of persons:
Person p walks to current bicycle position;
Person p bikes the distance specified by the LP solution;
Update bike position;
Person p walks the rest of way to finish at point d;

Note that the ordering of persons in the loop is important; a person who bikes backwards
should not correspond to p = 1.

12 Fundamental gymnastics

In the world of intercollegiate gymnastics, the selection of team members to compete in the
various events can be decisive in determining the overall outcome of a meet. Often the overall
scores between competing teams can differ by just a half of a point, out of a total team score
of around 180. So choosing an optimal lineup presents an interesting optimization problem
with practical consequences. This problem was first formulated by Ellis and Corn (1984),
the latter being a coach of women’s gymnastics at Utah State University.
In the competition being analyzed, there are four events (vault, uneven bars, balance
beam, floor exercise) and the coach needs to select at most six out of the n team members to
participate in each event. In addition, at least four of these participants are to be designated
as “all-rounders”, meaning that they must compete in all four events. Historical data are
available that indicate si j , the expected number of points out of 10 scored by participant i in
event j. An example problem (Land & Powell, 1985) involving a team of ten gymnasts is
shown in Table 5, which provides the expected score si j of gymnast i in event j.
Given a feasible assignment of gymnasts to events, the recorded team score is not simply
the sum of the si j for all selected participants i over the four events j, as formulated by
Ellis and Corn (1984). Rather, the recorded team score is computed as the sum of the top
five scores of team members in each of the four events. While maximizing the sum of all

123
600 Annals of Operations Research (2024) 343:573–605

participant scores over the four events will often maximize the recorded team score, this is
not always the case, as we will soon see.
The subsequent (Land & Powell, 1985) optimization model incorporated the correct objec-
tive function by adding additional decision variables z i j to the original (Ellis & Corn, 1984)
model. Their formulation is provided below, where there are discrete 0–1 decision variables
xi j indicating whether (1) or not (0) gymnast i participates in event j, either as a regular
contestant or as an all-rounder. Additionally there are discrete 0–1 variables yi indicating
whether (1) or not (0) gymnast i participates as an all-rounder. To ensure that only the top
five scores in each event are recorded, Land and Powell (1985) introduced variables z i j ,
which serve to indicate whether or not the score si j of gymnast i is to be counted for event
j. Their relaxed formulation as a linear program (allowing all variables to be continuous) is
as follows.


n 
4
max si j z i j (86)
i=1 j=1

subject to
xi j ≥ yi , i = 1, . . . , n; j = 1, . . . , 4 (87)

n
z i j ≤ 5, j = 1, . . . , 4 (88)
i=1
n
yi ≥ 4 (89)
i=1
n
xi j = 6, j = 1, . . . , 4 (90)
i=1
z i j ≤ xi j , i = 1, . . . , n; j = 1, . . . , 4 (91)
0 ≤ yi ≤ 1, i = 1, . . . , n (92)
0 ≤ xi j ≤ 1, z i j ≥ 0, i = 1, . . . , n; j = 1, . . . , 4. (93)
To understand this formulation, let’s first examine the constraints. Constraint (87) indi-
cates that a gymnast designated as an all-rounder must participate in all four events and
constraint (89) enforces that at least four all-rounders must be selected. Constraint (88)
restricts the recorded team scores for each event to include only the top five scores for that
event; constraints (91) enforce that scores are counted for an event only for gymnasts selected
to compete for that event. The equality (90) ensures that exactly six participants are selected
to compete for each event. Finally, the objective function to be maximized in (86) correctly
counts only the top five scores in each event.
As a specific illustration, consider the previous example involving a team of ten gymnasts,
competing in four events, with expected scores shown in Table 5. Solving the linear program
defined by (86)–(93) does indeed produce integer values for the x, y, z variables, yielding
the solution shown in Table 6. Here we see that gymnasts 1, 2, 4 and 8 are designated as
all-rounders. We have indicated in bold the top five scores for each event, their overall sum
being 186.0, the largest possible recorded team score. Note that the sum of all scores for the
solution in Table 6 is 220.4 and this is not the maximum sum of scores possible (221.2); an
assignment producing this maximum sum of scores yields however a suboptimal value 185.6
for the recorded team score.

123
Annals of Operations Research (2024) 343:573–605 601

Table 6 Team selection (shown Gymnast Event


in bold) that maximizes recorded
team score 1 2 3 4 Total

1 9.5 9.5 9.5 9.5 38.0


2 9.5 9.5 9.5 9.5 38.0
3 9.2 9.2 18.4
4 8.6 8.6 9.2 9.2 35.6
5
6 9.1 9.1 18.2
7 9.1 9.1 18.2
8 9.2 9.2 8.6 8.6 35.6
9 9.2 9.2 18.4
10

This example again illustrates our theme of unexpected and surprising applications of
linear programming. It is unexpected that optimization models would arise in the selection of a
lineup for gymnastics competitions and it is surprising that a linear programming formulation
can yield discrete values for the decision variables.

13 Conclusions

In this paper, we have tried to illustrate the enduring power and flexibility of linear program-
ming. From Sect. 2 to Sect. 12, we have briefly outlined some noteworthy appearances of
linear programming. For each of these topics, there are additional refinements of the method-
ology presented as well as countless applications. Some of our references (e.g., Emrouznejad
and Marra (2017); Emrouznejad and Yang (2018); Jones and Tamiz (2010)) provide a partial
list of these applications. The key point, of course, is that Professor Harold Hotelling (recall
Sect. 1) and other intellectual giants from the early days of operations research would be
amazed at the broad array of surprising and unexpected applications of linear programming.
Although each section is self-contained, there emerge a few unifying ideas such as the
handling of absolute values, minimizing a max function or maximizing a min function via
bounding, and taking the natural log of constraints in ratio form. In addition, linear pro-
gramming relaxation allows us to drop nonlinear constraints in order to solve a simple LP;
sometimes a post-processor can be applied to recapture the dropped constraints, as in Sects. 10
and 11. Also, as observed in Sect. 10, ratio constraints and objectives can often be converted
to linear ones. However, the most important commonality is that with a dose of cleverness,
linear programming can be so widely and successfully applied.
There are other general applications that we might have included, e.g., minimizing a piece-
wise linear convex function due to Charnes and Lemke (1954), and the deep and fundamental
connection between linear programming and integer programming. An important and early
illustration of this last item is the classic article by Dantzig et al. (1954) in which the authors
solve a 49-city traveling salesman problem (an integer program) using linear programming
only; of course, this last item probably requires a book rather than a section.
Linear programming is the fundamental model in operations research and the simplex
method is the fundamental algorithm in operations research. Indeed, the simplex method
ranks among the top ten algorithms in the twentieth century (Dongarra & Sullivan, 2000).

123
602 Annals of Operations Research (2024) 343:573–605

While more than 75 years old, this technique continues to be competitive (and we marvel at
37-year-old Novak Djokovic). In closing, we should all be proud to be operations researchers.

Appendix A On the uniqueness of solutions to linear programs

Appa (2002) presents a method which examines whether an LP has a unique optimal solution,
and, in the case that the solution is not unique, it provides an alternative optimal solution. In
particular, this is achieved by solving an additional LP as described below.
Consider a LP in the standard form:
 n
max cjxj (A1)
j=1

subject to

n
ai j x j = bi , ∀i (A2)
j=1

x j ≥ 0, ∀ j. (A3)
Let x ∗ = {x1∗ , x2∗ , ..., xn∗ } be an optimal basic solution to the above LP, and let T denote
the set of all variables in x ∗ that are equal to zero. We also define parameter d j to be equal
to 1 if j is in T , and to equal 0 otherwise. In order to check if x ∗ is unique, we can solve the
following LP:
 n
max djxj (A4)
j=1

subject to

n
ai j x j = bi , ∀i (A5)
j=1

n 
n
cjxj = c j x ∗j (A6)
j=1 j=1

x j ≥ 0, ∀ j. (A7)
Let x̄ = {x̄1 , x̄2 , ..., x̄n } be an optimal
 solution to the above LP. If the objective value equals
zero when solved optimally (i.e., nj=1 d j x̄ j = 0), then x ∗ is a unique optimal solution to

the initial LP. On the other hand, if nj=1 d j x̄ j > 0, then x̄ is an alternative optimal solution
of the initial LP. In other words, what the second LP does is that it forces the variables in x ∗
that are equal to zero to take strictly positive values if it is possible (objective function (A4)),
while maintaining the same objective value obtained in the initial LP (constraint (A6)).
Note that, if the reduced costs are available, we can modify the definitions of T and d j .
In particular, we can let T denote the set of all variables in x ∗ that are equal to zero and have
a reduced cost of 0. If T is empty, then there are no multiple optimal solutions. If T is not
empty, set coefficient d j equal to 1 if j is in T , and equal to 0 otherwise. Then, we solve the
above LP.
The above approach will generate one alternative optimum. One can with probability
approaching 1 generate all alternative corner point optima by the following variation of the
procedure:
Repeat until some termination condition is met:

123
Annals of Operations Research (2024) 343:573–605 603

For j = 1 to n:
Generate random values for d j in the interval [−1, +1];
Solve (A4)–(A7) and store the point if new.

Appendix B An example of a first stage LP for AHP

For the pairwise comparison matrix in Fig. 11, the first stage LP given below emerges:

min z 12 + z 13 + z 23
subject to
x1 − x2 − y12 = ln(2) x2 − x1 − y21 = ln(1/2)
x1 − x3 − y13 = ln(5) x3 − x1 − y31 = ln(1/5)
x2 − x3 − y23 = ln(2) x3 − x2 − y32 = ln(1/2)
z 12 ≥ y12 z 12 ≥ y21
z 13 ≥ y13 z 13 ≥ y31
z 23 ≥ y23 z 23 ≥ y32
x1 − x2 ≥ 0 x1 = 0
x1 − x3 ≥ 0 z i j ≥ 0, ∀i, j
x2 − x3 ≥ 0 xi , yi j unrestricted, ∀i, j.

Acknowledgements We thank Austin Buchanan at Oklahoma State University for pointing out a small error
in an earlier version of Appendix A.

Funding Open access funding provided by the Carolinas Consortium.

Declarations
Conflict of interest The authors declare that they have no Conflict of interest.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included in the
article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References
Agarwal, R., & Karahanna, E. (2000). Time flies when you’re having fun: Cognitive absorption and beliefs
about information technology usage. MIS Quarterly, 24(4), 665–694.
Anderson, D. & Bjarnadottir M. (2024). As good as it gets? A new approach to estimating possible prediction
performance. To appear in PLOS ONE.
Appa, G. (2002). On the uniqueness of solutions to linear programs. Journal of the Operational Research
Society, 53(10), 1127–1132.
Assad, A. A., & Gass, S. I. (2011). Profiles in operations research: Pioneers and innovators. Springer.

123
604 Annals of Operations Research (2024) 343:573–605

Chandran, B., Golden, B., & Wasil, E. (2005). Linear programming models for estimating weights in the
analytic hierarchy process. Computers & Operations Research, 32(9), 2235–2254.
Charnes, A., & Cooper, W. W. (1961). Management models and industrial applications of linear programming.
John Wiley.
Charnes, A., & Cooper, W. W. (1962). Programming with linear fractional functionals. Naval Research Logis-
tics Quarterly, 9(3–4), 181–186.
Charnes, A., & Lemke, C. E. (1954). Minimization of non-linear separable convex functionals. Naval Research
Logistics Quarterly, 1(4), 301–312.
Chvátal, V. (1983). On the bicycle problem. Discrete Applied Mathematics, 5(2), 165–173.
Dantzig, G., Fulkerson, R., & Johnson, S. (1954). Solution of a large-scale traveling-salesman problem.
Operations Research, 2(4), 365–462.
DiNardo, G., Levy, D., & Golden, B. (1989). Using decision analysis to manage Maryland’s river herring
fishery: An application of AHP. Journal of Environmental Management, 29(2), 192–213.
Dinkelbach, W. (1967). On nonlinear fractional programming. Management Science, 13(7), 492–498.
Dongarra, J., & Sullivan, F. (2000). Guest editors’ introduction: The top 10 algorithms. Computing in Science
& Engineering, 2(1), 22–23.
Ellis, P. M., & Corn, R. W. (1984). Using bivalent integer programming to select teams for intercollegiate
women’s gymnastics competition. Interfaces, 14(3), 41–46.
Emrouznejad, A., & Marra, M. (2017). The state of the art development of AHP (1979–2017): A literature
review with a social network analysis. International Journal of Production Research, 55(22), 6653–6675.
Emrouznejad, A., & Gl, Yang. (2018). A survey and analysis of the first 40 years of scholarly literature in
DEA: 1978–2016. Socio-Economic Planning Sciences, 61, 4–8.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2),
179–188.
Flavell, R. (1976). A new goal programming formulation. Omega, 4(6), 731–732.
Freed, N., & Glover, F. (1981). Simple but powerful goal programming models for discriminant problems.
European Journal of Operational Research, 7(1), 44–60.
Freed, N., & Glover, F. (1986). Evaluating alternative linear programming models to solve the two-group
discriminant problem. Decision Sciences, 17(2), 151–162.
Gass, S., Witzgall, C., & Harary, H. H. (1998). Fitting circles and spheres to coordinate measuring machine
data. The International Journal of Flexible Manufacturing Systems, 10(1), 5–25.
Gochet, W., Stam, A., Srinivasan, V., et al. (1997). Multigroup discriminant analysis using linear programming.
Operations Research, 45(2), 213–225.
Golden, B., Schrage, L., Shier, D., et al. (2021). The power of linear programming: Some surprising and
unexpected LPs. 4OR, 19(1), 15–40.
Golden, B. L., Wasil, E. A., & Harker, P. T. (Eds.). (1989). The analytic hierarchy process: Applications and
studies. Springer.
Jones, D., & Tamiz, M. (2010). Practical goal programming. Springer.
Kapsos, M., Zymler, S., Christofides, N., et al. (2014). Optimizing the Omega ratio using linear programming.
Journal of Computational Finance, 17(4), 49–57.
Keating, C., & Shadwick, W. F. (2002). A universal performance measure. Journal of Performance Measure-
ment, 6(3), 59–84.
Koehler, G. J. (1990). Considerations for mathematical programming models in discriminant analysis. Man-
agerial and Decision Economics, 11(4), 227–234.
Land, A., & Powell, S. (1985). Note: More gymnastics. Interfaces, 15(4), 52–54.
Lenstra, J. K., Rinnooy Kan, A. H. G., & Schrijver, A. (Eds.). (1991). History of mathematical programming:
A collection of personal reminiscences. Centrum voor Wiskunde en Informatica.
Nemhauser, G. L., & Wolsey, L. A. (1999). Integer and Combinatorial Optimization. Wiley-Interscience.
Retchless, T., Golden, B., & Wasil, E. (2007). Ranking US army generals of the 20th century: A group
decision-making application of the analytic hierarchy process. Interfaces, 37(2), 163–175.
Saaty, T. L. (1977). A scaling method for priorities in hierarchical structures. Journal of Mathematical Psy-
chology, 15(3), 234–281.
Saaty, T. L. (1980). The analytic hierarchy process. McGraw-Hill.
Sharpe, W. F. (1966). Mutual fund performance. The Journal of Business, 39(1), 119–138.
Shinmura, S. (2016). New theory of discriminant analysis after R. Fisher: Springer.
Stigler, S. M. (1981). Gauss and the invention of least squares. The Annals of Statistics, 9(3), 465–474.
von Neumann, J. & Morgenstern, O. (1944). Theory of Games and Economic Behavior (60th Anniversary
Commemorative Edition). Princeton University Press, http://www.jstor.org/stable/j.ctt1r2gkx.
Wang, Y. (2017). Operations research 04G: Goal programming. Retrieved January 8, 2020, from https://www.
youtube.com/watch?v=D1xYQdnmKvY

123
Annals of Operations Research (2024) 343:573–605 605

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

123

You might also like