0% found this document useful (0 votes)
182 views22 pages

Dea

The document describes how to efficiently solve multiple small linear programming (LP) models that arise in Data Envelopment Analysis (DEA) by combining the models into larger batches. DEA involves solving a separate LP for each Decision Making Unit (DMU) to calculate its efficiency. Solving many small LPs individually in a loop can be slow in GAMS. The document proposes two approaches: 1) indexing the DEA model on the DMU set and looping to solve a single model for each DMU, and 2) combining multiple small DEA models into a single larger LP to improve performance when using modern solvers.

Uploaded by

Coxa100Nocao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
182 views22 pages

Dea

The document describes how to efficiently solve multiple small linear programming (LP) models that arise in Data Envelopment Analysis (DEA) by combining the models into larger batches. DEA involves solving a separate LP for each Decision Making Unit (DMU) to calculate its efficiency. Solving many small LPs individually in a loop can be slow in GAMS. The document proposes two approaches: 1) indexing the DEA model on the DMU set and looping to solve a single model for each DMU, and 2) combining multiple small DEA models into a single larger LP to improve performance when using modern solvers.

Uploaded by

Coxa100Nocao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

EFFICIENTLY SOLVING DEA MODELS WITH GAMS

ERWIN KALVELAGEN

Abstract. Data Envelopment Analysis deals with solving a series of small


linear programming models. This document describes a simple way to combine
a number of these small models to improve performance. Especially the with
the current crop of state-of-the-art linear programming solvers it is beneficial
to solve these small models in relative large batches.

1. Data envelopment analysis


Data envelopment analysis or DEA [3, 4, 7] is an LP based technique for evaluating the relative efficiency of Decision Making Units (DMUs). In many cases
the performance non-profit and government organizational units is very difficult to
compare: their outputs are not readily comparable and no monetary value can be
easily assigned to inputs or outputs. With this technique, one can make draw some
conclusions, using concept related to an efficient frontier known from quadratic
programming applications in finance [13]. It is a non-parametric method: we dont
need an explicit specification of the functional relationship between inputs and
outputs (i.e. a production function) [5].
We assume that each DMU j has multiple inputs xi,j and multiple outputs yk,j .
A relative efficiency measure is defined by:
P
uk yk,j
(1)
Efficiency = Pk
i vi xi,j
where u and v are weights. Often the efficiency is scaled so that it ranges from
[0, 1].
The weights form a problem: setting a uniform value for them over all DMUs
is rather arbitrary. The main idea behind DEA, is that we allow each DMU j0
to set its own weights. It can use the following optimization problem for that:
maximize the efficiency of DMU j0 subject to the condition that all efficiencies of
other DMUs remain less than or equal to 1. I.e.

(2)

P
uk yk,j0
maximize 0 = Pk
u,v
i vi xi,j0
P
u
y
k k,j
subject to Pk
1 j
v
i
i xi,j
uk , vi 0

This is not an LP however. A simple work around is to fix the denominator to


a constant value, e.g. 1.0, which can be interpreted as setting a constraint on the
Date: November 12, 2002, updated November 2, 2004.
1

ERWIN KALVELAGEN

weights vi (often weights are normalized to add up to one; this can be considered
as a slightly more complex normalization). This results in:
maximize
u,v

subject to
(3)

uk yk,j0

vi xi,j0 = 1

uk yk,j

vi xi,j j

uk , vi 0
It is noted that x and y are no decision variables but rather data. The decision
variables are the weights u and v.
In some places [7] the dual has been mentioned as being preferable from a computational point of view (typical primal models have many more rows than columns).
The dual DEA model can be stated as:
minimize z0 = j0

X
j yk,j yk,j0
(4)

j0 xi,j0

j xi,j

j 0
Other forms for the DEA model have been proposed. The model we discussed
above is called the CCR model after the authors of [3]. Some variants set a lower
bound on uk and vi to prevent zero weights: uk , vi . Another basic model
is the BCC model [1]. This model is based on the dual, and adds a restriction on
the s:
minimize z0 = j0

X
j yk,j yk,j0
j

(5)

j0 xi,j0

j xi,j

j = 1

j 0
This transforms the model from being constant returns-to-scale to variable
returns-to-scale. The scores from this model are sometimes called pure technical
efficiency scores as they eliminate scale-efficiency from the analysis [2, 17].
2. GAMS implementation
We have to repeat the solution of the DEA LP model for every DMU. In GAMS
this is coded quite easily using a loop:

EFFICIENTLY SOLVING DEA MODELS WITH GAMS

Model dea.gms.

$ontext
Data Envelopment Analysis (DEA) example
Erwin Kalvelagen, may 2002
Data from:
Emrouznejad, A (1995-2001),
" Ali Emrouznejads DEA HomePage",
Warwick Business School, Coventry CV4 7AL, UK

$offtext

sets i
j
inp(j)
outp(j)
;

"DMUs" /Depot1*Depot20/
inputs and outputs /stock, wages, issues, receipts, reqs/
inputs /stock, wages/
outputs /issues, receipts, reqs/

Table data(i,j)
stock wages
Depot1
3
5
Depot2
2.5
4.5
Depot3
4
6
Depot4
6
7
Depot5
2.3
3.5
Depot6
4
6.5
Depot7
7
10
Depot8
4.4
6.4
Depot9
3
5
Depot10
5
7
Depot11
5
7
Depot12
2
4
Depot13
5
7
Depot14
4
4
Depot15
2
3
Depot16
3
6
Depot17
7
11
Depot18
4
6
Depot19
3
4
Depot20
3
6
;

parameter
x0(inp)
y0(outp)
x(inp,i)
y(outp,i)
;

issues
40
45
55
48
28
48
80
25
45
70
45
45
65
38
20
38
68
25
45
57

receipts
55
50
45
20
50
20
65
48
64
65
65
40
25
18
50
20
64
38
67
60

reqs
30
40
30
60
25
65
57
30
42
48
40
44
35
64
15
60
54
20
32
40

inputs of DMU j0
outputs of DMU j0
inputs of DMU i
outputs of DMU i

positive variables
v(inp) input weights
u(outp) output weights
;
variable
eff efficiency
;
equations
objective
normalize

objective function: maximize efficiency


normalize input weights

1http://amsterdamoptimization.com/models/dea/dea.gms

ERWIN KALVELAGEN

limit(i)

"limit other DMUs efficiency";

objective..

eff =e= sum(outp, u(outp)*y0(outp));

normalize..

sum(inp, v(inp)*x0(inp)) =e= 1;

limit(i)..

sum(outp, u(outp)*y(outp,i)) =l= sum(inp, v(inp)*x(inp,i));

model dea /objective, normalize, limit/;

alias (i,iter);
x(inp,i) = data(i,inp);
y(outp,i) = data(i,outp);
parameter efficiency(i) efficiency of each DMU;
loop(iter,
x0(inp) = x(inp, iter);
y0(outp) = y(outp, iter);
solve dea using lp maximizing eff;
abort$(dea.modelstat<>1) "LP was not optimal";
efficiency(iter) = eff.l;
);

display efficiency;
*
* create sorted output
*
set r /rnk1*rnk1000/;
parameter rank(i);
alias (i,ii);
rank(i) = sum(ii$(efficiency(ii)>=efficiency(i)), 1);
parameter efficiency2(r,i);
efficiency2(r,i)=efficiency(i)$(rank(i)=ord(r));
option efficiency2:4:0:1;
display efficiency2;

The result is:


----

105 PARAMETER efficiency2

rnk4 .Depot12
rnk4 .Depot19
rnk7 .Depot5
rnk10.Depot10
rnk13.Depot1
rnk16.Depot4
rnk19.Depot8

1.0000
1.0000
0.9466
0.8889
0.8204
0.6528
0.5169

rnk4 .Depot14
rnk5 .Depot9
rnk8 .Depot2
rnk11.Depot13
rnk14.Depot3
rnk17.Depot11
rnk20.Depot18

1.0000
0.9634
0.9417
0.8254
0.8148
0.6313
0.4201

rnk4 .Depot15
rnk6 .Depot20
rnk9 .Depot16
rnk12.Depot6
rnk15.Depot7
rnk18.Depot17

1.0000
0.9517
0.9091
0.8228
0.7111
0.5495

The sorting step is interesting: it is rather non-intuitive, but it works. Notice


the behavior when multiple entries have the same values.
An alternative formulation can be formed by not copying data into x0 and y0
but to make the model indexed on a set. We then loop over this set.
Model dea2.gms.

$ontext
Data Envelopment Analysis (DEA) example
Indexed equations formulation.
2http://amsterdamoptimization.com/models/dea/dea2.gms

EFFICIENTLY SOLVING DEA MODELS WITH GAMS

Erwin Kalvelagen, may 2002


Data from:
Emrouznejad, A (1995-2001),
" Ali Emrouznejads DEA HomePage",
Warwick Business School, Coventry CV4 7AL, UK

$offtext

sets i
j
inp(j)
outp(j)
;

"DMUs" /Depot1*Depot20/
inputs and outputs /stock, wages, issues, receipts, reqs/
inputs /stock, wages/
outputs /issues, receipts, reqs/

set j0(i) current DMU;


Table data(i,j)
stock wages
Depot1
3
5
Depot2
2.5
4.5
Depot3
4
6
Depot4
6
7
Depot5
2.3
3.5
Depot6
4
6.5
Depot7
7
10
Depot8
4.4
6.4
Depot9
3
5
Depot10
5
7
Depot11
5
7
Depot12
2
4
Depot13
5
7
Depot14
4
4
Depot15
2
3
Depot16
3
6
Depot17
7
11
Depot18
4
6
Depot19
3
4
Depot20
3
6
;

issues
40
45
55
48
28
48
80
25
45
70
45
45
65
38
20
38
68
25
45
57

receipts
55
50
45
20
50
20
65
48
64
65
65
40
25
18
50
20
64
38
67
60

reqs
30
40
30
60
25
65
57
30
42
48
40
44
35
64
15
60
54
20
32
40

parameter
x(inp,i) inputs of DMU i
y(outp,i) outputs of DMU i
;
positive variables
v(inp) input weights
u(outp) output weights
;
variable
eff efficiency
;
equations
objective(i)
normalize(i)
limit(i)

objective function: maximize efficiency


normalize input weights
"limit other DMUs efficiency";

objective(j0)..

eff =e= sum(outp, u(outp)*y(outp,j0));

normalize(j0)..

sum(inp, v(inp)*x(inp,j0)) =e= 1;

limit(i)..

sum(outp, u(outp)*y(outp,i)) =l= sum(inp, v(inp)*x(inp,i));

ERWIN KALVELAGEN

model dea /objective, normalize, limit/;

alias(i,iter);
x(inp,i) = data(i,inp);
y(outp,i) = data(i,outp);
parameter efficiency(i) efficiency of each DMU;
loop(iter,
*
* set j0 is the current DMU
*
j0(i) = no;
j0(iter) = yes;
solve dea using lp maximizing eff;
abort$(dea.modelstat<>1) "LP was not optimal";
efficiency(iter) = eff.l;
);

display efficiency;
*
* create sorted output
*
set r /rnk1*rnk1000/;
parameter rank(i);
alias (i,ii);
rank(i) = sum(ii$(efficiency(ii)>=efficiency(i)), 1);
parameter efficiency2(r,i);
efficiency2(r,i)=efficiency(i)$(rank(i)=ord(r));
option efficiency2:4:0:1;
display efficiency2;

Note that the set j0 is a dynamic set. The equations are therefore declared over
the set i, which is a static set. We then define the equations over the set j0 which
will be calculated inside the loop.
GAMS protects the modeler by forbidding the loop set to be used in equations.
However that is exactly what we need here. To work around this, we use a different
loop set iter and calculate the set j0 inside the loop.
3. Performance issues
The LPs in the model are all very small: 22 equations and 6 variables. Nevertheless GAMS will get slow if the number of DMUs gets large. Part of it we can
easily fix: the large amount of data written to the listing file. This can be reduced
to a minimum by the following statements:

option limrow=0; to remove the equation listing


option limcol=0; to remove the column listing
option solprint=off; to remove the solution listing
model.solprint=2; to suppress even more solver output
model.solvelink=2; to keep GAMS in memory while the solver executes

This will speed up GAMS but as the loop unfolds, GAMS may still become
unbearably slow. Basically, GAMS has too much overhead in solving very small

EFFICIENTLY SOLVING DEA MODELS WITH GAMS

models in a loop. We can alleviate this by folding several small LPs into one. For
the model above, we can solve the whole thing in one swoop. Say a single model
for DMU i has the standard LP format:
maximize cTi xi
xi

(6)

Ai x i = bi
`i xi ui

then a combined model can look like:


maximize

(7)

cTi xi

Ax = b
`xu
xT1 xT2

where x =

. . . xTn

and

A1

A=

(8)

A2
..

.
An

I.e. the matrix becomes a block-diagonal matrix with disconnected blocks. In a


GAMS model we can implement this by introducing an extra index on all variables
and equations.
Model dea3.gms.

$ontext
Data Envelopment Analysis (DEA) example
One LP formulation.
Erwin Kalvelagen, may 2002
Data from:
Emrouznejad, A (1995-2001),
" Ali Emrouznejads DEA HomePage",
Warwick Business School, Coventry CV4 7AL, UK

$offtext

sets i
j
inp(j)
outp(j)
;

"DMUs" /Depot1*Depot20/
inputs and outputs /stock, wages, issues, receipts, reqs/
inputs /stock, wages/
outputs /issues, receipts, reqs/

alias (i,j0);
Table data(i,j)
stock wages
Depot1
3
5
Depot2
2.5
4.5
Depot3
4
6
Depot4
6
7
Depot5
2.3
3.5

issues
40
45
55
48
28

receipts
55
50
45
20
50

reqs
30
40
30
60
25

3http://amsterdamoptimization.com/models/dea/dea3.gms

Depot6
Depot7
Depot8
Depot9
Depot10
Depot11
Depot12
Depot13
Depot14
Depot15
Depot16
Depot17
Depot18
Depot19
Depot20
;

ERWIN KALVELAGEN

4
7
4.4
3
5
5
2
5
4
2
3
7
4
3
3

6.5
10
6.4
5
7
7
4
7
4
3
6
11
6
4
6

48
80
25
45
70
45
45
65
38
20
38
68
25
45
57

20
65
48
64
65
65
40
25
18
50
20
64
38
67
60

65
57
30
42
48
40
44
35
64
15
60
54
20
32
40

parameter
x(inp,i) inputs of DMU i
y(outp,i) outputs of DMU i
;
positive variables
v(inp,j0)
input weights
u(outp,j0) output weights
;
variable
eff(j0)
totaleff
;

efficiency of each DMU


summation of all efficiencies

equations
objective(j0)
normalize(j0)
limit(i,j0)
totalobj
;

objective function: maximize efficiency


normalize input weights
"limit other DMUs efficiency"
combines all individual objective functions

totalobj..

totaleff =e= sum(j0, eff(j0));

objective(j0)..

eff(j0) =e=

normalize(j0)..

sum(inp, v(inp,j0)*x(inp,j0)) =e= 1;

limit(i,j0)..

sum(outp, u(outp,j0)*y(outp,i)) =l= sum(inp, v(inp,j0)*x(inp,i));

sum(outp, u(outp,j0)*y(outp,j0));

model dea /objective, normalize, limit, totalobj/;


x(inp,i) = data(i,inp);
y(outp,i) = data(i,outp);
solve dea using lp maximizing totaleff;

*
* create sorted output
*
set r /rnk1*rnk1000/;
parameter rank(i);
alias (i,ii);
rank(i) = sum(ii$(eff.l(ii)>=eff.l(i)), 1);
parameter efficiency2(r,i);
efficiency2(r,i)=eff.l(i)$(rank(i)=ord(r));
option efficiency2:4:0:1;
display efficiency2;

This model has just 441 equations and 121 variables, so it is still very small for
current standards. In this case, a one LP formulation solves much quicker than

EFFICIENTLY SOLVING DEA MODELS WITH GAMS

formulating twenty little models, one for each DMU. (We note that the actual
matrix being generated is not block-diagonal, but rather permuted block-diagonal:
after some simple row and column swaps the matrix can be made block-diagonal).
If you have many DMUs it is possible to find a balance between looping and
solving big LPs. E.g. suppose one has 100 DMUs, then it may make sense to solve
5 batches of 20 combined problems.
In the example below we set up a set dist which determines the distribution of
DMUs over runs. In this case we have two runs. This first takes care of DMUs 1
through 10, while the second run does DMUs 11 through 20.
Model dea4.gms.

$ontext
Data Envelopment Analysis (DEA) example
Flexible batch formulation
Erwin Kalvelagen, may 2002
Data from:
Emrouznejad, A (1995-2001),
" Ali Emrouznejads DEA HomePage",
Warwick Business School, Coventry CV4 7AL, UK

$offtext

sets i
j
inp(j)
outp(j)
;

"DMUs" /Depot1*Depot20/
inputs and outputs /stock, wages, issues, receipts, reqs/
inputs /stock, wages/
outputs /issues, receipts, reqs/

set j0(i) current DMU;


Table data(i,j)
stock wages
Depot1
3
5
Depot2
2.5
4.5
Depot3
4
6
Depot4
6
7
Depot5
2.3
3.5
Depot6
4
6.5
Depot7
7
10
Depot8
4.4
6.4
Depot9
3
5
Depot10
5
7
Depot11
5
7
Depot12
2
4
Depot13
5
7
Depot14
4
4
Depot15
2
3
Depot16
3
6
Depot17
7
11
Depot18
4
6
Depot19
3
4
Depot20
3
6
;

issues
40
45
55
48
28
48
80
25
45
70
45
45
65
38
20
38
68
25
45
57

receipts
55
50
45
20
50
20
65
48
64
65
65
40
25
18
50
20
64
38
67
60

reqs
30
40
30
60
25
65
57
30
42
48
40
44
35
64
15
60
54
20
32
40

set run batch run /run1*run2/;


set dist(run,i) distribution /
4

http://amsterdamoptimization.com/models/dea/dea4.gms

10

ERWIN KALVELAGEN

run1.(depot1*depot10),
run2.(depot11*depot20)
/;

parameter
x(inp,i) inputs of DMU i
y(outp,i) outputs of DMU i
;
positive variables
v(inp,i)
input weights
u(outp,i) output weights
;
variable
eff(i)
totaleff
;

efficiency of each DMU


summation of all efficiencies

equations
objective(i)
normalize(i)
limit(i,i)
totalobj
;

objective function: maximize efficiency


normalize input weights
"limit other DMUs efficiency"
combines all individual objective functions

totalobj..

totaleff =e= sum(j0, eff(j0));

objective(j0)..

eff(j0) =e=

normalize(j0)..

sum(inp, v(inp,j0)*x(inp,j0)) =e= 1;

limit(i,j0)..

sum(outp, u(outp,j0)*y(outp,i)) =l= sum(inp, v(inp,j0)*x(inp,i));

sum(outp, u(outp,j0)*y(outp,j0));

model dea /objective, normalize, limit, totalobj/;


x(inp,i) = data(i,inp);
y(outp,i) = data(i,outp);
parameter efficiency(i) efficiency of each DMU;

loop(run,
j0(i) = no;
j0(i)$dist(run,i) = yes;
solve dea using lp maximizing totaleff;
efficiency(j0) = eff.l(j0);
);

*
* create sorted output
*
set r /rnk1*rnk1000/;
parameter rank(i);
alias (i,ii);
rank(i) = sum(ii$(efficiency(ii)>=efficiency(i)), 1);
parameter efficiency2(r,i);
efficiency2(r,i)=efficiency(i)$(rank(i)=ord(r));
option efficiency2:4:0:1;
display efficiency2;

EFFICIENTLY SOLVING DEA MODELS WITH GAMS

11

4. Numerical experiments
The best balance between size of a batch and the number of batches need to be
determined by experimenting. Some of the state-of-the-art LP solvers are really
good now in solving LP models quickly. This means that it is often advantageous
to make the batches rather large.
runs user time system time total
1
0.083
0.052 0.135
2
0.117
0.062 0.179
3
0.136
0.097 0.233
4
0.171
0.091 0.262
5
0.181
0.125 0.306
7
0.210
0.167 0.377
10
0.320
0.224 0.544
20
0.527
0.412 0.939
Table 1. Performance results

all combined
10 + 10
7+7+6
5+5+5+5
4+4+4+4+4
3+3+3+3+3+3+2
2 each
all individual
for dea4.gms

The above model is very small, so when we tried actual runs, the fastest strategy
was to combine all models in a single run. The timings are on a 1Ghz dual pentium
machine running Linux and were obtained using the time utility of the c-shell. For
this small example we see that combining the 20 models into one run gives us a
speed-up of almost a factor 10.

runs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

time
time
user system total runs
user system total
7.607
0.361 7.968
16
5.107
0.851
5.958
6.037
0.408 6.445
17
5.142
0.814
5.956
5.777
0.369 6.146
18
5.113
0.875
5.988
5.523
0.396 5.919
19
5.146
0.894
6.04
5.421
0.423 5.844
20
5.177
0.898
6.075
5.392
0.482 5.874
21
5.218
0.955
6.173
5.388
0.552
5.94
22
5.222
1.013
6.235
5.443
0.523 5.966
23
5.275
0.966
6.241
5.451
0.574 6.025
24
5.302
1.019
6.321
5.464
0.636
6.1
25
5.271
1.066
6.337
5.492
0.589 6.081
30
5.455
1.242
6.697
5.341
0.66 6.001
40
6.005
1.503
7.508
5.248
0.728 5.976
50
6.48
1.822
8.302
5.175
0.769 5.944
100
9.136
3.501 12.637
5.181
0.765 5.946
200 15.125
6.589 21.714
Table 2. Performance results for 200 DMU model

For a larger proprietary model we used the following GAMS code:


$set n 10
set run batch run /run1*run%n%/;
set dist(run,i);
set current(run);

12

ERWIN KALVELAGEN

current(run1) = yes;
loop(i,
dist(current,i) = yes;
current(run++1) = current(run);
);
display dist;

Given a value for the environment variable n (the number of batch runs), this
fragment will distribute the subproblems i over the runs. We can set n to any
number. To perform the timing we used a model with 200 DMUs, and varied
n between 1 and 200. Running the model in one run resulted in an LP with
40401 equations and 1601 variables. Each individual model is: 201 equations and
7 variables. The performance results are shown in table 4. Here we see that there
is a wide range of relative efficient combinations. Combining all models into one is
not the best approach here.
5. Examples
5.1. Dual formulation. In this example we show how the dual formulations of
the Constant Returns to Scale CCR model (equation 4) and the Variable Returns
to Scale BCC model (equation 5) can be solved as one big LP model instead of a
series of small models.
We use the data set from [11].
Model bundesliga.gms.

$ontext
DEA models:
input and output oriented
constant returns to scale (CCR) and variable returns to scale (BCC)
Instead of a loop batch equations together to forma single large LP.
Erwin Kalvelagen jan 2005
Reference:
Dieter Haas, Martin G. Kocher and Matthias Sutter,
"Measuring Efficiency of German Football Teams by Data Envelopment Analysis",
University of Innsbruck, 12 may 2003
$offtext

set i teams /
Bayern M
unchen
Bayer Leverkusen
Hamburger SV
1860 M
unchen
1. FC Kaiserslautern
Hertha BSC
Vfl Wolfsburg
Vfb Stuttgart
Werder Bremen
SpVgg Unterhaching
Borussia Dortmund
SC Freiburg
FC Schalke
Eintracht Frankfurt
Hansa Rostock
SSV Ulm
Arminia Bielefeld
5http://amsterdamoptimization.com/models/dea/bundesliga.gms

EFFICIENTLY SOLVING DEA MODELS WITH GAMS

13

MSV Duisburg
/;
set j data
rank
wagep
wagec
points
spect
fill
rev
CL
UC

keys /
ranking at end of season 1999/2000
avg wage for players (annual, million dm)
wage for coach (monthly, 1000 dm)
points determining ranking
spectators (1000)
stadium utilization (%)
total revenue (million DM)
participation in Champions League
participation in UEFA Cup

/;
table data(i,j)
rank
Bayern M
unchen
1
Bayer Leverkusen
2
Hamburger SV
3
1860 M
unchen
4
1. FC Kaiserslautern 5
Hertha BSC
6
Vfl Wolfsburg
7
Vfb Stuttgart
8
Werder Bremen
9
SpVgg Unterhaching
10
Borussia Dortmund
11
SC Freiburg
12
FC Schalke
13
Eintracht Frankfurt 14
Hansa Rostock
15
SSV Ulm
16
Arminia Bielefeld
17
MSV Duisburg
18

wagep
63.0
30.5
31.0
30.0
31.0
32.5
19.0
20.5
20.0
12.0
60.0
9.5
40.0
20.0
14.0
8.0
16.0
11.5

wagec points
300
73
180
73
125
59
160
53
200
50
100
50
80
49
100
48
30
47
30
44
100
40
50
40
70
39
80
39
35
38
22
35
50
30
42
22

spect
894
382
703
555
684
809
292
500
507
163
1099
420
689
605
275
371
335
257

fill
83.5
89.7
76.6
51.8
96.9
62.8
83.5
65.3
84.5
76.6
93.7
98.8
65.4
58.3
66.0
97.0
74.4
50.1

rev
220
85
61
42
75
42
40
52
63
14
150
31
64
40
32
26
32
28

CL
1
1

1
1
1
1

display data;
set inp(j) inputs
/wagep,wagec/;
set outp(j) outputs /points,fill,rev/;
parameter x(inp,i); x(inp,i) = data(i,inp);
parameter y(outp,i); y(outp,i) = data(i,outp);
alias(i,i0);
positive variables lambda(i0,i);

efficiency for i0-th DMU


sum of efficiencies

*---------------------------------------------------------------------------* input oriented versions of constant returns to scale (CCR) and


* variable returns to scale (BCC) models
*----------------------------------------------------------------------------

equations
objective
input1(i0,outp)
input2(i0,inp)
convex(i0)
;
objective..

z =e= sum(i0, theta(i0));

input1(i0,outp).. sum(i,lambda(i0,i)*y(outp,i)) =g= y(outp,i0);

variables
theta(i0)
z
;

UC

14

ERWIN KALVELAGEN

input2(i0,inp)..

theta(i0)*x(inp,i0) =g= sum(i, lambda(i0,i)*x(inp,i));

convex(i0)..

sum(i, lambda(i0,i)) =e= 1;

model input_ccr /objective,input1,input2/;


model input_bcc /objective,input1,input2,convex/;
parameter results(i0,*,*);
solve input_ccr using lp minimizing z;
results(i0,input,CRS/CCR) = theta.l(i0);
solve input_bcc using lp minimizing z;
results(i0,input,VRS/BCC) = theta.l(i0);
*---------------------------------------------------------------------------* output oriented versions of constant returns to scale (CCR) and
* variable returns to scale (BCC) models
*---------------------------------------------------------------------------equations
output1(i0,inp)
output2(i0,outp)
;
output1(i0,inp)..

sum(i,lambda(i0,i)*x(inp,i)) =l= x(inp,i0);

output2(i0,outp)..

theta(i0)*y(outp,i0) =l= sum(i, lambda(i0,i)*y(outp,i));

model output_ccr /objective,output1,output2/;


model output_bcc /objective,output1,output2,convex/;

solve output_ccr using lp maximizing z;


results(i0,output,CRS/CCR) = theta.l(i0);
solve output_bcc using lp maximizing z;
results(i0,output,VRS/BCC) = theta.l(i0);

option results:4:1:2;
display results;

In the example both input oriented and output oriented efficiency scores are
calculated and presented in a results parameter:
----

149 PARAMETER results

Bayern M
unchen
Bayer Leverkusen
Hamburger SV
1860 M
unchen
1. FC Kaiserslautern
Hertha BSC
Vfl Wolfsburg
Vfb Stuttgart
Werder Bremen
SpVgg Unterhaching
Borussia Dortmund
SC Freiburg
FC Schalke
Eintracht Frankfurt
Hansa Rostock
SSV Ulm
Arminia Bielefeld
MSV Duisburg

input
CRS/CCR

input
VRS/BCC

output
CRS/CCR

output
VRS/BCC

1.0000
0.8288
0.5897
0.4282
0.7098
0.3934
0.6423
0.7578
1.0000
0.9219
0.7893
1.0000
0.5037
0.5997
0.7073
1.0000
0.6054
0.7242

1.0000
1.0000
0.7968
0.5918
1.0000
0.5545
0.8394
0.8107
1.0000
1.0000
1.0000
1.0000
0.5039
0.6003
0.7443
1.0000
0.6069
0.7450

1.0000
1.2065
1.6956
2.3354
1.4089
2.5419
1.5568
1.3196
1.0000
1.0847
1.2670
1.0000
1.9851
1.6674
1.4139
1.0000
1.6518
1.3809

1.0000
1.0000
1.0757
1.3119
1.0000
1.1827
1.0665
1.1527
1.0000
1.0000
1.0000
1.0000
1.3067
1.3698
1.1465
1.0000
1.3136
1.3695

EFFICIENTLY SOLVING DEA MODELS WITH GAMS

15

5.2. Bootstrapping. Bootstrapping[6, 16] is used to provide additional information for statistical inference. The following model from [19] implements a resampling
strategy from [15]. Two thousand bootstrap samples are formed, each resulting in
a DEA model of 100 small LPs. In this example we batch the DEA models together in a single large LP, so that we only have to solve 2,000 LP models instead
of 200,000.
Model bootstrap.gms.

$ontext
DEA bootstrapping example
Erwin Kalvelagen, october 2004
References:
Mei Xue, Patrick T. Harker
"Overcoming the Inherent Dependency of DEA Efficiency Scores:
A Bootstrap Approach", Tech. Report, Department of Operations and
Information Management, The Wharton School, University of Pennsylvania,
April 1999
http://opim.wharton.upenn.edu/~harker/DEAboot.pdf
$offtext

sets
i hospital (DMU) /h1*h100/
j inputs and outputs /
FTE
The number of full time employees in the hospital in FY 1994-95
Costs
The expenses of the hospital ($million) in FY 1994-95
PTDAYS The number of the patient days produced by the hospital in FY 1994-95
DISCH
The number of patient discharges produced by the hospital in FY 1994-95
BEDS
The number of patient beds in the hospital in FY 1994-95
FORPROF Dummy variable, one if it is for-profit hospital, zero otherwise
TEACH
Dummy variable, one if it is teaching hospital, zero otherwise
RES
The number of the residents in the hospital in FY 1994-95
CONST
Constant term in regression model
/
inp(j) inputs /FTE,Costs/
outp(j) outputs /PTDAYS,DISCH/
;
table data(i,j)
FTE
h1
h2
h3
h4
h5
h6
h7
h8
h9
h10
h11
h12
h13
h14
h15
h16
h17
h18
h19
6

1571.86
816.54
533.74
805.2
3908.1
727.72
2571.75
521
718
1504.85
1234.49
873
1067.17
668
452.35
1523
3152
871.96
2901.86

Costs
174
69.9
61.7
75.4
396
63.9
220
89.1
50
121
84.6
68.8
85.8
47.5
36.4
97.4
198
30.7
290

PTDAYS

DISCH

BEDS FORPROF TEACH

71986
53081
25030
34163
187462
31330
130077
43390
27896
75941
57080
48932
50436
67909
25200
59809
108631
17925
130004

12665
5861
4951
11877
42735
8402
26877
8598
6113
16427
14180
12060
11317
6235
6860
13180
22071
4605
24133

365
224
286
256
829
194
620
290
150
393
317
281
278
244
155
394
578
160
549

RES

1
1

136.8

42.81

23.21

13.31

195.67

126.89

http://amsterdamoptimization.com/models/dea/bootstrap.gms

16

ERWIN KALVELAGEN

h20
h21
h22
h23
h24
h25
h26
h27
h28
h29
h30
h31
h32
h33
h34
h35
h36
h37
h38
h39
h40
h41
h42
h43
h44
h45
h46
h47
h48
h49
h50
h51
h52
h53
h54
h55
h56
h57
h58
h59
h60
h61
h62
h63
h64
h65
h66
h67
h68
h69
h70
h71
h72
h73
h74
h75
h76
h77
h78
h79
h80
h81
h82
h83
h84
h85
h86
h87
h88
h89
h90
h91

902.4
194.69
713.51
557.36
2259.2
462.22
1212.1
2391.94
1637
501
412.1
738.56
414.1
1097
742
1010
440.6
1203.3
2558.01
215.45
599.3
480.55
634.51
1211.9
285.5
1030.36
1374.81
953.56
561.11
644
376.55
404.79
397.9
374.2
1702
148.09
253.48
1445.68
414.1
642.58
203.75
421.8
320.62
679.79
2382
559.29
568.15
2408.04
632.34
917.22
554.34
780
663.82
1424
313
778
863.37
3509.12
1593.82
466
666.38
998.8
1018
3238.28
1431.1
1735.99
1769
484.56
204.7
1706.58
1029.11
1167.2

78.2
10.9
62.6
23.8
120
32.4
97.3
192
162
37.9
40.2
27
35.7
105
62.8
97.1
34.2
85.4
195
8.409936
30.4
29.5
29.9
91.4
23.9
67.1
95.5
49.8
41.7
57.1
19.6
32.8
29.4
3.944649
100
5.013379
16.9
99.3
26.5
48.5
13
18.3
17.3
25.6
226
58.1
35
155
54.6
55.2
56.9
75.9
56.9
146
20.7
78.4
62
290
152
40.1
48.2
121
98.2
326
107
273
190
36.2
13.9
287
71.9
111

35743
15555
32558
12728
74061
28886
74194
89843
80468
26813
23217
11514
55611
59443
42542
47246
30773
50710
128450
65743
23299
34279
27157
90008
16473
43486
74279
47934
24800
39663
22003
27566
26072
4179
114603
51660
17599
81041
20432
42733
16923
16179
18882
27561
166559
40534
37120
70392
37228
42135
32352
39213
34180
107457
20110
51496
50957
109673
82400
30647
28048
45513
61176
122118
48900
84118
105741
24070
28137
75153
49993
75004

8664
1530
8966
2291
12942
6101
12681
18396
21345
4594
6044
3052
4354
13101
8739
12073
4305
11470
20441
578
5338
6560
5198
17666
2873
9467
11862
10553
5498
8604
4759
7871
4248
819
17235
771
4044
12912
4068
5983
3467
2840
3370
4447
26019
8806
7242
9538
6359
7294
3320
7154
5284
18198
5967
12302
10557
19213
17707
7265
5182
6855
11386
19068
10623
16458
19256
6464
1615
13465
6690
21334

236
132
138
276
348
134
342
336
415
166
160
144
200
242
172
269
201
247
571
238
173
169
141
320
135
235
284
207
132
260
143
190
170
156
438
172
178
475
129
181
146
160
160
308
787
342
158
266
175
215
205
172
200
432
165
390
228
469
474
164
153
238
350
514
208
278
478
125
135
367
252
350

12.08

14.52

229.19

26.32

1.1

1
1

13.82
5.42

6.25

6.44

11.81

17.53

1
1

11.33
7.08

111.33

2.75

1
1

290.53
11.64

88.86

146.33

1
1

158.4
0.93

1
1

91.56
4

1
1
1

1
1

EFFICIENTLY SOLVING DEA MODELS WITH GAMS

h92
h93
h94
h95
h96
h97
h98
h99
h100

1657.58
1017.16
1532.7
1462
1133.8
609
301.31
1930.08
1573.3

116
88.5
153
113
109
48.2
20.2
201
177

77753
64147
99998
119107
55540
71817
43214
87197
88124

17528
11135
17391
16053
15566
5639
2153
19315
19661

413
316
395
484
355
376
141
418
458

1
1
1
1

4.8
0.5
8.51
1

69.71

;
data(i,CONST) = 1;

*------------------------------------------------------------------------------* PHASE 1: Estimation of b(j)


*
* Run standard Constant Returns to Scale (CCR) Input-oriented DEA model
* followed by linear regression OLS estimation
*-------------------------------------------------------------------------------

*
* this is the standard DEA model
* instead of 100 small models we solve one big model, see
* http://www.gams.com/~erwin/dea/dea.pdf
*
parameter
x(inp,i) inputs of DMU i
y(outp,i) outputs of DMU i
;
alias(i,j0);
positive variables
v(inp,j0)
input weights
u(outp,j0)
output weights
;
variable
eff(j0) efficiency
z objective variable
;
equations
objective(j0)
normalize(j0)
limit(i,j0)
totalobj
;
totalobj..
objective(j0)..
normalize(j0)..
limit(i,j0)..

objective function: maximize efficiency


normalize input weights
"limit other DMUs efficiency"

z =e= sum(j0, eff(j0));


eff(j0) =e= sum(outp, u(outp,j0)*y(outp,j0));
sum(inp, v(inp,j0)*x(inp,j0)) =e= 1;
sum(outp, u(outp,j0)*y(outp,i)) =l= sum(inp, v(inp,j0)*x(inp,i));

model dea /totalobj,objective, normalize, limit/;


alias (i,iter);
x(inp,i) = data(i,inp);
y(outp,i) = data(i,outp);
option limrow=0;
option limcol=0;
dea.solprint=2;
dea.solvelink=2;
solve dea using lp maximizing z;
abort$(dea.modelstat<>1) "LP was not optimal";
display
"------------------------------------ DEA MODEL ------------------------",
eff.l;

17

18

*
*
*
*
*
*
*

ERWIN KALVELAGEN

now solve the regression problem


efficiency = b0 + b1*BEDS + b2*FORPROF + b3*TEACH + b4*RES
Use b = inv(X^TX) X^Ty
Standard errors are sigma^2 inv(X^TX)
See http://www.gams.com/~erwin/stats/ols.pdf

set e(j) explanatory variables /BEDS,FORPROF,TEACH,RES,CONST/;


*
* calculate inv(X^TX)
*
alias(e,ee,eee);
parameter XX(e,ee) matrix (X^TX);
XX(e,ee) = sum(i,data(i,e)*data(i,ee));
parameter Xy(e) X^Ty;
Xy(e) = sum(i, data(i,e)*eff.l(i));
parameter ident(e,ee) Identity matrix;
ident(e,e)=1;
variable
invXX(e,ee) matrix inv(X^TX)
dummy
;
equation
invert(e,ee)
edummy
;
invert(e,ee).. sum(eee, XX(e,eee)*invXX(eee,ee)) =e= ident(e,ee);
edummy.. dummy=e=0;
model matinv /invert,edummy/;
matinv.solprint=2;
matinv.solvelink=2;
solve matinv using lp minimizing dummy;
*
* calculate estimates and standard errors
*
parameter b(e);
b(e) = sum(ee, invXX.l(e,ee)*Xy(ee));
parameter resid(i) residuals;
resid(i) = eff.l(i) - sum(e,b(e)*data(i,e));
scalar rss residual sum of squares;
rss = sum(i, sqr(resid(i)));

*
* calculate standard errors
*
scalar df degrees of freedom;
df = card(i)-card(e);
scalar sigma_squared variance of estimate;
sigma_squared = rss/df;
parameter variance(e,ee);
variance(e,ee) = sigma_squared*invXX.l(e,ee);
parameter se(e) standard error;
se(e) = sqrt(variance(e,e));
parameter tval(e) "t statistic";
tval(e) = b(e)/se(e);
parameter pval(e) "p-values";
*
*

pvalue = 2 * pt( abs(tvalue), df)

EFFICIENTLY SOLVING DEA MODELS WITH GAMS

*
= 2 * 0.5 * pbeta( df / (df + sqr(abs(tvalue))), df/2, 0.5)
*
= betareg( df / (df+sqr(tvalue)), df/2, 0.5)
*
pval(e) = betareg( df / (df+sqr(tval(e))), df/2, 0.5);
parameter ols(e,*);
ols(e,estimates) = b(e);
ols(e,std.error) = se(e);
ols(e,t value) = tval(e);
ols(e,p value) = pval(e);

display
"------------------------------------ OLS MODEL ------------------------",
ols;

*------------------------------------------------------------------------------* PHASE 2: BOOTSTRAP algorithm


*------------------------------------------------------------------------------set s sample /sample1*sample2000/;

parameter bs(s,i) bootstrap sample;


bs(s,i) = trunc( uniform(1,card(i)+0.999999999) );
*display bs;
* sanity check:
loop((s,i),
abort$(bs(s,i)<1) "Check bs for entries < 1";
abort$(bs(s,i)>card(i)) "Check bs for entries > card(i)";
);
alias(i,ii);
set mapbs(s,i,ii);
mapbs(s,i,ii)$(bs(s,i) = ord(ii)) = yes;
* this mapping says the ith sample data record is the iith record
* in the original data (for sample s)
loop((s,i),
abort$(sum(mapbs(s,i,ii),1)<>1) "mapbs is not unique";
);
parameter data_sample(i,j);
parameter sb(s,e) b(e) for each sample s;
loop(s,
*
* solve dea model
*
data_sample(i,j) = sum(mapbs(s,i,ii),data(ii,j));
x(inp,i) = data_sample(i,inp);
y(outp,i) = data_sample(i,outp);
solve dea using lp maximizing z;
abort$(dea.modelstat<>1) "LP was not optimal";
*
* solve OLS model
*
XX(e,ee) = sum(i,data_sample(i,e)*data_sample(i,ee));
Xy(e) = sum(i, data_sample(i,e)*eff.l(i));
solve matinv using lp minimizing dummy;
sb(s,e) = sum(ee, invXX.l(e,ee)*Xy(ee));

);

19

20

ERWIN KALVELAGEN

*
* get statistics
*
parameter bbar(e) "Averaged estimates";
bbar(e) = sum(s, sb(s,e)) / card(s);
parameter sehat(e) "Standard errors of bootstrap algorithm";
sehat(e) = sqrt(sum(s, sqr(sb(s,e)-bbar(e)))/(card(s)-1));
parameter tbootstrap(e) "t statistic for bootstrap";
tbootstrap(e) = b(e)/sehat(e);
scalar dfbootstrap degrees of freedom;
dfbootstrap = card(i) - (card(e) - 1) - 1;
parameter pbootstrap(e) "p-values for bootstrap";
*
* pvalue = 2 * pt( abs(tvalue), df)
*
= 2 * 0.5 * pbeta( df / (df + sqr(abs(tvalue))), df/2, 0.5)
*
= betareg( df / (df+sqr(tvalue)), df/2, 0.5)
*
pbootstrap(e) = betareg( dfbootstrap / (dfbootstrap+sqr(tbootstrap(e))), dfbootstrap/2, 0.5);
parameter bootstrap(e,*);
bootstrap(e,estimates) = b(e);
bootstrap(e,std.error) = sehat(e);
bootstrap(e,t value) = tbootstrap(e);
bootstrap(e,p value) = pbootstrap(e);

display
"------------------------------------ BOOTSTRAP MODEL ------------------------",
bootstrap;

The idea of this model is to build a regression equation:


(9)

i = 0 + 1 BEDSi + 2 FORPROFi + 3 TEACHi + 4 RESi + i

where i are the DEA efficiency scores. From the results


----

290 ------------------------------------ OLS MODEL ------------------------

----

290 PARAMETER ols


estimates

std.error

BEDS
1.040019E-4 1.244050E-4
FORPROF
0.099
0.042
TEACH
-0.057
0.039
RES
-0.001 3.303407E-4
CONST
0.607
0.035

t value

p value

0.836
0.405
2.390
0.019
-1.451
0.150
-3.133
0.002
17.330 3.59753E-31

we see that FORPROF is significant at = 0.05 (the corresponding p value is


smaller than 0.05). However when we apply the resampling technique from the
bootstrap algorithm, the results indicate a different interpretation:
----

380 ------------------------------------ BOOTSTRAP MODEL ------------------------

----

380 PARAMETER bootstrap


estimates

std.error

BEDS
1.040019E-4 1.107967E-4
FORPROF
0.099
0.060
TEACH
-0.057
0.036
RES
-0.001 2.442416E-4
CONST
0.607
0.042

t value

p value

0.939
0.350
1.651
0.102
-1.584
0.116
-4.237 5.234667E-5
14.417 1.18732E-25

EFFICIENTLY SOLVING DEA MODELS WITH GAMS

21

default
solvelink=2
real
27m12.745s real
14m29.518s
user
20m58.595s user
12m58.734s
sys
5m30.054s
sys
1m3.559s
Table 3. Solvelink results

Here the p-value for FORPROF is indicating this parameter is not significant at the
0.05 level. The p-values are calculated using the incomplete beta function which is
available as BetaReg() in GAMS[12].
It is noted that the option m.solvelink=2; is quite effective for this model.
Timings that illustrate this are reported in table 3.
A further small performance improvement can be achieved to augment the model
equations for the DEA model by the equations that calculate (X T X)1 . This will
combine the DEA and OLS model into one model. After this has been done there
is only one solve for each bootstrap sample.
6. Other DEA sources
We want to mention the work of [8] and [9] for large DEA models in conjunction
with GAMS. The software is described on the web page http://www.gams.com/
contrib/gamsdea/dea.htm [10].
Some earlier DEA modeling work with GAMS is documented in [14, 18].
References
1. R. D. Banker, A. Charnes, and W. W. Cooper, Some models for estimating technical and scale
efficiencies in data envelopment analysis, Management Science 30 (1984), no. 9, 10781092.
2. William F. Bowlin, Measuring performance: An introduction to data envelopment analysis
(dea), Tech. report, Department of Accounting, University of Northern Iowa, Cedar Falls, IA,
1998.
3. A. Charnes, W. W. Cooper, and E. Rhodes, Measuring the efficiency of decision making
units, European Journal of Operational Research 2 (1978), 429444.
4. A. Charnes, W. W. Cooper, and E. Rhodes, Evaluating program and managerial efficiency:
An application of data envelopment analysis to program follow through, Management Science
27 (1981), 668697.
5. Laurens Cherchye, Timo Kuosmanen, and Thierry Post, New tools for dealing with errors-invariables in DEA, Tech. report, Catholic University of Leuven, 2000.
6. Bradley Efron and Robert J. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall,
1993.
7. Ali Emrouznejad, Dea homepage, http://www.deazone.com/, 2001.
8. Michael C. Ferris and Meta M. Voelker, Slice models in general purpose modeling systems,
Tech. report, Computer Sciences Department, University of Wisconsin, 2000.
9.
, Cross-validation, support vector machines and slice models, Tech. report, Computer
Sciences Department, University of Wisconsin, 2001.
10. GAMS Development Corporation, GAMS/DEA, http://www.gams.com/contrib/gamsdea/
dea.htm, 2001.
11. Dieter Haas, Martin G. Kocher, and Matthias Sutter, Measuring Efficiency of German Football Teams by Data Envelopment Analysis, Tech. report, University of Innsbruck, May 2003.
12. Erwin Kalvelagen, New special functions in GAMS, http://amsterdamoptimization.com/
pdf/specfun.pdf.
13.
, Model building with gams, to appear.
14. O.B. Olesen and N.C. Petersen, A presentation of GAMS for DEA, Computers & Operations
Research 23 (1996), no. 4, 323339.

22

ERWIN KALVELAGEN

15. Leopold Simar and Paul W. Wilson, Sensitivity Analysis of Efficiency Scores: How to Bootstrap in Nonparametric Frontier Models, Journal of Applied Statistics 44 (1998), no. 1, 4961.
16.
, A general methodology for bootstrapping in nonparametric frontier models, Journal
of Applied Statistics 27 (2000), 779802.
17. Boris Vuj
ci
c and Igor Jemri
c, Efficiency of banks in transition: A DEA approach, Tech.
report, Croatian National Bank, 2001.
18. John B. Walden and James E. Kirkley, Measuring Technical Efficiency and Capacity in Fisheries by Data Envelopment Analysis Using the General Algebraic Modeling System (GAMS):
A Workbook, NOAA Technical Memorandum NMFS-NE-160, National Oceanic and Atmospheric Administration, National Marine Fisheries Service, Woods Hole Lab., 166 Water St.,
Woods Hole, MA 02543, 2001.
19. Mei Xue and Patrick T. Harker, Overcoming the Inherent Dependency of DEA Efficiency
Scores: A Bootstrap Approach, Tech. report, Department of Operations and Information
Management, The Wharton School, University of Pennsylvania, April 1999.
Amsterdam Optimization Modeling Group, Washington D.C./The Hague
E-mail address: [email protected]

You might also like