Stochastic Dynamic Programming 2
Stochastic Dynamic Programming 2
net/publication/299870315
CITATION READS
1 3,449
1 author:
Kjetil K. Haugen
Molde University College
144 PUBLICATIONS 1,014 CITATIONS
SEE PROFILE
All content following this page was uploaded by Kjetil K. Haugen on 08 April 2016.
The license Creative Commons License CC-BY 4.0 gives permission to copy, dis-
tribute and disseminate the work in any medium or format, and to freely adapt
the material for any purpose, including commercial ones. The licensor cannot
withdraw these freedoms as long as you respect the following license conditions.
For such dissemination and adaptation, the following conditions apply: You must
provide correct citations and a reference to the license, together with an indication
of whether changes have been made. You can do this in any reasonable way as
long as it cannot be construed that the licensor endorses you or your use of the
work. You may not in any way prevent others from actions allowed by the license.
The book is published with support by Molde University College, Specialized Uni-
versity in Logistics.
This book was my first serious academic project after finishing my PhD-
thesis (Haugen, 1991) back in 1991. The primal subject for this thesis was an
application of stochastic dynamic programming in petroleum field scheduling
for Norwegian oil fields. Soon after defending my thesis, I was contacted by
a US publisher, asking whether I would like to write a chapter in a new OR1 -
series of books. This chapter should be related to stochastic (or probabilistic)
dynamic programming and cover around half of the planned volume, the
other half covering deterministic dynamic programming. Being somewhat
young, inexperienced and ambitious, I said yes to the job.
Later on, after finishing this work, it turned out that the book series was
cancelled. Naturally, I was not happy about such a decision. However, the
job I did back in 1991–1994, turned out to be of decent quality – even today.
This new version of the book covers most classical concepts of stochastic
dynamic programming, but is also updated on recent research. A certain
emphasis on computational aspects is evident.
The book discusses both classical probabilistic dynamic programming
techniques as well as more modern subjects, including some of my own re-
sults from my PhD. As such, the book can perhaps be categorized as a classic
monograph. As a consequence, some knowledge of probability calculus as
well as optimization and economic theory is needed for the general reader.
However, the book can (with some added material) be used as a text-book
on the subject, but as mentioned previously, it is not written as one.
Kjetil K. Haugen
Trondheim, Molde
1991–1994, September 2015
1
operations research
Acknowledgements
Contents 7
List of Figures 9
List of Tables 11
1 Introduction 13
1.1 An illustrative example . . . . . . . . . . . . . . . . . . . . . . 14
1.2 Solving the example by decision trees . . . . . . . . . . . . . . 15
1.3 Solving the example by SDP . . . . . . . . . . . . . . . . . . . 19
3 SDP - Benefits 33
3.1 SDP versus Decision trees . . . . . . . . . . . . . . . . . . . . 33
3.2 Nondiscrete state space . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Nondiscrete action space . . . . . . . . . . . . . . . . . . . . . 35
3.4 Handling non linearities . . . . . . . . . . . . . . . . . . . . . 39
3.5 Analytic solutions . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . 57
4 SDP - difficulties 59
4.1 Curse of dimensionality . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Problem structure . . . . . . . . . . . . . . . . . . . . . . . . . 65
6 Recent research 83
6.1 “Cures” for the curse of dimensionality . . . . . . . . . . . . . 83
6.2 Compression methods . . . . . . . . . . . . . . . . . . . . . . . 83
6.3 State space relaxation . . . . . . . . . . . . . . . . . . . . . . 86
6.4 Aggregation methods . . . . . . . . . . . . . . . . . . . . . . . 87
6.5 Forecast horizon . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.6 SDP and supercomputing . . . . . . . . . . . . . . . . . . . . 88
Bibliography 91
Index 96
List of Figures
1.1 Basic decision/chance node structure for the house selling ex-
ample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2 Full decision tree for the house selling example . . . . . . . . . 17
1.3 Upper branch of decision tree for the house selling example . . 18
1.4 Evaluating uncertain outcomes by expected values in a deci-
sion tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.1 Data for the house selling example. (All numbers in $1000.) . 15
1.2 Solution for the house selling example. . . . . . . . . . . . . . 18
1.3 Definition of the immediate return function R(i, a) for the
house selling example. . . . . . . . . . . . . . . . . . . . . . . 19
1.4 V2 (i) for the house selling example. . . . . . . . . . . . . . . . 20
1.5 V1 (i) for the house selling example. . . . . . . . . . . . . . . . 21
1.6 V1 (i) for the house selling example with alternative definition
of Pij (a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.7 V1 (i) for the house selling example with alternative definition
of Pij (a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1 V2 (i) for the house selling example with utility function, u(ξ) . 29
2.2 V1 (i) for the house selling example with utility function, u(ξ) . 29
3.1 Solution for the house selling example with price uniformly
distributed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Solution to the house selling example with quadratic utility
function and uniform density. . . . . . . . . . . . . . . . . . . 50
3.3 p1 , p2 , . . . , p10 . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.1 Net profit in each time period for various payment and main-
tenance possibilities . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 Probabilities for High (H), Medium (M) or Low (L) payments
in the next period, given observed state values and your decisions 69
5.3 Possible policies and associated net profits for the MDP-example 70
5.4 Stationary distributions for all possible policies . . . . . . . . . 71
5.5 Expected per period net profits for all possible policies . . . . 72
5.6 Behaviour of the Method of successive approximations . . . . 80
5.7 Policy improvement step . . . . . . . . . . . . . . . . . . . . . 82
Introduction
widely known. We also get a nice way of comparing the two methods.
Table 1.1: Data for the house selling example. (All numbers in $1000.)
Figure 1.1: Basic decision/chance node structure for the house selling exam-
ple
Figure 1.2: Full decision tree for the house selling example
18 Introduction
Figure 1.3: Upper branch of decision tree for the house selling example
eral conditional solutions. That is, we make alternative plans for all possible
futures. As opposed to the deterministic case where we only get one plan.
We also observe another important fact from table 1.2. The optimal
strategy is different between the two time periods. We see that it differs
in the optimal decision given a medium price observation. The salesman
waits in period 1 while he sells in period 2. This is an important distinction
which is treated well in the literature of SDP. Especially if we look at infinite
horizon problems, the possibility of obtaining stationary policies will prove
to be interesting. A stationary policy is a solution which is unconditioned on
time but conditioned on state. We will return to these topics later.
Table 1.3: Definition of the immediate return function R(i, a) for the house
selling example.
The next step we performed in the solution process, was to move to period
1. Again we maximized over all states but also including expected values of
waiting with the sales decision to period 2. If we look at the high price state,
the actual computation we performed was;
Note that the i subscript only takes on the three stochastic values in
period 1 as the fourth alternative from period 2 – “sold earlier” is impossible.
If we call the value function in period 1 V1 (i), equation (1.4) becomes:
! #
"
V1 (i) = max R(i, a) + pi V2 (i) (1.5)
a
i
Comparing table 1.5 and 1.4 with table 1.2 we observe that our latter
approach produced the same answer as the decision tree approach.
If we look at equation (1.5) we see that we have identified a recursive
method of computing the value function at different periods of time in our
problem.
Ross (Ross, 1983) defines the optimality equation as follows:
% &
"
Vn (i) = max R(i, a) + Pij (a)Vn+1 (j) (1.6)
a
j
To be formal, the term Pij (a) in equation (1.6) states that the stochastic
mechanism affecting our optimization problem is a family of discrete Markov
processes with transition matrices Pij (a). Returning to our initial example
we note that using this terminology, the price of our object can be described
as follows:
22 Introduction
H M L
H 0.25 0.55 0.20
Pij (a) = ∀a ∈ {“sell”, “wait”} (1.7)
M 0.25 0.55 0.20
L 0.25 0.55 0.20
This means that if we observe a high, medium or low price (H,M, L) in
period 1, then the probability of observing the same set of prices in period 2
is independent of the observation in period 1.
Assume alternatively that Pij (a) had the following structure:
H M L
11 4
H 1 − α 15 α 15 α
Pij (a) = ∀a ∈ {“sell”, “wait”}, α ∈ [0, 0.75], β ∈ [0, 0.8]
M 0.25 0.55 0.20
5 11
L 16
β 16
β 1−β
(1.8)
How may equation (1.8) be interpreted? Suppose we suspect that the
price we get selling a house one year depends on the price we get the year
before. A simple – but somewhat sensible assumption – may be to say that
a high price one year would “lead” to a high price next year and vice versa.
Equation (1.8) reflects such a reasoning. Note that if we choose α = 0.75
and β = 0.8 we obtain equation (1.7). Choosing the parameters α and β
equal to zero produces the other extreme – high and low price as absorbing
states. Surely we could try to give precise estimates on α and β if we had
additional information but that may be difficult. One way to attack such a
problem may be to try to solve the problem parametrically. That is, solve it
for all possible values of the parameters. In this example, this turns out to
be very simple so we might as well do it.
The calculations which lead to table 1.4 does not change. However, a new
version of table 1.5 is shown in table 1.6
The V1 (i) -values in table 1.6 are obtained as follows:
' (
190
Max 100, 100 − α = 100 (1.9)
3 α∈[0,0.75]
' (
525 525
Max −50, β = β (1.10)
8 β∈[0,0.8] 8
This might be a somewhat surprising result. The solution structure is
unchanged. That is, the optimal conditional decisions in table 1.2 are the
1.3 Solving the example by SDP 23
Table 1.6: V1 (i) for the house selling example with alternative definition of
Pij (a)
$
R(i, a) + i pi V2 (i)
i =High Price i =Medium price i =Low Price
sell 100 50 -50
190 525
wait 100- 3 α 52.5 8
β
525
V1 (i) 100 52.5 8
β
same. The only difference is the expected value of waiting in period 1 given
a low price, which is moving linearly from 0 to 52.5 for permitted values of
β.
Surely this is not a general result. However, it stresses an important
point. It is normally the leap from a deterministic model to a stochastic model
that give a dramatic different solution, not the actual stochastic mechanism
applied.
Action dependence may be harder to imagine in our house selling example.
Obviously we might imagine situations where a decision on selling a house or
not may affect our predictions on the future price. However, as our alternative
decision is to sell today, this example becomes somewhat artificial. If we
alternatively look at the possible decisions we have modelled, it should be
obvious that if we do not sell the house today, there is a whole pile of actions
we can take to try to change the price we may get tomorrow. We can paint it,
advertise more or differently, hire someone to set the house on fire and so on.
Such situations may be compared to insurance. That is, we can – by actions
– change our perspective of the future but the future is still hard to model
without using probabilistic techniques. Weather phenomena fit nicely into
this pattern. We can affect the probability of rain by locating our business in
Sahara or New Foundland but still some probability of rain exists in Sahara.
We can guard ourselves against theft by engaging a guard bureau, buying
a gun or moving away from New York, but still some probability of theft
exists. Surely we can also use insurance – not to remove the probability of
the event – but to vary the consequences of the event.
Let us make a small change in our example to exemplify this. Suppose
we now include the possibility of painting our house if we do not sell it.
24 Introduction
Assume that Pij (a) is given as in equation (1.7) for a ∈ {“sell”, “wait”}
while equation (1.11) gives the transition matrix for the third alternative;
a =“wait and paint”.
H M L
H 0.90 0.10 0.00
Pij (a) = (1.11)
M 0.50 0.40 0.10
L 0.20 0.50 0.30
Table 1.7: V1 (i) for the house selling example with alternative definition of
Pij (a)
$
R(i, a) + i pi V2 (i)
i =High Price i =Medium price i =Low Price
sell 100 50 -50
wait 52.5 52.5 52.5
wait and paint 95 − c 70 − c 45 − c
V1 (i) 100 max[52.5, 70 − c] 52.5
The results from table 1.7 show that the maximum price the owner of
the house would be interested in paying for the painting operation is 17.5.
This number is obtained as follows: Given that c > 17.5, the solution is
1.3 Solving the example by SDP 25
the decision to wait at a certain stage, observing the price (state value) si-
multaneously, the state value (price) at the next stage was not determined
with certainty.
The other implication of the stochastic assumption relates to the calcula-
tion of the recursive relationship. As pointed out earlier, given an uncertain
transition from one stage to the other, we need to decide how to deal with
uncertain outcomes. In our examples, we have used expected value as a mean
of dealing with uncertainty. As discussed in subsection 1.2 one answer to this
problem is utility theory.
Table 2.1: V2 (i) for the house selling example with utility function, u(ξ)
Table 2.2: V1 (i) for the house selling example with utility function, u(ξ)
$
R(i, a) + i pi V2 (i)
i =High Price i =Medium price i =Low Price
sell u(100) u(50) u(−50)
wait .25 + .55u(50) .25 + .55u(50) .25 + .55u(50)
V1 (i) u(100) Max[u(50), .25 + .55u(50)] .25 + .55u(50)
The numbers in table 2.2 may need some further explanation. The first
line gives the utility values of selling now (at stage 1) yielding u(100), u(50)
and u(−50). If we wait at stage 1 we have to calculate the expected utility
of that decision yielding;
u(100) > u(50) > u(0) ⇒ u(100) > .25u(100) + .55u(50) + .20u(0) (2.8)
and
Figure 2.1: Graph of utility function given indifference between risky and
certain decision
Chapter 3
SDP - Benefits
*n +
t
nt+1 = 6 = 3nt or nt = 6(3)t−1 ∀t ∈ {0, 1, . . . , T − 1} (3.1)
2
Substituting T = 14 into equation (3.1) yields 28,697,814.
If we compare this to the SDP approach, we observe that the calculations
at each stage would not grow exponentially as in the decision tree. We would
still be doing computations leading to tables close to table 1.5 – adding the
fourth i =“sold earlier” state. That is, from a computational point of view,
we may save a lot of work by applying SDP as opposed to decision trees.
34 SDP - Benefits
Surely, this effect is not due to the stochasticity of the problem. An excel-
lent example comparing the complexity of a deterministic decision tree and
the corresponding deterministic DP may be found in Wallace and Kall (Kall
and Wallace, 1994).
Hence, one important difference between SDP and decision trees is a
computational superiority in favour of SDP.
x=p−C (3.2)
Where C is the deterministic cost associated with the selling decision
from table 1.1. Consequently, x is uniform; x ∼ U[−50, 100], and it gives the
net profit obtained by performing a sale decision. The density of x becomes
, 1
150
x ∈ [−50, 100]
f (x) = (3.3)
0 otherwise
Let us resolve the example under the new assumption. In period 2, we
will sell if the observed x is positive. Alternatively, we wait or do noting as
this is the last period. Mathematically;
,
x x≥0
V2 (x) = (3.4)
0 otherwise
Note that the value function V2 is a continuous function of the continuous
state variable x. Continuing to period 1 we find V1 (x) by the recursion
(Formally, equation 3.5 is incorrect. This is due to the fact that the x is
not the same. The first x is the outcome of the stochastic variable in period 1
while the x in E {V2 (x)} is the outcome in period 2. However, as we compute
the expectation, the x (from period 2) vanishes yielding no further notational
problems.)
The expectation in equation (3.5) is calculated as
0 100
1 1 1
) )
E {V2 (x)} = 0 dx + x dx = 33 (3.6)
−50 150 0 150 3
giving
x ≥ 33 31
,
x
V1 (x) = (3.7)
33 13 otherwise
Hence, in period 2 the house is sold if the observed x is larger than 33 31
otherwise we wait. To show the structure we may construct a table as 1.2.
Table 3.1: Solution for the house selling example with price uniformly dis-
tributed.
the total area of the land is 1 unit of something and that we want to find an
optimal sales strategy over a two period horizon. Hence, we need decision
variables in each of the two periods. Let us define αt as the proportion of
the remaining land we sell in period t – αt ∈ [0, 1]. We keep the assumption
on p and x; x ∼ U[−50, 100].
Let us solve this example using SDP. In period 2 we need to have infor-
mation on how much land we have available for sale in this period. Surely
this is determined by the decision we make in period 1. Additionally (as in
our earlier examples), we need information on the outcome of the stochastic
mechanism. Hence, a SDP formulation will contain a two dimensional state
space in period 2. The value function in period 2 is;
The term (1 − α1 ) in equation (3.8) computes the available area for sale
in period 2. As α1 is ∈ [0, 1], (1 − α1 ) is positive. Therefore, we will sell all
of our remaining land if x > 0 – (α2 = 1). If on the other hand, x < 0, we
sell nothing – (α2 = 0). Then, V2 (x, α1 ) may be written:
,
x(1 − α1 ) x ≥ 0
V2 (x, α1 ) = (3.9)
0 otherwise
The optimization problem for period 1 is formulated as;
1
E {V2 (x, α1 )} = 33 (1 − α1 ) (3.11)
3
and V1 (x) as
' (
1 1
V1 (x) = max α1 (x − 33 ) + 33 (3.12)
0≤α1 ≤1 3 3
Solving the optimization problem (3.12) is straightforward. If x < 33 31 , α1
is multiplied with a negative number. Therefore, the maximal V1 (x|x < 33 31 )
is obtained by minimizing α1 . Alternatively, if x ≥ 33 13 , the optimal α1 is 1.
Consequently, V1 (x) becomes
3.3 Nondiscrete action space 37
x ≥ 33 13
,
x
V1 (x) = 1 (3.13)
33 3 otherwise
As in deterministic DP we need to “roll back” over the deterministic
state variable (α1 in this case) in order to find a complete solution. The case
x ≥ 33 31 yields selling the whole area in period 1 and nothing happens in
period 2. On the other hand, if x < 33 31 we sell nothing in period 1, α1 = 0
and
,
x x≥0
V2 (x) = (3.14)
0 otherwise
If we compare the solution of this example – equations (3.13) and (3.14) –
to the example in section 3.2 – equations (3.7) and (3.4), we observe that the
solutions are identical. This is of course due to the fact that the optimization
problems (3.8) and (3.12) are parametrical linear programming problems. We
know from LP theory that we always obtain corner solutions. In our case we
have two corner solutions, α = 1 or α = 0.
Actually, this is a quite general result. As an outline of a proof is instruc-
tive for later purposes, we will carry it through. Let us make some simple
generalizations. Assume that we look at our problem in a time frame of N
periods. Assume also that we generalize our uniform distribution for x to a
general distribution f [a, b] where a < 0. Let us start the backward recursion
in period N. Our optimization problem becomes:
% !N −1 # &
-
max x (1 − αi ) αN (3.15)
0≤αN ≤1
i=1
.N −1
The term i=1 (1 − αi ) is merely the remaining area for sale in period
N. The solution to (3.15) is straightforward giving
, .N −1
x i=1 (1 − αi ) x ≥ 0
VN (x, α1 , . . . , αN −1 ) = (3.16)
0 otherwise
If we move to stage N − 1 we obtain the following optimization problem;
% !N −2 # !N −2 # ) b &
- -
max x (1 − αi ) αN −1 + (1 − αi ) (1 − αN −1 ) xf (x)dx
0≤αN−1 ≤1 0
i=1 i=1
(3.17)
38 SDP - Benefits
/. 0
N −2
The product term i=1 (1 − αi ) is a common factor and a constant
in the optimization, and may hence be removed. Let us also define:
) b
θN −1 = xf (x)dx (3.18)
0
! /. 0
N −2
VN −1 (x, α1 , . . . , αN −2 ) =
x i=1 (1 − αi ) x ≥ θN −1
(3.20)
0 otherwise
or at a general stage N − j,
Note that we have introduced the variable y for the stochastic variable x
in period 2 in order to avoid confusion with x in period 1.
In order to keep the mathematics at a reasonable level we introduce a
family of utility functions at this point. Assume that u(w) may be expressed
as follows:
u(w) = Aw 2 + Bw (3.27)
Finally we want u′ (w) positive and u′′ (w) negative. u′′ (w) < 0 implies
Utilizing the fact that the maximal value of w is 100 we obtain an upper
limit on B from equation (3.30)
Hence we may choose B in the interval < .01, .02 >. Figure 3.1 shows
u(w) for some values of B. Note that the degree of risk aversion is increasing
with increasing B in figure 3.1.
Now we are in a position to evaluate the integral in equation (3.26).
Using (3.28), equation (3.26) may be expressed as
1.2
B=0.011
B=0.013
B=0.015
B=0.017
1 B=0.019
0.8
0.6
0.4
0.2
0
0 20 40 60 80 100
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
1
H(B)(33 − x) − 2x2 < 0 (3.36)
3
H(B) is defined
B
H(B) = (3.37)
0.0001 − 0.01B
x(B) is defined as
3 4
1 0.0001
x(B) = 33 + 4444.44 − 0.01 (3.38)
3 B
Let us start investigating the solution (3.35), (3.36), (3.37) and (3.38) by
differentiating (3.32) with respect to α1 and solve for first order conditions.
(Note that we have simplified the notation of C1 , . . . , C4 by omitting the
parametric dependence of B and x.)
3.4 Handling non linearities 43
B(33 13 − x)
' (
1
+ 4444.44 ≥ 0 (3.43)
2 [x2 + 2222.22] .0001 − .01B
Further manipulations on equation (3.43) yields
3 4
1 0.0001
x > 33 + 4444.44 − 0.01 (3.44)
3 B
We see that the right hand side of inequality (3.44) is what we have
defined as x(B) in equation (3.38).
It is probably simplest to explain the meaning of the inequality by looking
at a simple graph. Suppose that we fix B to 0.011. Then, the inequality (3.44)
is simplified to
In figure 3.3, the objective from equation (3.32) is plotted for various
values of x.
0.35
x=28
x=29
0.345 x=29.29
x=29.5
x=30
0.34
0.335
0.33
0.325
0.32
0.315
0.31
0.305
0.3
-0.2 0 0.2 0.4 0.6 0.8 1
We observe from figure 3.3, that if x is smaller than 29.29, the interior
optimal α1 is outside the region [0, 1]. Hence, the optimal solution is to set
α1∗ equal to 0. Alternatively, if x is larger than 29.29, the optimal α1 falls
inside the interval [0, 1] and we sell parts of the land in period 1 according
to equation (3.35). Note that the lover limit for split, x(B) is a decreasing
function in B. ( dx(B)
dB
= −0.44
B2
< 0.) Hence, increasing risk aversion (increas-
ing B) implies a decreased lower acceptable value x(B). If for instance an
individual with B = 0.011 observes x = 28 in period 1, nothing is sold in this
period. If a more cautious individual – with B = 0.015 – observed x = 28 in
period 1, he would sell ≈ 47.33% in period 1.
Note also that we are able to find an absolute lower limit x(B) in this
example. Look again at inequality (3.43). Our choice of utility function
made it necessary to limit the parameter measuring risk aversion B to the
interval [0.01, 0.02]. The inequality (3.43) states that in order to obtain a
split solution, x must be larger than the right hand side expression for any
possible B. Therefore, if we can solve the following problem,
' 3 4(
1 0.0001
x > max 33 + 4444.44 − 0.01 (3.46)
0.01<B≤0.02 3 B
3.4 Handling non linearities 45
we would find a lower limit for x which would guarantee a non-split so-
lution. Solving the reformulated inequality (3.46) yields
B(33 13 − x)
' (
1
+ 4444.44 ≤ 1 (3.48)
2 [x2 + 2222.22] .0001 − .01B
This inequality is obtained from inequality (3.43). Multiplying by 2[x2 +
2222.22] and rearranging terms give
1
H(B)(33 − x) − 2x2 < 0 (3.49)
3
which is inequality (3.36). Let us plot the left hand side of this inequality
as a function of x for a range of B values. Figure 3.4 shows the results. (Note
that B is increasing from top to bottom in figure 3.4.)
Some parts of figure 3.4 are simple to explain. Let us start by investigating
small B’s. We see that if B is sufficiently small, approximately; B < 0.015,
we obtain a structure with an upper limit on x. Look for instance at the
case B = 0.013. Then if x is smaller than 41, the value of inequality (3.49)
is true and we get a split solution. Alternatively if x > 41, α1 is larger than
1 and the optimal solution is to sell all the land in period 1. If the degree
of risk aversion is increased, for instance by choosing B = 0.014, we get the
same type of solution but with a larger upper bound on x. Surely this seems
sensible, the decision maker is more cautious and needs a higher observed x
in period 1 in order to sell all his land. The bottom part of the figure is also
easy to explain. If B ≥ 0.16, we observe that the function is always negative.
46 SDP - Benefits
10000
B=0.014
B=0.013
8000 B=0.015
B=0.017
B=0.016
B=0.0155
6000
4000
2000
-2000
-4000
-6000
-8000
20 30 40 50 60 70 80 90 100
This implies that the degree of risk aversion is so large that we always obtain
a split solution. In this area, the decision maker is so cautious that he always
insures himself against a low future x by selling some of the land in period
1.
However, the mid part of figure 3.4 (B = 0.0155) is harder to explain.
Note that for this value of B, the inequality (3.49) is true for 2 intervals –
x < 55 and x > 87 approximately. To stress the implications of this pattern
we have constructed another graph. This graph is shown in figure 3.5.
In figure 3.5, we have plotted the objective function in the optimization
problem (3.32) as a function of α1 . We have done this for various values of x
and a fixed value of B = 0.0155. Additionally, we have plotted the optimal
values of α1 and connected these by a line.
This graph (α1∗ (x)), shows how the optimal values of α1 changes as x
changes. In order to explain this structure we have taken it out an plotted
it in figure 3.6.
If we start examining figure 3.6 in point A, we observe that when we
move from point A to B, the optimal proportion sold in period 1 is increasing
towards 1. The horizontal line between points B and C is obtained because
α1∗ must be ≤ 1. This makes sense. However, the behaviour between points
C and D seems weird. Here, we obtain a solution where the optimal α1 is
decreasing when x increases. The interpretation is that when x becomes
3.4 Handling non linearities 47
1.2
x=40
x=50
x=60
x=70
1 x=80
x=90
x=100
0.8
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
u′′ (w)
ARA = − (3.50)
u′ (w)
Loosely speaking, the idea is to use ARA as a measure of how risk aversion
changes when wealth (w) changes. Hence, you would expect ARA to be
decreasing with w. It is easier to engage in a high stake bet for a rich person
who can bear the loss than for a poor person who can not. The larger ARA
is, the more risk averse the utility function is.
Let us calculate ARA and dARA dw
for the quadratic family of utility func-
2
tions .
2A dARA 4A2
ARA = and = >0 (3.51)
B − 2Aw dw (B − 2Aw)2
We observe from equations (3.51) that ARA is dependent on the argu-
ment w and more important, ARA increases with the argument of the utility
function. This last property shows why we get our weird result. When x in
figure 3.6 becomes very big, this has the effect of increasing risk aversion,
even if B is fixed at 0.0155. Hence when x becomes close to 100, the deci-
sion maker gets more risk averse and shifts from the less risk averse decision
of selling all in period 1 towards the more risk averse decision of split sale
between the two periods.
By aid of figure 3.4, we can construct a more precise solution structure
than equation (3.36) indicates. It should be easy to realize that we get three
different solution types in this area depending on the value of B. If B is
smaller than a value, call this value B1 , we obtain a solution corresponding
to the upper part of figure 3.4. That is, if B is smaller than B1 , we split
the solution. On the other hand, if B is larger than B1 we sell everything
in period 1 - (α1∗ = 1). It should also be evident that this solution structure
2
We use the normal way of defining a quadratic utility function; see (Copeland and
Weston, 1983); u(w) = Aw2 − Bw for these calculations
3.4 Handling non linearities 49
is valid for values of B smaller than the B which is such that the larger of
the roots in equation (3.52) equals 100. (Note from figure 3.4 that the value
we are looking for should be close to 0.015, and that H(B) is defined in
equation (3.36).)
1
H(B)(33 − x) − 2x2 = 0 (3.52)
3
The problem we have formulated in words above, may be formulated
mathematically as follows: Find the roots r1 , r2 from equation (3.52) and
solve equation (3.53) for B;
⎡ 7 ⎤ ⎡ 7 ⎤
H(B) ⎣ 266 32 266 32
r1 = − 1+ 1+ ⎦ and r2 = − H(B) ⎣1 − 1+ ⎦
4 H(B) 4 H(B)
(3.54)
It is easy to show that max(r1 , r2 ) = r1 . Therefore, the second step in
our procedure involves solving the following equation.
⎡ 7 ⎤
2
H(B) ⎣ 266 3
− 1+ 1+ ⎦ = 100 (3.55)
4 H(B)
H(B)
x∗ = − (3.57)
4
The maximal value should equal 0;
H(B)2
3 4 3 4
1 H(B)
H(B) 33 + −2 =0 (3.58)
3 4 16
Equation (3.58) is a simple quadratic equation in H(B) with solution;
2
H(B) = −266 (3.59)
3
Hence, B2 is found to be 0.016. (Refer to figure 3.4.) Let us try to sum
up the solution in a table.
Table 3.2: Solution to the house selling example with quadratic utility func-
tion and uniform density.
x(B) in Table 3.2, is defined in equation (3.38). The term “Split” refers
to a split solution computed by equation (3.40). r1 and r2 are defined in
equation (3.54).
This section has demonstrated that SDP may be applied in solving SDP’s
with non linearities. However, the solution structure to this seemingly simple
two periodic problem, turned out to be quite complex. Partially, this was
due to a somewhat special choice of utility function.
VN (x) = x (3.61)
That is, in the last period the best we can do is to sell as any outcome
in [a, b] yields positive contribution to the objective. Let us move to period
N − 1. The optimality equation (3.60) then becomes:
' ) b (
VN −1 (x) = max x, xf (x)dx − c (3.62)
a
;b
If we had known the density f (x), the expression a xf (x)dx − c could
have been computed as a number. Let us call this number p1 and rewrite
equation (3.62);
$%%%&#'
&'(
#
!
$%"
! # " #
!
Figure 3.7: VN −1 (x) in the house selling example with infinite horizon.
VN −2 (x) = max [x, E {VN −1 (x)}] = max [x, E {max [x, p1 ]}] (3.64)
giving
' ,) p1 ) b :(
VN −2 (x) = max x, p1 f (x)dx + xf (x)dx − c (3.67)
a p1
3.5 Analytic solutions 53
Now we can repeat the argument that led to equation (3.63) in this period.
Hence,
where
) pj−1 ) b
pj = pj−1 f (x)dx + xf (x)dx − c (3.70)
a pj−1
and
) b
p1 = xf (x)dx − c (3.71)
a
i pi
1 20.000
2 22.000
3 22.420
4 22.513
5 22.534
6 22.540
7 22.540
8 22.540
9 22.540
10 22.540
54 SDP - Benefits
lim pj → p∗ (3.72)
j→∞
Consequently,
,
∗ x x ≥ p∗
VN −j (x) → V (x) = max [x, p ] = (3.73)
p ∗ x < p∗
Thus, if we view our problem in an infinite horizon perspective, we have
identified a strategy independent of time (stages). The strategy may be
interpreted as follows: If we observe an x at any time, smaller than p∗ , we
do nothing. Alternatively, if we observe an x larger than p∗ , we sell our land
immediately.
We have earlier referred to such a strategy as a stationary policy. Not
only have we identified such a strategy, we have also an equation to find it.
Namely,
) p∗ ) b
∗ ∗
p = p f (x)dx + xf (x)dx − c (3.74)
a p∗
; p∗
Adding and subtracting the term a
xf (x)dx yields
) p∗
∗
p = E(x) − c + (p∗ − x) f (x)dx (3.75)
a
p∗
1
)
∗
p = 50 − 30 + (p∗ − x) (3.77)
100 0
or
Max Z = $∞
$
t=1 [E(x) − (t − 1)c] δt
∞
s.t. t=1 δt ≤ 1 (3.81)
δt ∈ {0, 1}
Given our earlier assumptions of E(x)−c > a and a > 0, it is easy to real-
ize that E(x) must be positive. Hence, the optimal solution to problem (3.81)
is easily found as
P (x ≥ p∗ ) = q (3.83)
Hence, q is the probability that x is larger than or equal to the reserve
price - p∗ . Then, the expected number of periods before the sale is made can
be computed as follows:
∞
"
=q t(1 − q)t−1 (3.85)
t=1
1
E(“Waiting time”) = (3.86)
q
Utilizing equation (3.86) we can compare the stochastic and the deter-
ministic model. If q = 1, the probability of observing an x larger than p∗
equals 1. Then, we obtain an expected waiting time of one period or the
same solution as in the deterministic case. (Note that our definition of pe-
riods means that immediate sale gives a waiting time of 1 period.) On the
other hand, if this probability decreases, the expected time before the sale is
made increases geometrically.
This example has showed that in some situations, SDP may be applied
to obtain analytic solutions to stochastic optimization problems. The reader
should not make the mistake of believing that this is a common situation. In
some situations, it may be helpful – at least as a way of obtaining principal
information on problem behaviour.
However, this example may be used to stress another important point.
In practice, many people tend to apply scenario analysis as a method of
taking care of uncertainty. By the term scenario analysis, we here refer to a
process where various scenarios or possible future developments of random
structures are substituted for stochastic variables. Then for each scenario,
a deterministic optimization problem is solved. Finally, one tries to weigh
these deterministic solutions together in order to find some solution that
3.6 Concluding remarks 57
takes uncertainty into consideration. The point we want to stress here, is the
fact that such a strategy may be dangerous. It may not exist any weighing
strategy that captures the stochasticity of the problem.
Let us look at a “scenario analysis” way of solving our example. Suppose
that two scenarios are formulated; Good and Bad. Suppose that the Good
scenario is characterized by a fixed x = 70 in all future periods while the Bad
scenario has x = 10 in all future periods. If we then solve the deterministic
optimization problem (3.81) for these two scenarios, we obtain the same
solution in both cases; namely
SDP - difficulties
This equation (4.1), is written in a form where the state space (possible
values for the state variable i) is one dimensional. Suppose our problem needs
a multidimensional state space definition; say i1 , . . . , im where we assume that
each state variable can take a set of discrete values. Then equation (4.1) may
be expressed:
I N
2 1024
3 59049
4 1048576
5 9765625
6 60466176
7 282475249
8 1073741824
-
#'
%
#$
(
!
!
!"$
%
#&
!"#
)*+,
qk1 at stage n, qk2 at stage n + 1 and so on. Alternatively, if the house is sold
at stage n + k, resources needed at n + k is qk1 , qk2 at n + k + 1 etc.
To make these resource commitments interesting, we need some limita-
tions on total use of resources at each stage. Let Qn be the available amount
of resource at stage n. For instance, the actual resources may involve build-
ing roads connecting the houses to an available road system, and we have a
limited crew of road builders.
So far, we have not said anything about uncertainty. In previous exam-
ples, the sales price of houses were modelled as a stochastic variable. This
seems still to be a sensible assumption. Actually, the point we want to make
is not necessarily dependent on which stochastic mechanism we implement.
Therefore, we need not specify how the stochastic sales price may be mod-
elled. Let us instead look at a certain stage n and ask the following question:
What is the necessary information we need to make a decision at this stage.
Surely, we need the value function at stage n + 1 and a stochastic mechanism
to compute the expected value of this function. But we also need some ad-
ditional information. Not only do we need information on whether a house
has been sold or not, but if it has been sold, we need to know when. This
information is necessary in order to check the resource constraints. That is,
whether a decision/state combination is legal. Hence, we need for each house
a state variable describing whether the house has been sold earlier and if so
when. The following state variable structure yield the necessary information:
64 SDP - difficulties
Let ikn be a state variable associated with house k at stage n and let ikn take
on the following values:
S4 = 10 − 5x5 (4.8)
Moving on to stage 3, the remaining resources is;
If we look back on section 3.5 we solved an infinite horizon problem. The key
point in such problems is to find a so called stationary policy. Such a policy
is characterized by being independent of the stages in the model. Hence,
in a simple SDP model, we want to find decisions that only depend on the
states of the system. The literature tends to use an alternative name for such
problems – Markov decision processes or MDP’s. To understand the reason
for this somewhat confusing term, look again at the fundamental optimality
equation;
% &
"
Vn (i) = max R(i, a) + Pij (a)Vn+1 (j) (5.1)
a
j
Table 5.1: Net profit in each time period for various payment and mainte-
nance possibilities
HE LE
H 100 180
M 50 130
L 0 80
The link between your choice of strategy {HE, LE} and your tenants
choice of strategy {H, M, L} is not certain. Table 5.2 gives information on
the probabilistic nature of the causality between strategic choices from your
point of view.
Note from Table 5.2 the dual nature of the causality. Your maintenance
effort is only partly determining the probabilities for the state values in the
next period, also the observed state value has impact. The probability of
obtaining a high payment in the next period, if today’s payment is high, is
larger than if today’s payment is low – independently of your action. Hence,
you assume some kind of underlying stochastic process guarding the pay-
ment scheme. A sensible practical explanation on such, may be that today’s
payment is an indication on the present economic situation for your tenants.
Hence, if you observe a high payment today, you assume that your tenants
also have money in the next period.
The decision problem facing you, may then be described as follows: In
each time period, you observe which payment you receive. Based on this
5.2 Full enumeration 69
Table 5.2: Probabilities for High (H), Medium (M) or Low (L) payments in
the next period, given observed state values and your decisions
State Decision H M L
H HE 0.8 0.1 0.1
LE 0.4 0.3 0.3
M HE 0.4 0.4 0.2
LE 0.3 0.3 0.4
L HE 0.4 0.3 0.3
LE 0.1 0.1 0.8
information, you must decide on which maintenance effort to use. This deci-
sion picks a future described by Table 5.2. This information decision pattern
is then repeated infinitely.
A sensible objective may seem to be a maximization of the total expected
net profit. However, it is easily observed from equation 5.1 that such a
strategy may be hard to implement. As our problem lacks any impatient
cost, refer to the example in section 3.5, the objective will be unbounded; i.e.
grow to infinity. As we are interested in finding policies – that is strategies
independent of time – we might as well maximize the expected net profit
per period. The alternative to this interpretation is to introduce explicit
impatience costs, normally in the form of discounting. We will briefly return
to this type of models later.
Table 5.3: Possible policies and associated net profits for the MDP-example
State 1 2 3 4 5 6 7 8
Policy H HE HE HE HE LE LE LE LE
M HE HE LE LE HE HE LE LE
L HE LE HE LE HE LE HE LE
Net H 100 100 100 100 180 180 180 180
profit M 50 50 130 130 50 50 130 130
L 0 80 0 80 0 80 0 80
sition matrix. For instance, using Table 5.2 it is easily seen that policy 1 has
the following matrix of transition probabilities;
H M L
H 0.8 0.1 0.1
P1 = (5.2)
M 0.4 0.4 0.2
L 0.4 0.3 0.3
H M L
H 0.8 0.1 0.1
P4 = (5.3)
M 0.3 0.3 0.4
L 0.1 0.1 0.8
Note that we use the notation Pp for the matrix of transition probabilities
associated with policy p, p ∈ {1, 2, . . . , 8}.
If we knew the probabilities of observing states H, M and L for each of the
possible policies in the long run, we could merely compute the expected profit
for each policy and choose the policy with the largest expected profit. Luckily,
such long run probabilities are easily obtained. The theory of stochastic
processes give a direct answer to this problem – refer for instance to (Ross,
1996). Hence, long run or steady state probabilities may be calculated for
each policy p by the following set of linear equations:
5.2 Full enumeration 71
I
" I
"
πjp = πip ppij , πjp = 1 ; ∀j ∈ {1, 2, . . . , I} and ∀p ∈ {1, 2, . . . , P }
i=1 j=1
(5.4)
The notation in equation (5.4) has the following meaning: πjp is the steady
state probability of observing state j given policy p. ppij is element ij in matrix
Pp . I is the number of states (3 in our example), while P is the number of
policies (8 in our example). To show how one of these π’s can be calculated
let us look at π 1 . Using equation (5.4) we get the following equational system:
State π1 π2 π3 π4 π5 π6 π7 π8
H .6667 .4762 .6379 .4167 .4000 .2326 .3700 .1923
M .1852 .1429 .1724 .1250 .3333 .2093 .3000 .1731
L .1481 .3809 .3809 .4583 .2667 .5581 .3300 .6346
Table 5.5: Expected per period net profits for all possible policies
1 2 3 4 5 6 7 8
75.930 85.237 86.202 94.584 88.665 96.981 105.600 107.885
the maximal expected per period net profit. Hence, the landlord’s strategy
is the somewhat cynical one of maintaining the flat as little as possible.
Even though many people may feel that such a strategy is quite common
in practice, it is easy to change the data somewhat such that an alternative
strategy yields higher expected per period net profit than the given one. For
instance, assume that the cost associated with making a low effort is changed
from $20 to $40, all other data unchanged. Then, the expected per period net
profits for policy 7 and 8 becomes {160, 110, 0} for policy 7 and {160, 110, 60}
for policy 8. Computing expected values for the two policies yields 92.200
for policy 7 and 87.885 for policy 8. Hence under these assumptions, the
optimal strategy has changed.
,
1 if decision a is chosen for observed state i
δia = (5.7)
0 otherwise
where a is a decision, chosen from the set {1, 2, . . . , A} and i is a state,
chosen from the set {1, 2, . . . , I}.
We add the following set of constraints on δia :
A
"
δia = 1 ∀i ∈ {1, 2, . . . , I} (5.8)
a=1
Given the definition (5.7) and the constraints (5.8), δia may be interpreted
as a binary variable picking all possible policies. For instance, policy 5 from
Table 5.3 can be picked by assigning the following values to δia ;
δ11 = 0 δ12 = 1
δ21 = 1 δ22 = 0 (5.9)
δ31 = 1 δ32 = 0
where state values 1, 2, 3 corresponds with H, M, L and decisions 1, 2
corresponds with HE, LE. Hence, the problem we would like to solve, is to
assign a set of values to δia (picking a policy) which maximizes expected per
period net profit. If such an optimization problem can be formulated, we can
use mathematical programming methods to find the solution. As the title to
this section indicates, we are looking for a linear programming formulation.
The set of decision variables δik which we have formulated, indicate however
an integer program. To avoid an integer program formulation, we perform a
little trick. We introduce the term randomized policy. A randomized policy
is an extension to a deterministic policy, characterized by the fact that we
allow ourselves to choose the probability of performing an action. Suppose
we change the values of policy 5 as follows:
δ11 = 0 δ12 = 1
δ21 = 21 δ22 = 12 (5.10)
δ31 = 1 δ32 = 0
Then, policy 5 may be interpreted as follows: If state 1 (H) is observed,
we make decision 2 (LE) with certainty. If state 3 (L) is observed, we make
decision 1 (HE) with certainty. However, if state 2 (M) is observed, we toss a
coin to decide on which action (HE or LE) to do. That is, we may interpret
δia as follows:
74 Infinite horizon problems
Therefore, our original decision variables δia are easy to calculate by equa-
tion (5.18) when the yia ’s are known.
Let us now turn to the expression for the objective function. Let Ria be
the per period net profit of observing state i and making decision a. If we
5.3 Using LP to solve MDP’s 75
return to the example in section 5.1 this values are readily available from
Table 5.1. That is,
We have three sets of constraints we need to take into account. First, our
unconditional probability density yia must sum to 1.
I "
" A
yia = 1 (5.21)
i=1 a=1
where pij (a) are the transition probabilities from Table 5.2. Equa-
tion (5.22) must be expressed in the yia variables. This is easily achieved
by utilizing equation (5.17) giving
A
" I "
" A
yja = yia pij (a) , ∀j ∈ {1, 2, . . . , I} (5.23)
a=1 i=1 a=1
A
"
yia ≤ yia (5.26)
a=1
$I $A
Max a=1 Ria yia
$Ii=1 $A
s.t. i=1 a=1 y ia = 1
$A $ I $A (5.27)
a=1 yja − i=1 a=1 yia pij (a) = 0 , ∀j ∈ {1, 2, . . . , I}
yia ≥ 0 , ∀i ∈ {1, 2, . . . , I} , ∀a ∈ {1, 2, . . . , A}
Let us now use this formulation to formulate and solve the example in
section 5.1. Using the data from section 5.1 and equation (5.27) we obtain
the following linear programming problem:
y12 = 0.1923 , y22 = 0.1731 , y32 = 0.6346 , y11 = y21 = y31 = 0 (5.29)
Using equation (5.18), the corresponding optimal policy can be calculated
as:
100
100 = α110 ⇒ α = (5.32)
110
Note that the basic assumption which leads to equation (5.33) is the
convergence of Vn . That is:
the initialization step, V 0 (i) = 0 , ∀i. Then, we obtain the the values for
V 1 (i) in Table 5.6 merely by maximizing each row in Table 5.1. V 2 (1) is
found by the following expression:
V 2 (1, a = 1) = 100 + 0.1 (0.8 · 180 + 0.1 · 130 + 0.1 · 80) = 116.5
(5.36)
V 2 (1, a = 2) = 180 + 0.1 (0.4 · 180 + 0.3 · 130 + 0.3 · 80) = 193.5
"
Vp (i) = R(i, p(i)) + α Pij (p(i))Vp (j) , ∀i ∈ {1, 2, . . . , I} (5.37)
j
5.6 Method of policy improvement 81
where we use the notation Vp for the value function of the given policy p,
and p(i) for the policy to emphasize the fact that it is state dependent. If we
are able to find another policy which is better then p then we can evaluate
the new one in the same fashion. According to the title of this section, the
fact that this is possible should not be surprising. The improvement step is
done by performing the following set of calculations.
% &
"
max R(i, a) + α Pij (a)Vp (j) (5.38)
a
j
117.930 = 100 + 0.1 (0.8 · 194.826 + 0.1 · 143.784 + 0.1 · 90.637) (5.42)
Recent research
This concluding chapter will briefly discuss some important research issues.
We will not give detailed descriptions of methods, but point to some relevant
literature. A good general introduction to problems and recent research on
MPD’s may be found in White (White and White, 1989).
State(i) Vn (i)
1 100
2 10
3 20
4 50
5 70
6 60
7 30
8 40
9 90
10 80
f (i) = 10 · i (6.2)
The point of this example is to show that how we number our state space
may have an important effect on how the approximation in equation (6.1)
6.2 Compression methods 85
100
’COMPRESS.DAT’
90
80
70
60
50
40
30
20
10
1 2 3 4 5 6 7 8 9 10
Table 6.2: Example illustrating the compression problem with resorted state
space
State(i) Vn (i)
1 10
2 20
3 30
4 40
5 50
6 60
7 70
8 80
9 90
10 100
86 Recent research
• Aggregation based on LP
This is if we only need the optimal solution in the first period. Normally,
this is the case in a practical situation, as we would solve the model again in
the next period anyway. However, if we want to inspect solution structures
further into the future, the method is obviously limited.
• Vector computers
• Parallel computers
01
./01234
)* )* )*
56748
56748 56748 56748
As figure 6.2 shows, the traditional serial algorithmic approach on the left,
has a repeated computation/inference structure. A decomposition approach
splits a master problem (MP) into subproblems (SP) with individual infer-
ence. Surely, decomposition algorithms are not new. However, the introduc-
tion of usable parallel computers has to some extent rediscovered some older
approaches. Refer for instance to the importance of decomposition meth-
ods in Stochastic programming. Parallel computing has also indisputably
influenced modern algorithmic research, refer for instance to the method of
scenario aggregation and the progressive hedging algorithm by Rockafellar
and Wets (Rockafellar and Wets, 1991).
So, what has this to do with SDP? It is interesting to note that Bellman
and Dreyfus (Bellman and Dreyfus, 1962) actually discuss parallel operations
in relation to dynamic programming already in 1962. Surely, no usable paral-
90 Recent research
lel computers existed at that point. Let us return to the optimality equation
and investigate it in this perspective.
% &
"
Vn (i) = max R(i, a) + α Pij (a)Vn+1 (j) (6.3)
a
j
Beasley, J. E. (1987), ‘Supercomputers and or’, J. Opl. Res. Soc. 38, 1085–
1089.
Ross, S. M. (1996), Stochastic Processes, 2nd edition, John Wiley & Sons,
New York.
Supercomputer, 88
support, 28
Tamaki, 58
time horizon, 15, 33
time preference, 78
toss a coin, 73
total expected discounted cost, 77
transition matrix, 21
transition probabilities, 70, 75
value determination, 81
value function, 21, 34, 36, 63, 79
vector computers, 88
vector notation, 71
wait, 15
Wallace and Kall, 57
Wallace, Stein, 5
Watson and Buede, 15
White, 72, 83
why SDP does not work, 83
Zenios, 88
Zipkin, 87
Author Biography: Kjetil K. Haugen is 56
years old (born 1959), living in Molde, Nor-
way. He is a professor of Logistics and Sport
Management at Molde University College,
Specialized University in Logistics. Haugen
holds a PhD in Management Science from
the Norwegian Institute of Technology from
1991. He achieved a (full) professorship in
Logistics in 2005 and (full) professorship in
Sport Management in 2011. Lately, his research interests have moved from
operations management-/logistics into sports economics/strategy. His main
interest has been game theory applied in football, both as a tool to under-
stand football economics as well as the game itself. Professor Haugen has
published in Journals such as Public Choice, EJOR, Annals of Operations Re-
search, Kybernetika, INTERFACES, Operations Research Perspectives, An-
nals of Regional Science, PLoS ONE, Journal of Sports Economics, European
Sport Management Quarterly, Sport Management Review and Sport in Soci-
ety among others.