A Parallel Dynamic Programming Algorithm For Multi-Reservoir System
A Parallel Dynamic Programming Algorithm For Multi-Reservoir System
optimization
Xiang Li
a
, Jiahua Wei
a
, Tiejian Li
a
, Guangqian Wang
a
, William W.-G. Yeh
b,
a
State Key Laboratory of Hydroscience & Engineering, Tsinghua University, Beijing 100084, China
b
Department of Civil and Environmental Engineering, University of California, Los Angeles, CA 90095, USA
a r t i c l e i n f o
Article history:
Received 8 May 2013
Received in revised form 8 January 2014
Accepted 12 January 2014
Available online 30 January 2014
Keywords:
Dynamic programming
Multi-reservoir system optimization
Joint operation
Parallel computing
a b s t r a c t
This paper develops a parallel dynamic programming algorithm to optimize the joint operation of a
multi-reservoir system. First, a multi-dimensional dynamic programming (DP) model is formulated for
a multi-reservoir system. Second, the DP algorithm is parallelized using a peer-to-peer parallel paradigm.
The parallelization is based on the distributed memory architecture and the message passing interface
(MPI) protocol. We consider both the distributed computing and distributed computer memory in the
parallelization. The parallel paradigm aims at reducing the computation time as well as alleviating the
computer memory requirement associated with running a multi-dimensional DP model. Next, we test
the parallel DP algorithm on the classic, benchmark four-reservoir problem on a high-performance com-
puting (HPC) system with up to 350 cores. Results indicate that the parallel DP algorithm exhibits good
performance in parallel efciency; the parallel DP algorithm is scalable and will not be restricted by the
number of cores. Finally, the parallel DP algorithm is applied to a real-world, ve-reservoir system in
China. The results demonstrate the parallel efciency and practical utility of the proposed methodology.
2014 Elsevier Ltd. All rights reserved.
1. Introduction
Dynamic programming (DP), an algorithm attributed largely to
Bellman [3], is developed for optimizing a multi-stage (the term
stage represents time step throughout the paper) decision pro-
cess. If the return or cost at each stage is independent and satises
the monotonicity and separability conditions [23], the original
multi-stage problem can be decomposed into stages with decisions
required at each stage. The decomposed problem then can be
solved recursively, two stages at a time, using the recursive equa-
tion of DP. DP is particularly suited for optimizing reservoir man-
agement and operation as the structure of the optimization
problem conforms to a multi-stage decision process. Over the past
four decades, DP had been used extensively in the optimization of
reservoir management and operation [4,6,8,13,22,35,3740].
In the discrete form of DP, storage of each reservoir is discret-
ized into a nite number of levels. By exhaustive enumeration over
all possible combinations of discrete levels at each stage for all res-
ervoirs in a system, global optimality can be assured in a discrete
sense. However, the well-known curse of dimensionality [2]
limits the application of DP to multi-state variable problems, as
the state space increases exponentially with an increase in the
number of state variables. This drastic increase in state space and
the consequent random access memory (RAM) requirement
quickly can exceed the hardware capacity of a modern computer
[13]. A variety of DP variants, such as incremental dynamic pro-
gramming (IDP) [15], dynamic programming successive approxi-
mations (DPSA) [14], incremental dynamic programming and
successive approximations (IDPSA) [32] and discrete differential
dynamic programming (DDDP) [9] have been proposed to alleviate
the dimensionality problem. However these variants all require an
initial trajectory for each state variable. For a non-convex problem,
there is no assurance of convergence to the global optimum. Re-
cently, Mousavi and Karamouz [22] reduced the computation time
of a DP model for a multi-reservoir system by diagnosing infeasible
storage combinations and removing them from further computa-
tions. Zhao et al. [40] proposed an improved DP model for optimiz-
ing reservoir operation by taking advantage of the monotonic
relationship between reservoir storage and the optimal release
decision. However, the model only can be applied to reservoir
operation with a concave objective function.
Because of the hardware limitations of a single computer as
well as large-scale computing requirements, parallel computing
has been applied in many elds [25]. In the water resources eld,
there are several successful examples. Bastian and Helmig [1]
http://dx.doi.org/10.1016/j.advwatres.2014.01.002
0309-1708/ 2014 Elsevier Ltd. All rights reserved.
t1
St 1 maxff
t
St; St 1 F
t
Stg 1
where t is the time index, t 2 1; T; St is the storage vector at the
beginning of time step t; St S
1
t; . . . ; S
i
t; . . . ; S
n
t
T
; St 1 is
the storage vector at the end of time step t; i is the reservoir index,
i 2 1; n; F
t
is the maximum cumulative return from the rst
time step to the beginning of the tth time step resulting from the
joint operation of n reservoirs; initially, F
1
0; F
t1
is the
maximum cumulative return from the rst time step to the end
of the tth time step resulting from the joint operation of n reser-
voirs; and f
t
is the objective function to be maximized during
time step t. Note that Eq. (1) is the inverted form of a DP model with
reservoir storages as the decision variables. In the non-inverted DP
model, the releases are the decision variables. For a deterministic
DP model, the ending storage is related to the beginning storage
by the continuity equation.
2.2. Constraints
In the operation of a multi-reservoir system, each individual
reservoir is subject to its own set of constraints, while the reservoir
system is subject to system constraints brought by the intercon-
nection of reservoirs. Specically, we consider the following
constraints:
Continuity equation:
St 1 St It M Rt 8t 2
where It is the vector of inows to reservoirs (i 1; . . . ; n) during
time step t; Rt is the vector of total releases from reservoirs
(i 1; . . . ; n) during time step t; Rt R
1
t; . . . ; R
i
t . . . ; R
n
t
T
;
and M is the n n reservoir system connectivity matrix. Without
loss of generality, we assume that evaporation loss is balanced by
precipitation.
Initial and nal reservoir storages:
S1 S
initial
3
ST 1 PS
final
4
where S
initial
and S
final
are the vectors of initial storages and nal ex-
pected storages of reservoirs (i 1; . . . ; n).
Lower and upper bounds on storages:
S
min
t 1 6 St 1 6 S
max
t 1 8t 5
where S
min
t 1 and S
max
t 1 are the vectors of minimum and
maximum storages of reservoirs (i 1; . . . ; n) at the end of time step
t.
Lower and upper bounds on releases:
For all reservoirs:
R
min
t 6 Rt 6 R
max
t 8t 6
where R
min
t is the vector of the minimum required releases from
reservoirs (i 1; . . . ; n) during time step t; and R
max
t is the vector
2 X. Li et al. / Advances in Water Resources 67 (2014) 115
of maximum allowable releases from reservoirs (i 1; . . . ; n) during
time step t.
For reservoirs in parallel:
PR
min
l
t 6
X
i2U
l
R
i
t 6 PR
max
l
t 8t; 8j 7
where l is the index of river conuence points; l 2 1; L and L is the
number of river conuence points; U
l
is the set of reservoirs in par-
allel at river conuence point l; and PR
min
l
t and PR
max
l
t are the
minimum and maximum releases at river conuence point l during
time step t.
Lower and upper bounds on outputs:
For all reservoirs:
N
min
t 6 Nt 6 N
max
t 8t 8
where Nt is the vector of outputs produced from reservoirs
(i 1; . . . ; n) during time step t; Nt N
1
t; . . . ; N
i
t . . . ; N
n
t
T
;
N
min
t is the vector of minimum required outputs from reservoirs
(i 1; . . . ; n) during time step t; and N
max
t is the vector of maxi-
mum allowable outputs from reservoirs (i 1; . . . ; n) during time
period t.
For the entire reservoir system:
N
min
t 6
X
n
i1
N
i
t 6 N
max
t 8t 9
where N
min
t is the minimum required output from the reservoir
system during time step t; and N
max
t is the maximum allowable
output from the reservoir system during time step t.
Note that reservoir storages, releases and outputs are variables
to be solved; the others are the known input data in the
optimization.
3. Parallelization strategy
In this study, we use the Windows HPC Server 2012 R2 operat-
ing system (OS). The HPC system consists of 20 IBM HS22 blades
and an Inniband 40 GBps network. Each blade has two Intel
Xeon
E5645 2.40 GHz CPUs (each CPU has six physical cores)
and 12 GB of RAM. The Intel
Xeon
t1
St 1) in
the memory for use in the next time step. The procedure continues
until the nal time step is reached.
Procedure II: Trace back the optimal path, based on the saved
optimal transitions, from the nal time step to the rst time step
to determine the consequent storage trajectories (i.e.
ST 1 ! ! S2 ! S1) and release trajectories (i.e.
RT ! ! R2 ! R1).
For this analysis, we perform a conversion for Eq. (1). Assuming
the storage of each reservoir is discretized into m levels, the num-
ber of storage combinations of all interconnected reservoirs is m
n
at any time step (see Fig. 2). Note that storage is dened either
at the beginning or the end of a time step. For the sake of clarity,
we introduce Cp
t
; t to denote a storage combination of all inter-
connected reservoirs in a system at the beginning of time step t,
where p
t
denotes the serial number of a storage combination at
the beginning of time step t, p
t
2 1; m
n
. Thus all possible storage
combinations of all reservoirs over the entire planning horizon
can be expressed abstractly as:
C
C1; 2 C1; t C1; T 1
.
.
.
.
.
.
.
.
.
q
.
.
.
.
.
.
Cp
t
; t
.
.
.
.
.
.
q
.
.
.
.
.
.
.
.
.
Cm
n
; 2 Cm
n
; t Cm
n
; T 1
2
6
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
7
5
m
n
T
10
where C is the m
n
T matrix; C C2; . . . ; Ct; . . . ; CT 1;
Cp
1
; 1 is the initial storage combination, often xed; C
is denoted
as the m
n
T matrix of optimal transitions or candidate paths, used
to save the optimal previous storage combinations to the current
ones so as to trace back the optimal path; and C
s element
C
p
t1
; t 1 saves the optimal storage combination at the begin-
ning of time step t to storage combination p
t1
at the end of time
step t. In addition, all maximum cumulative returns from the rst
time step to the beginning of the tth time step are denoted as F
t
,
where F
t
is the m
n
1 matrix. By substituting Cp
t
; t
(p
t
1; . . . ; m
n
) for St, the vector form Eq. (1) can be rewritten into
an equivalent scalar form Eq. (11):
F
t1
Cp
t1
; t 1 maxff
t
Cp
t
; t; Cp
t1
; t 1
F
t
Cp
t
; tg 11
Fig. 3 schematically illustrates the computing procedures and
computer memory used for the serial DP algorithm. Fig. 4 shows
the two computing procedures of the serial DP algorithm, based
on the scalar form Eq. (11). First, the DP algorithm executes Proce-
dure I: For a given p
t1
, to determine the maximum cumulative re-
turn F
t1
Cp
t1
; t 1 and optimal transition C
p
t1
; t 1, the
objective function f
t
of all possible transitions must be examined
between Cp
t
; t (p
t
1; 2; . . . ; m
n
) and Cp
t1
; t 1 and F
t
Cp
t
; t
(p
t
1; 2; . . . ; m
n
) has to be added as well. Then, the maximum
cumulative return F
t1
Cp
t1
; t 1 and optimal transition
C
p
t1
; t 1 are saved in computer memory. After all maximum
cumulative return F
t1
Cp
t1
; t 1 (p
t1
1; 2; . . . ; m
n
) and opti-
mal transition C
p
t1
; t 1 (p
t1
1; 2; . . . ; m
n
) are derived at the
X. Li et al. / Advances in Water Resources 67 (2014) 115 3
Fig. 1. Schematic representation of the HPC system.
Fig. 2. Conversion from reservoir storages to reservoir storage combinations.
Fig. 3. Schematic illustration of computing procedures and computer memory used for the serial DP algorithm.
4 X. Li et al. / Advances in Water Resources 67 (2014) 115
end of time step t, the algorithm proceeds to the next time step un-
til the end of the planning horizon. Then the DP algorithm carries-
out Procedure II: The optimal storage transition for each reservoir
as well as the corresponding release policy can be traced by a back-
ward sweep.
The computation time of the DP algorithm, mainly from Proce-
dure I, can be approximately estimated as:
s
1
m
2n
Ds T 12
where s
1
is the wall clock time of using one computing process;
there are a total number of m
2n
time evaluations for Eq. (1) at each
time step; all evaluations for Eq. (1) are assumed to require the
same average wall clock time Ds. It should be noted that this is
an upper bound estimation that includes the possible infeasible
transitions, which are discarded in the actual computation. The
infeasible transitions are the transitions that cannot be reached
due to insufcient inow to a reservoir [22].
On the other hand, Procedure I requires large RAM capacity. For
illustration, we consider the smallest amount of RAM to occupy
when running the DP algorithm on a single computer, as shown
in Fig. 3 (right). There are two main types of variables that need
to be saved in the sequential decision-making process: one is in
the form of integer variables, used to save the optimal transitions
C
t
and F
t1
for time
steps t and t 1 in two one-dimensional arrays, respectively. It
should be noted that the maximum cumulative returns at time
steps t and t 1 are updated until the nal time step is reached;
that is to say, F
t1
, F
t2
, . . . , F
1
are no longer saved in the RAM during
time step t (see Fig. 3). For the sake of simplicity, we assume the
two main variables occupy the same amount of RAM, U bytes. Thus
the DP algorithms total RAM amount simply can be expressed as:
RAM
1
m
n
T 2 U 13
where RAM
1
is the RAM amount running the DP algorithm on one
computing process (byte).
From Eqs. (12) and (13), it can be seen that computation time
and computer memory are proportional to m
n
for a multi-reservoir
DP problem, and for this reason the curse of dimensionality issue
arises.
3.2. Peer-to-peer parallel paradigm
In order to apply the DP algorithm to multi-reservoir system
optimization, it is necessary to develop an effective parallelization
strategy for the DP algorithm in order to shorten computation time
as well as alleviate the RAM bottleneck. The RAM bottleneck would
make the DP algorithm un-implementable when a single comput-
ing process is used or several computing processes are used on the
shared memory architecture.
The purpose of parallelization is to decompose the original task
into several subtasks. Moreover, the parallelization distributes the
total RAM associated with the task into several subtasks, each of
which requires less RAM. Fig. 5 illustrates the way we distribute
the above-mentioned two types of variables among K computing
processes. We believe that the distribution only can be made along
the vertical direction since the DP model features accumulation
time step by time step (refer to Eq. (1)). Parallelization is based
on the peer-to-peer parallel paradigm, consisting of two types of
computing processes, i.e. K peer processes and one transfer pro-
cess. Each peer process is in charge of sub-allocation of the total
RAM. The total number of storage combinations m
n
are distributed
among K peer processes, with the number of storage combinations
m
k
allocated to peer process k, k 2 1; K, yielding m
n
P
K
k1
m
k
. For
the sake of clarity, let C
1
; . . . ; C
k
; . . . ; C
T
and F
t
F
1;t
; . . . ;
F
k;t
; . . . ; F
K;t
T
, as shown in Fig. 5.
Parallelization should consider the concurrency and depen-
dency among subtasks undertaken by the K peer processes. Here,
the term concurrency refers several subtasks that can be exe-
cuted simultaneously on multiple computing processes. The term
dependency refers to a computing process that can perform a
subtask only after the other computing processes have nished
certain subtasks. As far as concurrency is concerned, during time
step t the computation of each optimal transition, say
C
p
t1
; t 1, is independent of the others in the DP algorithm;
Fig. 4. Flowcharts of Procedure I and Procedure II of the serial DP algorithm.
X. Li et al. / Advances in Water Resources 67 (2014) 115 5
that is, the computations of all C
p
t1
; t 1 p
t1
1; 2; . . . ; m
n
can
be executed simultaneously. As far as dependency is concerned, to
compute C
p
t1
; t 1 during time step t, all optimal transition
C
p
t
; t p
t
1; 2; . . . ; m
n
and maximum cumulative returns
F
t
Cp
t
; t p
t
1; 2; . . . ; m
n
, which are saved among K peer pro-
cesses in this paradigm (see Fig. 5), should be known prior. In other
words, a peer process cannot accomplish the computation for
C
p
t1
; t 1 unless all peer processes have nished all computa-
tions for C
p
t
; t p
t
1; 2; . . . ; m
n
and F
t
Cp
t
; t p
t
1; 2; . . . ; m
n
.
Thus there is an underlying synchronization during each time step
in this paradigm.
The two processes have basic functions. Each peer process eval-
uates a subtask of the current maximum cumulative returns and
optimal transitions based on the previous maximum cumulative
returns completed by the K peer processes and after the evalua-
tions the subtask of current maximum cumulative returns and
the optimal transitions are saved in the RAM. The transfer process
is in charge of communicating maximum cumulative returns
among all peer processes and tracing back the optimal path.
Figs. 6 and 7 showthe two procedures (Procedure I and Procedure
II) of the parallel DP algorithm. Unlike the procedures of serial DP
algorithm in Fig. 4, some inter-process communicating statements
are added between the peer processes and the transfer process.
Fig. 6 presents the owchart of peer process k and the transfer
process of Procedure I of the parallel DP algorithm. Peer process j is
any one of the K peer processes that has the same workows as peer
process k, andpeer process k or the transfer process receive the mes-
sageof peer process j basedontherst come, rst served principle;
andwis a counter variable. The morespecic workows of Procedure
I of the parallel DP algorithm include the following steps:
(1) Start to allocate a sub-allocation of the total RAM for all
processes;
(2) When t 1, for each peer process, say peer process k, initial-
ize F
k;2
and C
k
2 under the given F
1
0 and Cp
1
; 1;
(3) Each peer process, say peer process k, sends F
k;t1
to the
transfer process at time step t;
(4) The transfer process receives F
j;t1
from a certain peer pro-
cess j based on the principle of rst come, rst served
and sends F
j;t1
to all the peer processes;
(5) Repeat Step (4) until F
j;t1
from each of the peer processes
(i.e. j 1; . . . ; K) is received and sent by the transfer
process;
(6) Each peer process, say peer process k, receives F
j;t
at time
step t 1 (i.e. F
j;t1
at time step t) from the transfer process
based on the principle of rst come, rst served and car-
ries-out the recursive equation to compute and update
F
k;t1
and C
k
t 1 based on the sums of the objective func-
tion resulting from the transitions from C
j
t to C
k
t 1 and
F
j;t
;
(7) Repeat Step (6) until F
j;t
is received K times from the transfer
process (i.e. j 1; . . . ; K);
(8) Repeat Steps (3)(7) until the nal time step T is reached.
After Procedure I, each peer process, say peer process k, saves C
k
,
F
k;T
and F
k;T1
in its RAM. Then, Procedure II of the parallel DP algo-
rithm begins, as shown in Fig. 7, and the workows include the fol-
lowing steps:
(1) Each peer process, say peer process k, sends F
k;T1
and
C
k
T 1 to the transfer process;
(2) The transfer process receives F
j;T1
and C
j
T 1 from a
certain peer process j based on the principle of rst come,
rst served and compares the elements in F
j;T1
and
updates the maximum F
T1
C
p
T1
; T 1 and consequent
C
p
T1
; T 1;
Fig. 5. Schematic illustration of computing procedures and computer memory used for the parallel DP algorithm.
6 X. Li et al. / Advances in Water Resources 67 (2014) 115
Fig. 6. Flowchart of peer process k and transfer process of Procedure I of the parallel DP algorithm.
Fig. 7. Flowchart of peer process k and transfer process of Procedure II of the parallel DP algorithm.
X. Li et al. / Advances in Water Resources 67 (2014) 115 7
(3) Repeat Step (2) until F
j;T1
and C
j
T 1 from all peer pro-
cesses (i.e. j 1; . . . ; K) are received and the maximum
F
T1
C
p
T1
; T 1 and consequent C
p
T1
; T 1 are
derived;
(4) If t > 1, go to Step (5); otherwise go to Step (10);
(5) Identify the peer process that saves C
p
t
; t based on
C
p
t1
; t 1 and let X 1 (where X is used to judge which
choice the peer process should make) for the peer process,
X 2 for the other peer processes, and W 0 (where W is
used to judge whether to end the transfer process);
(6) The transfer process sends X to all the peer processes;
(7) Each peer process, say peer process k, receives X from the
transfer process. If X 1, the peer process should send
C
k
t to the transfer process and then repeat to receive X
for the next time step. If X 2, the peer process directly
repeats to receive X for the next time step;
(8) The transfer process receives the required C
j
t, which saves
the C
p
t
; t at time step t (i.e. C
p
t1
; t 1 at time step
t 1) from the required peer process, say peer process j;
(9) Repeat Steps (4)(8);
(10) Let X 3 for all peer processes and let W 1;
(11) Each peer process ends its process, once receive X 3 from
the transfer process;
(12) The transfer process ends its process when W 1.
The optimal path is derived by the transfer process with Proce-
dure II. Furthermore, the consequent storages and releases of mul-
tiple reservoirs also can be determined for all time steps with the
derived optimal path.
We consider a straightforward workload balancing strategy to
decrease the idle computing processes and further obtain good
performance on parallel efciency. The strategy is to approxi-
mately allocate the same amount of subtask to each peer process.
Thus the number of storage combinations saved in a peer process is
dened as:
m
k
a 1 k 6 b
m
k
a b < k 6 K
14
where
a um
n
=K 15
b vm
n
; K 16
and where u is the oor integer function (e.g. u5=3 1) and
m is the remainder function (e.g. v5; 3 2). However, we note
that the discarded infeasible solutions (see Section 3.1) still result
in imbalance tasks assigned to all peer processes.
The computation time of using the parallel DP algorithm is esti-
mated as:
s
K
s
0
s
00
s
000
=K 17
where s
K
is the wall clock time of using K peer processes, consisting
of the computation time fraction s
0
, the communication time frac-
tion s
00
and the workload imbalance time cost s
000
. We employ the
parallel efciency E
K
as a measurement to evaluate the parallel per-
formance, calculated as:
E
K
s
1
=K s
K
18
The parallel efciency of a parallel program is dependent on the ra-
tio between the communication time and the computation time. If
the ratio is small, the parallel efciency is high; otherwise the par-
allel efciency is low.
The RAM of the parallel DP algorithm allocated to each peer
process, say peer process k, is estimated as:
RAM
k
m
n
T 2 U=K 19
where RAM
k
is the RAM amount for each peer process (byte). For a
system of distributed memory architecture, such as Fig. 1, all
computing processes in a blade share RAM. Suppose H computing
processes share RAM whose size is RAM bytes. When using the
parallel DP algorithm, we roughly can determine the upper bound
of RAM by the following equation:
m
n
T 2 U=K H 6 RAM 20
From Eq. (20), we see that the RAM requirement for a multi-
reservoir DP problem can be alleviated by the number of comput-
ing processes (K). This is a breakthrough in that multi-reservoir
system optimization problems that previously could not be solved
by a serial DP algorithm on a single computer because of RAM
requirements now can be solved with parallel computing.
However, we note that an increase in reservoir number will result
in an increase in computing process demand.
4. The four-reservoir example
4.1. Problem description
We rst apply the developed parallel DP algorithm to the clas-
sic, hypothetical four-reservoir problem (see Fig. 8). This problem
has been studied in the literature by several researchers. Larson
[15] solved the problem by IDP. Heidari et al. [9] solved the same
problem by DDDP. More recently, Wardlaw and Sharif [34] used
a genetic algorithm while Kumar and Reddy [12] used particle
swarm optimization to evaluate the performance of their heuristic
techniques.
From Fig. 8, we see that the reservoir system consists of both
series and parallel connections (L 1, U
1
1; 3). The reservoir
system connectivity matrix is:
M
1 0 0 0
0 1 0 0
0 1 1 0
1 0 1 1
2
6
6
6
4
3
7
7
7
5
21
The inows are assumed to be constant for the entire operating
period and are set as:
I
2
3
0
0
2
6
6
6
4
3
7
7
7
5
22
Fig. 8. The four-reservoir problem.
8 X. Li et al. / Advances in Water Resources 67 (2014) 115
The main objectives of the system operation are to maximize
the benets from hydropower generation from reservoirs
i 1; 2; 3; 4 as well as the benets from irrigation from reservoir
i 4 over 12 time steps (n 4, T 12). We expand the objective
function f
t
to be maximized during time step t to:
f
t
X
4
i1
b
i
t R
i
t b
5
t R
4
t t < 12
X
4
i1
b
i
t R
i
t b
5
t R
4
t
X
4
i1
g
i
S
i
13; d
i
t 12
8
>
>
>
>
<
>
>
>
>
:
23
where R
i
t (i 1; 2; 3; 4) can be computed directly from Eq. (2)
once S
i
t and S
i
t 1 (i 1; 2; 3; 4) are chosen; b
i
t is the benet
function (refer to [15]); and g
i
is the penalty function for not
meeting the ending storage constraint (d
i
):
g
i
S
i
13; d
i
40 S
i
13 d
i
2
S
i
13 6 d
i
0 S
i
13 > d
i
(
24
Other constraints include:
S
min
0
0
0
0
2
6
6
6
4
3
7
7
7
5
; S
max
10
10
10
15
2
6
6
6
4
3
7
7
7
5
25
S
initial
5
5
5
5
2
6
6
6
4
3
7
7
7
5
; S
final
5
5
5
7
2
6
6
6
4
3
7
7
7
5
26
R
min
0
0
0
0
2
6
6
6
4
3
7
7
7
5
; R
max
3
4
4
7
2
6
6
6
4
3
7
7
7
5
27
Note that Eqs. (7)(9) are inactive in this example.
4.2. Results and discussion
In this example, the storage in each reservoir is discretized with
DS 1 unit, and thus the total number of storage combinations is
11 11 11 16 = 21,296. We solved the four-dimensional DP
model on the HPC system described above. First, we applied the se-
rial DP algorithm with a single computing process. The global opti-
mum is 401.3. The wall clock time (T
1
) was 1818.6 s. Then we
performed several executions using the developed parallel DP
algorithm by varying the peer process numbers. As expected, all
executions produced the same global optimum, except with differ-
ent wall clock times. The computation time fraction s
0
, the sum of
communication time fraction and workload imbalance time cost
s
00
s
000
, the wall clock time s
K
and the parallel efciency E
K
are
presented in Table 1. As we can see, the wall clock time T
K
is dras-
tically reduced, from1818.6 s to 9.7 s with 350 peer processes. This
reduction is quite substantial.
Several probable reasons affect the parallel efciency:
(1) The hyper-threading technology. (a) If there are one peer
process and one transfer process, we can conclude from
the result that the OS schedules the two processes on one
physical core. The peer process does not compete with the
transfer process for execution resources, and thus the com-
putation time fraction is almost the same with the serial
DP algorithm. (b) If there are two peer processes and one
transfer process, we conclude that one peer process and
the transfer process are scheduled on one physical core,
while the other peer process is scheduled on the other phys-
ical core. Because the three processes do not compete for
execution resources, the parallel efciency is as high as
0.99. (c) If there are three peer processes and one transfer
process, there would be two peer processes scheduled on
the same physical core. Because the two peer processes com-
pete for execution resources, the parallel efciency is
reduced to 0.72. (d) By this reasoning, parallel efciencies
increase when there are even numbers of peer processes
and decrease when there are odd numbers of peer processes.
However, as the number of peer processes becomes large,
the uctuations are not obvious.
(2) The workload imbalance. From Fig. 9, we see that as the
number of peer processes increases, the sum of communica-
tion time fraction and workload imbalance time cost
increases, then decreases, and then increases again. This is
because when the number of peer processes is small, the
workload imbalance (because various numbers of infeasible
solutions assigned to the peer processes will be discarded,
[see Section 3.1]) is signicant, and the fast peer processes
must wait until the slowest peer process nishes its task.
However, when the number of peer processes becomes
large, the amounts of task assigned to peer processes
decreases and the inuence of workload imbalance dimin-
ishes. Although there are still imbalance tasks among vari-
ous peer processes, the fast peer processes do not have to
wait long. Because the sum of communication time fraction
and workload imbalance time cost increases linearly as the
number of peer processes increases, we can infer that the
developed parallel DP algorithm is scalable and not
restricted by the increase in the number of cores.
(3) The turbo boost technology (referred to as dynamic overc-
locking). The clock rate depends on the CPUs thermal limit,
the number of cores in use as well as the maximum fre-
quency of the active cores. If the CPU is below its thermal
limits, the operating frequency will increase; otherwise the
operating frequency will be xed on the standard frequency.
Table 1
The computation time fraction, the sum of communication time fraction and
workload imbalance time cost, the wall clock time and the parallel efciency of the
parallel DP algorithm for the classic four-reservoir problem.
No. of peer processes s
0
(s) s
00
s
000
(s) s
K
(s) E
K
Serial 1818.6 1.00
1 1836.2 0.0 1836.2 0.99
2 1833.8 1.3 917.5 0.99
3 2427.5 100.6 842.7 0.72
4 2226.3 123.4 587.4 0.77
5 2595.5 115.2 542.1 0.67
6 2373.6 181.3 425.8 0.71
7 2688.3 140.5 404.1 0.64
8 2510.6 194.1 338.1 0.67
9 2732.9 132.6 318.4 0.63
10 2593.3 153.7 274.7 0.66
25 2948.3 90.9 121.6 0.60
50 2987.1 60.6 61.0 0.60
75 3016.1 33.3 40.7 0.60
100 3036.3 93.7 31.3 0.58
125 3073.2 75.3 25.2 0.58
150 3058.1 106.0 21.1 0.57
175 3066.6 140.8 18.3 0.57
200 3059.8 134.0 16.0 0.57
225 3115.4 147.1 14.5 0.56
250 3087.7 197.5 13.1 0.55
275 3106.8 210.5 12.1 0.55
300 3113.3 216.7 11.1 0.55
325 3080.3 250.6 10.2 0.55
350 3137.0 243.0 9.7 0.54
X. Li et al. / Advances in Water Resources 67 (2014) 115 9
5. A real-world application
5.1. Problem description
We now apply the developed parallel DP algorithm to a real-
world reservoir system located in the central Yangtze River Basin,
China (see Fig. 10). The system consists of ve reservoirs (n 5):
the Three Gorges Project (TGP) and the Gezhouba (GZB) on the
Yangtze River; and the Shuibuya (SBY), the Geheyan (GHY) and
the Gaobazhou (GBZ) on the Qingjiang River (which joins the Yan-
gtze River at the town of Zhicheng). As shown in Fig. 10 (right),
these reservoirs are numbered from i 1 to i 5 and the river
conuence point is denoted as l 1 (U
1
2; 5). The natural inow
between the TGP and the GZB can be ignored because of their close
proximity. At present, all ve reservoirs are operated by two corpo-
rations. The TGP and GZB cascade hydropower plants (TGPGZB)
are managed by the China Three Gorges Corporation, while the
SBY, GHY and GZB cascade hydropower plants (SBYGHYGBZ)
are under the jurisdiction of the Hubei Qingjiang Hydroelectric
Development Co., Ltd. The joint operation of this ve-reservoir sys-
tem is of major interest to the Ministry of Science and Technology
of China.
In this reservoir system, the SBY is a multi-year storage reser-
voir with 24.0 10
8
m
3
of active storage capacity; the GHY is a
yearly storage reservoir with 11.5 10
8
m
3
of active storage
capacity; the TGP is a seasonal storage reservoir with
221.5 10
8
m
3
of active storage capacity; and the GZB and GBZ
are daily storage reservoirs with active storage capacities
0.8 10
8
m
3
and 0.5 10
8
m
3
, respectively much smaller than
the other three reservoirs. The main characteristics and the operat-
ing rules of the ve reservoirs are listed in Table 2, where
S
max
S
min
denotes the active storage capacity; H is the
conversion function from reservoir storage to water level; N
min
denotes the guaranteed output production of a hydropower plant;
and N
max
denotes the installed capacity of a hydropower plant.
Fig. 9. Computation time fraction versus the sum of communication time fraction and workload imbalance time cost.
Fig. 10. The ve-reservoir system.
Table 2
Main characteristics and operating rules of the ve reservoirs.
Reservoir TGP GZB SBY GHY GBZ
S
min
(10
8
m
3
)
171.5 6.3 19.0 18.7 3.5
S
max
(10
8
m
3
) 393.0 7.1 43.0 30.2 4.0
S
max
S
min
(10
8
m
3
)
221.5 0.8 24.0 11.5 0.5
HS
min
(m)
145.0 63.0 350.0 180.0 78.0
HS
max
(m) 175.0 66.0 400.0 200.0 80.0
R
min
(m
3
/s)
6000 6000 0 0 0
R
max
(m
3
/s) 101,700 113,400 13,200 18,000 18,400
N
min
(MW)
4990 1040 310 241.5 77.3
N
max
(MW) 22,400 2757 1840 1212 270
10 X. Li et al. / Advances in Water Resources 67 (2014) 115
We use three scenarios for the case study, as shown in Table 3,
where PR
min
and PR
max
, respectively, denote the mandatory release
and the maximum allowable release at river conuence point l 1.
Scenario 1 is built for the TGPGZB; scenario 2 is for the SBYGHY
GBZ; and scenario 3 encompasses the entire ve-reservoir system.
Scenario 3 is conducted under the assumption that the ve-reser-
voir system is under joint operation. To test our proposed method-
ology, we further assume that energy generation is transmitted to
the same grid. This assumption may differ slightly from what actu-
ally occurs in practice. Furthermore, for conuence point l 1, the
Table 3
Three scenarios used in the real-world problem.
Scenario System
PR
min
(m
3
/s) PR
max
(m
3
/s) N
min
(MW) N
max
(MW)
1 TGPGZB 6000 56,700 6030 20,957
2 SBYGHYGBZ 0 18,400 628.8 3322
3 The ve-reservoir system 6000 56,700 6658.8 24,279
Table 4
The optimal energy generation from the three scenarios (10
8
KW h).
Scenario TGP GZB SBY GHY GBZ Total
1 8364.9 1528.8 9893.7
2 330.5 274.9 87.8 693.2
3 8411.9 1525.0 342.1 271.5 86.0 10636.5
(3-1-2)
*
47.0 3.8 11.6 3.4 1.8 49.6
*
Note: (3-1-2) denotes the scenario 3 minus scenario 1 minus scenario 2.
(a) Release at river confluence point 1 l = from scenario 1.
(b) Release at river confluence point 1 l = from scenario 3.
Fig. 11. Comparison of releases at river conuence point l 1, scenarios 1 versus 3.
X. Li et al. / Advances in Water Resources 67 (2014) 115 11
mandatory release is 6000 m
3
/s for both navigational and ecologi-
cal purposes, and the maximum allowable release is 56,700 m
3
/s
for downstream ood protection.
The objectives of the three scenarios are to maximize energy
generation over a recent 10-year horizon (June 2000 through
May 2010). The operating period is divided into 360 time steps
(a) Output production from scenario 1.
(b) Output production from scenario 2.
(c) Output production from scenario 3.
Fig. 12. Comparison of output productions, scenarios 1 and 2 versus scenario 3.
12 X. Li et al. / Advances in Water Resources 67 (2014) 115
(each with a length of about 10 days and T 360). We expand the
objective function f
t
to be maximized during time step t to:
f
t
X
I
i
N
i
t Dt
X
I
i
9:81 g
i
R
0
i
t H
i
t Dt 8t 28
where
R
i
t R
0
i
t R
00
i
t 8t 29
H
i
t HF
i
t HT
i
t 8t 30
g
i
is the hydropower plant efciency at reservoir i; R
0
i
t and R
00
i
t
are the power release and non-power release from reservoir i dur-
ing time step t; H
i
t is the average head at reservoir i during time
step t, which is the difference between the average reservoir fore-
bay water level HF
i
t (as a function of beginning and ending time
period reservoir storages) and the tail-race water level HT
i
t (as a
function of total release) at reservoir i during time step t; and Dt
is the time interval. In this case, each reservoir operates under its
own constraints (see Table 2), while the reservoir system operates
under the system constraints (see Table 3).
5.2. Results and discussion
In the real-world case, the GZB and GBZ are treated as run-of-
river hydropower plants because of their much smaller storages
compared with the TGP, SBY and GHY (see Table 2), which are
operated as storage reservoirs. The ratio of active storages of the
three reservoirs is approximately 20:2:1 (i.e. 221.5 10
8
m
3
:
24.0 10
8
m
3
: 11.5 10
8
m
3
), and thus we discretize them into
200, 20 and 10 levels, respectively. The total numbers of storage
combinations are 200 1 = 200, 20 10 = 200 and
200 20 10 = 40,000 for the three scenarios. Similarly, the com-
putation was performed on the HPC system with Oracle 11g serv-
ing as the database system for data access.
The optimal energy generations of the three scenarios are
shown in Table 4. The sum of optimal energy production of sce-
nario 1 and scenario 2 is 9893.7 + 693.2 = 10586.9 10
8
KW h, less
than the 10636.5 10
8
KW h of scenario 3. This indicates that the
coordinated operation of the ve reservoirs can result in an aver-
age 4.96 10
8
KW h energy production increase (or about
$1.24 10
8
CNY increase, assuming the energy price of the ve
reservoirs is $0.25 CNY/KW h [18]) per year for the system. The
comparisons of releases at river conuence point l 1 and the out-
put productions between scenarios 1 and 2 versus scenario 3 are
shown in Figs. 11 and 12, respectively. Obviously, the SBYGHY
GBZ system helps the TGPGZB system relieve the stress of water
supply demand at river conuence point l 1. The ve-reservoir
system can provide the grid with more secure and reliable output
production.
For scenario 3, the serial DP algorithm took more than 10 days
(T
1
). Similar to the four-reservoir problem, we performed several
executions using the developed parallel DP algorithm by varying
the number of peer processes. The computing procedures are com-
pletely the same for various executions, and thus the objective
function value and the consequent storages and releases are the
same. The wall clock time T
K
and the parallel efciency E
K
for var-
ious numbers of peer processes are shown in Table 5. As we can
see, the wall clock time T
K
is reduced from 266.83 h to 1.54 h with
350 peer processes. Again, this reduction is quite substantial. It is
clear that an increase in the number of peer processes is accompa-
nied by a decrease in wall clock time. Although each peer process is
allocated the same approximate amount of subtask, workload
imbalance still exists, which may result in a little less efciency
relative to the hypothetical problem.
6. Conclusions
In this paper, we establish a multi-dimensional DP model for
optimizing the joint operation of a multi-reservoir system, consid-
ering several indispensable constraints for individual reservoirs
and the reservoir system. We illustrate the DP algorithms solution
space with the help of a matrix and estimate the smallest RAM re-
quired by the DP algorithm, making full preparations for parallel-
ization. We believe an effective parallelization strategy for the DP
algorithm should be designed specially to alleviate the RAM bottle-
neck. For instance, the discretization should be sufciently ne for
an accurate simulation in real-time operation. If we use a discret-
ization level of 10 cm for each of the ve reservoirs in Section 5 and
30 days as the inow forecast period, the RAM requirement will in-
crease to approximately 2.09 TB when using the serial DP algo-
rithm. In this situation, a single computer or the shared memory
architecture no longer can meet the large RAM requirement. The
distributed memory architecture and the message passing inter-
face (MPI) protocol make it possible for us to develop a parallel
DP algorithm that considers both the distributed computing and
distributed computer memory, and further, to solve previously
unsolvable multi-reservoir DP problems with parallel computing.
In the above instance, the RAM requirement will be reduced to
approximate 2.09/K TB to each peer process, where K is the num-
ber of peer processes.
Using the developed parallel DP algorithm based on the peer-
to-peer parallel paradigm, we solve the classic four-reservoir prob-
lem and a real-world ve-reservoir system on a HPC system with
up to 350 cores. The results indicate that the wall clock times are
reduced drastically when the number of computing processes is in-
creased. In both cases, we observe good performance in parallel
efciency. Furthermore, the real-world problem results indicate
that operational effectiveness can be improved and the benecial
uses can be maximized by joint operation of the interconnected
ve reservoirs. Specically, (1) the energy production of the system
can be increased by an average of 4.96 10
8
KW h per year; (2) the
stress of meeting the minimum water supply demand at river con-
uence point l 1 can be relieved greatly by the inclusion of the
SBYGHYGBZ system with the TGPGZB system; and (3) more se-
cure and reliable output productions can be guaranteed and trans-
mitted to the grid.
The Casti et al. [5] paper sends a message of condence in con-
structing computers with many processing elements to deal with
some future DP problems. Even at the time of that study (i.e.
1970s), NASAs parallel computer had only 64 processing elements.
Over last twenty years, however, we have witnessed the rapid
development of supercomputing worldwide. According to TOP
500 supercomputer sites historical lists (http://www.top500.org/
), the number of cores in top supercomputers has increased from
several thousands to several millions, and computer memory has
Table 5
The wall clock time and the parallel efciency of the parallel DP algorithm for the ve-reservoir system.
No. of peer processes Serial 50 100 150 200 250 300 350
s
K
(h) 266.83 10.11 5.17 3.51 2.61 2.12 1.77 1.54
E
K
1.00 0.53 0.52 0.51 0.52 0.50 0.50 0.50
X. Li et al. / Advances in Water Resources 67 (2014) 115 13
increased correspondingly. Given such advances, we predict that
the barrier in computing resources may be inconsequential in the
future. Moreover, we believe the benets resulting from hydro-
power generation of a multi-reservoir system far exceed the ex-
pense of computing resources. Thus, we hope this paper will
stimulate future research on parallel computing in the eld of res-
ervoir operation.
Future work should implement the parallel DP algorithm using
massively parallel computing resources. Indeed, we have noticed
the trend to use massively parallel computing resources for some
water resources problems. For instance, Reed and Kollat [26] em-
ployed a massively parallel multi-objective evolutionary algorithm
for a groundwater monitoring application with a maximum of
8192 processors; while Kollet et al. [11] and Maxwell [20], respec-
tively, performed a ParFlow of various coupled modes utilizing the
maximum number of 16,384 processors.
Future work also could introduce the distributed database man-
agement technique to further alleviate RAM requirements for mul-
ti-reservoir DP problems. This paper distributes the optimal
transitions C
t
and F
t1
and
saves them in RAMs of several computing processes. Future work
could distribute and save them in several databases in various
computers. This would take advantage of large hard disk capacities
and access the required data from the database of identied com-
puter when the DP algorithm solves the recursive equation and
traces back the optimal path.
Finally, the developed parallel DP algorithm easily can be ap-
plied to other DP-based variants, such as SDP, IDP and DDDP.
Notation
t time index, t e [1, T]
i reservoir index, i e [1, n]
l river conuence point index, l e [1, L]
k peer process index, k e [1, K]
S(t) reservoir storage vector at the beginning of time
step t, S(t) = [S
1
(t), ... , S
i
(t), ... , S
n
(t)]
T
S(t + 1) reservoir storage vector at the end of time step t
S
initial
initial reservoir storage vector
S
nal
nal expected reservoir storage vector
S
min
(t + 1) minimum reservoir storage vector at the end of
time step t
S
max
(t + 1) maximum reservoir storage vector at the end of
time step t
F
t
maximum cumulative return from the rst time
step to the beginning of the tth time step
resulting from the joint operation of n reservoirs
f
t
() objective function to be maximized during time
step t
I(t) inow vector during time step t
R(t) total release vector during time step t,
R(t) = [R
1
(t), ... , R
i
(t), ... , R
n
(t)]
T
R
min
(t) minimum required release vector during time
step t
R
max
(t) maximum allowable release vector during time
step t
g
i
hydropower plant efciency at reservoir i
R
0
i
t power release from reservoir i during time step t
R
00
i
t non-power release from reservoir i during time
step t
H
i
t
average head at reservoir i during time step t,
which is equal to the average reservoir fore-bay
water level HF
i
(t) minus the tail-race water level
HT
i
(t)
Dt time interval
M reservoir system connectivity matrix
U
l
set of reservoirs in parallel at river conuence
point l
PR
min
l
t
minimum required release at river conuence
point l during time step t
PR
max
l
t maximum allowable release at river conuence
point l during time step t
N(t) output vector during time step t,
N(t) = [N
1
(t), ... , N
i
(t), ... , N
n
(t)]
T
N
min
(t) minimum required output vector during time
step t
N
max
(t) maximum allowable output vector during time
step t
N
min
(t) minimum required output from a reservoir
system during time step t
N
max
(t) maximum allowable output from a reservoir
system during time step t
C possible storage combination m
n
T matrix,
C = [C(2), ... , C(t), ... , C(T + 1)]
C
optimal transitions, C
1
; . . . ; C
k
; . . . ; C
T
F
t
maximum cumulative returns from the rst time
step to the beginning of the tth time step,
F
t
F
1;t
; . . . ; F
k;t
; . . . ; F
K;t
T
m number of discretization levels of each reservoir
m
k
allocation number of storage combinations to
peer process k
s
1
wall clock time of using 1 computing process
s
K
wall clock time of using K peer processes
Ds average wall clock time for one-time objective
function evaluation
s
0
computation time fraction
s
00
communication time fraction
s
000
workload imbalance time cost
E
K
parallel efciency
RAM
1
RAM amount for a single computing process
(byte)
RAM
k
RAM amount for each peer process (byte)
Acknowledgements
The research is supported by the National Key Technologies
R&D Program #2013BAB05B03 and #2009BAC56B03 and the Na-
tional Natural Science Foundation #51109114 in China. The rst
author is supported by a fellowship from the Chinese government
for his visit to the University of California, Los Angeles. Partial sup-
port is also provided from an AECOM endowment. The authors are
very grateful to the two anonymous reviewers for their in-depth
reviews and constructive comments, which greatly helped improve
the paper.
References
[1] Bastian P, Helmig R. Efcient fully-coupled solution techniques for two-phase
owin porous media: parallel multigrid solution and large scale computations.
Adv Water Resour 1999;23(3):199216. http://dx.doi.org/10.1016/S0309-
1708(99)00014-7.
[2] Bellman R. Adaptive control processes: a guided tour. Princeton, NJ: Princeton
University Press; 1961.
[3] Bellman R. Dynamic programming. Princeton, NJ: Princeton University Press;
1957.
[4] Bhaskar NR, Whitlatch EE. Derivation of monthly reservoir release policies.
Water Resour Res 1980;16(6):98793. http://dx.doi.org/10.1029/
WR016i006p00987.
14 X. Li et al. / Advances in Water Resources 67 (2014) 115
[5] Casti J, Richardson M, Larson R. Dynamic programming and parallel computers.
J Optim Theory App 1973;12(4):42338. http://dx.doi.org/10.1007/
BF00940421.
[6] Chandramouli V, Raman H. Multireservoir modeling with dynamic
programming and neural networks. J Water Resour Plann Manage
2001;127(2):8998. http://dx.doi.org/10.1061/(ASCE)0733-
9496(2001)127:2(89).
[7] El Baz D, Elkihel M. Load balancing methods and parallel dynamic
programming algorithm using dominance technique applied to the 01
knapsack problem. J Parallel Distrib Comput 2005;65(1):7484. http://
dx.doi.org/10.1016/j.jpdc.2004.10.004.
[8] Hall WA, Butcher WS, Esogbue A. Optimization of the operation of a multiple-
purpose reservoir by dynamic programming. Water Resour Res
1968;4(3):4717. http://dx.doi.org/10.1029/WR004i003p00471.
[9] Heidari M, Chow VT, Kokotovic PV, Meredith DD. Discrete differential dynamic
programming approach to water resources systems optimization. Water
Resour Res 1971;7(2):27382. http://dx.doi.org/10.1029/WR007i002p00273.
[10] Kollet SJ, Maxwell RM. Integrated surface-groundwater ow modeling: a free-
surface overland ow boundary condition in a parallel groundwater ow
model. Adv Water Resour 2006;29(7):94558. http://dx.doi.org/10.1016/
j.advwatres.2005.08.006.
[11] Kollet SJ, Maxwell RM, Woodward CS, Smith S, Vanderborght J, Vereecken H,
Simmer C. Proof of concept of regional scale hydrologic simulations at
hydrologic resolution utilizing massively parallel computer resources. Water
Resour Res 2010;46(4). http://dx.doi.org/10.1029/2009WR008730.
[12] Kumar DN, Reddy MJ. Multipurpose reservoir operation using particle swarm
optimization. J Water Resour Plann Manage 2007;133(3):192201. http://
dx.doi.org/10.1061/(ASCE)0733-9496(2007)133:3(192).
[13] Labadie JW. Optimal operation of multireservoir systems: state-of-the-art
review. J Water Resour Plann Manage 2004;130(2):93111. http://dx.doi.org/
10.1061/(ASCE)0733-9496(2004)130:2(93).
[14] Larson RE, Korsak AJ. A dynamic programming successive approximations
technique with convergence proofs. Automatica 1970;6(2):24552. http://
dx.doi.org/10.1016/0005-1098(70)90095-6.
[15] Larson RE. State increment dynamic programming. New York: Elsevier
Science; 1968.
[16] Li T, Wang G, Chen J, Wang H. Dynamic parallelization of hydrological model
simulations. Environ Modell Softw 2011;26(12):173646. http://dx.doi.org/
10.1016/j.envsoft.2011.07.015.
[17] Li X, Wei J, Fu X, Li T, Wang G. A knowledge-based approach for reservoir
system optimization. J Water Resour Plann Manage, in press. doi: http://
dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000379.
[18] Li X, Li T, Wei J, Wang G, Yeh WWG. Hydro unit commitment via mixed integer
linear programming: a case study of the three gorges project, China, IEEE Trans
Power Syst, in press. doi: http://dx.doi.org/10.1109/TPWRS.2013.2288933.
[19] Martins WS, Del Cuvillo JB, Useche FJ, Theobald KB, Gao GR. A multithreaded
parallel implementation of a dynamic programming algorithm for sequence
comparison. Pac Symp Biocomput 2001;6:31122.
[20] Maxwell RM. A terrain-following grid transform and preconditioner for
parallel, large-scale, integrated hydrologic modeling. Adv Water Resour
2012;53:10917. http://dx.doi.org/10.1016/j.advwatres.2012.10.001.
[21] Message passing interface forum. <http://www.mpi-forum.org/index.html>.
[22] Mousavi SJ, Karamouz M. Computational improvement for dynamic
programming models by diagnosing infeasible storage combinations. Adv
Water Resour 2003;26(8):8519. http://dx.doi.org/10.1016/S0309-
1708(03)00061-7.
[23] Nemhauser GL. Introduction to dynamic programming. New York: John Wiley;
1966.
[24] Piccardi C, Soncini-Sessa R. Stochastic dynamic programming for reservoir
optimal control: dense discretization and inow correlation assumption made
possible by parallel computing. Water Resour Res 1991;27(5):72941. http://
dx.doi.org/10.1029/90WR02766.
[25] Pool R. Massively parallel machines usher in next level of computing power.
Science 1992;256:501. http://dx.doi.org/10.1126/science.256.5053.50.
[26] Reed PM, Kollat JB. Visual analytics clarify the scalability and effectiveness of
massively parallel many-objective optimization: a groundwater monitoring
design example. Adv Water Resour 2013;56:113. http://dx.doi.org/10.1016/
j.advwatres.2013.01.011.
[27] Rouholahnejad E, Abbaspour KC, Vejdani M, Srinivasan R, Schulin R, Lehmann
A. A parallelization framework for calibration of hydrological models. Environ
Modell Softw 2012;31:2836. http://dx.doi.org/10.1016/
j.envsoft.2011.12.001.
[28] Rytter W. On efcient parallel computations for some dynamic programming
problems. Theor Comput Sci 1988;59(3):297307. http://dx.doi.org/10.1016/
0304-3975(88)90147-8.
[29] Sulis A. GRID computing approach for multireservoir operating rules with
uncertainty. Environ Modell Softw 2009;24(7):85964. http://dx.doi.org/
10.1016/j.envsoft.2008.11.003.
[30] Tan G, Sun N, Gao GR. A parallel dynamic programming algorithm on a multi-
core architecture. In: Proceedings of the nineteenth annual ACM symposium
on parallel algorithms and architectures (ACM 2007), 2007. p. 13544. http://
dx.doi.org/10.1145/1248377.1248399.
[31] Tang Y, Reed PM, Kollat JB. Parallelization strategies for rapid and robust
evolutionary multiobjective optimization in water resources applications. Adv
Water Resour 2007;30(3):33553. http://dx.doi.org/10.1016/
j.advwatres.2006.06.006.
[32] Trott WJ, Yeh WWG. Optimization of multiple reservoir system. J Hydraul Eng
Div 1973;99(10):186584.
[33] Wang H, Fu X, Wang G, Li T, Gao J. A common parallel computing framework
for modeling hydrological processes of river basins. Parallel Comput
2011;37(67):30215. http://dx.doi.org/10.1016/j.parco.2011.05.003.
[34] Wardlaw R, Sharif M. Evaluation of genetic algorithms for optimal reservoir
system operation. J Water Resour Plann Manage 1999;125(1):2533. http://
dx.doi.org/10.1061/(ASCE)0733-9496(1999)125:1(25).
[35] Wurbs RA. Reservoir-system simulation and optimization models. J Water
Resour Plann Manage 1993;119(4):45572. http://dx.doi.org/10.1061/
(ASCE)0733-9496(1993)119:4(455).
[36] Wu Y, Li T, Sun L, Chen J. Parallelization of a hydrological model using the
message passing interface. Environ Modell Softw 2013;43:12432. http://
dx.doi.org/10.1016/j.envsoft.2013.02.002.
[37] Yakowitz S. Dynamic programming applications in water resources. Water
Resour Res 1982;18(4):67396. http://dx.doi.org/10.1029/WR018i004p00673.
[38] Yeh WWG. Reservoir management and operations models: a state-of-the-art
review. Water Resour Res 1985;21(12):1797818. http://dx.doi.org/10.1029/
WR021i012p01797.
[39] Young GK. Finding reservoir operating rules. J Hydraul Eng Div
1967;93(HY6):297321.
[40] Zhao T, Cai X, Lei X, Wang H. Improved dynamic programming for reservoir
operation optimization with a concave objective function. J Water Resour
Plann Manage 2012;138(6):5906. http://dx.doi.org/10.1061/(ASCE)WR.1943-
5452.0000205.
X. Li et al. / Advances in Water Resources 67 (2014) 115 15