0% found this document useful (0 votes)

139 views15 pages

A Parallel Dynamic Programming Algorithm For Multi-Reservoir System

This document presents a parallel dynamic programming algorithm to optimize the joint operation of a multi-reservoir system. It first formulates a multi-dimensional dynamic programming model, then parallelizes the algorithm using a peer-to-peer approach based on distributed memory and message passing. The parallel approach aims to reduce computation time and memory requirements. It tests the parallel dynamic programming algorithm on a classic four-reservoir problem with up to 350 cores, showing good parallel efficiency and scalability. Finally, it applies the approach to a five-reservoir system in China.

Uploaded by

Mariana Marselina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

139 views15 pages

A Parallel Dynamic Programming Algorithm For Multi-Reservoir System

Uploaded by

Mariana Marselina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

A parallel dynamic programming algorithm for multi-reservoir system

optimization
Xiang Li
a
, Jiahua Wei
a
, Tiejian Li
a
, Guangqian Wang
a
, William W.-G. Yeh
b,
a
State Key Laboratory of Hydroscience & Engineering, Tsinghua University, Beijing 100084, China
b
Department of Civil and Environmental Engineering, University of California, Los Angeles, CA 90095, USA
a r t i c l e i n f o
Article history:
Received 8 May 2013
Received in revised form 8 January 2014
Accepted 12 January 2014
Available online 30 January 2014
Keywords:
Dynamic programming
Multi-reservoir system optimization
Joint operation
Parallel computing
a b s t r a c t
This paper develops a parallel dynamic programming algorithm to optimize the joint operation of a
multi-reservoir system. First, a multi-dimensional dynamic programming (DP) model is formulated for
a multi-reservoir system. Second, the DP algorithm is parallelized using a peer-to-peer parallel paradigm.
The parallelization is based on the distributed memory architecture and the message passing interface
(MPI) protocol. We consider both the distributed computing and distributed computer memory in the
parallelization. The parallel paradigm aims at reducing the computation time as well as alleviating the
computer memory requirement associated with running a multi-dimensional DP model. Next, we test
the parallel DP algorithm on the classic, benchmark four-reservoir problem on a high-performance com-
puting (HPC) system with up to 350 cores. Results indicate that the parallel DP algorithm exhibits good
performance in parallel efciency; the parallel DP algorithm is scalable and will not be restricted by the
number of cores. Finally, the parallel DP algorithm is applied to a real-world, ve-reservoir system in
China. The results demonstrate the parallel efciency and practical utility of the proposed methodology.
2014 Elsevier Ltd. All rights reserved.
1. Introduction
Dynamic programming (DP), an algorithm attributed largely to
Bellman [3], is developed for optimizing a multi-stage (the term
stage represents time step throughout the paper) decision pro-
cess. If the return or cost at each stage is independent and satises
the monotonicity and separability conditions [23], the original
multi-stage problem can be decomposed into stages with decisions
required at each stage. The decomposed problem then can be
solved recursively, two stages at a time, using the recursive equa-
tion of DP. DP is particularly suited for optimizing reservoir man-
agement and operation as the structure of the optimization
problem conforms to a multi-stage decision process. Over the past
four decades, DP had been used extensively in the optimization of
reservoir management and operation [4,6,8,13,22,35,3740].
In the discrete form of DP, storage of each reservoir is discret-
ized into a nite number of levels. By exhaustive enumeration over
all possible combinations of discrete levels at each stage for all res-
ervoirs in a system, global optimality can be assured in a discrete
sense. However, the well-known curse of dimensionality [2]
limits the application of DP to multi-state variable problems, as
the state space increases exponentially with an increase in the
number of state variables. This drastic increase in state space and
the consequent random access memory (RAM) requirement
quickly can exceed the hardware capacity of a modern computer
[13]. A variety of DP variants, such as incremental dynamic pro-
gramming (IDP) [15], dynamic programming successive approxi-
mations (DPSA) [14], incremental dynamic programming and
successive approximations (IDPSA) [32] and discrete differential
dynamic programming (DDDP) [9] have been proposed to alleviate
the dimensionality problem. However these variants all require an
initial trajectory for each state variable. For a non-convex problem,
there is no assurance of convergence to the global optimum. Re-
cently, Mousavi and Karamouz [22] reduced the computation time
of a DP model for a multi-reservoir system by diagnosing infeasible
storage combinations and removing them from further computa-
tions. Zhao et al. [40] proposed an improved DP model for optimiz-
ing reservoir operation by taking advantage of the monotonic
relationship between reservoir storage and the optimal release
decision. However, the model only can be applied to reservoir
operation with a concave objective function.
Because of the hardware limitations of a single computer as
well as large-scale computing requirements, parallel computing
has been applied in many elds [25]. In the water resources eld,
there are several successful examples. Bastian and Helmig [1]
http://dx.doi.org/10.1016/j.advwatres.2014.01.002
0309-1708/ 2014 Elsevier Ltd. All rights reserved.

Corresponding author. Tel.: +1 3108252300; fax: +1 3108257581.

E-mail addresses: [email protected], [email protected] (X. Li),
[email protected] (J. Wei), [email protected] (T. Li), dhhwgq@
tsinghua.edu.cn (G. Wang), [email protected] (W.W.-G. Yeh).
Advances in Water Resources 67 (2014) 115
Contents lists available at ScienceDirect
Advances in Water Resources
j our nal homepage: www. el sevi er. com/ l ocat e/ advwat r es
employed a data parallel implementation of a Newton-multi-grid
algorithm for two-phase ow in porous media. Kollet and Maxwell
[10] incorporated an efcient parallelism into an integrated hydro-
logic model ParFlow. Tang et al. [31] used the masterslave and
multi-population parallelization schemes for the Epsilon-Nondom-
inated Sorted Genetic Algorithm-II and applied them to several
water resources problems. Wang et al. [33] designed a common
parallel framework, while Li et al. [16] introduced a dynamic par-
allel algorithm for hydrological model simulations. Rouholahnejad
et al. [27] proposed a parallelization framework for hydrological
model calibration. Wu et al. [36] parallelized the Soil and Water
Assessment Tool. These previous studies demonstrated that by
using parallel computing associated with proper parallelization
strategies for optimization or simulation, solutions can be im-
proved and computation times can be reduced dramatically.
Although parallel computing has been used extensively in hydro-
logic models, the potential has not been explored fully in the eld
of reservoir operation [29].
Over the last several decades, with the rapid development of
high-performance computing (HPC) environments, parallel dy-
namic programming algorithm has been studied from theories
gradually to applications. From the theoretical point of view,
Casti et al. [5] presented various forms of parallelism and the
corresponding parallel algorithms for DP. Rytter [28] considered
some DP problems and estimated the computation complexity
and the number of computing processes based on a hypothetical
parallel random access machine, namely the shared memory
architecture. However, the shared memory architecture may
not be supportable for large RAM requirement problems, since
all parallel tasks access the nite RAM in the architecture (be-
cause of motherboard size restrictions). From the practical point
of view, El Baz and Elkihel [7] designed several load-balancing
strategies and applied a parallel DP algorithm with Open MP, a
protocol based on the shared memory architecture, to the 01
knapsack problem. Martins et al. [19] and Tan et al. [30] parall-
elized DP algorithms for a class of problems, such as biological
sequence comparisons. Piccardi and Soncini-Sessa [24] studied
the discretization and inow correlation on the solution reliabil-
ity of a stochastic dynamic programming (SDP) and exploited
parallel computing to a one-reservoir optimal control problem;
the parallelization was attributed largely to a vectorizing com-
piler or parallelizing compiler. Li et al. [17] implemented a
knowledge-based approach to accurately determine hydropower
generation and developed a masterslave parallel DP algorithm
on an HPC system to optimize the coordinated operation of
the Three Gorges Project and Gezhouba Project cascade hydro-
power plants in China. The masterslave is a frequently-used
parallel paradigm for parallelizing a DP algorithm, where the
master process has a full version of the DP algorithm, and calls
slave processes to evaluate and return objective values and saves
all variables in the RAM of the master process. However, this
parallel paradigm merely shortens computation time but does
not alleviate the RAM requirement.
In this paper, we develop a parallel DP algorithm for multi-
reservoir system optimization. The parallel DP algorithm aims
at reducing the computation time and alleviating the RAM
requirement, taking advantage of parallel computing on the dis-
tributed memory architecture. The paper is organized as follows:
Section 2 formulates a DP model for a multi-reservoir system
optimization problem; Section 3 analyzes the parallelism inher-
ent to the DP algorithm and then proposes a peer-to-peer paral-
lel paradigm for the DP algorithm; Section 4 considers the classic
four-reservoir example and tests the parallel DP algorithm on
the problem; Section 5 applies the parallel DP algorithm to a
real-world ve-reservoir system in China; nally, Section 6 pre-
sents the conclusions.
2. Dynamic programming model
2.1. Objective function
A typical objective function for the optimal operation of a multi-
reservoir system is either to maximize benet or minimize opera-
tional cost. If more than one objective is involved, they can be com-
bined by the weighting method to form a composite objective
function. Without loss of generality, let us consider the maximiza-
tion problem. According to Bellmans principle of optimality [3],
the traditional forward recursive equation of an n-dimensional
DP model (the term dimension here refers to the number of res-
ervoirs) can be represented as:
F

t1
St 1 maxff
t
St; St 1 F

t
Stg 1
where t is the time index, t 2 1; T; St is the storage vector at the
beginning of time step t; St S
1
t; . . . ; S
i
t; . . . ; S
n
t
T
; St 1 is
the storage vector at the end of time step t; i is the reservoir index,
i 2 1; n; F

t
is the maximum cumulative return from the rst
time step to the beginning of the tth time step resulting from the
joint operation of n reservoirs; initially, F

1
0; F

t1
is the
maximum cumulative return from the rst time step to the end
of the tth time step resulting from the joint operation of n reser-
voirs; and f
t
is the objective function to be maximized during
time step t. Note that Eq. (1) is the inverted form of a DP model with
reservoir storages as the decision variables. In the non-inverted DP
model, the releases are the decision variables. For a deterministic
DP model, the ending storage is related to the beginning storage
by the continuity equation.
2.2. Constraints
In the operation of a multi-reservoir system, each individual
reservoir is subject to its own set of constraints, while the reservoir
system is subject to system constraints brought by the intercon-
nection of reservoirs. Specically, we consider the following
constraints:
Continuity equation:
St 1 St It M Rt 8t 2
where It is the vector of inows to reservoirs (i 1; . . . ; n) during
time step t; Rt is the vector of total releases from reservoirs
(i 1; . . . ; n) during time step t; Rt R
1
t; . . . ; R
i
t . . . ; R
n
t
T
;
and M is the n n reservoir system connectivity matrix. Without
loss of generality, we assume that evaporation loss is balanced by
precipitation.
Initial and nal reservoir storages:
S1 S
initial
3
ST 1 PS
final
4
where S
initial
and S
final
are the vectors of initial storages and nal ex-
pected storages of reservoirs (i 1; . . . ; n).
Lower and upper bounds on storages:
S
min
t 1 6 St 1 6 S
max
t 1 8t 5
where S
min
t 1 and S
max
t 1 are the vectors of minimum and
maximum storages of reservoirs (i 1; . . . ; n) at the end of time step
t.
Lower and upper bounds on releases:
For all reservoirs:
R
min
t 6 Rt 6 R
max
t 8t 6
where R
min
t is the vector of the minimum required releases from
reservoirs (i 1; . . . ; n) during time step t; and R
max
t is the vector
2 X. Li et al. / Advances in Water Resources 67 (2014) 115
of maximum allowable releases from reservoirs (i 1; . . . ; n) during
time step t.
For reservoirs in parallel:
PR
min
l
t 6
X
i2U
l
R
i
t 6 PR
max
l
t 8t; 8j 7
where l is the index of river conuence points; l 2 1; L and L is the
number of river conuence points; U
l
is the set of reservoirs in par-
allel at river conuence point l; and PR
min
l
t and PR
max
l
t are the
minimum and maximum releases at river conuence point l during
time step t.
Lower and upper bounds on outputs:
For all reservoirs:
N
min
t 6 Nt 6 N
max
t 8t 8
where Nt is the vector of outputs produced from reservoirs
(i 1; . . . ; n) during time step t; Nt N
1
t; . . . ; N
i
t . . . ; N
n
t
T
;
N
min
t is the vector of minimum required outputs from reservoirs
(i 1; . . . ; n) during time step t; and N
max
t is the vector of maxi-
mum allowable outputs from reservoirs (i 1; . . . ; n) during time
period t.
For the entire reservoir system:
N
min
t 6
X
n
i1
N
i
t 6 N
max
t 8t 9
where N
min
t is the minimum required output from the reservoir
system during time step t; and N
max
t is the maximum allowable
output from the reservoir system during time step t.
Note that reservoir storages, releases and outputs are variables
to be solved; the others are the known input data in the
optimization.
3. Parallelization strategy
In this study, we use the Windows HPC Server 2012 R2 operat-
ing system (OS). The HPC system consists of 20 IBM HS22 blades
and an Inniband 40 GBps network. Each blade has two Intel

Xeon

E5645 2.40 GHz CPUs (each CPU has six physical cores)
and 12 GB of RAM. The Intel

Xeon

CPU employs hyper-threading

technology that includes key features: (1) for each physical core,
the OS addresses two logical cores and can schedule two processes
simultaneously (i.e. a logical core conducts one process). Thus
there are a total of 480 logical cores in the HPC system; (2) each
logical core within a physical core has its own execution resources
and also shares execution resources with the other logical core; (3)
when two processes on two logical cores belong to the same phys-
ical core and need same shared execution resources, one process
should stall and give the execution resources to the other process
until it nishes. In other words, the two logical cores within a
physical core cannot work as two independent physical cores.
Fig. 1 presents the schematic representation of the HPC system,
which is the distributed memory architecture.
Section 3.1 establishes the foundation for using the serial DP
algorithm to solve the multi-reservoir system optimization prob-
lem. We summarize the procedures of DP algorithm, illustrate
the DP algorithms solution space with the help of a matrix, convert
the vector form Eq. (1) into a scalar form, and estimate the compu-
tation time and RAM requirement for a multi-reservoir DP prob-
lem. Section 3.2 analyzes the concurrency and dependency of
executing the DP algorithm and employs a peer-to-peer parallel
paradigm to parallelize the DP algorithm. The aim of this paradigm
is to shorten computation time as well as alleviate the RAM
requirement. We develop the parallel programs using C++ lan-
guage under the Microsoft Visual Studio 2010 platform and the
message passing interface (MPI) protocol [21], allowing us to coor-
dinate a parallel program running on multiple computing pro-
cesses in the distributed memory architecture.
3.1. Preparation for parallelization
In general, the solution of the DP algorithm includes two proce-
dures [5,38]:
Procedure I: Solve the recursive equation (1) sequentially, two
stages (corresponding to time steps) at a time, save the optimal
transitions or candidate paths (i.e. St ! St 1) in the memory,
and save the maximum cumulative returns (i.e. F

t1
St 1) in
the memory for use in the next time step. The procedure continues
until the nal time step is reached.
Procedure II: Trace back the optimal path, based on the saved
optimal transitions, from the nal time step to the rst time step
to determine the consequent storage trajectories (i.e.
ST 1 ! ! S2 ! S1) and release trajectories (i.e.
RT ! ! R2 ! R1).
For this analysis, we perform a conversion for Eq. (1). Assuming
the storage of each reservoir is discretized into m levels, the num-
ber of storage combinations of all interconnected reservoirs is m
n
at any time step (see Fig. 2). Note that storage is dened either
at the beginning or the end of a time step. For the sake of clarity,
we introduce Cp
t
; t to denote a storage combination of all inter-
connected reservoirs in a system at the beginning of time step t,
where p
t
denotes the serial number of a storage combination at
the beginning of time step t, p
t
2 1; m
n
. Thus all possible storage
combinations of all reservoirs over the entire planning horizon
can be expressed abstractly as:
C
C1; 2 C1; t C1; T 1
.
.
.
.
.
.
.
.
.
q
.
.
.
.
.
.
Cp
t
; t
.
.
.
.
.
.
q
.
.
.
.
.
.
.
.
.
Cm
n
; 2 Cm
n
; t Cm
n
; T 1
2
6
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
7
5
m
n
T
10
where C is the m
n
T matrix; C C2; . . . ; Ct; . . . ; CT 1;
Cp
1
; 1 is the initial storage combination, often xed; C

is denoted
as the m
n
T matrix of optimal transitions or candidate paths, used
to save the optimal previous storage combinations to the current
ones so as to trace back the optimal path; and C

s element
C

p
t1
; t 1 saves the optimal storage combination at the begin-
ning of time step t to storage combination p
t1
at the end of time
step t. In addition, all maximum cumulative returns from the rst
time step to the beginning of the tth time step are denoted as F

t
,
where F

t
is the m
n
1 matrix. By substituting Cp
t
; t
(p
t
1; . . . ; m
n
) for St, the vector form Eq. (1) can be rewritten into
an equivalent scalar form Eq. (11):
F

t1
Cp
t1
; t 1 maxff
t
Cp
t
; t; Cp
t1
; t 1
F

t
Cp
t
; tg 11
Fig. 3 schematically illustrates the computing procedures and
computer memory used for the serial DP algorithm. Fig. 4 shows
the two computing procedures of the serial DP algorithm, based
on the scalar form Eq. (11). First, the DP algorithm executes Proce-
dure I: For a given p
t1
, to determine the maximum cumulative re-
turn F

t1
Cp
t1
; t 1 and optimal transition C

p
t1
; t 1, the
objective function f
t
of all possible transitions must be examined
between Cp
t
; t (p
t
1; 2; . . . ; m
n
) and Cp
t1
; t 1 and F

t
Cp
t
; t
(p
t
1; 2; . . . ; m
n
) has to be added as well. Then, the maximum
cumulative return F

t1
Cp
t1
; t 1 and optimal transition
C

p
t1
; t 1 are saved in computer memory. After all maximum
cumulative return F

t1
Cp
t1
; t 1 (p
t1
1; 2; . . . ; m
n
) and opti-
mal transition C

p
t1
; t 1 (p
t1
1; 2; . . . ; m
n
) are derived at the
X. Li et al. / Advances in Water Resources 67 (2014) 115 3
Fig. 1. Schematic representation of the HPC system.
Fig. 2. Conversion from reservoir storages to reservoir storage combinations.
Fig. 3. Schematic illustration of computing procedures and computer memory used for the serial DP algorithm.
4 X. Li et al. / Advances in Water Resources 67 (2014) 115
end of time step t, the algorithm proceeds to the next time step un-
til the end of the planning horizon. Then the DP algorithm carries-
out Procedure II: The optimal storage transition for each reservoir
as well as the corresponding release policy can be traced by a back-
ward sweep.
The computation time of the DP algorithm, mainly from Proce-
dure I, can be approximately estimated as:
s
1
m
2n
Ds T 12
where s
1
is the wall clock time of using one computing process;
there are a total number of m
2n
time evaluations for Eq. (1) at each
time step; all evaluations for Eq. (1) are assumed to require the
same average wall clock time Ds. It should be noted that this is
an upper bound estimation that includes the possible infeasible
transitions, which are discarded in the actual computation. The
infeasible transitions are the transitions that cannot be reached
due to insufcient inow to a reservoir [22].
On the other hand, Procedure I requires large RAM capacity. For
illustration, we consider the smallest amount of RAM to occupy
when running the DP algorithm on a single computer, as shown
in Fig. 3 (right). There are two main types of variables that need
to be saved in the sequential decision-making process: one is in
the form of integer variables, used to save the optimal transitions
C

in a two-dimensional array, with the horizontal direction repre-

senting the time step and the vertical direction representing the
storage combinations. The other is in the form of oating variables,
used to save the maximum cumulative returns F

t
and F

t1
for time
steps t and t 1 in two one-dimensional arrays, respectively. It
should be noted that the maximum cumulative returns at time
steps t and t 1 are updated until the nal time step is reached;
that is to say, F

t1
, F

t2
, . . . , F

1
are no longer saved in the RAM during
time step t (see Fig. 3). For the sake of simplicity, we assume the
two main variables occupy the same amount of RAM, U bytes. Thus
the DP algorithms total RAM amount simply can be expressed as:
RAM
1
m
n
T 2 U 13
where RAM
1
is the RAM amount running the DP algorithm on one
computing process (byte).
From Eqs. (12) and (13), it can be seen that computation time
and computer memory are proportional to m
n
for a multi-reservoir
DP problem, and for this reason the curse of dimensionality issue
arises.
3.2. Peer-to-peer parallel paradigm
In order to apply the DP algorithm to multi-reservoir system
optimization, it is necessary to develop an effective parallelization
strategy for the DP algorithm in order to shorten computation time
as well as alleviate the RAM bottleneck. The RAM bottleneck would
make the DP algorithm un-implementable when a single comput-
ing process is used or several computing processes are used on the
shared memory architecture.
The purpose of parallelization is to decompose the original task
into several subtasks. Moreover, the parallelization distributes the
total RAM associated with the task into several subtasks, each of
which requires less RAM. Fig. 5 illustrates the way we distribute
the above-mentioned two types of variables among K computing
processes. We believe that the distribution only can be made along
the vertical direction since the DP model features accumulation
time step by time step (refer to Eq. (1)). Parallelization is based
on the peer-to-peer parallel paradigm, consisting of two types of
computing processes, i.e. K peer processes and one transfer pro-
cess. Each peer process is in charge of sub-allocation of the total
RAM. The total number of storage combinations m
n
are distributed
among K peer processes, with the number of storage combinations
m
k
allocated to peer process k, k 2 1; K, yielding m
n

P
K
k1
m
k
. For
the sake of clarity, let C

1
; . . . ; C

k
; . . . ; C

T
and F

t
F

1;t
; . . . ;
F

k;t
; . . . ; F

K;t

T
, as shown in Fig. 5.
Parallelization should consider the concurrency and depen-
dency among subtasks undertaken by the K peer processes. Here,
the term concurrency refers several subtasks that can be exe-
cuted simultaneously on multiple computing processes. The term
dependency refers to a computing process that can perform a
subtask only after the other computing processes have nished
certain subtasks. As far as concurrency is concerned, during time
step t the computation of each optimal transition, say
C

p
t1
; t 1, is independent of the others in the DP algorithm;
Fig. 4. Flowcharts of Procedure I and Procedure II of the serial DP algorithm.
X. Li et al. / Advances in Water Resources 67 (2014) 115 5
that is, the computations of all C

p
t1
; t 1 p
t1
1; 2; . . . ; m
n
can
be executed simultaneously. As far as dependency is concerned, to
compute C

p
t1
; t 1 during time step t, all optimal transition
C

p
t
; t p
t
1; 2; . . . ; m
n
and maximum cumulative returns
F

t
Cp
t
; t p
t
1; 2; . . . ; m
n
, which are saved among K peer pro-
cesses in this paradigm (see Fig. 5), should be known prior. In other
words, a peer process cannot accomplish the computation for
C

p
t1
; t 1 unless all peer processes have nished all computa-
tions for C

p
t
; t p
t
1; 2; . . . ; m
n
and F

t
Cp
t
; t p
t
1; 2; . . . ; m
n
.
Thus there is an underlying synchronization during each time step
in this paradigm.
The two processes have basic functions. Each peer process eval-
uates a subtask of the current maximum cumulative returns and
optimal transitions based on the previous maximum cumulative
returns completed by the K peer processes and after the evalua-
tions the subtask of current maximum cumulative returns and
the optimal transitions are saved in the RAM. The transfer process
is in charge of communicating maximum cumulative returns
among all peer processes and tracing back the optimal path.
Figs. 6 and 7 showthe two procedures (Procedure I and Procedure
II) of the parallel DP algorithm. Unlike the procedures of serial DP
algorithm in Fig. 4, some inter-process communicating statements
are added between the peer processes and the transfer process.
Fig. 6 presents the owchart of peer process k and the transfer
process of Procedure I of the parallel DP algorithm. Peer process j is
any one of the K peer processes that has the same workows as peer
process k, andpeer process k or the transfer process receive the mes-
sageof peer process j basedontherst come, rst served principle;
andwis a counter variable. The morespecic workows of Procedure
I of the parallel DP algorithm include the following steps:
(1) Start to allocate a sub-allocation of the total RAM for all
processes;
(2) When t 1, for each peer process, say peer process k, initial-
ize F

k;2
and C

k
2 under the given F

1
0 and Cp
1
; 1;
(3) Each peer process, say peer process k, sends F

k;t1
to the
transfer process at time step t;
(4) The transfer process receives F

j;t1
from a certain peer pro-
cess j based on the principle of rst come, rst served
and sends F

j;t1
to all the peer processes;
(5) Repeat Step (4) until F

j;t1
from each of the peer processes
(i.e. j 1; . . . ; K) is received and sent by the transfer
process;
(6) Each peer process, say peer process k, receives F

j;t
at time
step t 1 (i.e. F

j;t1
at time step t) from the transfer process
based on the principle of rst come, rst served and car-
ries-out the recursive equation to compute and update
F

k;t1
and C

k
t 1 based on the sums of the objective func-
tion resulting from the transitions from C
j
t to C
k
t 1 and
F

j;t
;
(7) Repeat Step (6) until F

j;t
is received K times from the transfer
process (i.e. j 1; . . . ; K);
(8) Repeat Steps (3)(7) until the nal time step T is reached.
After Procedure I, each peer process, say peer process k, saves C

k
,
F

k;T
and F

k;T1
in its RAM. Then, Procedure II of the parallel DP algo-
rithm begins, as shown in Fig. 7, and the workows include the fol-
lowing steps:
(1) Each peer process, say peer process k, sends F

k;T1
and
C

k
T 1 to the transfer process;
(2) The transfer process receives F

j;T1
and C

j
T 1 from a
certain peer process j based on the principle of rst come,
rst served and compares the elements in F

j;T1
and
updates the maximum F

T1
C

p
T1
; T 1 and consequent
C

p
T1
; T 1;
Fig. 5. Schematic illustration of computing procedures and computer memory used for the parallel DP algorithm.
6 X. Li et al. / Advances in Water Resources 67 (2014) 115
Fig. 6. Flowchart of peer process k and transfer process of Procedure I of the parallel DP algorithm.
Fig. 7. Flowchart of peer process k and transfer process of Procedure II of the parallel DP algorithm.
X. Li et al. / Advances in Water Resources 67 (2014) 115 7
(3) Repeat Step (2) until F

j;T1
and C

j
T 1 from all peer pro-
cesses (i.e. j 1; . . . ; K) are received and the maximum
F

T1
C

p
T1
; T 1 and consequent C

p
T1
; T 1 are
derived;
(4) If t > 1, go to Step (5); otherwise go to Step (10);
(5) Identify the peer process that saves C

p
t
; t based on
C

p
t1
; t 1 and let X 1 (where X is used to judge which
choice the peer process should make) for the peer process,
X 2 for the other peer processes, and W 0 (where W is
used to judge whether to end the transfer process);
(6) The transfer process sends X to all the peer processes;
(7) Each peer process, say peer process k, receives X from the
transfer process. If X 1, the peer process should send
C

k
t to the transfer process and then repeat to receive X
for the next time step. If X 2, the peer process directly
repeats to receive X for the next time step;
(8) The transfer process receives the required C

j
t, which saves
the C

p
t
; t at time step t (i.e. C

p
t1
; t 1 at time step
t 1) from the required peer process, say peer process j;
(9) Repeat Steps (4)(8);
(10) Let X 3 for all peer processes and let W 1;
(11) Each peer process ends its process, once receive X 3 from
the transfer process;
(12) The transfer process ends its process when W 1.
The optimal path is derived by the transfer process with Proce-
dure II. Furthermore, the consequent storages and releases of mul-
tiple reservoirs also can be determined for all time steps with the
derived optimal path.
We consider a straightforward workload balancing strategy to
decrease the idle computing processes and further obtain good
performance on parallel efciency. The strategy is to approxi-
mately allocate the same amount of subtask to each peer process.
Thus the number of storage combinations saved in a peer process is
dened as:
m
k
a 1 k 6 b
m
k
a b < k 6 K

14
where
a um
n
=K 15
b vm
n
; K 16
and where u is the oor integer function (e.g. u5=3 1) and
m is the remainder function (e.g. v5; 3 2). However, we note
that the discarded infeasible solutions (see Section 3.1) still result
in imbalance tasks assigned to all peer processes.
The computation time of using the parallel DP algorithm is esti-
mated as:
s
K
s
0
s
00
s
000
=K 17
where s
K
is the wall clock time of using K peer processes, consisting
of the computation time fraction s
0
, the communication time frac-
tion s
00
and the workload imbalance time cost s
000
. We employ the
parallel efciency E
K
as a measurement to evaluate the parallel per-
formance, calculated as:
E
K
s
1
=K s
K
18
The parallel efciency of a parallel program is dependent on the ra-
tio between the communication time and the computation time. If
the ratio is small, the parallel efciency is high; otherwise the par-
allel efciency is low.
The RAM of the parallel DP algorithm allocated to each peer
process, say peer process k, is estimated as:
RAM
k
m
n
T 2 U=K 19
where RAM
k
is the RAM amount for each peer process (byte). For a
system of distributed memory architecture, such as Fig. 1, all
computing processes in a blade share RAM. Suppose H computing
processes share RAM whose size is RAM bytes. When using the
parallel DP algorithm, we roughly can determine the upper bound
of RAM by the following equation:
m
n
T 2 U=K H 6 RAM 20
From Eq. (20), we see that the RAM requirement for a multi-
reservoir DP problem can be alleviated by the number of comput-
ing processes (K). This is a breakthrough in that multi-reservoir
system optimization problems that previously could not be solved
by a serial DP algorithm on a single computer because of RAM
requirements now can be solved with parallel computing.
However, we note that an increase in reservoir number will result
in an increase in computing process demand.
4. The four-reservoir example
4.1. Problem description
We rst apply the developed parallel DP algorithm to the clas-
sic, hypothetical four-reservoir problem (see Fig. 8). This problem
has been studied in the literature by several researchers. Larson
[15] solved the problem by IDP. Heidari et al. [9] solved the same
problem by DDDP. More recently, Wardlaw and Sharif [34] used
a genetic algorithm while Kumar and Reddy [12] used particle
swarm optimization to evaluate the performance of their heuristic
techniques.
From Fig. 8, we see that the reservoir system consists of both
series and parallel connections (L 1, U
1
1; 3). The reservoir
system connectivity matrix is:
M
1 0 0 0
0 1 0 0
0 1 1 0
1 0 1 1
2
6
6
6
4
3
7
7
7
5
21
The inows are assumed to be constant for the entire operating
period and are set as:
I
2
3
0
0
2
6
6
6
4
3
7
7
7
5
22
Fig. 8. The four-reservoir problem.
8 X. Li et al. / Advances in Water Resources 67 (2014) 115
The main objectives of the system operation are to maximize
the benets from hydropower generation from reservoirs
i 1; 2; 3; 4 as well as the benets from irrigation from reservoir
i 4 over 12 time steps (n 4, T 12). We expand the objective
function f
t
to be maximized during time step t to:
f
t

X
4
i1
b
i
t R
i
t b
5
t R
4
t t < 12
X
4
i1
b
i
t R
i
t b
5
t R
4
t
X
4
i1
g
i
S
i
13; d
i
t 12
8
>
>
>
>
<
>
>
>
>
:
23
where R
i
t (i 1; 2; 3; 4) can be computed directly from Eq. (2)
once S
i
t and S
i
t 1 (i 1; 2; 3; 4) are chosen; b
i
t is the benet
function (refer to [15]); and g
i
is the penalty function for not
meeting the ending storage constraint (d
i
):
g
i
S
i
13; d
i

40 S
i
13 d
i

2
S
i
13 6 d
i
0 S
i
13 > d
i
(
24
Other constraints include:
S
min

0
0
0
0
2
6
6
6
4
3
7
7
7
5
; S
max

10
10
10
15
2
6
6
6
4
3
7
7
7
5
25
S
initial

5
5
5
5
2
6
6
6
4
3
7
7
7
5
; S
final

5
5
5
7
2
6
6
6
4
3
7
7
7
5
26
R
min

0
0
0
0
2
6
6
6
4
3
7
7
7
5
; R
max

3
4
4
7
2
6
6
6
4
3
7
7
7
5
27
Note that Eqs. (7)(9) are inactive in this example.
4.2. Results and discussion
In this example, the storage in each reservoir is discretized with
DS 1 unit, and thus the total number of storage combinations is
11 11 11 16 = 21,296. We solved the four-dimensional DP
model on the HPC system described above. First, we applied the se-
rial DP algorithm with a single computing process. The global opti-
mum is 401.3. The wall clock time (T
1
) was 1818.6 s. Then we
performed several executions using the developed parallel DP
algorithm by varying the peer process numbers. As expected, all
executions produced the same global optimum, except with differ-
ent wall clock times. The computation time fraction s
0
, the sum of
communication time fraction and workload imbalance time cost
s
00
s
000
, the wall clock time s
K
and the parallel efciency E
K
are
presented in Table 1. As we can see, the wall clock time T
K
is dras-
tically reduced, from1818.6 s to 9.7 s with 350 peer processes. This
reduction is quite substantial.
Several probable reasons affect the parallel efciency:
(1) The hyper-threading technology. (a) If there are one peer
process and one transfer process, we can conclude from
the result that the OS schedules the two processes on one
physical core. The peer process does not compete with the
transfer process for execution resources, and thus the com-
putation time fraction is almost the same with the serial
DP algorithm. (b) If there are two peer processes and one
transfer process, we conclude that one peer process and
the transfer process are scheduled on one physical core,
while the other peer process is scheduled on the other phys-
ical core. Because the three processes do not compete for
execution resources, the parallel efciency is as high as
0.99. (c) If there are three peer processes and one transfer
process, there would be two peer processes scheduled on
the same physical core. Because the two peer processes com-
pete for execution resources, the parallel efciency is
reduced to 0.72. (d) By this reasoning, parallel efciencies
increase when there are even numbers of peer processes
and decrease when there are odd numbers of peer processes.
However, as the number of peer processes becomes large,
the uctuations are not obvious.
(2) The workload imbalance. From Fig. 9, we see that as the
number of peer processes increases, the sum of communica-
tion time fraction and workload imbalance time cost
increases, then decreases, and then increases again. This is
because when the number of peer processes is small, the
workload imbalance (because various numbers of infeasible
solutions assigned to the peer processes will be discarded,
[see Section 3.1]) is signicant, and the fast peer processes
must wait until the slowest peer process nishes its task.
However, when the number of peer processes becomes
large, the amounts of task assigned to peer processes
decreases and the inuence of workload imbalance dimin-
ishes. Although there are still imbalance tasks among vari-
ous peer processes, the fast peer processes do not have to
wait long. Because the sum of communication time fraction
and workload imbalance time cost increases linearly as the
number of peer processes increases, we can infer that the
developed parallel DP algorithm is scalable and not
restricted by the increase in the number of cores.
(3) The turbo boost technology (referred to as dynamic overc-
locking). The clock rate depends on the CPUs thermal limit,
the number of cores in use as well as the maximum fre-
quency of the active cores. If the CPU is below its thermal
limits, the operating frequency will increase; otherwise the
operating frequency will be xed on the standard frequency.
Table 1
The computation time fraction, the sum of communication time fraction and
workload imbalance time cost, the wall clock time and the parallel efciency of the
parallel DP algorithm for the classic four-reservoir problem.
No. of peer processes s
0
(s) s
00
s
000
(s) s
K
(s) E
K
Serial 1818.6 1.00
1 1836.2 0.0 1836.2 0.99
2 1833.8 1.3 917.5 0.99
3 2427.5 100.6 842.7 0.72
4 2226.3 123.4 587.4 0.77
5 2595.5 115.2 542.1 0.67
6 2373.6 181.3 425.8 0.71
7 2688.3 140.5 404.1 0.64
8 2510.6 194.1 338.1 0.67
9 2732.9 132.6 318.4 0.63
10 2593.3 153.7 274.7 0.66
25 2948.3 90.9 121.6 0.60
50 2987.1 60.6 61.0 0.60
75 3016.1 33.3 40.7 0.60
100 3036.3 93.7 31.3 0.58
125 3073.2 75.3 25.2 0.58
150 3058.1 106.0 21.1 0.57
175 3066.6 140.8 18.3 0.57
200 3059.8 134.0 16.0 0.57
225 3115.4 147.1 14.5 0.56
250 3087.7 197.5 13.1 0.55
275 3106.8 210.5 12.1 0.55
300 3113.3 216.7 11.1 0.55
325 3080.3 250.6 10.2 0.55
350 3137.0 243.0 9.7 0.54
X. Li et al. / Advances in Water Resources 67 (2014) 115 9
5. A real-world application
5.1. Problem description
We now apply the developed parallel DP algorithm to a real-
world reservoir system located in the central Yangtze River Basin,
China (see Fig. 10). The system consists of ve reservoirs (n 5):
the Three Gorges Project (TGP) and the Gezhouba (GZB) on the
Yangtze River; and the Shuibuya (SBY), the Geheyan (GHY) and
the Gaobazhou (GBZ) on the Qingjiang River (which joins the Yan-
gtze River at the town of Zhicheng). As shown in Fig. 10 (right),
these reservoirs are numbered from i 1 to i 5 and the river
conuence point is denoted as l 1 (U
1
2; 5). The natural inow
between the TGP and the GZB can be ignored because of their close
proximity. At present, all ve reservoirs are operated by two corpo-
rations. The TGP and GZB cascade hydropower plants (TGPGZB)
are managed by the China Three Gorges Corporation, while the
SBY, GHY and GZB cascade hydropower plants (SBYGHYGBZ)
are under the jurisdiction of the Hubei Qingjiang Hydroelectric
Development Co., Ltd. The joint operation of this ve-reservoir sys-
tem is of major interest to the Ministry of Science and Technology
of China.
In this reservoir system, the SBY is a multi-year storage reser-
voir with 24.0 10
8
m
3
of active storage capacity; the GHY is a
yearly storage reservoir with 11.5 10
8
m
3
of active storage
capacity; the TGP is a seasonal storage reservoir with
221.5 10
8
m
3
of active storage capacity; and the GZB and GBZ
are daily storage reservoirs with active storage capacities
0.8 10
8
m
3
and 0.5 10
8
m
3
, respectively much smaller than
the other three reservoirs. The main characteristics and the operat-
ing rules of the ve reservoirs are listed in Table 2, where
S
max
S
min
denotes the active storage capacity; H is the
conversion function from reservoir storage to water level; N
min
denotes the guaranteed output production of a hydropower plant;
and N
max
denotes the installed capacity of a hydropower plant.
Fig. 9. Computation time fraction versus the sum of communication time fraction and workload imbalance time cost.
Fig. 10. The ve-reservoir system.
Table 2
Main characteristics and operating rules of the ve reservoirs.
Reservoir TGP GZB SBY GHY GBZ
S
min
(10
8
m
3
)
171.5 6.3 19.0 18.7 3.5
S
max
(10
8
m
3
) 393.0 7.1 43.0 30.2 4.0
S
max
S
min
(10
8
m
3
)
221.5 0.8 24.0 11.5 0.5
HS
min
(m)
145.0 63.0 350.0 180.0 78.0
HS
max
(m) 175.0 66.0 400.0 200.0 80.0
R
min
(m
3
/s)
6000 6000 0 0 0
R
max
(m
3
/s) 101,700 113,400 13,200 18,000 18,400
N
min
(MW)
4990 1040 310 241.5 77.3
N
max
(MW) 22,400 2757 1840 1212 270
10 X. Li et al. / Advances in Water Resources 67 (2014) 115
We use three scenarios for the case study, as shown in Table 3,
where PR
min
and PR
max
, respectively, denote the mandatory release
and the maximum allowable release at river conuence point l 1.
Scenario 1 is built for the TGPGZB; scenario 2 is for the SBYGHY
GBZ; and scenario 3 encompasses the entire ve-reservoir system.
Scenario 3 is conducted under the assumption that the ve-reser-
voir system is under joint operation. To test our proposed method-
ology, we further assume that energy generation is transmitted to
the same grid. This assumption may differ slightly from what actu-
ally occurs in practice. Furthermore, for conuence point l 1, the
Table 3
Three scenarios used in the real-world problem.
Scenario System
PR
min
(m
3
/s) PR
max
(m
3
/s) N
min
(MW) N
max
(MW)
1 TGPGZB 6000 56,700 6030 20,957
2 SBYGHYGBZ 0 18,400 628.8 3322
3 The ve-reservoir system 6000 56,700 6658.8 24,279
Table 4
The optimal energy generation from the three scenarios (10
8
KW h).
Scenario TGP GZB SBY GHY GBZ Total
1 8364.9 1528.8 9893.7
2 330.5 274.9 87.8 693.2
3 8411.9 1525.0 342.1 271.5 86.0 10636.5
(3-1-2)
*
47.0 3.8 11.6 3.4 1.8 49.6
*
Note: (3-1-2) denotes the scenario 3 minus scenario 1 minus scenario 2.
(a) Release at river confluence point 1 l = from scenario 1.
(b) Release at river confluence point 1 l = from scenario 3.
Fig. 11. Comparison of releases at river conuence point l 1, scenarios 1 versus 3.
X. Li et al. / Advances in Water Resources 67 (2014) 115 11
mandatory release is 6000 m
3
/s for both navigational and ecologi-
cal purposes, and the maximum allowable release is 56,700 m
3
/s
for downstream ood protection.
The objectives of the three scenarios are to maximize energy
generation over a recent 10-year horizon (June 2000 through
May 2010). The operating period is divided into 360 time steps
(a) Output production from scenario 1.
(b) Output production from scenario 2.
(c) Output production from scenario 3.
Fig. 12. Comparison of output productions, scenarios 1 and 2 versus scenario 3.
12 X. Li et al. / Advances in Water Resources 67 (2014) 115
(each with a length of about 10 days and T 360). We expand the
objective function f
t
to be maximized during time step t to:
f
t

X
I
i
N
i
t Dt
X
I
i
9:81 g
i
R
0
i
t H
i
t Dt 8t 28
where
R
i
t R
0
i
t R
00
i
t 8t 29
H
i
t HF
i
t HT
i
t 8t 30
g
i
is the hydropower plant efciency at reservoir i; R
0
i
t and R
00
i
t
are the power release and non-power release from reservoir i dur-
ing time step t; H
i
t is the average head at reservoir i during time
step t, which is the difference between the average reservoir fore-
bay water level HF
i
t (as a function of beginning and ending time
period reservoir storages) and the tail-race water level HT
i
t (as a
function of total release) at reservoir i during time step t; and Dt
is the time interval. In this case, each reservoir operates under its
own constraints (see Table 2), while the reservoir system operates
under the system constraints (see Table 3).
5.2. Results and discussion
In the real-world case, the GZB and GBZ are treated as run-of-
river hydropower plants because of their much smaller storages
compared with the TGP, SBY and GHY (see Table 2), which are
operated as storage reservoirs. The ratio of active storages of the
three reservoirs is approximately 20:2:1 (i.e. 221.5 10
8
m
3
:
24.0 10
8
m
3
: 11.5 10
8
m
3
), and thus we discretize them into
200, 20 and 10 levels, respectively. The total numbers of storage
combinations are 200 1 = 200, 20 10 = 200 and
200 20 10 = 40,000 for the three scenarios. Similarly, the com-
putation was performed on the HPC system with Oracle 11g serv-
ing as the database system for data access.
The optimal energy generations of the three scenarios are
shown in Table 4. The sum of optimal energy production of sce-
nario 1 and scenario 2 is 9893.7 + 693.2 = 10586.9 10
8
KW h, less
than the 10636.5 10
8
KW h of scenario 3. This indicates that the
coordinated operation of the ve reservoirs can result in an aver-
age 4.96 10
8
KW h energy production increase (or about
$1.24 10
8
CNY increase, assuming the energy price of the ve
reservoirs is $0.25 CNY/KW h [18]) per year for the system. The
comparisons of releases at river conuence point l 1 and the out-
put productions between scenarios 1 and 2 versus scenario 3 are
shown in Figs. 11 and 12, respectively. Obviously, the SBYGHY
GBZ system helps the TGPGZB system relieve the stress of water
supply demand at river conuence point l 1. The ve-reservoir
system can provide the grid with more secure and reliable output
production.
For scenario 3, the serial DP algorithm took more than 10 days
(T
1
). Similar to the four-reservoir problem, we performed several
executions using the developed parallel DP algorithm by varying
the number of peer processes. The computing procedures are com-
pletely the same for various executions, and thus the objective
function value and the consequent storages and releases are the
same. The wall clock time T
K
and the parallel efciency E
K
for var-
ious numbers of peer processes are shown in Table 5. As we can
see, the wall clock time T
K
is reduced from 266.83 h to 1.54 h with
350 peer processes. Again, this reduction is quite substantial. It is
clear that an increase in the number of peer processes is accompa-
nied by a decrease in wall clock time. Although each peer process is
allocated the same approximate amount of subtask, workload
imbalance still exists, which may result in a little less efciency
relative to the hypothetical problem.
6. Conclusions
In this paper, we establish a multi-dimensional DP model for
optimizing the joint operation of a multi-reservoir system, consid-
ering several indispensable constraints for individual reservoirs
and the reservoir system. We illustrate the DP algorithms solution
space with the help of a matrix and estimate the smallest RAM re-
quired by the DP algorithm, making full preparations for parallel-
ization. We believe an effective parallelization strategy for the DP
algorithm should be designed specially to alleviate the RAM bottle-
neck. For instance, the discretization should be sufciently ne for
an accurate simulation in real-time operation. If we use a discret-
ization level of 10 cm for each of the ve reservoirs in Section 5 and
30 days as the inow forecast period, the RAM requirement will in-
crease to approximately 2.09 TB when using the serial DP algo-
rithm. In this situation, a single computer or the shared memory
architecture no longer can meet the large RAM requirement. The
distributed memory architecture and the message passing inter-
face (MPI) protocol make it possible for us to develop a parallel
DP algorithm that considers both the distributed computing and
distributed computer memory, and further, to solve previously
unsolvable multi-reservoir DP problems with parallel computing.
In the above instance, the RAM requirement will be reduced to
approximate 2.09/K TB to each peer process, where K is the num-
ber of peer processes.
Using the developed parallel DP algorithm based on the peer-
to-peer parallel paradigm, we solve the classic four-reservoir prob-
lem and a real-world ve-reservoir system on a HPC system with
up to 350 cores. The results indicate that the wall clock times are
reduced drastically when the number of computing processes is in-
creased. In both cases, we observe good performance in parallel
efciency. Furthermore, the real-world problem results indicate
that operational effectiveness can be improved and the benecial
uses can be maximized by joint operation of the interconnected
ve reservoirs. Specically, (1) the energy production of the system
can be increased by an average of 4.96 10
8
KW h per year; (2) the
stress of meeting the minimum water supply demand at river con-
uence point l 1 can be relieved greatly by the inclusion of the
SBYGHYGBZ system with the TGPGZB system; and (3) more se-
cure and reliable output productions can be guaranteed and trans-
mitted to the grid.
The Casti et al. [5] paper sends a message of condence in con-
structing computers with many processing elements to deal with
some future DP problems. Even at the time of that study (i.e.
1970s), NASAs parallel computer had only 64 processing elements.
Over last twenty years, however, we have witnessed the rapid
development of supercomputing worldwide. According to TOP
500 supercomputer sites historical lists (http://www.top500.org/
), the number of cores in top supercomputers has increased from
several thousands to several millions, and computer memory has
Table 5
The wall clock time and the parallel efciency of the parallel DP algorithm for the ve-reservoir system.
No. of peer processes Serial 50 100 150 200 250 300 350
s
K
(h) 266.83 10.11 5.17 3.51 2.61 2.12 1.77 1.54
E
K
1.00 0.53 0.52 0.51 0.52 0.50 0.50 0.50
X. Li et al. / Advances in Water Resources 67 (2014) 115 13
increased correspondingly. Given such advances, we predict that
the barrier in computing resources may be inconsequential in the
future. Moreover, we believe the benets resulting from hydro-
power generation of a multi-reservoir system far exceed the ex-
pense of computing resources. Thus, we hope this paper will
stimulate future research on parallel computing in the eld of res-
ervoir operation.
Future work should implement the parallel DP algorithm using
massively parallel computing resources. Indeed, we have noticed
the trend to use massively parallel computing resources for some
water resources problems. For instance, Reed and Kollat [26] em-
ployed a massively parallel multi-objective evolutionary algorithm
for a groundwater monitoring application with a maximum of
8192 processors; while Kollet et al. [11] and Maxwell [20], respec-
tively, performed a ParFlow of various coupled modes utilizing the
maximum number of 16,384 processors.
Future work also could introduce the distributed database man-
agement technique to further alleviate RAM requirements for mul-
ti-reservoir DP problems. This paper distributes the optimal
transitions C

and maximum cumulative returns F

t
and F

t1
and
saves them in RAMs of several computing processes. Future work
could distribute and save them in several databases in various
computers. This would take advantage of large hard disk capacities
and access the required data from the database of identied com-
puter when the DP algorithm solves the recursive equation and
traces back the optimal path.
Finally, the developed parallel DP algorithm easily can be ap-
plied to other DP-based variants, such as SDP, IDP and DDDP.
Notation
t time index, t e [1, T]
i reservoir index, i e [1, n]
l river conuence point index, l e [1, L]
k peer process index, k e [1, K]
S(t) reservoir storage vector at the beginning of time
step t, S(t) = [S
1
(t), ... , S
i
(t), ... , S
n
(t)]
T
S(t + 1) reservoir storage vector at the end of time step t
S
initial
initial reservoir storage vector
S
nal
nal expected reservoir storage vector
S
min
(t + 1) minimum reservoir storage vector at the end of
time step t
S
max
(t + 1) maximum reservoir storage vector at the end of
time step t
F

t
maximum cumulative return from the rst time
step to the beginning of the tth time step
resulting from the joint operation of n reservoirs
f
t
() objective function to be maximized during time
step t
I(t) inow vector during time step t
R(t) total release vector during time step t,
R(t) = [R
1
(t), ... , R
i
(t), ... , R
n
(t)]
T
R
min
(t) minimum required release vector during time
step t
R
max
(t) maximum allowable release vector during time
step t
g
i
hydropower plant efciency at reservoir i
R
0
i
t power release from reservoir i during time step t
R
00
i
t non-power release from reservoir i during time
step t
H
i
t
average head at reservoir i during time step t,
which is equal to the average reservoir fore-bay
water level HF
i
(t) minus the tail-race water level
HT
i
(t)
Dt time interval
M reservoir system connectivity matrix
U
l
set of reservoirs in parallel at river conuence
point l
PR
min
l
t
minimum required release at river conuence
point l during time step t
PR
max
l
t maximum allowable release at river conuence
point l during time step t
N(t) output vector during time step t,
N(t) = [N
1
(t), ... , N
i
(t), ... , N
n
(t)]
T
N
min
(t) minimum required output vector during time
step t
N
max
(t) maximum allowable output vector during time
step t
N
min
(t) minimum required output from a reservoir
system during time step t
N
max
(t) maximum allowable output from a reservoir
system during time step t
C possible storage combination m
n
T matrix,
C = [C(2), ... , C(t), ... , C(T + 1)]
C

optimal transitions, C

1
; . . . ; C

k
; . . . ; C

T
F

t
maximum cumulative returns from the rst time
step to the beginning of the tth time step,
F

t
F

1;t
; . . . ; F

k;t
; . . . ; F

K;t

T
m number of discretization levels of each reservoir
m
k
allocation number of storage combinations to
peer process k
s
1
wall clock time of using 1 computing process
s
K
wall clock time of using K peer processes
Ds average wall clock time for one-time objective
function evaluation
s
0
computation time fraction
s
00
communication time fraction
s
000
workload imbalance time cost
E
K
parallel efciency
RAM
1
RAM amount for a single computing process
(byte)
RAM
k
RAM amount for each peer process (byte)
Acknowledgements
The research is supported by the National Key Technologies
R&D Program #2013BAB05B03 and #2009BAC56B03 and the Na-
tional Natural Science Foundation #51109114 in China. The rst
author is supported by a fellowship from the Chinese government
for his visit to the University of California, Los Angeles. Partial sup-
port is also provided from an AECOM endowment. The authors are
very grateful to the two anonymous reviewers for their in-depth
reviews and constructive comments, which greatly helped improve
the paper.
References
[1] Bastian P, Helmig R. Efcient fully-coupled solution techniques for two-phase
owin porous media: parallel multigrid solution and large scale computations.
Adv Water Resour 1999;23(3):199216. http://dx.doi.org/10.1016/S0309-
1708(99)00014-7.
[2] Bellman R. Adaptive control processes: a guided tour. Princeton, NJ: Princeton
University Press; 1961.
[3] Bellman R. Dynamic programming. Princeton, NJ: Princeton University Press;
1957.
[4] Bhaskar NR, Whitlatch EE. Derivation of monthly reservoir release policies.
Water Resour Res 1980;16(6):98793. http://dx.doi.org/10.1029/
WR016i006p00987.
14 X. Li et al. / Advances in Water Resources 67 (2014) 115
[5] Casti J, Richardson M, Larson R. Dynamic programming and parallel computers.
J Optim Theory App 1973;12(4):42338. http://dx.doi.org/10.1007/
BF00940421.
[6] Chandramouli V, Raman H. Multireservoir modeling with dynamic
programming and neural networks. J Water Resour Plann Manage
2001;127(2):8998. http://dx.doi.org/10.1061/(ASCE)0733-
9496(2001)127:2(89).
[7] El Baz D, Elkihel M. Load balancing methods and parallel dynamic
programming algorithm using dominance technique applied to the 01
knapsack problem. J Parallel Distrib Comput 2005;65(1):7484. http://
dx.doi.org/10.1016/j.jpdc.2004.10.004.
[8] Hall WA, Butcher WS, Esogbue A. Optimization of the operation of a multiple-
purpose reservoir by dynamic programming. Water Resour Res
1968;4(3):4717. http://dx.doi.org/10.1029/WR004i003p00471.
[9] Heidari M, Chow VT, Kokotovic PV, Meredith DD. Discrete differential dynamic
programming approach to water resources systems optimization. Water
Resour Res 1971;7(2):27382. http://dx.doi.org/10.1029/WR007i002p00273.
[10] Kollet SJ, Maxwell RM. Integrated surface-groundwater ow modeling: a free-
surface overland ow boundary condition in a parallel groundwater ow
model. Adv Water Resour 2006;29(7):94558. http://dx.doi.org/10.1016/
j.advwatres.2005.08.006.
[11] Kollet SJ, Maxwell RM, Woodward CS, Smith S, Vanderborght J, Vereecken H,
Simmer C. Proof of concept of regional scale hydrologic simulations at
hydrologic resolution utilizing massively parallel computer resources. Water
Resour Res 2010;46(4). http://dx.doi.org/10.1029/2009WR008730.
[12] Kumar DN, Reddy MJ. Multipurpose reservoir operation using particle swarm
optimization. J Water Resour Plann Manage 2007;133(3):192201. http://
dx.doi.org/10.1061/(ASCE)0733-9496(2007)133:3(192).
[13] Labadie JW. Optimal operation of multireservoir systems: state-of-the-art
review. J Water Resour Plann Manage 2004;130(2):93111. http://dx.doi.org/
10.1061/(ASCE)0733-9496(2004)130:2(93).
[14] Larson RE, Korsak AJ. A dynamic programming successive approximations
technique with convergence proofs. Automatica 1970;6(2):24552. http://
dx.doi.org/10.1016/0005-1098(70)90095-6.
[15] Larson RE. State increment dynamic programming. New York: Elsevier
Science; 1968.
[16] Li T, Wang G, Chen J, Wang H. Dynamic parallelization of hydrological model
simulations. Environ Modell Softw 2011;26(12):173646. http://dx.doi.org/
10.1016/j.envsoft.2011.07.015.
[17] Li X, Wei J, Fu X, Li T, Wang G. A knowledge-based approach for reservoir
system optimization. J Water Resour Plann Manage, in press. doi: http://
dx.doi.org/10.1061/(ASCE)WR.1943-5452.0000379.
[18] Li X, Li T, Wei J, Wang G, Yeh WWG. Hydro unit commitment via mixed integer
linear programming: a case study of the three gorges project, China, IEEE Trans
Power Syst, in press. doi: http://dx.doi.org/10.1109/TPWRS.2013.2288933.
[19] Martins WS, Del Cuvillo JB, Useche FJ, Theobald KB, Gao GR. A multithreaded
parallel implementation of a dynamic programming algorithm for sequence
comparison. Pac Symp Biocomput 2001;6:31122.
[20] Maxwell RM. A terrain-following grid transform and preconditioner for
parallel, large-scale, integrated hydrologic modeling. Adv Water Resour
2012;53:10917. http://dx.doi.org/10.1016/j.advwatres.2012.10.001.
[21] Message passing interface forum. <http://www.mpi-forum.org/index.html>.
[22] Mousavi SJ, Karamouz M. Computational improvement for dynamic
programming models by diagnosing infeasible storage combinations. Adv
Water Resour 2003;26(8):8519. http://dx.doi.org/10.1016/S0309-
1708(03)00061-7.
[23] Nemhauser GL. Introduction to dynamic programming. New York: John Wiley;
1966.
[24] Piccardi C, Soncini-Sessa R. Stochastic dynamic programming for reservoir
optimal control: dense discretization and inow correlation assumption made
possible by parallel computing. Water Resour Res 1991;27(5):72941. http://
dx.doi.org/10.1029/90WR02766.
[25] Pool R. Massively parallel machines usher in next level of computing power.
Science 1992;256:501. http://dx.doi.org/10.1126/science.256.5053.50.
[26] Reed PM, Kollat JB. Visual analytics clarify the scalability and effectiveness of
massively parallel many-objective optimization: a groundwater monitoring
design example. Adv Water Resour 2013;56:113. http://dx.doi.org/10.1016/
j.advwatres.2013.01.011.
[27] Rouholahnejad E, Abbaspour KC, Vejdani M, Srinivasan R, Schulin R, Lehmann
A. A parallelization framework for calibration of hydrological models. Environ
Modell Softw 2012;31:2836. http://dx.doi.org/10.1016/
j.envsoft.2011.12.001.
[28] Rytter W. On efcient parallel computations for some dynamic programming
problems. Theor Comput Sci 1988;59(3):297307. http://dx.doi.org/10.1016/
0304-3975(88)90147-8.
[29] Sulis A. GRID computing approach for multireservoir operating rules with
uncertainty. Environ Modell Softw 2009;24(7):85964. http://dx.doi.org/
10.1016/j.envsoft.2008.11.003.
[30] Tan G, Sun N, Gao GR. A parallel dynamic programming algorithm on a multi-
core architecture. In: Proceedings of the nineteenth annual ACM symposium
on parallel algorithms and architectures (ACM 2007), 2007. p. 13544. http://
dx.doi.org/10.1145/1248377.1248399.
[31] Tang Y, Reed PM, Kollat JB. Parallelization strategies for rapid and robust
evolutionary multiobjective optimization in water resources applications. Adv
Water Resour 2007;30(3):33553. http://dx.doi.org/10.1016/
j.advwatres.2006.06.006.
[32] Trott WJ, Yeh WWG. Optimization of multiple reservoir system. J Hydraul Eng
Div 1973;99(10):186584.
[33] Wang H, Fu X, Wang G, Li T, Gao J. A common parallel computing framework
for modeling hydrological processes of river basins. Parallel Comput
2011;37(67):30215. http://dx.doi.org/10.1016/j.parco.2011.05.003.
[34] Wardlaw R, Sharif M. Evaluation of genetic algorithms for optimal reservoir
system operation. J Water Resour Plann Manage 1999;125(1):2533. http://
dx.doi.org/10.1061/(ASCE)0733-9496(1999)125:1(25).
[35] Wurbs RA. Reservoir-system simulation and optimization models. J Water
Resour Plann Manage 1993;119(4):45572. http://dx.doi.org/10.1061/
(ASCE)0733-9496(1993)119:4(455).
[36] Wu Y, Li T, Sun L, Chen J. Parallelization of a hydrological model using the
message passing interface. Environ Modell Softw 2013;43:12432. http://
dx.doi.org/10.1016/j.envsoft.2013.02.002.
[37] Yakowitz S. Dynamic programming applications in water resources. Water
Resour Res 1982;18(4):67396. http://dx.doi.org/10.1029/WR018i004p00673.
[38] Yeh WWG. Reservoir management and operations models: a state-of-the-art
review. Water Resour Res 1985;21(12):1797818. http://dx.doi.org/10.1029/
WR021i012p01797.
[39] Young GK. Finding reservoir operating rules. J Hydraul Eng Div
1967;93(HY6):297321.
[40] Zhao T, Cai X, Lei X, Wang H. Improved dynamic programming for reservoir
operation optimization with a concave objective function. J Water Resour
Plann Manage 2012;138(6):5906. http://dx.doi.org/10.1061/(ASCE)WR.1943-
5452.0000205.
X. Li et al. / Advances in Water Resources 67 (2014) 115 15

Automotive Science and Mathematics: Allan Bonnick
100% (1)
Automotive Science and Mathematics: Allan Bonnick
27 pages
AWS D1.1 Quick Reference Guide Prequalified Welds
100% (3)
AWS D1.1 Quick Reference Guide Prequalified Welds
29 pages
Desperate Housewives Season 5
No ratings yet
Desperate Housewives Season 5
312 pages
Reservoirs Optimization With Dynamic Programming: Phan Thi Thu Phuong, Dao Thi Ngoc Han, Hoang Van Lai, Bui Dinh Tri
No ratings yet
Reservoirs Optimization With Dynamic Programming: Phan Thi Thu Phuong, Dao Thi Ngoc Han, Hoang Van Lai, Bui Dinh Tri
12 pages
10.1007@s40899 020 00411 W
No ratings yet
10.1007@s40899 020 00411 W
14 pages
Tree-Based Reinforcement Learning For Optimal Water Reservoir Operation
No ratings yet
Tree-Based Reinforcement Learning For Optimal Water Reservoir Operation
61 pages
Oxford Bookworm Library
No ratings yet
Oxford Bookworm Library
10 pages
DP Pso Cen 634
No ratings yet
DP Pso Cen 634
20 pages
Water Resources Systems:: Modeling Techniques and Analysis
No ratings yet
Water Resources Systems:: Modeling Techniques and Analysis
15 pages
$R14EU7U
No ratings yet
$R14EU7U
3 pages
PDP +Parallel+Dynamic+Programming
No ratings yet
PDP +Parallel+Dynamic+Programming
5 pages
13 - Chapter 4
No ratings yet
13 - Chapter 4
33 pages
Based On Stanford CS161 Slides From Summer 2021 by Karey Shi
No ratings yet
Based On Stanford CS161 Slides From Summer 2021 by Karey Shi
102 pages
Ibarra 2014
No ratings yet
Ibarra 2014
6 pages
Parallel Computing Optimization
No ratings yet
Parallel Computing Optimization
16 pages
CIM-2022-0136 Proof Hi
No ratings yet
CIM-2022-0136 Proof Hi
14 pages
Introduction To Dynamic Programming
No ratings yet
Introduction To Dynamic Programming
15 pages
Lecture 31
No ratings yet
Lecture 31
14 pages
Real-Time Symbolic Dynamic Programming For Hybrid MDPS: Luis G. R. Vianna Leliane N. de Barros Scott Sanner
No ratings yet
Real-Time Symbolic Dynamic Programming For Hybrid MDPS: Luis G. R. Vianna Leliane N. de Barros Scott Sanner
7 pages
Full Access1465-1487
No ratings yet
Full Access1465-1487
23 pages
Domain-Independent Dynamic Programming: Ryo Kuroiwa J. Christopher Beck
No ratings yet
Domain-Independent Dynamic Programming: Ryo Kuroiwa J. Christopher Beck
65 pages
Dynamic Programming Tutorial
No ratings yet
Dynamic Programming Tutorial
15 pages
Dynamic Programming in Computer Science
No ratings yet
Dynamic Programming in Computer Science
49 pages
توضیح خوب ون تی چو - 1974 - اصلی Application of Dddp in Water Resources Planning
No ratings yet
توضیح خوب ون تی چو - 1974 - اصلی Application of Dddp in Water Resources Planning
93 pages
M6L6 LN
No ratings yet
M6L6 LN
8 pages
Dynamicprogrammingkk
No ratings yet
Dynamicprogrammingkk
513 pages
Dynamic Programming An Introduction by Example
No ratings yet
Dynamic Programming An Introduction by Example
24 pages
Lele Improved Algorithms WRR 1987
No ratings yet
Lele Improved Algorithms WRR 1987
6 pages
MIP Formulation For Robust Resource Allocation in Dynamic Real-Time Systems
No ratings yet
MIP Formulation For Robust Resource Allocation in Dynamic Real-Time Systems
8 pages
Lecture 10
No ratings yet
Lecture 10
125 pages
Petrinet and Dynamic Programming
No ratings yet
Petrinet and Dynamic Programming
9 pages
Stochastic Programming-1
No ratings yet
Stochastic Programming-1
26 pages
Belfadil Et Al 2023 Leveraging Deep Reinforcement Learning For Water Distribution Systems With Large Action Spaces and
No ratings yet
Belfadil Et Al 2023 Leveraging Deep Reinforcement Learning For Water Distribution Systems With Large Action Spaces and
9 pages
Lecture 15
No ratings yet
Lecture 15
20 pages
Dynamic Programming
No ratings yet
Dynamic Programming
10 pages
Water Network Operational Optimization: Utilizing Symmetries in Combinatorial Problems by Dynamic Programming
No ratings yet
Water Network Operational Optimization: Utilizing Symmetries in Combinatorial Problems by Dynamic Programming
11 pages
Lecture 8 Dynamic Programming
No ratings yet
Lecture 8 Dynamic Programming
32 pages
DP - Intro With Links
No ratings yet
DP - Intro With Links
10 pages
A Survey of Dynamic Programming Algorithms
No ratings yet
A Survey of Dynamic Programming Algorithms
7 pages
Assignment 1 Gecho 1
No ratings yet
Assignment 1 Gecho 1
4 pages
Problem Statement: D.A. Li N An Et Al
No ratings yet
Problem Statement: D.A. Li N An Et Al
1 page
Data Partitioning
No ratings yet
Data Partitioning
8 pages
Multistage Stochastic Linear Programming Model For Daily Coordinated Multi-Reservoir Operation
No ratings yet
Multistage Stochastic Linear Programming Model For Daily Coordinated Multi-Reservoir Operation
19 pages
Deep Neural Network Approximated Dynamic Programming For Combinatorial Optimization
No ratings yet
Deep Neural Network Approximated Dynamic Programming For Combinatorial Optimization
8 pages
Optimal Operation of Reservoir Systems Using Simulated Annealing
No ratings yet
Optimal Operation of Reservoir Systems Using Simulated Annealing
28 pages
European Journal of Operational Research: F. Höfferl, D. Steinschorn
No ratings yet
European Journal of Operational Research: F. Höfferl, D. Steinschorn
10 pages
Hydropower Optimization Using Deep Learning: (Ole - Granmo, Jivitesh - Sharma) @uia - No
No ratings yet
Hydropower Optimization Using Deep Learning: (Ole - Granmo, Jivitesh - Sharma) @uia - No
13 pages
Dynamic Programming
No ratings yet
Dynamic Programming
3 pages
Dynamic Prog Final Paper
No ratings yet
Dynamic Prog Final Paper
7 pages
Optimizations, Chapter 1,2,3,4
No ratings yet
Optimizations, Chapter 1,2,3,4
13 pages
Discrete Dynamic Programming Problem: Pt. Ravishankar Shukla University, Raipur
No ratings yet
Discrete Dynamic Programming Problem: Pt. Ravishankar Shukla University, Raipur
14 pages
Dynamic Programming Matlab
No ratings yet
Dynamic Programming Matlab
6 pages
IJCAS v2 n3 pp263-278
No ratings yet
IJCAS v2 n3 pp263-278
16 pages
Resources 08 00173
No ratings yet
Resources 08 00173
13 pages
CH 18
No ratings yet
CH 18
30 pages
1 s2.0 S0305054822001046 Main
No ratings yet
1 s2.0 S0305054822001046 Main
14 pages
Multi-Reservoir Operation Rules Multi-Sw
No ratings yet
Multi-Reservoir Operation Rules Multi-Sw
21 pages
Metal Ceramic Restoration Framework
100% (1)
Metal Ceramic Restoration Framework
18 pages
Daily Exam 2 Unit 1 Grade 11 AKL
No ratings yet
Daily Exam 2 Unit 1 Grade 11 AKL
3 pages
Green World Vascular Plants
100% (1)
Green World Vascular Plants
34 pages
Phil Agri
No ratings yet
Phil Agri
9 pages
Allplastics - PTFE Virgin Sheet Data Sheet
No ratings yet
Allplastics - PTFE Virgin Sheet Data Sheet
1 page
Module in P.E. and Health Shs - Grade 11 Alupay SHS First Quarter
0% (1)
Module in P.E. and Health Shs - Grade 11 Alupay SHS First Quarter
5 pages
Lab Report Recrystallization
No ratings yet
Lab Report Recrystallization
5 pages
Eductrip Essay
No ratings yet
Eductrip Essay
1 page
The Path of Shadows Chthonic Gods Oneiromancy Necromancy in Ancient Greece Gwe
0% (1)
The Path of Shadows Chthonic Gods Oneiromancy Necromancy in Ancient Greece Gwe
112 pages
Problem 1 - 30
No ratings yet
Problem 1 - 30
4 pages
Floor Plans of Elan The Emperor
No ratings yet
Floor Plans of Elan The Emperor
12 pages
Lighting 1
No ratings yet
Lighting 1
11 pages
Rangga Fakhrurriza - Kelompok 6 Konflik Dan Negosiasi
No ratings yet
Rangga Fakhrurriza - Kelompok 6 Konflik Dan Negosiasi
6 pages
2017-02-02 - Moneysaver - Lewis-Clark Edition
No ratings yet
2017-02-02 - Moneysaver - Lewis-Clark Edition
20 pages
Calculus 1 - Limits - Worksheet 4 - Evaluating Limits by Factoring - Part 2
100% (1)
Calculus 1 - Limits - Worksheet 4 - Evaluating Limits by Factoring - Part 2
28 pages
Gwo Blade Repair Brochure
No ratings yet
Gwo Blade Repair Brochure
1 page
The Purpose of This Assignment Is To Know Practical Knowledge of Hotel Management
No ratings yet
The Purpose of This Assignment Is To Know Practical Knowledge of Hotel Management
5 pages
IBM-CBSE AI Project Logbook-1
No ratings yet
IBM-CBSE AI Project Logbook-1
27 pages
Case Report: Surgical Removal of A Canine Aortic Thromboembolism Secondary To Pancreatitis
No ratings yet
Case Report: Surgical Removal of A Canine Aortic Thromboembolism Secondary To Pancreatitis
8 pages
Pórfido Vendaval
No ratings yet
Pórfido Vendaval
13 pages
WSF AQC Presentation Richard Glindon Klaus Herick Tcm18-219589
No ratings yet
WSF AQC Presentation Richard Glindon Klaus Herick Tcm18-219589
21 pages
SupplierManual Logistics Buehler Version 6.0
No ratings yet
SupplierManual Logistics Buehler Version 6.0
50 pages
GBW Requirements
No ratings yet
GBW Requirements
40 pages
Experiment No.3 Boolean Theoretical and Actual Output Proof
No ratings yet
Experiment No.3 Boolean Theoretical and Actual Output Proof
4 pages
Eye Lids: Anatomy of Eye Lid
100% (2)
Eye Lids: Anatomy of Eye Lid
14 pages
Semiconductor Electronics
No ratings yet
Semiconductor Electronics
37 pages
Section 5: Maintenance, Repair and Adjustment: 5.1 Maintenance Schedule 5.1. List of Required Regular Maintenance Works
No ratings yet
Section 5: Maintenance, Repair and Adjustment: 5.1 Maintenance Schedule 5.1. List of Required Regular Maintenance Works
1 page

A Parallel Dynamic Programming Algorithm For Multi-Reservoir System

Uploaded by

A Parallel Dynamic Programming Algorithm For Multi-Reservoir System

Uploaded by

A parallel dynamic programming algorithm for multi-reservoir system

Corresponding author. Tel.: +1 3108252300; fax: +1 3108257581.

CPU employs hyper-threading

in a two-dimensional array, with the horizontal direction repre-

and maximum cumulative returns F

You might also like