MSG Pass Dyn PDF
MSG Pass Dyn PDF
R
in Optimization
Vol. 1, No. 2 (2013) 70–122
c 2013 M. Kraning, E. Chu, J. Lavaei, and S. Boyd
DOI: xxx
1 Introduction 70
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . 73
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2 Network Model 76
2.1 Formal definition and notation . . . . . . . . . . . . . . . 76
2.2 Dynamic optimal power flow problem . . . . . . . . . . . . 78
2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3 Device Examples 83
3.1 Generators . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.2 Transmission lines . . . . . . . . . . . . . . . . . . . . . . 84
3.3 Converters and interface devices . . . . . . . . . . . . . . 86
3.4 Storage devices . . . . . . . . . . . . . . . . . . . . . . . 87
3.5 Loads . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4 Convexity 91
4.1 Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2 Relaxations . . . . . . . . . . . . . . . . . . . . . . . . . . 92
ii
iii
7 Extensions 111
7.1 Closed-loop control . . . . . . . . . . . . . . . . . . . . . 111
7.2 Security constrained optimal power flow . . . . . . . . . . 113
7.3 Hierarchical models and virtualized devices . . . . . . . . . 113
7.4 Local stopping criteria and ρ updates . . . . . . . . . . . . 115
8 Conclusion 116
Abstract
We consider a network of devices, such as generators, fixed loads, de-
ferrable loads, and storage devices, each with its own dynamic con-
straints and objective, connected by AC and DC lines. The problem is
to minimize the total network objective subject to the device and line
constraints over a time horizon. This is a large optimization problem
with variables for consumption or generation for each device, power
flow for each line, and voltage phase angles at AC buses in each period.
We develop a decentralized method for solving this problem called
proximal message passing. The method is iterative: At each step, each
device exchanges simple messages with its neighbors in the network
and then solves its own optimization problem, minimizing its own ob-
jective function, augmented by a term determined by the messages it
has received. We show that this message passing method converges to
a solution when the device objective and constraints are convex. The
method is completely decentralized, and needs no global coordination
other than synchronizing iterations; the problems to be solved by each
device can typically be solved extremely efficiently and in parallel.
The proximal message passing method is fast enough that even a se-
rial implementation can solve substantial problems in reasonable time
frames. We report results for several numerical experiments, demon-
strating the method’s speed and scaling, including the solution of a
problem instance with over 30 million variables in 5 minutes for a serial
implementation; with decentralized computing, the solve time would be
less than one second.
1
Introduction
1.1 Overview
A traditional power grid is operated by solving a number of opti-
mization problems. At the transmission level, these problems include
unit commitment, economic dispatch, optimal power flow (OPF), and
security-constrained OPF (SC-OPF). At the distribution level, these
problems include loss minimization and reactive power compensation.
With the exception of the SC-OPF, these optimization problems are
static with a modest number of variables (often less than 10000), and
are solved on time scales of 5 minutes or more. However, the operation
of next generation electric grids (i.e., smart grids) will rely critically on
solving large-scale, dynamic optimization problems involving hundreds
of thousands of devices jointly optimizing tens to hundreds of millions
of variables, on the order of seconds rather than minutes [16, 41]. More
precisely, the distribution level of a smart grid will include various types
of active dynamic devices, such as distributed generators based on solar
and wind, batteries, deferrable loads, curtailable loads, and electric ve-
hicles, whose control and scheduling amount to a very complex power
management problem [59, 9].
In this paper, we consider a general problem, which we call the
70
1.1. Overview 71
1.3 Outline
The rest of this paper is organized as follows. In chapter 2 we give the
formal definition of our network model. In chapter 3 we give examples
of how to model specific devices such as generators, deferrable loads
and energy storage systems in our formal framework. In Chapter 4, we
describe the role that convexity plays in the D-OPF and introduce the
idea of convex relaxations as a tool to find solutions to the D-OPF in
the presence of non-convex device objective functions. In Chapter 5 we
derive the proximal message passing equations. In Chapter 6 we present
a series of numerical examples, and in Chapter 7 we discuss how our
framework can be extended to include use cases we do not explicitly
cover in this paper.
2
Network Model
76
2.1. Formal definition and notation 77
2.3 Discussion
We now describe our model in a less formal manner. Generators, loads,
energy storage systems, and other power sources and sinks are modeled
as single terminal devices. Transmission lines (or more generally, any
80 Network Model
Figure 2.1: A simple network (left); its transformation into standard form (right).
2.4 Example
We illustrate how a traditional power network can be recast into our
network model in Figure 2.1. The original power network, shown on
the left, contains 2 loads, 3 buses, 3 transmission lines, 2 generators,
and a single battery storage system. We can transform this small power
grid into our model by representing it as a network with 11 terminals,
8 devices, and 3 nets, shown on the right of figure 2.1. Terminals are
shown as small filled circles. Single terminal devices, which are used
to model loads, generators, and the battery, are shown as boxes. The
transmission lines are two terminal devices represented by solid lines.
The nets are shown as dashed rounded boxes. Terminals are associated
with the device they touch and the net in which they are contained.
The set of terminals can be partitioned by either the devices they
are associated with, or the nets in which they are contained. Figure 2.2
shows the network in Figure 2.1 as a bipartite graph, with devices on
the left and nets on the right. In this representation, terminals are
represented by the edges of the graph.
82 Network Model
L1
G1
T1
G2
T2
L2
T3
Figure 2.2: The network in Figure 2.1 represented as a bipartite graph. Devices
(boxes) are shown on the left with their associated terminals (dots). The terminals
are connected to their corresponding nets (solid boxes) on the right.
3
Device Examples
3.1 Generators
A generator is a single-terminal device with power schedule pgen , which
generates power over a range, P min ≤ −pgen ≤ P max , and has ramp-
rate constraints
Rmin ≤ −Dpgen ≤ Rmax ,
which limit the change of power levels from one period to the next.
Here, the operator D ∈ R(T −1)×T is the forward difference operator,
83
84 Device Examples
defined as
p1 (τ ) + p2 (τ ) = 0, |p1 (τ )| ≤ C max , τ = 1, . . . , T.
If the phase shifter can only support power flow in one direction, say,
from terminal 1 to terminal 2, then in addition we have the inequalities
p1 (τ ) ≥ 0, τ = 1, . . . , T . The voltage phase angles θ1 and θ2 are un-
constrained. (Indeed, this what a phase shifter is meant to do.) When
there is no capacity constraint, i.e., C max = ∞, we can think of a phase
shifter as a special type of net for AC terminals that enforces power
balance, but not voltage phase consistency. (However, we model it as
a device, not a net.)
where q init is the initial charge. It has zero cost function and the charge
level must not exceed the battery capacity, i.e., 0 ≤ q(τ ) ≤ Qmax ,
τ = 1, . . . , T . It is common to constrain the terminal battery charge
q(T ) to be some specified value or to match the initial charge q init .
More sophisticated battery models include (possibly state-
dependent) charging and discharging inefficiencies as well as charge
leakage [26]. In addition, they can include costs which penalize exces-
sive charge-discharge cycling.
The same general form can be used to model other types of energy
storage systems, such as those based on super-capacitors, flywheels,
pumped hydro, or compressed air, to name just a few.
3.5 Loads
Fixed load. A fixed energy load is a single terminal device with zero
cost function which consists of a desired consumption profile, l ∈ RT .
This consumption profile must be satisfied in each period, i.e., we have
the constraint pload = l.
α1T (l − pload )+ ,
4.1 Devices
We call a device convex if its objective function is convex. A network is
convex if all of its devices are convex. For convex networks, the D-OPF
is a convex optimization problem, which means that in principle we can
efficiently find a global solution [7]. When the network is not convex,
even finding a feasible solution for the D-OPF can become difficult,
and finding and certifying a globally optimal solution to the D-OPF
is generally intractable. However, special structure in many practical
power distribution problems can allow us to guarantee optimality.
In the examples from Chapter 3, the inverter, rectifier, phase shifter,
battery, fixed load, thermal load, curtailable load, electric vehicle, and
external tie are all convex devices using the constraints and objective
functions given. A deferrable load is convex if we drop the constraint
that it can only be turned on or off. We discuss the convexity properties
of the generator and AC and DC transmission lines next.
91
92 Convexity
4.2 Relaxations
One technique to deal with non-convex networks is to use convex re-
laxations. We use the notation g env to denote the convex envelope [52]
of the function g. There are many equivalent definitions for the con-
vex envelope, for example, g env = (g ∗ )∗ , where g ∗ denotes the convex
conjugate of the function g. We can equivalently define g env to be the
largest convex lower bound of g. If g is a convex, closed, proper (CCP)
function, then g = g env .
The relaxed dynamic optimal power flow problem (RD-OPF) is
1.6 1.6
1.4 1.4
gen (pgen )
φgen (pgen )
1.2 1.2
1.0 1.0
φenv
0.8 0.8
0.6 0.6
0.4 0.4
0 1 2 3 4 0 1 2 3 4
−pgen −pgen
Figure 4.1: Left: Cost function for a generator that can be turned off. Right: Its
convex relaxation.
p2 p2
p1 p1
Figure 4.2: Left: Feasible sets of a transmission lines with no loss (black) and AC
loss (grey). Right: Their convex relaxations.
94 Convexity
as shown in Figure 4.2 in black. When the transmission line has losses,
in most cases the loss function ℓ is a convex function of the input and
output powers, which leads to a feasible power region like the grey arc
in the left part of Figure 4.2.
The feasible set of a relaxed transmission line is given by the convex
hull of the original transmission line’s constraints. The right side of
figure 4.2 shows examples of this for both lossless and lossy transmission
lines. Physically, this relaxation gives lossy transmission lines the ability
to discard some additional power beyond what is simply lost to heat.
Since electricity is generally a valuable commodity in power networks,
the transmission lines will generally not throw away any additional
power in the optimal solution to the RD-OPF, leading to the power
line constraints in the RD-OPF being tight and thus also satisfying
the unrelaxed power line constraints in the original D-OPF. As was
shown in [40], when the network is a tree, this relaxation is always
tight. In addition, when all locational marginal prices are positive and
no other non-convexities exist in the network, the tightness of the line
constraints in the RD-OPF can be guaranteed in the case of networks
that have separate phase shifters on each loop in the networks whose
shift parameter can be freely chosen [54].
5
Proximal Message Passing
Notation
Whenever we have a set of variables that maps terminals to time peri-
ods, x : T → RT (which we can also associate with a |T | × T matrix),
we will use the same index, over-line, and tilde notation for the variables
x as we do for power schedules p and phase schedules θ. For example,
xt ∈ RT consists of the time period vector of values of x associated
with terminal t, x̄t = (1/|n|) t′ ∈n xt′ , where t ∈ n, and x̃t = xt − x̄t ,
P
95
96 Proximal Message Passing
5.1 Derivation
We derive the proximal message passing equations by reformulating the
D-OPF using the alternating direction method of multipliers (ADMM)
and then simplifying the resulting equations. We refer the reader to [6]
for a thorough overview of ADMM.
We first rewrite the D-OPF as
d∈D n∈N
uk+1
n := ukn + (pk+1
n − znk+1 ), n ∈ N,
vnk+1 := vnk + (θnk+1 − ξnk+1 ), n ∈ N,
where the first step is carried out in parallel by all devices, and then
the second and third and then fourth and fifth steps are carried out in
parallel by all nets.
Since gn (zn ) and hn (ξn ) are simply indicator functions for each
net n, the second and third steps of the algorithm can be computed
analytically and are given by
uk+1
n := ukn + p̄k+1
n , n ∈ N,
vnk+1 := ṽnk + θ̃nk+1 , n ∈ N,
5.2 Convergence
Theory. We now comment on the convergence of proximal message
passing. Since proximal message passing is a version of ADMM, all
convergence results that hold for ADMM also hold for proximal message
passing. In particular, when all devices have CCP objective functions
and a feasible solution to the D-OPF exists, the following hold:
2. Operation is optimal: k k → f ⋆ as k → ∞.
d∈D fd (pd , θd )
P
5.2. Convergence 99
3. Optimal prices are found: ρuk = ypk → yp⋆ and ρv k = yθk → yθ⋆ as
k → ∞.
Here f ⋆ is the optimal value for the D-OPF, and yp⋆ and yθ⋆ are optimal
dual variables for the power schedule and phase consistency constraints,
respectively. The proof of these results (in the more general setting)
can be found in [6]. As a result of the third condition, the optimal
locational marginal prices L⋆ can be found for each net n ∈ N by
setting L⋆n = |n|(yp⋆ )n .
Stopping criterion. Following [6], we can define primal and dual resid-
uals, which for proximal message passing simplify to
where ǫpri and ǫdual are, respectively, primal and dual tolerances. We
can normalize both of these quantities to network size by the relation
q
pri dual abs
ǫ =ǫ =ǫ |T |T ,
ρk+1 := h(ρk , rk , sk ),
ρk k+1
uk+1 := u ,
ρk+1
ρk k+1
v k+1 := v ,
ρk+1
using the prox functions of the relaxed device objective functions. Since
fdenv is a CCP function for all d ∈ D, proximal message passing in this
case is guaranteed to converge to the optimal value of the RD-OPF
and yield the optimal relaxed locational marginal prices.
5.3 Discussion
To compute the proximal messages, devices and nets only require
knowledge of who their network neighbors are, the ability to send small
vectors of numbers to those neighbors in each iteration, and the abil-
ity to store small amounts of state information and efficiently compute
prox functions (devices) or projections (nets). As all communication is
local and peer-to-peer, proximal message passing supports the ad hoc
formation of power networks, such as micro grids, and is self-healing
and robust to device failure and unexpected network topology changes.
Due to recent advances in convex optimization [61, 46, 47], many
of the prox function calculations that devices must perform can be
very efficiently executed at millisecond or microsecond time-scales on
inexpensive, embedded processors [30]. Since all devices and all nets
can each perform their computations in parallel, the time to execute a
single, network wide proximal message passing iteration (ignoring com-
munication overhead) is equal to the sum of the maximum computation
time over all devices and the maximum computation time of all nets in
the network. As a result, the computation time per iteration is small
and essentially independent of the size of the network.
In contrast, solving the D-OPF in a centralized fashion requires
102 Proximal Message Passing
103
104 Numerical Examples
In this way, when the distance between i and j is smaller than d, they
are connected with a fixed probability α > 0, and when they are located
farther than distance d apart, the probability decays as 1/kxi − xj k22 .
After this process, we add a transmission line between any isolated net
and its nearest neighbor. We then introduce transmission lines between
distinct connected components by selecting two connected components
uniformly at random and then selecting two nets, one inside each com-
ponent, uniformly at random and connecting them by a transmission
line. We continue this process until the network is connected.
For the examples we present, we chose parameter values d = 0.11
and α = 0.8 as the parameters for generating our network. This results
in networks with an average degree of 2.1. Using these parameters, we
generated networks with 30 to 100000 nets, which resulted in optimiza-
tion problems with approximately 10 thousand to 30 million variables.
6.2 Devices
After we generate the network topology described above, we randomly
attach a single (one-terminal) device to each net according to the dis-
tribution in table 6.1. We also allow the possibility that a net acts as
a distributor and has no device attached to it other than transmission
lines. About 10% of the transmission lines are DC transmission lines,
while the other are AC transmission lines. The models for each device
and line in the network are identical to the ones given in Chapter 3,
with model parameters chosen in a manner we describe below.
For simplicity, our examples only include networks with the devices
listed below. For all devices, the time horizon was chosen to be T = 96,
corresponding to 15 minute intervals for 24 hour schedules, with the
time period τ = 1 corresponding to midnight.
Device Fraction
None 0.4
Generator 0.4
Curtailable load 0.1
Deferrable load 0.05
Battery 0.05
tors lie in between. Large generators are generally more efficient than
small and medium generators which is reflected in their cost function by
having smaller values of α and β. Whenever a generator is placed into
a network, its type is selected uniformly at random, and its parameters
are taken from the appropriate row in table 6.2.
Fixed load. The load profile for a fixed load instance is a sinusoid,
uniformly at random from the interval [60, 72], ensuring that the load
profile peaks between the hours of 3pm and 6pm.
101 101
kpk k2 / |T |T
10−2 10−2
|f k − f ⋆ |/f ⋆
p
10−3 10−3
10−5 10−5
10−7 10−7
0 250 500 750 1000 0 250 500 750 1000
iter k iter k
Figure 6.1: The relative suboptimality (left) and primal infeasibility (right) of proxi-
mal message passing on a network instance with N = 3000 nets (1 million variables).
The dashed line shows when the stopping criterion is satisfied.
6.5 Results
We first consider a single example: a network instance with N = 3000 (1
million variables). Figure 6.1 shows that after fewer than 200 iterations
of proximal message passing, both the relative suboptimality as well as
the average net power imbalance and average phase inconsistency are
both less than 10−3 . The convergence rates for other network instances
over the range of sizes we simulated are similar.
In Figure 6.2, we present average timing results for solving the D-
OPF for a family of examples, using our serial implementation, with
6.5. Results 109
networks of size N = 30, 100, 300, 1000, 3000, 10000, 30000, and
100000. For each network size, we generated and solved 10 network in-
stances to compute average solve times and confidence intervals around
those averages. The times were modeled with a log-normal distribution.
For network instances with N = 100000 nets, the problem has over 30
million variables, which we solve serially using proximal message pass-
ing in 5 minutes on average. By fitting a line to the proximal message
passing runtimes, we find that our parallel implementation empirically
scales as O(N 0.996 ), i.e., solve time is linear in problem size.
For a peer-to-peer implementation, the runtime of proximal message
passing should be essentially constant, and in particular independent of
the size of the network. To solve a problem with N = 100000 nets (30
million variables) with approximately 200 iterations of our algorithm
then takes only 200 ms. In practice, the actual solve time would clearly
be dominated by network communication latencies and actual runtime
performance will be determined by how quickly and reliably packets can
be delivered [34]. As a result, in a true peer-to-peer implementation, a
negligible amount of time is actually spent on computation. However,
it goes without saying that many other issues must be addressed with a
peer-to-peer protocol, including handling network delays and security.
Figure 6.2 shows cold start runtimes for solving the D-OPF. If we
have good estimates of the power and phase schedules and dual vari-
ables for each terminal, we can use them to warm start our D-OPF
solver. To show the effect, we randomly convert 5% of the devices into
fixed loads and solve a specific instance with N = 3000 nets (1 million
variables). Let K cold to be the number of iterations needed to solve an
instance of this problem. We then uniformly scale the load profiles of
each device by separate and independent lognormal random variables.
The new profiles, ˆl, are obtained from the original profiles l via
ˆl = l exp(σX),
where X ∼ N (0, 1), and σ > 0 is given. Using the original solution
to warm start our solver, we solve the perturbed problem and report
the number of iterations K warm needed. Figure 6.3 shows the ratio
K warm /K cold as we vary σ, showing the significant savings possible
with warm-starting even under relatively large perturbations.
110 Numerical Examples
1000
100
time (seconds)
10
0.1
1.0
0.8
K warm /K cold
0.6
0.4
0.2
0
0.00 0.05 0.10 0.15 0.20
σ
Figure 6.3: Relative number of iterations needed to converge from a warm start for
various perturbations of load profiles compared to original number of iterations.
7
Extensions
111
112 Extensions
Figure 7.1: Left: A simple network with four devices and two nets. Right: A hierar-
chical representation with only 2 devices at the highest level. All terminals connected
to the left-most net are internal to the virtual device.
116
Acknowledgments
The authors thank Yang Wang and Neal Parikh for extensive discus-
sions on the problem formulation as well as ADMM methods; Yang
Wang, Brendan O’Donoghue, Haizi Yu, Haitham Hindi, and Mikael
Johansson for discussions on optimal ρ selection and for help with the
ρ update method; Steven Low for discussions about end-point based
control; and Ed Cazalet, Ram Rajagopal, Ross Baldick, David Chas-
sin, Marija Ilic, Trudie Wang, and Jonathan Yedidia for many helpful
comments. We would like to thank Marija Ilic, Le Xie, and Boris De-
fourny for pointing us to DYNMONDS and other earlier Lagrangian
approaches. We are indebted to Misha Chertkov, whose questions on
an early version of this paper prodded us to make the concept of AC
and DC terminals explicit. Finally, we thank Warren Powell and Hugo
Simao for encouraging us to release implementations of these methods.
This research was supported in part by Precourt 1140458-
1-WPIAE, by AFOSR grant FA9550-09-1-0704, by NASA grant
NNX07AEIIA, and by the DARPA XDATA grant FA8750-12-2-0306.
After this paper was submitted, we became aware of [31] and [32],
which apply ADMM to power networks for the purpose of robust state
estimation. Our paper is independent of their efforts.
117
References
118
References 119
[28] M. Ilic, L. Xie, and J.-Y. Joo, “Efficient coordination of wind power and price-
responsive demand — Part I: Theoretical foundations,” IEEE Transactions on
Power Systems, vol. 26, pp. 1875–1884, Nov 2011.
[29] M. Ilic, L. Xie, and J.-Y. Joo, “Efficient coordination of wind power and price-
responsive demand—part ii: Case studies,” IEEE Transactions on Power Sys-
tems, vol. 26, pp. 1885–1893, Nov 2011.
[30] J. L. Jerez, P. J. Goulart, S. Richter, G. A. Constantinides, E. C. Kerrigan,
and M. Morari, “Embedded online optimization for model predictive control at
megahertz rates,” Submitted, IEEE Transactions on Automatic Control, 2013.
[31] V. Kekatos and G. Giannakis, “Joint power system state estimation and breaker
status identification,” in Proceedings of the 44th North American Power Sym-
posium, 2012.
[32] V. Kekatos and G. Giannakis, “Distributed robust power system state estima-
tion,” IEEE Transactions on Power Systems, 2013.
[33] F. P. Kelly, A. K. Maulloo, and D. K. H. Tan, “Rate control in communication
networks: shadow prices, proportional fairness and stability,” Journal of the
Operational Research Society, vol. 49, pp. 237–252, 1998.
[34] A. Kiana and A. Annaswamy, “Wholesale energy market in a smart grid: A
discrete-time model and the impact of delays,” in Control and Optimization
Methods for Electric Smart Grids, (A. Chakrabortty and M. Ilic, eds.), pp. 87–
110, Springer US, 2012.
[35] B. H. Kim and R. Baldick, “Coarse-grained distributed optimal power flow,”
IEEE Transactions on Power Systems, vol. 12, no. 2, pp. 932–939, 1997.
[36] B. H. Kim and R. Baldick, “A comparison of distributed optimal power flow
algorithms,” IEEE Transactions on Power Systems, vol. 15, no. 2, pp. 599–604,
2000.
[37] M. Kraning, Y. Wang, E. Akuiyibo, and S. Boyd, “Operation and configuration
of a storage portfolio via convex optimization,” in Proceedings of the 18th IFAC
World Congress, pp. 10487–10492, 2011.
[38] A. Lam, B. Zhang, and D. Tse, “Distributed algorithms for optimal power flow
problem,” http://arxiv.org/abs/1109.5229, 2011.
[39] J. Lavaei and S. Low, “Zero duality gap in optimal power flow problem,” IEEE
Transactions on Power Systems, vol. 27, no. 1, pp. 92–107, 2012.
[40] J. Lavaei, D. Tse, and B. Zhang, “Geometry of power flows in tree networks,”
IEEE Power & Energy Society General Meeting, 2012.
[41] J. Liang, G. K. Venayagamoorthy, and R. G. Harley, “Wide-area measurement
based dynamic stochastic optimal power flow control for smart grids with high
variability and uncertainty,” IEEE Transactions on Smart Grid, vol. 3, pp. 59–
69, 2012.
[42] S. H. Low, L. Peterson, and L. Wang, “Understanding tcp vegas: a duality
model,” in Proceedings of the 2001 ACM SIGMETRICS international con-
ference on Measurement and modeling of computer systems, (New York, NY,
USA), pp. 226–235, ACM, 2001.
References 121
[61] Y. Wang and S. Boyd, “Fast model predictive control using online optimiza-
tion,” IEEE Transactions on Control Systems Technology, vol. 18, pp. 267–278,
2010.
[62] J. Zhu, Optimization of Power System Operation. Wiley-IEEE Press, 2009.