Ghadimi EJOR Article in Press Aug 14 2023 1
Ghadimi EJOR Article in Press Aug 14 2023 1
Ghadimi EJOR Article in Press Aug 14 2023 1
a r t i c l e i n f o a b s t r a c t
Article history: Rolling forecasts have been almost overlooked in the renewable energy storage literature. In this paper,
Received 13 April 2022 we provide a new approach for handling uncertainty not just in the accuracy of a forecast, but in the
Accepted 1 August 2023
evolution of forecasts over time. Our approach shifts the focus from modeling the uncertainty in a looka-
Available online xxx
head model to accurate simulations in a stochastic base model. We develop a robust policy for making
Keywords: energy storage decisions by creating a parametrically modified lookahead model, where the parameters
Stochastic programming are tuned in the stochastic base model. Since computing unbiased stochastic gradients with respect to
Energy storage the parameters require restrictive assumptions, we propose a simulation-based stochastic approximation
Simulation optimization algorithm based on numerical derivatives to optimize these parameters. While numerical derivatives, cal-
Parametric cost function approximation culated based on the noisy function evaluations, provide biased gradient estimates, an online variance
Rolling forecast reduction technique built in the framework of our proposed algorithm, will enable us to control the ac-
cumulated bias errors and establish the finite-time rate of convergence of the algorithm. Our numerical
experiments show the performance of this algorithm in finding policies outperforming the deterministic
benchmark policy.
© 2023 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.ejor.2023.08.003
0377-2217/© 2023 Elsevier B.V. All rights reserved.
Please cite this article as: S. Ghadimi and W.B. Powell, Stochastic search for a parametric cost function approximation: Energy storage
with rolling forecasts, European Journal of Operational Research, https://doi.org/10.1016/j.ejor.2023.08.003
JID: EOR
ARTICLE IN PRESS [m5G;August 12, 2023;19:0]
S. Ghadimi and W.B. Powell European Journal of Operational Research xxx (xxxx) xxx
predictive control (Almassalkhi & Hiskens, 2015; Kiaei & Lotfifard, model and show that, when optimized with our proposed algo-
2018; Kumar et al., 2018; Zafar, Ravishankar, Fletcher, & Pota, 2018). rithm, they can outperform a deterministic benchmark policy using
To the best of our knowledge, in all existing energy storage mod- vanilla point forecasts for an energy storage problem.
els, forecasting is either ignored, or the set of forecasts over the The rest of this paper is organized as follows. We discuss the
entire horizon are fixed (see e.g., Dicorato, Forte, Pisani, & Trovato, issue of rolling forecasts and its importance in sequential decision
2012; Zhang, Zhang, Huang, & Lee, 2018). This assumption under- making under uncertainty in Section 2. We then present our en-
mines real world problems where sources of uncertainty, in par- ergy storage model in Section 3. We discuss solution strategies in
ticular from renewable sources, are changing every few minutes. Section 4 and present our parametric CFA approach. We also pro-
This highlights a broader challenge in sequential stochastic deci- pose a stochastic policy search algorithm to optimize the parame-
sion making problems when a given forecast for a source of uncer- ters within the parametric CFA model in Section 5 and establish its
tainty is updated frequently over the time horizon. finite-time rate of convergence. We further show the performance
Several approaches have been proposed to generate forecasts of of this algorithm in optimizing the aforementioned policies for an
different sources of uncertainty in energy systems (which then as- energy storage problem in Section 6 and conclude the paper with
sumed to be fixed over the horizon). For example, time series mod- some remarks in Section 7.
els (saltyte Benth, Benth, & Jalinskas, 2007; Taylor & Buizza, 2003)
and neural networks (Abhishek, Singh, Ghosh, & Anand, 2012; 2. Rolling forecasts
Khotanzad, Davis, Abaye, & Maratukulam, 1996), have been used to
forecast the weather (temperature) which can affect both demand The problem of planning in the presence of rolling forecasts,
and supply. There are also different approaches in forecasting re- which exhibit potentially high errors, is difficult and has been
newable energy like solar radiation (Akarslan, Hocaoğlu, & Edizkan, largely overlooked in the energy storage literature. A simple fact
2014; Arbizu-Barrena, Ruiz-Arias, Rodríguez-Benítez, Pozo-Vázquez, for the rolling forecasts is having accumulated noise as we predict
& Tovar-Pescador, 2017) and wind speed (Liu, Shi, & Erdem, 2010; far more in the future. For example, at time t, denoting the forecast
Traiteur, Callicutt, Smith, & Roy, 2012). More recently, a vector au- of energy available from wind for time t by { ftE,t }t ≥t and assum-
toregression model is also proposed in Liu, Roberts, & Sioshansi
ing that { f0E,t }t =0,...,min(H,T ) is given, one can generate the forecasts
(2018) for forecasting temperature, wind speed, and solar radiation.
as
Our approach formalizes the idea that has been widely used
in industry that an effective way to solve complex stochastic op- ftE+1,t = ftE,t + t +1,t t = 0, . . . , T − 1,
timization problems is to shift the modeling of uncertainty from t = t + 1, . . . , min(t + H, T ), (1)
a lookahead approximation to the stochastic base model which
is captured by a simulator that includes the updating of rolling where T is the problem horizon, H is the size of the lookahead,
forecasts, as well as capturing any other dynamics relevant to the t +1,t ∼ N (0, σ2 ), and σ depends on ρE ftE,t for some constant ρE .
problem. We have first presented this idea more conceptually in This model is usually known as the“martingale model of forecast
Powell & Ghadimi (2022) for sequential decision making problems evolution” (see e.g., Graves, Dasu, & Qui, 1986; Heath & Jackson,
under uncertainty with updating forecasts. This approach, which 1994; Sapra & Jackson, 2004).
we call parametric cost function approximations (CFAs), requires The standard approach to handling forecasts is to fix them over
that we a) design a parameterized deterministic optimization prob- the planning horizon (ignoring the reality that they will actually be
lem and b) tune the parameters in a simulator. While the idea of changing) and optimize over a deterministic future. However, an
parameterized policies is well known in the form of linear decision optimal policy would require modeling the evolution of forecasts
rules (also called affine policies), step functions such as order-up-to over time, something that we have never seen done in a looka-
rules for inventory problems, or even neural networks, our idea of head model. An alternative is to fix the forecast (say, at time t)
parameterizing an optimization model is new to the stochastic op- over a horizon t ∈ {t, t + 1, . . . , t + H } and solve a stochastic dy-
timization community. We do not minimize the challenges of the namic program. With this strategy, the forecast becomes a latent
two aforementioned steps, but they are done offline, and repre- variable in the lookahead model. These are computationally dif-
sent the research required to design a policy that is both robust ficult, and it would be hard doing this, for example, every 5–10
yet no more difficult to compute than basic deterministic looka- minutes as might be required for an energy storage problem.
heads. In this paper, we mainly focus on applying this idea to an The failure to capture rolling forecasts represents a more sig-
energy storage problem under the presence of rolling forecasts and nificant modeling error than has been recognized in the research
discuss its associated computational challenges. literature. Fixing the forecast as a latent variable ignores our abil-
We make three main contributions in this paper. First, we ap- ity to wait to make decisions at a later time with a more accurate
ply the idea of using parametric CFA to handle uncertainty in the forecast. Properly modeling rolling forecasts and their associated
context of an energy storage problem with rolling forecasts. In errors represents a surprisingly complex challenge in a lookahead
contrast with a basic parametric model, our parameterized opti- model. If we have a rolling forecast extending 24 hours into the
mization model performs critical scaling functions and makes it future, including the forecast into the state variable introduces a
possible to handle high-dimensional decisions. Second, we present 24-dimensional component of the state variable into the model,
a new simulation-based stochastic approximation (SA) algorithm, without any particular structure that we can exploit.
based on the Gaussian random smoothing technique, to optimize Indeed, there is an important tradeoff: including a dynamically
(tune) the parameters in the parametric CFA model while using varying forecast in the state variable produces a more complex, higher
only two function evaluations at each iteration. Our proposed al- dimensional state variable, but one which does not have to be re-
gorithm is equipped with an online variance reduction technique optimized when the forecast changes. By contrast, treating the fore-
which makes it more robust than the vanilla stochastic gradient cast as a latent variable, as it has been done in the classical dynamic
method using numerical derivatives. Furthermore, we establish the programming models using Bellman’s equation, simplifies the model,
finite-time convergence of this algorithm and show that its sample but requires that the model be re-optimized when it changes.
complexity is in the same order of the one presented in Nesterov For this reason, we are going to adopt a completely differ-
& Spokoiny (2017) with slightly better dependence on the problem ent approach. Rather than developing a more accurate lookahead
parameters, when applied to nonsmooth nonconvex problems. Fi- model, we are going to use a parameterized, deterministic looka-
nally, we propose several policies for parameterization of the CFA head model, where the parameters are tuned in the simulator
2
JID: EOR
ARTICLE IN PRESS [m5G;August 12, 2023;19:0]
S. Ghadimi and W.B. Powell European Journal of Operational Research xxx (xxxx) xxx
which captures the updating of rolling forecasts. While the pa- β c (xtwr + xtgr ) − xtrd − xtrg ≤ Rmax − Rt ,
rameterization needs to be carefully designed, this strategy shifts
xtwr + xtgr ≤ γ c ,
the focus from solving a complex lookahead model to using a re-
alistic simulator, where it is much easier to handle complex dy- xtrd + xtrg ≤ γ d , (2)
namics. However, this parameterization policy brings the challenge where βc, βd ∈ (0, 1 ) are the charge and discharge efficiencies, γ c
of optimizing its parameters which requires an efficient stochastic and γ d are the maximum amount of energy that can be charged or
search method. We will present details of our proposed algorithm discharged from the storage device. Fig. 1 summarizes the model.
in Section 5. We will also discuss in more detail the base model The exogenous information
and lookahead models in Section 4. The exogenous information Wt describes the information that
first becomes known at time t. For our energy storage model, we
3. Energy storage model assume that the spot price of electricity from the grid, the mar-
ket price of electricity, and the load are deterministic. However, for
In this section, we describe an energy storage model involving sake of general modeling, we include the change in them from pe-
rolling forecasts of wind. Assume that a smart grid manager must riod t to t + 1 in the exogenous information. It also includes the
satisfy a recurring power demand with a stochastic supply of re- change in forecasts of the wind. It should be pointed out that the
newable energy, unlimited supply of energy from the main power exogenous information for the next period may depend on the cur-
grid at a stochastic price, and access to local rechargeable storage rent decision and/or state of the system.
devices. At the beginning of each period, the manager combines The transition function
different sources of available energy from the grid, local storage, The transition function, SM (· ) also explicitly describes the rela-
and wind to satisfy the current load. Moreover, depending on the tionship between the state of the model at time t and t + 1 such
main power grid price, they may decide to purchase more energy that St+1 = SM (St , xt , Wt+1 ). More specifically, the relationship of
from the grid and recharge the local storage if it has any remain- storage levels between periods is defined as:
ing capacity. In the case of excess energy from wind, they may also
Rt+1 = Rt − xtrd + β c xtwr + xtgr − xtrg . (3)
decide to sell it to the main power grid. We also consider different
energy prices for the grid and customers to allow more flexibility The forecast for the wind is also updated according to (1). The spot
in our model. price of electricity from the grid, the market price of electricity,
We now formally present our energy storage model by intro- and the load are assumed to be fixed over the horizon and do not
ducing the five key elements of sequential decision making under change once they are given at t = 1.
uncertainty (Powell, 2019; Powell & Meisel, 2016a), namely, state The objective function
variables, decision variables, exogenous information variables, the To evaluate the effectiveness of a policy or sequence of de-
transition function, and the objective function. cisions, we need an objective function representing the expected
The state variables sum of the costs Ct (St , xt ) in each time period t over a finite hori-
The state variable at time t, St , includes the following. zon. Denoting the penalty of not satisfying the demand by C P , for
Rt : The level of energy in storage satisfying Rt ∈ [0, Rmax ], where a given state St and decision xt , the cost realized at t is given by
Rmax > 0 represents the storage capacity.
{ ftE,t }t ≥t : The forecast of energy from wind at time t made at Ct (St , xt ) = C P Dt − (C P + Ptm ) xtwd + β d xtrd + xtgd
time t, where the current energy Et = ft,tE .
g
P[t] : The forward curve of spot prices of electricity from the grid
− Ptg β d xtrg − xtgr − xtgd . (4)
with the notation of [t] = {t }t ≥t . Therefore, we seek to find the policy that solves
P[m : The market price of electricity.
t]
T
D[t] : The load curve. min E Ct (St , Xtπ (St )) S0 , (5)
Hence the state of the system can be represented by the vector π ∈
t=0
, P , D[t] ) ∀t ≥ t.
g
St = (Rt , ftE,t , P[m
t] [t] where Xtπ (St ) denotes the decision function (policy) determining
The decision variables
the decision variable xt and the initial state S0 is assumed to be
At time t, several decision variables should be made to satisfy
known. If the cost function, transition function and constraints are
the load and replenish the storage device for the future.
linear, a deterministic lookahead policy can be constructed as a lin-
xtwd : The available energy from the wind used to satisfy the
ear program if point forecasts of exogenous information are pro-
load.
vided. Eq. (5), along with the transition function and the exoge-
xtrd : The allocated energy from the storage used to satisfy the
nous information process, is called the base model which can be
load.
gd
xt : The purchased energy from the grid used to satisfy the
load.
xtwr : The available energy from the wind transferred to storage.
gr
xt : The purchased energy from the grid used to store.
rg
xt : The stored energy to be sold to the grid.
Hence, the manager’s decision variables at time t are defined as
the vector xt given by
xt = xtwd , xtrd , xgd , xtwr , xtgr , xtrg ≥ 0,
which should satisfy the following constraints:
3
JID: EOR
ARTICLE IN PRESS [m5G;August 12, 2023;19:0]
S. Ghadimi and W.B. Powell European Journal of Operational Research xxx (xxxx) xxx
used to model virtually any sequential stochastic decision making which can efficiently handle the issue of rolling forecasts. Consider
problem, with possibly minor twists in the objective function for a deterministic lookahead policy given by
specific classes of the problem such as risk measures. min(t+H,T )
XtD-LA (St ) = argmin Ct (St , xt ) + Ct (St , x˜t ,t ),
xt ,x˜t t =t+1
4. Designing policies
In this section, we first review the existing solution strategies to s.t. (2 ), (3 ) xt , x˜t ≥ 0, and
solve the base model and then present our new approach, namely,
the parametric cost function approximation. x˜twd
,t +β d rd
x˜t ,t + x˜tgd,t ≤ Dt ,
x˜trd,t + x˜trg,t ≤ Rt ,
4.1. The four classes of policies
x˜twr ˜twd
,t + x
E
,t ≤ ft ,t ,
There are two general strategies for designing policies to solve
the base model. The first is to use policy search, where we have β c (x˜twr,t + x˜tgr,t ) − x˜trd,t − x˜trg,t + Rt ≤ Rmax ,
to tune the parameters of a policy so that it works well over time. ˜tgr,t ≤ γ c ,
x˜twr
,t + x
The second is to build a policy that makes the best decision now
that minimizes costs now and into the future (we call these looka- x˜trd,t + x˜trg,t ≤ γ d ,
head policies).
The policy search class can be divided into two classes: policy Rt − x˜trd,t + β c (x˜twr ˜tgr,t ) − x˜trg,t = Rt +1 ,
,t + x (6)
function approximations (PFAs), where the policy is an analytical
(t+H,T )
function that maps states to actions (such as a linear model or a where x˜t = (x˜t ,t )tmin
=t+1 . When we solve the above model, we
neural network); and cost function approximations (CFAs) which keep xt to compute the portion of the cost function at time t, and
consist of an optimization problem that has been parameterized discard all x˜t ,t and repeat this process as we move forward over
so that it produces good solutions over time. the problem horizon.
The lookahead class can also be divided into two classes. The In the parametric CFA approach, we parameterize the lookahead
class of value function approximation (VFA), is the familiar ap- model in (6) in which the parametric terms can be added to the
proach based on Bellman’s equation where we might compute cost function and/or constraints. In this paper, we focus on param-
(more often we approximate) the value of being in a downstream eterizing constraints including noisy forecasts. Hence, our hybrid
state produced by a decision now. The class of direct lookahead ap- policy Xtπ (St |θ ) is defined as the solution to the linear program-
proximation (DLA), is based on direct lookaheads where we opti- ming model (6) in which the wind energy constraint is updated as
mize over some planning horizon. The challenge with DLAs is how
,t ≤ bt ( ft ,t , θ ),
to handle uncertainty as we optimize over the horizon. Most prac- x˜twr ˜twd E
,t + x (7)
tical tools such as Google maps use a deterministic approximation.
Building uncertainty explicitly into the lookahead model is chal- where bt is a real valued function and θ is the set of constraint
lenging. The ultimate stochastic lookahead would require solving parameters.
We then need to optimize the values of parameters θ , for a
T given policy π , by solving
Xt∗ (St ) = argmin Ct (St , xt ) + E min E Ct (St , Xtπ (St ))
xt ∈Xt π ∈
t =t+1
min F π (θ ) := Eω F̄ π (θ , ω )
θ
St+1 St , xt .
T
=E Ct St (ω ), Xtπ (St (ω )|θ ) S0 , (8)
In special cases, the lookahead portion of the above equation can t=0
be computed exactly using Bellman’s equation:
where ω denotes the randomness in the model for which F̄ π (θ , ω )
represents the stochastic cumulative cost of the parametrized pol-
Vt (St ) = min Ct (St , xt ) + E Vt+1 (St+1 |St , xt ) ,
xt icy over the horizon. It should be pointed out that the more gen-
eral optimization problem associated with the parametric CFA ap-
where Vt+1 (· ) denotes the value of the downstream impact of a de-
proach is to optimize over the structure of policies and their pa-
cision xt made in state St . More often, we have to replace this value
rameterization simultaneously. However, our focus in this paper is
function with an approximation, but this only works when we can
to solve problem (8) to optimize the parameters for a given struc-
exploit structure such as convexity, linearity or monotonicity. Of-
ture of a policy. The tuned parameters capture the proper dynam-
ten, we have to directly approximate the lookahead by creating a
ics of forecasts, unlike an optimal solution to a stochastic looka-
lookahead model opening the door to a variety of approximation
head that uses a fixed forecast. However, tuning is not easy since
strategies, including the use of deterministic lookaheads, approxi-
the above problem is usually nonconvex and we will discuss an
mating the state variable and exogenous information process (this
approximation algorithm in Section 5 to solve it. We refer inter-
is where we can ignore the presence of rolling forecasts), along
ested readers to the companion paper (Powell & Ghadimi, 2022)
with the use of restricted policies. However, the best approach de-
in which we described the idea of the parametric CFA approach in
pends on the problem setting (see Powell & Meisel, 2016 for more
more detail and for general decision making problems under un-
details).
certainty.
An important step in the parametric CFA approach is to con-
4.2. The parametric cost function approximation sider meaningful parameterized policies in the model. This step is
truly domain dependent and can be significantly different from one
In this subsection, we propose using a hybrid policy of com- problem to another one. Indeed, this step is the art of modeling
bining deterministic lookaheads with parametrically modified CFAs that draws on a statistical model or the knowledge and insights
4
JID: EOR
ARTICLE IN PRESS [m5G;August 12, 2023;19:0]
S. Ghadimi and W.B. Powell European Journal of Operational Research xxx (xxxx) xxx
of the domain experts. For the energy storage problem in this pa- where
per, we assume that uncertainties only exist in the wind forecasts.
∂ C0 ∂ X0π
T
∂ Ct ∂ St
Therefore, we propose the following: ∇θ F̄ π (θ ) = · + ·
∂ X0π ∂θ ∂ St ∂θ
• Constant parameterization (π = const) - This parameterization t=1
uses a single scalar to modify the forecast of energy from wind ∂ Ct ∂ Xtπ ∂ St ∂ Xtπ
+ · · + , (9)
for the entire horizon such that bt in (7) is set to bt ( ftE,t , θ ) = ∂ Xtπ ∂ St ∂θ ∂θ
θ· ftE,t .
π
• Lookup table parameterization (π = lkup) - Overestimating or ∂ St ∂ St ∂ St−1 ∂ St π
∂ Xt−1 ∂ St−1 ∂ Xt−1
underestimating forecasts of energy from wind influences how and = · + π · · + ,
∂θ ∂ St−1 ∂θ ∂ Xt−1 ∂ St−1 ∂θ ∂θ
aggressively a policy will store energy. We can modify the fore-
cast for each period of the lookahead model with a unique pa- in which the ω is dropped for simplicity.
rameter θτ . This parameterization is a lookup table representa-
tion because there is a different θτ for each lookahead period, Proof. If F̄ π (·, ω ) is convex or concave for every ω ∈ , and
τ = 0, 1, 2, . . . This implies that bt ( ftE,t , θ ) = θt −t · ftE,t , where F π (· ) is finite
valued in the neighborhood
of θ , then we have
∇θ E F̄ π (θ , ω ) = E ∇θ F̄ π (θ , ω ) by Strassen (1965). Applying
t ∈ [t + 1, min(t + H, T )] and τ = t − t. If θτ < 1 the policy will
the chain rule, we find
be more conservative and decrease the risk of running out of
energy. Conversely, if θτ > 1 the policy will be more aggressive
d T
∂ C0 dX0π
and less adamant about maintaining large energy reserves. This ∇θ F̄ π (θ ) = C0 (S0 , X0π ) + C (St , Xtπ ) = ·
dθ dX π
0
∂θ
is a time-independent (or stationary) parameterization since t=1
the modification of the forecast at each time period depends
T
d
on how far in the future forecasts are provided. + C (St , Xtπ )
dθ
• Exponential decay parameterization (π = exp) Instead of cal- t=1
culating a set of parameters for every period within the looka- ∂ C0 dX0π
T
∂ Ct ∂ St ∂ Ct dXtπ
head model, we can make our parameterization a function of = · + · + ·
dX0π ∂θ ∂ St ∂θ ∂ Xtπ dθ
time and a few parameters. Intuitively, we can assume the fore- t=1
casts become worse when we are far in the future. Hence, it ∂ C0 dX0π
= ·
might be good to try some decaying functions of parameters dX0π ∂θ
to decrease the impact of errors in forecasts for the far fu-
ture. To do this, we suggest using the following exponential
T
∂ Ct ∂ St ∂ Ct ∂ Xtπ ∂ St ∂ Xtπ
+ · + · · + ,
function of two variables which also limits the search space ∂ St ∂θ ∂ Xtπ ∂ St ∂θ ∂θ
of parameters into a two dimensional plane i.e., bt ( ftE,t , θ ) =
t=1
−t ) where
ftE,t · θ1 · eθ2 ·(t .
π π
∂ St ∂ St ∂ St−1 ∂ St ∂ Xt−1 ∂ St−1 ∂ Xt−1
Similar parameterization schemes can be also proposed for the = · + π · + .
RHS of other constraints in the lookeahead model, if they include ∂θ ∂ St−1 ∂θ ∂ Xt−1 ∂ St−1 ∂θ ∂θ
noisy forecasts. The combination of these parameterizations can be
then used in the parametric CFA model, but tuning the higher di-
mensional parameter vector becomes harder. Note that if F̄ π (θ ) is not differentiable, then its subgradient can
be still computed using (9). However, when F̄ π (θ ) is not convex
5. The stochastic search algorithm (concave), its subgradient may not exist and the concept of gen-
eralized subgradient should be employed. If ∇θ F̄ π (θ , ω ) exists for
Our goal in this section is to solve problem (8) under spe- every ω ∈ , the ability to calculate its unbiased estimator allows
cific assumptions on F π (θ ). Even for a simple parameterization, us to use SA-type techniques such as stochastic gradient descent
this function is possibly nonconvex and nonsmooth which makes (SGD) to determine the optimal parameter θ ∗ . However, this is not
the optimization problem hard to solve. On the other hand, com- always the case. The function F π (θ ) can be generally nonsmooth
puting unbiased (sub)gradient estimates of the objective function and nonconvex and hence, its subgradient may not exist every-
w.r.t the parameters may be prohibitive or impossible. We first where. Moreover, calculating (9) may not be easy. Therefore, we
present a result on computing unbiased stochastic (sub)gradient propose an alternative way to estimate gradient of F π (θ ).
of F π (θ ) under certain conditions. We then discuss the setting To simplify our notation, we drop the superscript π for the
in which we cannot compute these stochastic (sub)gradients and policies in definition of the objective function and it only refers
we only have access to noisy evaluations of F π (θ ). We present a to the number π in the rest of this section. Before we proceed, we
simulation-based optimization algorithm based on a randomized assume that the objective function is Lipschitz continuous w.r.t θ
Gaussian smoothing technique and establish its finite-time rate of with constant L0 > 0, for any ω ∈ i.e.,
convergence to a stationary point of problem (8) when F π (θ ) is
|F̄ (θ1 , ω ) − F̄ (θ2 , ω )| ≤ L0 θ1 − θ2 ∀ θ1 , θ2 ,
possibly nonsmooth and nonconvex.
Stochastic approximation algorithms require computing which consequently implies that F (θ ) is Lipschitz continuous with
stochastic (sub)gradients of the objective function iteratively. constant L0 . This is a reasonable assumption for most of the ap-
Due to the special structure of F π (θ ), its (sub)gradient can be plications as the cost (objective function) does not make sudden
computed recursively under certain conditions as shown in the changes w.r.t small change of resources (policies). This property
next result. will be used to establish the convergence analysis of our proposed
algorithm. Furthermore, we assume that noisy evaluations of F (θ )
Proposition 5.1. Assume F̄ π (·, ω ) is convex/concave for every ω ∈ ,
can be obtained through simulations and hence, we can use tech-
and F π (· ) is finite valued in the neighborhood of θ . If distribution of
niques from simulation-based optimization where even the shape
ω is independent of θ , we have
of the function may not be known (see e.g., Fu, 2015 and the ref-
∇θ F π (θ ) = E ∇θ F̄ π (θ , ω ) , erences therein). In the reminder of this section, we provide a
5
JID: EOR
ARTICLE IN PRESS [m5G;August 12, 2023;19:0]
S. Ghadimi and W.B. Powell European Journal of Operational Research xxx (xxxx) xxx
zeroth-order SA algorithm and establish its finite-time convergence for ∇ Fηk (θ k ) at each iteration k, which is a convex combination
analysis to solve problem (8). of all generated zeroth-order gradient estimators up to iteration
A smooth approximation of the function F (θ ) can be defined by k. Indeed, taking this weighted average of gradient estimators will
the following convolution: help use to reduce the variance of these estimates. When working
with zeroth-order estimators, we have an additional level of noise
1
Fη (θ ) = F (θ + ηv )e− 2 v dv = Ev [F (θ + ηv )].
1 2
(10) (Gaussian noise in our case) to use in the finite-difference formula.
( 2π ) 2
d
Lemma 5.2. Assume that the function F is Lipschitz continuous with For k = 1, . . . , R:
constant L0 and η1 , η2 > 0. Then, for any θ ∈ Rd , we have 3: Update policy parameters as
and set
2 L 0 d | η2 − η1 |
≤ , Ḡk = (1 − αk )Ḡk−1 + αk Gηk (θ k , ωk ). (19)
η2
where the last inequality follows from the Lipschitz continuity of End For
F and the fact that Ev [ v 2 ] = d.
6
JID: EOR
ARTICLE IN PRESS [m5G;August 12, 2023;19:0]
S. Ghadimi and W.B. Powell European Journal of Operational Research xxx (xxxx) xxx
E[ Gηk (θ k , ωk ) − Ḡk−1 2
] ≤ 4L20 (d + 4 )2 ,
N
which together with (26) and the assumption that Lηk βk ≤ c2 , im-
αk βk E ∇ Fηk (θ k ) 2 ≤ 2(1 + 2c1 c22 )(F ∗ − F (θ 0 )+2η1 L0 d )
ply that
k=1
2αk βk Ḡk−1 2
+ βk Ḡk 2
≤ βk−1 Ḡk−1 2
N
8c1 (ηk − ηk−1 )2
+ L20 (d + 4 ) 2
βk
αk ηk2 + 2 Fηk (θ ) − Fηk (θ
k k−1
) + L20 (d + 4 )2 (4 + c2 )βk αk2 .
k=1
Summing up the above inequalities, and rearranging the terms, and
+ [2(1 + c1 ) + (4 + c2 )(1 + 2c1 c22 )]αk2 , (23) noting the fact that Ḡ0 = 0, we obtain
N
N
where the expectation is taken w.r.t the random vector ω, and Gaus- 2 αk βk E[ Ḡk−1 2 ] ≤ 2N + L20 (d + 4 )2 (4 + c2 ) βk αk2 ,
sian random vector v. k=2 k=1
− k k
Ḡk−1 2
, (25)
2
≤2 ∇ Fηk (θ k ) − Ḡk 2
+ (1 − αk ) Ḡk−1 2
where δk := ∇ Fηk (θ k ) − Gηk (θ k , ωk ). Moreover, by (19), we have
Ḡ k 2
= Ḡ − Ḡ k k−1 2
+ Ḡ k−1 2
+ 2 Ḡ − Ḡk k−1
, Ḡ k−1
+ αk Gηk (θ k , ωk ) 2
, (28)
= α Gηk (θ , ω ) − Ḡ
2
k
k
+ Ḡ k k−1 2 k−1 2
where the second inequality comes from the convexity of · 2
−αk βk δk , Ḡ k−1
+ 2αk (1 − αk ) ∇ Fηk (θ ) − Ḡ
k k−1
, δk .
(30)
β α2
+ k2 k Gηk (θ k , ωk )−Ḡk−1 2 +Lηk βk Ḡk−1 2 . Moreover, by the convexity of · 2 and we have
(26) (1 − αk )(∇ Fηk (θ ) − Ḡ k k−1
) 2
= (1 − αk )(∇ Fηk−1 (θ k−1 ) − Ḡk−1 )
Dividing both sides of (19) by Ak (defined in (21)), summing them + αk ek 2
k
αi + αk ek 2
, (31)
Ḡk = Ak Gηi (θ i , ωi ), (27)
Ai where
i=1
k
αi
k
αi (1 − αi )
E[ Ḡk 2
] ≤ L20 (d + 4 )2 Ak ≤ L20 (d + 4 )2 , + 2Ak ∇ Fηi (θ i ) − Ḡi−1 , δi . (34)
Ai Ai
i=1 i=1
7
JID: EOR
ARTICLE IN PRESS [m5G;August 12, 2023;19:0]
S. Ghadimi and W.B. Powell European Journal of Operational Research xxx (xxxx) xxx
In the rest of the proof, only for the sake of simplicity, we assume Proof. First, note that by choices of parameters in (36), we have
that η0 and θ0 are chosen such that ∇ Fη0 (θ 0 ) = 0. Now, by (24),
Lipschitz continuity of F̄ (·, ω ) and ∇ Fη , (13), and (14), we have
N
δ N 1 δN
βk αk = = 2 ,
L20 d δ ( d + 4 )N L0 d 2 (d + 4 )
∇ Fηk (θ ) − ∇ Fηk−1 (θ )
k k−1 2 k=1
≤ 2 ∇ Fηk (θ k ) − ∇ Fηk (θ k−1 ) 2 + ∇ Fηk (θ k−1 ) − ∇ Fηk−1 (θ k−1 ) 2
N
δ N 1
βk αk2 = ≤ ,
k=1
L20 d δ ( d + 4 )N L20 d (d + 4 )
4L2 d2 (ηk − ηk−1 )2
≤ 2 (Lηk αk βk )2 Ḡk−1 2 + 0 ,
N
N
Ai−1 − Ai
N
ηk2 αi Ai = Ai (Ai−1 − Ai )(1 − αi )
i=k i=k
Ai−1
i=k
E δk 2 ≤ 2 ∇ Fηk (θ k ) 2 + E[ Gηk (θ k−1 , ωk ) 2 ≤ 2L20 (d + 4 )2 ,
N
E[ ∇ Fηk (θ ) − Ḡ
k k−1
, δk ] = 0.
= (Ai − Ai+1 )
i=k
Therefore, taking expectation from both sides of (34) and noting = Ak − AN+1 ≤ Ak ∀k ≥ 1 ,
(32), we obtain
which implies that the assumptions in (20) hold with c1 = c2 = 1
N
and together with (22) and (23) imply (37) and (38).
βk αk E[ ∇ Fηk (θ k ) − Ḡk 2 ]
k=1 We now add a few remarks about the abode results. First,
note that a sufficient condition to obtain an -stationary point
N
k
αi
k
α 2
of the smooth approximation problem (any θ̄ ∈ Rd such that
≤ βk αk Ak E[ ei 2
]+ i
E[ δi 2 ]
Ai Ai E[ ∇ Fη (θ̄ ) 2 ] ≤ ), is to make the RHS of (38) less than the target
k=1 i=1 i=1
accuracy which, after neglecting some constants, implies that the
N
αk N
total number of function evaluations (2N) is bounded by
= βi αi Ai E[ ek 2
] + αk E[ δk 2
Ak
k=1 i=k L40 d3
O . (40)
N
αk βk
N δ 2
≤ αi Ai E[ ek 2
] + αk E[ δk 2
Ak This bound is slightly better than the one obtained in Nesterov &
k=1 i=k
Spokoiny (2017) (for the weighted average of E[ ∇ Fη (θ k ) 2 ] with-
N
out introducing the random index R) in terms of dependence on L0 .
≤ c1 αk βk E[ ek 2 ] + αk E[ δk 2
It should be noted that due to the choice of η in (36), the param-
k=1
eter δ controls the error between the original objective function
N
4L20 d2 βk (ηk − ηk−1 )2 and its smooth approximations i.e., | f (θ ) − f δ√ (θ )| ≤ δ for any
≤ 2 c1 c22 αk βk E[ Ḡk−1 2
]+
αk ηk2 L0 d
k=1 given θ . Hence, as δ goes to zero, the output of Algorithm 1 will
N be closer to a stationary point of problem (8).
+2c1 L20 (d + 4 )2 βk αk2 , (35) Second, we can adaptively choose βk and ηk such that they
k=1 gradually converge to zero. For example, if both βk and ηk are in
where the second to the fourth inequalities follow from the as- the order of 1/kγ for some γ ∈ (0, 1 ), the algorithm is still conver-
sumptions in (20). Combining the above relation with (22) and gent, albeit with a worse complexity than (40). In this case, we do
(28), we obtain (23). not need to use a very small smoothing parameter at the begin-
ning iteration of the algorithm.
In the next result, we specialize the rate of convergence of Third, the weighted average of stochastic gradients in (19) is
Algorithm 1 by specifying its parameters. used to reduce the variance associated with gradient estimates. To
further reduce this variance, one can use a mini-batch of samples
Corollary 5.1. Let the assumptions in the statement of Theorem to compute (18). In particular, given a batch size of mk and gener-
5.1 hold and an iteration limit N ≥ 1 is given. If the parameters are mk
ating samples ωk = {ωk,i }i=1 , the stochastic gradient used in (19) is
set to computed as
1 δ δ
αk = , ηk = √ , βk = k = 1, . . . , N, 1
mk
for some δ > 0. Then we have This additional averaging will further improve the practical perfor-
mance of the algorithm as shown in the next section. Also, it is
L20 (d + 4 ) (F ∗ − F (θ 0 ) + 5 )
3
2
E[ ḠR 2
]≤ √ , (37) worth noting that E[ ∇ FηR (θ R ) − ḠR 2 ] converges to zero with the
δN same rate presented in Corollary 5.1. Hence, Ḡk can be used as
an online certificate to assess the quality of generated solutions
3
6L20 (d+4 ) 2 (F ∗ −F (θ 0 )+6 ) without taking extra batch of samples. This is another advantage
E[ ∇ FηR (θ R ) 2 ] ≤ √
δN
, (38)
of using the weighted average of stochastic gradients to update the
where expectation is also taken w.r.t the random integer number R policy at each iteration of the SANG method.
whose probability distribution is supported on {1, . . . , N} and is given Finally, when the smoothing parameter is fixed, βk can be set
by to any number while changing the rate of convergence by a con-
stant factor. Hence, practically successful stepsize policies can be
αk βk tried. For example, one can use the widely used adaptive step-
PR (R = k ) = N k ∈ {1, . . . , N }. (39)
k=1 αk βk
size formula in the machine learning community for stochastic op-
timization, namely, the Root Mean Square Propagation (RMSProp)
8
JID: EOR
ARTICLE IN PRESS [m5G;August 12, 2023;19:0]
S. Ghadimi and W.B. Powell European Journal of Operational Research xxx (xxxx) xxx
Fig. 3. Averaged performance of lookup parameterization policy under perfect forecasts. Each curve represents performance of the lookup policy over changing one θi
(i = 1, . . . , 9) while θ j = 1 ∀ j = i. The rest of θi s (i > 9) have similar behavior and are removed to increase the readability of the graph.
9
JID: EOR
ARTICLE IN PRESS [m5G;August 12, 2023;19:0]
S. Ghadimi and W.B. Powell European Journal of Operational Research xxx (xxxx) xxx
Table 1
Energy storage model parameters.
βd βc γd γc Rmax R0 CP Ptm
1 1 25 25 400 20 30 10
6. Numerical experiments
10
JID: EOR
ARTICLE IN PRESS [m5G;August 12, 2023;19:0]
S. Ghadimi and W.B. Powell European Journal of Operational Research xxx (xxxx) xxx
Fig. 6. Performance of Algorithm 1 and CMA-ES to optimize the time-independent lookup table parameterization policy with ρE = 0.2 over 10 runs with the starting point
of θi = 1.
Fig. 7. Performance of Algorithm 1 running from four different starting points to optimize the time-independent lookup table parameterization policy with ρE = 0.2 over 10
runs of 500 iterations. The algorithms parameters are set to a = 2, b = 1, mk = 1.
the wind forecast under the exponential decay parameterized pol- ter a and report the results in the middle graph of Fig. 5. As it can
icy. be observed, the choice of a = 2 achieves the best performance.
In our last set of experiments, we test the performance of Finally, we fine-tune the batch size mk by setting b = 1, a = 2. As
Algorithm 1 in optimizing the parameters for the parametric CFA it can be seen from the bottom graph of Fig. 5, the best perfor-
approach. We focus on the lookup table parameterization (π = mance is obtained by the choice of mk = 1. Hence, we set the pa-
lkup) which has a larger search space (θ ∈ R23 ). To do so, we first rameters to a = 2, b = 1, mk = 1 for the rest of this experiment. We
need to fine-tune the parameters used in the algorithm. We use should emphasis that a better approach for fine-tuning more than
the stepsize policy in (42) with the choice of γk = 0.1 (as it is com- one hyperparameter might be using a bilevel programming model
mon in the machine learning literature), different batch sizes at in which the upper level problem is defined as an optimization
each iteration to compute (18) with (41), and weight coefficients problem w.r.t. the hyperparameters over the validation set while
in (36) which are multiplied by a constant factor i.e., the lower optimization problem is defined w.r.t. the parametrized
a policies over the training data set. Recently, gradient-based models
αk = , (44) have been developed to solve such problems efficiently, however,
δ ( d + 4 )N it is out of scope of this paper.
for some a > 0. We also set δ = 1 and ηk = 0.1. We then do the After fine-tuning the aforementioned parameters, we compare
fine-tuning for a in (44), b in (42), and mk in (41) by running the performance of Algorithm 1 against that of Covariance Ma-
Algorithm 1 five times, each for 200 iterations starting from θi = trix Adaptation Evolution Strategy (CMA-ES) which is a well-known
1 ∀i. We then evaluate the objective improvement after using 40 derivative-free global optimization method. Since this method is
samples (20 iterations when mk = 1). First, we fine-tune the algo- designed for deterministic problems, we use the sample average
rithm w.r.t b by setting a = mk = 1. As it can be seen from the top approximation (SAA) approach for estimating the function values.
graph in Fig. 5, the best performance is achieved by the choice of We use a fixed number of samples (totally 50 samples in each iter-
b = 1. Using this choice of b and mk = 1, we fine-tune the parame- ation of CMA-ES), run both methods 10 times and report the aver-
11
JID: EOR
ARTICLE IN PRESS [m5G;August 12, 2023;19:0]
S. Ghadimi and W.B. Powell European Journal of Operational Research xxx (xxxx) xxx
age and standard deviation of the percentage of the objective im- Ghadimi, S., Ruszczynski, A., & Wang, M. (2020). A single timescale stochastic ap-
provement in Fig. 6. As it can be seen, Algorithm 1 outperform the proximation method for nested stochastic optimization. SIAM Journal on Opti-
mization, 30(1), 960–979.
CMA-ES in terms of both quality of the solutions and their vari- Graves, H. C., Stephen, C., Meal, Dasu, S., & Qui, Y. (1986). Two-stage production
ability. The possible reason is that good estimates of the objective planning in a dynamic environment. In S. Axsäter, C. Schneeweiss, & E. Silver
function is a key point in the success of CMA-ES which requires a (Eds.), Multi-stage production planning and inventory control (pp. 9–43). Berlin,
Heidelberg: Springer Berlin Heidelberg.
large number of samples which is computationally very expensive Heath, D. C., & Jackson, P. L. (1994). Modeling the evolution of demand forecasts
in our case. with application to safety stock analysis in production/distribution systems. IIE
We also test the performance of Algorithm 1 by running it for Transactions, 26(3), 17–30.
Jiang, D. R., & Powell, W. B. (2015). Optimal hour-ahead bidding in the real-time
500 iterations using four different starting points. One of them
electricity market with battery storage using approximate dynamic program-
is θi = 1 ∀i and the other three are randomly chosen such that ming. INFORMS Journal on Computing, 27(3), 525–543.
θi ∈ [0.5, 1.5] ∀i. Agian, we repeat the runs 10 times and report Keerthisinghe, C., Chapman, A. C., & Verbič, G. (2019). Energy management of PV-s-
torage systems: Policy approximations using machine learning. IEEE Transactions
the average and standard deviation of the percentage of the ob-
on Industrial Informatics, 15(1), 257–265.
jective improvement in Fig. 7. As it can be seen, Algorithm 1 is Khotanzad, A., Davis, M. H., Abaye, A., & Maratukulam, D. J. (1996). An artificial neu-
able to find good parameterized policies regardless of the staring ral network hourly temperature forecaster with applications in load forecasting.
points. Moreover, it can improve the benchmark policy up to 150% IEEE Transactions on Power Systems, 11(2), 870–876.
Kiaei, I., & Lotfifard, S. (2018). Tube-based model predictive control of energy storage
which is significantly higher than that of the exponential decay systems for enhancing transient stability of power systems. IEEE Transactions on
(100%) and constant (40%) parametrized policies. While the num- Smart Grid, 9(6), 6438–6447.
bers can change by changing the problem parameters, this high- Kumar, R., Wenzel, M. J., Ellis, M. J., ElBsat, M. N., Drees, K. H., & Zavala, V. M. (2018).
A stochastic model predictive control framework for stationary battery systems.
lights again the importance of the careful parametrization in our IEEE Transactions on Power Systems, 33(4), 4397–4406.
proposed parametric CFA approach. Liu, H., Shi, J., & Erdem, E. (2010). Prediction of wind speed time series using mod-
ified Taylor Kriging method. Energy, 35(12), 4870–4879.
Liu, Y., Roberts, M. C., & Sioshansi, R. (2018). A vector autoregression weather model
7. Conclusion for electricity supply and demand modeling. Journal of Modern Power Systems
and Clean Energy, 6(4), 763–776.
We provide a hybrid policy of deterministic lookahead and cost Nesterov, Y., & Spokoiny, V. (2017). Random gradient-free minimization of convex
functions. Foundations of Computational Mathematics, 17(2), 527–566. https://doi.
function approximations (CFA), namely, the parametric CFA to find
org/10.1007/s10208-015-9296-2.
the best policy to for energy storage problems under the pres- Powell, W. B. (2019). A unified framework for stochastic optimization. European Jour-
ence of rolling forecasts. While this approach can handle complex nal of Operational Research, 275(3), 795–821.
Powell, W. B., & Ghadimi, S. (2022). The parametric cost function approximation:
stochastic models associated with the rolling forecasts, it comes
A new approach for multistage stochastic programming. arXiv preprint arXiv:
at the cost of tuning parameters (policies). The objective func- 2201.00258.
tion in the parametric CFA model is likely to be nonconvex and Powell, W. B., & Meisel, S. (2016). Tutorial on stochastic optimization in energy
its unbiased gradient estimates are not easy to calculate. Hence, - Part I: Modeling and policies. IEEE Transactions on Power Systems, 31(2),
1459–1467.
we present a new stochastic numerical derivative-based algorithm Powell, W. B., & Meisel, S. (2016). Tutorial on stochastic optimization in energy -
which only uses noisy function evaluations (obtained via simula- Part II: An energy storage illustration. IEEE Transactions on Power Systems, 31(2),
tions) to provide biased gradient estimates. By properly taking a 1468–1475.
saltyte Benth, J., Benth, F. E., & Jalinskas, P. (2007). A spatial-temporal model for
weighted average of these biased gradient estimates, we reduce temperature with seasonal variance. Journal of Applied Statistics, 34(7), 823–841.
the variance associated with them, which enables us to control Sapra, A., & Jackson, P. L. (2004). The martingale evolution of price forecasts in a sup-
accumulated the bias errors. Furthermore, we establish finite-time ply chain market for capacity: Technical report.
Sioshansi, R., Madaeni, S. H., & Denholm, P. (2014). A dynamic programming ap-
rate of convergence of this algorithm under different settings and proach to estimate the capacity value of energy storage. IEEE Transactions on
show that it can practically find policies that perform better than Power Systems, 29(1), 395–403.
the deterministic benchmark policy in optimizing an energy stor- Spall, J. C. (1992). Multivariate stochastic approximation using a simultaneous per-
turbation gradient approximation. Automatic Control, IEEE Transactions on, 37(3),
age system under the presence of rolling forecasts.
332–341.
Strassen, V. (1965). The existence of probability measures with given marginals. An-
References nals of Mathematical Statistics, 38, 423–439.
Taylor, J. W., & Buizza, R. (2003). A comparison of temperature density forecasts
Abhishek, K., Singh, M., Ghosh, S., & Anand, A. (2012). Weather forecasting model from GARCH and atmospheric models. Journal of Forecasting, 23(5), 337–355.
using artificial neural network. Procedia Technology, 4, 311–318. Tieleman, T., & Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a run-
Akarslan, E., Hocaoğlu, F. O., & Edizkan, R. (2014). A novel M-D (multi-dimensional) ning average of its recent magnitude. In COURSERA: Neural networks for machine
linear prediction filter approach for hourly solar radiation forecasting. Energy, learning.
73(C), 978–986. Traiteur, J. J., Callicutt, D. J., Smith, M., & Roy, S. B. (2012). A short-term ensemble
Almassalkhi, M. R., & Hiskens, I. A. (2015). Model-predictive cascade mitigation in wind speed forecasting system for wind power applications. Journal of Applied
electric power systems with storage and renewables–Part II: Case-study. IEEE Meteorology and Climatology, 51(10), 1763–1774.
Transactions on Power Systems, 30(1), 78–87. Xi, X., & Sioshansi, R. (2016). A dynamic programming model of energy storage and
Arbizu-Barrena, C., Ruiz-Arias, J. A., Rodríguez-Benítez, F. J., Pozo-Vázquez, D., & To- transformer deployments to relieve distribution constraints. Computational Man-
var-Pescador, J. (2017). Short-term solar radiation forecasting by advecting and agement Science, 13, 119–146.
diffusing MSG cloud index. Solar Energy, 155, 1092–1103. Xi, X., Sioshansi, R., & Marano, V. (2014). A stochastic dynamic programming model
Dicorato, M., Forte, G., Pisani, M., & Trovato, M. (2012). Planning and operating com- for co-optimization of distributed energy storage. Energy Systems, 5, 475–505.
bined wind-storage system in electricity market. IEEE Transactions on Sustainable Zafar, R., Ravishankar, J., Fletcher, J. E., & Pota, H. R. (2018). Multi-timescale model
Energy, 3(2), 209–217. predictive control of battery energy storage system using conic relaxation in
Dokka, T., & Frimpong, R. (2019). Approximate policy iteration using neural net- smart distribution grids. IEEE Transactions on Power Systems, 33(6), 7152–7161.
works for storage problems. arXiv preprint arXiv:1910.01895. Zhang, Z., Zhang, Y., Huang, Q., & Lee, W. (2018). Market-oriented optimal dispatch-
(2015). In M. C. Fu (Ed.), Handbook of simulation optimization. Springer. ing strategy for a wind farm with a multiple stage hybrid energy storage sys-
Ghadimi, S., & Lan, G. (2013). Stochastic first- and zeroth-order methods for noncon- tem. CSEE Journal of Power and Energy Systems, 4(4), 417–424.
vex stochastic programming. SIAM Journal on Optimization, 23(4), 2341–2368.
12