0% found this document useful (0 votes)
59 views6 pages

Lyapunov-Regularized Reinforcement Learning For Power System PDF

This paper proposes using Lyapunov regularization to train neural network controllers for power system transient stability. Specifically, it learns a Lyapunov function parameterized by a neural network to guide controller training. The loss function for the Lyapunov function emphasizes states near equilibrium. This learned Lyapunov function is then used to regularize reinforcement learning of the controller, penalizing actions that violate Lyapunov conditions. Simulation results show the controller achieves smaller losses while remaining stabilizing, unlike controllers trained without this regularization.

Uploaded by

salemg82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views6 pages

Lyapunov-Regularized Reinforcement Learning For Power System PDF

This paper proposes using Lyapunov regularization to train neural network controllers for power system transient stability. Specifically, it learns a Lyapunov function parameterized by a neural network to guide controller training. The loss function for the Lyapunov function emphasizes states near equilibrium. This learned Lyapunov function is then used to regularize reinforcement learning of the controller, penalizing actions that violate Lyapunov conditions. Simulation results show the controller achieves smaller losses while remaining stabilizing, unlike controllers trained without this regularization.

Uploaded by

salemg82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Lyapunov-Regularized Reinforcement Learning for Power System

Transient Stability
Wenqi Cui and Baosen Zhang

Abstract— Transient stability of power systems is becoming for optimal control (see, e.g., [6] and the reference within).
increasingly important because of the growing integration of References [7]–[10] apply these algorithms for power system
renewable resources. These resources lead to a reduction in frequency regulation. However, the stabilizing requirement of
mechanical inertia but also provide increased flexibility in
arXiv:2103.03869v1 [eess.SY] 5 Mar 2021

frequency responses. Namely, their power electronic interfaces the controllers is not considered in these works.
can implement almost arbitrary control laws. To design these The challenge of ensuring controllers are stable is more
controllers, reinforcement learning (RL) has emerged as a difficult to address. If a Lyapunov function is available, it
powerful method in searching for optimal non-linear control can potentially provide analytical constraints on the con-
policy parameterized by neural networks. troller. For lossless power systems, using a well-known
A key challenge is to enforce that a learned controller must
be stabilizing. This paper proposes a Lyapunov regularized RL energy function [2], [11], our previous work in [12] showed
approach for optimal frequency control for transient stability how to impose structural constraints on the neural network
in lossy networks. Because the lack of an analytical Lyapunov controllers such that they are guaranteed to be stabilizing.
function, we learn a Lyapunov function parameterized by a Unfortunately, for lossy networks, there are no known ana-
neural network. The losses are specially designed with respect lytic energy functions [1]. Most transmission lines have non-
to the physical power system. The learned neural Lyapunov
function is then utilized as a regularization to train the zero resistances, and distribution systems can have high r/x
neural network controller by penalizing actions that violate the ratios.
Lyapunov conditions. Case study shows that introducing the If analytical Lyapunov functions are not available, a nat-
Lyapunov regularization enables the controller to be stabilizing ural approach would be to learn a Lyapunov function to
and achieve smaller losses. facilitate controller design. For example, given input/output
I. I NTRODUCTION data and the assumption that the underlying system is stable,
Transient stability in power systems refers to the ability reference [13] learns a Lyapunov function jointly with learn-
of a system to converge to an acceptable steady-state after ing the system model to find stable system dynamics. The
a disturbance [1], [2]. With the increased penetration of work in [14] uses satisfiability modulo theories solvers to
renewable energy sources (RES), power systems have re- formally verify a function satisfies the Lyapunov conditions
duced inertia and transient stability is becoming increasingly and has been applied to microgrids [15]. However, it is
important [3]. Meanwhile, RES are connected to the grid via only currently computational tractable for small systems.
electronic interfaces and can be controlled freely by inverters Moreover, the above works focus on verifying a system is
to implement almost arbitrary control laws. Instead of linear stable and do not include controller design.
droop frequency response found in conventional generators, This paper proposes a Lyapunov regularization approach
the response of the inverter-based RES can be optimized to to guide the training of neural network controller for primary
improve performance by implementing more flexible control frequency response in lossy power systems. We learn a
laws [4]. Lyapunov function parameterized by a neural network. The
Transient stability models describe how frequency changes loss function for training the neural Lyapunov function is de-
in a system with a large deviation of operating states, and signed to satisfy the positive definiteness of its value and the
use the full nonlinear AC power flow equations [1]. Two negative definiteness of its Lie derivative. Existing methods
challenges emerge in controller design. Firstly, the problem in [13]–[15] weigh all the states equally in the loss function,
is over a functional space, which is infinite-dimensional. but this will cause the sub-optimum of Lyapunov function
Secondly, the controllers should be stabilizing, which is a near the equilibrium since the magnitude of states’ time
nontrivial constraint to enforce algorithmically for nonlinear derivative shrink quickly when approaching the equilibrium.
systems. Considering that the states near the equilibrium are more
A popular way to address the first challenge is to pa- important for control, we specially design the loss function
rameterize the controllers (e.g., using a neural network) such that the area around the equilibrium is emphasized.
and training them using reinforcement learning (RL) [5]. The neural Lyapunov function is utilized as a regular-
Abundant algorithms, including Q-learning, deep direct rein- ization to train the neural network controller by penalizing
forcement learning (DDPG), actor-critic, have been proposed actions that violate the Lyapunov conditions. The regular-
ized RL is integrated in the RNN based framework in
Department of Electrical and Computer Engineering, University of Wash- our previous work to increase its training efficiency [12].
ington Seattle, WA 98195, USA {wenqicui, zhangbao}@uw.edu
The authors are supported in part by the National Science Foundation Simulation results show that the learned function satisfies the
grant ECCS-1930605 and the Washington Clean Energy Institute. Lyapunov conditions for almost all points in the state space,
thus making it a good tool for regularization. Case study aim to find an optimal stabilizing controller u(·) by solving
shows that introducing the Lyapunov regularization enables (2).
the controller to achieve smaller loss. More importantly, N
a controller designed without regularization can lead to
X
min ||ωi ||∞ + γ||ui ||22 (2a)
unstable behaviors. All of the code and data described in u
i=1
this paper are publicly available at https://github.com/Wenqi- s.t. (1a) − (1b) (2b)
Cui/Lyapunov-Regularized-RL. One important future work is
to verify whether the learned function satisfies the Lyapunov ui ≤ ui (ωi ) ≤ ui (2c)
conditions for all points in a region. ui (ωi ) is stabilizing (2d)

II. M ODEL AND P ROBLEM F ORMULATION where γ is a tradeoff parameter between cost of frequency
deviation and action. The swing equations are in (2b). The
A. Frequency Dynamics controller are power limited within the upper bound ui and
Let N be the number of buses and E be the set of lower bound ui in (2c). We impose the condition that the
transmission lines connecting the buses. The susceptance and controller should be stabilizing in (2d). Constraints (2b)-(2d)
conductance of the line (i, j) ∈ E are Bij = Bji > 0 hold for the time t from 0 to T . Other costs (e.g., total
and Gij = Gji > 0, respectively. When (k, l) ∈ / E, the frequency deviation and the rate of change of frequency)
values are 0. We use the Kron reduced model to aggregate can also be accommodated.
load buses into generator buses [16], [17]. We assume that Problem (2) is challenging to solve by conventional control
each bus i has the conventional inertia Mi and the damping techniques and we will use RL to find u(·). The key difficulty
from synchronous generator and loads is denoted as Di . We is to quantify the stability requirement in (2d). We mitigate
assume that each bus has inverter-connected resources that this difficulty by using a Lyapunov function, which provides
can be controlled. Note that this is without loss of generality algebraic conditions for (2d). Since a Lyapunov function is
since the upper and lower actuation bounds can be set to not known for lossy systems [1], we show how one can be
zero if a resource is not present. learned in the next section.
The angle and frequency of bus i are δi and ωi , respec-
III. L EARNING A LYAPUNOV FUNCTION
tively. We assume that the bus voltage magnitudes are 1 p.u.
and the reactive power flows are ignored. The dynamics of A. Lyapunov Conditions
the power system is represented by the swing equation [18] From standard system theory, the Lyapunov
function need to satisfy conditions on its value
δ˙i =ωi , ∀i = 1, · · · , N (1a)
and its Lie derivatives [21]. Let the state space be
N
X D = {(δ, ω)|δ = (δ1 , · · · , δN ), ω = (ω1 , · · · , ωN )}. The
Mi ω̇i =Pm,i − Di ωi − ui (ωi ) − Bij sin(δi − δj )
state transition dynamics (1) is written as (δ̇, ω̇) = fu (δ, ω),
j=1,j6=i
where fu stands for the state transition function with respect
N
X to the controller u. Using the notation from [14], we have
− Gij cos(δi − δj ), ∀i = 1, · · · , N
j=1,j6=i Definition 1 (Lie Derivatives). The Lie derivative of the
(1b) continuously differentiable scalar function V : D → R over
the vector field fu is defined as
where ui (ωi ) is the controller that changes active power to
provide primary frequency response. Because power systems N
X ∂V (δ, ω) ∂V (δ, ω)
do not have real-time communication infrastructure, we ∇fu V (δ, ω) = δ˙i + ω̇i (3)
i=1
∂δi ∂ωi
restrict ui to be a static feedback controller where only its
local frequency measurement ωi is available. We envision the It measures the rate of change of V along the direction
control is provided by renewable energy resources such as of the system dynamics. The next proposition is standard in
batteries and solar PV. In the primary frequency regulation nonlinear systems.
timescale from 100ms to few seconds for primary frequency
regulation, the main limitation on actuation comes from Proposition 1 (Lyapunov function and asymptotic stabil-
power injection constraints. ity). Consider a controlled system described by (1) with
equilibrium at (δ ∗ , ω ∗ ). Suppose there exists a continuously
B. Optimization Problem Formulation differentiable function V : D → R that satisfies the following
conditions
The objective is to minimize the cost on frequency
deviations and the control effort. In this paper, we use V (δ, ω) > V (δ ∗ , ω ∗ ) ∀(δ, ω) ∈ D\{(δ ∗ , ω ∗ )} (4a)
frequency nadir, which is the infinite norm of ωi (t) over ∇fu V (δ, ω) < 0 ∀(δ, ω) ∈ D\{(δ , ω )} ∗ ∗
(4b)
the time horizon from 0 to the time T defined as ||ωi ||∞ =
∇fu Vφ (δ ∗ , ω ∗ ) = 0, (4c)
sup0≤t≤T |ωi (t)| [19]. We use a quadratic cost for the control
RT
actions defined by ||ui ||22 = T1 t=0 (ui (t))2 dt [17], [20]. We Then the system is asymptotically stable at the equilibrium.
2
In this paper, Lyapunov function is parameterized using where (∇fu Vφ (δ ∗ , ω ∗ )) guarantee the small magitute of
neural network with weights φ, and written as Vφ (δ, ω). For ∇fu Vφ (δ ∗ , ω ∗ ). Considering that ∇fu Vφ (δ h , ω h ) should
differentiability, we use ELU activation functions. Note that never be positive, we add σ (∇fu Vφ (δ ∗ , ω ∗ )) to guarantee
Vφ (δ, ω) is purely a function of the state variable (δ, ω), that ∇fu Vφ (δ ∗ , ω ∗ ) is negative close zero. This way, the
while ∇fu V (δ, ω) will be affected by the controller ui zero action at the equilibrium is guaranteed to satisfy
through the term ω̇i in (3). Therefore, only ∇fu V (δ, ω) will Lyapunov conditions.
be utilized to regularize controller once it is learned. Combining (5)-(7), the total loss function is
B. Learning the Lyapunov Function Lq (φ) = q1 l1 (φ) + q2 l2 (φ) + q3 l3 (φ) (8)
The condition (4a) is easy to be satisfied if we explicitly where q1 , q2 , q3 are hyperparameters balancing the loss
engineer the structure of Vφ (δ, ω). To name a few, Vφ (δ, ω) terms, with q3 tuned to be much larger than the others. Note
can be formulated using a convex function achieving the that the equilibrium (δ ∗ , ω ∗ ) is obtained from the steady state
minimum at the equilibrium. Or, given an arbitrary function in (1) and we fix the equilibrium in training. Of course the
g(δ, ω) and positive scalar , (4a) can be enforced by taking equilibrium changes if the load or the parameters changes.
2
Vφ (δ, ω) = (g(δ, ω) − g(δ ∗ , ω ∗ )) + ||(δ, ω) − (δ ∗ , ω ∗ )||2 . More specifically, ω ∗ = 0 always while δ ∗ varies. Since we
However, such parameterization may be too restrictive and only use the learned function as a regularization to train a
make it hard to satisfy (4b). Therefore, we do not explicitly controller, we are robust to changes in the equilibrium point.
engineer the structure in satisfying condition (4a). If the learned function is used to certify stability, then the
In this paper we use loss functions to penalize violations changes in equilibrium should be carefully accounted for.
of (4a)-(4c). Training is implemented in a batch updating
style where the number of batch is H and the state of C. Algorithm with Active Sampling
the h-th batch is randomly generated (δ h , ω h ) ∈ D for The goal for training the neural Lyapunov function is to
h = 1, · · · , H. The losses are designed with respect to the make larger proportional of the batch samples satisfy the
following considerations: conditions (4). The pseudo-code for our proposed method is
1) Avoid overfitting when δ̇ and ω̇ are large given in Algorithm 1. A linear controller is used to initialize
A loss that weighs all points in the space equally leads training. Let % be the proportion of samples that satisfy the
∇fu V (δ, ω) to have very negative values when δ and ω conditions (4). After most of the samples (e.g., % > 95%)
are far away from the equilibrium, and may violate (4b) have already satisfied the conditions, it would be difficult
for points close to the equilibrium. This contradicts the to improve the neural Lyapunov function further since the
premise that the small region around the equilibrium loss function will remain almost unchanged even though
should be stabilizing. Therefore, we design the loss term % increases slightly. We augment the training performance
with ∇fu V (δ, ω) to be by collecting samples violate (4) and add them to the next
H
batch of training. This way, the neural Lyapunov can improve
1 X efficiently and % can reach 99.9% in the end. Adam algorithm
tanh ∇fu Vφ (δ h , ω h )

l1 (φ) =
H is adopted to update weights φ in each episode.
 (5)
h=1
||(δ h , ω h ) − (δ ∗ , ω ∗ )||2

· exp − Algorithm 1 Learning neural Lyapunov function
µ
 Require: Learning rate α, number of episodes I, state
h h transfer function (1), hyperparameters in (5)-(8)
where the term tanh ∇fu Vφ (δ , ω ) avoid the overfit
h h Input: Droop coefficient li for the i-th bus, i = 1, · · · , N
of ∇ fu V (δh ,hω ) ∗to ∗ be extremely negative, the term
exp − ||(δ ,ω )−(δ µ
,ω )||2
emphasis the importance of Initialisation :Initial weights φ for neural network
1: for episode = 1 to I do
(δ h , ω h ) closer to the equilibrium. The hyper-parameter
2: Generate batch state samples δ h , ω h for the h-th batch,
µ controls rate of decay.
h = 1, · · · , H
2) Penalty term with (Vφ (δ ∗ , ω ∗ )) − Vφ (δ, ω))
3: If % > %̄, add the samples violates Lyapunov condition
In order to satisfying condition (4a), Vφ (δ h , ω h ) that is
{(δ, ω)} ← {(δ, ω), (δ̂, ω̂)}
smaller than Vφ (δ ∗ , ω ∗ ) need to be penalized. Let σ(·) be
4: Compute fu (δ, ω) for the sample states with linear
the ReLU function. Define the loss term as:
droop control using (1)
H
1 X 5: Calculate Vφ (δ, ω) and ∇ fu Vφ (δ, ω)
σ −Vφ (δ h , ω h ) + Vφ (δ ∗ , ω ∗ ))

l2 (φ) = (6)
H 6: Identify the states δ̂, ω̂ that does not satisfy Lya-
h=1
punov condition and its percentage %
3) Penalty term with ∇fu Vφ (δ ∗ , ω ∗ ) 7: Calculate total loss of all the batches using (5)-(8)
This term is employed to mitigate numerical errors. We 8: Update weights in the neural network by passing Loss
design a extra loss term to penalize on the value of to Adam optimizer: φ ← φ − αAdam(Loss)
∇fu Vφ (δ ∗ , ω ∗ ) as: 9: end for
2
l3 (φ) = (∇fu Vφ (δ ∗ , ω ∗ )) + σ (∇fu Vφ (δ ∗ , ω ∗ )) (7)
IV. L EARNING N EURAL N ETWORK C ONTROLLER WITH the corresponding bias vector. The variables to be trained are
LYAPUNOV REGULARIZATION weights θ = {q, b, z, c} in (10).
We propose to use the learned neural Lyapunov function To obtain the trajectory for training the controller, we
to guide the training of neural network controller. We adopt discretize dynamics (1) with step size ∆t. We use k and
the neural Lyapunov function as an additional regularization K to represent the discrete time and total number of stages,
that is used during the training process of the neural network respectively. The neural network controller is then denoted
controller. The real-time control policy is computed through as uθi (ωi ). From (1), ωi (k) in each timestep k is a function
the input of local frequency deviation and the weights that of ωi (k − 1) and uθi (ωi (k − 1)), which is then a function
are trained offline. Note that we may be able to achieve better of ωi (k − 2) and uθi (ωi (k − 2). This means that computing
performance through a projection if the Lyapunov conditions gradient of uθi (ωi (k) with respect to θi needs the chain-rule
are violated. However, such a projection requires information from the step k all the way to the first time step for all
of all the state variables in real-time, which is unrealistic for k = 0, · · · , K. To mitigate the computation burden caused
the power system with large numbers of nodes and limited by the subsequent application of chain-rule, we proposed
communication. a RNN-based framework to integrate the state transition
dynamics (1) implicitly.
A. Lyapunov Regularization As illustrated in Fig.1, the state of RNN cell of bus
Given a Lyapunov function, Proposition 2 illustrates the i is set to be (δi , ωi ). The system dynamics (1) is set
condition for locally exponentially stability [22]. as the transition function of RNN cell. At each time k,
state of RNN cell and the current action from neural net-
Proposition 2 (locally exponentially stable condition). For
work controller will go through the transition dynamics
the function V : D → R satisfying (4), if there is constant
to calculate the state of the time k + 1. The state ω
β > 0 such that for all (δ, ω) ∈ D we have
and action u constitute the first two component of output
2
∇fu V (δ, ω) ≤ −β (V (δ, ω) − V (δ ∗ , ω ∗ )) (9) where Yi1 (k) = ωi (k) and Yi2 (k) = (uθi (ωi (k))) . The
total state information and time derivative information are
Then, the equilibrium is locally exponentially stable. simultaneously send as input into Neural Lyapunov function
In order to satisfy (9) with the neural network controller, to calculate the Lyapunov regularization term, written as
we propose a Lyapunov regularization approach that the Yi3 (k) = σ (∇fu Vφ (δ, ω) + β(Vφ (δ, ω) − Vφ (δ ∗ , ω ∗ ))) /N .
action is penalized if this inequality does not hold. Compared
with traditional regularization (e.g., lasso, ridge) or penalty
term on large state magnitude, we do not add regularization
uniformly to all the weights or actions. Instead, the action
is only penalized when (9) is violated. The regularization
term is

Rφ (uθ ) = σ (∇fu Vφ (δ, ω) + β(Vφ (δ, ω) − Vφ (δ ∗ , ω ∗ )))


B. Controller and Architecture
Fig. 1. Structure of RNN for frequency control problem
The formulation of controller and the training architecture
is from our previous work [12]. For completeness, we
reiterate the key design in this subsection. The work in [12] The loss function is formulated to be equivalent with the
showed that a controller mapping frequency to active power objective function (2a) plus the Lyapunov regularization as:
needs to be a function that is monotonic, increasing and goes N K
through the origin. To this end, we explicitly engineer the
X 1 X 2
Loss = max |Yi1 (k)| + γ Yi (k)
neural network controller with a stacked-ReLU structure and i=1
k=0,··· ,K K
k=1
represented as (10) K
(11)
1 X 3
ui (ωi ) = qi σ(1ωi + bi ) + zi σ(−1ωi + ci ) (10a) +λ Yi (k).
K
k=1
l
X l
X
where qij ≥ 0, zij ≤ 0, ∀l = 1, 2, · · · , m (10b) C. Algorithm to Train Neural Network Controller
j=1 j=1
The pseudo-code for learning the neural network controller
(l−1)
b1i = 0, bli ≤ bi , ∀l = 2, 3, · · · , m (10c) is given in Algorithm 2. Training is implemented in a batch
(l−1) updating style where the h-th batch initialized with randomly
c1i = 0, cli ≤ ci , ∀l = 2, 3, · · · , m (10d)
generated initial states {δih (0), ωih (0)} for all i = 1, · · · , N .
where m is the number of hidden units and 1 ∈ Rm is the The evolution of states in K stages will be computed through
all 1’s column vector. Variables qi = [qi1 qi2 · · · qim ] structure of RNN as shown by Fig.1. Although algorithms 1
and zi = [zi1 zi2 · · · zim ] are the weight vector of bus i; and 2 can be iterated to make further update, we did not see
bi = [b1i b2i · · · bm |
i ] and ci = [ci
1
c2i · · · cm |
i ] are an obvious improvement in simulation.
Algorithm 2 Reinforcement Learning with RNN and decay every 30 steps with a base of 0.7.
Require: Learning rate α, batch size H, total time stages K,
number of episodes I, parameters in optimal frequency B. Visualization Lyapunov Function and the Lie Derivative
control problem (2) To visualize the Lyapunov function with a large number
Input: The neural Lyapunov function Vφ (δ, ω) of state variables, we fix all the states at their equilibrium
Initialisation :Initial weights θ for control network value and vary the state variable for one generator bus.
1: for episode = 1 to I do Fig. 2 illustrates the value of Lyapunov function and Lie
2: Generate initial states δih (0), ωih (0) for the i-th bus in derivative with the variation of δ and ω in generator bus 5.
the h-th batch, i = 1, · · · , N , h = 1, · · · , H The Lyapunov function V (δ, ω) achieves the minimum at
3: Reset the state of cells in each batch as the initial the equilibrium point and thus satisfies condition (4a). The
value xhi ← {δih (0), ωih (0)}. Lie derivative ∇fu Vφ (δ, ω) is smaller than zero in most of
4: RNN cells compute through K stages to obtain output the regions and thus also generally satisfy condition (4b).
{Yh,i (0), Yh,i (1), · · · , Yh,i (K)} After convergence, only 0.1% of samples with ω sufficiently
5: Calculate total PH loss PN of all the 1 batches close to zero make ∇fu Vφ (δ, ω) to be slightly positive.
1
Loss = H h=1 i=1 maxk=0,··· ,K |Yh,i (k)| + Such a small positive number only leads to small Lyapunov
1
PK 2 1
PK 3
γ K k=1 Yh,i (k) + λ K k=1 Yh,i (k). regularization term and therefore has neglectable impact on
6: Update weights in the neural network by passing Loss the training of neural network controller.
to Adam optimizer: θ ← θ − αAdam(Loss)
7: end for

V. C ASE S TUDY
Case studies are conducted on the IEEE New England
10-machine 39-bus (NE39) power network to illustrate the
effectiveness of the proposed method. We visualized the Fig. 2. Neural Lyapunov function (left) and Lie derivative (right) when
learned Lyapunov function and its Lie derivative. Then we changing (δ, ω) in generator 5 and keep state variable of other generators
show that regularization is necessary, in the sense that a at the equlibrium value.
controller learned without it can be unstable. Lastly, we show
the training losses.

A. Simulation Setting
The step size for the discrete simulation is set to 0.02
(20ms) and the time stages K is 100. Mechanical power Pm,i
are set at the nominal values, the bound on action ui is
(a) Dynamics of u (left) and ω (right) for RNN with Lyapunov regulariza-
uniformly distributed in [0.8Pm,i , Pm,i ] and γ is set as 0.005. tion
The parameters for training the neural networks are:
• Neural Lyapunov function is parameterized as a dense
neural network with one hidden layer of 50 neurons
and ELU activation. The episode number is 4000. The
hyper-parameters in (5)-(11) are µ = 50, q1 = 10, q2 =
5, q1 = 100, β = 0.005, λ = 0.01. Each episode has
the batch number of 1000 with random states samples, (b) Dynamics of u (left) and ω (right) for RNN without Lyapunov
regularization
where δih is uniformly distributed in [−20, 20] rad, ωih
is uniformly distributed in [−30, 30] Hz. Note that these Fig. 3. Dynamics of frequency deviation w and control action u in
selected generator buses corresponding to (a) RNN-Lyapunov (b) RNN-
values are far larger than the nominal variation of δi w.o.-Lyapunov. The neural network controller trained with Lyapunov regu-
and ωi . This configuration works the best after we try larization achieve better stablizing performance.
different values. Trainable weights are updated using
Adam with learning rate initializes at 0.05 and decay
every 100 steps with a base of 0.9. C. Performance Comparison
• Neural network controller is parameterized as the Under the same hyperparameters and RNN structure, we
stacked-ReLU function (10) with 20 neurons (m = 20). train the neural network controller with Lyapunov regular-
The episode number is 400. The initial states of angle ization (labeled as RNN-Lyapunov) and without Lyapunov
and frequency are randomly generated such that δi (0) is regularization (labeled as RNN-w.o.-Lyapunov), respectively.
uniformly distributed in [−1, 1] rad, ωi (0) is uniformly The dynamics of the system with different controllers are
distributed in [−0.5, 0.5] Hz. Trainable weights are up- illustrated in Fig. 3. For RNN-Lyapunov in Fig. 3(a), the
dated using Adam with learning rate initializes at 0.04 largest frequency deviation varies in [−0.015, 0.02] Hz. By
contrast, RNN-wo-Lyapunov in Fig. 3(b) has larger fre- the neural Lyapunov function as a regularization term for
quency deviation that varies in [−0.03, 0.02] Hz and cannot the training of neural network controller in RL, control
stabilize well. actions that violate Lyapunov conditions are penalized. Case
We further compare RNN-Lyapunov and RNN-w.o.- studies verify introducing Lyapunov regularization enable
Lyapunov with the benchmark of linear droop control, where the controller to be stabilizing and achieve smaller losses,
the droop coefficient is obtained by solving problem (2) using whereas controllers trained without regularization can fail to
fmincon function of Matlab [12]. Fig. 4 illustrates the control stabilize the system.
policy obtained from the three methods. Compared with
R EFERENCES
linear droop control, the stacked-ReLU neural network learns
[1] H.-D. Chiang, “Study of the existence of energy functions for power
a highly non-linear controller. The average cost normalized systems with losses,” IEEE Transactions on Circuits and Systems,
by the cost of linear droop control along episode is shown in vol. 36, no. 11, pp. 1423–1429, 1989.
Fig. 5. Both RNN Lyapunov and RNN-w.o.-Lyapunov con- [2] A. Arapostathis, S. Sastry, and P. Varaiya, “Global analysis of swing
dynamics,” IEEE Transactions on Circuits and Systems, vol. 29,
verge in approximate 150 episodes. After convergence, RNN- no. 10, pp. 673–679, 1982.
Lyapunov reduces the cost by approximate 19% compared [3] Y. Jiang, R. Pates, and E. Mallada, “Dynamic droop control in low-
with linear droop control. The reduction is 5% more than inertia power systems,” IEEE Transactions on Automatic Control,
2020.
that of RNN-w.o.-Lyapunov. Therefore, the proposed method [4] B. B. Johnson, S. V. Dhople, A. O. Hamadeh, and P. T. Krein, “Syn-
learns a non-linear stabilizing controller that performs better chronization of parallel single-phase inverters with virtual oscillator
than traditional linear droop control. control,” IEEE Transactions on Power Electronics, vol. 29, no. 11,
pp. 6124–6138, 2013.
[5] X. Chen, G. Qu, Y. Tang, S. Low, and N. Li, “Reinforcement learning
for decision-making and control in power systems: Tutorial, review,
and vision,” arXiv preprint arXiv:2102.01168, 2021.
[6] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.
MIT press, 2018.
[7] Z. Yan and Y. Xu, “Data-driven load frequency control for stochastic
power systems: A deep reinforcement learning method with continuous
action search,” IEEE Transactions on Power Systems, vol. 34, no. 2,
pp. 1653–1656, 2018.
[8] C. Chen, M. Cui, F. F. Li, S. Yin, and X. Wang, “Model-free
emergency frequency control based on reinforcement learning,” IEEE
Transactions on Industrial Informatics, 2020.
[9] J. Duan, D. Shi, R. Diao, H. Li, Z. Wang, B. Zhang, D. Bian, and Z. Yi,
“Deep-reinforcement-learning-based autonomous voltage control for
Fig. 4. Control action u of RNN-Lyapunov, RNN-w.o.-Lyapunov and power grid operations,” IEEE Transactions on Power Systems, vol. 35,
Linear droop control for generator bus 10. Lyapunov regularization leads to no. 1, pp. 814–817, 2019.
different non-linear control law. [10] D. Ernst, M. Glavic, F. Capitanescu, and L. Wehenkel, “Reinforcement
learning versus model predictive control: a comparison on a power sys-
tem problem,” IEEE Transactions on Systems, Man, and Cybernetics,
Part B (Cybernetics), vol. 39, no. 2, pp. 517–529, 2008.
[11] I. Dobson and H.-D. Chiang, “Towards a theory of voltage collapse
in electric power systems,” Systems & Control Letters, vol. 13, no. 3,
pp. 253–262, 1989.
[12] W. Cui and B. Zhang, “Reinforcement learning for optimal frequency
control: A lyapunov approach,” arXiv preprint arXiv:2009.05654,
2020.
[13] G. Manek and J. Z. Kolter, “Learning stable deep dynamics models,”
arXiv preprint arXiv:2001.06116, 2020.
[14] Y.-C. Chang, N. Roohi, and S. Gao, “Neural lyapunov control,”
Advances in neural information processing systems, 2019.
[15] T. Huang, S. Gao, and L. Xie, “Transient stability assessment of
networked microgrids using neural lyapunov methods,” arXiv preprint
arXiv:2012.01333, 2020.
[16] T. Nishikawa and A. E. Motter, “Comparative analysis of existing
models for power-grid synchronization,” New Journal of Physics,
Fig. 5. Normalized cost along the episode during the training of neural net- vol. 17, no. 1, p. 015012, 2015.
work controller with and without Lyapunov regularization. RNN-Lyapunov [17] A. Ademola-Idowu and B. Zhang, “Frequency stability using inverter
and RNN-w.o.-Lyapunov reduce the cost by approximate 19% and 14% power control in low-inertia power systems,” IEEE Transactions on
compared with linear droop control. Power Systems, pp. 1–1, 2020.
[18] P. Kundur, N. J. Balu, and M. G. Lauby, Power system stability and
control. McGraw-hill New York, 1994, vol. 7.
[19] D. Tabas and B. Zhang, “Optimal l-infinity frequency con-
VI. C ONCLUSION trol in microgrids considering actuator saturation,” arXiv preprint
arXiv:1910.03720, 2019.
This paper proposes a Lyapunov regularization approach [20] F. Dörfler, M. Chertkov, and F. Bullo, “Synchronization in complex
to guide the training of neural network controller for pri- oscillator networks and smart grids,” Proceedings of the National
mary frequency response for transient stability. A function Academy of Sciences, vol. 110, no. 6, pp. 2005–2010, 2013.
[21] S. Sastry, Nonlinear systems: analysis, stability, and control. Springer
paramertized as neural network is learned to overcome the Science & Business Media, 2013, vol. 10.
difficulty brought by the non-existence of analytical Lay- [22] K. J. Åström and R. M. Murray, Feedback systems: an introduction
punov functions for lossy power networks. By integrating for scientists and engineers. Princeton university press, 2010.

You might also like