Lyapunov-Regularized Reinforcement Learning For Power System PDF

This paper proposes using Lyapunov regularization to train neural network controllers for power system transient stability. Specifically, it learns a Lyapunov function parameterized by a neural network to guide controller training. The loss function for the Lyapunov function emphasizes states near equilibrium. This learned Lyapunov function is then used to regularize reinforcement learning of the controller, penalizing actions that violate Lyapunov conditions. Simulation results show the controller achieves smaller losses while remaining stabilizing, unlike controllers trained without this regularization.

Uploaded by

salemg82

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views6 pages

Lyapunov-Regularized Reinforcement Learning For Power System PDF

Uploaded by

salemg82

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Lyapunov-Regularized Reinforcement Learning for Power System

Transient Stability
Wenqi Cui and Baosen Zhang

Abstract— Transient stability of power systems is becoming for optimal control (see, e.g., [6] and the reference within).
increasingly important because of the growing integration of References [7]–[10] apply these algorithms for power system
renewable resources. These resources lead to a reduction in frequency regulation. However, the stabilizing requirement of
mechanical inertia but also provide increased flexibility in
arXiv:2103.03869v1 [eess.SY] 5 Mar 2021

frequency responses. Namely, their power electronic interfaces the controllers is not considered in these works.
can implement almost arbitrary control laws. To design these The challenge of ensuring controllers are stable is more
controllers, reinforcement learning (RL) has emerged as a difficult to address. If a Lyapunov function is available, it
powerful method in searching for optimal non-linear control can potentially provide analytical constraints on the con-
policy parameterized by neural networks. troller. For lossless power systems, using a well-known
A key challenge is to enforce that a learned controller must
be stabilizing. This paper proposes a Lyapunov regularized RL energy function [2], [11], our previous work in [12] showed
approach for optimal frequency control for transient stability how to impose structural constraints on the neural network
in lossy networks. Because the lack of an analytical Lyapunov controllers such that they are guaranteed to be stabilizing.
function, we learn a Lyapunov function parameterized by a Unfortunately, for lossy networks, there are no known ana-
neural network. The losses are specially designed with respect lytic energy functions [1]. Most transmission lines have non-
to the physical power system. The learned neural Lyapunov
function is then utilized as a regularization to train the zero resistances, and distribution systems can have high r/x
neural network controller by penalizing actions that violate the ratios.
Lyapunov conditions. Case study shows that introducing the If analytical Lyapunov functions are not available, a nat-
Lyapunov regularization enables the controller to be stabilizing ural approach would be to learn a Lyapunov function to
and achieve smaller losses. facilitate controller design. For example, given input/output
I. I NTRODUCTION data and the assumption that the underlying system is stable,
Transient stability in power systems refers to the ability reference [13] learns a Lyapunov function jointly with learn-
of a system to converge to an acceptable steady-state after ing the system model to find stable system dynamics. The
a disturbance [1], [2]. With the increased penetration of work in [14] uses satisfiability modulo theories solvers to
renewable energy sources (RES), power systems have re- formally verify a function satisfies the Lyapunov conditions
duced inertia and transient stability is becoming increasingly and has been applied to microgrids [15]. However, it is
important [3]. Meanwhile, RES are connected to the grid via only currently computational tractable for small systems.
electronic interfaces and can be controlled freely by inverters Moreover, the above works focus on verifying a system is
to implement almost arbitrary control laws. Instead of linear stable and do not include controller design.
droop frequency response found in conventional generators, This paper proposes a Lyapunov regularization approach
the response of the inverter-based RES can be optimized to to guide the training of neural network controller for primary
improve performance by implementing more flexible control frequency response in lossy power systems. We learn a
laws [4]. Lyapunov function parameterized by a neural network. The
Transient stability models describe how frequency changes loss function for training the neural Lyapunov function is de-
in a system with a large deviation of operating states, and signed to satisfy the positive definiteness of its value and the
use the full nonlinear AC power flow equations [1]. Two negative definiteness of its Lie derivative. Existing methods
challenges emerge in controller design. Firstly, the problem in [13]–[15] weigh all the states equally in the loss function,
is over a functional space, which is infinite-dimensional. but this will cause the sub-optimum of Lyapunov function
Secondly, the controllers should be stabilizing, which is a near the equilibrium since the magnitude of states’ time
nontrivial constraint to enforce algorithmically for nonlinear derivative shrink quickly when approaching the equilibrium.
systems. Considering that the states near the equilibrium are more
A popular way to address the first challenge is to pa- important for control, we specially design the loss function
rameterize the controllers (e.g., using a neural network) such that the area around the equilibrium is emphasized.
and training them using reinforcement learning (RL) [5]. The neural Lyapunov function is utilized as a regular-
Abundant algorithms, including Q-learning, deep direct rein- ization to train the neural network controller by penalizing
forcement learning (DDPG), actor-critic, have been proposed actions that violate the Lyapunov conditions. The regular-
ized RL is integrated in the RNN based framework in
Department of Electrical and Computer Engineering, University of Wash- our previous work to increase its training efficiency [12].
ington Seattle, WA 98195, USA {wenqicui, zhangbao}@uw.edu
The authors are supported in part by the National Science Foundation Simulation results show that the learned function satisfies the
grant ECCS-1930605 and the Washington Clean Energy Institute. Lyapunov conditions for almost all points in the state space,
thus making it a good tool for regularization. Case study aim to find an optimal stabilizing controller u(·) by solving
shows that introducing the Lyapunov regularization enables (2).
the controller to achieve smaller loss. More importantly, N
a controller designed without regularization can lead to
X
min ||ωi ||∞ + γ||ui ||22 (2a)
unstable behaviors. All of the code and data described in u
i=1
this paper are publicly available at https://github.com/Wenqi- s.t. (1a) − (1b) (2b)
Cui/Lyapunov-Regularized-RL. One important future work is
to verify whether the learned function satisfies the Lyapunov ui ≤ ui (ωi ) ≤ ui (2c)
conditions for all points in a region. ui (ωi ) is stabilizing (2d)

II. M ODEL AND P ROBLEM F ORMULATION where γ is a tradeoff parameter between cost of frequency
deviation and action. The swing equations are in (2b). The
A. Frequency Dynamics controller are power limited within the upper bound ui and
Let N be the number of buses and E be the set of lower bound ui in (2c). We impose the condition that the
transmission lines connecting the buses. The susceptance and controller should be stabilizing in (2d). Constraints (2b)-(2d)
conductance of the line (i, j) ∈ E are Bij = Bji > 0 hold for the time t from 0 to T . Other costs (e.g., total
and Gij = Gji > 0, respectively. When (k, l) ∈ / E, the frequency deviation and the rate of change of frequency)
values are 0. We use the Kron reduced model to aggregate can also be accommodated.
load buses into generator buses [16], [17]. We assume that Problem (2) is challenging to solve by conventional control
each bus i has the conventional inertia Mi and the damping techniques and we will use RL to find u(·). The key difficulty
from synchronous generator and loads is denoted as Di . We is to quantify the stability requirement in (2d). We mitigate
assume that each bus has inverter-connected resources that this difficulty by using a Lyapunov function, which provides
can be controlled. Note that this is without loss of generality algebraic conditions for (2d). Since a Lyapunov function is
since the upper and lower actuation bounds can be set to not known for lossy systems [1], we show how one can be
zero if a resource is not present. learned in the next section.
The angle and frequency of bus i are δi and ωi , respec-
III. L EARNING A LYAPUNOV FUNCTION
tively. We assume that the bus voltage magnitudes are 1 p.u.
and the reactive power flows are ignored. The dynamics of A. Lyapunov Conditions
the power system is represented by the swing equation [18] From standard system theory, the Lyapunov
function need to satisfy conditions on its value
δ˙i =ωi , ∀i = 1, · · · , N (1a)
and its Lie derivatives [21]. Let the state space be
N
X D = {(δ, ω)|δ = (δ1 , · · · , δN ), ω = (ω1 , · · · , ωN )}. The
Mi ω̇i =Pm,i − Di ωi − ui (ωi ) − Bij sin(δi − δj )
state transition dynamics (1) is written as (δ̇, ω̇) = fu (δ, ω),
j=1,j6=i
where fu stands for the state transition function with respect
N
X to the controller u. Using the notation from [14], we have
− Gij cos(δi − δj ), ∀i = 1, · · · , N
j=1,j6=i Definition 1 (Lie Derivatives). The Lie derivative of the
(1b) continuously differentiable scalar function V : D → R over
the vector field fu is defined as
where ui (ωi ) is the controller that changes active power to
provide primary frequency response. Because power systems N
X ∂V (δ, ω) ∂V (δ, ω)
do not have real-time communication infrastructure, we ∇fu V (δ, ω) = δ˙i + ω̇i (3)
i=1
∂δi ∂ωi
restrict ui to be a static feedback controller where only its
local frequency measurement ωi is available. We envision the It measures the rate of change of V along the direction
control is provided by renewable energy resources such as of the system dynamics. The next proposition is standard in
batteries and solar PV. In the primary frequency regulation nonlinear systems.
timescale from 100ms to few seconds for primary frequency
regulation, the main limitation on actuation comes from Proposition 1 (Lyapunov function and asymptotic stabil-
power injection constraints. ity). Consider a controlled system described by (1) with
equilibrium at (δ ∗ , ω ∗ ). Suppose there exists a continuously
B. Optimization Problem Formulation differentiable function V : D → R that satisfies the following
conditions
The objective is to minimize the cost on frequency
deviations and the control effort. In this paper, we use V (δ, ω) > V (δ ∗ , ω ∗ ) ∀(δ, ω) ∈ D\{(δ ∗ , ω ∗ )} (4a)
frequency nadir, which is the infinite norm of ωi (t) over ∇fu V (δ, ω) < 0 ∀(δ, ω) ∈ D\{(δ , ω )} ∗ ∗
(4b)
the time horizon from 0 to the time T defined as ||ωi ||∞ =
∇fu Vφ (δ ∗ , ω ∗ ) = 0, (4c)
sup0≤t≤T |ωi (t)| [19]. We use a quadratic cost for the control
RT
actions defined by ||ui ||22 = T1 t=0 (ui (t))2 dt [17], [20]. We Then the system is asymptotically stable at the equilibrium.
2
In this paper, Lyapunov function is parameterized using where (∇fu Vφ (δ ∗ , ω ∗ )) guarantee the small magitute of
neural network with weights φ, and written as Vφ (δ, ω). For ∇fu Vφ (δ ∗ , ω ∗ ). Considering that ∇fu Vφ (δ h , ω h ) should
differentiability, we use ELU activation functions. Note that never be positive, we add σ (∇fu Vφ (δ ∗ , ω ∗ )) to guarantee
Vφ (δ, ω) is purely a function of the state variable (δ, ω), that ∇fu Vφ (δ ∗ , ω ∗ ) is negative close zero. This way, the
while ∇fu V (δ, ω) will be affected by the controller ui zero action at the equilibrium is guaranteed to satisfy
through the term ω̇i in (3). Therefore, only ∇fu V (δ, ω) will Lyapunov conditions.
be utilized to regularize controller once it is learned. Combining (5)-(7), the total loss function is
B. Learning the Lyapunov Function Lq (φ) = q1 l1 (φ) + q2 l2 (φ) + q3 l3 (φ) (8)
The condition (4a) is easy to be satisfied if we explicitly where q1 , q2 , q3 are hyperparameters balancing the loss
engineer the structure of Vφ (δ, ω). To name a few, Vφ (δ, ω) terms, with q3 tuned to be much larger than the others. Note
can be formulated using a convex function achieving the that the equilibrium (δ ∗ , ω ∗ ) is obtained from the steady state
minimum at the equilibrium. Or, given an arbitrary function in (1) and we fix the equilibrium in training. Of course the
g(δ, ω) and positive scalar , (4a) can be enforced by taking equilibrium changes if the load or the parameters changes.
2
Vφ (δ, ω) = (g(δ, ω) − g(δ ∗ , ω ∗ )) + ||(δ, ω) − (δ ∗ , ω ∗ )||2 . More specifically, ω ∗ = 0 always while δ ∗ varies. Since we
However, such parameterization may be too restrictive and only use the learned function as a regularization to train a
make it hard to satisfy (4b). Therefore, we do not explicitly controller, we are robust to changes in the equilibrium point.
engineer the structure in satisfying condition (4a). If the learned function is used to certify stability, then the
In this paper we use loss functions to penalize violations changes in equilibrium should be carefully accounted for.
of (4a)-(4c). Training is implemented in a batch updating
style where the number of batch is H and the state of C. Algorithm with Active Sampling
the h-th batch is randomly generated (δ h , ω h ) ∈ D for The goal for training the neural Lyapunov function is to
h = 1, · · · , H. The losses are designed with respect to the make larger proportional of the batch samples satisfy the
following considerations: conditions (4). The pseudo-code for our proposed method is
1) Avoid overfitting when δ̇ and ω̇ are large given in Algorithm 1. A linear controller is used to initialize
A loss that weighs all points in the space equally leads training. Let % be the proportion of samples that satisfy the
∇fu V (δ, ω) to have very negative values when δ and ω conditions (4). After most of the samples (e.g., % > 95%)
are far away from the equilibrium, and may violate (4b) have already satisfied the conditions, it would be difficult
for points close to the equilibrium. This contradicts the to improve the neural Lyapunov function further since the
premise that the small region around the equilibrium loss function will remain almost unchanged even though
should be stabilizing. Therefore, we design the loss term % increases slightly. We augment the training performance
with ∇fu V (δ, ω) to be by collecting samples violate (4) and add them to the next
H
batch of training. This way, the neural Lyapunov can improve
1 X efficiently and % can reach 99.9% in the end. Adam algorithm
tanh ∇fu Vφ (δ h , ω h )

l1 (φ) =
H is adopted to update weights φ in each episode.
(5)
h=1
||(δ h , ω h ) − (δ ∗ , ω ∗ )||2

· exp − Algorithm 1 Learning neural Lyapunov function
µ
Require: Learning rate α, number of episodes I, state
h h transfer function (1), hyperparameters in (5)-(8)
where the term tanh ∇fu Vφ (δ , ω ) avoid the overfit
h h Input: Droop coefficient li for the i-th bus, i = 1, · · · , N
of ∇ fu V (δh ,hω ) ∗to ∗ be extremely negative, the term
exp − ||(δ ,ω )−(δ µ
,ω )||2
emphasis the importance of Initialisation :Initial weights φ for neural network
1: for episode = 1 to I do
(δ h , ω h ) closer to the equilibrium. The hyper-parameter
2: Generate batch state samples δ h , ω h for the h-th batch,
µ controls rate of decay.
h = 1, · · · , H
2) Penalty term with (Vφ (δ ∗ , ω ∗ )) − Vφ (δ, ω))
3: If % > %̄, add the samples violates Lyapunov condition
In order to satisfying condition (4a), Vφ (δ h , ω h ) that is
{(δ, ω)} ← {(δ, ω), (δ̂, ω̂)}
smaller than Vφ (δ ∗ , ω ∗ ) need to be penalized. Let σ(·) be
4: Compute fu (δ, ω) for the sample states with linear
the ReLU function. Define the loss term as:
droop control using (1)
H
1 X 5: Calculate Vφ (δ, ω) and ∇ fu Vφ (δ, ω)
σ −Vφ (δ h , ω h ) + Vφ (δ ∗ , ω ∗ ))

l2 (φ) = (6)
H 6: Identify the states δ̂, ω̂ that does not satisfy Lya-
h=1
punov condition and its percentage %
3) Penalty term with ∇fu Vφ (δ ∗ , ω ∗ ) 7: Calculate total loss of all the batches using (5)-(8)
This term is employed to mitigate numerical errors. We 8: Update weights in the neural network by passing Loss
design a extra loss term to penalize on the value of to Adam optimizer: φ ← φ − αAdam(Loss)
∇fu Vφ (δ ∗ , ω ∗ ) as: 9: end for
2
l3 (φ) = (∇fu Vφ (δ ∗ , ω ∗ )) + σ (∇fu Vφ (δ ∗ , ω ∗ )) (7)
IV. L EARNING N EURAL N ETWORK C ONTROLLER WITH the corresponding bias vector. The variables to be trained are
LYAPUNOV REGULARIZATION weights θ = {q, b, z, c} in (10).
We propose to use the learned neural Lyapunov function To obtain the trajectory for training the controller, we
to guide the training of neural network controller. We adopt discretize dynamics (1) with step size ∆t. We use k and
the neural Lyapunov function as an additional regularization K to represent the discrete time and total number of stages,
that is used during the training process of the neural network respectively. The neural network controller is then denoted
controller. The real-time control policy is computed through as uθi (ωi ). From (1), ωi (k) in each timestep k is a function
the input of local frequency deviation and the weights that of ωi (k − 1) and uθi (ωi (k − 1)), which is then a function
are trained offline. Note that we may be able to achieve better of ωi (k − 2) and uθi (ωi (k − 2). This means that computing
performance through a projection if the Lyapunov conditions gradient of uθi (ωi (k) with respect to θi needs the chain-rule
are violated. However, such a projection requires information from the step k all the way to the first time step for all
of all the state variables in real-time, which is unrealistic for k = 0, · · · , K. To mitigate the computation burden caused
the power system with large numbers of nodes and limited by the subsequent application of chain-rule, we proposed
communication. a RNN-based framework to integrate the state transition
dynamics (1) implicitly.
A. Lyapunov Regularization As illustrated in Fig.1, the state of RNN cell of bus
Given a Lyapunov function, Proposition 2 illustrates the i is set to be (δi , ωi ). The system dynamics (1) is set
condition for locally exponentially stability [22]. as the transition function of RNN cell. At each time k,
state of RNN cell and the current action from neural net-
Proposition 2 (locally exponentially stable condition). For
work controller will go through the transition dynamics
the function V : D → R satisfying (4), if there is constant
to calculate the state of the time k + 1. The state ω
β > 0 such that for all (δ, ω) ∈ D we have
and action u constitute the first two component of output
2
∇fu V (δ, ω) ≤ −β (V (δ, ω) − V (δ ∗ , ω ∗ )) (9) where Yi1 (k) = ωi (k) and Yi2 (k) = (uθi (ωi (k))) . The
total state information and time derivative information are
Then, the equilibrium is locally exponentially stable. simultaneously send as input into Neural Lyapunov function
In order to satisfy (9) with the neural network controller, to calculate the Lyapunov regularization term, written as
we propose a Lyapunov regularization approach that the Yi3 (k) = σ (∇fu Vφ (δ, ω) + β(Vφ (δ, ω) − Vφ (δ ∗ , ω ∗ ))) /N .
action is penalized if this inequality does not hold. Compared
with traditional regularization (e.g., lasso, ridge) or penalty
term on large state magnitude, we do not add regularization
uniformly to all the weights or actions. Instead, the action
is only penalized when (9) is violated. The regularization
term is

Rφ (uθ ) = σ (∇fu Vφ (δ, ω) + β(Vφ (δ, ω) − Vφ (δ ∗ , ω ∗ )))

B. Controller and Architecture
Fig. 1. Structure of RNN for frequency control problem
The formulation of controller and the training architecture
is from our previous work [12]. For completeness, we
reiterate the key design in this subsection. The work in [12] The loss function is formulated to be equivalent with the
showed that a controller mapping frequency to active power objective function (2a) plus the Lyapunov regularization as:
needs to be a function that is monotonic, increasing and goes N K
through the origin. To this end, we explicitly engineer the
X 1 X 2
Loss = max |Yi1 (k)| + γ Yi (k)
neural network controller with a stacked-ReLU structure and i=1
k=0,··· ,K K
k=1
represented as (10) K
(11)
1 X 3
ui (ωi ) = qi σ(1ωi + bi ) + zi σ(−1ωi + ci ) (10a) +λ Yi (k).
K
k=1
l
X l
X
where qij ≥ 0, zij ≤ 0, ∀l = 1, 2, · · · , m (10b) C. Algorithm to Train Neural Network Controller
j=1 j=1
The pseudo-code for learning the neural network controller
(l−1)
b1i = 0, bli ≤ bi , ∀l = 2, 3, · · · , m (10c) is given in Algorithm 2. Training is implemented in a batch
(l−1) updating style where the h-th batch initialized with randomly
c1i = 0, cli ≤ ci , ∀l = 2, 3, · · · , m (10d)
generated initial states {δih (0), ωih (0)} for all i = 1, · · · , N .
where m is the number of hidden units and 1 ∈ Rm is the The evolution of states in K stages will be computed through
all 1’s column vector. Variables qi = [qi1 qi2 · · · qim ] structure of RNN as shown by Fig.1. Although algorithms 1
and zi = [zi1 zi2 · · · zim ] are the weight vector of bus i; and 2 can be iterated to make further update, we did not see
bi = [b1i b2i · · · bm |
i ] and ci = [ci
1
c2i · · · cm |
i ] are an obvious improvement in simulation.
Algorithm 2 Reinforcement Learning with RNN and decay every 30 steps with a base of 0.7.
Require: Learning rate α, batch size H, total time stages K,
number of episodes I, parameters in optimal frequency B. Visualization Lyapunov Function and the Lie Derivative
control problem (2) To visualize the Lyapunov function with a large number
Input: The neural Lyapunov function Vφ (δ, ω) of state variables, we fix all the states at their equilibrium
Initialisation :Initial weights θ for control network value and vary the state variable for one generator bus.
1: for episode = 1 to I do Fig. 2 illustrates the value of Lyapunov function and Lie
2: Generate initial states δih (0), ωih (0) for the i-th bus in derivative with the variation of δ and ω in generator bus 5.
the h-th batch, i = 1, · · · , N , h = 1, · · · , H The Lyapunov function V (δ, ω) achieves the minimum at
3: Reset the state of cells in each batch as the initial the equilibrium point and thus satisfies condition (4a). The
value xhi ← {δih (0), ωih (0)}. Lie derivative ∇fu Vφ (δ, ω) is smaller than zero in most of
4: RNN cells compute through K stages to obtain output the regions and thus also generally satisfy condition (4b).
{Yh,i (0), Yh,i (1), · · · , Yh,i (K)} After convergence, only 0.1% of samples with ω sufficiently
5: Calculate total PH loss PN of all the 1 batches close to zero make ∇fu Vφ (δ, ω) to be slightly positive.
1
Loss = H h=1 i=1 maxk=0,··· ,K |Yh,i (k)| + Such a small positive number only leads to small Lyapunov
1
PK 2 1
PK 3
γ K k=1 Yh,i (k) + λ K k=1 Yh,i (k). regularization term and therefore has neglectable impact on
6: Update weights in the neural network by passing Loss the training of neural network controller.
to Adam optimizer: θ ← θ − αAdam(Loss)
7: end for

V. C ASE S TUDY
Case studies are conducted on the IEEE New England
10-machine 39-bus (NE39) power network to illustrate the
effectiveness of the proposed method. We visualized the Fig. 2. Neural Lyapunov function (left) and Lie derivative (right) when
learned Lyapunov function and its Lie derivative. Then we changing (δ, ω) in generator 5 and keep state variable of other generators
show that regularization is necessary, in the sense that a at the equlibrium value.
controller learned without it can be unstable. Lastly, we show
the training losses.

A. Simulation Setting
The step size for the discrete simulation is set to 0.02
(20ms) and the time stages K is 100. Mechanical power Pm,i
are set at the nominal values, the bound on action ui is
(a) Dynamics of u (left) and ω (right) for RNN with Lyapunov regulariza-
uniformly distributed in [0.8Pm,i , Pm,i ] and γ is set as 0.005. tion
The parameters for training the neural networks are:
• Neural Lyapunov function is parameterized as a dense
neural network with one hidden layer of 50 neurons
and ELU activation. The episode number is 4000. The
hyper-parameters in (5)-(11) are µ = 50, q1 = 10, q2 =
5, q1 = 100, β = 0.005, λ = 0.01. Each episode has
the batch number of 1000 with random states samples, (b) Dynamics of u (left) and ω (right) for RNN without Lyapunov
regularization
where δih is uniformly distributed in [−20, 20] rad, ωih
is uniformly distributed in [−30, 30] Hz. Note that these Fig. 3. Dynamics of frequency deviation w and control action u in
selected generator buses corresponding to (a) RNN-Lyapunov (b) RNN-
values are far larger than the nominal variation of δi w.o.-Lyapunov. The neural network controller trained with Lyapunov regu-
and ωi . This configuration works the best after we try larization achieve better stablizing performance.
different values. Trainable weights are updated using
Adam with learning rate initializes at 0.05 and decay
every 100 steps with a base of 0.9. C. Performance Comparison
• Neural network controller is parameterized as the Under the same hyperparameters and RNN structure, we
stacked-ReLU function (10) with 20 neurons (m = 20). train the neural network controller with Lyapunov regular-
The episode number is 400. The initial states of angle ization (labeled as RNN-Lyapunov) and without Lyapunov
and frequency are randomly generated such that δi (0) is regularization (labeled as RNN-w.o.-Lyapunov), respectively.
uniformly distributed in [−1, 1] rad, ωi (0) is uniformly The dynamics of the system with different controllers are
distributed in [−0.5, 0.5] Hz. Trainable weights are up- illustrated in Fig. 3. For RNN-Lyapunov in Fig. 3(a), the
dated using Adam with learning rate initializes at 0.04 largest frequency deviation varies in [−0.015, 0.02] Hz. By
contrast, RNN-wo-Lyapunov in Fig. 3(b) has larger fre- the neural Lyapunov function as a regularization term for
quency deviation that varies in [−0.03, 0.02] Hz and cannot the training of neural network controller in RL, control
stabilize well. actions that violate Lyapunov conditions are penalized. Case
We further compare RNN-Lyapunov and RNN-w.o.- studies verify introducing Lyapunov regularization enable
Lyapunov with the benchmark of linear droop control, where the controller to be stabilizing and achieve smaller losses,
the droop coefficient is obtained by solving problem (2) using whereas controllers trained without regularization can fail to
fmincon function of Matlab [12]. Fig. 4 illustrates the control stabilize the system.
policy obtained from the three methods. Compared with
R EFERENCES
linear droop control, the stacked-ReLU neural network learns
[1] H.-D. Chiang, “Study of the existence of energy functions for power
a highly non-linear controller. The average cost normalized systems with losses,” IEEE Transactions on Circuits and Systems,
by the cost of linear droop control along episode is shown in vol. 36, no. 11, pp. 1423–1429, 1989.
Fig. 5. Both RNN Lyapunov and RNN-w.o.-Lyapunov con- [2] A. Arapostathis, S. Sastry, and P. Varaiya, “Global analysis of swing
dynamics,” IEEE Transactions on Circuits and Systems, vol. 29,
verge in approximate 150 episodes. After convergence, RNN- no. 10, pp. 673–679, 1982.
Lyapunov reduces the cost by approximate 19% compared [3] Y. Jiang, R. Pates, and E. Mallada, “Dynamic droop control in low-
with linear droop control. The reduction is 5% more than inertia power systems,” IEEE Transactions on Automatic Control,
2020.
that of RNN-w.o.-Lyapunov. Therefore, the proposed method [4] B. B. Johnson, S. V. Dhople, A. O. Hamadeh, and P. T. Krein, “Syn-
learns a non-linear stabilizing controller that performs better chronization of parallel single-phase inverters with virtual oscillator
than traditional linear droop control. control,” IEEE Transactions on Power Electronics, vol. 29, no. 11,
pp. 6124–6138, 2013.
[5] X. Chen, G. Qu, Y. Tang, S. Low, and N. Li, “Reinforcement learning
for decision-making and control in power systems: Tutorial, review,
and vision,” arXiv preprint arXiv:2102.01168, 2021.
[6] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.
MIT press, 2018.
[7] Z. Yan and Y. Xu, “Data-driven load frequency control for stochastic
power systems: A deep reinforcement learning method with continuous
action search,” IEEE Transactions on Power Systems, vol. 34, no. 2,
pp. 1653–1656, 2018.
[8] C. Chen, M. Cui, F. F. Li, S. Yin, and X. Wang, “Model-free
emergency frequency control based on reinforcement learning,” IEEE
Transactions on Industrial Informatics, 2020.
[9] J. Duan, D. Shi, R. Diao, H. Li, Z. Wang, B. Zhang, D. Bian, and Z. Yi,
“Deep-reinforcement-learning-based autonomous voltage control for
Fig. 4. Control action u of RNN-Lyapunov, RNN-w.o.-Lyapunov and power grid operations,” IEEE Transactions on Power Systems, vol. 35,
Linear droop control for generator bus 10. Lyapunov regularization leads to no. 1, pp. 814–817, 2019.
different non-linear control law. [10] D. Ernst, M. Glavic, F. Capitanescu, and L. Wehenkel, “Reinforcement
learning versus model predictive control: a comparison on a power sys-
tem problem,” IEEE Transactions on Systems, Man, and Cybernetics,
Part B (Cybernetics), vol. 39, no. 2, pp. 517–529, 2008.
[11] I. Dobson and H.-D. Chiang, “Towards a theory of voltage collapse
in electric power systems,” Systems & Control Letters, vol. 13, no. 3,
pp. 253–262, 1989.
[12] W. Cui and B. Zhang, “Reinforcement learning for optimal frequency
control: A lyapunov approach,” arXiv preprint arXiv:2009.05654,
2020.
[13] G. Manek and J. Z. Kolter, “Learning stable deep dynamics models,”
arXiv preprint arXiv:2001.06116, 2020.
[14] Y.-C. Chang, N. Roohi, and S. Gao, “Neural lyapunov control,”
Advances in neural information processing systems, 2019.
[15] T. Huang, S. Gao, and L. Xie, “Transient stability assessment of
networked microgrids using neural lyapunov methods,” arXiv preprint
arXiv:2012.01333, 2020.
[16] T. Nishikawa and A. E. Motter, “Comparative analysis of existing
models for power-grid synchronization,” New Journal of Physics,
Fig. 5. Normalized cost along the episode during the training of neural net- vol. 17, no. 1, p. 015012, 2015.
work controller with and without Lyapunov regularization. RNN-Lyapunov [17] A. Ademola-Idowu and B. Zhang, “Frequency stability using inverter
and RNN-w.o.-Lyapunov reduce the cost by approximate 19% and 14% power control in low-inertia power systems,” IEEE Transactions on
compared with linear droop control. Power Systems, pp. 1–1, 2020.
[18] P. Kundur, N. J. Balu, and M. G. Lauby, Power system stability and
control. McGraw-hill New York, 1994, vol. 7.
[19] D. Tabas and B. Zhang, “Optimal l-infinity frequency con-
VI. C ONCLUSION trol in microgrids considering actuator saturation,” arXiv preprint
arXiv:1910.03720, 2019.
This paper proposes a Lyapunov regularization approach [20] F. Dörfler, M. Chertkov, and F. Bullo, “Synchronization in complex
to guide the training of neural network controller for pri- oscillator networks and smart grids,” Proceedings of the National
mary frequency response for transient stability. A function Academy of Sciences, vol. 110, no. 6, pp. 2005–2010, 2013.
[21] S. Sastry, Nonlinear systems: analysis, stability, and control. Springer
paramertized as neural network is learned to overcome the Science & Business Media, 2013, vol. 10.
difficulty brought by the non-existence of analytical Lay- [22] K. J. Åström and R. M. Murray, Feedback systems: an introduction
punov functions for lossy power networks. By integrating for scientists and engineers. Princeton university press, 2010.

Detailed Solutions
No ratings yet
Detailed Solutions
1 page
Jonathan Maldonado CPE 2022
No ratings yet
Jonathan Maldonado CPE 2022
60 pages
2007 Static Output Feedback Control Synthesis For Linear
No ratings yet
2007 Static Output Feedback Control Synthesis For Linear
7 pages
Symmetry 16 00322
No ratings yet
Symmetry 16 00322
15 pages
Control Systems Cousework
No ratings yet
Control Systems Cousework
7 pages
1 s2.0 S0378779622009750 Main
No ratings yet
1 s2.0 S0378779622009750 Main
14 pages
Lyapunov Event-Triggered Stabilization With A Known Convergence Rate
No ratings yet
Lyapunov Event-Triggered Stabilization With A Known Convergence Rate
15 pages
PMSM Kharitonov 1
No ratings yet
PMSM Kharitonov 1
10 pages
DC System
100% (1)
DC System
230 pages
Formally Verified Neural Lyapunov Function For Incremental Input-to-State Stability of Unknown Systems
No ratings yet
Formally Verified Neural Lyapunov Function For Incremental Input-to-State Stability of Unknown Systems
11 pages
Paper 4
No ratings yet
Paper 4
16 pages
IEEE TPS Ver01
No ratings yet
IEEE TPS Ver01
25 pages
Lyapunov Function For Power Systems With Transfer Conductances Extension of The Invariance Principle
No ratings yet
Lyapunov Function For Power Systems With Transfer Conductances Extension of The Invariance Principle
9 pages
Secondary Frequency Control of Microgrids An Online Reinforcement Learning Approach
No ratings yet
Secondary Frequency Control of Microgrids An Online Reinforcement Learning Approach
8 pages
Model Predictive Control For Virtual Synchronous
No ratings yet
Model Predictive Control For Virtual Synchronous
5 pages
Extended Lyapunov Function For Power Systems With Transmission Losses
No ratings yet
Extended Lyapunov Function For Power Systems With Transmission Losses
7 pages
36385-Article Text-138449-1-10-20240711
No ratings yet
36385-Article Text-138449-1-10-20240711
14 pages
Application of Quasi Opposition Based Whale Algorithm For LFC of Multi-Area Deregulated Power System Using Fractional Controller
No ratings yet
Application of Quasi Opposition Based Whale Algorithm For LFC of Multi-Area Deregulated Power System Using Fractional Controller
6 pages
075 Ilka c2
No ratings yet
075 Ilka c2
7 pages
Adaptive Control Lyapunov Function Based Model Predictive Control For Continuous Nonlinear Systems
No ratings yet
Adaptive Control Lyapunov Function Based Model Predictive Control For Continuous Nonlinear Systems
13 pages
Computing Lyapunov Functions Using Deep Neural Networks
No ratings yet
Computing Lyapunov Functions Using Deep Neural Networks
27 pages
Designing System Level Synthesis Controllers For N
No ratings yet
Designing System Level Synthesis Controllers For N
11 pages
IET Generation Trans Dist - 2018 - Ramachandran - Load Frequency Control of A Dynamic Interconnected Power System Using
No ratings yet
IET Generation Trans Dist - 2018 - Ramachandran - Load Frequency Control of A Dynamic Interconnected Power System Using
10 pages
Hussain 2014
No ratings yet
Hussain 2014
12 pages
Near-Optimal Control of Dynamical Systems With Neural Ordinary Differential Equations
No ratings yet
Near-Optimal Control of Dynamical Systems With Neural Ordinary Differential Equations
23 pages
Model-Free Frequency Control of Power Systems With Unknown Markov Jump Parameters
No ratings yet
Model-Free Frequency Control of Power Systems With Unknown Markov Jump Parameters
5 pages
Reinforcement Learning-Based Decentralized Safety Control For Constrained Interconnected Nonlinear Safety-Critical Systems
No ratings yet
Reinforcement Learning-Based Decentralized Safety Control For Constrained Interconnected Nonlinear Safety-Critical Systems
23 pages
Certified Training With Branch-and-Bound.18235v1
No ratings yet
Certified Training With Branch-and-Bound.18235v1
16 pages
Energies 15 03809 v2
No ratings yet
Energies 15 03809 v2
29 pages
1.nonlinear Robust Control For Single-Machine Infinite-Bus Power Systems With Input Saturation
No ratings yet
1.nonlinear Robust Control For Single-Machine Infinite-Bus Power Systems With Input Saturation
7 pages
Event-Triggered Control For Nonlinear Uncertain Systems Via A Prescribed-Time Approach
No ratings yet
Event-Triggered Control For Nonlinear Uncertain Systems Via A Prescribed-Time Approach
7 pages
Neural Network Predictive Control of UPF
No ratings yet
Neural Network Predictive Control of UPF
10 pages
Energies 15 09177
No ratings yet
Energies 15 09177
21 pages
1reinforcement Learning-Based Model Predictive Control For Discrete-Time Systems
No ratings yet
1reinforcement Learning-Based Model Predictive Control For Discrete-Time Systems
13 pages
Robust Partial Feedback Linearizing Excitation Controller Design For Multimachine Power Systems
No ratings yet
Robust Partial Feedback Linearizing Excitation Controller Design For Multimachine Power Systems
14 pages
Author 'S Accepted Manuscript
No ratings yet
Author 'S Accepted Manuscript
36 pages
Neural Network MPC
No ratings yet
Neural Network MPC
8 pages
A Reinforcement Learning Approach For Fast Frequency Control in Low-Inertia Power Systems
No ratings yet
A Reinforcement Learning Approach For Fast Frequency Control in Low-Inertia Power Systems
6 pages
Engineering Science and Technology, An International Journal
No ratings yet
Engineering Science and Technology, An International Journal
21 pages
Robust Nonlinear Controller Design For Transient Stability Enhancement of Power Systems
No ratings yet
Robust Nonlinear Controller Design For Transient Stability Enhancement of Power Systems
6 pages
2020 Tacchi
No ratings yet
2020 Tacchi
8 pages
Recurrent Neural Network Modeling For Model Predictive Control
No ratings yet
Recurrent Neural Network Modeling For Model Predictive Control
31 pages
Neural Lyapunov Control For Power System Transient Stability A Deep Learning-Based Approach
No ratings yet
Neural Lyapunov Control For Power System Transient Stability A Deep Learning-Based Approach
12 pages
Robust Mathcal H Infty State Feedback Controllers Based On Linear Matrix Inequalities Applied To Grid-Connected Converters
No ratings yet
Robust Mathcal H Infty State Feedback Controllers Based On Linear Matrix Inequalities Applied To Grid-Connected Converters
11 pages
Nonaffine Nonlinear
No ratings yet
Nonaffine Nonlinear
5 pages
1 en 25 Chapter Author (1) - 1
No ratings yet
1 en 25 Chapter Author (1) - 1
15 pages
Books On High Voltage Engineering and Dielectrics
No ratings yet
Books On High Voltage Engineering and Dielectrics
4 pages
Power Systems
No ratings yet
Power Systems
6 pages
Using Dynamic Neural Networks To Generate Chaos: An Inverse Optimal Control Approach
No ratings yet
Using Dynamic Neural Networks To Generate Chaos: An Inverse Optimal Control Approach
7 pages
Power System Stability
No ratings yet
Power System Stability
9 pages
3796 Neural Lyapunov Model Predicti
No ratings yet
3796 Neural Lyapunov Model Predicti
12 pages
Feedback Linearized Model of DC Motor Using Differential Geometry
No ratings yet
Feedback Linearized Model of DC Motor Using Differential Geometry
6 pages
Power Systems Stability Control: Reinforcement Learning Framework
No ratings yet
Power Systems Stability Control: Reinforcement Learning Framework
9 pages
Control Design Along Trajectories With Sums of Squares Programming
No ratings yet
Control Design Along Trajectories With Sums of Squares Programming
8 pages
Stabilization of Linear/Nonlinear Autonomous Systems Using Lyapunov Functions
No ratings yet
Stabilization of Linear/Nonlinear Autonomous Systems Using Lyapunov Functions
6 pages
IJIREEICE8 s9 Pradeep MIMO-PID Controller For 3 Area Power System
No ratings yet
IJIREEICE8 s9 Pradeep MIMO-PID Controller For 3 Area Power System
4 pages
Lyapunov Stability Analysis of Complex Power Systems
No ratings yet
Lyapunov Stability Analysis of Complex Power Systems
24 pages
Improved Particle Swarm Optimization Based Load Frequency Control in A Single Area Power System
No ratings yet
Improved Particle Swarm Optimization Based Load Frequency Control in A Single Area Power System
4 pages
Training On Iso 17011
No ratings yet
Training On Iso 17011
93 pages
Ala Mardawi
No ratings yet
Ala Mardawi
171 pages
SAP Plant Maintenance
100% (3)
SAP Plant Maintenance
26 pages
New Intelligent AVR Controller Based On Particle Swarm Optimization For Transient Stability Enhancement
No ratings yet
New Intelligent AVR Controller Based On Particle Swarm Optimization For Transient Stability Enhancement
6 pages
Construction of Lyapunov Function To Examine Robust Stability For Linear System
No ratings yet
Construction of Lyapunov Function To Examine Robust Stability For Linear System
6 pages
7075 AutotransformerProtection RA 20230912 Web
No ratings yet
7075 AutotransformerProtection RA 20230912 Web
21 pages
Transformer Oil in Service - Case Histories
No ratings yet
Transformer Oil in Service - Case Histories
22 pages
RHC Protection - TCS
No ratings yet
RHC Protection - TCS
29 pages
Learning
No ratings yet
Learning
2 pages
Direct Methods For Transient Stability Studies in Power System Analysis (Lyapunov - Energy Function)
No ratings yet
Direct Methods For Transient Stability Studies in Power System Analysis (Lyapunov - Energy Function)
10 pages
Breaker Failure Protection Applications of Modern Numerical Distance Relays
No ratings yet
Breaker Failure Protection Applications of Modern Numerical Distance Relays
15 pages
Design of Robust Power System Stabilizer Based On Particle Swarm Optimization
No ratings yet
Design of Robust Power System Stabilizer Based On Particle Swarm Optimization
8 pages
BR 821-044 Product-Brochure Cable-Fault-Location EN PDF
No ratings yet
BR 821-044 Product-Brochure Cable-Fault-Location EN PDF
15 pages
Cables & Power Network Calculations
No ratings yet
Cables & Power Network Calculations
65 pages
A Guide To Understanding Partial Discharge Sensor Applications
No ratings yet
A Guide To Understanding Partial Discharge Sensor Applications
18 pages
Quiz
No ratings yet
Quiz
13 pages
Transformer Neutral Grounding
No ratings yet
Transformer Neutral Grounding
5 pages
Conceptual Clarifications in Electrical Power Engineering-Basics
0% (1)
Conceptual Clarifications in Electrical Power Engineering-Basics
29 pages
Qaid 2021 IOP Conf. Ser. Mater. Sci. Eng. 1127 012034
No ratings yet
Qaid 2021 IOP Conf. Ser. Mater. Sci. Eng. 1127 012034
9 pages
Quiz
No ratings yet
Quiz
6 pages
How To Prevent Power Transformer Accidents
No ratings yet
How To Prevent Power Transformer Accidents
5 pages
Mastery Level
No ratings yet
Mastery Level
2 pages
Design Knowhow Low Voltage Substation Layouts Earthing Fire Protection and Tests PDF
50% (2)
Design Knowhow Low Voltage Substation Layouts Earthing Fire Protection and Tests PDF
24 pages
Misconceptions About Transformers
No ratings yet
Misconceptions About Transformers
21 pages
Renard Series and Preferred Electrical Ratings
No ratings yet
Renard Series and Preferred Electrical Ratings
2 pages
SEL411L Line Protection Relay Summary - Repaired
No ratings yet
SEL411L Line Protection Relay Summary - Repaired
28 pages
IEC602551 XX Protectionrelayfunctionalstandardsforall
No ratings yet
IEC602551 XX Protectionrelayfunctionalstandardsforall
8 pages
Introduction To Harmonic Analysis Basics - R2
100% (2)
Introduction To Harmonic Analysis Basics - R2
46 pages
Detailed Solutions
No ratings yet
Detailed Solutions
2 pages
How To Make Calculation For A Distribution Substation 1004 KV 21600 kVA
No ratings yet
How To Make Calculation For A Distribution Substation 1004 KV 21600 kVA
14 pages
Maximum Permissible Flux Density in Transformers at Rated Voltage and Frequency
100% (1)
Maximum Permissible Flux Density in Transformers at Rated Voltage and Frequency
7 pages

Lyapunov-Regularized Reinforcement Learning For Power System PDF

Uploaded by

Lyapunov-Regularized Reinforcement Learning For Power System PDF

Uploaded by

Lyapunov-Regularized Reinforcement Learning for Power System

Rφ (uθ ) = σ (∇fu Vφ (δ, ω) + β(Vφ (δ, ω) − Vφ (δ ∗ , ω ∗ )))

You might also like