0% found this document useful (0 votes)
12 views12 pages

D Statcom

This document presents a study on using deep reinforcement learning to control a D-STATCOM device for improving power quality in distribution systems. Specifically, it assesses using the Deep Deterministic Policy Gradient algorithm to generate d-q axis current references for D-STATCOM control, which can enhance voltage stability and transient response. Simulations of the proposed approach on an IEEE 13-bus system show it provides better control of the D-STATCOM than conventional control methods.

Uploaded by

velpula sukumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views12 pages

D Statcom

This document presents a study on using deep reinforcement learning to control a D-STATCOM device for improving power quality in distribution systems. Specifically, it assesses using the Deep Deterministic Policy Gradient algorithm to generate d-q axis current references for D-STATCOM control, which can enhance voltage stability and transient response. Simulations of the proposed approach on an IEEE 13-bus system show it provides better control of the D-STATCOM than conventional control methods.

Uploaded by

velpula sukumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received September 12, 2021, accepted October 6, 2021, date of publication October 13, 2021, date of current version

November 2, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3119745

D-STATCOM d-q Axis Current Reference Control


Applying DDPG Algorithm in the Distribution
System
JONG HA WOO 1 , LEI WU 2 , (Senior Member, IEEE), SUNG MIN LEE1 ,
JONG-BAE PARK 1 , (Member, IEEE), AND JAE HYUNG ROH 1 , (Member, IEEE)
1 Department of Electrical and Electronics Engineering, Konkuk University, Seoul 05029, South Korea
2 Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ 07030, USA
Corresponding author: Jae Hyung Roh ([email protected])
This work was supported in part by the Human Resources Program in Energy Technology of the Korea Institute of Energy Technology
Evaluation and Planning (KETEP), and in part by the Ministry of Trade, Industry and Energy (MOTIE), South Korea, under Grant
20204010600220 and Grant 20194030202370.

ABSTRACT The high penetration level of renewable energy in large-scale power systems could adversely
affect power quality, such as voltage stability and harmonic pollution. This paper assesses the impacts of
Distribution Static Compensator (D-STATCOM), one of the Flexible AC Transmission System (FACTS)
devices, on power quality of 4.16kV-level distribution systems via transient and steady-state analysis.
Carrier-based Pulse Width Modulation (PWM) control in D-STATCOM generates d-q axis current reference
via the PID (Proportional-Integral-Differential) controller to control d-q axis current and voltage. A new
control method, via the Deep Deterministic Policy Gradient (DDPG) algorithm-based reinforcement learning
(RL), is studied to create a new d-q axis current reference applying to the voltage control, which can
improve voltage stability and transient response and derive fast convergence of current and voltage at the
D-STATCOM bus. The real-time simulations on an IEEE 13-bus system show that the proposed approach
can better control the D-STATCOM than the conventional control methods for enhancing voltage stability
and transient performance.

INDEX TERMS D-STATCOM, FACTS device, reinforcement learning.

NOMENCLATURE Symbols:
Abbreviations: Ts RL sampling time.
D-STATCOM Distribution Static Compensator. 6 Gaussian Action Space Noise.
FACTS Flexible AC Transmission System. V voltage in real time.
VSC Voltage Source Converters. i current in real time.
PCC Point of Common Coupling.
B a batch sampled from the training dataset, from
PLL Phase Locked Loop.
replay buffer.
PID Proportional-Integral-Differential.
π Policy: π, returns an action sampled from our
PWM Pulse Width Modulation.
Actor network plus some noise for exploration.
IGBT Insulated Gated Bipolar Transistor.
NN Neural Network. θ Critic target Neural network, Q-value func-
RL Reinforcement Learning. tion critic Q(s,a) that maps state & action pair
DQN Deep Q-Network. to scalar value representing the expected total
DDPG Deep Deterministic Policy Gradient. long-term rewards.
8 Actor target Neural network, Deterministic pol-
icy actor ϕ (s) that maximizes the expected
The associate editor coordinating the review of this manuscript and cumulative long-term reward.
approving it for publication was Chun-Wei Tsai . τ Target smooth factor.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.


145840 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 9, 2021
J. H. Woo et al.: D-STATCOM d-q Axis Current Reference Control Applying DDPG Algorithm

γ Discount factor. requiring significant domain expertise of the control engi-


x Difference between current reference which neer. This is because it requires prior knowledge regarding
gained from PID controller output and real value the plant model, which has intrinsic dynamic characteris-
of current. tics. Otherwise, from the perspective of simple structure and
1Iref Difference between next and previous Current empirical tuning schemes, PID controller is not always effec-
Reference. tive in producing desired system performance [13]–[15].
1Vref Difference between next and previous Voltage To overcome this problem, a novel artificial intelligence
Reference. method is introduced to replace the conventional control
R Reward Function. method. To improve the PQ compensator, the NN controller
adapts strategy to enhance stability regarding adaptive stabil-
ity, transient stability, steady stability, and robustness [16].
I. INTRODUCTION Hopfield-type NN is used to get D-STATCOM’s fundamen-
In recent years, the continuous development of the tal current computation under the distorted voltage source
power industry has contributed to the increased need for condition [17]. Multi recurrent neural networks, especially
high-precision equipment such as power electronic devices. echo state network and simplified dual network, is adopted
Furthermore, taking increment for the microgrid into con- for system identification and dynamic optimization for model
sideration, distribution system stability is potentially endan- predictive control [18]. In addition, DNN also trained by RL
gered from a penetration rate of distributed resources with can implement such complex controllers [19].
the technical issue [1]. This trend makes the distribution RL for PI controller tuning algorithm has also been intro-
system profound complexity and deteriorates the power qual- duced and discussed for automatically tuning a PI gain of the
ity [2], [3]. It reflects microgrid to face significant system’s discrete control system [20]. These works can be self-taught
problem in an autonomous mode of operation [4]. Specifi- without domain expertise or knowledge from the expert con-
cally, the associated power quality issues of the distribution trol engineer. In this paper, instead of using PID controllers,
systems include power factor degradation, harmonic current we use RL agents’’ actions to set the d-q axis reference
introduction, interruption, instantaneous voltage drop (Sag), currents for controlling the D-STATCOM to improve voltage
instantaneous voltage rise (Swell), and voltage unbalance. stability.
FACTS facilities have recently been researched to compen- In this paper, the DDPG algorithm with two advantages
sate for these power quality issues [5], [6]. In particular, is used to handle this problem among the various kinds of
a D-STATCOM is a compensating device that is used to RL. First, the concept of experience replay buffer applies
control reactive power flows in the distribution systems and to DDPG. Whereas Q-learning and SARSA algorithms do
enhance voltage stability [7]. In addition, a D-STATCOM, not have, it means data is not stored and is inefficient. It is
installed in parallel to the distribution system via Voltage supported by the fact that Q-learning and SARSA use every
Source Converters (VSC), can act as an active filter to flexibly sample once, after which they discard the sample [21]–[23].
respond to external disturbances and load variation [8], [9]. On the other hand, experience replay buffer entitles RL to
A D-STATCOM is a shunt-connected bidirectional learn quickly from a limited amount of data with reusing
converter-based device that operates as an impedance con- the data set stored from the buffer without endangering their
verter to utilize either inductive or capacitive electrical ele- convergence properties. It also significantly affects continu-
ments by revising its output voltage levels. D-STATCOM can ous variables on control systems in practical real-time appli-
solve the problem mentioned earlier, such as the poor load cations [24]–[26]. Second, although the DQN model uses
power factor, the poor voltage regulation, and the unbalanced experience replay buffer, a difference exists in RL action’s
loads [10]. D-STATCOM, consisting of three controllers (i.e., space. Since the variables under the s the real-time control
AC Voltage Regulator, DC Voltage Regulator, and Current system corresponds to continuous, DQN has limited discrete
Regulator), uses voltage source PWM inverters that utilize action space. Thus, DDPG has done a wider range of explo-
the Insulated Gate Bipolar Transistor (IGBT) to compensate ration, which can improve the control system’s performance
for the voltage in the discrete time domain with reactive after sufficient training, and it can reduce the number of
power supports [11]. Thus, the studies on enhanced control iterations to achieve the same performance between DQN and
performance, such as improving transient response, become DDPG [27].
more vital, especially in the emerging power system with To sum up, the main contribution is a novel strategy to fos-
a high penetration of renewable energy sources [12]. The ter D-STATCOM’s control ability in a belief that the reference
uncertainty and intermittency characteristic of renewable of the control system should be variable and changed. The
energy poses significant challenges in controlling nodal volt- proposed DDPG model with continuous action space does not
ages to avoid power system collapse. require access to plant models without prior knowledge, and
Techniques such as gain scheduling, robust control, and it could adapt to various plant conditions.
model predictive control are widely used to control nonlin- We proceed with the case study on an IEEE 13-bus system
ear systems with complicated control tasks. However, these and control D-STATCOM with a new control method using
approaches are hard to tune the parameters such as gains, RL, specifically the DDPG algorithm, by monitoring the PCC

VOLUME 9, 2021 145841


J. H. Woo et al.: D-STATCOM d-q Axis Current Reference Control Applying DDPG Algorithm

voltage value, which is directed to the part of reinforcement TABLE 1. Design parameters of D-STATCOM.
learning’s state elements in proposed approach. In section 2,
the D-STATCOM topology is illustrated, explaining how the
voltage controller works. Reinforcement learning and the
DDPG algorithm are described in section 3. Section 4 demon-
strates how the DDPG algorithm applied to the control system
is designed. As variable reference is rapidly converged to
desired real reference by the proposed approach, observed
current and voltage are also converged faster. In section 5,
performance analysis for test scenarios is illustrated after
training the DDPG process, which includes randomly volt-
age changes (2%, which corresponds to ±83.2 [V]) and
load changes within 30%. In this section, we show that
the transient response is improved in variable conditions of
the power system. The variable main feeder’s voltages and
D-STATCOM’s set-reference voltage are conducted on non-
linear loads conditions to evaluate the model’s performance
and strategy. This article is concluded in section 6.

II. TOPOLOGY OF D-STATCOM: CARRIER-BASED PWM mined by sinωt and cosωt provided by the Phase Locked
CONTROL METHOD Loop (PLL) is then regulated with two separate PID regula-
D-STATCOM mainly consists of an inverter connected to tors with respect to the reference id and iq currents obtained
the network through a transformer and capacitor C, which earlier. As shown in Figure 1, the four PID controllers are
provides dc-link voltage. In detail, D-STATCOM includes a involved in the two current regulation loop (inner and outer).
detailed representation of power electronic IGBT converters, The inner current regulation loop consists of two PID con-
and it is used to regulate the voltage of the distribution trollers that control the d-axis and q-axis currents. The con-
network. The D-STATCOM regulates adjacent bus voltage troller’s outputs are the Vd and Vq voltages designated by
by absorbing or generating reactive power. This reactive the PWM inverter’s pulse generator. The Vd and Vq voltages,
power transfer is done through the leakage reactance of the which are obtained from the integrated outputs of current PI
coupling transformer by generating a secondary voltage in controllers, are converted into phase voltages Vabc. The Iq
phase with main feeder’s primary voltage, which is provided reference comes from the outer voltage regulation loop, and
by a voltage-sourced PWM inverter [7]. In D-STATCOM, the Id reference comes from the DC-link voltage regulation
dc link capacitor operates either inductance or capacitance loop [29].
under specific conditions. When bus voltage is lower than The other two PI controllers in the outer current regulation
the reference voltage at D-STATCOM, it acts as a capacitor loop decide both Iq reference and Id reference by calculating
to absorb reactive power into the grid. In contrast, when bus the difference between the value reference voltage and actual
voltage is higher than the reference voltage at D-STATCOM, voltage observed. To be specific, as shown in Figure 1, Id ref-
it acts as an inductance to inject positive reactive power into erence is determined by PID controller output when DC link
the grid [7], [28]. voltage’s error is the input of PID controller, and Iq reference
The D-STATCOM consists of several key components as is determined by PID controller output when an error of AC
shown in Figure 1. The LC damped filters are connected at voltage which is connected to network side is the input of PID
the inverter output, and a capacitor is acting as a DC source controller. It maintains voltage regulation of the primary side
for the inverter. As the sensor measured the load side voltages equal to the reference value defined in the control system.
and currents, the supply side’s current is calculated first, and After Vq and Vd are determined, IGBTs of VSC Pulse are
then the compensating current of the D-STATCOM is calcu- transmitted and controlled by a PWM pulse generator [30].
lated [7]. Measuring the difference between the load current The filters with a series inductance Lf of 800 µH are con-
and the compensating current, the generating PWM pulses for nected to the primary side bridge output and the filter capac-
the IGBTs inverter bridge, which is essential for observation itance Cf of 100 µF in series with resistance Rf of 10  are
voltage to trace the reference voltage, are then produced. connected to the secondary low voltage side of the coupling
These voltage-sourced PWM inverters consist of two IGBT transformer (1/Y , 4.16/1.2kV ). The detailed technical spec-
bridges, so their control performance becomes more com- ifications of D-STATCOM parameters are given in Table 1.
prehensive than a single bridge. Twin inverter configuration
produces fewer harmonics than a single bridge, resulting in III. DDPG ALGORITHM FOR CONTROLLER DESIGN
smaller filters and improved dynamic response [29]. RL is a value-based algorithm, which learns by estimat-
Both voltage and current components obtained from’ abc ing the Q-value that the model will take in the given cir-
to dq transformation’’’’ in the synchronous reference deter- cumstance [31]. When estimations of this value become

145842 VOLUME 9, 2021


J. H. Woo et al.: D-STATCOM d-q Axis Current Reference Control Applying DDPG Algorithm

FIGURE 1. D-STATCOM topology for carrier based control.

somewhat possible, actions (i.e., a policy) are chosen based DDPG algorithm uses two networks, named actor and
on this value. In addition, Q-Learning has an e-greedy policy critic networks. Actor network proposes an action with a
that estimates the value for all actions and then selects the given state, and critic network predicts if the action is good
action that corresponds to the largest number among those (positive reward) or bad (negative reward) according to the
values. However, learning is not an easy task if there are many given state and an action. To determine the parameters of
behaviors in the continuous action space [32]. DDPG, firstly backpropagating the critic loss, we update
DDPG is a model-free off-policy algorithm for learn- the parameter of the critic network θ. In every iteration,
ing continuous actions. DDPG consists of two ideas from we update the actor model’s parameter ϕ by performing
Deterministic Policy Gradient (DPG) and Deep Q-Network gradient ascent on the output θ of the first critic model as
(DQN). It uses’ experience replay, which enables RL agents in Eq. (1):
to memorize past experience, and’ frozen target network
N
which can operate over continuous action spaces [33]. In the X
case of DQN, it reduces learning instability using the ‘expe- θ ← minθ B−1 (y − Qθ (s, a))2
rience replay’ and the ‘frozen target network.’’ Typical i=1
X
Q-learning obtains the data by the agent moving actually. ∇ϕ J(ϕ) = B −1
∇a Qθ (s, a)|a=πϕ (s)) ∇ϕ πϕ (s) (1)
Thus, naturally, there is a significant correlation between the
data [34], [35]. Therefore, the experience replay method is where B is [st , at , rt ,st+1 ] from replay buffer. After this,
used to reduce the correlation between input data, signifi- we calculate the parameters of ϕ and θ, where actor network
cantly reducing the relationship among them. It also enables ϕ and critic network θ update their own weights as if NN
repetitive learning of past experiences. parameters update. From this work, the target networks are
The mathematical and numerical approaches-based control updated by Polyak Averaging as shown in Eq. (2), where
system requires interpretation in the z-domain to determine θtarget is the calculated critic target, θ 0target is the previously
the PWM output. Changes in operation status of the power trained critic target, ϕtarget is the calculated actor target, and
system’s elements, such as nodal voltage, are constructed by ϕ 0 target is the previously trained actor target. Thus, the target
a sequence of real or complex numbers as a discrete time- network’s parameters update depends on the target smooth
domain signal. Rather than DQN, DDPG is appropriate for factor, which decides how much the current target network
real-time changes in a discrete-time domain because DQN can affect the entire target network.
updates the neural network using a total reward in one episode
unit, while DDPG updates the reward for each step [36]. θ 0 target ← τ θtarget + (1 − τ ) θ 0 target
Since the current and voltage data in discrete time-domain ϕ 0 target ← τ ϕtarget + (1 − τ ) ϕ 0 target (2)
are changed with continuous form, unlike discrete movement
such as top-bottom-left-right, the RL components can operate In addition, learning from sampling all accumulated
over continuous action spaces by using DDPG. To this end, experience is better than learning only from recent expe-
DDPG has the advantage of DQN and can be extended to rience. The DDPG normalizes the various unparticular
continuous action space using the actor & critic framework. units by normalizing the observation and uses batch nor-
The critic framework is used to estimate the value via the malization to put the samples into a single minibatch
Bellman equation. The actor framework is used to generate and normalize all the dimensions for better learning.
action according to the distribution of action space by the The schematic diagram of DDPG algorithm is illustrated
chain rule [37], [38]. in Figure 2.

VOLUME 9, 2021 145843


J. H. Woo et al.: D-STATCOM d-q Axis Current Reference Control Applying DDPG Algorithm

The behavior of a RL policy is similar to the operation of a


controller in control system [38]. We introduce outer voltage
regulation loop, which uses two PID controller to decide Iq
and Id references respectively, as illustrated in section 2. This
paper introduces a technique that designates action vectors
from DDPG algorithms as a reference, instead of outer PID
controller in a discrete action domain to achieve the system’s
desired objective. Insufficient training would deteriorate the
control system because it is unpredictable and unexpected,
resulting in poor performance in controller design.
To make the model robust against unpredictable distur-
bance, the RL agent has to learn the proper systematic way
under variable systematic conditions. Furthermore, the effi-
cient noise vector’s setting of RL allows the agent to explore
FIGURE 2. Schematic diagram of DDPG algorithm. off-policy for ensuring sufficient exploration and avoiding
the policy’s determinacy. Unless there is an efficient noise
in the environment, the agent would probably not try a wide
enough variety of action exploration during the learning pro-
cess. In brief, two types of unpredictable situations make the
training model works stronger.
If the actions have different ranges and units within an
agent, each action could require different noise parameters.
For an agent with two actions [Iq, Id], we can set the standard
deviation of each action to a different value. Universally,
the range of noise is defined as a value√between 1% and 10%
of action vector’s range divided by Ts, which Ts (2 ms)
means RL sampling time [40]. Therefore, as Iq is set as
[−1.4 to 1.4] and Id is set as [−0.7 to 0.7] in this paper,
these noise vectors have the different standard deviation as
Gaussian action space noise (σ = 0.6261, 0.3130) chosen
by the policy using a stochastic noise model at each training
step. Furthermore, the same decay rate (1e-4) is applied to
FIGURE 3. IEEE 13-bus real unbalanced distribution network.
both standard deviations.
In this model, the learning rate of actor and critic networks
is set at 1e-3, which corresponded to the neural network
IV. THE PROPOSED APPROACH TO CONTROL learning rate of actor 8 and critic θ. ReLU layers, which
D-STATCOM are well known for adding non-linearity into NN, are used
The first process for constructing RL is designating the sys- to activation function in each network [41]. In addition, the
tem environment in which the agents play. RL environment discount factor γ is set at 0.99 as it commonly leads to the
includes the plant, the reference signal, and line information. highest reward with convergence stability [42]. The specific
In general, the RL environment can also include a trans- parameters of DDPG are given in Table 2.
former, filters, breaker, and power converter [38]. In this For making model’s policy, as the voltage regulation is
proposed approach, the IEEE 13-bus, a real unbalanced dis- a focus of this paper, we assume that: If the controller is
tribution network, is used, as shown in Figure 3. The trans- designed to keep a power supply side voltage at 1 pu, the-
former is connected between bus 633 and bus 634, and oretically we shall set the voltage reference at 1pu. However,
the voltage regulator is installed between bus 650 and bus our approach stems from the idea that a reference should be
632. On-load tap changer transformer (120/4.2kV) is used given instead of 1 pu even though the desired real reference
as the voltage regulator. Data of line impedance and system is 1.
information are referenced in the IEEE 13-bus system from In a similar way, Idref and Iqref shall not be the proper
Electric Power Research Institute (EPRI) [39]. D-STATCOM references to enhance the control system’s transient response.
is installed parallel to bus 632, and the designated value of dc In Figure 4, in the outer loop, two PID controllers are elimi-
link capacitor used in this study is 10,000 µF. In addition, nated, and the controller outputs, as a result, are replaced by
DC link voltage reference is set at 3,000 V. This system actions of DDPG algorithm.
is used to illustrate that the proposed approach develops The observation of RL system plays a vital role since
bus voltage control performance with an improved transient it is a core of the agent, which enables the agent to
response. receive the results from the action and the environments

145844 VOLUME 9, 2021


J. H. Woo et al.: D-STATCOM d-q Axis Current Reference Control Applying DDPG Algorithm

FIGURE 4. DDPG agent workflow: determining Iq, Id reference in D-STATCOM control system.

TABLE 2. Design parameters of DDPG algorithm. from the difference between its line voltage observation and
real reference. However, designing a reward vector that only
focuses on voltage stability may not work properly and could
be risky to diverge during the model’s training process since
there is a low correlation between nodal voltage stability and
RL action vectors (Idref and Iqref ) as observed in our exper-
iment. In order to mitigate the low correlation, the reward
strategy is suggested as follows:
¬ Gather data from PID controller’s output (Idref and
Iqref ) and nodal voltage in when RL agent is not con-
sidered variable episodes.
­ Train RL Agent with the reward vector formed by
the differences between action vectors and outer PID
controller’s output.
® Find the total reward vectors tr1 when it learns enough.
¯ Set another reward strategy
– Agent earns the rewards tr1 evenly for each step.
changes [36]. In this paper, the observations represent – If the nodal voltage stability is improved, com-
that s = [Vacref , Vac, Vacdif , Kz−1
Ts
Vacdif Vdcref , Vdc, Vdcdif , pared to PID controllers, the extra-rewards are
K Ts
z−1 Vdcdif ]. However, magnitudes of Vac and Vdc are not received as a return for voltage stability.
at the same level, so it is necessary to normalize data to ° Re-train the RL Agent from a pre-trained agent from
0-1 range. The idea about the composition of ‘s’ (RL State) step ­.
components originates from the input of PID controller that
Making RL agent learn the outer PID controller’s output
creates a proportional and integral term from the error gen-
(Idref and Iqref ) is important to determine feasible RL agent’s
erated by the difference between the measurement and the
actions in the control system. Therefore, it gets more consid-
reference.
erable correlation compared to the output (Vacref and V dcref )
As RL agent learns through the reward received for each
what we originally desired.
step, setting the reward vector criteria is crucial for making
proper policies. RL reward function can be implemented to x = Iref _PID − i
pursue the minimum steady-state error in the control system. 1Iref = Iref 2 − Iref 1
In addition, its reward is applied to compose Q-value as
1Vref = Vref 2 − Vref 1 (3)
bellman equation from the critic network with mentioned
states and two actions. In Eq. (3), 1Iref is the difference between the next steady
In this paper, in order to enhance voltage stability, state value of current Iref 2 and previous steady state value of
the reward vector has to be formed by a degree of a mismatch current Iref 1 , and 1Vref also have the same relationship as

VOLUME 9, 2021 145845


J. H. Woo et al.: D-STATCOM d-q Axis Current Reference Control Applying DDPG Algorithm

1Iref . For illustrating the reward function, the basic variable


set is described in Eq. (3). The difference between the current
reference gained from PID controller output and the real value
of current is tied to variable x as the dependent variable for the
reward function. As Iref _PID is obtained from step ¬ in which
Iref 1 and Iref 2 are obtained together, it is used to describe
reward function which reflects previous step’s value function.
− 1I1 x
(
e ref , x>0
R1 = (4)
1, x≤0
R1 is illustrated in Eq. (4). When x is larger than 0,
1
− x
the reward is lower than 1, as the term of e 1Iref . As long
as x is small, from the value function, R1 also get a higher
value than as not followed steady state current reference Iref 2 . FIGURE 5. Reward function: comparison between R1 and R2 .

At steps ¬ - ­, it would be better to set the reward function


by leveraging the relationship between RL agent’s actions and
the outer PID controller’s output (Idref and Iqref ).

Vref −v

 1V 2
ref
R2 = R1 , x>0 (5)

 1, x≤0
However, it cannot expect better performance compared
to PID controller. Thus, we further tune the reward function
after training the model as described in the steps ® - ¯.
In step °, using the pre-trained agent from step ­, the RL
agent should learn from the re-designed reward function.
In Eq. (5), it can be illustrated by the term of R2 is also
Vref −v
2
1Vref
designed as R1 . To explain more in detail, the slope of
the reward function should be steep or gentle according to the
FIGURE 6. Programmable voltage source at bus 650 (slack bus).
transient responsibility between real voltage and Vref2 so that
voltage performance considerably impacts reward function
with variable x. It illustrates that reward R2 place priority on
voltage transient response over the current transient response,
even though the current transient response reflects reward
function up to a certain point. Thus, if voltage performance is
good enough, the expression of ability to trace PI controller
can be ignored up to a certain point. The comparison between
R1 and R2 are simply described in Figure 5.
Moreover, since only reward functions are needed in the
training process, reward functions are neglected when we test
the model.

V. PERFORMANCE ANALYSIS
A. PROGRAMMABLE VOLTAGE SOURCE & CONTINGENCY
As shown in Figure 6, the voltage magnitude of the voltage
source at bus 650 varies, fluctuating between 0.985 and 1.02.
Bus 650 is the slack bus, that is, the IEEE 13-bus distribution FIGURE 7. Result comparison of voltage(p.u) at bus 632.
system is connected to the main grid through bus 650. In the
experiment, D-STATCOM is installed in parallel to bus 632.
Transformer and line impedance exists between bus 623 and case, line-to-line RMS base voltage is set at 4.16kV. The RL
bus 650. is implemented for 1,000 iterations, and the action vectors
We designate the real reference of D-STATCOM at 1 pu, of RL develop a transient response in the control system
which is shown in the purple line in the Figure 7. In this when the voltage magnitude of the programmable voltage

145846 VOLUME 9, 2021


J. H. Woo et al.: D-STATCOM d-q Axis Current Reference Control Applying DDPG Algorithm

FIGURE 9. Reactive power in each phase (D-STATCOM).

FIGURE 8. q-axis current changes according to reference changes in


D-STATCOM. is enhanced since the reactive power output of D-STATCOM
converges fast. In addition, reactive power profiles of indi-
vidual phases are depicted in Figure 9, which are different
source changes. During the training, programmable voltage because the IEEE 13-bus system has unbalanced loads. The
also changes randomly every 0.1 seconds. The response summation of reactive power is also shown in Figure 9.
characteristics with and without D-STATCOM are shown In addition, the same scenario as Figure 6 is further
in Figure 7. The yellow line in Figure 7 shows that, without tested under the setting that the circuit breaker between bus
D-STATCOM, the voltage does not converge to 1. Besides, 671 and bus 692 opens at 0.1s and closes at 0.5s, and the
the transient response is enhanced when the DDPG approach breaker’s resistance is 0.0001 . The significant transient
is applied in comparison with the existing control theory. response in the bus voltage and reactive powers have been
It confirms that the proposed model works adaptably under observed during feeder tripping and reclosing, as shown
various power system conditions with different voltage levels in Figure 10. It illustrates that the transient response asso-
of the voltage source. ciated with feeder tripping and reclosing depends on the
As the voltage magnitude of the programmable voltage presence of D-STATCOM. The D-STATCOM absorbs active
source changes, D-STATCOM controls reactive power to and reactive power during this period, same as what we
adjusts the voltage of bus 632 to 1 pu. In order to con- observed in Section 5.1. It confirms that the model works
trol ac-voltage, reactive power at bus 632 is changed via adaptably with the three-phase breaker operation and main
D-STATCOM, as q-axis current reference is changed. In Fig- feeder’s voltage changes.
ure 8, especially on the second and the third images, the leg-
ends illustrate ‘q-axis current’’’’ which is colored by red, and B. VARIATION OF SET-REFERENCE VOLTAGE AT
‘q-axis current reference’’’’ which is colored by black (dotted D-STATCOM
line). These black dotted lines correspond to q-axis current Solar power generation rises, and reverse power flows into
reference on the first subplot. This figure seems that q-axis the system during the day, resulting in an overvoltage phe-
current reference converges faster than the reactive power in nomenon exceeding the allowable limit. D-STATCOM can
D-STATCOM, since q-axis current reference reacts first to adjust the reference voltage when this happens, assuming that
disturbance and then q-axis voltage reacts afterward. IEEE 13-bus is connected to other distribution networks and
Furthermore, as fast q-axis reference current derives fast D-STATCOM coordinates them for the entire system [43].
convergence of real q-axis current, observed current also con- Thus, the proposed approach is tested on different reference
verges fast by tracing the reference designated via the DDPG voltage scenarios at bus 632. As illustrated in the yellow
approach. Applying DDPG approach into the control system, line of Figure 11, an approximately 1.00238 pu (with the
reactive power also converges faster than the conventional base voltage of 4.16kV) is applied when D-STATCOM is not
method as shown in Figure 8. Thus, voltage transient response present in the grid.

VOLUME 9, 2021 145847


J. H. Woo et al.: D-STATCOM d-q Axis Current Reference Control Applying DDPG Algorithm

FIGURE 10. Result comparison of voltage (p.u) at bus 632 under load switching (3-phase breaker).

FIGURE 11. Result comparison of voltage(p.u) at bus 632 under variable


set-reference voltage.

The purple line indicates that the reference of the


D-STATCOM changes over time. It changes from 1 pu to
0.995 pu at 0.2s, further to 0.99 pu at 0.4s, and finally back to
1 pu at 0.6s. As reference voltage is changed, D-STATCOM
controls the distribution system’s voltage by itself. In other
words, DC link capacitor injects inductive reactive power
to set the voltage at reference. In Figure 12, the DC link
capacitor injects 1.44 MVAR and then 4.55 MVAR at 0.2 sec-
onds to adjust the AC voltage. At the same time, to improve FIGURE 12. Reactive power and q-axis current reference changes in
the transient performance of D-STATCOM, q-axis current D-STATCOM under variable set-reference voltage.
reference under the DDPG approach responds faster than the
conventional method.
The voltage setting time results of the two methods are in the same setup. In chronological order, the settling times
compared in Table 3. As the reference changes from 1 to for the DDPG approach and conventional control system are
0.995 at 0.2s, the DDPG approach takes 0.0534 seconds 0.0532 seconds and 0.0660 seconds respectively for the event
to reach the steady state. In comparison, the conventional at 0.4 seconds and are 0.0558 seconds and 0.0716 seconds,
control system takes 0.0729 seconds to reach the steady state respectively for the event at 0.6 seconds.

145848 VOLUME 9, 2021


J. H. Woo et al.: D-STATCOM d-q Axis Current Reference Control Applying DDPG Algorithm

FIGURE 13. DC-link voltage and d-axis current reference changes in D-STATCOM.

FIGURE 14. D-STATCOM current comparison (p.u) with current wave form.

TABLE 3. D-STATCOM voltage performance analysis – setting time. action vector to the d-axis current reference. As our model’s
actions vectors in this paper consist of d-q axis current ref-
erence, the change in the new d-axis current reference is
shown in Figure 13. It converges at a certain level to control
DC-link voltage at 3,000 V. Using DDPG agent’s action
vector, the voltage of the capacitor is maintained constant to
the reference value of 3,000 V, as shown in the Figure 13.
The final result of the current wave form from the
D-STATCOM is shown in Figure 14. In this scenario, both the
Various distribution system conditions would lead to a conventional method and DDPG approach guarantee voltage
decrease or an increase in dc link capacitor voltage. For the stability through reactive power control, which improves the
sake of compensation, it is essential that the capacitor dc-link current transient response. In the control systems, settling
voltage remains as close to the reference value as possible. time which requires the response to reach and stay within
During the transient operation, it is possible to improve the the specified range of 2% to 5% of its final value, is a
performance of the capacitor dc link voltage by adding the RL crucial criterion in determining control performance. The

VOLUME 9, 2021 145849


J. H. Woo et al.: D-STATCOM d-q Axis Current Reference Control Applying DDPG Algorithm

TABLE 4. D-STATCOM current performance analysis – setting time. [4] B. Singh, P. Jayaprakash, and D. P. Kothari, ‘‘New control approach for
capacitor supported DSTATCOM in three-phase four wire distribution
system under non-ideal supply voltage conditions based on synchronous
reference frame theory,’’ Int. J. Elect. Power Energy Syst., vol. 33, no. 5,
pp. 1109–1117, Sep. 2011.
[5] H. Yoon and Y. Cho, ‘‘Imbalance reduction of three-phase line current
using reactive power injection of the distributed static series compensator,’’
J. Elect. Eng. Technol., vol. 14, no. 3, pp. 1017–1025, Feb. 2019.
[6] B. S. Goud and B. L. Rao, ‘‘Power quality enhancement in grid-connected
PV/wind/battery using UPQC: Atom search optimization,’’ J. Elect. Eng.
Technol., vol. 16, no. 2, pp. 821–835, Jan. 2021.
current settling time results of the two methods are compared [7] C. Kumar and M. K. Mishra, ‘‘A voltage-controlled DSTATCOM for
power-quality improvement,’’ IEEE Trans. Power Del., vol. 29, no. 3,
in Table 4. As the reference changes from 1 to 0.995 at pp. 1499–1507, Jun. 2014.
0.2s, the DDPG approach takes 0.0313 seconds to reach [8] B. Pragathi, R. C. Poonia, B. Polaiah, and D. K. Nayak, ‘‘Evaluation and
the steady state. In comparison, conventional control system analysis of soft computing techniques for grid connected photo voltaic
system to enhance power quality issues,’’ J. Elect. Eng. Technol., vol. 16,
takes 0.0753 seconds to reach the steady state in the same pp. 1833–1840, Apr. 2021.
setup. In chronological order, the settling times for the DDPG [9] S. Ramachandran and M. Ramasamy, ‘‘Solar photovoltaic interfaced quasi
approach and conventional control system are 0.0302 seconds impedance source network based static compensator for voltage and fre-
quency control in the wind energy system,’’ J. Elect. Eng. Technol., vol. 16,
and 0.0722 seconds, respectively for the event at 0.4 seconds. no. 3, pp. 1253–1272, Feb. 2021.
They are 0.0301 seconds and 0.0732 seconds, respectively for [10] B. Blazic and I. Papic, ‘‘Improved D-StatCom control for operation with
the event at 0.6 seconds. As described in Table 4, the DDPG unbalanced currents and voltages,’’ IEEE Trans. Power Del., vol. 21, no. 1,
pp. 225–233, Jan. 2006.
approach helps improving the performance and settling time [11] C. K. Sao, P. W. Lehn, M. R. Iravani, and J. A. Martinez, ‘‘A benchmark
when time-varying reference is applied. system for digital time-domain simulation of a pulse-width-modulated
D-STATCOM,’’ IEEE Trans. Power Del., vol. 17, no. 4, pp. 1113–1120,
Oct. 2002.
VI. CONCLUSION [12] K. Sayahi, A. Kadri, F. Bacha, and H. Marzougul, ‘‘Implementation of a
With the increasing complexity of power systems, especially D-STATCOM control strategy based on direct power control method for
the distribution systems, the role of FACTS devices becomes grid connected wind turbine,’’ Int. J. Elect. Power Energy Syst., vol. 121,
Oct. 2020, Art. no. 106105.
critical for the system stability. In this paper, a 4.16 kV distri- [13] Y. M. Zhao, W. F. Xie, and X. W. Tu, ‘‘Performance-based parameter
bution system with D-STATCOM is simulated by Simulink tuning method of model-driven PID control systems,’’ ISA Trans., vol. 51,
(in discrete time domain) in real-time. Instead of PID con- no. 3, pp. 393–399, May 2012.
[14] S. Lee, J. Kim, L. Baker, A. Long, N. Karavas, N. Menard, I. Galiana,
troller, the proposed approach applies RL to regulate the and C. J. Walsh, ‘‘Autonomous multi-joint soft exosuit with augmentation-
d-axis and q-axis current references and control nodal volt- power-based control parameter tuning reduces energy cost of loaded walk-
age. Since references converge fast, observed current and ing,’’ J. Neuroeng. Rehabil., vol. 15, no. 1, pp. 1–9, Dec. 2018.
[15] B. Tandon and R. Kaur, ‘‘Genetic algorithm based parameter tuning of PID
voltage converge faster than conventional PID model. In controller for composition control system,’’ Int. J. Eng. Sci. Technol., vol. 3,
addition, these references are key to fast dc-link capacitor’s no. 8, pp. 6705–6711, Aug. 2011.
operation to inject reactive power into the grid. The simula- [16] S. R. Arya and B. Singh, ‘‘Neural network based conductance estimation
control algorithm for shunt compensation,’’ IEEE Trans. Ind. Informat.,
tions confirm that operating D-STATCOM with the proposed vol. 10, no. 1, pp. 569–577, Feb. 2014.
model could induce a more stable voltage profile and better [17] L. L. Lai, ‘‘A two-ANN approach to frequency and harmonic evaluation,’’
transient response. Furthermore, it verifies that the model is in Proc. 5th Int. Conf. Artif. Neural Netw., 1997, pp. 245–250.
[18] Y. Pan and J. Wang, ‘‘Model predictive control of unknown nonlinear
robust against D-STATCOM’s reference voltage changes. dynamical systems based on recurrent neural networks,’’ IEEE Trans. Ind.
The role of D-STATCOM would only become essential Electron., vol. 59, no. 8, pp. 3089–3101, Aug. 2012.
with the increasing complexity of the grid due to the higher [19] M. T. Ahmad, N. Kumar, and B. Singh, ‘‘Generalised neural network-based
control algorithm for DSTATCOM in distribution systems,’’ IET Power
penetration level of renewable energy. The future work would Electron., vol. 10, no. 12, pp. 1529–1538, Oct. 2017.
include renewable energy sources, such as wind and solar, [20] W. J. Shipman and L. C. Coetzee, ‘‘Reinforcement learning and deep neural
into the proposed study. Moreover, AC OPF problem using networks for PI controller tuning,’’ IFAC-PapersOnLine, vol. 52, no. 14,
pp. 111–116, 2019.
FACTS devices will be dealt with in subsequent research. [21] F. S. Melo, S. P. Meyn, and M. I. Ribeiro, ‘‘An analysis of reinforcement
learning with function approximation,’’ in Proc. 25th Int. Conf. Mach.
REFERENCES Learn. (ICML), Helsinki, Finland, 2008, pp. 664–671.
[22] S. P. Singh, T. Jaakkola, and M. I. Jordan, ‘‘Reinforcement learning with
[1] R. Zamora and A. K. Srivastava, ‘‘Controls for microgrids with storage: soft state aggregation,’’ in Proc. Adv. Neural Inf. Process. Syst., 1995,
Review, challenges, and research needs,’’ Renew. Sustain. Energy Rev., pp. 361–368.
vol. 14, no. 7, pp. 2009–2018, Sep. 2010. [23] C. Szepesvári and W. D. Smart, ‘‘Interpolation-based Q-learning,’’ in Proc.
[2] Y. Naderi, S. H. Hosseini, S. G. Zadeh, B. Mohammadi-Ivatloo, 21st Int. Conf. Mach. Learn. (ICML), 2004, pp. 791–798.
J. C. Vasquez, and J. M. Guerrero, ‘‘An overview of power quality [24] S. Adam, L. Busoniu, and R. Babuska, ‘‘Experience replay for real-time
enhancement techniques applied to distributed generation in electrical reinforcement learning control,’’ IEEE Trans. Syst., Man, Cybern., C (Appl.
distribution networks,’’ Renew. Sustain. Energy Rev., vol. 93, pp. 201–214, Rev.), vol. 42, no. 2, pp. 201–212, Mar. 2012.
Oct. 2018. [25] Z. Cao, Q. Xiao, R. Huang, and M. Zhou, ‘‘Robust neuro-optimal control of
[3] E. Jamil, S. Hameed, B. Jamil, and Qurratulain, ‘‘Power quality improve- underactuated snake robots with experience replay,’’ IEEE Trans. Neural
ment of distribution system with photovoltaic and permanent magnet syn- Netw. Learn. Syst., vol. 29, no. 1, pp. 208–217, Jan. 2018.
chronous generator based renewable energy farm using static synchronous [26] P. Zhu, W. Dai, J. Ma, Z. Zeng, and H. Lu, ‘‘Multi-robot flocking
compensator,’’ Sustain. Energy Technol. Assessments, vol. 35, pp. 98–116, control based on deep reinforcement learning,’’ IEEE Access, vol. 8,
Oct. 2019. pp. 150397–150406, 2020.

145850 VOLUME 9, 2021


J. H. Woo et al.: D-STATCOM d-q Axis Current Reference Control Applying DDPG Algorithm

[27] J. Duan, D. Shi, R. Diao, H. Li, Z. Wang, B. Zhang, and D. Bian, ‘‘Deep- LEI WU (Senior Member, IEEE) received the
reinforcement-learning-based autonomous voltage control for power grid B.S. degree in electrical engineering and the M.S.
operations,’’ IEEE Trans. Power Syst., vol. 35, no. 1, pp. 814–817, degree in systems engineering from Xi’an Jiao-
Jan. 2020. tong University, Xi’an, China, in 2001 and 2004,
[28] J. Hussain, M. Hussain, S. Raza, and M. Siddique, ‘‘Power quality respectively, and the Ph.D. degree in electrical
improvement of grid connected wind energy system using DSTATCOM- engineering from Illinois Institute of Technology
BESS,’’ Int. J. Renew. Energy Res., vol. 9, no. 3, pp. 1388–1397, Sep. 2019. (IIT), Chicago, IL, USA, in 2008.
[29] A. Banerji, S. K. Biswas, and B. Singh, ‘‘DSTATCOM control algorithms:
From 2008 to 2010, he was a Senior Research
A review,’’ Int. J. Power Electron. Drive Syst. (IJPEDS), vol. 2, no. 3,
Associate with the Robert W. Galvin Center for
pp. 285–296, Sep. 2012.
[30] S. Bansrlar and R. Nayak, ‘‘Modeling of adaptable voltage controller and Electricity Innovation, IIT. He was a summer Vis-
its stability analysis in distributed generation system,’’ Int. J. Current Eng. iting Faculty with NYISO, in 2012. He was a Professor with the Electrical
Technol., vol. 5, no. 3, pp. 1798–1801, Jun. 2015. and Computer Engineering Department, Clarkson University, Potsdam, NY,
[31] L. C. Baird and A. W. Moore, ‘‘Gradient descent for general reinforcement USA, till 2018. He is currently a Professor with the Electrical and Computer
learning,’’ in Proc. Adv. Neural Inf. Process. Syst., 1999, pp. 968–974. Engineering Department, Stevens Institute of Technology, Hoboken, NJ,
[32] Z. Yang, K. Merrick, L. Jin, and H. A. Abbass, ‘‘Hierarchical deep rein- USA. His research interests include power systems operation and planning,
forcement learning for continuous action control,’’ IEEE Trans. Neural energy economics, and community resilience microgrid.
Netw. Learn. Syst., vol. 29, no. 11, pp. 5174–5184, Nov. 2018.
[33] J. Li, T. Chai, FL. Lewis, Z. Ding, and Y. Jiang, ‘‘Off-policy interleaved
Q-learning: Optimal control for affine nonlinear discrete-time systems,’’
IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 5, pp. 1308–1320,
May 2019.
[34] M. Ramicic and A. Bonarini, ‘‘Correlation minimizing replay memory in
temporal-difference reinforcement learning,’’ Neurocomputing, vol. 393, SUNG MIN LEE received the B.S. degree in elec-
pp. 91–100, Jun. 2020. trical engineering from Konkuk University, Seoul,
[35] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, and J. Veness, ‘‘Human- South Korea, in 2019, where he is currently pur-
level control through deep reinforcement learning,’’ Nature, vol. 518, suing an Integrated Ph.D. degree under the super-
no. 7540, pp. 529–533, 2015. vision of Prof. Y. H. Cho. His current research
[36] J. H. Woo, L. WU, J. B. Park, and J. H. Roh, ‘‘Real-time optimal power flow interests include high-power converters and grid-
using twin delayed deep deterministic policy gradient algorithm,’’ IEEE connected systems.
Access, vol. 8, pp. 213611–213618, 2020.
[37] S. Dankwa and W. Zheng, ‘‘Twin-delayed DDPG: A deep reinforcement
learning technique to model a continuous movement of an intelligent robot
agent,’’ in Proc. 3rd Int. Conf. Vis., Image Signal Process., Aug. 2019,
pp. 1–5.
[38] Z. Zhang, D. Zhang, and R. C. Qiu, ‘‘Deep reinforcement learning for
power system applications: An overview,’’ CSEE J. Power Energy Syst.,
vol. 6, no. 1, pp. 213–225, Mar. 2020.
[39] IEEE PES Distribution System Analysis Subcommittee’s Distribution
Test Feeder Working Group. Accessed: Jul. 2004. [Online]. Available: JONG-BAE PARK (Member, IEEE) received the
https://cmte.ieee.org/pes-testfeeders/resources/ B.S., M.S., and Ph.D. degrees from Seoul National
[40] R. S. Sutton and A. G. Barto, ‘‘Reinforcement learning: An introduction,’’ University, South Korea, in 1987, 1989, and 1998,
in Adaptive Computation and Machine Learning. 2 nd ed. Cambridge, MA, respectively.
USA: MIT Press, 2018. From 1998 to 2001, he was with the Electri-
[41] K. Eckle and J. S. Hieber, ‘‘A comparison of deep networks with cal and Electronics Department, Anyang Univer-
ReLU activation function and linear spline-type methods,’’ Neural Netw., sity, South Korea, as an Assistant Professor. From
vol. 110, pp. 232–242, Feb. 2019. 2006 to 2008, he was a resident Researcher with
[42] Y. Lin, J. McPhee, and N. L. Azad, ‘‘Comparison of deep reinforcement EPRI, USA. Since 2001, he has been with the
learning and model predictive control for adaptive cruise control,’’ IEEE Electrical Engineering Department, Konkuk Uni-
Trans. Intell. Vehicles, vol. 6, no. 2, pp. 221–231, Jun. 2021. versity, Seoul, South Korea, as a Professor. His major research interests
[43] R. K. Varma and M. Siavashi, ‘‘PV-STATCOM: A new smart inverter include power system operation, planning, economics, and markets.
for voltage control in distribution systems,’’ IEEE Trans. Sustain. Energy,
vol. 9, no. 4, pp. 1681–1691, Oct. 2018.

JAE HYUNG ROH (Member, IEEE) received the


B.S. degree in nuclear engineering from Seoul
National University, Seoul, South Korea, in 1993,
the M.S. degree in electrical engineering from
Hongik University, Seoul, in 2002, and the Ph.D.
degree in electrical engineering from Illinois Insti-
JONG HA WOO received the B.S. degree in elec- tute of Technology, Chicago, IL, USA, in 2008.
trical engineering from Konkuk University, Seoul, From 1992 to 2001, he was with Korea Electric
South Korea, in 2020, where he is currently pur- Power Corporation. From 2001 to 2010, he was
suing the master’s degree in electrical engineering with Korea Power Exchange. Since 2010, he has
under the supervision of Prof. J. H. Roh. His cur- been with the Department of Electrical and Electronics Engineering, Konkuk
rent research interests include the optimal power University, Seoul, as a Professor. His research interests include electricity
flow using a deep reinforcement learning, smart market, smart grid, and resource planning.
grid, and power system operation. Dr. Roh was a recipient of the IEEE PES Technical Committee Prize Paper
Award in 2015.

VOLUME 9, 2021 145851

You might also like