0% found this document useful (0 votes)
33 views11 pages

Optimize The Operating Range For Improving The Cycle Life of Battery Energy

Uploaded by

AsmZziz Oo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views11 pages

Optimize The Operating Range For Improving The Cycle Life of Battery Energy

Uploaded by

AsmZziz Oo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Journal of Energy Storage 73 (2023) 109144

Contents lists available at ScienceDirect

Journal of Energy Storage


journal homepage: www.elsevier.com/locate/est

Research papers

Optimize the operating range for improving the cycle life of battery energy
storage systems under uncertainty by managing the depth of discharge
Seon Hyeog Kim a , Yong-June Shin b ,∗
a Digital Convergence Research Laboratory, Electronics and Telecommunications Research Institute, 218, Gajeong-ro, Daejeon, 34129, Yuseong-gu, Republic of
Korea
b
The School of Electrical and Electronic Engineering, Yonsei University, 50, Yonsei-ro, Seoul, 03722, Seodaemun-gu, Republic of Korea

ARTICLE INFO ABSTRACT

Keywords: Globally, renewable energy penetration is being actively promoted by renewable energy 100% (RE100)
Battery aging policies. BESS operators using time-of-use pricing in the electrical grid need to operate the BESS effectively
Battery energy storage system (BESS) to maximize revenue while responding to demand fluctuations. Battery energy storage (BESS) is needed to
Battery management
overcome supply and demand uncertainties in the electrical grid due to increased renewable energy resources.
Depth of discharge (DOD)
BESS operators using time-of-use pricing in the electrical grid need to operate the BESS effectively to maximize
Deep reinforcement learning
Time-of-use
revenue while responding to demand fluctuations. However, excessive discharge depth and frequent changes in
operating conditions can accelerate battery aging. Deep discharge depth increases BESS energy consumption,
which can ensure immediate revenue, but accelerates battery aging and increases battery aging costs. The
proposed BESS management system considers time-of-use tariffs, supply deviations, and demand variability to
minimize the total cost while preventing battery aging. In this study, we investigated a BESS management
strategy based on deep reinforcement learning that considers depth of discharge and state of charge range
while reducing the total operating cost. In the proposed BESS management system, the agent takes actions to
minimize the total operating cost while avoiding excessive discharge depth and low state of charge. A series
of experiments using a real BESS demonstrated that the proposed BESS management system has improved
performance compared to the existing methods.

1. Introduction conditions of the BESS has not been seriously considered. In [9],
the state-of-charge (SOC) range affected battery aging. A scheduling
Renewable energy deployed to achieve carbon neutrality relies on algorithm considering battery degradation was proposed in [10–13].
battery energy storage systems to address the instability of electric-
Excessive depth of discharge (DOD) can ensure immediate revenue,
ity supply. BESS can provide a variety of solutions, including load
but BESSs typically do not cycle beyond their maximum rate capacity.
shifting, power quality maintenance, energy arbitrage, and grid sta-
bilization [1]. Previous research has proposed an energy management Increasing DOD due to excessive charge/discharge for economic gain
system (EMS) operation strategy that integrates BESS, PV, and vehicle- increases the risk of BESS fire and accelerates battery aging. In [14,
to-grid functions to maximize the benefits of BESS [2,3]. Mixed-integer 15], the state of health (SOH) and end of life (EOL) of a battery is
linear programming was implemented to solve various grid scenarios highly dependent on depth of discharge (DOD) conditions. Lithium-
to reduce operating costs and peak hour consumption [4,5]. Model ion batteries are typically designed to last longer when charged to a
predictive control (MPC) is a modern optimal control strategy that can
moderate SOC range, such as 20%–80%. In additions, deep discharging
efficiently handle non-linearity and operational constraints. MPC can
provide improved performance and is well suited to EMS problems. can cause internal stress on the battery, which can lead to other issues
In [6,7], MPC was used to maximize the economic benefits of BESS and such as reduced charging capacity and decreased overall performance.
minimize the BESS performance degradation under different system The capacity degradation of a battery is accelerated by repeated deep
constraints. However, MPC performance can be affected by load/PV discharges and recharges at high SOC [16].
uncertainties [8]. The aforementioned studies have demonstrated improvements in
Existing energy management studies using BESSs have focused on charge and discharge scheduling, but they are model-based approaches
reducing electricity costs in time-of-use (TOU) tariffs, while the aging

∗ Corresponding author.
E-mail addresses: [email protected] (S.H. Kim), [email protected] (Y.-J. Shin).

https://doi.org/10.1016/j.est.2023.109144
Received 4 May 2023; Received in revised form 17 September 2023; Accepted 29 September 2023
Available online 16 October 2023
2352-152X/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
S.H. Kim and Y.-J. Shin Journal of Energy Storage 73 (2023) 109144

that rely heavily on information from system models. To ensure op- Table 1
Time-of-use tariff.
timal operation even in complex environments, BESS management
Time Electricity price [cent/kWh]
methods based on reinforcement learning (RL) have been proposed.
Model-free approaches that do not require system model informa- Off-Peak 2.35
TOU
Mid-Peak 4.30
tion have achieved great success in decision-making applications using Tariff
On-Peak 32.6
RL [17]. Deep neural networks (DNNs) have also overcome the prob-
lem caused by the small state space of Q-learning. In [18], deep
reinforcement learning (DRL) combining RL and DNNs provided an
effective EMS without specific user information. In [19], a method to 2.2. Battery energy storage system
optimize scheduling in DR based on deep Q-learning (DQN) combining
Q-learning and DNNs was proposed. To overcome the high-dimensional BESS scheduling is optimized by considering demand/supply fore-
DQN problem and avoid being trapped in local optimization, double- casts, TOU and SOC. The inequality constraints include the utility grid’s
deep Q-learning (DDQN) was proposed. The authors in [20] imple- power capacity limits and EMS, as follows:
mented DDQN to learn optimal battery control policies considering 𝐺
𝑃𝑚𝑖𝑛 ≤ 𝑃𝑡𝐺 ≤ 𝑃𝑚𝑎𝑥
𝐺
, (2)
price uncertainty and battery degradation. To solve the DQN discrete
action problem, deterministic policy gradient (DPG) has been proposed. 𝐵𝐸𝑆𝑆
𝑃𝑚𝑖𝑛 ≤ 𝑃𝑡𝐵𝐸𝑆𝑆 ≤ 𝑃𝑚𝑎𝑥
𝐵𝐸𝑆𝑆
, (3)
However, DPG suffers from low sampling efficiency and slow conver-
gence due to large variance of gradient estimation. To overcome these Constraint (2) is the utility grid constraint and constraint (3) is the
drawbacks, a deep deterministic policy gradient (DDPG) method was charging/discharging rate limit in BESS scheduling imposed by the
proposed. The authors in [13] showed simulation results to derive an EMS. To prevent the battery from being over-charged or over-
optimal BESS control strategy based on DDPG. Recently, soft actor criti- discharged, the BESS SOC limit is defined as follows:
cism (SAC), a state-of-the-art DRL strategy that accelerates convergence
𝑆𝑂𝐶𝑚𝑖𝑛 ≤ 𝑆𝑂𝐶𝑡 ≤ 𝑆𝑂𝐶𝑚𝑎𝑥 , (4)
and improves optimization performance, has been used to intelligently
optimize EMS. The authors in [21–23] developed a method based on The defined SOC estimation method estimates residual capacity by
SAC that outperforms other DRL methods in optimizing EMS in complex calculating the BESS charge/discharge power per hour based on a
environments. energy capacity. The SOC is utilized to estimate the available energy
Existing DRL-based BESS scheduling methods have demonstrated of the BESS. The SOC of the BESS can be calculated as follows:
improved performance through simulation verification. However, they
have not simultaneously considered the DOD conditions of the BESS ⎧ |𝑃𝑡𝐵𝐸𝑆𝑆 |⋅𝛥𝑡
⎪𝑆𝑂𝐶𝑡 + 𝜂𝑐 𝐸𝐵
, 𝑃𝑡𝐵𝐸𝑆𝑆 > 0,
and the degradation cost due to the uncertainty of load/generation. In 𝑆𝑂𝐶𝑡+1 =⎨ (5)
𝐵𝐸𝑆𝑆
addition, performance analysis based on actual battery test results has ⎪𝑆𝑂𝐶𝑡 − 𝜂𝑑 |𝑃𝑡 |⋅𝛥𝑡
, 𝑃𝑡𝐵𝐸𝑆𝑆 < 0,
⎩ 𝐸𝐵
not been addressed. Based on this literature review, this paper proposes
a state-of-the-art DRL-based BESS scheduling that can learn optimized where 𝜂𝑐 represents the charging conversion efficiency and 𝜂𝑑 repre-
control to reduce grid operating costs, including the degradation cost, sents the discharging conversion efficiency. 𝐸𝐵 is the energy capacity
in a complex environment. We compare the BESS scheduling method of the BESS, which gradually decreases as the battery ages, so updating
using DRL with real battery DOD tests in similar environments to information about 𝐸𝐵 can improve the accuracy of SOC estimation.
analyze the impact on battery life and operating costs. The EMS plays an essential role in optimal operational scheduling
using BESSs, as it considers the grid states and TOU. The TOU pricing
The remainder of this paper is organized as follows. Section 2
provides consumers with opportunities to manage their electricity cost
describes a grid environment model and the battery aging model. The
by shifting use from on-peak periods to off-peak periods. The TOU
BESS management procedure based on DRL is introduced in Section 3.
𝐶𝑡𝑇 𝑂𝑈 is presented in Table 1 [24]. The electricity price during off-
In Section 4, we apply the proposed methods to various grid scenarios
peak hours is 2.35 cent/kWh, whereas that during on-peak hours is 32.6
based on real-world grid datasets and actual battery tests, and Section 5
cent/kWh. This TOU pricing can save electricity costs for on-peak loads
summarizes and concludes the paper.
by utilizing BESS at off-peak to charge energy at a lower cost. Thus, the
operating cost 𝐶𝑡𝑜 is determined by the utility grid as well as the BESS
2. Environment model charging/discharging schedule, and can be defined as follows:
𝐶𝑡𝑜 =(𝑃𝑡𝐺 + 𝑃𝑡𝐵𝐸𝑆𝑆 ) × 𝐶𝑡𝑇 𝑂𝑈 ,
2.1. Grid and time-of-use (6)
𝑠.𝑡. (1) − (4)

An electrical grid consists of a primary energy resource, the elec- The objective of the proposed EMS is to optimize the BESS schedul-
tricity grid, and renewable resources such as demand-driven loads, ing over a finite period such that the grid operates economically while
BESS, and PV. BESSs are installed to reduce the cost of electricity reducing the aging costs during demand and supply uncertainties. This
through arbitrage and to balance the energy imbalance caused by the objective function applies to similar electrical systems; these include
EVs such as BESSs.
uncertainty and irregular demand of solar power generation. The power
balance constraint that must be satisfied at all times can be formulated
2.3. Battery aging model
as follows:
𝑃𝑡𝐷𝐸 = 𝑃𝑡𝐺 + 𝜂𝑒 𝑃𝑡𝐵𝐸𝑆𝑆 + 𝜂𝑝 𝑃𝑡𝑃 𝑉 , The limited BESS lifespan is a critical factor in grid long term op-
⏟⏟⏟ ⏟⏟⏟ ⏟⏞⏞⏟⏞⏞⏟ ⏟⏟⏟ (1)
eration planning. Frequent charging/discharging will reduce the BESS
Demand Utility grid BESS PV
lifespan. In general, it is not recommended to discharge a battery
where 𝑃𝑡𝐷𝐸 is the grid demand, 𝑃𝑡𝐺 is the electrical power from the entirely, as this dramatically shortens its life. In other words, there
utility grid, 𝑃𝑡𝐵𝐸𝑆𝑆 is the BESS charging/discharging power, 𝜂𝑒 is the is a trade-off between the electricity and BESS aging costs in BESS
efficiency of the BESS inverter (𝜂𝑒 depends on the charge/discharge management. Increasing the BESS running time and cycling can re-
operating conditions), 𝑃𝑡𝑃 𝑉 is the PV output power, and 𝜂𝑝 is the duce the electrical costs but accelerate aging, which results in higher
efficiency of the PV inverter replacement costs. Without careful management, cyclical use causes

2
S.H. Kim and Y.-J. Shin Journal of Energy Storage 73 (2023) 109144

Fig. 1. Battery lifespan impact of SOC operating range.

Fig. 2. (a) Cycle life depending on DOD. (b) Partial cycling of the BESS.

the BESS to age rapidly, which results in BESS system replacement (100%–80%), B (79%–60%), C (59%–40%), and D (39%–0%), as shown
costs [25]. In [26], an EMS that considers the ESS battery degradation in Fig. 2(b). The power output during the time the battery spends in
cost was proposed. However, the aging indices used in previous studies SOC range 𝑋 is written as [9]:
did not facilitate the evaluation of the cyclic aging of daily scheduling. 𝑇
∫𝑡 𝑆𝑂𝐶𝑥 𝑑𝑡
Therefore, a proactive BESS management system is required to optimize 𝜌𝑥 = 0
× 100 [%], (8)
𝑇
economic operation while minimizing aging factors; such a system is ∫𝑡 𝑆𝑂𝐶𝑇 𝑜𝑡𝑎𝑙 𝑑𝑡
0
described below.
In Eq. (8), the numerator is the cumulative power output during the
time the battery spends in each SOC range 𝑥. Weighting functions are
2.3.1. Depth of Discharge (DOD) then used to calculate the PC value as follows:
A battery’s lifetime is highly dependent on the DOD. The DOD
indicates the percentage of the battery that has been discharged relative 𝜌 = (𝑎 × 𝜌𝐴 + 𝑏 × 𝜌𝐵 + 𝑐 × 𝜌𝐶 + 𝑑 × 𝜌𝐷 ), (9)
to the battery’s overall capacity. Deep discharge reduces the battery’s where 𝑎, 𝑏, 𝑐, and 𝑑 are the linear weighting factors that are determined
cycle life, as shown in Fig. 1. Also, overcharging can cause unstable by the BESS scheduling conditions. Based on the battery manufacturer’s
conditions. To increase battery cycle life, battery manufacturers recom- data sheets and research [14–16], they recommend operating the BESS
mend operating in the reliable SOC range and charging frequently as in the 20%–80% range. A charging at high SOC range accelerates
battery capacity decreases, rather than charging from a fully discharged battery aging as a result of problems such as corrosion and electrolyte
SOC or maintaining a high SOC. Therefore, as suggested in this paper, stratification [9]. As shown in Fig. 2(b), cycling in a low SOC range
deep discharge should be avoided by utilizing BESS scheduling that (range D, 𝜌𝐷 ) causes more damage to the battery than cycling in other
considers the DOD. Fig. 2(a) illustrates the relationship between the SOC ranges, so 𝑑 has the highest weight for capacity degradation. In
DOD and the cycle life; the wider the DOD range, the shorter the both high and low SOCs (𝜌𝐴 ), excessive charging can increase DOD and
battery’s life cycle. The DOD is calculated as follows: deteriorate the cycle life [27]. Therefore, 𝑎 is set to be higher than 𝑏
and 𝑐.
𝐷𝑘 = max(𝑆𝑂𝐶𝑡 ) − min(𝑆𝑂𝐶𝑡 ) (7)
To account for immediate rewards in the learning process, a degra-
where 𝐷𝑘 denotes the DOD at the 𝑘th cycle and 𝑡 is the time stamp. dation coefficient 𝑐𝑑,𝑘 is proposed to estimate the reward for every
charging or discharging control action. The degradation coefficient can
2.3.2. Operating range of BESS be defined as follows:
The impact of aging varies depending on the SOC ranges where the 𝜌
𝑐𝑑,𝑘 = 𝑘 , (10)
battery operation is concentrated, which can be evaluated using a par- 𝐷𝑘
tial cycling (PC) [9]. The PC reflects the BESS degradation conditions where 𝑐𝑑,𝑘 is updated at each episode 𝑘 based on the last training
based on SOC range. The SOC range 𝑋 is divided into four ranges: A operation. The degradation level varies depending on the PC value even

3
S.H. Kim and Y.-J. Shin Journal of Energy Storage 73 (2023) 109144

Fig. 3. Energy management system framework based on DRL in grid.

if the DOD is the same. As shown in the example in Fig. 2(b), the DOD and fed into the DRL to learn the optimal policy. The deviation between
of Range I, Range II, and Range III are the same but have different 𝑃 𝐶 the demand/supply forecasting data and the actual demand can be
values and have different effects on battery aging. written as:

2.3.3. State of Health (SOH) 𝛥𝑃𝑡 = (𝑃𝑡𝑓 ,𝑃 𝑉 − 𝑃𝑡𝑃 𝑉 ) − (𝑃𝑡𝑓 ,𝐷𝐸 − 𝑃𝑡𝐷𝐸 ), (12)
⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟
SOH is a principal parameter that evaluates a battery’s lifespan. PV deviation Demand deviation
With the gradual loss of available capacity during aging, the SOH is
characterized by the ratio of the battery’s remaining available capacity where 𝑃𝑡𝑓 ,𝑃 𝑉
is the PV forecast, 𝑃𝑡𝑓 ,𝐷𝐸 is the demand forecast. The de-
to its initial available capacity, which can be expressed as: viation of the supply/demand is balanced by the BESS. If the actual PV
𝐸𝑘 supply is insufficient due to overestimation, additional BESS discharge
𝑆𝑂𝐻(𝑘) = × 100[%], (11) is required. If the actual PV supply exceeds the predicted PV supply,
𝐸0
BESS charging is instead required.
where 𝐸𝑘 represents the remaining available capacity at 𝑘 cycles and
Actor–critic models are optimized using offline DRL based on the
𝐸0 is the initial BESS capacity. In this study, SOH is measured through
SAC algorithm, as shown in Fig. 4. A critic is trained offline to es-
a complete discharge test to measure the exact capacity degradation
timate the demand, PV generation and operating cost. Based on the
of the BESS. However, since such a complete discharge test adversely
critic network, an actor is developed to optimize the BESS scheduling,
affects the performance and aging of the battery, SOH is measured
which is updated by the SAC. As the DRL process progresses, the EMS
every 50 cycles.
continues to improve the performance and minimize the operational
costs. After the offline DRL, the BESS model can directly observe the
3. BESS management using DRL
state using the grid model as well as the BESS degradation model and
The BESS-integrated grid considered in this study is installed in output a control action to minimize the expected total operating cost.
a set of buildings located in Seoul, Korea. Fig. 3 shows a schematic In online applications, the BESS profile generated from the data-driven
diagram of the grid with the BESS and DRL-based EMS system. The model based on real-world grid datasets is implemented using an actual
environment generates an observation vector 𝑠𝑡 from the grid and the battery under similar conditions to observe the battery states according
battery aging models. The EMS constitutes agents that gradually learn to the charging/discharging pattern, DOD, and PC.
control strategies by leveraging the experience of repetitive interactions As shown in Fig. 5, the entire BESS system is equipped with eight
with the environment. BESS rack systems (a total of 1 MWh is installed), and each BESS rack
In this paper, the agent can observe the uncertainty encapsulated in system consists of 14 battery modules. Each module also consists of 14
the data and use a long short-term memory (LSTM) network and DRL battery packs. In the offline training process, the BESS capacity is set to
techniques to learn the state transitions for the features in the actual 1 MWh, which is the same as the actual grid BESS. Since the grid-level
data set. The grid state values, forecasting data and TOU are taken BESS in the grid is too large for aging cycles, actual battery testing is
directly from the dataset indexed at 𝑡 + 1. In contrast, the BESS state implemented in a similar environment at a low level using the same
values are determined by the control actions taken at time step 𝑡. The battery pack, which is disassembled from the same model as the actual
left part of the workflow (Fig. 4) forecasts the demand and PV using grid battery module.
variational mode decomposition (VMD) and LSTM network. LSTM can
extract features from the historical data and prevent the vanishing gra-
3.1. State
dient problem. The forecasting method was proposed in our previous
study [28]. Using VMD, the demand/PV profile is decomposed into a
weekly demand profile and then decomposed into intrinsic mode func- The current state of the information 𝑠𝑡 contains the grid model and
tions that capture periodic features. Then, LSTM model is trained using BESS aging model states. The decision-making process of the Markov
intrinsic mode functions from the historical profile. The demand/PV is decision process (MDP) model for BESS scheduling is proposed in this
predicted by integrating the results of analyzing its periodic features. paper. The MDP signifies that the next state at 𝑡 + 1 is only related to
Then, the demand and PV are predicted, concatenated with other states, the action and state information at time 𝑡 and is independent of the

4
S.H. Kim and Y.-J. Shin Journal of Energy Storage 73 (2023) 109144

Fig. 4. The workflow of the proposed BESS Scheduling based on SAC.

Fig. 5. Energy storage system, battery module and battery pack used in the experiment.

previous state at time 𝑡 − 1, 𝑡 − 2, 𝑡 − 3, …. The state 𝑠𝑡 ∈ 𝑆 at time step the proposed algorithm is to find the optimal policy 𝜋 ∗ that maximizes
𝑡 are defined below: the reward (reducing the overall cost).

𝑠𝑡 = [𝑃𝑡𝐺 , 𝑃𝑡𝐷𝐸 , 𝑃𝑡𝑃 𝑉 , 𝑃𝑡𝑓 ,𝐷𝐸 , 𝑃𝑡𝑓 ,𝑃 𝑉 , 𝑃𝑡𝐵𝐸𝑆𝑆 , 𝑆𝑂𝐶𝑡 , 𝐶𝑡𝑇 𝑂𝑈 ], (13)
⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏟ ⏟⏟⏟ 3.3. Reward and penalty
Microgrid BESS TOU

The reward 𝑟𝑡 at time slot 𝑡 indicates the immediate return, which


where 𝑃𝑡𝐺 , 𝑃𝑡𝐷𝐸 , 𝑃𝑡𝑃 𝑉 , 𝑃𝑡𝑓 ,𝐷𝐸 , 𝑃𝑡𝑓 ,𝑃 𝑉 is from the grid and the forecasting
is obtained when the agent executes the action 𝑎𝑡 based on the state
models, and 𝑃𝑡𝐵𝐸𝑆𝑆 and 𝑆𝑂𝐶𝑡 are from the BESS models, 𝐶𝑡𝑇 𝑂𝑈 is from
𝑠𝑡 . The reward is the key to achieve proper performance in BESS
the TOU pricing.
scheduling. In this paper, the goal of BESS scheduling is to maximize
the overall electricity cost savings while considering the cost of BESS
3.2. Action degradation. Supposing that the experiment start at time slot 𝑡 in one
episode, the cumulative reward is expressed as:
The control signal 𝑎𝑡 is sent by the EMS to control the BESS power [ ]
output. Note that the action is chosen by following the strategy 𝜋, 𝑅𝑡 = 𝑟𝑡 + 𝛾 𝑟𝑡+1 + 𝛾𝑟𝑡+2 ⋯ + 𝛾 𝑇 −𝑡−1 𝑟𝑇 , (14)
which will be updated by the SAC algorithm in the direction of a higher
where 𝑇 represents the finite MDP steps and 𝛾 ∈ [0, 1] is a discount
reward. The action 𝑎𝑡 ∶ −1 ≤ 𝑎𝑡 ≤ 1 is defined as the BESS’s normalized
factor, which is responsible for balancing the current and future return.
power to prevent DRL overestimation and divergence, since the SAC
Thus, given the direction of a policy 𝜋, the value function for state 𝑠𝑡
selects an action (charging/discharging or rest) from an action space
can be described as follows:
based on the policy 𝜋. The actual power supply can be reconstructed
[ ]
by multiplying 𝑃𝐵 (𝑃𝐵 is the maximum power of the BESS). The goal of 𝑉 𝜋 (𝑠𝑡 ) = 𝐸 𝑅(𝑠𝑡 , 𝑡)|𝑠𝑡 = 𝑠 , (15)

5
S.H. Kim and Y.-J. Shin Journal of Energy Storage 73 (2023) 109144

In addition, the battery degradation penalty and forecasting er- used to define the objective function by the mean-squared error in the
ror penalty functions are proposed to prevent excessive charging/ Q-function as follows [19,20]:
discharging and an increase in DOD/PC, which are defined as follows: [ ]
(𝜃) = E (𝑟𝑡 + 𝛾 max 𝑄(𝑠𝑡+1 , 𝑎𝑡+1 , 𝜃) − 𝑄(𝑠𝑡 , 𝑎𝑡 , 𝜃))2 (20)
⎧ 𝑎𝑡+1
⎪𝜑1 × 𝑐𝑑,𝑘 × 𝐶𝐵 , 𝑆𝑂𝐶𝑚𝑖𝑛 < 𝑆𝑂𝐶𝑡 ≤ 0.4,
⎪ The DNN accepts a continuous state as an input and outputs an esti-
⎪0 0.4 < 𝑆𝑂𝐶𝑡 ≤ 0.6, mate of the Q-function for each discrete action; when acting, the DNN
𝜏𝑡𝐷 =⎨ (16)
chooses the maximum action-value at each state. However, the max
⎪0 0.6 < 𝑆𝑂𝐶𝑡 ≤ 0.8,
⎪ operator leads to overestimations. The DDQN mitigates this problem
⎪𝜑2 × 𝑐𝑑,𝑘 × 𝐶𝐵 , 0.8 < 𝑆𝑂𝐶𝑡 ≤ 𝑆𝑂𝐶𝑚𝑎𝑥 , using two separate networks to decouple the action selection from the

target Q value generation. The DDQN uses the following target:
where 𝜑1 and 𝜑2 are the degradation penalty coefficients and 𝐶𝐵 is the ( )
battery cost per kWh. In this study, 𝐶𝐵 is set to a constant value that 𝑦𝐷𝐷𝑄𝑁 = 𝑅𝑡 + 𝛾𝑄 𝑠𝑡+1 , arg min 𝑄(𝑠𝑡+1 , 𝑎𝑡+1 , 𝜃), 𝜃 − (21)
𝑡
does not change during daily scheduling [20], and the degradation cost 𝑎𝑡+1
is affected by aging indices such as the DOD and PC. The degradation The DDQN can more effectively overcome the curse of dimensionality
coefficient 𝑐𝑑,𝑘 can be determined by the defined aging index, which problem by selecting the best actions using the online instead of the
increase the degradation cost [29]. target model [30].
The proposed BESS scheduling method determines the optimal BESS
charging time and charge/discharge rate based on PV and load fore- 3.4.2. Deep Deterministic Policy Gradients (DDPG)
casts. However, deviations in demand and supply will occur due to The DDPG is a type of actor–critic-based off-policy method and
forecast errors. The proposed DRL and MPC models control the BESS model-free algorithm based on the DPG, and it can operate in a con-
based on predictions, and if the predictions are inaccurate, optimized tinuous state and action space. This algorithm uses DNNs to establish
BESS charging/discharging cannot be achieved, resulting in increased two approximation functions from the actor–critic algorithm [13].
operating costs. Therefore, this study considers a penalty for forecast The actor network can be described as a policy function 𝜇(𝑠|𝜃 𝜇 ) that
uncertainty. The forecasting error penalty is defined as follows: deterministically maps states to actions, whereas the critic 𝑄(𝑠, 𝑎) is
trained using the Bellman equation. While the agent is being trained,
⎧ 𝐵𝐸𝑆𝑆 , 𝛥𝑃 > 0,
⎪𝜓1 × 𝑃𝑡 the critic and actor network weights are continually updated based on
⎪ the observed reward at each time step. To facilitate training, these two
𝜏𝑡𝐹 = ⎨0 𝛥𝑃 = 0, (17)
⎪ networks are also created with a copy: a actor target network 𝜇 ′ with
′ ′
⎪𝜓2 × 𝑃𝑡𝐵𝐸𝑆𝑆 , 𝛥𝑃 < 0, parameter 𝜃 𝜇 and a critic target network 𝑄′ with parameter 𝜃 𝑄 . A loss
⎩ function 𝐿 is computed as the mean squared error between the target
where 𝜓1 and 𝜓2 are the deviation penalty coefficients. This penalty value and the critic’s estimated Q-value, which is written as:
function is imposed for charging the BESS when the actual load is
1 ∑
𝑀
greater than expected or discharging the BESS when the actual load 𝐿(𝜃 𝑄 ) = (𝑦 − 𝑄(𝑠𝑡 , 𝑎𝑡 |𝜃 𝑄 )2 ), (22)
is smaller than expected. 𝑀 𝑖=1 𝑖
The reward 𝑟𝑡 that results from the EMS 𝑎𝑡 is set to be equal to the where 𝑀 is the size of the experience mini-batch, and 𝑦𝑖 is obtained
grid’s negative overall cost 𝐶𝑡𝑜 . At each time step, the immediate reward by applying Eq. (21) as in Q-learning. The critic network’s parameters
can be expressed as follows: 𝜃 𝑄 are updated by minimizing 𝐿 across the mini-batch of experiences
sampled from the replay buffer. On the other hand, the actor network’s
𝑟𝑡 (𝑠𝑡 , 𝑎𝑡 ) = −[𝐶𝑡𝑜 + 𝜏𝑡𝐷 + 𝜏𝑡𝐹 ], (18)
parameters are updated according to the gradient of the value function
where 𝑟𝑡 is the reward of making decision 𝑎𝑡 in state 𝑠𝑡 . expectation 𝐽 . The resulting policy gradient ▿𝜃𝜇 𝐽 is used to update the
actor, and is written as:
𝑀 [ ]
3.4. Benchmark DRLs for performance evaluation 1 ∑
▽𝜃 𝜇 𝐽 ≈ ▽𝑎 𝑄(𝑠𝑖 , 𝑎|𝜃 𝑄 )|𝑎=𝜇(𝑠𝑖 |𝜃𝜇 ) ▽𝜃 𝑢 𝜇(𝑠𝑖 |𝜃 𝑢 ) (23)
𝑀 𝑖=1
Before introducing the proposed BESS scheduling results, we briefly
introduce some background information on DRL. We compare the Finally, the target actor and critic networks are updated using a
performance of the proposed methods using variant DRLs to optimize smoothing factor 𝜏 to prevent learning instabilities [31]:
′ ′
BESS scheduling. 𝜃 𝑄 ← 𝜏𝜃 𝑄 + (1 − 𝜏)𝜃 𝑄
′ ′ (24)
𝜃 𝜇 ← 𝜏𝜃 𝜇 + (1 − 𝜏)𝜃 𝜇
3.4.1. Double Deep Q Learning (DDQN)
Q-learning uses a critic network and the Q-function, which infers 3.4.3. Soft Actor Critic (SAC)
an optimal policy from the state–action pair. The action-value function Conventional model-free DRL methods have two limitations: high
indicates the extent to which the action taken in each state is effectively sampling complexity and weak convergence, which both depend on
denoted by 𝑄𝜋 (𝑠, 𝑎). The optimal 𝑄∗𝜋 ∗ (𝑠, 𝑎) is used to represent the max- parameter tuning. To improve the sample efficiency, off-policy al-
imum accumulative reward of action 𝑎𝑡 in state 𝑠𝑡 , and the action-value gorithms such as the DDPG were proposed, but their performance
𝑄(𝑠𝑡 , 𝑎𝑡 ) is updated using: relies heavily on hyper-parameters. Therefore, the state-of-the-art off-
[ ] policy DRL algorithm based on maximum-entropy, SAC, is proposed.
𝑄∗𝜋 ∗ (𝑠, 𝑎) ← (1 − 𝜃)𝑄(𝑠𝑡 , 𝑎𝑡 ) + 𝜃 𝑟𝑡 + 𝛾 max 𝑄(𝑠𝑡+1 , 𝑎𝑡+1 ) , (19) Similar to the DDPG, the SAC also uses an actor–critic architecture and
where 𝜃 represents the learning rate, which determines the effect of experience replay buffer that reuses past experiences for an off-policy
the new reward on the old 𝑄(𝑠𝑡 ) value, and 𝛾 is the discount factor formulation. Different from DDPG, the primary feature of the SAC is
entropy regularization; the algorithm is based on the maximum entropy
that balances the immediate and future rewards. However, Q-learning
in the reinforcement learning framework, and its goal is to maximize
is severely affected by the curse of dimensionality because of its tabular
both the expected rewards and the entropy. This goal is expressed as
approach to storing the Q-values. To overcome this problem, the value
follows [21–23]:
function for the standard Q-learning algorithm is replaced by a DQN [∝ ]
with the parameter 𝜃, which is given by the DNN’s weights and biases ∑
𝜋 ∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝜋 𝐸𝜋 𝛾 𝑡 (𝑟𝑡 + 𝛼𝐻𝑡𝜋 ) , (25)
such that 𝑄𝜋 (𝑠, 𝑎) ≈ 𝑄(𝑠, 𝑎, 𝜃). This approximation is subsequently 𝑡=0

6
S.H. Kim and Y.-J. Shin Journal of Energy Storage 73 (2023) 109144

Fig. 6. (a) The MPC-EMS structure for BESS control. (b) The grid/ess model scheme.

where H is the Shannon entropy term that represents the agent’s atti- • Apply only the first optimal action and discard the remaining
tude in taking arbitrary actions, and 𝛼 is a regularization coefficient that samples
indicates the importance of the entropy term for rewards. In general,
Compared to DRL, MPC trains with a pre-defined model, so the
considering conventional DRL algorithms, 𝛼 is 0. The maximization
training time is much faster, but external interference or model uncer-
of this target function has a close connection with the exploration–
tainty can have a large impact on performance.
exploitation trade-off, and it ensures that the agent is explicitly pushed
towards the exploration of new policies and prevented from providing
4. Case study
sub-optimal results. As a result, the SAC provides learning robustness
and sampling efficiency.
In this section, the proposed DRL-based BESS scheduling was im-
plemented and compared with several common solutions, including
3.4.4. Model Predictive Control (MPC) the DDQN, DDPG and MPC. This section also includes a performance
MPC is widely used in industry as an effective approach to handling evaluation using simulated scenario based on a real-world grid datasets
large-scale multivariate constraint control problems. The MPC model is and actual battery cycle tests. For the purposes of this study, a cycle is
used for performance comparison with the proposed DRL model. The defined as 24 h of BESS scheduling starting at 0:00 AM. The dataset
MPC-EMS structure for BESS control and the grid/ess model scheme time interval used is 5 min, which is the measurement interval of a
are shown in Fig. 6. The MPC is to select control actions by iteratively smart meter, so the control interval in the simulation is also set to
solving an online constrained optimization problem that is designed to 5 min. The composition ratio of the dataset for training data and vali-
minimize a performance index over a finite prediction range based on dation data is 7:3, respectively. The proposed models estimate the total
predictions obtained from a model of the system. The operating/aging operating costs, including the battery degradation costs, and implement
cost objective function and constraints are formulated in the same an optimized BESS scheduling for actual batteries to compare the SOH
environmental model used for DRL. In the MPC approach, the control according to the DOD. In addition, to evaluate the generalization of a
inputs for each stage are computed online instead of using precomputed well-trained agent, three scenarios were introduced in Table 2.
offline. In each sampling period, as shown in Fig. 6(a), the system Battery packs with similar initial capacities were used as scheduling
state is updated, the optimal control problem is solved online, and the cycle test samples. The average full capacity of the battery pack used
controller’s time window is stepped back by one step. In this study, the was 53.81 Ah, which was reduced by 10.31% when compared to the
MPC-EMS model is controlled to minimize the operating cost using the nominal capacity (60 Ah) due to actual grid operation history.
same objective function (6) as the DRLs under the same constraints (1)–
(4). As shown in Fig. 6(b), the MPC model determined state predictions 4.1. Training process
and actions for each 𝑡 as follows:
The MPC model determines the control of the BESS by considering The hyperparameters used in this experiments were well-tuned by
the state variables and predicted results. The MPC model plans control hyperparameter tuning techniques from previous reports of similar
commands by considering the state variables at time 𝑡, outputs control studies. The Adam optimizer was used to learn the DNN weights. The
commands, and receives feedback on the current state variables. It com- discount factor was set to 0.95, batch size was 64 and the learning rate
pares and evaluates the output and response of the control commands was set to 0.001 [20,31]. The DDQN action spaces were discretized to
and updates the cost function to calculate the control input for the next 10 kW intervals from −150 kW to 150 kW. The examined DRL methods
control command. were implemented using MATLAB and Python.
At each time 𝑡: The training results of the networks of DRL models are shown in
Fig. 7. The DRL agents were trained by 7000 episodes, and the cost
• MPC model get new state to update the estimate of the current curve obtained using each DRL method is shown in Fig. 7. Fig. 7(a)
state represents the performance evolution of the network during the train-
• Solve the optimization problem within the constraints ing process of the proposed DRLs. The DDQN, DDPG and SAC are

7
S.H. Kim and Y.-J. Shin Journal of Energy Storage 73 (2023) 109144

Table 2
Operating conditions in different scenarios.
Scenario BESS Initial SOC [%] Average load consumption [kW] Average forecasting deviation [%]
SC #1 50 396.23 7.21
SC #2 60 275.41 5.56
SC #3 40 335.87 3.89

Fig. 7. (a) Daily total operating cost evaluated for the examined DRL methods. (b) Excessive DOD penalty. (c) Forecasting error penalty.

off-policy DRL algorithms, and they use random sampling from inside dependent on TOU, but additional scheduling is performed according
the replay buffer for training. The DDQN had difficulty reaching the to the deviation in supply/demand.
optimized policy until the replay buffers were filled. The DDQN perfor- As shown in Fig. 9, the SOC ranges according to the proposed DRL
mance did not improve after 5000 episodes and converged. The DDPG methods are represented and are implemented in scenario 1. Since
demonstrated superior performance when compared to DDQN, but it the MPC-EMS method does not consider the SOC range, the DOD
has a slower learning rate when compared to the SAC. The SAC agent
is significantly increased due to excessive charging/discharging. As
improves its policy continuously during the first 5000 episodes and
shown in Table 3, the deep discharge time (DDT), another cause of
then stabilizes around a null reward. In Fig. 7(b) and Fig. 7(c), the
accelerated battery aging, is defined as the time in which the SOC
change in penalty values is presented. The excessive DOD penalty was
consistently reduced and converged over 3000 episodes. On the other is less than 40% [9]. The MPC-EMS method uses existing methods
hand, the forecasting error penalty increased to about 2000 episodes, based on TOUs without considering the BESS aging conditions. Thus,
but they trained rapidly afterward and converged after 5000 episodes the MPC-EMS is maximally charged in the lowest TOU range and
Since deviation in the demand/PV forecasting exists, the forecasting continuously discharges at peak loads with a DDT of 4.5 h, resulting
error penalty does not converge to zero. Similarly, the battery degra- in a high DOD as shown in Fig. 9. In particular, because of the low
dation penalty did not converge to zero because the scheduling SOC initial SOC conditions in scenario 3, there are DDT intervals for all
range was outside the 40%–80% range. This result occurred because scheduling methods presented in the comparison groups (MPC, DDQN,
it is advantageous to charge/discharge the BESS even if it undergoes and DDPG). However, there is no DDT in the SAC-EMS scheduling,
penalties. The experiment demonstrates that the SAC can be learned which maintained a stable DOD. Similarly, when comparing the overall
directly in an environment due to efficient learning. operating cost, the MPC-EMS method has a higher operating cost than
the three methods using DRL (DDQN, DDPG, and SAC). In the most
4.2. Results and discussion heavily loaded Scenario 1, the MPC-EMS method has a 27% higher
operating cost ($ 722.70) compared to the best performing SAC-EMS
4.2.1. Experimental results
($ 567.35). Similarly, MPC-EMS has a 25% higher operating cost ($
Fig. 8 shows an example of BESS scheduling. In Fig. 8(a), 𝑃𝑡𝐷𝐸
687.32) than SAC-EMS ($ 545.82) in Scenario 2 and 23% higher
represents the actual demand, 𝑃𝑡𝑃 𝑉 represents the actual PV generation,
operating cost ($ 669.91) than SAC-EMS ($ 541.62) in Scenario 3.
𝑃𝑡𝑓 ,𝐷𝐸 represents the predicted demand data, and 𝑃𝑡𝑓 ,𝑃 𝑉 represents
the predicted PV supply. Fig. 8(b) shows the TOU pricing. Fig. 8(c) Although the MPC-EMS method also forecasts the future state of PV,
represents the experiment results using the proposed scheduling DRL load, etc. and optimizes through iterative calculations reflecting the
methods (the red dotted line represents the BESS scheduling using state of the BESS, the performance is subject to the accuracy of the
the MPC based on TOU, the black dotted line represents the BESS defined model. Since the DRL has strengths in solving uncertainties and
scheduling using the DDQN, the green dotted line represents the BESS nonlinearities in the environment, it outperforms the MPC-EMS model
scheduling using the DDPG, and the blue solid line represents the BESS in conditions with uncertainties such as aging costs of batteries and
scheduling using the SAC). BESS scheduling methods are primarily deviations between demand and load presented in this study.

8
S.H. Kim and Y.-J. Shin Journal of Energy Storage 73 (2023) 109144

Fig. 8. The BESS scheduling is examined using the proposed DRL methods. (a) Comparison of actual and forecast values. (b) Time-of-Use. (c) BESS Scheduling results.

Fig. 9. Average SOC range according to the proposed DRL methods.

Table 3
Daily total operating cost and depth of discharge in the different scenarios.
MPC DDQN DDPG SAC
Scenario SC #1 SC #2 SC #3 SC #1 SC #2 SC #3 SC #1 SC #2 SC #3 SC #1 SC #2 SC #3
Total operating cost [$] 722.70 687.32 669.91 626.83 599.82 588.72 592.25 556.72 549.81 567.35 545.82 541.62
Depth of discharge [%] 57.43 54.21 55.72 39.90 38.73 38.88 37.99 38.24 37.95 35.43 36.72 37.44
Deep discharge time [h] 4.5 2.25 6.75 0 0 2.5 0 0 0.75 0 0 0

4.2.2. Actual battery aging tests 350 cycles, the MPC-EMS capacity loss is about 11.51%. In comparison,
Fig. 10 shows the test results for the application of optimized the DDQN-EMS and DDPG-EMS capacity losses are 9.63% and 9.15%,
BESS charging/discharging scheduling to actual batteries and shows respectively. In particular, the SAC-EMS scheduling exhibited the small-
the effect of DOD on battery health. After each BESS scheduling, a full est capacity loss, 5.97%, in battery aging tests. The MPC-EMS method
discharge was carried out to verify the remaining capacity every 50 demonstrates faster capacity reduction due to the higher DOD while the
cycles. This full discharge shows that the SOH and capacity fade. After DRL methods showed slower capacity reduction since they maintained

9
S.H. Kim and Y.-J. Shin Journal of Energy Storage 73 (2023) 109144

Data availability

Data will be made available on request.

Acknowledgments

This work was supported by the Korea Institute of Energy Technol-


ogy Evaluation and Planning (No. KETEP-20202020800290 & 20202
020900290) and by the National Research Foundation of Korea (NRF)
grant funded by the Ministry of Science, ICT & Future Planning, (No.
NRF-2020R1A2B5B03001692). The authors would like to thank LG
electronics for the real-world grid datasets and battery energy storage
system.

References

[1] M. Wang, et al., The value of vehicle-to-grid in a decarbonizing California grid,


J. Power Sources 513 (2021) 230472.
Fig. 10. The SOH changes of battery packs using the proposed BESS scheduling
[2] L. Luo, Optimal scheduling of a renewable based microgrid considering photo-
methods.
voltaic system and battery energy storage under uncertainty, J. Energy Storage
28 (2020) 101306.
[3] J. Wu, et al., Energy management strategy for grid-tied microgrids consider-
a stable SOC and avoided the DDT. These experimental results indicate ing the energy storage efficiency, IEEE Trans. Ind. Electron. 65 (12) (2018)
9539–9549.
that if BESS maintains a high DOD with a low SOC range, it can reduce [4] H.A.U. Muqeet, A. Ahmad, Optimal scheduling for campus prosumer microgrid
the battery lifetime and increase the degradation costs. considering price based demand response, IEEE Access 8 (2020) 71378–71394.
[5] Y. Li, et al., Optimal scheduling of an isolated microgrid with battery storage
considering load and renewable generation uncertainties, IEEE Trans. Ind.
5. Conclusion Electron. 66 (2) (2019) 1565–1575.
[6] F. Garcia-Torres, et al., Optimal economic schedule for a network of microgrids
This study proposes the development of the BESS scheduling method with hybrid energy storage system using distributed model predictive control,
IEEE Trans. Ind. Electron. 66 (3) (2019) 1919–1929.
to address the grid energy management problem. Data-driven DRL
[7] F. Garcia-Torres, C. Bordons, Optimal economical schedule of hydrogen-based
optimization methods have been proposed because it is difficult to microgrids with hybrid storage using model predictive control, IEEE Trans. Ind.
have a perfect physical/predictive model in actual BESS operation; the Electron. 62 (8) (2015) 5195–5207.
BESS method considers the battery’s SOC range to reduce the opera- [8] U. Raveendrannair, et al., An analysis of multi objective energy scheduling in
PV-BESS system under prediction uncertainty, IEEE Trans. Energy Convers. 36
tion/degradation cost and extend the battery’s lifetime. This proposed
(3) (2021) 2276–2286.
method leverages the performance of the state-of-the-art SAC DRL in [9] L. Liu, et al., Managing battery aging for high energy availability in green
combination with the battery aging model, which is designed using the datacenters, IEEE Trans. Parallel Distrib. Syst. 28 (12) (2017) 3521–3536.
battery aging index. [10] M.A. Ortega-Vazquez, Optimal scheduling of electric vehicle charging and
vehicle-to-grid services at household level including battery degradation and
The proposed methods are implemented in an actual battery test
price uncertainty, IET Gener. Transm. Distrib. 8 (6) (2014) 1007–1016.
and contribute to real-time scheduling implementation. The proposed [11] C. Zhou, et al., Modeling of the cost of EV battery wear due to V2G application
method’s performance was evaluated by performing various case study in power systems, IEEE Trans. Energy Convers. 26 (4) (2011) 1041–1050.
to verify its adaptability in various situations; additionally, the aging [12] B. Xu, et al., Factoring the cycle aging cost of batteries participating in electricity
markets, IEEE Trans. Power Syst. 33 (2) (2018) 2248–2259.
cycle test shows that BESS management considering SOC/DOD condi-
[13] Yan, et al., Deep reinforcement learning-based optimal data-driven control of
tions can extend the battery’s lifetime. Furthermore, optimization for battery energy storage for power system frequency support, IET Gener. Transm.
DOD could become even more important when long-term operation of Distrib. 14 (25) (2020) 6071–6078.
the BESS is considered. The proposed approach is expected to be more [14] S.-J. Park, et al., Depth of discharge characteristics and control strategy to
optimize electric vehicle battery life, J. Energy Storage 59 (2023) 106477.
economical because long-term operation must also include long-term
[15] R.D. Deshpande, et al., Physics inspired model for estimating ‘cycles to failure’
maintenance/replacement costs due to battery aging. Therefore, it is as a function of depth of discharge for lithium ion batteries, J. Energy Storage
necessary to operate the BESS in an optimized DOD range to avoid 33 (2021) 101932.
increasing costs and capacity loss due to aging. [16] M. Eskandari, et al., Battery energy storage systems (BESSs) and the economy-
dynamics of microgrids: Review, analysis, and classification for standardization
of BESSs applications, J. Energy Storage 55 (Part B) (2022) 105627.
CRediT authorship contribution statement [17] S. Lee, D.H. Choi, Reinforcement learning-based energy management of smart
home with rooftop solar photovoltaic system, Sensors 19 (18) (2019) 3937.
[18] Y. Du, F. Li, Intelligent multi-microgrid energy management based on deep neural
Seon Hyeog Kim: Methodology, Writing – original draft, Valida- network and model-free reinforcement learning, IEEE Trans. Smart Grid 11 (2)
tion, Investigation, Experiments, Visualization, Reviewing and writing. (2020) 1066–1076.
Yong-June Shin: Resource, Funding acquisition, Conceptualization, [19] Z. Wan, et al., Model-free real-time EV charging scheduling based on deep
Supervision, Reviewing & editing. reinforcement learning, IEEE Trans. Smart Grid 10 (5) (2019) 5246–5257.
[20] J. Cao, et al., Deep reinforcement learning based energy storage arbitrage with
accurate lithium-ion battery degradation model, IEEE Trans. Smart Grid 11 (5)
Declaration of competing interest (2020) 4513–4521.
[21] B. Zhang, et al., Soft actor-critic–based multi-objective optimized energy con-
version and management strategy for integrated energy systems with renewable
The authors declare the following financial interests/personal rela- energy, Energy Convers. Manag. 243 (2021) 114381.
tionships which may be considered as potential competing interests: [22] S. Wang, et al., Deep reinforcement scheduling of energy storage systems for real-
Seon Hyeog Kim reports financial support was provided by Yonsei time voltage regulation in unbalanced LV networks with high PV penetration,
IEEE Trans. Sustain. Energy 12 (4) (2021) 2342–2352.
University. Seon Hyeog kim reports a relationship with Korea Institute
[23] J. Wu, et al., Battery thermal- and health-constrained energy management for
of Energy Technology Evaluation and Planning that includes: funding hybrid electric bus based on soft actor-critic DRL algorithm, IEEE Trans. Ind.
grants. Seon Hyeog Kim has patent pending to Seon Hyeog Kim. Electron. 17 (6) (2021) 3751–3761.

10
S.H. Kim and Y.-J. Shin Journal of Energy Storage 73 (2023) 109144

[24] Korea electric power corporation (KEPCO), 2020, [Online]. Available: https: [28] S. Kim, et al., Deep learning based on multi-decomposition for short-term load
//home.kepco.co.kr. forecasting, Energies 11 (12) (2018) 3433.
[25] T.A. Lehtola, A. Zahedi, Electric vehicle battery cell cycle aging in vehicle to [29] Z. Wang, et al., Dueling network architectures for deep reinforcement learning,
grid operations: a review, IEEE Trans. Emerg. Sel. Topics Power Electron. 9 (1) in: Proc. Int. Conf. Learning Representations, 2016.
(2021) 423–437. [30] V. Bui, et al., Double deep 𝑄-learning-based distributed operation of battery
[26] C. Ju, et al., A two-layer energy management system for microgrids with hybrid energy storage system considering uncertainties, IEEE Trans. Smart Grid 11 (1)
energy storage considering degradation costs, IEEE Trans. Smart Grid 9 (6) (2020) 457–469.
(2018) 6047–6057. [31] Y. Ye, et al., Model-free real-time autonomous control for a residential multi-
[27] E. Wikner, T. Thiringer, Extending battery lifetime by avoiding high SOC, Appl. energy system using deep reinforcement learning, IEEE Trans. Smart Grid 11 (4)
Sci. 8 (2018) 1825–1840. (2020) 3068–3082.

11

You might also like