Batteries 10 00131
Batteries 10 00131
Batteries 10 00131
Article
Controlling Algorithm of Reconfigurable Battery for State of
Charge Balancing Using Amortized Q-Learning
Dominic Karnehm 1, * , Wolfgang Bliemetsrieder 2 , Sebastian Pohlmann 1 and Antje Neve 1
1 Electrical Engineering and Technical Informatics Department, University of the Bundeswehr Munich,
85577 Neubiberg, Germany; [email protected] (S.P.); [email protected] (A.N.)
2 Electrical Power Systems and Information Technology, University of the Bundeswehr Munich,
85577 Neubiberg, Germany; [email protected]
* Correspondence: [email protected]
Abstract: In the context of the electrification of the mobility sector, smart algorithms have to be
developed to control battery packs. Smart and reconfigurable batteries are a promising alternative
to conventional battery packs and offer new possibilities for operation and condition monitoring.
This work proposes a reinforcement learning (RL) algorithm to balance the State of Charge (SoC)
of reconfigurable batteries based on the topologies half-bridge and battery modular multilevel
management (BM3). As an RL algorithm, Amortized Q-learning (AQL) is implemented, which
enables the control of enormous numbers of possible configurations of the reconfigurable battery
as well as the combination of classical controlling approaches and machine learning methods. This
enhances the safety mechanisms during control. As a neural network of the AQL, a Feedforward
Neuronal Network (FNN) is implemented consisting of three hidden layers. The experimental
evaluation using a 12-cell hybrid cascaded multilevel converter illustrates the applicability of the
method to balance the SoC and maintain the balanced state during discharge. The evaluation shows
a 20.3% slower balancing process compared to a conventional approach. Nevertheless, AQL shows
great potential for multiobjective optimizations and can be applied as an RL algorithm for control in
Citation: Karnehm, D.; Bliemetsrieder, power electronics.
W.; Pohlmann, S.; Neve, A.
Controlling Algorithm of Keywords: reconfigurable battery; neuronal network; SoC balancing; reinforcemnt learning; amortized
Reconfigurable Battery for State of Q-learning
Charge Balancing Using Amortized
Q-Learning. Batteries 2024, 10, 131.
https://doi.org/10.3390/
batteries10040131 1. Introduction
Academic Editors: Chunwen Sun, The balancing of the State of Charge (SoC) in battery packs is one of the main chal-
Nima Tashakor, Zhan Ma, Ricardo lenges in the field of Electrical Vehicles (EVs). Therefore, many types of balancing methods
Lizana Fuentes and Hyoung Jun Lim have been proposed. It has to be distinguished between active and passive balancing
Received: 1 February 2024
methods. Passive methods are power-loose methods. For this purpose, battery cells with a
Revised: 25 March 2024
higher SoC transfer energy through a shunt resistor to discharge the cell [1]. This method
Accepted: 7 April 2024 is cost-effective in implementation. However, it reduces the efficiency of the battery pack.
Published: 15 April 2024 Active equalization circuits can be used in battery packs to transfer energy from cells with
higher charge to those with lower charge [2–4]. Van et al. [5] propose an SoC balancing
algorithm for battery cells connected in series. Therefore, a modified bidirectional cuk
converter is used to transfer energy from cells with a higher SoC to cells with a lower one.
Copyright: © 2024 by the authors. The usage of this algorithm has been proven during the discharge and relaxation of the
Licensee MDPI, Basel, Switzerland. battery pack. In the literature, different Reinforcement Learning (RL) algorithms have been
This article is an open access article discussed to optimize the system operation parameters. For instance, balancing the voltage
distributed under the terms and of the capacitor and the thermal stress [6] of a modular multilevel converter (MMC) or the
conditions of the Creative Commons optimization of the efficiency of the DC-DC converter of the dual active bridge (DAB) [7,8].
Attribution (CC BY) license (https://
Reconfigurable batteries allow for the active balancing of different battery parameters,
creativecommons.org/licenses/by/
such as voltage [9], State of Temperature (SoT) [10], and SoC [11–13]. To address the
4.0/).
problem of SoC balancing in reconfigurable batteries and the modular multilevel inverter
(MMI), various algorithms have been suggested. Several centralized algorithms have
been proposed to balance the SoC during battery relaxation [12,13] and during the charge
and discharge of the reconfigurable battery [11]. To reduce the communication required
between the main controller and the indivudual modules, decentralized algorithms have
been introduced [14–17]. Furthermore, machine-learning-based algorithms have shown
possible benefits in the field of controlling reconfigurable batteries. Jiang et al. [18] propose
an RL Deep Q-Network (DQN) to control a reconfigurable battery using a three-switch
topology Battery Modular Multilevel Management (BM3) [19]. Stevenson et al. [20] use a
DQN to reduce the imbalance of the SoC and the current of a reconfigurable battery with
four battery cells. The authors improved the potential to increase economic viability and
enhance the battery operation, which is crucial in the context of sustainability of EVs.
Mashayekh [21] introduces a decentralized online RL algorithm to control an SoC
balanced modular reconfigurable battery converter based on the topology of the half-
bridge converter. The focus of this method is to reduce the communication between the
controller and the modules as well as to reduce the intercommunication between the
modules to a minimum. Therefore, an algorithm based on game theory is implemented,
where each module tries to maximize the reward of itself. Caused by the design of the
reward function of each module, the reward of the entire system increases if each individual
module maximizes its reward. The algorithm shows high usability for a balanced system.
In the case of an unbalanced initial state, the authors suggest the implementation of other
algorithms to balance first, followed by the usage of the decentralized algorithm to reduce
the communication requirements during the balanced state.
Yang et al. [22] propose an online-learning DQN algorithm to balance the SoC of a
reconfigurable battery with 64 cells and a predefined voltage output. For this, the authors
used a neural network with one hidden layer.
The multiparameter balancing of the SoC and the State of Health (SoH) with an offline
DQN algorithm have been introduced in [23]. The authors have implemented a DQN
algorithm on a direct current (DC) reconfigurable battery using a half-bridge topology with
10 modules.
To balance the SoC of an Alternating current (AC) reconfigurable battery, a Q-learning
algorithm has been proposed in [24]. The algorithm is restricted by the number of control-
lable modules in the system. However, the Q-learning algorithm has shown potential for
controlling reconfigurable batteries. In comparison to multiparameter optimization, the lim-
itations of the Q-learning algorithm regarding the possible number of controlled modules
have to be faced. This work proposes an algorithm based on Amortized Q-learning (AQL)
that addresses this problem. Additionally, the proposed algorithm allows for the combina-
tion of classical balancing algorithms and machine learning approaches. As a result, the
positive aspects of both approaches can be utilized. Among others, this includes the safety
of a classical approach and the flexibility and adaptation possibilities of machine learning
methods. Furthermore, the algorithm can be applied to AC and DC reconfigurable batteries.
To the best of the authors’ knowledge, this work introduces the first reinforcement learning
algorithm that enables the combination with classical algorithms to control reconfigurable
batteries with variable voltage levels.
2. Reconfigurable Battery
This paper discusses two topologies of MMC for reconfigurable batteries: the half-
bridge [9] and the BM3 [19] converter. In Figure 1, the circuits for both converters are
shown. For a reconfigurable battery, multiple modules are interconnected with each other.
A half-bridge converter module includes a battery and two MOSFET switches, S1 and S2 ,
and can be switched to serial and bypass modes. A BM3 module includes a battery and
three MOSFETs, S1 , S2 , and S3 , and it can be switched into serial, parallel, or bypass mode.
Batteries 2024, 10, 131 3 of 13
S1
S1
S2 S2
S3
(a) (b)
Figure 1. Electrical circuit of a half-bridge converter module (a) and a BM3 converter module (b).
Table 1 shows the three switchable modes. The two states of the MOSFET switches
are defined as follows: on-state with a fixed resistance of 0.55 mΩ and off-state with an
infinite resistance. The resistance value of the MOSFET is based on the resistance Ron of the
switches installed on the device under test (DUT) during the experimental evaluation.
Table 1. Switch states of the MOSFETs for BM3 and half-bridge, modeled as electrical resistors.
S1 S2 S3
Half-Bridge
Bypass on off -
Series off on -
BM3
Bypass on off off
Series off on off
Parallel on off on
In this work, the SoC is calculated by Coulomb counting, where the SOC at the time
step t + ∆t is defined as:
Z ∆t
iB
SOC (t + ∆t) = SOC (t) − dt, (1)
0 Q0
where i B is the current of the battery, Q0 the capacity of the cell, and SOC (0) is the initial
SoC of the battery cell. Van et al. [5] explains that the balance of the SoC is achieved by
equalizing the SoC of each cell so that the difference between the average SoC of each cell
and the general average SoC is minimized. It is defined as follows:
N 2
min ∑ SOC k − MSOC
k =0
N
1
MSOC :=
N ∑ SOCk (2)
k =0
the state and action, the environment returns a reward rt := r (at , st ), determined by the
reward function, and the next state st+1 [26,27].
rt rt+1
at
Agent Environment
st st+1
Figure 2. Interaction between agent and environment in reinforcement learning [25].
Generally, the actions an agent can perform are consistent across all states. A vector a
represents an element of the action space A [26]. The objective of the proposed algorithm
is to combine control methods and the possibilities provided by an RL algorithm. For the
classical approach of a DQN, the neural network takes the state of the environment as input
and outputs the action to take. This method limits the possibility of large discrete action
spaces and restricts the action space. For the use case of operating a reconfigurable battery,
the control of the output voltage is necessary. Additionally, the option of limiting the use
of single modules is necessary for safety. Van de Wiele et al. [28] introduced Amortized
Q-learning (AQL) an RL algorithm for enormous action spaces. This approach applies
Q-learning to high-dimensional or continuous action spaces. The costly maximization of all
actions a is replaced by the maximization of a smaller subset of possible actions sampled
for a learned proposal distribution.
where W is the number of all possible actions. An action is defined as the switching states
m of the modules of the reconfigurable battery pack.
h i
a : = m1 , m2 , , m N , (7)
Batteries 2024, 10, 131 5 of 13
where N is the number of modules in the system. The switching state mk of the module k is
described as follows:
0
if Bypass
(k)
m = 1 if Serial (8)
2 if Parallel
v(t)
Furthermore, the action space A contains the subsets AV to control the voltage levels.
The number of battery cells switched to serial mode in order to generate the required
voltage level at time t is v(t). Additionally, AC can restrict the action space A. The set AC
contains excluded actions. It is possible to disable a collection of defined actions to allow
for the combination of algorithmic and machine-learning-based control. An exemplary
cause for such a restriction can be a broken MOSFET switch or a battery cell that is about to
overheat. Consequently, the action space At at time t can be defined as:
v(t)
A t = AV − AC . (9)
where γ is a bias value to increase the effect of the reward during training. It ensures that the
reward is not too small and does not interfere with the optimization of the neural network.
Hidden Layers
SOC¹norm LSTM
SOC²norm
SOC1norm
SOC2norm LSTM
...
SOCNnorm
SOCNnorm argmax
a1B
a¹B
...
aN B LSTM
aNB
architecture of the network. It is implemented in Python 3.8 utilizing the machine learning
library PyTorch 2.0.1.
Layers Model
Input Layer Dense (24)
Hidden Layer 1 Dense (128)
ReLU
Dropout(0.1)
Hidden Layer 2 Dense (64)
ReLU
Dropout (0.1)
Hidden Layer 3 Dense (32)
ReLU
Dropout (0.1)
Output Layer Dense (1)
Figure 4 illustrates the reward for episodes during the training of both topologies. The
hardware used to train the model was a PowerEdge R750xa (Dell, Round Rock, TX, USA)
with an A40 GPU (Nvidia, Santa Clara, CA, USA) and an Xeon Gold 6338 CPU (Intel, Santa
Clara, CA, USA). The total training times for the topologies half-bridge and BM3 come to
16.4 h and 47.0 h. Training requires a simulation due to the high number of training runs.
A real-world setup could not archive the required number of runs in an acceptable training
time. For evaluation during training, nine random initial SoCs are set for each battery cell.
To ensure the reproducibility of the evaluation, a random seed is used. The evaluation
reward is the sum of the reward, as defined in (11), of 0.1 s simulation time with a step size
of ∆t = 10−5 s. Due to the complexity of action spaces, the training of both models does
not have to take the same amount of epochs. Accordingly, 3000 and 10,000 epochs took
place for the half-bridge and BM3 topology.
50,000
40,000
Reward
30,000
Half-Bridge
20,000 BM3
10,000
0
0 2,000 4,000 6,000 8,000 10,000
Epoch
Figure 4. Reward over the training of the model for Half-Bridge (blue) and BM3 (orange) controlling.
5. Experimental Analysis
The usability of the proposed algorithm is analyzed and discussed based on the results
of the experiments for the different scenarios:
• The simulative balancing of a 12-cell BM3 converter system.
• The experimental evaluation of results with a 12-cell half-bridge converter system and
comparison with the balancing algorithm proposed by Zheng [30].
Batteries 2024, 10, 131 8 of 13
Current (A)
Voltage (V)
SOC (%)
99.9 20 10
99.8 10 5
99.7 0 0
0 2 4 6 8 10 0 2 4 6 8 10
Discharge Time (s) Discharge Time (s)
(a) (b)
(c)
Figure 5. Simulated SoC (a), voltage (blue) and current (orange) (b), and states (c) of a BM3 converter
over time using proposed AQL algorithm for 10 s and a step size for ∆t = 10−4 s.
Figure 5a illustrates the process of balancing the SoC. It shows a discharge of ap-
proximately 0.3% SoC of 12 BM3 modules. The zoom-in on the figure highlights the SoC
balancing of the cells. It shows the unbalanced SoCs at the beginning and, afterwards,
the shift to each other in order to reach the balanced state. Voltage and current during
discharge can be seen in Figure 5b. The stepwise generation of the overall voltage caused
by the multilevel inverter can be detected. The trained model uses all possible switch states
with serial, bypass, and parallel. In 65% of all time steps, at least one cell is switched to
parallel mode. It can be conducted that the proposed model can be utilized to control a
BM3 reconfigurable battery and balance the SoC over time.
• 12× Battery cell simulator: NGM202 (Rohde and Schwarz, Munich, Germany) Power
Supply.
Figure 6. Experimental setup with a 12-cell hybrid cascaded multilevel converter as DUT.
A power supply unit is used for battery simulation to ensure a reproducible setup.
Furthermore, this enables the establishment of a randomly chosen initial SoC between 100%
and 70% of cells. The experimental ID is set as a seed for random value generation. During
the experiment, cells are discharged from the initial SoC to 20%. The computing unit is
dedicated to battery cell data processing and neural network execution. A Raspberry Pi 4 is
available as the control unit, connected via the Controller Area Network (CAN) bus to the
DUT. The control unit, the computing unit, and the battery cell simulators are connected
over Ethernet for communication. The control unit sets the DUT switching states based on
the determined actions by the computing unit.
The balancing process can be seen in Figure 7a. The discharge occurs for 100 s. The
system reaches the balanced state after 71.9 s at a SoC of 54%. During validation, the
balanced state is defined as a maximum SoC deviation of 1% within the cells with a
capacity of 0.1 A h. Besides the discharge time itself, an approximately linear tapper of the
discharge curves of the single cells can be detected.
SOC (%)
60 60
50 Balanced State 50
Balanced State
40 40
0 20 40 60 80 100 0 20 40 60 80 100
Discharge Time (s) Discharge Time (s)
(a) (b)
Figure 7. SoC balancing using the proposed AQL algorithm (a) and switch-max algorithm (b);
Balanced state after 71.9 s and 64.7 s (red) of discharge.
Batteries 2024, 10, 131 10 of 13
The balancing algorithm proposed by Zheng et al. [30] is also examined to evaluate
the method proposed in this paper. The algorithm is defined as follows: a cell with a higher
SoC can be discharged more. Those will be used preferably, and thus the difference in SoC
is increased. Therefore, the n cells with the highest SoC are switched on, where n is the
number of cells required to reach the requested voltage level. Due to this, the algorithm is
following named switching-max. The results of the different balancing methods can be seen
in Figure 7b. The initial SoC of each cell is set identically to ensure the same conditions for
evaluating the proposed AQL algorithm. During the experiment, the switch-max algorithm
reaches a balanced state and an SoC difference below 0.1 percent, after 64.7 s at an SoC of
55.4 percent.
For a valid evaluation of the proposed method and a discussion of the results compared
to the algorithm proposed by Zheng et al. [30], the experiment was repeated 50 times with
different randomly generated initial SoCs of each battery cell. The discharge time required
to reach the balanced state of each experiment can be seen in Figure 8. In addition, it shows
the mean (solid line) and standard deviation (shaded region) of both methods. It can be
observed that the mean time required to balance the switching-max algorithm is 11.99 s or
20.3% faster compared to the proposed AQL algorithm. A mean balancing time of 70.9 s
with a standard deviation of 11.9 s can be observed for the AQL method. Furthermore, the
switching-max algorithm shows 58.9 s ± 10.0 s for balancing.
90
Discharge Time to balance SOC (s)
80 Mean
70
11.99s
60
50
40 Standard Deviation
Switching-Max
Proposed AQL
30
0 10 20 30 40 50
Experiment ID
Figure 8. Comparasion of the required time to reach balanced state using the switching-max algorithm
and the proposed AQL algorithm over 50 experiments; Area highlights the standard deviation of the
mean discharge time (line).
6. Discussion
The general applicability of the proposed method is demonstrated by the evaluation
based on the simulation of a BM3 converter and the experimental setup of the half-bridge
converter. Both scenarios have shown the suitability for SoC balancing. Furthermore,
after reaching the balanced state, the cells are steadily discharged. The algorithm can be
utilized to balance an unbalanced state and to control the system during the balanced state.
The computational cost is a significant component, mainly for the BM3 topology. With
12 modules, the voltage level of five cells connected in series has the highest number of
valid configurations, resulting in a total of 19,448. Without any action constraints, the neural
network must examine each configuration to determine the best option. The limitation of
Batteries 2024, 10, 131 11 of 13
the action space, proposed in (9), can help reduce computational cost and time. Furthermore,
to decrease the number of switching operations per time step, a dynamic action space can
be utilized. Therefore, the current switching configuration must be considered to limit the
action space.
It is not possible to adequately compare the proposed method to other RL algorithms,
such as [18,20,22], because they cannot be applied to different voltage levels. A sepa-
rate neural network would be required for each voltage level. Comparison between the
proposed AQL SoC balancing algorithm and the switching-max algorithm proposed by
Zheng [30] shows a 12.0 s or 20.3% slower balance for the AQL algorithm. This conclusion
was reached by comparing 50 individual experiments for each algorithm, as shown in
Figure 8. However, the main goal of this work is to introduce an algorithm using a machine
learning method that allows for the control of reconfigurable batteries with variable voltage
output. The algorithm shows applicability for different types of topologies. Furthermore,
the proposed algorithm enables the combination of classical algorithms with the proposed
AQL algorithm by allowing dynamic action spaces as introduced in (9).
Author Contributions: Conceptualization, D.K. and A.N.; methodology, D.K.; software, D.K.; vali-
dation, D.K. and W.B.; formal analysis, D.K. and W.B.; data curation, D.K.; writing—original draft
preparation, D.K.; writing—review and editing, W.B., S.P. and A.N.; visualization, D.K.; supervision,
A.N.; and funding acquisition, A.N. All authors have read and agreed to the published version of
the manuscript.
Batteries 2024, 10, 131 12 of 13
Funding: This research is funded by dtec.bw—Digitalization and Technology Research Center of the Bun-
deswehr, which we gratefully acknowledge. dtec.bw is funded by the European Union—NextGenerationEU.
Further, we acknowledge financial support by the University of the Bundeswehr Munich.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author.
Conflicts of Interest: The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Gallardo-Lozano, J.; Romero-Cadaval, E.; Milanes-Montero, M.I.; Guerrero-Martinez, M.A. A novel active battery equalization
control with on-line unhealthy cell detection and cell change decision. J. Power Sources 2015, 299, 356–370. [CrossRef]
2. Zhang, Z.; Zhang, L.; Hu, L.; Huang, C. Active cell balancing of lithium-ion battery pack based on average state of charge. Int. J.
Energy Res. 2020, 44, 2535–2548. [CrossRef]
3. Ghaeminezhad, N.; Ouyang, Q.; Hu, X.; Xu, G.; Wang, Z. Active Cell Equalization Topologies Analysis for Battery Packs: A
Systematic Review. IEEE Trans. Power Electron. 2021, 36, 9119–9135. [CrossRef]
4. Cao, Y.; Abu Qahouq, J.A. Hierarchical SOC Balancing Controller for Battery Energy Storage System. IEEE Trans. Ind. Electron.
2021, 68, 9386–9397. [CrossRef]
5. Van, C.N.; Vinh, T.N.; Ngo, M.D.; Ahn, S.J. Optimal SoC Balancing Control for Lithium-Ion Battery Cells Connected in Series.
Energies 2021, 14, 2875. [CrossRef]
6. Jung, J.H.; Hosseini, E.; Liserre, M.; Fernández-Ramírez, L.M. Reinforcement Learning Based Modulation for Balancing Capacitor
Voltage and Thermal Stress to Enhance Current Capability of MMCs. In Proceedings of the 2022 IEEE 13th International
Symposium on Power Electronics for Distributed Generation Systems (PEDG), Kiel, Germany, 26–29 June 2022; pp. 1–6.
[CrossRef]
7. Tang, Y.; Hu, W.; Cao, D.; Hou, N.; Li, Y.; Chen, Z.; Blaabjerg, F. Artificial Intelligence-Aided Minimum Reactive Power Control
for the DAB Converter Based on Harmonic Analysis Method. IEEE Trans. Power Electron. 2021, 36, 9704–9710. [CrossRef]
8. Tang, Y.; Hu, W.; Xiao, J.; Chen, Z.; Huang, Q.; Chen, Z.; Blaabjerg, F. Reinforcement Learning Based Efficiency Optimization
Scheme for the DAB DC–DC Converter with Triple-Phase-Shift Modulation. IEEE Trans. Ind. Electron. 2021, 68, 7350–7361.
[CrossRef]
9. Tashakor, N.; Li, Z.; Goetz, S.M. A generic scheduling algorithm for low-frequency switching in modular multilevel converters
with parallel functionality. IEEE Trans. Power Electron. 2020, 36, 2852–2863. [CrossRef]
10. Kristjansen, M.; Kulkarni, A.; Jensen, P.G.; Teodorescu, R.; Larsen, K.G. Dual Balancing of SoC/SoT in Smart Batteries Using
Reinforcement Learning in Uppaal Stratego. In Proceedings of the IECON 2023-49th Annual Conference of the IEEE Industrial
Electronics Society, Singapore, 16–19 October 2023; pp. 1–6. [CrossRef]
Batteries 2024, 10, 131 13 of 13
11. Mashayekh, A.; Kersten, A.; Kuder, M.; Estaller, J.; Khorasani, M.; Buberger, J.; Eckerle, R.; Weyh, T. Proactive SoC Balancing
Strategy for Battery Modular Multilevel Management (BM3) Converter Systems and Reconfigurable Batteries. In Proceedings
of the 2021 23rd European Conference on Power Electronics and Applications (EPE’21 ECCE Europe), Ghent, Belgium, 6–10
September 2021; pp. P.1–P.10. [CrossRef]
12. Huang, H.; Ghias, A.M.; Acuna, P.; Dong, Z.; Zhao, J.; Reza, M.S. A fast battery balance method for a modular-reconfigurable
battery energy storage system. Appl. Energy 2024, 356, 122470. [CrossRef]
13. Han, W.; Zou, C.; Zhang, L.; Ouyang, Q.; Wik, T. Near-fastest battery balancing by cell/module reconfiguration. IEEE Trans.
Smart Grid 2019, 10, 6954–6964. [CrossRef]
14. McGrath, B.P.; Holmes, D.G.; Kong, W.Y. A decentralized controller architecture for a cascaded H-bridge multilevel converter.
IEEE Trans. Ind. Electron. 2013, 61, 1169–1178. [CrossRef]
15. Xu, B.; Tu, H.; Du, Y.; Yu, H.; Liang, H.; Lukic, S. A distributed control architecture for cascaded H-bridge converter with
integrated battery energy storage. IEEE Trans. Ind. Appl. 2020, 57, 845–856. [CrossRef]
16. Pinter, Z.M.; Papageorgiou, D.; Rohde, G.; Marinelli, M.; Træholt, C. Review of Control Algorithms for Reconfigurable Battery
Systems with an Industrial Example. In Proceedings of the 2021 56th International Universities Power Engineering Conference
(UPEC), Middlesbrough, UK, 31 August–3 September 2021; pp. 1–6. [CrossRef]
17. Morstyn, T.; Momayyezan, M.; Hredzak, B.; Agelidis, V.G. Distributed control for state-of-charge balancing between the modules
of a reconfigurable battery energy storage system. IEEE Trans. Power Electron. 2015, 31, 7986–7995. [CrossRef]
18. Jiang, B.; Tang, J.; Liu, Y.; Boscaglia, L. Active Balancing of Reconfigurable Batteries Using Reinforcement Learning Algorithms.
In Proceedings of the 2023 IEEE Transportation Electrification Conference & Expo (ITEC), Detroit, MI, USA, 21–23 June 2023; pp. 1–6.
[CrossRef]
19. Kuder, M.; Schneider, J.; Kersten, A.; Thiringer, T.; Eckerle, R.; Weyh, T. Battery modular multilevel management (bm3) converter
applied at battery cell level for electric vehicles and energy storages. In Proceedings of the PCIM Europe Digital Days 2020;
International Exhibition and Conference for Power Electronics, Intelligent Motion, Renewable Energy and Energy Management,
Nuremberg, Germany, 7–8 July 2020; pp. 1–8.
20. Stevenson, A.; Tariq, M.; Sarwat, A. Reduced Operational Inhomogeneities in a Reconfigurable Parallelly-Connected Battery Pack
Using DQN Reinforcement Learning Technique. In Proceedings of the 2023 IEEE Transportation Electrification Conference &
Expo (ITEC), Detroit, MI, USA, 21–23 June 2023; pp. 1–5. [CrossRef]
21. Mashayekh, A.; Pohlmann, S.; Estaller, J.; Kuder, M.; Lesnicar, A.; Eckerle, R.; Weyh, T. Multi-Agent Reinforcement Learning-Based
Decentralized Controller for Battery Modular Multilevel Inverter Systems. Electricity 2023, 4, 235–252. [CrossRef]
22. Yang, F.; Gao, F.; Liu, B.; Ci, S. An adaptive control framework for dynamically reconfigurable battery systems based on deep
reinforcement learning. IEEE Trans. Ind. Electron. 2022, 69, 12980–12987. [CrossRef]
23. Yang, X.; Liu, P.; Liu, F.; Liu, Z.; Wang, D.; Zhu, J.; Wei, T. A DOD-SOH balancing control method for dynamic reconfigurable
battery systems based on DQN algorithm. Front. Energy Res. 2023, 11, 1333147. [CrossRef]
24. Karnehm, D.; Pohlmann, S.; Neve, A. State-of-Charge (SoC) Balancing of Battery Modular Multilevel Management (BM3)
Converter using Q-Learning. In Proceedings of the 15th Annual IEEE Green Technologies (GreenTech) Conference, Denver, CO,
USA, 19–21 April 2023.
25. Sutton, R.; Barto, A. Reinforcement learning: An introduction 1st edition. Exp. Psychol. Learn. Mem. Cogn. 1998, 30, 1302–1321.
26. Jang, B.; Kim, M.; Harerimana, G.; Kim, J.W. Q-Learning Algorithms: A Comprehensive Classification and Applications. IEEE
Access 2019, 7, 133653–133667. [CrossRef]
27. Mirchevska, B.; Hügle, M.; Kalweit, G.; Werling, M.; Boedecker, J. Amortized Q-learning with Model-based Action Proposals for
Autonomous Driving on Highways. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation
(ICRA), Xi’an, China, 30 May–5 June 2021; pp. 1028–1035. [CrossRef]
28. Van de Wiele, T.; Warde-Farley, D.; Mnih, A.; Mnih, V. Q-Learning in enormous action spaces via amortized approximate
maximization. Technical Report. arXiv 2020, arXiv:2001.08116.
29. Karnehm, D.; Sorokina, N.; Pohlmann, S.; Mashayekh, A.; Kuder, M.; Gieraths, A. A High Performance Simulation Framework
for Battery Modular Multilevel Management Converter. In Proceedings of the 2022 International Conference on Smart Energy
Systems and Technologies (SEST), Eindhoven, The Netherlands, 5–7 September 2022; pp. 1–6. [CrossRef]
30. Zheng, Z.; Wang, K.; Xu, L.; Li, Y. A Hybrid Cascaded Multilevel Converter for Battery Energy Management Applied in Electric
Vehicles. IEEE Trans. Power Electron. 2014, 29, 3537–3546. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.