0% found this document useful (0 votes)

6 views

Smart Microgrid Optimization using Deep Reinforcement Learning by utilizing the Energy Storage Systems

This paper presents a method for optimizing smart microgrid operations using deep reinforcement learning (RL) to manage energy flow and reduce costs. By integrating artificial intelligence into the energy management system, the authors demonstrate significant cost savings through simulations using real data from Norway. The study highlights the potential of RL in enhancing the efficiency of microgrids, particularly in managing renewable energy sources and battery storage systems.

Uploaded by

ialameen102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Smart Microgrid Optimization using Deep Reinforcement Learning by utilizing the Energy Storage Systems

Uploaded by

ialameen102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Smart Microgrid Optimization using Deep

Reinforcement Learning by utilizing the Energy

Storage Systems
2024 4th International Conference on Smart Grid and Renewable Energy (SGRE) | 979-8-3503-0626-2/24/$31.00 ©2024 IEEE | DOI: 10.1109/SGRE59715.2024.10428874

Ibrahim Ahmed Andreas Pedersen Lucian Mihet-Popa

Engineering Department Engineering Department Engineering Department
Østfold University College Østfold University College Senior Member IEEE
Fredrikstad, Norway Fredrikstad, Norway Østfold University College
[email protected] https://orcid.org/0000-0002-2087-8123 Fredrikstad, Norway
https://orcid.org/0000-0001-7814-5152 [email protected]
https://orcid.org/0000-0002-4556-2774

Abstract—The need for power is expanding quickly, and as digitization of the grid’s distribution segment. Furthermore,
fossil fuels are depleted, alternatives based on renewable energy distributed generation and microgrids have been integrated into
are emerging. In recent years, microgrids, which are based on the power grid to increase its reliability and efficiency [3]. Mi-
distributed generation and storage systems, have been on an
increasing trend. This generates various new opportunities, one crogrids are defined according to the US Department of Energy
of which is the capacity to monitor, control, and regulate the as a ”group of interconnected loads and distributed energy
energy flow inside and outside of the microgrid. By making resources within clearly defined electrical boundaries that act
microgrid’s distributed energy resources (DERs) and energy stor- as a single controllable entity with respect to the grid” [4]. A
age components economically viable with artificial intelligence microgrid can operate in grid-tied mode or standalone mode
and machine learning incorporated into the cost-optimization
process, the demand for and growth of these technologies will using a backup power supply. One of the most used backup
be accelerated significantly. This paper focuses on employing power supplies in microgrids is a battery storage system
reinforcement learning (RL) algorithms to control energy flow because of its quick response and reliability in the face of any
in an AC microgrid. By incorporating artificial intelligence and power disruption or blackout to support critical loads [5]. To
machine learning into the energy management system (EMS), benefit from this costly investment, it is important to utilize the
the paper aims to optimize costs and facilitate the integration
of renewable energy sources. The RL agent is designed to trade battery in other avenues, such as peak-shaving and electricity
energy with the main grid, taking advantage of the energy storage trading. However, the microgrid is a complex and dynamic
system and achieving cost savings. The RL agent is tested using system with several distributed generation units and loads and
real spot prices data in Norway from a simulation model that due to the intermittent nature of renewable sources and the
combines Python and MATLAB-Simulink software programs for difficulty of accurately predicting the power generation and
efficient co-simulation technique. Results show significant cost
savings of around 14% for a simple model and 7.5% for a consumption[6], effectively optimizing the energy flow using
complex dynamic model. the battery is a challenging problem. Reinforcement learning
Index Terms—Smart Microgrids, Reinforcement Learning, (RL) is among the potential approaches used in the literature
PPO, Co-Simulation, Energy Management System to solve this problem [7], [8]. This paper uses reinforcement
learning by training agents that receive power measurements
I. I NTRODUCTION from renewable sources, the grid, and the load and perform
The contemporary civilization requires a reliable supply of actions that optimize the operational cost of the microgrid. The
electricity to consumers and prosumers with a high level of rest of the paper is structured as follows: Section 2 provides a
power quality. In many nations and areas, the structure of the brief introduction to reinforcement learning, Section 3 explores
power grid is constantly evolving, creating challenges with en- related work and illustrates the novelty of this paper, section 4
ergy flow changes, capacity constraints, and expensive invest- explains the modeling and implementation approach, section
ment costs to modernize the power grid. With new technology 5 shows the results and section 6 concludes the paper.
come new possibilities, and for many years, grids have been
digitalized to enable centralized monitoring and management II. R EINFORCEMENT L EARNING
of the electricity network [1]. This ”smart” digitalized grid A reinforcement learning system consists of four compo-
has been a reality for the grid’s high-voltage segment for nents: a policy, a reward, a value function, and, in certain
a substantial amount of time. In the next years, grids will situations, a model of the environment (microgrid), as can
increasingly rely on renewable energy, and customers will be seen in Fig. 1. An RL system’s policy is the mapping of
benefit from smart technology such as electric vehicle chargers observations/inputs to outputs/actions. A reward is a number
and smart meters [2]. This creates a possibility to increase that the RL agent receives in the subsequent time step after

Authorized licensed use limited to: Oestfold University College. Downloaded on May 01,2025 at 22:48:24 UTC from IEEE Xplore. Restrictions apply.
performing an action that should indicate the quality of the in the long-term. The authors in [12], compared ten different
activity. The RL agent’s only purpose is to maximize the RL algorithms on a scenario of 10 days. The data they used
overall reward. To prevent the RL agent from prioritizing the was from Finland, and they found that a modified version
immediate benefit above the cumulative gain over the long of the Advantage Actor-Critic algorithm (A3C++) performed
run, a value function is employed to define the future projected best and that the performance of Proximal Policy Optimization
return. The last optional component is the environment model, (PPO) achieved negative results. Moreover, they proposed in
which attempts to replicate the environment to anticipate the future work section that the day-ahead electricity spot
behavior; typically, this is the next observation and reward. prices should be added to the observations. In another example,
This is referred to as model-based approaches, while model- [13] used RL to reduce the operational cost by 20.75%, but the
free methods are purely empirical with no explicit model of agent was only tested on a simple mathematical model. This
the environment [9]. paper will show the impact of integrating an RL agent into
Agent Microgrid a complex environment that simulates the effect of switching
Policy
converters, reactive power, and primary and secondary control.
Action (Pref)
function ESS Additionally, the use of real variable spot prices and weather
Reward data from Norway will be introduced, and different algorithms
PV Wind Load and pricing schemes will be investigated.
The following points summarize the novelty and contribu-
Grid
tions of this paper:
Observations
1) Development and training of a reinforcement learning
agent capable of optimizing and reducing the operational
PGrid PLoad cost of a microgrid using real electricity prices, real
Electricity
weather data, and load data from a Norwegian commu-
Price Day-Ahead
Electricity
nity.
Pwind+pv Prices 2) Development of a full-scale complex model with power
electronic converters controlled using model predictive
control (MPC) to verify the performance of the RL agent
Fig. 1. Structure of the RL agent used in the proposed microgrid and compare it with the simplified mathematical model
used for training.
For very complex dynamic systems, it may become ex-
IV. M ODELING AND I MPLEMENTATION
tremely difficult, if not impossible, to construct an agent that
models the entire system mathematically and so performs the A. Modeling Approach
optimal actions consistently. Rather than describing these rules The chosen framework for training the RL agent is Python,
empirically, RL will continually learn and update its parame- mainly using stable-baselines3 library. To be able to train an
ters to improve performance. Even if the environment changes RL agent efficiently, a simplified model that expresses the most
over time, the RL agent will learn to adapt. For this learning to relevant dynamics of the microgrid should be made to reduce
occur, however, random exploration of the environment is key the time needed for the training. A complex model developed
and thus, one of the most critical and challenging criteria to in Simulink is then used to test the efficacy of the trained RL
select is the trade-off between exploration and exploitation. If agent.
the agent’s exploration parameter is set to a high value, it will 1) Modeling the PV Modules: For the PV modules, the
nearly select random actions, but if it is set to a low value, it dynamics of the PV, inverter, and filter are neglected, since the
will take the first and best solution it discovers and get trapped main concern of a cost-optimization algorithm is the output
in a local minimum. power, not the frequency or voltage. Since the relationship
between the solar irradiance and the power generation is
III. R ELATED W ORK AND N OVELTY positive with a high correlation coefficient[14][15], the PV
Several works in the literature explored the problem of modules can simply be modeled using a linear equation as:
optimizing microgrid operational cost using reinforcement
learning. The Deep-Q-Network (DQN) algorithm that was Ppv = G · Ppv,rated (1)
used in [10] improved the efficiency and cost of an MG located
where Ppv is the power output of the PV system, G is the
in Belgium. However, real-time prices were not used and the
irradiance, and Ppv,rated is the rated power of the PV modules.
energy cost was set at a constant 2C/kWh. The proposed
2) Modeling of the wind turbine: The wind turbine power
scheme in [11] created 3 different consumption profiles based
was modeled using the Cubic Power with Cut-off model,
on the customers’ requirements and obtained a highly prof-
where the wind power is given by (2) [16][17].
itable scheme using DDPG algorithm, but the results were only
over a shorter time horizon of some weeks, and the battery " #
3
simply got discharged at the end for one of the schemes, 3 Pnom Pnom Umin
Pwind = VW ind 3 3 − 3 3 (2)
which does not prove how the trained algorithm would work Unom − Umin Unom − Umin

Authorized licensed use limited to: Oestfold University College. Downloaded on May 01,2025 at 22:48:24 UTC from IEEE Xplore. Restrictions apply.
Where Pwind is the output power of the wind turbine, Vwind is and receives the next state observations and a reward. The
the wind speed in m/s, Pnom is the nominal power of the wind observation space is the set of observations received by the
turbine, Unom is the nominal wind speed in m/s, Umax and agent. In grid-connected mode, these observations are the grid
Umin are the maximum and minimum operating wind speeds power, total renewable power generation (PV Power + Wind
in m/s respectively. It is worth noting that the output power Power), the load power, the SOC of the battery, and the
will be zero if the wind speed is outside the operating range day-ahead spot price which are the list of electricity prices
(less than Umin or greater than Umax ) and it would be limited in the next 24 hours since the day-ahead price is usually
to the nominal power Pnom if the result from (2) is higher than published one day in advance and the agent has access to
that. that information. The action space is simply one action which
is the power reference given to the battery. The observation
B. Modeling of the Energy Storage System space has 28 dimensions while the action space has only one.
The energy storage system used is a lithium-ion battery
E. The Reward Function
and it is modeled as a simple system with power as input
and state of charge (SOC) as output. The sign of the power The reward function has to be designed carefully based
determines whether the battery is charging or discharging, on the objectives of the system. For the energy management
positive sign means it is charging while negative means that system, the objectives of the system are as follows:
it is discharging. The state of charge is calculated based 1) Keep the battery state of charge between 10% and
on the efficiency of the battery, battery capacity, and the 90% to preserve the health of the battery and avoid
charging/discharging power using (3). degradation
2) Economic optimization by minimizing the annual elec-
η
SOC(k + 1) = SOC(k) + P (k) × T × (3) tricity cost incurred by the microgrid using the battery
CBat 3) Disable economic optimization when the state of charge
where SOC (k + 1) is the state of charge in the next time step, is below 30%
SOC (k) is the current state of charge, P (k) is input power to To address the first objective, a severe penalty of -10 is
the battery, T is the timestep which is the time span in which applied to discourage the battery’s state of charge (SOC) from
the battery is charging/discharging with the given power, η is exceeding operational limits, preventing it from discharging to
the efficiency of the battery and CBat is the total capacity of 0% or charging to 100%. This is formally displayed in (7).
the battery in kWh.
(
C. Modeling of the Main Grid and Cost Calculation −10, SOC > 90% ∥ SOC < 10%
RSOC = (7)
The grid is modeled as an ideal power source that is able 0, 10% ≤ SOC ≤ 90%
to supply and accept infinite power. Therefore, the grid power Achieving the second objective, which focuses on encour-
is simply the load power minus the summation of the wind, aging the agent to make beneficial decisions regarding power
PV and battery powers to achieve power balance as shown in trading with the grid, proved challenging due to the inter-
(4). mittent nature of renewable sources. Rewarding or punishing
based on power buying or selling is unreliable, as it depends
PGrid = PLoad − (PP V + PW ind − PBat ) (4) heavily on environmental factors. Instead, the paper proposes
Cost Calculation: The instantaneous electricity cost (de- a different approach: creating a baseline called the ”No Action
noted by H) during a given hour (n) is simply the grid power Cost,” defined as the instantaneous electricity cost when the
multiplied by the spot price (denoted as S) at that hour as agent takes no action. This approach eliminates the impact
shown in (5). of luck on the reward function, ensuring that the agent is
rewarded only when its actions outperform doing nothing.
H(n) = S(n) × PGrid (n) (5) We can then mathematically define the reward function as the
difference between the no-action instantaneous cost Hn and
The cumulative annual cost (denoted by J) is then taken the instantaneous cost with action Ha as shown in (8).
by summing the instantaneous cost over the entire year (8760
Hn − Ha
hours) as shown in (6). REO = (8)
N
8760
X where N is a constant used to normalize the error to ensure
J= H(n) (6)
the value is not very high. It is worth noting that the reward
n=1
is limited between -2 and 2 to avoid sharp spikes and ensure
D. The EMS Environment that the penalties remain significant.
Reinforcement learning problems are modeled as Markov The third objective is to make sure there is enough charge
Decision Processes with an agent that interacts with the en- in the battery in case of grid disconnection. Without this, the
vironment through sending actions and receiving observations agent will try to exploit the battery energy and sell as much
and rewards. At each timestep, the agent carries out an action power to keep the SOC at around 10%. We added another

Authorized licensed use limited to: Oestfold University College. Downloaded on May 01,2025 at 22:48:24 UTC from IEEE Xplore. Restrictions apply.
term to the reward which penalizes the agent based on the G. Complex Model and Model Predictive Control
amount of grid power supplied or absorbed by the grid. This
A model that takes the switching of power electronics and
will ensure that when the SOC is too low, the agent will
all of the layers of a hierarchical control structure into account
attempt to direct any excess energy from renewables to charge
was created to test the behavior of the RL agent within this
the battery rather than sell it to the grid and it would also not
system with more advanced dynamics. In this model, the rated
discharge the battery into the grid to further decrease the SOC.
power of the components is the same, and all the primary
The mathematical representation is found in (9).
controllers for the inverters use Finite Set Model Predictive
abs(PGrid ) Control (FS-MPC) that tries to keep the DC-link voltage
RG = (9)
PN om constant while producing the maximum power using MPPT
Where PGrid is the grid power when taking an action and algorithms. The battery controller also uses FS-MPC and it
PN om is the nominal grid power which is 100kW in this case. receives a power reference from the RL agent. The control of
The complete reward function is simply the summation of the three-phase PV inverter is based on the strategy using αβ-
Rc minus the penalties as show in (10). transformation developed by [28], while the battery and wind
turbine converters were implemented based on the strategy
R = REO − RSOC − RG (10) found in [29]. The cost function components of the PV inverter
MPC controller are depicted in (11) - (13).
Where REO is the reward from economic optimization from
(8), RSOC is the penalty to keep the SOC within the acceptable
limits from (7) and RG is the reward that depends on the grid ∗ 2 ∗ 2
f1 = Isα (k) − Isα (k + 1) + Isβ (k) − Isβ (k + 1) (11)
power when the SOC falls below 30% in (9).
F. The Learning Algorithm
2 2
f2 = VC∗f α (k)−VCf α (k+1) + VC∗f β (k)−VCf β (k+1) (12)

There are a variety of reinforcement learning algorithms
used in different problems depending on the action space, the
observation space, and the nature of the problem but generally,
there are two approaches to solving reinforcement learning ∗ 2 ∗ 2
problems, either through the value function or directly through f 3 = IL fα
(k)−ILf α (k +1) + IL fβ
(k)−ILf β (k +1) (13)
optimizing the policy. Actor-critic methods use both the value ∗ ∗
where Isα (k) and Isβ (k) are the αβ components of the
function and the policy to learn. The critic estimates the value
reference grid current, Isα (k + 1) and Isβ (k + 1) are the
function and the actor updates the policy based on the value
αβ components of the predicted grid current, VC∗f α (k) and
function estimate by the critic [9]. The most widely used
VC∗f β (k) are the αβ components of the reference filter ca-
algorithm for microgrids in the literature is Deep-Q-Network
pacitor voltage, VCf α (k + 1) and VCf β (k + 1) are the αβ
(DQN) which was used in [13], [18] and [18] while [19] used
components of the predicted filter capacitor voltage, [IL∗ f α (k)
Q-learning and [20] used Double DQN which are similar to
DQN. Other algorithms such as Monte Carlo with a deep and IL∗ f β (k) are the αβ components of the reference filter
neural network [21] and DDPG [22] were used as well. In ad- inductor current, Lf α(k + 1) and Lf β(k + 1) are the αβ
dition, policy-based algorithms started gaining more traction, components of the predicted filter inductor current.
and the Vanilla Policy gradient (VPG) algorithm was employed The overall cost function F is the weighted sum of f1 , f2 ,
in [23] and a comparison between Deep-Deterministic Policy and f3 as shown in (14):
Gradient (DDPG) and Proximal policy optimization (PPO)
was done in [24]. Authors in [25] recently utilized PPO algo- F = f1 + λv f2 + λi f3 (14)
rithm for scheduling optimization and found that it displayed
superior performance to DQN. Furthermore, [12] compared where λv and λi are weighting factors for the filter capacitor
seven different algorithms and proposed improved versions of voltage and filter inductor current cost functions. More details
PPO and the Asynchronous Actor-Critic (A3C) that provided about these equations can be found in [28].
superior performance. After testing a variety of algorithms The battery and wind turbine MPC cost function depends
including DQN, DDQG, Advantage Actor-Critic (A2C), and on the control of the active and reactive powers as displayed
PPO, we found that PPO delivered the best results and that in (15):
is why we decided to use PPO for the training of the RL
agents. PPO was first introduced in [26] and was found to be f = |P ∗ − P (k + 1)| + |Q∗ − Q(k + 1)| (15)
superior to many of the existing algorithms such as A2C and
DQN. It is a variant of an algorithm called Trust Region Policy where P ∗ and Q∗ are the reference active and reactive powers
Optimization [27] and maintains many of its advantages while generated by the EMS in the case of the battery and the speed
being much simpler to implement. It works by alternating controller in the case of the wind turbine, P (k +1) and Q(k +
between sampling data from the policy and performing several 1) are the predicted active and reactive power using the method
rounds of first-order optimization. described in [29].

Authorized licensed use limited to: Oestfold University College. Downloaded on May 01,2025 at 22:48:24 UTC from IEEE Xplore. Restrictions apply.
Acquire and Develop Simulink Phasor
Process the Data
Explore the Data and Complex Models

Test Agents on Python and Perform hyperparameter Develop Python

Train RL Agents
Simulink-Python Co-simulation Tuning Environment

Fig. 2. Overview of the general steps followed during the project

H. Co-Simulation check the effect of co-simulation, and a more complex-

Co-simulation involves the integration of two distinct sim- discrete model utilizing more detailed dynamics to validate the
ulation software programs, such as MATLAB-Simulink and performance on a realistic scenario. The phasor simulation in
Python, to create a collaborative system for data exchange Simulink, complex-discrete simulation in Simulink, and the
between them. This approach offers several advantages, in- mathematical model in Python are displayed alongside the
cluding the ability to use specialized solvers for each subsys- no-action cost in Fig. 3. The interval between 3500 - 6000
tem, accommodating different time steps for multiple systems, hours shows a flat region followed by a slight decline in the
and distributing the computational load for large-scale systems cumulative cost. This is due to the fact that the prices during
like microgrids. The decision to use co-simulation arose from this period were low as displayed in Fig. 4(a) and this coupled
the inefficiency of training a reinforcement learning agent in with the high generation during this period allowed the agent
MATLAB/Simulink and the preference for Python’s flexibil- to sell more power to the grid and reduce the cumulative
ity and readily available algorithm implementations. Fig. 2 cost. The Python model achieved 13.71%, the phasor model
describes the general steps of the simulation process. After achieved 12.16%, and the complex model achieved 7.48% in
training and testing the agent in Python, a Simulink-Python cost savings compared to the no-action baseline. The Python
co-simulation was employed to assess its performance in a model had the best results because this was where the training
more complex model, despite the slower execution time in was performed in addition to being the simplest model. The
the complex model. To address this issue, a simplified wind phasor model was close to the Python model because both
turbine model was adopted due to its inertia and the impracti- were modeled with the same approach and the small difference
cality of simulating real-time changes in a model where each could possibly be attributed to the delay between Python and
hour corresponds to only seconds. This co-simulation approach MATLAB.
entailed calling a Python file within MATLAB-Simulink, with Fig. 4, shows a plot of the SOC and the spot price with Fig.
observations as input and battery references as output. 4(a) showing the result for 1 year, and Fig. 4(b) showing how it
behaves during the irregularly high electricity prices at the end.
V. R ESULTS AND D ISCUSSION
Even though the state of charge is constantly fluctuating from
The cumulative operational cost over one year was chosen day to day, there is a clear upward trend from the beginning
to be the optimization variable and all the models were until 5000 hours followed by a downward trend until the end.
benchmarked against the no-action cost to calculate the cost This correlates well with the spot price which starts low at
savings when running the agent on each of the three different the beginning of the year and then rises significantly after
models. Two models were developed in Simulink, a phasor 5000 hours, suggesting that the agent was tracking these long-
simplified model that is identical to the Python model to term price changes. Furthermore, Fig. 5 depicts the renewable

Authorized licensed use limited to: Oestfold University College. Downloaded on May 01,2025 at 22:48:24 UTC from IEEE Xplore. Restrictions apply.
160 1
Cummulative Cost without EMS SpotPrice
Cost (Phasor Simulation) SOC
140 Cost (Complex Model) 0.9 RES Power
Cost (Python)
0.8
120

0.7
Price in thousand (NOK)

100

Normalized Units
0.6
80
0.5

60 0.4

40 0.3

20
0.2

0.1
0
0 1000 2000 3000 4000 5000 6000 7000 8000
Time (hrs) 0
5 10 15 20 25 30 35 40 45
Time (hrs)
Fig. 3. Comparison of the achieved cumulative cost savings in (%)
Fig. 5. Normalized results for the spot price, PRES , and the SOC for 2
1
random days
Spot Price
SOC
0.9

0.8
component dynamics. The algorithm’s performance was less
0.7
successful in the complex environment, with minor variations
Normalized Units

0.6
in power production and communication latency between
0.5
Simulink and Python models. The PPO algorithm, applied
0.4
to the microgrid, achieved cost savings of 13.7%, which
0.3
reduced to 7.48% in the complex model due to changing
0.2
spot prices. Using reinforcement learning algorithms such as
0.1
PPO for energy optimization yields promising results, and
0
0 1000 2000 3000 4000 5000 6000 7000 8000 training with simplified models is a valid approach for complex
Time (hrs)
environments. This research contribute to understanding how
(a) Result for 1 year
reinforcement learning can enhance microgrid performance
1
Spot Price
and reduce operational costs.
SOC
0.9
ACKNOWLEDGMENTS
0.8
This work was supported in part by EEA and Norway
0.7
Grants financed by Innovation Norway: DOITSMARTER
Normalized Units

0.6
(2022/337335), ENERGEIA (2022/346660) and Increased
0.5 knowledge on RES and Energy Efficiency (2022/346705).
0.4
R EFERENCES
0.3

0.2
[1] M. C. Falvo, L. Martirano, D. Sbordone, and E. Bocci,
“Technologies for smart grids: A brief review,” in
0.1
7700 7800 7900 8000 8100 8200 8300
Time (hrs)
8400 8500 8600 8700 2013 12th International Conference on Environment
and Electrical Engineering, IEEE, Wroclaw, Poland,
(b) Result for the last 1000 hours
2013, pp. 369–375.
[2] S. Kakran and S. Chanana, “Smart operations of smart
Fig. 4. Normalized results for the spot price, and the SOC
grids integrated with distributed generation: A review,”
Renewable and Sustainable Energy Reviews, vol. 81,
power along with the SOC and the spot price for two days. pp. 524–535, 2018.
There is a correlation between the RES production and the [3] Y. Yoldaş, A. Önen, S. Muyeen, A. V. Vasilakos, and
SOC where the SOC follows the graph of renewable power I. Alan, “Enhancing smart grid with microgrids: Chal-
production. lenges and opportunities,” Renewable and Sustainable
Energy Reviews, vol. 72, pp. 205–214, 2017.
VI. C ONCLUSION [4] Department of Energy Office of Electricity Delivery and
Energy Reliability, Summary report: 2012 doe micro-
This paper introduced an optimization technique for data
grid workshop, [Accessed: 2022-05-24, 2012. [Online].
analytics that utilizes RL which exploit both real weather data
Available: https : / / www . energy . gov / sites / prod /
and electricity price, tested on a full-scale complex model.
files / 2012 % 20Microgrid % 20Workshop % 20Report %
Key differences between the complex and simplified models
2009102012.pdf.
included reactive power, switching devices, controllers, and

Authorized licensed use limited to: Oestfold University College. Downloaded on May 01,2025 at 22:48:24 UTC from IEEE Xplore. Restrictions apply.
[5] M. Faisal, M. A. Hannan, P. J. Ker, A. Hussain, microgrid system by deep reinforcement learning tech-
M. B. Mansor, and F. Blaabjerg, “Review of energy niques,” Energies, vol. 13, no. 11, p. 2830, 2020.
storage system technologies in microgrid applications: [19] E. Foruzan, L.-K. Soh, and S. Asgarpoor, “Reinforce-
Issues and challenges,” IEEE Access, vol. 6, pp. 35 143– ment learning approach for optimal distributed energy
35 164, 2018. management in a microgrid,” IEEE Transactions on
[6] J. Del Ser, D. Casillas-Perez, L. Cornejo-Bueno, et al., Power Systems, vol. 33, no. 5, pp. 5749–5758, 2018.
“Randomization-based machine learning in renewable [20] V.-H. Bui, A. Hussain, and H.-M. Kim, “Double deep
energy prediction problems: Critical literature review, Q-learning-based distributed operation of battery energy
new results and perspectives,” Applied Soft Computing, storage system considering uncertainties,” IEEE Trans-
p. 108 526, 2022. actions on Smart Grid, vol. 11, no. 1, pp. 457–469,
[7] E. O. Arwa and K. A. Folly, “Reinforcement learning 2019.
techniques for optimal power control in grid-connected [21] Y. Du and F. Li, “Intelligent multi-microgrid energy
microgrids: A comprehensive review,” IEEE Access, management based on deep neural network and model-
vol. 8, pp. 208 992–209 007, 2020. free reinforcement learning,” IEEE Transactions on
[8] D. Zhang, X. Han, and C. Deng, “Review on the re- Smart Grid, vol. 11, no. 2, pp. 1066–1076, 2019.
search and practice of deep learning and reinforcement [22] H. Bian, X. Tian, J. Zhang, and X. Han, “Deep rein-
learning in smart grids,” CSEE Journal of Power and forcement learning algorithm based on optimal energy
Energy Systems, vol. 4, no. 3, pp. 362–370, 2018. dispatching for microgrid,” in 2020 5th Asia Conference
[9] R. S. Sutton and A. G. Barto, Reinforcement learning: on Power and Electrical Engineering (ACPEE), IEEE,
An introduction. MIT press, 2018. Tianjin, China, 2020, pp. 169–174.
[10] V. François-Lavet, D. Taralla, D. Ernst, and R. Fonte- [23] M. ELamin, F. Elhassan, and M. A. Manzoul, “Enhanc-
neau, “Deep reinforcement learning solutions for energy ing energy trading between different islanded micro-
microgrids management,” in European Workshop on Re- grids a reinforcement learning algorithm case study in
inforcement Learning (EWRL 2016), Barcelona, Spain, northern kordofan state,” in 2020 International Confer-
2016. ence on Computer, Control, Electrical, and Electron-
[11] P. Chen, M. Liu, C. Chen, and X. Shang, “A bat- ics Engineering (ICCCEEE), IEEE, Khartoum, Sudan,
tery management strategy in microgrid for personalized pp. 1–6.
customer requirements,” Energy, vol. 189, p. 116 245, [24] M. ELamin, F. Elhassan, and M. A. Manzoul, “Com-
2019. parison of deep reinforcement learning algorithms in
[12] T. A. Nakabi and P. Toivanen, “Deep reinforcement enhancing energy trading in microgrids,” in 2020 Inter-
learning for energy management in a microgrid with national Conference on Computer, Control, Electrical,
flexible demand,” Sustainable Energy, Grids and Net- and Electronics Engineering (ICCCEEE), IEEE, Khar-
works, vol. 25, p. 100 413, 2021. toum, Sudan, 2021, pp. 1–6.
[13] Y. Ji, J. Wang, J. Xu, X. Fang, and H. Zhang, “Real-time [25] Y. Ji, J. Wang, J. Xu, and D. Li, “Data-driven online
energy management of a microgrid using deep rein- energy scheduling of a microgrid based on deep rein-
forcement learning,” Energies, vol. 12, no. 12, p. 2291, forcement learning,” Energies, vol. 14, no. 8, p. 2120,
2019. 2021.
[14] Y.-K. Wu, C.-R. Chen, and H. Abdul Rahman, “A [26] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and
novel hybrid model for short-term forecasting in pv O. Klimov, “Proximal policy optimization algorithms,”
power generation,” International Journal of Photoen- arXiv preprint arXiv:1707.06347, 2017.
ergy, vol. 2014, pp. 1–9, 2014. [27] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and
[15] M. Abuella and B. Chowdhury, “Solar power probabilis- P. Moritz, “Trust region policy optimization,” in Inter-
tic forecasting by using multiple linear regression analy- national conference on machine learning, PMLR, Lille,
sis,” in SoutheastCon 2015, IEEE, Fort Lauderdale, FL, France, 2015, pp. 1889–1897.
USA, 2015, pp. 1–5. [28] C. Xue, D. Zhou, and Y. Li, “Hybrid model predic-
[16] V. Thapar, G. Agnihotri, and V. K. Sethi, “Critical tive current and voltage control for LCL-filtered grid-
analysis of methods for mathematical modelling of connected inverter,” IEEE Journal of Emerging and
wind turbines,” Renewable Energy, vol. 36, no. 11, Selected Topics in Power Electronics, vol. 9, no. 5,
pp. 3166–3177, 2011. pp. 5747–5760, 2021.
[17] R. Chedid, H. Akiki, and S. Rahman, “A decision [29] M. P. Akter, S. Mekhilef, N. M. L. Tan, and H.
support technique for the design of hybrid solar-wind Akagi, “Model predictive control of bidirectional ac-
power systems,” IEEE transactions on Energy conver- dc converter for energy storage system,” Journal of
sion, vol. 13, no. 1, pp. 76–83, 1998. Electrical Engineering and Technology, vol. 10, no. 1,
[18] D. Domı́nguez-Barbero, J. Garcı́a-González, M. A. pp. 165–175, 2015.
Sanz-Bobi, and E. F. Sánchez-Úbeda, “Optimising a

Authorized licensed use limited to: Oestfold University College. Downloaded on May 01,2025 at 22:48:24 UTC from IEEE Xplore. Restrictions apply.

2403.01013v2
No ratings yet
2403.01013v2
25 pages
Energy Management System for an Industrial Microgrid Using Optimization Algorithms-Based Reinforcement Learning Technique
No ratings yet
Energy Management System for an Industrial Microgrid Using Optimization Algorithms-Based Reinforcement Learning Technique
18 pages
fenrg-12-1361869
No ratings yet
fenrg-12-1361869
12 pages
batteries-09-00219-v3
No ratings yet
batteries-09-00219-v3
16 pages
energies-14-08365-v2
No ratings yet
energies-14-08365-v2
17 pages
symmetry-16-00322
No ratings yet
symmetry-16-00322
15 pages
DRL Based EMS TesisDoc
No ratings yet
DRL Based EMS TesisDoc
14 pages
DOC-20250318-WA0174.
No ratings yet
DOC-20250318-WA0174.
6 pages
Deep_Reinforcement_Learning_for_Smart_Grid_Operations_Algorithms_Applications_and_Prospects
No ratings yet
Deep_Reinforcement_Learning_for_Smart_Grid_Operations_Algorithms_Applications_and_Prospects
42 pages
Energies 16 06770 v2
No ratings yet
Energies 16 06770 v2
20 pages
Design and Implementation of an Environment for Learning to Run a Power Network (L2RPN)
No ratings yet
Design and Implementation of an Environment for Learning to Run a Power Network (L2RPN)
18 pages
Energies: Impact Analysis of Demand Response Intensity and Energy Storage Size On Operation of Networked Microgrids
No ratings yet
Energies: Impact Analysis of Demand Response Intensity and Energy Storage Size On Operation of Networked Microgrids
19 pages
Applied Energy
No ratings yet
Applied Energy
23 pages
Energy Storage Management Via Deep Q-Networks
No ratings yet
Energy Storage Management Via Deep Q-Networks
5 pages
An Efficient Scenario Based Stochastic P
No ratings yet
An Efficient Scenario Based Stochastic P
16 pages
Smart Grid Optimiization by Deep Reinforcement Learning Over Discreet and Continuous Action Space
No ratings yet
Smart Grid Optimiization by Deep Reinforcement Learning Over Discreet and Continuous Action Space
4 pages
Distributed Economic Dispatch in Microgrids Based On Cooperative Reinforcement Learning
No ratings yet
Distributed Economic Dispatch in Microgrids Based On Cooperative Reinforcement Learning
12 pages
Optimization under uncertainty of a biomass-integrated renewable energy microgrid with energy storage
No ratings yet
Optimization under uncertainty of a biomass-integrated renewable energy microgrid with energy storage
14 pages
1 s2.0 S2352484722019709 Main
No ratings yet
1 s2.0 S2352484722019709 Main
17 pages
820 - Paper - PowerTech2019 Preprint
No ratings yet
820 - Paper - PowerTech2019 Preprint
6 pages
Enhanced energy management in smart microgrids using hybrid optimization and demand response strategies
No ratings yet
Enhanced energy management in smart microgrids using hybrid optimization and demand response strategies
14 pages
Energies 15 04100
No ratings yet
Energies 15 04100
16 pages
Stochastic_Optimization_of_Renewable-Based_Microgrid_Operation_Incorporating_Battery_Operating_Cost
No ratings yet
Stochastic_Optimization_of_Renewable-Based_Microgrid_Operation_Incorporating_Battery_Operating_Cost
8 pages
1-s2.0-S0142061524002126-main (1)
No ratings yet
1-s2.0-S0142061524002126-main (1)
19 pages
ML IN POWER SYSTEM ReserachProposalPresentation
No ratings yet
ML IN POWER SYSTEM ReserachProposalPresentation
8 pages
Model Predictive Control for Virtual Synchronous
No ratings yet
Model Predictive Control for Virtual Synchronous
5 pages
Deep Reinforcement Learning For Power System
No ratings yet
Deep Reinforcement Learning For Power System
13 pages
Applsci 10 02951
No ratings yet
Applsci 10 02951
4 pages
Optimal Energy Management For Microgrids With Combined Heat and Power (CHP) Generation, Energy Storages, and Renewable Energy Sources
No ratings yet
Optimal Energy Management For Microgrids With Combined Heat and Power (CHP) Generation, Energy Storages, and Renewable Energy Sources
18 pages
A Model Predictive Control Framework For Reliable Microgrid Energy
No ratings yet
A Model Predictive Control Framework For Reliable Microgrid Energy
11 pages
Termpaper
No ratings yet
Termpaper
5 pages
Energy Management of Renewable Energy Sources Incorporating With Energy Storage Device
No ratings yet
Energy Management of Renewable Energy Sources Incorporating With Energy Storage Device
17 pages
ENERGY MANAGEMENT IN HYBRID PV-WIND-BATTERY STORAGE-BASED MICROGRID USING MONTE CARLO OPTIMIZATION TECHNIQUE
No ratings yet
ENERGY MANAGEMENT IN HYBRID PV-WIND-BATTERY STORAGE-BASED MICROGRID USING MONTE CARLO OPTIMIZATION TECHNIQUE
33 pages
document
No ratings yet
document
12 pages
DRLMicrogrid_5_2_2020
No ratings yet
DRLMicrogrid_5_2_2020
21 pages
36385-Article Text-138449-1-10-20240711
No ratings yet
36385-Article Text-138449-1-10-20240711
14 pages
Full Text
No ratings yet
Full Text
14 pages
Read 1
No ratings yet
Read 1
10 pages
Consensus-Based Hybrid Multiagent Cooperative Control Strategy of Microgrids Considering Load Uncertainty
No ratings yet
Consensus-Based Hybrid Multiagent Cooperative Control Strategy of Microgrids Considering Load Uncertainty
14 pages
Multi-objective algorithm for hybrid microgrid energy management based on multi-agent system
No ratings yet
Multi-objective algorithm for hybrid microgrid energy management based on multi-agent system
12 pages
MPC For Optimization
No ratings yet
MPC For Optimization
6 pages
Optimal Battery Management Strategies in Mobile Networks Powered by A Smart Grid
No ratings yet
Optimal Battery Management Strategies in Mobile Networks Powered by A Smart Grid
8 pages
Avila 2020
No ratings yet
Avila 2020
14 pages
Dynamic Energy Management System
No ratings yet
Dynamic Energy Management System
14 pages
IET Generation Trans Dist - 2020 - Shakir - Forecasting and Optimisation For Microgrid in Home Energy Management Systems
No ratings yet
IET Generation Trans Dist - 2020 - Shakir - Forecasting and Optimisation For Microgrid in Home Energy Management Systems
11 pages
Smart Optimization in Battery Energy Storage Systems An Overview
100% (2)
Smart Optimization in Battery Energy Storage Systems An Overview
17 pages
Multiobjective Intelligent Energy Management For A Microgrid
No ratings yet
Multiobjective Intelligent Energy Management For A Microgrid
12 pages
GAERGE
No ratings yet
GAERGE
17 pages
Optimal operation and stochastic scheduling of renewable energy of a microgrid with optimal sizing of battery energy storage considering cost reduction
No ratings yet
Optimal operation and stochastic scheduling of renewable energy of a microgrid with optimal sizing of battery energy storage considering cost reduction
13 pages
Two-Level Optimal Scheduling Strategy of Demand Response-Based Microgrids Based On Renewable Energy Forecasting
No ratings yet
Two-Level Optimal Scheduling Strategy of Demand Response-Based Microgrids Based On Renewable Energy Forecasting
32 pages
Applied Soft Computing Journal: Amin Masoumi, Saeid Ghassem-Zadeh, Seyed Hossein Hosseini, Behnam Zamanzad Ghavidel
No ratings yet
Applied Soft Computing Journal: Amin Masoumi, Saeid Ghassem-Zadeh, Seyed Hossein Hosseini, Behnam Zamanzad Ghavidel
19 pages
B14 1rv17ee004 V
No ratings yet
B14 1rv17ee004 V
14 pages
Technologies 12 00039
No ratings yet
Technologies 12 00039
25 pages
Power quality improvement of microgrid for photovoltaic ev charging station with hybrid energy storage system using RPO-ADGAN approach (1)
No ratings yet
Power quality improvement of microgrid for photovoltaic ev charging station with hybrid energy storage system using RPO-ADGAN approach (1)
13 pages
Improved Genetic Algorithm-Based Optimization Approach For Energy Management of Microgrid
No ratings yet
Improved Genetic Algorithm-Based Optimization Approach For Energy Management of Microgrid
6 pages
Presentation1 Final
No ratings yet
Presentation1 Final
20 pages
A Hierarchical Distributed Predictive Control Approach For Microgrids Energy Management
No ratings yet
A Hierarchical Distributed Predictive Control Approach For Microgrids Energy Management
12 pages
IJEEE - Microgrid BSS Scheduling Using Teaching Learning Based Optimization Algorithm
No ratings yet
IJEEE - Microgrid BSS Scheduling Using Teaching Learning Based Optimization Algorithm
11 pages
Microgrid Energy Basics
From Everand
Microgrid Energy Basics
Jack Patterson
No ratings yet
Recent Advances in Electrical Engineering: Applications Oriented
From Everand
Recent Advances in Electrical Engineering: Applications Oriented
SUMAN DEBNATH
No ratings yet
Solar Power Systems Design Installation and Maintenance Textbook
No ratings yet
Solar Power Systems Design Installation and Maintenance Textbook
138 pages
Pothitou Et Al. (2016) - Environmental Knowledge, Pro-Environmental Behaviour and Energy
No ratings yet
Pothitou Et Al. (2016) - Environmental Knowledge, Pro-Environmental Behaviour and Energy
13 pages
ÖLFLEX® SOLAR XLR-E (I+E)
No ratings yet
ÖLFLEX® SOLAR XLR-E (I+E)
2 pages
Ahmed Ali CV (Electrical Engineer)
No ratings yet
Ahmed Ali CV (Electrical Engineer)
2 pages
DeltaV Power and Grounding
No ratings yet
DeltaV Power and Grounding
192 pages
Guia de Instalação Gdrive N67TE8
No ratings yet
Guia de Instalação Gdrive N67TE8
54 pages
Lecture: Ir. M. Zaenal Efendi, MT.: Electronic Engineering Polythechnic Institute of Surabaya
No ratings yet
Lecture: Ir. M. Zaenal Efendi, MT.: Electronic Engineering Polythechnic Institute of Surabaya
19 pages
Electrification: Accelerating the Energy Transition 1st Edition Pami Aalto - eBook PDFpdf download
100% (2)
Electrification: Accelerating the Energy Transition 1st Edition Pami Aalto - eBook PDFpdf download
53 pages
Bravo Genset PRM26
No ratings yet
Bravo Genset PRM26
2 pages
Fluorescent Lamp described
No ratings yet
Fluorescent Lamp described
26 pages
40YearsofCirculatingFluidizedBedmanuscript Compl
No ratings yet
40YearsofCirculatingFluidizedBedmanuscript Compl
13 pages
MODULE-2-Economic Aspects-2
No ratings yet
MODULE-2-Economic Aspects-2
20 pages
EDUCATION2
No ratings yet
EDUCATION2
1 page
Berg Uncertainty Principle
0% (1)
Berg Uncertainty Principle
20 pages
KR BOE-2018 Paper-2 Solution PDF
100% (4)
KR BOE-2018 Paper-2 Solution PDF
8 pages
H2 2021 Prelim Phy P1
No ratings yet
H2 2021 Prelim Phy P1
18 pages
The Operation of Ravenswood Generating Station in Queens
No ratings yet
The Operation of Ravenswood Generating Station in Queens
3 pages
SOLAR, Installation Regulations
No ratings yet
SOLAR, Installation Regulations
2 pages
2 DOF Articulated Pen Plotter
No ratings yet
2 DOF Articulated Pen Plotter
19 pages
Solartech PK Solar Pumping Inverters
No ratings yet
Solartech PK Solar Pumping Inverters
2 pages
Passive Infrared Motion and Light Sensor: Msensor G3 Pir 16dpi WH
No ratings yet
Passive Infrared Motion and Light Sensor: Msensor G3 Pir 16dpi WH
2 pages
Ivis en Ma Ags531757.01 05.2011
100% (1)
Ivis en Ma Ags531757.01 05.2011
28 pages
Introduction To MV SWGR (EQUITMENT) - LSB
No ratings yet
Introduction To MV SWGR (EQUITMENT) - LSB
3 pages
Automatic night lamp
No ratings yet
Automatic night lamp
7 pages
REAP RESoverview
No ratings yet
REAP RESoverview
2 pages
Seminar On POWER GENERATING SHOCK ABSORBER - Mechanical Engineering World - Project Ideas - Seminar Topics - E-Books (PDF) - New Trends
No ratings yet
Seminar On POWER GENERATING SHOCK ABSORBER - Mechanical Engineering World - Project Ideas - Seminar Topics - E-Books (PDF) - New Trends
5 pages
05_SenstvtyFactor
No ratings yet
05_SenstvtyFactor
31 pages
Mcgraw Hill Electrical Circuits
No ratings yet
Mcgraw Hill Electrical Circuits
29 pages
Commutation Torque Ripple Reduction in BLDC Motor Using Modified SEPIC Converter and Three
No ratings yet
Commutation Torque Ripple Reduction in BLDC Motor Using Modified SEPIC Converter and Three
2 pages
501 Catalog 2005
No ratings yet
501 Catalog 2005
416 pages

Smart Microgrid Optimization using Deep Reinforcement Learning by utilizing the Energy Storage Systems

Uploaded by

Smart Microgrid Optimization using Deep Reinforcement Learning by utilizing the Energy Storage Systems

Uploaded by

Smart Microgrid Optimization using Deep

Reinforcement Learning by utilizing the Energy

Ibrahim Ahmed Andreas Pedersen Lucian Mihet-Popa

Test Agents on Python and Perform hyperparameter Develop Python

Fig. 2. Overview of the general steps followed during the project

H. Co-Simulation check the effect of co-simulation, and a more complex-

You might also like