0% found this document useful (0 votes)

10 views15 pages

Deep Reinforcement Learning For Smart Home Energy Management

This paper addresses the energy cost minimization problem for smart homes using a Deep Deterministic Policy Gradient (DDPG) based algorithm, which does not require prior knowledge of thermal dynamics or uncertain parameters. The proposed algorithm effectively schedules Heating, Ventilation, and Air Conditioning (HVAC) systems and energy storage systems, achieving energy cost savings of 8.10% to 15.21% while maintaining thermal comfort. Simulation results validate the robustness and effectiveness of the approach in real-world scenarios.

Uploaded by

mustafa ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views15 pages

Deep Reinforcement Learning For Smart Home Energy Management

Uploaded by

mustafa ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO.

XX, MONTH 2019 1

Deep Reinforcement Learning for Smart Home

Energy Management
Liang Yu, Member, IEEE, Weiwei Xie, Di Xie, Yulong Zou, Senior Member, IEEE, Dengyin Zhang,
Zhixin Sun, Linghua Zhang, Yue Zhang, Senior Member, IEEE, Tao Jiang, Fellow, IEEE

Abstract
arXiv:1909.10165v2 [eess.SY] 19 Dec 2019

In this paper, we investigate an energy cost minimization problem for a smart home in the absence of a building thermal
dynamics model with the consideration of a comfortable temperature range. Due to the existence of model uncertainty, parameter
uncertainty (e.g., renewable generation output, non-shiftable power demand, outdoor temperature, and electricity price) and
temporally-coupled operational constraints, it is very challenging to design an optimal energy management algorithm for scheduling
Heating, Ventilation, and Air Conditioning (HVAC) systems and energy storage systems in the smart home. To address the
challenge, we first formulate the above problem as a Markov decision process, and then propose an energy management algorithm
based on Deep Deterministic Policy Gradients (DDPG). It is worth mentioning that the proposed algorithm does not require the
prior knowledge of uncertain parameters and building thermal dynamics model. Simulation results based on real-world traces
demonstrate the effectiveness and robustness of the proposed algorithm.

Index Terms
Smart home, energy management, deep reinforcement learning, energy cost, thermal comfort, energy storage systems, HVAC
systems

I. I NTRODUCTION
As a next-generation power system, smart grid is typified by an increased use of information and communications technology
(e.g., Internet of Things) in the generation, transmission, distribution, and consumption of electrical energy. In smart grid
environment, there are many opportunities for saving the energy cost of smart homes, which are evolved from traditional homes
by adopting three components, i.e., the internal networks, intelligent controls, and home automations [1]. For example, dynamic
electricity prices could be utilized to reduce energy cost by scheduling Energy Storage Systems (ESS) and thermostatically
controllable loads intelligently. As one kind of thermostatically controllable loads, Heating, Ventilation, and Air Conditioning
(HVAC) systems consume about 40% of total energy in a household [2], which results in energy cost concerns for smart home
owners. Since the primary purpose of HVAC systems is to maintain thermal comfort for the occupants, it is of great importance
to optimize the energy cost of smart homes without sacrificing thermal comfort.
In this paper, we investigate an energy optimization problem for a smart home with renewable energies, ESS, HVAC systems,
and non-shiftable loads (e.g., televisions) in the absence of a building thermal dynamics model. To be specific, our objective is
to minimize the energy cost of the smart home during a time horizon with the consideration of a comfortable indoor temperature
range. However, it is very challenging to achieve the above aim due to the following reasons. Firstly, it is often intractable to
obtain accurate dynamics of indoor temperature, which can be affected by many factors [3]. Secondly, it is difficult to know
the statistical distributions of all combinations of random system parameters (e.g., renewable generation output, power demand
of non-shiftable loads, outdoor temperature, and electricity price). Thirdly, there are temporally-coupled operational constraints
associated with ESS and HVAC systems, which means that the current action would affect the future decisions. To address
the above challenge, we propose a Deep Deterministic Policy Gradients (DDPG) based energy management algorithm, which
can make decision about ESS charging/discharging power and HVAC input power simply based on the current observation
information.
The main contributions of this paper are summarized as follows.
• We investigate an energy cost minimization problem for smart homes in the absence of a building thermal dynamics model
with the consideration of a comfortable temperature range, energy exchange between the smart home and the utility grid,
ESS charging/discharging, HVAC input power adjustment, and parameter uncertainties. Then, we reformulate the problem
as a Markov Decision Process (MDP), where environment state, action and reward function are designed.

L. Yu, W. Xie, D. Xie, Y. Zou, Z. Sun, L. Zhang are with Key Laboratory of Broadband Wireless Communication and Sensor Network Technology of Ministry
of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, P. R. China.
D. Zhang is with Jiangsu Key Laboratory of Broadband Wireless Communication and Internet of Things, School of Internet of Things, Nanjing University of
Posts and Telecommunications, Nanjing 210003, P. R. China.
Y. Zhang is with the Department of Engineering, University of Leicester, Leicester LE1 7RH, U.K.
T. Jiang is with Wuhan National Laboratory for Optoelectronics, School of Electronic Information and Communications, Huazhong University of Science and
Technology, Wuhan 430074, P. R. China.
IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 2

• We propose an energy management algorithm to jointly schedule ESS and HVAC systems based on DDPG. Since the
proposed algorithm makes decision simply based on the current environment state, it does not require prior knowledge of
uncertain parameters and building thermal dynamics model.
• Extensive simulation results based on real-world traces show that the proposed algorithm can save energy cost by 8.10%-
15.21% without sacrificing thermal comfort when compared with two baselines. Moreover, the robustness testing shows
that the proposed algorithm has the potential of providing a more efficient and practical tradeoff between maintaining
thermal comfort and reducing energy cost than an “optimal” strategy.
The remainder of this paper is organized as follows. In Section II, we introduce related works. In Section III, system model
and problem formulation are given. Then, we propose a DDPG-based energy management algorithm in Section IV and its
effectiveness is verified by simulation results in Section V. Finally, we make a conclusion and discuss the future work in
Section VI.

II. R ELATED W ORKS

There have been many studies on energy cost and/or thermal comfort in smart homes. Due to the space limitation, we mainly
focus on joint energy cost and thermal comfort management in smart homes [4]–[8]. The approaches proposed in these studies
can be generally classified into two categories, i.e., model-based approaches and model-free based approaches. To be specific,
model-based approaches are designed based on the model information about thermal dynamics of the environment [9] [10].
By contrast, model-free based approaches are designed without requiring the above-mentioned information.

A. Model-based approaches
In [4], Angelis et al. presented a home energy management approach to minimize the energy cost related to task execution,
energy storage, energy selling and heat pump without violating the given comfortable temperature range and other constraints.
In [5], Fan et al. proposed an online home energy management scheme to minimize the energy cost associated with electric
water heaters and HVAC systems with the consideration of indoor temperature ranges. In [6], Zhang et al. developed a home
energy management strategy to minimize energy cost related to the HVAC load and deferrable loads without violating the
given comfortable temperature range. In [7], Pilloni et al. proposed a Quality of Experience (QoE)-aware smart home energy
management system to save energy cost while minimizing the annoyance perceived by users. In [8], Yu et al. proposed an
online home energy management algorithm to minimize the sum of energy cost and thermal discomfort cost (Here, thermal
discomfort cost is the function of temperature deviation between indoor temperature and the comfortable temperature level).
In [11], Franceschelli et al. proposed a heuristic approach to optimize the peak-to-average power ratio of a large population
of thermostatically controlled loads considering comfortable temperature ranges. Although some advances have been made in
the above-mentioned works, their approaches need to model building thermal dynamics with simplified mathematical models,
e.g., Equivalent Thermal Parameters (ETP) model.

B. Model-free based approaches

Since it is very challenging to develop a building thermal dynamics model that is both accurate and efficient enough for
HVAC control, some recent works have considered to use real-time data for HVAC control [12]–[14]. For example, Lu et al. in
[12] proposed an energy management scheme to minimize the sum of electricity cost and user dissatisfaction cost associated
with wash machines and HVAC loads based on multi-agent reinforcement learning and artificial neural network approach.
In [13], Ruelens et al. proposed a residential demand response method to minimize energy cost with the consideration of
temperature range based on batch reinforcement learning. Although reinforcement learning based methods in [12]–[14] do not
require the prior knowledge of building thermal dynamics model, they are known to be unstable or even to diverge when a
nonlinear function approximator (e.g., a neural network) is used to represent the action-value function [15]. To efficiently handle
large and continuous state space, deep reinforcement learning (DRL) has been presented and shown successful in playing Atari
and Go games [15]. In [3], Wei et al. proposed a DRL-based method for building HVAC control, which can reduce energy cost
while maintaining the desired indoor temperature range. In [16], Gao et al. presented a DRL-based thermal comfort control
method to minimize energy consumption and thermal discomfort. In [17], Zhang et al. conducted real-life implementation and
evaluation of a DRL-based control method for a radiant heating system, which optimizes energy demand and thermal comfort.
In [18], Valladares et al. proposed a DRL-based thermal comfort and indoor air control algorithm. In [19], Wan et al. proposed
a DRL-based algorithm to minimize the energy cost of a smart home with battery energy storage. Although some model-free
methods have been proposed in above-mentioned studies, none of them can be applicable to the coordination between ESS
and HVAC systems in smart homes. To deal with this problem, we develop a DDPG-based energy management algorithm in
this paper.
IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 3

Fig. 1. Illustration of a smart home.

III. S YSTEM M ODEL A ND P ROBLEM F ORMULATION

The smart home considered in this paper is shown in Fig. 1, where distributed generators, ESS, loads, and home energy
management system (HEMS) could be identified. Distributed generators could be solar panels or wind generators. ESS could be
lead-acid batteries or lithium-ion batteries, which can reduce net-energy demand from main grids by storing excess renewable
energies locally and are very important for implementing nearly-zero energy buildings in the future [20]. At present, ESS costs
are very high (e.g., around 450$/kWh), which means that installing ESS in a smart home is not very economical. However,
ESS costs are dropping rapidly with the development of technology and are predicted to drop below 100$/kWh within the
next decade. As a result, the profitability of adopting ESS will gradually increase. Therefore, we consider ESS in the model
of the smart home. Loads in a smart home can be generally divided into several types, e.g., non-shiftable loads, shiftable and
non-interruptible loads, and controllable loads [21]. To be specific, power demands of non-shiftable loads (e.g., televisions,
microwaves, and computers) must be satisfied completely without delay. As for shiftable and non-interruptible loads (e.g.,
washing machines), their tasks can be scheduled to a proper time but can not be interrupted. In contrast, controllable loads
(e.g., HVAC systems, heat pumps, and electric water heaters) can be controlled to flexibly adjust their operation times and
energy usage quantities by following some operational requirements, e.g., temperature ranges. In this paper, we mainly focus
on non-shiftable loads and thermostatically controlled loads [13]. As for thermostatically controlled loads, HVAC systems are
considered since they consume about 40% of the total energy in a smart home [2]. Suppose that the HEMS operates in slotted
time, i.e., t ∈ [1, T ], where T is the total number of time slots. For simplicity, the duration of a time slot ∆t is normalized
to a unit time (e.g., one hour) so that power and energy could be used equivalently. In each time slot, the HEMS makes
continuous decision on ESS charging/discharging power and HVAC input power according to a set of available information
(e.g., renewable generation output, non-shiftable power demand, outdoor temperature, and electricity price), with the aim of
minimizing the energy cost of the smart home while maintaining the comfortable temperature range in the absence of the
building thermal dynamics model. In the following parts, models associated with ESS and HVAC systems are provided. Then,
an energy cost minimization problem is formulated. Next, we reformulate it as a MDP due to the difficulty of solving the
minimization problem.

A. ESS Model
Let Bt be the stored energy in the ESS at time slot t. Then, the ESS storage dynamics model is given by
dt
Bt+1 = Bt + ηc ct + , ∀ t, (1)
ηd
where ηc ∈ (0, 1] and ηd ∈ (0, 1] are the charging and discharging efficiency coefficients, respectively; ct and dt are ESS
charging power and discharging power, respectively. Here, ct and dt are assigned with different signs (i.e., ct ≥ 0 and dt ≤ 0),
which contributes to the design of action in Section II-F.
Since ESS cannot be charged above its capacity B max or discharged below the minimal energy level B min , we have
B min ≤ Bt ≤ B max , ∀ t. (2)
Due to the existence of ESS charging and discharging rate limitations, we have
0 ≤ ct ≤ cmax , ∀ t, (3)
−dmax ≤ dt ≤ 0, ∀ t, (4)
IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 4

where cmax and dmax are maximum charging and discharging power of the ESS, respectively.
To avoid the simultaneous ESS charging and discharging, we have
ct · dt = 0, ∀ t. (5)

B. HVAC Model
The HVAC system can be dynamically adjusted to maintain thermal comfort of the occupants in the smart home. Since
thermal comfort depends on many factors (e.g., air temperature, mean radiant temperature, relative humidity, air speed, clothing
insulation, and metabolic rate), its representation is very complex. In existing studies, many modeling approaches and parameter
measurement methods associated with thermal comfort have been developed [16] [22]–[28]. Similar to [3]–[6], this paper uses
a comfortable temperature range as the representation of thermal comfort for simplicity, i.e.,
T min ≤ Tt ≤ T max , ∀ t, (6)
min max
where T and T are the minimum and maximum comfort level, respectively.
In this paper, we consider an HVAC system with inverter in the smart home, i.e., the HVAC system can adjust its input
power et continuously [8]. Suppose emax be the rating power of the HVAC system, we have
0 ≤ et ≤ emax , ∀ t. (7)

C. Power Balancing
To keep the power balance in the smart home, the aggregated power supply should be equal to the served power demand.
Then, we have
gt + pt − dt = bt + et + ct , ∀ t, (8)
where gt , pt , bt are power drawn from the utility grid, renewable generation output, and non-shiftable power demand,
respectively. If gt < 0, it means that energy form the smart home will be sold to the utility grid. Otherwise, the smart
home will purchase energy from the utility grid.

D. Cost Model
Let vt and ut be the buying and selling price of energy, respectively. Then, the energy cost of the smart home at time slot
t can be calculated by
vt − ut vt + ut
C1,t = ( |gt | + gt ), ∀ t, (9)
2 2
where the intuition behind (9) is that just one variable gt is needed to reflect the behavior of electricity buying or selling. For
example, when gt ≥ 0, C1,t = vt gt . For the case gt < 0, C1,t = ut gt .
It is well known that frequent discharging or charging would do harm to the lifetime of the ESS. To capture this phenomenon,
ESS depreciation cost at time slot t is introduced as follows [29]
C2,t = ψ(|ct | + |dt |), ∀ t, (10)
where ψ denotes ESS depreciation coefficient in $/kW.

E. Total Energy Cost Minimization Problem

Based on the above-mentioned models, we can formulate a total energy cost minimization problem as follows,
T
X
(P1) min E{C1,t + C2,t } (11a)
t=1
s.t. (1) − (8), (11b)
where the expectation operator E is taken over the randomness of the system parameters (i.e., renewable generation output pt ,
non-shiftable power demand bt , outdoor temperature Ttout , and buying/selling electricity prices vt /ut ) and the possibly random
control actions (i.e., the amount of energy exchange between the smart home and the utility grid gt , ESS charging/discharging
power ct /dt , and HVAC input power et ) at each time slot.
It is very challenging to solve P1 due to the following reasons. Firstly, it is often intractable to obtain accurate dynamics of
indoor temperature Tt , which can be affected by many factors [3], e.g., building structure and materials, surrounding environment
(e.g., ambient temperature, humidity, and solar radiation intensity), and internal heat gains from occupants, lighting systems
and other equipments. Secondly, it is very difficult to know the statistical distributions of all combinations of random system
parameters. Thirdly, there are temporally-coupled operational constraints associated with ESS and HVAC systems, which means
IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 5

that the current action would affect future decisions. To handle the “time-coupling” property, typical methods are based on
dynamic programming [8], which suffers from “the curse of dimensionality” problem. In this paper, we provide a way of solving
P1 without requiring the dynamics of indoor temperature and prior knowledge of random system parameters. In particular, we
reformulate the above-mentioned sequential decision making problem as a MDP problem. Then, we develop a DDPG-based
energy management algorithm for the problem.

F. MDP Formulation
In the smart home, the indoor temperature at next time slot is only determined by the indoor temperature, HVAC power input,
and environment disturbances (e.g., outdoor temperature and solar irradiance intensity) in the current time slot [6] [7] [30] [31].
Moreover, the ESS energy level at next time slot just depends on the current energy level and current discharging/charging
power according to (1), which is independent of previous states and actions. Thus, both of ESS scheduling and HVAC control
can be regarded as a MDP. In the following parts, we will formulate the sequential decision making problem associated with
smart home energy management as a MDP. It is worth noting that the MDP formulation is an approximation description of the
smart home energy management problem since some components of the environment state may be not Markovian in practice,
e.g., renewable generation output and electricity price. According to existing works [15] [32], even though the environment is
not strictly MDP, the corresponding problem can still be solved by reinforcement learning based algorithms empirically, which
is also validated by simulation results in this paper. For non-Markovian environment, many approaches could be adopted
to improve the performance of reinforcement learning based algorithms, e.g., approximate state [32] [33], recurrent neural
networks [34], gated end-to-end memory policy networks [35], and eligibility traces [33].

Rt +1

Environment
Indoor & outdoor
Non-shiftable loads Utility grid temperatures

HEMS at
Agent

Generators ESS HVAC System

st +1

st
Fig. 2. The agent-environment interaction in the MDP.

A discounted MDP is formally defined as a five-tuple M = (S, A, P, R, γ), where S is the set of environment states and
A is the set of actions. P : S × A × S → [0, 1] is the transition probability function, which models the uncertainty in the
evolution of states of the system based on the action taken by the agent [36]. R : S × A → R is the reward function and
γ ∈ [0, 1] is a discount factor. In this paper, the agent denotes the learner and decision maker (i.e., HEMS agent), while
the environment comprising many objects outside the agent (e.g., renewable generators, non-shiftable loads, ESS, the HVAC
system, utility grid, indoor/outdoor temperature). The interaction between the agent and the environment can be depicted by
Fig. 2, where the HEMS agent observes environment state st and takes action at . Then, environment state becomes st+1 and
the reward Rt+1 is returned. In the following parts, we will design key components of the MDP, including environment state,
action and reward function.
1) Environment State: The environment state consists of seven kinds of information, i.e., renewable generation output pt ,
non-shiftable power demand bt , ESS energy level Bt , outdoor temperature Ttout , indoor temperature Tt , buying electricity price
vt , and time slot index in a day t′ (t′ = mod (t, 24)). Since selling electricity price ut is typically related to buying electricity
price vt (e.g., ut = δvt [37]–[39], δ is a constant), ut is not selected as a part of the environment state. For brevity, st is
adopted to describe the environment state, i.e., st = (pt , bt , Bt , Ttout , Tt , vt , t′ ).
2) Action: The aim of HEMS agent is to optimally decide the amount of energy exchange between the smart home and
the utility grid (i.e., gt ), ESS charging power (i.e., ct ), ESS discharging power (i.e., dt ), and HVAC input power et . After ct ,
dt , and et are jointly decided, gt can be known immediately according to (8). Therefore, the action of the MDP consists of
ESS charging/discharging power ct /dt and HVAC input power et . Since adopting ct and dt simultaneously would complicate
IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 6

the design of the energy management algorithm, we use just one variable ft , where the range of ft is [−dmax , cmax ]. When
ft ≥ 0, ct = ft and dt = 0. When ft ≤ 0, ct = 0 and dmax t = ft . Therefore, the constraints (3)-(5) could be guaranteed. To
guarantee the feasibility of (1)-(2), 0 ≤ ct ≤ min{cmax , B ηc−Bt } when ft ≥ 0, and min{−dmax , (B min − Bt )ηd } ≤ dt ≤ 0
when ft ≤ 0. According to (6), the range of et is [0, emax ]. When indoor temperature Tt is lower than T min, et should be
zero for avoiding further temperature deviation. Similarly, when Tt > T max , the feasible et should be nonnegative. For brevity,
at is used to describe the action, i.e., at = (ft , et ).
3) Reward: According to the MDP theory in [33], the transition of the environment state from st−1 to st could be triggered
by the execution of at−1 . Finally, the reward Rt will be obtained. Since the aim of the agent is to minimize the total energy
cost while maintaining the comfortable temperature range, the corresponding reward consists of three parts, namely the penalty
for the energy consumption of the HVAC system, the penalty for ESS depreciation, and the penalty for temperature deviation.
Since the energy cost of the HVAC system at slot t−1 is C1,t−1 , the first part of Rt can be represented by −C1,t−1 (st−1 , at−1 ).
Similarly, the second part of Rt can be described by −C2,t−1 (st−1 , at−1 ). To maintain the comfortable temperature range,
the third part of Rt can be computed by −C3,t (st ), where
+ +
C3,t (st ) = ([Tt − T max ] + T min − Tt ), ∀ t,

(12)
which means that C3,t = 0 if T min ≤ Tt ≤ T max . Otherwise, C3,t = Tt − T max if Tt > T max , and C3,t = T min − Tt if
Tt < T min .
Taking three parts into consideration, the final reward function can be designed as follows,
Rt = −β(C1,t−1 (st−1 , at−1 ) + C2,t−1 (st−1 , at−1 )) − C3,t (st ),
where β denotes a positive weight coefficient in o C/$.
4) Action-Value Function: When jointly controlling the ESS and the HVAC system at time slot t, the HEMS agent intends
to maximize the expected P∞return it receives over the future. In particular, the return is defined as the sum of the discounted
rewards [33], i.e., R = i=1 γ i−1 Rt+i . Let Qπ (s, a) be the action-value function under a policy π (note that a policy is a
mapping from states to probabilities of selecting each possible action), which represents the expected return if action at = a
is taken in state st = s under the policy π. Then, the optimal action-value function Q∗ (s, a) is maxπ Qπ (st , at ) and can be
calculated by the following Bellman optimality equation in a recursive manner, i.e.,
Q∗ (s, a) = E[R
P t+1 + γmax
∗ ′
a′ Q (st+1 , a )|st = s, at = a].
= s′ ,r P (s , r|s, a)[r + γmaxa′ Q∗ (s′ , a′ ),
′

where s′ ∈ S, r ∈ R, a′ ∈ A, and P ∈ P.
To obtain Q∗ (s, a), system state transition probabilities P (s′ , r|s, a) are required. Since indoor temperature in the smart
home could be affected by many disturbances, it is difficult to accurately obtain state transition probabilities. To overcome this
challenge, Q-learning methods could be used, which do not require the knowledge of state transition probabilities. To support
the case with continuous system states, a function approximator could be adopted to estimate Q-function. When a neural
network with weight θ is adopted as the non-linear function approximator, we refer it as Q-network. In [15], a deep Q-network
(DQN) algorithm was proposed, which can use experience replay and target network to ensure the stability of reinforcement
learning methods when function approximators are adopted. However, DQN cannot be directly applied to the problem with
continuous action spaces since it needs to discretize the action space and lead to an explosion of the number of actions. As
a result, low computational efficiency, decreased performance, and the requirement of more training data would be incurred
[16] [40].

IV. DDPG- BASED E NERGY M ANAGEMENT A LGORITHM

In this section, we first propose a DDPG-based energy management algorithm. Then, we analyze the computational com-
plexity of the proposed algorithm.

A. Algorithmic Design
To solve the MDP problem defined in Section III-F, we propose a DDPG-based energy management algorithm. Different
from DQN, DDPG is capable of dealing with continuous states and actions. For example, just two network outputs are needed
to represent continuous actions in this paper, which avoids the explosion of the number of actions. Since DDPG is a kind of
actor-critic methods (i.e., methods that learn approximations to both policy function and value function), actor network and
critic network are incorporated, which are shown in Fig. 3. The input and output of actor network is the environment state st
and action a, respectively. Then, a and st are adopted as the input of critic network, whose output is action-value function
(i.e., Q(st , a)). Next, the policy gradient can be computed and used to update the weight of actor network. Before computing
Q(st , a), the weight of critic network should be updated based on two mechanisms, i.e., memory replay and target networks.
More details will be introduced when explaining Algorithm 2.
IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 7

st θQ Q ( st , a )

θµ

Fig. 3. Actor network and critic network in DDPG.

Algorithm 1: The Proposed Energy management Strategy

Input: System state St , testing time slots Htest
Output: System decision at = (ft , et ) in each time slot
µ
1 Load the weight of the actor network θ obtained by the Algorithm 2;
2 for t=1,2,· · · ,Htest do
3 Select action at = µ(φ(st )|θµ );
4 Execute action at = (ft , et ) in smart home environment and observe next state st+1 and reward Rt+1 ;
5 end

The proposed DDPG-based energy management algorithm can be found in the Algorithm 1, where the key step is to load
the weight of the actor network θµ , which is trained by Algorithm 2. In each time slot, the actor network selects an action
on ESS charging/discharing power and HVAC input power according to the current environment state st . Then, the action
at is executed and the environment state becomes st+1 . Meanwhile, the reward Rt+1 is obtained. In Algorithm 2, we first
initialize a replay memory D with capacity N , which stores the transition tuple (st , at , Rt+1 , st+1 ). Moreover, a preprocess
function φ(st ) is introduced to facilitate the learning process by normalizing the input data. Specifically, each component in
the environment state at time slot t (e.g., κt ) should be normalized within the range [0,1] using the following expression:
κt −mint κt Q µ Q
maxt κt −mint κt . Then, we randomly initialize critic network Q(φ(s), a|θ ) and actor network µ(φ(s)|θ ) with weights θ
µ
and θ , respectively. Their architectures in the proposed energy management algorithm are described by Fig. 4, where there
are two hidden layers in the actor network and four hidden layers in the critic network. Next, we initialize the weights of
′ ′ ′ ′
target critic network Q(φ(s), a|θQ ) and target actor network µ(φ(s)|θµ ) by copying, i.e., θQ ← θQ and θµ ← θµ . In each
time slot of each episode, an action is selected based on the following expression in the line 8, i.e.,
at = µ(φ(st )|θµ ) + Nt , (13)
where Nt is the exploration noise. In this paper, we use the following way to introduce exploration noise, i.e.,
µ(φ(st )|θµ ), if ωt > ξt ,

at = (14)
(Ut,1 , Ut,2 ), if ωt ≤ ξt ,
where ωt , Ut,1 , and Ut,2 follow uniform distributions with parameters (0,1), (-dmax / max{cmax , dmax }, cmax / max{cmax , dmax }),
and (0,1), respectively. ξt = max(ξt − ζ ∗ (episode − N/P ), ξmin ), ξ0 = 1 and 0 < ζ < 1. After at is obtained, it will be
applied to ESS and the HVAC system. At the end of time slot t, the new state st+1 and the reward Rt+1 are returned from
the environment. Then, the transition tuple (φ(st ), at , Rt+1 , φ(st+1 )) will be stored in the memory for the training of actor
and critic networks as shown in the line 10. Next, K transitions are randomly sampled for training deep neural networks, i.e.,
actor network, critic network, target actor network, and target critic network. As shown in lines 12-14, Q(φ(si ), ai ) and yi
generated by critic network and target network are used to calculate mean square error loss. By minimizing the loss function,
the weight of critic network could be updated. Then, we can calculate the sampled policy gradient as shown in the line 15,
which is used to update the weight of actor network. Finally, the weights of target actor network and target critic network
could be updated as shown in lines 17-19. Note that a small τ should be selected in order to improve the learning stability.
Typically, 0 < τ ≪ 1.

B. Algorithmic Computational Complexity

In Algorithm 1, it can be observed that the computational complexity of the proposed energy management algorithm depends
on the number of testing slots Htest . Since simple calculations are carried out in Algorithm 1, its computational complexity
IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 8

Algorithm 2: Training Deep Neural Networks with DDPG

Input: Renewable generation output, non-shiftable power demand, outdoor temperature, electricity price
Output: The weights of actor network and critic network, i.e., θµ and θQ
1 Initialize memory D of size N ;
2 Initialize preprocess function φ(st );
Q µ Q
3 Randomly initialize critic network Q(φ(s), a|θ ) and actor network µ(φ(s)|θ ) with weights θ and θµ , respectively;
′ ′
′ ′ Q
4 Initialize target networks Q and µ by copying: θ ⇐ θQ , θµ ⇐ θµ ;
5 for episode=1,2,· · · ,M do
6 Receive the initial environment state s0 ;
7 for t=0,2,· · · ,P − 1 do
8 Select action at = µ(φ(st )|θµ ) + Nt ;
9 Execute action at in smart home environment and observe next state st+1 and reward Rt+1 ;
10 Store (φ(st ), at , Rt+1 , φ(st+1 )) in D;
11 Sample a random mini-batch of K transitions (φ(si ), ai , Ri+1 , φ(si+1 )) from D, 1 ≤ i ≤ K;
′ ′
12 Set yi = Ri+1 + γQ′ (φ(si+1 ), µ′ (φ(si+1 )|θµ )|θQ );
13 Update critic network by minimizing the loss:;
1 PK Q 2
14 L= K i=1 (yi − Q(φ(si ), ai |θ )) ;
15 Update actor policy using sampled policy gradient:;
PK ∇a Q(φ(s),a|θQ )|s=si ,a=µ(φ(si ))
16 i=1 K ∇θµ µ(φ(s)|θµ )|si ;
17 Update target networks:;
′ ′
18 θQ ← τ θQ + (1 − τ )θQ ;
′ ′
19 θµ ← τ θµ + (1 − τ )θµ ;
20 end
21 end

can be described by O(Htest ). Given the fixed testing time horizon, a shorter duration of a time slot would results in a larger
Htest . However, the time slot’s duration can not be selected arbitrarily in practice due to the following reasons. On one hand,
too long duration would results in the loss of many control opportunities of saving energy cost and maintaining a comfortable
temperature range. On the other hand, too short duration may affects the training convergence of DRL-based algorithms
since the control actions taken by the DRL agent cannot take effect immediately in terms of environment states (e.g., indoor
temperature) [17]. Therefore, the duration of a time slot should be selected appropriately in practice. In existing works, the
typical duration of a time slot is several minutes or one hour (e.g., 15 minutes [3], 1 hour [17]), which is far greater than the
computation time of the proposed energy management algorithm in a time slot. Therefore, the proposed energy management
algorithm can be implemented in a real-time way.

V. P ERFORMANCE E VALUATION
In this section, we evaluate the performance of the proposed energy management algorithm. We first describe the simulation
setup. Then, we describe the baselines used for performance comparisons. Finally, we provide simulation results about algo-
rithmic convergence process, algorithmic performance under varying β, algorithmic effectiveness, and algorithmic scalability.

A. Simulation setup
In simulations, we use real-world traces related to solar generation, non-shiftable power demand, outdoor temperature, and
electricity price, which are extracted from Pecan Street database1 . Note that such database is the largest real-world open
energy database on the planet and includes the data related to home energy consumption and solar generation of the Mueller
neighborhood in Austin, Texas, USA. For simplicity, the cooling mode of a residential HVAC system is considered. Since
summers in Austin are very hot2 , we use the data during the period from June 1 to August 31, 2018 for model training and
testing. To be specific, the data in June and July is used to train neural network models and the data in August is adopted for
performance testing. Some important system parameters are configured as follows: ut = 0.9vt [37], γ = 0.995, ηc = ηd = 0.95
[41], ζ = 0.0005, ξmin = 0.1, T min = 66.2o F (19o C) [3], T max = 75.2o F (24o C) [3], other parameter configurations are
shown in TABLE I, where αa and αc denote the learning rate of actor network and critic network, respectively. In TABLE I,
Na and Nc denote the number of neurons in each hidden layer of actor network and critic network, respectively. To simulate the
environment, we adopt the following indoor temperature dynamics model for simplicity, i.e., Tt+1 = εTt +(1−ε)(Ttout − ηA hvac
et )
1 https://www.pecanstreet.org/
2 https://en.wikipedia.org/wiki/Austin, Texas#Climate
IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 9

(a) Actor network

(b) Critic network

Fig. 4. The architectures of actor network and critic network.

[6] [7] [30] [31], where ε = 0.7 [42], ηhvac = 2.5 [30], A = 0.14kW/oF [30]. Note that the variant of the proposed energy
management algorithm can be applicable to any indoor temperature dynamics model by incorporating more environment-related
variables in system state, e.g., relative humidity and solar radiation intensity.
TABLE I
M AIN PARAMETER SETTINGS

Htest 744 hours ∆t 1 hour

B max 6kW h B min 0.6kW h
B0 1.2kW h cmax 3kW
dmax 3kW emax 2kW
M 3000 P 24
K 120 N 24000
αa 0.0001 αc 0.001
Na 300,600 Nc 300,600,600,600
τ 0.001 Optimizer Adam

B. Baselines
To evaluate the performance of the proposed algorithm, we adopt three baselines as follows.
IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 10

• Baseline1: this scheme adopts ON/OFF policy [3] for building HVAC control but without considering the use of the ESS.
Specifically, the HVAC system will be turned on if Ti > T max and it will be turned off if Ti < T min.
• Baseline2: this scheme uses the DDPG-based control policy in this paper for HVAC control but without considering the use
of the ESS, i.e., cmax = dmax = 0. Based on the performance comparison between Baseline2 and the proposed algorithm,
the energy cost saving caused by the use of the ESS can be known. Similarly, the energy cost saving incurred by the use
of DDPG-based control policy can be obtained by comparing the performance of Baseline2 with thatPof Baseline1.
Htest
• Baseline3: this scheme intends to minimize the cumulative cost during the testing period Htest (i.e., t=1 (C1,t + C2,t ))
with the consideration of constraints (1)-(8), assuming that all uncertainty system parameters and the dynamics model of
indoor temperature can be known beforehand. Although the optimal solution of this scheme is not achievable in practice
due to the existence of parameter and model uncertainties, it can provide the lower bound for the performance of the
proposed algorithm when all constraints in P1 are satisfied.

C. Simulation Results
1) Algorithmic convergence process: According to Algorithm 1, the proposed energy management algorithm needs to know
the training result of Algorithm 2 before testing. In Fig. 5, the reward received during each episode generally increases.
Since the minimum exploration probability ξmin is 0.1 and system parameters (e.g., solar radiation power, non-shiftable power
demand, outdoor temperature, and electricity price) are varying in each episode, the episode reward fluctuates within a small
range. To show the changing trend of rewards more clearly, we provide the average value of the past 50 episodes. In Fig. 5,
it can be found that the average reward generally increases and becomes more and more stable.

Fig. 5. The convergence process of the Algorithm 2.

2) Algorithmic performance under varying β: Since many random number generators are adopted in neural network
initialization, mini-batch data collection for training, and action choice, the performance of the proposed algorithm is varying
even the same system parameters are configured. To show the impact of β on the performance of the proposed algorithm
more clearly, mean values of total energy cost (i.e., the sum of energy cost and ESS depreciation cost) and total temperature
deviation with 95% confidence interval across 40 runs are considered and the corresponding results can be found in Fig. 6.
It can be observed that the mean value of total energy cost and that of total temperature deviation generally decreases and
increases with the increase of β, respectively. Such tendency is obvious since larger β results in more importance of energy
cost and less importance of temperature deviation. By taking mean values of total energy cost and total temperature deviation
into consideration, a proper value of β is 1 when the mean value of total temperature deviation is less than 1o C.
3) Algorithmic effectiveness: Performance comparisons among four schemes are shown in Fig. 7, where the proposed
energy management algorithm achieves better performance than Baseline1 and Baseline2. To be specific, the proposed energy
management algorithm can reduce the mean value of total energy cost by 15.21% and 8.10% when compared with Baseline1
and Baseline2, respectively. Moreover, the mean value of total temperature deviation under the proposed algorithm is smaller
than Baseline1 and Baseline2, which can be illustrated by Figs. 7(b) and (c). Compared with Baseline1, Baseline2 and the
proposed algorithm could save energy cost by increasing/decreasing HVAC input power when electricity price is low/high,
which can be depicted by Figs. 8(a) and (b). Compared with Baseline2, the proposed algorithm could reduce energy cost by
charging/discharging ESS when electricity price is low/high, which can be shown in Figs. 8(a) and (c). Though Baseline3
achieves the best performance, it requires all prior knowledge of uncertain system parameters and thermal dynamics model.
IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 11

(a) Total energy cost

(b) Total temperature deviation

Fig. 6. The impact of β on the performance of the proposed algorithm.

Thus, Baseline3 is just adopted for performance reference. By observing the performance gap between the proposed algorithm
and Baseline3, it can be known that the potential of reducing the mean value of total energy cost is great. In future work, more
training data and advanced DRL-based energy management algorithms would be adopted for reducing the performance gap.
4) Algorithmic robustness: Note that the thermal dynamics model used in above-mentioned simulations can not capture
thermal disturbances in practice, e.g., thermal disturbances from solar irradiance, lighting systems, and computers. Thus, we
evaluate the robustness of the proposed algorithm when random thermal disturbance is introduced. To be specific, Tt+1 =
εTt + (1 − ε)(Ttout − ηA hvac
et ) + ǫt [10], where the error item ǫt is assumed to follow a uniform distribution with parameters
o
[ϑl , ϑu ] F . In this scenario, three cases are considered, i.e., ϑu = −ϑl = 1.8, 3.6, 5.4. In Fig. 9, it can be observed that the
proposed algorithm achieves better performances than Baseline1 under three cases. Compared with Baseline3, the proposed
algorithm can save the total energy cost by up to 10% with a small increase of the total temperature violation. Moreover, unlike
Baseline3, the proposed algorithm does not require any prior knowledge of all uncertain parameters and thermal dynamics model.
Therefore, the proposed algorithm has the potential of providing a more efficient and practical tradeoff between maintaining
thermal comfort and reducing energy cost than Baseline3.

VI. C ONCLUSION
In this paper, we proposed a DDPG-based energy management algorithm for a smart home to efficiently control HVAC
systems and energy storage systems in the absence of a building thermal dynamics model, with the consideration of a
IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 12

(a) Mean value of total energy cost

(b) Mean value of total temperature deviation

(c) Indoor temperature

Fig. 7. Performance comparisons among three schemes (β = 0.6, 95% confidence interval across 40 runs is considered).
IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 13

(a) Price

(b) HVAC input power

(c) ESS energy level

Fig. 8. Simulation results associated with ESS and HVAC systems.

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 14

(a) Mean value of total energy cost

(b) Mean value of total temperature deviation

Fig. 9. The robustness of the proposed algorithm.

comfortable temperature range and many parameter uncertainties. Extensive simulation results based on real-world traces
showed the effectiveness and robustness of the proposed algorithm. In future work, more reasonable thermal comfort models
and more types of controllable loads (e.g., electric vehicles, electric water heaters) would be incorporated. In addition, more
opportunities of saving energy cost could be grasped by utilizing real-world occupant behavior information [43], which requires
the adoption of more advanced deep neural network architectures/algorithms.

R EFERENCES
[1] S. Wu, J. Rendall, M. Smith, S. Zhu, J. Xu, Q. Yang, H. Wang, and P. Qin, “Survey on prediction algorithms in smart homes,” IEEE Internet of Things
Journal, vol. 4, no. 3, pp. 636-644, June 2017.
[2] A. Afram and F. Janabi-Sharif, “Effects of dead-band and set-point settings of on/off controllers on the energy consumption and equipment switching
frequency of a residential HVAC system,” Journal of Process Control, vol. 47, pp. 161-174, 2016.
[3] T. Wei, Y. Wang, and Q. Zhu, “Deep reinforcement learning for building HVAC control,” The 54th Annual Design Automation Conference, 2017.
[4] F. Angelis, M. Boaro, D. Fuselli, S. Squartini, F. Piazza, and Q. Wei, “Optimal home energy management under dynamic electrical and thermal constraints,”
IEEE Trans. on Industrial Informatics, vol. 9, no. 3, pp. 1518-1527, Aug. 2013.
[5] W. Fan, N. Liu, and J. Zhang, “An event-triggered online energy management algorithm of smart home: lyapunov optimization approach,” Energies, vol.
9, no. 5, pp. 381-404, 2016.
[6] D. Zhang, S. Li, M. Sun, and Z. O’Neill, “An optimal and learning-based demand response and home energy management system,” IEEE Trans. on Smart
Grid, vol. 7, no. 4, pp. 1790-1801, July 2016.
[7] V. Pilloni, A. Floris, A. Meloni and L. Atzori, “Smart home energy management including renewable sources: a QoE-driven approach,” IEEE Trans. on
Smart Grid, vol. 9, no. 3, pp. 2006-2018, May 2018.
IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO. XX, MONTH 2019 15

[8] L. Yu, T. Jiang, and Y. Zou, “Online energy management for a sustainable smart home with an HVAC load and random occupancy,” IEEE Trans. on
Smart Grid, vol. 10, no. 2, pp. 1646-1659, March 2019.
[9] M. Shad, A. Momeni, R. Errouissi, C.P. Diduch, M.E. Kaye, and L. Chang, “Identification and estimation for electric water heaters in direct load control
programs”, IEEE Trans. on Smart Grid, vol. 8, no. 2, pp. 947-955, Nov. 2015.
[10] E.C. Kara, M. Bergés, and G. Hug, “Impact of disturbances on modeling of thermostatically controlled loads for demand response”, IEEE Trans. on
Smart Grid, vol. 6, no. 5, pp. 2560-2568, Nov. 2015.
[11] M. Franceschelli, A. Pilloni, and A. Gasparri, “A heuristic approach for online distributed optimization of multi-agent networks of smart sockets and
thermostatically controlled loads based on dynamic average consensus”, 2018 European Control Conference, 2018.
[12] R. Lu, S. Hong, and M. Yu, “Demand response for home energy management using reinforcement learning and artificial neural network,” IEEE Trans.
on Smart Grid, DOI: 10.1109/TSG.2019.2909266, 2019.
[13] F. Ruelens, B. Claessens, S. Vandael, B. Schutter, R. Babuška, and R. Belmans, “Residential demand response of thermostatically controlled loads using
batch reinforcement learning,” IEEE Trans. on Smart Grid, vol. 8, no. 5, pp. 2149-2159, Sept. 2017.
[14] J. Vázquez-Canteli, and Z. Nagy, “Reinforcement learning for demand response: A review of algorithms and modeling techniques,” Applied Energy,
vol. 235, pp. 1072-1089, 2019.
[15] V. Mnih, et. al. “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529-541, 2015.
[16] G. Gao, J. Li, Y. Wen, “Energy-efficient thermal comfort control in smart buildings via deep reinforcement learning,” arXiv:1901.04693v1, 2019.
[17] Z. Zhang and K.P. Lam, “Practical implementation and evaluation of deep reinforcement learning control for a radiant heating system,” The 5th ACM
International Conference on Systems for Built Environments, 2018.
[18] W. Valladares, M. Galindo, J. Gutiérrez, W. Wu, K. Liao, J. Liao, K. Lu, and C. Wang, “Energy optimization associated with thermal comfort and indoor
air control via a deep reinforcement learning algorithm,” Building and Environment, vol. 155, pp. 105-117, 2019.
[19] Z. Wan, H. Li, and H. He, “Residential energy management with deep reinforcement learning,” International Joint Conference on Neural Networks
(IJCNN), 2018.
[20] A.I. Nousdilis, E.O. Kontis, G.C. Kryonidis, G.C. Christoforidis, and G.K. Papagiannis, “Economic assessment of lithium-ion battery storage systems
in the nearly zero energy building environment,” 20th International Symposium on Electrical Apparatus and Technologies, 2018.
[21] M. Yousefi, A. Hajizadeh, and M. Soltani, “A comparison study on stochastic modeling methods for home energy management system,” IEEE Trans.
on Industrial Informatices, DOI: 10.1109/TII.2019.2908431, 2019.
[22] B. Yang, X. Cheng, D. Dai, T. Olofsson, H. Li, and A. Meier, “Real-time and contactless measurements of thermal discomfort based on human poses
for energy efficient control of buildings,” Building and Environment, vol. 162, pp. 1-10, 2019.
[23] X. Cheng, B. Yang, A. Hedman, T. Olofsson, H. Li, and L. Gool, “A pilot study of online non-invasive measuring technology based on video magnification
to determine skin temperature,” Building and Environment, vol. 198, pp. 340-352, 2019.
[24] X. Cheng, B. Yang, T. Olofsson, G. Liu, and H. Li, “A pilot study of online non-invasive measuring technology based on video magnification to determine
skin temperature”, Building and Environment, vol. 121, pp. 1-10, 2017.
[25] Y. Wang, and Z. Lian, “A thermal comfort model for the non-uniform thermal environments,” Energy and Buildings, vol. 172, pp. 397-404, 2018.
[26] W. Li, J. Zhang, T. Zhao, and R. Liang, “Experimental research of online monitoring and evaluation method of human thermal sensation in different
active states based on wristband device,” Energy and Buildings, vol. 173, pp. 613-622, 2018.
[27] L. Yang, Z. Zheng, J. Sun, D. Wang, and X. Li, “A domain-assisted data driven model for thermal comfort prediction in buildings,” The ninth ACM
International Conference on Future Energy Systems, 2018.
[28] L. Yu, D. Xie, T. Jiang, Y. Zou, and K. Wang, “Distributed real-time hvac control for cost-efficient commercial buildings under smart grid environment,”
IEEE Internet of Things Journal, vol. 5, no. 1, pp. 44-55, Feb. 2018.
[29] H. Xu, X. Li, X. Zhang, and J. Zhang, “Arbitrage of energy storage in electricity markets with deep reinforcement learning,” arXiv:1904.12232v1, 2019.
[30] P. Constantopoulos, F. C. Schweppe, and R. C. Larson, “Estia: A realtime consumer control scheme for space conditioning usage under spot electricity
pricing,” Computers & Operations Research, vol. 18, no. 8, pp. 751-765, 1991.
[31] A.A. Thatte and L. Xie, “Towards a unified operational value index of energy storage in smart grid environment,” IEEE Trans. on Smart Grid, vol. 3,
no. 3, pp. 1418-1426, Sep. 2012.
[32] Z. Zhang, A. Chong, Y. Pan, C. Zhang, S. Lu, and K. Lam, “A deep reinforcement learning approach to using whole building energy model for hvac
optimal control,” 2018 Building Performance Modeling Conference and SimBuild co-organized by ASHRAE and IBPSA-USA, 2018.
[33] R.S. Sutton and A.G. Barto, “Reinforcement learning: an introduction,” The MIT Press, London, England, 2018.
[34] J. Schmidhuber, “Reinforcenlent learning in Markovian and non-Markovian environments,” Proceedings of the 3rd International Conference on Neural
Information Processing Systems, 1990.
[35] J. Perez and T. Silander, “Non-Markovian control with gated end-to-end memory policy networks,” arXiv:1705.10993v1, 2017.
[36] S. Padakandla, K.J. Prabuchandran and S. Bhatnagar, “Reinforcement learning in non-stationary environments,” https://arxiv.org/pdf/1905.03970.pdf
[37] L. Yu, T. Jiang and Y. Cao, “Energy cost minimization for distributed internet data centers in smart microgrids considering power outages,” IEEE Trans.
on Parallel and Distributed Systems, vol. 26, no. 1, pp. 120-130, Jan. 2015.
[38] L. Yu, T. Jiang, and Y. Zou, “Distributed real-time energy management in data center microgrids,” IEEE Trans. on Smart Grid, vol. 9, no. 4, pp.
3748-3762, July 2018.
[39] Y. Zhang, N. Gatsis, and G. B. Giannakis, “Robust management of distributed energy resources for microgrids with renewables,” IEEE Trans. on
Sustainable Energy, vol. 4, no. 4, pp. 944-953, Oct. 2013.
[40] T. Lillicrap, J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,”
International Conference on Learning Representations, 2016.
[41] Y. Xu, L. Xie, and C. Singh, “Optimal scheduling and operation of load aggregator with electric energy storage in power markets,” North American
Power Symposium 2010, 2010.
[42] R. Deng, Z. Zhang, J. Ren, and H. Liang, “Indoor temperature control of cost-effective smart buildings via real-time smart grid communications,” IEEE
Globecom, 2016.
[43] S. Chen, T. Liu, F. Gao, J. Ji, Z. Xu, B. Qian, H. Wu, and X. Guan, “Butler, not servant: A human-centric smart home energy management system,”
IEEE Communications Magazine, vol. 55, no. 2, pp. 27-33, Feb. 2017.

Year of Goodbyes
28% (18)
Year of Goodbyes
41 pages
Benham Rise Final Powerpoint Presentation
No ratings yet
Benham Rise Final Powerpoint Presentation
16 pages
Imp 3
No ratings yet
Imp 3
20 pages
Intelligent Residential Energy Management System For Dynamic Demand Response in Smart Buildings
No ratings yet
Intelligent Residential Energy Management System For Dynamic Demand Response in Smart Buildings
12 pages
Minimizing Building Electricity Costs in A Dynamic Power Market: Algorithms and Impact On Energy Conservation
No ratings yet
Minimizing Building Electricity Costs in A Dynamic Power Market: Algorithms and Impact On Energy Conservation
11 pages
Department of Eee Aits, Rajampet
No ratings yet
Department of Eee Aits, Rajampet
16 pages
UG Project Team - 08
No ratings yet
UG Project Team - 08
7 pages
Residential Energy Scheduling For Variable Weather Solar Energy Based On Adaptive Dynamic Programming
No ratings yet
Residential Energy Scheduling For Variable Weather Solar Energy Based On Adaptive Dynamic Programming
11 pages
Safe: Secure Appliance Scheduling For Flexible and Efficient Energy Consumption For Smart Home Iot
No ratings yet
Safe: Secure Appliance Scheduling For Flexible and Efficient Energy Consumption For Smart Home Iot
12 pages
A.Lokesh, V.Sreenivasulu Ii-.Btech, Eee, Aits, Rajampeta Abstract
No ratings yet
A.Lokesh, V.Sreenivasulu Ii-.Btech, Eee, Aits, Rajampeta Abstract
11 pages
Deep Reinforcement Learning For HVAC Control in Smart Buildings
No ratings yet
Deep Reinforcement Learning For HVAC Control in Smart Buildings
6 pages
Hotel Classification
No ratings yet
Hotel Classification
9 pages
A Fast Technique For Smart Home Management ADP With Temporal Difference Learning
No ratings yet
A Fast Technique For Smart Home Management ADP With Temporal Difference Learning
13 pages
IEEE Optimal Sizing of PV and BESS For A Smart Household
No ratings yet
IEEE Optimal Sizing of PV and BESS For A Smart Household
10 pages
Idea Uncertainty-Aware Household Appliance Scheduling Considering Dynamic Electricity Pricing in
No ratings yet
Idea Uncertainty-Aware Household Appliance Scheduling Considering Dynamic Electricity Pricing in
10 pages
PowerCon2023 TSAC
No ratings yet
PowerCon2023 TSAC
7 pages
1 s2.0 S2666546820300434 Main
No ratings yet
1 s2.0 S2666546820300434 Main
9 pages
Model
No ratings yet
Model
17 pages
A Multi-Agent Reinforcement Learning Based Data-Driven Method For Home Energy Management
No ratings yet
A Multi-Agent Reinforcement Learning Based Data-Driven Method For Home Energy Management
10 pages
Usage of GAMS-Based Digital Twins and Clustering To Improve Energetic Systems Control
No ratings yet
Usage of GAMS-Based Digital Twins and Clustering To Improve Energetic Systems Control
17 pages
An Optimal Home Energy Management System
No ratings yet
An Optimal Home Energy Management System
15 pages
Lesson Plan
No ratings yet
Lesson Plan
4 pages
Buildings 11 00237 v3
No ratings yet
Buildings 11 00237 v3
21 pages
Salerno 2021
No ratings yet
Salerno 2021
41 pages
Energy Prediction of Appliances Using Supervised ML Algorithms
No ratings yet
Energy Prediction of Appliances Using Supervised ML Algorithms
17 pages
On-Line Building Energy Optimization Using Deep Reinforcement Learning
No ratings yet
On-Line Building Energy Optimization Using Deep Reinforcement Learning
11 pages
Decentralized Neighborhood Energy Management With Coordinated Smart Home Energy Sharing
No ratings yet
Decentralized Neighborhood Energy Management With Coordinated Smart Home Energy Sharing
10 pages
Arts7 Q1 M1 Attiresfabricsandtapestriesv Final
100% (2)
Arts7 Q1 M1 Attiresfabricsandtapestriesv Final
28 pages
Deep Learning in Energy Modeling Application in Smart Buildings With Distributed Energy Generation
No ratings yet
Deep Learning in Energy Modeling Application in Smart Buildings With Distributed Energy Generation
23 pages
Boom Truck - V550K-TH - English
No ratings yet
Boom Truck - V550K-TH - English
4 pages
Appliance Based Control For Home Power Management Systems
No ratings yet
Appliance Based Control For Home Power Management Systems
15 pages
Real-Time Scheduling For Optimal Energy Optimization in Smart Grid Integrated With Renewable Energy Sources
No ratings yet
Real-Time Scheduling For Optimal Energy Optimization in Smart Grid Integrated With Renewable Energy Sources
23 pages
Smart Home Energy Management Optimization Method Considering Energy Storage and Electric Vehicle
No ratings yet
Smart Home Energy Management Optimization Method Considering Energy Storage and Electric Vehicle
11 pages
Al Last
No ratings yet
Al Last
22 pages
Comprehensive Pharmacology
No ratings yet
Comprehensive Pharmacology
102 pages
Internet of Things: Ishaani Priyadarshini, Sandipan Sahu, Raghvendra Kumar, David Taniar
No ratings yet
Internet of Things: Ishaani Priyadarshini, Sandipan Sahu, Raghvendra Kumar, David Taniar
18 pages
Edmodo - Print Quiz (CSS)
No ratings yet
Edmodo - Print Quiz (CSS)
11 pages
Real-Time Energy Management in Smart Homes Through Deep Reinforcement Learning
No ratings yet
Real-Time Energy Management in Smart Homes Through Deep Reinforcement Learning
18 pages
(Classics in Applied Mathematics) Stephen L. Campbell, Carl D. Meyer - Generalized Inverses of Linear Transformations - Society For Industrial and Applied Mathematics (2008)
100% (1)
(Classics in Applied Mathematics) Stephen L. Campbell, Carl D. Meyer - Generalized Inverses of Linear Transformations - Society For Industrial and Applied Mathematics (2008)
294 pages
Smart Home Energy Management VAE-GAN Synthetic Dataset Generator and Q-Learning
No ratings yet
Smart Home Energy Management VAE-GAN Synthetic Dataset Generator and Q-Learning
12 pages
Buildings 13 02397
No ratings yet
Buildings 13 02397
16 pages
A Survey of Smart Home Energy Conservation Techniques
No ratings yet
A Survey of Smart Home Energy Conservation Techniques
20 pages
Enterprise
No ratings yet
Enterprise
13 pages
Duvasu Admissions - 2019: Prospectus
No ratings yet
Duvasu Admissions - 2019: Prospectus
44 pages
A Deep Reinforcement Learning Approach Based Energy Managemen
No ratings yet
A Deep Reinforcement Learning Approach Based Energy Managemen
8 pages
Chen 2020 176
No ratings yet
Chen 2020 176
10 pages
Lafayette Parish Business Database 211
No ratings yet
Lafayette Parish Business Database 211
890 pages
Unit 202: Principles of Engineering Technology: Handout 14: Work, Power and Energy
No ratings yet
Unit 202: Principles of Engineering Technology: Handout 14: Work, Power and Energy
3 pages
Energy Storage Management Via Deep Q-Networks
No ratings yet
Energy Storage Management Via Deep Q-Networks
5 pages
Gokhale 2024 Safe
No ratings yet
Gokhale 2024 Safe
5 pages
Multi-Agent Deep Reinforcement Learning For HVAC Control in Commercial Buildings
No ratings yet
Multi-Agent Deep Reinforcement Learning For HVAC Control in Commercial Buildings
14 pages
OceanofPDF - Com Velise - Cebelius
100% (1)
OceanofPDF - Com Velise - Cebelius
371 pages
Brandi 2020 146
No ratings yet
Brandi 2020 146
53 pages
Afit Physics Past Questions and Answers1-1
100% (1)
Afit Physics Past Questions and Answers1-1
10 pages
Document From ?
No ratings yet
Document From ?
25 pages
1 s2.0 S2352484723014737 Main
No ratings yet
1 s2.0 S2352484723014737 Main
24 pages
S. M. Tamjeed Bin Alam (R193139)
No ratings yet
S. M. Tamjeed Bin Alam (R193139)
51 pages
Manual Fritzbox Fon Wlan 7170
No ratings yet
Manual Fritzbox Fon Wlan 7170
140 pages
The 7 Philosophies of Balinese Architecture: Materials
No ratings yet
The 7 Philosophies of Balinese Architecture: Materials
13 pages
4.ED-DQN An Event-Driven Deep Reinforcement Learning Control Method For Multi-Zone Residential Buildings
No ratings yet
4.ED-DQN An Event-Driven Deep Reinforcement Learning Control Method For Multi-Zone Residential Buildings
17 pages
Demand Response Program For Efficient Demand-Side Management in Smart Grid Considering Renewable Energy Sources
No ratings yet
Demand Response Program For Efficient Demand-Side Management in Smart Grid Considering Renewable Energy Sources
22 pages
1 s2.0 S0142061521004695 Main
No ratings yet
1 s2.0 S0142061521004695 Main
12 pages
1 s2.0 S0378778824007096 Main
No ratings yet
1 s2.0 S0378778824007096 Main
11 pages
Real-Time Energy Management in Smart Homes Through
No ratings yet
Real-Time Energy Management in Smart Homes Through
18 pages
Thermostat With Machine Learning Algorithms
No ratings yet
Thermostat With Machine Learning Algorithms
8 pages
IoT Research Paper Summaries
No ratings yet
IoT Research Paper Summaries
4 pages
Real Time Multi-Objective Energy Management of A Smart Home
No ratings yet
Real Time Multi-Objective Energy Management of A Smart Home
6 pages
GRE Writting 1000 Tips
No ratings yet
GRE Writting 1000 Tips
58 pages
Optimal Operation of Smart Home Appliances
No ratings yet
Optimal Operation of Smart Home Appliances
6 pages
Dynamic Programming For Home Appliance S
No ratings yet
Dynamic Programming For Home Appliance S
12 pages
3 Days Trip in Ujjain, Madhya Pradesh, India
No ratings yet
3 Days Trip in Ujjain, Madhya Pradesh, India
2 pages
Group 1 Poetry From Asia & Australia
No ratings yet
Group 1 Poetry From Asia & Australia
44 pages
Drugs and Vices
No ratings yet
Drugs and Vices
6 pages
Kirt You So Much
No ratings yet
Kirt You So Much
3 pages
Energies 18 00129 With Cover
No ratings yet
Energies 18 00129 With Cover
25 pages
RETRACTED Modeling Techno-Economic Multi-Objective
No ratings yet
RETRACTED Modeling Techno-Economic Multi-Objective
12 pages
ACKO Health Index 2024 F8279fe111
No ratings yet
ACKO Health Index 2024 F8279fe111
16 pages
Article 28809
No ratings yet
Article 28809
20 pages
Paper 1
No ratings yet
Paper 1
20 pages
Xi-Bio - Guess Paper 2024 - Sigma FT Homelander
No ratings yet
Xi-Bio - Guess Paper 2024 - Sigma FT Homelander
43 pages
1 s2.0 S0378778823001081 Main
No ratings yet
1 s2.0 S0378778823001081 Main
11 pages
May 2024 Resume
No ratings yet
May 2024 Resume
2 pages
Optibend Unitube Mini Cable
No ratings yet
Optibend Unitube Mini Cable
7 pages
Multi Agent Deep Reinforcement Learning Optimization Framework - 2022 - Applied
No ratings yet
Multi Agent Deep Reinforcement Learning Optimization Framework - 2022 - Applied
17 pages
A Comparison of ML Algorithms
No ratings yet
A Comparison of ML Algorithms
18 pages
CALTRACS - Standard - Calvert Racing, Inc
No ratings yet
CALTRACS - Standard - Calvert Racing, Inc
1 page
Meta-Reinforcement Learning-Based Transferable Scheduling Strategy For Energy Management
No ratings yet
Meta-Reinforcement Learning-Based Transferable Scheduling Strategy For Energy Management
11 pages
DPR Dahisar Mangrove Park
No ratings yet
DPR Dahisar Mangrove Park
370 pages

Deep Reinforcement Learning For Smart Home Energy Management

Uploaded by

Deep Reinforcement Learning For Smart Home Energy Management

Uploaded by

IEEE INTERNET OF THINGS JOURNAL, VOL. XX, NO.

XX, MONTH 2019 1

Deep Reinforcement Learning for Smart Home

II. R ELATED W ORKS

B. Model-free based approaches

Fig. 1. Illustration of a smart home.

III. S YSTEM M ODEL A ND P ROBLEM F ORMULATION

E. Total Energy Cost Minimization Problem

Generators ESS HVAC System

IV. DDPG- BASED E NERGY M ANAGEMENT A LGORITHM

Fig. 3. Actor network and critic network in DDPG.

Algorithm 1: The Proposed Energy management Strategy

B. Algorithmic Computational Complexity

Algorithm 2: Training Deep Neural Networks with DDPG

(a) Actor network

(b) Critic network

Fig. 4. The architectures of actor network and critic network.

Htest 744 hours ∆t 1 hour

Fig. 5. The convergence process of the Algorithm 2.

(a) Total energy cost

(b) Total temperature deviation

Fig. 6. The impact of β on the performance of the proposed algorithm.

(a) Mean value of total energy cost

(b) Mean value of total temperature deviation

(c) Indoor temperature

(b) HVAC input power

(c) ESS energy level

Fig. 8. Simulation results associated with ESS and HVAC systems.

(a) Mean value of total energy cost

(b) Mean value of total temperature deviation

Fig. 9. The robustness of the proposed algorithm.

You might also like