0% found this document useful (0 votes)

36 views13 pages

Nonintrusive Load Monitoring Based On Self-Supervised Learning

Uploaded by

Jafet Arbelaez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views13 pages

Nonintrusive Load Monitoring Based On Self-Supervised Learning

Uploaded by

Jafet Arbelaez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL.

72, 2023 2507113

Nonintrusive Load Monitoring Based

on Self-Supervised Learning
Shuyi Chen , Bochao Zhao , Member, IEEE, Mingjun Zhong , Member, IEEE,
Wenpeng Luan , Senior Member, IEEE, and Yixin Yu , Life Senior Member, IEEE

Abstract— Deep learning models for nonintrusive load moni- extensive attention around the world [4]. In general, load
toring (NILM) tend to require a large amount of labeled data monitoring technology is mainly categorized into intrusive
for training. However, it is difficult to generalize the trained way and nonintrusive way. Note that intrusive load monitoring
models to unseen sites due to different load characteristics and
operating patterns of appliances between datasets. For addressing requires extra sensor installation for submetering. Alterna-
such problems, self-supervised learning (SSL) is proposed in this tively, the concept of nonintrusive load monitoring (NILM)
article, where labeled appliance-level data from the target dataset was proposed by Hart [5] as identifying power consumed
or house are not required. Initially, only the aggregate power by each individual appliance via analyzing aggregate power
readings from target dataset are required to pretrain a general readings using only software tools. NILM offers appliance-
network via a self-supervised pretext task to map aggregate
power sequences to derived representatives. Then, supervised level power consumption feedback to both demand and sup-
downstream tasks are carried out for each appliance category ply sides economically and efficiently, contributing to power
to fine-tune the pretrained network, where the features learned system planning and operation [1], energy bill savings [6],
in the pretext task are transferred. Utilizing labeled source demand side management [7], and energy conservation and
datasets enables the downstream tasks to learn how each load
emission reduction [3], [6], [8].
is disaggregated, by mapping the aggregate to labels. Finally,
the fine-tuned network is applied to load disaggregation for NILM is a single-channel blind source separation problem,
the target sites. For validation, multiple experimental cases are aiming to disaggregate the appliance-level energy consumption
designed based on three publicly accessible REDD, U.K.-DALE, from the aggregate measurements [9]. Combinatorial optimiza-
and REFIT datasets. Besides, the state-of-the-art neural networks tion (CO) is initially applied to perform NILM in [5], search-
are employed to perform NILM task in the experiments. Based
ing for the best combination of operational states of individual
on the NILM results in various cases, SSL generally outperforms
zero-shot learning in improving load disaggregation performance appliances at each time instance. However, CO relies on the
without any submetering data from the target datasets. power range of each operational state as prior knowledge,
Index Terms— Deep neural network (DNN), nonintrusive load
making it unavailable to the newly added appliances [10].
monitoring (NILM), self-supervised learning (SSL), sequence-to- Benefiting from the technology development in recent years
point learning. on big data, artificial intelligence, and edge computing, plenty
of NILM approaches have been proposed based on machine
I. I NTRODUCTION learning, mathematics, and signal processing [8], [11]. Fac-
torial hidden Markov model (FHMM) and its variants [12],
I N RECENT years, energy shortage and environmental pol-
lution worldwide have become increasingly serious. There-
fore, the approaches of efficient energy utilization and carbon
[13], [14] are popular in carrying out NILM. Given an
aggregate power signal as the observation, such FHMM-based
NILM methods estimate the hidden operational states of
emissions reduction are being explored [1], [2]. Meanwhile,
each appliance considering their state continuity in time
with the global deployment of smart meters, benign interaction
series [15], [16]. Thus, FHMM-based methods usually achieve
between power suppliers and users has been established for
good results in disaggregating loads with periodic operation
enhancing demand side management and optimizing power
such as refrigerators. However, their performance is limited
grid operation [3]. As one of the energy conservation applica-
for the loads with short-lasting working cycles and the ones
tions, electricity consumption detail monitoring has attracted
with less frequent usage. Note that FHMM-based methods are
Manuscript received 23 October 2022; accepted 27 January 2023. Date of regarded as state-based NILM approaches, where the aggregate
publication 22 February 2023; date of current version 2 March 2023. This power measurement at each time instance is assigned to
work was supported in part by the Joint Funds of the National Natural Science
Foundation of China under Grant U2066207 and in part by the National Key each operational state per appliance [17]. Alternatively, NILM
Research and Development Program of China under Grant 2020YFB0905904. approaches can be event-based, where sudden changes in
The Associate Editor coordinating the review process was Dr. Guglielmo power signals referring to turn-on, turn-off, and state transition
Frigo. (Corresponding author: Wenpeng Luan.)
Shuyi Chen, Bochao Zhao, Wenpeng Luan, and Yixin Yu are with events are featured [17]. Such event-based NILM methods can
the School of Electrical and Information Engineering, Tianjin University, be carried out via subtractive clustering and the maximum
Tianjin 300072, China (e-mail: [email protected]). likelihood classifier [18]. Besides, graph signal processing
Mingjun Zhong is with the Department of Computing Science, University of
Aberdeen, AB24 3UE Aberdeen, U.K. (e-mail: [email protected]). concepts are applied to perform NILM, mapping correlation
Digital Object Identifier 10.1109/TIM.2023.3246504 among samples to the underlying graph structure [19], [20].
1557-9662 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
2507113 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023

Although such event-based NILM approaches can achieve transferability across datasets. However, in ZSL, it is difficult
high load identification accuracy, they tend to suffer from to generalize networks between datasets with different load
measurement noises. characteristics and operating patterns of appliances. The same
Deep neural networks (DNNs), performing well in computer as ZSL, self-supervised learning (SSL) requires no labeled
vision, speech recognition, and natural language processing, data from the target set. SSL is an efficient way to extract
have been employed in load disaggregation since 2015 [21]. universal features from large-scale unlabeled data, contributing
Since then, DNN structures become more and more pop- to robustness enhancement [36]; thus, it performs well in
ular to perform NILM, including long short-term memory image processing and speech recognition [37]. To the best
(LSTM) [15], [21], [22], gated recurrent unit (GRU) [10], [23], of our knowledge, SSL has not been used to solve NILM
denoising autoencoder (dAE) [21], [24], and convolutional problem.
neural network (CNN) [25], [26], [27], showing competitive Driven by such research gaps, in this article, SSL is applied
performance against traditional NILM methods. Although to three state-of-the-art NILM algorithms based on CNN,
LSTM is suitable for long time-series-related tasks due to GRU, and LSTM, as S2p [9], Bi-GRU [10], and LSTM [22].
avoiding the vanishing gradient problem, it underperforms in For performing NILM, a self-supervised pretext task training
NILM task compared to CNN [28], [29]. Attention mechanism is initially carried out for learning features from the aggregate
is applied to LSTM in [22] for improving load disaggregation power readings from the unlabeled data in the target set.
performance. As a variant of LSTM, GRU can also remember Then, the pretrained network is fine-tuned in the supervised
data patterns. In addition, GRU contains fewer parameters, downstream task training based on the labeled data from the
thus requires shorter training time, which is suitable for online source set for transferring the prelearned knowledge to load
application in NILM [10], [30]. Note that the bidirectional disaggregation. After pretraining and fine-tuning, the network
GRU (Bi-GRU) is employed to perform NILM in [10], where can be applied to load disaggregation for target sites. The
the network can be trained simultaneously in positive and proposed method is validated on the real-world datasets at
negative time directions. Besides, dAE is applied to NILM 1-min granularity, in the scenarios designed for the same
by recovering the power signal of target appliances (clean dataset or across various datasets. The contributions of this
signal) from the aggregate (noisy signal) [8], [21], where article are clarified as follows.
CNN layers are usually embedded [21], [24]. A state-of-the- 1) SSL is applied to load disaggregation based on deep
art CNN-based NILM method, S2p, is proposed in [9] and learning without submetering on the target set, by setting
claimed to outperform benchmarks in NILM task [9], [31], a pretext task for network pretraining on unlabeled data
beneficial from meaningful latent features learned from sub- from the target set with fine-tuning.
metering data. Recently, more improved network structures 2) Experiments are carried out for all combinations of three
are applied to NILM, such as generative adversarial networks state-of-the-art DNN-based NILM methods (S2p [9],
(GANs) [32] and transformer [33]. GAN framework helps to Bi-GRU [10], and LSTM [22]) and learning frameworks
improve appliance profile estimation when performing NILM, (SSL and ZSL), on three real-world datasets.
as more outputs close to the ground truth (GT) are generated in 3) Six cases differing in data selection are designed for
network training. In terms of transformer, attention mechanism performance evaluation on the data across houses or
is advantageous to NILM via capturing long-range dependency sets, showing SSL generally outperforms in various
from sequential data [30]. metrics and energy consumption estimation results, with
Compared to traditional NILM methods, the advantages comparable training time cost.
of DNN-based NILM approaches include automatic feature The rest of this article is organized as follows. In Section II,
extraction from power readings and linearity between com- the NILM formulation is clarified, followed by introducing the
putational complexity and appliance amount [4]. However, preliminaries for NILM neural networks and SSL. The
the promising performance of the aforementioned DNN-based methodology of SSL for NILM is explained in Section III.
methods relies on a large amount of submetering data from Section IV contains datasets, evaluation metrics, and exper-
the target set for training [4]. Since such data collection may imental settings, followed by experimental results with dis-
last for months or even years [4], it is neither user-friendly cussion illustrated in Section V; eventually, the conclusion is
nor economical in practice. drawn and the future work is prospected in Section VI.
Alternatively, transfer learning concepts are proposed,
where transferable networks can be trained on a source (seen) II. P RELIMINARIES
dataset and applied to the load disaggregation task on a
In this section, we first formulate the NILM problem and
target (unseen) dataset [2], [32], [34], [35]. Depending on
then clarify seq2seq and seq2point concepts, followed by an
whether network fine-tuning is required, transfer learning can
introduction for two seq2point network architectures. Finally,
be classified as few-shot learning (FSL) and zero-shot learning
the overall structure of SSL is demonstrated.
(ZSL) [34]. For fine-tuning in FSL, a small amount of labeled
data from the target set are still required [34]. However, when
labels are unable to be captured from the target datasets, ZSL A. NILM Problem Formulation
offers proper solutions. In [35], ZSL achieves tiny performance Assuming that the aggregate power reading measured in a
drop compared to baseline when it is employed in load household at time index t ∈ [1, T ] is yt , where T refers to
disaggregation by both GRU and CNN networks, showing the total number of samples. Then, the simultaneous power

Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: NLM BASED ON SELF-SUPERVISED LEARNING 2507113

TABLE I
N ETWORK S PECIFICATIONS FOR S2 P [9]

1) S2p: The utilization of S2p in NILM is based on the

assumption that the midpoint of each sliding window acts as
its nonlinear regression representation. Namely, S2p makes full
use of the past and future information to infer the midpoint,
Fig. 1. Examples for seq2seq and seq2point frameworks. (a) seq2seq. (b) S2p
in [9]. (c) Bi-GRU in [10]. as shown in Table I.
For a defined neural network f m , the input is a power
sequence denoted by yt:t+W −1 segmented by a sliding window
consumed by appliance m ∈ M to be disaggregated is denoted from the aggregate, where t is the time index, and the window
by xtm , where M is a set of appliances. The measurement noise size W is set to an odd number. Thus, by mapping each
is denoted by et , usually regarded as Gaussian distributed [7]. sequence yt:t+W −1 to the power xτm consumed by appliance m
Then, the total load power for a household can be expressed at τ = t + (W − 1)/2, the entire power signal xm for
as appliance m can be predicted. Such model can be formulated
as
X
yt = xtm + et . (1) xτm = f m (yt:t+W −1 ) + ε (2)
m∈M
where ε is the W -dimensional Gaussian random noise.
Thus, for each time index t, NILM problem is to estimate xtm , Besides, the loss function in the network training is formulated
∀m, given the aggregate power yt . When applying machine as follows:
learning or deep learning to NILM, it will become a regression T −W
X+1
or classification problem [7]. log p xτm |yt:t+W −1 , θ p

Lp = (3)
t=1
where θ p is a set of network parameters.
B. Sequence-to-Sequence Versus Sequence-to-Point NILM
The CNN-based architecture of S2p is illustrated in Table I
Frameworks
containing five convolutional layers and one dense layer.
NILM can be carried out via neural networks with a seq2seq In each iteration, the input signal refers to an n-length sliding
or seq2point framework [25]. In a seq2seq NILM solution, window for aggregate measurements. Then, five convolutional
for each appliance, a network learns the nonlinear regression layers are employed for feature extraction through an acti-
between sequences with the same time stamps, referring to vation function called ReLU. Eventually, the feature maps
the aggregate and appliance-level power. For an arbitrary are flattened and fed to a dense layer, and an appliance-level
aggregate power sequence y covering time instance t, the power corresponding to the midpoint of the input window is
power xtm consumed by appliance m at t is predicted by the obtained. It is claimed in [31] that S2p achieves performance
network, thus can be finalized as the average value of all such improvement against seq2seq framework on the same network
predictions [9]. Unlike the seq2seq framework, the seq2point architecture.
framework predicts the appliance-level power consumed at 2) Bi-GRU: Unlike S2p in Fig. 1(b), the power consumed
only one point of each sliding window iteratively. The inputs by each appliance at time index t + W − 1 is mapped by
and outputs of both seq2seq and seq2point frameworks in a the predefined sequence yt:t+W −1 in Bi-GRU as historical
NILM task are illustrated in Fig. 1. aggregate measurements, as shown in Fig. 1(c). That is,
Note that seq2point frameworks are demonstrated in as window sliding, power prediction per appliance can be
Fig. 1(b) and (c) on two architectures, as S2p proposed in [9] obtained from only the past information, which is applicable
and Bi-GRU proposed in [10], respectively. Compared to the for real-time load disaggregation. Moreover, GRU is beneficial
seq2seq framework, the seq2point framework emphasizes the from less memory occupancy and fewer parameters than other
representational power at one element and eases the prediction network architectures such as LSTM [30]. The architecture of
task. Then, S2p and Bi-GRU are introduced in detail. Bi-GRU is demonstrated in Table II.

TABLE II III. S ELF -S UPERVISED L EARNING FOR NILM

N ETWORK S PECIFICATIONS FOR Bi-GRU [10]
Motivated by the drawbacks of existing DNN-based NILM
methods requiring labeling for the target dataset, in this article,
SSL is applied to various neural networks for load disaggre-
gation. The scheme of SSL application in NILM is shown in
Fig. 3, taking the S2p NILM method as an example. It can
be observed that the proposed load disaggregation approach is
composed of three stages: self-supervised pretext task training,
supervised downstream task training, and load disaggregation
as testing. For given aggregate power measurements, the S2p
network can be constructed for mapping each W -length sliding
pre pre
window as a sequence yt:t+W −1 to its midpoint yτ through
pretraining. Note that self-supervised pretraining is treated
as initialization for the general neural network f for each
appliance category, contributing to performance improvement
and overfitting mitigation [37]. Then, for each appliance m,
a load disaggregation network f m can be built by fine-tuning
the pretrained network f . That is, the network parameters are
fine-tuned by using the aggregate power sequence yfine t:t+W −1 as
input and xτm as target for the neural networks, which results
in f m . Finally, the predicted power reading x̂ mτ is the output of
f m using the mains reading ytest
t:t+W −1 as input. In the following
part of this section, we clarify both self-supervised pretext task
training and supervised downstream task training proposed for
load disaggregation, while testing procedure is executed as
Fig. 2. SSL framework. in [2].

A. Self-Supervised Pretext Task Training

As shown in Table II, after each aggregate power sequence
is input to the network, a convolutional layer is used for feature Firstly, self-supervised pretext task training is set for extract-
extraction. Then, two Bi-GRU layers are applied to enhance ing features from the unlabeled aggregate data from the
the memory for the data patterns based on the extracted target set. For matching the pretraining task with the network
features, followed by a dense layer as in the S2p network. Note architectures of S2p and Bi-GRU, a nonlinear loss function is
that dropout performs overfitting prevention for such layers. defined as
Note that convolutional layers are used in both seq2point 1
l−W
X+1 pre pre
architectures, with all layers activated by ReLU functions L SSL = min loss yt:t+W −1 , y0 (4)
θSSL l − W + 1 t=1
except the output. As the network layers go deeper, larger
number of filters in convolutional layers in S2p and more units pre
where θSSL is a set of network parameters, and y0 is
in Bi-GRU layers in Bi-GRU help to enrich features.
its midpoint for S2p or the ending point for Bi-GRU.
Equation (4) maps each aggregate power sequence referring
to a sliding window of the mains to its midpoint or ending
C. Self-Supervised Learning
point via a nonlinear regression. It can be inferred from [2]
SSL performs promisingly in computer vision applications and [9] that mapping the aggregate power sequence to its
without labeled data [36], where visual features are learned midpoint can help networks to learn meaningful features, such
from large-scale unlabeled data to avoid the cost of data as appliance-level operational power ranges. Therefore, the
annotations. In SSL, a pretext task should be predesigned midpoint elements are believed as crucial signatures in the
for the network with a loss function to learn features from pretext task for performing energy disaggregation.
unlabeled dataset followed by supervised downstream tasks It is known that sudden changes within an aggregate power
for enriching features. The general pipeline of SSL is shown sequence correspond to appliance state transitions and switch-
in Fig. 2. ing ON/OFF events. Thus, the intuition behind setting such
In Fig. 2, by solving the self-supervised pretext task, the pretext task is to enhance general feature representation from
network is pretrained on unlabeled dataset for mapping the the aggregate in the target dataset, mainly referring to power
relationship between the input and output in the pretext change points, usage duration, and power levels [9]. Therefore,
task. Then, benefiting from the downstream task, the prede- the network parameters are initialized via iterative updating in
fined network parameters are fine-tuned based on the labeled the self-supervised pretraining stage, with each sequence as
dataset. Thus, knowledge is transferred between tasks in input. The schematic of the pretext task proposed for load
SSL procedures. disaggregation is shown in Fig. 4.

Fig. 3. Flowchart of the proposed SSL in load disaggregation, performed by S2p.

Fig. 4. Self-supervised pretext task layout. Fig. 5. Supervised downstream task layout.

As shown in Fig. 4, aggregate measurement sequences as input and the appliance-level power estimation at its midpoint
sliding windows are iteratively imported into the networks, or ending point as output. Thus, the downstream task training
while the network outputs are their midpoints or ending points. for a typical appliance m is illustrated in Fig. 5.
Note that parameter fine-tuning can be carried out on either
B. Supervised Downstream Task Training partial network layers or all layers, defined as partial fine-
For transferring the features extracted from the “unseen” tuning and full fine-tuning. For performing partial fine-tuning
target dataset without labels to load disaggregation task, the on both S2p and Bi-GRU architectures, only their dense layers
network is updated iteratively. That is, the network parameters and output layers are fine-tuned, while the other layers are
are further fine-tuned on the labeled data from the source frozen. In DNN, general low-level features can be captured in
set, as a supervised downstream task training. By carrying shallow layers, while in deep layers, high-level features related
out the downstream task, the general feature representation to the task tend to be extracted [37]. Therefore, partial fine-
learned via the pretext task contributes to predicting individual tuning is usually applied to the tasks with knowledge transfer
appliance power. As described in Section II, such network is for sharing common features and thus guaranteeing efficiency.
fine-tuned with the mains power in each sequence window as However, it is claimed in [38] that fine-tuning all network

parameters after self-supervised pretext outperforms partial

fine-tuning in some tasks. As fine-tuning parameters in only
dense layers may be insufficient in feature extraction and
transference from pretext task to downstream task.

C. Justification for Using SSL

We now justify why our scheme of using SSL could be able
to learn generalized features. The purpose of the pretext task is
to enable networks to learn general features from the aggregate
in the target dataset when neither submetering nor user survey
is available for labeling loads in the target dataset. The purpose
of the downstream task is then to help the network to learn
the features of the load disaggregation task from the source
dataset with labels and transfer general features learned from
the pretext task to load disaggregation.
Denote the midpoint of the aggregated power sequence
yt:t+W −1 as yτ , and the midpoint of the appliance xm t:t+W −1
as xτm . Then, a model in pretext task can be trained to
predict yτ using two nonlinear functions: u = f θ (yt:t+W −1 )
and yτ = gα (u), where u is the representation variable. This
could be represented by the following probabilistic model:
Z
p(yτ |yt:t+W −1 ) = pα (yτ |u) pθ (u|yt:t+W −1 )du. (5)

The function f θ is used to learn a feature representation

Fig. 6. Data utilization in (a) SSL, (b) FSL, and (c) ZSL.
from the aggregate in the target dataset and gα as a regressor.
For predicting xτm , we could simply learn a function h β , such
that xτm = h β (yτ ). Then, we can have
In an FSL-based NILM method, the network is pretrained on
p xτm |yt:t+W −1

the aggregate with submetering data in the source set and then
ZZ fine-tuned on labeled data from the target set [34]. Namely,
pβ xτm |yτ pα (yτ |u)dyτ pθ (u|yt:t+W −1 )du.

= (6) unlike in SSL, limited submetering data are required in FSL.
However, in a ZSL-based NILM method, the network trained
We can see that the fine-tuning procedure in downstream task on the source dataset is only directly tested on the target
given a learned representation u is to use u to predict xτm , dataset, requiring no submetering measurements from the
which means u could be used to predict xτm by the following target set as in SSL. That is, differing from SSL, no knowledge
probabilistic model: about power consumption for the target dataset can be learned
Z
in ZSL.
p xτ |u(yt:t+W −1 ) =
m
pβ xτm |yτ pα (yτ |u(yt:t+W −1 ))dyτ .

(7) IV. E XPERIMENTAL S ETUP

The network learns the features from the source dataset with Based on S2p, Bi-GRU, and LSTM, SSL is benchmarked
labels and transfers general features learned from the pretext with ZSL in solving load disaggregation problems, as they
task to load disaggregation via updating network parameters in neither require labeled data from the target set. Note that in
downstream task. It should be noticed that since the values of the proposed SSL, the network is fully fine-tuned, as full
yτ are all the possible values of the aggregate and the values fine-tuning generally outperforms partial fine-tuning when
of xτm are only all the possible values of an appliance, yτ we apply them to this task, in line with the conclusion
has a larger number of possible values than xτm . Therefore, obtained in [38]. Experiments are carried out on the U.S.
such h β is an undercomplete representation for yτ , and so, REDD dataset [39] and two datasets from the U.K., as
xτm compresses yτ by using h β . This indicates that: 1) we can U.K.-DALE [40] and REFIT [41].
use u to predict xm t:t+W −1 and 2) u contains more information
to predict xm t:t+W −1 than the model directly trained using A. Data Selection
(yt:t+W −1 , xm
t:t+W −1 ).
The REDD dataset [39] contains low-rate power readings
collected from six houses in the U.S. for several weeks
D. Data Utilization in SSL Against FSL and ZSL in 2011, sampled at 1 s. The U.K.-DALE dataset [40], col-
For clearly distinguishing data utilization in load disag- lected from five households in the U.K. from 2012 to 2017,
gregation based on SSL, FSL, and ZSL, a comparison is includes the power consumption data sampled at 1 s, while
demonstrated in Fig. 6. in the REFIT dataset [41], data are collected once per 8 s

TABLE III
DATA S ELECTION FOR T ESTING C ASES

from 20 U.K. houses from 2013 to 2015. Since the data are selected in this article. At first, mean absolute error
from REDD, U.K.-DALE, and REFIT datasets are collected by (MAE) [4] is applied to quantify the absolute error between
standard monitoring equipment and widely used in validating the predicted signal and the GT on average. Namely, MAE is
NILM approaches, its measurement uncertainty is acceptable formulated as
with negligible influence on NILM performance. Compared to T
X
the other two datasets, REFIT dataset is noisier due to more MAE = |x̂ t − xt |/T (8)
unknown loads. Note that all data are downsampled to 1 min t=1
as low-rate measurements. Since the REFIT dataset is the
where the variable x̂ t denotes the power prediction at time
largest [2], containing more houses with more appliances, it is
index t ∈ [1, T ], and xt is the simultaneous actual power
chosen for fine-tuning in SSL and training in ZSL. The experi-
measurement. Thus, T is the total number of samples in the
mental data selected for pretext task training, downstream task
signal.
training, and testing in SSL are clarified in Table III, as well
The second metric is normalized signal aggregate error [30],
as that for both training and testing in ZSL.
which is defined as
For exploring the performance boundary of SSL, six exper-
imental cases are settled. In both Cases 1 and 2, pretraining, SAE = |r̂ − r |/r (9)
fine-tuning, and testing in SSL are carried out among houses
where r and r̂ refer to the actual total energy consumption and
from the same REFIT dataset. Cases 1 and 2 share the same
its prediction, respectively. SAE measures the relative error of
data from the same house for testing, while they differ in
the total energy.
houses for building the pretraining dataset. Data for single
Furthermore, metric energy per day (EpD) [2] is adopted,
House 2 are used for both pretraining and testing in Case 1,
which offers the average error of daily appliance-level energy
while more houses other than House 2 are used for pretraining
consumption prediction. EpD is defined as
in Case 2. Similar settings are employed for Cases 3 and 4;
however, the data for both pretraining and testing are from D
X
REDD houses. For further investigating the impact of pre- EpD = |ên − en |/D (10)
training house selection on load disaggregation, we set the size n=1

of pretraining data from various houses in Case 6 equivalent where D denotes the total number of days, en is the actual
to that from single House 1 in Case 5. In terms of ZSL daily energy consumption, and ên refers to the prediction.
benchmarks, the datasets for training and testing are the same Not like MAE, where errors are calculated sample-by-sample,
as those for fine-tuning and testing for SSL, respectively. corresponding to power, SAE and EpD are calculated for
In consideration of the appliance types across datasets, five energy consumption. However, EpD reflects daily errors while
appliances with the most frequent usage are chosen to be dis- SAE is normalized for the entire period.
aggregated, including fridge (F), microwave (M), dishwasher
(DW), washing machine (WM), and kettel (K). For each C. Experimental Settings
appliance, the data for fine-tuning its network come from
Our program is implemented in Python using Tensor-
houses heuristically picked based on [30].
Flow. The networks were trained on machines with NVIDIA
GeForce RTX 2080 Ti. Networks are trained by ADAM
B. Evaluation Metrics optimizer algorithm, with an early-stopping mechanism to
Since various metrics have been applied to evaluate NILM prevent overfitting as in [2], where the optimal network is
performance in the previous works, three widely used metrics picked with a validation loss lower than those in the following

six epochs. The batch size is set to 512 for REDD dataset, TABLE IV
as in Cases 3 and 4, while in other cases, it is set to 1024 for E XPERIMENTAL R ESULTS FOR S2 P. B OLD N UMBERS
R EFER TO THE B EST R ESULTS IN A ROW
REFIT and U.K.-DALE datasets to accelerate data processing.
Meanwhile, we set the learning rate to 0.001 as in [2]. The
data normalization used in our experiments is the same as
in [9]. In Bi-GRU, dropout rate is set to 0.5. Since the data
sampling rates in this article differ from those in [10] and [22],
the window lengths for Bi-GRU and LSTM are heuristically
characterized as ten for WMs due to their longer operational
duration and five for other appliances, for guaranteeing the
same temporal settings. For similar reason, the general window
length is set to 79 for all appliances in S2p, based on that set
in [9].

V. E XPERIMENTAL R ESULT
In this section, the overall experimental results are initially
demonstrated, as for all cases, SSL is evaluated against ZSL on
S2p, Bi-GRU, and LSTM networks in three metrics for each
appliance. Then, the total energy consumption per appliance
estimated in all experiments is illustrated, followed by a series
of examples of disaggregated power signals. Finally, SSL and
ZSL are compared for their total training time and execution
time per sample. For simplification, SSL with full fine-tuning
is abbreviated as SSL. The abbreviations of such appliances
are: F for fridge; M for microwave, DW for dishwasher,
WM for washing machine, and K for kettle.

A. NILM Performance Metrics Comparison

The comparison between load disaggregation performance
of ZSL and that of SSL is demonstrated in Table IV for S2p,
Table V for Bi-GRU, and Table VI for LSTM. For a clear
expression, hyphens are added for combining disaggregation
networks and the approaches they are trained.
1) Within the Same Dataset: Since only REFIT dataset is
used in both Cases 1 and 2, their results demonstrate load same single house for testing as in Case 1. In Case 2, both S2p-
disaggregation performance of SSL and ZSL on networks SSL and Bi-GRU-SSL achieve better overall performance than
across houses from the same dataset. In Case 1, S2p-SSL the same NILM approaches with ZSL. Similarly, LSTM-SSL
generally outperforms the S2p-ZSL in various metrics, except generally outperforms LSTM-ZSL in Case 2 for most appli-
for WM, mainly caused by the higher power of misidenti- ances, where the experimental results are overall improved by
fied loads predicted by S2p-SSL. Bi-GRU-SSL outperforms 9.1% for MAE, 40% for SAE, and 2.3% for EpD due to less
Bi-GRU-ZSL in overall MAE, although the later one achieves misidentification. Therefore, SSL generally outperforms ZSL
lower MAE for most appliances. The main contributor for in both Cases 1 and 2 for all neural networks we applied to
disaggregation performance by Bi-GRU-SSL is WM, with load disaggregation. In short, more pretraining data generally
lower stand-by power prediction. Namely, the stand-by power help to improve load disaggregation performance.
feature of WM learned from training data by Bi-GRU-ZSL is 2) Across Datasets: In both Cases 3 and 4, the dataset
higher than that in the dataset for testing. In Table VI, LSTM- for both training in ZSL and fine-tuning in SSL remains
SSL generally performs better across metrics and cases for the same as in Cases 1 and 2, while the data in other
most appliances comparing to LSTM-ZSL, as misidentification stages are switched to REDD dataset. EpD of S2p-SSL is
is mitigated, e.g., for DW. Such improvement on overall lower for most appliances while comparable with S2p-ZSL
performance supports the advantage of the proposed SSL on average. Although ZSL achieves lower MAE in Bi-GRU
scheme on NILM tasks. Similar to the results for Bi-GRU, for most appliances, Bi-GRU-SSL achieves lower MAE for
LSTM-ZSL slightly outperforms LSTM-SSL in disaggregating WM, resulting in better overall MAE performance. The load
M because of less overestimation. Specifically, operational disaggregation performance for F is improved by pretraining,
overlapping with other loads may result in the higher power with close performance in Cases 3 and 4, which is similar
features learned from the aggregate for M in the pretext task. to the results achieved for WM. The worst performance for
Similar results can be observed in Case 2, where the data for DW among all appliances is caused by the difference between
pretraining are acquired from other more houses instead of the the DW load characteristics in the REDD dataset and those

TABLE V TABLE VI
E XPERIMENTAL R ESULTS FOR Bi-GRU. B OLD N UMBERS E XPERIMENTAL R ESULTS FOR LSTM. B OLD N UMBERS
R EFER TO THE B EST R ESULTS IN A ROW R EFER TO THE B EST R ESULTS IN A ROW

in the REFIT dataset. Such distinction in load characteristics replacing the data collected from a single house by that from
also affects the disaggregation results for M. MAE results for multiple houses while maintaining the data size seems not to
both M and DW are increased by applying SSL to Bi-GRU, have a certain impact on load disaggregation performance, e.g.,
while in Case 3, the superiority of S2p-SSL in both SAE and close performance for fridges in Cases 5 and 6 is due to the
EpD is caused by lower power estimation for misidentified high similarity in power characteristics of their operational
loads. In Case 3, although EpD results are not improved cycles. Significant performance improvement is achieved by
by LSTM-SSL comparing to LSTM-ZSL, the improvements SSL for M from Cases 5 and 6, due to less misidentification.
on the other two metrics are significant, while in Case 4, Bi-GRU-SSL outperforms Bi-GRU-ZSL for DW in energy
dominating performance is achieved by LSTM-SSL. Since estimation; however, Bi-GRU-ZSL achieves better results in
SSL generally outperforms others in Cases 3 and 4, the results sample-by-sample disaggregation results.
show the knowledge transferability between REDD dataset and It can be concluded from the whole experimental results
REFIT dataset. that self-supervised pretraining helps NILM performance
For the remaining experiments in Cases 5 and 6, data for improvement against ZSL, especially for S2p. Furthermore,
self-supervised pretraining in SSL and testing in both ZSL and S2p-SSL generally performs better than Bi-GRU-SSL and
SSL are selected from U.K.-DALE dataset. From Table IV, LSTM-SSL across metrics. With respect to transferability
applying SSL to S2p generally achieves the best disaggrega- between datasets, the overall results show that U.K.-DALE
tion performance in both cases. Similar experimental results and REFIT datasets, both collected from the U.K., have
can be found in Cases 5 and 6 on LSTM framework, showing stronger transferable potential than the datasets from distinct
general superiority of LSTM-SSL. Slightly poor performance regions. Moreover, the increase in pretraining data size leads to
for M in Case 5 is because of waveformwise and temporal performance improvement for SSL. In terms of disaggregation
pattern distinction between datasets for pretraining and test- performance for each appliance, the poor results for M are
ing. However, improvement is tiny for Bi-GRU. Moreover, achieved in various cases, as little information can be captured

Fig. 7. Energy consumption estimation by ZSL and SSL with GT. (a) Case 1 and Case 2. (b) Case 3 and Case 4. (c) Case 5 and Case 6.

due to its short duration and low usage frequency. For more REFIT House 5 and House 7. The stand-by power samples
complex WM loads with multiple operational states, their are estimated by Bi-GRU and LSTM for DW and WM, while
characteristics differ a lot across houses and datasets, resulting they cannot be observed in the testing set. Similar results can
in underestimation for power ranges while overestimation for be observed in Cases 3 and 4. Moreover, the generally worse
operation times. performance of Bi-GRU and LSTM than S2p in estimating
power consumption for F in Cases 3 and 4 is also caused
B. Energy Consumption Estimation by the distinction between the power characteristics of F in
The power consumption estimation results for each appli- the fine-tuning set and those for the testing set. The worse
ance are demonstrated as in Fig. 7. In both Cases 1 and 2, performance of Bi-GRU and LSTM as underestimation for K
power consumption close to the GT is estimated by various in Cases 5 and 6 can be explained by the same reason. Thus,
methods for F, M, and K, with ON/OFF operational states. we can conclude that Bi-GRU and LSTM are more sensitive
However, they differ in power consumption estimation for to the distinction between distribution in training and testing
multistate DW and WM, respectively, where Bi-GRU and sets. Note that unknown loads consume 73% in Cases 1 and 2,
LSTM suffer from overestimation. Around 8% and 16% 65% in Cases 3 and 4, and 56% in Cases 5 and 6 of the total
samples in power signals for DW and WM in the fine-tuning energy; thus, overestimation is more serious in Cases 1 and 2
sets are in stand-by states below 10 W, collected from both due to more misidentification.

Fig. 8. Disaggregated power signals for U.K.-DALE House 2 in Case 6. (a) Microwave (M). (b) Fridge (F). (c) DW. (d) WM. (e) Kettle (K).

Among all six cases, S2p-SSL generally outperforms

LSTM-SSL and Bi-GRU-SSL in total energy estimation,
in line with the results shown in Table IV–VI. On the whole,
SSL helps to improve such networks in power consumption
estimation performance.

C. Load Power Disaggregation

Furthermore, Case 6 is selected as an example for demon-
strating load disaggregation results. The estimation of power
consumed by five target appliances in Case 6 is shown in
Fig. 8. From Fig. 8, the operating periods for each target appli-
ance are generally identified by these methods. Furthermore, Fig. 9. Total training time (in second) for (a) Case 4 and (b) Case 5.
the superiority of NILM methods based on S2p and LSTM
over those based on Bi-GRU can be observed, with disag-
gregated power signals closer to the GT for most appliances. dataset, including power ranges and periodicity. Since M oper-
Among these methods, underestimation tends to be common ation usually lasts within several minutes, its distinction on
for most appliances. Possible reasons can be inferred as the power profile across datasets and houses is trivial. Therefore,
mapping construction is affected by short-lasting and less- close training time is achieved for M. However, the amount
frequent usage of M and multistate WM operating in various of data for pretraining in Case 5 is over 20 times of that in
power ranges. An exception is DW, which is overestimated by Case 4, resulting in longer training time across models. It is
both S2p-ZSL and LSTM-ZSL, as DWs from REFIT houses noteworthy that applying SSL to both Bi-GRU and LSTM
operate in higher power ranges than those from U.K.-DALE requires longer training time for multistate appliances such
houses. as WM, due to its longest sliding window compared to other
appliances. In conclusion, the amount of data for pretraining
should be carefully chosen as it balances NILM performance
D. Training Time Analysis and efficiency.
In this section, the total training time in both Cases 4 and 5 For evaluating the influence of labeled data amount on
is demonstrated in Fig. 9 as instances. In Case 4, the total execution time, the execution time per sample for all methods
training time of ZSL-based methods and that of SSL-based is shown in Table VII.
methods are close except for F in LSTM-ZSL. As the net- From Table VII, generally close execution time per sample
work convergence duration of LSTM is more sensitive to the can be observed for SSL and ZSL on all networks, especially
distinction between the operational patterns of F in the training for Bi-GRU. For S2p, the mean execution time per sample of

TABLE VII [3] B. Liu and W. Luan, “Conceptual cloud solution architecture and
E XECUTION T IME PER S AMPLE FOR A LL M ETHODS . application scenarios of power consumption data based on load dis-
T IME U NIT I S S ECOND /M EGABYTE aggregation,” Power Syst. Technol., vol. 40, no. 3, pp. 791–796, 2016.
[4] G. Cui, B. Liu, W. Luan, and Y. Yu, “Estimation of target appliance
electricity consumption using background filtering,” IEEE Trans. Smart
Grid, vol. 10, no. 6, pp. 5920–5929, Nov. 2019.
[5] G. W. Hart, “Nonintrusive appliance load data acquisition method:
Progress report,” MIT Energy Lab., Cambridge, MA, USA, Tech. Rep.,
1984.
[6] M. Zeifman and K. Roth, “Nonintrusive appliance load monitoring:
Review and outlook,” in Proc. IEEE Int. Conf. Consum. Electron.
(ICCE), Las Vegas, NV, USA, Jan. 2011, pp. 76–84.
[7] P. Huber, A. Calatroni, A. Rumsch, and A. Paice, “Review on deep
neural networks applied to low-frequency NILM,” Energies, vol. 14,
no. 9, p. 2390, Apr. 2021.
[8] J. R. Herrero et al., “Non intrusive load monitoring (NILM): A state
of the art,” in Proc. Int. Conf. Practical Appl. Agents Multi-Agent Syst.
Cham, Switzerland: Springer, 2017, pp. 125–138.
[9] C. Zhang, M. Zhong, Z. Wang, N. Goddard, and C. Sutton, “Sequence-
to-point learning with neural networks for non-intrusive load monitor-
ing,” in Proc. AAAI Conf. Artif. Intell., 2018, vol. 32, no. 1, pp. 1–8.
SSL is longer than that of ZSL by about 49 s, while for LSTM, [10] O. Krystalakos, C. Nalmpantis, and D. Vrakas, “Sliding window
the mean execution time per sample of ZSL is longer mainly approach for online energy disaggregation using artificial neural net-
contributed by F, as its power characteristics varying across works,” in Proc. 10th Hellenic Conf. Artif. Intell., Jul. 2018, pp. 1–6.
houses in the training dataset result in longer convergence [11] B. Zhao, M. Ye, L. Stankovic, and V. Stankovic, “Non-intrusive load
disaggregation solutions for very low-rate smart meter data,” Appl.
duration. However, in SSL, the pretext task helps to accelerate Energy, vol. 268, Jun. 2020, Art. no. 114949.
convergence in the downstream task. [12] M. Zhong, N. Goddard, and C. Sutton, “Signal aggregate constraints
in additive factorial HMMs, with application to energy disaggregation,”
in Proc. 27th Int. Conf. Neural Inf. Process. Syst., vol. 27. Cambridge,
VI. C ONCLUSION MA, USA: MIT Press, 2014, pp. 3590–3598.
In this article, SSL is applied to the state-of-the-art NILM [13] J. Z. Kolter and T. Jaakkola, “Approximate inference in additive factorial
HMMs with application to energy disaggregation,” in Proc. 15th Int.
solutions based on S2p, Bi-GRU, and LSTM. A pretext task Conf. Artif. Intell. Statist., 2012, pp. 1472–1482.
is proposed to pretrain a general network to map aggregate [14] M. Zhong, N. Goddard, and C. Sutton, “Interleaved factorial non-
power sequences to derived representatives, where labeled homogeneous hidden Markov models for energy disaggregation,” 2014,
data for the target dataset are not required. By performing a arXiv:1406.7665.
[15] L. Mauch and B. Yang, “A new approach for supervised power disaggre-
supervised downstream task based on the labeled data from gation by using a deep recurrent LSTM network,” in Proc. IEEE Global
source datasets, the pretrained network is fine-tuned with Conf. Signal Inf. Process. (GlobalSIP), Orlando, FL, USA, Dec. 2015,
feature transference and then applied to disaggregate loads pp. 63–67.
[16] L. Mauch and B. Yang, “A novel DNN-HMM-based approach for
for the target sites. Publicly accessible REDD, U.K.-DALE, extracting single loads from aggregate power signals,” in Proc. IEEE
and REFIT datasets are utilized for validation, with multiple Int. Conf. Acoust., Speech Signal Process. (ICASSP), Shanghai, China,
experimental cases designed to evaluate the performance of Mar. 2016, pp. 2384–2388.
[17] P. G. Donato, A. Hernandez, M. A. Funes, I. Carugati, R. Nieto, and
SSL using data within the same set and across datasets. For J. Urena, “Review of NILM applications in smart grids: Power quality
a comprehensive NILM performance demonstration, various assessment and assisted independent living,” in Proc. Argentine Conf.
metrics are employed, showing SSL generally outperforms Autom. Control (AADECA), Oct. 2020, pp. 1–6.
ZSL scheme. Specifically, the features learned via SSL help [18] N. Henao, K. Agbossou, S. Kelouwani, Y. Dubé, and M. Fournier,
“Approach in nonintrusive type I load monitoring using subtractive clus-
to improve underestimation, reduce misidentification, and mit- tering,” IEEE Trans. Smart Grid, vol. 8, no. 2, pp. 812–821, Mar. 2017.
igate the influence of wrong labels. Such conclusions are [19] B. Zhao, L. Stankovic, and V. Stankovic, “Blind non-intrusive appliance
also supported by the disaggregated power signals for each load monitoring using graph-based signal processing,” in Proc. IEEE
Global Conf. Signal Inf. Process. (GlobalSIP), Orlando, FL, USA,
appliance and their total power consumption estimation results. Dec. 2015, pp. 68–72.
Thus, SSL contributes to network refinement and disaggre- [20] B. Zhao, L. Stankovic, and V. Stankovic, “On a training-less solution for
gation performance improvement, with limited increment in non-intrusive appliance load monitoring using graph signal processing,”
IEEE Access, vol. 4, pp. 1784–1799, 2016.
execution time per sample.
[21] J. Kelly and W. Knottenbelt, “Neural NILM: Deep neural networks
Future work includes investigation on the influence of acti- applied to energy disaggregation,” in Proc. 2nd ACM Int. Conf. Embed-
vation balancing for compensating the skew of training data, ded Syst. Energy-Efficient Built Environ., 2015, pp. 55–64.
analyzing the impact of sampling frequency of the data for [22] J. Zhang, J. Sun, J. Gan, Q. Liu, and X. Liu, “Improving domestic NILM
using an attention-enabled Seq2Point learning approach,” in Proc. IEEE
fine-tuning in the source domain on the NILM performance, Int. Conf. Dependable, Autonomic Secure Comput., Int. Conf. Pervasive
and applying autoencoder to carry out the pretext task. Intell. Comput., Int. Conf. Cloud Big Data Comput., Int. Conf. Cyber Sci.
Technol. Congr. (DASC/PiCom/CBDCom/CyberSciTech), AB, Canada,
Oct. 2021, pp. 434–439.
R EFERENCES
[23] T.-T.-H. Le, J. Kim, and H. Kim, “Classification performance using gated
[1] Y. Yu, B. Liu, and W. Luan, “Nonintrusive residential load monitoring recurrent unit recurrent neural network on energy disaggregation,” in
and decomposition technology,” Southern Power Syst. Technol., vol. 7, Proc. Int. Conf. Mach. Learn. Cybern. (ICMLC), Jul. 2016, pp. 105–110.
no. 4, pp. 1–5, Aug. 2013. [24] R. Bonfigli, A. Felicetti, E. Principi, M. Fagiani, S. Squartini, and
[2] M. D’Incecco, S. Squartini, and M. Zhong, “Transfer learning for non- F. Piazza, “Denoising autoencoders for non-intrusive load monitoring:
intrusive load monitoring,” IEEE Trans. Smart Grid, vol. 11, no. 2, Improvements and comparative evaluation,” Energy Buildings, vol. 158,
pp. 1419–1429, Mar. 2020. pp. 1461–1474, Jan. 2018.

[25] G. Zhou, Z. Li, M. Fu, Y. Feng, X. Wang, and C. Huang, “Sequence-to- Bochao Zhao (Member, IEEE) received the B.Eng.
sequence load disaggregation using multiscale residual neural network,” degree (Hons.) in electronic and electricity engineer-
IEEE Trans. Instrum. Meas., vol. 70, pp. 1–10, 2021. ing and the Ph.D. degree in electronic and electric-
[26] J. Chen, X. Wang, X. Zhang, and W. Zhang, “Temporal and spectral ity engineering from the University of Strathclyde,
feature learning with two-stream convolutional neural networks for Glasgow, U.K., in 2014 and 2020, respectively.
appliance recognition in NILM,” IEEE Trans. Smart Grid, vol. 13, no. 1, He is currently a Lecturer with the School of
pp. 762–772, Jan. 2022. Electrical and Information Engineering, Tianjin Uni-
[27] D. Yang, X. Gao, L. Kong, Y. Pang, and B. Zhou, “An event-driven versity, Tianjin, China. His current research interests
convolutional neural architecture for non-intrusive load monitoring of include demand response, energy trading in virtual
residential appliance,” IEEE Trans. Consum. Electron., vol. 66, no. 2, power plants, energy disaggregation, and graph sig-
pp. 173–182, May 2020. nal processing applications.
[28] P. Paulo, “Applications of deep learning techniques on NILM,” Ph.D.
dissertation, Dept. Elect. Eng., Federal Univ. Rio de Janeiro, Rio de
Janeiro, Brazil, 2016.
[29] D. Kelly, “Disaggregation of domestic smart meter energy data,” Ph.D.
dissertation, Comput. Univ. London, Diploma Imperial College, London,
U.K., 2016.
[30] L. Wang, S. Mao, and R. M. Nelms, “Transformer for nonintrusive load
monitoring: Complexity reduction and transferability,” IEEE Internet Mingjun Zhong (Member, IEEE) received the B.Sc.
Things J., vol. 9, no. 19, pp. 18987–18997, Oct. 2022. and Ph.D. degrees in applied mathematics from the
[31] J. Barber, H. Cuayáhuitl, M. Zhong, and W. Luan, “Lightweight non- Dalian University of Technology, Dalian, China,
intrusive load monitoring employing pruned sequence-to-point learning,” in July 1999 and July 2004, respectively.
in Proc. 5th Int. Workshop Non-Intrusive Load Monitor., Nov. 2020, He was an Associate Professor with Dalian Nation-
pp. 11–15. alities University, Dalian. He was a Post-Doctoral
[32] A. M. A. Ahmed, Y. Zhang, and F. Eliassen, “Generative adversarial Researcher with INRIA Rennes, Rennes, France,
networks and transfer learning for non-intrusive load monitoring in smart from December 2015 to December 2016. He was
grids,” in Proc. IEEE Int. Conf. Commun., Control, Comput. Technol. a Research Assistant with the School of Comput-
Smart Grids (SmartGridComm), Tempe, AZ, USA, Nov. 2020, pp. 1–7. ing, University of Glasgow, Glasgow, U.K., from
February 2008 to October 2011. He was a Research
[33] Z. Yue, C. R. Witzig, D. Jorde, and H. Jacobsen, “BERT4NILM: A
Associate with the School of Informatics, The University of Edinburgh,
bidirectional transformer model for non-intrusive load monitoring,” in
Edinburgh, U.K., from July 2013 to April 2017. He was an Associate Professor
Proc. 5th Int. Workshop Non-Intrusive Load Monit., 2020, pp. 89–93.
with the Dalian University of Technology. He was a Senior Lecturer with
[34] L. Wang, S. Mao, B. M. Wilamowski, and R. M. Nelms, “Pre-trained
the University of Lincoln, Lincoln, U.K. He is currently a Lecturer with
models for non-intrusive appliance load monitoring,” IEEE Trans. Green
the Department of Computing Science, University of Aberdeen, Aberdeen,
Commun. Netw., vol. 6, no. 1, pp. 56–68, Mar. 2022.
U.K. His current research interests include machine learning, computational
[35] D. Murray, L. Stankovic, V. Stankovic, S. Lulic, and S. Sladojevic, statistics, and artificial intelligence.
“Transferability of neural network approaches for low-rate energy dis-
aggregation,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
(ICASSP), Brighton, U.K., May 2019, pp. 8330–8334.
[36] D. Hendrycks, M. Mazeika, S. Kadavath, and D. Song, “Using self-
supervised learning can improve model robustness and uncertainty,”
2019, arXiv:1906.12340.
[37] L. Jing and Y. Tian, “Self-supervised visual feature learning with deep
neural networks: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., Wenpeng Luan (Senior Member, IEEE) received
vol. 43, no. 11, pp. 4037–4058, Nov. 2021. the B.S. degree in electrical engineering from
[38] S. Pascual, M. Ravanelli, J. Serrà, A. Bonafonte, and Y. Bengio, Tsinghua University, Beijing, China, in 1986, the
“Learning problem-agnostic speech representations from multiple self- M.S. degree in electrical engineering from Tianjin
supervised tasks,” 2019, arXiv:1904.03416. University, Tianjin, China, in 1989, and the Ph.D.
[39] J. Z. Kolter and M. J. Johnson, “REDD: A public data set for energy degree in electrical engineering from Strathclyde
disaggregation research,” in Proc. Workshop Data Mining Appl. Sustain- University, Glasgow, U.K., in 1999.
ability (SIGKDD), San Diego, CA, USA, 2011, pp. 59–62. After long industry working experiences, he joined
[40] J. Kelly and W. Knottenbelt, “The U.K.-DALE dataset, domestic Tianjin University, as a Professor, in 2019. His cur-
appliance-level electricity demand and whole-house demand from five rent research interests include smart metering data
U.K. homes,” Sci. Data, vol. 2, no. 1, pp. 1–14, 2015. analytics, nonintrusive load monitoring, distribution
[41] D. Murray, L. Stankovic, and V. Stankovic, “An electrical load mea- system analysis, renewable energy resource integration, and utility advanced
surements dataset of United Kingdom households from a two-year applications.
longitudinal study,” Sci. Data, vol. 4, no. 1, pp. 1–12, 2017.

Yixin Yu (Life Senior Member, IEEE) received the

B.S. and M.S. degrees in electrical engineering from
Tianjin University, Tianjin, China, in 1958 and 1963,
respectively.
Shuyi Chen received the B.S. degree in electri- He joined Tianjin University, as a Full Professor,
cal engineering from China Agricultural University, in 1985. He was a Visiting Scholar with the Depart-
Beijing, China, in 2020. She is currently pursuing the ment of EECS, University of California at Berkeley,
M.S. degree in electrical engineering with Tianjin Berkeley, CA, USA, from 1980 to 1982, and a
University, Tianjin, China. Chair Visiting Professor with the Kyushu Institute of
Her current research interests include nonintrusive Technology, Kitakyushu, Japan, from 1995 to 1997.
load monitoring and deep learning. His current research interests include power system
stability and security, power distribution planning, and smart grid.
Mr. Yu was elected as an Academician of the Chinese Academy of
Engineering, in 2005.

Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.

CAIE IGCSE Physics Theory
No ratings yet
CAIE IGCSE Physics Theory
52 pages
High - Temp Component Life
100% (1)
High - Temp Component Life
337 pages
Install Active-Directory
No ratings yet
Install Active-Directory
4 pages
Energy Disaggregation From Non-Intrusive Load Monitoring
No ratings yet
Energy Disaggregation From Non-Intrusive Load Monitoring
11 pages
Thresholding Methods in Non Intrusive Load Monitoring: Daniel Precioso David Gómez Ullate
No ratings yet
Thresholding Methods in Non Intrusive Load Monitoring: Daniel Precioso David Gómez Ullate
24 pages
Non-Intrusive Load Monitoring For Energy Consumption Disaggregation
No ratings yet
Non-Intrusive Load Monitoring For Energy Consumption Disaggregation
6 pages
Text Classification Using Support Vector Machine IJERTV1IS3174
No ratings yet
Text Classification Using Support Vector Machine IJERTV1IS3174
4 pages
Sensors 22 05872 v4
No ratings yet
Sensors 22 05872 v4
29 pages
Non-Intrusive Load Classification and Recognition Using Soft-Voting Ensemble Learning Algorithm With Decision Tree K-Nearest Neighbor Algorithm and Multilayer Perceptron
No ratings yet
Non-Intrusive Load Classification and Recognition Using Soft-Voting Ensemble Learning Algorithm With Decision Tree K-Nearest Neighbor Algorithm and Multilayer Perceptron
15 pages
FSCUT9100 Installation User Manual
No ratings yet
FSCUT9100 Installation User Manual
85 pages
10.y. T. Quek, W. L. Woo
No ratings yet
10.y. T. Quek, W. L. Woo
10 pages
Non-Intrusive Load Monitoring Theory, Technologies and Applications (Hui Liu) (Z-Library)
No ratings yet
Non-Intrusive Load Monitoring Theory, Technologies and Applications (Hui Liu) (Z-Library)
288 pages
Optimal Strategy To Select Load Identification Fea
No ratings yet
Optimal Strategy To Select Load Identification Fea
16 pages
v1 Covered
No ratings yet
v1 Covered
25 pages
Towards Feasible Solutions For Load Monitoring in
No ratings yet
Towards Feasible Solutions For Load Monitoring in
24 pages
PHD Thesis
No ratings yet
PHD Thesis
205 pages
Machine Learning Approaches To Non-Intrusive Load Monitoring (Roberto Bonfigli, Stefano Squartini) (Z-Library)
No ratings yet
Machine Learning Approaches To Non-Intrusive Load Monitoring (Roberto Bonfigli, Stefano Squartini) (Z-Library)
142 pages
14 - Electrical Signal Source Separation Via Nonnegative Tensor Factorization Using On Site Measurements in A Smart Home
No ratings yet
14 - Electrical Signal Source Separation Via Nonnegative Tensor Factorization Using On Site Measurements in A Smart Home
10 pages
Performance Evaluation in Non-Intrusive Load Monit
No ratings yet
Performance Evaluation in Non-Intrusive Load Monit
17 pages
Industrial NILM (Important)
No ratings yet
Industrial NILM (Important)
22 pages
Deep Learning-Based Non-Intrusive Commercial
No ratings yet
Deep Learning-Based Non-Intrusive Commercial
18 pages
Event Detection For Non-Intrusive Load Monitoring Using Tukey's Fences
No ratings yet
Event Detection For Non-Intrusive Load Monitoring Using Tukey's Fences
9 pages
Error Detection and Correction
No ratings yet
Error Detection and Correction
38 pages
Unet Nilm
No ratings yet
Unet Nilm
5 pages
Exp Limiting Friction
No ratings yet
Exp Limiting Friction
2 pages
Shikha Thesis
No ratings yet
Shikha Thesis
122 pages
Dissertation NILM
No ratings yet
Dissertation NILM
112 pages
Q r2.3 Equilibrium Questions
No ratings yet
Q r2.3 Equilibrium Questions
2 pages
Sensors 23 05226
No ratings yet
Sensors 23 05226
25 pages
ER To Relational Model
No ratings yet
ER To Relational Model
39 pages
Learn HTML - Forms Cheatsheet - Codecademy
No ratings yet
Learn HTML - Forms Cheatsheet - Codecademy
8 pages
Energies 10 00538 PDF
No ratings yet
Energies 10 00538 PDF
21 pages
Unsupervised Energy Disaggregation Via Convolutional Sparse Coding
No ratings yet
Unsupervised Energy Disaggregation Via Convolutional Sparse Coding
8 pages
Transfer Learning For Multiobjective Non Intrusive Load Monitoring in Smart Building
No ratings yet
Transfer Learning For Multiobjective Non Intrusive Load Monitoring in Smart Building
11 pages
Rok Aha 2019
No ratings yet
Rok Aha 2019
11 pages
1 PB
No ratings yet
1 PB
11 pages
A Review of NILM Applications Withmachine Learning Approaches
No ratings yet
A Review of NILM Applications Withmachine Learning Approaches
19 pages
KMK 26nov2018 TCE-2018-09-0422 Rev 9
No ratings yet
KMK 26nov2018 TCE-2018-09-0422 Rev 9
10 pages
Transformer For Nonintrusive Load Monitoring Complexity Reduction and Transferability
No ratings yet
Transformer For Nonintrusive Load Monitoring Complexity Reduction and Transferability
11 pages
01.bouhouras, A.S. Milioudis, A.N. Labridis, D.P. (2014) Development of Distinct Load Signatures For Higher Efciency of NILM Algorithms
No ratings yet
01.bouhouras, A.S. Milioudis, A.N. Labridis, D.P. (2014) Development of Distinct Load Signatures For Higher Efciency of NILM Algorithms
9 pages
Nitinol Stent For Percutaneous Heart Valve Implantation - Material Shape Setting
No ratings yet
Nitinol Stent For Percutaneous Heart Valve Implantation - Material Shape Setting
8 pages
MATNilm
No ratings yet
MATNilm
10 pages
Switch On - Worksheets - 3 - Amer 6
No ratings yet
Switch On - Worksheets - 3 - Amer 6
1 page
Unsupervised Algorithm For Disaggregating Low-Sampling-Rate Electricity Consumption of Households
No ratings yet
Unsupervised Algorithm For Disaggregating Low-Sampling-Rate Electricity Consumption of Households
38 pages
Multi-State Energy Classifier To Evaluate The NILM
No ratings yet
Multi-State Energy Classifier To Evaluate The NILM
18 pages
A Framework To Generate and Label Datasets For Non
No ratings yet
A Framework To Generate and Label Datasets For Non
26 pages
Non Intrusive Energy Disaggregation by Detecting Similarities in Consumption Patterns
No ratings yet
Non Intrusive Energy Disaggregation by Detecting Similarities in Consumption Patterns
20 pages
Group Members: 1. Shucayb Mohamed Ismail 2. Abdihafid Ismail Salad 3. Nimo Ahmed Hassan 4. Nimo Khadar Ahmed
No ratings yet
Group Members: 1. Shucayb Mohamed Ismail 2. Abdihafid Ismail Salad 3. Nimo Ahmed Hassan 4. Nimo Khadar Ahmed
20 pages
Electronics 13 01420
No ratings yet
Electronics 13 01420
21 pages
Self-Supervised Feature Learning For Appliance Recognition in Non-Intrusive Load Monitoring
No ratings yet
Self-Supervised Feature Learning For Appliance Recognition in Non-Intrusive Load Monitoring
13 pages
Scientific Method and Retailing Research A Retrospective
No ratings yet
Scientific Method and Retailing Research A Retrospective
13 pages
Machine Learning Approaches For Non-Intrusive Load
No ratings yet
Machine Learning Approaches For Non-Intrusive Load
27 pages
Journal Pre-Proof: Energy & Buildings
No ratings yet
Journal Pre-Proof: Energy & Buildings
42 pages
Non-Intrusive Load Monitoring (NILM) Using Deep
No ratings yet
Non-Intrusive Load Monitoring (NILM) Using Deep
6 pages
Incorporating Appliance Usage Patterns For Non-Intrusive Load Monitoring and Load Forecasting
No ratings yet
Incorporating Appliance Usage Patterns For Non-Intrusive Load Monitoring and Load Forecasting
14 pages
Load Identification in Neural Networks For A Non-Intrusive Monitoring of Industrial Electrical Loads
No ratings yet
Load Identification in Neural Networks For A Non-Intrusive Monitoring of Industrial Electrical Loads
2 pages
Towards Identification of Appliances in Conventional Homes Using ML and Descriptive Statistics
No ratings yet
Towards Identification of Appliances in Conventional Homes Using ML and Descriptive Statistics
6 pages
Motion and Its Types - What Is Motion - Types of Motion PPT 2
No ratings yet
Motion and Its Types - What Is Motion - Types of Motion PPT 2
1 page
NILM 2017 Paper
No ratings yet
NILM 2017 Paper
23 pages
Electronics 07 00235 PDF
No ratings yet
Electronics 07 00235 PDF
15 pages
Regularized NCA Based Prominent Feature Selection
No ratings yet
Regularized NCA Based Prominent Feature Selection
12 pages
Grouping of Resistances-1
No ratings yet
Grouping of Resistances-1
13 pages
A Practical Solution For Non-Intrusive Type II Load Monitoring Based On Deep Learning and Post-Processing
No ratings yet
A Practical Solution For Non-Intrusive Type II Load Monitoring Based On Deep Learning and Post-Processing
13 pages
Simpsons Parking Lot Diversity Lab Share
No ratings yet
Simpsons Parking Lot Diversity Lab Share
7 pages
1992 Nonintrusive Appliance Load Monitoring
No ratings yet
1992 Nonintrusive Appliance Load Monitoring
22 pages
2022 - 1 s2.0 S2352467722001370 Main
No ratings yet
2022 - 1 s2.0 S2352467722001370 Main
12 pages
Frafos ABC SBC Brochure
No ratings yet
Frafos ABC SBC Brochure
4 pages
Cost-Effective Instrumentation Via NILM To Support A Residential Energy Management System
No ratings yet
Cost-Effective Instrumentation Via NILM To Support A Residential Energy Management System
4 pages
The Ultimate Physics: A Brief History of String Theory: October 2017
No ratings yet
The Ultimate Physics: A Brief History of String Theory: October 2017
11 pages
Applied Sciences: Non-Intrusive Load Disaggregation by Convolutional Neural Network and Multilabel Classification
No ratings yet
Applied Sciences: Non-Intrusive Load Disaggregation by Convolutional Neural Network and Multilabel Classification
17 pages
Load Monitoring and Activity Recognition in Smart
No ratings yet
Load Monitoring and Activity Recognition in Smart
14 pages
Paldi: Online Load Disaggregation Via Particle Filtering
No ratings yet
Paldi: Online Load Disaggregation Via Particle Filtering
11 pages
Energy Efficiency and Predictive Maintenance Appli
No ratings yet
Energy Efficiency and Predictive Maintenance Appli
8 pages
Residential Electrical Consumption Disaggregation On A Single Low-Cost Meter
No ratings yet
Residential Electrical Consumption Disaggregation On A Single Low-Cost Meter
7 pages
Non-Intrusive Load Monitoring Using Prior Models of General Appliance Types
No ratings yet
Non-Intrusive Load Monitoring Using Prior Models of General Appliance Types
7 pages
Scheme of Examination
No ratings yet
Scheme of Examination
42 pages
Nonintrusive Load Monitoring
No ratings yet
Nonintrusive Load Monitoring
7 pages
Re Order Level
No ratings yet
Re Order Level
23 pages
Ipiii
No ratings yet
Ipiii
8 pages
BODMAS 1new
No ratings yet
BODMAS 1new
2 pages
Electrical Power and Energy Systems
No ratings yet
Electrical Power and Energy Systems
8 pages
E427 PDF
No ratings yet
E427 PDF
7 pages
T7S 1250 Pr332/P Lsirc in 1250A 4P F F: General Information
No ratings yet
T7S 1250 Pr332/P Lsirc in 1250A 4P F F: General Information
3 pages
Stats1 Chapter 2::: Measures of Location & Spread
No ratings yet
Stats1 Chapter 2::: Measures of Location & Spread
53 pages
JOTRON TRON UAIS TR-2500 - Operation - Installation Manual
No ratings yet
JOTRON TRON UAIS TR-2500 - Operation - Installation Manual
77 pages
The Chevalley-Warning Theorem (Featuring. - . The Erd Os-Ginzburg-Ziv Theorem)
No ratings yet
The Chevalley-Warning Theorem (Featuring. - . The Erd Os-Ginzburg-Ziv Theorem)
14 pages
Preheat Calculation 2 PDF
No ratings yet
Preheat Calculation 2 PDF
3 pages
Dr. Rife's Newspaper Articles
No ratings yet
Dr. Rife's Newspaper Articles
1 page
Energy Management Systems: Design and Implementation: Definitive Reference for Developers and Engineers
From Everand
Energy Management Systems: Design and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Concise Guide to OTN optical transport networks
From Everand
Concise Guide to OTN optical transport networks
alasdair gilchrist
4/5 (2)

Nonintrusive Load Monitoring Based On Self-Supervised Learning

Uploaded by

Nonintrusive Load Monitoring Based On Self-Supervised Learning

Uploaded by

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL.

72, 2023 2507113

Nonintrusive Load Monitoring Based

1) S2p: The utilization of S2p in NILM is based on the

TABLE II III. S ELF -S UPERVISED L EARNING FOR NILM

A. Self-Supervised Pretext Task Training

Fig. 3. Flowchart of the proposed SSL in load disaggregation, performed by S2p.

parameters after self-supervised pretext outperforms partial

C. Justification for Using SSL

The function f θ is used to learn a feature representation

(7) IV. E XPERIMENTAL S ETUP

A. NILM Performance Metrics Comparison

Among all six cases, S2p-SSL generally outperforms

C. Load Power Disaggregation

Yixin Yu (Life Senior Member, IEEE) received the

You might also like