Nonintrusive Load Monitoring Based On Self-Supervised Learning
Nonintrusive Load Monitoring Based On Self-Supervised Learning
Abstract— Deep learning models for nonintrusive load moni- extensive attention around the world [4]. In general, load
toring (NILM) tend to require a large amount of labeled data monitoring technology is mainly categorized into intrusive
for training. However, it is difficult to generalize the trained way and nonintrusive way. Note that intrusive load monitoring
models to unseen sites due to different load characteristics and
operating patterns of appliances between datasets. For addressing requires extra sensor installation for submetering. Alterna-
such problems, self-supervised learning (SSL) is proposed in this tively, the concept of nonintrusive load monitoring (NILM)
article, where labeled appliance-level data from the target dataset was proposed by Hart [5] as identifying power consumed
or house are not required. Initially, only the aggregate power by each individual appliance via analyzing aggregate power
readings from target dataset are required to pretrain a general readings using only software tools. NILM offers appliance-
network via a self-supervised pretext task to map aggregate
power sequences to derived representatives. Then, supervised level power consumption feedback to both demand and sup-
downstream tasks are carried out for each appliance category ply sides economically and efficiently, contributing to power
to fine-tune the pretrained network, where the features learned system planning and operation [1], energy bill savings [6],
in the pretext task are transferred. Utilizing labeled source demand side management [7], and energy conservation and
datasets enables the downstream tasks to learn how each load
emission reduction [3], [6], [8].
is disaggregated, by mapping the aggregate to labels. Finally,
the fine-tuned network is applied to load disaggregation for NILM is a single-channel blind source separation problem,
the target sites. For validation, multiple experimental cases are aiming to disaggregate the appliance-level energy consumption
designed based on three publicly accessible REDD, U.K.-DALE, from the aggregate measurements [9]. Combinatorial optimiza-
and REFIT datasets. Besides, the state-of-the-art neural networks tion (CO) is initially applied to perform NILM in [5], search-
are employed to perform NILM task in the experiments. Based
ing for the best combination of operational states of individual
on the NILM results in various cases, SSL generally outperforms
zero-shot learning in improving load disaggregation performance appliances at each time instance. However, CO relies on the
without any submetering data from the target datasets. power range of each operational state as prior knowledge,
Index Terms— Deep neural network (DNN), nonintrusive load
making it unavailable to the newly added appliances [10].
monitoring (NILM), self-supervised learning (SSL), sequence-to- Benefiting from the technology development in recent years
point learning. on big data, artificial intelligence, and edge computing, plenty
of NILM approaches have been proposed based on machine
I. I NTRODUCTION learning, mathematics, and signal processing [8], [11]. Fac-
torial hidden Markov model (FHMM) and its variants [12],
I N RECENT years, energy shortage and environmental pol-
lution worldwide have become increasingly serious. There-
fore, the approaches of efficient energy utilization and carbon
[13], [14] are popular in carrying out NILM. Given an
aggregate power signal as the observation, such FHMM-based
NILM methods estimate the hidden operational states of
emissions reduction are being explored [1], [2]. Meanwhile,
each appliance considering their state continuity in time
with the global deployment of smart meters, benign interaction
series [15], [16]. Thus, FHMM-based methods usually achieve
between power suppliers and users has been established for
good results in disaggregating loads with periodic operation
enhancing demand side management and optimizing power
such as refrigerators. However, their performance is limited
grid operation [3]. As one of the energy conservation applica-
for the loads with short-lasting working cycles and the ones
tions, electricity consumption detail monitoring has attracted
with less frequent usage. Note that FHMM-based methods are
Manuscript received 23 October 2022; accepted 27 January 2023. Date of regarded as state-based NILM approaches, where the aggregate
publication 22 February 2023; date of current version 2 March 2023. This power measurement at each time instance is assigned to
work was supported in part by the Joint Funds of the National Natural Science
Foundation of China under Grant U2066207 and in part by the National Key each operational state per appliance [17]. Alternatively, NILM
Research and Development Program of China under Grant 2020YFB0905904. approaches can be event-based, where sudden changes in
The Associate Editor coordinating the review process was Dr. Guglielmo power signals referring to turn-on, turn-off, and state transition
Frigo. (Corresponding author: Wenpeng Luan.)
Shuyi Chen, Bochao Zhao, Wenpeng Luan, and Yixin Yu are with events are featured [17]. Such event-based NILM methods can
the School of Electrical and Information Engineering, Tianjin University, be carried out via subtractive clustering and the maximum
Tianjin 300072, China (e-mail: [email protected]). likelihood classifier [18]. Besides, graph signal processing
Mingjun Zhong is with the Department of Computing Science, University of
Aberdeen, AB24 3UE Aberdeen, U.K. (e-mail: [email protected]). concepts are applied to perform NILM, mapping correlation
Digital Object Identifier 10.1109/TIM.2023.3246504 among samples to the underlying graph structure [19], [20].
1557-9662 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
2507113 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
Although such event-based NILM approaches can achieve transferability across datasets. However, in ZSL, it is difficult
high load identification accuracy, they tend to suffer from to generalize networks between datasets with different load
measurement noises. characteristics and operating patterns of appliances. The same
Deep neural networks (DNNs), performing well in computer as ZSL, self-supervised learning (SSL) requires no labeled
vision, speech recognition, and natural language processing, data from the target set. SSL is an efficient way to extract
have been employed in load disaggregation since 2015 [21]. universal features from large-scale unlabeled data, contributing
Since then, DNN structures become more and more pop- to robustness enhancement [36]; thus, it performs well in
ular to perform NILM, including long short-term memory image processing and speech recognition [37]. To the best
(LSTM) [15], [21], [22], gated recurrent unit (GRU) [10], [23], of our knowledge, SSL has not been used to solve NILM
denoising autoencoder (dAE) [21], [24], and convolutional problem.
neural network (CNN) [25], [26], [27], showing competitive Driven by such research gaps, in this article, SSL is applied
performance against traditional NILM methods. Although to three state-of-the-art NILM algorithms based on CNN,
LSTM is suitable for long time-series-related tasks due to GRU, and LSTM, as S2p [9], Bi-GRU [10], and LSTM [22].
avoiding the vanishing gradient problem, it underperforms in For performing NILM, a self-supervised pretext task training
NILM task compared to CNN [28], [29]. Attention mechanism is initially carried out for learning features from the aggregate
is applied to LSTM in [22] for improving load disaggregation power readings from the unlabeled data in the target set.
performance. As a variant of LSTM, GRU can also remember Then, the pretrained network is fine-tuned in the supervised
data patterns. In addition, GRU contains fewer parameters, downstream task training based on the labeled data from the
thus requires shorter training time, which is suitable for online source set for transferring the prelearned knowledge to load
application in NILM [10], [30]. Note that the bidirectional disaggregation. After pretraining and fine-tuning, the network
GRU (Bi-GRU) is employed to perform NILM in [10], where can be applied to load disaggregation for target sites. The
the network can be trained simultaneously in positive and proposed method is validated on the real-world datasets at
negative time directions. Besides, dAE is applied to NILM 1-min granularity, in the scenarios designed for the same
by recovering the power signal of target appliances (clean dataset or across various datasets. The contributions of this
signal) from the aggregate (noisy signal) [8], [21], where article are clarified as follows.
CNN layers are usually embedded [21], [24]. A state-of-the- 1) SSL is applied to load disaggregation based on deep
art CNN-based NILM method, S2p, is proposed in [9] and learning without submetering on the target set, by setting
claimed to outperform benchmarks in NILM task [9], [31], a pretext task for network pretraining on unlabeled data
beneficial from meaningful latent features learned from sub- from the target set with fine-tuning.
metering data. Recently, more improved network structures 2) Experiments are carried out for all combinations of three
are applied to NILM, such as generative adversarial networks state-of-the-art DNN-based NILM methods (S2p [9],
(GANs) [32] and transformer [33]. GAN framework helps to Bi-GRU [10], and LSTM [22]) and learning frameworks
improve appliance profile estimation when performing NILM, (SSL and ZSL), on three real-world datasets.
as more outputs close to the ground truth (GT) are generated in 3) Six cases differing in data selection are designed for
network training. In terms of transformer, attention mechanism performance evaluation on the data across houses or
is advantageous to NILM via capturing long-range dependency sets, showing SSL generally outperforms in various
from sequential data [30]. metrics and energy consumption estimation results, with
Compared to traditional NILM methods, the advantages comparable training time cost.
of DNN-based NILM approaches include automatic feature The rest of this article is organized as follows. In Section II,
extraction from power readings and linearity between com- the NILM formulation is clarified, followed by introducing the
putational complexity and appliance amount [4]. However, preliminaries for NILM neural networks and SSL. The
the promising performance of the aforementioned DNN-based methodology of SSL for NILM is explained in Section III.
methods relies on a large amount of submetering data from Section IV contains datasets, evaluation metrics, and exper-
the target set for training [4]. Since such data collection may imental settings, followed by experimental results with dis-
last for months or even years [4], it is neither user-friendly cussion illustrated in Section V; eventually, the conclusion is
nor economical in practice. drawn and the future work is prospected in Section VI.
Alternatively, transfer learning concepts are proposed,
where transferable networks can be trained on a source (seen) II. P RELIMINARIES
dataset and applied to the load disaggregation task on a
In this section, we first formulate the NILM problem and
target (unseen) dataset [2], [32], [34], [35]. Depending on
then clarify seq2seq and seq2point concepts, followed by an
whether network fine-tuning is required, transfer learning can
introduction for two seq2point network architectures. Finally,
be classified as few-shot learning (FSL) and zero-shot learning
the overall structure of SSL is demonstrated.
(ZSL) [34]. For fine-tuning in FSL, a small amount of labeled
data from the target set are still required [34]. However, when
labels are unable to be captured from the target datasets, ZSL A. NILM Problem Formulation
offers proper solutions. In [35], ZSL achieves tiny performance Assuming that the aggregate power reading measured in a
drop compared to baseline when it is employed in load household at time index t ∈ [1, T ] is yt , where T refers to
disaggregation by both GRU and CNN networks, showing the total number of samples. Then, the simultaneous power
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: NLM BASED ON SELF-SUPERVISED LEARNING 2507113
TABLE I
N ETWORK S PECIFICATIONS FOR S2 P [9]
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
2507113 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: NLM BASED ON SELF-SUPERVISED LEARNING 2507113
Fig. 4. Self-supervised pretext task layout. Fig. 5. Supervised downstream task layout.
As shown in Fig. 4, aggregate measurement sequences as input and the appliance-level power estimation at its midpoint
sliding windows are iteratively imported into the networks, or ending point as output. Thus, the downstream task training
while the network outputs are their midpoints or ending points. for a typical appliance m is illustrated in Fig. 5.
Note that parameter fine-tuning can be carried out on either
B. Supervised Downstream Task Training partial network layers or all layers, defined as partial fine-
For transferring the features extracted from the “unseen” tuning and full fine-tuning. For performing partial fine-tuning
target dataset without labels to load disaggregation task, the on both S2p and Bi-GRU architectures, only their dense layers
network is updated iteratively. That is, the network parameters and output layers are fine-tuned, while the other layers are
are further fine-tuned on the labeled data from the source frozen. In DNN, general low-level features can be captured in
set, as a supervised downstream task training. By carrying shallow layers, while in deep layers, high-level features related
out the downstream task, the general feature representation to the task tend to be extracted [37]. Therefore, partial fine-
learned via the pretext task contributes to predicting individual tuning is usually applied to the tasks with knowledge transfer
appliance power. As described in Section II, such network is for sharing common features and thus guaranteeing efficiency.
fine-tuned with the mains power in each sequence window as However, it is claimed in [38] that fine-tuning all network
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
2507113 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: NLM BASED ON SELF-SUPERVISED LEARNING 2507113
TABLE III
DATA S ELECTION FOR T ESTING C ASES
from 20 U.K. houses from 2013 to 2015. Since the data are selected in this article. At first, mean absolute error
from REDD, U.K.-DALE, and REFIT datasets are collected by (MAE) [4] is applied to quantify the absolute error between
standard monitoring equipment and widely used in validating the predicted signal and the GT on average. Namely, MAE is
NILM approaches, its measurement uncertainty is acceptable formulated as
with negligible influence on NILM performance. Compared to T
X
the other two datasets, REFIT dataset is noisier due to more MAE = |x̂ t − xt |/T (8)
unknown loads. Note that all data are downsampled to 1 min t=1
as low-rate measurements. Since the REFIT dataset is the
where the variable x̂ t denotes the power prediction at time
largest [2], containing more houses with more appliances, it is
index t ∈ [1, T ], and xt is the simultaneous actual power
chosen for fine-tuning in SSL and training in ZSL. The experi-
measurement. Thus, T is the total number of samples in the
mental data selected for pretext task training, downstream task
signal.
training, and testing in SSL are clarified in Table III, as well
The second metric is normalized signal aggregate error [30],
as that for both training and testing in ZSL.
which is defined as
For exploring the performance boundary of SSL, six exper-
imental cases are settled. In both Cases 1 and 2, pretraining, SAE = |r̂ − r |/r (9)
fine-tuning, and testing in SSL are carried out among houses
where r and r̂ refer to the actual total energy consumption and
from the same REFIT dataset. Cases 1 and 2 share the same
its prediction, respectively. SAE measures the relative error of
data from the same house for testing, while they differ in
the total energy.
houses for building the pretraining dataset. Data for single
Furthermore, metric energy per day (EpD) [2] is adopted,
House 2 are used for both pretraining and testing in Case 1,
which offers the average error of daily appliance-level energy
while more houses other than House 2 are used for pretraining
consumption prediction. EpD is defined as
in Case 2. Similar settings are employed for Cases 3 and 4;
however, the data for both pretraining and testing are from D
X
REDD houses. For further investigating the impact of pre- EpD = |ên − en |/D (10)
training house selection on load disaggregation, we set the size n=1
of pretraining data from various houses in Case 6 equivalent where D denotes the total number of days, en is the actual
to that from single House 1 in Case 5. In terms of ZSL daily energy consumption, and ên refers to the prediction.
benchmarks, the datasets for training and testing are the same Not like MAE, where errors are calculated sample-by-sample,
as those for fine-tuning and testing for SSL, respectively. corresponding to power, SAE and EpD are calculated for
In consideration of the appliance types across datasets, five energy consumption. However, EpD reflects daily errors while
appliances with the most frequent usage are chosen to be dis- SAE is normalized for the entire period.
aggregated, including fridge (F), microwave (M), dishwasher
(DW), washing machine (WM), and kettel (K). For each C. Experimental Settings
appliance, the data for fine-tuning its network come from
Our program is implemented in Python using Tensor-
houses heuristically picked based on [30].
Flow. The networks were trained on machines with NVIDIA
GeForce RTX 2080 Ti. Networks are trained by ADAM
B. Evaluation Metrics optimizer algorithm, with an early-stopping mechanism to
Since various metrics have been applied to evaluate NILM prevent overfitting as in [2], where the optimal network is
performance in the previous works, three widely used metrics picked with a validation loss lower than those in the following
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
2507113 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
six epochs. The batch size is set to 512 for REDD dataset, TABLE IV
as in Cases 3 and 4, while in other cases, it is set to 1024 for E XPERIMENTAL R ESULTS FOR S2 P. B OLD N UMBERS
R EFER TO THE B EST R ESULTS IN A ROW
REFIT and U.K.-DALE datasets to accelerate data processing.
Meanwhile, we set the learning rate to 0.001 as in [2]. The
data normalization used in our experiments is the same as
in [9]. In Bi-GRU, dropout rate is set to 0.5. Since the data
sampling rates in this article differ from those in [10] and [22],
the window lengths for Bi-GRU and LSTM are heuristically
characterized as ten for WMs due to their longer operational
duration and five for other appliances, for guaranteeing the
same temporal settings. For similar reason, the general window
length is set to 79 for all appliances in S2p, based on that set
in [9].
V. E XPERIMENTAL R ESULT
In this section, the overall experimental results are initially
demonstrated, as for all cases, SSL is evaluated against ZSL on
S2p, Bi-GRU, and LSTM networks in three metrics for each
appliance. Then, the total energy consumption per appliance
estimated in all experiments is illustrated, followed by a series
of examples of disaggregated power signals. Finally, SSL and
ZSL are compared for their total training time and execution
time per sample. For simplification, SSL with full fine-tuning
is abbreviated as SSL. The abbreviations of such appliances
are: F for fridge; M for microwave, DW for dishwasher,
WM for washing machine, and K for kettle.
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: NLM BASED ON SELF-SUPERVISED LEARNING 2507113
TABLE V TABLE VI
E XPERIMENTAL R ESULTS FOR Bi-GRU. B OLD N UMBERS E XPERIMENTAL R ESULTS FOR LSTM. B OLD N UMBERS
R EFER TO THE B EST R ESULTS IN A ROW R EFER TO THE B EST R ESULTS IN A ROW
in the REFIT dataset. Such distinction in load characteristics replacing the data collected from a single house by that from
also affects the disaggregation results for M. MAE results for multiple houses while maintaining the data size seems not to
both M and DW are increased by applying SSL to Bi-GRU, have a certain impact on load disaggregation performance, e.g.,
while in Case 3, the superiority of S2p-SSL in both SAE and close performance for fridges in Cases 5 and 6 is due to the
EpD is caused by lower power estimation for misidentified high similarity in power characteristics of their operational
loads. In Case 3, although EpD results are not improved cycles. Significant performance improvement is achieved by
by LSTM-SSL comparing to LSTM-ZSL, the improvements SSL for M from Cases 5 and 6, due to less misidentification.
on the other two metrics are significant, while in Case 4, Bi-GRU-SSL outperforms Bi-GRU-ZSL for DW in energy
dominating performance is achieved by LSTM-SSL. Since estimation; however, Bi-GRU-ZSL achieves better results in
SSL generally outperforms others in Cases 3 and 4, the results sample-by-sample disaggregation results.
show the knowledge transferability between REDD dataset and It can be concluded from the whole experimental results
REFIT dataset. that self-supervised pretraining helps NILM performance
For the remaining experiments in Cases 5 and 6, data for improvement against ZSL, especially for S2p. Furthermore,
self-supervised pretraining in SSL and testing in both ZSL and S2p-SSL generally performs better than Bi-GRU-SSL and
SSL are selected from U.K.-DALE dataset. From Table IV, LSTM-SSL across metrics. With respect to transferability
applying SSL to S2p generally achieves the best disaggrega- between datasets, the overall results show that U.K.-DALE
tion performance in both cases. Similar experimental results and REFIT datasets, both collected from the U.K., have
can be found in Cases 5 and 6 on LSTM framework, showing stronger transferable potential than the datasets from distinct
general superiority of LSTM-SSL. Slightly poor performance regions. Moreover, the increase in pretraining data size leads to
for M in Case 5 is because of waveformwise and temporal performance improvement for SSL. In terms of disaggregation
pattern distinction between datasets for pretraining and test- performance for each appliance, the poor results for M are
ing. However, improvement is tiny for Bi-GRU. Moreover, achieved in various cases, as little information can be captured
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
2507113 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
Fig. 7. Energy consumption estimation by ZSL and SSL with GT. (a) Case 1 and Case 2. (b) Case 3 and Case 4. (c) Case 5 and Case 6.
due to its short duration and low usage frequency. For more REFIT House 5 and House 7. The stand-by power samples
complex WM loads with multiple operational states, their are estimated by Bi-GRU and LSTM for DW and WM, while
characteristics differ a lot across houses and datasets, resulting they cannot be observed in the testing set. Similar results can
in underestimation for power ranges while overestimation for be observed in Cases 3 and 4. Moreover, the generally worse
operation times. performance of Bi-GRU and LSTM than S2p in estimating
power consumption for F in Cases 3 and 4 is also caused
B. Energy Consumption Estimation by the distinction between the power characteristics of F in
The power consumption estimation results for each appli- the fine-tuning set and those for the testing set. The worse
ance are demonstrated as in Fig. 7. In both Cases 1 and 2, performance of Bi-GRU and LSTM as underestimation for K
power consumption close to the GT is estimated by various in Cases 5 and 6 can be explained by the same reason. Thus,
methods for F, M, and K, with ON/OFF operational states. we can conclude that Bi-GRU and LSTM are more sensitive
However, they differ in power consumption estimation for to the distinction between distribution in training and testing
multistate DW and WM, respectively, where Bi-GRU and sets. Note that unknown loads consume 73% in Cases 1 and 2,
LSTM suffer from overestimation. Around 8% and 16% 65% in Cases 3 and 4, and 56% in Cases 5 and 6 of the total
samples in power signals for DW and WM in the fine-tuning energy; thus, overestimation is more serious in Cases 1 and 2
sets are in stand-by states below 10 W, collected from both due to more misidentification.
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: NLM BASED ON SELF-SUPERVISED LEARNING 2507113
Fig. 8. Disaggregated power signals for U.K.-DALE House 2 in Case 6. (a) Microwave (M). (b) Fridge (F). (c) DW. (d) WM. (e) Kettle (K).
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
2507113 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
TABLE VII [3] B. Liu and W. Luan, “Conceptual cloud solution architecture and
E XECUTION T IME PER S AMPLE FOR A LL M ETHODS . application scenarios of power consumption data based on load dis-
T IME U NIT I S S ECOND /M EGABYTE aggregation,” Power Syst. Technol., vol. 40, no. 3, pp. 791–796, 2016.
[4] G. Cui, B. Liu, W. Luan, and Y. Yu, “Estimation of target appliance
electricity consumption using background filtering,” IEEE Trans. Smart
Grid, vol. 10, no. 6, pp. 5920–5929, Nov. 2019.
[5] G. W. Hart, “Nonintrusive appliance load data acquisition method:
Progress report,” MIT Energy Lab., Cambridge, MA, USA, Tech. Rep.,
1984.
[6] M. Zeifman and K. Roth, “Nonintrusive appliance load monitoring:
Review and outlook,” in Proc. IEEE Int. Conf. Consum. Electron.
(ICCE), Las Vegas, NV, USA, Jan. 2011, pp. 76–84.
[7] P. Huber, A. Calatroni, A. Rumsch, and A. Paice, “Review on deep
neural networks applied to low-frequency NILM,” Energies, vol. 14,
no. 9, p. 2390, Apr. 2021.
[8] J. R. Herrero et al., “Non intrusive load monitoring (NILM): A state
of the art,” in Proc. Int. Conf. Practical Appl. Agents Multi-Agent Syst.
Cham, Switzerland: Springer, 2017, pp. 125–138.
[9] C. Zhang, M. Zhong, Z. Wang, N. Goddard, and C. Sutton, “Sequence-
to-point learning with neural networks for non-intrusive load monitor-
ing,” in Proc. AAAI Conf. Artif. Intell., 2018, vol. 32, no. 1, pp. 1–8.
SSL is longer than that of ZSL by about 49 s, while for LSTM, [10] O. Krystalakos, C. Nalmpantis, and D. Vrakas, “Sliding window
the mean execution time per sample of ZSL is longer mainly approach for online energy disaggregation using artificial neural net-
contributed by F, as its power characteristics varying across works,” in Proc. 10th Hellenic Conf. Artif. Intell., Jul. 2018, pp. 1–6.
houses in the training dataset result in longer convergence [11] B. Zhao, M. Ye, L. Stankovic, and V. Stankovic, “Non-intrusive load
disaggregation solutions for very low-rate smart meter data,” Appl.
duration. However, in SSL, the pretext task helps to accelerate Energy, vol. 268, Jun. 2020, Art. no. 114949.
convergence in the downstream task. [12] M. Zhong, N. Goddard, and C. Sutton, “Signal aggregate constraints
in additive factorial HMMs, with application to energy disaggregation,”
in Proc. 27th Int. Conf. Neural Inf. Process. Syst., vol. 27. Cambridge,
VI. C ONCLUSION MA, USA: MIT Press, 2014, pp. 3590–3598.
In this article, SSL is applied to the state-of-the-art NILM [13] J. Z. Kolter and T. Jaakkola, “Approximate inference in additive factorial
HMMs with application to energy disaggregation,” in Proc. 15th Int.
solutions based on S2p, Bi-GRU, and LSTM. A pretext task Conf. Artif. Intell. Statist., 2012, pp. 1472–1482.
is proposed to pretrain a general network to map aggregate [14] M. Zhong, N. Goddard, and C. Sutton, “Interleaved factorial non-
power sequences to derived representatives, where labeled homogeneous hidden Markov models for energy disaggregation,” 2014,
data for the target dataset are not required. By performing a arXiv:1406.7665.
[15] L. Mauch and B. Yang, “A new approach for supervised power disaggre-
supervised downstream task based on the labeled data from gation by using a deep recurrent LSTM network,” in Proc. IEEE Global
source datasets, the pretrained network is fine-tuned with Conf. Signal Inf. Process. (GlobalSIP), Orlando, FL, USA, Dec. 2015,
feature transference and then applied to disaggregate loads pp. 63–67.
[16] L. Mauch and B. Yang, “A novel DNN-HMM-based approach for
for the target sites. Publicly accessible REDD, U.K.-DALE, extracting single loads from aggregate power signals,” in Proc. IEEE
and REFIT datasets are utilized for validation, with multiple Int. Conf. Acoust., Speech Signal Process. (ICASSP), Shanghai, China,
experimental cases designed to evaluate the performance of Mar. 2016, pp. 2384–2388.
[17] P. G. Donato, A. Hernandez, M. A. Funes, I. Carugati, R. Nieto, and
SSL using data within the same set and across datasets. For J. Urena, “Review of NILM applications in smart grids: Power quality
a comprehensive NILM performance demonstration, various assessment and assisted independent living,” in Proc. Argentine Conf.
metrics are employed, showing SSL generally outperforms Autom. Control (AADECA), Oct. 2020, pp. 1–6.
ZSL scheme. Specifically, the features learned via SSL help [18] N. Henao, K. Agbossou, S. Kelouwani, Y. Dubé, and M. Fournier,
“Approach in nonintrusive type I load monitoring using subtractive clus-
to improve underestimation, reduce misidentification, and mit- tering,” IEEE Trans. Smart Grid, vol. 8, no. 2, pp. 812–821, Mar. 2017.
igate the influence of wrong labels. Such conclusions are [19] B. Zhao, L. Stankovic, and V. Stankovic, “Blind non-intrusive appliance
also supported by the disaggregated power signals for each load monitoring using graph-based signal processing,” in Proc. IEEE
Global Conf. Signal Inf. Process. (GlobalSIP), Orlando, FL, USA,
appliance and their total power consumption estimation results. Dec. 2015, pp. 68–72.
Thus, SSL contributes to network refinement and disaggre- [20] B. Zhao, L. Stankovic, and V. Stankovic, “On a training-less solution for
gation performance improvement, with limited increment in non-intrusive appliance load monitoring using graph signal processing,”
IEEE Access, vol. 4, pp. 1784–1799, 2016.
execution time per sample.
[21] J. Kelly and W. Knottenbelt, “Neural NILM: Deep neural networks
Future work includes investigation on the influence of acti- applied to energy disaggregation,” in Proc. 2nd ACM Int. Conf. Embed-
vation balancing for compensating the skew of training data, ded Syst. Energy-Efficient Built Environ., 2015, pp. 55–64.
analyzing the impact of sampling frequency of the data for [22] J. Zhang, J. Sun, J. Gan, Q. Liu, and X. Liu, “Improving domestic NILM
using an attention-enabled Seq2Point learning approach,” in Proc. IEEE
fine-tuning in the source domain on the NILM performance, Int. Conf. Dependable, Autonomic Secure Comput., Int. Conf. Pervasive
and applying autoencoder to carry out the pretext task. Intell. Comput., Int. Conf. Cloud Big Data Comput., Int. Conf. Cyber Sci.
Technol. Congr. (DASC/PiCom/CBDCom/CyberSciTech), AB, Canada,
Oct. 2021, pp. 434–439.
R EFERENCES
[23] T.-T.-H. Le, J. Kim, and H. Kim, “Classification performance using gated
[1] Y. Yu, B. Liu, and W. Luan, “Nonintrusive residential load monitoring recurrent unit recurrent neural network on energy disaggregation,” in
and decomposition technology,” Southern Power Syst. Technol., vol. 7, Proc. Int. Conf. Mach. Learn. Cybern. (ICMLC), Jul. 2016, pp. 105–110.
no. 4, pp. 1–5, Aug. 2013. [24] R. Bonfigli, A. Felicetti, E. Principi, M. Fagiani, S. Squartini, and
[2] M. D’Incecco, S. Squartini, and M. Zhong, “Transfer learning for non- F. Piazza, “Denoising autoencoders for non-intrusive load monitoring:
intrusive load monitoring,” IEEE Trans. Smart Grid, vol. 11, no. 2, Improvements and comparative evaluation,” Energy Buildings, vol. 158,
pp. 1419–1429, Mar. 2020. pp. 1461–1474, Jan. 2018.
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: NLM BASED ON SELF-SUPERVISED LEARNING 2507113
[25] G. Zhou, Z. Li, M. Fu, Y. Feng, X. Wang, and C. Huang, “Sequence-to- Bochao Zhao (Member, IEEE) received the B.Eng.
sequence load disaggregation using multiscale residual neural network,” degree (Hons.) in electronic and electricity engineer-
IEEE Trans. Instrum. Meas., vol. 70, pp. 1–10, 2021. ing and the Ph.D. degree in electronic and electric-
[26] J. Chen, X. Wang, X. Zhang, and W. Zhang, “Temporal and spectral ity engineering from the University of Strathclyde,
feature learning with two-stream convolutional neural networks for Glasgow, U.K., in 2014 and 2020, respectively.
appliance recognition in NILM,” IEEE Trans. Smart Grid, vol. 13, no. 1, He is currently a Lecturer with the School of
pp. 762–772, Jan. 2022. Electrical and Information Engineering, Tianjin Uni-
[27] D. Yang, X. Gao, L. Kong, Y. Pang, and B. Zhou, “An event-driven versity, Tianjin, China. His current research interests
convolutional neural architecture for non-intrusive load monitoring of include demand response, energy trading in virtual
residential appliance,” IEEE Trans. Consum. Electron., vol. 66, no. 2, power plants, energy disaggregation, and graph sig-
pp. 173–182, May 2020. nal processing applications.
[28] P. Paulo, “Applications of deep learning techniques on NILM,” Ph.D.
dissertation, Dept. Elect. Eng., Federal Univ. Rio de Janeiro, Rio de
Janeiro, Brazil, 2016.
[29] D. Kelly, “Disaggregation of domestic smart meter energy data,” Ph.D.
dissertation, Comput. Univ. London, Diploma Imperial College, London,
U.K., 2016.
[30] L. Wang, S. Mao, and R. M. Nelms, “Transformer for nonintrusive load
monitoring: Complexity reduction and transferability,” IEEE Internet Mingjun Zhong (Member, IEEE) received the B.Sc.
Things J., vol. 9, no. 19, pp. 18987–18997, Oct. 2022. and Ph.D. degrees in applied mathematics from the
[31] J. Barber, H. Cuayáhuitl, M. Zhong, and W. Luan, “Lightweight non- Dalian University of Technology, Dalian, China,
intrusive load monitoring employing pruned sequence-to-point learning,” in July 1999 and July 2004, respectively.
in Proc. 5th Int. Workshop Non-Intrusive Load Monitor., Nov. 2020, He was an Associate Professor with Dalian Nation-
pp. 11–15. alities University, Dalian. He was a Post-Doctoral
[32] A. M. A. Ahmed, Y. Zhang, and F. Eliassen, “Generative adversarial Researcher with INRIA Rennes, Rennes, France,
networks and transfer learning for non-intrusive load monitoring in smart from December 2015 to December 2016. He was
grids,” in Proc. IEEE Int. Conf. Commun., Control, Comput. Technol. a Research Assistant with the School of Comput-
Smart Grids (SmartGridComm), Tempe, AZ, USA, Nov. 2020, pp. 1–7. ing, University of Glasgow, Glasgow, U.K., from
February 2008 to October 2011. He was a Research
[33] Z. Yue, C. R. Witzig, D. Jorde, and H. Jacobsen, “BERT4NILM: A
Associate with the School of Informatics, The University of Edinburgh,
bidirectional transformer model for non-intrusive load monitoring,” in
Edinburgh, U.K., from July 2013 to April 2017. He was an Associate Professor
Proc. 5th Int. Workshop Non-Intrusive Load Monit., 2020, pp. 89–93.
with the Dalian University of Technology. He was a Senior Lecturer with
[34] L. Wang, S. Mao, B. M. Wilamowski, and R. M. Nelms, “Pre-trained
the University of Lincoln, Lincoln, U.K. He is currently a Lecturer with
models for non-intrusive appliance load monitoring,” IEEE Trans. Green
the Department of Computing Science, University of Aberdeen, Aberdeen,
Commun. Netw., vol. 6, no. 1, pp. 56–68, Mar. 2022.
U.K. His current research interests include machine learning, computational
[35] D. Murray, L. Stankovic, V. Stankovic, S. Lulic, and S. Sladojevic, statistics, and artificial intelligence.
“Transferability of neural network approaches for low-rate energy dis-
aggregation,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
(ICASSP), Brighton, U.K., May 2019, pp. 8330–8334.
[36] D. Hendrycks, M. Mazeika, S. Kadavath, and D. Song, “Using self-
supervised learning can improve model robustness and uncertainty,”
2019, arXiv:1906.12340.
[37] L. Jing and Y. Tian, “Self-supervised visual feature learning with deep
neural networks: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., Wenpeng Luan (Senior Member, IEEE) received
vol. 43, no. 11, pp. 4037–4058, Nov. 2021. the B.S. degree in electrical engineering from
[38] S. Pascual, M. Ravanelli, J. Serrà, A. Bonafonte, and Y. Bengio, Tsinghua University, Beijing, China, in 1986, the
“Learning problem-agnostic speech representations from multiple self- M.S. degree in electrical engineering from Tianjin
supervised tasks,” 2019, arXiv:1904.03416. University, Tianjin, China, in 1989, and the Ph.D.
[39] J. Z. Kolter and M. J. Johnson, “REDD: A public data set for energy degree in electrical engineering from Strathclyde
disaggregation research,” in Proc. Workshop Data Mining Appl. Sustain- University, Glasgow, U.K., in 1999.
ability (SIGKDD), San Diego, CA, USA, 2011, pp. 59–62. After long industry working experiences, he joined
[40] J. Kelly and W. Knottenbelt, “The U.K.-DALE dataset, domestic Tianjin University, as a Professor, in 2019. His cur-
appliance-level electricity demand and whole-house demand from five rent research interests include smart metering data
U.K. homes,” Sci. Data, vol. 2, no. 1, pp. 1–14, 2015. analytics, nonintrusive load monitoring, distribution
[41] D. Murray, L. Stankovic, and V. Stankovic, “An electrical load mea- system analysis, renewable energy resource integration, and utility advanced
surements dataset of United Kingdom households from a two-year applications.
longitudinal study,” Sci. Data, vol. 4, no. 1, pp. 1–12, 2017.
Authorized licensed use limited to: Universidad Industrial de Santander. Downloaded on March 05,2024 at 02:36:27 UTC from IEEE Xplore. Restrictions apply.