Real Time Prediction and Anomaly Detection of Electrical Load in PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Applied Energy xxx (xxxx) xxxx

Contents lists available at ScienceDirect

Applied Energy
journal homepage: www.elsevier.com/locate/apenergy

Real-time prediction and anomaly detection of electrical load in a residential


community☆
Xinlin Wanga, Sung-Hoon Ahna,b,c
a
Department of Mechanical and Aerospace Engineering, Seoul National University, South Korea
b
Innovative Technology and Energy Center, Tanzania
c
Institute of Advanced Machines and Design, Seoul National University, South Korea

H I GH L IG H T S

• The load predictor can reduce the influence of the over-or-under fitting problem.
• The detector presents an independent detection process to identify anomalies.
• The data exploration method provides guidelines for the feature selection.
• The method realizes visual display and has low hardware requirements.
• The method is based on off-grid power plant, applied in a rural area of Tanzania.

A R T I C LE I N FO A B S T R A C T

Keywords: Regression model-based electrical load anomaly detection shows great potential to improve the quality of de-
Electrical load anomaly detection mand side management (DSM) because the load prediction and detection requirements can be satisfied by a
Demand side management single framework simultaneously. However, compared with other detection methods, both prediction and de-
Load prediction tection accuracy need improvement. To overcome this limitation, this work proposes a residential electrical load
Rule-engine
anomaly detection framework (RELAD) that includes a hybrid one-step-ahead load predictor (OSA-LP) and a
Residential electricity usage
rule-engine-based load anomaly detector (RE-AD). Considering that the diversity and randomness of residential
electricity usage may render prediction difficult, the OSA-LP cascades an autoregressive integrated moving
average (ARIMA) model and artificial neural networks (ANN) to achieve high precision in linear and nonlinear
regression. Meanwhile, through employing the Bayesian information criterion (BIC), the OSA-LP efficiently
reduces the influence of the over- or underfitting problem in real-time prediction and improves the prediction
accuracy. To remedy the deficiency of overreliance on prediction outcomes in regression-model-based anomaly
detection methods, the RE-AD integrates a support vector machine (SVM), the k-nearest neighbors (kNN)
method and the cross-entropy loss function to develop an independent detection process to analyze the cor-
rectness of data. This method was applied to detect the load of the off-grid solar power plant in Ngurudoto, a
rural area in Tanzania with 44 households and nearly 150 residents. The results of the practical application
demonstrate that the proposed predictor and anomaly detector exhibit better predictive and detective accuracy
than that achieved in previous work, which demonstrates the practicality of the proposed method.

1. Introduction of industrial and commercial sectors, are the major components of


energy consumption. According to a survey of the Energy Information
The electrical loads of residential communities, together with those Administration in 2018, the residential load accounted for 38.5% (1.46


This research was supported by the International S&T Cooperation Program through the National Research Foundation of Korea (NRF) funded by the Ministry of
Science, ICT & Future Planning (MSIP) (NRF-2017K1A3A9A04013801) and the Basic Research Lab Program through the NRF funded by the MSIT
(2018R1A4A1059976). The authors gratefully acknowledge the help of all the members in Innovative Technology and Energy Center, Tanzania, for their kind
cooperation and help in this project. Additionally, the authors would like to express sincere thanks to Institute of Engineering Research at Seoul National University
and Dr. Zhuqing Mao.
E-mail addresses: [email protected] (X. Wang), [email protected] (S.-H. Ahn).
URL: http://www.fab.snu.ac.kr/ (S.-H. Ahn).

https://doi.org/10.1016/j.apenergy.2019.114145
Received 5 September 2019; Received in revised form 21 October 2019; Accepted 11 November 2019
0306-2619/ © 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: Xinlin Wang and Sung-Hoon Ahn, Applied Energy, https://doi.org/10.1016/j.apenergy.2019.114145
X. Wang and S.-H. Ahn Applied Energy xxx (xxxx) xxxx

Nomenclature BIC Bayesian information criterion


SVM-AD SVM-based anomaly detector
DSM demand side management SVM support vector machine
PACF partial autocorrelation function ARIMA/AN-AD ARIMA/ANN-based anomaly detector
RELAD residential electrical load anomaly detection framework kNN k-nearest neighbors
ADF augmented Dickey-Fuller LSTM-AD long short-term memory network-based anomaly detector
OSA-LP one-step-ahead load predictor LSTM long short-term memory network
ARMA autoregressive-moving average C regularization parameter
RE-AD rule-engine based load anomaly detector SC(k) silhouette coefficient
Preal SVM classification result of load F1 detection accuracy measure
ARIMA autoregressive integrated moving average ACF autocorrelation function
Pprediction SVM classification result of prediction results R2 goodness-of-fit regression measure
ANN nonlinear artificial neural networks RMSD root-mean-square deviation
PkNN kNN classification result of load

trillion kWh) of the total annual electricity consumption in the United consumption patterns of residents. Moreover, according the feature
States [1]. Therefore, efficient detection of electrical load anomalies, of time series data in different periods, the RE-AD selects the optimal
such as electrical theft, leakage, and other nontechnical losses in re- detection approach which not only ensures the detection accuracy
sidential communities, is critical to save energy and ensure the safety of but also reduces the computational cost.
a power supply system [2]. At the same time, with the development of • We employ a novel data processing method that derives guidelines
demand side management (DSM), simply monitoring current power for residential load data and other nonstationary time series data
consumption is not sufficient to meet the second-by-second balance exploration. In this study, the results reflect residential load char-
between power consumption and generation. Especially in some rural acteristics over different periods.
regions where the main grid is not available, stablishing a compre- • RELAD is applied to detect the load of a standalone solar power
hensive framework that can detect anomalies in residential power plant in Ngurudoto (longitude: 36.906933, latitude −3.332608), a
usage in real time while simultaneously forecasting the future load has rural area in Tanzania with 44 households containing nearly 150
become an important step in improving the efficiency of the smart grid occupants. The households lack a sustainable energy source. The
[3]. To this end, previous studies have used regression model-based successful operation of RELAD addresses the need for providing
anomaly detectors to first predict the load and then use the difference in uninterrupted electricity in remote areas through better load pre-
the forecasting results and the real load data to identify anomalies [4]. diction and anomaly detection.
This method provides an effective solution to improve energy effi-
ciency: the prediction and detection requirements can be satisfied by a The rest of this paper is organized as follows. Section 2 presents
single framework. However, some long-term issues exist in the research related work, and Section 3 introduces our system overview and the
on regression-model-based anomaly detection. First, given the in- data exploration. Section 4 and 5 present our load predictor and
dependence, diversity, and randomness of residential electricity use, it anomaly detector. Section 6 discusses the experimental results and
remains challenging to employ an efficient prediction model to forecast comparisons, and Section 7 provides a summary.
electrical loads in real time. Second, because the goal is to identify
anomalies, such as electrical theft, leakage, and other nontechnical
2. Related work
losses in residential communities, traditional regression model-based
anomaly detectors rely on the outcomes of a prediction model and lack
Based on previous research in the literature, the current electrical
an independent anomaly detection design to analyze the correctness of
load anomaly detectors can be categorized into three main types: (1)
load data. These characteristics create a vicious cycle that compromises
regression-model-based anomaly detectors, (2) classification-model-
the anomaly detection accuracy.
based anomaly detectors, and (3) others.
To address the gaps in previous research, a residential electrical
The regression-model-based anomaly detectors first use regression
load anomaly detection framework (RELAD) composed of a one-step-
methods to fit the historical electrical load data and predict the future
ahead load predictor (OSA-LP) and a rule-engine-based load anomaly
load according to the results. Finally, monitored data with large de-
detector (RE-AD) is presented in this study. Our research innovations/
viations from the predicted results are identified as anomalies.
contributions are as follows:
Considering the effect of temperature on the power usage of residents,

• The proposed OSA-LP is used to forecast the electricity needs of Zhang et al. developed a linear regression-model-based load anomaly
detector [5]. The prediction results serve as the baseline: if the real
residential areas. This framework combines the advantages afforded
consumption data are far below the baseline, they are considered to be
by the linear autoregressive integrated moving average (ARIMA)
anomalous. This method provided a simple regression model-based
and nonlinear artificial neural networks (ANN). Then, considering
load anomaly detection approach that considers environmental factors;
the influence of human-induced random events on the prediction
however, individual differences in temperature sensitivity were ig-
results, the Bayesian information criterion (BIC) is employed to re-
nored. Moreover, if the annual temperature is constant, the model may
duce the influence of the over- or underfitting problem of the
not be useful. Considering the complexity of building electrical loads,
ARIMA model in real time. The introduction of BIC makes this study
Chou et al. proposed a hybrid prediction-model-based anomaly detector
achieve automatic model optimization in a real-time prediction
[6]. The hybrid prediction model cascades ARIMA and ANN (ARIMA/
process.
• Our RE-AD integrates a support vector machine (SVM), the k-nearest
ANN). ANN are introduced to compensate for the prediction error of
ARIMA in nonlinear regression and to enable the hybrid model to ex-
neighbors (kNN) method, and the cross-entropy loss function to
ploit the advantages of both linear and nonlinear models. However,
create an independent detection approach to analyze the correctness
although the prediction model has been improved, this work lacks a
of load data that is not affected by the prediction results. Anomalies
reasonable detection process. The two-sigma rule to detect anomalous
are identified accurately by reference to the normal power
data may be too simple to fully consider the impact of normal random

2
X. Wang and S.-H. Ahn Applied Energy xxx (xxxx) xxxx

events on the prediction results. To address the detection error caused unsupervised-classification-based system provides a new method for
by the simple detection algorithm, Luo et al. proposed a dynamic re- identifying anomalous data without additional training cost; however,
gression-results-based anomaly detector [7]. Instead of using a fixed due to the lack of references, this approach is limited by the fact that
threshold for the difference between the prediction results and real the detection results cannot be easily evaluated [13]. The consumer
load, an active adaptive threshold is calculated. The proposed dynamic power load anomaly detection results should be strengthened with
detection rule improves the detection accuracy; however, this work explanations of why the data are detected as abnormal.
lacks an independent detection algorithm to identify anomalies, and the Additional types of anomaly detection methods have been reported.
outcomes of the prediction model are still the only reference for Cabrera et al. presented a rule learning-based anomaly detector that
anomaly detection. Fenza et al. developed a drift-aware method to targets the detection of power waste patterns in educational institutions
detect anomalies in smart grids [8]. Historical data are first used to [14]. Data binning is first employed to reduce the data size; then, dif-
train a long short-term memory (LSTM) network; then, the anomaly ferent determination rules are used to identify energy waste patterns.
detection algorithm evaluates the prediction error trend obtained from Although this static analysis approach did not discuss the application
the LSTM over time. Because the target of this work is to test the details of the method, such as the detection speed and training cost, it
consumer profile rather than identify general nontechnical loss, iden- proposes a new detection approach with full consideration of the sur-
tified anomalies are not based on the error between the prediction re- rounding environment.
sult and the real data at a specific time, instead the error trend is uti- Table 1 summarizes the advantages and limitations of regression-
lized. model-based and classification-model-based anomaly detection
In addition to regression-model-based anomaly detectors, classifi- methods and the proposed RELAD.
cation-model-based anomaly detectors are widely used anomaly de-
tection methods. According to the classification model, these detectors 3. System overview
can be further classified into supervised-classification-model-based
anomaly detectors and unsupervised-classification-model-based The proposed RELAD is designed to realize real-time residential
anomaly detectors. electrical load prediction and anomaly detection. The framework of
Jokar et al. developed a supervised-model-based anomaly detector RELAD is illustrated in Fig. 1. The input of RELAD is the real-time load
of electricity theft [9]. During training, k-means clustering and the data and the historical load data of the past 24 h. To avoid the over-or
silhouette score are employed to determine the number of patterns in underfitting problem in real-time prediction and balance the require-
the dataset, and an SVM-based classifier is constructed to learn the ments of prediction accuracy and calculation speed, the moving
normal and anomalous patterns. Pinceti et al. presented a comparison window is designed to update the prediction model in each step. In each
test to consider the application of different supervised learning models step, after receiving the real-time data, the moving window selects the
for detecting attacks in load redistribution [10]. A nearest-neighbor real-time data and the historical data of the past 24 h as the input to
method, an SVM, and a replicator neural network are employed to update the OSA-LP. After updating the OSA-LP and obtaining the pre-
identify anomalies in a realistic dataset, and the nearest-neighbor al- diction result, then the RE-AD determines the correctness of the real-
gorithm outperforms the other detectors. The supervised-classification- time data. The outputs of RELAD are composed of the prediction result
model-based anomaly detector is an efficient anomaly detection for the next step and the detection result of the real-time data. Fig. 2
method with rational detection references and high detection accuracy. shows a diagram of the moving window.
However, the limitations of this method are the training cost and ap-
plicability. First, the detection accuracy is directly affected by the
quality of the training dataset: considerable time and resources are 3.1. Data exploration
required to obtain a high-quality labeled training dataset. Second, with
the development of DSM, detecting anomalies in the real-time load may RELAD is applied to detect the load of an off-grid solar power plant
not be sufficient to improve the energy efficiency and sustainability. (32 × 150 W solar panels; total 4.8 kW) built in Ngurudoto. Fig. 3(a)
Fan et al. proposed an unsupervised-classification-based building shows five days of raw power plant load for Ngurudoto: the x-axis is
power consumption anomaly detector to reduce the training cost of hours. The load is complex. The household electricity consumption data
supervised-learning-based anomaly detection [11]. The dominant per- are influenced by two aspects: the living habits of residents and random
iods and influential exogenous variables of each appliance in a building events. A machine learning model can learn living habits through well-
are identified by spectral density estimation and a decision tree. Finally, trained datasets; however, random events are unpredictable, which
an unsupervised learning model autoencoder is employed to calculate makes learning difficult. Some random activities, such as suddenly
the anomaly score for each observation. Scores that are higher than a turning on a bathroom light or making coffee, have a random, irregular,
preset threshold are identified as anomaly candidates. Pereora et al. and uncontrolled occurrence that reflects the diversity of modern
developed an autoencoder-based unsupervised anomaly detector to people’s lives and increases the difficulty of power consumption pre-
detect anomalies in solar energy generation series [12]. Compared with diction and detection. Even more challenging is that the overall energy
the general autoencoder model, the proposed encoding-decoding pro- consumption of a residential community is the sum of all such random
cess is improved by a variational self-attention mechanism. The events in each household, further complicating the learning process. As
shown in Fig. 3(a), some trends are observed in the change in daily

Table 1
Summary of electrical load anomaly detection works.
Type Advantages Limitations

Regression model Anomaly detection and future power usage prediction Low prediction
detection accuracy
Classification model-Supervised High detection accuracy High training cost
Lacking of prediction information
Classification model-Unsupervised Low training cost Weak interpretability
Lacking of prediction information
RELAD (this work) Anomaly detection and future power usage prediction High training cost
High detection accuracy

3
X. Wang and S.-H. Ahn Applied Energy xxx (xxxx) xxxx

3.1.1. Vibration analysis


Vibration analysis mines load series characteristics from the per-
spective of time series. Since the load series typically changes dyna-
mically over time in a strongly nonlinear manner, the first step is to
transform the nonstationary series into a stationary series [15]. The
method used in this study is to calculate the x-order difference of the
load series and use the augmented Dickey-Fuller (ADF) test to detect
whether the x-order difference is stationary. x starts at 1 and is in-
creased until a stationary difference is obtained [16]. According to the
ADF results, the first-order difference of the load series (i.e., x = 1) is
stationary, as shown in Fig. 3(b). During the period from 7 pm to 7 am,
the first-order difference is near zero, which means that the power
Fig. 1. The proposed residential electrical load anomaly detection framework. consumption of residents tends to be stable. However, during other
periods, severe vibrations are evident, which illustrates the impact of
random events in human life on electricity consumption. Therefore,
vibration analysis identifies a vibration and a nonvibration period: the
former from 7 am to 7 pm and the latter overnight. The results of the
vibration analysis provide a guideline for selecting a suitable load
prediction model according to the different periods and features of
different models.

3.1.2. Autoclustering
Because the independence, diversity, and randomness of the power
plant load inevitably affect the anomaly detection results, we classify
load data into different clusters to reduce the difficulty of anomaly
detection caused by random events. Compared with general supervised-
learning-based classification techniques, the autoclustering proposed in
this study does not rely on a training dataset and provides a simple data
exploration method to process complex data. We use automated clus-
tering featuring k-means and silhouette coefficients. K-means clustering
is widely used because it is fast and simple [17]. For all clustering
models, when the k-means are used to assign N data to k disjoint
Fig. 2. Diagram of the moving window. clusters, k (the number of groups) must be preset. However, k is un-
known for new data; therefore, the silhouette coefficient, which mea-
sures how similar an object is to other objects in its own cluster com-
load, and some parts of these trends show significant differences.
pared to objects in other clusters, is employed. The silhouette
During the day, the load data show high randomness; therefore, vi-
coefficient is [18]:
bration analysis and autoclustering are implemented to capture the
intrinsic characteristics of the residential load and to design a novel b (k ) − a (k )
SC (k ) = ,
detection method to reduce the difficulty of anomaly detection. max(a (k ), b (k )) (1)

where a (k ) is the mean intracluster distance and b (k ) is the mean


nearest-cluster distance for k. The silhouette coefficient ranges from − 1

Fig. 3. The results of data exploration.Note: Ngurudoto is an underdeveloped rural village in Tanzania; the power consumption of this area is relatively low.

4
X. Wang and S.-H. Ahn Applied Energy xxx (xxxx) xxxx

to + 1, with + 1 being optimal. We assume that there are m clustering where L ̂ is the maximized value of the model likelihood function, m is
scenarios and label the clustering scenarios from 2 to m. For each sce- the sample size, and k is the number of model parameters. In this study,
nario, k-means is used to cluster the data, and the silhouette coefficient the sample size m is equal to n2 , and k is equal to 2 (p and q). n2 is set as
is used to evaluate the result. After m clustering epochs, we obtain m n2 = U × U T , where U is a vector [0, k ], i.e., [0, 2].
silhouette coefficients, and the maximum value of the silhouette coef- The introduction of BIC realizes automatic model optimization, re-
ficient is selected to present the best clustering result. ducing the influence of over- or underfitting in real-time prediction. In
After autoclustering, Fig. 3(c) exhibits five load patterns (labeled each prediction, the moving window selects the historical data of the
0–4). Patterns 3 and 4 occur during the vibration period; however, past 24 h to update the ARIMA model, and BIC is simultaneously em-
during the daytime, the patterns are random. By means of automatic ployed to select the optimal parameters.
clustering, the irregular load data are transformed into different power
consumption patterns. In the anomaly detection phase, the detector 4.2. Nonlinear compensation
needs to detect only whether the real-time load data belong to a benign
learned pattern. Nonlinear compensation is an important part of the prediction
model. Although ARIMA is improved by BIC, the model is still not
4. One-step-ahead load predictor sufficiently accurate to fit the electrical load. Here, we select ANN as
the nonlinear compensation model to compensate for the fitting error of
Previous work has shown that time series data may consist of two the improved ARIMA (ARIMA and BIC) in nonlinear regression. ANN is
types of characteristics: linear characteristics and nonlinear character- a nonlinear modeling technique with a structure similar to that of the
istics [4]. Therefore, a comprehensive predictor applied in time series brain and is suitable for modeling in a wide range of applications [4].
data should feature both linear regression and nonlinear compensation: The inputs of the ANN training dataset are the regression result series of
linear regression is used to fit the data, and nonlinear compensation is the improved ARIMA and the related prediction time. The fitting errors,
employed to reduce the regression error [6]. i.e., the differences between regression results and real load value, are
the outputs. The moving window is designed to record the inputs and
4.1. Linear regression outputs of the training dataset and to update the ANN model in each
step. Considering the nonlinear features of residential power usage
Of the various linear regression models, the autoregressive moving data, a hyperbolic function is selected as the ANN activation function.
average (ARMA) model may be optimal in this scenario, and its accu- After testing, to balance the computation speed and accuracy, the
racy and flexibility have been proven in many time series data pre- learning rate of the network is set to 0.01 and the structure is 10 × 10.
diction studies. An ARMA (p , q) model can be expressed as [4]: The final result of the OSA-LP is the sum of the linear prediction result
of the improved ARIMA model and the nonlinear compensation result
yt = a1 yt − 1 + a2 yt − 2 + …ap yt − p + nt + b1 nt − 1 + ⋯+bq yt − q (2) of the ANN.
where yt , the value at time t, is a function of the previous p values, i.e.,
yt − 1 , yt − 2 , …, yt − p . The errors at times t , t − 1, …, t − q are 5. A rule-engine-based load anomaly detector
nt − 1, nt − 2, …, nt − q . p is the number of autoregressive terms, a1, …, ap are
the autoregressive coefficients, q is the number of lagged forecast er- The difference between the proposed RE-LAD and earlier regression-
rors, and b1, …, bq are the moving average coefficients. model-based anomaly detectors is that the prediction results are not the
However, as the electrical load is nonstationary, the use of ARMA to only indicators of anomalies; a rule engine is employed to detect
directly fit the data is inappropriate [15]. Therefore, we generalized the anomalies in real time, even when the predictions are inaccurate.
ARMA model to an ARIMA (p , d, q) model. The only difference between According to the different load characteristics of vibration and non-
ARIMA and ARMA is that before fitting the series, ARIMA first trans- vibration periods, two types of rule engines are designed. In the de-
forms the nonstationary time series into a stationary d-order difference tection process, the current time determines the rule engine to be used.
series, where d is the order of the difference. Given that we use the
results of the vibration analysis in this study, d = 1. 5.1. Anomaly detection in the vibration period
A limitation of ARIMA when applied in real-time load prediction is
that the values of p and q must be preset according to the features of the Random events and predictive errors occur frequently during the
target data series. Generally, to avoid over- or underfitting caused by vibration period. Therefore, to identify the anomalous data accurately
inappropriate p and q values, the autocorrelation function (ACF) and and to reduce the influence of random events on detection, the rule
partial autocorrelation function (PACF) are employed to analyze the engine operates in the following order: 1. Results classification, 2.
target data series. The optimal values of p and q are selected and preset Comparison of classification results, 3. Classification error test, and 4.
according to the results of the ACF and PACF. Nevertheless, this ap- Evaluation of classification results.
proach ignores the problem that the nonstationary load series change
dramatically over time. The characteristics of the load series used for ex 5.1.1. Results classification
ante analysis and real-time monitoring may vary considerably. Fixed To avoid the effect of the predictive errors on anomaly detection,
parameters may not be sufficient to reflect the characteristics of the the prediction results and real load are not simply compared. We first
real-time data, which ultimately results in over- or underfitting. Thus, use an SVM to explore whether the results exhibit the same power
the BIC, a criterion for optimal model selection from a set of finite consumption pattern. An SVM is a widely used classifier that is
models, is introduced. The general idea of BIC is to use informative strengthened by the kernel method. By introducing the kernel, an SVM
priors to shrink the unrestricted model toward an accurate benchmark gains flexibility in the choice of the form of the threshold separating
to reduce parameter uncertainty and improve prediction accuracy [19]. anomalies from normal data. Since the kernel is nonparametric and
After scoring the fitting results of each model, the lowest BIC score is operates locally, it does not need to have the same functional form for
selected as an indicator of the best fitting model. For autoclustering, we all the data [21]. In this study, the SVM model employs a Gaussian
assume that the maximum values of both p and q are n, yielding n2 kernel. By locating the solution space in high-dimensional space, the
regression scenarios numbered from 1 to n2 . We obtain n2 BIC scores Gaussian kernel-based SVM can effectively solve the problem caused by
and then select the regression with the lowest BIC. The BIC is [20]: the nonlinear decision function, improving the model’s nonlinear clas-
sification performance [22]. The training datasets of an SVM consist of
BIC = k × ln (m) − 2 × ln (L )̂ (3) a benign dataset and a malicious dataset; the benign dataset used in this

5
X. Wang and S.-H. Ahn Applied Energy xxx (xxxx) xxxx

work is the autoclustering result shown in Fig. 2(c). The malicious exhibit different patterns, we focused on the detection of real loads as
dataset contains two types of malicious events: electricity theft and follows:
leakage. Notably, because electricity theft and leakage are infrequent,
in our historical dataset, the number of anomalies is limited compared • Use the kNN method to find the training dataset pattern closest to
with the quantity of normal data. To prevent the classification error the real load, and label it P . kNN
caused by imbalanced data, the benign sample-based malicious sample • If P = P ,there is no classification error. Daily random events of
real kNN
creation method first proposed by Jokar et al. is employed [9]. n data residential power use are the cause of Pprediction and Preal exhibiting
from the benign dataset are randomly selected to form a new benign different patterns. If Preal is benign, a result follows.
dataset x, i.e., x ∈ {x1, …, x n} . Because the goal of electricity theft is to • If Preal and PkNN exhibit different patterns, there is a classification
report a lower consumption than normal use, the electricity theft da- error. Although we introduced the advantages of the SVM in the first
taset y can be represented as y = α × x , where α ∈ [0, 1] and step, misclassification remains unavoidable. An SVM may mis-
E (y ) ⩽ E (x ) . Similarly, the electricity leakage dataset z can be denoted classify when the training data quality is inadequate or the reg-
as z = β × x , where β ∈ [1, ∞] and E (x ) ⩽ E (z ) . Although defining all ularization parameter C is inappropriate. [24]. Therefore, the kNN
the malicious samples that are consistent with E (y ) ⩽ E (x ) or result is preferred in this scenario. If PkNN is benign, it is tested in the
E (x ) ⩽ E (z ) might not be practical, the malicious dataset can be gen- next step.
erated by considering the generalization property of the SVM [12]. The
values of α and β can be considered as the detection sensitivity criteria. The kNN result is defined as correct when it differs from the SVM
In this study, α is set to 0.5, and β is set to 2. The prediction results and result. Many previous studies have found that SVM outperforms kNN in
the real load value are classified after training the SVM. large feature spaces [25,26]; however, as mentioned above, the accu-
racy of an SVM relies on the C value and the use of a high-quality
5.1.2. Comparison of classification results training dataset. In comparison, the kNN method relies on a limited
According to the classification results, three cases are explored: number of neighbors rather than class domains. Therefore, kNN is va-
luable for correction when classification errors occur [27]. Notably, we
• If the prediction result and the real load exhibit the same pattern use the kNN method only to check and correct classification errors: the
kNN method cannot substitute for the SVM in the first step of anomaly
(i.e., Pprediction = Preal ) and the pattern is benign, the current load
value is normal. detection because of the features of the SVM, which are mentioned in
• If Pprediction = Preal but the pattern is malicious, the load is anomalous, the results classification step.
and a warning follows.
• If Pprediction and Preal exhibit different learned patterns, there are two 5.1.4. Evaluation of classification results
In this step, the cross-entropy loss function is used to evaluate
potential reasons: (1) the random events affect the load and make
the real data vary greatly, resulting in different patterns; (2) there is whether the classification result is acceptable from the perspective of
a classification error. Therefore, the algorithm first performs a kNN- global optimization. The cross-entropy loss function is a popular clas-
based classification error check to reassess the results. If no error is sification/clustering result evaluation method that is simple, accurate,
observed, then whether the classification result is in fact acceptable and adaptable to global optimization. The cross-entropy loss function is
is determined. defined as [28]:
T
⎛ ⎞
5.1.3. Classification error test Loss ⎜y, y ⎟̂ = ∑ ̂
yi log(yi ),
⎝ ⎠ i=1 (4)
The kNN method, which is simple, practical, and robust to noisy
training data, is used to evaluate the classification results. K, a positive where y is the real label in the training dataset, y ̂ is the classification of
integer, is the number of nearest neighbors to be evaluated [23]. We use the SVM, and L is the size of the training dataset. The lower the cross-
the simplest kNN model, i.e., 1NN. As our predictions and the real data entropy loss is, the better the classification. The steps are as follows:

Fig. 4. Flowchart of the rule engine in the vibration period.

6
X. Wang and S.-H. Ahn Applied Energy xxx (xxxx) xxxx

• Use the cross-entropy loss function to calculate the loss of the initial speed.
training dataset.
• Place the load data and the matched pattern into the training dataset 6.1. Experimental results
(to build a new dataset) and use the (new) cross-entropy loss func-
tion to recalculate the loss. 6.1.1. Test A
• Compare the two results. If the difference between the two loss Fig. 6 shows the user interface (UI) of RELAD. The current mon-
functions ε is smaller than threshold1, the classification result of kNN itoring time (t) is 10:20 am, during the vibration period, and the current
is acceptable, and the load is benign. Otherwise, even if the kNN load is 490 W. In the previous step (t − 1), the predictor forecasted the
method indicates that the load is benign, the algorithm considers the next load to be 479 W (blue point), i.e., 479 W at 10: 20 am. Therefore,
load to be abnormal. according to the rule engine, the anomaly detector classified the current
load of 490 W, and the prediction result of 479 W by the SVM. The
Fig. 4 shows a flowchart of the rule engine in the vibration period. classification results showed that both the real load of 490 W and the
prediction result of 479 W belong to pattern 1, a benign pattern.
5.2. Anomaly detection in the nonvibration period Therefore, the result of the anomaly detector was normal. Moreover,
the moving window selected the historical load data of the past 24 h to
The load in the nonvibration period tends to change linearly. Given update the OSA-LP and predicted that the load at the next step (t + 1)
the high performance of our prediction model in this period, the rule- would be 491 W. The prediction results of this study can be used in two
engine operates as follows: ways. First, the results provide a reference for the next anomaly de-
Compare the difference between the prediction result and the real tection. Second, the results can be compared with the remaining ca-
load with respect to a preset parameter, threshold2 . If the difference is pacity of the battery and the charging speed of the solar panel to de-
higher than threshold2 , the data are abnormal. termine the remaining power supply time of the off-grid power supply
The rule engine of the nonvibration period is straightforward but is system.
applicable only in some special periods when the possibility of random
electricity use is low. For daytime power consumption anomaly detec-
6.1.2. Test B
tion, this rule engine is not sufficiently comprehensive to achieve high
Fig. 7 shows another test result in the normal monitoring case. In
detection accuracy.
this test, the prediction result in the previous step (t − 1) was 667 W,
but the current load (t) is 1200 W. The two results exhibit two different
6. Experimental results and comparison
patterns (the load data belong to pattern 3 and the prediction results
belong to pattern 1). Therefore, according to the rule engine, the RE-AD
The proposed method is applied to detect the load of the solar power
first uses kNN to assess the classification error. The kNN result shows
plant built in Ngurudoto. The wiring diagram of Ngurudoto is shown in
that the current load data belong to pattern 3, which is the same as the
Fig. 5. We previously designed a long range (LoRa)-based wireless en-
result of the SVM. Then, the cross-entropy loss function is employed to
ergy monitoring system to track the electrical load of the power plant
evaluate the classification result and indicates an acceptable result.
and all households in the village. Considering the system needs to op-
Therefore, the final results show that the current power consumption
erate in a harsh environment with limited skilled manpower for op-
situation is normal, the load belongs to a benign learned pattern, and
eration and maintenance, the wireless energy monitoring system con-
the prediction result for the next step (t + 1) is 1040 W.
tains two parts: a local part in Tanzania and a remote control part in
South Korea. RELAD is implemented in South Korea. In each step, if
RELAD identifies the real-time load is abnormal, a warning will be is- 6.1.3. Test C
sued to the manager of the system to remote control the power plant in Fig. 8 shows the change in load when the power supply system is
Tanzania. abnormal. The prediction result in the previous step (t − 1) was 318 W,
The prediction interval of the OSA-LP, i.e., the communication in- pattern 0. However, the current load (t) is 3053 W, and the SVM result
terval between Tanzania and South Korea, is determined based on the is pattern 6, malicious dataset. Finally, RELAD issues a warning. Ver-
prediction accuracy and early warning of battery capacity. If the in- ification showed that this warning was caused by an abnormal circuit in
terval is long, there will be sufficient time to confirm the remaining the power supply system. Benefiting from the timely feedback of the
power of the standalone power supply system to prevent power
outages. However, considering the unpredictable occurrence of random
events in residential power usage, a long interval will also reduce the
prediction accuracy. Conversely, if the interval is short, the prediction
accuracy can be improved, but the significance of an early warning is
lost. Therefore, considering the prediction accuracy, the effectiveness of
early warning and the number of households afforded by the off-grid
power plant, the prediction interval in this work is set to 15 min.
RELAD has low computer hardware requirements and is im-
plemented using a desktop computer with a 3.4 GHz Intel Core i5
processor, 4 GB RAM, and the Windows 10 operating system. To sim-
plify the data mining process, the proposed method is designed based
on C Sharp and Python; C Sharp builds a visual interface with several
user options. Through investigating the value of cross-entropy loss in
normal power consumption, in this study the threshold1 is set to 3.
According to the results of vibration analysis, to balance the accuracy
and sensitivity of detection, the threshold2 is set to 50 W.
Three experimental results are presented to illustrate the im- Fig. 5. Wiring diagram of Ngurudoto (from Google map). Note: Ngurudoto is a
plementation of RELAD in detail. To further demonstrate the con- rural area in Tanzania with 44 households and nearly 150 occupants. The
tributions of this study, RELAD is compared with previous works, and power of this village is supplied by an off-grid solar power plant (32 × 150
the comparison results are discussed in terms of accuracy and running solar panels; total 4.8 kW).

7
X. Wang and S.-H. Ahn Applied Energy xxx (xxxx) xxxx

Fig. 6. The UI of RELAD - test A. On the left


side of the UI is the operation panel. Our
method has two modes of operation: auto-
detection mode and manual detection model.
Because our method is suitable not only for
monitoring the load of Ngurudoto but also
for detecting and forecasting the power
consumption of the smart manufacturing
process, additional buttons are included on
the operation panel.

ARIMA/AN-AD is updated by the moving window in each step, and


p , d , and q are set to 1. The training datasets of SVM-AD, LSTM-AD and
our method are the same. SVM-AD uses a multiclass SVM with a radial
basis function (RBF) kernel. The structure of LSTM-AD consists of a first
visible layer with 1 input, a hidden layer with 80 LSTM neurons, and an
output layer. Network training is performed for 90 epochs. After two
months of uninterrupted, 24-h real-time detection of the power plant
load in Ngurudoto (from February 1, 2019, to April 1, 2019), the results
of each detector are evaluated and reported in Table 2.
To explain the prediction performance of each model more com-
prehensively, two goodness-of-fit regression measures are introduced.
The common R-squared (R2 ) goodness-of-fit regression measure is used
to provide an intuitive measure of how well observations are replicated
by each model according to the proportion of total variation of the
Fig. 7. The UI of the proposed method-test B.
outcomes. R2 ranges from 0 to 1: the higher the score is, the better the
fit [29]. The root-mean-square deviation (RMSD) is another frequently
used measure of the differences between values predicted by a model
and the values observed. In general, a lower RMSD is better than a
higher one [30].
F1 score is a measure, that provides a balanced evaluation of the
overall performance of a classifier. The F1 score consists of precision p
and recall r, which were developed by the information retrieval com-
munity. p , r and the F1 score are defined as follows [23]:
TP
p ≔ ,
FP + TP (5)

TP
r ≔ ,
FN + TP (6)

F1 score is the harmonic mean of the precision p and recall r as follows:


Fig. 8. The UI of the proposed method-test C.
2×p×r
F1 ≔ ,
p+r (7)
proposed method, the manager can remotely control the power supply
system in time to reduce losses.
Table 2
Comparison of the results between the proposed method and previously re-
6.2. Comparison results and evaluation ported methods.
Items R2 RMSD F1
The results of RELAD were compared with the results of Chou et al.
[6], Fenza et al. [8], and Jokar et al. [9]. Chou et al. use an ARIMA/ RELAD (this work) 0.6967 217.3001 0.9953
ARIMA/ANN-AD 0.5928 243.5128 0.8427
ANN-based anomaly detector (ARIMA/AN-AD), Fenza et al. use an
LSTM-AD 0.6361 217.6356 0.9837
LSTM network-based anomaly detector (LSTM-AD), and Jokar et al. use SVM-AD N/A N/A 0.9829
an SVM-based anomaly detector (SVM-AD). The prediction model of

8
X. Wang and S.-H. Ahn Applied Energy xxx (xxxx) xxxx

where TP represents the number of time points at which normal data OSA-LP is approximately 400 ms, and the total calculation time of
were correctly classified, FP represents the number of time points at prediction and detection is approximately one second, which ap-
which anomalous data were classified as normal, TN represents the proaches real-time monitoring. However, the running speed of LSTM-
number of time points at which anomalous data were considered ab- AD is slower than that of the other methods. In actual testing, the
normal, and FN represents the number of time points at which normal computation speed is too slow for real-time monitoring. The limitation
data were labeled abnormal. of LSTM-AD is that the method, which is based on a deep neural net-
work, has its practicability limited by the hardware configuration. By
6.2.1. Accuracy contrast, RELAD not only realizes real-time load forecasting and
anomaly detection on a computer with a simple hardware configuration
1. Prediction results: but also provides visual display and function selection, which improves
The three methods other than SVM-AD, which has no prediction practicability.
model, use different prediction models to forecast the current or
future load of the power plant. The results of R2 are consistent with 7. Conclusion
that of RMSD, and show that the OSA-LP outperforms the ARIMA/
ANN-AD. That indicates the introduction of BIC enhances the Among all the types of current power consumption anomaly de-
ARIMA/ANN model and can effectively reduce the influence of the tectors, regression-model-based anomaly detectors, which detect
over- or underfitting problem of the ARIMA model, thereby im- anomalies and forecast the future load simultaneously, have great po-
proving the real-time prediction accuracy. Regarding the prediction tential to improve the quality of demand side management. However,
results of both the OSA-LP and the LSTM-AD, both the results of R2 the prediction and detection performance of traditional regression
and RMSD show that the OSA-LP outperforms the LSTM-AD, mainly model-based anomaly detection works is unsatisfactory, which hinders
because the OSA-LP is a real-time predictor and the moving window the application of such methods. In this study, we propose a novel
updates the model in each step. Considering the influence of human- anomaly detection framework consisting of a one-step-ahead load
induced random events in the residential load on the prediction predictor and a rule-engine-based load anomaly detector. On the one
results, prediction errors are unavoidable for each model. In this hand, the proposed predictor simultaneously exploits the advantages of
scenario, determining how to realize automatic prediction error each model and reduces the influence of the over- or under fitting
compensation becomes the key to improve the accuracy of real-time problem in real-time prediction. On the other hand, the proposed
prediction. For the OSA-LP, even if prediction errors occur, the best anomaly detector exhibits an independent anomaly detection process
parameters can still be selected through BIC to fit the latest histor- that can identify anomalies such as electrical theft or leakage by re-
ical data selected by the moving window in real time. This design ferencing the normal power usage patterns of residents. The proposed
ensures less discrepancy between the observed values and the values method provides critical guidance for reducing nontechnology loss and
expected under the model. By comparison, although the LSTM-AD improving the efficiency of smart grids.
uses a more sophisticated neural network, the model still lacks a Our experimental results demonstrate that (1) the data exploration
feedback mechanism to reduce the prediction error. To exhibit the method presented in this study is an efficient feature selection method
prediction results of each model more intuitively, Fig. 9 shows 24-h that can be used as a guideline for more nonstationary time series data
real-time prediction results of ARIMA/AN-AD, LSTM-AD, and OSA- exploration research; (2) the proposed predictor and detector achieve
LP, and Table 3 exhibits the related prediction results. The results better prediction and detection accuracy than that in previous work;
shown in Table 3 are consistent with that of Table 2, and both the and (3) while realizing high-speed calculation and accurate prediction
results of R2 and RMSD show that the OSA-LP outperforms the other and detection, our method provides a visual display and function se-
two methods. lection with low hardware requirements, expanding the application of
2. Anomaly detection results: machine learning technology. Our results are based on the electrical
Among all the anomaly detectors, ARIMA/ANN-AD exhibits the load of a solar power plant in Ngurudoto, a rural area of Tanzania. The
lowest F1 score. General regression-model-based anomaly detection power consumption pattern is relatively simple. For a more general
methods for identifying anomalies rely excessively on the results of application, future work will apply this method in more residential
the prediction model. Classifying all the points outside of 2 standard areas to test different power consumption patterns. Meanwhile, con-
deviations (SDs) from the mean of the prediction results as anom- sidering the influence of human-induced random events on the results
alous data is not reasonable. The results of the RELAD show that an of real-time residential electrical load prediction and anomaly detec-
independent anomaly detection process is necessary to improve the tion, further work will also employ more advanced approaches to fur-
anomaly detection accuracy of the regression-model-based anomaly ther improve the prediction and anomaly detection performance of this
detector. Although the target of LSTM-AD is to detect anomalies in method.
consumer behavior, such as family structure changes, rather than This work presents a standalone power source-based anomaly de-
specific nontechnical loss, when anomalies occur, LSTM-AD can still tection framework. The accuracy and practicality of this method have
capture the changes in data series. According to the F1 scores of been demonstrated in this study. We hope that through the application
LSTM-AD and ARIMA/ANN-AD, LSTM-AD has higher detection ac- of this method, an ever-increasing number of people using off-grid
curacy for two main reasons: 1. The prediction accuracy of the LSTM
network is higher than that of ARIMA/ANN; 2. The detection pro-
cess of LSTM-AD does not compare the error between the predicted
and real loads at a specific time, but in analyzing the trend, it im-
proves the rationality of detection. The SVM-AD also exhibits a high
F1 score. However, in contrast to other methods, the SVM-AD cannot
provide more information about the future load, which limits its
application. In summary, RELAD shows the best prediction and
anomaly detection performance among all anomaly detectors.

6.2.2. Running speed


The ARIMA/ANN-AD, SVM-AD, and RELAD have fast running
speeds. After long-term verification, the mean calculation time of the Fig. 9. 24-h real-time prediction results of each prediction model.

9
X. Wang and S.-H. Ahn Applied Energy xxx (xxxx) xxxx

Table 3 [11] Fan C, Xiao F, Zhao Y, Wang JY. Analytical investigation of autoencoder-based
Comparison of the 24-h real-time prediction results between the proposed methods for unsupervised anomaly detection in building energy data. Appl Energy
method and previously reported methods. 2018;211:1123–35.
[12] Pereira J, Margarida S. Unsupervised anomaly detection in energy time series data
Items R2 RMSD using variational recurrent autoencoders with attention. In: 2018 17th IEEE inter-
national conference on machine learning and applications; 2018.
[13] David Weinberg. Our machines now have knowledge we will never understand;
OSA-LP (this work) 0.7711 125.4662
2017. https://www.wired.com/story/our-machines-now-have-knowledge-well-
ARIMA/ANN-AD 0.6099 167.7502
never-understand/.
LSTM-AD 0.7384 150.0135
[14] Cabrera DFM, Zareipour H. Data association mining for identifying lighting energy
waste patterns in educational institutes. Energy Build 2013;62:210–6.
[15] Nie HZ, Liu GH, Liu XM, Wang Y. Hybrid of ARIMA and SVMs for short-term load
power sources can enjoy uninterrupted electricity as soon as possible. forecasting. Energy Procedia 2012;16:1455–60.
[16] Owoye Oluwole. The causal relationship between taxes and expenditures in the G7
countries: cointegration and error-correction models. Appl Econ Lett 1995;2:19–22.
Declaration of Competing Interest [17] Gaddam SR, Phoha VV, Balagani KS. K-Means+ID3: a novel method for supervised
anomaly detection by cascading K-means clustering and ID3 decision tree learning
methods. IEEE Trans Knowl Data Eng 2007;19:345–54.
The authors declare that they have no known competing financial
[18] Amorim RC, Hennig C. Recovering the number of clusters in data sets with noise
interests or personal relationships that could have appeared to influ- features using feature rescaling factors. Inf Sci 2015;324:126–45.
ence the work reported in this paper. [19] Sune Karlsson. Forecasting with Bayesian vector autoregression. Handbook Econ
Forecast 2015;2:791–897.
[20] Findley David F. Counterexamples to parsimony and BIC. Ann Inst Stat Math
References 1991;43:505–14.
[21] Auria L, Moro RA. Support vector machines (SVM) as a technique for solvency
[1] EIA; 2017. https://www.eia.gov/energyexplained/index.php?page=electricity_use. analysis; 2008.
[2] Spagnuolo A, Petraglia A, Vetromile C, Formosi R, Lubritto C. Monitoring and op- [22] Ye Tian. Stock forecasting method based on wavelet analysis and ARIMA-SVR
timization of energy consumption of base transceiver stations. Energy model. In: 2017 3rd international conference on information management; 2017.
2015;81:286–93. [23] Iwayemi A, Zhou C. SARAA: semi-supervised learning for automated residential
[3] Thiaux Y, Dang TT, Schmerber L, Multon B, Ahmed HB, Bacha S, et al. Demand-side appliance annotation. IEEE Trans Smart Grid 2017;8:779–86.
management strategy in stand-alone hybrid photovoltaic systems with real-time [24] Zhou Hua. Machine learning, Tsinghua. University Press; 2016.
simulation of stochastic electricity consumption behavior. Appl Energy 2019;253. [25] Hmeidi I, Hawashin B, Qawasmeh EE. Performance of KNN and SVM classifiers on
[4] Narendra C, Eswara Reddy BB. A moving-average filter based hybrid ARIMA-ANN full word Arabic articles. Adv Eng Inform 2008;22:106–11.
model for forecasting time series data. Appl Soft Comput 2014;23:27–38. [26] Colas F, Brazdil P. Comparison of SVM and some older classification algorithms in
[5] Zhang Y, Chen WW, Black J. Anomaly detection in premise energy consumption text classification tasks. In: IFIP international conference on artificial intelligence in
data. In: 2011 IEEE power and energy society general meeting; 2011. theory and practice; 2016. p. 169–78.
[6] Chou JS, Telaga AS. Real-time detection of anomalous power consumption. Renew [27] Palaniappan R, Sundaraj K, Sundaraj S. A comparative study of the SVM and kNN
Sustain Energy Rev 2014;33:400–11. machine learning algorithms for the diagnosis of respiratory pathologies using
[7] Luo J, Hong T, Yue M. Real-time anomaly detection for very short-term load fore- pulmonary acoustic signals. BMC Bioinformatics 2015;15.
casting. Clean Energy 2018;6:235–43. [28] Kroese DP, Rubinstein RY, Taimre T. Application of the cross-entropy method to
[8] Fenza G, Gallo M, Loia Vi. Drift-aware methodology for anomaly detection in smart clustering and vector quantization. J Global Optim 2007;37:137–57.
grid. IEEE Access 2019;7:9645–57. [29] Glantz SA, Slinker BK. Primer of applied regression and analysis of variance.
[9] Jokar P, Arianpoo N, Leung VCM. Electricity theft detection in AMI using customers’ McGraw-Hill; 1990.
consumption patterns. IEEE Trans Smart Grid 2016;7:216–26. [30] Robert P, Olufunmilayo T, Hao C. Components of information for multiple resolu-
[10] Pinceti A, Sankar L, Kosut O. Load redistribution attack detection using machine tion comparison between maps that share a real variable. Environ Ecol Stat
learning: a data-driven approach. 2018 IEEE power & energy society general 2008;15:111–42.
meeting. 2018.

10

You might also like