Sensors: Multivariate-Time-Series-Driven Real-Time Anomaly Detection Based On Bayesian Network
Sensors: Multivariate-Time-Series-Driven Real-Time Anomaly Detection Based On Bayesian Network
Article
Multivariate-Time-Series-Driven Real-time Anomaly
Detection Based on Bayesian Network
Nan Ding ∗ ID
, Huanbo Gao, Hongyu Bu, Haoxuan Ma and Huaiwei Si
School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China;
[email protected] (H.G.); [email protected] (H.B.);
[email protected] (H.M.); [email protected] (H.S.)
* Correspondence: [email protected]
Received: 25 August 2018; Accepted: 4 October 2018; Published: 9 October 2018
Abstract: Anomaly detection is an important research direction, which takes the real-time information
system from different sensors and conditional information sources into consideration. Based on this,
we can detect possible anomalies expected of the devices and components. One of the challenges
is anomaly detection in multivariate-sensing time-series in this paper. Based on this situation,
we propose RADM, a real-time anomaly detection algorithm based on Hierarchical Temporal Memory
(HTM) and Bayesian Network (BN). First of all, we use HTM model to evaluate the real-time
anomalies of each univariate-sensing time-series. Secondly, a model of anomalous state detection in
multivariate-sensing time-series based on Naive Bayesian is designed to analyze the validity of the
above time-series. Lastly, considering the real-time monitoring cases of the system states of terminal
nodes in Cloud Platform, the effectiveness of the methodology is demonstrated using a simulated
example. Extensive simulation results show that using RADM in multivariate-sensing time-series
is able to detect more abnormal, and thus can remarkably improve the performance of real-time
anomaly detection.
1. Introduction
The system condition monitoring associated with virus invasion, failed sensors, and improperly
implemented controls plagues many automated information system, such as wireless sensor
networking, vehicular networking, and industrial system. The real-time anomaly detection for system
condition monitoring has significant and practical applications, which uses the information coming in
real-time from different sensors and other condition information sources and tries to detect possible
anomalies in the normal condition and behaviour expected of its devices or components. Anomalies
can be spatial, known as spatial anomalies, which means the values are outside the typical range like
the 1st and 3rd anomalies in Figure 1, which is time series data collected from temperature sensors
in a factory (The anomaly in Figure 1 is marked in red). Also anomalies can be temporal, known as
temporal anomalies.The values are not out of typical range but the sequence generated from which is
anomalous, like the second anomaly in Figure 1 [1].
At present, there is extensive work on anomaly detection techniques looking for individual
objects that are different from normal objects. Some of them mainly focus on the anomaly detection
in univariate time series (UTS). Holt-Winters is one of the anomaly detection methods which can
detect spatial anomalies and it is widely implemented for commercial applications [2]. The Skyline
project provides an open-source implementation of a number of statistical techniques for anomaly
detection in streaming data [3]. Autoregressive Integrated Moving Average Model(ARIMA) is a
general technique for modeling temporal data with seasonality which can perform temporal anomaly
detection in complex scenarios [4]. Bayesian change point methods are a nature method which
segments time series and can be used for online anomaly detection [5,6]. Twitter released its own
open-source anomaly detection algorithms for time series data, it is capable of detecting spatial and
temporal anomalies, and has gotten a relatively high score in the Numenta Anomaly Benchmark (NAB)
scoring mechanism [7,8]. In addition, there are a number of model-based methods applied to specific
fields, examples include detection for cloud data center temperatures [9], ATM fraud detection [10],
anomaly detection in aircraft engine measurements [11], and some excellent work based on an accurate
forecasting solutions with application to the water sector [12,13] so on.
However, in a complex system, compared to the anomaly detection in UTS, it brings richer system
information using multivariate time series (MTS), which are captured from the sensors and condition
information sources. For the methods of anomaly detection in MTS, there are two main categories, one
is to use the method of detecting anomaly after dimension reduction. Existing dimensionality reduction
methods, such as PCA (Principal component analysis) dimensionality reduction method used for MTS
and the linear dimensionality reduction method used for MTS based on common principle component
analysis, are processing the principle time series according to the anomaly detection method for
UTS [14]. Another method is to take the sliding window as a tool for dividing MTS, and then detect
the subsequences. For example, one method is to calculate each covariance matrix of subsequence
based on Riemannian manifolds and make the covariance matrix as the descriptor, Riemannian
distance as the similarity measure to calculate the distance between the covariance matrix. And it
can show the existence of abnormal intuitively through the distribution of the covariance matrix and
its visualization [15,16]. These methods could meet the requirements for anomaly detection in MTS,
but due to the lack of consideration of the inherent relevance in MTS, actual effect and accuracy are
required to be improved. In order to improve the efficiency of the algorithm, this paper introduces the
concept of health factor α which means whether the system is running well enough or not. Introducing
health factor can greatly reduce the cost of RADM in a healthy system.
In this paper, we propose a real-time anomaly detection algorithm in MTS based on Hierarchical
Temporal Memory (HTM) and BN (Bayesian Network). The remainder of the paper is structured as
follows. Section 2 provides a brief introduction to the performance problems and scenario of anomaly
detection in MTS, while Section 3 is dedicated to presenting the methodology of HTM. The proposed
RADM is discussed in Section 4. Then, in Section 5, to demonstrate the principles and efficacy, results
are presented comparing RADM with HTM. Finally, in Section 6, concluding remarks and possible
extensions of the work are discussed.
The contributions of this paper can be summarized as follows:
2. This paper introduces the concept of health factor α to describe the health of the system, and
further more, greatly improve the detection efficiency of health systems.
3. RADM combines HTM with naive Bayesian network to detect anomalies in multivariate-sensing
time-series, and get better result compared with the algorithm just work in univariate-sensing
time-series.
At present, related works utilize system parameters to characterize the real-time state of the
system, and implement the real-time detection for its state. In order to perform the anomaly detection
in MTS, based on the real-time monitoring cases of the system states of terminal nodes in Cloud
Platform, we virtualize the node system state into CPU, NET and MEM parameters, as shown in
Figure 2, and use them as anomaly detection sequences to characterize the system state. We describe
the system state as follow:
where, X (t) represents the time series data of CPU, Y (t) is the time series data of NET, and Z (t) means
the time series data of MEM. Since NET accounts for a large proportion in the reasons of system
anomalies, we set NET as the principle time series, and it is utilized in anomaly detection in UTS which
is used as a comparison in the experiment.
in the hierarchy. Hierarchy information will continue to converge with the rise of the level, and also
diverge with the decline of the level. The spatial pooler and the temporal pooler are used in HTM to
learn and predict from the input data. The HTM region contains a columnar region consisting of cells,
and through the spatial pooler each cell produces a sparse distributed representation, which is used to
record the active state of cells. Then the temporal pooler can discover and learn pattern from the list of
active columnar regions calculated from the spatial pooler, and serialize it for prediction [19].
. . . , x t −2 , x t −1 , x t , x t +1 , x t +2 , . . . (2)
In practical applications, the statistics of the system can change dynamically, real-time learning
and training are needed to perform the detection of new anomalies. HTM is a learning algorithm which
appears to match the above constraints and has been shown to work well for prediction tasks [20],
however, it does not directly output the anomaly score. In order to perform anomaly detection,
two different internal representations are used in the HTM. As shown in Figure 3 [1]. Given an input
xt , the vector a (xt ) is a sparse binary code representing the sparse distributed representation of the
current input. Also an internal state vector π (xt ) is used to represents a prediction for a (xt+1 ), that is,
a prediction of the next input xt+1 . The prediction vector incorporates inferred information about
current sequences, which is dependent on the current detected sequence and the current inferred
position of the input in the sequence, and this means different inputs will lead to different predictions.
However, a (xt ) and π (xt ) do not directly represent anomalies. In order to create a robust anomaly
detection system, HTM introduces two additional steps. Firstly a raw anomaly score is computed from
the two sparse vectors. Then HTM computes an anomaly likelihood value which is thresholded to
determine whether the system is anomalous.
π ( x t −1 ) · a ( x t )
st = 1 − (3)
| a( xt )|
The raw anomaly score will be 0 if the current input is perfectly predicted, 1 if it is completely
unpredicted, or somewhere between 0 and 1 according to the similarity between the input and
the prediction.
Because of the continuous learning nature of HTM, the changes of the underlying system can be
handled gracefully. If there is a shift in the behavior of the system, the anomaly score will be high at
the point of this shift, however, as the HTM model adapts to the “ new normal ”, the anomaly score
will degrade to zero automatically [1].
Sensors 2018, 18, 3367 5 of 13
W −1
∑ii=
=0 s t −i
µt = (4)
k
W −1
∑ii= ( s t −i − µ t )2
σt2 = =0
(5)
k−1
Then HTM evaluates a recent short-term average of the anomaly scores, and thresholds the
Gaussian tail probability (Q-function [21]) to decide whether or not to declare an anomaly. The anomaly
likelihood is defined as the complement of the tail probability:
µet − µt
Lt = 1 − Q( ) (6)
σt
where: 0
∑ i =W − 1 s t − i
µet = i=0 (7)
k
W 0 is a window for a short-term moving average, where W 0 << W. HTM sets the threshold Lt, if it is
very close to 1, an anomaly will be reported:
anomaly ≡ Lt ≥ 1 − e (8)
HTM sets Q as an array of the tail probability for the standard normal distribution, which
preserves the probability corresponding to the mean. In order to facilitate the calculation, HTM divides
the range of values of µ in the normal distribution[σ, 3.5σ] into 70 intervals equidistantly, so that it sets
the tail probability to 71 values and puts them into an array Q, as shown in Table 1.
In the process of thresholding Lt , thresholding the tail probability makes the number of alerts have
an inherent upper limit. In addition, since e is very close to 0, it would be unlikely to get alerts with
probability much higher than e, which also imposes an upper limit on the number of false positives.
P ( x1 , x2 , . . . , x i | y = c k ) P ( y = c k )
P(y = ck | x ) = (10)
∑ k P ( y = c k ) P ( x1 , x2 , . . . , x i | y = c k )
where ck represents the k-th class of y and i is the number of characteristic variables of x, xi is the
value corresponding to the i-th characteristic variable. Let M be the characteristic variable dimension
of x (i.e., the total number of characteristic variables). If characteristic variables in the sample x are
independent of each other, the above formula can be simplified as:
∏iM=1 P ( x i | y = c k ) ∗ P ( y = c k )
P(y = ck | x ) = (11)
∑k P(y = ck ) ∏iM =1 P ( x i | y = c k )
For the same group of state variables [X,Y,Z], the denominator of the formula is the same, so only
the numerator is needed to be compared and we get:
When using a set of [X, Y, Z] data as a training sample (training set), the following Bayesian
formula is used to calculate the posterior distribution of the target variable:
3
c( x ) = argmax ∏i=1 P( ak |c) P(c) (13)
where ak represents three attribute variables[X, Y, Z], so k can be the value of 1,2,3.
The following formula is used to calculate P(c):
∑in=1 I (c = c j )
P(c) = (14)
n
where, n is the number of training samples, c j marks the j-th category of c, I (c = c j ) is the instruction
function, when the equation in brackets is true, its value is 1, otherwise 0. Then it is assumed that the
k-th characteristic in the three characteristics has l values, and one of the values is assumed to be a jl ,
the following formula is used to calculate P (ak |c):
j
∑in=1 I ( xi = a jl , c = c j )
P( ak |c) = (15)
∑in=1 I (c = c j )
After the above calculation, we can get the probability that a set of eigenvalues belongs to a class
and then perform the classification.
4.2. RADM
Based on the above HTM algorithm and Naive Bayesian model, we design a real-time anomaly
detection algorithm in MTS, RADM. The anomaly detection flowchart of RADM is described in
Figure 5.
The ternary sequences X (t), Y (t)and Z (t) need to be detected separately, three HTM networks
are used to study and predict three sequences, we get the anomaly likelihood through HTM algorithm,
and a list of anomaly likelihood is composed of the anomaly likelihood in three sequences. The specific
process can be found in Section 3.
(2) Discretization
Because in the calculation of anomaly likelihood the values of the anomaly likelihood are also
divided into 70 intervals, in order to facilitate the study and reduce the amount of data and calculation
Sensors 2018, 18, 3367 8 of 13
of training sample in parameter learning, we need to discretize the anomaly likelihood. By analyzing
the likelihood data we know that the anomaly likelihood of each variable is related to the standard
normal distribution and the range of the low anomaly likelihood is large but the effect to the anomaly
detection is negligible. After the experimental tests, combined with the short-term sliding average we
use the equivalent interval method to get the threshold intervals and the discrete values, as shown in
Table 2.
We use the method of parameter learning to build Naive BN. After we know the Naive Bayesian
structure, the corresponding weights are assigned to each time series according to the prior knowledge.
Since NET is the principle time series in this experiment, we need to increase the weight of NET when
training BN, and then builds the training sample based on these weights to train BN. We use Matlab
to implement BN and after the discretization the junction tree method is used to infer the anomaly
classification, so that we get the list of anomaly regions.
The relevant pseudo-code of RADM is shown as follow Algorithm 1:
Algorithm 1: RADM.
1: Input: S(t);
3: while 1 do
4: ALx (t) = HTM(S(t).X (t));
// The list of anomaly likelihood in X (t) through HTM
9: end while
Since each exception likelihood obtained by HTM corresponds to a point in a time series, in the
actual test, there will be continuous anomaly points. It is obviously unreasonable to judge the number
of anomalies based on anomaly points. Therefore we use an anomaly region to represent an anomaly.
The anomaly region is divided according to the continuous situation between the anomaly points,
the formal definition is as follows:
Sensors 2018, 18, 3367 9 of 13
If the distance between an anomaly point A1 and another anomaly point A2 is less than the
division distance S, then we put A1 and A2 into the same anomaly region, on the contrary, they belong
to different anomaly regions. The division distance S is also called the division window.
So as to improve the efficiency of the algorithm, this paper introduces the concept of health factor
α, which is defined as Formulas (16) and (17). Sn is the anomaly score calculated by HTM. Health factor
means whether the system is running well enough or not. The reason for introducing health factor is
that the ratio of anomalies in different systems is different. For a system with extremely low probability
of anomalies in daily operation, if we import all data into the Naive Bayesian model for joint detection,
it will waste a lot of system resources and also affect the temporal performance of the algorithm.
In this paper, only when α is higher than the threshold θ, the data is imported into the Naive Bayesian
model for joint detection, which can effectively improve the temporal performance of the algorithm.
The threshold θ can be dynamically selected between 0 and 1 depending on the system state. When θ
is equal to 0 (θ = 0), it means that all the data is imported into the Naive Bayesian model for joint
determination, which corresponds to the condition where the system state is extremely unhealthy.
When θ is equal to 1 (θ = 1), it represents the system is completely normal, and no matter what value of
α is, it is not necessary to enter the Naive Bayesian model for joint determination.
H = ( S1 , S2 , S3 , . . . , S n ) (16)
q
α = || H || = ( S1 − S1 ) 2 + ( S2 − S2 ) 2 + · · · + ( S n − S n ) 2 (17)
(a)
(b)
(c)
Figure 6. Analysis of the relevance between different dimensions: (a) Establish relevance based on
MEM, (b) Independent CPU and NET, (c) Mixed situation.
Sensors 2018, 18, 3367 11 of 13
Definitely, the anomaly detection in UTS is the basis of RADM. However, as the dimension
increases, the MTS can play a greater role in anomaly detection in a complex system and reflect the
relevance between data, and this relevance is the embodiment of data validity.
Compared with the running time of the two experiments, we can see that the running time of
RADM is about three times of the anomaly detection algorithm based on HTM, even so, RADM also
Sensors 2018, 18, 3367 12 of 13
has a high computational efficiency. That is because for every 4000 time series it only costs about 65 s,
which has been able to meet the requirements of real-time anomaly detection.
6. Conclusions
Anomaly detection is an important research direction. Even though one of the terminal nodes
performs anomalously, it will gradually expand the impact and threaten the entire system. RADM
algorithm proposed in this paper, which combines HTM algorithm and BN effectively, can be applied
to the anomaly detection in MTS in complex system. Compared to UTS, the analysis for MTS can detect
anomalies in the system effectively. First of all, we use HTM algorithm to detect anomalies in UTS and
get fine detection results as well as excellent time performance. Secondly, we combine HTM algorithm
and BN together to perform anomaly detection in MTS effectively without dimension reduction. Some
anomalies that be left out in UTS can be detected by this means and detection accuracy is improved.
Lastly, in order to improve the efficiency of the algorithm, we introduce the concept of health factor α
to describe whether the system is healthy or not. This method can greatly improve the performance of
the algorithm in the health system. Extensive simulation results show that using RADM algorithm
to perform anomaly detection in MTS can achieve better result than just in UTS. Furthermore, it will
remarkably improve the performance of real-time anomaly detection in many domains. Our future
work will include optimizing our algorithm further and improving the detection accuracy. We also
attempt to build stronger correlations between multiple variables using other models and apply our
algorithms to other areas.
Author Contributions: Conceptualization, N.D.; Methodology, N.D.; Resources, N.D.; Analysis, N.D.;
Writing-Original Draft, H.B.; Writing-Review and Editing, N.D.; Supervision, N.D.; Funding Acquisition, N.D;
Software, H.B. and H.G.; Validation, H.M.; Visualization, H.S.; Project Administration, N.D; Preparation, N.D. and
H.G. and H.B.
Funding: This work was supported in part by the National Science Foundation of China No. 61471084 and the
Open Program of State Key Laboratory of Software Architecture No. SKLSA2016B-02.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Ahmad, S.; Purdy, S. Real-Time Anomaly Detection for Streaming Analytics. arXiv 2016, arXiv:1607.02480 .
2. Szmit, M.; Szmit, A. Usage of modified holtwinters method in the anomaly detection of network traffic:
Case studies. J. Comput. Netw. Commun. 2012, 2012, 192913.
3. A Stanway. Etsy Skyline. Available online: https://github.com/etsy/skyline (accessed on 9 October 2018).
4. Bianco, A.M.; Garcia Ben, M.; Martinez, E.J.; Yohai, V.J. Outlier detection in regression models with ARIMA
errors using robust estimates. J. Forecast. 2010, 20, 565–579.
5. Adams, R.P.; MacKay, D.J. Bayesian Online Changepoint Detection. arXiv 2007, arXiv:0710.3742v1.
6. Tartakovsky, A.G.; Polunchenko, A.S.; Sokolov, G. Efficient Computer Network Anomaly Detection by
Changepoint Detection Methods. IEEE J. Sel. Top. Sign. Process. 2013, 7, 4–11. [CrossRef]
7. Lavin, A.; Ahmad, S. Evaluating Real-time Anomaly Detection Algorithms–the Numenta Anomaly
Benchmark. In Proceedings of the IEEE 14th International Conference on Machine Learning and Applications
(ICMLA), Miami, FL, USA, 9–11 December 2015; pp. 38–44.
8. Kejariwal, A. Twitter Engineering: Introducing practical and robust anomaly detection in a time series.
Available online: http://bit.ly/1xBbX0Z (accessed on 6 January 2015).
9. Lee, E.K.; Viswanathan, H.; Pompili, D. Model-based thermal anomaly detection in cloud datacenters.
In Proceedings of the International Conference on Distributed Computing in Sensor SystemsModel-based
thermal anomaly detection in cloud datacenters, Cambridge, MA, USA, 20–23 May 2013.
10. Klerx, T.; Anderka, M.; Büning, H.K.; Priesterjahn, S. Model-Based Anomaly Detection for Discrete Event
Systems. In Proceedings of the IEEE 26th International Conference on Tools with Artificial Intelligence,
Limassol, Cyprus, 10–12 November 2014.
Sensors 2018, 18, 3367 13 of 13
11. Simon, D.L.; Rinehart, A.W. A Model-Based Anomaly Detection Approach for Analyzing Streaming Aircraft
Engine Measurement Data; GT2014-27172; American Society of Mechanical Engineers: New York, NY,
USA, 2015.
12. Candelieri, A. Clustering and support vector regression for water demand forecasting and anomaly detection.
Water 2017, 9, 224. [CrossRef]
13. Candelieri, A.; Soldi, D.; Archetti, F. Short-term forecasting of hourly water consumption by using automatic
metering readers data. Procedia Eng. 2015, 119, 844–853. [CrossRef]
14. Li, Z.X.; Zhang, F.M.; Zhang, X.F.; Yang, S.M. Research on Feature Dimension Reduction Method for
Multivariate Time Series. J Chin. Comput. Syst. 2001, 20, 565–579. (In Chinese)
15. Xu, Y.; Hou, X.; Li, S.; Cui, J. Anomaly detection of multivariate time series based on Riemannian
manifolds.Computer Science. Int. J. Biomed. Eng. 2015, 32, 542–547.
16. Qiu, T.; Zhao, A.; Xia, F.; Si, W.; Wu, D.O.; Qiu, T.; Zhao, A.; Xia, F.; Si, W.; Wu, D.O. ROSE: Robustness
Strategy for Scale-Free Wireless Sensor Networks. IEEE/ACM Trans. Netw. 2017, 25, 2944–2959. [CrossRef]
17. Majhi, S.K.; Dhal, S.K. A Study on Security Vulnerability on Cloud Platforms. Procedia Comput. Sci. 2016, 78,
55–60. [CrossRef]
18. Qiu, T.; Zheng, K.; Han, M.; Chen, C.P.; Xu, M. A Data-Emergency-Aware Scheduling Scheme for Internet of
Things in Smart Cities. IEEE Trans. Ind. Inf. 2018, 14, 2042–2051. [CrossRef]
19. Hawkins, J.; Ahmad, S.; Dubinsky, D. HTM Cortical Learning Algorithms. Available online: https://numenta.
org/resources/HTM_CorticalLearningAlgorithms.pdf (accessed on 12 September 2011).
20. Padilla, D.E.; Brinkworth, R.; McDonnell, M.D. Performance of a hierarchical temporal memory network in
noisy sequence learning. In Proceedings of the IEEE International Conference on Computational Intelligence
and Cybernetics, Yogyakarta, Indonesia, 3–4 December 2013; pp. 45–51.
21. Karagiannidis, G.K.; Lioumpas, A.S. An improved approximation for the Gaussian Q-function.
IEEE Commun. Lett. 2007, 11, 644–646. [CrossRef]
22. Cocu, A.; Craciun, M.V.; Cocu, B. Learning the Structure of Bayesian Network from Small Amount of Data.
Ann. Dunarea de Jos Univ. Galati Fascicle III Electrotechn. Electron. Autom. Control Inform. 2009, 32, 12–16.
23. Taheri, S.; Mammadov, M. Learning the naive Bayes classifier with optimization models. Int. J. Appl. Math.
Comput. Sci. 2013, 23, 787–795. [CrossRef]
24. Htm.java. Available online: https://github.com/numenta/htm.java (accessed on 9 October 2018).
c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).