15-A Novel Multi-Modal Incremental Tensor Decomposition For Anomaly Detection in Large-Scale Networks
15-A Novel Multi-Modal Incremental Tensor Decomposition For Anomaly Detection in Large-Scale Networks
Information Sciences
journal homepage: www.elsevier.com/locate/ins
A R T I C L E I N F O A B S T R A C T
Keywords: Network traffic anomaly detection is a crucial task for today’s network monitoring and mainte-
Multi-modal incremental tensor nance. However, with the rapid growth of network data volume, the data structure has become
Tensor decomposition more and more complex, showing multi-modal characteristics, which makes traffic anomaly
Machine learning
detection face a great challenge. The earlier proposed anomaly detection methods have the
Anomaly detection
following deficiencies, 𝑖) Most of them are static or dynamic detection methods that only
grow along the temporal modality. 𝑖𝑖) Lower detection rate or higher computational cost. To
address these deficiencies, this article proposes a traffic anomaly detection framework based
on multi-modal incremental tensor decomposition, which has the following three highlights, 𝑖)
Constructing traffic data as a tensor model to fully mine the correlation between data, and the
proposed framework is applicable to the situation where traffic data grows dynamically along
multiple modes. 𝑖𝑖) Using the multi-modal incremental tensor decomposition method to process
dynamically growing data without decomposing all the data, greatly reducing computational cost
and improving data quality. 𝑖𝑖𝑖) Using the XGBoost classification algorithm for anomaly detection
to improve detection accuracy. Finally, the results of experiments on two real network traffic
datasets NSL-KDD and CICDDOS 2019 show that the proposed framework can achieve a high
detection rate of 99.21%, and has the characteristics of good scalability and fast detection speed.
1. Introduction
Abnormal traffic, such as port scanning, denial of service attacks (DoS), distributed denial of service attacks (DDoS), and the
spread of worms, etc., can lead to network congestion, network paralysis, and information leakage, resulting in extremely adverse
effects. In addition, with the continuous expansion of network scale and the increasing complexity of network structure, it is becoming
increasingly difficult to accurately and quickly detect and diagnose abnormal traffic in large-scale network data. In particular, the
wave of telecommuting and cloud migration caused by COVID-19 in recent years has led to a sharp increase in global cyber attacks
and frequent abnormal traffic. Therefore, it is urgent to innovate technologies and methods to improve the accuracy of anomaly
detection and reduce the damage of network attacks.
* Corresponding author.
E-mail address: [email protected] (P. Wang).
1
The Two Authors contribute equally to this work, they are listed as the co-first author.
https://doi.org/10.1016/j.ins.2024.121210
Received 13 March 2024; Received in revised form 2 July 2024; Accepted 18 July 2024
Available online 23 July 2024
0020-0255/© 2024 Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
Early literature on anomaly detection [1] usually constructed traffic data as a matrix model, and used matrix-based decomposition
algorithms such as Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) to process the data, and then
performed anomaly detection by setting thresholds. This method achieves better detection performance when the volume of data is
small and the structure of the data is relatively simple.
In recent years, machine learning (ML) algorithms have developed rapidly and have been widely used in various fields [2]. ML
algorithms have higher detection rates and faster speeds than earlier anomaly detection methods. However, when we perform anomaly
detection, data inevitably has redundancy and noise, which will reduce data quality and affect detection performance. Moreover, with
the rapid growth of data volume, the data structure becomes complex and the data shows multi-modal characteristics, which makes
data processing more and more difficult. For this situation, anomaly detection methods based on tensor decomposition have been
proposed and achieved good detection performance. Because modeling the data as a tensor can well retain the correlation between
the data, and tensor decomposition [3] of the data can effectively reduce the dimensionality of the data and feature extraction, so as
to improve the quality of the data and then improve the detection accuracy. Therefore, some anomaly detection methods combining
tensor decomposition and ML algorithms have also been proposed.
However, most of the anomaly detection methods are static and do not meet the real-time requirements of real-world applications.
Some dynamic detection methods only consider the data growth along one mode. Therefore, when the data grows along multiple
modes, how to efficiently detect anomalies is still a serious challenge.
Aiming at solving the above problems, this paper proposes an anomaly detection framework based on multi-modal incremental
tensor decomposition, which combines the advantages of tensor decomposition and ML to detect abnormal traffic accurately and
quickly. The framework is mainly divided into three steps: data preprocessing, multi-modal incremental tensor decomposition, and
anomaly detection. Where the multi-modal incremental tensor decomposition method is an improvement of the framework proposed
by [4]. The anomaly detection step adopts the extreme gradient boosting (XGBoost) classification algorithm in the ML classification
algorithm, which has the characteristics of fast speed and scalability.
The key contributions of this paper can be summarized as follows.
• This paper proposes an anomaly detection framework that is dynamic and applicable to large-scale network data that grows
along multiple modes.
• The proposed detection framework performs traffic anomaly detection at a faster speed and lower computational cost. The multi-
modal incremental tensor decomposition method calculates the Tucker decomposition results at the current time only based on
the decomposition results of the historical tensor data, and does not decompose all the tensor data or perform expensive SVD
calculations. For massive data in large-scale networks, we have significantly reduced storage and computation costs.
• The dynamically growing data is processed using incremental tensor decomposition, which can reduce data redundancy and
noise and improve data quality. In addition, the XGBoost classification algorithm has better classification performance compared
to other ML methods. Therefore, overall, the detection framework proposed in this paper has better detection performance.
The remaining sections of the paper are organized according to the following structure. Section 2 describes related work. Section 3
introduces the preliminaries related to this paper. Section 4 introduces the framework for network traffic anomaly detection based
on multi-modal incremental tensor decomposition proposed in this paper. The performance evaluation and conclusion will be given
in Section 5 and Section 6, respectively.
2. Relation work
Network traffic anomaly detection is an important barrier to prevent network attacks. It judges whether there is abnormal traffic
by detecting some characteristics of network traffic or changes in traffic size.
In this section, we mainly review some network traffic anomaly detection methods, which are divided into statistical-based,
machine learning-based, and tensor decomposition-based detection methods.
2.1. Statistical-based
The statistical-based detection method is relatively mature and has been widely used. It is used to identify normal and abnormal
traffic in the network by setting a threshold.
Huang et al.[5] and Lakhina et al. [6] use the PCA method to project the traffic data constructed as a matrix model into the normal
subspace and abnormal subspace, and then use Q statistical analysis method in the abnormal subspace to identify normal traffic and
abnormal traffic. Yeh et al.[7] and Lee et al. [8] proposed the method of oversampling PCA. This method obtains the principal direction
by using the PCA method on the data, and then uses the “Leave One Out” method to check the influence of each data point on the
change of the principal direction to identify whether the traffic is abnormal. Udhayan et al. [9] proposed a Statistical Segregation
Method (SSM) for DDoS attack detection. This method samples traffic data and compares it with attack status conditions, and then
performs correlation analysis to identify abnormal traffic. Fortunati et al. [10] proposed an improved method of anomaly detection
method based on covariance. This method constructs a covariance matrix based on network traffic data to obtain a norm distribution,
and then detects abnormal traffic by setting a threshold.
Although statistics-based anomaly detection methods are widely used and can detect unknown anomalous traffic, the thresholds
in this method are difficult to balance, resulting in low detection rates.
2
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
Table 1
Description of the notation.
Symbol Description
Anomaly detection based on ML has been a popular research direction in recent years. The detection performance is relatively
improved compared to statistical-based methods.
Han et al.[11] proposed a naive Bayesian network intrusion detection method based on PCA. The method extracts the main
features by PCA and calculates the contribution rate, and then uses the contribution rate as the weight to form a new Bayesian
classification model. Compared with the traditional method, this method reduces the data dimension and improves the detection
performance. Peng et al. [12] proposed an Software Defined Network (SDN) based DDoS attack detection method (DPTCM-KNN
algorithm), which combines KNN and double P-value for DDoS attack detection. Hwang et al.[13] proposed the D-PACK method,
which uses a convolutional neural network (CNN) to automatically learn features, and then uses an unsupervised deep learning (DL)
model to identify abnormal traffic. Li et al.[14] proposed a novel algorithm called Adaptive label Propagation (ALP). ALP identifies
overlapping anomaly groups through tag propagation and belonging coefficients, and deals with the particularity of different types of
nodes and edges in heterogeneous networks through an adaptive neighbor weighting mechanism. Wu et al.[15] proposed an intrusion
detection model based on CNN, which can automatically select features to solve the imbalanced data problem. The model can achieve
better detection performance on the NSL-KDD dataset. Garg et al.[16] proposed a hybrid anomaly detection method based on DL in
SDN. This method uses a restricted Boltzmann machine and a support vector machine for anomaly detection.
However, most ML-based methods do not consider the multi-modal characteristics of data, and do not improve data quality by
adopting more efficient algorithms to remove data redundancy and noise.
Tensor decomposition is a method for processing large-scale data. This method does not destroy the spatial structure and internal
potential information of the original data, and is more robust to noise. [17–22] introduced a lot of tensor decomposition methods,
all of which have achieved good experimental results in their respective application fields.
Sun et al.[23] applied the incremental tensor analysis (ITA) method to anomaly detection, which can effectively reveal hidden
correlations in high-dimensional data and improve the anomaly detection rate. Wang et al.[24] used tensor principal component
analysis (TPCA) to detect network attacks in SDNs and proposed a framework for big data-driven network attack detection. Li et
al.[25] proposed an online anomaly detection method based on tensor decomposition. The method uses incremental CP decomposition
for dynamically growing data, which reduces the computational and storage costs. Xie et al.[26] proposed the TensorDet method,
which applies two new techniques, sequential tensor truncation and two-phase anomaly detection, to improve detection accuracy and
speed. Huang et al.[27] proposed a Dynamic Sequence Tensor Recovery (DSTR) algorithm, which uses the incremental High Order
Singular Value Decomposition (HOSVD) method to process dynamic data to improve detection accuracy and reduce cost. Maranhão
et al.[28] combined Higher Order Orthogonal Iteration (HOOI) and Multiple Denoising (MuDe) methods to improve data quality, and
then used supervised machine learning algorithms for anomaly detection. Xu et al.[29] proposed a DDoS attack detection framework
that combines multi-modal denoising algorithms based on tensor SVD and ML classification algorithms. Compared with statistical
detection methods and traditional ML detection methods, the detection performance is improved.
Although tensor decomposition-based methods have achieved good detection performance in the field of anomaly detection, most
dynamic detection methods only consider the case when the tensor data grows along one mode, and there is a lack of research when
the data grows along multiple modes.
3. Preliminaries
This section introduces the concepts represented by the mathematical notation associated with the tensor. For brevity, the nota-
tional descriptions used in this article are presented in Table 1.
3
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
Definition 1. Tensor is an extension of vectors and matrices to higher dimensions. An 𝑁 -order tensor is denoted as ∈ 𝑅𝐼1 ×𝐼2 ×⋯×𝐼𝑁 ,
where 𝑁 is the order or mode of . The elements of the tensor are denoted as 𝑥𝑖1 ,⋯,𝑖𝑛 ,⋯,𝑖𝑁 , where 𝑖𝑛 ∈ {1, 2, ⋯ , 𝐼𝑛 } and 1 ≤ 𝑛 ≤ 𝑁 .
Definition 2. Slices are two-dimensional sections of a tensor, which is defined by fixing all but two indices. Fig. 1 shows the horizontal
slice X(𝑖,∶,∶) , the lateral slice X(∶,𝑗,∶) and the frontal slices X(∶,∶,𝑘) of the 3-order tensor, respectively.
Definition 3. Unfolding is ∏ the process of transforming a tensor into a matrix. For a tensor ∈ 𝑅𝐼1 ×𝐼2 ×⋯×𝐼𝑁 , the mode-𝑘 unfolding
𝐼𝑘 × 𝑖≠𝑘 𝐼𝑖
is denoted as ()(𝑘) ∈ 𝑅 . Fig. 2 shows the unfolding process of the 3-order tensor ∈ 𝑅𝐼×𝐽 ×𝐾 into three matrices, where
()(1) ∈ 𝑅𝐼×(𝐽 𝐾) , ()(2) ∈ 𝑅 𝐽 ×(𝐼𝐾) and ()(3) ∈ 𝑅𝐾×(𝐼𝐽 ) are the matrices of the tensor unfolded along the first, second and third
modes.
Definition 4. The product of a tensor in the 𝑘-th mode with a matrix or vector is called the 𝑘-mode product. The 𝑘-mode product of
∈ 𝑅𝐼1 ×⋯×𝐼𝑘 ×⋯×𝐼𝑁 and A ∈ 𝑅𝑃 ×𝐼𝑘 is denoted as
Fig. 3 shows the 1-mode product of a third-order tensor ∈ 𝑅5×3×3 and a matrix 𝐀 ∈ 𝑅3×5 , and the result is a tensor of size
3 × 3 × 3.
The 𝑘-mode product satisfies a property that the order of multiplication is uncorrelated for different modes in multiplication. If
the modes are different (i.e., 𝑘 ≠ 𝑘′ )
4
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
Definition 6. The Tucker decomposition is to decompose the tensor ∈ 𝑅𝐼1 ×𝐼2 ×⋯×𝐼𝑁 into the form of the product of a core tensor
∈ 𝑅𝑅1 ×𝑅2 ×⋯×𝑅𝑁 and 𝑁 factor matrices A(𝑛) ∈ 𝑅𝐼𝑛 ×𝑅𝑛 , which is defined as
Eq. (6) can also be described as = ×{A(𝑛) }, where the factor matrix {A(𝑛) } is usually orthogonal. When 𝑅𝑛 < 𝐼𝑛 , the core tensor
can be regarded as a compression of the tensor . In Fig. 4, the 3-order tensor ∈ 𝑅𝐼×𝐽 ×𝐾 is decomposed into three factor matrices
A ∈ 𝑅𝐼×𝑅1 , B ∈ 𝑅𝐽 ×𝑅2 , C ∈ 𝑅𝐾×𝑅3 and a 3-order core tensor ∈ 𝑅𝑅1 ×𝑅2 ×𝑅3 .
The Tucker decomposition of the 3-order tensor can be constructed by the SVD of the matrix. Firstly, the SVD is performed on the
matrices ()(1) , ()(2) and ()(3) . Then the three left singular matrices 𝐀, 𝐁 and 𝐂 are obtained, and finally the core tensor will be
calculated according to Eq. (7).
= ×1 A𝑇 ×2 B𝑇 ×3 C𝑇 . (7)
Definition 7. Dividing the 𝑁 -th order tensor into 2𝑁 sub-tensors 𝑢1 ,⋯,𝑢𝑁 ∈Θ and dividing matrix 𝐀(𝑛) ∈ 𝑅𝐼𝑛 ×𝑅𝑛 into 𝐀(𝑛),1 ∈
𝑅𝐼𝑛,1 ×𝑅𝑛 and 𝐀(𝑛),2 ∈ 𝑅𝐼𝑛,2 ×𝑅𝑛 , where 𝐼𝑛,1 + 𝐼𝑛,2 = 𝐼𝑛 and 𝐀𝑇(𝑛) = [𝐀𝑇(𝑛),1 𝐀𝑇(𝑛),2 ] ∈ 𝑅𝑅𝑛 ×(𝐼𝑛,1 +𝐼𝑛,2 ) . Then, the block tensor matrix multipli-
cation is defined as
∑
× {𝐀(𝑛) } = 𝑢1 ,⋯,𝑢𝑁 × {𝐀(𝑛),𝑢𝑛 }. (8)
(𝑢1 ,⋯,𝑢𝑁 )∈Θ
This section introduces the framework for network traffic anomaly detection based on the multi-modal incremental tensor decom-
position proposed in this paper. In Fig. 5, the detection framework is mainly divided into three modules, which are data preprocessing,
multi-modal incremental tensor decomposition (MMITD), and anomaly detection.
5
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
Moreover, the existing dataset contains a large number of features, some of which have little effect on the results of anomaly
detection [30]. Therefore, these features can be removed and some meaningful features are retained to improve the detection accuracy.
The main idea of the MMITD method is to calculate the Tucker decomposition result of the tensor at time 𝑡 + 1 according to the
Tucker decomposition result at time 𝑡, and finally obtain low-rank and clean tensor data at time 𝑡 + 1.
(𝑡) (𝑡) (𝑡) (𝑡)
𝐼1 ×𝐼2 ×⋯×𝐼𝑁−1 ×𝐼𝑁
Firstly, the truncated Tucker decomposition is performed on the tensor data (𝑡) ∈ 𝑅 to obtain (𝑡) ∈
(𝑡)
𝑅𝑅1 ×𝑅2 ×⋯×𝑅𝑁 and 𝐀(𝑡)
(𝑛)
∈ 𝑅𝐼𝑛 ×𝑅𝑛 , and the truncated rank (𝑅1 , 𝑅2 , ⋯ , 𝑅𝑁 ) is calculated by Eq. (9),
6
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
Fig. 7. Decomposing tensor into low-rank tensor (noise-free tensor) and sparse tensor (noise tensor).
∑𝑅𝑛 2
𝑖𝑛 =1
(𝜎𝑖(𝑛) )
𝑛
𝑄≥ , 𝑛 ∈ {1, 2, ⋯ , 𝑁}, (9)
∑𝐼 𝑛 2
𝑖𝑛 =1
(𝜎𝑖(𝑛) )
𝑛
(𝑛)
where 𝜎𝑖 is the singular value of the tensor data along the 𝑛-th modal unfolded matrix and 𝑄 is the ratio. Compared with directly
𝑛
fixing the rank of the tensor in [4], our method allows the tensor data to retain some variability to obtain the main features of the
data.
(𝑡+1)
Then, (𝑡+1) and 𝐀(𝑛) are calculated based on the newly added tensor data at time 𝑡 + 1 and the decomposition results (𝑡) and
𝐀(𝑡)
(𝑛)
at time 𝑡, which includes the partitioning of the incremental tensor and updating the factor matrix and the core tensor.
(𝑡+1)
Finally, the tensor 𝑛𝑒𝑤 is calculated according to Eq. (6) and used for anomaly detection.
Then, we divide (𝑡+1) into 2𝑁 sub-tensors, denoted as 𝑢1 ,⋯,𝑢𝑁 , where 𝑢1 , ⋯ , 𝑢𝑁 ∈ Θ ≜ {0, 1}𝑁 is an 𝑁 -term two-tuple. When
(𝑡+1)
(𝑡+1)
𝑢𝑛 = 0, the sub-tensor 0,⋯,0 = (𝑡) , and the remaining sub-tensors are the newly added data at time 𝑡 + 1.
Finally, we divide the newly added 2𝑁 − 1 sub-tensors (𝑢1 ,⋯,𝑢𝑁 )(𝑢1 ,⋯,𝑢𝑁 )≠(0,⋯,0) into 𝑁 categories according to the number of
(𝑡+1)
indices 𝑢𝑛 = 1, denoted as ℂ𝑛 .
In Fig. 5, taking the 3-order tensor as an example, we divide the newly added 23 − 1 sub-tensors into three categories: ℂ1 , ℂ2 and
ℂ3 .
(𝑡+1) (𝑡+1) (𝑡+1)
ℂ1 = {1,0,0 , 0,1,0 , 0,0,1 },
(𝑡+1) (𝑡+1) (𝑡+1)
ℂ2 = {1,1,0 , 1,0,1 , 0,1,1 }, (10)
(𝑡+1)
ℂ3 = {1,1,1 }.
where in the right half of the equation, 𝐀(𝑛) = 𝐀(𝑛) ∈ 𝑅𝐼𝑛 ×𝑅𝑛 if 𝑢𝑛 = 0 and 𝐀(𝑛) = 𝐀′(𝑛) ∈ 𝑅𝑑𝑛 ×𝑅𝑛 if 𝑢𝑛 = 1.
(𝑡)
Then, calculating the pseudo-inverse of the matrix of the unfolding of the tensor −𝑢𝑛 along the 𝑛-th mode, the extension matrix
𝐀′(𝑛) will be obtained according to Eq. (12),
𝐀′new
(𝑛)
← 𝛼𝐀′old
(𝑛)
+ (1 − 𝛼)(𝑢(𝑡+1) ) (−𝑢𝑛 )†(𝑛) ,
1 ,⋯,𝑢𝑁 (𝑛)
(12)
where 𝛼 indicates the extent to which the information obtained in the previous step is retained, and 𝛼 ∈ (0, 1).
After obtaining an extension matrix, it does not satisfy the properties of the unitary matrix, so it will be orthogonalized to be
available for the next step. Compared with the method in [4] where only the final extension matrix is orthogonalized, our method
ensures that the matrix used in the next step to calculate the tensor −𝑢𝑛 is orthogonalized, thus improving the accuracy of the tensor
decomposition and the anomaly detection rate.
The above update steps are repeated in each category and the final extension matrix 𝐀′(𝑛) will be obtained. Then, the matrix 𝐀′(𝑛)
(𝑡) (𝑡+1)
is concatenated with the matrix 𝐀(𝑛) along the second mode according to Eq. (13) to obtain 𝐕(𝑛) .
7
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
𝑇 𝑇
(𝐕(𝑡+1)
(𝑛)
) = [(𝐀(𝑡)
(𝑛)
) (𝐀′(𝑛) )𝑇 ] . (13)
where
0,⋅,0 = (𝑡) ×1 𝐀(𝑡) ×3 𝐂(𝑡) ,
(16)
0,0,⋅ = (𝑡) ×1 𝐀(𝑡) ×2 𝐁(𝑡) .
Finally, 𝐀′ , 𝐁′ and 𝐂′ will be orthogonalized. Although we have updated all the extension matrices, the update process in ℂ2 and
ℂ3 is also closely related to the extension matrix. Therefore, the results of this update will be used in the next step.
(𝑖𝑖): Updating Extension Matrix with ℂ2 .
There are three sub-tensors in ℂ2 , and each sub-tensor will be used for the update of two extension matrices. For example, the
(𝑡+1)
sub-tensor 1,1,0 will be used to update 𝐀′ and 𝐁′ in Eq. (17),
(𝑡+1)
𝐀′new ← 𝛼𝐀′old + (1 − 𝛼)(1,1,0 ) (⋅,1,0 )†(1) ,
(1)
(𝑡+1) (17)
𝐁′new ← 𝛼𝐁′old + (1 − 𝛼)(1,1,0 ) (1,⋅,0 )†(2) ,
(2)
where
⋅,1,0 = (𝑡) ×2 𝐁′ ×3 𝐂(𝑡) ,
(18)
1,⋅,0 = (𝑡) ×1 𝐀′ ×3 𝐂(𝑡) ,
since the index 𝑢1 = 𝑢2 = 1, we use the extension matrices 𝐀′ and 𝐁′ updated in ℂ1 instead of the factor matrix 𝐀(𝑡) and 𝐁(𝑡) at time
𝑡. But 𝑢3 = 0, we use the factor matrix 𝐂(𝑡) at time 𝑡.
(𝑡+1) (𝑡+1)
Similarly, the update equations using sub-tensor 1,0,1 and 0,1,1 are shown in Eq. (19),
(𝑡+1)
𝐀′new ← 𝛼𝐀′old + (1 − 𝛼)(1,0,1 ) (⋅,0,1 )†(1) ,
(1)
(𝑡+1)
𝐂′new ← 𝛼𝐂′old + (1 − 𝛼)(1,0,1 ) (1,0,⋅ )†(3) ,
(3)
(𝑡+1) (19)
𝐁′new ← 𝛼𝐁′old + (1 − 𝛼)(0,1,1 ) (0,⋅,1 )†(2) ,
(2)
(𝑡+1)
𝐂′new ← 𝛼𝐂′old + (1 − 𝛼)(0,1,1 ) (0,1,⋅ )†(3) ,
(3)
where
⋅,0,1 = (𝑡) ×2 𝐁(𝑡) ×3 𝐂′ ,
1,0,⋅ = (𝑡) ×1 𝐀′ ×2 𝐁(𝑡) ,
(20)
0,⋅,1 = (𝑡) ×1 𝐀(𝑡) ×3 𝐂′ ,
0,1,⋅ = (𝑡) ×1 𝐀(𝑡) ×2 𝐁′ .
Finally, similar to the previous steps, we orthogonalize the extension matrix obtained each time.
(𝑖𝑖𝑖): Updating Extension Matrix with ℂ3 .
(𝑡+1)
There is only one sub-tensor 1,1,1 in ℂ3 , and 𝑢1 = 𝑢2 = 𝑢3 = 1. The extension matrix is updated according to Eq. (21)
(𝑡+1)
𝐀′new ← 𝛼𝐀′old + (1 − 𝛼)(1,1,1 ) (⋅,1,1 )†(1) ,
(1)
(𝑡+1)
𝐁′new ← 𝛼𝐁′old + (1 − 𝛼)(1,1,1 ) (1,⋅,1 )†(2) , (21)
(2)
(𝑡+1)
𝐂′new ← 𝛼𝐂′old + (1 − 𝛼)(1,1,1 ) (1,1,⋅ )†(3) ,
(3)
8
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
where
⋅,1,1 = (𝑡) ×2 𝐁′ ×3 𝐂′ ,
1,⋅,1 = (𝑡) ×1 𝐀′ ×3 𝐂′ , (22)
1,1,⋅ = (𝑡) ×1 𝐀′ ×2 𝐁′ .
Similarly, orthogonalizing the extension matrix, we will obtain the final results 𝐀′ , 𝐁′ and 𝐂′ . Then, concatenating 𝐀(𝑡) , 𝐁(𝑡) and
𝐂(𝑡) with them along the second mode according to Eq. (13). Finally, orthogonalize them, the factor matrix 𝐀(𝑡+1) , 𝐁(𝑡+1) and 𝐂(𝑡+1)
at time 𝑡 + 1 will be obtained.
𝑇
(𝑡+1) = (𝑡+1) × {(𝐀(𝑡+1)
(𝑛)
) }
∑ (𝑡+1) 𝑇
= 𝑢(𝑡+1)
,⋯,𝑢 × {(𝐀(𝑛),𝑢 ) } 1 𝑁 𝑛
(𝑢1 ,⋯,𝑢𝑁 )∈Θ
(𝑡+1) 𝑇 ∑ 𝑇 (23)
= 0,0,0 × {(𝐀(𝑡+1)
(𝑛),0
) }+ 𝑢(𝑡+1) (𝑡+1)
,⋯,𝑢 × {(𝐀(𝑛),𝑢 ) }
1 𝑁 𝑛
(𝑢1 ,⋯,𝑢𝑁 )∈Θ′
𝑇 ∑ 𝑇
= (𝑡)
× {(𝐀(𝑡+1)
(𝑛),0
) 𝐀(𝑡)
(𝑛)
}+ 𝑢(𝑡+1) (𝑡+1)
,⋯,𝑢 × {(𝐀(𝑛),𝑢 ) }.
1 𝑁 𝑛
(𝑢1 ,⋯,𝑢𝑁 )∈Θ′
Next, the update process of the core tensor is illustrated as an example of the third-order tensor. Firstly, the factor matrices 𝐀(𝑡+1) ,
𝐁(𝑡+1) and 𝐂(𝑡+1) are divided into 𝐀(𝑡+1)
0
(𝑡+1) (𝑡+1) (𝑡+1) (𝑡+1) (𝑡+1)
, 𝐀1 , 𝐁0 , 𝐁1 , 𝐂0 , 𝐂1 , respectively. Then the core tensor is calculated according
to Eq. (24)
(𝑡+1) 𝑇 𝑇 𝑇 ∑ 𝑇 𝑇 𝑇
(𝑡+1) = (0,0,0 )×1 (𝐀(𝑡+1)
0
) ×2 (𝐁(𝑡+1)
0
) ×3 (𝐂(𝑡+1)
0
) + 𝑢(𝑡+1) (𝑡+1)
,𝑢 ,𝑢 ×1 (𝐀𝑢 ) ×2 (𝐁(𝑡+1)
𝑢 ) ×3 (𝐂(𝑡+1)
𝑢 ) . (24)
1 2 3 1 2 3
(𝑢1 ,𝑢2 ,𝑢3 )∈Θ′
Algorithm 1: MMITD.
Input: 𝑁 -order tensor (𝑡) and (𝑡+1)
(𝑡+1) (𝑡+1)
Output: Low rank 𝑛𝑒𝑤 , (𝑡+1) , 𝐀(𝑛)
1 Rank (𝑅1 , 𝑅2 , ⋯ , 𝑅𝑁 ) of (𝑡) ← according to Eq. (9)
(𝑡)
2 (𝑡) , 𝐀(𝑛) ← Truncated Tucker decomposition for (𝑡)
3 ℂ𝑛 ← Partition the tensor (𝑡+1)
4 for ℂ𝑛 , 𝑛 = {1, 2, ⋯ , 𝑁} do
5 for 𝑢(𝑡+1)
,⋯,𝑢
∈ ℂ𝑛 do
1 𝑁
6 for 𝑢𝑛 ∈ {𝑢1 , 𝑢2 , ⋯ , 𝑢𝑁 } do
7 if 𝑢𝑛 = 1 then
8 Calculate the extension matrix 𝐀′(𝑛) according to Eq. (12)
9 Orthogonalization matrix 𝐀′(𝑛)
10 end
11 end
12 end
13 end
14 for 𝐀′(𝑛) , 𝑛 = {1, 2, ⋯ , 𝑁} do
(𝑡+1) (𝑡)
15 𝐕(𝑛) ← concatenate 𝐀′(𝑛) and 𝐀(𝑛)
(𝑡+1) (𝑡+1)
16 𝐀(𝑛) ← Orthogonalization matrix 𝐕(𝑛)
17 end
(𝑡+1) (𝑡+1) (𝑡+1)
18 Partition 𝐀(𝑛) into 𝐀(𝑛),0 and 𝐀(𝑛),1
(𝑡+1)
19 𝑛𝑒𝑤 , (𝑡+1)
← according to Eq. (6) and Eq. (23)
(𝑡+1) (𝑡+1)
20 return 𝑛𝑒𝑤 , (𝑡+1) , 𝐀(𝑛)
9
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
Table 2
Description of features in dataset NSL-KDD.
Table 3
Description of features in dataset CICDDOS2019.
(𝑡+1)
Completing the above steps will obtain the results (𝑡+1) and 𝐀(𝑛) of the tensor Tucker decomposition at time 𝑡 + 1, thus obtaining
(𝑡+1)
the approximate low-rank tensor 𝑛𝑒𝑤 .
Then it is transformed into a two-dimensional form for anomaly detection. Our detection
method uses XGBoost to classify and finally get normal and abnormal traffic.
XGBoost is a gradient-boosting-based ML algorithm with efficient, flexible, and scalable features. It can effectively handle large-
scale data, solve the problem of high-dimensional data, provide fast and accurate classification results, and avoid the phenomenon
of overfitting. Therefore, the XGBoost classification technique is widely used in various fields.
5. Performance evaluation
This section is the experimental part, which mainly evaluates the performance of anomaly detection. The experimental section
contains three subsections. The first subsection describes the two datasets used for the experiments, the second subsection mainly
introduces relevant evaluation metrics and the third subsection gives the experimental results and analysis.
Datasets used for anomaly detection in the experimental part of this article are the commonly used NSL-KDD benchmark dataset
[31] and the recent CICDDOS 2019 dataset [32]. We will describe these two datasets in detail below.
5.1.1. NSL-KDD
There are 41 features in the NSL-KDD dataset. After numerical processing of some features and removing some features that do
not contribute much to the classification, the final data containing 40 features and class labels is obtained. Then, we change the label
of normal data to ‘0’ and the label of abnormal data to ‘1’. Part of the feature information is shown in Table 2.
5.1.2. CICDDOS2019
The CICDDOS2019 dataset contains normal traffic and the latest DDoS attack traffic, similar to real-world data. The dataset is very
large and each piece of data contains 87 features. In the experiments, we remove features that have no impact on anomaly detection
from the 87 features, leaving 64 features [28] and labels. Similarly, change the label of normal data to ‘0’ and the label of abnormal
data to ‘1’. Part of the feature information is shown in Table 3.
This subsection describes the evaluation metrics, including CPU running time, Accuracy, Precision, Recall, False Alarm Rate, and
F1-Score.
𝐂𝐏𝐔 𝐫𝐮𝐧𝐧𝐢𝐧𝐠 𝐭𝐢𝐦𝐞: Evaluate the running efficiency of the detection algorithm, including tensor decomposition time and classi-
fication time.
𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 : Accuracy is a popular evaluation metric for classification models, which indicates the proportion of correctly classified
samples to the total number of samples, and is defined as
𝑇𝑃 +𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = . (25)
𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁
10
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
𝐏𝐫𝐞𝐜𝐢𝐬𝐢𝐨𝐧: Precision uses the result of the prediction as a judgment criterion and indicates the proportion of samples with a positive
prediction that are correctly predicted, which is denoted as
𝑇𝑃
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = . (26)
𝑇𝑃 + 𝐹𝑃
𝐑𝐞𝐜𝐚𝐥𝐥: Recall is judged by the actual sample and indicates the proportion of correctly predicted positive samples to the total
actual positive samples, which is denoted as
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = . (27)
𝑇𝑃 + 𝐹𝑁
𝐅𝐚𝐥𝐬𝐞 𝐀𝐥𝐚𝐫𝐦 𝐑𝐚𝐭𝐞: The False Alarm Rate, also known as the False Positive Rate and False Detection Rate, has a lower value
indicating the better performance of the model. The FAR is described as
𝐹𝑃
𝐹 𝐴𝑅 = . (28)
𝑇𝑁 + 𝐹𝑃
𝐅𝟏 − 𝐒𝐜𝐨𝐫𝐞: F1-Score uses both accuracy and recall to evaluate the model. The value is high and the model is more robust only
when both accuracy and recall perform well. The F1-Score is described as
2 ⋅ 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹 1−𝑆𝑐𝑜𝑟𝑒 = . (29)
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
For the NSL-KDD dataset in Table 2, the matrix is denoted as 𝐗 ∈ 𝑅𝑁×𝐾 . In the detection framework, 𝐗 is modeled as a third-order
tensor ∈ 𝑅𝐼×𝐽 ×𝐾 , where I=5, J=8, 𝐼 × 𝐽 =𝑁 , and 𝐾 is the total data volume. For the dataset CICDDOS2019 in Table 3, fold it
into a third-order tensor ∈ 𝑅𝐼×𝐽 ×𝐾 , where 𝐼 =𝐽 =8. In the experimental process, (i) different proportions of data are randomly
selected to obtain ∈ 𝑅𝐼×𝐽 ×𝐾 , where 𝐾 ′ = 𝐾 × 𝑓 and 𝑓 is the selected proportion, (ii) set different initial tensor and incremental
′
tensor for different proportion data and perform decomposition of multi-modal incremental tensor using Algorithm 1, (iii) 5-fold
cross-validation is used to generate training and testing sets. As shown in Fig. 8, dataset is divided into 5 blocks. In the experiments,
each block is used as 1 time testing set and 4 times training set, leading to 5 sets of experimental results. The evaluation metrics of
the final detection take the average value of these 5 sets of results.
Then, we explore the effect of the impact factor 𝛼 on the detection performance in the incremental tensor decomposition part.
For the NSL-KDD dataset, the initial tensor data size is set to 4 × 6 × (𝐾 ′ × 0.98), and the total tensor size is 5 × 8 × 𝐾 ′ after the
dimensions are increased. For the CICDDOS 2019 dataset, the initial tensor data size is set to 6 × 6 × (𝐾 ′ × 0.98) and the total tensor
size is 8 × 8 × 𝐾 ′ . 𝑓 is chosen as 0.2 in the experiment, and the experimental results are shown in Fig. 9. It can be found that the
value of the influencing factor 𝛼 has little effect on the accuracy. Therefore, take 𝛼 = 0.6 in the next experiment.
In addition, in the detection framework of this paper, when updating the factor matrix, each extension matrix is orthogonalized
after it has been updated. Compared with orthogonalizing only the final extension matrix, our method can achieve better detection
performance in the NSL-KDD dataset and CICDDOS 2019 dataset, as shown in Fig. 10.
Finally, a series of contrast experiments will be introduced in the next subsection.
11
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
Fig. 10. Shows the change in ACC after orthogonalizing each extension matrix and after orthogonalizing only the last extension matrix. (a) and (b) are the changes in
ACC on the NSL-KDD and CICDDOS2019 datasets, respectively.
• The detection framework proposed in this paper achieves better detection performance for network traffic data growing along
multiple modes.
• Compared with other methods, the framework in this paper has higher Accuracy, Precision, Recall, F1-Score, and lower False
Alarm Rate on two network datasets.
• In Fig. 11(f) and Fig. 12(f), it is obvious that the detection framework of this paper has a greatly reduced running time compared
to other methods, and the running time has been taken as a logarithmic result. Therefore, as the volume of data increases, it can
achieve speeds that are dozens, hundreds, or even thousands of times compared to other technologies.
12
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
Fig. 11. Comparing the detection performance of different tensor decomposition algorithms on the NSL-KDD dataset.
• Data growing along multiple modes have better detection performance on different classification algorithms and different dataset
sizes after being processed by Algorithm 1.
• Compared to other detection methods, the XGBoost classification technique has faster detection speed and better detection
performance on different proportions of data.
• The traffic anomaly detection framework proposed in this paper is highly robust. Better detection performance can be achieved
on different categories of datasets and different dataset sizes.
13
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
Fig. 12. Comparing the detection performance of different tensor decomposition algorithms on the CICDDOS2019 dataset.
6. Conclusion
In this paper, we propose an anomaly detection framework based on multi-modal incremental tensor decomposition in large-scale
networks. The framework is suitable for dynamic anomaly detection systems and considers the case where network data grows along
multiple modes. For this type of data, Tucker decomposition of multi-modal incremental tensor is used to reduce computational cost
and remove data redundancy and noise to improve data quality. Compared with tensor decomposition-based methods, our detection
method only utilizes low-cost tensor matrix multiplication and pseudo-inversion of matrices, and does not perform a complete tensor
14
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
Fig. 13. Comparing the detection performance of different ML algorithms on the NSL-KDD dataset.
decomposition on large-scale data. Therefore, the framework proposed in this paper is more suitable for large-scale data and has
higher detection speed and accuracy.
In addition, the XGBoost classification technique is used in the anomaly detection module, which makes our detection framework
perform better. Based on these characteristics, in our future work, we intend to apply this detection framework to SDN network
architectures to avoid serious harm due to network attacks.
15
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
Fig. 14. Comparing the detection performance of different ML algorithms on the CICDDOS2019 dataset.
Rongqiao Fan: Writing – original draft, Software, Methodology. Qiyuan Fan: Writing – review & editing, Writing – original draft,
Supervision, Software, Resources, Project administration, Methodology, Formal analysis, Data curation, Conceptualization. Xue Li:
Resources, Investigation. Puming Wang: Writing – review & editing, Supervision, Conceptualization. Jing Xu: Validation, Formal
analysis. Xin Jin: Visualization. Shaowen Yao: Project administration. Peng Liu: Data curation.
16
R. Fan, Q. Fan, X. Li et al. Information Sciences 681 (2024) 121210
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.
Data availability
Acknowledgements
This work was supported by National Nature Science Foundation of China Project No. 62166047, 62101481, 62002313 and The
15th Graduate Research Innovation Project of Yunnan University No. KC-23234593.
References
[1] H. Ringberg, A. Soule, J. Rexford, et al., Sensitivity of PCA for traffic anomaly detection, in: Proceedings of the 2007 ACM SIGMETRICS International Conference
on Measurement and Modeling of Computer Systems, 2007, pp. 109–120.
[2] P. Mignone, R. Corizzo, M. Ceci, Distributed and explainable GHSOM for anomaly detection in sensor networks, Mach. Learn. (2024) 1–42.
[3] M. Wang, D. Hong, Z. Han, et al., Tensor decompositions for hyperspectral data processing in remote sensing: a comprehensive review, IEEE Geosci. Remote
Sens. Mag. 11 (1) (2023) 26–72.
[4] H. Xiao, F. Wang, F. Ma, et al., eOTD: an efficient online Tucker decomposition for higher order tensors, in: 2018 IEEE International Conference on Data Mining
(ICDM), IEEE, 2018, pp. 1326–1331.
[5] L. Huang, X.L. Nguyen, M. Garofalakis, et al., In-network PCA and anomaly detection, Adv. Neural Inf. Process. Syst. (2006) 19.
[6] A. Lakhina, M. Crovella, C. Diot, Diagnosing network-wide traffic anomalies, ACM SIGCOMM Comput. Commun. Rev. 34 (4) (2004) 219–230.
[7] Y.R. Yeh, Z.Y. Lee, Y.J. Lee, Anomaly detection via over-sampling principal component analysis, in: New Advances in Intelligent Decision Technologies, Springer,
Berlin, Heidelberg, 2009, pp. 449–458.
[8] Y.J. Lee, Y.R. Yeh, Y.C.F. Wang, Anomaly detection via online oversampling principal component analysis, IEEE Trans. Knowl. Data Eng. 25 (7) (2012) 1460–1470.
[9] J. Udhayan, T. Hamsapriya, Statistical segregation method to minimize the false detections during ddos attacks, Int. J. Netw. Secur. 13 (3) (2011) 152–160.
[10] S. Fortunati, F. Gini, M.S. Greco, et al., An improvement of the state-of-the-art covariance-based methods for statistical anomaly detection algorithms, in: Signal,
Image and Video Processing, vol. 10, 2016, pp. 687–694.
[11] X. Han, L. Xu, M. Ren, et al., A Naive Bayesian network intrusion detection algorithm based on Principal Component Analysis, in: 2015 7th International
Conference on Information Technology in Medicine and Education (ITME), IEEE, 2015, pp. 325–328.
[12] H. Peng, Z. Sun, X. Zhao, et al., A detection method for anomaly flow in software defined network, IEEE Access 6 (2018) 27809–27817.
[13] R.H. Hwang, M.C. Peng, C.W. Huang, et al., An unsupervised deep learning model for early network traffic anomaly detection, IEEE Access 8 (2020) 30387–30399.
[14] Z. Li, X. Chen, J. Song, et al., Adaptive label propagation for group anomaly detection in large-scale networks, IEEE Trans. Knowl. Data Eng. (2022).
[15] K. Wu, Z. Chen, W. Li, A novel intrusion detection model for a massive network using convolutional neural networks, IEEE Access 6 (2018) 50850–50859.
[16] S. Garg, K. Kaur, N. Kumar, et al., Hybrid deep-learning-based anomaly detection scheme for suspicious flow detection in SDN: a social multimedia perspective,
IEEE Trans. Multimed. 21 (3) (2019) 566–578.
[17] P. Wang, L.T. Yang, G. Qian, et al., HO-OTSVD: a novel tensor decomposition and its incremental decomposition for cyber–physical–social networks (CPSN),
IEEE Trans. Netw. Sci. Eng. 7 (2) (2019) 713–725.
[18] P. Wang, L.T. Yang, J. Li, et al., Data fusion in cyber-physical-social systems: state-of-the-art and perspectives, Inf. Fusion 51 (2019) 42–57.
[19] Q. Song, X. Huang, H. Ge, et al., Multi-aspect streaming tensor completion, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, 2017, pp. 435–443.
[20] P. Wang, L.T. Yang, J. Li, et al., MMDP: a mobile-IoT based multi-modal reinforcement learning service framework, IEEE Trans. Serv. Comput. 13 (4) (2020)
675–684.
[21] K. Yang, Y. Gao, Y. Shen, et al., Dismastd: an efficient distributed multi-aspect streaming tensor decomposition, in: 2021 IEEE 37th International Conference on
Data Engineering (ICDE), IEEE, 2021, pp. 1080–1091.
[22] C. Liu, T. Wu, Z. Li, et al., Robust online tensor completion for IoT streaming data recovery, IEEE Trans. Neural Netw. Learn. Syst. (2022).
[23] J. Sun, D. Tao, S. Papadimitriou, et al., Incremental tensor analysis: theory and applications, ACM Trans. Knowl. Discov. Data 2 (3) (2008) 1–37.
[24] P. Wang, L.T. Yang, X. Nie, et al., Data-driven software defined network attack detection: state-of-the-art and perspectives, Inf. Sci. 513 (2020) 65–83.
[25] X. Li, K. Xie, X. Wang, et al., Online Internet anomaly detection with high accuracy: a fast tensor factorization solution, in: IEEE INFOCOM 2019-IEEE Conference
on Computer Communications, IEEE, 2019, pp. 1900–1908.
[26] K. Xie, X. Li, X. Wang, et al., Fast tensor factorization for accurate internet anomaly detection, IEEE/ACM Trans. Netw. 25 (6) (2017) 3794–3807.
[27] W. Huang, K. Xie, J. Li, A novel sequence tensor recovery algorithm for quick and accurate anomaly detection, IEEE Trans. Netw. Sci. Eng. 9 (5) (2022) 3531–3545.
[28] J.P.A. Maranhão, J.P.C.L. da Costa, E. Javidi, et al., Tensor based framework for Distributed Denial of Service attack detection, J. Netw. Comput. Appl. 174
(2021) 102894.
[29] J. Xu, X. Li, P. Wang, et al., Multi-modal noise-robust DDoS attack detection architecture in large-scale networks based on tensor SVD, IEEE Trans. Netw. Sci.
Eng. 10 (1) (2022) 152–165.
[30] X. Sáez-de-Cámara, J.L. Flores, C. Arellano, et al., Clustered federated learning architecture for network anomaly detection in large scale heterogeneous IoT
networks, Comput. Secur. 131 (2023) 103299.
[31] M. Tavallaee, E. Bagheri, W. Lu, et al., A detailed analysis of the KDD CUP 99 data set, in: 2009 IEEE Symposium on Computational Intelligence for Security and
Defense Applications, IEEE, 2009, pp. 1–6.
[32] I. Sharafaldin, A.H. Lashkari, S. Hakak, et al., Developing realistic distributed denial of service (DDoS) attack dataset and taxonomy, in: 2019 International
Carnahan Conference on Security Technology (ICCST), IEEE, 2019, pp. 1–8.
17