0% found this document useful (0 votes)
10 views15 pages

Graph-Tensor Neural Networks For Network Traffic Data Imputation

123

Uploaded by

ghr20020506
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views15 pages

Graph-Tensor Neural Networks For Network Traffic Data Imputation

123

Uploaded by

ghr20020506
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

3010 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 31, NO.

6, DECEMBER 2023

Graph-Tensor Neural Networks for Network Traffic


Data Imputation
Lei Deng , Graduate Student Member, IEEE, Xiao-Yang Liu , Member, IEEE,
Haifeng Zheng , Senior Member, IEEE, Xinxin Feng , Member, IEEE, and Zhizhang Chen , Fellow, IEEE

Abstract— It is important to estimate the global network traffic The major challenge for network traffic estimation is to cap-
data from partial traffic measurements for many network man- ture the spatial/topological correlations hidden in network traf-
agement tasks, including status monitoring and fault detection. fic. Topological structures naturally exist in computer networks
However, existing estimation approaches cannot well handle the
topological correlations hidden in network traffic and suffer from and network traffic [4], [5], and have been proven to be helpful
limited imputation performance. This paper proposes a deep for network traffic estimation [6]. But unfortunately, they are
learning approach for network traffic imputation, which well implicit and unknown. Specifically, the traffic flows collected
exploits the topological structure of network traffic. We first from different OD pairs can be strongly correlated [6], but
model the network traffic as a novel graph-tensor and derive they do not have explicit and clear quantitative relations.
a theoretical recovery guarantee. Then we develop an iterative
graph-tensor completion algorithm and propose a graph neural It is desirable to make full use of the implicit topological
network for network traffic imputation by unfolding the iterative information for network traffic imputation.
algorithm. The proposed graph neural network well captures the Previous works on network traffic estimation have limi-
topological correlations of network traffic and achieves accurate tations in capturing the topological correlations of network
imputation. Extensive experiments on real-world datasets show traffic and result in unreliable imputation performance. Exist-
that the proposed graph neural network achieves about one-half
lower relative square error while at least ten times faster ing works mainly contain three categories: Matrix completion
imputation speed than the existing methods. (MC), tensor completion (TC), and neural network meth-
ods. MC mainly utilizes the low-rank property for traffic
Index Terms— Graph-tensor, network traffic imputation, ten-
sor completion, tensor neural network. matrix imputation [7], [8]. TC is an extension of MC to
the higher dimension. It exploits the multilinear structure of
traffic data [6], [9] by various tensor decomposition strategies
I. I NTRODUCTION like CANDECOMP/PARAFAC (CP) [10] and Tucker [11].
However, existing MC and TC based algorithms by low-rank
N ETWORK traffic information is important for many
network management tasks, such as network status mon-
itoring, network fault detection, and network security man-
linear decompositions may be less effective in exploiting the
topological structure. Neural network based methods capture
agement. In large-scale networks, it is costly to collect global the nonlinear correlations of network traffic to improve impu-
traffic data by measuring all transmission pairs directly [1]. tation accuracy [12], [13]. These approaches may also ignore
To reduce the sampling cost, a subset of origin and destination the topological information of OD flows and provide unsatis-
(OD) pairs are usually selected for measurement, and then the factory imputation performance. Therefore, it is important to
global traffic data can be estimated [1], [2], [3]. develop a new approach to handle the topological structure in
network traffic to enhance the performance of imputation.
Manuscript received 29 October 2022; accepted 9 April 2023; approved In this paper, we propose a deep learning based approach
by IEEE/ACM T RANSACTIONS ON N ETWORKING Editor S. K. Das. Date that well exploits the topological structure for network traffic
of publication 15 May 2023; date of current version 19 December 2023.
This work was supported in part by the National Natural Science Foundation imputation. In particular, since the topological information of
of China under Grant 61971139 and Grant 62071125 and in part by the network traffic is unknown, graph neural networks can be
Natural Science Foundation of Fujian Province under Grant 2021J01576. applied to extract these implicit topological correlations by
(Corresponding authors: Haifeng Zheng; Zhizhang Chen.)
Lei Deng, Haifeng Zheng, and Xinxin Feng are with the Fujian their powerful learning capability. With this technique, the
Key Laboratory for Intelligent Processing and Wireless Transmission unknown topological structure can be automatically learned
of Media Information, College of Physics and Information Engineering, and utilized for efficient network traffic imputation.
Fuzhou University, Fuzhou 350108, China (e-mail: [email protected];
[email protected]; [email protected]). Our contributions are summarized as follows.
Xiao-Yang Liu is with the Department of Electrical Engineering, Columbia • We model the network traffic data as a novel low-tubal-
University, New York, NY 10027 USA (e-mail: [email protected]). rank graph-tensor using the notion of Graph-Tensor SVD
Zhizhang Chen is with the College of Physics and Information Engineer-
ing, Fuzhou University, Fuzhou, Fujian 350108, China, on leave from the (GT-SVD) [14], [15], and then formulate the imputation
Department of Electrical and Computer Engineering, Dalhousie University, of network traffic as a graph-tensor completion problem.
Halifax, NS B3H 4R2, Canada (e-mail: [email protected]). We also provide a recovery guarantee to show that the
This article has supplementary downloadable material available at
https://doi.org/10.1109/TNET.2023.3268982, provided by the authors. unobserved data is theoretically recoverable under the
Digital Object Identifier 10.1109/TNET.2023.3268982 GT-SVD framework.
1558-2566 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
DENG et al.: GRAPH-TENSOR NEURAL NETWORKS FOR NETWORK TRAFFIC DATA IMPUTATION 3011

• We propose a graph-tensor neural network (called GT- data imputation. CP decomposition can also be exploited
NET) for network traffic imputation by unfolding a to recover skewed network monitoring data [22]. To reduce
well-designed iterative graph-tensor completion algo- the computational complexity, the approach presented in [6]
rithm. GT-NET captures the topological correlations of efficiently utilizes the factorization result of the previous data
OD flows and exploits the low-rankness of network traffic to accelerate the tensor factorization of the current traffic data.
data in a learned graph spectral domain. It achieves nearly However, existing MC and TC based methods may be less
real-time imputation and enjoys a potential mathematical effective in capturing the topological correlations of OD flows
interpretation as a byproduct. by low-rank linear decompositions and suffer from limited
• We conduct extensive experiments on two real-world net- network traffic imputation performance.
work traffic datasets. Experimental results show that the
proposed GT-NET achieves one-half lower relative square C. Neural Network Based Methods
error (RSE) than the competitive algorithms. At the same
Neural network based methods have also been explored for
time, the computation time of GT-NET is at least ten
network traffic analysis. Recurrent Neural Networks (RNN)
times faster than the existing methods while keeping high
and Convolution Neural Networks (CNN) are demonstrated to
imputation accuracy.
be efficient for network traffic monitoring and analysis [23],
The remainder of this paper is organized as follows. [24]. But these approaches mostly focus on network traffic
In Section II, we review the related works about network prediction instead of imputation.
traffic imputation. In Section III, we present the system model There are a limited number of neural network based
and the problem formulation. In Section IV, we develop approaches for network traffic imputation [12], [13], [25],
the recovery guarantee and propose a iterative graph-tensor
[26]. For example, Jiang et al. [12] decomposed the end-to-
completion (TGTC) algorithm. In Section V, we describe how
end traffic into low-frequency and high-frequency components.
to unfold the iterative TGTC algorithm into a graph-tensor
The low-frequency component is represented by an auto-
neural network. In Section VI, we evaluate the performances.
regressive model, whereas the high-frequency component of
Finally, we conclude our work in Section VII.
the end-to-end traffic is reconstructed by a backpropagation
neural network. Xie et al. [13] proposed a Neural Tensor
II. R ELATED W ORKS
Completion (NTC) scheme for network traffic imputation,
We review existing works on network traffic imputation and which employs 3D CNNs to extract the hidden correlations
point out the differences between our and previous works. of traffic data and utilizes the sampled entries of a tensor for
Existing works on network traffic imputation can be mainly training.
classified into matrix completion based methods, tensor com- However, the above methods might ignore the underlying
pletion based methods, and neural network based methods. topological information of OD flows and the imputation per-
formance is limited. Besides, these methods lack potentially
A. Matrix Completion Based Methods mathematical interpretations.
Matrix completion (MC) has been extensively exploited
for network traffic imputation [7], [8], [16]. For example, D. Model-Based Neural Networks via Deep Unfolding
Sparsity Regularized Matrix Factorization (SRMF) [7] applies Model-based neural networks are developed by unfolding
K-nearest neighbors (KNN) to characterize the similarities iterations of an algorithm into deep neural networks [27]
of traffic volumes from different OD pairs, and construct a (referred to as deep unfolding). They incorporate the advan-
spatial constraint matrix to improve the accuracy. But the tages of deep learning methods and model-based methods.
correlations of traffic volumes from different OD pairs cannot Besides, deep unfolding enjoys natural mathematical inter-
be well exploited and the imputation performance is limited. pretations. Model-based neural networks have been widely
Xie et al. [1] took samples of a data matrix adaptively to applied for signal processing [28] and computer vision [29],
improve the sampling efficiency in network monitoring sys- [30], [31], and achieve encouraging performance.
tems. Xiao et al. [17] developed an anomaly-tolerant imputa- We attempt to introduce deep unfolding to network traffic
tion approach for simultaneously imputing network traffic and monitoring. Specifically, we propose a graph neural net-
detecting network anomalies. work (GT-NET) for imputing the unknown/unmeasured net-
work traffic, the structure of which is inferred by unfolding
B. Tensor Completion Based Methods a well-designed iterative graph-tensor completion algorithm
Tensor completion (TC) based methods are considered to (TGTC). GT-NET learns the topological information of net-
be more effective in capturing the multi-dimensional correla- work traffic efficiently and is capable of achieving real-time
tions [18], [19], [20], and it is more applicable for network traffic data imputation.
traffic imputation. TC based methods are developed mainly It worth mentioning that although some emerging network
by using various tensor decomposition strategies, such as telemetry technologies [32], [33] can also achieve effec-
CP [10] and Tucker [11] decompositions, which have also tive network state monitoring, they are different from our
been widely used for network traffic imputation [6], [9], [20], approach. Our proposed approach is based on traditional
[21]. For example, Zhou et al. [9] utilized CP decomposi- flow-level traffic measurement frameworks like NetFlow [34]
tion to construct regularization for accurate Internet traffic and OpenTM [35], while network telemetry does not rely on

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
3012 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 31, NO. 6, DECEMBER 2023

Fig. 1. Illustrations of the Abilene network and the graph-tensor model. There are 12 principal nodes and 144 OD pairs. Each OD pair forms a data matrix
that represents its traffic measurements for a period of time. (a) Left: The network topology, wherein a subset of OD pairs are unmeasured. Correspondingly,
a subset of data matrices are unobserved/unknown. (b) Right: The traffic graph-tensor by stacking the data matrices.

these frameworks. Hence network telemetry based approaches We also define the block-diagonal matrix A ∈ Cn1 n3 ×n2 n3
are applicable in a different context from our approach, which as
also lead to high measurement costs [33]. The proposed  e(1)
A

GT-NET would be more applicable for network traffic moni-
toring under a limited sampling budget.  Ae(2) 
A = bdiag(A)
e = . (4)

 ..
 . 
III. S YSTEM M ODEL AND P ROBLEM F ORMULATION Ae(n3 )
We first introduce the basic notations and definitions. Then,
we elaborate on the system model and the problem formula-
tion. B. System Model
Network traffic measurements considered in this work are
A. Notations collected by flow-level measurement techniques like Net-
Flow [34] and OpenTM [35]. With these techniques, the data
Throughout the paper, calligraphy letters like A are used
packets from the same OD address into one flow can be
to represent third-order tensors, boldface capital letters like A
aggregated to obtain the traffic statistics. Assuming that a
for matrices, and boldface lowercase letters like a for vectors.
network contains n3 OD pairs and the network traffic volumes
For convenience, the k-th frontal slice of a tensor A is denoted
of OD pairs are collected n1 times per hour. Then the traffic
as A(k) , i.e., A(k) = A(:, :, k).
volume data of all OD pairs within n2 hours can be formed as
We begin by introducing graph-tensors, which have advan-
a traffic tensor X ∈ Rn1 ×n2 ×n3 . Each frontal slice X (:, :, k)
tages in representing the topological structure of multi-
represents the traffic volumes of the k-th OD pair in a period
dimensional data. A tensor A ∈ Rn1 ×n2 ×n3 can be regarded
of time. We take the Abilene network [25] as an example to
as a graph-tensor if topological correlations exist in the third
illustrate the traffic tensor in Fig. 1.
dimension. An illustration of the graph-tensor model can be
There exist underlying topological correlations between the
seen in Fig. 2. The topological information is usually portrayed
OD flows. In other words, the traffic volumes of an OD pair
by a Laplacian matrix L = D − A ∈ Rn3 ×n3 , where A is
can be related to the traffic of other OD pairs [5], [6]. The
the adjacencyP matrix, D is the diagonal degree matrix and traffic tensor X can be regarded as a graph-tensor model. Note
D(i, i) = j Aij for i ∈ [n3 ].
that although the first dimension and the second dimension of
Ae is a tensor derived by performing graph Fourier transform the constructed graph-tensor are set to be “time interval” and
(GFT) along the third dimension of the graph-tensor A. “hour” here, they can be reasonably changed (e.g., by “day”
Specifically, or “week”) according to practical requirements.
Ae = GFT(A) = A ×3 Ψ∗ , (1) The spatial correlations of traffic data can be caused by
complicated user/application behaviors [37]. For example, due
and the inverse operation is to a flash crowd effect, the traffic to a single location will be
increased [5]. On the other hand, users from different areas
A = IGFT(A)
e = Ae ×3 Ψ, (2)
may have similar daily Internet behaviours [38], which may
where ×3 is the mode-3 product [36], Ψ∗ is the Hermitian result in a similar OD flow tendency. It is worth mentioning
of Ψ, which can be obtained from the eigendecomposition that the topological correlations of OD pairs naturally exist
ΨΛΨ∗ = L. We now define the following operations: in the computer network. They may be implicit and unknown
h i because the traffic volumes from different OD pairs do not
unfold(A) = A(1) ; A(2) ; · · · ; A(n3 ) ∈ Rn1 n3 ×n2 (3) have exact quantitative relations. We use a Laplacian matrix
L to represent the underlying topological correlations of OD
and fold(unfold(A)) = A. pairs.

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
DENG et al.: GRAPH-TENSOR NEURAL NETWORKS FOR NETWORK TRAFFIC DATA IMPUTATION 3013

C. Problem Formulation Definition 2 (GT-SVD [14], [36]): The GT-SVD of a


1) Target: The aim of this work is to estimate the global net- graph-tensor A ∈ Rn1 ×n2 ×n3 is given by
work traffic data from partially observed traffic measurements.
A = U ∗g S ∗g V ⊤ , (9)
Since it is costly to collect global traffic data by measuring
all transmission pairs directly, a subset of OD pairs is usually where U ∈ Rn1 ×n1 ×n3 and V ∈ Rn2 ×n2 ×n3 are orthogonal
selected to take network measurements to reduce the sampling tensors, and S ∈ Rn1 ×n2 ×n3 is an f -diagonal tensor.
cost, while the unobserved data can be estimated [1], [2]. An illustration of GT-SVD can be found in Fig. 2.
Although one can select different OD pairs at each time Under the GT-SVD framework, the graph-tensor nuclear norm
interval to take measurements, it may cause high configuration (gTNN) [40] of A is defined as ∥A∥gTNN = ∥A∥∗ .
cost since the controller needs to frequently send control Definition 3 (Graph-Tensor Tubal-Rank and Multi-
messages to the nodes in the network. Hence we simply Rank [14]): We use rankg (A) to represent the tubal-rank
assume that the traffic data of the selected OD pairs are of graph-tensor A, which is defined as the number of
recorded for a period of time and refer to such a sampling non-zero singular tubes of S. The multi-rank of graph-
pattern as the OD-pair sampling pattern (an example can be tensor A is denoted as rankm (A) = r, where r is
illustrated in Fig. 1). We focus on imputing the unobserved (i)
a vector and r i = rank(A ) for i ∈ [n3 ]. Then
traffic data accurately under such an OD-pair sampling pattern. rankg (A) = max(r 1 , · · · , r n3 ).
Suppose that the sampled OD pairs are selected randomly Definition 4 (Graph-Tensor Spectral Norm): For a graph-
and define a graph-tensor X ∈ Rn1 ×n2 ×n3 to model the tensor A, its spectral norm ∥A∥ is defined by
network traffic data, then the traffic imputation task becomes a
tensor completion problem from randomly measured/observed ∥A∥ = ∥A∥ = maxi δi (A), (10)
frontal slices. Fig. 1(a) illustrates a network under the OD-pair
sampling pattern. Correspondingly, in Fig. 1(b), some unob- where δi (A) is the i-th bigger singular value of A.
served frontal slices of the traffic tensor need to be imputed. Let M ∈ Rn1 ×n2 ×n3 be the incomplete traffic graph-tensor,
2) Limitations of traditional tensor decompositions: The Ω be the set of observed entries and PΩ be the sampling
traditional tensor decomposition frameworks like Tucker [11] operator, then PΩ (X ) = PΩ (M). More specifically,
and CP [10] decompositions are inapplicable for solving the (
above completion problem. For example, suppose that the k-th X (i, j, k), if (i, j, k) ∈ Ω,
M(i, j, k) = (11)
frontal slice X (k) is unobserved, the CP decomposition can be 0, otherwise.
written as
XR To impute the unobserved network traffic data, we formulate
X = ai ◦ bi ◦ ci , (5) the traffic data imputation problem as a tensor completion
i=1
XR problem by minimizing the graph-tensor tubal-rank, i.e.,
X (k) = (ai ◦ bi ) · cik , (6)
i=1 min rankg (X ) , s.t. PΩ (X ) = PΩ (M). (12)
X
where ◦ represents the outer product, R is the CP-rank,
ai ∈ Rn1 , bi ∈ Rn2 , ci ∈ Rn3 , and cik is the k-th element of However, rank minimization is NP-hard. One can consider
ci . Then the solution for ci would be infinite if the k-th frontal its relaxation. Similar to the matrix case, where matrix nuclear
slice X (k) is missing. Thus the CP decomposition model norm is the tightest convex relaxation to matrix rank, graph-
becomes inapplicable. Similarly, the Tucker decomposition is tensor nuclear norm (gTNN) is the convex envelope of the
also inapplicable. ℓ1 norm of graph-tensor multi-rank. The following theorem
3) Graph-tensor SVD framework: We apply graph-tensor indicates that gTNN can be regarded as the convex relaxation
SVD (GT-SVD) to solve the graph-tensor completion problem. for graph-tensor tubal-rank. Then (12) can be relaxed by
Since the traditional tensor operations [39] are inapplicable for
graph-tensors, we introduce necessary definitions. min ∥X ∥gTNN , s.t. PΩ (X ) = PΩ (M). (13)
X
Definition 1 (Graph-Tensor Product [14], [36]): The
graph-tensor product between A ∈ Rn1 ×n2 ×n3 and Theorem 1: Given a graph-tensor X ∈ Rn1 ×n2 ×n3
B ∈ Rn2 ×n4 ×n3 is defined as with multi-rank rankm (X ), the graph-tensor nuclear norm
∥X ∥gTNN is the convex envelope of the ℓ1 norm of the graph-
C = A ∗g B = (Ae △ B)
e ×3 Ψ, (7) tensor multi-rank on the set {X | ||X || ≤ 1}.
We present the proof of Theorem 1 in the supplemen-
where ∗g is the graph-tensor product, C ∈ Rn1 ×n4 ×n3 . △ is the tary material, which can be derived by following [19]. Our
facewise product [36], which can be computed by multiplying main idea is to recover the graph-tensor from random mea-
the frontal slices of two tensors independently. sured/observed OD pairs (frontal slices) by solving (13). For
The graph-tensor product can also be represented as this purpose, we address the following critical problems:
unfold(A ∗g B) = Ă · unfold(B), (8) (i) Are the traffic data of the unobserved OD pairs (frontal
slices) in a traffic graph-tensor theoretically recoverable? (ii)
where Ă = (Ψ⊗I n1 ) · A · (Ψ∗ ⊗I n2 ) ∈ Rn1 n3 ×n2 n3 , I n1 is How many OD pairs (frontal slices) should be measured for
the identity matrix with size n1 ×n1 , ⊗ denotes the Kronecker approximate recovery? (iii) Given a subset of samples Ω, how
product. to recover the traffic tensor accurately?

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
3014 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 31, NO. 6, DECEMBER 2023

Fig. 2. Illustrations of the graph-tensor and GT-SVD.

IV. T EMPORALLY R EGULARIZED G RAPH -T ENSOR from randomly selected entries under vector incoherence con-
C OMPLETION FOR T RAFFIC I MPUTATION ditions. Theorem 2 shows that the unobserved frontal slices of
a low-tubal-rank graph-tensor are theoretically recoverable
A. Recovery Guarantee if the number of the observed frontal slices |ω| reaches
Before elaborating on the temporally regularized the sampling lower bound. In other words, under GT-SVD
graph-tensor completion (TGTC) algorithm, we first discuss framework, imputing the traffic tensor under OD-pair sampling
the recovery guarantee. Since this work mainly focuses on pattern is theoretically feasible. The proof of Theorem 2 can
estimating the unobserved frontal slices in the traffic graph- be found in the supplementary material.
tensor, let us consider the following theoretical problem: As mentioned in Section III-C, unobserved frontal slices are
For a graph-tensor X ∈ Rn1 ×n2 ×n3 , to guarantee exact hard to be recovered under traditional tensor decomposition
tensor recovery with high probability, what conditions for frameworks like Tucker [11] and CP [10] decompositions.
the sampling frontal slices and the tubal-rank rankg (X ) are Hence Theorem 2 indicates that GT-SVD enjoys theoretical
required? To find the answer, we define some notations firstly. advantages for frontal slice recovery compared with the con-
Let ω be the set of indices of the randomly observed frontal ventional tensor decomposition frameworks. Since the recov-
slices. The number of observed frontal slices |ω| satisfies ery guarantee under GT-SVD framework can be provided,
|Ω| = n1 n2 |ω|, where Ω is the set of totally observed entries. we focus on traffic graph-tensor completion.
For convenient discussion, we assume that n1 = n2 = n.
We begin by introducing the notion of graph-tensor inco-
herence, a prerequisite for graph-tensor completion from ran- B. Temporal Regularization
domly observed frontal slices. We develop temporal regularization for mining the underly-
Definition 5: (Graph-tensor incoherence condition) Sup- ing temporal correlations of network traffic data. In particular,
pose a graph-tensor X ∈ Rn×n×n3 with tubal-rank r (r ≤ n) inspired by [44], we construct two Laplacian tensors L1 ∈
can be decomposed into X = A ∗g B, where A ∈ Rn×r×n3 Rn1 ×n1 ×n3 and L2 ∈ Rn2 ×n2 ×n3 to depict the correlations of
r×n×n3
and B ∈  R  . Further, define F ∈ Rn×n×n3 , where traffic volumes at different time intervals and hours, respec-
F (k) (k)
= A ,0 ∈ R n×n
, for k ∈ [n3 ]. Let F̆ = (Ψ∗ ⊗I n ) · (k) (k)
tively. Each frontal slice L1 (L2 ) is the Laplacian matrix
F · (Ψ⊗I n ), then X satisfies the incoherence condition, that connects different rows (columns) of X (k) . Then (13) can
if there exists a µ0 > 0 such that be rewritten by incorporating the temporal regularization:
max F̆ (i, j) ≤ µ0 . (14) n3
(k)
i,j
X
Graph-tensor incoherence condition is the extension of min ∥X ∥gTNN + λ1 tr((X (k) )⊤ L1 X (k) )
X
k=1
vector incoherence [41]. For lower µ0 , the sampling is more n3
effective, and fewer samples are required for the recovery. (k)
X
+ λ2 tr(X (k) L2 (X (k) )⊤ ),
Then we have the following result. k=1
Theorem 2: Suppose that X ∈ Rn×n×n3 is a graph-tensor s.t. PΩ (X ) = PΩ (M), (16)
with tubal-rank r, and ω is the set of randomly measured
frontal slices of X . Suppose X satisfies the incoherence where tr(·) denotes the trace of a matrix, λ1 and λ2 are adjust-
condition (14) with parameter µ0 , if ment parameters. There are many ways to derive L1 and L2 .
(k) (k)
1 For example, each frontal slice L1 (L2 ) can be constructed
|ω| ≥ C0 rn3 µ20 log(nn3 /δ) (15) by simply applying KNN and linear regression to the rows
n
(columns) [7].
for some constants C0 and δ, then X can be recovered with It is worth mentioning that the Laplacian tensors L1 and
high probability by solving the tightest convex relaxation of L2 are designed for temporal regularization. L1 (L2 ) is sig-
(12). nificantly different from the Laplacian matrix L that represents
Theorem 2 is derived by following the compressive sensing the topological correlations of OD pairs. For clear statements,
theory [42], [43], which claims that a vector can be recovered TABLE I summarizes the key notations used in this paper.

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
DENG et al.: GRAPH-TENSOR NEURAL NETWORKS FOR NETWORK TRAFFIC DATA IMPUTATION 3015

TABLE I problem (19) has a closed solution [44], for k ∈ [n3 ], we have
S UMMARY OF K EY N OTATIONS U SED IN THE PAPER
(k)
(k) 2λ1 (k) −1 (k) Y 2λ2 (k) −1
Xt = (I + L1 ) (Zt−1 − t−1 )(I + L ) ,
ρ ρ ρ 2
s.t. PΩ (X ) = PΩ (M), (20)

where I is an identity matrix with the proper size. We use PΩ⊥


to denote the complement of PΩ , then the above equation can
be further written as

Xt = PΩ⊥ (At ) + PΩ (M), (21)

where At is an intermediate tensor, for k ∈ [n3 ],


(k)
(k) 2λ1 (k) −1 (k) Y 2λ2 (k) −1
At = (I + L1 ) (Zt−1 − t−1 )(I + L ) .
ρ ρ ρ 2
(22)
C. Optimization Solution
2) Updating Z: Z can be updated by
In order to solve (16), we need to introduce an intermediate ( )
variable Z, then (16) becomes 2
ρ Yt−1
Zt = arg min ∥Z∥gTNN + Z − (Xt + ) ,
n3 Z 2 ρ F
(k)
X
min ∥Z∥gTNN + λ1 tr((X (k) )⊤ L1 X (k) ) (23)
X
k=1
n3 and (23) has a closed-form solution by using graph-tensor
(k)
X
+ λ2 tr(X (k) L2 (X (k) )⊤ ), singular value thresholding (as given in Lemma 1).
k=1 Definition 6: The definition of the element-wise soft-
s.t. X = Z, PΩ (X ) = PΩ (M). (17) thresholding operation soft(·) is given by

We apply the alternating direction method of multipliers  Y ij + γ, if Y ij ≤ −γ,

(ADMM) to solve (17). Firstly, the Lagrangian function of soft(Y ij , γ) = 0, if |Y ij | ≤ γ, (24)
(17) is given by 
Y ij − γ, if Y ij ≥ γ.

L(X , Z, Y) Lemma 1: Suppose that the GT-SVD of a graph-tensor R ∈


n3 Rn1 ×n2 ×n3 is R = U ∗g S ∗g V ⊤ , the following problem
(k)
X
= ∥Z∥gTNN + λ1 tr((X (k) )⊤ L1 X (k) ) 
1

2
k=1 arg min τ ∥Z∥gTNN + ∥Z − R∥F (25)
n3 Z 2
(k)
X
+ λ2 tr(X (k) L2 (X (k) )⊤ ) has a closed-form solution:
k=1
ρ e (k) )⊤ , k ∈ [n3 ].
+ Y ⊤ (X − Z) +
2
∥X − Z∥F , Ze(k) = Dτ (R
e (k) ) = Ue(k) soft(Se(k) , τ )(V
2 (26)
s.t. PΩ (X ) = PΩ (M), (18)
where Dτ (·) is the graph-tensor singular value thresholding
where Y is the Lagrange multiplier. operation.
The variables X , Z, and Y are updated alternatively and Then (23) can be solved by using Lemma 1, i.e., let Rt =
iteratively. Then we obtain the following recursion: Xt + Yt−1
ρ , we have
1) Updating X : X can be updated by
(k) e (k)
( n
3 Zet = D ρ1 (R t ), k ∈ [n3 ]. (27)
(k)
X
Xt = arg min λ1 tr((X (k) )⊤ L1 X (k) )
X
k=1 3) Updating Y: Y can be updated by
n3
(k) Yt = Yt−1 + ρ(Xt − Zt ).
X
+ λ2 tr(X (k) L2 (X (k) )⊤ ) (28)
k=1
2
) Therefore, we have the temporally regularized graph-tensor
ρ Yt−1 completion (TGTC) algorithm given in Alg. 1. The pro-
+ X − (Zt−1 − ) ,
2 ρ F posed TGTC algorithm adopts ADMM to decompose the
s.t. PΩ (X ) = PΩ (M), (19) optimization problem (17) into several solvable sub-problems,
where each sub-problem has a closed-form solution. Since
where ρ is the learning rate. We note that the n3 frontal the convergence of ADMM has been proven [45], TGTC
slices of X can be updated independently and parallelly. Since converges as well.

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
3016 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 31, NO. 6, DECEMBER 2023

Algorithm 1 Temporally Regularized Graph-Tensor Algorithm 2 Training the Graph-Tensor Neural Net-
Completion Algorithm (TGTC) work (GT-NET)
Input: PΩ (M): the partially observed tensor, Input: PΩ (M): the partially observed tensor,
L1 , L2 : temporal Laplacian tensors, X : the recovered tensor label,
L ∈ Rn3 ×n3 : Laplacian matrix for OD pairs, T: phase number,
λ1 , λ2 , ρ, ϵ: hyperparameters. MaxEpoch: maximum epoch.
Maxiter: maximum iteration. Output: θ: The trained parameters.
Output: Xest : the recovered tensor. 1 Initialize: Initialize trainable parameters θ,
1 Initialize: X0 = Z0 = Y0 = 0, t = 1. Z0 = Y0 = 0.
2 while NOT converged or t ≤ Maxiter do 2 while NOT converged or epoch ≤ MaxEpoch do
3 Update X by (21): 3 for t = 1: T do
4 for k= 1 : n3 do 4 Module Xt :
5
(k)
At = 5 At =Conv2D(Conv2D(Zt−1 − Yt−1 ρ ))
(k) −1 (k) Y
(k)
2λ2 (k)
−1 6 Xt = PΩ⊥ (At ) + PΩ (M).
(I + 2λρ 1 L1 ) (Zt−1 − t−1
ρ )(I + ρ L2 ) ,
7 Module Zt :
6 end
8 Rt = Xt + Yt−1 ρ ,
7 Xt = PΩ⊥ (At ) + PΩ (M).
8 Check the convergence condition: 9 Re t = F(Rt ) = Conv2D(ReLU(Conv2D(Rt ))),
∥Xt − Xt−1 ∥∞ ≤ ϵ. 10 Ht = soft(R e t , 1 ),
ρ
9 Update Z by (27): 11 Zt = F −1 (Ht ) =
10 Rt = Xt + Yt−1 ρ ,
Conv2D(ReLU(Conv2D(Ht ))).
11 Rt = GFT(Rt ),
e 12 Module Yt :
12 for k= 1 : n3 do 13 Yt = Yt−1 + ρ(Xt − Zt ).
e (k) 14 end
13 [U , S, V ] = svd(R t ),
(k) 15 Update θ to minimize the Loss in (35) by Adam
14 Ht = U soft(S, ρ )V ⊤ ,
1
optimizer.
15 end 16 epoch = epoch + 1.
16 Zt = IGFT(Ht ). 17 end
17 Update Y by (28): 18 return θ.
18 Yt = Yt−1 + ρ(Xt − Zt ).
19 t = t + 1.
20 end
21 return Xest = Xt . tunately, these topological correlations are unknown. Hence
it is difficult to artificially construct a parameter L that can
accurately depict the topological connections of OD pairs.
D. Computational Complexity An inaccurate L might degrade the performance of network
traffic imputation. Besides, parameters like ρ, L1 , and L2 are
Now we discuss the computational complexity of the
generally needed to be tuned for different applications, which
TGTC algorithm. It is easy to obtain that the cost of
is inflexible.
GFT is O(n1 n2 n23 ). The cost of temporal regularization
To solve the above problems, we propose a graph-tensor
is O(n21 n2 n3 + n1 n2 n23 ). Moreover, the cost of SVD is
neural network to learn the correlations of OD pairs and other
O(min(n21 n2 , n1 n22 )) for calculating singular value soft-
hyperparameters. More specifically, a deep unfolding method
thresholding. Then the cost of each iteration is the sum of
is considered as a manner of automating hyperparameter
the cost of these parts, and the total cost is that multiplied by
cross-validation [28], [30], in which the hyperparameters are
the total number of iteration times.
learnable.

E. Advantages and Limitations of the TGTC Model V. G RAPH -T ENSOR N EURAL N ETWORKS
The proposed TGTC algorithm enjoys two advantages.
Firstly, TGTC models the network traffic as a graph-tensor and We propose a graph-tensor neural network (GT-NET) for
adopts GFT for capturing the topological correlations of OD network traffic imputation. The basic idea is to map Alg. 1
pairs efficiently. Secondly, we develop temporal regularization into a deep neural network with a number of phases, wherein
to depict the temporal correlations of traffic data, which each phase corresponds to one iteration of Alg. 1. The structure
is expected to improve network traffic imputation accuracy of GT-NET is presented in Alg. 2, wherein each module in
further. GT-NET corresponds to a variable update in the iteration of
Nevertheless, the parameter settings are the main pitfall we Alg. 1. An overview of the GT-NET is shown in Fig. 3 for
need to address in TGTC. For example, L is a key parameter clear statements. GT-NET contains 3 modules in each phase:
that portrays the topological correlations of OD pairs. Unfor- module Xt , Zt , and Yt .

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
DENG et al.: GRAPH-TENSOR NEURAL NETWORKS FOR NETWORK TRAFFIC DATA IMPUTATION 3017

Fig. 3. Overview of the proposed GT-NET. The whole neural network contains T phases, wherein the structure of one phase is presented on the bottom.
The arrows denote the data flow.

A. Design of Module Xt B. Design of Module Zt


This module maps lines 3-7 of Alg. 1 to lines 4-6 of This module maps lines 9-16 of Alg. 1 to lines 7-11
(k) −1
Alg. 2. In line 5 of Alg. 1, the matrices (I + 2λρ 1 L1 ) of Alg. 2. This module mainly contains three submodules,
(k) −1 i.e., transform submodule, soft-thresholding submodule, and
and (I + 2λρ 2 L2 ) that contain temporal correlations of
inverse transform submodule.
network traffic data are designed for regularization. One
1) Transform submodule: In line 11 of Alg. 1, we use
may consider convolution kernels (layers) instead of matrix
(k) −1 graph-Fourier transform to obtain a low-rank tensor defined in
multiplications. In other words, the matrices (I + 2λρ 1 L1 ) the graph spectral domain. Since the topological correlations
(k) −1
and (I + 2λρ 2 L2 ) in line 5 of Alg. 1 can be replaced by of OD pairs are unknown, we deploy convolutional neural
convolution kernels C kt and D kt with appropriate sizes. Then networks to find a general transform domain, which are
the temporal correlations can be learned from training data. demonstrated to be powerful in learning representation [31],
The update of X can be written as [46], [47]. We design a general transform structure by using
two 2D convolution layers with multi-channels and a ReLU
(k)
(k) (k) Yt−1 activation layer, which can be represented by
At = C kt (Zt−1 − )D kt , k ∈ [n3 ], (29)
ρ
R
e t = F(Rt ) = Conv2D(ReLU(Conv2D(Rt ))). (33)
and

Xt = PΩ⊥ (At ) + PΩ (M). (30) Hence line 11 of Alg. 1 becomes line 9 of Alg. 2. The ReLU
activation is adopted for the purpose of learning a more general
For simplicity, (29) can be represented by transform.
Yt−1 2) Soft-thresholding submodule: By considering that the
At = C t (Zt−1 − )D t , (31) matrix factorization can be implicitly performed by the neural
ρ
network [48], the SVD in line 13 of Alg. 1 is implemented
where C t and D t are 2D convolution layers with n3 channels, by neural networks, and the component matrices U and V
i.e., C kt and D kt correspond to the k-th channel of C t and D t , are implicitly contained in the transform submodule and the
respectively. Then we rewrite At : inverse transform submodule (the inverse transform submodule
Yt−1 will be introduced next). Hence one can directly replace lines
At = Conv2D(Conv2D(Zt−1 − )). (32) 12-15 of Alg. 1 by using the element-wise soft-thresholding
ρ
operator that is defined in Definition 6. The soft-thresholding
Therefore, lines 3-7 of Alg. 1 become lines 4-6 of Alg. 2. parameter 1/ρ is set to be trainable rather than adopting

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
3018 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 31, NO. 6, DECEMBER 2023

a pre-set value in [49]. The output of the soft-thresholding A. Performance Metric


submodule is denoted by Ht =soft(R e t , 1 ).
ρ To quantify the imputation performance, we adopt relative
3) Inverse transform submodule: We design an inverse square error (RSE) and mean absolute error (MAE) as metrics.
transform submodule F −1 (·), which corresponds to the oper- RSE is given by
ation in line 16 of Alg. 1. F −1 (·) acts as the inversion of
the general transform, i.e., F −1 (F(X )) = X . The inverse ∥X − Xest ∥F
RSE = , (38)
transform submodule is designed by using the symmetric ∥X ∥F
structure with F(·). Then the output of the module Zt can where ∥·∥F is the tensor Frobenius norm, X and Xest are
be written as the ground truth tensor and the estimated tensor, respectively.
MAE is defined as
Zt = F −1 (Ht ) = Conv2D(ReLU(Conv2D(Ht ))). (34) num
1 X
MAE = |x̂i − xi |, (39)
Therefore, lines 9-16 of Alg. 1 become lines 7-11 of Alg. 2. num i=1
where x̂i is the imputed value, xi is the ground truth value
C. Design of Module Yt and num is the number of missing/unobserved values.
This module can be directly obtained by following line 18 of
Alg. 1, but the learning rate ρ is set to be a learnable parameter. B. Simulations With a Synthetic Dataset
We evaluate the imputation performance of TGTC algorithm
on a synthetic graph-tensor with a given topological structure.
D. Loss Function We generate a low-tubal-rank graph-tensor C ∈ RL1 ×L2 ×L3
Two constraints are considered in the GT-NET: with tubal-rank r by (7), i.e., C = A ∗g B = (Ae △ B) e ×3
• The fidelity of the recovered data.
Ψ. The topological structure of C is generated at random.
• The accuracy of the inverse transform function.
Specifically, denote the corresponding Laplacian matrix for C
by L = D −A, where A is generated by MATLAB command
Then the loss function is defined as
rand(L 3 , L3 ) but the diagonal elements are set to 0, D(i, i) =
for i ∈ [L3 ]. Then ΨΛΨ∗ = L. A is generated by
P
Loss = αLfidelity + βLinversion , (35) j A ij
MATLAB command rand(L1 , r, L3 ) and B is generated by
where rand(r, L2 , L3 ). We set L1 = L2 = L3 = 20 and r = 15.
Then we obtain a graph-tensor C ∈ R20×20×20 with tubal-
Lfidelity = ||PΩ⊥ (XT − X )||2F , (36) rank r = 15.
We randomly drop some frontal slices from C to evaluate the
and imputation performance of TGTC, and the sampling rate varies
T from 0.1 to 0.8. Since the temporal regularization of TGTC
1 X
Linversion = T ||F −1 (F(Rt )) − Rt ||2F . (37) is unnecessary for the synthetic graph-tensor, we set λ1 =
t=1 λ2 = 0 here. Three existing tensor completion approaches are
considered for comparison: Tensor-nuclear norm minimization
Note that XT is the recovered data in the last phase, X is
using the alternating direction method of multipliers (TNN-
the ground truth, α and β are the trade-off parameters of two
ADMM) [50], CP-ALS [51], [52], and Tucker-ALS [51], [52].
loss-terms. α and β are set to be 1 and 0.1, respectively.
Fig. 4 plots the RSEs for different algorithms. It can be
The GT-NET is potentially interpretable since it is inferred
seen that the traditional CP and Tucker decomposition models
by following the structure of TGTC update steps in Alg. 1.
(i.e., CP-ALS and Tucker-ALS) achieve limited performances.
All parameters in GT-NET are discriminately learned instead
In contrast, TGTC achieves more excellent performance com-
of being hand-crafted or fixed. Each phase in GT-NET cor-
pared with other algorithms since the topological correlations
responds to one iteration in TGTC, and each module in
of the graph-tensor can be well extracted. The experimen-
GT-NET corresponds to a variable update in the iteration.
tal results demonstrate that TGTC is effective in imputing
Hence GT-NET enjoys the benefits of quick and accurate
graph-tensor with unobserved frontal slices.
imputation with well-defined interpretability.

C. Experiment Setup for Real Datasets


VI. P ERFORMANCE E VALUATION
For real network traffic datasets, the topological structures
We conduct experiments to evaluate the imputation per- might be unknown. We mainly focus on evaluating the impu-
formances of the proposed TGTC algorithm and GT-NET. tation performance of GT-NET under the OD-pair sampling
Two cases are considered in the experiments: The topological pattern.
structure is known or unknown. Specifically, for datasets with 1) Datasets: We conduct experiments to justify the effec-
known topological structures, we directly apply TGTC for tiveness of the proposed GT-NET by using the large-scale IP
imputation. For datasets with unknown topological structures, network traffic datasets, i.e., the Abilene dataset [25] and the
we apply GT-NET to learn the topological correlations and GÉANT dataset [53]. The Abilene dataset contains end-to-
impute the unobserved network traffic data. end traffic of 12 nodes (144 OD pairs) over 168 days, and the

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
DENG et al.: GRAPH-TENSOR NEURAL NETWORKS FOR NETWORK TRAFFIC DATA IMPUTATION 3019

TABLE II
RSE C OMPARISONS OF TGTC AND GT-NET ON THE A BILENE DATASET

TABLE III
RSE C OMPARISONS OF TGTC AND GT-NET ON THE G ÉANT DATASET

Fig. 4. RSE comparisons for the synthetic graph-tensor.

traffic data is collected every 5 minutes. The GÉANT dataset


collects end-to-end traffic volumes of 23 nodes from January
1st, 2005 to April 31st, 2005, and the traffic data is collected
every 15 minutes.
2) Data preprocessing: We regard the traffic data in one
day as a unit, and each unit is modeled as a graph-tensor. We also study the imputation accuracy and convergence of
We take the Abilene dataset as an example. The Abilene traffic GT-NET with different network structures. We mainly focus
data in one day can be constructed as a graph-tensor with size on the design of the module Zt . In particular, we modify the
12×24×144 (i.e., 12 time intervals×24 hours×144 OD pairs). structures of the transform submodule F(·) and the inverse
Since there are 168 days to consider, one can obtain 168 graph- transform submodule F −1 (·), in which three different struc-
tensors (units), where 80% of graph-tensors would be used tures are considered:
for training and the rest for testing. We randomly drop some • structure-1: The transform submodule F(·) is set to

frontal slices from each graph-tensor (unit) to imitate the F(·) = Conv2D(ReLU(Conv2D(·))), and the inverse
OD-pair sampling pattern. To make comprehensive compar- transform submodule F −1 (·) is set to the symmetric
isons, we vary the sampling rate from 0.1 to 0.8. The GÉANT structure with F(·). This is the adopted structure in the
dataset is also preprocessed similarly. Abilene and GÉANT proposed GT-NET (as shown in Fig. 3).
datasets have been previously examined by [7] and [22]. • structure-2: The transform submodule F(·) is set to

We build a neural network under TensorFlow frame- F(·) = Conv2D(·), and the inverse transform submodule
work and conduct experiments on a computer with Intel(R) F −1 (·) is set to the symmetric structure with F(·).
Core(TM) i7-8700 3.20GHz CPU. Adam optimizer is adopted, • structure-3: The transform submodule F(·) is set to

while the learning rate, batch size and running epoch are set F(·) = Conv2D(Conv2D(·)), and the inverse transform
to be 10e-4, 1 and 800, respectively. Our source codes are submodule F −1 (·) is set to the symmetric structure with
available online.1 F(·).
Figs. 5 and 6 present the RSEs of GT-NET for different
D. GT-NET With Different Neural Network Structures running epochs. There are several key observations: (i) at high
sampling rates (e.g., 0.7), GT-NET with different structures
In the proposed GT-NET model, we adopt T phases to can converge faster than at low sampling rates (e.g., 0.3),
recover the unobserved traffic data. Now we conduct experi- (ii) the transform submodule F(·) with two convolutional
ments to investigate the performance impacted by the phase layers is more effective than a single convolutional layer, (iii)
number T . TABLE II and TABLE III present the imputation a ReLU activation function further improves the imputation
RSE of GT-NET for choosing the different phase number T . performance, thus GT-NET with structure-1 achieves better
It can be seen that the larger the phase number T is, the better performance as the epoch number increases. Then structure-1
imputation performance can be achieved generally. But the is a great choice for GT-NET. Overall, for the Abilene dataset,
performance is improved very slightly when T is sufficiently GT-NET requires about 800 running epochs for convergence
large. We note that large T can increase the model complexity and about 15 minutes for training. For the GÉANT dataset,
and result in more computational time. For example, the GT-NET requires about 350 running epochs for convergence
computational time for T = 2, 4, 6 are 3.9, 8.9, 14.9 ms, and about 20 hours for training.
respectively. Our scheme with T = 4 is a balanced choice that
ensures both high imputation speed and accuracy. Meanwhile, E. Performance Comparison of TGTC and GT-NET
experimental evidence in TABLE II shows that GT-NET
We evaluate the performance of TGTC on two datasets and
can provide much more reliable imputation performance than
make a comparison with GT-NET. Hyperparameters in TGTC
TGTC.
like λ1 , λ2 , ρ, ϵ, L1 , L2 and L need to be predetermined.
1 https://github.com/summerdenglei/graph-tensor-neural-network λ1 , λ2 , ρ and ϵ are tuned to be 0.01, 0.01, 0.1 and 10e-3,

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
3020 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 31, NO. 6, DECEMBER 2023

Fig. 5. Imputation accuracy and convergence of GT-NET with different network structures on the Abilene dataset. (a) Sampling rate = 0.7. (b) Sampling
rate = 0.3.

Fig. 6. Imputation accuracy and convergence of GT-NET with different network structures on the GÉANT dataset. (a) Sampling rate = 0.7. (b) Sampling
rate = 0.3.

respectively. Each frontal slice of the Laplacian tensor L1 (L2 ) TGTC is hard to be well chosen artificially, GT-NET is more
can be constructed by following the strategy in [7], which applicable. Although TGTC achieves excellent performance
simply applies KNN and linear regression to rows (columns). on the synthetic dataset, it is less effective for real datasets.
The Laplacian matrix L portrays the topological correlations Hence the imputation result of TGTC is not presented in the
of the n3 OD pairs, which are unknown. But L can still be following experiments.
roughly computed. Specifically, we can define a directed graph
G to represent these correlations. The edge weight from an OD F. Performance Comparison With Other Algorithms
pair a to another OD pair b on the graph is set to be 1 if the We compare GT-NET with 6 baseline methods, including
destination of pair a is actually the origin of pair b. Given the NTC [13], Autoencoder [55], SRMF [7], TNN-ADMM [20],
connection weights on the graph, one can obtain an adjacency [50], Graph-Laplacian Regularization Tensor Completion algo-
matrix and further derive L. rithm (GLRTC) [54], CP-ALS [51], [52], and Tucker-
We also investigate the performance of TGTC in the case ALS [51], [52]. NTC is a deep neural network based tensor
without temporal regularization (by setting λ1 = λ2 = 0). completion model for network traffic monitoring, which
The imputation results of TGTC and GT-NET at different exploits 3D CNN to capture the spatial-temporal feature of
sampling rates are presented in TABLE II and TABLE III. traffic data. Autoencoder is also a deep learning based missing
We can observe that temporal regularization brings perfor- value imputation approach with an autoencoder architecture.
mance improvement slightly. However, the overall imputation SRMF is an MC model, which develops spatial and tem-
accuracy is still limited compared with GT-NET. This is poral constraints for traffic matrix imputation. We vector-
generally because the topological correlations of OD pairs ize each frontal slice of the graph-tensor to form a traffic
cannot be well exploited. Although one can roughly construct matrix so that SRMF can be applied. TNN-ADMM utilizes
L according to the network topology, which cannot precisely t-SVD [39] to capture the multi-dimensional correlations of
reflect the correlations of OD pairs. Since the parameter L in tensors, which improves the network measurement completion

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
DENG et al.: GRAPH-TENSOR NEURAL NETWORKS FOR NETWORK TRAFFIC DATA IMPUTATION 3021

TABLE IV
MAE C OMPARISONS ON THE A BILENE DATASET

Fig. 7. RSE comparisons on the Abilene dataset. Fig. 8. RSE comparisons on the GÉANT dataset.

accuracy. GLRTC introduces spatial regularization to improve Similarly, we also carry out experiments to verify the
the network measurement recovery accuracy. CP-ALS and effectiveness of GT-NET on the GÉANT dataset. The impu-
Tucker-ALS perform alternating least-square for tensor com- tation RSEs of different methods are presented in Fig. 8.
pletion based on CP model and Tucker model, respectively. It can be easily observed that SRMF, GLRTC, TNN-ADMM,
CP-ALS and Tucker-ALS have been examined by previous NTC, CP-ALS, and Tucker-ALS provide similar and limited
works [22]. Notice that existing works also model the net- imputation accuracy. GT-NET achieves much more excellent
work traffic data as an origin-destination-time (O-D-T) tensor, imputation performance thanks to the well-learned topological
in which the OD-pair sampling indicates a tubal sampling. correlations of OD flows. TABLE V indicates the MAE
We adopt the better recovery results between the O-D-T tensor comparison on the GÉANT dataset, which demonstrates the
model and graph-tensor model as the true recovery results of superiority of the proposed GT-NET. We note that the MAEs
the traditional tensor completion algorithms. of the competitive approaches do not strictly decrease as the
Fig. 7 presents the RSE of different methods on the Abilene sampling rate rises. This is mainly because the end-to-end
dataset and TABLE IV presents more imputation results. network traffic data is highly skewed with heavy tails [7], [22]
We can observe that the imputation performances of the com- while the conventional tensor completion methods cannot well
petitive algorithms are not satisfactory because they cannot handle the skewed distributed traffic data. Nevertheless, GT-
capture the underlying correlations of OD pairs. In particular, NET can still provide reliable performance.
SRMF is a matrix completion approach, and it cannot handle Moreover, the computational time of GT-NET is also sat-
the topological correlations. TNN-ADMM and GLRTC are isfactory. TABLE VI summarizes the computational time of
both t-SVD based approaches, which may be effective in different methods. We mention that the computational time of
capturing the temporal correlations of signals but vulnerable NTC is not presented here because the trained neural network
in extracting topological correlations. Since the unobserved of NTC might be inapplicable for a new unit (tensor) and it
frontal slices seriously damage the low-rankness of the tensor, is difficult to make fair comparisons. Experimental results in
the imputation performances of CP-ALS and Tucker-ALS are TABLE VI show that the proposed GT-NET achieves excellent
limited. We also notice that the imputation performance of real-time performance compared with the other algorithms.
NTC is also unsatisfactory because the topological correlations For the Abilene dataset, the computational time of GT-NET
of OD flows cannot be well learned in the training process. can be controlled within 10ms, while GLRTC, TNN-ADMM,
In contrast, GT-NET exploits the topological correlations and CP-ALS, SRMF, Tucker-ALS and Autoencoder require about
the low-rankness of network traffic data in a learned transform 1.4, 1.2, 1, 0.9, 0.5 and 0.42 seconds, respectively. Since the
domain, which achieves more reliable imputation performance GÉANT dataset contains more OD pairs than the Abilene
than the competitive algorithms. dataset, the experiments on the GÉANT dataset generally

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
3022 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 31, NO. 6, DECEMBER 2023

TABLE V
MAE C OMPARISONS ON THE G ÉANT DATASET

TABLE VI [5] M. Crovella and E. Kolaczyk, “Graph wavelets for spatial traffic
C OMPUTATIONAL T IME analysis,” in Proc. 22nd Annu. Joint Conf. IEEE Comput. Commun.
Societies, Mar. 2003, pp. 1848–1857.
[6] K. Xie et al., “Accurate recovery of internet traffic data: A sequential
tensor completion approach,” IEEE/ACM Trans. Netw., vol. 26, no. 2,
pp. 793–806, Apr. 2018.
[7] Y. Zhang, M. Roughan, W. Willinger, and L. Qiu, “Spatio-temporal com-
pressive sensing and internet traffic matrices,” in Proc. ACM SIGCOMM
Conf. Data Commun., Aug. 2009, pp. 267–278.
[8] P. Tune and M. Roughan, “Spatiotemporal traffic matrix synthesis,” in
Proc. ACM Conf. Special Interest Group Data Commun., Aug. 2015,
pp. 579–592.
[9] H. Zhou, D. Zhang, K. Xie, and Y. Chen, “Spatio-temporal tensor com-
pletion for imputing missing internet traffic data,” in Proc. IEEE 34th
Int. Perform. Comput. Commun. Conf. (IPCCC), Dec. 2015, pp. 1–7.
require more computational time than on the Abilene dataset [10] S. E. Leurgans, R. T. Ross, and R. B. Abel, “A decomposition for three-
(see TABLE VI). GLRTC, TNN-ADMM, CP-ALS, Tucker- way arrays,” SIAM J. Matrix Anal. Appl., vol. 14, no. 4, pp. 1064–1083,
ALS, SRMF, Autoencoder and our GT-NET require about 3.7, Oct. 1993.
[11] L. R. Tucker, “Implications of factor analysis of three-way matrices for
3.2, 3.3, 1.6, 1.4, 2.41 and 0.11 seconds, respectively. measurement of change,” Problems Measuring Change, vol. 15, p. 3,
Jan. 1963.
VII. C ONCLUSION [12] D. Jiang, Z. Zhao, Z. Xu, C. Yao, and H. Xu, “How to reconstruct
end-to-end traffic based on time-frequency analysis and artificial neural
This paper studies the network traffic imputation problem network,” AEU-Int. J. Electron. Commun., vol. 68, no. 10, pp. 915–925,
from a subset of OD flows. Specifically, we model the network Oct. 2014.
[13] K. Xie et al., “Neural tensor completion for accurate network moni-
traffic data as a low-tubal-rank graph-tensor using the notion toring,” in Proc. IEEE INFOCOM Conf. Comput. Commun., Jul. 2020,
of GT-SVD and develop a theoretical recovery guarantee. pp. 1688–1697.
Then we present an iterative tensor completion algorithm (i.e., [14] T. Zhang, W. Kan, and X.-Y. Liu, “High performance GPU primitives for
TGTC) based on GT-SVD for network traffic imputation. graph-tensor learning operations,” J. Parallel Distrib. Comput., vol. 148,
pp. 125–137, Feb. 2021.
Since the topological correlations of OD flows are unknown, [15] L. Deng, X.-Y. Liu, H. Zheng, X. Feng, and Y. Chen, “Graph spectral
we propose a multi-phase deep neural network (i.e., GT- regularized tensor completion for traffic data imputation,” IEEE Trans.
NET) by unfolding the iterative TGTC algorithm, which is Intell. Transp. Syst., vol. 23, no. 8, pp. 10996–11010, Aug. 2022.
potentially interpretable. Then the low-rankness of traffic data [16] D. Jiang, W. Wang, L. Shi, and H. Song, “A compressive sensing-based
approach to end-to-end network traffic reconstruction,” IEEE Trans.
can be exploited in a learned transform domain. We apply Netw. Sci. Eng., vol. 7, no. 1, pp. 507–519, Jan. 2020.
two CNNs with one ReLU to imitate the transform, soft- [17] F. Xiao, L. Chen, H. Zhu, R. Hong, and R. Wang, “Anomaly-tolerant
thresholding operation to implement the low-rankness. Perfor- network traffic estimation via noise-immune temporal matrix completion
model,” IEEE J. Sel. Areas Commun., vol. 37, no. 6, pp. 1192–1204,
mance evaluation on real-world datasets shows that GT-NET Jun. 2019.
achieves better imputation accuracy and real-time performance [18] M. E. Kilmer, L. Horesh, H. Avron, and E. Newman, “Tensor-
for network traffic data than with the state-of-the-art. tensor algebra for optimal representation and compression of multi-
way data,” Proc. Nat. Acad. Sci. USA, vol. 118, no. 28, Jul. 2021,
Art. no. e2015851118.
R EFERENCES [19] G. Song, M. K. Ng, and X. Zhang, “Robust tensor completion using
[1] K. Xie et al., “Sequential and adaptive sampling for matrix completion transformed tensor SVD,” Numer. Linear Algebra Appl., vol. 27, no. 3,
in network monitoring systems,” in Proc. IEEE Conf. Comput. Commun. p. e2299, 2020.
(INFOCOM), Apr. 2015, pp. 2443–2451. [20] L. Deng, H. Zheng, X.-Y. Liu, X. Feng, and Z. D. Chen, “Network
[2] J. M. C. Silva, P. Carvalho, and S. R. Lima, “Computational weight latency estimation with leverage sampling for personal devices: An adap-
of network traffic sampling techniques,” in Proc. IEEE Symp. Comput. tive tensor completion approach,” IEEE/ACM Trans. Netw., vol. 28,
Commun. (ISCC), Jun. 2014, pp. 1–6. no. 6, pp. 2797–2808, Dec. 2020.
[3] N. G. Duffield and M. Grossglauser, “Trajectory sampling for direct [21] Q. Wang, L. Chen, Q. Wang, H. Zhu, and X. Wang, “Anomaly-aware
traffic observation,” IEEE/ACM Trans. Netw., vol. 9, no. 3, pp. 280–292, network traffic estimation via outlier-robust tensor completion,” IEEE
Jun. 2001. Trans. Netw. Service Manag., vol. 17, no. 4, pp. 2677–2689, Dec. 2020.
[4] D. Alderson, H. Chang, M. Roughan, S. Uhlig, and W. Willinger, [22] K. Xie, S. Li, X. Wang, G. Xie, and Y. Ouyang, “Expectile tensor
“The many facets of internet topology and traffic,” Netw. Heterogeneous completion to recover skewed network monitoring data,” in Proc. IEEE
Media, vol. 1, no. 4, pp. 569–600, 2006. INFOCOM Conf. Comput. Commun., May 2021, pp. 1–10.

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
DENG et al.: GRAPH-TENSOR NEURAL NETWORKS FOR NETWORK TRAFFIC DATA IMPUTATION 3023

[23] N. Ramakrishnan and T. Soni, “Network traffic prediction using recur- [50] Z. Zhang and S. Aeron, “Exact tensor completion using t-SVD,” IEEE
rent neural networks,” in Proc. 17th IEEE Int. Conf. Mach. Learn. Appl. Trans. Signal Process., vol. 65, no. 6, pp. 1511–1526, Mar. 2015.
(ICMLA), Dec. 2018, pp. 187–193. [51] B. W. Bader and T. G. Kolda, “Efficient MATLAB computations with
[24] C. Zhang and P. Patras, “Long-term mobile traffic forecasting using deep sparse and factored tensors,” SIAM J. Sci. Comput., vol. 30, no. 1,
spatio-temporal neural networks,” in Proc. 18th ACM Int. Symp. Mobile pp. 205–231, Dec. 2007.
Ad Hoc Netw. Comput., Jun. 2018, pp. 231–240. [52] B. W. Bader et al. MATLAB Tensor ToolBox Version 2.5. Accessed: Jan.
[25] S. Mekaouil, C. Benhamed, and K. Ghoumid, “Traffic matrix estimation 2012. [Online]. Available: https://www.tensortoolbox.org/
using the Levenberg–Marquardt neural network of a large IP system,” [53] S. Uhlig, B. Quoitin, J. Lepropre, and S. Balon, “Providing public
Data Manag. Secur., Appl. Med., Sci. Eng., vol. 45, pp. 85–95, Jan. 2013. intradomain traffic matrices to the research community,” ACM SIG-
[26] Y. Ouyang, K. Xie, X. Wang, J. Wen, and G. Zhang, “Lightweight COMM Comput. Commun. Rev., vol. 36, no. 1, pp. 83–86, 2006.
trilinear pooling based tensor completion for network traffic monitor- [54] Y. Hu, L. Deng, H. Zheng, X. Feng, and Y. Chen, “Network latency
ing,” in Proc. IEEE INFOCOM Conf. Comput. Commun., May 2022, estimation with graph-Laplacian regularization tensor completion,” in
pp. 2128–2137. Proc. IEEE GLOBECOM Global Commun. Conf., Dec. 2020, pp. 1–6.
[27] J. R. Hershey, J. Le Roux, and F. Weninger, “Deep unfolding: Model- [55] H. Aidos and P. Tomas, “Neighborhood-aware autoencoder for missing
based inspiration of novel deep architectures,” 2014, arXiv:1409.2574. value imputation,” in Proc. 28th Eur. Signal Process. Conf. (EUSIPCO),
[28] J. Johnston, Y. Li, M. Lops, and X. Wang, “ADMM-Net for commu- Jan. 2021, pp. 1542–1546.
nication interference removal in stepped-frequency radar,” IEEE Trans.
Signal Process., vol. 69, pp. 2818–2832, 2021.
[29] J. Zhang and B. Ghanem, “ISTA-Net: Interpretable optimization-inspired
deep network for image compressive sensing,” in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1828–1837.
[30] Y. Zhang, X.-Y. Liu, B. Wu, and A. Walid, “Video synthesis via
transform-based tensor neural network,” in Proc. 28th ACM Int. Conf.
Multimedia, Oct. 2020, pp. 2454–2462.
[31] X. Han, B. Wu, Z. Shou, X.-Y. Liu, Y. Zhang, and L. Kong, “Tensor Lei Deng (Graduate Student Member, IEEE)
FISTA-Net for real-time snapshot compressive imaging,” in Proc. AAAI received the B.Eng. degree in communication engi-
Conf. Artif. Intell., 2020, pp. 10933–10940. neering from Liaoning Technical University, China,
[32] M. Yu, “Network telemetry: Towards a top-down approach,” ACM in 2017. He is currently pursuing the Ph.D. degree
SIGCOMM Comput. Commun. Rev., vol. 49, no. 1, pp. 11–17, Feb. 2019. with Fuzhou University, Fuzhou, China. His research
[33] L. Tan et al., “In-band network telemetry: A survey,” Comput. Netw., interests include tensor theory, graph data analysis,
vol. 186, Feb. 2021, Art. no. 107763. graph neural networks, and the Internet of Things.
[34] B. Claise, “Cisco systems Netflow services export version 9,” IETF,
Fremont, CA, USA, Tech. Rep. RFC 3954, 2004. [Online]. Available:
https://www.rfceditor.org/info/rfc3954
[35] A. Tootoonchian, M. Ghobadi, and Y. Ganjali, “OpenTM: Traffic matrix
estimator for OpenFlow networks,” in Proc. Int. Conf. Passive Act. Netw.
Meas. Cham, Switzerland: Springer, 2010, pp. 201–210.
[36] E. Kernfeld, M. Kilmer, and S. Aeron, “Tensor-tensor products
with invertible linear transforms,” Linear Algebra Appl., vol. 485,
pp. 545–570, Nov. 2015.
[37] K.-C. Lan and J. Heidemann, “On the correlation of internet flow,”
USC/Inf. Sci. Inst., Marina Del Rey, CA, USA, Tech. Rep. ISI-TR-574,
2003. Xiao-Yang Liu (Member, IEEE) received the B.Eng.
[38] M. Kamola, “Estimation of correlated flows from link measurements,” degree in computer science from the Huazhong
in Proc. 20th Int. Conf. Methods Models Autom. Robot. (MMAR), University of Science and Technology, China,
Aug. 2015, pp. 272–277. in 2010, the Ph.D. degree from the Department
[39] M. E. Kilmer and C. D. Martin, “Factorization strategies for third-order of Computer Science and Engineering, Shanghai
tensors,” Linear Algebra Appl., vol. 435, no. 3, pp. 641–658, 2011. Jiao Tong University, in 2017, and the M.S. degree
[40] X.-Y. Liu and M. Zhu, “Convolutional graph-tensor net for graph data in electrical engineering from Columbia University,
completion,” 2021, arXiv:2103.04485. USA, in 2018, where he is currently pursuing the
[41] R. G. Baraniuk, “Compressive sensing [lecture notes],” IEEE Signal Ph.D. degree with the Department of Electrical
Process. Mag., vol. 24, no. 4, pp. 118–121, Jul. 2007. Engineering.
His research interests include tensor theory and
[42] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: high-performance tensor computation, deep learning, optimization algorithms,
Exact signal reconstruction from highly incomplete frequency informa- big data analysis, and data privacy.
tion,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006.
[43] E. Candès and J. Romberg, “Sparsity and incoherence in compressive
sampling,” Inverse Problems, vol. 23, no. 3, pp. 969–985, 2007.
[44] N. Shahid, N. Perraudin, V. Kalofolias, G. Puy, and P. Vandergheynst,
“Fast robust PCA on graphs,” IEEE J. Sel. Topics Signal Process.,
vol. 10, no. 4, pp. 740–756, Jun. 2016.
[45] X. Ren and Z. Lin, “Linearized alternating direction method with
adaptive penalty and warm starts for fast solving transform invariant low-
rank textures,” Int. J. Comput. Vis., vol. 104, no. 1, pp. 1–14, Aug. 2013.
Haifeng Zheng (Senior Member, IEEE) received
[46] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward the B.Eng. and M.S. degrees in communication
networks are universal approximators,” Neural Netw., vol. 2, no. 5, engineering from Fuzhou University, Fuzhou, China,
pp. 359–366, 1989. and the Ph.D. degree in communication and infor-
[47] J. Ma, X.-Y. Liu, Z. Shou, and X. Yuan, “Deep tensor ADMM-net for mation system from Shanghai Jiao Tong University,
snapshot compressive imaging,” in Proc. IEEE/CVF Int. Conf. Comput. Shanghai, China. He was a Visiting Scholar with
Vis. (ICCV), Oct. 2019, pp. 10222–10231. The State University of New York at Buffalo from
[48] H.-J. Xue, X. Dai, J. Zhang, S. Huang, and J. Chen, “Deep matrix October 2015 to September 2016. He is currently
factorization models for recommender systems,” in Proc. 26th Int. Joint a Professor with the College of Physics and Infor-
Conf. Artif. Intell., vol. 17, 2017, pp. 3203–3209. mation Engineering, Fuzhou University. His research
[49] Q. Sun et al., “Convolutional imputation of matrix networks,” in Proc. interests include tensor theory, machine learning, the
Int. Conf. Mach. Learn., 2018, pp. 4818–4827. Internet of Things, and wireless networks.

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.
3024 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 31, NO. 6, DECEMBER 2023

Xinxin Feng (Member, IEEE) received the B.S. and Zhizhang (David) Chen (Fellow, IEEE) received
M.S. degrees in communication engineering from the B.Eng. degree in radio engineering from Fuzhou
the Nanjing University of Science and Technology, University, Fuzhou, Fujian, China, in 1982, the
China, in 2006 and 2008, respectively, and the master’s degree in radio engineering from Southeast
Ph.D. degree in information and communication University, Nanjing, China, in 1986, and the Ph.D.
engineering from Shanghai Jiao Tong University, degree in electrical engineering from the University
China, in 2015. Currently, she is an Associate Pro- of Ottawa, Ottawa, ON, Canada, in 1992.
fessor with the College of Physics and Information He was an NSERC Post-Doctoral Fellow with
Engineering, Fuzhou University, Fuzhou, China. Her McGill University, Montreal, Canada, in 1993. He is
research interests include dynamic spectrum sharing, currently with the Department of Electrical and
incentive mechanisms design in vehicle networks, Computer Engineering, Dalhousie University, Hali-
and network economies. fax, NS, Canada, where he is currently a Professor and the former Head of the
Department of Electrical and Computer Engineering. He has been an Adjunct
or Visiting Professor with the University of Nottingham, U.K., École Nationale
Supérieure des Télé de Bretagne of France, Shanghai Jiao Tong University,
Fuzhou University, The Hong Kong University of Science and Technol-
ogy, and the University of Electronic Science and Technology of China.
He has authored and coauthored over 450 journals and conference papers
in computational electromagnetics, RF/microwave electronics, antennas, and
wireless technologies. He was one of the originators of the unconditionally
stable methods that have been highly cited and used. He and his team
also developed several nonlinear ultra-wideband receivers and planar wireless
power transfer transmitting and receiving structures. His current research
interests include time-domain electromagnetic modeling techniques, antennas,
wideband wireless communication and sensing systems, and wireless power
technology. He is a Fellow of the Canadian Academy of Engineering and the
Engineering Institute of Canada. He received the 2005 Nova Scotia Engineer-
ing Award, the 2006 Dalhousie Graduate Teaching Award, the 2007 and 2015
Dalhousie Faculty of Engineering Research Award, the 2013 IEEE Canada
Fessenden Medal, and Dalhousie University Professorship. He has served
as the Guest Editor or the Track Editor for the IEEE T RANSACTIONS ON
M ICROWAVE T HEORY AND T ECHNIQUES, IEEE Microwave Magazine, IEEE
J OURNAL OF E LECTROMAGNETICS , RF AND M ICROWAVE IN M EDICINE
AND B IOLOGY , and the International Journal of Numerical Modeling (John
Wiley) and an Associate Editor for the IEEE J OURNAL OF M ULTISCALE AND
M ULTIPHYSICS C OMPUTATIONAL T ECHNIQUES. He was also the Founding
Chair of the joint Signal Processing and Microwave Theory & Techniques
Chapter of IEEE Atlantic Canada, the Chair of the IEEE Canada Atlantic
Section, and a member of the Board of Directors for IEEE Canada (2000–
2001). He currently serves as a Topic Editor for the IEEE J OURNAL OF
M ICROWAVES and an Elected Member of the Ad-Com of IEEE Antennas
and Propagation Society.

Authorized licensed use limited to: Xian Jiaotong University. Downloaded on November 05,2024 at 10:46:51 UTC from IEEE Xplore. Restrictions apply.

You might also like