0% found this document useful (0 votes)
31 views12 pages

Long Term 5G Network Traffic Forecasting Via Model

1) 5G technology has enabled new applications but increased network traffic, which can exceed network capacity and degrade performance if not addressed. 2) Accurate long-term prediction of network traffic months in advance is needed to plan infrastructure upgrades to maintain quality of service. 3) Existing prediction methods are ineffective for long-term forecasting due to the non-stationary nature of traffic data over time from factors like seasonal variations, random perturbations, and outliers.

Uploaded by

Sukarn Gahlaut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views12 pages

Long Term 5G Network Traffic Forecasting Via Model

1) 5G technology has enabled new applications but increased network traffic, which can exceed network capacity and degrade performance if not addressed. 2) Accurate long-term prediction of network traffic months in advance is needed to plan infrastructure upgrades to maintain quality of service. 3) Existing prediction methods are ineffective for long-term forecasting due to the non-stationary nature of traffic data over time from factors like seasonal variations, random perturbations, and outliers.

Uploaded by

Sukarn Gahlaut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

ARTICLE

https://doi.org/10.1038/s44172-023-00081-4 OPEN

Long term 5G network traffic forecasting via


modeling non-stationarity with deep learning
Yuguang Yang1,5, Shupeng Geng 1,5, Baochang Zhang 1,2 ✉, Juan Zhang 1,2 ✉, Zheng Wang3,
Yong Zhang3 & David Doermann4

5G cellular networks have recently fostered a wide range of emerging applications, but their
popularity has led to traffic growth that far outpaces network expansion. This mismatch may
decrease network quality and cause severe performance problems. To reduce the risk,
operators need long term traffic prediction to perform network expansion schemes months
1234567890():,;

ahead. However, long term prediction horizon exposes the non-stationarity of series data,
which deteriorates the performance of existing approaches. We deal with this problem by
developing a deep learning model, Diviner, that incorporates stationary processes into a well-
designed hierarchical structure and models non-stationary time series with multi-scale stable
features. We demonstrate substantial performance improvement of Diviner over the current
state of the art in 5G network traffic forecasting with detailed months-level forecasting for
massive ports with complex flow patterns. Extensive experiments further present its
applicability to various predictive scenarios without any modification, showing potential to
address broader engineering problems.

1 Beihang University, 100191 Beijing, China. 2 Zhongguancun Laboratory, 100094 Beijinig, China. 3 China Unicom, 100037 Beijing, China. 4 University at Buffalo,

14260 Buffalo, NY, USA. 5These authors contributed equally: Yuguang Yang and Shupeng Geng. ✉email: [email protected]; [email protected]

COMMUNICATIONS ENGINEERING | (2023)2:33 | https://doi.org/10.1038/s44172-023-00081-4 | www.nature.com/commseng 1


ARTICLE COMMUNICATIONS ENGINEERING | https://doi.org/10.1038/s44172-023-00081-4

5G
technology has recently gained popularity world- planning for upgrading and scaling the network infrastructure
wide for its faster transfer speed, broader band- and prepare it for the next planning period14–17.
width, reliability, and security. 5G technology can In industry, a common practice is calculating network traffic’s
achieve a 20× faster theoretical peak speed over 4G with lower potential growth rate by analyzing the historical traffic data18.
latency, promoting applications like online gaming, HD stream- However, this approach cannot scale to predict the demand for
ing services, and video conferences1–3. The development of 5G is new services and is less than satisfactory for long-term forecast-
changing the world at an incredible pace and fostering emerging ing. And predictions-based methods have been introduced to
industries such as telemedicine, autonomous driving, and exten- solve this dilemma by exploring the potential dependencies
ded reality4–6. These and other industries are estimated to bring a involved in historical network traffic, which provides both a
1000-fold boost in network traffic, requiring the additional constraint and a source for assessing future traffic volume. Net-
capacity to accommodate these growing services and work planners can harness the dependencies to extrapolate long-
applications7. Nevertheless, 5G infrastructure, such as board enough traffic forecasts to develop sustainable expansion schemes
cards and routers, must be deployed and managed with strict cost and mitigation strategies. The key issue to this task is to obtain an
considerations8,9. Therefore, operators often adopt a distributed accurate long-term network traffic prediction. However, directly
architecture to avoid massive back-to-back devices and links extending the prediction horizon of existing methods is ineffec-
among fragmented networks10–13. As shown in Fig. 1a, the tive for long-term forecasting since these methods suffer a severe
emerging metropolitan router is the hub to link urban access performance degeneration, where the long-term prediction hor-
routers, where services can be accessed and integrated effectively. izon exposes the non-stationarity of time series. This inherent
However, the construction cycle of 5G devices requires about non-stationarity of real-world time series data is caused by multi-
three months to schedule, procure, and deploy. Planning new scale temporal variations, random perturbations, and outliers,
infrastructures requires accurate network traffic forecasts months which present various challenges. These are summarized as fol-
ahead to anticipate the moment that capacity utilization surpasses lows. (a) Multi-scale temporal variations. Multi-scale (daily/
the preset threshold, where the overloaded capacity utilization weekly/monthly/yearly) variations throughout long-term time
might ultimately lead to performance problems. Another issue series indicate multi-scale non-stationary latent patterns within
concerns the resource excess caused by building coarse-grained the time series, which should be taken into account compre-
5G infrastructures. To mitigate these hazards, operators for- hensively in the model design. The seasonal component, for
mulate network expansion schemes months ahead with long-term example, merely presents variations at particular scales. (b)
network traffic prediction, which can facilitate long-period Random factors. Random perturbations and outliers interfere

b c
ar
color b2
4 z
3
1 4
Inbits traffic

0 Input Layer
-1 3
2 X ... Linear Projection of Input Series Matrix
1
0 ..
y x
Da10080 300
Time Series
i ly 200 250
tim6040 100 150 h d ay Construct
e s 20 0 50 ss eac Embedding
0 s acro Input Time Series Matrix
tep
s Tim e step

History time series Diviner Encoder N• Diviner Decoder M•


Future network traffic
4

Future time series Smoothing Filter Attention Smoothing Filter Attention


1 2 3

History network traffic


1 T Mechanism Mechanism
inbits traffic
0

Difference Attention Difference Attention


-1

Time steps Time steps


Module Module

a Network topology of Add and normalization Add and normalization


5G distributed architecture

Feed Forward Feed Forward

MER
... MER
...
MER Add and normalization Add and normalization
outbit

inbit

MAR MAR MAR MAR


Generator Layer

MAR MAR
... ... One Step Convolution Generator
MAR MAR MAR

Fig. 1 Schematic illustration for the workflow of Diviner. a We collect the data from MAR–MER links. The orange cylinder depicts the metropolitan
emerging routers (MER), and the pale blue cylinder depicts metropolitan accessing routers (MAR). b The illustration of the introduced 2D → 3D
transformation process. Specifically, given a time series of network traffic data spanning K days, we construct a time series matrix X e ¼ ½~x1 ~
x2 ¼ ~xK ,
where each ~xi represents the traffic data for a single day of length T. The resulting 3D plot displays time steps across each day, daily time steps, and inbits
traffic along the x, y, and z axes, respectively, with the inbits traffic standardized. The blue line in the 2D plot and the side near the origin of the pale red
plane in the 3D plot represent historical network traffic, while the yellowish line in the 2D plot and the side far from the origin of the pale red plane in the 3D
plot represent the future network traffic to predict. c The overall working flow of the proposed Diviner. The blue solid line indicates the data stream
direction. Both the encoder and decoder blocks of Diviner contain a smoothing filter attention mechanism (yellowish block), a difference attention module
(pale purple block), a residual structure (pale green block), and a feed-forward layer (gray block). Finally, a one-step convolution generator (magenta
block) is employed to convert the dynamic decoding into a sequence-generating procedure.

2 COMMUNICATIONS ENGINEERING | (2023)2:33 | https://doi.org/10.1038/s44172-023-00081-4 | www.nature.com/commseng


COMMUNICATIONS ENGINEERING | https://doi.org/10.1038/s44172-023-00081-4 ARTICLE

with the discovery of stable regularities, which entails higher decomposition that merely presents the temporal variation at
robustness in prediction models. (c) Data distribution shift. Non- particular scales. (2) We develop a deep learning framework with
stationarity of the time series inevitably results in a dataset shift a well-designed hierarchical structure to model the multi-scale
problem with the input data distribution varying over time. stable regularities within non-stationary time series. In contrast to
Figure 1b illustrates these challenges. previous methods employing various modules in the same layer,
Next, we review the shortcomings of existing methods con- we perform a dynamical scale transformation between different
cerning addressing non-stationarity issues. Existing time series layers and model stable temporal dependencies in the corre-
prediction methods generally fall into two categories, conven- sponding layer. This hierarchical deep stationary process syn-
tional models and deep learning models. Most conventional chronizes with the cascading feature embedding of deep neural
models, such as ARIMA19,20 and HoltWinters21–25, are built with networks, which enables us to capture complex regularities con-
some insight into the time series but implemented linearly, tained in the long-term histories and achieve precise long-term
causing problems for modeling non-stationary time series. Fur- network traffic forecasting. Our experiment demonstrates that the
thermore, these models rely on manually tuned parameters to fit robustness and predictive accuracy significantly improve as we
the time series, which impedes their application in large-scale consider more factors concerning non-stationarity, which pro-
prediction scenarios. Although Prophet26 uses a nonlinear mod- vides an avenue to improve the long-term forecast ability of deep
ular and interpretive parameter to address these problems, its learning methods. Besides, we also show that the modeling of
hand-crafted nonlinear modules need help to easily model non- non-stationarity can help discover nonlinear latent regularities
stationary time series, whose complex patterns make it inefficient within network traffic and achieve a quality long-term 5G net-
to embed diverse factors in hand-crafted functions. This dilemma work traffic forecast for up to three months. Furthermore, we
boosts the development of deep learning methods. Deep learning expand our solution to climate, control, electricity, economic,
models can utilize multiple layers to represent latent features at a energy, and transportation fields, which shows the applicability of
higher and more abstract level27, enabling us to recognize deep this solution to multiple predictive scenarios, showing valuable
latent patterns in non-stationary time series. Recurrent neural potential to solve broader engineering problems.
networks (RNNs) and Transformer networks are two main deep
learning forecasting frameworks. RNN-based models28–34 feature Results
a feedback loop that allows models to memorize historical data Diviner with deep stationary processes. In this Section, we
and process variable-length sequences as inputs and outputs, introduce our proposed deep learning model, Diviner, that tackles
which calculates the cumulative dependency between time steps. the non-stationarity of long-term time series prediction with deep
Nevertheless, such indirect modeling of temporal dependencies stationary processes, which captures multi-scale stable features
can not disentangle information from different scales within and models multi-scale stable regularities to achieve long-term
historical data and thus fails to capture multi-scale variations time series prediction.
within non-stationary time series. Transformer-based
models35–37 solve this problem using a global self-attention Smoothing filter attention mechanism as a scale converter. As
mechanism rather than feedback loops. Doing so enhances the shown in Fig. 2a, the smoothing filter attention mechanism
network’s ability to capture longer dependencies and interactions adjusts the feature scale and enables Diviner to model time series
within series data and thus brings exciting progress in various from different scales and access the multi-scale variation features
time series applications38. For more efficient long-term time within non-stationary time series. We build this component
series processing, some studies39–41 turn the self-attention based on Nadaraya-Watson regression51,52, a classical algorithm
mechanism into a sparse version. However, despite their pro- for non-parametric regression. Given the sample space
mising long-term forecasting results, time series’ specialization is Ω ¼ fðxi ; yi Þj1 ≤ i ≤ n; xi 2 R; yi 2 Rg, window size h, and kernel
not taken into account during their modeling process, where function K( ⋅ ), the Nadaraya–Watson regression has the follow-
varying distributions of non-stationary time series deteriorate ing expression:
their predictive performances. Recent research attempts to n x  x  n x  x 
incorporate time series decomposition into deep learning j
^y ¼ ∑ K i
yi = ∑ K ; ð1Þ
models42–47. Although their results are encouraging and bring i¼1 h j¼1 h
more interpretive and reasonable predictions, their limited R1
decomposition, e.g., trend-seasonal decomposition, reverses the where the kernel function K( ⋅ ) is subject to 1 KðxÞdx ¼ 1 and
correlation between components and merely presents the varia- n, x, y denote sample size, independent variable, and dependent
tion of time series at particular scales. variable, respectively.
In this work, we incorporate deep stationary processes into The Nadaraya–Watson regression estimates the regression
neural networks to achieve precise long-term 5G network traffic value ^y using a local weighted average method, where the weight
xxj
forecasts, where stochastic process theories can guarantee the of a sample (xi, yi), Kðxx n
h Þ= ∑j¼1 Kð h Þ, decays with the distance
i

prediction of stationary events48–50. Specifically, as shown in of xi from x. Consequently, the primary sample (xi, yi) is closer to
Fig. 1c, we develop a deep learning model, Diviner, that incor- samples in its vicinity. This process implies the basic notion of
porates stationary processes into a well-designed hierarchical scale transformation, where adjacent samples get closer on a more
structure and models non-stationary time series with multi-scale significant visual scale. Inspired by this thought, we can
stable features. To validate the effectiveness, we collect an reformulate the Nadaraya–Watson regression from the perspec-
extensive network port traffic dataset (NPT) from the intelligent tive of scale transformation. We incorporate it into the attention
metropolitan network delivering 5G services of China Unicom structure to design a learnable scale adjustment unit. Concretely,
and compare the proposed model with numerous current arts we introduce the smoothing filter attention mechanism with a
over multiple applications. We make two distinct research con- learnable kernel function and self-masked operation, where the
tributions to time series forecasting: (1) We explore an avenue to former shrinks (or magnifies) variations for adaptive feature-scale
solve the challenges presented in long-term time series prediction adjustment, and the letter eliminates outliers. To ease under-
by modeling non-stationarity in the deep learning paradigm. This standing, we consider the 1D time series case here, and the high-
line is much more universal and effective than the previous works dimensional case can be easily extrapolated (shown mathemati-
incorporating temporal decomposition for their limited cally in Section “Methods”). Given the time step ti, we estimate its

COMMUNICATIONS ENGINEERING | (2023)2:33 | https://doi.org/10.1038/s44172-023-00081-4 | www.nature.com/commseng 3


ARTICLE COMMUNICATIONS ENGINEERING | https://doi.org/10.1038/s44172-023-00081-4

a b
Smoothing Filter Matrix multiplication Difference Attention
, Attention Mechanism Modular
˜ Direction of information flow

, Q K V
self-masked structure
Matrix-Difference Transformation

... n
∆Q
Q ∆K
K ∆V
V

Linear ℎ Linear Linear


n n n
Qs Ks Vs

Attention Matrix
, ˜ , ˜ , ˜
Vs = Vs SoftMax(Q s K

ConCat ...
... Concat & Matrix-CumSum

Fig. 2 Illustration of the structure of smoothing filter attention mechanism and difference attention module. a This panel displays the smoothing filter
attention mechanism, which involves computing adaptive weights K(ξi, ξj) (orange block) and employing a self-masked structure (gray block with dashed
lines) to filter out the outliers, where ξi denotes the ith embedded time series period (yellow block). The adaptive weights serve to adjust the feature scale
of the input series and obtain the scale-transformed period embedding hi (pink block). b This diagram illustrates the difference attention module. The
Matrix-Difference Transformation (pale blue block) subtracts adjacent columns of a matrix to obtain the shifted query, key, and value items (ΔQ, ΔK, and
ΔV). Then, an autoregressive multi-head self-attention is performed (in the pale blue background) to capture the correlation of time series across different
time steps, resulting in Ve ðiÞ for the ith attention head. Here, QðiÞ , KðiÞ , VðiÞ , and V
e ðiÞ represent the query, key, value, and result in items, respectively. the
s s s s s
SoftMax is applied to the scaled dot-product between the query and key vectors to obtain attention weights (the pale yellow block). The formula for the
SoftMax function is SoftMaxðki Þ ¼ eki = ∑nj¼1 ekj , where ki is the ith element of the input vector, and n is the length of the input vector. Lastly, the Matrix-
CumSum operation (light orange block) accumulates the shifted features using the ConCat operation, and Ws denotes the learnable aggregation
parameters.

regression value ^yi with an adaptive-weighted average of values and varies around a fixed mean level with minor distribution
{yt∣t ≠ ti}, ^yi ¼ ∑j≠i αj yj , where the adaptive weights α are obtained shifts. Subsequently, we use a self-attention mechanism to
by a learnable kernel function f. The punctured window {tj∣tj ≠ ti} interconnect shifts, which captures the temporal dependencies
of size n − 1 denotes our self-masked operation, and within the time series variation. Last, we employ a CumSum
f ðyi ; yÞw ¼ expðwi ðyi  yÞ2 Þ, αi ¼ f ðyi ; yÞw =∑j≠i f ðyj ; yÞw . Our operation to accumulate shifted features and generate a non-
i i i stationary time series conforming to the discovered regularities.
adaptive weights vary with the inner variation fðyi  yÞ2 jt i ≠tg
(decreased or increased), which adjusts (shrinking or magnifying) Modeling and generating non-stationary time series in Diviner
the distance of points across each time step and achieves an framework. The smoothing filter attention mechanism filters out
adaptive feature-scale transformation. Specifically, the minor random components and dynamically adjusts the feature scale.
variation gets further shrunk at a large feature scale, magnified at Subsequently, the difference attention module calculates internal
a small feature scale, and vice versa. Concerning random connections and captures the stable regularity within the time
components, global attention can serve as an average smoothing series at the corresponding scale. Cascading these two modules,
method to help filter small perturbations. As for outliers, their one Diviner block can discover stable regularities within non-
large margin against regular items leads to minor weights, which stationary time series at one scale. Then, we stack Diviner blocks
eliminates the interference of outliers. Especially when the sample in a multilayer structure to achieve multi-scale transformation
(ti, yi) comes to be an outlier, this structure brushes itself aside. layers and capture multi-scale stable features from non-stationary
Thus, the smoothing filter attention mechanism filters out time series. Such a multilayer structure is organized in an
random components and dynamically adjusts feature scales. This encoder-decoder architecture with asymmetric input lengths for
way, we can dynamically transform non-stationary time series efficient data utilization. The encoder takes a long historical series
according to different scales, which accesses time series under to embed trends, and the decoder receives a relatively short time
comprehensive sights. series. With the cross-attention between the encoder and decoder,
we can pair the latest time series with pertinent variation patterns
Difference attention module to discover stable regularities. The from long historical series and make inferences about future
difference attention module calculates the internal connections trends, improving calculation efficiency and reducing redundant
among stable shifted features to discover stable regularities within historical information. The point is that the latest time series is
the non-stationary time series and thereby overcomes the inter- more conducive to anticipating the immediate future than the
ference of uneven distributions. Concretely, as shown in Fig. 2b, remote-past time series, where the correlation across time steps
this module includes the difference and CumSum operations at generally degrades with the length of the interval53–57. Addi-
both ends of the self-attention mechanism35, which interconnects tionally, we design a generator to obtain prediction results in one
the shift across each time step to capture internal connections step to avoid dynamic cumulative error problems39. The gen-
within non-stationary time series. The difference operation erator is built with CovNet sharing parameters throughout each
separates the shifts from the long-term trends, where the shift time step based on the linear projection generator39,58,59, which
refers to the minor difference in the trends between adjacent time saves hardware resources. These techniques enable deep learning
steps. Considering trends lead the data distribution to change methods to model non-stationary time series with multi-scale
over time, the difference operation makes the time series stable stable features and produce forecasting results in a generative

4 COMMUNICATIONS ENGINEERING | (2023)2:33 | https://doi.org/10.1038/s44172-023-00081-4 | www.nature.com/commseng


COMMUNICATIONS ENGINEERING | https://doi.org/10.1038/s44172-023-00081-4 ARTICLE

Table 1 Time-series forecasting results on the 5G traffic network dataset.

Models Diviner Autoformer Informer Transformer NBeats

Metric MSE MAE MASE MSE MAE MASE MSE MAE MASE MSE MAE MASE MSE MAE MASE
NPT-1 96 0.256 0.340 1.391 0.456 0.511 2.090 0.264 0.349 1.427 0.259 0.333 1.362 0.491 0.509 2.082
288 0.277 0.379 1.598 0.431 0.499 2.104 0.611 0.590 2.488 0.376 0.445 1.876 0.624 0.694 2.927
672 0.263 0.367 1.601 0.446 0.522 2.278 1.680 0.885 3.862 0.365 0.437 1.907 0.680 0.615 2.684
1344 0.275 0.367 1.585 0.400 0.467 2.017 1.307 0.923 3.987 0.448 0.462 1.996 0.883 0.692 2.989
2880 0.318 0.390 1.613 0.674 0.629 2.601 1.590 1.050 4.343 0.811 0.652 2.697 1.257 0.844 3.491
NPT-2 96 0.370 0.405 1.800 0.605 0.603 2.681 0.760 0.646 2.870 0.458 0.470 2.088 0.539 0.476 2.116
288 0.394 0.431 1.977 0.579 0.607 2.786 1.131 0.826 3.788 0.415 0.454 2.082 0.589 0.541 2.481
672 0.484 0.462 2.074 0.541 0.525 2.357 1.149 0.861 3.864 0.548 0.546 2.453 0.734 0.598 2.685
1344 0.314 0.372 1.814 0.437 0.472 2.301 1.129 0.858 4.181 0.705 0.593 2.889 0.583 0.532 2.593
2880 0.378 0.390 1.861 0.750 0.644 3.072 1.342 0.935 4.457 0.458 0.470 2.240 0.934 0.725 3.459
NPT-3 96 0.177 0.323 1.672 0.272 0.401 2.076 0.664 0.656 3.397 0.300 0.415 2.150 0.227 0.347 1.797
288 0.193 0.301 1.558 0.579 0.607 3.144 0.880 0.721 3.736 0.458 0.478 2.478 0.486 0.498 2.579
672 0.187 0.305 1.599 0.541 0.525 2.753 0.931 0.771 4.044 0.327 0.409 2.147 0.455 0.488 2.558
1344 0.204 0.335 1.822 0.437 0.472 2.569 1.023 0.831 4.520 0.362 0.434 2.363 0.622 0.575 3.128
2880 0.240 0.350 1.756 0.750 0.644 3.228 1.196 0.922 4.622 0.362 0.434 2.177 0.816 0.673 3.374
1 n
n ∑i¼1 jyi ^yi j
The traffic forecast accuracy is assessed by MSE, MAE, and MASE. MSE ¼ n1 ∑ni¼1 ðyi  ^yi Þ2 , MAE ¼ n1 ∑ni¼1 jyi  ^yi j, MASE ¼ n , where ^y 2 Rn denotes the forecast and y 2 Rn denotes the
n1 ∑j¼2 jyj yj1 j
1

ground truth. All datasets were standardized using the mean and standard deviation values of the training set. The best predictive performance over the comparison is shown in bold.

paradigm, which is an attempt to tackle long-term time series experimental results are summarized in Table 1. Although there
prediction problems. exist quantities of outliers and frequent oscillations in the NPT
dataset, Diviner achieves a 38.58% average MSE reduction
Performance of the 5G network traffic forecasting. To validate (0.451 → 0.277) and a 20.86% average MAE reduction
the effectiveness of the proposed techniques, we collect extensive (0.465 → 0.368) based on the prior art. In terms of the scalability to
NPTs from China Unicom. The NPT datasets include data recorded different prediction spans, Diviner has a much lower dMSE 30 1
every 15 minutes for the whole 2021 year from three groups of real- (4.014% → 0.750%) and dMAE 30 1 (2.343% → 0.474%) than the
world metropolitan network traffic ports {NPT-1, NPT-2, NPT-3}, prior art, which exhibits a slight performance degradation with a
where each sub-dataset contains {18, 5, 5} ports, respectively. We substantial improvement in predictive robustness when the pre-
split them chronologically with a 9:1 proportion for training and diction horizon becomes longer. The degradation rates and pre-
testing. In addition, we prepare 16 network ports for parameter- dictive performance of all baseline approaches have been provided
searching. The main difficulties lie in the explicit shift of the dis- in Supplementary Table S1 regarding to the space limitation.
tribution and numerous outliers. And this Section elaborates on the The experiments on NPT-2 and NPT-3 shown in Supplemen-
comprehensive comparison of our model with prediction-based and tary Data 1 reproduce the above results, where Diviner can support
growth-rate-based models in applying 5G network traffic forecasting. accurate long-term network traffic prediction and exceed current
art involving accuracy and robustness by a large margin. In
Experiment 1: We first compare Diviner to other time series addition, we have the following results by sorting the comprehen-
prediction-based methods, we note these baseline models as sive performances (obtained by the average MASE errors) of the
Baselines-T for clarity. Baselines-T include traditional models baselines established with the Transformer framework: Diviner >
ARIMA19,20 and Prophet26; classic machine learning model Autoformer > Transformer > Informer. This order aligns with the
LSTMa60; deep learning-based models Transformer35,
non-stationary factors considered in these models and verifies our
Informer39, Autoformer42, and NBeats61. These models are
proposal that incorporating non-stationarity promotes neural
required to predict the entire network traffic series {1, 3, 7, 14, 30}
networks’ adaptive abilities to model time series, and the modeling
days, aligned with {96, 288, 672, 1344, 2880} prediction spans
multi-scale non-stationarity other breaks through the ceiling of
ahead in Table 1, and inbits is the target feature. In terms of the
prediction abilities for deep learning models.
evaluation, although the MAE, MSE, and MASE predictive
accuracy generally decrease with prediction intervals, the degra-
Experiment 2: The second experiment compares Diviner with two
dation rate varies between models. Therefore, we introduce an
other industrial methods, which aim to predict the capacity uti-
exponential velocity indicator to measure the rate of accuracy
lization of inbits and outbits with historical growth rates. The
degradation. Specifically, given time spans [t1, t2] and the corre-
experiment shares the same network port traffic data as in
sponding MSE, MAE, and MASE errors, we have the following:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Experiment 1, while the split ratio is changed to 3:1 chron-
t ologically for a longer prediction horizon. Furthermore, we use a
dMSEt21 ¼ ð t2 t1 MSEt2 =MSEt1  1Þ ´ 100%; ð2Þ
long construction cycle of {30, 60, 90} days (aligned with {2880,
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 5760, 8640} time steps) to ensure the validity of such growth-rate-
t
dMAEt21 ¼ ð t2 t1 MAEt2 =MAEt1  1Þ ´ 100%; ð3Þ based methods for the law of large numbers. Here we first define
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi capacity utilization mathematically:
t
dMASEt21 ¼ ð t2 t1 MASEt2 =MASEt1  1Þ ´ 100%; ð4Þ Given a fixed bandwidth B 2 Rand the traffic flow of the kth 
construction cycles e ¼ ~
XðkÞ xkCþ1 ~xkCþ2 ::: ~xðkþ1ÞC ,
t t t
where dMSEt21 ; dMAEt21 ; dMASE t21 2 R. Concerning the close e 2 RT ´ C , where ~
XðkÞ xi 2 RT is a column vector of length T
experimental results between {NPT-1, NPT-2, and NPT-3}, we representing the time series per day and C denotes the number of
focus mainly on the result of the NPT-1 dataset, and the days in one construction cycle. Then the capacity utilization (CU)

COMMUNICATIONS ENGINEERING | (2023)2:33 | https://doi.org/10.1038/s44172-023-00081-4 | www.nature.com/commseng 5


ARTICLE COMMUNICATIONS ENGINEERING | https://doi.org/10.1038/s44172-023-00081-4

Table 2 Long-term (1–3 months) capacity utilization forecasting results on the NPT dataset.

Models Diviner Baseline-A Baseline-M

MAE, MASE Inbits Outbits Inbits Outbits Inbits Outbits


7D2NPT-1 2880 0.390, 2.388 0.552, 3.420 0.792, 4.851 0.874, 5.415 2.888, 17.690 5.693, 35.274
5760 0.705, 4.351 0.899, 6.494 0.870, 5.371 0.948, 6.848 31.490, 194.431 17.602, 127.16
8640 0.640, 3.425 0.694, 4.172 0.877, 4.693 1.012, 6.084 21.220, 113.567 14.487, 87.096
7D2NPT-2 2880 0.352, 1.992 0.525, 2.574 0.911, 5.159 0.907, 4.442 2.644, 14.959 2.112, 10.341
5760 0.508, 2.817 0.605, 2.905 1.037, 5.751 1.037, 4.980 8.464, 46.936 5.515, 26.470
8640 0.814, 4.363 0.642, 3.113 1.169, 6.265 1.189, 5.766 8.323, 44.602 7.048, 34.165
7D2NPT-3 2880 0.398, 2.232 0.413, 2.987 0.945, 5.291 0.719, 5.192 4.993, 27.947 5.222, 37.698
5760 0.769, 4.713 0.719, 4.911 1.048, 6.425 0.817, 5.580 6.895, 42.245 15.078, 102.973
8640 0.621, 3.611 0.604, 4.062 0.968, 5.622 0.829, 5.567 5.169, 30.023 29.547, 198.393

The long-term capacity utilization forecasting results on the NPT dataset were evaluated using MAE and MASE error metrics. All datasets were standardized using the mean and standard deviation
values of the training set. The best predictive performance over the comparison is shown in bold.

of the kth construction cycle is defined as follows: remain inherent multi-scale variations of network traffic series, so
e Diviner still exceeds the Baseline-A, suggesting the necessity of
k XðkÞk ð5Þ
CU ðkÞ ¼ m1
; applying deep learning models such as Diviner to discover
BCT nonlinear latent regularities within network traffic.
where CU ðkÞ 2 R. As shown in the definition, capacity When analyzing the results of these two experiments jointly,
utilization is directly related to network traffic, so a precise we present that Diviner possesses a relatively low degradation rate
for a prediction of 90 days, dMASE 901 ¼ 1:034%. In contrast, the
network traffic prediction leads to a quality prediction of capacity
1 ¼ 2:343%
utilization. We compare the proposed predictive method with two degradation rate of the prior art comes to dMASE 30
commonly used moving average growth rate predictive methods for a three-times shorter prediction horizon of 30 days.
in the industry, the additive and multiplicative moving average Furthermore, considering diverse network traffic patterns in the
growth rate predictive methods. For clarity, we note the additive provided datasets (about 50 ports), the proposed method can deal
method as Baseline-A and the multiplicative method as Baseline- with a wide range of non-stationary time series, validating its
M. Baseline-A calculates an additive growth rate with the applicability without modification. These experiments witness
difference of adjacent construction cycles. Given the capacity Diviner’s success in providing quality long-term network traffic
utilization of the last two construction cycles CU(k − 1), forecasting and extending the effective prediction spans of deep
CU(k − 2), we have the following: learning models for up to three months.
c A ðkÞ ¼ 2 CUðk  1Þ  CU ðk  2Þ:
CU ð6Þ
Application on other real-world datasets. We validate our
Baseline-M calculates a multiplicative growth rate with the method on benchmark datasets for the weather (WTH), electricity
quotient of adjacent construction cycles. Given the capacity transformer temperature (ETT), electricity (ECL), and exchange
utilization of the last two construction cycles CU(k − 1), (Exchange). We summarize the experimental results in Table 3. We
CU(k − 2), we have the following: follow the standard protocol and divide them into training, vali-
c M ðkÞ ¼ CUðk  1Þ CU ðk  1Þ:
CU ð7Þ
dation, and test sets in chronological order with a proportion of
CUðk  2Þ 7:1:2 unless otherwise specified. Due to the space limitation, the
complete experimental results are shown in Supplementary Data 2.
Different from the above two baselines, we calculate the
capacity utilization of the network with the network traffic Weather temperature prediction. The WTH dataset42 records 21
forecast. Given
 the network traffic of the last K construction meteorological indicators for Jena 2020, including air temperature
cycles Xe¼ ~ xðkKÞCþ1 ::: ~xðkKþ1ÞC ::: ~xðk1ÞC ::: ~
xkC , and humidity, and WetBulbFarenheit is the target. This dataset is
we have the following: finely quantified to the 10-min level, which means that there are
e ðkÞ ¼ DivinerðX
e Þ; ð8Þ 144 steps for one day and 4320 steps for one month, thereby
X
challenging the capacity of models to process long sequences.
e Among all baselines, NBeats and Informer have the lowest error in
c D ðkÞ ¼ k X ðkÞkm1 :
CU ð9Þ terms of MSE and MAE metrics, respectively. However, we notice a
BCT contrast between these two models when extending prediction
We summarize the experimental results in Table 2. Concerning spans. Informer degrades precipitously when the prediction spans
the close experimental results between {NPT-1, NPT-2, and NPT- increase from 2016 to 4032 (MAE:0.417 → 0.853), but on the
3} shown in, we focus mainly on the result of the NPT-1 dataset, contrary, NBeats gains a performance improvement
which has the most network traffic ports. Diviner achieves a (MAE:0.635 → 0.434). We attribute this to a trade-off of pursuing
substantial reduction of 31.67% MAE (0.846 → 0.578) on inbits context and texture. Informer has an advantage over the texture in
and a reduction of 24.25% MAE (0.944 → 0.715) on outbits over the short-term case. Still, it needs to capture the context depen-
Baseline-A. An intuitive explanation is that the growth-rate-based dency of the series considering the length of input history series
methods extract particular historical features but lack adapt- should extend in pace with prediction spans and vice versa. As for
ability. We notice that Baseline-A has a much better performance Diviner, it achieves a remarkable 29.30% average MAE reduction
of 0.045× average inbits-MAE and 0.074× average outbits-MAE (0.488 → 0.345) and 41.54% average MSE reduction
over Baseline-M. This result suggests that network traffic tends to (0.491 → 0.287) over both Informer and NBeats. Additionally,
increase linearly rather than exponentially. Nevertheless, there Diviner gains a low degradation rate of dMSE 30 1 ¼ 0:439%,

6 COMMUNICATIONS ENGINEERING | (2023)2:33 | https://doi.org/10.1038/s44172-023-00081-4 | www.nature.com/commseng


Table 3 Time-series forecasting results on other real-world datasets.

Models Diviner Autoformer Informer Transformer NBeats


Metric MSE MAE MASE MSE MAE MASE MSE MAE MASE MSE MAE MASE MSE MAE MASE
WTH 144 0.280 0.341 7.072 0.373 0.440 9.125 0.359 0.401 8.316 0.448 0.484 10.038 0.508 0.590 12.236
432 0.333 0.392 8.135 0.402 0.445 9.235 0.374 0.431 8.944 0.407 0.470 9.754 0.427 0.501 10.397
1008 0.273 0.328 6.806 0.663 0.613 12.720 0.344 0.387 8.030 0.535 0.514 10.666 0.406 0.490 10.167
2016 0.233 0.306 6.348 1.857 1.019 21.140 0.367 0.417 8.651 0.367 0.417 8.651 0.757 0.635 13.173
4032 0.318 0.358 5.832 1.016 0.853 13.897 1.251 0.806 13.131 0.876 0.616 10.035 0.361 0.434 7.070
ETTh1 24 0.058 0.183 4.174 0.093 0.234 5.338 0.098 0.247 5.634 0.468 0.599 13.664 0.157 0.269 6.136
48 0.071 0.203 4.629 0.089 0.229 5.222 0.158 0.319 7.274 0.369 0.524 11.950 0.146 0.292 6.659
168 0.119 0.262 5.977 0.148 0.280 6.387 0.183 0.346 7.893 0.478 0.618 14.099 0.494 0.536 12.228
336 0.114 0.268 6.031 0.183 0.344 7.742 0.222 0.387 8.710 0.235 0.417 9.385 0.411 0.494 11.118
720 0.157 0.322 7.195 0.201 0.364 8.133 0.269 0.435 9.720 0.261 0.445 9.943 1.257 0.844 18.859
ETTm1 24 0.157 0.322 7.195 0.201 0.364 8.133 0.269 0.435 9.720 0.261 0.445 9.943 1.257 0.844 18.859
48 0.023 0.119 5.159 0.092 0.250 10.838 0.069 0.230 9.971 0.288 0.481 20.853 1.584 1.220 52.892
96 0.044 0.162 6.957 0.063 0.198 8.503 0.194 0.372 15.975 0.264 0.450 19.325 1.352 1.106 47.497
288 0.078 0.209 9.375 0.096 0.245 10.990 0.401 0.554 24.850 0.230 0.410 18.391 0.628 0.621 27.856
672 0.071 0.211 9.323 0.117 0.276 12.195 0.512 0.644 28.456 0.379 0.540 23.861 0.361 0.480 21.210
ETTh2 24 0.072 0.203 4.281 0.131 0.281 5.927 0.093 0.240 5.062 0.608 0.653 13.773 0.167 0.318 6.707
48 0.109 0.252 5.209 0.143 0.284 5.871 0.155 0.314 6.491 0.758 0.740 15.298 0.264 0.392 8.104
168 0.206 0.352 6.070 0.254 0.399 6.881 0.232 0.389 6.708 0.425 0.528 9.105 0.525 0.548 9.450
336 0.220 0.373 5.792 0.262 0.403 6.258 0.263 0.417 6.475 0.324 0.461 7.158 0.750 0.655 10.171
720 0.202 0.368 5.435 0.579 0.621 9.172 0.277 0.431 6.365 0.270 0.423 6.247 0.816 0.682 10.073
ECL 168 0.265 0.361 2.315 0.385 0.458 2.937 0.447 0.503 3.225 0.587 0.561 3.597 0.225 0.363 2.327
336 0.295 0.395 2.602 0.462 0.496 3.267 0.489 0.528 3.478 0.683 0.64 4.215 0.237 0.359 2.364
720 0.303 0.409 2.544 1.349 0.907 5.643 0.54 0.571 3.552 0.482 0.527 3.278 0.367 0.482 2.998
960 0.427 0.489 3.849 1.263 0.920 7.242 0.582 0.608 4.786 0.644 0.597 4.699 0.457 0.540 4.250
Exchange 10 0.147 0.282 2.867 0.163 0.315 3.203 4.896 2.124 21.601 6.926 2.553 25.964 0.804 0.701 7.129
20 0.273 0.421 3.160 0.423 0.540 4.054 6.318 2.443 18.341 6.759 2.524 18.949 1.166 0.939 7.049
30 0.399 0.506 3.132 0.857 0.799 4.945 5.388 2.253 13.945 7.307 2.635 16.31 1.521 1.105 6.839
60 0.619 0.669 4.265 0.911 0.776 4.948 9.886 3.067 19.557 8.455 2.840 18.109 3.299 1.670 10.648
COMMUNICATIONS ENGINEERING | https://doi.org/10.1038/s44172-023-00081-4

Solar 144 0.348 0.326 7.461 0.431 0.485 11.091 0.365 0.362 8.290 0.546 0.513 11.742 0.351 0.371 8.487
288 0.312 0.331 8.355 0.437 0.477 12.035 0.405 0.397 10.007 0.368 0.368 9.289 0.345 0.356 8.988
720 0.315 0.342 8.793 0.400 0.525 13.497 0.577 0.537 13.803 0.339 0.441 11.352 0.350 0.357 9.176
864 0.310 0.297 7.053 0.546 0.607 14.423 0.994 0.897 21.299 0.813 0.478 11.367 0.349 0.357 8.488
Traffic 168 0.156 0.259 0.835 0.431 0.485 1.561 1.814 1.159 3.729 0.750 0.644 2.071 0.509 0.528 1.700
336 0.158 0.261 0.847 0.437 0.477 1.548 1.799 1.153 3.738 0.629 0.573 1.857 0.517 0.529 1.714
720 0.318 0.437 1.457 0.400 0.525 1.751 1.817 1.150 3.836 0.671 0.604 2.014 0.526 0.533 1.779
960 0.277 0.397 1.299 0.546 0.607 1.986 1.821 1.165 3.809 1.950 1.116 3.649 0.523 0.532 1.740

The model’s predictive accuracy is assessed by MSE, MAE, and MASE. All datasets were standardized using the mean and standard deviation values of the training set. The best and suboptimal predictive performance over the comparison is shown in bold and italics,

COMMUNICATIONS ENGINEERING | (2023)2:33 | https://doi.org/10.1038/s44172-023-00081-4 | www.nature.com/commseng


respectively.
ARTICLE

7
ARTICLE COMMUNICATIONS ENGINEERING | https://doi.org/10.1038/s44172-023-00081-4

dMAE 301 ¼ 0:167% showing its ability to harness historical


and its reasonable extrapolation, considering that it is generally
information within time series. The predictive performances and difficult to predict the financial system.
degradation rates of all baseline approaches have been provided in
Supplementary Table S2. Our model can synthesize context and Solar energy production prediction. The solar dataset contains the
texture to balance both short-term and long-term cases, ensuring 10-minute level 1 year (2006) solar power production data of 137
its accurate and robust long-term prediction. PV plants in Alabama State, and PV-136 is the target feature
(http://www.nrel.gov). Given that the amount of solar energy
Electricity transformer temperature prediction. The ETT dataset produced daily is generally stable, conducting a super long-term
contains two-year data with six power load features from two prediction is unnecessary. Therefore, we set the prediction horizon
counties in China, and oil temperature is our target. Its split ratio of to {1, 2, 5, 6} days, aligned with {144, 288, 720, 864} prediction steps
training/validation/test set is 12/4/4 months39. The ETT data set is ahead. Furthermore, this characteristic of solar energy means that
divided into two separate datasets at the 1-h {ETTh1, ETTh2} and its production series tend to be stationary, and thereby the com-
15-minute levels ETTm1. Therefore, we can study the performance parison of the predictive performances between different models
of the models under different granularities, where the prediction on this dataset presents their basic series modeling abilities. Con-
steps {96, 288, 672} of ETTm1 align with the prediction steps {24, cretely, considering the MASE error can be used to assess the
48, 168} of ETTh1. Our experiments show that Diviner achieves the model’s performance on different series, we calculate and sort each
best performance in both cases. Although in the hour-level case, model’s average MASE error under different prediction horizon
Diviner outperforms the baselines with the closest MSE and MAE settings to measure the time series modeling ability (provided in
to Autoformer (MSE: 0.110 → 0.082, MAE: 0.247 → 0.216). When Supplementary Table S6). The results are as follows: Diviner >
the hour-level granularity turns to a minute-level case, Diviner NBeats > Transformer > Autoformer > Informer > LSTM, where
outperforms Autoformer by a large margin (MSE:0.092 → Diviner surpasses all Transformer-based models in the selected
0.064, MAE:0.239 → 0.194). The predictive performances and baselines. Provided that the series data is not that non-stationary,
the granularity change when the hour-level granularity turns the advantages of Autoformer’s modeling time series non-
into the minute-level granularity of all baseline approaches have stationarity are not apparent. At the same time, capturing stable
been provided in Supplementary Table S3. These demonstrate long- and short-term dependencies is still effective.
the capacity of the Diviner in processing time series of different
granularity. Furthermore, the granularity is also a manifestation Road occupancy rate prediction. The Traffic dataset contains
of scale. These results demonstrate that modeling multi-scale hourly 2-year (2015–2016) road occupancy rate collected from
features is conducive to dealing with time series of different 862 sensors on San Francisco Bay area freeways by the California
granularity. Department of Transportation, where sensor-861 is the target
feature (http://pems.dot.ca.gov). The prediction horizon is set to
Consumer electricity consumption prediction. The ECL dataset {7, 14, 30, 40} days, aligned with {168, 336, 720, 960} prediction
records the two-year electricity consumption of 321 clients, which is steps ahead. Considering the road occupancy rate tends to have a
converted into hour-level consumption owing to the missing data, and weekly cycle, we use this dataset to compare different networks’
MT-320 is the target feature62. We predict different time horizons of ability to model the temporal cycle. During the comparison, we
{7, 14, 30, 40} days, aligned with {168, 336, 720, 960} prediction steps mainly focus on the following two groups of deep learning
ahead. Next, we analyze the experimental results according to the models: group-1 takes the non-stationary specialization of time
prediction spans (≤360 as short-term prediction, ≥360 as long-term series into account (Diviner, Autoformer), and group-2 does not
prediction). NBeats achieves the best forecasting performance for employ any time-series-specific components (Transformer,
short-term electricity consumption prediction, while Diviner surpasses Informer, LSTMa). We find that group-1 gains a significant
it in the long-term prediction case. The short-term and long-term performance improvement over group-2, which suggests the
performance of all approaches has been provided in Supplementary necessity of modeling non-stationarity. As for the proposed
Table S4. Statistically, the proposed method outperforms the best Diviner model, it achieves a 27.64% MAE reduction
baseline (NBeats) by decreasing 17.43% MSE (0.367 → 0.303), 15.14% (0.604 → 0.437) to the Transformer model when forecasting 30-
MAE (0.482 → 0.409) at 720 steps ahead, and 6.56% MSE day road occupancy rates. Subsequently, we conduct an intra-
(0.457 → 0.427) at 9.44% MAE (0.540 → 0.489) at 960 steps ahead. group comparison for group-1, where Diviner still gains an
We attribute this to scalability, where different models converge to average 35.37% MAE reduction (0.523 → 0.338) to Autoformer.
perform similarly in the short-term case, but their differences emerge The predictive performance of all approaches has been provided
when the prediction span becomes longer. in Supplementary Table S7. We attribute this to Diviner’s
multiple-scale modeling of non-stationarity, while the trend-
Gold price prediction. The Exchange dataset contains 5-year seasonal decomposition of Autoformer merely reflects time series
closing prices of a troy ounce of gold in the US recorded daily variation at particular scales. These experimental results
from 2016 to 2021. Due to the high-frequency fluctuation of the demonstrate that Diviner is competent in predicting time series
market price, the predictive goal is to predict its general trend data with cycles.
reasonably (https://www.lbma.org.uk). To this end, we perform a
long-term prediction of {10, 20, 30, 60} days. The experimental
results clearly show apparent performance degrades for most Discussion
baseline models. Given a history of 90 days, only Autoformer and We study the long-term 5G network traffic prediction problem by
Diviner can predict with MAE and MSE errors lower than 1 when modeling non-stationarity with deep learning techniques. Although
the prediction span is 60 days. However, Diviner still outperforms some literature63–65 in the early stage argues that the probabilistic
other methods with a 38.94% average MSE reduction traffic forecast under uncertainty is more suitable for the varying
(0.588 → 0.359) and a 22.73% average MSE reduction network traffic than a concrete forecast produced by time series
(0.607 → 0.469) and achieves the best forecast performance. The models, the probabilistic traffic forecast and the concrete traffic
predictive performance of all baseline approaches has been pro- forecast share the same historical information in essence. Moreover,
vided in Supplementary Table S5. These results indicate the the development of time series forecasting techniques these years has
adaptability of Diviner to the rapid evolution of financial markets witnessed a series of works employing time series forecasting

8 COMMUNICATIONS ENGINEERING | (2023)2:33 | https://doi.org/10.1038/s44172-023-00081-4 | www.nature.com/commseng


COMMUNICATIONS ENGINEERING | https://doi.org/10.1038/s44172-023-00081-4 ARTICLE

techniques for practical applications such as bandwidth advantages of our model become apparent in long-term fore-
management14,15, resource allocation16, and resource provisioning17, casting scenarios.
where the time series prediction-based methods can provide detailed
network traffic forecast. However, existing time series forecasting Methods
methods suffer a severe performance degeneration since the long- Preliminaries.
 We denote
 the original form of the time-series data as
term prediction horizon exposes the non-stationarity of time series, X ¼ x1 x2 ::: xn ; xi 2 R. The original time series data X is reshaped to a
 
which raises several challenges: (a) Multi-scale temporal variations. matrix form as X e¼ ~ x1 ~x2 ::: ~ xK , where ~ xi is a vector of length T with the
(b) Random factors. (c) Data Distribution Shift. time series data per day/week/month/year, K denotes the number of days/weeks/
Therefore, this paper attempts to challenge the problem of months/years, ~ xi 2 RT . After that, we can represent the seasonal pattern as ~xi and use
its variation between adjacent time steps to model trends, shown as the following:
achieving a precise long-term prediction for non-stationary time
series. We start from the fundamental property of time series t 2 1
~
xt 2 ¼ ~
xt 1 þ ∑ Δest ;
non-stationarity and introduce deep stationary processes into a t¼t 1 ð10Þ
neural network, which models multi-scale stable regularities Δest ¼ ~
xtþ1  ~
xt ;
within non-stationary time series. We argue that capturing the
where Δest denotes the change of the seasonal pattern, Δest 2 RT . The shift reflects the
stable features is a recipe for generating non-stationary forecasts variation between small time steps, but when such variation (shift) builds up over a
conforming to historical regularities. The stable features enable t 2 1
rather long period, the trend d comes out. It can be achieved as ∑t¼t Δest . Therefore,
networks to restrict the latent space of time series, which deals 1
we can model trends by capturing the long- and short-range dependencies of shifts
with varying distribution problems. Extensive experiments on among different time steps.
network traffic prediction and other real-world scenarios Next, we introduce a smoothing filter attention mechanism to construct multi-
demonstrate its advances over existing prediction-based models. scale transformation layers. A difference attention module is mounted to capture
and interconnect shifts of the corresponding scale. These mechanisms make our
Its advantages are summarized as follows. (a) Diviner brings a Diviner capture multi-scale variations in non-stationary time series, and the
salient improvement on both long- and short-term prediction mathematical description is listed below.
and achieves state-of-the-art performance. (b) Diviner can per-
form robustly regardless of the selection of prediction span and Diviner input layer. Given the time series data X, we transform X into
 
granularity, showing great potential for long-term forecasting. (c) e¼ ~
X x1 ~ x2 ::: ~xK , where ~ xi is a vector of length T with the time series data
Diviner maintains a strong generalization in various fields. The per day (seasonal), and K denotes the number of days, ~ x i 2 RT , Xe 2 RT ´ K . Then
performance of most baselines might degrade precipitously in we construct the dual input for Diviner. Noticing that Diviner adopts an encoder-
some or other areas. In contrast, our model distinguishes itself for decoder architecture, we construct Xin for encoder and Xin de for decoder, where
  in  en
consistent performance on each benchmark. Xin
en ¼ ~
x 1 ~
x 2 ::: ~
x K , X de ¼ ~
x KK de þ1 ~xKK de ::: ~ xK , and Xin en 2 R ,
K

e
de 2 R
K de
This work explores an avenue to obtain detailed and precise Xin . This means that Xin in
en takes all elements from X while Xde takes only the
latest Kde elements. After that, a fully connected layer on Xin in
en and Xde is used to
long-term 5G network traffic forecasts, which can be used to dm ´ K d m ´ K de
obtain Einen and Ede , where Een 2 R de 2 R
in in
, Ein and dm denotes the model
calculate the time network traffic might overflow the capacity and dimension.
helps operators formulate network construction schemes months
in advance. Furthermore, Diviner generates long-term network Smoothing filter attention mechanism. Inspired by Nadaraya-Watson
traffic forecasts at the minute level, facilitating its broader regression51,52 bringing the adjacent points closer together, we introduce the
applications for resource provisioning, allocating, and monitor- smoothing filter attention mechanism with a learnable kernel function and self-
ing. Decision-makers can harness long-term predictions to allo- masked architecture, where the former brings similar items closer to filter out the
cate and optimize network resources. Another practical random component and adjust the non-stationary data to stable features, and the
letter reduces outliers. The smoothing filter attention mechanism is implemented
application is to achieve an automatic network status monitoring  
based on the input E ¼ ξ 1 ξ 2 ::: ξ K in , where ξ i 2 Rdm , E is the general
system, which automatically alarms when real network traffic reference to the input of each layer, for encoder Kin = K, and for decoder Kin = Kde.
exceeds a permitted range around predictions. This system sup- en and Ede are, respectively, the input of the first encoder and decoder
Specifically, Ein in

ports targeted port-level early warning and assists workers in layer. The calculation process is shown as follows:
troubleshooting in time, which can bring substantial efficiency
∑ Kðξ i ; ξ j Þ  ξ j
improvement considering the tens of millions of network ports j≠i
ηi ¼ ; ð11Þ
running online. In addition to 5G networks, we have expanded ∑ K ðξ i ; ξ j Þ
j≠i
our solution to broader engineering fields such as electricity,
climate, control, economics, energy, and transportation. Predict-
ing oil temperature can help prevent the transformer from K ðξ i ; ξ j Þ ¼ expðwi  ðξ i  ξ j Þ2 Þ; ð12Þ
overheating, which affects the insulation life of the transformer where wi 2 Rdm ; i 2 ½1; K in  denotes the learnable parameters, ⊙ denotes the
and ensures proper operation66,67. In addition, long-term element-wise multiple, (⋅)2 denotes the element-wise square and the square of a
meteorological prediction helps to select and seed crops in agri- vector here represents the element-wise square. To simplify the representation, we
culture. As such, we can discover unnoticed regularities within assign the smoothing filter attention mechanism as Smoothing-Filter(E) and
historical series data, which might bring opportunities to tradi- denote its output as Hs. Before introducing our difference attention module, we
first define the difference between a matrix and its inverse operation CumSum.
tional industries.
One limitation of our proposed model is that it suffers from Difference and CumSumoperation. Given a matrix M 2 Rm ´ n ,
critical transitions of data patterns. We attribute this to external 
M ¼ m1 m2 ::: mn , the difference of M is defined as:
factors, whose information is generally not included in the  
measured data53,55,68. Our method is helpful in the intrinsic ΔM ¼ Δm1 Δm2 ::: Δmn ; ð13Þ
regularity discovery within the time series but cannot predict where Δmi ¼ miþ1  mi ; Δmi 2 R ; i 2 ½1; nÞ and we pad Δmn with Δmn−1 to
m
patterns not previously recorded in the real world. Alternatively, keep a fixed length before and after the difference operation. The CumSum
we can use dynamic network methods69–71 to detect such critical operation Σ toward M is defined as:
transitions in the time series53. Furthermore, the performance of  
ΣM ¼ Σm1 Σm2 ::: Σmn ; ð14Þ
Diviner might be similar to other deep learning models if given a
few history series or in the short-term prediction case. The former where Σmi ¼ ∑ij¼1 mj ; Σmi 2 Rm : The differential attention module, intuitively,
contains insufficient information to be exploited, and the short- can be seen as an attention mechanism plugged between these two operations,
term prediction needs more problem scalability, whereas the mathematically described as follows.

COMMUNICATIONS ENGINEERING | (2023)2:33 | https://doi.org/10.1038/s44172-023-00081-4 | www.nature.com/commseng 9


ARTICLE COMMUNICATIONS ENGINEERING | https://doi.org/10.1038/s44172-023-00081-4

Differential attention module. The input of this model involves three elements: References
Q, K, V. The (Q, K, V) varies between the encoder and decoder, which is 1. Jovović, I., Husnjak, S., Forenbacher, I. & Maček, S. Innovative application of
s ; Hs ; Hs Þ for the encoder and ðHs ; Een ; Een Þ for the decoder, where Een is
ðHen en en de out out out 5G and blockchain technology in industry 4.0. EAI Endorsed Trans. Ind. Netw.
the embedded result of the final encoder block (assigned in the pseudo-code), Intell. Syst. 6, e4 (2019).
dm ´ K d m ´ K de dm ´ K
s 2R ; Hde
s 2R ; Eout
en 2 R
Hen . 2. Osseiran, A. et al. Scenarios for 5G mobile and wireless communications: the
ðiÞ
vision of the METIS project. IEEE Commun. Mag. 52, 26–35 (2014).
QðiÞ ðiÞ ðiÞ
s ; Ks ; Vs ¼ WðiÞ
q ΔQ þ bðiÞ
q ; Wk ΔK þ bðiÞ ðiÞ
k ; Wv ΔV þ bðiÞ
v ; ð15Þ 3. Wu, G., Yang, C., Li, S. & Li, G. Y. Recent advances in energy-efficient
! networks and their application in 5G systems. IEEE Wirel. Commun. 22,
>

e ðiÞ ðiÞ QðiÞ  KðiÞ 145–151 (2015).


V s ¼ Vs  SoftMax
s
pffiffiffiffiffiffi s ; ð16Þ 4. Hui, H., Ding, Y., Shi, Q., Li, F. & Yan, J. 5G network-based internet of things
dm
for demand response in smart grid: A survey on application potential. Appl.
h > > >
i> Energy 257, 113972 (2020).
D ¼ ΣðWs Ve ð1Þ
s
e ð2Þ
V s ::: e ðhÞ
V s
Þ; ð17Þ 5. Johansson, N. A., Wang, Y., Eriksson, E. & Hessler, M. Radio access
for ultra-reliable and low-latency 5G communications. in Proceedings of
where WðiÞ
q 2 R da ´ dm
, WðiÞ 2R d attn ´ d m
, 2R WðiÞ , Ws 2 R da ´ dm
, d m ´ hd a
IEEE International Conference on Communication Workshop, 1184–1189
dm ´ K
k v
 
D2R , i ∈ [1, h], h denotes the number of parallel attentions.  denotes the (2015).
6. Yilmaz, O., Wang, Y., Johansson, N. A., Brahmi, N. & Sachs, J. Analysis of
concatenation of matrix, V e ðiÞ
s denotes the deep shift, and D denotes the deep trend. ultra-reliable and low-latency 5G communication for a factory automation use
We denote the differential attention module as Differential-attention(Q, K, V) to case. in Proceedings of IEEE International Conference on Communication
ease representations. Workshop (2015).
7. Fernández, M. L., Huertas, C. A., Gil, P. M., García, C. F. J. & Martínez, P. G.
Convolution Generator. The final output of Diviner is calculated through con- Dynamic management of a deep learning-based anomaly detection system for
volutional layers, called the one-step generator, which takes the output of the final 5G networks. J. Ambient Intell. Hum. Comput. 10, 3083–3097 (2019).
decoder layer Eout
de as the input: 8. O’Connell, E., Moore, D. & Newe, T. Challenges associated with implementing
5G in manufacturing. Telecom 1, 48–67 (2020).
Rpredict ¼ ConvNet ðEout
de Þ; ð18Þ
9. Oughton, E. J., Frias, Z., van der Gaast, S. & van der Berg, R. Assessing the
ðMÞ
where Rpredict 2 Rdm ´ K r ; Ede 2 Rdm ´ K de , ConvNet is a multilayer fully convolu- capacity, coverage and cost of 5G infrastructure strategies: Analysis of the
tion net, whose input and output channels are the input length of the decoder Kde netherlands. Telemat. Inform. 37, 50–69 (2019).
and the prediction length Kr, respectively. 10. Gupta, A. & Jha, R. K. A survey of 5g network: Architecture and emerging
technologies. IEEE Access 3, 1206–1232 (2015).
11. Wang, C. et al. Cellular architecture and key technologies for 5G wireless
Pseudo-code of Diviner. For the convenience of reproducing, We summarize the communication networks. IEEE Commun. Mag. 52, 122–130 (2014).
framework of our Diviner in the following pseudo-code: 12. Li, Q. C., Niu, H., Papathanassiou, A. T. & Wu, G. 5G network capacity: key
elements and technologies. IEEE Vehicular Technol. Mag. 9, 71–78 (2014).
13. Liu, H. Research on resource allocation and optimization technology in 5G
communication network. In Proceedings of International Conference on
Consumer Electronics and Computer Engineering, 209–212 (2022).
14. Yoo, W. & Sim, A. Time-series forecast modeling on high-bandwidth network
measurements. J. Grid Comput. 14, 463–476 (2016).
15. Wei, Y., Wang, J. & Wang, C. A traffic prediction based bandwidth
management algorithm of a future internet architecture. in Proceedings of
International Conference on Intelligent Networks and Intelligent Systems,
560–563 (2010).
16. Garroppo, R. G., Giordano, S., Pagano, M. & Procissi, G. On traffic prediction
for resource allocation: a Chebyshev bound based allocation scheme. Comput.
Commun. 31, 3741–3751 (2008).
17. Bega, D., Gramaglia, M., Fiore, M., Banchs, A. & Costa-Pérez, X. Deepcog:
Optimizing resource provisioning in network slicing with ai-based capacity
forecasting. IEEE J. Sel. Areas Commun. 38, 361–376 (2019).
18. Hassidim, A., Raz, D., Segalov, M. & Shaqed, A. Network utilization: the flow
view. in Proceedings of 2013 IEEE INFOCOM, 1429–1437 (2013).
19. Box, G., Jenkins, G., Reinsel, G. & Ljung, G. Time Series Analysis: Forecasting
and Control (John Wiley & Sons, America, 2015).
20. Box, G. E. & Jenkins, G. M. Some recent advances in forecasting and control. J.
R. Stat. Soc. C 17, 91–109 (1968).
21. Moayedi, H. & Masnadi-Shirazi, M. Arima model for network traffic
prediction and anomaly detection. in Proceedings of International Symposium
on Information Technology, vol. 4, 1–6 (2008).
22. Azari, A., Papapetrou, P., Denic, S. & Peters, G. Cellular traffic prediction and
classification: A comparative evaluation of lstm and arima. In Proceedings of
International Conference on Discovery Science, 129-144 (2019).
23. Tikunov, D. & Nishimura, T. Traffic prediction for mobile network using holt-
winter’s exponential smoothing. In Proceedings of International Conference on
Software, Telecommunications and Computer Networks, 1–5 (2007).
Data availability 24. Shu, Y., Yu, M., Yang, O., Liu, J. & Feng, H. Wireless traffic modeling and
The datasets supporting our work have been deposited at https://doi.org/10.5281/zenodo. prediction using seasonal arima models. IEICE Trans. Commun. 88,
7827077. However, restrictions apply to the availability of NPT data, which were used 3992–3999 (2005).
under license for the current study, and so are not publicly available. Data are, however, 25. Rafsanjani, M. K., Rezaei, A., Shahraki, A. & Saeid, A. B. Qarima: a new approach
available from the authors upon reasonable request and with permission of China to prediction in queue theory. Appl. Math. Comput. 244, 514–525 (2014).
Information Technology Designing Consulting Institute. 26. Taylor, S. & Letham, B. Forecasting at scale. Am. Stat. 72, 37–45 (2018).
27. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444
(2015).
Code availability 28. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9,
Codes are available at https://doi.org/10.5281/zenodo.7825740. 1735–1780 (1997).
29. Salinas, D., Flunkert, V., Gasthaus, J. & Januschowski, T. DeepAR:
Received: 7 September 2022; Accepted: 10 May 2023; probabilistic forecasting with autoregressive recurrent networks. Int. J.
Forecast. 36, 1181–1191 (2020).

10 COMMUNICATIONS ENGINEERING | (2023)2:33 | https://doi.org/10.1038/s44172-023-00081-4 | www.nature.com/commseng


COMMUNICATIONS ENGINEERING | https://doi.org/10.1038/s44172-023-00081-4 ARTICLE

30. Qin, Y. et al. A dual-stage attention-based recurrent neural network for time 58. Wu, N., Green, B., Ben, X. & O’Banion, S. Deep transformer models for time
series prediction. in Proceedings of International Joint Conference on Artificial series forecasting: the influenza prevalence case. Preprint at arXiv https://doi.
Intelligence, 2627–2633 (2017). org/10.48550/arXiv.2001.08317 (2020).
31. Mona, S., Mazin, E., Stefan, L. & Maja, R. Modeling irregular time series with 59. Lea, C., Flynn, M. D., Vidal, R., Reiter, A. & Hager, G. D. Temporal convolutional
continuous recurrent units. Proc. Int. Conf. Mach. Learn. 162, 19388–19405 (2022). networks for action segmentation and detection. in Proceedings of the IEEE
32. Kashif, R., Calvin, S., Ingmar, S. & Roland, V. Autoregressive Denoising Conference on Computer Vision and Pattern Recognition, 156-165 (2017).
Diffusion models for multivariate probabilistic time series forecasting. in 60. Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly
Proceedings of International Conference on Machine Learning, vol. 139, learning to align and translate. in Proceedings of International Conference on
8857–8868 (2021). Learning Representations (2015).
33. Alasdair, T., Alexander, P. M., Cheng, S. O. & Xie, L. Radflow: A recurrent, 61. Oreshkin, B. N., Carpov, D., Chapados, N. & Bengio, Y. N-BEATS: Neural
aggregated, and decomposable model for networks of time series. in basis expansion analysis for interpretable time series forecasting. in
Proceedings of International World Wide Web Conference, 730–742 (2021). Proceedings of International Conference on Learning Representations (2020).
34. Ling, F. et al. Multi-task machine learning improves multi-seasonal prediction 62. Li, S. et al. Enhancing the locality and breaking the memory bottleneck of
of the Indian ocean dipole. Nat. Commun. 13, 1–9 (2022). transformer on time series forecasting. in Proceedings of Annual Conference on
35. Vaswani, A. et al. Attention is all you need. Proc. Annu. Conf. Neural Inf. Neural Information Processing Systems 32 (2019).
Process. Syst. 30, 5998–6008 (2017). 63. Geary, N., Antonopoulos, A., Drakopoulos, E., O’Reilly, J. & Mitchell, J. A
36. Alexandre, D., Étienne, M. & Nicolas, C. TACTiS: transformer-attentional framework for optical network planning under traffic uncertainty. in
copulas for time series. in Proceedings of International Conference on Machine Proceedings of International Workshop on Design of Reliable Communication
Learning, vol. 162, 5447–5493 (2022). Networks, 50–56 (2001).
37. Tung, N. & Aditya, G. Transformer neural processes: uncertainty-aware meta 64. Laguna, M. Applying robust optimization to capacity expansion of one
learning via sequence modeling. in Proceedings of International Conference on location in telecommunications with demand uncertainty. Manag. Sci. 44,
Machine Learning, vol. 162, 16569–16594 (2022). S101–S110 (1998).
38. Wen, Q. et al. Transformers in time series: A survey. CoRR (2022). 65. Bauschert, T. et al. Network planning under demand uncertainty with robust
39. Zhou, H. et al. Informer: beyond efficient transformer for long sequence time- optimization. IEEE Commun. Mag. 52, 178–185 (2014).
series forecasting. in Proceedings of AAAI Conference on Artificial Intelligence 66. Radakovic, Z. & Feser, K. A new method for the calculation of the hot-spot
(2021). temperature in power transformers with onan cooling. IEEE Trans. Power
40. Kitaev, N., Kaiser, L. & Levskaya, A. Reformer: the efficient transformer. Deliv. 18, 1284–1292 (2003).
in Proceedings of International Conference on Learning Representations 67. Zhou, L. J., Wu, G. N., Tang, H., Su, C. & Wang, H. L. Heat circuit method for
(2019). calculating temperature rise of scott traction transformer. High Volt. Eng. 33,
41. Li, S. et al. Enhancing the locality and breaking the memory bottleneck of 136–139 (2007).
transformer on time series forecasting. in Proceedings of the 33th Annual 68. Jiang, J. et al. Predicting tipping points in mutualistic networks through
Conference on Neural Information Processing Systems vol. 32, 5244–5254 dimension reduction. Proc. Natl Acad. Sci. USA 115, E639–E647 (2018).
(2019). 69. Chen, L., Liu, R., Liu, Z., Li, M. & Aihara, K. Detecting early-warning signals
42. Wu, H., Xu, J., Wang, J. & Long, M. Autoformer: decomposition transformers for sudden deterioration of complex diseases by dynamical network
with auto-correlation for long-term series forecasting. in Proceedings of biomarkers. Sci. Rep. 2, 1–8 (2012).
Annual Conference on Neural Information Processing Systems, vol. 34, 70. Yang, B. et al. Dynamic network biomarker indicates pulmonary metastasis at
22419–22430 (2021). the tipping point of hepatocellular carcinoma. Nat. Commun. 9, 1–14 (2018).
43. Zhou, T. et al. Fedformer: frequency enhanced decomposed transformer for 71. Liu, R., Chen, P. & Chen, L. Single-sample landscape entropy reveals the
long-term series forecasting. in Proceedings of International Conference on imminent phase transition during disease progression. Bioinformatics 36,
Machine Learning, vol. 162, 27268–27286 (2022). 1522–1532 (2020).
44. Liu, S. et al. Pyraformer: low-complexity pyramidal attention for long-range
time series modeling and forecasting. in Proceedings of International
Conference on Learning Representations (ICLR) (2021). Acknowledgements
45. Liu, M. et al. SCINet: Time series modeling and forecasting with sample This work was supported by the National Natural Science Foundation of China under
convolution and interaction. in Proceedings of Annual Conference on Neural Grant 62076016 and 12201024, Beijing Natural Science Foundation L223024.
Information Processing Systems (2022).
46. Wang, Z. et al. Learning latent seasonal-trend representations for time series Author contributions
forecasting. in Proceedings of Annual Conference on Neural Information Y.Y., S.G., B.Z., J.Z., and D.D. conceived the research. All authors work on the writing of
Processing Systems (2022). the article. Y.Y. and S.G. equally contributed to this work by performing experiments and
47. Xie, C. et al. Trend analysis and forecast of daily reported incidence of hand, results analysis. Z.W. and Y.Z. collected the 5G network traffic data. All authors read and
foot and mouth disease in hubei, china by prophet model. Sci. Rep. 11, 1–8 approved the final paper.
(2021).
48. Cox, D. R. & Miller, H. D. The Theory of Stochastic Processes (Routledge,
London, 2017). Competing interests
49. Dette, H. & Wu, W. Prediction in locally stationary time series. J. Bus. Econ. The authors declare no competing interests.
Stat. 40, 370–381 (2022).
50. Wold, H. O. On prediction in stationary time series. Ann. Math. Stat. 19,
558–567 (1948).
Inclusion and ethics
No ‘ethics dumping’ and ‘helicopter research’ cases occurred in our research.
51. Watson, G. S. Smooth regression analysis. Sankhyā: The Indian Journal of
Statistics, Series A359–372 (1964).
52. Nadaraya, E. A. On estimating regression. Theory Probab. Appl. 9, 141–142 Additional information
(1964). Supplementary information The online version contains supplementary material
53. Chen, P., Liu, R., Aihara, K. & Chen, L. Autoreservoir computing for multistep available at https://doi.org/10.1038/s44172-023-00081-4.
ahead prediction based on the spatiotemporal information transformation.
Nat. Commun. 11, 1–15 (2020). Correspondence and requests for materials should be addressed to Baochang Zhang or
54. Lu, J., Wang, Z., Cao, J., Ho, D. W. & Kurths, J. Pinning impulsive Juan Zhang.
stabilization of nonlinear dynamical networks with time-varying delay. Int. J.
Bifurc. Chaos 22, 1250176 (2012). Peer review information Communications Engineering thanks Akhil Gupta, Erol
55. Malik, N., Marwan, N., Zou, Y., Mucha, P. J. & Kurths, J. Fluctuation of Egrioglu, and the other anonymous reviewer(s) for their contribution to the peer review
similarity to detect transitions between distinct dynamical regimes in short of this work. Primary Handling Editors: Miranda Vinay and Rosamund Daw. A peer
time series. Phys. Rev. E 89, 062908 (2014). review file is available.
56. Yang, R., Lai, Y. & Grebogi, C. Forecasting the future: Is it possible for
adiabatically time-varying nonlinear dynamical systems? Chaos 22, 033119 Reprints and permission information is available at http://www.nature.com/reprints
(2012).
57. Henkel, S. J., Martin, J. S. & Nardari, F. Time-varying short-horizon Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
predictability. J. Financ. Econ. 99, 560–580 (2011). published maps and institutional affiliations.

COMMUNICATIONS ENGINEERING | (2023)2:33 | https://doi.org/10.1038/s44172-023-00081-4 | www.nature.com/commseng 11


ARTICLE COMMUNICATIONS ENGINEERING | https://doi.org/10.1038/s44172-023-00081-4

Open Access This article is licensed under a Creative Commons


Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the article’s Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
article’s Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this license, visit http://creativecommons.org/
licenses/by/4.0/.

© The Author(s) 2023

12 COMMUNICATIONS ENGINEERING | (2023)2:33 | https://doi.org/10.1038/s44172-023-00081-4 | www.nature.com/commseng

You might also like