0% found this document useful (0 votes)
15 views6 pages

Applying Deep Learning Approaches For Network Traffic Prediction

This document discusses the application of deep learning approaches, specifically recurrent neural networks (RNNs), for predicting network traffic based on historical data. It evaluates various RNN architectures, including LSTM and GRU, on real data from GÉANT backbone networks, highlighting their effectiveness in capturing temporal dependencies and improving prediction accuracy compared to traditional linear models. The paper also outlines the structure of the experiments conducted and the performance outcomes of different RNN methods in traffic matrix estimation.

Uploaded by

janarthanan20669
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

Applying Deep Learning Approaches For Network Traffic Prediction

This document discusses the application of deep learning approaches, specifically recurrent neural networks (RNNs), for predicting network traffic based on historical data. It evaluates various RNN architectures, including LSTM and GRU, on real data from GÉANT backbone networks, highlighting their effectiveness in capturing temporal dependencies and improving prediction accuracy compared to traditional linear models. The paper also outlines the structure of the experiments conducted and the performance outcomes of different RNN methods in traffic matrix estimation.

Uploaded by

janarthanan20669
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Applying Deep Learning Approaches for Network

Traffic Prediction
Vinayakumar R∗ , Soman KP∗ and Prabaharan Poornachandran†
∗ Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore
† Center for Cyber Security Systems and Networks, Amrita School of Engineering, Amritapuri
Amrita Vishwa Vidyapeetham, Amrita University, India
Email: [email protected]

Abstract—Network traffic prediction aims at predicting the and other devices information in real time network, prediction
subsequent network traffic by using the previous network traffic approaches have been employed.
data. This can serve as a proactive approach for network manage- Most commonly used approaches to obtain a traffic matrix
ment and planning tasks. The family of recurrent neural network
(RNN) approaches is known for time series data modeling are tomogravity [1], principal component analysis (PCA),
which aims to predict the future time series based on the past route change [5]. These techniques have their own disadvan-
information with long time lags of unrevealed size. RNN contains tages [6]. Network tomography is the well-known technique
different network architectures like simple RNN, long short term used to get a traffic matrix. This facilitates to know the
memory (LSTM), gated recurrent unit (GRU), identity recurrent link counts and routing information. This generates an ill-
unit (IRNN) which is capable to learn the temporal patterns
and long range dependencies in large sequences of arbitrary posed feature in estimating the traffic matrix [6]. Ideally, the
length. To leverage the efficacy of RNN approaches towards network traffic parameters prediction is estimated based on
traffic matrix estimation in large networks, we use various their statistical characteristics mainly because they exists ex-
RNN networks. The performance of various RNN networks is treme association relationships between the sequential ordered
evaluated on the real data from GÉANT backbone networks. To values. The statistical features have changed a lot in the current
identify the optimal network parameters and network structure
of RNN, various experiments are done. All experiments are run complex and diverse network architecture and its applications
up to 200 epochs with learning rate in the range [0.01-0.5]. LSTM [7]. These new statistical features are not the right approach
has performed well in comparison to the other RNN and classical for current network traffic information due to the fact that they
methods. Moreover, the performance of various RNN methods is were inadequately modeled by Poisson and Gaussian models
comparable to LSTM. [5]. This paper is more devoted to network traffic prediction
Index Terms—Network traffic matrix, Prediction, Deep Learn-
by using the existing data from GÉANT backbone networks.
ing
There are various methods exist for network traffic forecast-
ing. Generally, these can be classified in to linear and non-
I. I NTRODUCTION linear prediction. The most commonly used linear mechanism
is ARMA/ARIMA model [8], [9], [10] and HoltWinters algo-
In modern society, the Internet and its applications has rithm [8]. The most commonly used non-linear mechanism is
become a primary communication tool for all types of users neural networks [3], [11], [12]. In [13] claimed that the non-
to carry out daily activities. This can create a large volume of linear approach based traffic prediction was most appropriate.
network traffic. Understanding the traffic matrix is essential for The performance of various linear approaches such as ARMA,
individual network service providers mainly due to the reason ARAR and HW and non-linear approach such as neural
that their inferences are enormous. Traffic matrix provides networks was evaluated. In all experimental settings, the non-
the abstract representation of the volume of traffic flows linear technique has performed well in comparison to the linear
from all possible sets of origin to a destination point in a approaches. Generally, the best forecasting approach is chosen
network for definite time interval. The source and destination based on considering the factors like chracteristics of the traffic
point can be routers, points-of-presence, internet protocol (IP) matrix, less mean suwarred error and less compuational cost.
prefixes and links [1]. Traffic matrix facilitates network service [13] showed that the Feed Forward neural network (FFN)
providers to make various network management decisions such predictor with multiresolution learning approach as considered
as network maintenance, network optimization, routing policy as best forecasting technique with careful consideration of
design, load balancing, protocol design, anomaly detection precision and model complexity.
and prediction of future traffic trends [2], [3]. A detailed Recurrent neural network (RNN) is an enhanced model
study on network traffic matrix is reported in [4]. A network of traditional FFN [14]. This contains a self-recurrent loop
service provider has to know the future trends of network which facilitates to carry out information from one time step
parameters, routers and other devices information in order to to another. This characteristic of RNN made suitable for
proceed with the traffic variability. To deal with the problem time series and sequence modelling tasks. LSTM is a type
of predicting the future trends of network parameters, routers of RNN which was proposed to alleviate the vanishing and

Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:52:57 UTC from IEEE Xplore. Restrictions apply.
978-1-5090-6367-3/17/$31.00 ©2017 IEEE 2353
exploding gradient issue of traditional RNN [15]. With the aim
to reduce the computational cost of LSTM, [16] introduced
GRU. [17] showed RNN with ReLU and proper initialization
of identity matrix to a recurrent weight matrix is capable
to perform closer in the performance of LSTM. This was
shown by evaluating the 4 standard experiments. This type of
RNN is called as identity-RNN (IRNN). The family of RNN
techniques has performed well in various long-standing tasks
like handwritten recognition, speech recognition, various tasks
related to natural language processing (NLP) [18]. Towards
transforming the efficacy of family of RNN techniques towards
the large scale traffic matrix prediction, in this paper we apply
and evaluate the effectiveness of various RNN networks on Fig. 1: Architecture of Multi-layer perceptron which has inputs
GÉANT backbone networks. x = x1 , x2 , · · · , xp−1 , xp and outputs ot = ot1 , ot2 , all
The paper is structured as follows. Section II discusses the connections and hidden layers and its units are not shown.
related work. Section III provides the background information
of RNN networks and traffic matrix prediction. Section IV
discusses the details of data sets and experiments related in mathematical terms and these neurons are connected by
to network parameters and network structure identification. an edge. Multi-layer perceptron (MLP) is one of the well-
Section V provides the detailed evaluation results. Finally, the known algorithms of FFN. It contains an input layer, output
conclusion and future work directions are placed at Section layer and the number of hidden layer is parameterized. FFN
VI. follows unidirectional mechanism to pass information from
II. R ELATED WORK one layer to the others. So the weights in the hidden layers
are not shared. MLP is a parameterized function that can
There are different methods exist for prediction of network be defined as ot : Ri × Rj where i and j are the size of
traffic. [13] discussed the various linear models such as the input vector x = x1 , x2 , · · · , xi−1 , xi and output vector
ARMA, ARAR and HW and non-linear approach such as ot = ot1 , ot2 , · · · , otj−1 , otj respectively. Each hidden layer
neural network with multi-resolution learning. In all configu- (hli ) computation is defined mathematically as follows
rations of experiments, non-linear model was performed well
in comparison to all the other linear models. [19] proposed hli (x) = f (wi T x + bi ) (1)
FARIMA predictors using RNN and stable to capture the non-
gaussian of the self-similar traffic. [20] was done comparative hli : Rdi −1 → Rdi , f : R → R, wi ∈ Rd×di−1 , b ∈
study of various prediction methods such as mode prediction, di
R , and f is either logistic sigmoid or hyperbolic tangent
key element with matrix prediction and prediction of princi- non-linear activation function. Logistic sigmoid have values
pal component. prinicipal component prediction was resulted between [0, 1] whereas [1,-1] range of values for hyperbolic
in less average prediction error for origin-destination (OD) tangent.
flows. TMP-KEC was decreased the prediction error for most
important elements and total matrix. 1
sigmoid = σ(x) = (2)
1 + e−z
III. BACKGROUND
This section discusses the idea behind the traffic matrix e2z − 1
hyperbolic tan gent = tanh(z) = (3)
prediction and the mathematical details of artificial neural e2z + 1
network (ANN). If a network consists of l layers then the combined repre-
A. Artificial neural network (ANN) sentation of them can be generally defined as

An Artificial neural network (ANN) is a computational


hl(x) = hll (hll−1 (hll−2 (· · · (hl1 (x))))) (4)
mechanism in machine learning that is influenced by the
attributes of biological neural networks. This forms a directed
graph in which nodes represent biological neurons and edges d0 = i (5)
represent synapses. Feed forward networks (FFN) and recur-
rent neural network (RNN) are two types of ANN. MLP is a parameterized function, thus optimal parameters
have to be selected to achieve an acceptable detection rate.
B. Feed forward network (FFN) These parameters are dynamic in nature, which means it
A feed-forward network (FFN) forms a directed graph changes with the data. Even, finding optimal parameters to
without formation of a cycle, as shown in Fig 1. The di- a certain data is considered as a research study. We use
rected graph consists of a set of neurons, called as units mean squared error as loss function in MLP to achieve

Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:52:57 UTC from IEEE Xplore. Restrictions apply.
2354
a better detection rate. Loss function is a difference be-
tween the expected (ep) and predicted value (pr) and it pr = g3(hl2 ; θ3) (12)
is defined with a list of corrected input-output set it ot =
(s1 , t1 ), (s2 , t2 ), · · · , (sn , tn ) as follows where i o = (s1 , t1 ), (s2 , t2 ), · · · , (sn , tn ) and to minimize
the error between the predicted value (pr) and an expected
n
1X value (ep), we use loss function.
loss(sn , tn ) = d(ti , f (si )) (6)
n i=1 To compute partial derivatives for the above case, we use
the chain rule
Given a loss function, T raini o (θ) ≡ Ji o (θ) =
n
1
P ∂L
n d(ti , fθ (si )) where θ = (w1 , b1 , · · · , wk , bk ) and we = 2 ∗ (pr − ep) (13)
i=1 ∂pr
need to find the appropriate value of θ ∈ Rd to minimize the    
loss function Ji o (θ). This includes the estimation of fθ (si ) ∂L ∂L ∂pr
= ∗ (14)
and ∇fθ (si ) at the cost |it ot| ∂hl2 ∂pr ∂hl2
   
min J(θ) (7) ∂L ∂L ∂hl2
θ = ∗ (15)
∂hl1 ∂hl2 ∂hl1
Gradient descent is one of the most commonly used tech-
niques to minimize the loss function. It follows a repeated ∂L

∂L
 
∂pr

estimation of a gradient to update its parameters. A single = ∗ (16)
∂θ3 ∂pr ∂θ3
update on training data is mathematically defined as follows.
   
∂L ∂L ∂hl2
θnew = θold − α∇θ J(θ) (8) = ∗ (17)
∂θ2 ∂hl2 ∂θ2
α is the learning rate, it is comparatively very small.    
There is no standard approach to select a particular value ∂L ∂L ∂hl1
= ∗ (18)
for learning rate. The optimal value is chosen by following ∂θ1 ∂hl1 ∂θ1
the trial and error mechanism during training phase. Learning C. Recurrent neural network (RNN)
rate has a direct impact on the speed of the optimization,
Recurrent neural network (RNN) is an enhanced model of
lesser the learning rate consumes more time to its convergence.
traditional FFN, as shown in Fig 2. A RNN unit contains a self-
Traditional machine learning classifiers follows the gradient
recurrent connection that helps to carry out information across
descent approach in which the problem surface is convex. In
time-steps. This characteristic of RNN facilitates to learn the
the case of neural networks the problem space is non-convex
temporal dependencies. This computational flow of RNN is
and a detailed analysis of non-convex training mechanism is
mathematically formulated as follows.
discussed by [21]. They reported that training deep networks
maps the data to a high dimensional space from a layer to
another layer. Thus the critical points might grow exponen- hlt = SG(wxhl xt + whlhl hlt−1 + bhl ) (19)
tially. Sometimes critical points might end up in saddle point
or a local optimum. To avoid this, we use backpropogation or
ott = sf (whlot hlt + bot ) (20)
backward propogation of errors algorithm. Stochastic gradient
descent (SGD) is a mechanism used by many deeper networks. where w represents weight matrices, b terms denotes bias
It facilitates to estimate the gradient on a random selection vectors, SG and SF is an element wise non-linear sigmoid,
of very few training samples by forming a mini-batch. As a sof tmax activation function, hl acts as a short-term memory
result, SGD is very faster in comparison to GD in reaching the to the RNN network. The RNN network is typically unfolded
minimum , while this will not be an efficient approximation in across time-steps and the process of applying backpropogation
comparison to standard gradient method. The deeper networks on the unfolded RNN is called as backpropogation through
of this paper have used SGD and its update rule is given by time (BPTT). RNN with BPTT generates vanishing and ex-
ploding gradient issue in backpropogating error gradient across
θnew = θold − α∇θ J(θ; s(i) , t(i) ) (9) many time-steps [22]. To alleviate, [15] introduced LSTM.
LSTM contains memory block as shown in Fig 2 that helps
where (s(i) , t(i) ) are input-output set of training samples.
to reduce vanishing and exploding gradient issue. A memory
Let us take three-layer network and formulate the problem in
block is a complex processing unit that contains input gate,
mathematical way.
output gate, forget gate and one or more memory cells, as
shown in Fig 2. A memory cell acts as a short term memory
hl1 = g1(x; θ1 ) (10)
to store dependencies across time-steps with the control from
the different gates such as input gate, forget gate and output
hl2 = g2(hl1 ; θ2 ) (11) gate.

Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:52:57 UTC from IEEE Xplore. Restrictions apply.
2355
Fig. 2: Architecture of multi-layer perceptron which has inputs
Fig. 3: Sliding window
x = x1 , x2 , · · · , xp−1 , xp and outputs ot = ot1 , ot2 , all
connections and hidden layers and its units are not shown.

IV. E XPERIMENTS
All experiments of RNN and LSTM network are trained
using BPTT on Graphics processing unit (GPU) enabled
TensorFlow [23] computational framework in Ubuntu 14.04.
Experiments of shallow networks are done using Scikit-learn
[24].
A. Description of traffic matrix data set
To evaluate the effectiveness of various RNN networks, FFN
and other classical method, we use the publically available and
most well-known data set such as GÉANT backbone networks Fig. 4: MSE across hidden layers in the range [100-700]
[25]. The GÉANT network includes 23 peer nodes and 120
undirected links 3 . 2004-timeslot traffic data is sampled from
the GÉANT networks by 15 minute time interval. From the
into two parts such as 700*529 for training and 200*529 for
10,772 traffic matrices, we choose arbitrarily 1200 traffic
validation. We use various experiments for FFN, RNN, LSTM,
matrices. Each traffic matrix T M is transformed to vector of
GRU and IRNN with the number of units and hidden layer,
size 23*23 = 529 T M V . These vectors are concatenated and
and learning rate. We train three trails of experiments for the
formed a new traffic matrix T M of size 1200*529. The traffic
units 100, 200, 300, 400, 500, 600 and 700 with 1 layer.
matrix T M is randomly divided in to two matrices such as
All experiments are run for 200 epochs. The performance
training T Mtrain 900*529 and T Mtest 300*529. There are
of each trained model associated with each units of FFN,
various ways exist for predicting the future traffic. One way is
RNN, LSTM, GRU and IRNN network is evaluated on the
to feed the vectors T M V in T M to prediction algorithms and
validation data, shown in Fig 4. An experiment with 600
predict one value of T M V at a time. This method was not
units is performed well in FFN, RNN LSTM, GRU and
correct due to the fact that the OD traffic is dependent on other
IRNN network. Recurrent networks have performed well in
ODs [26]. Thus capturing the patterns exist in previous traffic
comparison to the FFN. Moreover, LSTM has performed well
can enhance the performance of prediction of traffic matrix.
in comparison to the RNN network. The performance of
We use the slide window to create a training data set, as shown
both GRU and IRNN is comparable to the LSTM networks.
in Fig 3. The number of gray color boxes is the length of the
However, the computational complexity is more with LSTM
slide window with time-slots T S and OD pair. The values
in comparison to the GRU networks. The primary reason may
exist in right side of the slide window is the predicted value
be due to the fact that, LSTM can store and update the time
by prediction algorithms. We obtain T S − SW + 1 training
dependencies across various time steps. Next, we run two
records by sliding the window for the OD pair with T S time
trails of experiments across learning rate in the range [0.01-
slots.
0.5] for FFN, RNN LSTM, GRU and IRNN network. The
B. Hyperparameter selection performance is shown in Fig 5. In most of the cases, LSTM
As LSTM network is a parameterized function, finding an has performed well in comparison to the RNN networks. The
optimal parameter have direct impact on minimizing the mean performance of both GRU and IRNN netwpork is comparable
squared error. Initially, we divide the training data 900*529 to the LSTM. To select an appropriate network structure for
IRNN/GRU/LSTM/RNN/FFN network, we used the following
3 http://www.geant.net/ network topologies

Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:52:57 UTC from IEEE Xplore. Restrictions apply.
2356
Fig. 6: Architecture of proposed LSTM architecture. Innner
units and thier connections are not shown

Fig. 5: MSE across hidden layers in the range [100-700]

1) FFN/RNN/LSTM/GRU/IRNN 1 layer
2) FFN/RNN/LSTM/GRU/IRNN 2 layer
3) FFN/RNN/LSTM/GRU/IRNN 3 layer
4) FFN/RNN/LSTM/GRU/IRNN 4 layer Fig. 7: Stacked LSTM
5) FFN/RNN/LSTM/GRU/IRNN 5 layer
6) FFN/RNN/LSTM/GRU/IRNN 6 layer
Two trails of experiments are run for each network struc- peephole connections to all of its gates that learn the precise
tures with number of units 500 and learning rate 0.1. All timing of inputs with respect to the internal states. Later, many
experiment are run till 100 epochs. The best performed model variants of LSTM introduced. LSTM network have capability
of each network structure over epochs is shown in Fig 5. in what to learn, forget across many time steps. As a result
Experiments with recurrent networks have performed well in LSTM learns the long-range temporal context in sequence
comparison to the traditional FFN. Experiments in all network data modeling across many time-steps. The output of hidden
structures, LSTM has performed well in comparison to the recurrent layer is passed to the feed forward network layer.
RNN network. It use linear activation function for prediction with MSE for
estimating the error. The staked layer LSTM, is shown in Fig
C. Proposed architecture 7. We use more than one hidden recurrent LSTM layer instead
The LSTM architecture for predicting the future traffic ma- of increasing the memory blocks. The primary reason is that,
trix is shown in Fig 6. In input layer, we pass traffic matrix by adopting hidden recurrent LSTM layer is most suitable. This
using the sliding window approach to hidden recurrent LSTM maps the data to high dimensional space and passes through
layers. A hidden recurrent LSTM layer contains one or more non-linear recurrent layers to capture the information related
memory blocks. A memory block is a complex processing unit to the time domain.
consists of one or more memory cell and a set of multiplicative
gating units such as input and output gate. A memory block is V. E VALUATION R ESULTS
a primary unit that houses the information across time-steps.
Based on the aforementioned hyperparamter approach, we
It has a built-in self-connection called constant error carousel
select the 5 layer with 500 units and learning rate 0.1 for
(CEC) value as 1. It will be triggered when it doesn’t receive
all networks such as IRNN/GRU/LSTM/RNN to evaluate the
any value from the outside signal. The adaptive multiplicative
performance on the testing data set. We train 3 models for
gating units control the states of a memory block across time-
each FFN/RNN/LSTM/GRU and IRNN using the training data
steps. An entry or deny of an input flow of a cell activation to
of size 900*529. The performance of the trained model is
a memory cell is controlled by an input gate. The output state
evaluated on the test data of size 300*529. The detailed result
of a memory cell to other nodes is controlled by an output
is reported in Table I.
gate. An additional component is added to a memory block,
called forget gate [27] instead of CEC due to the fact that the
internal values of a memory cell could increase without any TABLE I: Prediction performance
constraints. An LSTM network with forget gate facilitate to
Algorithm MSE
forget or remember its previous state values. This has been
FFN 0.091
used as standard component in recent LSTM architectures.
RNN 0.067
Furthermore, peephole connection is added from internal states
LSTM 0.042
of a memory cell to all its gates to learn the precise timing
GRU 0.051
of the outputs [28]. The LSTM network is trained using the
aforementioned technique, such as BPTT. A memory cell has IRNN 0.059

Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:52:57 UTC from IEEE Xplore. Restrictions apply.
2357
VI. C ONCLUSION [7] W. E. Leland, W. Willinger, M. S. Taqqu, and D. V. Wilson, “On
the self-similar nature of ethernet traffic,” ACM SIGCOMM Computer
This paper has discussed data preprocessing and feeding Communication Review, vol. 25, no. 1, pp. 202–213, 1995.
techniques for RNN and it evaluated the effectiveness of var- [8] P. Cortez, M. Rio, M. Rocha, and P. Sousa, “Internet traffic forecasting
using neural networks,” in Neural Networks, 2006. IJCNN’06. Interna-
ious RNN architectures for network traffic matrix prediction tional Joint Conference on. IEEE, 2006, pp. 2635–2642.
using GÉANT backbone networks. In real time network data [9] H. Feng and Y. Shu, “Study on network traffic prediction techniques,”
sets, LSTM has performed well in comparison to the other in Wireless Communications, Networking and Mobile Computing, 2005.
Proceedings. 2005 International Conference on, vol. 2. IEEE, 2005,
RNN, FFN and classical methods. However, the performance pp. 1041–1044.
of the both GRU and IRNN techniques is comparable to [10] J. Dai and J. Li, “Vbr mpeg video traffic dynamic prediction based on
LSTM. Moreover, the GRU network has took less compu- the modeling and forecast of time series,” in INC, IMS and IDC, 2009.
NCM’09. Fifth International Joint Conference on. IEEE, 2009, pp.
tational cost in comparison to the LSTM networks. Overall, 1752–1757.
the proposed method has achieved the best performance by [11] V. Dharmadhikari and J. Gavade, “An nn approach for mpeg video traffic
accurately predicting the traffic matrix. prediction,” in Software Technology and Engineering (ICSTE), 2010 2nd
International Conference on, vol. 1. IEEE, 2010, pp. V1–57.
With concerning on computational cost, we are not able to [12] A. Abdennour, “Evaluation of neural network architectures for mpeg-4
train more complex architecture. The reported results can be video traffic prediction,” IEEE transactions on broadcasting, vol. 52,
no. 2, pp. 184–192, 2006.
further improved by using high computation cost architecture. [13] M. Barabas, G. Boanea, A. B. Rus, V. Dobrota, and J. Domingo-
The complex network architectures can be trained by using Pascual, “Evaluation of network traffic prediction based on neural
advanced hardware and following distributed approach in networks with multi-task learning and multiresolution decomposition,”
in Intelligent Computer Communication and Processing (ICCP), 2011
training that we are incompetent to try. IEEE International Conference on. IEEE, 2011, pp. 95–102.
The discussed RNN models are all same except the com- [14] J. L. Elman, “Finding structure in time,” Cognitive science, vol. 14,
putational unit in the recurrent hidden layer. RNN are funda- no. 2, pp. 179–211, 1990.
[15] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
mentally complex networks that consist of many interactions computation, vol. 9, no. 8, pp. 1735–1780, 1997.
between the computational units in the recurrent hidden layers [16] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares,
to carry out a certain task from one layer to the other. H. Schwenk, and Y. Bengio, “Learning phrase representations using
rnn encoder-decoder for statistical machine translation,” arXiv preprint
Despite the fact that the effectiveness of various RNN has arXiv:1406.1078, 2014.
analyzed on traffic matrix prediction data sets under different [17] Q. V. Le, N. Jaitly, and G. E. Hinton, “A simple way to initialize recur-
experiments, the information about the internal procedure of rent networks of rectified linear units,” arXiv preprint arXiv:1504.00941,
2015.
the operation in the network is partly demonstrated. This can [18] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
be explored by transforming the state of network to linearized no. 7553, pp. 436–444, 2015.
dynamics and compute, analyze the structure of Eigen values [19] Y. Wen and G. Zhu, “Prediction for non-gaussian self-similar traffic with
neural network,” in Intelligent Control and Automation, 2006. WCICA
and Eigen vectors from them across many time steps [29]. 2006. The Sixth World Congress on, vol. 1. IEEE, 2006, pp. 4224–4228.
This analysis provides which Eigen vector actually carrying [20] ——, “Prediction for non-gaussian self-similar traffic with neural net-
out application specific information and it will be orthogonal work,” in Intelligent Control and Automation, 2006. WCICA 2006. The
Sixth World Congress on, vol. 1. IEEE, 2006, pp. 4224–4228.
to the invariance of the network. So overall the context of [21] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-
Eigen vector associated with the linearized weight matrix wise training of deep networks,” in Advances in neural information
enables to understand the overall provenance of the networks processing systems, 2007, pp. 153–160.
[22] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies
performance in each time steps. with gradient descent is difficult,” IEEE transactions on neural networks,
vol. 5, no. 2, pp. 157–166, 1994.
R EFERENCES [23] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,
S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: A system for
[1] Y. Zhang, M. Roughan, N. Duffield, and A. Greenberg, “Fast accurate large-scale machine learning.” in OSDI, vol. 16, 2016, pp. 265–283.
computation of large-scale ip traffic matrices from link loads,” in ACM [24] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
SIGMETRICS Performance Evaluation Review, vol. 31, no. 1. ACM, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al.,
2003, pp. 206–217. “Scikit-learn: Machine learning in python,” Journal of Machine Learning
[2] A. Soule, A. Nucci, R. Cruz, E. Leonardi, and N. Taft, “How to Research, vol. 12, no. Oct, pp. 2825–2830, 2011.
identify and estimate the largest traffic matrix elements in a dynamic [25] S. Uhlig, B. Quoitin, J. Lepropre, and S. Balon, “Providing public in-
environment,” in ACM SIGMETRICS Performance Evaluation Review, tradomain traffic matrices to the research community,” ACM SIGCOMM
vol. 32, no. 1. ACM, 2004, pp. 73–84. Computer Communication Review, vol. 36, no. 1, pp. 83–86, 2006.
[3] A. Medina, N. Taft, K. Salamatian, S. Bhattacharyya, and C. Diot, [26] R. Moazzezi, “Change-based population coding,” Ph.D. dissertation,
“Traffic matrix estimation: Existing techniques and new directions,” UCL (University College London), 2011.
ACM SIGCOMM Computer Communication Review, vol. 32, no. 4, pp. [27] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget:
161–174, 2002. Continual prediction with lstm,” 1999.
[4] P. Tune and M. Roughan, “Internet traffic matrices: A primer,” Recent [28] F. A. Gers, N. N. Schraudolph, and J. Schmidhuber, “Learning precise
Advances in Networking, vol. 1, pp. 1–56, 2013. timing with lstm recurrent networks,” Journal of machine learning
[5] A. Soule, A. Lakhina, N. Taft, K. Papagiannaki, K. Salamatian, A. Nucci, research, vol. 3, no. Aug, pp. 115–143, 2002.
M. Crovella, and C. Diot, “Traffic matrices: balancing measurements, in-
ference and modeling,” in ACM SIGMETRICS Performance Evaluation
Review, vol. 33, no. 1. ACM, 2005, pp. 362–373.
[6] M. Polverini, A. Iacovazzi, A. Cianfrani, A. Baiocchi, and M. Listanti,
“Traffic matrix estimation enhanced by sdns nodes in real network topol-
ogy,” in Computer Communications Workshops (INFOCOM WKSHPS),
2015 IEEE Conference on. IEEE, 2015, pp. 300–305.

Authorized licensed use limited to: Sathyabama Institute of Science and Technology. Downloaded on January 30,2025 at 07:52:57 UTC from IEEE Xplore. Restrictions apply.
2358

You might also like