0% found this document useful (0 votes)
7 views

spatial-internet-traffic-load-forecasting-with-using-estimation-method

NA

Uploaded by

dynamogaming8055
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

spatial-internet-traffic-load-forecasting-with-using-estimation-method

NA

Uploaded by

dynamogaming8055
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Available online at www.sciencedirect.

com

ScienceDirect
Procedia Computer Science 35 (2014) 290 – 298

18th International Conference on Knowledge-Based and Intelligent


Information & Engineering Systems - KES2014

Spatial internet traffic load forecasting with using estimation method


Anna Kamińska-Chuchmała∗
Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, Wrocław 50-370, Poland

Abstract
Internet traffic is one of the most unpredictable and fluctuating phenomenon to forecast, although accurate prediction is difficult
challenge. Many research about measurement experiments are dedicated to predict the performance of Internet network. Especially,
during last years this issue is important, when growing demand on reliable access to the Internet is desired by users. In this
paper spatial (temporal-area) Internet traffic load forecasting is proposed. Data are obtained from conducted active measurement
experiment. Period of time from which is contained database amounts three weeks of October 2013 and each day at the same time
at: 06:00 am, 12:00 pm, 06:00 pm and 12:00 am the data were collected. This experiment relies on download a copy of the same
resource from servers located in Europe by Wrocław agent. One of the most interesting variable obtained from this experiment
is total download time of indicated resource. On basis of this experiment, the Internet traffic forecasts with one week ahead are
performed. Spatial forecasting is made by using geostatistical estimation method - ordinary kriging. Paper contains description of
ordinary kriging method and preliminary measurement data analysis. Next, model of forecast with discussion of results are given.
The final view of performance considered the Internet network in Europe ending the paper.

©c 2014 The Authors.
Authors. Published
PublishedbybyElsevier
ElsevierB.V.
B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/3.0/).
Peer-review under responsibility of KES International.
Peer-review under responsibility of KES International.
Keywords: Internet traffic loads, network performance, spatial forecast, kriging method, active measurement

1. Introduction

Since many years the issue of Internet traffic has been researched e.g. in terms of performance, scalability, and
changeability. There exist organizations, such as the Cooperative Association for Internet Data Analysis (CAIDA)
which is a collaborative undertaking among organizations in the commercial, government, and research sectors aimed
at promoting greater cooperation in the engineering and maintenance of a robust, scalable global Internet infrastruc-
ture 1 . CAIDA researchers conduct regular measurements of Internet traffic at various networks, develop analysis
tools, analyze available traffic samples, and indicate future directions in traffic classification 2 . The another example is
integrated project funding by European Union, in which European Traffic Observatory Measurement InfrastruCture
(ETOMIC) was created 3 . During these investigations, active experiment ETOMIC is distributed throughout Europe
and allows users to infer network topology and discover its specific characteristics, such as delays and available band-

∗ Corresponding author. Tel.: +48 71 320 40 20; fax: +48 71 321 10 18.
E-mail address: [email protected]

1877-0509 © 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/3.0/).
Peer-review under responsibility of KES International.
doi:10.1016/j.procs.2014.08.109
Anna Kamińska-Chuchmała / Procedia Computer Science 35 (2014) 290 – 298 291

widths 4 . Moreover a lot of researchers are focused on active measurement for a long time, because it is claimed that
the passive measurements could not give any conclusive answer 5 and try to examine performance of Internet on differ-
ent ways, for example by analyzing packet delay dynamics 6 . Additionally on basis of analysis of experiments could
be concluded that traffic associated with the monitoring of individual Web servers is characterized by self-similarity,
burstiness, long tail and seasonality 7 .
In this paper author using active measurement experiment as database to forecast research, which could be helpful
to picture how the traffic in the Internet and network performance are presented at the moment. The gain of this
paper is to show new approach to - not in 2D but 3D spatial (temporal-area) - forecast Internet traffic loads with using
geostatistical estimation method, which was not used in this domain by now. In the next section related works with
Internet traffic load prediction are presented.

2. Related work

Over the past decade the researches were conducted many Internet traffic loads predictions. For example in 8
was proposed Generalized Autoregressive Conditional Heteroscedastic (GARCH) model to forecast traffic load and
practical techniques for model fitting. The proposed simulation model Markov Chain Monte Carlo (MCMC) pro-
vides approach for simulating internet data traffic patterns. Moreover authors compared their approach with Seasonal
Auotoregressive Moving Average (SARIMA) model.
In other paper 9 : ARM, ARIMA, FARIMA (Fractional ARIMA) models, and Fractional Gaussian Noise (FGN)
were proposed to Web traffic modeling. Authors presented their approach in six steps. ARMA (Autoregressive
Moving Average) and FARIMA are very popular and useful models for network traffic prediction. Paper 10 shows
using these two mentioned models for conducted performance tests and comparisons with also Gaussian predictor.
Authors conducted comparisons on basis on the mean packet delay, the variance of the packet delay, and the buffer
requirements.
The different approach to performance prediction of network is in 11 , where authors design and validation of a
system that can be used by an autonomic manager to predict the response times of transaction-oriented applications.
However, in those mentioned forecasted methods there is no possibility to obtain temporal-area forecast, it can
be only temporal. Moreover, additional parameters such as input data are required to prepare forecast for better
accuracy. These disadvantages affects the thinking about better solution as a forecast method, because nowadays
network operators would like to know how will appear network traffic in future on whole considered area, also in
place where this information is not given.
Interesting response on this request could be geostatistical methods, where values are estimated on whole consid-
ered area on the created 3D grid. Moreover, minimum of input parameters is enough to prepare forecast with good
accuracy. Geostatistical estimation method was use for example to forecast traffic cars basis on the floating car speed
data in Beijing 12 . Having regard to all of advantages of geostatistical estimation method author decided to applied it
to spatial Internet traffic forecast.
Currently, to the best of author knowledge the spatial kriging methods approach to Internet loads prediction as
presented in this paper is unique, leaving no similar problem statement in the literature. Till now, there is not any
research about spatial Internet network loads forecast with using geostatistical methods except author’s works (for
example 13,14 ).
Next section described geostatistical estimation method in details.

3. Estimation method

One of the main geostatistical estimation methods is Ordinary Kriging (OK). The kriging term was coined by
Georges Matheron in 1963 in honor of Danie Krige. OK is used to estimate a value at point of an area for which
a variogram is known with using data in the neighborhood of the estimation location. It could be defined kriging
estimate, named Z ∗ as a linear combination of the neighboring information Z with weights ωα :

n
Z ∗ (x0 ) = ωα Z(xα ), (1)
α=1
292 Anna Kamińska-Chuchmała / Procedia Computer Science 35 (2014) 290 – 298

where:
ωα - weights attached to the Z(xα ),
Z(xα ) - a random variable at each of the n locations constructed at the data locations xα .
It is important to mention, that it has to constrain the weights to value equals 1, because in the particular case when
all data values are equal to a constant, the estimated value should also be equal to this constant.
It could be assumed that the data are part of a realization of an intrinsic random function Z(x) with a variogram
γ(h), where h is a vector linking any two points x and x + h.
Imposed, that estimator at the target denoted 0 to be unbiased (which assumes that the expectation of the linear
combination exists):
E[Z ∗ (x0 ) − Z(x0 )] = 0, (2)
and estimation variance is the variance of the linear combination:
σ2E = var(Z ∗ (x0 ) − Z(x0 )). (3)
The resulting OK equations are obtained by minimizing the estimation variance with the constraint on the weights:
⎧ n


⎪ ωβ γ(xα − xβ ) + μ = γ(xα − x0 ) for α = 1, ..., n


⎨ β=1


⎪ n (4)


⎩ ωβ = 1
β=1

Finally, the estimation variance of OK is:



n
σ2 = μ − γ(x0 − x0 ) + ωα γ(xα − x0 ). (5)
α=1

For better understanding of OK method, the working flows are presented by block diagrams included the preliminary
and structural data analysis in figure 1 and the OK method in figure 2. On the first block diagram it is started
from preparing basic statistics of input data such as: minimum, maximum and average values; standard deviation;
variability, skewness and kurtosis coefficients. According to statisticians approach, if skewness is equal more than 3 it
means that data are characterized by asymmetry. In geostatistical methods, the Gaussian (symmetrical) distribution is
aimed, then accuracy of forecast will be much higher. One of the solution (using in this paper) using by statisticians
is logarithmization of data to obtain more symmetric values. That is why, in block diagram, it is check the value of
skewness. Next step is prepare other graphics statistics like basemap, histogram or Quantile-Quantile plot to better
analysis the input data.
Afterward, structural analysis is performed. It contains creating directional variogram (direction along third axis
of time for forecasting) and next fitted theoretical model to variogram function. The best option to verify correctness
of variogram approximation is cross-validation. Obtained model of variogram will be using in OK method.
Second block diagram in figure 2 describe procedure of preparing model of forecast to calculate OK method and
after that analysis of obtained results. At the beginning in the model of forecast input data from the history of database
are uploaded with calculated model of variogram obtained from previous block diagram and created elementary 3D
grid. Next step is setup of parameters such as: estimation type and kind of neighborhood. Subsequently, OK method
is calculated in pursuance of theory and Z ∗ is estimated according with equation 1. After forecast, if data were in
logarithmic scale then are transformed back to the original values. Next step is analyzing of results with using as
example raster maps and statistics parameters.
More information about estimation kriging methods could be found in 15,16 .

4. Preliminary and structural data analysis

Presented in previous section geostatistical estimation could be useful for traffic forecasting and in this paper it will
be present that it is possible to obtain forecasts with good quality based on geostatistical estimation OK method.
In this section preliminary and structural data analysis will be discuss according to schema of block diagrams.
Anna Kamińska-Chuchmała / Procedia Computer Science 35 (2014) 290 – 298 293

Fig. 1. Block diagram of preliminary and structural data analysis.


294 Anna Kamińska-Chuchmała / Procedia Computer Science 35 (2014) 290 – 298

Fig. 2. Block diagram of OK method.

Data necessary for forecasting Internet traffic were obtained from an active experiment VMWING (Virtual Mul-
tiagent Web pING). This experiment was performed in author’s Distributed Computer Systems Division at Wrocław
University of Technology. VMWING is distributed experiment on European servers from which agent in Wrocław is
downloading a copy of the same resource: server install image for PC (Intel x86) computers (zsync metafile). The size
of this resource is 1.3MB. The database includes the information about: the Internet loads variable which is the total
downloading time of ubuntu-13.04-server-i386.iso.zsync file, server’s geographical location which the agent targeted,
and the time stamp of taking a measurement. The experiment was taken between 4th and 24th of October 2013, every
day at the same time: 06:00 am, 12:00 pm, 6:00 pm and 12:00 am. Simplified schema of VMWING experiment is
presented in figure 3.
The elementary statistics of history of database for particular hours are presented in table 1. The Internet traffic
considered during first three weeks of October is characterized by changeability of download time. The highest
boundaries values of download time and coefficients of variability, skewness and kurtosis are for midnight, which
may point to intensified of traffic in Internet at midnight or that the part of servers in network are disabled at night
and then communications is slower. For every four considered hours during the day, there is high difference between
minimum and maximum value, more than 35 seconds. Moreover average value related to the limits values is quite
Anna Kamińska-Chuchmała / Procedia Computer Science 35 (2014) 290 – 298 295

Fig. 3. Internet measurement infrastructure system VMWING.

small, what indicates some unpredictable peaks. Additionally, variability of coefficient equals about 100% what
proves how this process is varied. Skewness is equal more than 3 for 12:00 am and almost 3 for other hours.

Table 1. Fundamental statistics of download times from servers between 4-24.10.2013.


Statistical parameters 6:00 am 12:00 pm 6:00 pm 12:00 am
Minimum value Zmin [s] 0.00 0.09 0.08 0.08
Maximum value Zmax [s] 35.95 35.95 35.91 35.99
Average value Z [s] 6.14 6.26 6.58 5.86
Standard deviation S [s] 6.19 6.14 6.44 5.96
Variability coefficient V [%] 100.81 98.08 97.87 101.71
Skewness coefficient G 2.98 2.87 2.84 3.12
Kurtosis coefficient K 12.57 11.88 11.38 13.89

Fig. 4. Histograms of download time of resource from European servers at 12:00 pm in October 2013.

The histogram presented in figure 4 has a big right side asymmetry of distribution. The highest frequencies of
download time are included in interval between 2 and 4 seconds. In other intervals the frequencies is well over
half the smaller. That means that download time equals more than 4 could occur as some unpredictable peaks. In
conjunction with that skewness is equal about 3 and the histogram show asymmetry, the data have to be logarithmized
before next calculation with using OK method.
Before preparation of forecast model, during structural data analysis, directional variogram is modeled. Function
of variogram is approximated by the best appropriate theoretical models: nuggets effect and cubic, presented in figure
296 Anna Kamińska-Chuchmała / Procedia Computer Science 35 (2014) 290 – 298

Fig. 5. Directional variogram of download time approximated by the theoretical model of the nuggets effect and cubic.

5. A directional variogram is calculated along the time axis (for 90◦ direction), distance class for this variogram equals
6.62◦ and sill is equal 0.08. The variogram function indicates a very gentle rising trend.

5. Spatial internet traffic load forecasting with using ordinary kriging method

After preliminary and structural data analysis, the model of forecast is prepared. This model using directional vari-
ogram presented in previous section. Additionally the moving neighborhood type is used, where the search ellipsoid is
equal 15.32◦ for X, Y and Z directions. In this model ordinary punctual kriging is calculated. Temporal-area forecast
is performed with a one-week time advance, i.e. it encompassed the period between 25th and 31th October 2013. The
table 2 presents global statistics of the average forecasted values of Internet traffic loads during one week.

Table 2. Global statistics of forecasted download times of resource with one-week time advance, calculated with OK method.
Geostatistical parameter Average Minimum Maximum Variance Standard Variability
value value value S2 deviation coefficient
Z [s] Zmin [s] Zmax [s] [s]2 [s] V [%]
Mean forecasted value Z for 6:00 am 4.98 2.23 13.84 4.84 2.20 44.18
Mean forecasted value Z for 12:00 am 4.99 2.23 13.86 4.80 2.19 43.89
Mean forecasted value Z for 6:00 pm 4.98 2.23 13.38 4.63 2.15 43.17
Mean forecasted value Z for 12:00 pm 4.99 2.04 13.81 4.88 2.21 44.29

The results of forecast presented in table 2 are less varied than input database, and the variability coefficient equals
below 45%. On the other hand, difference between minimum and maximum value is still large and equals more than
10 seconds. For every considered hour the results are similar with little more loads of the Internet at 12:00 pm as in
the case of history of database presented in table 1. This situation with high dispersion of the download time values
could be evidence of larger traffic in the Internet networks during the midnight.
Anna Kamińska-Chuchmała / Procedia Computer Science 35 (2014) 290 – 298 297

Geostatistical estimation methods could give information about loads not only in considered servers (points), but
for whole considered area. Thus, results of spatial (temporal-area) forecasts could be illustrated as a raster map.
Exemplary raster map for last day of prognosis (31th October 06:00 pm), is presented in figure 6. This map shows
the download time of resource from the Internet. The cross corresponds to the download time from a given European
server (the download time is given in seconds). As could be see in this map, a few servers in Spain are overloaded.

Fig. 6. Raster map of Internet traffic loads in 31th October 06:00 pm

For better analysis of results, comparison of forecasted and real download time of file localized as example at
server of Université de Nantes in France is presented in table 3. The file named as ubuntu-13.04-server-i386.iso.zsync
was downloaded from url: http://ubuntu.univ-nantes.fr/ubuntu-cd/. As could be seen in table 3 the best accuracy of
forecasts presented as an errors ex post equals 1.62% is for 29th October at 12:00 pm. The highest error ex post
equals 25.58% is for 28th October at 06:00 pm. This fluctuation of accuracy at evening and night hours confirms
impediments in Internet traffic. Average ex post error for whole forecast in one-week ahead is equal 23.61% for server
in academic campus in Nantes.

Table 3. Comparison of real and forecasted total download time of file from server in Nantes, France between 25th and 31th October 2013.
Day and hour of forecast Real download time [s] Forecasted download time [s] Averaged forecasted error ex post [%]
25th October 2013 06:00 am 4.82 4.21 12.59
26th October 2013 12:00 pm 4.48 3.79 15.38
27th October 2013 12:00 am 4.20 3.78 9.96
28th October 2013 06:00 pm 4.46 3.32 25.58
29th October 2013 12:00 pm 3.10 3.15 1.62
30th October 2013 12:00 am 2.99 3.14 4.91
31th October 2013 06:00 pm 2.59 2.95 13.95

Obtained results of spatial forecast are with satisfactory accuracy and using minimum of input data. As advantage
of this OK method is also obtained forecasts in missing points on the 3D grid (area map). However, disadvantage of
this geostatistical estimation method is only one realization of forecast, what give a little less reflect of reality. More
accurate to present reality behavior of Internet traffic could be geostatistical simulation method, which was compared
298 Anna Kamińska-Chuchmała / Procedia Computer Science 35 (2014) 290 – 298

by the author on basis different MWING experiment in 17 . Difference in accuracy was on level 1-4% in favor of
simulation methods.

6. Conclusions

In this paper active measurement experiment VMWING was analyzed. Next spatial Internet traffic load forecasting
with using ordinary kriging method was described. Finally, the accuracy of prediction was discussed.
Active experiments are very valuable in research, because they give possibility to analyzing variability of traffic in
Internet. Spatial forecast with using OK method describe the picture of future demand on access to the Internet with
satisfactory accuracy. Author compared spatial forecast with using geostatistical methods with spatial econometric
methods and indicated advantages and disadvantages of these methods in 18 . As advantage of geostatistical methods
could be additional informations about estimated values on whole considered area not only in measurement points.
However by using spatial econometric methods it could be obtain better accuracy of prediction.
In future research it will be indicated to perform new different experiment with agents in new localizations, created
new models and further work towards improve accuracy of prediction.

References

1. http://www.caida.org/research/traffic-analysis/
2. Dainotti A, Pescapè A, Claffy K. Issues and Future Directions in Traffic Classification. IEEE Network; 2012, vol. 26, no. 1, p. 35-40.
3. http://www.etomic.org
4. Csabai I et al. ETOMIC Advanced Network Monitoring System for Future Internet Experimentation. Infocommunications Journal; 2010, vol.
65, p. 25-31.
5. Krashakov SA, Shchur LN. Active Measurements (Experiments) of the Internet Traffic Using Cache-mesh. Int. Journal of Modern Physics C;
2001, vol. 12(4), p. 549-562.
6. Wang K, Li Z-C, Yang F, Wu Q, Bi J-P. Experiment and Analysis of Active Measurement for Packet Delay Dynamics. Lecture Notes in
Computer Science, Springer-Verlag Berlin Heidelberg; 2005, vol. 3619, p. 1063-1072.
7. Borzemski L. The Experimental Design for Data Mining to Discover Web Performance Issues in a Wide Area Network. Cybernetics and
Systems: An International Journal; 2010, vol. 41, p. 31-45.
8. Syed AR, Saleem H, Syed H. MCMC simulation of GARCH model to forecast network traffic load. International Journal of Computer Science
and Informatics; 2012, vol. 9, issue 3, no 2, p. 277-284.
9. Wang X, Goseva-Popstojanova K. Modeling Web Request and Session Level Arrivals. IEEE Computer Society, Advanced Information Net-
working and Applications, AINA ’09. International Conference on; 2009, p. 24-32.
10. Cui W, Bassiouni M A. Virtual private network bandwidth management with traffic prediction. Elsevier, Computer Networks; 2003, vol. 42, p.
765-778.
11. Kirtane S, Martin J. Application Performance Prediction in Autonomic Systems. Proc. of the 44th Annual Southeast Regional Conference;
2006, p. 566-572.
12. Wang Y, Zhuang D, Liu H. Spatial Distribution of Floating Car Speed. Journal of Transportation Systems Engineering and Information
Technology; 2012, vol. 12, issue 1, p. 36-41.
13. Borzemski L, Kamińska-Chuchmała A. Client-Perceived Web Performance Knowledge Discovery through Turning Bands Method. Cybernet-
ics and Systems: An International Journal; 2012, vol. 43, issue 4, p. 354-368.
14. Borzemski L, Kamińska-Chuchmała A. Distributed Web Systems Performance Forecasting Using Turning Bands Method. IEEE Transactions
on Industrial Informatics; 2013, vol. 9, issue 1, p. 254-261.
15. Wackernagel H. Multivariate Geostatistics: An Introduction with Applications. Springer-Verlag Berlin Heidelberg; 2003.
16. Chilè J-P, Delfiner P. Geostatistics: Modeling Spatial Uncertainty; Wiley, 2012.
17. Borzemski L, Kamińska-Chuchmała A. Web performance forecasting with kriging method. Contemporary Challenges and Solutions in Applied
Artificial Intelligence Studies in Computational Intelligence, Springer, 2013, vol. 489, p. 149-154.
18. Borzemski L, Kamińska-Chuchmała A. Spatial Econometrics Models in Web Servers Performance. Communications in Computer and Infor-
mation Science, vol. 370, Springer-Verlag Berlin, 2013, p. 45-54.

You might also like