2015 Katja Rand Predi
2015 Katja Rand Predi
18
19
20 art ic l e i nf o a b s t r a c t
21
22 Article history: Workload estimation and prediction has become a very relevant research area in the field of cloud
Received 8 July 2014 computing. The reason lies in its many benefits, which include QoS (Quality of Service) satisfaction,
23
Received in revised form automatic resource scaling, and job/task scheduling. It is very difficult to accurately predict the workload
24 15 March 2015
of cloud applications if they are varying drastically. To address this issue, existing solutions use either
25 Accepted 2 June 2015
statistical methods, which effectively detect repeating patterns but provide poor accuracy for long-term
26
predictions, or learning methods, which develop a complex prediction model but are mostly unable to
27 Keywords: detect unusual patterns. Some solutions use a combination of both methods. However, none of them
28 Infrastructure management address the issue of gathering system-specific information in order to improve prediction accuracy. We
29 Cloud computing
propose an Advanced Model for Efficient Workload Prediction in the Cloud (AME-WPC), which combines
IaaS
30 statistical and learning methods, improves accuracy of workload prediction for cloud computing
Resource auto-scaling
31 Workload prediction applications and can be dynamically adapted to a particular system. The learning methods use an
32 Random forest extended training dataset, which we define through the analysis of the system factors that have a strong
33 influence on the application workload. We address the workload prediction problem with classification
34 as well as regression and test our solution with the machine-learning method Random Forest on both –
basic and extended – training data. To evaluate our proposed model, we compare empirical tests with
35
the machine-learning method kNN (k-Nearest Neighbors). Experimental results demonstrate that
36
combining statistical and learning methods makes sense and can significantly improve prediction
37 accuracy of workload over time.
38 & 2015 Elsevier Ltd. All rights reserved.
39
40 67
41 68
42 1. Introduction factors (e.g. current state of the system, number of users, projects,
69
43 upcoming events and projects such as software development projects
70
44 The area of cloud computing is relatively new and has expanded or delivering a service to a customer). These decisions can be
71
45 considerably in the past few years. Usage of dynamically scalable and identified on the basis of future workload predictions, which can be
72
46 often virtualized computing resources that are available as services achieved through identification of historical usage patterns, analysis of
73
47 over the Internet have gained a lot of attention from both industry and historical data or current state of the system. To enable more reliable
74
48 academia. decisions about current and future resource scaling, it is important to
75
49 The evolution of cloud computing IT services took a step forward in establish an effective workload prediction mechanism. Why cannot
76
50 the efficient use of hardware resources through the use of virtualiza- resources be increased exactly when we need them to? Why is it
77
51 tion. In traditional hosting services the user receives a static amount of better to have this knowledge in advance? The answer to those
78
52 hardware resources. In contrast, the cloud computing approach offers questions relies on the fact that initializing additional virtual resources
79
53 Q4 on-demand virtualized resources to its users (Buyya et al., 2009). in a cloud is not instantaneous – cloud-hosting platforms introduce
80
54 Because virtual resources can be added or removed at any time during several minutes delay in the hardware resource allocation (Islam et al.,
81
55 the lifetime of the application hosted in a cloud, the possibility of 2012), which could cause a lot of inconveniences for the end users
82
56 dynamic scaling arises, along with the need for more advanced (interruptions, operational costs, utilization of resources, loss of clients,
83
57 resource management systems (Manvi and Shyam, 2014). For systems etc.). Finally, appropriate workload prediction mechanisms would not
84
58 to work seamlessly, decisions about dynamic resource scaling should only resolve the problem with the shortage of hardware resources, but
85
59 be carefully scrutinized – because they are influenced by a lot of also the problem with unused resources, which make the cloud costly
86
60 and inefficient.
87
61 In this research paper we present an Advanced Model for Efficient
88
62 E-mail addresses: [email protected] (K. Cetinski), Workload Prediction in the Cloud (AME-WPC) which combines
89
63 [email protected] (M.B. Jurič). statistical and learning methods. In order to improve capabilities of
90
64 http://dx.doi.org/10.1016/j.jnca.2015.06.001 91
65 1084-8045/& 2015 Elsevier Ltd. All rights reserved. 92
66
Please cite this article as: Cetinski K, Jurič MB. AME-WPC: Advanced model for efficient workload prediction in the cloud. Journal of
Network and Computer Applications (2015), http://dx.doi.org/10.1016/j.jnca.2015.06.001i
2 K. Cetinski, M.B. Jurič / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎
1 the learning method, we propose domain-specific database exten- Thus, statistical methods have been successfully used for short- 67
2 sions, which we define through analysis of the system factors that term predictions. Furthermore, HMMs are not a good fit for the 68
3 have a strong influence on the application workload (e.g. part of day, time series predictions, since they are used mostly for predicting 69
4 holidays and weekends). Additional database extensions are defined the labels (hidden states) of a fully observed sequence, not for 70
5 using a novel Two-phase Pattern Matching method (TPM), which completing a sequence. More reliable decisions about long-term 71
6 covers two phases: it recognizes similar patterns based on workload future workloads are often made based on a complex prediction 72
7 value and similar patterns based on workload fluctuation. TPM can be model, which uses machine-learning methods. 73
8 repeatedly applied to the most recent historical workload data in 74
9 order to regularly improve the prediction model. We address the 2.2. Learning methods 75
10 workload prediction problem with classification as well as regression 76
11 and test our solution with the machine learning method Random As previously mentioned, the problem with statistical methods is 77
12 Forest on both – basic (contains only attributes given in source data) poor accuracy particularly with long-term forecasting. This means that 78
13 and extended (contains an additional set of attributes) – training data. erratic fluctuations, that are typical for time series, are practically 79
14 Finally, we demonstrate capabilities of AME-WPC on AuverGrid impossible to predict. This problem can be resolved with the use of 80
15 workload data series, obtained from Grid Workloads Archive (http:// machine-learning methods such as k Nearest Neighbors (kNN), 81
16 gwa.ewi.tudelft.nl/). For evaluation purposes, we include the machine- Regression Tree, variations of neural networks (Frank et al., 2001; 82
17 learning method kNN. Experimental results demonstrate that the use Chen et al., 2005; Donate et al., 2013; Eddahecha et al., 2013; Chang 83
18 of combined methods significantly improves prediction accuracy of et al., 2014), Support Vector Machine (SVM) (Cao, 2003) and many 84
19 workload over time. others (Chen et al., 2005; Donate et al., 2013; Saadatfar et al., 2012). 85
20 The rest of the paper is organized as follows: Section 2 discusses Advantages of these methods are that they learn from historical data 86
21 related work. Section 3 defines a problem domain and presents data (search connections among attributes) and build a model that is used 87
22 collection and processing. The proposed model is highlighted in for predicting future values. Variations of neural networks (NN) have 88
23 Section 4, which is followed by experimental results in Section 5. been used widely for time series predictions. As mentioned, NNs can 89
24 Section 6 discusses the conclusion of the work presented. be accurate prediction models, but are time-consuming and complex. 90
25 On the other hand, simple machine-learning methods such as Naïve 91
26 Bayes and Linear Regression do not perform with sufficient accuracy 92
27 2. Related work for complex and non-linear problems such as time series predictions. 93
28 The machine-learning method kNN was used for time series 94
29 Methods for ensuring efficient workload prediction in the cloud prediction by various researchers (Troncoso Lora et al., 2004; 95
30 have yet to be addressed in a way comparable to the approach Imandoust and Bolandraftar, 2013; Ban et al., 2013). The main idea 96
31 proposed in this paper. Workload of infrastructure resources can be of the kNN technique for pattern classification is based on the 97
32 presented as a time series, which is a sequence of data points typically similarity of the individuals. It classifies objects based on closest 98
33 measured at successive points in time spaced at uniform time training examples in the feature space. In a similar way, our approach 99
34 intervals. Time series forecasting relies on a model to predict future searches for patterns in existing training datasets, but with our own 100
35 values based on previously observed values. There are many existing pattern-matching technique TPM. Similar patterns are determined 101
36 research studies on this topic and researchers have addressed this based on two phases – value and fluctuation. In contrast to kNN, our 102
37 problem by leveraging different approaches. approach extends training dataset based on the results of TPM, which 103
38 identifies the most similar time-points in the historical workload and 104
39 produces additional attributes. Furthermore, our approach applies 105
40 2.1. Statistical methods confidence factors to the Random Forest predictions, which improves 106
41 confidence in the predicted values. 107
42 One group of methods that is often used for predicting time series 108
43 is statistical methods (Quiroz et al., 2009; Mentzer and Moon, 2004; 2.3. Hybrid methods 109
44 Ganapathi et al., 2010), which cover the identification of similar past 110
45 occurrences with the current short-term workload history (i.e. pattern In order to achieve better workload prediction accuracy, the 111
46 matching) (Caron et al., 2010a,b, 2011; Gmach et al., 2007; Liu et al., following researchers used a combination of statistical and machine- 112
47 2011), autoregression (AR) model (Li et al., 2011; Li, 2005), Monte Carlo learning methods. Montes et al. (2011) propose an approach that 113
48 (Vercauteren and Aggarwal, 2007), Moving Average (MA) model combines the use of the machine-learning prediction techniques with 114
49 (Ardagna et al., 2012), Exponential Smoothing (ES) (Kalekar, 2004), a single entity vision of the grid in order to improve the management 115
50 Autoregressive Integrated Moving Average (ARIMA) (Zhang et al., of the whole system. Furthermore, Vercauteren and Aggarwal (2007) 116
51 2009; Roy et al., 2011; Doulamis et al., 2007; Kalantari and Akbari, propose a solution to the web server load prediction problem based 117
52 2009; Cortez et al., 2012), Linear Regression and Quadratic Regression on a hierarchical framework. Li et al. (2011) present an integrated 118
53 (Sun et al., 2013; Yang et al., 2014) and Hidden Markov Model (HMM) approach that employs three-layered resource controllers using dif- 119
54 (Khan and Anerousis, 2012; Li and Cheng, 2010; Gong et al., 2010). For ferent analytic techniques, including statistical machine learning. 120
55 short-term predictions and estimation of predicted values, filters such Moreover, Li (2005) proposes a hierarchical framework for modeling 121
56 as Kalman's (Kalantari and Akbari, 2009; Cortez et al., 2012) are often workload. Zhang (2003) proposes a hybrid methodology that com- 122
57 used. Furthermore, some of the researchers focus on extracting the bines both ARIMA and ANN models. Moreover, Cortez et al. (2012) 123
58 small number of trends from historical data that will be most useful to present three methods for traffic forecasting in TCP/IP based networks: 124
59 a resource management system (Bacigalupo et al., 2010, 2004, 2005; a neural network ensemble method, ARIMA and Holt–Winters. Frank 125
60 Bacigalupo, 2006). Moreover, Sarikaya et al. (2010) propose a Statistical et al. (2001) use neural networks as time series predictors, which 126
61 Metric Model (SMM) that is system and metric independent for employ a sliding window over the input sequence. Imam and Miskhat 127
62 predicting workload behavior. A different solution to a workload (2011) present time delay neural networks and regression methods for 128
63 prediction problem was presented by Wu et al. (2010), who proposed predicting future workloads in the grid or cloud platform. In similar 129
64 a model for grid performance prediction. They applied Savitzky–Golay way, Islam et al. (2012) develop prediction-based resource measure- 130
65 filter to train a sequence of confidence windows and used Kalman ments and provisioning strategies using neural network and linear 131
66 filters to minimize prediction errors. regression to satisfy upcoming resource demands. 132
Please cite this article as: Cetinski K, Jurič MB. AME-WPC: Advanced model for efficient workload prediction in the cloud. Journal of
Network and Computer Applications (2015), http://dx.doi.org/10.1016/j.jnca.2015.06.001i
K. Cetinski, M.B. Jurič / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 3
1 Machine-learning methods are more reliable for long-term work- However, such methods are very general and do not give accurate 67
2 load predictions and can, in combination with statistical methods, predictions. If we consider maximum workloads, resources will be 68
3 provide more accurate workload predictions. Furthermore, it is impo- unused most of the time. On the other hand, if an average 69
4 rtant that we provide as much information about the addressed workload is taken into account, there will be a lack of resources 70
5 system as possible and extract additional features that have an infl- (and performance will decrease) when workload increases. These 71
6 uence on workload value, which can significantly improve the kinds of prediction methods are considered to be poor because 72
7 performance of both statistical and machine-learning methods. they are not accurate enough and correspond to a very small 73
8 Table 1 presents a comparison between our approach and existing number of cases (Roy et al., 2011). 74
9 hybrid solutions. The main drawback of existing solutions is the lack of The problem of workload prediction can be resolved more 75
10 information about the addressed system. This information would efficiently by using appropriate machine-learning methods, 76
11 significantly improve the performance of either statistical or machine assuming that a historical record of workload is available for a 77
12 learning methods. Moreover, we strongly believe that combining specified period of time in the past. A major focus of research in 78
13 statistical and learning methods is reasonable, as it provides the the field of machine learning is to automatically learn to recognize 79
14 advantages of both sides. Training data, which presents the most complex patterns and make intelligent decisions based on existing 80
15 important part of prediction with machine learning, should be data. A prediction model can be built by mining the data in the 81
16 extended with appropriate features. For this purpose we developed training window (i.e. historical workload data) and use it to 82
17 our own Two-phase Pattern Matching (TPM) method, which extracts predict the workload throughout a prediction window (i.e. testing 83
18 significant features out of historical data. AME-WPC can be dynami- data). Figure 1 shows training and prediction windows. 84
19 cally adapted to a specific system. After a certain period of time, 85
20 learning datasets can be redefined based on most-recent historical 3.2. Data collection and preprocessing 86
21 workload data in order to further improve prediction accuracy of the 87
22 prediction model. Furthermore, we apply confidence factors for The first thing to consider when trying to build a prediction model 88
23 achieving more reliable predictions. We address the challenges of is data. We obtain our training data from the Grid Workloads Archive 89
24 poor prediction accuracy by building an Advanced Model for Efficient (http://gwa.ewi.tudelft.nl/), which offers several different workload 90
25 Workload Prediction in the Cloud (AME-WPC). traces and is widely adopted in the academic research field. Because 91
26 we need a historical workload trace in order to build a prediction 92
27 model, we choose the AuverGrid trace (http://gwa.ewi.tudelft.nl/ 93
28 datasets/gwa-t-4-auvergrid), since it provides the most complete data 94
3. Problem definition and data collection
29 for the given attributes. These traces were provided by the AuverGrid 95
30 team, the owners of the AuverGrid system. 96
3.1. Problem definition
31 The AuverGrid contains traces of job records collected during 97
32 12 months, from January 1 to December 31, 2006 (Fig. 2). Although 98
Our goal is to improve prediction accuracy of system workloads in
33 the total number of 29 attributes is defined, some attributes are 99
order to improve automatic scaling of cloud resources. Automatic
34 unavailable or partially available due to the limitations in the 100
resource scaling enables the system to automatically increase or
35 environment. The most useful attributes for us are submission time, 101
decrease the amount of infrastructure resources depending on the
36 wait time, run time and the number of processors used for each job. 102
past, current and future needs. This is useful especially when we need
37 Based on these attributes we transformed the original data into a 103
to react quickly in the case of shortage of hardware resources. The
38 104
time difference between sending the request and the actual acquisi-
39 105
tion is minimal but not instant and can consequently lead to serious
40 106
performance and capacity bottlenecks.
41 107
The problem of workload prediction can be resolved in several
42 108
ways. The most common solution is the use of historical workload
43 109
data for planning the future workload of the system. Maximum or
44 110
average loads can be observed for specified time intervals. Fig. 1. Training and prediction windows.
45 111
46 112
Table 1
47 Comparison of existing approaches. 113
Q9
48 114
49 Statistical Learning Feature Confidence 115
approach approach extraction factors
50 116
51 AME-WCA
117
52 118
Caron et al. (2010a, b, 2011), Ardagna et al. (2012), Kalekar (2004), Zhang (2003), Roy et al. (2011), Bacigalupo
53 et al. (2004, 2005, 2010, 2011); Bacigalupo (2006), Khan and Anerousis (2012), Li and Cheng (2010), Gong
119
54 et al. (2010), Sarikaya et al. (2010), Sun et al. (2013), and Yang et al. (2014) 120
55 Gmach et al. (2007) and Doulamis et al. (2007) 121
56 122
Liu et al. (2011), Kalantari and Akbari (2009), and Wu et al. (2010)
57 123
58 Li et al. (2011), Li (2005), Zhang et al. (2009), Cortez et al. (2012), Frank et al. (2001), Imam and Miskhat (2011), 124
59 and Islam et al. (2012) 125
60 Montes et al. (2011) 126
61 127
Vercauteren and Aggarwal (2007)
62 128
63 Chen et al. (2005), Donate et al. (2013), Eddahecha et al. (2013), Chang et al. (2014), Troncoso Lora et al. (2004), 129
64 Imandoust and Bolandraftar (2013), and Ban et al. (2013) 130
Cao (2003) and Saadatfar et al. (2012)
65 131
66 132
Please cite this article as: Cetinski K, Jurič MB. AME-WPC: Advanced model for efficient workload prediction in the cloud. Journal of
Network and Computer Applications (2015), http://dx.doi.org/10.1016/j.jnca.2015.06.001i
4 K. Cetinski, M.B. Jurič / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎
1 67
2 68
3 69
4 70
5 71
6 72
7 73
8 74
9 75
10 76
11 77
12 78
13 79
14 80
15 81
16 82
17 83
18 84
19 85
20 86
21 87
22 88
23 Fig. 2. AuverGrid job traces. 89
24 90
25 time series that contain workload values recorded for each second 91
26 in an entire time interval (12 months). We define workload as the 92
27 amount of processors needed at a certain time. In order to create a 93
28 time series, we developed a method in GNU Octave (Eaton, 2002), 94
29 which transforms all logged jobs into a time series. GNU Octave is 95
30 a high-level interpreted language primarily intended for numerical 96
31 computations. The data is sub-sampled so that workload values 97
32 are acquired only four times in an hour (every 15 min). The latter 98
33 is required in order to make our model less computationally 99
34 complex. The file is in a ‘.TAB’ format, which is required when 100
35 using machine-learning libraries such as Orange (Demšar et al., 101
36 2013). This presents our basic dataset, which we structure into 102
37 training and testing sets. Figure 3 presents a part of our training 103
38 dataset with basic attributes: date (month and day), time (hour and 104
39 minute) and workload. Extended datasets, which contain addi- 105
40 Q5 tional attributes, will be defined later on(Fig. 4). 106
41 In machine learning, problems can be approached either with 107
42 classification or regression. The difference is in output variable values: 108
43 regression involves estimating or predicting a response and classifica- 109
44 tion identifies group membership. We address workload prediction 110
45 problems with both regression and classification in order to determine 111
46 which method is more appropriate for our problem domain. Because 112
47 the number of classes in our training data is too large, we transform 113
48 the workload attribute to a different scale in order to use the 114
49 classification approach. For the purpose of testing, data is transformed 115
50 into a group of 11 classes and a group of 17 classes. We determine the 116
Fig. 3. Basic training dataset.
51 number of classes based on the most appropriate division of the 117
52 original data. In the next step, we divide our data into learning and 118
53 testing sets. Learning data is divided into 3-, 5- and 7-month intervals 119
54 and the testing data into 24-h intervals. We chose one-day interval (TPM) method and attribute scoring. Additional features are extracted 120
55 predictions because in this case, it is neither realistic nor useful to and scored from historical workload data (i.e. part of day, weekends) 121
56 predict further into the future. If the system is unpredictable and (Step 3). Then we divide our data into different training and testing 122
57 workload fluctuates regularly, the number of daily predictions should sets (Step 4), which we use with the Random Forest method (Step 5). 123
58 be properly adjusted. Finally, confidence factors are applied to workload predictions (Step 6) 124
59 in order to achieve more reliable results. The following subsections 125
60 present each part of our model in detail. 126
61 4. The proposed model 127
62 4.1. Analysis of time series 128
63 In this section, we first present an overview of the AME-WPC. Our 129
64 model consists of six steps (Fig. 5). In Step 1, we first analyze and We can determine periodicity of the time series with the use of 130
65 obtain historical workload records (time series). Step 2 features autocorrelation, which represents a cross-correlation of a signal with 131
66 extraction by leveraging our novel Two-phase Pattern Matching itself. Autocorrelation shows the similarity between observations as a 132
Please cite this article as: Cetinski K, Jurič MB. AME-WPC: Advanced model for efficient workload prediction in the cloud. Journal of
Network and Computer Applications (2015), http://dx.doi.org/10.1016/j.jnca.2015.06.001i
K. Cetinski, M.B. Jurič / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 5
Please cite this article as: Cetinski K, Jurič MB. AME-WPC: Advanced model for efficient workload prediction in the cloud. Journal of
Network and Computer Applications (2015), http://dx.doi.org/10.1016/j.jnca.2015.06.001i
6 K. Cetinski, M.B. Jurič / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Please cite this article as: Cetinski K, Jurič MB. AME-WPC: Advanced model for efficient workload prediction in the cloud. Journal of
Network and Computer Applications (2015), http://dx.doi.org/10.1016/j.jnca.2015.06.001i
K. Cetinski, M.B. Jurič / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 7
1 67
2 68
3 69
4 70
5 71
6 72
7 73
8 74
9 75
10 76
11 77
12 78
13 79
14 80
15 81
16 82
17 83
18 84
19 Fig. 6. Correlogram of our time series. 85
20 86
21 87
22 88
23 89
24 90
25 91
26 92
27 93
28 94
29 95
30 96
31 97
32 98
33 99
34 100
35 101
36 102
37 103
38 Fig. 7. Lag plot of our time series. 104
39 105
40 there is a strong correlation among current workload and histor- 5.1.4. Datasets 106
41 ical workload for four specific time intervals (att1, att2, att3, att4). We label the learning datasets as ti1, ti2 and ti3. The first dataset 107
42 These attributes are continuous and represent workload values at includes 8600 learning cases (approx. 3 months), the second database 108
43 a certain time in the past. att1 presents workload 1 day ago, att2 includes 14,500 learning cases (approx. 5 months) and the third one 109
44 9 days ago, att3 11 days ago and att4 18 days ago. More distant includes 20,000 cases (approx. 7 months). Workload prediction is 110
45 attributes (greater than one month) were not considered because performed one day ahead (24 h). Within each learning dataset (3, 111
46 that would result in a larger learning dataset or smaller learning 5 and 7 months) we perform test on 3 different testing datasets – day- 112
47 dataset with missing values. Aside from that, we have identified 1, day-2 and day-3. Day-1 represents testing dataset for the day 113
48 two discrete attributes – partofday and weekend. The first specifies following the last day of the adequate training dataset. Day-2 114
49 part of the day (1-night, 2-morning, 3-afternoon and 4-evening) represents testing data for the day after day-1 and day-3 for the day 115
50 and the second tells whether it is a weekend (1-weekend and 0- after day-2. 116
51 working day). Because the large number of attributes can nega- 117
52 tively affect the behavior of machine-learning methods, the next 5.2. Evaluation of results 118
53 step is to evaluate each identified attribute and choose only those 119
54 that contribute the most (i.e. have the highest assessments). For the purpose of comparing prediction results of different 120
55 datasets, we compute mean squared prediction errors. 121
56 In statistics, the mean squared error (MSE) of an estimator 122
57 measures the average of the squares of the difference between the 123
58 5.1.3. Attribute scoring estimator and what is estimated (predicted): 124
59 For classification, the scoring method eliminated the weekend, 125
60 part-of-day, att2 and att4 attributes, due to their low classification 1X ^ 126
MSE ¼ ðY i Y i Þ2 ð1Þ
61 scores. For regression, the scoring method eliminated part-of-day, n 127
62 att2, att4 and minute because they had the lowest scores of all In our case, the random variable Y presents true values and the 128
63 regression attributes. As a result we obtained updated training random variable Y^ presents predicted values. Figure 8 presents pre- 129
64 datasets with additional attributes for both methods (classification diction errors, which were computed based on the aforementioned 130
65 and regression), which will be used for testing the machine- equation for MSE. Symbol n presents the number of observed 131
66 learning method. instances. Additionally, we also computed normalized MSE's, which 132
Please cite this article as: Cetinski K, Jurič MB. AME-WPC: Advanced model for efficient workload prediction in the cloud. Journal of
Network and Computer Applications (2015), http://dx.doi.org/10.1016/j.jnca.2015.06.001i
8 K. Cetinski, M.B. Jurič / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎
1 67
2 68
3 69
4 70
5 71
6 72
7 73
8 74
9 75
10 76
11 77
12 78
13 79
14 80
15 81
16 82
17 83
18 84
19 85
20 86
21 87
22 88
23 89
24 90
25 91
26 92
27 93
28 94
29 95
30 96
31 97
32 98
33 99
34 100
35 101
36 102
37 103
38 104
39 105
40 106
41 107
42 108
Fig. 8. Tables show MSE's and NMSE's of methods Clas1_A (Classification method with 17 classes on extended training data), Clas1_B (Classification method with 17 classes
43 on basic training data), Clas2_A (Classification method with 11 classes on extended training data), Clas2_B (Classification method with 11 classes on basic training data),
109
44 Reg_A (Regression with values 0–170 on extended training data) and Reg_B (Regression with values 0–170 on basic training data) mark on different training data: (a) 3 110
45 month time interval; (b) 5 month time interval; and (c) 7 month time interval. 111
46 112
47 113
48 114
49 were adjusted by MSE values measured on different scales due to To further improve our classification methods, we add confidence 115
50 different number of classes (regression and classification with differ- factors to predictions in order to make them more reliable. Confidence 116
51 ent number of classes), to a notionally common scale. The following factors are specified on the basis of an actual prediction value and 117
52 equation presents the formula for a normalized MSE: Random Forest classifier probabilities of individual classes. Highest 118
53 probabilities from Random Forest classifiers are summed with an 119
1 X ^
54 NMSE ¼ ðY i Y i Þ2 ð2Þ additional value added based on the highest probability class. These 120
nnm
55 factors are then multiplied with original values of Random Forest 121
56 where m stands for the number of classes, with either 11 or 17 in the predictions (Algorithm 3). 122
57 case of classification or 170 in the case of regression. 123
Algorithm 3. Computed highest probability class from the Ran-
58 We perform the following tests. For training data from 3, 5 and 124
dom Forest class probabilities.
59 7 months (ti1, ti2 and ti3) we perform tests for the one-day-ahead 125
60 prediction (24 h) for both basic and extended training datasets. If Require: train’training dataset 126
61 we look at the error numbers for 3 months of training data in Fig. 8 Require: test’testing dataset 127
62 we see that regression (marked as Reg in the table) did not set forest ¼Orange.ensemble.forest. RandomForestLearner 128
63 perform satisfactory enough for our model. Both classification (train) 129
64 methods outperformed our model, which is why we decided to 130
65 rule out the regression method and try to further improve both for all instance in test do 131
66 classification methods. set iclass ¼ instance:get_classðÞ:value 132
Please cite this article as: Cetinski K, Jurič MB. AME-WPC: Advanced model for efficient workload prediction in the cloud. Journal of
Network and Computer Applications (2015), http://dx.doi.org/10.1016/j.jnca.2015.06.001i
K. Cetinski, M.B. Jurič / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 9
1 67
2 68
3 69
4 70
5 71
6 72
7 73
8 74
9 75
10 76
11 77
12 78
13 79
14 80
15 81
16 82
17 83
18 84
19 85
20 86
21 87
22 88
23 89
24 90
25 91
26 92
27 93
28 94
29 95
30 96
31 97
32 98
33 99
Fig. 9. Advanced Random Forest prediction with added confidence factors versus basic Random Forest prediction and advanced Random Forest prediction without
34 100
confidence factors for the next 24 h on training data ti1 (day 3), ti2 (day 1) and ti3 (day 2). Clas1 represents classification with 17 classes and Clas2 classification with 11
35 classes: (a) 3 months, Clas1; (b) 5 months, Clas1; (c) 7 months, Clas1; (d) 3 months, Clas2; (e) 5 months, Clas2; and (f) 7 months, Clas2. 101
36 102
37 set probabilities ¼ forest (instance, Classifier.GetProbabilities) To pursue this further, the classification method with 11 classes 103
38 compute highest probability class for each instance (marked as Clas2) gives less errors. If we look at the visual presenta- 104
39 end for tion (Fig. 9) where an actual prediction curve is shown, the classifica- 105
40 return highest probability class for each instance tion method with 11 classes also gives more useful results as far as 106
41 critical peak workloads. In this case, fluctuations of the predicted 107
42 workload are closer to those of the actual workload. We have to 108
43 consider fluctuation as well because it is very important when 109
44 Experimental tests prove that confidence factors additionally building a model for preventing the lack of system resources. This 110
45 improve our model, which clearly show visual presentations – means that the classification method with 11 classes would be more 111
46 actual workload values are more similar to the Random Forest appropriate for our model. 112
47 prediction values with added confidence factors than to the Furthermore, we compare our prediction model with the 113
48 Random Forest prediction values without confidence factors. machine-learning method kNN, which is already implemented in 114
49 MSE values show that the larger size of the learning dataset does the Orange library (Demšar et al., 2013) for Python. We perform 115
50 not considerably improve prediction accuracy, meaning that 3 months tests with different values of kNN's parameter k in the range 116
51 of training data is enough for making accurate predictions. between 2 and 30. Due to the most accurate prediction results, we 117
52 The graphs in Fig. 9 show prediction results for the basic set parameter k to 10. The graphs in Fig. 9 show prediction results 118
53 prediction method (blue dotted line). This is a Random Forest that for the extended prediction method – this is a Random Forest 119
54 uses basic training data (only attributes from original dataset). Red using extended training data (red dotted line) and the kNN 120
55 dotted line presents the extended prediction method – this is prediction method using basic training data (green dotted line). 121
56 Random Forest using extended training data with confidence Tests were made on two different approaches (Clas1 and Clas2) 122
57 factors. Graphs show Random Forest prediction values without and three different training datasets (3, 5 and 7 months of training 123
58 confidence factors as well (orange dotted line). Predicting work- data). Predictions made with kNN failed on all training and testing 124
59 load with a Random Forest tested on basic training dataset is datasets. As is seen from the graphs in Fig. 9, kNN predictions are 125
60 significantly worse compared to our approach. The latter can be very unreliable and deviate significantly from the true values. We 126
61 concluded from the error table as well as graphs. can then conclude that predicting workload with kNN is not 127
62 Thus, some methods indicate that the first few hours can be suitable for our observed system. 128
63 predicted fairly accurate, which can be seen from visual presenta- With experimental results on different training and testing data, 129
64 tions of predictions. This is to be expected because workload we have shown that the use of AME-WPC, which considers our own 130
65 values from a short time period in the past are the most helpful pattern matching method (TPM) and Random Forest classifier with 131
66 information for predicting future values. extended training dataset and confidence factors result in better 132
Please cite this article as: Cetinski K, Jurič MB. AME-WPC: Advanced model for efficient workload prediction in the cloud. Journal of
Network and Computer Applications (2015), http://dx.doi.org/10.1016/j.jnca.2015.06.001i
10 K. Cetinski, M.B. Jurič / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎
1 67
2 68
3 69
4 70
5 71
6 72
7 73
8 74
9 75
10 76
11 77
12 78
13 79
14 80
15 81
16 82
17 83
18 84
19 85
20 86
21 87
22 88
23 89
24 90
25 91
26 92
27 93
28 94
29 95
30 96
31 97
32 98
33 99
34 100
35 101
Fig. 10. Advanced Random Forest prediction versus kNN prediction for the next 24 h on training data ti1 (day 3), ti2 (day 1) and ti3 (day 2). Clas1 represents classification
36 102
with 17 classes and Clas2 classification with 11 classes: (a) 3 months, Clas1; (b) 5 months, Clas1; (c) 7 months, Clas1; (d) 3 months, Clas2; (e) 5 months, Clas2; and
37 (f) 7 months, Clas2. 103
38 104
39 prediction accuracy compared to the use of basic training datasets and data. Finally, we evaluated our proposed model capabilities on the 105
40 the basic prediction methods. Results were presented in the form of AuverGrid workload data series, which we obtained from the Grid 106
41 MSE's, NMSE's (Fig. 8) and graphs (Figs. 9 and 10) show that prediction Workloads Archive (http://gwa.ewi.tudelft.nl/). For evaluation pur- 107
42 accuracy of our proposed model outperforms the basic Random Forest poses, we included the machine-learning method kNN. Experimental 108
43 and kNN methods significantly. results demonstrated that the use of TPM and Random Forest with 109
44 extended learning datasets significantly improve prediction accuracy 110
45 of workload over time. Thereby, our approach can be considered 111
46 6. Conclusion and future work efficient in terms of enabling better resource management and 112
47 optimal service provisions which result in lower operational costs 113
48 We believe that the more knowledge we have about the cloud and a more stable environment. 114
49 system, the more precisely we can predict its future behavior. It is As part of our future work, we intend to improve our prediction 115
50 therefore necessary to consider historical workload, the type of model while developing an advanced model for automatic 116
51 data that describes system resources, connections among this data resource scaling that will make scaling decisions based on a 117
52 and the current state of the system (workload), upcoming events, predicted workload. Additionally, we will include detection of 118
53 projects and other factors that influence on the system workload. events that are considered to influence workload fluctuations. In 119
54 In this research paper we presented an Advanced Model for addition, AME-WPC can be dynamically adapted after a certain 120
55 Efficient Workload Prediction in the Cloud (AME-WPC), which com- period of time through a redefined learning dataset based on 121
56 bines statistical and learning methods. In order to improve the most-recent historical workload data in order to further improve 122
57 capabilities of the learning method, we proposed domain-specific prediction accuracy of the model. As a result, we will implement a 123
58 database extensions, which we defined through analysis of the system prototype system whose performance will be tested in real-world 124
59 factors that have a strong influence on the application workload (e.g. scenarios. We believe that the introduction of such systems can 125
60 part of day, holidays and weekends). Additional database extensions bring the automation of cloud management to a whole new level. 126
61 were defined using a novel Two-phase Pattern Matching method 127
62 (TPM). TPM covers two phases: it recognizes similar patterns based on 128
References
63 workload value and similar patterns based on workload fluctuation. 129
64 We addressed the workload prediction problem with classification as 130
Ardagna D, Casolari S, Colajanni M, Panicucci B. Dual time-scale distributed capacity
65 well as regression and tested our solution with the machine-learning allocation and load redirect algorithms for cloud systems. J Parallel Distrib 131
66 method of Random Forest on both – basic and extended – training Comput 2012;72(6):796–808. 132
Please cite this article as: Cetinski K, Jurič MB. AME-WPC: Advanced model for efficient workload prediction in the cloud. Journal of
Network and Computer Applications (2015), http://dx.doi.org/10.1016/j.jnca.2015.06.001i
K. Cetinski, M.B. Jurič / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 11
1 Bacigalupo D, Jarvis S, Nudd G. An investigation into the application of different Kalantari M, Akbari M. Grid performance prediction using state-space model. 61
2 performance prediction techniques to e-commerce applications. In: Proceed- Concurr Comput: Pract Exp 2009;21(9):1109–30. 62
ings of the 18th international parallel and distributed processing symposium Kalekar PS. Time series forecasting using Holt–Winters exponential smoothing.
3 (IPDPS). IEEE; 2004. p. 248–55. Technical Report; 2004.
63
4 Bacigalupo DA, Jarvis SA, He L, Spooner DP, Dillenberger DN, Nudd GR. An Khan A, Anerousis N. Workload characterization and prediction in the cloud: a 64
5 investigation into the application of different performance prediction methods multiple time series approach. In: 2012 IEEE network operations and manage- 65
to distributed enterprise applications. J Supercomput 2005;34(2):93–111. ment symposium. IEEE; 2012. p. 1287–94.
6 Bacigalupo DA, van Hemert J, Usmani A, Dillenberger DN, Wills GB, Jarvis SA.
66
Kononenko I. Estimating attributes: analysis and extensions of relief. In: Machine
7 Resource management of enterprise cloud systems using layered queuing and learning: ECML-94. Springer; 1994. p. 171–82. 67
8 historical performance models. In: IEEE international symposium on parallel & Li S-T, Cheng Y-C. A stochastic HMM-based forecasting model for fuzzy time series. 68
distributed processing, workshops and Phd forum (IPDPSW). IEEE; 2010. p. 1–8. IEEE Trans Syst Man Cybern Part B. Cybern 2010;40(5):1255–66.
9 Bacigalupo DA, van Hemert J, Chen X, Usmani A, Chester AP, He L, et al. Managing
69
Li Q, Hao Q-F, Xiao L-M, Li Z-J. An integrated approach to automatic management of
10 dynamic enterprise and urgent workloads on clouds using layered queuing and virtualized resources in cloud environments. Comput J 2011;54(6):905–19.
70
11 historical performance models. Simul Model Pract Theory 2011;19(6):1479–95. Li T. A hierarchical framework for modeling and forecasting web server workload. J 71
Bacigalupo D. Performance prediction-enhanced resource management of distrib-
12 Am Stat Assoc 2005;100(471):748–63. 72
uted enterprise systems [Ph.D. thesis]. UK: University of Warwick, Department Liu X, Ni Z, Yuan D, Jiang Y, Wu Z, Chen J, et al. A novel statistical time-series
13 of Computer Science; 2006. 73
pattern based interval forecasting strategy for activity durations in workflow
14 Ban T, Zhang R, Pang S, Sarrafzadeh A, Inoue D. Referential kNN regression for systems. J Syst Softw 2011;84(3):354–76. 74
15 financial time series forecasting. Neural Inf Process 2013;8226:601–8. Manvi SS, Shyam GK. Resource management for infrastructure as a service (IaaS) in 75
Breiman L. Random forests. Mach Learn 2001;45:5–32.
16 cloud computing: a survey. J Netw Comput Appl 2014;41:424–40. 76
Buyya R, Yeo CS, Venugopal S, Broberg J, Brandic I. Cloud computing and emerging
Mentzer J, Moon M. Sales forecasting management. Thousand Oaks; 2004. p. 73–
17 IT platforms: vision, hype, and reality for delivering computing as the 5th
112. Q7 77
18 utility. Future Gener Comput Syst 2009;25(6):599–616. 78
Montes J, Sánchez A, Pérez MS. Grid global behavior prediction. In: 2011 11th IEEE/
Cao L. Support vector machines experts for time series forecasting. Neurocomput-
19 ing 2003;51:321–39.
ACM international symposium on cluster. Cloud and grid computing. IEEE; 79
20 2011. p. 124–33. Q8 80
Caron E, Desprez F, Muresan A. Forecasting for grid and cloud computing on-
Quiroz A, Kim H, Parashar M, Gnanasambandam N, Sharma N. Towards autonomic
21 demand resources based on pattern matching. In: Proceedings of the IEEE 81
workload provisioning for enterprise grids and clouds. In: Proceedings of the
second international conference on cloud computing technology and science.
22 IEEE; 2010a. p. 456–63.
2009 10th IEEE/ACM international conference on grid computing. IEEE; 2009. p. 82
23 Caron E, Desprez F, Muresan A. Forecasting for cloud computing on-demand 50–7. 83
resources based on pattern matching. Technical report; February, 2010b. Robnik-Šikonja M. Improving random forests. Lecture Notes in Computer Science,
24 vol. 3201. 2004. p. 359–70.
84
Caron E, Desprez F, Muresan A. Pattern matching based forecast of non-periodic
25 repetitive behavior for cloud clients. J Grid Comput 2011;9(1):49–64. Roy N, Dubey A, Gokhale A. Efficient autoscaling in the cloud using predictive 85
26 Chang Y-C, Chang R-S, Chuang F-W. A predictive method for workload forecasting models for workload forecasting. In: Proceedings of the 2011 IEEE 4th 86
27 in the cloud environment. In: Advanced technologies, embedded and multi- international conference on cloud computing. IEEE; 2011. p. 500–7. 87
media for human-centric computing; 2014. p. 577–85. Saadatfar H, Fadishei H, Deldari H. Predicting job failures in auvergrid based on
28 workload log analysis. New Gener Comput 2012;30(1):73–94. 88
Chen Y, Yang B, Dong J, Abraham A. Time-series forecasting using flexible neural
29 tree model. Inf Sci 2005;174(3–4):219–35. Sarikaya R, Isci C, Buyuktosunoglu A. Runtime workload behavior prediction using 89
30 Cortez P, Rio M, Rocha M, Sousa P. Multi-scale internet traffic forecasting using statistical metric modeling with application to dynamic power management. 90
neural networks and time series methods. Expert Syst 2012;29(2):143–55. In: 2010 IEEE International Symposium on Workload Characterization (IISWC);
31 91
Delft TU. Gwa-t-4 auvergrid. URL 〈http://gwa.ewi.tudelft.nl/datasets/gwa-t-4- 2010. p. 1–10.
32 auvergrid〉. Sun YS, Chen Y-F, Chen MC. A workload analysis of live event broadcast service in 92
33 Delft TU. The grid workloads archive. URL 〈http://gwa.ewi.tudelft.nl/〉. cloud. Procedia Comput Sci 2013;19:1028–33. [In: The 4th International 93
34 Demšar J, Curk T, Erjavec A, Črt GroupHočevar T, Milutinovič M, et al. Orange: data Conference on Ambient Systems, Networks and Technologies (ANT 2013), the 94
mining toolbox in Python. J Mach Learn Res 2013;14:2349–53. 3rd International Conference on Sustainable Energy Information Technology
35 Donate JP, Li X, Sánchez GG, de Miguel AS. Time series forecasting by evolving (SEIT-2013)].
95
36 artificial neural networks with genetic algorithms, differential evolution and Troncoso Lora A, Riquelme Santos JM, Riquelme JC, Gómez Expósito A, Martínez 96
37 estimation of distribution algorithm. Neural Comput Appl 2013;22(1):11–20. Ramos JL. Time-series prediction: application to the short-term electric energy 97
Doulamis N, Doulamis A, Litke A, Panagakis A, Varvarigou T, Varvarigos E. Adjusted demand. Curr Top Artif Intell 2004;3040:577–86.
38 fair scheduling and non-linear workload prediction for QoS guarantees in grid
98
Vercauteren T, Aggarwal P. Hierarchical forecasting of Web server workload using
39 computing. Comput Commun 2007;30(3):499–515. sequential Monte Carlo training. IEEE Trans Signal Process 2007;55(4):1286–97. 99
40 Eaton JW. GNU Octave manual. Network Theory Limited; 2002. Wu Y, Hwang K, Yuan Y. Adaptive workload prediction of grid performance in 100
Eddahecha A, Chtouroub S, Chtouroua M. Hierarchical neural networks based confidence windows. IEEE Trans Parallel Distrib Syst 2010;21(7):925–38.
41 101
prediction and control of dynamic reconfiguration for multilevel embedded Yang J, Liu C, Shang Y, Cheng B, Mao Z, Liu C, et al. A cost-aware auto-scaling
42 systems. J Syst Archit 2013;59(1):48–59. approach using the workload prediction in service clouds. Inf Syst Front
102
43 Frank R, Davey N, Hunt S. Time series prediction and neural networks. J Intell Robot 2014;16(1):7–18. 103
Syst 2001;31(1–3):91–103. Zhang H, Jiang G, Yoshihira K, Chen H, Saxena A. Intelligent workload factoring for a
44 104
Ganapathi A, Yanpei C, Fox A, Katz R, Patterson D. Statistics-driven workload hybrid cloud computing model. In: Proceedings of the 2009 congress on
45 modeling for the cloud. In: 2010 IEEE 26th international conference on data 105
services —I. IEEE; 2009. p. 701–8.
46 engineering workshops (ICDEW). IEEE; 2010. p. 87–92. Zhang GP. Time series forecasting using a hybrid ARIMA and neural network model. 106
47 Gmach D, Rolia J, Cherkasova L, Kemper A. Workload analysis and demand Neurocomput 2003;50:159–75. 107
prediction of enterprise data center applications. In: Proceedings of the 2007
48 IEEE 10th international symposium on workload characterization. IEEE; 2007. 108
49 p. 171–80. 109
50 Gong Z, Gu X, Wilkes J. Press: predictive elastic resource scaling for cloud systems; 110
Q2 2010. p. 9–16.
51 Imam M, Miskhat S. Neural network and regression based processor load prediction 111
52 for efficient scaling of grid and cloud resources. In: Proceedings of the 14th 112
53 international conference on computer and information technology (ICCIT); 113
2011. p. 333–8.
54 Imandoust SB, Bolandraftar M. Application of K-nearest neighbor (KNN) approach
114
55 for predicting economic events: theoretical background. Int J Eng Res Appl 115
56 2013;3(5):605–10. 116
Islam S, Keung J, Lee K, Liu A. Empirical prediction models for adaptive resource
57 117
provisioning in the cloud. Future Gener Comput Syst 2012;28(1):155–62.
58 Jones E, Oliphant T, Peterson P, et al. SciPy: open source scientific tools for Python; 118
59 Q6 2001. 119
60
Please cite this article as: Cetinski K, Jurič MB. AME-WPC: Advanced model for efficient workload prediction in the cloud. Journal of
Network and Computer Applications (2015), http://dx.doi.org/10.1016/j.jnca.2015.06.001i