0% found this document useful (0 votes)
46 views

Machine Learning Based Workload Prediction in Cloud Computing

This document discusses machine learning based workload prediction in cloud computing. It begins by introducing the importance of accurate workload prediction for efficient resource provisioning and maintaining quality of service. It then reviews different existing prediction methods, including statistical, machine learning, and deep learning approaches. The authors propose a clustering-based prediction method that trains separate models for different task categories to improve prediction accuracy. Experimental results using a Google cluster trace show their method outperforms other approaches with around 90% accuracy for CPU and memory workload prediction.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Machine Learning Based Workload Prediction in Cloud Computing

This document discusses machine learning based workload prediction in cloud computing. It begins by introducing the importance of accurate workload prediction for efficient resource provisioning and maintaining quality of service. It then reviews different existing prediction methods, including statistical, machine learning, and deep learning approaches. The authors propose a clustering-based prediction method that trains separate models for different task categories to improve prediction accuracy. Experimental results using a Google cluster trace show their method outperforms other approaches with around 90% accuracy for CPU and memory workload prediction.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Machine Learning Based Workload Prediction in

Cloud Computing
Jiechao Gao, Haoyu Wang and Haiying Shen
Department of Computer Science
University of Virginia
Charlottesville, VA, USA
{jg5ycn, hw8c, hs6ms}@virginia.edu

Abstract—As a widely used IT service, more and more com- multiple applications on the VMs. Since the load of each VM
panies shift their services to cloud datacenters. It is important on a PM varies over time, a PM may become overloaded, i.e.,
for cloud service providers (CSPs) to provide cloud service the resource demand from its VMs is beyond its possessed
resources with high elasticity and cost-effectiveness and then
achieve good quality of service (QoS) for their clients. However, resource. Such load imbalance in a PM adversely affects the
meeting QoS with cost-effective resource is a challenging problem performance of all the VMs (hence the applications) running
for CSPs because the workloads of Virtual Machines (VMs) on the PM. Insufficient resources provision to customer appli-
experience variation over time. It is highly necessary to provide cations also violates the Service Level Agreement (SLA) [4].
an accurate VMs workload prediction method for resource An SLA is an agreement between a cloud customer and
provisioning to efficiently manage cloud resources. In this paper,
we first compare the performance of representative state-of- the cloud service provider that guarantees the application
the-art workload prediction methods. We suggest a method to performance of the customer. In order to uphold the SLA, a
conduct the prediction a certain time before the predicted time cloud service provider must prevent PM overload and ensure
point in order to allow sufficient time for task scheduling based on VMs receive their demanded resources. As shown in Figure 1,
predicted workload. To further improve the prediction accuracy, the prediction model can get the historical data from resource
we introduce a clustering based workload prediction method,
which first clusters all the tasks into several categories and manager and send back the predicted workload for each task to
then trains a prediction model for each category respectively. resource manager. The resource manager then can arrange each
The trace-driven experiments based on Google cluster trace VM on PMs according to the predicted results respectively.
demonstrates that our clustering based workload prediction As cloud data centers are often oversubscribed, resources
methods outperform other comparison methods and improve the such as CPU and bandwidth are stretched thin as they are
prediction accuracy to around 90% both in CPU and memory.
Index Terms—Cloud Computing, Machine learning, Workload
shared across many tenants. In particular, when VMs with
Prediction intense resource requirements are located on the same PM,
they compete for scarce resources, which may lead to poor
I. I NTRODUCTION performance of applications. Much resource effort has been
devoted to developing strategies for resource provisioning in
Cloud computing is a widely used IT service, which pro- the initial VM allocation and VM migration phases. Recently,
vides various services under one roof. Multiple types of ser- some methods [5]–[11] have been proposed to predict VM
vices such as storage, computing and web hosting now can be resource demand in a short time for sufficient resources
provided by one cloud service provider. Many businesses move provisioning or load balancing. In the proactive load balancing,
their services to clouds due to their flexible service model such a PM predicts whether it will be overloaded by predicting its
as pay-as-you-go business model [1]. Such elasticity of the VMs’ resource demands and moves out VMs when necessary.
service model brings about cost saving for most businesses by In the previous research, statistical approaches [12]–[15], ma-
eliminating the need of developing, maintaining and scaling a chine learning (ML) approaches [16]–[20] and deep learning
large private infrastructure [2]. approaches [21]–[28] are used for the resource demand pre-
Cloud computing plays an important role nowadays which diction. There has been no effort that conducts the comparison
allows clients to use cloud resources in a pay-as-you-go study on these prediction approaches.
fashion. It can satisfy the cloud resource requirements of To better understand these approaches, we compare the
the clients so that the clients need not to concern about the representative methods in these approaches using the Google
overprovisioning of a service whose resource utilization does cluster trace [29]. In the previous prediction methods, there
not meet the predictions, and then wasting costly resources, is 0 time gap between the input workload data points and
or underprovisioning of a service which turns into popular in the predicted workload data point (we call 0-gap prediction).
the future, and then missing potential revenue [3]. Then, it may leave little time for task scheduling based on
Using hardware virtualization, cloud service providers let a predicted workload. We propose m-gap prediction that keeps
physical machine (PM) run multiple virtual machines (VMs) a gap of m time points between the input data points and
(i.e., tasks) with different resource allocations. A cloud hosts the predicted data point in order to leave enough time for

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:01:05 UTC from IEEE Xplore. Restrictions apply.
978-1-7281-6607-0/20/$31.00 ©2020 IEEE
the task scheduling. Our experimental results show that m- II. R ELATED W ORK
gap prediction does not compromise the prediction accuracy
performance of 0-gap prediction. Also, previous prediction We classify all the previous resource demand (i.e., work-
methods build one prediction model for all the tasks, which load) prediction works into three parts: statistical approaches,
may not catch the patterns of all the heterogeneous tasks for machine learning approaches and deep learning approaches.
more accurate prediction. In order to improve the accuracy Statistical approaches. The statistical approaches are pop-
performance of the previous prediction methods, we propose ular ways in predicting workload. Khan et al. [12] discovered
a clustering-based prediction method. It clusters tasks with the repeatable workload patterns of VMs, and then introduced
similar workload patterns into a group for training to create a an approach based on Hidden Markov Modeling to character-
model, and uses the corresponding model of a task to predict ize and predict workload patterns. Jiang et al. [13] presented
its workload. an online temporal data mining system called ASAP, which
is used to model and predict the cloud VM demand by using
PM VM Moving Average (MA) model. Morais et al. [14] proposed
Resource a framework for the implementation of auto-scaling services


Clients Manager that are based on several CPU utilization prediction methods
including Auto Correlation (AC), linear regression (LR), auto

VM regression (AR), Auto Regression Integrated Moving Average
PM
Historical Predicted (ARIMA) and so on. Gong et al. [15] developed PRESS that

Data Workload uses a pattern matching and state-driven approach to predict
workloads. It first employs signal processing techniques to
check if the CPU utilization in a VM exhibits repeating
Prediction Model patterns. If yes, the repeating patterns are used to predict future
workloads; otherwise, PRESS employs a statistical state-driven
Fig. 1. Overview of workload prediction procedure. approach, and uses a discrete-time Markov chain to predict
the demand for the near future. However, many datasets are
Our contributions in this paper are as follows: unstable (i.e., the variance between each two neighboring is
(1) We conduct experimental comparison on the predic- large or some points missed), but these time series prediction
tion accuracy performance of state-of-the-art prediction approaches employ the linear prediction structure, which is
methods from statistical approaches, machine learning more suitable for stable dataset.
approaches and deep learning approaches respectively. Machine learning approaches. Machine learning ap-
(2) We propose m-gap prediction that keeps a gap of m time proaches are widely used for VM workload prediction as
points between the input data points and the predicted well. Imam et al. [30] presented time delay neural network
data point in order to leave enough time for the task (NN) and regression methods to predict the workload of each
scheduling based on predicted workload. VM. Farahnakian et al. [31] developed resource measurement
(3) We also propose a clustering-based prediction method and provisioning strategies using NN and linear regression
for higher prediction accuracy and use two clustering to predict upcoming VMs’ demands. Bankole et al. [17]
algorithms here. The method first clusters all the tasks developed a cloud client prediction model to predict the
into several categories and then generates a model for resource demand of each VM using three machine learning
each task category. Since each model can capture the models: support vector regression, NN and linear regression.
different features in each task category, the accuracy Islam et al. [16] proposed an approach using NN and Linear
prediction performance is much better than the prediction Regression algorithms to predict the future CPU load of a VM
methods that have only one model for all the tasks. and they have concluded that NN surpasses Linear Regression
(4) We implement our proposed methods and construct ex- in terms of accuracy. In addition, they have shown that the
tensive experiments. The experimental results show that accuracy of both algorithms depends on the input window
m-gap prediction does not compromise the accuracy per- size. Nikravesh et al. [19], [20] have evaluated the Support
formance of 0-gap prediction used in previous prediction Vector Machine (SVM), NN and Linear Regression machine
methods. Also, our clustering-based prediction method learning prediction methods. They found that, if the resource
achieves much better accuracy performance than all the utilization of each VM changes periodically, SVM has better
previous methods. prediction accuracy compared to the other methods. However,
The rest of the paper is organized as follows. Section II when the size of the dataset is large (such as the Google
presents the related work. Section III presents the Google cluster trace, SVM cannot achieve high prediction accuracy
cluster trace preparation and the measurement results for as indicated in [32]. Gopal et al. [33] proposed Bayesian
state-of-the-art prediction methods. Sections IV presents our model for resource prediction of each VM and compared with
clustering-based prediction methods and the performance eval- linear regression method and support vector regression. They
uation of our methods. Section V concludes the paper with observed that by using Bayesian based model, the workloads
remarks on our future work. of approximately 75% of the servers in datacenter could be

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:01:05 UTC from IEEE Xplore. Restrictions apply.
predicted with accuracies over 80%. tion approaches can be classified to three groups: statistical
Deep learning approaches. Deep learning approaches are approaches, machine learning approaches and deep learn-
also applied for workload prediction in recent years. Qiu et ing approaches. According to the performance, we choose
al. [22] presented a deep learning approach (that consists of ARIMA [35] to represent the statistical approaches since
a Deep belief network (DBN) and a regression layer) for ARIMA can achieve an average prediction accuracy over 70%
the VM workload prediction in the cloud system. Zhang et compared with other statistical methods in [36]. We choose
al. [25] presented an efficient deep learning model based Support Vector Regression (SVR) [37], [38] and Bayesian
on the canonical polyatomic decomposition to predict the Ridge Regression [33], [39] to represent the machine learning
workload of each VM. Their proposed model can achieve a approaches since these two methods are announced as the most
high training speedup since it utilizes the canonical polyatomic effective prediction algorithms for cloud system in [19], [20].
decomposition to compress the parameters significantly with We choose LSTM [27], [34] to represent the deep learning
a low classification accuracy drop. Zhang et al. [24] proposed approaches since LSTM can achieve higher accuracy than
a DBN-based approach for cloud resource request prediction Autoregressive method, artificial neural network, ARIMA in
of each task that can be used for long-term and short-term host load prediction in [27].
prediction with improved accuracy compared with existing 1) Auto Regression Integrated Moving Average (ARIMA):
methods. Kumar et al. [26] developed prediction models based ARIMA is widely used in time series analysis. It is a general-
on Long Short Term Memory (LSTM) networks [34]. The ization of an autoregressive moving average (ARMA) model.
proposed model is tested on three benchmark datasets of web Both of these models are fitted to time series data either to
server logs, and HTTP traces of NASA server, Calgary server, better understand the data or to predict future points in the
and Saskatchewan server. Song et al. [27] applied a model series. ARIMA model can be applied to the unstable datasets
based on LSTM to predict the mean load over consecutive via one or more differencing steps. The differencing step forms
future time intervals and actual load multi-step-ahead using data transformation which can be applied on the time-series
Google cluster trace, which achieves high prediction accuracy data to make the data more stable.
in a traditional distributed system. 2) Bayesian Ridge Regression (BRR): Bayesian Ridge
Regression (BRR) has better performance when dealing with
III. T RACE A NALYSIS AND M EASUREMENT pathological data. The BRR has a probabilistic model for
A. Google Cluster Trace regression problems. When the pathological data occurs, the
The Google cluster trace starts at 19:00 EDT on Sunday prediction variances are large so they may be far from the
May 1, 2011, and it records 29 days’ resource utilization of actual value. In BRR, by adding a degree of bias to the
CPU and memory usage of each task on the Google cluster of regression, it can reduce the standard errors and then achieve
about 12.5k machines. A job is comprised of one or more better prediction accuracy.
tasks, each of which is accompanied by a set of resource 3) Support Vector Regression (SVR): SVR is based on
requirements. The trace contains 672,075 jobs and more than the computation of a linear regression function in a multiple
48 million tasks in the 29 days. This trace is a randomly- variables feature space where the input data can be used via
picked 1 second sample of CPU/memory usage from within a non-linear regression function [40]. The model produced by
the associated 5-minute usage-reporting period for each task. SVR depends only on a subset of the training data, because the
We use the entire trace to conduct the trace-driven experiments cost function of building the model doesn’t take into account
and conduct measurement among the comparison methods. of any training data which is closer to the prediction results. In
We notice that in the Google cluster trace, the CPU usage another words, during the training of SVR model, it puts more
and memory usage of some tasks are zero for a period of time. weights on the data points further to the previous predicted
It doesn’t necessarily mean that the task is paused as indicated value so that the model can consider more on further points
in [29]. There could be many reasons for it. [29] indicates that to capture more possible patterns in a dataset.
when the measurements occur while the monitoring system 4) Long Short Term Memory (LSTM): LSTM is a neural
or machine hosting the system is overloaded, memory and network model which is widely used in the field of deep
CPU for a task may not be collected and then set 0. In learning. It can be applied to fit the time-series data. For the
some cases, a task has no process for an extended period of pathological data problem, it can void those pathological data
time. Also, the measurement records may be missing, thus via forget gate and input gate and then achieve high prediction
generating pathological data. These zero values make the data accuracy.
non-linear so that it may be difficult to predict the resource In each method we mentioned above, we choose the
usage via statistical model. In this case, it is important to find same experiment settings as the previous paper indicates. For
the prediction algorithms that can better deal with pathological ARIMA, we implement the method with the same experiment
data. settings mentioned in [36]. For SVR, we implement the
method with the same experiment settings mentioned in [38].
B. Statistical and Machine Learning Methods For BRR, we implement the method with the same experiment
We implement the whole experiment on a local ma- settings mentioned in [33]. For LSTM, we implement the
chine based on Tensorflow. Recall that the workload predic- method with the same experiment settings mentioned in [27].

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:01:05 UTC from IEEE Xplore. Restrictions apply.
Then, we compare these methods under the circumstances of 60 percent time of the trace as the training set and use the
0-gap prediction and m-gap prediction as follow. remaining 40 percent of the trace as the testing set.
Figure 3 shows the training and testing procedure of the
C. Prediction Methods
m-gap prediction. Different from the 0-gap prediction, m-gap
1) 0-gap Prediction: This method is used in the previous prediction has a time window gap m between the last time
prediction methods. That is, the w data points before the nth point of input data and the output time point, as shown by
time point are used as inputs to predict the value at the nth gray squares. In this example, the window size w = 3 and
time point, denoted by Vn . w here means window size. For gap = 3. Three data points (from the 1st to the 3rd time
each task, we use the first 80 percent time of the trace as the points) are used as input data to get the prediction value of
training set and use the remaining 20 percent of the trace as the 7th time point, and it is compared with the real value of
the testing set. We call it 0-gap prediction because there is no the 7th point to get the accuracy.
time gap between the time of inputs and the time of the output
value. D. Metrics
Output To illustrate the performance of the above methods, we use
P4 three metrics to determine the better results.
(Prediction value)
1) CDF for Accuracy: We use cumulative distribution
function (CDF) to show the performance of accuracy. As
1 2 3 4
in [41], the prediction accuracy is calculated by:
t
|Pn − Rn |
An = 1 − (1)
Input Real value Rn
where An is the prediction accuracy of nth prediction, Pn is
Fig. 2. An example of 0-gap prediction.
the predicted value of nth prediction and Rn is the real value
Figure 2 shows a simple example for the training and in nth prediction.
testing procedure of the 0-gap prediction. The numbers in the 2) CDF for Accuracy with Different Window Sizes: To
squares mean the time sequence. Take the testing procedure show the influence of the window size on the accuracy
for instance, the red squares represent the input data of the performance, we show the CDF for accuracy with different
testing and the number of red squares means the window size window sizes for the best model in accuracy.
w. In this example, the w = 3. The blue square represents the E. Experimental Results
real value of one time point. The black square represents the
prediction value of the time point. Three data points (from the 1) 0-gap Prediction Results: We first evaluate 0-gap pre-
1st to the 3rd time points) are used as input data to get the diction. Figure 4 shows the CDF of the prediction accuracy
prediction value of the 4th time point, and it is compared with of the CPU usage among the four comparison methods. The
the real value of the 4th point to get the accuracy. result follows SVR< ARIMA≈ LSTM<BRR. SVR has worse
2) m-gap Prediction: In the above 0-gap prediction results than other methods. Since the size of Google cluster
method, since there is no time gap between the inputs and the trace is large, SVR cannot achieve high prediction accuracy as
output. Then, the prediction model may output the predicted indicated in [32]. ARIMA and LSTM have better performance
than SVR but worse than BRR. For ARIMA, as mentioned
P7
Output previously that the Google cluster trace has some pathological
(Prediction value) data, since ARIMA creats a liner prediction model, it does not
Input Gap
perform well for handling pathological datasets. For LSTM,
Real value
its neural network mode can fit the non-linearities of a dataset.
1 2 3 4 5 6 7 t However, it does not perform well for short-term tasks that do
P8
not many data points for training as indicated in [34]. So the
performance for LSTM is not good for the short-term tasks,
which leads to its worse performance than BRR. BRR has
2 3 4 5 6 7 8 t
the highest accuracy performance compared to other methods.
Fig. 3. An example of m-gap prediction. BRR uses Levenberg-Marquardt algorithm [42] which is for
non-linear datasets. Thus, BRR can achieve the best perfor-
value of Vn after the real workload of Vn already occurs. mance among these four methods in spit of the pathological
Even though the prediction can be completed before the real dataset and the case of few training data points. Since the
occurrence, there may leave little time for the scheduling performance of SVR is worse than other three methods, we
before the real workload of Vn occurs. Therefore, we propose will not discuss this method in the rest of the experiments.
a method called m-gap prediction. That is, the w data points Figure 5 show the CDF of the prediction accuracy of
before the nth − m time point are used as inputs to predict the CPU usage in different window sizes with 1, 10 and
the value at the nth time point. For each task, we use the first 50. For these three cases, the results follow that window

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:01:05 UTC from IEEE Xplore. Restrictions apply.
1 1 1 1
ARIMA window size=1 window size=1 window size=1
0.8 0.8 0.8 0.8
window size=10 window size=10 window size=10
BRR
0.6 0.6 window size=50 0.6 window size=50 0.6 window size=50
CDF

CDF

CDF

CDF
LSTM
0.4 0.4 0.4 0.4
SVR
0.2
0.2 0.2 0.2
0
0 20 40 60 80 100 0 0 0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Accuracy Accuracy Accuracy Accuracy
(a) ARIMA (b) BRR (c) LSTM
Fig. 4. CDF of CPU accuracy for
0-gap prediction. Fig. 5. CDF of CPU accuracy with different window sizes for 0-gap prediction.

1 1 1 1
ARIMA window size=1 window size=1 window size=1
0.8 0.8 0.8 0.8
window size=10 window size=10 window size=10
BRR
0.6 0.6 window size=50 0.6 window size=50 0.6 window size=50
CDF

CDF

CDF

CDF
LSTM
0.4 0.4 0.4 0.4
SVR
0.2
0.2 0.2 0.2
0
50 60 70 80 90 100 0 0 0
50 60 70 80 90 100 50 60 70 80 90 100 50 60 70 80 90 100
Accuracy Accuracy Accuracy Accuracy
(a) ARIMA (b) BRR (c) LSTM
Fig. 6. CDF of memory accuracy
for 0-gap prediction. Fig. 7. CDF of memory accuracy with different window sizes for 0-gap prediction.

size=1<window size=10≈window size=50. Larger window and order for all the comparison methods and the different
size can introduce more input values in the model and then window sizes due to the same reasons.
achieve more accurate results. However, the larger window
IV. C LUSTERING BASED P REDICTION M ETHODS
size leads to much longer training and testing time cost [43].
Since window size=10 has similar prediction accuracy but Since different tasks have different workload features, it
lower overhead, we use window size=10 as unless otherwise may be difficult for one model to capture the variable workload
specified. features and then predict the resource utilization with high
Similar to Figures 4 and 5, Figures 6 and 7 show the results accuracy. One prediction model can achieve higher accuracy
in memory prediction. The results follow the same trend and for the tasks with similar workload features. Therefore, in
order for all the comparison methods and the different window order to overcome this problem for higher prediction accuracy,
sizes due to the same reasons. we propose clustering based prediction methods which are
introduced below.
2) m-gap Prediction Results: Figure 8 shows the CDF of
the prediction accuracy of the CPU usage in m-gap prediction. A. Clustering Methods
The same as in 0-gap prediction, the result follows ARIMA≈ In the above, we build one model that is used for predicting
LSTM<BRR. Due to the same reasons, for ARIMA, it does the workloads of all tasks. Different tasks have different
not perform well in handling pathological datasets. For LSTM, workload features. We would like to see if we build one
its neural network model can fit the non-linearities of dataset, model for similar tasks, whether the prediction accuracy can
but its performance is not good for the short-term tasks with be improved. Therefore, we first cluster similar tasks to a
not many training data points, which leads to its worse perfor- group, and then build one model for each task cluster with
mance than BRR. BRR has the highest accuracy performance the same machine learning algorithm. For a given task, we
compared to other methods as it can handle pathological choose corresponding model of the task’s category to predict
datasets and few training data points. its workload. We use two clustering methods explained below.
Figure 9 show the CDF of the prediction accuracy of the 1) Prototype-based Clustering Method: Prototype-based
CPU usage in different window sizes with 1, 10 and 50 for clustering (PBC) is to find the shortest distance that between
m-gap prediction. For these three cases, even though the gap every tasks to the center. The number of clusters N means that
m = 10, the results follow that window size=1<window the tasks (described by CPU and memory usage data) will
size=10≈window size=50 which is the same as in Figure 5. be divided into N parts and for each part the total distance
Similar to Figures 6 and 7, Figures 10 and 11 show the between each task description (or data point) to the center is
results in memory prediction. The results follow the same trend the shortest.

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:01:05 UTC from IEEE Xplore. Restrictions apply.
1 1 1 1
window size=1 window size=1 window size=1
ARIMA 0.8 0.8 0.8
0.8 window size=10 window size=10 window size=10
0.6
BRR window size=50 window size=50 window size=50
0.6 0.6 0.6
CDF

CDF

CDF

CDF
0.4 LSTM
0.4 0.4 0.4
0.2
0.2 0.2 0.2
0
0 20 40 60 80 100 0 0 0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Accuracy Accuracy Accuracy Accuracy
(a) ARIMA (b) BRR (c) LSTM
Fig. 8. CDF of CPU accuracy for
m-gap prediction. Fig. 9. CDF of different window sizes in CPU for m-gap prediction.

1 1 1 1
ARIMA window size=1 window size=1 window size=1
0.8 0.8 0.8 0.8
window size=10 window size=10 window size=10
0.6
Bayes
0.6 window size=50 0.6 window size=50 0.6 window size=50
CDF

CDF

CDF

CDF
0.4 LSTM
0.4 0.4 0.4
0.2
0.2 0.2 0.2
0
50 60 70 80 90 100 0 0 0
50 60 70 80 90 100 50 60 70 80 90 100 50 60 70 80 90 100
Accuracy Accuracy Accuracy Accuracy
(a) ARIMA (b) BRR (c) LSTM
Fig. 10. CDF of memory accuracy
for m-gap prediction. Fig. 11. CDF of memory accuracy with different window sizes for m-gap prediction.

The K-means [44] and Gaussian Mixture Clustering [45] area and then groups together points that are close. Meanwhile,
are the two main methods for PBC. K-means is a distance- for the rest outliers points that lie alone in low-density area,
based iterative algorithm. It clusters the whole data points each point is assigned into the nearest group. Finally, all the
observation instances into N clusters so that each observation points are clustered. One difference between K-means and
instance is smaller than the center point of the cluster in DBC is that the number of groups, N, can be set manually
which it is located, compared to other cluster center points. In in K-means but is determined by DBC itself. DBC clusters all
order to minimize the squared error, the K-means algorithm the tasks into 5 groups.
uses iterative optimization to approximate the target. In Gaus- Also since DBC discovers clusters by continuously con-
sian Mixture clustering, it uses probability model to express necting high-density points in the neighborhood, it only needs
clustering prototype. The problem here is that the lengths of to define the neighborhood size and density thresholds, so
all tasks are not exactly the same while the PBC clustering clusters of different shapes and sizes can be found. We use
methods require the same length of all data points. To solve DBC in our prediction methods. The input of DBC is all the
this problem, we append 0 in the end to make all the tasks tasks in the dataset. We cluster the tasks into different subsets
have the same length. In our experiment, we find out that and directly use the previous prediction algorithms on each
the performance of K-means is better than Gaussian Mixture subset. Using this method, we get the clustered tasks with same
Clustering. So we use K-means in our prediction methods. pattern and train each subset for better prediction accuracy.
The input of the PBC method is all tasks in the dataset. We
B. Prediction Procedure
cluster the tasks into different subsets. For each subset, we
use one of the three prediction algorithms (i.e., ARIMA, BRR Combining the different clustering methods and prediction
and LSTM) to build a model. As a result, N subsets lead methods, we finally can get the methods denoted by PBC-
to N models. Larger N leads to high computation overhead ARIMA, DBC-ARIMA, PBC-BRR, DBC-BRR, and PBC-
but lower N leads lower prediction accuracy since the lower LSTM and DBC-LSTM. The clustering method clusters all the
number of subsets may not capture the workload feathers. We tasks into several groups. Then, we use the data in each group
use N = 5 in default because we found it achieves a better for training to build a model. To predict a task’s workload, we
tradeoff between prediction accuracy and computer overhead map the task to a task group based on its features and then use
from our experiments. the model for the corresponding group to conduct prediction.

2) Density-based Clustering Method: Density-based clus- C. Experimental Results


tering (DBC) method [46] is a density-based clustering non- Now we evaluate the performance of PBC and DBC based
parametric method. It first finds the points within high density prediction methods. Since there is no big difference for predic-

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:01:05 UTC from IEEE Xplore. Restrictions apply.
1 1 1 1
window size=1 window size=1 window size=1
0.8 PBC-ARIMA 0.8 0.8 0.8
window size=10 window size=10 window size=10
0.6 PBC-BRR 0.6 window size=50 0.6 window size=50 0.6 window size=50
CDF

CDF

CDF
CDF
0.4 0.4 0.4 0.4
PBC-LSTM
0.2
0.2 0.2 0.2
0
0 20 40 60 80 100 0 0 0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Accuracy Accuracy Accuracy Accuracy
(a) PBC-ARIMA (b) PBC-BRR (c) PBC-LSTM
Fig. 12. CDF of CPU accuracy in
PBC-based prediction. Fig. 13. CDF of CPU accuracy with different window sizes in PBC-based prediction.

1 1 1 1
window size=1 window size=1 window size=1
0.8 PBC-ARIMA 0.8 0.8 0.8
window size=10 window size=10 window size=10
0.6 PBC-BRR 0.6 window size=50 0.6 window size=50 0.6 window size=50
CDF

CDF

CDF

CDF
0.4 0.4 0.4 0.4
PBC-LSTM
0.2
0.2 0.2 0.2
0
50 60 70 80 90 100 0 0 0
50 60 70 80 90 100 50 60 70 80 90 100 50 60 70 80 90 100
Accuracy Accuracy Accuracy Accuracy
(a) PBC-ARIMA (b) PBC-BRR (c) PBC-LSTM
Fig. 14. CDF of memory accuracy
in PBC-based prediction. Fig. 15. CDF of memory accuracy with different window sizes in PBC-based prediction.

1 1 1 1
window size=1 window size=1 window size=1
0.8 DBC-ARIMA 0.8 0.8 0.8
window size=10 window size=10 window size=10
0.6 DBC-BRR 0.6 window size=50 0.6 window size=50 0.6 window size=50
CDF

CDF
CDF

CDF

0.4 0.4 0.4 0.4


DBC-LSTM
0.2
0.2 0.2 0.2
0
0 20 40 60 80 100 0 0 0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Accuracy Accuracy Accuracy Accuracy
(a) DBC-ARIMA (b) DBC-BRR (c) DBC-LSTM
Fig. 16. CDF of CPU accuracy in
DBC-based prediction. Fig. 17. CDF of CPU accuracy with different window sizes in DBC-based prediction.

tion accuracy between 0-gap prediction and m-gap prediction. points for training. Even after PBC, PBC-LSTM still performs
We use 0-gap prediction below for these clustering based worse than PBC-BRR. For PBC-BRR, it can handle non-linear
methods. datasets and few training datasets. After PBC, PBC-BRR can
achieve higher prediction accuracy than other two methods.
1) PBC-based Prediction: Figure 12 shows the CDF of the Thus, PBC-BRR still has the best performance among three
prediction accuracy for the CPU usage using the PBC-based methods.
prediction methods. The results follow PBC-ARIMA≈PBC-
Figure 13 show the CDF of the prediction accuracy for
LSTM<PBC-BRR. For PBC-ARIMA, as we discussed before,
the CPU usage using different window sizes for all the three
ARIMA cannot achieve better performance when the data
methods. The results follow the same trend and order as in
is not stable. Since the pathological data randomly exists
Figure 5 due to the same reasons.
in all the tasks [29], even after PBC that clusters similar
tasks for modeling, the performance of ARIMA is still worse Figures 14 and 15 show the CDF of the prediction accuracy
and similar to the performance of PBC-LSTM. PBC-LSTM’s in memory. These figures demonstrate that all the methods
advantage is to predict the workload for long-term tasks (with achieves the similar prediction accuracy performance with
many training data points) because of the neural network varied window sizes. The reason is that, after clustering, all the
involved in LSTM, but it is not good in predicting the methods can achieve similar prediction accuracy performance
workload of short-term tasks that do not have many data even with smaller window size. Comparing these results with

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:01:05 UTC from IEEE Xplore. Restrictions apply.
1 1 1 1
window size=1 window size=1 window size=1
0.8 DBC-ARIMA 0.8 0.8 0.8
window size=10 window size=10 window size=10
0.6 DBC-BRR 0.6 window size=50 0.6 window size=50 0.6 window size=50
CDF

CDF

CDF

CDF
0.4 0.4 0.4 0.4
DBC-LSTM
0.2
0.2 0.2 0.2
0
50 60 70 80 90 100 0 0 0
50 60 70 80 90 100 50 60 70 80 90 100 50 60 70 80 90 100
Accuracy Accuracy Accuracy Accuracy
(a) DBC-ARIMA (b) DBC-BRR (c) DBC-LSTM
Fig. 18. CDF of memory accuracy
in DBC-based prediction. Fig. 19. CDF of memory accuracy with different window sizes in DBC-based prediction.

1 1 1
0-gap-ARIMA 0-gap-BRR 0-gap-LSTM
0.8 0.8 0.8
m-gap-ARIMA m-gap-BRR m-gap-LSTM
0.6 0.6 0.6
CDF

CDF

CDF
PBC-ARIMA PBC-BRR PBC-LSTM
0.4 DBC-ARIMA 0.4 DBC-BRR 0.4 DBC-LSTM
0.2 0.2 0.2
0 0 0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Accuracy Accuracy Accuracy
(a) ARIMA (b) BRR (c) LSTM

Fig. 20. CDF of CPU accuracy of enhanced prediction methods.

1 1 1
0-gap-ARIMA 0-gap-BRR 0-gap-LSTM
0.8 0.8 0.8
m-gap-ARIMA m-gap-BRR m-gap-LSTM
0.6 0.6 0.6
CDF

CDF

CDF
PBC-ARIMA PBC-BRR PBC-LSTM
0.4 DBC-ARIMA 0.4 DBC-BRR 0.4 DBC-LSTM
0.2 0.2 0.2
0 0 0
50 60 70 80 90 100 50 60 70 80 90 100 50 60 70 80 90 100
Accuracy Accuracy Accuracy
(a) ARIMA (b) BRR (c) LSTM

Fig. 21. CDF of memory accuracy of enhanced prediction methods.

Figure 7, we notice that the accuracy of window size 1 is also include the results of the comparison methods in m-gap
greatly improved. Therefore, PBC can highly improve the prediction for reference. Figures 20 and 21 show the CDF
prediction accuracy. of the prediction accuracy in the CPU and memory usage.
2) DBC-based Prediction: Figure 16 shows the CDF of the The results follow 0-gap prediction ≈ m-gap prediction <
prediction accuracy for the CPU usage using the DBC-based PBC ≈ DBC where the improvement is from 75% to over
prediction methods. After DBC, the results follow the same 90% in CPU usage prediction and from 92% to 95% in
trend and order as in Figure 12 due to the same reasons. memory usage prediction. In the trace, since different tasks
Figure 17 show the CDF of the prediction accuracy for have much different patterns, 0-gap prediction and m-gap
the CPU usage using different window sizes for all the three prediction cannot achieve better prediction performance using
methods. The results follow the same trend and order as in only one model. PBC and DBC help the prediction methods
Figure 13 due to the same reasons. achieve better prediction performance since the clustering
Figures 18 and 19 show the CDF of the prediction accuracy methods cluster the tasks with similar patterns into a group for
for the memory usage. These figures follow the same trend and training. Then, the model corresponding to the group of tasks
order as in Figures 16 and 17 due to the same reasons. can capture the pattern easily and predict the performance
3) Comparison Performance Evaluation: In this section, more accurately. The result indicates that clustering methods
we compare our proposed clustering based methods and the can highly improve the prediction accuracy as much as 15%
three state-of-art prediction methods in 0-gap prediction. We

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:01:05 UTC from IEEE Xplore. Restrictions apply.
compared with the previous prediction methods. [15] Z. Gong, X. Gu, and J. Wilkes, “Press: Predictive elastic resource scaling
for cloud systems,” in Proc. of ICNSM, 2010.
V. C ONCLUSION [16] S. Islam, J. Keung, K. Lee, and A. Liu, “Empirical prediction models
for adaptive resource provisioning in the cloud,” Trans. on FGCS, 2012.
Accurate task workload prediction is crucial in cloud re- [17] A. Bankole and S. Ajila, “Cloud client prediction models for cloud
source management. In this paper, we first measured and resource provisioning in a multitier web application environment,” in
compared the state-of-the-art statistical and machine learning Proc. of SOSE, 2013.
[18] N. Roy, A. Dubey, and A. Gokhale, “Efficient autoscaling in the cloud
methods in the task workload prediction using the Google using predictive models for workload forecasting,” in Proc. of ICC, 2011.
cluster trace. Then, we suggested the m-gap prediction method [19] A. Nikravesh, S. Ajila, and C. Lung, “Measuring prediction sensitivity
to do workload prediction a certain time before the predicted of a cloud auto-scaling system,” in Proc. of CSAC, 2014.
[20] Y. Nikravesh, S. Ajila, and C. Lung, “Towards an autonomic auto-scaling
time point to leave enough time for task scheduling based on prediction system for cloud resource provisioning,” in Proc. of SEAS,
predicted workload. We further proposed a clustering based 2015.
workload prediction method for higher prediction accuracy. [21] L. Kang and H. Shen, “Preventing battery attacks on electrical vehicles
based on data-driven behavior modeling,” in Proc. of ICCS, 2019.
This method clusters tasks with similar workload patterns, [22] F. Qiu, B. Zhang, and J. Guo, “A deep learning approach for vm
builds a workload prediction model for each cluster, and workload prediction in the cloud,” in Proc. of SNPD, 2016.
uses corresponding model to predict the upcoming workload [23] J. Gao, H. Wang, and H. Shen, “Task failure prediction in cloud data
centers using deep learning,” Proc. of IEEE Bigdata, 2019.
of a task. This method achieves higher prediction accuracy [24] W. Zhang, P. Duan, L. Yang, F. Xia, Z. Li, Q. Lu, W. Gong, and S. Yang,
compared to the traditional prediction methods. In the future “Resource requests prediction in the cloud computing environment with
work, we will focus on improving the architecture of deep a deep belief network,” Software: Practice and Experience, 2017.
[25] Q. Zhang, L. Yang, Z. Yan, Z. Chen, and P. Li, “An efficient deep
learning algorithms with clustering based model to achieve learning model to predict cloud workload for industry informatics,”
higher prediction accuracy. Trans. on TII, 2018.
[26] J. Kumar, R. Goomer, and K. Singh, “Long short term memory recurrent
ACKNOWLEDGEMENTS neural network (lstm-rnn) based workload forecasting model for cloud
datacenters,” Trans. on Computer Science, 2018.
This research was supported in part by U.S. NSF grants [27] B. Song, Y. Yu, Y. Zhou, Z. Wang, and S. Du, “Host load prediction
NSF-1827674, CCF-1822965, OAC-1724845, CNS-1733596, with long short-term memory in cloud computing,” Journal of Super-
Microsoft Research Faculty Fellowship 8300751, and AWS computing, 2018.
[28] J. Gao, H. Wang, and H. Shen, “Smartly handling renewable energy
Machine Learning Research Awards. instability in supporting a cloud datacenter,” Proc. of IPDPS, 2020.
[29] C. Reiss, J. Wilkes, and J. Hellerstein, “Google cluster-usage traces:
R EFERENCES format+ schema,” Google Inc., White Paper, 2011.
[1] L. Vaquero, L.and Rodero-Merino and M. Caceres, J.and Lindner, “A [30] T. Imam, F. Miskhat, R. Rahman, and A. Amin, “Neural network and
break in the clouds: towards a cloud definition,” in Proc. of SIGCOMM, regression based processor load prediction for efficient scaling of grid
2008. and cloud resources,” in Proc. of ICCIT, 2011.
[2] H. Shen and L. Chen, “Distributed autonomous virtual resource man- [31] F. Farahnakian, P. Liljeberg, and J. Plosila, “Lircup: Linear regression
agement in datacenters using finite-markov decision process,” Trans. on based cpu usage prediction algorithm for live migration of virtual
TON, 2017. machines in data centers,” in Proc. of EuroSEAA, 2013.
[3] A. Josep, R. Katz, A. Konwinski, L. Gunho, D. Patterson, and A. Rabkin, [32] T. Joachims, “Training linear svms in linear time,” in Proc. of SIGKDD,
“A view of cloud computing,” Communications of the ACM, 2010. 2006.
[4] C. Qiu, H. Shen, and L. Chen, “Probabilistic demand allocation for cloud [33] G. Shyam and S. Manvi, “Virtual resource prediction in cloud en-
service brokerage,” in Proc. of INFOCOM, 2016. vironment: a bayesian approach,” Journal of Network and Computer
[5] A. Beloglazov and R. Buyya, “Managing overloaded hosts for dynamic Applications, 2016.
consolidation of virtual machines in cloud data centers under quality of [34] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
service constraints,” Trans. on TPDS, 2013. computation, 1997.
[6] H. Wang, H. Shen, and Z. Li, “Approaches for resilience against [35] S. Das, Time series analysis. Princeton University Press, Princeton,
cascading failures in cloud datacenters,” in Proc. of ICDCS, 2018. NJ, 1994.
[7] W. Wei, H. Fan, X.and Song, and J. Fan, X.and Yang, “Imperfect [36] R. Calheiros, E. Masoumi, R. Ranjan, and R. Buyya, “Workload
information dynamic stackelberg game based resource allocation using prediction using arima model and its impact on cloud applications’ qos,”
hidden markov for cloud computing,” Trans. on SC, 2018. Trans. on CC, 2015.
[8] M. Xu and R. Buyya, “Brownout approach for adaptive management [37] A. Smola and B. Schölkopf, “A tutorial on support vector regression,”
of resources and applications in cloud computing systems: A taxonomy Statistics and computing, 2004.
and future directions,” ACM Computing Surveys (CSUR), 2019. [38] L. Chen, H. Shen, and K. Sapra, “Rial: Resource intensity aware load
[9] Y. Yu, F. Jindal, V.and Bastani, F. Li, and I. Yen, “Improving the balancing in clouds,” in Proc. of INFOCOM, 2014.
smartness of cloud management via machine learning based workload [39] T. Park and G. Casella, “The bayesian lasso,” Journal of the American
prediction,” in Proc. of COMPSAC, 2018. Statistical Association, 2008.
[10] M. Hassan, H. Chen, and Y. Liu, “Dears: A deep learning based elastic [40] D. Basak, S. Pal, and D. Patranabis, “Support vector regression,” 2007.
and automatic resource scheduling framework for cloud applications,” [41] “http://www.acheronanalytics.com/acheron-blog/how-to-measure-the-
in Proc. of UBICOMP, 2018. accuracy-of-predictive-models, [Accessed in APR 2019].”
[11] H. Wang and H. Shen, “Proactive incast congestion control in a [42] J. Moré, “The levenberg-marquardt algorithm: implementation and the-
datacenter serving web applications,” in Proc. of INFOCOM, 2018. ory,” in Numerical analysis, 1978.
[12] A. Khan, X. Yan, S. Tao, and N. Anerousis, “Workload characterization [43] C. Richard, J. Bermudez, and P. Honeine, “Online prediction of time
and prediction in the cloud: A multiple time series approach,” in Proc. series data with kernels,” Trans. on SP, 2009.
of NOMS, 2012. [44] K. Krishna and N. Murty, “Genetic k-means algorithm,” Trans. on SMC,
[13] Y. Jiang, C. Perng, T. Li, and R. Chang, “Asap: A self-adaptive 1999.
prediction system for instant cloud resource demand provisioning,” in [45] G. Celeux and G. Govaert, “Gaussian parsimonious clustering models,”
Proc. of ICDM, 2011. 1995.
[14] A. Morais, V. Brasileiro, V. Lopes, A. Santos, W. Satterfield, and [46] J. Sander, M. Ester, H. Kriegel, and X. Xu, “Density-based clustering
L. Rosa, “Autoflex: Service agnostic auto-scaling framework for iaas in spatial databases: The algorithm gdbscan and its applications,” 1998.
deployment models,” in Proc. of CCGrid, 2013.

Authorized licensed use limited to: UNIVERSIDADE FEDERAL DE MINAS GERAIS. Downloaded on November 29,2023 at 00:01:05 UTC from IEEE Xplore. Restrictions apply.

You might also like