0% found this document useful (0 votes)
208 views

Analysis, Modeling and Simulation of Workload

The document summarizes a study analyzing workload patterns in a large-scale cloud computing environment. Over 25 million tasks submitted by over 900 users over one month were analyzed. Key findings on user and task diversity were quantified and used to develop a simulation model. The model was validated and examples of its practical applications for resource management and energy efficiency were illustrated.

Uploaded by

rajmurugaaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
208 views

Analysis, Modeling and Simulation of Workload

The document summarizes a study analyzing workload patterns in a large-scale cloud computing environment. Over 25 million tasks submitted by over 900 users over one month were analyzed. Key findings on user and task diversity were quantified and used to develop a simulation model. The model was validated and examples of its practical applications for resource management and energy efficiency were illustrated.

Uploaded by

rajmurugaaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

208 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO.

2, APRIL-JUNE 2014

Analysis, Modeling and Simulation of Workload


Patterns in a Large-Scale Utility Cloud
Ismael Solis Moreno, Peter Garraghan, Paul Townend, and Jie Xu, Member, IEEE

AbstractUnderstanding the characteristics and patterns of workloads within a Cloud computing environment is critical in order to
improve resource management and operational conditions while Quality of Service (QoS) guarantees are maintained. Simulation
models based on realistic parameters are also urgently needed for investigating the impact of these workload characteristics on new
system designs and operation policies. Unfortunately there is a lack of analyses to support the development of workload models that
capture the inherent diversity of users and tasks, largely due to the limited availability of Cloud tracelogs as well as the complexity in
analyzing such systems. In this paper we present a comprehensive analysis of the workload characteristics derived from a production
Cloud data center that features over 900 users submitting approximately 25 million tasks over a time period of a month. Our analysis
focuses on exposing and quantifying the diversity of behavioral patterns for users and tasks, as well as identifying model parameters
and their values for the simulation of the workload created by such components. Our derived model is implemented by extending the
capabilities of the CloudSim framework and is further validated through empirical comparison and statistical hypothesis tests. We
illustrate several examples of this works practical applicability in the domain of resource management and energy-efficiency.

Index TermsCloud computing, workload characterization, cloud computing simulation, workload modeling

1 INTRODUCTION

C LOUDcomputing environments are large-scale het-


erogeneous systems that are required to meet Qual-
ity of Service requirements demanded by consumers in
as offering a practical way to improve data center function-
ality. For providers, it enables a method to enhance resource
management mechanisms to effectively leverage the diver-
order to fulfill diverse business objectives [1]. Such sys- sity of users and tasks to increase the productivity and QoS
tem characteristics result in a diversity of Cloud work- of their systems. For example, exploiting task heterogeneity
load in terms of user behavior, task execution length and to reduce performance interference of physical servers or
resource utilization patterns. In this context, Workload is analyzing the correlation of failures to resource consump-
defined as: The amount of work assigned to, or done by, a tion. For researchers, simulation of Cloud workload enables
client, workgroup, server, or system in a given time period evaluation of theoretical mechanisms supported by the
[12] and consists of two components: tasks and users. characteristics of Cloud data centers.
Tasks are defined as the basic unit of computation Ideally such simulation parameters are derived from
assigned or performed in the Cloud, and a user is the empirical analysis of large-scale production Cloud
defined as the actor responsible for creating and config- data centers. Failure to do so results in misleading
uring the volume of tasks to be computed. In order to assumptions about the degree of workload diversity that
further enhance the effectiveness of managing Cloud exists within the Cloud and the creation of unrealistic
computing environments there are two critical require- simulation parameters. This consequently results in limi-
ments. The first is that such environments require exten- tations to the usefulness and accuracy of simulation
sive and continuous analyses in order to understand and parameters. However, deriving such analyses is challeng-
quantify the characteristics of system components. The ing in two specific areas. The first and most critical prob-
second is the exploitation of the parameters derived lem is that there are few available data sources pertaining
from such analyses in order to develop simulation mod- to large-scale production utility Clouds, due to business
els which accurately reflect the operational conditions. and confidentiality concerns. This is a particular chal-
Analysis and simulation of Cloud tasks and users signifi- lenge in academia, which relies on the very few publicly
cantly benefits both providers and researchers, as it enables available Cloud tracelogs. The second problem is analysis
a more in-depth understanding of the entire system as well and simulation of realistic workloads; this is due to the
massive size and complexity of data that a typical pro-
duction Cloud can generate in terms of sheer volume of
 The authors are with the School of Computing, University of Leeds, Leeds users and server events as well as recording resource uti-
LS2 9JT, United Kingdom.
E-mail: {scism, scpmg, p.m.townend, j.xu}@leeds.ac.uk. lization of tasks.
Manuscript received 15 Sept. 2013; revised 3 Mar. 2014; accepted 3 Mar.
Recently, there has been initial work from the analysis
2014. Date of publication 1 Apr. 2014; date of current version 30 July 2014. of limited Cloud traces from Google [2], [3] and Yahoo! [4]
Recommended for acceptance by I. Bojanova, R.C.H. Hua, O. Rana, and M. in an effort to provide mechanisms to analyze and charac-
Parashar. terize workload patterns. However, such efforts are pre-
For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference the Digital Object Identifier below. dominately constrained to traces of short observational
Digital Object Identifier no. 10.1109/TCC.2014.2314661 periods [5] and coarse-grain statistics [6] which are not
2168-7161 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
MORENO ET AL.: ANALYSIS, MODELING AND SIMULATION OF WORKLOAD PATTERNS IN A LARGE-SCALE UTILITY CLOUD 209

sufficient to characterize the workload diversity of Cloud the model based on the validation results. Section 8 dis-
environments. In addition, there have been a number of cusses practical applications of the results obtained within
approaches that analyze the diversity of workload by clas- this paper. Sections 9 and 10 discuss the conclusions and
sifying tasks according to critical characteristics [7], [8], further research directions of this work, respectively.
[9]. However, none of these provide a comprehensive
study of the diversity of users and tasks, or provide a 2 BACKGROUND
model containing sufficient details about the model
parameters obtained from the analyses in order to be of 2.1 Diversity Patterns in Cloud
practical use to researchers. According to the NIST [11], the Cloud computing model has
The objective of this paper is to present an in-depth the following five essential characteristics: on-demand self-
empirical analysis of workload and its diversity in a large- service, resource pooling, broad network access, rapid elas-
scale production Cloud computing data center. Addition- ticity and measured service. These characteristics create
ally, this work aims to provide a validated simulation highly dynamic environments where customers from differ-
model that includes parameters of tasks and users to be ent contexts co-exist submitting workloads with diverse
made available for other researchers to use. The analysis is resource requirements at anytime. Workloads by themselves
conducted using the data from the second version of the have properties or attributes that describe their behavior.
Google Cloud tracelog [3], [10], which contains over 25 These attributes are normally expressed by the type and
million tasks, submitted by 930 users over the observational amount of resources consumed and other attributes that
period of a month. There are three core contributions within could dictate where a specific workload can or cannot be
this work: executed. For example, security requirements, geographical
location, or specific hardware constraints such as processor
 An in-depth statistical analysis of the characteristics architecture, number of cores or Ethernet speed among
of workload diversity within a large-scale produc- others described in [13]. As discussed in [14], as more and
tion Cloud. The analysis was performed over the more customers adopt Cloud platforms to fulfill their IT
entire tracelog time span as well as a number of requirements, Cloud providers need to be prepared to man-
observational periods to investigate patterns of age highly heterogeneous workloads that are served on the
diversity for both users and tasks within the system. top of shared infrastructure. Workloads can be broadly clas-
 An extensive analysis of distribution parameters sified according to the fundamental resources that they con-
derived from the workload analysis that can be sume in terms of CPU, memory and storage-bound
applied to simulation tools by other researchers. workloads [15]. Moreover, depending on the interaction
 A comprehensive validation of the simulation model with the end-users, they can also be classified as latency-
based on empirical and statistical methods. A signifi- sensitive and batch workloads [16]. Common examples of
cant contribution of the simulation model provided workloads running in multi-tenant Cloud data centers
is that it does not just replay the data within the according to [17] include Business Intelligence, scientific
tracelog. Instead, it creates patterns that randomly high-performance computing, gaming and simulation.
fluctuate based on realistic parameters. This is
important in order to emulate dynamic environ- 2.2 Importance of Workload Models in Cloud
ments and to avoid just statically reproducing the Models abstract reality to aid researchers and providers in
behavior from a specific period of time. understanding system environments in order to develop or
enhance such systems. Workload models enable a way to
A secondary contribution of this paper is presenting actually study Cloud environments and the effect of work-
practical applications of the model obtained to identify load variability on the performance and productivity of the
sources of inefficiencies and enhance resource-management overall system. Specifically, they support researchers and
and energy usage in virtualized Cloud environments. providers in further understanding the actual status and
This paper applies the methodology of analysis intro- conditions of the Cloud system and identify Key Perfor-
duced in our previous approach [9], but is substantially mance Indicators (KPI) necessary to improve operational
different in a number of ways. First, this paper focuses parameters. Such models can be used in a number of
specifically on a substantial analysis of Cloud diversity for research domains including resource optimization, security,
tasks and users. Additionally, we analyze the entire trace- dependability and energy-efficiency. In order to produce
log time span and three additional observational periods, realistic models, it is critical to derive their components and
instead of just two dayswhich limited the original parameters from real-world production tracelogs. This
approachs applicability, as it could potentially omit cru- leads to capturing the intrinsic diversity and dynamism of
cial behavior within the overall Cloud environment. Fur- all co-existing components within the system as well as their
thermore, extensive analysis and parameter details are interactions. Moreover, realistic workload models enable
provided for user and task distributions. the simulation of Cloud environments whilst being able to
The remainder of this paper is organized as follows: control selected variables to study emergent system-wide
Section 2 presents the background; Section 3 discusses behavior, as well as support the estimation of accurate fore-
related work; Section 4 details the methodology used. Sec- casting under dynamic system conditions to improve QoS
tion 5 presents the cluster and distribution analysis of task offered to users. This supports the enhancement of Cloud
and user diversity. Section 6 presents the validation of the Management Systems (CMSs) as it allows providers to
model simulation. Section 7 describes the improvements to experiment with hypothetical scenarios and assess their
210 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014

decisions as a result of changes within the Cloud environ- generated by the Hadoop framework. The main objective of
ment (i.e., Capacity planning for increased system size, this work is to group jobs with similar characteristics using
alteration of the workload scheduling algorithm, perfor- clustering to analyze the resulting centroids. This work only
mance tradeoffs, and service pricing models). focuses on the usage of the storage system, neglecting other
critical resources such as CPU and Memory.
Our previous work [9] provides an approach for charac-
3 RELATED WORK terizing Cloud workload based on user and task patterns
The analysis of workload patterns for Cloud computing using the second version of the Google tracelog; it presents
environments has been addressed previously [5], [6], [7], coarse-grain statistical properties of the tracelog, and classi-
[8], [9], [18], [19], [20], [21], [22]. In this section, the most rel- fies tasks and users using statistical mechanisms to select
evant approaches are described; their limitations and gaps the number of clusters. A concise analysis of the clusters is
are also discussed. performed as well as best fit distributions for each. Finally,
Wang et al. [22] present an approach to characterize the the derived analysis parameters are simulated and com-
workloads of Cloud computing Hadoop ecosystems, based pared against the empirical data for validation. This work
on an analysis of the first version of the Google tracelog [2]. has a number of limitations; the analysis performed is con-
The main objective of this work is to obtain coarse-grain sta- fined to only two days as opposed to the entire tracelog
tistical data about jobs and tasks to classify them by dura- time span, resulting in the potential omission of crucial sys-
tion. This characteristic limits the works application to the tem environment behavior. Also, the cluster analysis and
study of timing problems, and makes it unsuitable to ana- intra-cluster analysis do not contain sufficient detail to
lyze other Cloud computing issues related to resource usage quantify the diversity of workload, instead presenting high-
patterns. Additionally, the analysis focuses on tasks and level observations. Furthermore, there is insufficient detail
ignores the relationship with the users, a crucial component about the parameter distributions used; more detail is nec-
in Cloud workload as discussed previously. essary in order for other researchers to simulate the work-
Zhang et al. [5] present a study to evaluate whether the load obtained. Finally, the validation of the simulated
mean values for task waiting time, CPU, Memory, and disk model against that of the empirical data is based only on a
consumption are suitable to accurately represent the perfor- visual match of the patterns from one single execution, and
mance characteristics of real traces. The data used in their does not consider more rigorous statistical techniques.
study is not publicly available and consists of the historical From the analysis of the related work it is clear that there
traces of six Google compute clusters spanning five days of are few available production tracelogs to analyze workload
operation. The evaluation conducted suggests that mean patterns in Cloud environments. Previous analyses present
values of runtime task resource consumption is a promising gaps that need to be addressed in order to achieve more
way to describe overall task resource usage. However, it realistic workload patterns. It is imperative to analyze large
does not describe how the boundaries for task classification data samples as performed by [5], [6], [9]. Small operational
were made and how members behave. time frames as those used in [7], [8], [22] could lead to unre-
Mishra et al. [7] describe an approach to develop Cloud alistic models. Second, analyses need to explore more than
computing workload classifications based on task resource coarse-grain statistics and cluster centroids. To capture the
consumption patterns. The analyzed data consist of records patterns of clustered individuals it is also necessary to con-
from five Google clusters over four days. The proposed duct analysis of the parameters and study the trends of each
approach identifies workload characteristics, constructs the cluster characteristic. Although previously approaches offer
task classification, identifies the qualitative boundaries of some insights about workload characteristics, they do not
each cluster and then reduces the number of clusters by provide a structured model which can be used for conduct-
merging adjacent clusters. This approach is useful to create ing simulations. Finally, the workload is always driven by
the classification of tasks, but does not perform an analysis the users, therefore realistic workload models must include
of the characteristics of the formed clusters in order to user behavioral patterns linked to tasks. The approaches
derive a detailed workload model. Finally, it is entirely previously described completely focus on tasks, neglecting
focused on task modeling, neglecting user patterns. the impact of user behavior on the overall environment
Kuvulya et al. [6] present a statistical analysis of MapRe- workload. A summary of the main characteristics of the
duce traces. The analysis is based on ten months of MapRe- related work is presented in Table 1.
duce logs from the M45 supercomputing cluster [4]. Here,
the authors present a set of coarse-grain statistical character-
istics of the data related to resource utilization, job patterns, 4 METHODOLOGY
and source of failures. This work provides a detailed The methodology, analysis and subsequent simulation
description of the distributions followed by the job comple- within this paper was applied to the second version of
tion times, but only provides very general information the Google Cloud tracelog [3], [10] which contains over
about the resource consumption and user behavioral pat- 12,000 servers, 25 million tasks and 930 users over the
terns. Similar to [22], this characteristic limits the proposed period of a month. The tracelog includes detailed data such
approach mainly to the study of timing problems. as submission patterns, resource requests of users and
Aggarwal et al. [8] describe an approach to characterize resource consumption of tasks within the system.
Hadoop jobs. The analysis is performed on a data set span- The methodology is divided into two distinct steps: The
ning 24 hours from one of Yahoo!s production clusters first is defining the model that will be used for simulating the
comprising of 11,686 jobs. This data set features metrics Cloud workload from the derived data set analysis. As stated
MORENO ET AL.: ANALYSIS, MODELING AND SIMULATION OF WORKLOAD PATTERNS IN A LARGE-SCALE UTILITY CLOUD 211

TABLE 1
Overview of Related Studies

previously, users are responsible for driving the volume and Eti ti P ti j P uj : (6)
behavior of tasks in terms of requested resources and the vol-
ume of task submission. Therefore, three important charac-
The second step of the methodology is to cluster tasks and
teristics that define this behavior within the tracelog are
users composed by the parameters defined for analyzing
referred to as parameters that are fundamental to describe the
and creating realistic workload models derived from empiri-
user behavior: the submission rate a, and requested amount
cal data. k-means clustering is a popular data-clustering
of CPU b and Memory f. The submission rate is the quotient
algorithm to divide n observations into k clusters, in which
of dividing the number of submissions by the tracelog time
analyzed data sets are partitioned in relation to the selected
span and is presented as task submissions per hour.
parameters and grouped around cluster centroids [23].
Requested CPU and memory are represented as normalized
One critical factor in such an algorithm is determining
resources requested by users taken directly from the task
the optimal number of clusters. For the analysis, we use
events log within the tracelog.
the statistical method proposed by Pham, et al. [24]. This
Tasks are defined by the type and amount of work
method, shown in Equations (7) and (8), allows us to
dictated by users, resulting in different execution length
select the number of clusters based on quantitative met-
and resource utilization patterns. Consequently, essential
rics avoiding qualitative techniques that introduce sub-
parameters that describe tasks are the length x and the
jectivity. This clustering method considers the degree of
average resource utilization for CPU g and Memory p.
variability among all the elements within the derived
While the length is defined as the total amount of work to
clusters in relation to the number of analyzed parame-
be computed, the average resource utilization is the mean
ters. A number of clusters k is suggested when this vari-
of all the consumption measurements recorded in the trace-
ability represented by f(k) is lower than or equal to 0.85
log for each task.
according to the observations presented by the authors.
The Cloud workload can be defined as a set of users
SK is the sum of cluster distortions, Nd is the number of
with profiles U submitting tasks classified in profiles T ,
parameters within the population and ak is the weight
where each user profile ui is defined by the probability
factor based on the previous set of clusters.
functions of a; b and f, and each task profile ti byx; g and
We run the k-means clustering algorithm for k ranging
p determined from the tracelog analysis. The expectation
from 1 to 10. For each value of k we calculate f(k) using
Eui of a user profile is given by its probability P(ui ),
Equations (7) and (8). Based on the results we were able to
and the expectation Eti of a task profile is given by its
formally determine the number of clusters for U and T
probability P ti conditioned to the probability of P uj .
(Equations (1) and (2)) respectively.
The model components and their relationship are formal-
ized in Equations (1) to (6). 8
> 1 If k I
>
>
>
<
U fu1 ; u2 ; u3 ; . . . ; ui g (1) Sk
fk If Sk1 6 0; 8k > I (7)
> a
> k Sk1
T ft1 ; t2 ; t3 ; . . . ; ti g (2) >
>
:
1 If Sk1 0; 8k > I

ui ffa; fb; ffg (3) 8 3


< 1  4Nd
> If k 2; and Nd > I
ti ffx; fg; fpg (4) ak (8)
: ak1 1  ak1
> If k > 2 and Na > I:
6
Eui ui P ui (5)
212 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014

Fig. 1. Clusterization for users (a) entire month (b) entire month (omitting outliers) (c) Day 2, (d) Day 18, and (e) Day 26.

5 ANALYSIS OF DIVERSITY analysis comprises of four observational periods; the


entire month trace, Day 2, Day 18 and Day 26. The latter
This section presents the analysis of user and task char-
three observational periods were selected for two rea-
acteristics within the tracelog after performing the
sons: First, they represent observational periods of low
k-means clustering algorithm on the entire tracelog time-
task length, high submission rate and an average of
span as described in Section 4. Specifically, we are inter-
these two parameters respectively. Second, the periods
ested in quantifying and characterizing the diversity of
are temporally far apart, and provide insight into system
user and task behavior that exists within the system
diversity at different system states.
environment. The analysis is divided into two sections;
cluster analysis and distribution analysis.
The cluster analysis discusses the characteristics and 5.1 Cluster Analysis
behavior of the k-clusters and studies the statistical proper- Fig. 1 illustrates the k-clusters partitioning that satisfies f
ties of each parameter within the clusters for users and (k) < 0.85 for users across observational periods. It can
tasks, including the Mean, Standard Deviation and Coeffi- be observed from Fig. 1a that the majority of users across
cient of Variation (Cv). The distribution analysis consists of the entire month request similar portions of CPU and
analyzing the inner data distributions for each of the com- memory, and exhibit similar submission rates. Further-
ponents within each cluster parameter for tasks and users. more, there are three specific users that have a substan-
This required fitting the data to the closest theoretical distri- tially high submission rate and request larger amounts
bution using a Goodness of Fit (GoF) test to obtain the of CPU and memory as shown in clusters 2 (U2) and 3
parameters of their Probabilistic Distribution Functions (U3), respectively. When omitting these three users from
(PDF). The data of each cluster is fitted to a parametric dis- the cluster analysis in Fig. 1b it is clearer to observe that
tribution by using the Anderson-Darling (AD) GoF statisti- clusters characteristics are similar across additional
cal test. The theoretical distribution with the lowest AD- observational periods as demonstrated in Figs. 1c, 1d,
value is selected to represent the data distribution of each and 1e, with a substantial amount of users exhibiting a
cluster. The objective is to use the PDFs of the parameters in similar submission rates and resource request patterns.
the workload model described in Equations (3) and (4). A Table 2 shows the statistical properties of each parameter
number of assumptions for the distribution analysis can be for the defined clusters for the entire tracelog period. It is
found in [9]. The main alteration to the methodology in observable that users follow different resource utilization
order to improve the accuracy of the model is to consider and submission patterns. For example, U2 contains 0.71 per-
the amount of CPU and memory requested by users instead cent of the total user population and has an incredibly high
of the proportions of overestimation and underestimation submission rate in comparison to other clusters. Another
of resources. This is because the overestimation is an example is that U3 has the highest average requested CPU
approximated value, whilst the amount of requested resour- and memory, but has the lowest submission rate, indicating
ces is a factual value which produces more accurate results. this type of user infrequently submits more resource inten-
Moreover, for both the cluster and distribution analy- sive tasks.
sis we have also investigated the variance of task and We observe that requested CPU and memory across most
user clusters and parameters over a number of observa- clusters exhibits low variance, with an average Cv of 0.42
tional periods. The reason for this is to inspect patterns and 0.79 respectively (U3 requested memory appears to
that exist within the data and to explore the degree of have higher variance due to the strong influence of three
variance over the system lifespan. As a result, this specific users discussed above). The parameter submission
MORENO ET AL.: ANALYSIS, MODELING AND SIMULATION OF WORKLOAD PATTERNS IN A LARGE-SCALE UTILITY CLOUD 213

TABLE 2
Statistical Properties of User Clusters for Entire System

Fig. 2. Clusterization for tasks (a) entire month, (b) Day 2, (c) Day 18, and (d) Day 26.

rate exhibits highly variant behavior across all user clusters, highly heterogeneous across all clusters and observational
with an average Cv of 1.97. U2 is the only user cluster whose periods with an average Cv of 2.36, indicating high varia-
Cv submission rate is less than 1, which is most likely due to tion between values. This is due to the same reasons as
the cluster population size of 3. for the variability that exists for user submission rates;
There are three reasons for the above observations. First, task length is a parameter that is outside the boundaries
as reported in previous works [9] the Cloud data center of the system environment and is entirely dependent on
environment is naturally heterogeneous in workload due to the demands of the user (i.e., Users will execute tasks of
user behavior. Second, requested resources by users are different execution length to meet their QoS demands).
possibly a reflection of the application and system domain CPU and memory are less variable due to application
boundaries. For example, applications deployed or invoked domain constraints imposed by the system environment,
within the Cloud environment have pre-defined resource reflected by an average Cv value of 0.93 and 0.83 for CPU
requests to meet the demands of user QoS. Third, the sub- and memory utilization respectively.
mission rate is outside the boundaries of the system and is These results highlight two important findings. First,
entirely driven by users; Such behavior is reflective of the when quantifying the diversity of the Cloud environment, it
definition of Cloud computing, which provides the illusion appears that parameters that are outside the boundaries of
of infinite resource to users [25], allowing them to submit as the system environment introduce the highest level of het-
many tasks as required without conscious thought about erogeneity. This is demonstrated by the parameters user sub-
system limitations. mission rate and task execution length exhibiting highly
Figs. 2a, 2b, 2c, and 2d presents the k-clusters for tasks variant behavior in comparison to CPU and memory
across all observational periods, and demonstrates that it requests and utilization for users and tasks, respectively.
was possible to define three clusters for all observational Second, the diversity of workload imposed by these two
periods where f(k) < 0.85. It is observable that the cluster parameters introduces potential challenges to workload pre-
shapes are visually similar across all observational periods, diction; for this case, where the parameters are highly vari-
with cluster 3 (T3) containing the lowest values for CPU, able and dynamic, the expiration time of historical data
memory and length while T2 exhibits more variant behav- seems to be considerably shorter. Therefore, there exists the
ior. Moreover, T2 composes less than 2 percent of the total need for adaptive and evolving mechanisms that allow
task population and T3 contains over 70 percent of the task providers to obtain more accurate predictions.
population across all time periods as shown in Table 3. In
addition, we observe that the proportions of tasks within 5.2 Distribution Analysis
the clusters stay relatively constant. In comparison to the This section studies the data distributions for each cluster
heterogeneity of user clusters, task patterns appear to be parameter for tasks and users. Figs. 3 and 4 present the
more uniform across different observational periods.
Table 4 presents the statistical properties of the task TABLE 3
parameters length, CPU and Memory utilization for all Proportion of Task Clusters Population Percent
clusters across the four observational periods. It is possi-
ble to make a more balanced comparison of task clusters
over different time periods in contrast to user clusters
due to the observed stability. Similar to the characteristic
of user submission rate, we observe that task length is
214 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014

TABLE 4 used for U3 and U5 shows that a large portion of requested


Statistical Properties of Task Clusters CPU is homogenous for those types of users, while U4
requested CPU and memory is represented with three-
Parameter Weibull, signifying that a large portion of users
in the analyzed environment request smaller portions of
CPU and memory with few users requesting large amounts.
Submission rate distributions predominantly best fit
three-Parameter Weibull and 3-Parameter Lognormal. In
conjunction with the parameter values, we observe that this
data distribution is right-skewed as depicted in Fig. 3c. This
indicates that the Cloud environment is composed of a
majority of users that submit a small number of tasks and a
few users that submit a large proportion of tasks (indeed,
there exists one user that submits approximately 18 percent
of the total tasks [9].)
For tasks, we observe that CPU and memory utilization
across the three clusters follows a number of distributions
including General Extreme Value, Weibull, three-Parameter
Weibull and three-Parameter Lognormal. This indicates
that a high proportion of tasks consume machine resources
at lower rates as shown in Fig. 4a and 4b for CPU and mem-
ory, respectively. The length of a task shares similar behav-
ior to that of the submission rate of a user, in that it is right-
skewed which signifies that most of tasks have a short to
medium duration as depicted in Fig. 4c.
Moreover, we contrast the best fit distributions for tasks
across different observational periods as shown in Table 7. We
Cumulative Distribution Function (CDF) as an example of
present the distribution comparison for tasks, as every obser-
the similarity between the theoretical distribution and the
vational period shares the same number of task clusters that
empirical data for parameters in U1 and T1 obtained as
satisfies fk < 0.85. An observation of interest is that different
result of the fitting process with the AD test. Table 5
days appear to best fit different distributions within that time
presents the probability of CPU and memory consumption
frame. For example, Day 18 is composed of Loglogistic and
equal to 0 for each task cluster, whilst Table 6 presents the
Lognormal distributions for all parameters, while Day 26 is
best fit distributions with their corresponding AD value for
predominantly composed of three-Parameter Lognormal
task and user cluster parameters for the entire tracelog, as
and Gamma distributions.
well as the parameters required for researchers to simulate
In addition, we observe that task length appears to have
the behavior of users and tasks.
exhibit the most consistent distributions characteristics
Inspecting the different types of distributions and their
within a selected time observation, predominately follow-
respective parameter values, we see further statistical evi-
ing Lognormal and three-Parameter Lognormal. We also
dence of inherit workload diversity within the Cloud envi-
observe homogeneity of certain parameters across differ-
ronment due to user behavior. For users, it is observable
ent observational periods, with the time periods following
that the best-fit distribution for requested CPU varies
the same family of distributions for length (Lognormal and
between Logistic, three-Parameter Weibull and Loglogistic
Three-Parameter Lognormal).
and Wakeby. Memory is equally heterogeneous, ranging
from three-Parameter Lognormal, three-Parameter Loglo-
gistic and Weibull. This gives us insight into the nature of 6 MODEL SIMULATION
how different users request different resources based on In order to characterize and analyze the performance of
their requirements. For example, the Wakeby distribution similar large-scale Cloud data centers under a projected

Fig. 3. CDF of user cluster U1 (a) CPU requested, (b) memory requested, and (c) submission rate.
MORENO ET AL.: ANALYSIS, MODELING AND SIMULATION OF WORKLOAD PATTERNS IN A LARGE-SCALE UTILITY CLOUD 215

Fig. 4. CDF of task cluster T1 (a) CPU, (b) memory, and (c) submission rate.

set of operating conditions, we implemented the task and parameters derived for the entire month analysis as
user model parameters described previously as an exten- described in Tables 5 and 6. The profiles of the simulated
sion to the CloudSim framework [26], [27], [28], [29]. servers are outlined from the tracelog as presented in
CloudSim is a Java based framework that enables the sim- Table 8 where the values of CPU and memory are nor-
ulation of complete Cloud Computing environments [27]. malized. The normalization is a scaling relative to the
It provides abstraction of all the elements within the largest capacity of the resource on any server in the trace
Cloud computing model and the interaction among them. which is 1.0.
However, as with any other simulation software, the qual-
ity and accuracy of the results entirely depends on how
6.3 Simulation Validation
accurately the introduced parameters reflect the analyzed
system in reality. The following subsections describe the Model validation is defined as the substantiation that a
implemented workload generator and the conducted sim- computerized model with its domain of applicability possesses
ulation validation. a satisfactory range of accuracy consistent with the intended
application of the model [3]. In the case of the historical
data of trace-driven models where the analyst does not
6.1 Workload and Environment Generator
have access to the real system or to a different dataset
The workload and environment generator is composed of
sample from the same system, a common validation
six modules: The Profile Manager, Data center Generator,
technique consists of using a portion of the available
Customer Generator, Task Generator and Environment
data to construct the model and the remaining data to
Coordinator. The user and task profiles describe respec-
determine whether the model behaves as the real system
tively the user and task types identified during the cluster-
does. This is typically addressed by sampling the ana-
ing process and encapsulate the outlined behavioral
lyzed tracelog where both the input and the actual sys-
patterns derived during the cluster and distribution analy-
tem response must be collected from the same period of
sis. The server profiles describe the capacities and character-
time [31]. According to Sargent [30], there are two basic
istics of the data center hosts according to the data within
approaches in comparing the simulation model to the
the tracelog. These characteristics as well as the proportion
behavior of the real system. The first consists of using
of servers from each type are listed in Table 8.
graphs to empirically evaluate the outputs and the sec-
The profiles manager loads each element description
ond involves the application of statistical hypothesis
making them available to the generators. The User Gener-
tests to make an objective decision.
ator creates the CloudSim user instances and connects
To validate our model simulation we use both techni-
them with a specific profile determined by their associ-
ques; the proportions of categorical data such as task,
ated probabilities as described in Equation (5). The Task
user and server types as well as tasks priorities are con-
Generator creates the CloudSim task instances and con-
trasted empirically by plotting comparative charts and
nects them with a specific task profile determined by the
evaluating the absolute error between the average out-
conditional probability in Equation (6). Each one of the
put from the simulations and the data in the real sys-
user and task characteristics defined such as submission
tem. Additionally we analyze the variability of results
rate, length and resource consumption described in the
and their corresponding confidence interval (CI). On the
model are obtained by sampling the inverse CDFs of the
other hand, continuous data such as the user and task
distributions in Equations (3) and (4). Finally, the Envi-
resource request and consumption patterns are com-
ronment Coordinator controls the interactions between
pared statistically using the Wilcox Mann-Whitney test
the three generators and the CloudSim framework that
executes the simulation with the created instances. TABLE 5
Probability of 0 for Task Resource Utilization
6.2 Simulation Configuration
We have executed a model simulation of a data center
composed of 12,000 servers with 160 customers submit-
ting tasks during 24 hours a total of five iterations. The
user and task profiles are configured using the statistical
216 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014

TABLE 6
Best Fit Distribution Parameters of User and Task Clusters for Entire System

(WMW) [32], [33]. WMW is one of the most powerful 6.4 Validation Results
non-parametric tests for comparing two populations. The results from our simulation experiments demonstrate
According to Mauger [34], it is based on the test of the the accuracy of the derived model to represent the opera-
null hypothesis that the distributions of two populations, tional characteristics of the workload within the Cloud com-
although unspecified, are equal, against the alternative puting data center for the analyzed scenario. Fig. 5
hypothesis that the distributions have the same shape but are illustrates the proportion of components (users, tasks, task
shifted, so the outcomes of one population tends to be larger priorities and servers) created during the simulations which
than the other. It is commonly applied instead of the are contrasted against the observations from the real sys-
two-sample t-test when the analyzed data does not fol- tem. Comparing the average simulation outputs with the
low a normal distribution as is the case of the outlined real values, it is possible to observe that simulated propor-
user and tasks patterns. Additionally, in order to verify tions of fundamental elements consistently match the pro-
the consistency of the WMW test, we have applied the portions of the elements in the actual system. From the
Fishers Method [35]; a meta-analysis technique to com- detailed results presented in Table 9, it can be observed that
bine p-values from different and independent tests while the proportions of tasks do not significantly fluctuate,
which have the same null hypothesis. The objective is to the proportions of users and servers across different simula-
verify whether the rejections are statistically significant tion executions present a higher variability. This is mainly
given the variances reported, or are consistent with the produced by a very small population of specific clusters.
results of the other simulations. For example cluster U2 represents only 0.70 percent of

TABLE 7
Best Fit Distribution Comparison for Task Clusters
MORENO ET AL.: ANALYSIS, MODELING AND SIMULATION OF WORKLOAD PATTERNS IN A LARGE-SCALE UTILITY CLOUD 217

TABLE 8 TABLE 9
Server Characteristics of Tracelog Simulation Results for Proportions of Cloud Data
Center Components

the customers population but introduce a variability of


35.35 percent. The average Cv for tasks is estimated at
0.78 percent against 15.85 and 28.37 percent for customers
and servers, respectively. Although the creation of tasks
depends on the type of users created, the variability
observed in the generation of users is not sufficiently statis-
tically significant to affect the correct proportions of tasks
generated during the different simulations. This can be con-
firmed by analyzing the absolute error between the means
of real and simulated populations. For the generation of real system. In the case of parameters such as CPU
users, the average absolute error is calculated at 0.39 percent requested, memory requested and CPU utilization, 90 per-
while for tasks and servers is calculated as 0.62 and 0.04 per- cent of the results have a moderate to strong significance
cent, respectively. A breakdown of these statistics with their value ranging from 0.05 to 0.99. However, there are instan-
corresponding 95 percent CI for the obtained mean is pre- ces (highlighted in grey) in which there is no statistical evi-
sented in Table 9, where it is also observable that in all the dence to support the WMW null hypothesis.
cases the difference between the simulated and real system The results of Fishers p-value calculation for the clusters
proportions is lower than 1 percent. with at least one rejection are also presented in Tables 10
In regards to the user and tasks patterns derived from and 11. Fishers p-values > 0.05 support the hypothesis
the distribution analysis, we have plotted the empirical that all separate WMW null hypotheses are true. On the
CDF of the real data for each cluster parameter and com- other hand, p-values < 0.05 suggest that the WMW null
pared them against the empirical CDF of their correspond- hypothesis holds in some simulations but not in others.
ing simulation outputs. Due to space constraints, we are From the total 120 evaluated cases there are six solid rejec-
exemplifying this comparison in Figs. 6 and 7 with the tions which represent an error of 5 percent where the most
parameters of U1 and T3 respectively which represent the affected parameter is CPU utilization for T2 and T3.
largest populations for each element in the tracelog. From Regarding to the tasks execution times, it is observed
these figures, it is noticeable that simulated component that like the actual system, the simulated tasks follow a
patterns are consistent with those observed in the real lognormal distribution. That is, most of the tasks have a
data. The most significant differences are found in task short to medium duration; while a small proportion of
CPU consumption patterns. This is confirmed in Tables 10 tasks have a considerable elapsed time as illustrated Fig. 8.
and 11, where the significance values (p-values) obtained Comparing the average location obtained during the simu-
by applying WMW test for each simulation output against lations against the location for the data in the tracelog we
the real system measurements are listed. Significance val- obtain a percentage of error of 1.27 percent for T1, 8.07 per-
ues between 0.30 and 0.99 suggest that the simulated cent for T2 and 5.91 percent for T3. This is consistent with
parameters for user submission rate, task length and task the results of WMW test in Table 11, since the Length and
memory utilization strongly follow the distributions of the CPU utilization is more accurate for T1 the execution time

Fig. 5. Comparison of proportions of real and simulated data for (a) users, (b) tasks, (c) task priority, and (d) servers.
218 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014

Fig. 6. CDF of user patterns between real and simulated data for U1 (a) requested CPU, (b) requested memory, and (c) submission rate.

Fig. 7. CDF of task patterns between real and simulated data for T3 (a) CPU utilization, (b) memory utilization, and (c) length.

TABLE 10
Wilcox Mann-Whitney and Fishers P-Value Test for User Clusters

TABLE 11
Wilcox Mann-Whitney and Fishers P-Value Test for Task Clusters

is closer to that observed in the real system. Conversely, Essentially, the data is ranked and presented in a histo-
differences in CPU utilization for T2 and T3 increase the gram, which is split based on the lowest points of the dif-
error in execution time for these two clusters. ferent valleys created by the multimodal distribution. To
identify the peaks and valleys of a given multimodal data
set, we smooth the histogram by applying the LOWESS
7 IMPROVEMENT OF CPU CONSUMPTION [36] (Locally-Weighted Scatterplot Smoother) technique
PATTERNS using the Minitab statistical package [37]. Then, the
Inaccurate CPU utilization patterns for T2 and T3 are derived sub-regions are fitted to new parametrical
result of multimodal data distributions. This makes fit- distributions following the same process described in
ting such data sets with a single theoretical distribution Section 5.2. Consequently, the CPU utilization patterns of
unsuitable and creates significant gaps between the simu- the affected clusters comprise a combination of different
lated and real data as observed in Fig. 7b. To improve the distributions which are sampled by the model simulator
accuracy of our model, we applied multi-peak histogram based on the proportional size of the derived sub-regions.
analysis for region splitting [38] and fitted the derived The distribution parameters and sizes of the obtained
dataset sub-regions to new parametrical distributions. sub-regions are presented in Table 12.
MORENO ET AL.: ANALYSIS, MODELING AND SIMULATION OF WORKLOAD PATTERNS IN A LARGE-SCALE UTILITY CLOUD 219

Fig. 8. CDF of task patterns between real and simulated data of task execution time (seconds) for (a) T1 (s), (b) T2, and (c) T3.

The results of this process are illustrated in Fig. 9 for resources and energy-efficiency. The core idea is to co-
where it can be observed that the split distributions allocate different types of workloads based on the level of
improve the fitting between the simulated and real data- interference that they create, to reduce resultant overhead
sets. The p-values of the WMW test for both clusters are and thus improve the energy-efficiency of the data center.
sufficiently statistically strong to support the equality of By considering the resource consumption patterns of each
patterns. This reduces the error for execution time from task type we estimate the level of interference and energy-
8.07 to 0.42 percent and from 5.91 to 0.13 percent for T2 efficiency decrement when they are co-located in a physi-
and T3, respectively. cal server. We classify incoming tasks based on their
resource usage patterns, pre-select the hosting servers
8 APPLICATION OF WORK based on resources constraints, and make the final alloca-
tion decision based on the current servers performance
The workload model presented in this paper enables
interference level [40]. In both cases the proposed work-
researchers to simulate request and consumption pat-
load model and the parameters derived from the pre-
terns considering parameters and patterns statistically
sented analysis are used to emulate the user and tasks
close to those observed from a production environment.
patterns required by the energy-aware algorithms. The
This is critical in order to improve resources utilization,
model integrates the relationship between user demand
reduce energy waste and in general terms support the
and the actual resource usageessential in both scenarios
design of accurate forecast mechanisms under dynamic
where the aim is to achieve a balance between resource
conditions to improve the QoS offered to customers. Spe-
request and utilization in order to reduce resource waste.
cifically, we use the proposed model to support the
Another important benefit of our approach is that as val-
design and evaluation of two energy-aware mechanisms
ues of customer and task parameters are represented as pro-
for Cloud computing environments.
portions of resources requested or consumed, they are
The first is a resource overallocation mechanism that
agnostic of underlying hardware characteristics. Therefore,
considers customers resource request patterns and the
the proposed model can be used to evaluate the perfor-
actual resource utilization imposed by their submitted
mance of different data center configurations under the
tasks. Taking into account these parameters from the pro-
same workload.
posed model it is possible to estimate the resource overes-
Furthermore, the comprehensive analysis at cluster and
timation patterns. The main idea is to exploit the resource
intra-cluster level, the workload model that integrates user
overestimation patterns of each user type in order to
and tasks patterns, and the applicability of the model
smartly overallocate resources to the physical servers.
independently of the hardware characteristics represent
This reduces the waste produced by frequent overestima-
unique advances in comparison with the related work pre-
tions and increases data center availability. Consequently,
viously discussed in Section 3. Additionally, the proposed
it creates the opportunity to host additional Virtual
model supports the assessment of resource management
Machines in the same computing infrastructure, improv-
mechanisms such as those recently presented in [41], [42]
ing its energy-efficiency [39].
and [43] with parameters from a large-scale production
The second mechanism considers the relationship
Cloud environment.
between Virtual Machine interference due to competition

TABLE 12
Sub-Regions Distribution Fitting to Improve CPU Utilization for T2 and T3
220 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 2, NO. 2, APRIL-JUNE 2014

 Describing Cloud analyses is an important first step, but


providing the parameters and characteristics derived from
these analyses is critical. This supports the develop-
ment and validation of simulation models as pre-
sented in this work. Such simulations can support the
evaluation of new operational policies, new system
designs, and support the decision-making process as
result of changes in the Cloud environment.
 Workload models can be exploited to improve diverse and
critical operational parameters. This paper has pre-
sented two examples of how the derived model can
be used to improve performance and energy-
efficiency by exploiting the diversity of users and
tasks. In addition, the workload model can be used to
improve parameters such as security, dependability,
and economics.

10 FUTURE WORK
Future research directions includes extending the model to
include tasks constraints based on server characteristics;
this will allows us to analyze the impact of hardware hetero-
Fig. 9. CPU utilization pattern improvement for (a) T2 and (b) T3.
geneity on workload behavior. Other extensions include
analyzing the workload from the jobs perspective specifi-
9 CONCLUSIONS cally modeling the behavior and relationship of users and
This paper presents an analysis that quantifies the diver- submitted jobs, accurately emulating and analyzing work-
sity of Cloud workloads and derives a workload model load energy consumption and reliability enabling further
from a large-scale production Cloud data center. The research into energy-efficiency, resource optimization and
presented analysis and model captures the characteristics failure-analysis in the Cloud environment. Finally, it is
and behavioral patterns of user and task variability important to enable a collaboration link with the CloudSim
across the entire system as well as different observa- group in order to integrate the proposed workload genera-
tional periods. The derived model is implemented using tor as an add-in of the current framework implementation
the CloudSim framework and extensively validated allowing it to be made publicly available.
through empirical comparison and statistical tests.
From the observations presented within this work and ACKNOWLEDGMENTS
the results obtained from the simulations, a number of The work was supported by CONACyT (No. 213247),
conclusions can be made. These are as follows: the National Basic Research Program of China (973)
(No. 2011CB302602), and the UK EPSRC WRG platform
 Workload in Cloud data centers is driven not only by tasks
project (No. EP/F057644/1).
characteristics but also by user behavioral patterns.
Related approaches on workload analysis are
focused on parameters such as the duration and the REFERENCES
resources consumed by tasks. However, as observed [1] R. Buyya, R. Ranjan, and R. N. Calheiros, InterCloud: Utility-ori-
from the presented analysis, in some scenarios spe- ented federation of cloud computing environments for scaling of
application services, Proc. 10th Int. Conf. Algorithms Archit. Parallel
cific types of users impose a strong influence on the Process., 2010, pp. 1331.
overall Cloud workload. Therefore, comprehensive [2] Google. Google Cluster Data V1 (2010). [Online] Available: http://
workload models must consider both tasks and users code.google.com/p/googleclusterdata/wiki/TraceVersion1
[3] Google. Google Cluster Data V2 (2011). [Online] Available: http://
in order to reflect realistic conditions. code.google.com/p/googleclusterdata/wiki/ClusterData2011_1
 User patterns tend to be significantly more diverse than [4] Yahoo. Yahoo! M45 Supercomputing Project. (2007). [Online].
task patterns across different observational periods. Available: http://research.yahoo.com/node/1884
Depending on the type of service offered, providers [5] Q. Zhang, J. Hellerstein, and R. Boutaba, Characterizing task
usage shapes in Google compute clusters, in Proc. 5th Int. Work-
can control the type of tasks and the environment in shop Large Scale Distrib. Syst. Middleware, 2011, pp. 28.
which they are running (i.e., SaaS and PaaS). This can [6] S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan, An analysis of
create more stable tasks patterns over the time. On traces from a production MapReduce cluster, in Proc. IEEE/ACT
Int. Conf. Cluster, Cloud Grid Comput., 2010, pp. 94103.
the other hand, user patterns tend to change accord- [7] A. K. Mishra, J. Hellerstein, W. Cirne, and C. R. Das, Towards
ing to needs derived from their own business objec- characterizing cloud backend workloads: Insights from Google
tives which are completely out of the boundaries of compute clusters, ACM SIGMETRICS Perform. Eval. Rev., vol. 37,
Cloud providers. This creates new challenges on pp. 3441, 2010.
[8] S. Aggarwal, S. Phadke, and M. Bhandarkar, Characterization of
workload prediction mechanisms that need to evolve Hadoop jobs using unsupervised learning, in Proc. 2nd Int. Conf.
and adapt according to such dynamic characteristics. Cloud Comput. Technol. Sci., 2010, pp. 748753.
MORENO ET AL.: ANALYSIS, MODELING AND SIMULATION OF WORKLOAD PATTERNS IN A LARGE-SCALE UTILITY CLOUD 221

[9] I. Solis Moreno, P. Garraghan, P. Townend, and J. Xu, An [33] A. Gold, Understanding the Mann-Whitney Test, J. Property Tax
approach for characterizing workloads in Google cloud to derive Assessment Admin., vol. 4, pp. 5557, 2007.
realistic resource utilization models, in Proc. IEEE Int. Symp. Serv. [34] D. T. Mauger and G. L. Kauffman Jr, 82 - statistical analysis
Oriented Syst. Eng., 2013, pp. 4960. specific statistical tests: Indications for use, Surgical Research
[10] C. Reiss, J. Wilkes, and J. Hellerstein, Google Cluster-Usage W. S. Wiley and W. W. Douglas, eds., San Diego, CA, USA, Aca-
Traces: Format Schema, Google Inc., Mountain View, CA, demic, 2001, pp. 12011215.
USA, White Paper, 2011. [35] D. A. S. Fraser, A. K. M. Saleh, and K. Ji, Combining p-values: A
[11] P. Mell and T. Grance, The NIST definition of cloud computing, definitive process, J. Statist. Res., vol. 44, pp. 1529, 2010.
NIST Spec. Publication, vol. 800, p. 145, 2011. [36] D. Borcard and P. Legendre, Exploratory data analysis,in
[12] M. A. El-Refaey and M. A. Rizkaa, Virtual systems workload char- Numerical Ecology, New York, NY, USA, Springer, pp. 930, 2011.
acterization: An overview, in Proc. IEEE Int. Workshops Enabling [37] Minitab, Version: Release 16 (2010). MINITAB statistical software
Technol. Infrastructures Collaborative Enterprises, 2009, pp. 7277. [Online]. Available: http://www.minitab.com.
[13] B. Sharma, V. Chudnovsky, J. Hellerstein, R. Rifaat, and C. R. Das, [38] S. Pal and P. Bhattacharyya, Multipeak histogram analysis in
Modeling and synthesizing task placement constraints in Google region splitting: A regularization problem, in Proc. IEEE Comput.
compute clusters, in Proc. ACM Symp. Cloud Comput., 2011, pp. 114. Digit. Tech., 1991, vol. 138, pp. 285288.
[14] J. Zhan, L. Wang, W. Shi, S. Gong, and X. Zang, PhoenixCloud: [39] I. Solis Moreno and J. Xu, Neural network-based overallocation
Provisioning resources for heterogeneous workloads in cloud for improved energy-efficiency in real-time cloud environments,
computing, arXiv preprint arXiv:1006, vol. 1401, 2010. in Proc. IEEE Int. Symp. Object/Compon./Serv.-Oriented Real-Time
[15] V. Vasudevan, D. Andersen, M. Kaminsky, L. Tan, J. Franklin, and Distrib. Comput., 2012, pp. 119126.
I. Moraru, Energy-efficient cluster computing with FAWN: [40] I. Solis Moreno, R. Yang, J. Xu, and T. Wo, Improved energy-effi-
Workloads and implications, in Proc. Int. Conf. Energy-Efficient ciency in cloud datacenters with interference-aware virtual
Comput. Netw., 2010, pp. 195204. machine placement, in Proc. IEEE Int. Symp. Auton. Decentralized
[16] T. N. B. Doung, X. Li, R. S. M. Goh, X. Tang, and W. Cai, QoS- Syst., 2013, pp. 18.
aware revenue-cost optimization for latency-sensitive services in [41] X. Lu, H. Wang, J. Wang, J. Xu, and D. Li, Internet-based virtual
IaaS clouds, in Proc. IEEE/ACM Int. Symp. Distrib. Simul. Real computing environment: Beyond the data center as a computer,
Time Appl., 2012, pp. 1118. Future Generation Comput. Syst., vol. 29, pp. 309322, 2013.
[17] IBM, Get more out of cloud with a structured workload analysis, [42] M. Kesavan, I. Ahmad, O. Krieger, R. Soundararajan, A.
White Paper IAW03006-USEN-00, 2011. Gavrilovska and K. Schwan, Practical compute capacity
[18] A. Bahga and V. K. Madisetti, Synthetic workload generation for management for virtualized datacenters, IEEE Trans. Cloud
cloud computing applications, J. Softw. Eng. Appl., vol. 4, Comput., vol. 1, no. 1, pp. 88100, Jan.-Jun. 2013.
pp. 396410, 2011. [43] J. Doyle, R. Shorten, and D. OMahony, Stratus: Load balancing
[19] A. Beitch, B. Liu, T. Yung, R. Griffith, A. Fox, and D. A. Patterson, the cloud for carbon emissions control, IEEE Trans. Cloud Com-
Rain: A workload generation toolkit for cloud computing put., vol. 1, no. 1, pp. 116128, Jan.-Jun. 2013.
applications, Elect. Eng. Comput. Sci. Univ. California, Berkeley,
CA, USA, White Paper UCB/EECS-2010-14, 2010. Ismael Solis Moreno received the PhD degree
[20] Y. Chen, A. S. Ganapathi, R. Griffith, and R. H. Katz, Analysis from the University of Leeds, and the MSc
and lessons from a publicly available Google cluster trace, USA, degree from the CENIDET, Mexico. He has
EECS Dept., Univ. California, Berkeley, CA, UCB/EECS-2010-95., worked as a researcher for the Mexican Electrical
Jun. 2010. Research Institute. His current work on energy-
[21] J. W. Smith and I. Sommerville, Workload classification & soft- efficient Cloud computing is funded by the CON-
ware energy measurement for efficient scheduling on private cloud ACyT. He has received best paper awards at
platforms, presented at the ACM SOCC, Cascais, Portugal, 2011. IEEE SOSE-2013 and IEEE ISADS-2013.
[22] G. Wang, A. R. Butt, H. Monti, and K. Gupta, Towards synthesiz-
ing realistic workload traces for studying the Hadoop ecosystem,
in Proc. IEEE Int. Symp. Modeling, Anal. Simul. Comput. Telecom-
mun. Syst., 2011, pp. 400408. Peter Garraghan received the BSc degree from
[23] R. Xu and D. Wunsch, Survey of clustering algorithms, IEEE Staffordshire University, United Kingdom, and is
Trans. Neural Netw., vol. 16, pp. 645678, 2005. currently working toward the PhD degree in the
[24] D. T. Pham, S. S. Dimov, and C. D. Nguyen, Selection of K in Distributed Systems and Service Group at the
K-means clustering, Proc. Inst. Mech. Eng., Part C: J. Mech. Eng. University of Leeds. He has worked as an IT spe-
Sci., vol. 219, pp. 103119, 2005. cialist at HP, Germany. His current research on
[25] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, Cloud computing and energy-aware dependabil-
A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. ity is funded by the UK EPSRC WRG platform
Zaharia, Above the Clouds: A Berkeley view of cloud project. He has received an award for best con-
computing, Univ. California, Berkeley, CA, USA, Tech. Rep. ference paper at the IEEE SOSE-2013.
UCB/EECS-2009-28, Feb. 2009.
[26] R. Buyya, R. Ranjan, and R. N. Calheiros, Modeling and simula- Paul Townend is a research fellow in the School
tion of scalable cloud computing environments and the CloudSim of Computing, University of Leeds. He has been
toolkit: Challenges and opportunities, in Proc. Intl Conf. High Per- a lead researcher on major projects dealing with
form. Comput. Simul., 2009, pp. 111. HPC, decision support, large-scale simulations,
[27] R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, and R. Cloud computing, and dependable and secure
Buyya, CloudSim: A toolkit for modeling and simulation of cloud systems. He has extensive experience in collabo-
computing environments and evaluation of resource provisioning rating with academia, local government, and
algorithms, Softw. Practice Experience, vol. 41, pp. 2350, 2010. industry, and has authored and coauthored more
[28] S. K. Garg and R. Buyya, NetworkCloudSim: Modelling parallel than 40 international publications.
applications in cloud simulations, in Proc. IEEE Intl. Conf. Utility
Cloud Comput., 2011, pp. 105113.
[29] B. Wickremasinghe, R. N. Calheiros, and R. Buyya, CloudAnalyst: Jie Xu is a chair of computing and head of the I-
A CloudSim-based visual Modeller for analysing cloud computing CSS at the University of Leeds. He is the director
environments and applications, in Proc. IEEE Intl. Conf. Adv. Inf. of the UK EPSRC WRG e-Science Centre. He is
Netw. Appl., 2010, pp. 446452. also a guest professor of Beihang University,
[30] R. G. Sargent, Verification and validation of simulation models, China. He has published more than 300 aca-
in Proc.Conf, Winter Simul., 2010, pp. 166183. demic papers in areas related to dependable dis-
[31] O. Balci and R. G. Sargent, Some examples of simulation model tributed systems and has industrial experience in
validation using hypothesis testing, Proc. Conf. Winter Simul., designing and implementing large-scale net-
vol. 2, pp. 621629, 1982. worked computer systems. He has led or coled
[32] D. Brown and P. Rothery, Models in biology: Mathematics, statis- many research projects to the value of more than
tics and computing, Proc. 14th Conf. Winter Simul, 1993. $30M. He is a member of the IEEE.

You might also like