0% found this document useful (0 votes)
2 views14 pages

Paper 1

The paper presents an optimization framework for cloud service processing using virtual machines (VMs), focusing on resource allocation, task scheduling, and resource sharing schemes. It introduces a minimized overhead VM resource allocation scheme, evaluates task scheduling policies, and explores resource sharing models, demonstrating significant performance improvements in competitive situations. Experimental results indicate that combining lightest workload first (LWF) with adjusted proportional-share models enhances response time and fairness in resource allocation.

Uploaded by

Sheng Di
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views14 pages

Paper 1

The paper presents an optimization framework for cloud service processing using virtual machines (VMs), focusing on resource allocation, task scheduling, and resource sharing schemes. It introduces a minimized overhead VM resource allocation scheme, evaluates task scheduling policies, and explores resource sharing models, demonstrating significant performance improvements in competitive situations. Experimental results indicate that combining lightest workload first (LWF) with adjusted proportional-share models enhances response time and fairness in resource allocation.

Uploaded by

Sheng Di
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO.

6, JUNE 2015 1755

Optimization of Composite Cloud Service


Processing with Virtual Machines
Sheng Di, Member, IEEE, Derrick Kondo, Member, IEEE, and Cho-Li Wang, Member, IEEE

Abstract—By leveraging virtual machine (VM) technology, we optimize cloud system performance based on refined resource
allocation, in processing user requests with composite services. Our contribution is three-fold. (1) We devise a VM resource allocation
scheme with a minimized processing overhead for task execution. (2) We comprehensively investigate the best-suited task scheduling
policy with different design parameters. (3) We also explore the best-suited resource sharing scheme with adjusted divisible resource
fractions on running tasks in terms of Proportional-share model (PSM), which can be split into absolute mode (called AAPSM) and
relative mode (RAPSM). We implement a prototype system over a cluster environment deployed with 56 real VM instances, and
summarized valuable experience from our evaluation. As the system runs in short supply, lightest workload first (LWF) is mostly
recommended because it can minimize the overall response extension ratio (RER) for both sequential-mode tasks and parallel-mode
tasks. In a competitive situation with over-commitment of resources, the best one is combining LWF with both AAPSM and RAPSM. It
outperforms other solutions in the competitive situation, by 16 þ % w.r.t. the worst-case response time and by 7.4 þ % w.r.t. the
fairness.

Index Terms—Cloud resource allocation, task scheduling, resource allocation, virtual machine, minimization of overhead

1 INTRODUCTION

C LOUD computing [1], [2] has emerged as a flexible plat-


form allowing users to customize their on-demand
services. Platform as a service (PaaS) is a classical paradigm,
as the real resource amounts consumed by tasks. Such an
over-commitment of resources may result in relatively short-
supply situation (a.k.a., competitive situation) occasionally,
and a typical example is Google App Engine [3], which degrading the overall quality of service (QoS) [11].
allows users to easily deploy and release their own services In our cloud model, each user request (or task) is made up
on the Internet. of a set of subtasks (or web service instances), and in this
Our cloud scenario is similar to the PaaS model, in which paper, we aim to answer four questions below.
the users can submit complex requests each being composed
of off-the-shelf web services. Each service is associated with  how to optimize resource allocation for a task based
a price, which is assigned by its creator. When a user sub- on its budget, where the subtasks inside the task can
mits a compute request (or a task) that calls other services, be connected in parallel or in series.
he/she needs to pay the usage of these services and the pay-  how to split the physical resources according to
ment is determined by how much resource to be consumed. tasks’ various requirements in both competitive and
On the other hand, virtual machine (VM) resource isolation non-competitive situation.
technology [4], [5], [6], [7], [8], [9] can effectively isolate vari-  how to minimize data transmission overhead and
ous types of resources for the VMs running on the same operation cost of virtual machine monitor (VMM).
hardware. We leverage such a feature to refine the resource  how to schedule user requests with minimized task
allocation, which is completely transparent to users. response time in a competitive situation.
In cloud systems [10], over-commitment of physical Based on our characterization of Google trace [11], [13]
resources is fairly common in order to achieve high resource which contains 4,000 types of cloud applications, we find
utilization. According to a Google trace [11] with 10kþ hosts, that there are only two types of Google tasks, sequential-
for example, Reiss et al. [12] presented the resource amounts mode task and parallel-mode task. The former contains
requested are often greater than the total capacity of Google multiple subtasks connected sequentially (like a sequential
data centers, and the requesting amounts are usually twice workflow) and the latter executes multiple subtasks in par-
allel (e.g., mapreduce). We try to optimize the performance
for both of the two cases.
 S. Di is with MCS division, Argonne National Laboratory. The cloud system may experience two different situa-
E-mail: [email protected].
 D. Kondo is with INRIA, Grenoble, France.
tions, either non-competitive status or competitive status.
E-mail: [email protected].
 C.-L. Wang is with Department of Computer Science, The University of 1) For a non-competitive situation, the available resour-
Hong Kong, Hong Kong, China. E-mail: [email protected]. ces are relatively adequate for user demands, so the
Manuscript received 13 Apr. 2013; revised 5 Mar. 2014; accepted 18 May optimality is mainly determined by task’s intrinsic
2014. Date of publication 8 June 2014; date of current version 13 May 2015. structure (e.g., how its subtasks are connected) and
Recommended for acceptance by K. Li. budget. In particular, some subtask’s output needs
For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference the Digital Object Identifier below. to be transferred to its succeeding subtask as input,
Digital Object Identifier no. 10.1109/TC.2014.2329685 and the data transmission delay cannot be
0018-9340 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
1756 IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO. 6, JUNE 2015

overlooked if the data size is huge. On the other


hand, since we will take advantage of the VM
resource isolation, the cost of VMM operations (such
as the time cost in performing CPU-capacity chang-
ing command at runtime) is also supposed to be
minimized.
2) For a competitive situation, how to keep each task’s
QoS at a high level and maximize the overall fairness
of the treatment meanwhile is quite challenging. On
one hand, each task’s execution is determined by a dif-
ferent structure that is made up of multiple subtasks
Fig. 1. System overview of cloud composite service system.
corresponding to various services, and also associated
with a varied budget to restrict its total payment. On
the other hand, a competitive situation with limited mode tasks, LWFþLSTF leads to the best result, which is
available resources may easily delay some particular better than other solutions by 3.8 percent-51.6 percent.
responses, leading to the unfairness of treatment. In the remainder of the paper, we will use the term host,
In our experiment, we find that assigning different pri- machine, and node interchangeably. In Section 2, we describe
orities to tasks in the task scheduling stage and in the the architecture of our cloud system. In Section 3, we formu-
resource allocation stage would bring out significantly dif- late the research problem in our cloud environment, to be
ferent effects on the overall performance and fairness. aiming to maximize individual task’s QoS and the overall fair-
Hence, we investigate the best-suited queuing policies for ness of treatment meanwhile. In Section 4, we discuss how to
maximizing the overall performance and fairness of QoS. optimize the execution of each task with minimized over-
As for the sequential-mode tasks, the candidate queuing heads, and how to stabilize the QoS especially in a competi-
policies include first-come-first-serve (FCFS), shortest-opti- tive situation. We present experimental results in Section 5.
mal-length-first (SOLF), lightest-workload-first (LWF), We discuss the related works in Section 6. Finally, we con-
shortest-subtask-first (SSTF) (a.k.a., min-min), and slowest- clude the paper with a vision of the future work in Section 7.
progress-first (SPF). SOLF assigns higher priorities to the
tasks with shorter theoretically optimal execution length 2 SYSTEM OVERVIEW
estimated based on our convex-optimization model, which The system architecture of our composite cloud service sys-
is similar to the heterogeneous earliest finish time (HEFT) tem is shown in Fig. 1a. A user request (a.k.a., a task) is made
[14]. LWF and SSTF can be considered shortest job first up of multiple subtasks connected in parallel or sequentially.
(SJF) and min-min algorithm [15] respectively. The intui- Each subtask is an instance of an off-the-shelf service that has
tive idea of SPF is similar to earliest deadline first (EDF) a very convenient interface (such API) to be called. Each
[16], wherein we adopt two criteria to evaluate the task exe- whole task is expected to be completed as soon as possible
cution progress. In addition, we also investigate the best- under the constraint of its budget. Task scheduling is a key
suited scheduling policy for the parallel-mode tasks. The layer to coordinate the task priorities. Resource allocation
candidate scheduling policies include FCFS, SSTF, LWF, layer is responsible for calculating the optimal resource frac-
longest subtask first (LSTF) and the hybrid approach with tion for the subtasks. Each physical host runs multiple VMs,
a mixture of queuing policies, e.g., LWFþLSTF. on each of which are deployed with all of the off-the-shelf
We also explore a best-fit resource allocation scheme services (e.g., the libraries or programs that do the computa-
(called adjusted proportional-share model (PSM)) to adapt tion). Each subtask will be executed on a VM, with an amount
to the competitive situation. Specifically, we investigate of virtual resource fraction tuned by the substrate VM moni-
how to coordinate the divisible resource allocation among tor (VMM, a.k.a., hypervisor). Fault tolerance is beyond the
the running tasks in terms of their structures like workload scope of the paper, and we discuss this issue in [17] in details.
or varied estimated progress. Each task is processed via a scheduling queue, as
Based on the composite cloud service model, we imple- shown in Fig. 1b. Tasks are submitted continually over
ment a distributed prototype that is able to solve/calculate time, and each task submitted will be analyzed by a task
complex matrix problems submitted by users. We also parser (in the user interface module), in order to predict
explore the best choice of the involved parameters used in the subtask workloads based on input parameters. Sub-
our algorithm, by running our experiments performed on a task’s workload can be characterized using {resource_
real-cluster environment with 56 VMs and 10 services with processing_ratesubtask_execution_length} based on his-
various execution types. Experiments show that for the torical traces or workload prediction approaches like poly-
sequential-mode tasks, the worst-case performance under nomial regression method [18]. Our optimization model
LWF is higher than that under other policies by at least will compute the optimal resource vector of all subtasks
16 percent when overall resource amount requested is about for the task. And then, the unprocessed subtasks with no
twice as the real resource amount that can be allocated. dependent preceding unprocessed subtasks will be put/
Another key lesson we learned is that in a competitive situa- registered in a queue, waiting for the scheduling notifica-
tion, short tasks (with the short single-core execution tion. Upon being notified, the hypervisor of the selected
length) are better to be assigned with more powerful physical machine will launch a VM and perform the
resource amounts than the theoretically optimal values resource isolation to match optimization demand. The cor-
derived from the optimization theory. As for the parallel- responding service on the VM will be called using specific
DI ET AL.: OPTIMIZATION OF COMPOSITE CLOUD SERVICE PROCESSING WITH VIRTUAL MACHINES 1757

input parameters, and the output will be cached in the There are two metrics to evaluate the system perfor-
VM, waiting for the notification of the data transmission mance. One is RER of each task (defined in Formula (3))
for its succeeding subtask.
We adopt XEN’s credit scheduler [19] to perform the t0i s real response time
RERðti Þ ¼ : (3)
resource isolation among VMs on the same physical t0i s theoretically optimal length
machine. With XEN [20], we can dynamically isolate some
The RER is used to evaluate the execution performance for a
key resources (like CPU rate and network bandwidth) to suit
particular task. The lower value the RER is, the higher exe-
the specific usage demands of different tasks. There are two
cution efficiency the corresponding task is processed in
key concepts in the credit scheduler, capacity and weight.
reality. A sequential-mode task’s theoretically optimal length
Capacity specifies the upper limit on the CPU rate consum-
(TOL) is the sum of the theoretical execution time of each
able by a particular VM, and weight means a VM’s propor-
subtask based on the optimal resource allocation solution
tional-share credit. On a relatively free physical host, the
to the above problem (Formulas (1) and (2)), while a paral-
CPU rate of a running VM is determined by its capacity. If
lel-mode task’s TOL is equal to the largest theoretical sub-
there are over-many VMs running on a physical machine,
task execution time. The response time here indicates the
the real CPU rates allocated for them are proportional to their
whole wall-clock time from a task’s submission to its final
weights. Both capacity and weight can be tuned at runtime.
completion. In general, the response time of a task includes
subtask’s waiting time, overhead before subtask execution
3 PROBLEM FORMULATION (e.g., on resource allocation or data transmission), subtask’s
Assuming there are n tasks to be processed by the system, productive time, and processing overhead after execution.
and they are denoted as ti , where i ¼ 1, 2, . . . ; n. Each task We try best to minimize the cost for each part.
is made up of multiple subtasks connected in series or in The other metric is the fairness index of RER among all
parallel. We denote the subtasks of the task ti to be tið1Þ , tið2Þ , tasks (defined in Formula (4)), which is used to evaluate the
. . ., tiðmi Þ , where mi refers to the number of subtasks in ti . fairness of the treatment in the system. Its value is ranged in
Such a formulation is generic enough such that any user [0, 1], and the bigger its value is, the higher fairness of the
request (or task) can be constructed by multiple nested com- treatment is. Based on Formula (3), the fairness is also
posite services (or subtasks). related to the different types of execution overheads. How
Task execution time is represented in different ways to effectively coordinate the overheads among different
based on different intra-structure about subtask connection. tasks is a very challenging issue. This is mainly due to
For the sequential-mode task, its total execution time (or exe- largely different task structure (i.e., the subtask’s workload
Pmi liðjÞ and their connection way), task budget, and varied resource
cution length) can be denoted as T (ti ) ¼ j¼1 r , where liðjÞ
iðjÞ availability over time
and riðjÞ are referred to as the workload of subtask tiðjÞ and  Pn 2
the compute resource allocated respectively. The workload i¼1 RERðti Þ
fairnessðti Þ ¼ Pn : (4)
here is evaluated by the number of instructions or data to n i¼1 RER2 ðti Þ
read/write from/to disk, and the compute resource here
means workload processing rate like CPU rate and disk I/O Our final objective is to minimize RER for each individ-
bandwidth. As for a parallel mode task (e.g., embarrassingly ual task (or minimize the maximum RER) and maximize the
parallel application), its total execution length is equal to the overall fairness, especially in a competitive situation with
longest execution time of its subtasks (or makespan). We will over-many submitted tasks.
use execution time, execution length, response length, and
wall-clock time interchangeably in the following text. 4 OPTIMIZATION OF SYSTEM PERFORMANCE
Each subtask tiðjÞ will call a particular service API, which
In order to optimize the entire QoS for each task, we need to
is associated with a service price (denoted as piðjÞ ). The ser-
minimize the time cost at each step in the course of its exe-
vice prices ($/unit) are determined by corresponding ser- cution. We study the best-fit solution with respect to three
vice makers in our model, since they are the ones who pay following facets, resource allocation, task scheduling, and
monthly resource leases to infrastructure-as-a-service (IaaS) minimization of overheads.
providers (e.g., Amazon EC2 [21]). The total payment in
executing a task ti on top of service layer is equal to
Pmi 4.1 Optimized Resource Allocation with VMs
j¼1 ½riðjÞ  piðjÞ . Each task is associated with a budget We first derive an optimal resource vector for each task
(denoted as Bðti Þ) by its user in order to control its total pay- (including parallel-mode task and sequential-mode task),
ment. Hence, the problem of optimizing task ti ’s execution subject to task structure and budget, in both non-com-
can be formulated as Formulas (1) and (2) (convex-optimi- petitive situation and competitive situation. In non-com-
zation problem) petitive situation, there are always available and
8P l adequate resources for task processing. As for an over-
< m i iðjÞ
j¼1 r ; ti is in sequential mode committed situation (or competitive situation), the over-
min T ðti Þ ¼ iðjÞ
 liðjÞ  (1) all resources are over-committed such that the requested
: maxj¼1m ; ti is in parallel mode
i r iðjÞ resource amounts succeed the de-facto resource amounts
X
mi in the system. In this situation, we designed an adjust-
s:t: ½riðjÞ  piðjÞ   Bðti Þ: (2) able resource allocation method for maintaining the
j¼1 high performance and fairness.
1758 IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO. 6, JUNE 2015

4.1.1 Optimal Resource Allocation in Non-Competitive T ðti Þ subject to the constraint (2) is shown as Equa-
Situation tion (8), where j ¼ 1, 2, . . . ; mi
liðjÞ
In a non-competitive situation (with unlimited available riðjÞ ¼ Pmi  Bðti Þ: (8)
resource amounts), the resource fraction allocated to some j¼1 piðjÞ liðjÞ
task is mainly restricted by its user-set budget. Based on the Proof. We just need to prove the optimal situation
target function (Formula (1)) and a constraint (Formula (2)), occurs if and only if all of subtask execution
we analyze the two types of tasks (sequential-mode and lengths are equal to each other. That is, the entire
parallel-mode) respectively. execution length of a parallel-mode task will be
minimized if and only if Equation (9) holds
 Optimization of Sequential-Mode Task:
lið1Þ lið2Þ liðmi Þ
Theorem 1. If task ti is constructed in sequential mode, ¼ ¼  ¼ : (9)
rið1Þ rið2Þ riðmi Þ
ti ’s optimal resource vector r ðti Þ for minimizing
T ðti Þ subject to the constraint (2) is shown as Equa- In this situation, we can easily derive equation (8)
tion (5), where j ¼ 1, 2, . . . ; mi by using up the user-preset budget Bðti Þ, i.e., let-
P i
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ting m j¼1 ½riðjÞ  piðjÞ  ¼ Bðti Þ hold.
l =:piðjÞ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  Bðti Þ:
riðjÞ ¼ Xmi iðjÞ As follows, we use proof-by-contradiction
(5)
k¼1
liðkÞ piðkÞ method to prove that Equation (9) is a necessary
condition of the optimal situation by contradic-
@2 T ðti Þ l
Proof. Since @rj ¼ 2 riðjÞ
3 > 0, T (ti ) is convex with a
tion. Let us suppose an optimal situation with
iðjÞ minimized task wall-clock length occurs while
minimum extreme point. By combining the con-
Equation (9) does not hold. Without loss of gener-
straint (2), we can get the Lagrangian function as ality, we denote by tiðkÞ the subtask that has the
liðkÞ
Formula (6), where  refers to the Lagrangian longest execution time (i.e., riðkÞ ), that is,
multiplier liðkÞ
! T ðti Þ ¼ riðkÞ . Since equation (9) does not hold, there
X
mi
liðjÞ X
mi
l l
F ðri Þ ¼ þ  Bðti Þ  riðjÞ piðjÞ : (6) must exist another subtask tiðjÞ such that riðjÞ < riðkÞ .
j¼1
riðjÞ j¼1
iðjÞ iðkÞ
Obviously, we are able to add a small increment
~k to riðkÞ and decrease riðjÞ by ~j correspond-
We derive Equation (7) via Lagrangian multiplier
ingly, such that the total payment is unchanged
method and the two subtasks’ wall-clock lengths become
rffiffiffiffiffiffiffi rffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffi the same. That is, Equations (10) and (11) hold
lið1Þ lið2Þ l
rið1Þ rið2Þ    riðmi Þ ¼ p : p :    : piðmi:Þ (7) simultaneously
ið1Þ ið2Þ iðmi Þ

riðjÞ piðjÞ þ riðjÞ piðkÞ ¼ ðriðjÞ  Dj ÞpiðjÞ þ ðriðjÞ þ Dk ÞpiðkÞ


In order to minimize T ðti Þ, the optimal resource (10)
vector r iðjÞ should use up all budgets (i.e., let the liðjÞ liðkÞ
¼ : (11)
riðjÞ  Dj riðkÞ þ Dk
total payment be equal to Bðti Þ). Then, we can get
Equation (5). u
t It is obvious that the new solution with Dj and
Dk gets the further reduced task wall-clock length,
As follows, we discuss the significance of Theo-
which contradicts to our assumption that the pre-
rem 1 and how to split physical resources among dif-
vious allocation is optimal. u
t
ferent tasks based on VM resource isolation in
practice. According to Theorem 1, we can easily com-
pute the optimal resource vector for any task based 4.1.2 Adjusted Resource Allocation to Adapt
on its budget constraint. Specially, riðjÞ is the theoreti- to Competitive Situation
cally optimal resource vector (or processing rate)
If the system runs in short supply, it is likely the total
allocated to the subtask tiðjÞ , such that the total wall-
sum of their optimal resources (i.e., r ðti Þ) may succeed
clock time of task ti can be minimized. That is, even the total capacity of physical machines. At such a com-
though there were more available resources com- petitive situation, it is necessary to coordinate the priori-
pared to the value riðjÞ , it would be useless for the ties of the tasks in the resource consumption, such that
task ti due to its limited budget. In this situation, our none of tasks’ real execution lengths would be extended
resource allocator will allocate the theoretically opti- noticeably compared to its theatrically optimal execution
mal resource fraction (Formula (5)) to each subtask’s length (i.e., minimizing RER(ti ) for each task ti ). In our
resource capacity (such as maximum CPU rate). system, we improve the proportional-share model [22]
with XEN’s credit scheduler by further enhancing
 Optimization of Parallel-Mode Task:
resource fractions for short tasks.
Theorem 2. If task ti is constructed in the parallel mode, Under XEN’s credit scheduler, each guest VM on the
ti ’s optimal resource vector r ðti Þ for minimizing same physical machine will get its CPU rate that is
DI ET AL.: OPTIMIZATION OF COMPOSITE CLOUD SERVICE PROCESSING WITH VIRTUAL MACHINES 1759

proportional to its weight.1 Suppose on a physical host is running d subtasks (belonging to different tasks),
(denoted as hi ), ni scheduled subtasks are running on ni which are denoted as t1ðx1 Þ , t2ðx2 Þ , . . . , tdðxd Þ , where
stand-alone VMs separately (denoted vj , where j ¼ 1, 2, xi ¼ 1, 2, . . . , or mi . Then, wðtiðjÞ Þ will be determined
. . . ni ). We denote the host hi ’s total compute capacity to be ci by either Formula (13) or Formula (14), based on dif-
(e.g., eight cores), and the weights of the ni subtasks to be ferent proportional-share credits (either task’s
wðv1 Þ, wðv2 Þ, . . . , wðvni Þ. Then, the real resource fraction
workload or task’s TOL). Hence, the relative mode
(denoted by rðvj Þ) allocated to the VM vj can be calculated by
based APSM (abbreviated as RAPSM) has two dif-
Formula (12)
ferent types, workload-based APSM (abbreviated as
RAPSM(W)) and TOL-based APSM (abbreviated as
wðvj Þ
rðvj Þ ¼ Pni ci : (12) RAPSM(T))
k¼1 wðvk Þ
8 
Now, the key question becomes how to determine the < h  riðjÞ li  a

weight value for each running subtask (or VM) on a physi- wðtiðjÞ Þ ¼ r iðjÞ a < li  b (13)
:1 
h  riðjÞ li > b
cal machine, to adapt to the competitive situation. We
devise a novel model, namely adjusted proportional-share
model (APSM), which further tunes the credits based on 8 
task’s workload (or execution length). The design of APSM < h  riðjÞ
> T ðti Þ a

is based on the definition of RER: a large value of RER tends wðtiðjÞ Þ ¼ riðjÞ a < T ðti Þ  b (14)
to appear with a short task. This is mainly due to the fact >
: 1  r
h iðjÞ T ðti Þ > b:
that the overheads (such as data transmission cost, VMM
operation cost) in the whole wall-clock time are often rela-
The weight values in our design (Formula (13)) are
tively constant regardless of the total task workload. That is,
determined by four parts, the extension coefficient
based on RER’s definition, short task’s RER is more sensi-
tive to the execution overheads than that of a long one. (h), theoretically optimal resource fraction (riðjÞ ), the
Hence, we make short tasks tend to get more resource frac- threshold value a to determine short tasks, and the
tions than their theoretically optimal vector (i.e., riðjÞ ). There threshold value b to determine long tasks. Obvi-
are two alternative ways to realize this effect. ously, the value of h is supposed to be always
greater than 1. In reality, tuning h’s value could
 Absolute mode. For this mode, we use a threshold adjust the extension degree for short/long tasks.
(denoted as t) to split running tasks into two catego- Changing the values of a and b could tune the num-
ries, short tasks (workload  t) and long tasks (work- ber of the short/long tasks. That is, by adjusting
load > t). Three values of t are investigated in our these values dynamically, we could optimize the
experiments: 500, 1,000, or 2,000, which corresponds
overall system performance to adapt to different
to 5, 10 or 20 seconds when running a task on a single
contention states. Specific values suggested in prac-
core. We assign as much resource as possible to short
tice will be discussed with our experimental results.
tasks, while keeping the long tasks’ resource fractions
In practice, one could use either of the above two modes
unchanged. Task length is evaluated in terms of its
or both of them, to adjust the resource allocation to adapt to
workload to process. In practice, it can be estimated
the competitive situation.
based on the workload characterization over history
or workload prediction method like [18]. In our
4.2 Best-Suited Task Scheduling Policy
design based on the absolute mode, short tasks’ cred-
its will be set to 800 (i.e., eight cores), implying the In a competitive situation where over-many tasks are sub-
full computational power. For example, if there is mitted to the system, it is necessary to queue some tasks
only one short running task on a host, it will be that cannot find the qualified resources temporarily. The
assigned with full resources (eight cores) for its com- queue will be checked as soon as some new resources are
putation. If there are more running tasks, they will be released or new tasks are submitted. As multiple hosts are
allocated according to PSM, while short tasks will be available for the task (e.g., there are still available CPU rates
probably assigned with more resource fractions. non-allocated on the host), the most powerful one with the
 Relative mode. Our intuitive idea is adopting a largest availability will be selected as the execution host. A
proportional-share model on most of the middle- key question is how to select the waiting tasks based on
their demands, such that the overall execution performance
size-tasks such that their resource fractions received
and the fairness can both be optimized.
are proportional to their theoretically optimal
Based on the two-fold objective that aims to minimize the
resource amounts (riðjÞ ). Meanwhile, we enhance
RER and maximize the fairness meanwhile, we investigate
the credits of the subtasks whose corresponding the best-fit scheduling policy for both sequential-mode tasks
tasks are relatively short and decrease the credits of and parallel-mode tasks. We propose that (1) the best-fit
the ones with long tasks. That is, we give some extra queuing policy for the sequential-mode tasks is lightest-
credits to short tasks to enhance their resource con- workload-first policy, which assigns the highest scheduling
sumption priority. Suppose on a physical machine priority to the shortest task that has the least workload
amount to process; (2) the best-fit policy for parallel-mode
1. Weight-setting command is “xm sched-credit -d VM -w weight”. tasks is adopting LWF and longest-subtask-first (LSTF)
1760 IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO. 6, JUNE 2015

together. In addition, we also evaluate many other queuing Pmi Pmi liðjÞ
li ¼ j¼1 liðjÞ , and TOLðti Þ ¼ j¼1 r . SPF means
policies for comparison, including FCFS, SOLF, SPF, SSTF, iðjÞ

and so on. We describe all the task-selection policies below. that the smaller value of ti ’s WP ðti Þ or TP ðti Þ, the
higher ti ’s priority would be. For example, if ti is a
 FCFS. FCFS schedules the subtasks based on their newly submitted task, its workload processed must
arrival order. The first arrival one in the queue will be 0 (or d = 0), then WP ðti Þ would be equal to 0, indi-
be scheduled as long as there are available resources cating ti is with the slowest process
to use. This is the most basic policy, which is the easi-
est to implement. However, it does not take into Pd
l
account the variation of task features, such as task WP ðti Þ ¼ j¼1 iðdÞ
; (15)
li
structure, task workload, thus the performance and
wallclock time since t0i s submission
fairness will be significantly restricted. TP ðti Þ ¼ TOLðti Þ : (16)
 Lightest-Workload-First. LWF schedules the subtasks
based on the predicted workload of their correspond- Based on the two different definitions, the SPF can be
ing tasks (a.k.a., jobs). Task’s workload is defined as split into two types, namely slowest-workload-prog-
the execution length estimated based on a standard ress-first (SWPF) and slowest-time-progress-first
process rate (such as single-core CPU rate). In the (STPF) respectively. We evaluated both of them in
waiting queue, the subtask whose corresponding our experiment.
task has lighter workload will be scheduled with a  SSTF. SSTF selects the shortest subtask waiting in the
higher priority. In our Cloud system that aims to min- queue. The shortest subtask is defined as the subtask
imize the RER and maximize the fairness meanwhile, (in the waiting queue) which has the minimal work-
LWF obviously possesses a prominent advantage. load amount estimated based on single-core compu-
Note that various tasks’ TOLs are different due to tation. As a subtask is completed, there must be some
their different budget constraints and workloads, new resources released for other tasks, which means
while tasks’ execution overheads tend to be constant that a new waiting subtask will then be scheduled if
because of usually stable memory size consumed the queue is non-empty. Obviously, SSTF will result
over time. In addition, the tasks with lighter work- in the shortest waiting time to all the subtasks/tasks
loads tend to be with smaller TOLs, based on the defi- on average. In fact, since we select the “best” resource
nition of T ðti Þ. Hence, according to the definition of in the task scheduling, the eventual scheduling effect
RER, the tasks with lighter workloads (i.e., shorter of SSTF will make the short subtasks be executed as
jobs) are supposed to be more sensitive to their execu- soon as possible. Hence, this policy is exactly the
tion overheads, which means that they should be same as min-min policy [15], which has been effective
associated with higher priorities. in Grid workflow scheduling. However, our experi-
 SOLF. SOLF is designed based on such an intui- ments validate that SSTF is not the best-suited sched-
tion: in order to minimize RER of a task, we can uling policy in our Cloud system.
only minimize the task’s real execution length  LWF þ LSTF. We can also combine different individ-
because its theoretically optimal length (TOL) is a ual policies to generate a new scheduling policy. In
fixed constant based on its intrinsic structure and our system, LWF þ LSTF is devised for parallel-
budget. Since tasks’ TOLs are different due to their mode task, whose total execution length is deter-
heterogeneous structures, workloads, and budgets, mined by its longest subtask execution length (i.e.,
the execution overheads will impact their RERs to makespan), thus the subtasks with heavier work-
different extents. Suppose there were two tasks loads in the same task will have higher priority to
whose TOLs are 30 and 300 seconds respectively schedule. On the other hand, in order to minimize
and their execution overheads are both 10 seconds. the overall waiting time, all of tasks will be sched-
Even though the sums of their subtask execution uled based on lightest workload first. LWF þ LSTF
lengths were right the optimal values (30 and 300 means that the subtasks whose task has the lightest
seconds), their RERs would be largely different: workload will have the highest priority and the sub-
30 þ 10
30 versus 300300
þ 10
. In other words, the tasks with tasks belonging to the same task will be scheduled
shorter TOLs are supposed to be scheduled with based on longest subtask first. In addition, we also
higher priorities, for minimizing the discrepancy implement LWF þ SSTF for comparison.
among tasks’ RERs.
 SPF. SPF is designed for sequential-mode tasks, 4.3 Minimization of Processing Overheads
based on task’s real execution progress compared to In our system, in addition to the waiting time and execution
its overall workload or TOL. The tasks with the slow- time of subtasks, there are three more overheads which need
est progress will have the highest scheduling priori- also to be counted in the whole response time, VM resource
ties. The execution progress can be defined based on isolation cost, data transmission cost between sub-tasks, and
either the workload processed or the wall-clock time VM’s default restoring cost. Our cost-minimization strategy
passed. They are called workload progress (WP) and is performing the data transmission and VMM operations
time process (TP) respectively, and they are defined in concurrently, based on the characterization of their costs.
Formulas (15) and (16) respectively. In the two for- We also assign extra amount of resources to super-short
mulas, d refers to the number of completed subtasks, tasks (e.g., the tasks with TOL  2 seconds) in order to
DI ET AL.: OPTIMIZATION OF COMPOSITE CLOUD SERVICE PROCESSING WITH VIRTUAL MACHINES 1761

TABLE 1
Workloads (Single-Core Execution Length) of 10 Matrix Computations (Seconds)
Matrix Scale M-M-Multi. QR-Decom. Matrix-Power M-V-Multi. Frob.-Norm Rank Solve Solve-Tran. V-V-Multi. Two-Norm
500 0.7 2.6 m = 10 2.1 0.001 0.010 1.6 0.175 0.94 0.014 1.7
1,000 11 12.7 m = 20 55 0.003 0.011 8.9 1.25 7.25 0.021 9.55
1,500 38 35.7 m = 20 193.3 0.005 0.03 29.9 4.43 24.6 0.047 29.4
2,000 99.3 78.8 m = 10 396 0.006 0.043 67.8 10.2 57.2 0.097 68.2
2,500 201 99.5 m = 20 1,015 0.017 0.111 132.6 18.7 109 0.141 136.6

mitigate the impact of the overhead to their executions. Spe- single thread, thus they cannot get speedup when being
cifically, we run them directly on VMs without any credit- allocated with more than one processor. Hence, we set the
tuning operation. Otherwise, the credit-tuning effect may capacity of any subtask performing a single-threaded ser-
work on another subtask instead of the current subtask, vice to be single-core rate, or less when its theoretically opti-
due to the inevitable delay (about 0.3 seconds) of the credit- mal resource to allocate is less than one core.
tuning command. Details can be found in our corresponding In our experiment, we are assigned with eight physical
conference paper [24]. nodes to use from the most powerful cluster at HongKong
(namely Gideon-II [23]), and each node owns two quad-
core Xeon CPU E5540 (i.e., eight processors per node) and
5 PERFORMANCE EVALUATION 16 GB memory size. There are 56 VM-images (centos 5.2)
5.1 Experimental Setting maintained by Network File System (NFS), so 56 VMs
We implement a cloud composite service prototype that can (seven VMs per node) will be generated at the bootstrap.
help solve complex matrix problems, each of which is XEN 4.0 [20] serves as the hypervisor on each node and
allowed to contain a series of nested or parallel matrix com- dynamically allocates various CPU rates to the VMs at run-
putations. For an example of nested matrix computation, a time using the credit scheduler.
user may submit a request like Solve((Amn  Anm )k , Bmm ), We will evaluate different queuing policies and resource
which can be split into three steps (or subtasks): (1) matrix- allocation schemes under different competitive situations
product (a.k.a., matrix-matrix multiply): Cmm ¼ Amn  with different numbers (4-24) of tasks simultaneously.
Anm ; (2) matrix-power: Dmm ¼ Cmmk
; (3) calculating least Table 2 lists the candidate key parameters we investigated
squares solution of DX ¼ B: Solve(Dmm , Bmm ). in our evaluation. Note that the measurement unit of h and
In our experiment, we make use of ParallelColt [25] to b for RAPSM(T) is second, while the measurement unit for
perform the math computations, each consisting of a set of RAPSM(W) is seconds  100, because a single core’s proc-
matrix computations. ParallelColt [25] is such a library that essing ability is represented as 100 according to XEN’s
can effectively calculate complex matrix computations, such credit scheduler [19], [20].
as matrix-matrix multiply and matrix decomposition, in
parallel (with multiple threads) based on symmetric multi- 5.2 Experimental Results
ple processor (SMP) model. 5.2.1 Demonstration of Resource Contention Degrees
There are totally 10 different matrix computations (such
as matrix-product, matrix-decomposition, etc.) as shown We first characterize the various contention degrees with
in Table 1. We carefully characterize the single-core execu- different number of tasks submitted. The contention
tion length (or workload) for each of them, and find that degree is evaluated via two metrics, allocate-request ratio
each matrix computation has its own execution type. For (abbreviated as ARR) and queue length (abbreviated as
example, matrix-product and matrix-power are typical QL). System’s ARR at a time point is defined as the ratio
computation-intensive services, while rank and two-norm of the total allocated resource amount to the total amount
computation should be memory-intensive or I/O-bound requested by subtasks at that moment. QL at a time point
ones when matrix scale is large. Hence, each sequential- is defined as the total number of subtasks in the waiting
mode task that is made up of multiple different matrix list at that moment. There are four test-cases each of
computations in series can be considered complex applica- which uses different number of tasks (4, 8, 16, and 24)
tions with execution types varied over time. submitted. The four test-cases correspond to different
In each test, we randomly generate a number of user contention degrees.
requests, each of which is composed of 5 15 sub-tasks
(or matrix computation services). Such a simulation is non- TABLE 2
trivial since each emulated matrix has to be compatible for Candidate Key Parameters
each matrix computation (e.g., two matrices in a matrix-
product must be in the form of Amn and Bnp respectively). Parameter Value
Among the 10 matrix-computation services, three services threshold of short task length (seconds) 5, 10, 20
are implemented as multiple-threaded programs, including h 1.25, 1.5, 1.75, 2
matrix-matrix multiply, QR-decomposition, matrix-power, a w.r.t. RAPSM(T) (seconds) 5, 10, 20
b w.r.t. RAPSM(T) (seconds) 100, 200, 300
hence their computation can get an approximate-linear a w.r.t. RAPSM(W) (seconds  100) 500, 1,000, 2,000
speedup when allocated multiple processors. The other b w.r.t. RAPSM(W) (seconds  100) 10,000, 20,000, 30,000
seven matrix operation services are implemented using
1762 IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO. 6, JUNE 2015

Fig. 4. Distribution of RER in a competitive situation.

instance, when 24 tasks are submitted simultaneously, ARR


stays around 1/2 during the first 50 seconds. It is also worth
noting that the longest task execution length under FCFS is
remarkably longer than that under LWF (about 280 seconds
versus about 240 seconds). This implies scheduling policy is
essential to the performance of Cloud system.
Fig. 3 presents that the QL increases with the number of
tasks submitted. It is worth noticing that QL under different
scheduling policies exhibits quite different. In the duration
with high competition (the first 50 seconds in the test), SSTF
and LWF both lead to small number of waiting tasks (about
5-6 and 6-7 respectively). By contrast, under SOLF, SWPF,
or STPF, QL is much longer (about 10-12 waiting tasks on
average), implying a longer waiting time.
Fig. 2. Allocation versus request with different contention degrees.
5.2.2 Investigation of Best-Suited Strategy
Fig. 2 shows the summed resource amount allocated and We explore the best-suited scheduling policy and resource
the summed amount requested over time under different allocation scheme, in a competitive situation with 24 tasks
1
competitive situations, with exactly the same experimental (allocate-request ratio 2 for the first 50 seconds). The
settings except for different scheduling policies. The num- investigation is for sequential-mode tasks and parallel-
bers enclosed in parentheses indicate the number of tasks mode tasks respectively.
submitted. A. Best-suited strategy for sequential-mode tasks. Fig. 4 shows
We find that with the same number of submitted tasks, the distribution (cumulative distribution function) of RER in
ARR exhibits similarly with different scheduling policies. the competitive situation, with different scheduling policies
The resource amount allocated can always meet the
resource amount requested (i.e., ARR keeps one and two TABLE 3
curves overlap in the figure) when there are a small number Statistics of RER in a Competitive Situation
(4 or 8) of tasks submitted, regardless of the scheduling poli- with Sequential-Mode Tasks
cies. This confirms our resource allocation scheme can guar-
strategy min. avg. max. fairness
antee the service level in the non-competitive situation. As
the system runs with over-many tasks (such as 16 and 24) FCFSþPSM 0.712 3.919 22.706 0.352
submitted, there would appear a prominent gap between FCFSþRAPSM(T) 0.718 4.042 23.763 0.351
FCFSþRAPSM(W) 0.720 4.137 24.717 0.348
the resource allocation curve and the resource request LWFþPSM 0.720 2.106 8.202 0.628
curve. This clearly indicates a competitive situation. For LWFþRAPSM(T) 0.719 2.172 8.659 0.603
LWFþRAPSM(W) 0.723 2.122 7.937 0.630
SOLFþPSM 0.736 2.979 13.473 0.506
SOLFþRAPSM(T) 0.745 3.252 14.625 0.527
SOLFþRAPSM(W) 0.738 3.230 14.380 0.526
SSTFþPSM 0.791 2.068 8.263 0.591
SSTFþRAPSM(T) 0.769 2.169 9.024 0.566
SSTFþRAPSM(W) 0.770 2.126 8.768 0.579
SWPFþPSM 0.713 6.167 58.691 0.209
SWPFþRAPSM(T) 0.726 6.532 62.332 0.208
SWPFþRAPSM(W) 0.718 6.477 61.794 0.208
STPFþPSM 0.723 3.176 16.398 0.465
STPFþRAPSM(T) 0.723 3.208 15.831 0.475
STPFþRAPSM(W) 0.722 3.188 15.399 0.485
Fig. 3. Queue lengths with different scheduling policies.
DI ET AL.: OPTIMIZATION OF COMPOSITE CLOUD SERVICE PROCESSING WITH VIRTUAL MACHINES 1763

TABLE 4
Statistics of RER in a Competitive Situation
with Parallel-Mode Tasks

strategy min. avg. max. fairness


FCFS 2.75 1.50 4.97 0.89
LWF 2.25 1.30 3.72 0.92
HWF 3.04 1.45 6.25 0.86
SSTF 3.37 2.09 4.94 0.96
LWFþLSTF 2.34 1.57 3.58 0.96
LWFþSSTF 2.27 1.35 3.79 0.94
Fig. 5. Distribution of PBR and PPR in a competitive situation. HWFþLSTF 3.20 1.43 6.88 0.81
HWFþSSTF 3.20 1.38 7.39 0.82
used for the sequential-mode tasks. For each policy, we ran
the experiments with all the possible combinations of param-
eters shown in Table 2, and then compute the distribution. It so that it is meaningless to characterize the processing prog-
is clearly observed that the RERs under LWF and SSTF are ress, unlike the situation with sequential-mode tasks.
much smaller than those under other policies. By contrast, Each test is conducted with 24 tasks, and each task ran-
The two worst policies are SWPF and SOLF, whose maxi- domly contains 5-15 parallel matrix-power computation
mum RERs are even up to 105 and 55 respectively. The main subtasks. Since there are only eight physical execution
reason is that LWF and SSTF lead to much shorter waiting nodes, so this is a competitive situation where some tasks
time than SWPF and SOLF, according to Fig. 3. have to be queued for scheduling. In this experiment, we
In addition to task scheduling policy, we also investigate also implement heaviest workload first (HWF) combined
the best-fit resource allocation scheme. In Table 3, we show with longest (shortest) subtask first for comparison.
the statistics of RER with various solutions, by combining Based on Table 4, we can observe that LWF always leads
different scheduling policies and resource allocation to a fairly high scheduling performance. For example, when
schemes. We evaluate each solution with all of different only adopting LWF, the average RER is about 1.30, which is
combinations of parameters (including t, h, a, and b) and lower than that of FCFS by 1:51:5 1:3
¼ 13.3%, and lower than
2:09  1:3
compute the statistics (including minimum, average, maxi- SSTF by 2:09 ¼ 37.8%. Adopting the LWFþLSTF (i.e., the
mum, and fairness value). combination of LWF and SSTF) can minimize the maximum
Through Table 3, we observe that LWF and SSTF result in RER to be 3.58, which is lower than other strategies by
the lowest RERs on average and at the worst case, which is 3.8 percent (LWF)—51.6 percent (HWFþSSTF).
consistent with the distribution of RER as shown in Fig. 4. The key reason why LWF exhibits much better results is
They improve the performance by 3:9192:1  1 ¼ 86:6%, as com- that LWF schedules the shortest tasks with highest priority,
pared to FCFS policy. It is also observed that relative mode suffersing the least overall waiting time. In particular, not
based adjusted PSM (RAPSM) may not further reduce the only can LWFþLSTF minimize the maximum ERE, but it can
RER as expected. This means that RAPSM cannot directly also lead to the highest fairness (up to 0.96), which means a
improve the execution performance without absolute fairly high stability of the LWFþLSTF policy. This is because
mode based adjusted PSM (AAPSM). Later, we will show in each task is a parallel-mode task, such that longest subtask
Section 5.2.3 that the solutions with AAPSM can signifi- first can effectively minimize the makespan for each task,
cantly reduce RER, in comparison to the RAPSM. optimizing the execution performance for a particular task.
We also show the distributions of payment budget ratio
(PBR) and performance payment ratio (PPR) in Fig. 5. 5.2.3 Exploration of Best-Fit Parameters
Through Fig. 5a, we observe that all of tasks’ payments are In this section, we comprehensively compare various solu-
guaranteed below their budgets. This is due to the strict tions with different scheduling policies and adjusted
payment-by-reserve model (Formula (2) and Theorem 1) we resource allocation schemes with different parameters
always followed in our design. Through Fig. 5b, it is also shown in Table 2, in a both competitive situation (i.e., AAR
observed that PPR exhibits similarly to RER. For example, is about 1/2) and non-competitive situation (i.e., AAR
two best scheduling policies are also LWF and SSTF. Their approaches 1). We find that the adjusted resource allocation
mean PPRs are 0.874 and 0.883 respectively; their maximum scheme could effectively improve the execution perfor-
PPRs are 8.176 and 6.92 respectively. Apparently, if we do mance (RER and PPR), only when combining it with the
not take into account the difference of adjusted resource Absolute Mode based Adjusted PSM (AAPSM).
allocation scheme but task scheduling policy, SSTF outper- A. Evaluation in a competitive situation. Fig. 6 shows the
forms others prominently due to its shortest waiting time RER of various solutions with different parameters, includ-
on average. ing h, a, and b. It is observed that various policies with dif-
B. Best-suited strategy for parallel-mode tasks. We also ferent choices of the three parameters lead to different
explore the best-suited scheduling strategy with respect to results. The smallest RER (best result) is 1.77, when adopt-
the parallel-mode tasks. Table 4 presents different mini- ing SSTFþRAPSM(W) and h, a, and b being set to 1.75,
mum/average/maximum/fairness values of RER when 30,000, and 1,000. The largest RER (worst case) is 7.69, when
scheduling parallel-mode tasks by different policies, includ- adopting SWPFþRAPSM(W) and h, a, and b being set to
ing FCFS, LWF, SSTF, etc. Note that SPF is not included 1.5, 20,000, and 1,000. We also find that different selections
because all of subtasks in a task will be executed in parallel, of the three parameters may affect the performance
1764 IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO. 6, JUNE 2015

Fig. 6. Average RER of various solutions with different parameters.

prominently for a few solutions like STLFþRAPSM(T) and In our experiments, the most interesting and valuable
STLFþRAPSM(W). However, they would not impact RER finding is that the AAPSM with different short task
clearly in most cases. From Figs. 6b 6d, 6f and 6h, it is length thresholds (t) will result in quite different results,
observed that with different parameters, the RERs under which we show in Tables 5, 6 and 7. These three tables
both LWF and SSTF are within [1.85, 2.31]. present the three key indicators, average RER, maximum

TABLE 5 TABLE 6
Mean RER under Various Solutions with Different t and h Max. RER under Various Solutions with Different t and h

strategy t h ¼ 1.25 h ¼ $1.5 h ¼ 1.75 h¼2 strategy t h ¼ 1.25 h ¼ 1.5 h ¼ 1.75 h¼2
FCFS 5 4.050 4.142 4.131 4.054 FCFS 5 23.392 25.733 24.742 24.470
10 4.122 3.952 3.924 3.845 10 24.184 22.397 22.633 23.192
20 4.121 4.196 4.139 4.296 20 24.204 25.258 24.391 25.313
LWF 5 2.071 2.090 2.169 2.138 LWF 5 9.164 7.826 8.543 8.323
10 2.268 2.133 2.179 2.152 10 10.323 7.681 9.136 9.104
20 2.194 1.935 2.194 2.218 20 8.106 5.539 6.962 8.812
SOLF 5 3.316 3.321 3.552 3.102 SOLF 5 13.294 15.885 15.743 13.255
10 3.241 3.382 2.989 2.783 10 12.735 18.719 13.939 10.325
20 3.375 3.324 3.305 3.039 20 17.070 15.642 13.379 13.394
SSTF 5 2.111 2.072 2.275 2.147 SSTF 5 8.091 7.931 9.276 9.803
10 2.202 2.171 1.980 2.172 10 10.245 8.376 7.224 11.199
20 2.322 1.968 2.092 2.205 20 11.418 6.432 6.706 9.650
STPF 5 3.265 3.011 3.271 3.119 STPF 5 16.456 15.545 17.642 15.232
10 3.296 3.024 3.152 3.132 10 16.925 14.797 15.892 14.710
20 3.200 3.326 3.318 3.244 20 13.978 15.596 16.250 14.853
SWPF 5 6.169 6.371 6.339 6.322 SWPF 5 58.467 61.647 60.266 59.044
10 6.271 6.353 6.446 6.659 10 59.351 60.199 61.241 64.286
20 6.784 6.763 6.730 6.635 20 65.142 64.887 64.769 63.326
DI ET AL.: OPTIMIZATION OF COMPOSITE CLOUD SERVICE PROCESSING WITH VIRTUAL MACHINES 1765

TABLE 7
Fairness of RER under Various Solutions with Different t&h

strategy t h ¼ 1.25 h ¼ 1.5 h ¼ 1.75 h¼2


FCFS 5 0.353 0.338 0.345 0.352
10 0.347 0.359 0.358 0.348
20 0.352 0.344 0.354 0.347
LWF 5 0.585 0.628 0.609 0.619
10 0.552 0.640 0.581 0.579
20 0.627 0.709 0.670 0.610
SOLF 5 0.552 0.506 0.540 0.526
10 0.566 0.459 0.497 0.573
20 0.484 0.520 0.538 0.541
SSTF 5 0.594 0.602 0.558 0.549
10 0.527 0.575 0.623 0.479
20 0.517 0.660 0.650 0.544
STPF 5 0.468 0.472 0.446 0.480
10 0.457 0.473 0.474 0.494
20 0.508 0.500 0.479 0.499
SWPF 5 0.210 0.205 0.209 0.215 Fig. 7. Investigation of best-fit parameters for LWF+AAPSM.
10 0.210 0.209 0.209 0.204
20 0.206 0.207 0.206 0.208
Table 7 shows the fairness of RER with different solu-
tions. We find that the best result is adopting LWF with t
and h being set to 20 and 1.5, and the expected fairness
RER, and fairness index of RER, when adopting various value is up to 0.709, which is better than the second best
solutions with different values of t and h. In our evalua- solution (SSTF, t ¼ 20, h ¼ 1.5) by about 0:7090:66 ¼ 7:4%.
0:66
tion, we compute the average value for each of the three From the three tables, we can conclude that LWF
indicators, by traversing all of the remaining parameters, (t ¼ 20,h ¼ 1.5) is the best choice in the competitive situa-
including a and b. Accordingly, the values shown in tion with AAR 2.
three tables can be deemed relatively stable mathematical In Fig. 7, we further investigate the performance (on
expectation. minimum value, average value, and maximum value of
Through Table 5, we clearly observe that LWF and SSTF RER) with more comprehensive combinations of parame-
significantly outperforms other solutions, w.r.t. the mean ters, by taking into account scheduling policy, AAPSM and
values of RER. The mean values of RER under the two solu- RAPSM together. We find that LWF and SSTF result in
tions can be restricted down to 1.935 and 1.968 respectively, best results when their short task length thresholds (t) are
when short task length threshold is set to 20 seconds. The set to 20 and 10 seconds respectively. So, we just make the
mean value of RER under FCFS is about 4, which is about comparison in these two situations. It is observed that the
twice as large as that of LWF or SSTF. The worst situation mean values of RER of LWFþAAPSMþRAPSM(T) and
occurs when adopting SWPFþRAPSM(W) and setting t to LWFþAAPSMþRAPSM(W) (t ¼ 20) can be down to 1.64
20. In this situation, the mean value of RER is even up and 1.67 respectively, and their values of {a, b} are {300,
to 6.784, which is worse than LWF(t ¼ 20) by 6:7841:935  1:935
¼ 20} and {10,000, 2,000} respectively. In addition, the maxi-
250.6%. The reason why t¼ 20 is often better than t ¼ 5 is mum value of RER under LWFþAAPSMþRAPSM(T) (t ¼
that the former assigns more resources to short tasks at run- 20) is about 3.9, which is the best result as we observed.
time, significantly reducing the waiting cost in the system. B. Evaluation in a non-competitive situation. For the non-
However, t¼ 20 is not always better than t ¼ 5 or t¼ 10, in competitive situation, there are eight tasks submitted and
that the resource allocation is also related to other parame- AAR is about 1 in the first 50 seconds. We compare the exe-
ters like h. That is, if h is set to 2, then t¼ 20 will lead to an cution performance (i.e., RER) when assigning different val-
over-adjusted resource allocation situation, which exhibits ues to h, a, and b in a non-competitive situation. For each
worst results than t = 10. set of parameters, we perform 144 tests, based on various
Through Table 6, it is observed that that LWF and SSTF task scheduling policies and different values of threshold of
significantly outperforms other solutions, w.r.t. the maxi- short task length (i.e., t), and then compute the CDF for
mum values of RER (i.e., the worst case for each solution). each set of parameters, as shown in Fig. 8.
In absolute terms, the expected value of the worst RER Through Fig. 8, it is observed that various assignments
when adopting LWF with t ¼ 20 is about 5.539, and SSTF’s on the set of parameters result in different execution perfor-
is about 6.432, which is worse than LWF by 6:432 5:539  1 ¼ mance. For RAPSM(T) and RAPSM(W), {a ¼ 200 s, b ¼ 5 s}
16.1%. The worst case among all solutions happens when and {a ¼ 20000, b ¼ 500} often serve as the best assignment
using SWPFþRAPSM(W), and the expected value of the of the parameters respectively, regardless of h’s value. For
worst RER is even up to 64.887, which is about 11.7 times as example, Fig. 8d shows that when h is set to 2.0, {a ¼ 200
large as that of LWF (t ¼ 20). The expected value of the seconds, b ¼ 5 seconds} is better than other choices by [3.8,
worst RER under FCFS is about 23, which is about four 7.1 percent]. In addition, we can also observe a relatively
times as large as that of LWF. high instability in the other assignments of parameters. For
1766 IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO. 6, JUNE 2015

Fig. 8. Average RER with different parameters for the non-competitive situation.

instance, with RAPSM(T), {a ¼ 100 seconds, b ¼ 20 sec- application deadlines meanwhile. Whereas, they overlook
onds} exhibits good results in Fig. 8c (h ¼ 1.75), but bad the competitive situation by assuming the resource pool is
results in Fig. 8 (h ¼ 2.0); Using RAPSM(W) with always adequate and users have unlimited budgets. Many
{a ¼ 10,000, b ¼ 2,000}, about 93 percent of tasks’ RERs are of other methods like Genetic algorithms [35] and Simulated
below 1 when setting h to 1.75, while the corresponding Annealing algorithm [36], often overlooked the execution
ratio is only 86 percent when setting eta to 2.0. overheads in VM operation or data transmission, and per-
formed the evaluation through simulation.
6 RELATED WORK In addition to scheduling model, many Cloud manage-
Although job scheduling problem [26] in Grid computing ment researchers focus on the optimization of resource
[27] has been extensively studied for years, most of them assignment. Unlike Grid systems whose compute nodes are
(such as [28], [29]) are not suited for our cloud composite exclusively consumed by jobs, the resource allocation in
service processing environment. Grid jobs are often with Cloud systems are able to be refined by leveraging VM
long execution length, while Cloud tasks are often short resource isolation technology. Stillwell et al. [37] exploited
based on [13]. Hence, task’s response time will be more eas- how to optimize the resource allocation for service hosting
ily degraded by scheduling/execution overheads (such as on a heterogeneous distributed platform. Their research is
waiting time and data transmission cost) in Cloud environ- formalized as a mixed integer linear program (MILP) prob-
ment than in Grid environment. That is, the overheads in lem and treated as a rational LP problem instead, also with
Cloud environment should be treated more carefully. fundamental theoretical analysis based on estimate errors. In
Recently, many new scheduling methods are proposed comparison to their work, we intensively exploit the best-
for different Cloud systems. Zaharia et al. [30] designed a suited scheduling policy and resource allocation scheme for
task scheduling method to improve the performance of the competitive situation. We also take into account user pay-
Hadoop [31] for a heterogeneous environment (such as a ment requirement, and evaluate our solution on a real-VM-
pool of VMs each being customized with different abilities). deployment environment which needs to tackle more practi-
Unlike the FCFS policy and speculative execution model cal technical issues like minimization of various execution
originally used in Hadoop, they designed a so-called lon- overheads. Meng et al. [38] analyzed VM-pairs’ compatibil-
gest approximate time to end (LATE) policy, that assigns ity in terms of the forecasted workload and estimated VM
higher priorities to the jobs with longer remaining execution sizes. SnowFlock [39] is another interesting technology that
lengths. Their intuition is maximizing the opportunity for a allows any VM to be quickly cloned (similar to UNIX process
speculative copy to overtake the original and reduce job’s fork) such that the resource allocation would be automati-
response time. Isard et al. [32] proposed a fair scheduling cally refined at runtime. Kuribayashi [40] also proposed a
policy (namely Quincy) for a high performance compute resource allocation method for Cloud computing environ-
system with virtual machines, in order to maximize the ments especially based on divisible resources. BlobCR [41]
scheduling fairness and minimize the data transmission aims to optimize the performance of HPC applications on
cost meanwhile. Compared to these works, our Cloud sys- IaaS clouds at system level, by improving the robustness of
tem works with a strict payment model, under which the running virtual machines using virtual disk image snap-
optimal resource allocation for each task can be computed shots. In comparison, our work focuses on the theoretical
based on convex optimization theory. Mao et al. [33], [34] optimization of performance when system runs in short sup-
proposed a solution by combining dynamic scheduling and ply and corresponding implementation issues at the applica-
EDF strategy, to minimize user payment and meet tion level.
DI ET AL.: OPTIMIZATION OF COMPOSITE CLOUD SERVICE PROCESSING WITH VIRTUAL MACHINES 1767

7 CONCLUSION AND FUTURE WORK [5] D. Gupta, L. Cherkasova, R. Gardner, and A. Vahdat,
“Enforcing performance isolation across virtual machines in
In this paper, we designed and implemented a loosely- xen,” in Proc. 7th ACM/IFIP/USENIX Int. Conf. Middleware,
coupled Cloud system with web services deployed on 2006, pp. 342–362.
[6] J. N. Matthews, W. Hu, M. Hapuarachchi, T. Deshane, D. Dimatos,
multiple VMs, aiming to improve the QoS of each user G. Hamilton, M. McCabe, and J. Owens, “Quantifying the perfor-
request and maximize fairness of treatment at runtime. mance isolation properties of virtualization systems,” Proc. ACM
Our contribution is three-fold: (1) we studied the best- Workshop Exp. Comput. Sci., 2007, pp. 1–9.
[7] S. Chinni and R. Hiremane, “Virtual machine device queues,”
suited task scheduling policy with VMs; (2) we explored Virt. Technol. White Paper, 2008, [Online]. Available: http://
an optimal resource allocation scheme and an adjusted www.intel.com/content/www/us/en/virtualization/vmdq-
strategy to suit the competitive situation; (3) the processing technology-paper.html., pp. 1–22.
overhead is minimized in our design. Based on our experi- [8] T. Cucinotta, D. Giani, D. Faggioli, and F. Checconi,
“Providing performance guarantees to virtual machines using
ments, we summarize the following lessons. real-time scheduling,” in Proc. 5th ACM Workshop Virtualization
High-Perform. Cloud Comput., 2010, pp. 657–664.
 We confirm that the best scheduling policy of [9] R. Nathuji, A. Kansal, and A. Ghaffarkhah, “Q-clouds: Managing
scheduling sequential-mode tasks in the competi- performance interference effects for qos-aware clouds,” in Proc.
tive situations, is either lightest-workload-first or ACM Euro. Conf. Comp. Sys., 2010, pp. 237–250.
[10] R. Ghosh, and V. K. Naik, “Biting off safely more than you can
SSTF. Each of them improves the performance by chew: Predictive analytics for resource over-commit in IaaS
about 86 percent compared to FCFS. As for the cloud,” in Proc. IEEE 5th Int. Conf. Cloud Comput., 2012, pp. 25–32.
parallel-mode tasks, the best-fit policy is combin- [11] Google cluster-usage traces. (2011). [Online]. Available: http://
ing LWF and longest subtask first, and the aver- code.google.com/p/googleclusterdata
[12] C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A.
age RER is lower than other solutions by 3.8-51.6 Kozuch, “Towards understanding heterogeneous clouds at scale:
percent. Google trace analysis,” Intel Sci. Technol. Center Cloud Comput.,
 For a competitive situation, the best solution is com- Carnegie Mellon Univ., Pittsburgh, PA, USA, Tech. Rep. ISTC–
CC–TR–12–101, Apr. 2012.
bining lightest-workload-first with AAPSM and [13] S. Di, D. Kondo, and W. Cirne, “Characterization and comparison
RAPSM (in absolute terms, LWFþAAPSMþRAPSM of cloud versus grid workloads,” in Proc. IEEE Int. Conf. Cluster
with short task length threshold and extension coef- Comput., 2012, pp. 230–238.
ficient being set to 20 seconds and 1.5 respectively). [14] M. Rahman, S. Venugopal, and R. Buyya, “A dynamic critical path
algorithm for scheduling scientific workflow applications on
It outperforms other solutions in the competitive sit- global grids,” in Proc. 3rd IEEE Int. Conf. e-Sci. Grid Comput., 2007,
uation, by 16 þ % w.r.t. the worst-case response pp. 35–42.
time. The fairness under this solution is about 0.709, [15] M. Maheswaran, S. Ali, H. J. Siegel, D. Hensgen, and R. F. Freund,
which is higher than that of the second best solution “Dynamic matching and scheduling of a class of independent tasks
onto heterogeneous computing systems,” in Proc. 8th Heterogeneous
(SSTFþAAPSMþRAPSM) by 7:4þ%. Comput. Workshop, 1999, p. 30.
 For a non-competitive situation, {a ¼ 200 seconds, [16] EDF Scheduling. (2008). [Online]. Available: http://en.wikipedia.
b ¼ 5 seconds} serves as the best assignment of the org/wiki/earliest_deadline_first_scheduling
[17] S. Di, Y. Robert, F. Vivien, D. Kondo, C-L. Wang, and F. Cappello,
parameters, regardless of the threshold value of set- “Optimization of cloud task processing with checkpoint-restart
ting the short task length (h). mechanism,” in Proc. IEEE/ACM Int. Conf. High Perform. Comput.,
In the future, we plan to further exploit an adaptive solu- Netw., Storage Anal., 2013, pp. 64:1–64:12.
tion that can dynamically optimize the performance in both [18] L. Huang, J. Jia, B. Yu, B. G. Chun, P. Maniatis, and M. Naik,
“Predicting execution time of computer programs using sparse
competitive and non-competitive situations. We also plan to polynomial regression,” in Proc. 24th Int. Conf. Neural Inf. Process.
improve the ability of fault tolerance and resilience in our Syst. 2010, pp. 1–9.
cloud system. [19] Xen-credit-scheduler. (2003). [Online]. Available: http://wiki.
xensource.com/xenwiki/creditscheduler
[20] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho,
ACKNOWLEDGMENTS R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the art of
virtualization,” in Proc. 19th ACM Symp. Operating Syst. Principles.
This work was made by the ANR project Clouds@home 2003, pp. 164–177.
(ANR-09-JCJC-0056-01), also supported by the U.S. Depart- [21] Amazon elastic compute cloud. (2006). [Online]. Available:
http://aws.amazon.com/ec2/
ment of Energy, Office of Science, under Contract DE- [22] M. Feldman, K. Lai, and L. Zhang, “The proportional-share alloca-
AC02-06CH11357, and also in part by HKU 716712E. tion market for computational resources,” IEEE Trans. Parallel Dis-
tributed Syst., vol. 20, no. 8, pp. 1075–1088, Aug. 2009.
[23] Gideon-II Cluster. (2010). [Online]. Available: http://i.cs.hku.hk/
REFERENCES clwang/Gideon-II
[24] S. Di, D. Kondo, and C. L. Wang, “Optimization and stabilization
[1] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A.
of composite service processing in a cloud system, ” in Proc. IEEE/
Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M.
ACM 21st Int. Symp. Quality Serv., 2013, pp. 1–10.
Zaharia, “Above the clouds: A Berkeley view of cloud
[25] P. Wendykier and J. G. Nagy, “Parallel colt: A high-performance
computing,” EECS, Univ. California, Berkeley, CA, USA, Tech.
java library for scientific computing and image processing,” ACM
Rep. UCB/EECS-2009-28, Feb. 2009.
Trans. Math. Softw., vol. 37, pp. 31:1–31:22, Sep. 2010.
[2] L. M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, “A
[26] C. Jiang, C. Wang, X. Liu, and Y. Zhao, “A survey of job schedul-
break in the clouds: Towards a cloud definition,” SIGCOMM Com-
ing in grids,” in Proc. Joint 9th Asia-Pacific Web 8th Int. Conf. Web-
put. Commun. Rev., vol. 39, no. 1, pp. 50–55, 2009.
Age Information Manage. Conf. Advances Data Web Manage, 2007,
[3] Google app engine. (2008). [Online]. Available: http://code.
pp. 419–427.
google.com/appengine/
[27] I. Foster and C. Kesselman, The Grid 2: Blueprint for a New Comput-
[4] J. E. Smith and R. Nair, Virtual Machines: Versatile Platforms for
ing Infrastructure (The Morgan Kaufmann Series in Computer Architec-
Systems and Processes. San Mateo, CA, USA: Morgan Kaufmann,
ture and Design). San Mateo, CA, USA: Morgan Kaufmann, Nov.
2005.
2003.
1768 IEEE TRANSACTIONS ON COMPUTERS, VOL. 64, NO. 6, JUNE 2015

[28] E. Imamagic, B. Radic, and D. Dobrenic, “An approach to grid Sheng Di received his Master (M.Phil) degree
scheduling by using condor-G matchmaking mechanism,” in from Huazhong University of Science and Tech-
Proc. 28th Int. Conf. Inf. Technol. Interfaces, 2006, pp. 625–632. nology in 2007 and Ph.D degree from The Uni-
[29] Y. Gao, H. Rong, and J. Z. Huang, “Adaptive grid job scheduling versity of Hong Kong in Nov. of 2011, both on
with genetic algorithms,” Future Generation Comput. Syst., vol. 21, Computer Science. Dr. Di is currently a postdoc
pp. 151–161, Jan. 2005. research at Argonne National Laboratory,
[30] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, Lemont, USA. His research interest involves opti-
“Improving mapreduce performance in heterogeneous environ- mization of distributed resource allocation in
ments,” in Proc. 8th USENIX Conf. Operating Syst. Des. Implementa- large-scale cloud platforms, characterization and
tion, 2008, pp. 29–42. prediction of workload at Cloud data centers, and
[31] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop fault tolerance on Cloud/HPC.
distributed file system,” in Proc. IEEE 26th Symp. Mass Storage
Syst. Technol., 2010, pp. 1–10.
[32] M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Derrick Kondo received the bachelor’s degree
Goldberg, “Quincy: Fair scheduling for distributed computing from Stanford University in 1999, and the mas-
clusters,” in Proc. ACM SIGOPS 22nd Symp. Operating Syst. Princi- ter’s and PhD degrees from the University of
ples, 2009, pp. 261–276. California at San Diego in 2005, all in computer
[33] M. Mao, J. Li, and M. Humphrey, “Cloud auto-scaling with dead- science. He is currently a tenured research
line and budget constraints,” in Proc. 11th IEEE/ACM Int. Conf. scientist at INRIA—Grenoble, Montbonnot-Saint-
Grid Comput., 2010, pp. 41–48. Martin, France. His current research interests
[34] M. Mao and M. Humphrey, “Auto-scaling to minimize cost and include reliability, fault-tolerance, statistical anal-
meet ApplicationDeadlines in cloud workflows,” in Proc. IEEE/ ysis, job and resource management. He received
ACM Int. Conf. High Perform. Comput., Netw., Storage Anal., 2011, the Young Researcher Award (similar to US
pp. 49:1–49:12. National science Foundation (NSF)’s CAREER
[35] S. Kaur and A. Verma, “An efficient approach to genetic algorithm Award) in 2009, and the Amazon Research Award in 2010, and the
for task scheduling in cloud computing,” Int. J. Inf. Technol. Com- Google Research Award in 2011. He is a member of the IEEE.
put. Sci., vol. 10, pp. 74–79, 2012.
[36] S. Zhan and H. Huo, “Improved PSO-based task scheduling algo-
rithm in cloud computing,” J. Inf. Comput. Sci., vol 9, no. 13, Cho-Li Wang is currently a Professor in the
pp. 3821–3829, 2012. Department of Computer Science at The Univer-
[37] M. Stillwell, F. Vivien, and H. Casanova, “Virtual machine sity of Hong Kong. He graduated with a B.S.
resource allocation for service hosting on heterogeneous distrib- degree in Computer Science and Information
uted platforms,” in Proc. IEEE Int. Parallel Distrib. Process. Symp., Engineering from National Taiwan University in
Shanghai, China, 2012, pp. 786–797. 1985 and a Ph.D. degree in Computer Engineer-
[38] X. Meng and et al., “Efficient resource provisioning in compute ing from University of Southern California in
clouds via VM multiplexing,” in Proc. 7th Int. Conf. Autonomic 1995. Prof. Wang’s research is broadly in the
Comput., 2010, pp. 11–20. areas of parallel architecture, software systems
[39] H. A. L. Cavilla, J. A. Whitney, A. M. Scannell, P. Patchin, S. M. for Cluster computing, and virtualization techni-
Rumble, E. de Lara, M. Brudno, and M. Satyanarayanan, ques for Cloud computing. His recent research
“SnowFlock: Rapid virtual machine cloning for cloud projects involve the development of parallel software systems for multi-
computing,” in Proc. 4th ACM Eur. Conf. Comput. Syst., 2009, core/GPU computing and multi-kernel operating systems for future
pp. 1–12. manycore processor. Prof. Wang has published more than 150 papers
[40] S.-i. Kuribayashi, “Optimal joint multiple resource allocation in various peer reviewed journals and conference proceedings. He is/
method for cloud computing environments,” Int. J. Res. Rev. Com- was on the editorial boards of several scholarly journals, including IEEE
put. Sci., vol. 2, pp. 1–8, 2011. Transactions on Cloud Computing, IEEE Transactions on Computers,
[41] B. Nicolae and F. Cappello, “BlobCR: Efficient checkpoint-restart and Journal of Information Science and Engineering. He also serves
for HPC applications on IaaS clouds using virtual disk image as a coordinator (China) of the IEEE Technical Committee on Parallel
snapshots, ” in Proc. IEEE/ACM Int. Conf. High Perform. Comput., Processing (TCPP).
Netw., Storage Anal., 2011, pp. 34:1–34:12.
" For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.

You might also like