Zhang 2016
Zhang 2016
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 1
Abstract—In computing clouds, burstiness of a virtual machine (VM) workload widely exists in real applications, where spikes usually
occur aperiodically with low frequency and short duration. This could be effectively handled through dynamically scaling up/down in a
virtualization-based computing cloud; however, to minimize energy consumption, VMs are often highly consolidated with the minimum
number of physical machines (PMs) used. In this case, to meet the dynamic runtime resource demands of VMs in a PM, some VMs have
to be migrated to some other PMs, which may cause potential performance degradation. In this paper, we investigate the burstiness-
aware server consolidation problem from the perspective of resource reservation, i.e., reserving a certain amount of extra resources on
each PM to avoid live migrations, and propose a novel server consolidation algorithm, QUEUE. We first model the resource requirement
pattern of each VM as a two-state Markov chain to capture burstiness, then we design a resource reservation strategy for each PM
based on the stationary distribution of a Markov chain. Finally, we present QUEUE, a complete server consolidation algorithm with
a reasonable time complexity. We also show how to cope with heterogenous spikes and provide remarks on several extensions.
Simulation and testbed results show that, QUEUE improves the consolidation ratio by up to 45% with large spike size and around
30% with normal spike size compared with the strategy that provisions for peak workload, and achieves a better balance between
performance and energy consumption in comparison with other commonly-used consolidation algorithms.
Index Terms—Bursty workload, Markov chain, resource reservation, server consolidation, stationary distribution
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 2
adlo
kor (1) To the best of our knowledge, we are the first
W )R ) to quantify the amount of reserved resources with
(e
e
R(
p
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 3
pon adlo
kor
1-pon W )R )
OFF ON (e
e
(d
R
p
(Rb)
)R ikp aol
(Rp) (d
b
S kor
1-poff aol
rko Wk
W ae
poff la P
mro
N 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Time
Fig. 2. Two-state Markov chain. The “ON” state repre- Workload Profile ON-OFF Model
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 4
Symbol Meaning
n the number of VMs
performance of Hj . The performance of each PM should
Vi the i-th VM be probabilistically guaranteed, so we have
Rbi the normal workload size of Vi
Rei the spiky workload size of Vi Φj ≤ ρ, ∀1 ≤ j ≤ m. (3)
Rpi the peak workload size of Vi , and Rpi = Rbi + Rei
pion the probability of Vi switching from OFF to ON Here, ρ is a predetermined value serving as the threshold
piof f the probability of Vi switching from ON to OFF of COR. Main notations are summarized in Fig. 4 for
m the number of PMs quick reference. Our problem can be stated as follows.
Hj the j-th PM
Cj the physical capacity of Hj
Problem 1: (Burtiness-Aware Server Consolidation,
xij the variable indicating whether V Mi is placed on P Mj BASC) Given a set of n VMs and a set of m PMs, find a
Φj the capacity overflow ratio of PM P Mj VM-to-PM mapping X to minimize the number of PMs
ρ the threshold of capacity overflow ratio
used while making sure that (1) the initial placement sat-
Fig. 4. Main notations for quick reference. isfies capacity constraint, and (2) the capacity overflow
ratio of each PM is not larger than the threshold ρ. It can
be formally formulated as follows:
4 P ROBLEM F ORMULATION ∑
n
We consider a computing cloud with 1-dimensional re- min |{j| xij > 0, 1 ≤ j ≤ m}|
i=1
source; for scenarios with multi-dimensional resources, (4)
we provide a few remarks in Section 8. There are m s.t. COj0 = 0, ∀1 ≤ j ≤ m
physical machines in the computing cloud, and each PM Φj ≤ ρ, ∀1 ≤ j ≤ m
is described by its physical capacity Here, |S| denotes the cardinality of set S. In the
Hj = (Cj ), ∀1 ≤ j ≤ m. (2) following theorem, we can prove that, the BASC problem
is NP-complete.
We use a binary matrix X = [xij ]n×m to represent Theorem 1: The BASC problem is NP-complete.
the results of placing n VMs on m PMs: xij = 1, if Vi Proof: We prove this theorem by reduction from the
is placed on Hj , and 0 otherwise. We assume that the Bin Packing (BP) problem [31], which is NP-complete.
workloads of VMs are mutually independent. Let Wi (t) The decision version of the BP problem is as follows.
be the resource requirements of Vi at time t. According Given n items with sizes s1 , s2 , ..., sn ∈ (0, 1], can we
the Markov chain model, we have pack them in no more than k unit-sized bins?
Given an instance of the decision version of the BP
Rbi if Vi is in the “OFF” state at time t, problem, we can construct an instance of the decision
Wi (t) = version of our problem as follows: let Rbi = Rpi = si ,
Rpi if Vi is in the “ON” state at time t.
∀1 ≤ i ≤ n; let m = k; let Cj = 1, ∀1 ≤ j ≤ m; and let
Then, the ∑
aggregated resource requirement of VMs on ρ = 0, i.e., capacity overflow is not allowed.
n
PM Hj is i=1 xij Wi (t). It is not hard to see that the construction can be
finished in polynomial time; thus, we reduce solving
Let COjt indicate whether the capacity overflow hap-
the NP-complete BP problem to solving a special case
pens on PM Hj at time t, i.e.,
of our problem, implying that our problem is NP-hard.
∑n It is easy to verify that the BASC problem is also in NP;
1 if xij Wi (t) > Cj ,
t
COj = i=1
the theorem follows immediately.
0 otherwise.
5 B URSTINESS -AWARE R ESOURCE R ESER -
Intuitively, the results of VM placement should guaran- VATION
tee that the capacity constraint is satisfied on each PM In this section, we first present the main idea of our
at the beginning of the time period of interest, i.e., solution to the BASC problem, then we design a resource
reservation strategy for a single PM, based on which
COj0 = 0, ∀1 ≤ j ≤ m.
we develop QUEUE, a complete server consolidation
We now can define our metric for probabilistic perfor- algorithm. In the end, we provide a concrete example
mance guarantee—capacity overflow ratio (COR), which is to help readers better understand our algorithm.
the fraction of time that the aggregated workloads of a
PM exceed its physical capacity. Denoting the capacity 5.1 Overview of QUEUE
overflow ratio of PM Hj as Φj , we have We propose reserving a certain amount of physical re-
∑ t sources on each PM to accommodate workload spikes.
1≤t≤T COj
Φj = , The main idea is to abstract the reserved spaces as
T blocks (i.e., serving windows in queueing theory). We
where T is the length of the time period of interest. give an informal illustration of the evolution process of
It is easy to see that, a smaller Φj implies a better our queueing system in Fig. 5.
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 5
Queueing Reducing
System Re Re Re R e Re Blocks Re Re Re OFF to ON (i.e., VMs that enter ( ) the queueingx!system)
Re Re Re Re Re
at time t, respectively. We use xy to denote y!(x−y)! , i.e.,
x choose y. Since the workloads of VMs are mutually
Rb Rb Rb Rb Rb Rb R b Rb Rb Rb Rb Rb Rb Rb Rb
independent, we have
( )
V1 V2 V3 V4 V5 V1 V2 V 3 V4 V5 V1 V 2 V3 V4 V5 θ(t) x
Physical Machine Physical Machine Physical Machine P r{O(t) = x} = pof f (1 − pof f )θ(t)−x ,
(a) (b) (c) x
and
Fig. 5. An illustration of the evolution process. (a) The ( )
original provisioning strategy for peak workload. (b) Gath- k − θ(t) x
P r{I(t) = x} = pon (1 − pon )k−θ(t)−x ,
ering all Re ′ s together to form a queueing system. (c) Re- x
ducing the number of blocks while still satisfying Equ. (3). which suggest that both O(t) and I(t) follow the bino-
mial distribution:
{
Initially, all VMs are provisioned by Rb + Re , and each O(t) ∼ B(θ(t), pof f ),
(5)
VM has its own block (denoted as Re in Fig. 5). A VM I(t) ∼ B(k − θ(t), pon ).
uses only its Rb part during periods of normal workload,
however, when a workload spike occurs, the extra Re Without loss of generality, we assume that the switch
part is put into use. We note that, the collected Re ′ s between two consecutive states of all VMs happens at
altogether form a queueing system—when a workload the end of each time interval. Then we have the recursive
spike occurs in a VM, the VM enters the queueing system relation of θ(t),
and occupies one of the idle blocks; when the spike θ(t + 1) = θ(t) − O(t) + I(t). (6)
disappears, the corresponding VM leaves the queue-
ing system and releases the block. It is worth noting Combining Equs. (5) and (6) together, we see that, the
that, there is no waiting space in the queueing system; next state θ(t + 1) only depends on the current state
thus, the PM capacity constraint would be violated if a θ(t) and not on the past sequence of states θ(t − 1),
workload spike occurs while all the blocks are occupied, θ(t − 2), ..., θ(0). Therefore, the stochastic process θ(0),
which never happens when the number of blocks equals θ(1), ... of discrete time ({0, 1, 2, ...}) and discrete space
the number of co-located VMs (as shown in Fig. 5(b)). ({0, 1, 2, ..., k}) is a Markov chain. The stochastic process
However, we may find that a certain number of blocks is said to be in state i (1 ≤ i ≤ k) if the number of busy
are idle for the majority of the time in Fig. 5(b), so we can blocks is i. Fig. 6 shows the transition graph of the chain.
reduce the number of blocks while only incurring very Let pij be the transition probability from state i to state
few capacity violations (as shown in Fig. 5(c)). Therefore, j. That is to say, if θ(t) = i, then the probability that
our goal becomes reserving minimal number of blocks θ(t + 1) = j is pij . For the sake( ) of convenience, when
on each PM while the performance constraint in Equ. (3) y > x or y ≤ x < 0, we let xy be 0. Then, pij can be
is still satisfied. derived as follows.
pij =P r{θ(t + 1) = j|θ(t) = i}
5.2 Resource Reservation Strategy for a Single PM ∑
i
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 6
p0k
Algorithm 1 Calculating Minimum Blocks (CalMinBlk)
p02 p1k
p2k Input: k, the number of co-located VMs on a PM;
p00
p01 p12 pon , the switch probability from “OFF” to “ON”;
0 p11 1 p22 2 k pof f , the switch probability from “ON” to “OFF”;
p10 p21 pk2 pkk
ρ, capacity overflow ratio threshold
p20 Output: K, the minimum number of blocks that should
pk1
be reserved on a PM
pk0
1: Calculate the transition matrix P using Equ. (7)
Fig. 6. The transition graph of the stochastic process 2: Prepare the coefficient matrix of the homogeneous
{θ(0), θ(1), ..., θ(t), ...}. The stochastic process is said to system of linear equations described in Equ. (9)
be in state i (1 ≤ i ≤ k) if the number of busy blocks is i. 3: Solve the the homogeneous system via Gaussian
pij is the transition probability from state i to state j. elimination and get the stationary distribution π
4: Calculate K from π using Equ. (8)
5: return K;
(t) (t) (t)
(π0 , π1 , ..., πk ) be the distribution of the chain at time
(t)
t, i.e., πh = P r{θ(t) = h}, ∀0 ≤ h ≤ k. For our chain,
which is finite, π (t) is a vector of k+1 nonnegative entries which suggests that K is the minimum number that
∑k ∑K
(t)
such that h=0 πh = 1. In linear algebra, vectors of this guarantees h=0 πh ≥ 1 − ρ.
type are called stochastic vectors. Then, it holds that This is because, when he number of reserved blocks on
PM Hj is reduced from k to K, if the queueing system is
π (t+1) = π (t) P. in state h, which is larger than K, then capacity overflow
Suppose π is a distribution over the state space occurs, i.e., COjt = 1, and vice versa. Thus we have
{0, 1, 2, ..., k} such that, if the chain starts with an initial ∑
1≤t≤T COj
t ∑k ∑
K
distribution π (0) that is equal to π, then after a transition, Φj = = πh = 1 − πh ≤ ρ.
the distribution of the chain is still π (1) = π. Then the T
h=K+1 h=0
chain will stay in the distribution π forever:
We now show how to calculate the stationary distri-
P P P
π −→ π −→ π −→ · · · · · · bution π. According to its definition, we have π = πP,
which is equivalent to the following homogeneous sys-
Such π is called a stationary distribution. For our
tem of linear equations that can be solved by Gaussian
chain, we have the following theorem.
elimination. k
Theorem 2: For the Markov chain defined in Fig. 6 and ∑
πh ph0 − π0 = 0
Equ. (7), given an arbitrary initial distribution π (0) , π (t)
will converge to the same distribution π, which satisify
h=0
∑ πh ph1 − π1 = 0
k
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 7
MinN[k]
pon , the switch probability from “OFF” to “ON”; 10
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 8
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 9
4: for g = 2 to G do
8: r←
∑ g ∑ ′
i−1 ∑i
M inN [x′i ] × max{Rej | xh < j ≤ x′h }
100
7 P ERFORMANCE E VALUATION
would be time-consuming. So we use G to restrict the In this section, we conduct extensive simulations and
maximum number of groups. testbed experiments to evaluate the proposed algorithms
under different settings and reveal insights of the pro-
We use S to record the best partition so far (line 1), posed design performance.
and use rmin to record the amount of resources needed
by that partition S (line 2). For each integer g (2 ≤ g ≤
G), we first generate all possible g-partitions of k using 7.1 Simulation Setup
Equ. (11); then, for each permutation x′1 , ..., x′g of a g- Two commonly-used packing strategies are considered
partition x1 , ..., xg , we compute the amount of resources here, which both use the First Fit Decrease heuristic for
needed by this ordered partition (line 8) and compare it VM placement. The first strategy is to provision VMs
with rmin : if this ordered partition uses fewer resources for peak workload (FFD by Rp ), while the second is
than S, we update S and rmin (lines 9-11). Finally, the to provision VMs for normal workload (FFD by Rb ).
optimal ordered partition S is returned. Provisioning for peak workload is usually applied for
It takes O(p(k)) time to generate all possible partitions the initial VM placement [1], where cloud tenants choose
kg−1
of an integer k [34]. Since pg (k) ∼ g!(g−1)! , generat- the peak workload as the fixed capacity of the VM to
ing all possible g-partitions (1 ≤ g ≤ G) requires guarantee application performance. On the other hand,
∑G kg−1 provisioning for normal workload is usually applied in
O( g=1 g!(g−1)! ) time. Taking G = 3 for example, since
the consolidation process, since at runtime the majority
p1 (k) = O(1), p2 (k) = O(k), and p3 (k) = O(k 2 ), of VMs are in the OFF state, i.e., most of the VMs only
generating all possible g-partitions (1 ≤ g ≤ 3) requires have normal workloads.
O(n2 ) time. For each possible permutation of a partition, We consider both the situations without and with live
evaluating the amount of resources needed requires migration, where different metrics are used to evalu-
O(k) time, thus, the total time complexity of Alg. 3 is
∑G kg−1 ate the runtime performance. For experiments without
O(k g=1 (g−1)! ). In practice, we can choose a proper G live migration, where only local resizing is allowed to
to achieve a balance between complexity and optimality. dynamically provision resources, we use the capacity
We use PM H2 in Fig. 8 as an example, where the sizes overflow ratio (COR) defined in Section 4 as the perfor-
of the four spikes are 13, 10, 10, and 9. Without loss of mance metric. Next, in our testbed experiments, we add
generality, we let Re1 = 13, Re2 = 10, Re3 = 10, and Re4 = 9. live migration to our system to simulate a more realistic
We consider all ordered 1-partition, 2-partitions, and 3- computing cluster, in which the number of migrations
partitions of k = 4, i.e., G = 3. Fig. 9 shows the results, reflects the quality of performance, and the number of
where the optimal ordered partition is {2, 2}. active PMs reflects the level of energy consumption.
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 10
50 50 50
RP RP RP
QUEUE QUEUE QUEUE
RB RB RB
0 0 0
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
Number of VMs Number of VMs Number of VMs
Fig. 11. Packing results. The common settings are: ρ = 0.01, d = 16, pon = 0.01, pof f = 0.09, and Cj ∈ [80, 100]. (a)
Rb = Re , Rb and Re ∈ [2, 20]. (b) Rb > Re , Rb ∈ [12, 20], Re ∈ [2, 10]. (c) Rb < Re , Rb ∈ [2, 10], Re ∈ [12, 20].
Average COR
Average COR
RB RB RB
Capacity Overflow Ratio Threshold Capacity Overflow Ratio Threshold Capacity Overflow Ratio Threshold
1% 1% 1%
Fig. 12. Comparison results of QUEUE and RB with respect to capacity overflow ratio (COR).
7.2 Simulation Results indicates the frequency of spike occurrence. For a bursty
workload, the spikes usually occur with low frequency
We first evaluate the computation cost of our algorithm
and short duration, therefore, we choose pon = 0.01 and
briefly, and then quantify the reduction of the number
pof f = 0.09. Workload patterns are distinguished via
of running PMs, as well as compare the runtime perfor-
setting different ranges for Rb and Re . For Figs. 11(a)
mance with two commonly-used packing strategies.
and 12(a), Rb = Re , Rb and Re ∈ [2, 20]; for Figs. 11(b)
To investigate the performance of our algorithm in
and 12(b), Rb > Re , Rb ∈ [12, 20], Re ∈ [2, 10], and for
various settings, three kinds of workload patterns are
Figs. 11(c) and 12(c), Rb < Re , Rb ∈ [2, 10], Re ∈ [12, 20].
used for each experiment: Rb = Re , Rb > Re and
Rb < Re , which denote workloads with normal spike We see that QUEUE significantly reduces the number
size, small spike size, and large spike size, respectively. It of PMs used, as compared with FFD by Rp (denoted
will be observed later that, the workload pattern of VMs as RP). When Rb < Re , the number of PMs used in
does affect the packing result, number of active PMs, and QUEUE is reduced by 45% compared with RP, where
number of migrations. the ratios for Rb = Re and Rb > Re are 30% and
According to the results in Section 5.3, the time com- 18%, respectively. FFD by Rb (denoted as RB) uses even
plexity of QUEUE is O(d4 + nlogn + mn). In Fig. 10, we fewer PMs, but the runtime performance is disastrous
present the experimental computation cost of QUEUE according to Fig. 12. The COR of RB is unacceptably
with reasonable d and n values. We see that, our algo- high. With larger spike sizes (Rb < Re ), the packing
rithm incurs very few overheads with moderate n and d result of QUEUE is better, because more PMs are saved
values. The cost variation with respect to n is not even compared with RP, and fewer additional PMs (for live
distinguishable in the millisecond-level. migrations) are used compared with RB (see Fig. 11(c)).
To evaluate the consolidation performance of QUEUE Simultaneously, with larger spike sizes, the average COR
in different settings, we then choose Rb and Re uniform- of QUEUE is slightly higher, but is still bounded by ρ
ly and randomly from a certain range for each VM. We (see Fig. 12(c)). The case of smaller spike sizes shows the
repeat the experiments multiple times for convergence. opposite results.
The capacity overflow ratio (COR) is used here as the We mention that, there are very few PMs with CORs
metric of runtime performance. Since FFD by Rp never slightly higher than ρ in each experiment. This is be-
incurs capacity violations, it is not included in the per- cause a Markov chain needs some time to enter into its
formance assessment. stationary distribution. Though we did not theoretically
Figs. 11 and 12 show the packing results and COR evaluate whether the chain constructed in Section 5
results, respectively. The common settings for three sub- is rapid-mixing, in our experiments, we find that the
figures are as follows: ρ = 0.01, d = 16, pon = 0.01, pof f = time period before the chain enters into its stationary
0.09, and Cj ∈ [80, 100]. As we discussed in Section 3, pon distribution is very short.
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 11
4000
Schedule Module
3500
Local Resizing
Consolidation Module
Number of requests
& Live Migration
3000
Performance Monitor Logger
2500
1000
0 50 100 150 200 250 300 350 400 450 500
Time
Fig. 13. The architecture of our testbed.
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 12
16
QUEUE
20
RB QUEUE
1XPEHURIPigrations
14 18
14
10
12
8
10
6 8
6
4
4
2
2
0 0
Rb=Re Rb>Re Rb<Re 10 20 30 40 50 60 70 80 90 100
Time[VHFRQGV
(a) Number of PMs used
100
Fig. 17. Comparison of time-order patterns of migration
90
QUEUE
RB
events during one of the experiments for Rb = Re . Similar
Number of migrations
80
RB−EX
results are observed for Rb > Re and Rb < Re .
70
60
10
while the number of PMs used keeps at a low level.
0
Rb=Re Rb>Re Rb<Re
We call this phenomenon cycle migration. The results of
RB-EX are more subtle. As we have observed, two kinds
(b) Number of migrations
of results are possible for RB-EX depending on different
Fig. 16. Bars show the average values, and the extend- experiment settings: (1) RB-EX uses slightly more PMs
ed whiskers show the maximum and minimum values. than RB, while cycle migration still exists like in RB; and
(ρ=0.01, pon =0.01, pof f =0.09, σ = 30s, δ = 0.3 for (2) in RB-EX, cycle migration disappears, but more PMs
RB-EX, the length of evaluation period is 100σ, and VM are used than QUEUE. From this point of view, RB-EX
configurations are set based. on Fig. 15.) performs less efficiently than QUEUE.
7.5 Summary
of all resources on each PM, which is an applicable
Key observations are summarized as follows.
consolidation strategy when, in reality, nothing about the
workload pattern (except the existence of burstiness) is (1) QUEUE reduces the number of active PMs by up to
known. In our experiments, we choose δ = 30%. The 45% with large spike size (Rb < Re ) and up to 30%
result is averaged over 10 executions for convergence. with normal spike size (Rb = Re ) in comparison
Fig. 16 shows the comparison results. At the end of with provisioning for peak workload.
the evaluation period, on average, RB uses fewer PMs (2) QUEUE incurs very few migrations, while both
than QUEUE, but it incurs many more migrations than RB and RB-EX incur excessive migrations at the
QUEUE; the performance of RB-EX is between RB and beginning of each experiment due to the over-tight
QUEUE. These performance gaps attenuate in Rb > Re initial packing, and the number of PMs used in RB
and enlarge in Rb < Re . or RB-EX increases rapidly during this period.
We also investigate the time-order patterns of migra- (3) Due to falsely picking migration targets, i.e., idle
tion events. As shown in Fig. 17, in general, QUEUE deception, RB incurs an unacceptably large number
incurs very few migrations throughout the evaluation of migrations constantly throughout the experiment,
period. At the beginning of the evaluation period, RB and the overall performance is seriously degraded.
and RB-EX incurs excessive migrations due to the over- (4) RB-EX performs less efficiently than QUEUE, while
tight initial VM placement, and the number of PMs either cycle migration exists or cycle migration dis-
used increases rapidly during this period. RB incurs appears, but more PMs are used than QUEUE.
an unacceptably large number of migrations through-
out the evaluation period, while RB-EX either incurs 8 D ISCUSSIONS
considerable number of migrations constantly, and uses Overhead of learning parameters. One limitation of the
only slightly more PMs than RB, or incurs very few 2-state Markov chain model is that, learning parameters
migrations as QUEUE and uses more PMs than QUEUE requires computing clouds to provide tentative deploy-
(sometimes uses the same number of PMs as QUEUE). ments, which may incur additional overhead to clouds.
To explain this phenomenon, we introduce a term However, this overhead can be drastically reduced if
idle deception to refer to the situation where a PM is tenants have to reserve resources for the same type of
falsely reckoned idle. In a highly-consolidated cloud, idle VMs repeatedly and lastingly. For example, about 40% of
deception is very likely to happen, i.e., a busy PM is applications are recurring in Bing’s production data cen-
likely to be selected as a migration target. As a result, ters [36]. For the same type of VMs, the cloud provider
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 13
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 14
[11] A. Verma, G. Kumar, and R. Koller, “The cost of reconfiguration Sheng Zhang received his BS and PhD de-
in a cloud,” in Proc. of ACM Middleware 2010, pp. 11–16. grees from Nanjing University in 2008 and 2014,
[12] W. Voorsluys, J. Broberg, S. Venugopal, and R. Buyya, “Cost of respectively. Currently, He is an assistant pro-
virtual machine live migration in clouds: A performance evalua- fessor in the Department of Computer Science
tion,” in Proc. of IEEE CloudCom 2009, pp. 254–265. and Technology, Nanjing University. He is also a
[13] Z. Luo and Z. Qian, “Burstiness-aware server consolidation via member of the State Key Lab. for Novel Software
queuing theory approach in a computing cloud,” in Proc. of IEEE Technology. His research interests include cloud
IPDPS 2013, pp. 332–341. computing and mobile networks. To date, he
[14] S. M. Ross, Introduction to Probability Models, Tenth Edition. Or- has published more than 15 papers, including
lando, FL, USA: Academic Press, Inc., 2011. those appeared in IEEE Transactions on Parallel
[15] X. Meng, V. Pappas, and L. Zhang, “Improving the scalability of and Distributed Systems, IEEE Transactions on
data center networks with traffic-aware virtual machine place- Computers, ACM MobiHoc, and IEEE INFOCOM. He received the Best
ment,” in Proc. of IEEE INFOCOM 2010, pp. 1–9. Paper Runner-Up Award from IEEE MASS 2012.
[16] D. Jayasinghe, C. Pu, T. Eilam, M. Steinder, I. Whally, and E. S-
nible, “Improving performance and availability of services hosted
on iaas clouds with structural constraint-aware virtual machine Zhuzhong Qian is an associate professor at the
placement,” in Proc. of IEEE SCC 2011, pp. 72–79. Department of Computer Science and Technol-
[17] Q. Zhang, M. F. Zhani, R. Boutaba, and J. L. Hellerstein, “HAR- ogy, Nanjing University, P. R. China. He received
MONY: Dynamic heterogeneity-aware resource provisioning in his PhD. Degree in computer science in 2007.
the cloud,” in Proc. of IEEE ICDCS 2013, pp. 510–519. Currently, his research interests include cloud
[18] S. Zhang, Z. Qian, J. Wu, and S. Lu, “SEA: Stable resource computing, distributed systems, and pervasive
allocation in geographically distributed clouds,” in Proc. of IEEE computing. He is the chief member of several na-
ICC 2014, pp. 2932–2937. tional research projects on cloud computing and
[19] M. Alicherry and T. Lakshman, “Network aware resource alloca- pervasive computing. He has published more
tion in distributed clouds,” in Proc. of IEEE INFOCOM 2012, pp. than 30 research papers in related fields.
963 – 971.
[20] C. Guo, G. Lu, H. J. Wang, S. Yang, C. Kong, P. Sun, W. Wu,
and Y. Zhang, “Secondnet: a data center network virtualization
architecture with bandwidth guarantees,” in ACM CoNEXT 2010, Zhaoyi Luo received his BS degree from Nan-
pp. 15:1–15:12. jing University in 2013. He is a Master student
[21] U. Sharma, P. Shenoy, S. Sahu, and A. Shaikh, “A cost-aware (MMath) in the David Cheriton School of Com-
elasticity provisioning system for the cloud,” in Proc. of IEEE puter Science, University of Waterloo, Canada.
ICDCS 2011, pp. 559–570. His research interests include computer-aided
[22] J. Kleinberg, Y. Rabani, and É. Tardos, “Allocating bandwidth for tools and techniques for analyzing software re-
bursty connections,” SIAM Journal on Computing, vol. 30, no. 1, quirements and specifications.
pp. 191–217, 2000.
[23] A. Goel and P. Indyk, “Stochastic load balancing and related
problems,” in Proc. of IEEE FOCS 1999, pp. 579–586.
[24] M. Chen, H. Zhang, Y.-Y. Su, X. Wang, G. Jiang, and K. Yoshihira,
“Effective VM sizing in virtualized data centers,” in Proc. of
IFIP/IEEE IM 2011, pp. 594–601. Jie Wu (F’09) is the chair and a Laura H. Carnell
[25] D. Breitgand and A. Epstein, “Improving consolidation of virtual professor in the Department of Computer and
machines with risk-aware bandwidth oversubscription in com- Information Sciences at Temple University. He
pute clouds,” in Proc. of IEEE INFOCOM 2012, pp. 2861–2865. is also an Intellectual Ventures endowed visiting
[26] Z. Gong, X. Gu, and J. Wilkes, “PRESS: Predictive elastic resource chair professor at the National Laboratory for
scaling for cloud systems,” in Proc. of CNSM 2010, pp. 9–16. Information Science and Technology, Tsinghua
[27] R. Calheiros, R. Ranjan, and R. Buyya, “Virtual machine pro- University. Prior to joining Temple University, he
visioning based on analytical performance and qos in cloud was a program director at the National Science
computing environments,” in Proc. of IEEE ICPP 2011, pp. 295– Foundation and was a Distinguished Professor
304. at Florida Atlantic University. His current re-
[28] G. Casale, N. Mi, and E. Smirni, “Model-driven system capacity search interests include mobile computing and
planning under workload burstiness,” IEEE Transactions on Com- wireless networks, routing protocols, cloud and green computing, net-
puters, vol. 59, no. 1, pp. 66–80, 2010. work trust and security, and social network applications. Dr. Wu regularly
[29] G. Casale, N. Mi, L. Cherkasova, and E. Smirni, “Dealing with publishes in scholarly journals, conference proceedings, and books.
burstiness in multi-tier applications: Models and their parameter- He serves on several editorial boards, including IEEE Transactions
ization,” IEEE Transactions on Software Engineering, vol. 38, no. 5, on Service Computing and the Journal of Parallel and Distributed
pp. 1040–1053, 2012. Computing. Dr. Wu was general co-chair/chair for IEEE MASS 2006,
[30] Z. Huang and D. Tsang, “SLA guaranteed virtual machine con- IEEE IPDPS 2008, IEEE ICDCS 2013, and ACM MobiHoc 2014, as
solidation for computing clouds,” in Proc. of IEEE ICC 2012, pp. well as program co-chair for IEEE INFOCOM 2011 and CCF CNCC
1314–1319. 2013. He was an IEEE Computer Society Distinguished Visitor, ACM
[31] V. V. Vijay, Approximation Algorithms. Springer, 2003. Distinguished Speaker, and chair for the IEEE Technical Committee on
[32] G. H. Hardy, E. M. Wright, D. R. Heath-Brown, and J. H. Silver- Distributed Processing (TCDP). Dr. Wu is a CCF Distinguished Speaker
man, An introduction to the theory of numbers. Clarendon press and a Fellow of the IEEE. He is the recipient of the 2011 China Computer
Oxford, 1979, vol. 4. Federation (CCF) Overseas Outstanding Achievement Award.
[33] G. E. Andrews, The Theory of Partitions. Cambridge University
Press, 1976.
Sanglu Lu received her BS, MS, and PhD de-
[34] I. Stojmenović and A. Zoghbi, “Fast algorithms for genegrating
grees from Nanjing University in 1992, 1995,
integer partitions,” International Journal of Computer Mathematics,
and 1997, respectively, all in computer science.
vol. 70, no. 2, pp. 319–332, 1998.
She is currently a professor in the Department
[35] “Xen cloud platform,” http://www.xenproject.org/.
of Computer Science and Technology and the
[36] S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and
State Key Laboratory for Novel Software Tech-
J. Zhou, “Re-optimizing data-parallel computing,” in Proc. of
nology. Her research interests include distribut-
USENIX NSDI 2012, pp. 281–294.
ed computing, wireless networks, and pervasive
[37] R. Xu and I. Wunsch, D., “Survey of clustering algorithms,” IEEE
computing. She has published over 80 papers in
Transactions on Neural Networks, vol. 16, no. 3, pp. 645–678, May
referred journals and conferences in the above
2005.
areas. She is a member of IEEE.
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.