Predicting The Performance of Virtual Machine Migration
Predicting The Performance of Virtual Machine Migration
Migration
Sherif Akoush, Ripduman Sohan, Andrew Rice, Andrew W. Moore and Andy Hopper
University of Cambridge Computer Laboratory
[email protected]
AbstractWith the ability to move virtual machines between
physical hosts, live migration is a core feature of virtualisation.
However for migration to be useful, deployable feature on a
large (datacentre) scale, we need to predict migration times with
accuracy. In this paper, we characterise the parameters affecting
live migration with particular emphasis on the Xen virtualisation
platform. We discuss the relationships between the important
parameters that affect migration and highlight how migration
performance can vary considerably depending on workload. We
further provide 2 simulation models that are able to predict
migration times to within 90% accuracy for both synthetic and
real-world benchmarks.
I. I NTRODUCTION
Virtualisation has become a core facility in modern computing installations. It provides opportunities for improved
efficiency by increasing hardware utilisation and application
isolation [1] as well as simplifying resource allocation and
management. One key feature that makes virtualisation attractive is that of live migration.
Live migration platforms (e.g. XenMotion [2] and VMotion [3]) allow administrators to move running virtual machines (VMs) seamlessly between physical hosts. This is of
particular benefit for service providers hosting high availability
applications. The level of service to which a service provider
commits when hosting and running an application is described
by a Service Level Agreement (SLA) and is typically couched
in terms of application availability. Policies such as 99.999%
availability (common in the telecommunication industry) permit only 5 minutes of downtime a year. Routine activities
such as restarting a machine for hardware maintenance are extremely difficult under such a regime and so service providers
invest considerable resources in high-availability fault-tolerant
systems. Live migration mitigates this problem by allowing
administrators to move VMs with little interruption. This permits regular maintenance of the physical hardware; supports
dynamic reconfiguration enabling cluster compaction in times
of low demand; and shifts computing load to manage cooling
hotspots within a datacentre [4].
However, short interruptions of service are still unavoidable
during live migration due to the overheads of moving the
running VM. Previous studies have demonstrated that this
can vary considerably between applications due to different
memory usage patterns, ranging from 60 milliseconds when
migrating a Quake game server [5] to 3 seconds in the
case of High-Performance Computing benchmarks [6]. This
variation means that predicting the duration of any interruption
P re-copyi + Stop-and-copy
T otalDowntime =Stop-and-copy
+ Commitment + Activation
{z
}
|
(2)
P ost-migrationOverhead
C. Migration Bounds
Given the stop conditions, it is possible to work out the
upper and lower migration performance bounds for a specific
migration algorithm. We will use a real-world case to characterise these boundaries.
While there exist a range of live migration platforms, for the
remainder of this paper we will base our analysis on the Xen
migration platform. Xen is already being used as the basis
for large scale cloud deployments [11] and thus this work
would immediately benefit these deployments. Moreover, Xen
is open-source allowing us to quickly and efficiently determine
the migration sub-system design and implementation. Note
however that our measurement techniques, methodology, and
prediction models design basis are applicable to any virtualisation platform that employs the pre-copy migration mechanism.
The stop conditions that are used in Xen migration algorithm are defined as follows:
1) Less than 50 pages were dirtied during the last pre-copy
iteration.
2) 29 pre-copy iterations have been carried out.
3) More than 3 times the total amount of RAM allocated
to the VM has been copied to the destination host.
The first condition guarantees a short downtime as few
pages are to be transferred. On the other hand, the other 2
conditions just force migration into the stop-and-copy stage
which might still have many modified pages to be copied
across resulting in large downtime.
1) Bounding Total Migration Time (Equation 3): Consider
the case of an idle VM running no applications. In this
case the iterative pre-copy stage will terminate after the first
iteration as there is no memory difference. Consequently, the
migration sub-system need only to send the entire RAM in
the first round. The total migration lower bound is thus the
time required to send the entire RAM coupled with pre- and
post-migration overheads.
On the other hand, consider the case where the entire
memory pages are being modified as fast as link speed. In
this scenario, the iterative pre-copy stage will be forced to
terminate after copying more than 3 times the total amount
of RAM allocated to the VM. Migration then re-sends the
entire modified RAM during the stop-and-copy stage. The total
migration upper bound is thus defined as the time required to
send 5 times the VM size less 1 page1 plus pre- and postmigration overheads.
V M Size
T otalM igrationT ime
LinkSpeed
5 V M Size 1 page
Overheads +
(3)
LinkSpeed
2) Bounding Total Downtime (Equation 4): Similarly, the
total downtime lower bound is defined as the time required
for the post-migration overhead, assuming that the final stopand-copy stage does not transfer any pages. This occurs either
when the VM is idle or the link speed is fast enough to copy
all dirtied pages in the pre-copy stage. On the other hand, the
total downtime upper bound is defined as the time required to
copy the entire RAM in the stop-and-copy stage coupled with
the post-migration overhead.
Overheads +
P ost-migrationOverhead T otalDowntime
V M Size
(4)
P ost-migrationOverhead +
LinkSpeed
3) Difference in Bounds: Modelling bounds is useful as it
enables us to reason about migration times provided that we
know the link speed and VM memory size. These bounds are
the limits in which the total migration time and total downtime
are guaranteed to lie. Given a 1,024 MB VM and 1 Gbps
migration link, for example, the total migration time has a
lower bound of 13 and upper bound of 50 seconds respectively.
Similarly, the downtime has a lower bound of .314 and upper
bound of 9.497 seconds respectively.
Table I illustrates the migration bounds for some common
link speeds. While the downtime lower limit is fixed (as it is
dependent purely on post-migration overhead) all other bounds
vary in accordance to link speed due to their correlation with
the VM memory size. As the table indicates, the bounds vary
significantly. For bigger VM memory sizes (which is common
in current installations [12]) we have even larger differences.
Thus, using bounds is at best an imprecise exercise and does
not allow for accurate prediction of migration times. Building
better predictions requires understanding the relationship between factors that impact migration performance. We address
this in the next section.
1
n1
X
i=0
{z
} |
1V M Size
{z
3V M Size1page
{z
1V M Size
TABLE I
M IGRATION B OUNDS . MT: T OTAL M IGRATION T IME (S ECONDS ). DT:
T OTAL D OWNTIME (M ILLISECONDS ). LB: L OWER B OUND . UB: U PPER
B OUND . VM S IZE = 1,024 MB.
Speed
100 Mbps
1 Gbps
10 Gbps
M TLB
95.3 s
13.3 s
5.3 s
M TU B
459.1 s
49.9 s
10.1 s
DTLB
314 ms
314 ms
314 ms
DTU B
91,494.5 ms
9,497.8 ms
1,518.7 ms
Fig. 1. Effect of Link Bandwidth on Migration Performance. VM Size= 1,024 MB. Confidence intervals are omitted because of insufficient vertical resolution.
Fig. 2.
Infrastructure
as the pool master, while the others (Host A and B) are used
for live migration runs.
The storage area network is configured using an IBM eserver xSeries 336 having Intel(R) Xeon(TM) X5470 3.00 GHZ
CPU, 2 GB fully buffered DIMM modules, integrated dual
Gigabit Ethernet, and an Ultra320 SCSI controller. ISCSI
support is provided by the Linux target framework (tgt) [14]
installed on a Ubuntu server running the 2.6.27-7 kernel.
A number of client machines generate load for SPECweb
and SPECsfs while a desktop machine is used for automating
the experiments and analysing measurements. Three links
running on separate subnets using a dedicated Netgear Gigabit
switch provide management, storage and guest networking
functions respectively. Migration is carried out over dedicated
back-to-back connections between Host A and B.
D. Optimising Migration For 10 Gbps Links
We evaluate our prediction models by comparing simulation
results with measurements obtained through actual migration
runs executed over a pair of directly connected SolarFlare
10 Gbps network interconnects. However, our initial results
showed that the default Xen migration platform is incapable
of providing a migration throughput higher than 3.2 Gbps.
Profiling the migration sub-system highlighted a high overhead associated with mapping the guest domain (DomU)
physical pages. The Xen migration sub-system carries out
migration by continually mapping and unmapping the physical
pages of the migrating DomU in the control domain (Dom0)
migration process. In doing this Dom0 is able to access and
send DomU page contents to the remote host.
The default Xen implementation maps and unmaps DomU
pages in 4 MB segments (for batch transfer). These operations
have a high temporal overhead due to the time required to
setup and tear down page table entries in the migration process
page table. As a result, Dom0 migration process spends a
significant amount of time in page table management and is
therefore unable to keep the link fully utilised.
We modified the Xen migration sub-system to eliminate this
bottleneck by changing the design so that Dom0 migration
process maps the entire guest VM physical address space at the
start of the migration. Although the overhead of mapping the
entire address space is in the order of hundreds of milliseconds
(as it is proportional to the VM size), this cost is amortised
TABLE II
P REDICTION M EAN E RROR FOR D IFFERENT L INK S PEEDS AND VM
S IZES . MT: T OTAL M IGRATION T IME . DT: T OTAL D OWNTIME .
V M Size
1,024
1,024
1,024
1,024
1,024
1,024
512
512
Speed
100 Mbps
100 Mbps
1 Gbps
1 Gbps
10 Gbps
10 Gbps
10 Gbps
10 Gbps
M odel
AVG
HIST
AVG
HIST
AVG
HIST
AVG
HIST
M Terr
1.8%
3.5%
1.6%
2.5%
2.6%
3.3%
3.2%
3.8%
DTerr
7.5%
8.0%
9.3%
7.4%
3.3%
6.2%
7.1%
4.9%
Fig. 3. Evaluation for the AVG and HIST Models. Bandwidth= 10 Gbps. VM Size= 1,024 MB. MT: Total Migration Time. DT: Total Downtime. Confidence
intervals are omitted because of insufficient vertical resolution.
TABLE III
warmup time. We force a live migration every 2 minutes over
I NDUSTRY- STANDARD W ORKLOADS . CPU: SPEC CPU. WEB:
SPEC WEB . SFS: SPEC SFS . MR: M AP R EDUCE TASKS . MT: T OTAL
the duration of the entire 10 runs.
M IGRATION T IME ( IN S ECONDS ). DT: T OTAL D OWNTIME ( IN
The average measured total migration time and downtime
M ILLISECONDS ). BANDWIDTH = 10 G BPS . VM S IZE = 1,024 MB ( FOR
are
14.9 seconds and 348 milliseconds respectively. We also
CPU AND WEB) AND 4,096 MB ( FOR SFS AND MR). A: ACTUAL
M EASUREMENT. P: THE HIST M ODEL S P REDICTION .
tracked the total number of dirty pages for the benchmark run,
CPU
WEB
SFS
MR
M TA
5.8 s
7.5 s
14.8 s
14.9 s
M TP
5.7 s
7.4 s
14.9 s
15.13 s
Err
2.4%
2.0%
1.5%
1.4%
DTA
317.3 ms
449.5 ms
217.6 ms
348.9 ms
DTP
314.1 ms
420.4 ms
217.7 ms
348.1 ms
Err
2.4%
6.4%
0.1%
0.2%
TABLE IV
M IGRATION ON 100 M BPS LINK FOR M AP R EDUCE TASKS . MT: T OTAL
M IGRATION T IME ( IN S ECONDS ). DT: T OTAL D OWNTIME ( IN
M ILLISECONDS ). VM S IZE = 4,096 MB. A: ACTUAL M EASUREMENT. P:
THE HIST M ODEL S P REDICTION .
Min
Max
M TA
525.82 s
1,109.88 s
M TP
567.80 s
1,219.37 s
DTA
629.00 ms
42,534.00 ms
DTP
499.61 ms
45,431.12 ms
Fig. 5. Tracking the Total Number of Dirtied Pages for MapReduce Workload