0% found this document useful (0 votes)
2 views

Improved Algorithms for Data Migration

Uploaded by

John Doe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Improved Algorithms for Data Migration

Uploaded by

John Doe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Improved Algorithms for Data Migration

Samir Khuller1 , Yoo-Ah Kim, and Azarakhsh Malekian


1
Department of Computer Science, University of Maryland, College Park, MD 20742.
Research supported by NSF Award CCF-0430650.
E-mail: samir,[email protected].
2
Department of Computer Science and Engineering, University of Connecticut,
Storrs, CT 06269.
E-mail: [email protected].

Abstract. Our work is motivated by the need to manage data on a col-


lection of storage devices to handle dynamically changing demand. As
demand for data changes, the system needs to automatically respond to
changes in demand for different data items. The problem of computing
a migration plan among the storage devices is called the data migration
problem. This problem was shown to be N P -hard, and an approxima-
tion algorithm achieving an approximation factor of 9.5 was presented
for the half-duplex communication model in [Khuller, Kim and Wan: Al-
gorithms for Data Migration with Cloning, SIAM J. on Computing, Vol.
33(2):448–461 (2004)]. In this paper we develop an improved approxima-
tion algorithm that gives a bound of 6.5 + o(1) using various new ideas.
In addition, we develop better algorithms using external disks and get
an approximation factor of 4.5. We also consider the full duplex com-
munication model and develop an improved bound of 4 + o(1) for this
model, with no external disks.

1 Introduction
To handle high demand, especially for multimedia data, a common approach is
to replicate data objects within the storage system. Typically, a large storage
server consists of several disks connected using a dedicated network, called a
Storage Area Network. Disks typically have constraints on storage as well as
the number of clients that can access data from a single disk simultaneously.
These systems are getting increasing attention since TV channels are moving to
systems where TV programs will be available for users to watch with full video
functionality (pause, fast forward, rewind etc.). Such programs will require large
amounts of storage, in addition to bandwidth capacity to handle high demand.
Approximation algorithms have been developed [16, 17, 7, 11] to map known
demand for data to a specific data layout pattern to maximize utilization, where
the utilization is the total number of clients that can be assigned to a disk that
contains the data they want. In the layout, we compute not only how many copies
of each item we need, but also a layout pattern that specifies the precise subset
of items on each disk. The problem is N P -hard, but there are polynomial-time
approximation schemes [7, 16, 17, 11]. Given the relative demand for data, the
algorithm computes an almost optimal layout. Note that this problem is slightly
different from the data placement problem considered in [9, 15, 3] since all the
disks are in the same location, it does not matter which disk a client is assigned
to. Even in this special case, the problem is N P -hard [7].
Over time as the demand for data changes, the system needs to create new
data layouts. The problem we are interested in is the problem of computing a
data migration plan for the set of disks to convert an initial layout to a target
layout. We assume that data objects have the same size (these could be data
blocks, or files) and that it takes the same amount of time to migrate any data
item from one disk to another disk. In this work we consider two models. In the
first model (half-duplex) the crucial constraint is that each disk can participate
in the transfer of only one item – either as a sender or as a receiver. In other
words, the communication pattern in each round forms a matching. Our goal is to
find a migration schedule to minimize the time taken to complete the migration
(makespan). To handle high demand for popular objects, new copies will have
to be dynamically created and stored on different disks. All previous work on
this problem deals with the half-duplex model. We also consider the full-duplex
model, where each disk can act as a sender and a receiver in each round for a
single item. Previously we did not consider this natural extension of the half-
duplex model since we did not completely understand how to utilize its power
to prove interesting approximation guarantees.
The formal description of the data migration problem is as follows: data item
i resides in a specified (source) subset Si of disks, and needs to be moved to a
(destination) subset Di . In other words, each data item that initially belongs
to a subset of disks, needs to be moved to another subset of disks. (We might
need to create new copies of this data item and store it on an additional set
of disks.) See Figure 1 for an example. If each disk had exactly one data item,
and needs to copy this data item to every other disk, then it is exactly the
problem of gossiping. The data migration problem in this form was first studied
by Khuller, Kim and Wan [4], and it was shown to be NP-hard. In addition, a
polynomial-time 9.5-approximation algorithm was developed for the half-duplex
communication model.
A slightly different formulation was considered by Hall et al. [10] in which a
particular transfer graph was specified. While they can solve the problem very
well, this approach is limited in the sense that it does not allow (a) cloning (cre-
ation of several new copies) and (b) does not allow optimization over the space
of transfer graphs. In [4] it was shown that a more general problem formula-
tion is the one with source and destination subsets specified for each data item.
However, the main focus in [10] is to do the transfers without violating space
constraints. Another formulation has been considered recently where one can
optimize over the space of possible target layouts [12]. The resulting problems
are also N P -hard. However, no significant progress on developing approxima-
tion algorithms was made on this problem. A simple flow based heuristic was
presented for the problem, and was demonstrated to be effective in finding good
target layouts.

2
Job migration has also been considered in the scheduling context recently as
well [2], where a fixed number of jobs can be migrated to reduce the makespan
by as much as possible. There is a lot of work on data migration for minimizing
completion time for a fixed transfer graph as well (see [6, 14] for references).

Fig. 1. An initial and target layout, and their corresponding Si ’s and Di ’s. For example,
disk 1 initially has items {2, 4, 5} and in the target layout has items {1, 3, 4}.

1.1 Communication Model

Different communication models can be considered based on how the disks are
connected. In this paper we consider two models. The first model is the same
model as in the work by Hall et al. [10, 1, 4, 13] where the disks may commu-
nicate on any matching; in other words, the underlying communication graph
allows for communication between any pair of devices via a matching (a switched
storage network with unbounded backplane bandwidth). Moreover, to model the
limited switching capacity of the network connecting the disks, one could allow
for choosing any matching of bounded size as the set of transfers that can be
done in each round. We call this the bounded-size matching model. It was shown
in [4] that an algorithm for the bounded matching model can be obtained by
a simple simulation of the algorithm for the unbounded matching model with
excellent performance guarantees.
In addition we consider the full duplex model where each disk may act as a
sender and a receiver for an item in each round. Note that we do not require the
communication pattern to be a matching any more. For example, we may have
cycles, with disk 1 sending an item to disk 2, disk 2 to disk 3 and disk 3 to disk
1. In earlier work we did not discuss this model as we were unable to utilize the
power of this model to prove non-trivial approximation guarantees. Note that
this does not correspond directly to edge coloring anymore.

1.2 Our Results

Our approach is based on the approach initially developed in [4]. Using various
new ideas lets us reduce the approximation factor to 6.5+o(1). The main techni-
cal difficulty is simply that of “putting it all together” and making the analysis
work.
In addition we show two more results. If we are allowed to use “external
disks” (called bypass disks in [10]), we can improve the approximation guarantee
further to 3 + 21 max(3, γ). This can be achieved by using at most d ∆γ e external
disks, where ∆ is the number of items that need to be migrated. We assume that
each external disk can hold γ items. This gives an approximation factor of 4.5
by setting γ = 3.

3
Finally, we also consider the full-duplex model where each disk can be the
source or destination of a transfer in each round. In this model we show that an
approximation guarantee of 4 + o(1) can be achieved. Earlier, we did not focus
on this model specifically as we were unable to utilize the power of this model
in any non-trivial manner.
The algorithm developed in [4] has been implemented, and we performed an
extensive set of experiments comparing its performance with the performance of
other heuristics [8]. Even though the worst case approximation factor is 9.5, the
algorithm performed very well in practice, giving approximation ratios within
twice the optimal solution in most cases.

2 The Data Migration Algorithm


We start this section by describing some theorems from the edge coloring and
scheduling literature and also the lower bounds that we will use in the following
sections for analysis. In the second part, we present our suggested data migration
algorithm.

2.1 Preliminaries
Our algorithms make use of known results on edge coloring of multigraphs.
Given a graph G with max degree ∆G and multiplicity µ the following results
are known (see Bondy-Murty [5] for example). Let χ0 be the edge chromatic
number of G. Note that when G is bipartite, χ0 = ∆G and such an edge coloring
can be obtained in polynomial time [5].

Theorem 1. (Vizing [20]) If G has no self-loops then χ0 ≤ ∆G + µ.

Theorem 2. (Shannon [18]) If G has no self-loops then χ0 ≤ b 23 ∆G c.

Another result that we will use (related to scheduling) is the following theorem
by shmoys and Tardos:
Theorem 3. (Shmoys-Tardos [19]) We are given a collection of jobs J , each
of which is to be assigned to exactly one machine among the set M; if job j ∈ J
is assigned to machine i ∈ M, then it requires pij units of processing time, and
incurs a cost cij . Suppose that there exists a fractional solution (that is, a job
can be assigned fractionally to machines) with makespan P and total cost C.
Then in polynomial time we can find a schedule with makespan P + max pij and
total cost C.
We are using two main lowerbounds for our analysis. As in [4] let βj be
|{i|j ∈ Di }|, i.e., the number of different sets Di , to which a disk j belongs.
We then define β as maxj=1...N βj . In other words, β is an upper bound on the
number of items a disk may need. Note that β is a lower bound on the optimal
number of rounds, since the disk j that attains the maximum, needs at least β
rounds to receive all the items i such that j ∈ Di , since it can receive at most

4
one item in each round. Another lower bound that we will use in the analysis
is α which is defined as follows: For an item i decide a primary source si ∈ Si
so that α = maxj=1,...,N (|{i|j = si }| + βj ) is minimized. In other words, α is
the maximum number of items for which a disk may be a primary source (si ) or
destination. As one can see α is also a lower bound on the optimal number of
rounds.
Moreover, we may assume that Di 6= ∅ and Di ∩ Si = ∅. This is because we
can define the destination set Di as the set of disks that need item i and do not
currently have it.
Next, we present a high level description of our suggested data migration
algorithm.

2.2 Data Migration Algorithm: High Level Idea

The high level idea of the algorithm can be described as follows:

Algorithm 4

1. For each item i, find a disk call it the primary source si ∈ Si such that
maxj=1,...,N (|{i|j = si }| + βj ) is minimized. Later we show how we can do
this step in polynomial time.
2. For each item i, we define two different subgroups Gi ⊆ Di and Ri (⊆ Di )
with the following properties:
– Gi ’s are disjoint from each other and at first, we send item i to these
disks.
– Ri sets are not disjoint from each other but each disk belongs to only
a bounded number of different Ri sets. In our algorithm, we send data
from Gi to Ri and then from Ri to the rest of the disks in Di .
3. Ri sets are selected as follows:
(a) First partition Di into subgroups Di,k k = 0 . . . d |Dqi | e of size at most q(q
is a parameter that will be specified later.) That is, we partition Di into
b |Dqi | c subgroups of size q and possibly one subgraph of size less than q
(if |Di | is not a multiple of q).
(b) Now select Ri ⊆ Di and assign Di,k subsets to a disk in Ri such that
for each disk in Ri the total size of subgroups assigned to the disk is at
most β + q. (We will see later that it is always possible to select Ri with
this property.) Let ri be the disk in Ri to which the small subgroup (a
subgroup with size strictly less than q) is assigned. Note that if |Di | is a
multiple of q, there is no disk ri . We define Ri to be Ri \ ri .
4. Compute Gi ⊆ Di such that |Gi | = b |Dβi | c and they are mutually disjoint.
5. For each item i for which Gi = ∅ but Ri 6= ∅, we select a disk gi . Let G0i = Gi
if Gi is not empty and G0i = {gi } otherwise.
Note that gi disk exists iff q < |Di | < β.
6. Send data item i from the primary source si to G0i .

5
7. Send item i from G0i to Ri \ G0i by setting up a transfer graph and using an
edge coloring to synchronize the transfer. Here, Ri is defined to be Ri \ ri
where ri is the disk in Ri to which the small subgroup (a subgroup with size
strictly less than q) is assigned.
8. Send item i from si to ri if ri has not received item i.
9. Finally set up a transfer graph from Ri to Di \ Ri . We find an edge coloring
of the transfer graph and the number of colors used is an upper bound on
the number of rounds required to ensure that each disk in Di gets item i. In
Lemma 7 we derive an upper bound on the number of required colors.

In the previous approach given in [4], first we chose disjoint representative sets
in Di (equivalent to Gi sets here)and items are first migrated to these subsets
and then from these subsets to the rest of the disks in need of each item. By
choosing disjoint sets, broadcasting inside the subsets are faster and easier but
also disjointness limits the size of the subsets and that would cause the increase in
the number of rounds in the following phase. Here, we add new representative sets
(Ri ) that are not necessarily disjoint. We still use disjoint subsets to broadcast
to these sets and then we send data from the new representative sets (Ri ) to
the remaining disks. In the following sections, we will describe the details of the
algorithm 4.

3 Step Details
We present the step details in the same order that was described in the algorithm
4.

Step 1: Selecting the primary source for each item. This is exactly the same as
Lemma 3.1 described in [4].
Lemma 1. [4] We can find a source si ∈ Si for each item i so that
maxj=1...N (|{i|j = si }| + βj ) is minimized, using a flow network.

Proof. We create a flow network with a source s and a sink t as shown in Figure
??. We have two set of nodes corresponding to disks and items. Add directed
edges from s to nodes for items and also directed edges from item i to disk j if
j ∈ Si . The capacities of all those edges are one. Finally we add an edge from
the node corresponding to disk j to t with capacity α − βj . We want to find the
minimum α so that the maximum flow of the network is ∆. We can do this by
checking if there is a flow of ∆ with α starting from max βj and increasing by
one until it is satisfied. If there is outgoing flow from item i to disk j, then we
set j as si .

Step 3: Selecting Ri for each item i. Let Dik (k = 1, . . . , d |Dqi | e) be k-th subgroup
of Di . The size of Dik is q for k = 1, . . . , b |Dqi | c and Dik , k = b |Dqi | c + 1 contains
the remaining disks in Di of size |Di | − q · b|Di |/qc (and it could be possibly
empty). To show how we assign Dij to Ri we use Theorem 3. In our problem,

6
we can think of each subgroup Dik as a job and each disk as a machine. If
disk j belongs to Di , then we can assign job Dik to disk j with zero cost. The
processing time is the size of Dik , which is at most q. If disk j does not belongs
to Di , then the cost to assign Dik to j is ∞ (disk j cannot be in Ri ). First we
can show that:
Lemma 2. There exists a fractional assignment such that the max load of each
disk is at most β.

Proof. We can assign |D1i | fraction of subgroup Dik to each disk j ∈ Di . It is


easy to check that every subgroup Dik is completely assigned. The load on disk
j is given by
X X |Dik | X 1 X X
= |Dik | = 1≤β
|Di | |Di |
i:j∈Di k i:j∈Di k i:j∈Di

Now we can show that:

Lemma 3. There is a way to choose Ri sets for each i = 1 . . . ∆ and assign sub-
groups Dik such that for each disk in Ri the total size of subgroups Dik assigned
to the disk is at most β + q.

Proof. By Theorem 3, we can convert the fractional solution obtained in Lemma


2 to an integral solution such that each subgroup is completely assigned to one
disk, and the maximum load on a disk is at most β + q. ( Since as maximum
size of Dik is q.)

Considering this assignment, we can directly conclude that:


Fact. For each disk j, at most β/q + 1 different large subgroups Dik (of size
exactly q) can be assigned to the disk j.
The reason is that the number of disks assigned to Ri is at most β + q and
the size of each large subgroup is q. We will use this fact later.

Step 4: Select G0i ⊆ Di . We can find disjoint sets Gi ⊆ Di using the same
algorithm as in [4]. For completeness we include their method here:
Lemma 4. There is a way to choose disjoint sets Gi for each i = 1 . . . ∆, such
that |Gi | = b |Dβi | c and Gi ⊆ Di .

Proof. Create a flow network with a source s and sink t. In addition we have two
sets of vertices U and W . The first set U has ∆ nodes, each corresponding to a
disk that is the source of an item. The set W has N nodes, each corresponding
to a disk in the system. We add directed edges from s to each node in U , such
that the edge (s, i) has capacity b |Dβi | c. We also add directed edges with infinite
capacity from node i ∈ U to j ∈ W if j ∈ Di . We add unit capacity edges from s
to t in this network. The min-cut in this network is obtained by simply selecting
the outgoing edges from s. We can find a fractional flow of this value as follows:

7
saturate all the outgoing edges from s. From each node i there are Di edges to
nodes in W . Suppose Γi = b |Dβi | c. Send β1 units of flow along Γi β ≤ |Di | this
can be done. Observe that the total incoming flow to a vertex in W is at most
1 since there are at most β incoming edges, each carrying at most β1 units of
flow. An integral max flow in this network will correspond to |Gi | units of flow
going from s to i, and from i to a subset of vertices in Di before reaching t. The
vertices to which i has non-zero flow will form the set Gi .

The above approach can help us find Gi sets. However if Gi = ∅ but Ri 6= ∅,


we need to select another disk gi as well. Note that if |Gi | = 0 then |Di | < β,
and therefore, |Ri | < βq . We define G0i to be Gi if Gi 6= ∅ and G0i = gi otherwise.
Next, we show how to select gi as well.

Lemma 5. For P each item i for which Gi = ∅ but Ri 6= ∅, we can find gi so that
for a disk j, i:j=gi |Ri | ≤ 2 βq + 1.

Or in other words, each gi is responsible for at most q + 1 disks.

Proof. We again use Theorem 3. Reduce the problem to the following scheduling
problem: Consider each disk as a machine. For each item such that |Gi | = 0,
create a job of size |Ri |. The cost of assigning job i to disk j is 1 iff j ∈ Ri ,
otherwise it is infinite. Note that there is a fractional assignment such that the
load on each disk(machine) is at most βq + 1. The way to show it is by assigning
1
|R |
fraction of each job to each machine (disk) in its Ri set. The load due to
i

this job on the machine (disk) is 1. Since a disk is in at most βq + 1 different


R̄i sets (based on the fact given in previous section), the fractional load on each
machine(disk) is at most βq + 1. By applying the Shmoys-Tardos [19] scheduling
algorithm (Theorem 3), we can find an assignment of jobs (items) to machines
(disks) such that the total cost is at most the number of items and the load on
each machine (disk) is at most 2β q + 1. Note that the size of each job is at most
β
q .) gi will be the disk(machine) that item i is assigned to.

Step 6: Sending items from Si to G0i First we show how to send data from Si to
G0i and also give the number of rounds these transfers take. We claim that this
can be done in 2OP T + O( βq ) rounds. We develop a lower bound on the optimal
solution by solving the following linear program L(m) for a given m.
m
XX
L(m) : nijk xijk ≥ |G0i | for all i (1)
j k=1
X
xijk ≤ 1 for all j, k (2)
i
0 ≤ xijk ≤ 1 (3)

where nijk = min(2m−k , |G0i |) if disk j belongs to Si and nijk = 0 otherwise.


Intuitively, xijk indicates that at time k, disk j send item i to some disk in

8
G0i . Let M be the minimum m such that L(m) has a feasible solution. Note
that M is a lowerbound for the optimal solution. (Otherwise, consider a feasible
migration and set xijk based on the given schedule as defined above). One can
easily verify that the schedule gives a feasible solution for the linear program
L(m). Also, we know that between all the feasible solutions, M is the smallest
possible m that has a feasible solution. Now, we show that:
Lemma 6. We can perform migrations from Si to G0i in 2 · M + O(β/q) rounds.

Proof. Given a fractional P solution
P ∗∗ x toP L(M
P), we can obtain an integral solution
∗∗ ∗
x such that for all i, j k xijk ≥ b j Pk xP ijk c.(Using Lemma 3.4 from [4]).
For each item i, we arbitrarily select min( j k x∗∗ 0 0
ijk , |Gi |) disks from Gi . Let
Hi denote this subset. We create the following transfer graph from Si to Hi :
create an edge from a disk j ∈ Si to a disk Hi if x∗∗ ijk = 1. (Make sure every disk
in Hi has an incoming edge from a disk in Si .) Note the indegree of a disk in this
transfer graph is 2 + βq since a disk can belong to Hi for at most 2 + β/q different
i’s (a disk can be gi for at most β/q + 1 different items and also may belong
to one Gi .) The outdegree is M and the multiplicity is 2β/q + 4. Therefore, we
can perform the migration from Si to Hi in M + O(β/q) rounds. For i with
|G0i | = 1, the transfer is complete. For the rest of the items, since sets G0i (= Gi )
are disjoint, we can double the number of copies in each round until the number
of copies becomes |Gi |. After M rounds, the number of copies we can make for
item i is at least
XX
2M |Hi | = 2M min( x∗∗
ijk , |Gi |)
j k
XX
M −1
≥ min(2 ·2 x∗∗
ijk , |Gi |)
j k
XX
M −1
≥ min(2 ·( x∗∗
ijk + 1), |Gi |)
j k
XX
M −1
≥ min(2 x∗ijk , |Gi |)
j k
XX
≥ min( nijk x∗ijk , |Gi |) ≥ |Gi |.
j k

The second inequality comes from the fact that j k x∗∗


P P
ijk ≥ 1. Therefore
we can finish the whole transfer from Si to G0i in 2 · M + O(β/q) rounds.

Step 7: Sending item i from G0i to Ri . We now focus on sending item i from
the disks in G0i to disks in Ri . We construct a transfer graph to send data from
G0i to Ri sets so that each disk in Ri \ G0i receives item i from one disk in G0i .
We create the transfer graph as follows: First, add directed edges from disks in
Gi to disks in Ri . Recall that |Gi | = b |Dβi | c and |Ri | = b |Dqi | c. Since Gi sets are
disjoint, there is a transfer graph in which each disk in Gi has at most Θ(β/q)
outgoing edges. For items with Gi = ∅, we put edges from gi to all disks in Ri .

9
The outdegree of each disk can be increased by at most 2 βq + 1. The indegree of
a disk in Ri is at most βq + 1 and the multiplicity is 2β
q + 2. Therefore, this step
can be done in O(β/q) rounds.

Step 8: Sending item i from si to ri . Again we create a transfer graph in which


there is an edge from si to ri if ri has not received item i in the previous steps.
The indegree of a disk j is at most βj since a disk j is selected as ri only if
j ∈ Di and the outdegree of disk j is at most α − βj . Using Theorem 2, this step
can be done in 3α2 rounds.
S 0
Step 9: Sending item i fromS R0 i to Di \ (Ri Gi ). We now create a transfer
graph from Ri to Di \ (Ri Gi ) such that there is an edge from disk a ∈ Ri to
disk b if the subgroup that b belongs to is assigned to a in Lemma 3. We find an
edge coloring of the transfer graph. The following lemma gives an upper bound
on the number of rounds required to ensure that each disk in Di gets item i.
Lemma 7. The number of colors we need to color the transfer graph is at most
3β + q.

Proof. First, we compute the maximum indegree and outdegree of each node.
The outdegree of a node is at most β + q due to the way we choose Ri (See
Lemma 3). The indegree of each node is at most β since in the transfer graph we
send items only to the disks in their corresponding destination sets. Multiplicity
of the graph is also at most β since we send data item i from disk j to disk k (or
vice versa) only if both disk j and k belong to Di . By Theorem 1, we see that
the maximum number of colors needed is at most 3β + q.

To wrap up, in the next theorem we show that the total number of rounds
in this algorithm is bounded by 6.5+o(1) times the optimal solution.

Theorem 1. The total number of rounds required for the data migration is at
most 6.5 + o(1) times OP T .

Proof. The total number of rounds we need is 2M +3α/2+3β +O(β/q)+q. Since √


M , α, and β are the lowerbounds on the optimal solution, chooosing q = Θ( β)
gives the desired result.

4 External Disks

Until now we assumed that we had N disks, and the source and destination sets
were chosen from this set of disks and only essential transfers are performed. In
other words, if an item i is sent to disk j, then it must be that j ∈ Di (disk j
was in the destination set for item i), hence the total number of transfers done
is the least possible. In several situations, we may have access to idle disks with
available storage that we can make use of as temporary devices to enable a faster
completion of the transfers we are trying to schedule. In addition, we exploit the

10
fact that by performing a small number of non-essential transfers (this was also
used in [13, 10]), we can further reduce the total number of rounds required. We
show that indeed such techniques can considerably reduce the total number of
rounds required for performing the transfers from Si sets to Di sets.
We assume that each external disk has enough space to pack γ items. If we
are allowed to use d ∆γ e external disks, the approximation ratio can be improved
to 3 + max(1.5, γ2 ). For example, choosing γ = 3 gives a bound of 4.5.
Define β̄ = i=1 |D i|
P∆
N . We can see that 2β̄ is a lowerbound on the optimal
number of rounds since in each round at most b N2 c data items can be transferred.
The high level description of the algorithm is as follows:

1. Assign γ items to each external disk. Send items to their assigned external
disks.
2. For each item i, choose disjoint Gi sets of size b Dβ̄i c.
3. Send item i to all disks in the Gi set.
4. Send item i from the Gi set to all the disks in Di . We will also make use of
the copy of item i on the external disk.

We now discuss the steps in detail.


First step can be done in at most max(α, γ) rounds by sending the items from
their primary sources to the external disks (for this step we will compute α as
before, with the change that we can ignore the βj term). The maximum degree
of each disk is at most max(α, γ). Since the graph is bipartite, transferring items
to their assigned external disks can be finished in max(α, γ) rounds.
We can easily choose disjoint set Gi as we are allowed to perform non-essential
transfers (i.e., a disk j can belong to Gi even if j is not in Di .) Hence we can
use a simple greedy method to choose Gi . Broadcasting items inside Gi can be
done in 2M rounds as described in Section 2.
Next step is to send the item to all the remaining disks in the Di sets. We
make a transfer graph as follows: assign to each disk in Gi at most β̄ disks in Di
so that each disk in Di is assigned to at most one disk in the Gi set. The number
of unassigned disks from each Di set is at most β̄. Assign all of the remaining
disks from Di to the external disk containing that item. The outdegree of the
internal disks is at most β̄ since each disk belongs to at most one Gi set. The
indegree of each internal disk is at most β since a disk will receive an item only
if it is in its demand set. The multiplicity between two internal disks is at most
2. (Since each disk can belong to at most one Gi set.) So the total degree of each
internal disk is at most β + β̄. Each external disk has at most γ items and the
number of remaining disks for each item is at most β̄. So the outdegree of each
external disk is at most γ β̄ ≤ γ2 OP T .
So the maximum degree of each node in the whole graph is at most
max(β + β̄, γ β̄). and the maximum number of colors needed to color this
graph is 12 max(3, γ)OP T + max(2, γ). Adding up all these values the com-
plete transfer can be done in α + 2m0 + 3 + 12 max(3, γ)OP T + max(2, γ) ≤
(3 + 12 max(3, γ))OP T + 2γ + O(1).

11
5 Full Duplex Model
In this section we consider the full duplex communication model. In this model,
we assume that each disk can send and receive at most one item in each round.
In the half-duplex model, we assumed that at each round, a disk can either send
or receive one item (but not both at the same time). In the full duplex model
the communication pattern does not have to induce a matching since directed
cycles are allowed (the direction indicates the data transfer direction).
We develop a 4 + o(1) approximation algorithm for this model. In this model,
given a transfer graph G, we find an optimal migration schedule for G as follows:
Construct a bipartite graph by putting one copy of each disk in each partition.
We call the copy of vertex u in the first partition uA , and in the other partition
uB . We add an edge from uA to vB in the bipartite graph if and only if there
is a directed edge in the transfer graph from u to v. The bipartite graph can be
colored optimally in polynomial time and the number of colors is equal to the
maximum degree of the bipartite graph.
Note that β and M are still lower bounds on the optimal solution in the full-
duplex model. The algorithm is the same as in Section 2 except the procedure
to select primary sources si .

– For each item i, decide a primary source si so that α0 =


maxj=1...N (max(|{j|j = si }|, βj )) is minimized. Note that α0 is also a lower
bound for the optimal solution. We can find these primary sources as shown
in Lemma 8 by adapting the method used in [4].

We show how to find the primary sources si .


Lemma 8. By using network flow we can choose primary sources to minimize
maxj=1...N (max(|{j|j = si }|, βj ))

Fig. 2. Computing α0 .

Proof. Create two vertices s and t. (See Figure 5 for example.) Make two sets, one
for the items and one for the disks. Add edges from s to each node corresponding
to an item of unit capacity. Add a directed edge of infinite capacity between item
j and disk i if i ∈ Sj . Add edges of capacity α0 from each node in the set of disks
to t. Find the minimum α0 (initially α0 = β), so that we can find a feasible flow
of value ∆. For each item j, choose the disk as its primary source sj to which it
sends one unit of flow.

Theorem 1. There is a 4 + o(1) approximation algorithm for data migration in


the full duplex model.

12
Proof. Step 1 (from Si to G0i ) and Step 2 (from G0i to Ri ) still take 2M + O(β/q)
rounds and O(β/q) rounds, respectively. For Step 3, if we construct a bipartite
graph, then the max degree is at most max(α0 , β), which is the number of rounds
required for this step. For Step 4, the maximum degree of the bipartite graph is
0
β + q. Therefore, the total number √ of rounds we need is 2M + max(α , β) + β +
O(β/q) + q. By choosing q = Θ( β), we can obtain a 4 + o(1)-approximation
algorithm.

References
1. E. Anderson, J. Hall, J. Hartline, M. Hobbes, A. Karlin, J. Saia, R. Swaminathan
and J. Wilkes. An Experimental Study of Data Migration Algorithms. Workshop
on Algorithm Engineering, pages 145–158, London, UK, 2001. Springer-Verlag
2. G. Aggarwal, R. Motwani and A. Zhu. The load rebalancing problem. Symp. on
Parallel Algorithms and Architectures, pages 258–265, (2003).
3. I. D. Baev and R. Rajaraman. Approximation algorithms for data placement in
arbitrary networks. Proc. of ACM-SIAM SODA, pp. 661–670, 2001.
4. S. Khuller, Y.A. Kim and Y.C. Wan. Algorithms for Data Migration with
Cloning, Siam J. on Comput., Vol. 33, No. 2, pp. 448–461,Feb. 2004.
5. J. A. Bondy and U. S. R. Murty. Graph Theory with Applications. American
Elsevier, New York, 1977.
6. R. Gandhi and J. Mestre. Combinatorial algorithms for Data Migration to min-
imize the average completion time. APPROX (2006) (to appear).
7. L. Golubchik, S. Khanna, S. Khuller, R. Thurimella and A. Zhu. Approximation
Algorithms for Data Placement on Parallel Disks. Proc. of ACM-SIAM SODA,
pages 661–670, Washington, D.C., USA, 2000. Society of Industrial and Applied
Mathematics.
8. L. Golubchik, S. Khuller, Y. Kim, S. Shargorodskaya and Y. C. Wan. Data mi-
gration on parallel disks. Proc. of European Symp. on Algorithms (2004). LNCS
3221, pages 689–701. Springer. To appear in Special Issue of Algorithmica from
ESA 2004.
9. S. Guha and K. Munagala. Improved algorithms for the data placement problem,
2002. Proc. of ACM-SIAM SODA, pages 106–107, San Fransisco, CA, USA, 2002.
Society of Industrial and Applied Mathematics.
10. J. Hall, J. Hartline, A. Karlin, J. Saia and J. Wilkes. On Algorithms for Efficient
Data Migration. Proc. of ACM-SIAM SODA, pp. 620–629, 2001.
11. S. Kashyap and S. Khuller. Algorithms for Non-Uniform Size Data Placement on
Parallel Disks. Conference on FST&TCS Conference, LNCS 2914, pp. 265–276,
2003. Full version to appear in Journal of Algorithms (2006).
12. S. Kashyap, S. Khuller, Y. C. Wan and L. Golubchik. Fast reconfiguration of
data placement in parallel disks. 2006 ALENEX Conference, Jan 2006.
13. S. Khuller, Y. Kim and Y. C. Wan. On Generalized Gossiping and Broadcasting.
ESA Conference. pages 373–384, Budapest, Hungary, 2003. Springer.
14. Y. Kim. Data Migration to minimize the average completion time. Proc. of ACM-
SIAM SODA, pp. 97–98, 2003.
15. A. Meyerson, K. Munagala, and S. A. Plotkin. Web caching using access statis-
tics. In Symposium on Discrete Algorithms, pages 354–363, 2001.
16. H. Shachnai and T. Tamir. On Two Class-constrained Versions of the Multiple
Knapsack Problem. Algorithmica, 29:442–467, 2001.

13
17. H. Shachnai and T. Tamir. Polynomial Time Approximation Schemes for Class-
constrained Packing Problems. Workshop on Approximation Algorithms, LNCS
1913, pp. 238–249, 2000.
18. C.E. Shannon. A Theorem on Colouring Lines of a Network. J. Math. Phys.,
28:148–151, 1949.
19. D.B. Shmoys and E. Tardos. An Aproximation Algorithm for the Generalized
Assignment Problem. Mathematical Programming, A 62, pp. 461–474, 1993.
20. V. G. Vizing. On an Estimate of the Chromatic Class of a p-graph (Russian).
Diskret. Analiz. 3:25–30, 1964.

14

You might also like