0% found this document useful (0 votes)
23 views15 pages

Dominant Resource Fairness Fair Allocation of Mult

Uploaded by

RABINA BAGGA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views15 pages

Dominant Resource Fairness Fair Allocation of Mult

Uploaded by

RABINA BAGGA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228950060

Dominant resource fairness: Fair allocation of multiple resource types

Article · January 2011

CITATIONS READS
1,012 1,895

6 authors, including:

Ali Ghodsi Benjamin Hindman


Sharif University of Technology University of California, Berkeley
208 PUBLICATIONS 13,465 CITATIONS 9 PUBLICATIONS 2,554 CITATIONS

SEE PROFILE SEE PROFILE

Andy Konwinski Ion Stoica


University of California, Berkeley University of California, Berkeley
16 PUBLICATIONS 17,600 CITATIONS 448 PUBLICATIONS 86,957 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Andy Konwinski on 23 May 2014.

The user has requested enhancement of the downloaded file.


Dominant Resource Fairness: Fair Allocation of Multiple Resource Types

Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica
University of California, Berkeley
{alig,matei,benh,andyk,shenker,istoica}@cs.berkeley.edu

Abstract her share irrespective of the demand of the other users.


We consider the problem of fair resource allocation Given these features, it should come as no surprise
in a system containing different resource types, where that a large number of algorithms have been proposed
each user may have different demands for each resource. to implement (weighted) max-min fairness with various
To address this problem, we propose Dominant Resource degrees of accuracy, such as round-robin, proportional
Fairness (DRF), a generalization of max-min fairness resource sharing [32], and weighted fair queueing [12].
to multiple resource types. We show that DRF, unlike These algorithms have been applied to a variety of re-
other possible policies, satisfies several highly desirable sources, including link bandwidth [8, 12, 15, 24, 27, 29],
properties. First, DRF incentivizes users to share re- CPU [11, 28, 31], memory [4, 31], and storage [5].
sources, by ensuring that no user is better off if resources Despite the vast amount of work on fair allocation, the
are equally partitioned among them. Second, DRF is focus has so far been primarily on a single resource type.
strategy-proof, as a user cannot increase her allocation Even in multi-resource environments, where users have
by lying about her requirements. Third, DRF is envy- heterogeneous resource demands, allocation is typically
free, as no user would want to trade her allocation with done using a single resource abstraction. For example,
that of another user. Finally, DRF allocations are Pareto fair schedulers for Hadoop and Dryad [1, 18, 34], two
efficient, as it is not possible to improve the allocation of widely used cluster computing frameworks, allocate re-
a user without decreasing the allocation of another user. sources at the level of fixed-size partitions of the nodes,
We have implemented DRF in the Mesos cluster resource called slots. This is despite the fact that different jobs
manager, and show that it leads to better throughput and in these clusters can have widely different demands for
fairness than the slot-based fair sharing schemes in cur- CPU, memory, and I/O resources.
rent cluster schedulers. In this paper, we address the problem of fair alloca-
tion of multiple types of resources to users with heteroge-
1 Introduction neous demands. In particular, we propose Dominant Re-
Resource allocation is a key building block of any shared source Fairness (DRF), a generalization of max-min fair-
computer system. One of the most popular allocation ness for multiple resources. The intuition behind DRF is
policies proposed so far has been max-min fairness, that in a multi-resource environment, the allocation of a
which maximizes the minimum allocation received by a user should be determined by the user’s dominant share,
user in the system. Assuming each user has enough de- which is the maximum share that the user has been allo-
mand, this policy gives each user an equal share of the cated of any resource. In a nutshell, DRF seeks to max-
resources. Max-min fairness has been generalized to in- imize the minimum dominant share across all users. For
clude the concept of weight, where each user receives a example, if user A runs CPU-heavy tasks and user B runs
share of the resources proportional to its weight. memory-heavy tasks, DRF attempts to equalize user A’s
The attractiveness of weighted max-min fairness share of CPUs with user B’s share of memory. In the
stems from its generality and its ability to provide perfor- single resource case, DRF reduces to max-min fairness
mance isolation. The weighted max-min fairness model for that resource.
can support a variety of other resource allocation poli- The strength of DRF lies in the properties it satis-
cies, including priority, reservation, and deadline based fies. These properties are trivially satisfied by max-min
allocation [31]. In addition, weighted max-min fairness fairness for a single resource, but are non-trivial in the
ensures isolation, in that a user is guaranteed to receive case of multiple resources. Four such properties are

1
sharing incentive, strategy-proofness, Pareto efficiency, 7
and envy-freeness. DRF provides incentives for users to Maps
share resources by guaranteeing that no user is better off
6 Reduces

Per task CPU demand (cores)


in a system in which resources are statically and equally
5
partitioned among users. Furthermore, DRF is strategy-
proof, as a user cannot get a better allocation by lying 4
about her resource demands. DRF is Pareto-efficient as
it allocates all available resources subject to satisfying 3
the other properties, and without preempting existing al-
locations. Finally, DRF is envy-free, as no user prefers 2
the allocation of another user. Other solutions violate at
least one of the above properties. For example, the pre- 1
ferred [3, 22, 33] fair division mechanism in microeco-
nomic theory, Competitive Equilibrium from Equal In- 00 1 2 3 4 5 6 7 8 9
comes [30], is not strategy-proof. Per task memory demand (GB)
We have implemented and evaluated DRF in Figure 1: CPU and memory demands of tasks in a 2000-node
Mesos [16], a resource manager over which multiple Hadoop cluster at Facebook over one month (October 2010).
cluster computing frameworks, such as Hadoop and MPI, Each bubble’s size is logarithmic in the number of tasks in its
can run. We compare DRF with the slot-based fair shar- region.
ing scheme used in Hadoop and Dryad and show that
slot-based fair sharing can lead to poorer performance,
heavy as well, especially for reduce operations.
unfairly punishing certain workloads, while providing
Existing fair schedulers for clusters, such as Quincy
weaker isolation guarantees.
[18] and the Hadoop Fair Scheduler [2, 34], ignore the
While this paper focuses on resource allocation in dat- heterogeneity of user demands, and allocate resources at
acenters, we believe that DRF is generally applicable to the granularity of slots, where a slot is a fixed fraction
other multi-resource environments where users have het- of a node. This leads to inefficient allocation as a slot is
erogeneous demands, such as in multi-core machines. more often than not a poor match for the task demands.
The rest of this paper is organized as follows. Sec- Figure 2 quantifies the level of fairness and isola-
tion 2 motivates the problem of multi-resource fairness. tion provided by the Hadoop MapReduce fair sched-
Section 3 lists fairness properties that we will consider in uler [2, 34]. The figure shows the CDFs of the ratio
this paper. Section 4 introduces DRF. Section 5 presents between the task CPU demand and the slot CPU share,
alternative notions of fairness, while Section 6 analyzes and of the ratio between the task memory demand and
the properties of DRF and other policies. Section 7 pro- the slot memory share. We compute the slot memory
vides experimental results based on traces from a Face- and CPU shares by simply dividing the total amount of
book Hadoop cluster. We survey related work in Sec- memory and CPUs by the number of slots. A ratio of
tion 8 and conclude in Section 9. 1 corresponds to a perfect match between the task de-
2 Motivation mands and slot resources, a ratio below 1 corresponds to
tasks underutilizing their slot resources, and a ratio above
While previous work on weighted max-min fairness has 1 corresponds to tasks over-utilizing their slot resources,
focused on single resources, the advent of cloud com- which may lead to thrashing. Figure 2 shows that most of
puting and multi-core processors has increased the need the tasks either underutilize or overutilize some of their
for allocation policies for environments with multiple slot resources. Modifying the number of slots per ma-
resources and heterogeneous user demands. By multi- chine will not solve the problem as this may result either
ple resources we mean resources of different types, in- in a lower overall utilization or more tasks experiencing
stead of multiple instances of the same interchangeable poor performance due to over-utilization (see Section 7).
resource.
To motivate the need for multi-resource allocation, we 3 Allocation Properties
plot the resource usage profiles of tasks in a 2000-node We now turn our attention to designing a max-min fair al-
Hadoop cluster at Facebook over one month (October location policy for multiple resources and heterogeneous
2010) in Figure 1. The placement of a circle in Figure 1 requests. To illustrate the problem, consider a system
indicates the memory and CPU resources consumed by consisting of 9 CPUs and 18 GB RAM, and two users:
tasks. The size of a circle is logarithmic to the number of user A runs tasks that require h1 CPUs, 4 GBi each, and
tasks in the region of the circle. Though the majority of user B runs tasks that require h3 CPUs, 1 GBi each.
tasks are CPU-heavy, there exist tasks that are memory- What constitutes a fair allocation policy for this case?

2
1.0 with indicates that strategy-proofness is important, as it
is common for users to attempt to manipulate schedulers.
0.8 For example, one of Yahoo!’s Hadoop MapReduce dat-
acenters has different numbers of slots for map and re-
CDF of tasks

0.6 duce tasks. A user discovered that the map slots were
contended, and therefore launched all his jobs as long
0.4 reduce phases, which would manually do the work that
MapReduce does in its map phase. Another big search
0.2 Memory demand company provided dedicated machines for jobs only if
CPU demand the users could guarantee high utilization. The company
0.00.0 0.5 1.0 1.5 2.0 2.5 soon found that users would sprinkle their code with in-
Ratio of task demand to resource per slot finite loops to artificially inflate utilization levels.
Figure 2: CDF of demand to slot ratio in a 2000-node cluster at Furthermore, any policy that satisfies the sharing in-
Facebook over a one month period (October 2010). A demand centive property also provides performance isolation, as
to slot ratio of 2.0 represents a task that requires twice as much it guarantees a minimum allocation to each user (i.e., a
CPU (or memory) than the slot CPU (or memory) size. user cannot do worse than owning n1 of the cluster) irre-
spective of the demands of the other users.
One possibility would be to allocate each user half of It can be easily shown that in the case of a single re-
every resource. Another possibility would be to equal- source, max-min fairness satisfies all the above proper-
ize the aggregate (i.e., CPU plus memory) allocations of ties. However, achieving these properties in the case
each user. While it is relatively easy to come up with a of multiple resources and heterogeneous user demands
variety of possible “fair” allocations, it is unclear how to is not trivial. For example, the preferred fair division
evaluate and compare these allocations. mechanism in microeconomic theory, Competitive Equi-
To address this challenge, we start with a set of de- librium from Equal Incomes [22, 30, 33], is not strategy-
sirable properties that we believe any resource alloca- proof (see Section 6.1.2).
tion policy for multiple resources and heterogeneous de- In addition to the above properties, we consider four
mands should satisfy. We then let these properties guide other nice-to-have properties:
the development of a fair allocation policy. We have
found the following four properties to be important: • Single resource fairness: For a single resource, the
solution should reduce to max-min fairness.
1. Sharing incentive: Each user should be better off
sharing the cluster, than exclusively using her own • Bottleneck fairness: If there is one resource that is
partition of the cluster. Consider a cluster with iden- percent-wise demanded most of by every user, then
tical nodes and n users. Then a user should not be the solution should reduce to max-min fairness for
able to allocate more tasks in a cluster partition con- that resource.
sisting of n1 of all resources.
• Population monotonicity: When a user leaves the
2. Strategy-proofness: Users should not be able to system and relinquishes her resources, none of the
benefit by lying about their resource demands. This allocations of the remaining users should decrease.
provides incentive compatibility, as a user cannot
• Resource monotonicity: If more resources are added
improve her allocation by lying.
to the system, none of the allocations of the existing
3. Envy-freeness: A user should not prefer the allo- users should decrease.
cation of another user. This property embodies the
notion of fairness [13, 30]. 4 Dominant Resource Fairness (DRF)
4. Pareto efficiency: It should not be possible to in- We propose Dominant Resource Fairness (DRF), a new
crease the allocation of a user without decreasing allocation policy for multiple resources that meets all
the allocation of at least another user. This prop- four of the required properties in the previous section.
erty is important as it leads to maximizing system For every user, DRF computes the share of each resource
utilization subject to satisfying the other properties. allocated to that user. The maximum among all shares
of a user is called that user’s dominant share, and the
We briefly comment on the strategy-proofness and resource corresponding to the dominant share is called
sharing incentive properties, which we believe are of the dominant resource. Different users may have dif-
special importance in datacenter environments. Anec- ferent dominant resources. For example, the dominant
dotal evidence from cloud operators that we have talked resource of a user running a computation-bound job is

3
User A User B Algorithm 1 DRF pseudo-code
100% R = hr1 , · · · , rm i . total resource capacities
3 CPUs 12 GB
C = hc1 , · · · , cm i . consumed resources, initially 0
si (i = 1..n) . user i’s dominant shares, initially 0
Ui = hui,1 , · · · , ui,m i (i = 1..n) . resources given to
50% user i, initially 0
pick user i with lowest dominant share si
Di ← demand of user i’s next task
6 CPUs 2 GB
0% if C + Di ≤ R then
CPUs Memory C = C + Di . update consumed vector
(9 total) (18GB total) Ui = Ui + Di . update i’s allocation vector
Figure 3: DRF allocation for the example in Section 4.1. si = maxm j=1 {ui,j /rj }
else
return . the cluster is full
CPU, while the dominant resource of a user running an end if
I/O-bound job is bandwidth.1 DRF simply applies max-
min fairness across users’ dominant shares. That is, DRF
seeks to maximize the smallest dominant share in the by DRF to users A and B, respectively. Then user A
system, then the second-smallest, and so on. receives hx CPU, 4x GBi, while user B gets h3y CPU,
We start by illustrating DRF with an example (§4.1), y GBi. The total amount of resources allocated to both
then present an algorithm for DRF (§4.2) and a defini- users is (x + 3y) CPUs and (4x + y) GB. Also, the dom-
tion of weighted DRF (§4.3). In Section 5, we present inant shares of users A and B are 4x/18 = 2x/9 and
two other allocation policies: asset fairness, a straightfor- 3y/9 = y/3, respectively (their corresponding shares of
ward policy that aims to equalize the aggregate resources memory and CPU). The DRF allocation is then given by
allocated to each user, and competitive equilibrium from the solution to the following optimization problem:
equal incomes (CEEI), a popular fair allocation policy
preferred in the micro-economic domain [22, 30, 33]. max (x, y) (Maximize allocations)
In this section, we consider a computation model with
subject to
n users and m resources. Each user runs individual tasks,
and each task is characterized by a demand vector, which x + 3y ≤ 9 (CPU constraint)
specifies the amount of resources required by the task, 4x + y ≤ 18 (Memory constraint)
e.g., h1 CPU, 4 GBi. In general, tasks (even the ones 2x y
= (Equalize dominant shares)
belonging to the same user) may have different demands. 9 3
4.1 An Example
Solving this problem yields2 x = 3 and y = 2. Thus,
Consider a system with of 9 CPUs, 18 GB RAM, and two user A gets h3 CPU, 12 GBi and B gets h6 CPU, 2 GBi.
users, where user A runs tasks with demand vector h1 Note that DRF need not always equalize users’ domi-
CPU, 4 GBi, and user B runs tasks with demand vector nant shares. When a user’s total demand is met, that user
h3 CPUs, 1 GBi each. will not need more tasks, so the excess resources will
In the above scenario, each task from user A consumes be split among the other users, much like in max-min
1/9 of the total CPUs and 2/9 of the total memory, so fairness. In addition, if a resource gets exhausted, users
user A’s dominant resource is memory. Each task from that do not need that resource can still continue receiv-
user B consumes 1/3 of the total CPUs and 1/18 of the ing higher shares of the other resources. We present an
total memory, so user B’s dominant resource is CPU. algorithm for DRF allocation in the next section.
DRF will equalize users’ dominant shares, giving the al-
4.2 DRF Scheduling Algorithm
location in Figure 3: three tasks for user A, with a total
of h3 CPUs, 12 GBi, and two tasks for user B, with a Algorithm 1 shows pseudo-code for DRF scheduling.
total of h6 CPUs, 2 GBi. With this allocation, each user The algorithm tracks the total resources allocated to each
ends up with the same dominant share, i.e., user A gets user as well as the user’s dominant share, si . At each
2/3 of RAM, while user B gets 2/3 of the CPUs. step, DRF picks the user with the lowest dominant share
This allocation can be computed mathematically as among those with tasks ready to run. If that user’s task
follows. Let x and y be the number of tasks allocated demand can be satisfied, i.e., there are enough resources
1A user may have the same share on multiple resources, and might 2 Note that given last constraint (i.e., 2x/9 = y/3) allocations x
therefore have multiple dominant resources. and y are simultaneously maximized.

4
User A User B CPU RAM
Schedule
res. shares dom. share res. shares dom. share total alloc. total alloc.
User B h0, 0i 0 h3/9, 1/18i 1/3 3/9 1/18
User A h1/9, 4/18i 2/9 h3/9, 1/18i 1/3 4/9 5/18
User A h2/9, 8/18i 4/9 h3/9, 1/18i 1/3 5/9 9/18
User B h2/9, 8/18i 4/9 h6/9, 2/18i 2/3 8/9 10/18
User A h3/9, 12/18i 2/3 h6/9, 2/18i 2/3 1 14/18

Table 1: Example of DRF allocating resources in a system with 9 CPUs and 18 GB RAM to two users running tasks that require
h1 CPU, 4 GBi and h3 CPUs, 1 GBi, respectively. Each row corresponds to DRF making a scheduling decision. A row shows the
shares of each user for each resource, the user’s dominant share, and the fraction of each resource allocated so far. DRF repeatedly
selects the user with the lowest dominant share (indicated in bold) to launch a task, until no more tasks can be allocated.

available in the system, one of her tasks is launched. We case of interest is when all the weights of user i are equal,
consider the general case in which a user can have tasks i.e., wi,j = wi , (1 ≤ j ≤ m). In this case, the ratio be-
with different demand vectors, and we use variable Di to tween the dominant shares of users i and j will be simply
denote the demand vector of the next task user i wants wi /wj . If the weights of all users are set to 1, Weighted
to launch. For simplicity, the pseudo-code does not cap- DRF reduces trivially to DRF.
ture the event of a task finishing. In this case, the user
releases the task’s resources and DRF again selects the 5 Alternative Fair Allocation Policies
user with the smallest dominant share to run her task. Defining a fair allocation in a multi-resource system is
Consider the two-user example in Section 4.1. Table 1 not an easy question, as the notion of “fairness” is itself
illustrates the DRF allocation process for this example. open to discussion. In our efforts, we considered numer-
DRF first picks B to run a task. As a result, the shares ous allocation policies before settling on DRF as the only
of B become h3/9, 1/18i, and the dominant share be- one that satisfies all four of the required properties in
comes max(3/9, 1/18) = 1/3. Next, DRF picks A, as Section 3: sharing incentive, strategy-proofness, Pareto
her dominant share is 0. The process continues until it efficiency, and envy-freeness. In this section, we con-
is no longer possible to run new tasks. In this case, this sider two of the alternatives we have investigated: Asset
happens as soon as CPU has been saturated. Fairness, a simple and intuitive policy that aims to equal-
At the end of the above allocation, user A gets h3 CPU, ize the aggregate resources allocated to each user, and
12 GBi, while user B gets h6 CPU, 2 GBi, i.e., each user Competitive Equilibrium from Equal Incomes (CEEI),
gets 2/3 of its dominant resource. the policy of choice for fairly allocating resources in the
Note that in this example the allocation stops as soon microeconomic domain [22, 30, 33]. We compare these
as any resource is saturated. However, in the general policies with DRF in Section 5.3.
case, it may be possible to continue to allocate tasks even
after some resource has been saturated, as some tasks 5.1 Asset Fairness
might not have any demand on the saturated resource. The idea behind Asset Fairness is that equal shares of
The above algorithm can be implemented using a bi- different resources are worth the same, i.e., that 1% of
nary heap that stores each user’s dominant share. Each all CPUs worth is the same as 1% of memory and 1%
scheduling decision then takes O(log n) time for n users. of I/O bandwidth. Asset Fairness then tries to equalize
the aggregate resource value allocated to each user. In
4.3 Weighted DRF
P computes for each user i the
particular, Asset Fairness
In practice, there are many cases in which allocating re- aggregate share xi = j si,j , where si,j is the share of
sources equally across users is not the desirable policy. resource j given to user i. It then applies max-min across
Instead, we may want to allocate more resources to users users’ aggregate shares, i.e., it repeatedly launches tasks
running more important jobs, or to users that have con- for the user with the minimum aggregate share.
tributed more resources to the cluster. To achieve this Consider the example in Section 4.1. Since there are
goal, we propose Weighted DRF, a generalization of both twice as many GB of RAM as CPUs (i.e., 9 CPUs and
DRF and weighted max-min fairness. 18 GB RAM), one CPU is worth twice as much as one
With Weighted DRF, each user i is associated a weight GB of RAM. Supposing that one GB is worth $1 and
vector Wi = hwi,1 , . . . , wi,m i, where wi,j represents the one CPU is worth $2, it follows that user A spends $6
weight of user i for resource j. The definition of a dom- for each task, while user B spends $7. Let x and y be
inant share for user i changes to si = maxj {ui,j /wi,j }, the number of tasks allocated by Asset Fairness to users
where ui,j is user i’s share of resource j. A particular A and B, respectively. Then the asset-fair allocation is

5
given by the solution to the following optimization prob- User A User B
lem:
100% 100% 100%
max (x, y) (Maximize allocations)
subject to
x + 3y ≤ 9 (CPU constraint)
50% 50% 50%
4x + y ≤ 18 (Memory constraint)
6x = 7y (Every user spends the same)
Solving the above problem yields x = 2.52 and y = 0% 0% 0%
2.16. Thus, user A gets h2.5 CPUs, 10.1 GBi, while user CPU Mem CPU Mem CPU Mem
B gets h6.5 CPUs, 2.2 GBi, respectively.
a) DRF b) Asset Fairness c) CEEI
While this allocation policy seems compelling in its
simplicity, it has a significant drawback: it violates the Figure 4: Allocations given by DRF, Asset Fairness and CEEI
sharing incentive property. As we show in Section 6.1.1, in the example scenario in Section 4.1.
asset fairness can result in one user getting less than 1/n
of all resources, where n is the total number of users. Unfortunately, while CEEI is envy-free and Pareto ef-
5.2 Competitive Equilibrium from Equal Incomes ficient, it turns out that it is not strategy-proof, as we will
In microeconomic theory, the preferred method to fairly show in Section 6.1.2. Thus, users can increase their al-
divide resources is Competitive Equilibrium from Equal locations by lying about their resource demands.
Incomes (CEEI) [22, 30, 33]. With CEEI, each user re- 5.3 Comparison with DRF
ceives initially n1 of every resource, and subsequently,
To give the reader an intuitive understanding of Asset
each user trades her resources with other users in a per-
Fairness and CEEI, we compare their allocations for the
fectly competitive market.3 The outcome of CEEI is both
example in Section 4.1 to that of DRF in Figure 4.
envy-free and Pareto efficient [30].
We see that DRF equalizes the dominant shares of the
More precisely, the CEEI allocation is given by the
users, i.e., user A’s memory share and user B’s CPU
Nash bargaining solution4 [22, 23]. The Nash bargain-
share. In contrast, Asset Fairness equalizes the total frac-
ing
Q solution picks the feasible allocation that maximizes tion of resources allocated to each user, i.e., the areas of
i ui (ai ), where ui (ai ) is the utility that user i gets from
the rectangles for each user in the figure. Finally, be-
her allocation ai . To simplify the comparison, we assume
cause CEEI assumes a perfectly competitive market, it
that the utility that a user gets from her allocation is sim-
finds a solution satisfying market clearance, where ev-
ply her dominant share, si .
ery resource has been allocated. Unfortunately, this ex-
Consider again the two-user example in Section 4.1.
act property makes it possible to cheat CEEI: a user can
Recall that the dominant share of user A is 4x/18 =
claim she needs more of some underutilized resource
2x/9 while the dominant share of user B is 3y/9 = y/3,
even when she does not, leading CEEI to give more tasks
where x is the number of tasks given to A and y is the
overall to this user to achieve market clearance.
number of tasks given to B. Maximizing the product
of the dominant shares is equivalent to maximizing the 6 Analysis
product x · y. Thus, CEEI aims to solve the following
optimization problem: In this section, we discuss which of the properties pre-
sented in Section 3 are satisfied by Asset Fairness, CEEI,
max (x · y) (maximize Nash product) and DRF. We also evaluate the accuracy of DRF when
subject to task sizes do not match the available resources exactly.
x + 3y ≤ 9 (CPU constraint) 6.1 Fairness Properties
4x + y ≤ 18 (Memory constraint) Table 2 summarizes the fairness properties that are sat-
Solving the above problem yields x = 45/11 and y = isfied by Asset Fairness, CEEI, and DRF. The Appendix
18/11. Thus, user A gets h4.1 CPUs, 16.4 GBi, while contains the proofs of the main properties of DRF, while
user B gets h4.9 CPUs, 1.6 GBi. our technical report [14] contains a more complete list of
results for DRF and CEEI. In the remainder of this sec-
3 A perfect market satisfies the price-taking (i.e., no single user af-
tion, we discuss some of the interesting missing entries
fects prices) and market-clearance (i.e., matching supply and demand
via price adjustment) assumptions.
in the table, i.e., properties violated by each of these dis-
4 For this to hold, utilities have to be homogeneous, i.e., u(α x) = ciplines. In particular, we show through examples why
α u(x) for α > 0, which is true in our case. Asset Fairness and CEEI lack the properties that they

6
Allocation Policy User 1 User 2
Property Asset CEEI DRF 100%
Sharing Incentive X X
Strategy-proofness X X
Envy-freeness X X X 50%
Pareto efficiency X X X
Single Resource Fairness X X X
Bottleneck Fairness X X 0%
Population Monotonicity X X Resource 1 Resource 2
Resource Monotonicity Figure 5: Example showing that Asset Fairness can fail to meet
the sharing incentive property. Asset Fairness gives user 2 less
Table 2: Properties of Asset Fairness, CEEI and DRF. than half of both resources.

User 1 User 2
do, and we prove that no policy can provide resource
100% 100%
monotonicity without violating either sharing incentive
or Pareto efficiency to explain why DRF lacks resource
monotonicity.
50% 50%
6.1.1 Properties Violated by Asset Fairness
While being the simplest policy, Asset Fairness violates
several important properties: sharing incentive, bottle- 0% 0%
neck fairness, and resource monotonicity. Next, we use Res. 1 Res. 2 Res. 1 Res. 2
examples to show the violation of these properties. a) With truthful b) With user 1
demands lying
Theorem 1 Asset Fairness violates the sharing incen-
Figure 6: Example showing how CEEI violates strategy proof-
tive property.
ness. User 1 can increase her share by claiming that she needs
more of resource 2 than she actually does.
Proof Consider the following example, illustrated in
Figure 5: two users in a system with h30, 30i total re-
sources have demand vectors D1 = h1, 3i, and D2 = Proof Consider two users A and B with demands h4, 2i
h1, 1i. Asset fairness will allocate the first user 6 tasks and h1, 1i and 77 units of two resources. Asset fairness
and the second user 12 tasks. The first user will receive allocates A a total of h44, 22i and B h33, 33i equalizing
66
h6, 18i resources, while the second will use h12, 12i. their sum of shares to 77 . If resource two is doubled, both
24 users’ share of the second resource is halved, while the
While each user gets an equal aggregate share of 60 , the
second user gets less than half (15) of both resources. first resource is saturated. Asset fairness now decreases
This violates the sharing incentive property, as the sec- A’s allocation to h42, 21i and increases B’s to h35, 35i,
ond user would be better off to statically partition the equalizing their shares to 42 21 35 35
77 + 154 = 77 + 154 = 154 .
105

cluster and own half of the nodes.  Thus resource monotonicity is violated. 

Theorem 2 Asset Fairness violates the bottleneck fair- 6.1.2 Properties Violated by CEEI
ness property. While CEEI is envy-free and Pareto efficient, it turns
out that it is not strategy proof. Intuitively, this is be-
Proof Consider a scenario with a total resource vector of cause CEEI assumes a perfectly competitive market that
h21, 21i and two users with demand vectors D1 = h3, 2i achieves market clearance, i.e., matching of supply and
and D2 = h4, 1i, making resource 1 the bottleneck re- demand and allocation of all the available resources.
source. Asset fairness will give each user 3 tasks, equal- This can lead to CEEI giving much higher shares to users
izing their aggregate usage to 15. However, this only that use more of a less-contended resource in order to
gives the first user 73 of resource 1 (the contended bottle- fully utilize that resource. Thus, a user can claim that she
neck resource), violating bottleneck fairness.  needs more of some underutilized resource to increase
her overall share of resources. We illustrate this below.

Theorem 3 Asset fairness does not satisfy resource Theorem 4 CEEI is not strategy-proof.
monotonicity.

7
User 1 User 2 User 3 Theorem 6 No allocation policy that satisfies the shar-
100% 100%
ing incentive and Pareto efficiency properties can also
satisfy resource monotonicity.

Proof We use a simple example to prove this prop-


50% 50% erty. Consider two users A and B with symmetric de-
mands h2, 1i, and h1, 2i, respectively, and assume equal
amounts of both resources. Sharing incentive requires
0% 0% that user A gets at least half of resource 1 and user B
Res. 1 Res. 2 Res. 1 Res. 2 gets half of resource 2. By Pareto efficiency, we know
a) With 3 users b) After user 3 that at least one of the two users must be allocated more
leaves resources. Without loss of generality, assume that user A
Figure 7: Example showing that CEEI violates population is given more than half of resource 1 (a symmetric argu-
monotonicity. When user 3 leaves, CEEI changes the alloca- ment holds if user B is given more than half of resource
tion from a) to b), lowering the share of user 2. 2). If the total amount of resource 2 is now increased by
a factor of 4, user B is no longer getting its guaranteed
share of half of resource 2. Now, the only feasible allo-
Proof Consider the following example, shown in Figure
cation that satisfies the sharing incentive is to give both
6. Assume a total resource vector of h100, 100i, and two
users half of resource 1, which would require decreas-
users with demands h16, 1i and h1, 2i. In this case, CEEI
ing user 1’s share of resource 1, thus violating resource
allocates 100 1500
31 and 31 tasks to each user respectively monotonicity. 
(approximately 3.2 and 48.8 tasks). If user 1 changes her
demand vector to h16, 8i, asking for more of resource
This theorem explains why both DRF and CEEI vio-
2 than she actually needs, CEEI gives the the users 25 6 late resource monotonicity.
and 100
3 tasks respectively (approximately 4.2 and 33.3
tasks). Thus, user 1 improves her number of tasks from 6.2 Discrete Resource Allocation
3.2 to 4.2 by lying about her demand vector. User 2 suf-
So far, we have implicitly assumed one big resource
fers because of this, as her task allocation decreases. 
pool whose resources can be allocated in arbitrarily small
In addition, for the same intuitive reason (market amounts. Of course, this is often not the case in prac-
clearance), we have the following result: tice. For example, clusters consist of many small ma-
chines, where resources are allocated to tasks in discrete
Theorem 5 CEEI violates population monotonicity. amounts. In the reminder of this section, we refer to
Proof Consider the total resource vector h100, 100i and these two scenarios as the continuous, and the discrete
three users with the following demand vectors D1 = scenario, respectively. We now turn our attention to how
h4, 1i, D2 = h1, 16i, and D3 = h16, 1i (see Figure 7). fairness is affected in the discrete scenario.
CEEI will yield the allocation A1 = h11.3, 5.4, 3.1i, Assume a cluster consisting of K machines.
where the numbers in parenthesis represent the number Let max-task denote the maximum demand vec-
of tasks allocated to each user. If user 3 leaves the system tor across all demand vectors, i.e., max-task =
and relinquishes her resource, CEEI gives the new allo- hmaxi {di,1 }, maxi {di,2 }, · · · , maxi {di,m }i. Assume
cation A2 = h23.8, 4.8i, which has made user 2 worse further that any task can be scheduled on every machine,
off than in A1 .  i.e., the total amount of resources on each machine
is at least max-task. We only consider the case when
6.1.3 Resource Monotonicity vs. Sharing Incentives each user has strictly positive demands. Given these
and Pareto efficiency assumptions, we have the following result.
As shown in Table 2, DRF achieves all the properties ex-
cept resource monotonicity. Rather than being a limita- Theorem 7 In the discrete scenario, it is possible to al-
tion of DRF, this is a consequence of the fact that sharing locate resources such that the difference between the al-
incentive, Pareto efficiency, and resource monotonicity locations of any two users is bounded by one max-task
cannot be achieved simultaneously. Since we consider compared to the continuous allocation scenario.
the first two of these properties to be more important (see
Section 3) and since adding new resources to a system is Proof Assume we start allocating resources on one ma-
a relatively rare event, we chose to satisfy sharing incen- chine at a time, and that we always allocate a task to the
tive and Pareto efficiency, and give up resource mono- user with the lowest dominant share. As long as there
tonicity. In particular, we have the following result. is at least a max-task available on the first machine, we

8
1.0 tion and job completion time.
0.8 Job 1 CPU
Job 1 Share

0.6 Job 1 Memory 7.1 Dynamic Resource Sharing

0.4 In our first experiment, we show how DRF dynamically


shares resources between jobs with different demands.
0.2 We ran two jobs on a 48-node Mesos cluster on Amazon
0.00 50 100 150 200 250 300 EC2, using “extra large” instances with 4 CPU cores and
(a) 15 GB of RAM. We configured Mesos to allocate up to
1.0 4 CPUs and 14 GB of RAM on each node, leaving 1 GB
for the OS. We submitted two jobs that launched tasks
0.8
Job 2 Share

with different resource demands at different times during


0.6 a 6-minute interval.
0.4 Job 2 CPU Figures 8 (a) and 8 (b) show the CPU and memory al-
0.2 Job 2 Memory locations given to each job as a function of time, while
0.00 50 100 150 200 250 300
Figure 8 (c) shows their dominant shares. In the first 2
minutes, job 1 uses h1 CPU, 10 GB RAMi per task and
(b) job 2 uses h1 CPU, 1 GB RAMi per task. Job 1’s dom-
1.0 inant resource is RAM, while job 2’s dominant resource
Dominant Share

0.8 is CPU. Note that DRF equalizes the jobs’ shares of their
0.6 dominant resources. In addition, because jobs have dif-
ferent dominant resources, their dominant shares exceed
0.4 Job 1 50%, i.e., job 1 uses around 70% of the RAM while job
0.2 Job 2 2 uses around 75% of the CPUs. Thus, the jobs benefit
0.00 50 100 150 200 250 300 from running in a shared cluster as opposed to taking half
Time (s) the nodes each. This captures the essence of the sharing
(c) incentive property.
Figure 8: CPU, memory and dominant share for two jobs. After 2 minutes, the task sizes of both jobs change, to
h2 CPUs, 4 GBi for job 1 and h1 CPU, 3 GBi for job
2. Now, both jobs’ dominant resource is CPU, so DRF
continue to allocate a task to the next user with least dom- equalizes their CPU shares. Note that DRF switches allo-
inant share. Once the available resources on the first ma- cations dynamically by having Mesos offer resources to
chine become less than a max-task size, we move to the the job with the smallest dominant share as tasks finish.
next machine and repeat the process. When the alloca- Finally, after 2 more minutes, the task sizes of both
tion completes, the difference between two user’s alloca- jobs change again: h1 CPU, 7 GBi for job 1 and h1 CPU,
tions of their dominant resources compared to the con- 4 GBi for job 2. Both jobs’ dominant resource is now
tinuous scenario is at most max-task. If this were not the memory, so DRF tries to equalize their memory shares.
case, then some user A would have more than max-task The reason the shares are not exactly equal is due to re-
discrepancy w.r.t. to another user B. However, this can- source fragmentation (see Section 6.2).
not be the case, because the last time A was allocated a
task, B should have been allocated a task instead.  7.2 DRF vs. Alternative Allocation Policies
We next evaluate DRF with respect to two alternative
7 Experimental Results schemes: slot-based fair scheduling (a common policy in
This section evaluates DRF through micro- and macro- current systems, such as the Hadoop Fair Scheduler [34]
benchmarks. The former is done through experiments and Quincy [18]) and (max-min) fair sharing applied
running an implementation of DRF in the Mesos cluster only to a single resource (CPU). For the experiment, we
resource manager [16]. The latter is done using trace- ran a 48-node Mesos cluster on EC2 instances with 8
driven simulations. CPU cores and 7 GB RAM each. We configured Mesos
We start by showing how DRF dynamically adjusts the to allocate 8 CPUs and 6 GB RAM on each node, leav-
shares of jobs with different resource demands in Section ing 1 GB free for the OS. We implemented these three
7.1. In Section 7.2, we compare DRF against slot-level scheduling policies as Mesos allocation modules.
fair sharing (as implemented by Hadoop Fair Scheduler We ran a workload with two classes of users, repre-
[34] and Quincy [18]), and CPU-only fair sharing. Fi- senting two organizational entities with different work-
nally, in Section 7.3, we use Facebook traces to compare loads. One of the entities had four users submitting small
DRF and the Hadoop’s Fair Scheduler in terms of utiliza- jobs with task demands h1 CPU, 0.5 GBi. The other en-

9
40 35 200 196
35 33 30 173
30 150
25 123
20 17 100 72
15 13 65 69
10 8 50
5
0 DRF 0 DRF 3 slots 4 slots 5 slots 6 slots CPU-fair
3 slots 4 slots 5 slots 6 slots CPU-fair
Figure 9: Number of large jobs completed for each allocation Figure 11: Average response time (in seconds) of large jobs
scheme in our comparison of DRF against slot-based fair shar- for each allocation scheme in our comparison of DRF against
ing and CPU-only fair sharing. slot-based fair sharing and CPU-only fair sharing.

100 91 94 70 61
80 60 56
61 66 50 39
60 40 35
40 37 35 30 25 25
20 20
10
0 DRF 3 slots 4 slots 5 slots 6 slots CPU-fair 0 DRF 3 slots 4 slots 5 slots 6 slots CPU-fair
Figure 10: Number of small jobs completed for each alloca- Figure 12: Average response time (in seconds) of small jobs
tion scheme in our comparison of DRF against slot-based fair for each allocation scheme in our comparison of DRF against
sharing and CPU-only fair sharing. slot-based fair sharing and CPU-only fair sharing.

tity had four users submitting large jobs with task de- 2010). The data consists of Hadoop MapReduce jobs.
mands h2 CPUs, 2 GBi. Each job consisted of 80 tasks. We assume task duration, CPU usage, and memory con-
As soon as a job finished, the user would launch another sumption is identical as in the original trace. The traces
job with similar demands. Each experiment ran for ten are simulated on a smaller cluster of 400 nodes to reach
minutes. At the end, we computed the number of com- higher utilization levels, such that fairness becomes rel-
pleted jobs of each type, as well as their response times. evant. Each node in the cluster consists of 12 slots, 16
For the slot-based allocation scheme, we varied the cores, and 32 GB memory. Figure 13 shows a short 300
number of slots per machine from 3 to 6 to see how it second sub-sample to visualize how CPU and memory
affected performance. Figures 9 through 12 show our re- utilization looks for the same workload when using DRF
sults. In Figures 9 and 10, we compare the number of compared to Hadoop’s fair scheduler (slot). As shown in
jobs of each type completed for each scheduling scheme the figure, DRF provides higher utilization, as it is able
in ten minutes. In Figures 11 and 12, we compare aver- to better match resource allocations with task demands.
age response times. Figure 14 shows the reduction of the average job com-
Several trends are apparent from the data. First, with pletion times for DRF as compared to the Hadoop fair
slot-based scheduling, both the throughput and job re- scheduler. The workload is quite heavy on small jobs,
sponse times are worse than with DRF, regardless of the which experience no improvements (i.e., −3%). This is
number of slots. This is because with a low slot count, because small jobs typically consist of a single execu-
the scheduler can undersubscribe nodes (e.g.,, launch tion phase, and the completion time is dominated by the
only 3 small tasks on a node), while with a large slot longest task. Thus completion time is hard to improve
count, it can oversubscribe them (e.g., launch 4 large for such small jobs. In contrast, the completion times of
tasks on a node and cause swapping because each task the larger jobs reduce by as much as 66%. This is be-
needs 2 GB and the node only has 6 GB). Second, with cause these jobs consists of many phases, and thus they
fair sharing at the level of CPUs, the number of small can benefit from the higher utilization achieved by DRF.
jobs executed is similar to DRF, but there are much fewer
8 Related Work
large jobs executed, because memory is overcommitted
on some machines and leads to poor performance for all We briefly review related work in computer science and
the high-memory tasks running there. Overall, the DRF- economics.
based scheduler that is aware of both resources has the While many papers in computer science focus on
lowest response times and highest overall throughput. multi-resource fairness, they are only considering multi-
ple instances of the same interchangeable resource, e.g.,
7.3 Simulations using Facebook Traces
CPU [6, 7, 35], and bandwidth [10, 20, 21]. Unlike these
Next we use log traces from a 2000-node cluster at Face- approaches, we focus on the allocation of resources of
book, containing data for a one week period (October different types.

10
1.0
Memory Utilization CPU Utilization pared to DRF is that it is not strategy-proof. As a result,
0.8 users can manipulate the scheduler by lying about their
0.6
0.4 DRF
demands.
0.2 Slots Many of the fair division policies proposed in the mi-
0.00 500 1000 1500 2000 2500 croeconomics literature are based on the notion of utility
and, hence, focus on the single metric of utility. In the
1.0 economics literature, max-min fairness is known as the
0.8
0.6 lexicographic ordering [26, 25] (leximin) of utilities.
0.4 The question is what the user utilities are in the multi-
0.2 resource setting, and how to compare such utilities. One
0.00 500 1000 1500 2000 2500 natural way is to define utility as the number of tasks al-
Time (s) located to a user. But modeling utilities this way, together
Figure 13: CPU and memory utilization for DRF and slot fair- with leximin, violates many of the fairness properties we
ness for a trace from a Facebook Hadoop cluster. proposed. Viewed in this light, DRF makes two contri-
butions. First, it suggests using the dominant share as a
Completion Time Reduction

proxy for utility, which is equalized using the standard


70 66% leximin ordering. Second, we prove that this scheme is
60 55% 53%
50 51% 48% strategy-proof for such utility functions. Note that the
40 35% leximin ordering is a lexicographic version of the Kalai-
30 Smorodinsky (KS) solution [19]. Thus, our result shows
20 that KS is strategy-proof for such utilities.
10
0 -3% 9 Conclusion and Future Work
00 0 00 00 0 0
1-5 01-100 01-15 01-20 01-300 01-300 3001-

5 1 0 1 5 2 5 2 5 We have introduced Dominant Resource Fairness (DRF),
Job Size (tasks) a fair sharing model that generalizes max-min fairness to
Figure 14: Average reduction of the completion times for dif- multiple resource types. DRF allows cluster schedulers
ferent job sizes for a trace from a Facebook Hadoop cluster. to take into account the heterogeneous demands of dat-
acenter applications, leading to both fairer allocation of
resources and higher utilization than existing solutions
Quincy [18] is a scheduler developed in the context that allocate identical resource slices (slots) to all tasks.
of the Dryad cluster computing framework [17]. Quincy DRF satisfies a number of desirable properties. In par-
achieves fairness by modeling the fair scheduling prob- ticular, DRF is strategy-proof, so that users are incen-
lem as a min-cost flow problem. Quincy does not cur- tivized to report their demands accurately. DRF also in-
rently support multi-resource fairness. In fact, as men- centivizes users to share resources by ensuring that users
tioned in the discussion section of the paper [18, pg. 17], perform at least as well in a shared cluster as they would
it appears difficult to incorporate multi-resource require- in smaller, separate clusters. Other schedulers that we in-
ments into the min-cost flow formulation. vestigated, as well as alternative notions of fairness from
Hadoop currently provides two fair sharing sched- the microeconomic literature, fail to satisfy all of these
ulers [1, 2, 34]. Both these schedulers allocate resources properties.
at the slot granularity, where a slot is a fixed fraction of We have evaluated DRF by implementing it in the
the resources on a machine. As a result, these sched- Mesos resource manager, and shown that it can lead to
ulers cannot always match the resource allocations with better overall performance than the slot-based fair sched-
the tasks’ demands, especially when these demands are ulers that are commonly in use today.
widely heterogeneous. As we have shown in Section 7,
this mismatch may lead to either low cluster utilization 9.1 Future Work
or poor performance due to resource oversubscription. There are several interesting directions for future re-
In the microeconomic literature, the problem of equity search. First, in cluster environments with discrete tasks,
has been studied within and outside of the framework of one interesting problem is to minimize resource frag-
game theory. The books by Young [33] and Moulin [22] mentation without compromising fairness. This prob-
are entirely dedicated to these topics and provide good lem is similar to bin-packing, but where one must pack
introductions. The preferred method of fair division in as many items (tasks) as possible subject to meeting
microeconomics is CEEI [3, 33, 22], as introduced by DRF. A second direction involves defining fairness when
Varian [30]. We have therefore devoted considerable at- tasks have placement constraints, such as machine pref-
tention to it in Section 5.2. CEEI’s main drawback com- erences. Given the current trend of multi-core machines,

11
a third interesting research direction is to explore the use [13] D. Foley. Resource allocation and the public sector. Yale
of DRF as an operating system scheduler. Finally, from Economic Essays, 7(1):73–76, 1967.
a microeconomic perspective, a natural direction is to [14] A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski,
investigate whether DRF is the only possible strategy- S. Shenker, and I. Stoica. Dominant resource fairness:
proof policy for multi-resource fairness, given other de- Fair allocation of multiple resource types. Technical
Report UCB/EECS-2011-18, EECS Department,
sirable properties such Pareto efficiency.
University of California, Berkeley, Mar 2011.
10 Acknowledgements [15] P. Goyal, H. Vin, and H. Cheng. Start-time fair queuing:
A scheduling algorithm for integrated services packet
We thank Eric J. Friedman, Hervé Moulin, John Wilkes, switching networks. IEEE/ACM Transactions on
and the anonymous reviewers for their invaluable feed- Networking, 5(5):690–704, Oct. 1997.
back. We thank Facebook for making available their [16] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi,
traces. This research was supported by California MI- A. D. Joseph, R. H. Katz, S. Shenker, and I. Stoica.
CRO, California Discovery, the Swedish Research Coun- Mesos: A platform for fine-grained resource sharing in
the data center. In NSDI, 2011.
cil, the Natural Sciences and Engineering Research
[17] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly.
Council of Canada, a National Science Foundation Grad- Dryad: distributed data-parallel programs from
uate Research Fellowship,5 and the RAD Lab spon- sequential building blocks. In EuroSys 07, 2007.
sors: Google, Microsoft, Oracle, Amazon, Cisco, Cloud- [18] M. Isard, V. Prabhakaran, J. Currey, U. Wieder,
era, eBay, Facebook, Fujitsu, HP, Intel, NetApp, SAP, K. Talwar, and A. Goldberg. Quincy: Fair scheduling for
VMware, and Yahoo!. distributed computing clusters. In SOSP ’09, 2009.
[19] E. Kalai and M. Smorodinsky. Other Solutions to Nash’s
References Bargaining Problem. Econometrica, 43(3):513–518,
1975.
[1] Hadoop Capacity Scheduler.
[20] J. M. Kleinberg, Y. Rabani, and É. Tardos. Fairness in
http://hadoop.apache.org/common/docs/r0.
routing and load balancing. J. Comput. Syst. Sci.,
20.2/capacity_scheduler.html.
63(1):2–20, 2001.
[2] Hadoop Fair Scheduler.
[21] Y. Liu and E. W. Knightly. Opportunistic fair scheduling
http://hadoop.apache.org/common/docs/r0.
over multiple wireless channels. In INFOCOM, 2003.
20.2/fair_scheduler.html.
[22] H. Moulin. Fair Division and Collective Welfare. The
[3] Personal communication with Hervé Moulin.
MIT Press, 2004.
[4] A. K. Agrawala and R. M. Bryant. Models of memory
[23] J. Nash. The Bargaining Problem. Econometrica,
scheduling. In SOSP ’75, 1975.
18(2):155–162, April 1950.
[5] J. Axboe. Linux Block IO – Present and Future
[24] A. Parekh and R. Gallager. A generalized processor
(Completely Fair Queueing). In Ottawa Linux
sharing approach to flow control - the single node case.
Symposium 2004, pages 51–61, 2004.
ACM/IEEE Transactions on Networking, 1(3):344–357,
[6] S. K. Baruah, N. K. Cohen, C. G. Plaxton, and D. A.
June 1993.
Varvel. Proportionate progress: A notion of fairness in
[25] E. A. Pazner and D. Schmeidler. Egalitarian equivalent
resource allocation. Algorithmica, 15(6):600–625, 1996.
allocations: A new concept of economic equity.
[7] S. K. Baruah, J. Gehrke, and C. G. Plaxton. Fast
Quarterly Journal of Economics, 92:671–687, 1978.
scheduling of periodic tasks on multiple resources. In
[26] A. Sen. Rawls Versus Bentham: An Axiomatic
IPPS ’95, 1995.
Examination of the Pure Distribution Problem. Theory
[8] J. Bennett and H. Zhang. WF2 Q: Worst-case fair
and Decision, 4(1):301–309, 1974.
weighted fair queueing. In INFOCOM, 1996.
[27] M. Shreedhar and G. Varghese. Efficient fair queuing
[9] D. Bertsekas and R. Gallager. Data Networks. Prentice
using deficit round robin. IEEE Trans. Net, 1996.
Hall, second edition, 1992.
[28] I. Stoica, H. Abdel-Wahab, K. Jeffay, S. Baruah,
[10] J. M. Blanquer and B. Özden. Fair queuing for
J. Gehrke, and G. Plaxton. A proportional share resource
aggregated multiple links. SIGCOMM ’01,
allocation algorithm for real-time, time-shared systems.
31(4):189–197, 2001.
In IEEE RTSS 96, 1996.
[11] B. Caprita, W. C. Chan, J. Nieh, C. Stein, and H. Zheng.
[29] I. Stoica, S. Shenker, and H. Zhang. Core-stateless fair
Group ratio round-robin: O(1) proportional share
queueing: Achieving approximately fair bandwidth
scheduling for uniprocessor and multiprocessor systems.
allocations in high speed networks. In SIGCOMM, 1998.
In USENIX Annual Technical Conference, 2005.
[30] H. Varian. Equity, envy, and efficiency. Journal of
[12] A. Demers, S. Keshav, and S. Shenker. Analysis and
Economic Theory, 9(1):63–91, 1974.
simulation of a fair queueing algorithm. In SIGCOMM
[31] C. A. Waldspurger. Lottery and Stride Scheduling:
’89, pages 1–12, New York, NY, USA, 1989. ACM.
Flexible Proportional Share Resource Management.
5 Any opinions, findings, conclusions, or recommendations ex- PhD thesis, MIT, Laboratory of Computer Science, Sept.
pressed in this publication are those of the authors and do not nec- 1995. MIT/LCS/TR-667.
essarily reflect the views of the NSF. [32] C. A. Waldspurger and W. E. Weihl. Lottery scheduling:

12
flexible proportional-share resource management. In a task of user i is selected, she is allocated an amount

si di,k =  · rk of the dominant resource. This means that
OSDI ’94, 1994.
[33] H. P. Young. Equity: in theory and practice. Princeton the share of the dominant resource of user i increases by
University Press, 1994. ( · rk )/rk = , as expected.
[34] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy,
S. Shenker, and I. Stoica. Delay Scheduling: A Simple A.2 Allocation Properties
Technique for Achieving Locality and Fairness in We start with a preliminary result.
Cluster Scheduling. In EuroSys 10, 2010.
[35] D. Zhu, D. Mossé, and R. G. Melhem. Lemma 8 Every user in a DRF allocation has at least
Multiple-Resource Periodic Scheduling Problem: how one saturated resource.
much fairness is necessary? In IEEE RTSS, 2003.
Proof Assume this is not the case, i.e., none of the re-
A Appendix: DRF Properties sources used by user i is saturated. However, this con-
In this appendix, we present the main properties of DRF. tradicts the assumption that progressive filling has com-
The technical report [14] contains a more complete list pleted the computation of the DRF allocation. Indeed,
of results for DRF and CEEI. For context, the following as long as none of the resources of user i are saturated,
table summarizes the properties satisfied by Asset Fair- progressive filling will continue to increase the alloca-
ness, CEEI, and DRF, respectively. tions of user i (and of all the other users sharing only
In this section, we assume that all users have an un- non-saturated resources). 
bounded number of tasks. In addition, we assume that
all tasks of a user have the same demand vector, and we Recall that progressive filling always allocates the re-
will refer to this vector as the user’s demand vector. sources to a user proportionally to the user’s demand
Next, we present progressive filling [9], a simple tech- vector. More precisely, let Di = hdi,1 , di,2 , . . . , di,m i
nique to achieve DRF allocation when all resources are be the demand vector of user i. Then, at any time t dur-
arbitrary divisible. This technique is instrumental in ing the progressive filling process, the allocation of user
proving our results. i is proportional to the demand vector,
A.1 Progressive Filling for DRF Ai (t) = αi (t) · Di = αi (t) · hdi,1 , di,2 , . . . , di,m i (1)
Progressive filling is an idealized algorithm to achieve
where αi (t) is a positive scalar.
max-min fairness in a system in which resources can
Now, we are in position to prove the DRF properties.
be allocated in arbitrary small amounts [9, pg 450]. It
was originally used in a networking context, but we now Theorem 9 DRF is Pareto efficient.
adapt it to our problem domain. In the case of DRF, pro-
gressive filling increases all users’ dominant shares at the Proof Assume user i can increase her dominant share,
same rate, while increasing their other resource alloca- si , without decreasing the dominant share of anyone else.
tions proportionally to their task demand vectors, until at According to Lemma 8, user i has at least one saturated
least one resource is saturated. At this point, the alloca- resource. If no other user is using the saturated resource,
tions of all users using the saturated resource are frozen, then we are done as it would be impossible to increase i’s
and progressive filling continues recursively after elim- share of the saturated resource. If other users are using
inating these users. In this case, progressive filling ter- the saturated resource, then increasing the allocation of
minates when there are no longer users whose dominant i would result in decreasing the allocation of at least an-
shares can be increased. other user j sharing the same saturated resource. Since
Progressive filling for DRF is equivalent to the under progressive filling, the resources allocated by any
scheduling algorithm presented in Figure 1 after appro- user are proportional to her demand vector (see Eq. 1),
priately scaling the users’ demand vectors. In particular, decreasing the allocation of any resource used by user i
each user’s demand vector is scaled such that allocating will also decrease i’s dominant share. This contradicts
resources to a user according to her scaled demand vec- our hypothesis, and therefore proves the result. 
tor will increase her dominant share by a fixed , which
is the same for all users. Let Di = hdi,1 , di,2 , . . . , di,m i Theorem 10 DRF satisfies the sharing incentive and
be the demand vector of user i, let rk be her domi- bottleneck fairness properties.
d
nant share6 , and let si = ri,k be her dominant share.
k Proof Consider a system consisting of n users. Assume
We then scale the demand vector of user i by si , i.e.,
resource k is the first one being saturated by using pro-
Di0 = si Di = si hdi,1 , di,2 , . . . , di,m i. Thus, every time
gressive filling. Let i be the user allocating the largest
6 Recall
that in this section we assume that all tasks of a user have share on resource k, and let ti,k denote her share of k.
the same demand vector. Since resource k is saturated, we have trivially ti,k ≥ n1 .

13
Furthermore, by the definition of the dominant share, we The strategy-proofness of DRF shows that a user will
have si ≥ ti,k ≥ n1 . Since progressive filling increases not be better off by demanding resources that she does
the allocation of each user’s dominant resource at the not need. The following example shows that excess de-
same rate, it follows that each user gets at least n1 of her mand can in fact hurt user’s allocation, leading to a lower
dominant resource. Thus, DRF satisfies the sharing in- dominant share. Consider a cluster with two resources,
centive property. If all users have the same dominant and 10 users, the first with demand vector h1, 0i and the
resource, each user gets exactly n1 of that resource. As a rest with demand vectors h0, 1i. The first user gets the
result, DRF satisfies the bottleneck fairness property as entire first resource, while the rest of the users each get
1
well.  9 of the second resource. If user 1 instead changes her
1
demand vector to h1, 1i, she can only be allocated 10 of
Theorem 11 Every DRF allocation is envy-free. 1
each resource and the rest of the users get 10 of the sec-
Proof Assume by contradiction that user i envies an- ond resource.
other user j. For user i to envy another user j, user j In practice, the situation can be exacerbated as re-
must have a strictly higher share of every resource that i sources in datacenters are typically partitioned across
wants; otherwise i cannot run more tasks under j’s allo- different physical machines, leading to fragmentation.
cation. This means that user j’s dominant share is strictly Increasing one’s demand artificially might lead to a situ-
larger than user i’s dominant share. Since every resource ation in which, while there are enough resources on the
allocated to user i is also allocated to user j, this means whole, there are not enough on any single machine to
that user j cannot reach its saturated resource after user i, satisfy the new demand. See Section 6.2 for more infor-
i.e., tj ≤ ti , where tk is the time that user k’s allocation mation.
gets frozen due to saturation. However, if tj ≤ ti , under Next, for simplicity we assume strictly positive de-
progressive filling, the dominant shares of users j and i mand vectors, i.e., the demand of every user for every
will be equal at time tj , after which the dominant share resource is non-zero.
of user i can only increase, violating the hypothesis. 
Theorem 13 Given strictly positive demand vectors,
Theorem 12 (Strategy-proofness) A user cannot in- DRF guarantees that every user gets the same dominant
crease her dominant share in DRF by altering her true share, i.e., every DRF allocation ensures si = sj , for all
demand vector. users i and j.
Proof Assume user i can increase her dominant share by Proof Progressive filling will start increasing every
using a demand vector dˆi 6= di . Let ai,j and âi,j denote users’ dominant resource allocation at the same rate until
the amount of resource j user i is allocated using pro- one of the resources becomes saturated. At this point, no
gressive filling when the user uses the vector di and dˆi , more resources can be allocated to any user as every user
respectively. For user i to be better off using dˆi , we need demands a positive amount of the saturated resource. 
that âi,k > ai,k for every resource k where di,k > 0.
Let r denote the first resource that becomes saturated for Theorem 14 Given strictly positive demands, DRF sat-
user i when she uses the demand vector di . If no other isfies population monotonicity.
user is allocated resource r (aj,r = 0 for all j 6= i),
this contradicts the hypothesis as user i is already allo- Proof Consider any DRF allocation. Non-zero demands
cated the entire resource r, and thus cannot increase her imply that all users have the same saturated resource(s).
allocation of r using another demand vector dˆi . Thus, Consider removing a user and relinquishing her currently
assume there are other users that have been allocated r allocated resources, which is some amount of every re-
(aj,r > 0 for some j 6= i). In this case, progressive fill- source. Since all users have the same dominant share α,
ing will eventually saturate r at time t when using di , and any new allocation which decreases any user i’s domi-
at time t0 when using demand dˆi . Recall that the domi- nant share below α would, due to Pareto efficiency, have
nant share is the maximum of a user’s shares, thus i must to allocate another user j a dominant share of more than
have a higher dominant share in the allocation â than in α. The resulting allocation would violate max-min fair-
a. Thus, t0 > t, as progressive filling increases the dom- ness, as it would be possible to increase i’s dominant
inant share at a constant rate. This implies that i—when share by decreasing the allocation of j, who already has
ˆ
using d—does not saturate any resource before time t0 , a higher dominant share than i. 
and hence does not affect other user’s allocation before
time t0 . Thus, when i uses d,ˆ any user m using resource However, we note that in the absence of strictly posi-
r has allocation am,r at time t. Therefore, at time t, there tive demand vectors, DRF no longer satisfies the popula-
is only ai,r amount of r left for user i, which contradicts tion monotonicity property [14].
the assumption that âi,r > ai,r . 

14

View publication stats

You might also like