0% found this document useful (0 votes)

18 views15 pages

A Policy Evaluation Tool For Multisite Resource Management

Uploaded by

Junliang Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views15 pages

A Policy Evaluation Tool For Multisite Resource Management

Uploaded by

Junliang Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

1352 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO.

10, OCTOBER 2008

A Policy Evaluation Tool for

Multisite Resource Management
Mudhakar Srivatsa, Member, IEEE, Nithya Rajamani, and Murthy Devarakonda, Senior Member, IEEE

Abstract—Enterprises typically operate multiple data center sites, each handling workloads according to an enterprise-level strategy.
Sharing resources across multiple sites (or enterprises) brings up several important problems. Each site may have its own policies that
govern its interactions with other remote sites. Different policies impact the system performance in different ways, and site administrators
and system designers need to understand the effects of a given set of policies on potential workloads. In this paper, we describe an
analysis methodology that determines the impact of policies on workloads, and we present results and validation for a prototypical
multisite resource sharing system. Our analytical tool is capable of evaluating complex policies on a large-scale system and permits
independent policies for each site so that policy makers can quickly evaluate several alternatives and their effects on the workloads
before deploying them.

Index Terms—Modeling and prediction, performance of systems, policy impact analysis, resource sharing, distributed systems.

1 INTRODUCTION

E NTERPRISES typically operate multiple data center sites [2],

[35], each handling workloads according to an enter-
prise-level strategy. The individual sites are federated and
Sharing of resources across multiple sites (or enterprises)
brings up several important problems. In order to preserve
the autonomy of each site, it is necessary that all decisions
autonomous. Each workload has a Service-Level Agreement about the local resources should be made by the site resource
(SLA) [36], written in terms of priority and performance manager. Hence, each site may have its own policies that
requirements, and it caters to a specific customer base. govern its interactions with other remote sites, e.g., when,
Workloads are managed by workload managers such as the what, and how resources can be shared with the other sites.
IBM WebSphere Extended Deployment (XD Edition) [7], Different policies impact the SLA violations experienced by
which are responsible for distributing individual requests of local and remote workloads in different ways. The site
a workload among resources currently assigned to this administrators and system designers need to understand the
workload. In order to facilitate resource sharing among the effects of a given set of policies on different workloads. We
workloads at a site, resource managers such as the IBM describe here an analysis tool for studying fast and scalable
Tivoli Intelligent Orchestrator (TIO) [5] arbitrates resources policy impact analysis, and we present results and valida-
among the workload managers at that site and are tion for a prototypical multisite resource sharing system.
responsible for performing resource allocation across multi- Our results are useful for system designers in building
ple workload managers at its local site. effective solution strategies, and our methodology can be
The next level of resource sharing happens across different incorporated into a planning and system management tool
sites of an enterprise. The resource manager at one site may that would permit policy makers to understand, fine-tune,
coordinate with its counterparts at remote sites for remote and analyze the effect of policies on the system. Our tool is
resource sharing. Enterprises can benefit from such multisite capable of evaluating complex policies on a large-scale
resource sharing, as underutilized resources at one site can be system and permits independent policies for each site so
used to handle peak load at remote sites, which would that policy makers can quickly evaluate several alternatives
otherwise require overprovisioning of resources. and their effects on workloads before deploying them.
A variation of the multisite enterprise scenario is the We have developed our methodology based on extensive
service provider scenario, where one site provides resources work in the field of performance analysis [28], [37], [32] and
for a set of client sites. Yet, another variation is where load sharing in distributed systems [25], [13], [18]. The main
multiple sites operate on a peer-to-peer basis but do not solution methodology used by our tool is a combination of
belong to a single enterprise. Although the work described closed queuing network and finite-state automaton. We
here is in the context of the multisite single enterprise model the system’s state transitions as a nondeterministic
scenario, it is equally applicable to other variations. finite-state automaton. A site’s policies are used to identify
the set of valid system states and set of valid state transitions.
. The authors are with the IBM T.J. Watson Research Center, Yorktown These transitions are annotated with a rate at which the
Heights, NY 10598. E-mail: {msrivats, rnithya, mdev}@us.ibm.com. transition occurs to construct a queuing network. The
Manuscript received 22 June 2007; revised 4 Oct. 2007; accepted 22 Jan. 2008; solution to the queuing network gives the steady-state
published online 15 Feb. 2008. probability that the system remains in any particular state.
Recommended for acceptance by M. Parashar. The steady-state solution is directly used to study the effect of
For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference IEEECS Log Number TPDS-2007-06-0206. these policies on workloads (e.g., average-case profit, worst
Digital Object Identifier no. 10.1109/TPDS.2008.29. case loss, etc.).
1045-9219/08/$25.00 ß 2008 IEEE Published by the IEEE Computer Society
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
SRIVATSA ET AL.: A POLICY EVALUATION TOOL FOR MULTISITE RESOURCE MANAGEMENT 1353

Fig. 1. Multisite resource management.

However, queuing-network-based analysis is faced with notion of site, workload, event, policy, and cost in Section 4.
a state-space explosion problem, that is, the number of We present algorithms for policy evaluation and combating
states in the queuing network may explode to large state-space explosion in Section 5. Section 6 describes
numbers, making fast and scalable policy analysis a experiments that quantify the scalability of our approach,
challenging problem. In this paper, we introduce the notion followed by a detailed collection of case studies. Section 7
of near-equivalent states and present a tuneable state-space validates our policy evaluation tool against a real imple-
compression algorithm that simultaneously achieves the mentation. Finally, we conclude in Section 8.
following goals: 1) reduce the number of states in the
queuing network by several orders of magnitude and 2 RELATED WORK
thereby facilitate fast and scalable policy analysis, 2) retain
the accuracy of steady-state solution to the queuing network Although load sharing in clusters and processor sharing [34],
and thus preserve the integrity of the overall cost analysis, [29] are widely studied, a methodology for analyzing
and 3) identify bottleneck resources in the system and thus multisite resource sharing under policy constraints is
facilitate the system administrator to perform more intelli- relatively new. The most closely related work is in the field
of grid computing [1], [23], [22]. An open grid services
gent capacity planning and policy tuning. We used our
architecture (OGSA) [24], [21] develops a framework (for
methodology to analyze several multisite resource usage
both commercial and scientific grids) to support distributed
policies. The tool was useful not only in understanding the
system integration, virtualization, and management ser-
effects of various policies but also in characterizing the
vices. A grid resource allocation manager (GRAM) [27]
conditions under which a policy would be useful. The key
supports locating, submitting, monitoring, and canceling
results from our analysis are listed as follows: jobs on a grid.
. Multisite resource allocation is effective when Our work falls under the category of a commercial grid
individual sites are moderately loaded, they experi- that is built on top of the IBM TIO [5]. Platform Load Sharing
ence high variance in their loads, and only a small Facility (LSF) [10] offers several infrastructural features
subset of sites are heavily loaded. similar to the IBM TIO [5]. Platform LSF supports multi-
. Greedy strategies work well when individual sites cluster capabilities allowing a cluster to span across multiple
experience moderate loads and high load variance. sites across various geographical locations. Similar to TIO,
LSF attempts to provide a single computing machinery
. Allowing low-priority workloads to borrow remote
image across multiple connected hardware clusters. LSF also
servers is useful, especially when the resource
supports rich multisite resource allocation policies including
manager uses priority-based preemption to allocate
job priorities and preserves local ownership and control. In
local resources.
this paper, we assume that the infrastructure required to
. Long lease times hurt high-priority workloads.
support multisite clusters is available (say, using the IBM
We validated our analysis methodology with a real TIO or Platform LSF). Our goal is to develop fast and
implementation of the multisite resource management scalable algorithms for performance-based policy impact
system (see Fig. 1). Our prototype comprises a simple analysis that answers questions such as: What if we use policy
two-site scenario. It uses a benchmark J2EE application P instead of policy Q? What if we change the threshold parameter
Trade2 [8] as the workload, Cayuga [35] as the workload in policy P from thr1 to thr2 ? What if site A prefers sharing its
manager, and TIO [5] as the resource manager. We observed resources with site B to site C? What if the sites have different
that our analysis results match the measurements obtained (and possibly conflicting) policies? While Platform LSF sup-
from our implementation to within a 5 percent error. ports a rich set of resource sharing policies and tools for
The following sections of this paper are organized as monitoring resource utilization, it is not evident that it
follows: We describe related work in Section 2, followed by supports policy impact analysis. The algorithms described
an overview of the multisite resource allocation problem in in this paper for fast and scalable policy evaluation (e.g., cost
Section 3. We present a concrete model that captures the analysis and what-if analysis) have been implemented using

Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1354 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008

the IBM TIO, but they may be applicable to several other autonomy of each site, all decisions on local resources
multisite resource management systems. available at that site should be made by the site’s resource
Several authors have addressed policy-related issues in manager. Having a centralized resource manager that
virtual organizations (VOs) [18], [30], [31], [19], which may coordinates the resource managers at each local site might
span multiple autonomous organizations. In [18] and [19], solve this problem in a very similar way that a site’s resource
the authors introduce usage policies for resources in a grid manager coordinates various local application managers.
and evaluate these policies by using simulations and However, the resource managers at each site would lose
measured validation. The authors model VOs that generate their autonomy over local resources. The resource managers
batch jobs to be executed at various resource provider sites. at each site coordinate with their counterparts in a peer-to-
The resource providers accept jobs based on usage policies. peer manner. Fundamental primitives that are used by
The study finds that a policy called commitment-limit is resource managers to interact with their counterparts
most effective in minimizing the response times. The include resource borrowing and resource donating.
commitment-limit policy specifies that a site should accept
a job if a VO’s resource usage is below a threshold or if there 3.1 Resource Borrowing and Donation
are idle nodes and the VO’s use of resources is below another In our framework, each site is capable of borrowing and
threshold. In [31] and [19], the authors extend policy-based donating resources. When a resource is donated from sites
resource allocation techniques for hierarchical VOs. Our S1 to S2 , site S2 can use that resource for time duration
work is motivated by the need to study policy-based work specified by a lease time. At the end of the lease time, site S2
offloading from one site to another in a federated environ- may request a renewal of the lease. Site S1 may renew the
ment by using different system and workload models and lease, depending on its current state. Our resource borrow-
considers a broad range of policies. Therefore, we studied a ing and donating differs radically from traditional resource
different aspect of policy-based resource sharing and models along two dimensions: resource granularity and
provide a complimentary set of results. We believe that time granularity. We assume that the resource granularity is
our methodology is general enough such that various usage one computing node in a server. There are two primary
policies can be modeled and analyzed using our tool. reasons for choosing this granularity level in a commercial
A significant amount of work has been done in the field grid: configuring one node to host multiple applications is
of economic models for resource allocation in peer-to-peer challenging, and more importantly, for security reasons
networks. In [16], the authors have proposed trading as a (accidental information leakage), it may be unsafe to host
mechanism, wherein a site acquires remote resources by two applications (especially if they are from different clients)
trading away its own local resources. Several authors have on the same node. Having said that, as isolation and
also researched on optimal resource allocation in peer-to- virtualization techniques [9], [11] improve, it would be
peer networks [40], [17]. Our architecture for multisite possible to use finer grained resources. Nonetheless, our
resource management is peer to peer; however, unlike policy evaluation tool can be easily extended to accommo-
peer-to-peer systems wherein the peers may be greedy and date such fine-grained resources.
competing with each other, our focus is on cooperative The time granularity at which a resource is leased could
multisite resource management. Our work shares a similar vary from a couple of minutes to hours. This large resource
spirit as that of several studies on commercial sharing of granularity can be attributed to the fact that data centers
IT resources [14], [15]. (sites) typically own tens or even hundreds of nodes. The
Many papers describe system architectures required to coarseness in time granularity was chosen, because it takes
support cluster-based resource sharing and load balancing a long time to switch nodes among different applications:
for Web servers and application servers. Fox et al. [25] e.g., time is required to set up the software stack, provision,
describe the idea of using a separate overflow pool of and enable the application [6].
servers (that are not usually a part of the cluster) for For the sake of simplicity, we assume that all updates to
handling bursts in the Web traffic for one application. A the logical state of a resource manager are serialized. This
report on giant-scale services [13] presents an extensive logical state update changes the in-memory state model and
discussion on Internet-based systems that are primarily adds log entries. These operations are typically fast, and
single owner and comprise well-connected clusters. The thus, this keeps the information lag small. Following the
basic model of the giant-scale services implementation is synchronized logical state update, a workflow that executes
similar to our site resource manager that attempts to hide the actual borrow or donate operation is kicked off. If
node failures and balances traffic. However, the emphasis necessary, this workflow also installs the required software
of this paper is on lessons learned from the infrastructural stack, installs the application, and sets appropriate environ-
facilities available for effective load balancing and on ment variables. The actual state change (trigged by work-
enhancing the performance of application services such as flows) may take several seconds or a couple of minutes.
round-robin DNS and layer-4 and layer-7 Web switches.
Emphasis is on the evaluation of multisite resource sharing 3.2 Policies
policies in a system model of a few small enterprise data In view of the resource borrowing and donating model
centers to the giant-scale services. discussed above, a resource manager needs to make the
following decisions:
3 MULTISITE RESOURCE ALLOCATION . When can a node be borrowed?
In this section, we present the outline of our framework, . Where can a node be borrowed?
which can be employed by a resource manager to share . When can a node be donated?
resources with its counterparts. In order to preserver the . How can lease renewal requests be handled?

Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
SRIVATSA ET AL.: A POLICY EVALUATION TOOL FOR MULTISITE RESOURCE MANAGEMENT 1355

These policy-based decisions could be performance related measurements are primarily used to build a workload
(for example, never borrow a node from a remote site generator that best simulates the dynamics on a commercial
when there are free nodes available at the local site) or grid.1 The workload transits between levels, as specified
administrative (for example, site A should never donate/ by a transition probability matrix tpm: nlevels nlevels.
borrow a resource to/from site B). Our framework permits The entry tpmði; jÞ specifies the probability that a workload
a wide variety of such policies to be deployed by the transits from levels i to j ð1 i; j nlevelsÞ.
system. In order to preserve site autonomy, we permit sites
to specify their own policies. Our analytical tool is capable 4.4 Workload-State Model
of studying the effect of these (possibly) heterogeneous A workload’s state is characterized by hlv; nl; nbi, where lv
policies on different workloads. denotes the current workload level (as described in the
workload model above), nl denotes the amount of local
resources that are currently serving this workload, and nb
4 MODEL is an array (one element per remote site) that denotes the
In this section, we present a concrete model that describes amount of resources borrowed on behalf of this work-
various entities (physical and logical) in multisite resource load. These resources (local and remote) belong to the
allocation, including machines, sites, workloads, policies, pool type required by the workload. A collection of
events, and cost models. We also describe a model for resources (local and remote) is quantified by the three
capturing the state of physical entities such as site and tuple hnnodes; mm; pti, where nnodes denotes the number
workload. of nodes of machine model mm and pool type pt.

4.1 Machine Model 4.5 Site-State Model

A machine or a server is presented as a five tuple In contrast to the site model that describes static parameters
hnnodes; cpu; mem; disk; nwi, where nnodes denotes the associated with a site, the site’s state model captures the
number of homogeneous nodes (blades) in a server. For time-varying parameters associated with the site. A site’s
each node, cpu denotes the amount of computing power, state is represented as hws; as; ndi, where ws is an array of
mem denotes the amount of main memory, disk denotes local workload states, as is an array of active resources that
the disk access bandwidth, and nw denotes the networking are not in maintenance mode, and nd denotes an array
bandwidth available. For scalability reasons, all resources (one element per remote workload) that denotes the amount
are discretized using a least count. For example, the cpu of resources donated to some remote workload.
resource is discretized using a least count of 100 million
instructions per second (MIPS), and the network resource 4.6 Policy Model
nw is discretized using a least count of 1 Mbps. In our framework, policies direct a resource manager in
4.2 Site Model responding to events. Let S denote the current state of the
system. Let P be a collection of policies. Our policy engine
A site is represented by an array of hns; mm; pti, where ns
denotes the number of nodes with machine model mm and is event driven; that is, it is invoked only when some
pool type pt. Each pool represents the software stack external event e occurs. The policy set P is used to guide the
deployed on the node, including the operating system, system to a collection of plausible new states that conform
network stack, application stack, etc. Each site has a policy to the policy set. More concretely, when the policy set P is
set P that describes site-specific policies. Policies conform invoked and the current system state is S on some event e, it
to the policy model, which will be described later in this returns a collection fðS1 ; p1 Þ; ðS2 ; p2 Þ; ; ðSn ; pn Þg, where Si
section. A site is associated with a list of workloads (or is a valid next state, and pi is the probability with which the
applications) that are hosted by the site that conforms to the system is recommended to transit to state Si such that
workload model described as follows: Pn
i¼1 pi ¼ 1. Note that if n is equal to one, then there is only
4.3 Workload Model one next state that conforms to the policy set P . If n > 1, the
Workloads denote applications hosted by a site. Each system would choose its next state probabilistically from
workload is associated with an SLA [36] that explicitly the set fSi : 1 i ng.
specifies the guarantees provided by the service provider More formally, a policy P1 can be a hpredicatei or a
(site) and the cost of violating those guarantees. We assume three tuple hpredicate; e; actioni, where predicate is a
that each workload draws its resources (nodes) from predicate on the system state S. If P1 ¼ hpredicatei, then
exactly one server pool type. We use a capacity planner all states S such that predicateðSÞ is false is precluded. If
[4] to determine the number of nodes required for P1 ¼ hpredicate; e; actioni, then for all states S such that
satisfying the workload’s SLA for different values of mean predicateðSÞ is true and when an event e occurs, the policy
workload request arrival rates. The capacity planner requires the system to perform an action action. In
essentially permits us to discretize the resource require- general, our policy specification requires a method
ment for a workload at any instance of time into nlevel nextStateðS; actionÞ (could be any arbitrary function; a
levels. Each level is, in turn, mapped to the resources Java method) that returns the next state S 0 when the
required for handling the workload satisfactorily (accord- system performs an action action from state S. An action
ing to its SLA). Resource requirement is represented as an could involve borrowing/donating resources, reallocating
array of hnnodes; mm; pti, where nnodes denotes the local resources among workloads, etc.
number of nodes of machine model mm and pool type pt.
We use empirical measurements to determine the distribu- 1. Popular choices for such distributions include the exponential
tion of the amount of time a workload stays at level i. These distribution and the Pareto distribution.

Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1356 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008

4.7 Event Model whether the reallocated node was idle or running some
Events represent those external changes that would require workload, whether the reallocated node was local or
the system to reallocate or redistribute its resources among remote with respect to the workload, and what the cost
workloads in order to meet its business objectives. We of provisioning and setting up the workload is. First, the
focus on two major types of events in our framework: 1) a RC would be higher if the node was previously running
workload event occurs when a workload requirement some workload (say, due to the time expended in simply
moves from one level to another and 2) a node event shutting down that workload on the node); also, such a cost
occurs when a node fails (or is moved into the maintenance might depend on the workload (if any) that was previously
mode) or when a node is reinstated into the system upon running on the node. Second, the RC for a local node is
recovering from failure (or upon maintenance completion). likely to be much smaller than that for a remote node. This
A node event also occurs when a node is borrowed or is primarily because a local reallocation does not have to go
donated and when the lease on a borrowed node expires. through agreements between different sites [4]. Support for
Given a state S, there is a set of events E that could automated negotiations across sites is provided by PANDA
potentially occur when the system is in state S. For each [26] and WS agreements [12]. Third, the most important
event e 2 E, the probability distribution of the time for component of reallocation is the cost of provisioning and
event e to fire is assumed to be known. For workload setting up the environment required for the workload on
events, the probability distribution of the time to the next the reallocated node. This might involve additions to the
event is obtained from the workload model. We assume that node’s software stack, starting the workload’s runtime
nodes fail (and recover) independently and the time of environment, setting up database connections, etc.
failure (and recover) is assumed to follow an exponential
4.9 Sample Policies
distribution or the Weibull (bathtub) distribution.
In this section, we discuss samples policies that are
4.8 Cost Model permissible in our model. We present policies that aid the
We have developed a cost model for evaluating the effect of system in responding to the following questions: When can
various policies on the system. In general, our cost model a node be borrowed? Where can a node be borrowed?
permits cost functions that are arbitrary functions of the When can a node be donated? For the sake of simplicity, in
the following discussion, we assume that workloads are
system’s state and state transitions. In this section, we present
prioritized in decreasing order of their VCs (VCs are
a sample cost model that considers three important costs:
typically specified in dollars).
1. Violation cost (VC). This cost represents the cost of
violating a workload’s SLA. 4.9.1 When to Borrow a Node?
2. Remote node cost (RSC). This cost represents the cost Let an event e denote the fact that a workload w
of using a remote resource. requirement has increased. Then, a borrow policy could
3. Reallocation cost (RC). This cost represents the initial be defined as a collection of policies shown as follows: With
A denoting the site to which workload w belongs
setup and provisioning cost for a workload.
Additionally, one could include the cost of operating a node . P1 . Always allocate idle nodes in preference to
measured in terms of power (typically dominated by cooling other nodes.
costs), physical space, and man-hours for maintenance. . P2 . Always use local nodes in preference to remote
We use ðS; wÞ to denote the VC for workload w when nodes.
the system is in state S. A popular choice for estimating the . P3 . If site A has currently donated nodes for some
VC would be to make the VC proportional to ndsðS; wÞ, remote workload rw, then preempt remote workload
where ndsðS; wÞ denotes the difference between the number rw if its priority is smaller than workload w.
of nodes required for workload w and the number of nodes . P4 . If site A had allocated nodes to some local
actually allocated to workload w (inclusive of local and workload lw, then preempt local workload lw if its
remote nodes) in state S. In this case, the VC would be priority is smaller than workload w.
expressed as a penalty per deficit node per unit of time in . P5 . A workload w is eligible for using a remote node
the workload’s SLA. only if its priority is greater than a minimum
We use ðS; wÞ to denote the cost of using a remote threshold.
node for workload w when the system is in state S. A
4.9.2 From Whom to Borrow a Node?
common choice for estimating the RSC would be to make
it proportional to the nbsðS; wÞ, where nbsðS; wÞ denotes Let us suppose that a workload w has qualified to borrow a
the number of remote nodes borrowed for workload w in remote node. Then, the site from which it borrows a node
could be determined by the following policy. Let nfX
state S. In this case, the RSC would be expressed as a
denote the number of free nodes currently at site X:
penalty per remote node per unit of time in the site’s
policy set. Additionally, one might choose to distinguish . P6 . Borrow a node from the site that has the
between nodes borrowed from different sites. In that case, maximum number of free nodes.
the RSC for different nodes would depend on the site
from which those nodes were borrowed. 4.9.3 When to Donate a Node?
We use ðS; S 0 ; wÞ to denote the RC for workload w Let us suppose that site X has received a request for
when the system makes a transition from states S to S 0 . The donating a node. Then, the site grants the request based on
RC would typically depend on at least three factors: the following policies. Let nfX denote the number of free

nodes currently at site X. Let the number of requested

nodes be r:

. P7 . Donate for r0 nodes for workload w, where

r0 ¼ minðnfX F ; rÞ, with F denoting the minimum
number of free nodes required after granting a
donation request.
. P8 . Workload w is eligible for borrowing a node only
if its priority is greater than a minimum threshold.

5 ANALYSIS
In this section, we present techniques for analyzing a
complex policy-driven system. We incorporate policies into
the system by modeling the system dynamics as a finite- Fig. 2. Policy-guided state-space exploration.
state automaton. Then, we superimpose a queuing net-
work model on the automaton that labels state transitions nfX . Policies like p3 permit policy makers to add
in the automaton with probability distribution functions. randomization techniques that are popularly used as
We then solve the model by using numerical techniques or heuristics for performance enhancement.
simulations and use this solution to estimate workload We encode policies into the function P ðS; eÞ, as discussed in
costs (VC, RSC, and RC). the policy model. Recall that P ðS; eÞ determines how the
system responds to event e when it is in state S. We now
5.1 Finite-State Automaton present a simple technique for constructing the finite-state
We model the dynamics of a complex policy-driven automaton from the site models and their policies. We use
system as a nondeterministic finite-state automaton. The the algorithm in Fig. 2, which uses a policy-driven
automaton is constructed automatically from the indivi- technique for constructing the automaton. The algorithm
dual site’s policy set that is defined according to the policy starts with some valid (or legal) state S0 and explores the
model. States in the automaton correspond to the system state space in a policy-driven manner. This technique
states. State transitions are triggered by events. We use P allows us to automatically generate the automaton from
to denote the collection of policy sets from all sites in the the site model and their policies. By construction, every
system. The collective policy set P determines how the state S in the automaton is legal, and every transition T in
system responds to events. More specifically, policies the automaton is permissible. The probability of transition
serve three critical purposes: 1) to determine whether a is handled in the queuing network model that is juxtaposed
given state S is legally permissible or not (we call a state S on this finite-state automaton. We observed that this policy-
illegal if it does not conform to the policy set P ), 2) to guided state-space exploration technique allows us to speed
determine whether a transition T between two states S up automaton construction drastically. This is because
and S 0 is permissible or not (we term a transition, that is, a among all the combinatorial choices that could potentially
response to an event, illegal if the event is not handled in represent a state, very few (< 0.1 percent) of them actually
a way that conforms to policy set P ), and 3) to determine conformed to our policy set. For instance, simple policies
the probability that the system makes a transition T from like “do not starve a workload whenever an idle node (of
states S to S 0 in response to a given event e. We illustrate the required pool type) is locally available,” “use local node
the role of policies in generating the finite-state automaton in preference to remote nodes,” and “use priority-based
description for the system using three examples: preemptive scheduling to manage local nodes,” tend to
drastically limit the number of states. Hence, pruning the
. Policy p1 : site A never borrows a remote node for state space by using a policy-guided technique turned out
workload w. Given a state S, its legality can be to be highly beneficial.
tested (with respect to policy p1 ) by ensuring that
the workload state for w should indicate that nb 5.2 Queuing Network Model
(the number of nodes borrowed by workload w) is We superimpose a queuing network model on the finite-
equal to zero. state model to annotate state transitions with their prob-
. Policy p2 : site A borrows a remote node for ability distribution functions. A transition T : S !e S 0 is
workload w from a remote site that currently has labeled with a tuple hfe ; pri. The function fe describes the
the maximum number of idle nodes (of the same probability distribution of the event e that causes the system
server pool type as that required for workload w). to transit from states S to S 0 . The probability pr denotes the
Given a transition T from states S to S 0 , its legality probability that the system transits from states S to S 0 in
can be tested (with respect to policy p2 ) as follows: response to event e. For node events, the function fe is an
If workload w has borrowed a node in transition T , exponential distribution. For workload events, the function
then the state S 0 should indicate that the borrowed could either be an exponential distribution or a Pareto
node belongs to the site that has the maximum distribution. Exponential distribution is amenable to nu-
number of idle nodes in state S. merical analysis and thus provides fast (though crude)
. Policy p3 : suppose that site A has to borrow a remote results, while Pareto distribution captures a more realistic
node for workload w and let nfX denote the current bursty workload characteristics [37].
number of idle nodes at site X; then, site A borrows A solution to the above model gives us prðSÞ for all
a node from site X with a probability proportional to states S and rateðT Þ for all transitions T , where prðSÞ

Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1358 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008

denotes the probability (over an extended period of time)

that the system is in state S, and rateðT Þ denotes the rate
(over an extended period of time) at which the system
makes a transition T . We use values to estimate workload
costs as follows (in terms of mean cost units per unit time):
XX
VC ¼ ðS; wÞ prðSÞ;
S w
XX
RSC ¼ ðS; wÞ prðSÞ; Fig. 3. Coalesce and solve.
S w
X X
RC ¼ ðS; S 0 ; wÞ rateðT Þ: computation resource and the memory resource with
T :S!S 0 w
weights wcpu and wmem wnw and wdisk .
We compute prðSÞ and rateðT Þ from the model as follows: If However, the fact that distanceðS1 ; S2 Þ is very small does
the workload events are exponentially distributed, then we not necessarily imply that the states can be coalesced. For
instance, consider a state S1 wherein the system utilization
compute the stationary probability distribution by using
is 100 percent and a state S2 that is very close to S1 ,
standard techniques to solve a Markov chain. The stationary wherein the system is overloaded (and thus violating the
probability distribution directly gives us prðSÞ for all states S. SLA for some workloads). Although distanceðS1 ; S2 Þ is very
For every transition T : S !e S 0 , rateðT Þ ¼ prðSÞ rateðfe Þ, small, the two states are not equivalent. So, we add the
where rateðfe Þ denotes the rate of the exponential distribu- second cost constraint that requires that the cost function at
tion fe . We use a discrete event simulation [20] to solve the both states S1 and S2 evaluate to nearly the same value.
The cost of a state S P for our sample cost P model is
model when workload events follow a heavy-tailed Pareto
determined P P as costðSÞ ¼ w ðS; wÞ prðSÞ þ w ðS; wÞ
distribution. We run the simulation for a substantially long prðSÞ þ S0 w ðS; S; wÞ rateðS ! S 0 Þ.
period of time tsim . In the course of this run, we measure the Note that in general, a policy may specify arbitrary
amount of time expended by the system in any state S by tðSÞ actions to events. Hence, strictly speaking, the policy may
and use this information to compute prðSÞ ¼ tðSÞ require the system to react in entirely different ways on the
tsim . Similarly,
same event e, from states S1 and S2 . Hence, one would have
we measure the number of times that the system transits
to compare all the transitions in and out of states S1 and S2
from states S to S 0 by using a transition T by nðT Þ and before we conclude that the states are equivalent. However,
Þ
estimate rateðT Þ ¼ nðTtsim . Given prðSÞ for all states S and we believe that a consistent set of policies would not make
rateðT Þ for all transitions T , the workload costs can be radically different decisions on the same event e from
estimated as previously discussed. two states S1 and S2 with a very small distance and
comparable costs. Hence, our policy evaluation tool does
5.3 State-Space Explosion Problem not consider the transitions in and out of a state when
Theoretically, solving the queuing network for the steady- determining the equivalence between two states. However,
state model permits us to evaluate a set of policies. However, one can optionally turn checks on transition; then, the
even for a reasonably small system, the number of possible evaluation tool would report possible inconsistencies in the
system states tends to be exponentially large. This greatly policy set P . When two states S1 and S2 are coalesced, we
limits the scalability of the policy evaluation tool. However, replace them with a state S12 , with prðS12 Þ ¼ prðS1 Þ þ prðS2 Þ.
we recognize that most of these system states may not be For every transition T1 : S1 !e S and T2 : S2 !e S (for some
important. In this section, we propose techniques to coalesce state S), we replace with a transition T : S12 !e S, with
near-equivalent states. There are two primary advantages in prðT Þ ¼ prðT1 Þ þ prðT2 Þ. The same holds for transitions T1 :
resorting to state coalition: 1) the number of states in the S !e S1 and T2 : S !e S2 to be replaced with a transition
system significantly reduces and 2) at the end of state T : S !e S12 , with prðT Þ ¼ prðT1 Þ þ prðT2 Þ.
coalition, we are left with important states. However, observe that costðSÞ depends on prðSÞ and
We say that two states S1 and S2 are nearly equivalent if rateðT Þ (for all T : S ! S 0 and T : S 0 ! S), which can be
1Þ obtained only by solving the queuing network model for its
distanceðS1 ; S2 Þ < disthr and j costðS
costðS2 Þ 1j < costthr. The
steady-state solution. We merge the process of coalescing
distance distanceðS1 ; S2 Þ is defined as the Cartesian distance
states and solving the queuing network model as follows:
between the two state vectors S1 and S2 . When we compute Let the set of linear equations obtained from the steady-state
the distance, we ensure that different elements of the state solution to the queuing network
P P model be represented as
vectors are normalized. For example, let us suppose that all j aij prðSj Þ ¼ 0 (for all i) and j prðSj Þ ¼ 1. We start with
processing speeds are between 100 and 1,000 MIPS. Then, an initial guess for prðSj Þ by using simulations (for faster
we normalize all processing speeds to a range (0, 1) by convergence). Then, based on the distance threshold and the
cost threshold, we iteratively coalesce the states and solve for
mapping X MIPS to X100 900 . Finally, when we compute the
the steady-state equation, as shown in the algorithm in Fig. 3.
Cartesian distance between two normalized state vectors,
In general, one can use many stopping conditions for the
we permit different weights to be associated with different algorithm. We have used a precision-based stopping
resources. For example, when the application is highly condition in the algorithm. Alternatively, one can use a
computation and memory intensive, we could weight the condition on the number of states ðjSSj < thrÞ, the runtime

Fig. 4. Unified policy evaluation.

Fig. 5. Site types.

of the policy evaluation tool ðrunT ime < thrÞ, or the total
The unified policy evaluation algorithm also permits the
number of iterations ðnumItr < thrÞ.
When the policy evaluation tool terminates, we are left administrator to carry out sensitivity analysis effectively.
with important states and only relevant components in the We study techniques to perform sensitivity analysis with
state vector. For example, consider a computation-intensive respect to a system model, a workload model, and a cost
application where disk I/O is irrelevant. Let us suppose that model. Let us suppose that we have evaluated a system
a normalized network resource initially had four classes model SM, a workload model W M, and a cost model CM
(0, 32 kilobits per second (Kbps)), (32 Kbps, 64 Kbps), under the policy set P to obtain CSS (coalesced states). Let
(64 Kbps, 96 Kbps), and (96 Kbps, 128 Kbps). At the end of us now suppose that we make a small change 4SM in the
the algorithm, we will be left with states wherein the system model. Recall that SM is a state vector and 4SM
network resource has only one class (0, 128 Kbps). When any denotes the change in the system model and is also
resource spans its full range, the resource becomes irrelevant represented as a huge state vector such that the new
(irrespective of the quantity of that resource, the system system model SM 0 ¼ SM þ 4SM (vector addition). We use
behaves identically). Hence, we replace such resources with 4SM and update every state S in CSS to obtain
a special class “,” meaning that the resource is not S 0 ¼ S þ 4SMS, where 4SMS denotes a projection of
important for the given application, the policy set, and the 4SM on S. For example, if S has an on the disk I/O
cost function. In general, we start with an arbitrarily large component in its vector, then the disk I/O component in
number (millions) of highly fine-grained states. Our algo- 4SMS is also replaced by . If 4SM involved a machine
rithm not only provides a steady-state solution to the with 300 MIPS replaced by a machine with 600 MIPS and
queuing network model but also reduces the number of the state S had its CPU resource vector marked with the
states to only a few important ones (a few tens, depending on range (0, 1,000) MIPS, then 4SMS is (0, 1,000) MIPS.
the distance and cost thresholds). Clearly, a project of 4SM on S eliminates all changes that
5.4 Unified Policy Evaluation Algorithm are irrelevant to S. Finally, we compute important changes
in the system model
P with respect to the policy set S as
In this section, we present our complete algorithm for policy 0
imp change ¼ SS2CSS distanceðS; S Þ. If imp change is
evaluation. The algorithm takes a system model SM, a
workload model W M, a cost model CM, and a policy set P larger than a threshold, we solve the system afresh by
as its input. The output of the algorithm is a probability using the algorithm in Fig. 4; else, we simply construct the
distribution function of the overall system cost, namely, new set of coalesced states CSS 0 by replacing every state S
P rðcost ¼ xÞ for all x, where P rðcost ¼ xÞ denotes the in CSS by S 0 . We use CSS 0 to obtain the cost distribution.
probability that the overall system cost is equal to x. Note We use the same technique described above for sensitivity
that, given the overall cost distribution, one could easily analysis toward the workload model. However, one cannot
measure the average cost and its higher order moments (like use the same technique for the cost model, since states may
standard deviation). The unified policy evaluation algo- be coalesced using a cost threshold. We therefore start with
rithm is shown in the algorithm in Fig. 4. We use a policy- the uncoalesced state space SS and evaluate the costðSÞ for
guided state-space generation technique to generate the every S in SS by using the cost model CM and cost model
state space (the algorithm in Fig. 2). We then coalesce the CM 0 . If the mean difference SS 2 CSS jcostCM ðSÞcost
jSSj
CM 0 ðSÞj
is
states, depending on the distance and cost thresholds, and lesser than a threshold, then we assume that this change in
solve the coalesced state space CSS for a steady-state the cost model does not change the set of coalesced states.
solution. The steady-state solution gives prðSÞ, the prob- In this case, we use the same CSS to revaluate the cost
ability that the system is in some state S for all S 2 CSS. distribution (step 3 of the algorithm in Fig. 4); else, we rerun
Now, we obtain the cost of each state S by using a cost model the algorithm from step 2.
and translate the probability distribution over the state space
to a probability distribution over the overall system cost. The
cost distribution allows the administrator to perform worst 6 RESULTS
case analysis. For example, there could be two policy sets P In this section, we present several results obtained using
and Q such that the average cost of P is lower than that of Q. our analytical tool to study various policies for multisite
However, P might have a state S, where prðSÞ > 0, and resource management. For every experiment, we use a
costðSÞ is greater than the cost of all states in the state space different scenario that best highlights the inferences that
of policy Q. The administrator can make such an observation we made from them. A scenario is described using the site
by looking at the probability distribution plots for the and workload models used to perform the experiment. We
overall system for policy sets P and Q. This enables the use a small set of site types and workload types in all our
administrator to make much sounder decisions about which experiments. We first describe them in Figs. 5 and 6. For
policy set should be chosen for the system. example, Fig. 5 shows that a site of type ST1 runs a

Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1360 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008

Fig. 8. Accuracy of the policy evaluation tool.

Fig. 6. Workload model: sample transition probability transition matrix.

Fig. 9. Runtime.

local reallocation, and nrtsðS; S 0 ; wÞ denotes the number of

nodes involved in a remote reallocation (for workload w
Fig. 7. Fraction of remaining states. when the system transits from states S to S 0 ).

workload of type W T1 and has three local nodes of pool- 6.1 Scalability Experiments
type zero, a site of type ST2 runs a workload of type W T2 In this section, we present performance and scalability
and has one local node of pool-type zero, and a site of results on our policy evaluation tool. We first study the
type ST3 runs two workloads, both of type W T2 , and has effect of distance and cost threshold on the performance of
two local nodes of pool-type zero. the policy evaluation tool and then show the ability of our
We now describe the first workload type W T1 . A evaluation tool to scale with the number of sites. Fig. 7
workload of type W T1 has five levels numbered 0, 1, 2, 3, shows the fraction of remaining states for different values of
and 4. At level i, the workload requires i nodes. At each distance and cost threshold. Note that only important states
level i, the workload spends an exponentially distributed are left behind when our policy evaluation tool terminates.
amount of time, with a mean of 100 time units. The default Recall that the higher the distance and cost threshold, the
transition probability matrix of the workloads is shown in more the probability that the states would be coalesced.
Fig. 6. In our experiments, we vary this matrix to change the Note that the number of important states drops steeply as
mean load and load variance. In this example, the mean the threshold values are increased. This is primarily because
load is about two nodes. When averaged over a long period most of the system states are indeed equivalent to one
of time, the workload is in level 2 for about 66 percent of the another and can be coalesced. For example, in a transac-
time, in levels 1 and 3 each for about 13 percent of the time, tional grid, the cost model is independent of the disk I/O
and in levels 0 and 4 each for about 4 percent of the time. A utilization. Hence, in all the final states, the disk I/O part of
workload of type W T2 has three levels numbered 0, 1, and the state vector would be eliminated (replaced by “”).
2. At level i, the workload requires i nodes. At each level i, Fig. 8 shows the accuracy of the evaluation tool for
the workload spends a Pareto-distributed amount of time, different values of distance and cost threshold. Accuracy is
with a mean of 100 time units and infinite variance. In our measured as the ratio of the estimated system cost with
experiments, we use different transition probability ma- state coalition and without coalition. In Fig. 7, as the
trices to achieve different mean loads and load variances. threshold is increased, the number of remaining states
For all workloads, we used the following simplified cost decreases sharply. However, the accuracy of our policy
model. We assume that every workload w has a priority evaluation tool decreases very marginally with the distance
denoted by priorityðwÞ. The VC parameter ðS; wÞ ¼ and the cost threshold.
ndsðS; wÞ priorityðwÞ, where ndsðS; wÞ denotes the differ- Fig. 9 shows the time taken for policy evaluation for
ence between the number of nodes required for workload w different values of distance and cost threshold. The time
and the number of nodes actually allocated to workload w that it takes for the policy evaluation tool to terminate falls
(local and remote nodes inclusive) in state S. The RSC very sharply with the threshold values. This is primarily
parameter ðS; wÞ ¼ 0, that is, there is no penalty for using because of the reduced number of states. Note that the
a remote node. The RC ðS; S; wÞ is defined to be equal to number of equations to be solved in order to obtain a
the VC experienced by the workload during the reallocation steady-state solution for the queuing network model is
process. Based on our measurements on our prototype (see proportional to the number of states. Note that the cost of
Section 6.1), we observed that a reallocation involving a solving a system of n linear equations is proportional to n2 .
local node took two time units and that involving a remote Fig. 10 shows the scalability of the system with the
node took 12 time units. Hence, ðS; S 0 ; wÞ ¼ nltsðS; S 0 ; wÞ number of sites for certain values of distance and cost
priorityðwÞ 2 þ nrtsðS; S 0 ; wÞ priorityðwÞ 12, w h e r e threshold. Note when the threshold values are zero, the
nltsðS; S 0 ; wÞ denotes the number of nodes involved in time that it takes for the policy evaluation tool to terminate

Fig. 10. Scalability with the number of sites. Fig. 12. VC versus load variance.

almost increases exponentially with the number of sites. 1. Light loads. In this zone, there is not much need to
This is because the number of possible system states borrow resources, and hence, cooperating multiple
increase exponentially with the number of sites, thereby sites do not succeed in reducing the aggregate VC
severely limiting the scalability of the policy evaluation tool. significantly.
However, as we raise the threshold, the policy evaluation 2. Heavy loads. In this zone, the system (the collective
tool is much better equipped to handle a system with a resources available at all sites) is insufficient, and
larger number of states. hence, cooperation does not yield significant gains
(unless the workloads vary largely in terms of their
6.2 Case Study priorities).
In this section, we present a collection of case studies on 3. Moderate loads. In this zone, sites can offload their
multisite resource allocation. Even though our policy peak demands to free nodes available at remote
evaluation tool allows different sites to use different cost sites, thereby achieving much lower VCs.
models, in this section, we assume that all sites use the same Fig. 12 shows the aggregate VC as the load variance
cost model. changes (at fixed mean load ¼ 4). As the variance increases,
the workloads spend most of their time at a state where
6.2.1 When Is Multisite Resource Allocation Useful? they need four nodes or at a state where they need just
In this experiment, we identify the workload characteristics zero or one node. At lower load variances, the workloads
spend a significant portion of their time close to their
that would make multisite resource allocation a better
mean, that is, when the workloads each need two nodes.
choice compared to independent noncooperating sites. This
As the variance increases, cooperating multisites can
comparison is achieved by explicitly comparing the handle peak demands at one site by borrowing resources
aggregate VC of cooperating versus noncooperating sites. from the other site. Although the variance is high, the peak
The workload characteristics that are of primary interest to demands at the two sites are likely to be uncorrelated.
us are the mean load and the load variance. Furthermore, as the variance increases, the workloads
Scenario 1. Two sites S1 and S2 are both of type ST1 . Site S1 has spend much more time units at a state where they require
one workload W1 of type W T1 . Site S2 has one workload W2 four nodes and at a state where they require zero node.
Hence, as the variance increases, cooperating multisites
of type W T1 . Both workloads W1 and W2 have the same
become a much better choice for resource allocation
priority. compared to noncooperating sites.
Fig. 11 shows the VC as the total mean load of Scenario 2. Two sites S1 and S2 are both of type ST5 . Site S1 has
workloads W1 and W2 varies ðunder fixed variance ¼ 1Þ. one workload W1 of type W T1 . Site S2 has one workload W2 of
“lvc” denotes the VCs when the sites operate without type W T1 . Both workloads W1 and W2 have the same priority.
cooperating with each other (they optimize resource
In this experiment, we demonstrate the usefulness of
allocations locally). “vc” denotes the VCs when the sites
multisite resource allocation when the sites are unevenly
cooperate with another and borrow/donate nodes to
loaded. Fig. 13 shows the aggregate VCs when the sites are
handle peak loads. Fig. 11 can be divided into three zones: unevenly loaded. We fix the mean load on site S1 to be

Fig. 11. VC versus mean load. Fig. 13. Unevenly loaded sites.

Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1362 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008

Fig. 14. Sensitivity analysis. Fig. 16. Worst-case analysis.

1.5 nodes and vary the mean load on site S2 . When the load indicates that only workloads fWi : i jg can borrow
on site S2 is very low, cooperating multisites do not have remote nodes. The values shown in the figure are normal-
any advantage. However, as the load on S2 increases, site S2 ized by “pr1 ,” which indicates the VC when all workloads
can offload some its load to site S1 . But, when site S1 gets can borrow remote nodes. At very low loads, there is no
loaded to its maximum capacity, it can no longer accept need to borrow nodes, and hence, the threshold has no effect
load from site S2 . Hence, the difference between “lvc” and on the VC. As the mean load increases, there is opportunity
“vc” does not diverge when the workload on site S2 soaks to offload the peak demand by borrowing remote nodes.
up all the resources available in site S2 and the unused Setting a threshold inhibits the system from exploiting the
resources in site S1 . free nodes available at remote sites for lower priority
Fig. 14 shows how our evaluation tool can be used to workloads. However, at very high loads, using a borrow
perform sensitivity analysis. Fig. 14 shows a simple threshold is very useful. Since lower priority workloads are
sensitivity analysis on the overall system cost as we vary not allowed to borrow nodes, there is not much time wasted
the workload model parameters. We vary the mean load and in reallocating resources among different workloads.
the load variance (keeping the mean load a constant) and Fig. 16 shows the cost distribution of two policies P1 and
study its effect on the system cost. On the x-axis, we show the P2 under a heavy mean load ¼ 8. P1 uses a borrow thresh-
factor by which a workload parameter is changed, and the old of 4, and P2 uses a borrow threshold of 2. Observe in
y-axis shows the corresponding factor by which the overall Fig. 15 that the average cost of P1 is lower than the average
system cost changes. Fig. 14 shows that by keeping the mean cost of P2 . However, the cost distribution in Fig. 16 shows
load a constant and increasing the variance by 20 percent that the worst case cost for P1 is higher than the worst case
(a factor of 1.2), the overall cost goes up to the same extent cost of P2 . This is primarily because under policy P1 ,
as increasing the mean load by 12 percent to 14 percent. three workloads are never permitted to borrow resources,
and thus, it incurs a higher cost when the load due to
6.2.2 Borrowing Remote Nodes workload W4 is low, and the rest (W1 , W2 , and W3 ) are
In this section, we study the effect of borrow threshold thr br high. If the system administrator is interested in worst case
on the aggregate VC. Note that when a borrow threshold is costs, then the administrator can graphically view the cost
enforced, only workloads w with priorityðwÞ thr br are distributions before deciding on the appropriate policy.
permitted to borrow remote nodes. The first experiment on The second experiment on borrow threshold shows the
borrow threshold shows that borrow threshold is useful only effect of a local resource allocation strategy. In this scenario,
when the system (inclusive of all sites) is operating at a high the resource manager uses a priority-based preemptive
mean load. The second experiment shows how our resource allocation strategy for managing local resources.
analytical tool can be used to perform worst case analysis.
Fig. 17 shows the VC versus the mean load for different
Scenario 3. Four sites S1 , S2 , S3 , and S4 are all of type ST2 . values of borrow threshold. “prj ” indicates that only
Site Si has one workload Wi of type W T2 for 1 i 4. workloads fWi : i jg can borrow remote nodes. The
The priority of workload Wi is i ð1 i 4Þ. values shown in the figure are normalized by “pr1 ,” which
Fig. 15 shows the aggregate VC versus the aggregate indicates the VC when all workloads can borrow remote
mean load for different values of borrow threshold. “prj ” nodes. The main emphasis in this experiment is that one

Fig. 15. Effect of borrow threshold. Fig. 17. Effect of the local optimization strategy.

Fig. 18. Adaptive borrow threshold. Fig. 19. Cycle-breaking rule.

needs to be careful in choosing borrow thresholds. For some nodes from site B. Let the set wpA denote the priority
instance, “pr3 ” and “pr4 ” in the figure behave much worse of a workload that belong to site A and are currently
than “pr1 ” under all values of mean loads (even under high running on nodes in site B (and similarly for wpB ). Then,
loads, as compared to Fig. 15). This is because, in this the cycle-breaking rule requires that wpA > wpB . Clearly, if
scenario, the workloads W3 and W4 never need to borrow the cycle-breaking rule is strictly implemented, then there
nodes. The local optimization cycle always grabs a node would be no cycles in the resource borrowing graph;
from a local lower priority workload and transfers it to a that is, site A will not be using the resources at site B, and
higher priority workload. Since W3 and W4 require not site B will not be using the resources at site A at the same
more than two nodes, they are always guaranteed to be time instant. On the contrary, if we assume that there
allocated local nodes. Hence, “pr3 ” and “pr4 ” are equivalent exists a cycle A ! B ! ! A, then it would mean that
to the case where the sites are noncooperating (W3 and W4 wpA > wpB > > wpA , which is an obvious contradiction.
never need to borrow nodes, whereas W1 and W2 are not Fig. 19 compares the RSC (“rsc”) and the VC (“vc”) with
permitted to borrow nodes). and without the cycle-breaking rule for different values of
The third experiment on borrow threshold shows the the mean load. The figure shows the ratio of each cost with
effect of threshold adaptation. In threshold adaptation, the the cycle-breaking rule to that without the rule. At very low
threshold value is decayed by a constant decay factor on loads, no borrow operations are required, as the sites are
every remote optimization cycle. However, it is reset to its
self sufficient. As the mean load increases, much more
original (default) value whenever a borrow operation fails
nodes may be borrowed. The cycle-breaking rule decreases
to obtain a remote node. When the system is heavily loaded,
the number of borrow operations and thus ensures that the
the borrow requests for lower priority workloads is very
likely to fail, and hence, the borrow threshold would stay RSC and the RC are substantially smaller. At high loads,
close to its default value. On the other hand, if the system is there are no free nodes available that could be borrowed at
lightly loaded, most borrow requests would succeed in either site. Hence, at very high loads, the number of borrow
fetching a free remote node. Therefore, at low loads, the operations come down, and consequently, the cost differ-
borrow threshold would be very low, and thus, most ence between a policy with and without the cycle-breaking
workloads would be permitted to borrow nodes. We use the rule decreases.
same scenario as in Scenario 3, which is described earlier in
this section. Fig. 18 shows the VC versus the mean load for 6.2.4 Lease Time
different values of decay factor. At very low loads, the Scenario 5. Two sites S1 and S2 are each of type ST4 . Site S1
decay factor has no influence on the VC, since the has one workload W1 of type W T2 , and site S2 has
individual sites have sufficient resources to handle their one workload W2 of type W T2 . The priority of workload Wi
peak demands. When the decay factor is very close to one, is i ð1 i 2Þ.
we are being highly conservative in permitting lower This section studies the effect of lease time on the
priority workloads in borrowing nodes; thus, higher decay workload VCs. Lease-time-based policies allow resources
factors tend to perform poorly at moderate loads (under- to be borrowed or donated for a fixed period of time,
utilized resources). When the decay factor is low, we namely, the lease time. Leases are nonpreemptable; that is,
encourage lower priority workloads to borrow nodes; thus, once a resource is leased, the donor cannot withdraw that
low decay factors tend to perform poorly at high loads resource before the specified lease time. On the other
(thrashing due to frequent reallocation). hand, the site that borrowed a resource could return the
resource before its lease terminates.
6.2.3 Cycle Breaking Rule
Fig. 20 shows the VC of the two workloads as the lease
Scenario 4. Four sites S1 , S2 , S3 , and S4 are all of type ST2 . time increases. Note that leases are nonpreemptable, but a
Site Si has one workload Wi of type W T2 for 1 i 4. borrow resource can be returned before a lease expires (if
The priority of workload Wi is i ð1 i 4Þ. the borrowing site decides that the borrowed node is no
A cycle-breaking rule is a policy added to improve the longer required). Consider a scenario wherein the lower
system’s stability, that is, “no local workload with a higher priority workload W1 has borrowed a node. Now, if the
priority is executed on a remote node when a remote higher priority workload W2 needs a node, it has to wait till
workload with a lower priority is being run on a local the lease expires. When the lease expires, W1 is denied a
node.” The key motivation behind a cycle-breaking rule is lease extension, and the node is assigned to W2 . Hence, as
given as follows: Let A ! B denote that site A is using the lease duration increases, the VC for higher priority

Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1364 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008

Fig. 20. Lease time.

Fig. 22. Exponentially distributed events.

workloads increases. Beyond a certain value, an increase in
lease duration does not affect the VC, because the borrowing
site would return the node (because its peak demand is 7.1 Scenario A
over) almost always before the lease actually expires. Scenario A was used for experimental measurement as
follows: We have two sites: S1 and S2 . Site S1 has three
6.2.5 Reallocation Time nodes: s11 , s12 , and s13 . Site S2 has one node s21 . All the nodes
Scenario 6. Two sites S1 and S2 are each of type ST4 . Site S1 has in sites S1 and S2 belong to the same pool type: Intel Pentium
one workload W1 of type W T2 , and site S2 has one workload 3, 1-GHz processors with a 256-Mbyte RAM running RedHat
W2 of type W T2 . The priority of workload Wi is i ð1 1 2Þ. Linux release 7.1. Site S1 hosts the Trade2 application
(Trade2 is a J2EE application publicly available from IBM
In this section, we measure the effect of node reallocation
[8]). Site S2 has no workloads. Site S1 runs the workload
duration with the VC. If the reallocation time is very high,
manager [35] for Trade2 and the site resource manager TIO
then one would have to be very conservative about
[5]. Site S2 runs only its site resource manager TIO [5]. We
borrowing/donating nodes. We assume that reallocation
use a clientside HTTP load generator program that
is nonpreemptable; that is, once a node reallocation
dynamically adjusts the load according to the workload
operation has begun, it cannot be aborted. Also, we assume
model. More specifically, we used a workload of type W T1
that when a node is being reallocated, it is unusable. Fig. 21
for our experimental run. The workload manager uses the
shows the VC as the amount of time taken to perform a
Network Dispatcher [3] to distribute this load among all the
reallocation increases. Reallocation is shown as a percentage
nodes assigned to workload Trade2. We use a simple
of the mean burst time. Note that the values shown in the
figure have been normalized by the “lvc,” the VC when registry wherein all site resource managers register them-
sites are noncooperating. When the reallocation time is very selves upon start-up so that the resource managers can
high, the VC for cooperating multisites becomes larger than identify their counterparts. All interactions between the
that of noncooperating sites (normalized value > 1). When resource managers are implemented as Web services [39]
the lease time exceeds about 50 percent of the mean burst that use Simple Object Access Protocol (SOAP) [38]
time, the VCs of cooperating multisites surpass those of messages on HTTP.
noncooperating sites. Fig. 22 validates our analytical model against our
prototype by comparing estimated costs from our model
and the measurements obtained from our prototype. The
7 VALIDATION costs compared are VC (“vc”), RSC (“rsc”), and RC (“rc”).
In this section, we present a validation of our analytical Fig. 22 validates the numerical solution obtained for our
model against a real implementation. We present two analytical model when the workload events are exponen-
scenarios: A and B. Scenario A is used to quantitatively tially distributed. Fig. 23 validates the simulation results
compare the results obtained from our policy evaluation when the workload events are Pareto distributed. We ran
tool to a real implementation. Scenario B illustrates how this experiment for 8,000 seconds (a little over 2 hours).
we could use our policy evaluation tool to compare and
refine policies.

Fig. 21. Reallocation time. Fig. 23. Pareto-distributed events.

at high loads. Fig. 24 shows the total cost incurred by policy

set P S3 from our real implementation. Note that the costs
for policy sets P S1 and P S2 in Fig. 24 are obtained from our
policy evaluation tool.
In summary, our initial experiments show that our tool is
capable of estimating these costs within a precision of about
5 percent. We found that a real experiment has to be run for
at least 1 hour for the workload costs (per unit time) to
converge to some steady value. On the other hand, our
numerical solution can be computed in a couple of seconds,
and our simulation model requires only a few tens of
Fig. 24. Policy refinement. seconds before the cost values converge to a steady-state
value. This makes it possible for a policy maker to
From the experimental run, we measured the workload VC, interactively use this tool for defining and fine-tuning a
the RSC, and the RC. These costs are expressed as the site’s policies.
fraction of time (over this duration of 8,000 seconds)
wherein the workload’s SLA was not met (VC), a remote
node was used (RSC), and the remote node was being 8 CONCLUSION
reallocated (RC). Fig. 22 shows the results obtained from This paper presents an analysis methodology for studying
our analytical model and the real measurements when the the effect of resource sharing policies on a multisite
workload events follow an exponential distribution. Fig. 23 resource management system. We demonstrated that our
shows the results obtained from our simulation model and tool could be very useful for system designers in building
the real measurements when the workload events follow a effective solution strategies and that the methodology can
heavy-tailed Pareto distribution. be incorporated into a planning and system management
tool that would permit policy makers to understand, fine-
7.2 Scenario B
tune, and analyze the effect of policies on the system. We
Scenario B used in our experimental evaluation shows how have presented a validation of our model against a
our policy evaluation tool can be used to compare one or prototype implementation of a multisite resource manage-
more policies and refine them. In this scenario, we have ment system. A summary of the key results obtained by
two sites S1 and S2 , each having three nodes. Each site has using our analysis methodology on multisite resource
the same set of three workloads W1 , W2 and W3 , with the management include the following: 1) moderate mean load
priority of workload Wi being i. For each of the workloads, and high load variance are best handled by cooperating
we used the Trade2 application. The system has two policy sites, 2) a borrow threshold policy is useful only under high
sets: P S1 and P S2 . Policy set P S1 has the following policies: loads; at lower loads, a high borrow threshold tends to
thr br ¼ 1, thr dn ¼ 90 percent, lease time ¼ 600 seconds, increase the aggregate workload costs, and 3) short lease
and preempt remote workload ¼ true. We use thr dn ¼ x times are good for high-priority workloads. In conclusion,
to denote that a site donates nodes to remote workloads our tool is capable of evaluating complex policies on a
only if it its average node utilization is lower than x percent. multisite resource management system and permits inde-
When the preempt remote workload is true, the site can pendent policies for each site so that policy makers can
prematurely break a lease on a node that it has donated to a quickly evaluate several alternatives and their effects on the
remote workload. Policy set P S2 has the following policies: workloads before deploying them.
thr br ¼ 2, thr dn ¼ 80 percent, lease time ¼ 600 seconds, As a part of our future work, we are exploring three
directions. First, we are exploring techniques to extend our
and preempt remote workload ¼ false. When the preempt
policy analysis tool to operate with general distributions for
remote workload is false, a node leased to a remote
node failure and recovery and workload models. We plan
workload cannot be prematurely broken. However, a
to use G/M/1 queuing networks (using embedded Markov
request to extend a lease (for the next 600 seconds) can be
chains [33]) and G/G/1 queuing networks (using well-
declined. known approximations [33]). Second, we plan to explore
Fig. 24 shows the total cost VC þ RC þ RSC for policy the economics of multisite resource allocation wherein each
sets P S1 and P S2 versus the mean load. While P S1 site is operated by competing agents by using game
performs better at lower loads, at higher loads, policy set theoretic techniques. Third, we intend to apply our
P S2 performs better. This is because of the following techniques to perform policy evaluation in other applica-
reasons: 1) at high loads, the number of borrows could be tion domains, in particular policy evaluation for storage
high, thereby increasing the RC, 2) the conservative nature area network file system.
of policy set P S2 reduces the RC (higher borrow threshold
and lower donation utilization threshold), and 3) further-
more, we observed that policy set P S1 donates nodes too ACKNOWLEDGMENTS
often and breaks the lease too often (at high loads), thereby This work was motivated by a larger effort in multisite
increasing its RCs significantly. resource allocation. The authors would like to thank their
The policy evaluation tool helps us identify the crossover colleagues Asit Dan, Daniela Rosu, Vijay Naik, and
point between policy sets P S1 and P S2 . It also helps the Sameh Fakhouri for their specific advice that helped this
administrator construct a policy set P S3 from P S1 and P S2 , work greatly, Mark Squillante for reviewing the work and
which imitates P S1 when the load is low and imitates P S2 giving the authors valuable feedback, Daniel Dias for his

Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1366 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008

skillful management support for this effort and for the [28] A.K. Iyengar, M.S. Squillante, and L. Zhang, “Analysis and
Characterization of Large-Scale Web Server Access Patterns and
multisite resource allocation project, and Norbert Vogl for Performance,” World Wide Web J., 1999.
proofreading this paper. [29] J. Kay and P. Lauder, A Fair Share Scheduler. Univ. of Sydney and
AT&T Labs, 1988.
[30] K.H. Kim and R. Buyya, “Policy-Based Resource Allocation in
Hierarchical Virtual Organizations for Global Grids,” Proc. 18th
REFERENCES Int’l Symp. Computer Architecture and High-Performance Computing
[1] Globus Toolkit, http://www.globus.org/, 2008. (SBAC-PAD), 2006.
[2] IBM Data Center Solution, http://www-1.ibm.com/servers/ [31] K.H. Kim and R. Buyya, “Fair Resource Sharing in Hierarchical
eserver/xseries/windows/datacenter.html, 2008. Virtual Organizations for Global Grids,” Proc. Eighth IEEE/ACM
[3] IBM Network Dispatcher Features, http://www-3.ibm.com/ Int’l Workshop Grid Computing (GRID), 2007.
software/network/about/features/keyfeatures.html, 2008. [32] A.J. King and M.S. Squillante, “Optimal Control of Web Hosting
[4] IBM Tivoli Decision Support for OS/390 Capacity Planner Feature Systems under Service-Level Agreements,” IBM Research Report
Guide and Reference, http://publib.boulder.ibm.com/tividd/td/ RC23094 (W0401-145), 2004.
TDS390/SH19-4021-05/en_US/HTML/drlz9mst.htm, 2008. [33] L. Kleinrock, Queuing Systems. Wiley Interscience, 1975.
[5] IBM Tivoli Intelligent Orchestrator, http://www-306.ibm.com/ [34] O. Kremien and J. Kramer, “Methodical Analysis of Adaptive
software/tivoli/products/intell-orch, 2008. Load Sharing Algorithms,” IEEE Trans. Parallel and Distributed
[6] IBM Tivoli Provisioning Manager, http://www-306.ibm.com/ Systems, vol. 3, no. 6, Nov. 1992.
software/tivoli/products/prov-mgr, 2008. [35] A. Leff, J. Rayfield, and D. Dias, “Meeting Service Level
[7] IBM WebSphere Extended Deployment Edition (WebSphere XD), Agreements in a Commercial Grid,” IEEE Internet Computing,
http://www-6.ibm.com/jp/software/websphere/ft/was/xd/ 2003.
pdf/whitepaper.pdf, 2008. [36] H. Ludwig, A. Keller, A. Dan, R. King, and R. Frank, “A Service-
[8] IBM WebSphere Performance Benchmark Sample (Trade2), Level Agreement Language for Dynamic Electronic Services,”
http://www-4.ibm.com/software/webservers/appserv/ Electronic Commerce Research 3, 2003.
wpbs_download.html, 2008. [37] M.S. Squillante, D.D. Yao, and L. Zhang, “Web Traffic Modeling
[9] LPAR: Dynamic Logical Partitioning, http://www-03.ibm.com/ and Server Performance Analysis,” Proc. 38th IEEE Conf. Decision
servers/eserver/iseries/lpar/, 2008. and Control (CDC), 1999.
[10] Platform LSF, http://www.platform.com/, 2008. [38] Simple Object Access Protocol (SOAP), World Wide Web Con-
[11] VMWare ESX Server, http://www.vmware.com/, 2008. sortium (W3C) note, http://www.w3.org/TR/soap, 2008.
[12] WS Agreement Specification, http://www.gridforum.org/, 2008. [39] Web Services, World Wide Web Consortium (W3C) note, http://
[13] E.A. Brewer, “Lessons from Giant-Scale Services,” IEEE Internet www.w3.org/2002/ws, 2008.
Computing, 2001. [40] Y. Yan, A. El-Atawy, and E. Al-Shaer, “Ranking-Based Optimal
[14] G. Cheliotis and C. Kenyon, “Autonomic Economics: Why Self- Resource Allocation in Peer-to-Peer Networks,” Proc. IEEE
Managed E-Business Systems Will Talk Money?” Proc. IEEE Conf. INFOCOM, 2007.
e-Commerce Technology (CEC), 2003.
[15] G. Cheliotisyx, C. Kenyony, and R. Buyya, “10 Lessons from Mudhakar Srivatsa received the BTech degree
Finance for Commercial Sharing of Its Resources,” Peer to Peer in computer science and engineering from the
Computing: The Evolution of Disruptive Technology, IRM Press, Indian Institute of Technology Madras, Chennai,
2004. India, in 2002 and the PhD degree in computer
[16] B. Cooper and H. Garcia-Molina, “Peer-to-Peer Resource Trading science from Georgia Institute of Technology,
in a Reliable Distributed System,” Proc. First Int’l Workshop Peer-to- Atlanta, in 2006. He is currently a research
Peer Systems (IPTPS), 2002. scientist at the IBM T.J. Watson Research
[17] Y. Drougas and V. Kalogeraki, “A Fair Resource Allocation Center, conducting research at the intersection
Algorithm for Peer-to-Peer Overlays,” Global Internet, 2005. of networking and security. His research inter-
[18] C. Dumitrescu and I. Foster, “Usage-Policy-Based CPU Sharing in ests include security and reliability of large-scale
Virtual Organizations,” Proc. Fifth IEEE/ACM Int’l Workshop Grid networks, secure information flow, and risk management. He is a
Computing (GRID), 2004. member of the IEEE.
[19] C. Dumitrescu, M. Wilde, and I. Foster, “A Model for Usage
Policy-Based Resource Allocation in Grids,” Proc. Sixth IEEE Int’l Nithya Rajamani received the BE degree from
Workshop Policies for Distributed Systems and Networks (POLICY), Anna University, Chennai, India and the mas-
2005. ter’s degree in computer science from the
[20] G.S. Fishman, “Discrete Event Simulation,” Springer Series in University of Illinois, Urbana-Champaign. She
Operations Research, 2001. is currently a research developer and a senior
[21] I. Foster, D. Gannon, H. Kishimoto, and J.V. Reich, “Open Grid software engineer at the IBM T.J. Watson
Services Architecture Use Cases,” Global Grid Forum (GGF), Research Center. Her interests include distrib-
information document, Oct. 2004. uted systems, Web services, and, more re-
[22] I. Foster, C. Kesselman, J. Nick, and S. Tuescke, “The Physiology cently, information management and services
of the Grid: An Open Grid Services Architecture for Distributed science.
Systems Integration,” Open Grid Service Infrastructure WG, Global
Grid Forum, 2002.
[23] I. Foster, C. Kesselman, and S. Tuecke, “The Anatomy of the Grid:
Enabling Scalable Virtual Organizations,” Int’l J. Supercomputer Murthy Devarakonda received the PhD degree
Applications, vol. 15, no. 3, pp. 200-222, 2001. in computer science from the University of
[24] I. Foster, H. Kishimoto, A. Savva, D. Berry, A. Djaoui, Illinois, Urbana-Champaign, in 1988. He is
A. Grimshaw, B. Horn, F. Maciel, F. Siebenlist, R. Subramaniam, currently a senior manager and a research staff
J. Treadwell, and J.V. Reich, “The Open Grid Services member in the Services Research Department,
Architecture,” Global Grid Forum (GGF), informational docu- IBM T.J. Watson Research Center, where he
ment, 2005. has worked on distributed file systems, Web
[25] A. Fox, S.D. Gribble, Y. Chawathe, E.A. Brewer, and P. Gauthier, technologies, policy-based systems manage-
“Cluster-Based Scalable Network Services,” Proc. 16th ACM Symp. ment, and services computing. He received
Operating Systems Principles (SOSP), 1997. IBM Divisional Awards for his work on distributed
[26] H. Gimpel, H. Ludwig, A. Dan, and R. Kearney, “PANDA: file systems and global technology outlook development. He is a senior
Specifying Policies for Automated Negotiations of Service Con- member of the IEEE and the ACM.
tracts,” Proc. First Int’l Conf. Service-Oriented Computing (ICSOC),
2003.
[27] Grid Resource Allocation Management (GRAM) Service. Globus,
http://www.globus.org/toolkit/docs/3.2/gram/, 2008.

Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.

Cisco Commands Cheat Sheet
100% (2)
Cisco Commands Cheat Sheet
11 pages
Integrated Software Application 1
67% (3)
Integrated Software Application 1
5 pages
NSE 4 7.0 Sample Questions - Attempt Review2
No ratings yet
NSE 4 7.0 Sample Questions - Attempt Review2
10 pages
QP MEP Q0202 Office Assistant
No ratings yet
QP MEP Q0202 Office Assistant
28 pages
KINGPAK
No ratings yet
KINGPAK
59 pages
CMSC 427: Computer Graphics: David Jacobs
No ratings yet
CMSC 427: Computer Graphics: David Jacobs
40 pages
Sample Deployment Plan
No ratings yet
Sample Deployment Plan
2 pages
EWM HU Label PPF Action and Conditions Technique
50% (2)
EWM HU Label PPF Action and Conditions Technique
3 pages
Apostol Sol
No ratings yet
Apostol Sol
94 pages
Graphic Engines 3
No ratings yet
Graphic Engines 3
8 pages
Workflow Scheduling Algorithms For Grid Computing
No ratings yet
Workflow Scheduling Algorithms For Grid Computing
43 pages
Wimax 8512
No ratings yet
Wimax 8512
5 pages
Designing Conventions For Automated Negotiation: Jeffrey S. Rosenschein and Gilad Zlotkin
No ratings yet
Designing Conventions For Automated Negotiation: Jeffrey S. Rosenschein and Gilad Zlotkin
18 pages
Contractor Tool and Equipment Register
No ratings yet
Contractor Tool and Equipment Register
2 pages
CWNA Guide To Wireless LAN's Second Edition - Chapter 2
No ratings yet
CWNA Guide To Wireless LAN's Second Edition - Chapter 2
46 pages
Introduction To Database Assignment Part1 Group 7
No ratings yet
Introduction To Database Assignment Part1 Group 7
11 pages
AD7730 - EngineerZone PDF
No ratings yet
AD7730 - EngineerZone PDF
23 pages
Mentor Product Description (Including Insight)
No ratings yet
Mentor Product Description (Including Insight)
39 pages
Coolsat Testing
No ratings yet
Coolsat Testing
21 pages
Printer Configuration Demo Manual
No ratings yet
Printer Configuration Demo Manual
6 pages
JD For SCM Functional
No ratings yet
JD For SCM Functional
2 pages
Comp Sci - IJCSEITR - Analyzing Large Scaled Applications - DeEPTI SINGH
No ratings yet
Comp Sci - IJCSEITR - Analyzing Large Scaled Applications - DeEPTI SINGH
8 pages
HMD Global Nokia 2.3 Datasheet
No ratings yet
HMD Global Nokia 2.3 Datasheet
1 page
Css Preprocessor Mixins Notes
No ratings yet
Css Preprocessor Mixins Notes
5 pages
Service Time Distribution of Tasks Using Adapt - Policy: Ramyadevi K, Anitha B
No ratings yet
Service Time Distribution of Tasks Using Adapt - Policy: Ramyadevi K, Anitha B
3 pages
GRID SCHEDULING USING VARIOUS PERFORMANCE MEASURES - A COMPARATIVE STUDY - Ubiquitous Computing and Communication Journal
No ratings yet
GRID SCHEDULING USING VARIOUS PERFORMANCE MEASURES - A COMPARATIVE STUDY - Ubiquitous Computing and Communication Journal
12 pages
Iftikhar Ahmad CV..
No ratings yet
Iftikhar Ahmad CV..
1 page
PMO Delotte Consulting Presentation
100% (1)
PMO Delotte Consulting Presentation
14 pages
Ijctt V3i4p103
No ratings yet
Ijctt V3i4p103
6 pages
How To Block HTTP DDoS Attack With Cisco ASA Firewall
No ratings yet
How To Block HTTP DDoS Attack With Cisco ASA Firewall
4 pages
How To Run Android in Ubuntu
No ratings yet
How To Run Android in Ubuntu
2 pages
SE&PM Module 5
No ratings yet
SE&PM Module 5
50 pages
Economical Job Scheduling in Wireless Grid: M. N. Birje, Sunilkumar S. Manvi, Chetan Bulla
No ratings yet
Economical Job Scheduling in Wireless Grid: M. N. Birje, Sunilkumar S. Manvi, Chetan Bulla
5 pages
Drawing Bathroom
No ratings yet
Drawing Bathroom
3 pages
A Novel Application of Option Pricing To Distributed Resources Management
No ratings yet
A Novel Application of Option Pricing To Distributed Resources Management
8 pages
Measuring The Robustness of Resource Allocations in A Stochastic Dynamic Environment
No ratings yet
Measuring The Robustness of Resource Allocations in A Stochastic Dynamic Environment
10 pages
A Distributed Resource Management Architecture That Supports Advance Reservations and Co-Allocation
No ratings yet
A Distributed Resource Management Architecture That Supports Advance Reservations and Co-Allocation
10 pages
MIP Formulation For Robust Resource Allocation in Dynamic Real-Time Systems
No ratings yet
MIP Formulation For Robust Resource Allocation in Dynamic Real-Time Systems
8 pages
NimrodG An Architecture For A Resource Management and Scheduling System in A Global Computational Grid
No ratings yet
NimrodG An Architecture For A Resource Management and Scheduling System in A Global Computational Grid
7 pages
Matchmaking Distributed Resource Management For High Throughput Computing
No ratings yet
Matchmaking Distributed Resource Management For High Throughput Computing
7 pages
New Sequential and Parallel Algorithm For Dynamic Resource Constrained Project Scheduling Problem
No ratings yet
New Sequential and Parallel Algorithm For Dynamic Resource Constrained Project Scheduling Problem
7 pages
Energy Aware Consolidation For Cloud Computing
No ratings yet
Energy Aware Consolidation For Cloud Computing
5 pages
Devesh Kumar YMA Resume
No ratings yet
Devesh Kumar YMA Resume
3 pages
Google Search Operators
No ratings yet
Google Search Operators
18 pages
Efficient Resource Allocation Algorithm With
No ratings yet
Efficient Resource Allocation Algorithm With
9 pages
IEEE PDGC OnlineVersion
No ratings yet
IEEE PDGC OnlineVersion
7 pages
Architecting High-Scale Metrics with Thanos: The Complete Guide for Developers and Engineers
From Everand
Architecting High-Scale Metrics with Thanos: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Lenses.io for Data Streaming Platforms: The Complete Guide for Developers and Engineers
From Everand
Lenses.io for Data Streaming Platforms: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Apache Superset Essentials: The Complete Guide for Developers and Engineers
From Everand
Apache Superset Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Moab Cluster Scheduling and Resource Management: Definitive Reference for Developers and Engineers
From Everand
Moab Cluster Scheduling and Resource Management: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Liftbridge Message Streams for Distributed Systems: The Complete Guide for Developers and Engineers
From Everand
Liftbridge Message Streams for Distributed Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Keiko: Design and Algorithms for Distributed Scheduling: The Complete Guide for Developers and Engineers
From Everand
Keiko: Design and Algorithms for Distributed Scheduling: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Comprehensive LSF Administration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive LSF Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DENT Network Operating System in Practice: The Complete Guide for Developers and Engineers
From Everand
DENT Network Operating System in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Memphis.dev Essentials: The Complete Guide for Developers and Engineers
From Everand
Memphis.dev Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Practical Apache Mesos: Definitive Reference for Developers and Engineers
From Everand
Practical Apache Mesos: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Elastic Essentials: Definitive Reference for Developers and Engineers
From Everand
Elastic Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
From Everand
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Redux Engineering: Definitive Reference for Developers and Engineers
From Everand
Practical Redux Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mesosphere Architecture and Deployment: Definitive Reference for Developers and Engineers
From Everand
Mesosphere Architecture and Deployment: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Designing Resilient Distributed Systems with CAP: Definitive Reference for Developers and Engineers
From Everand
Designing Resilient Distributed Systems with CAP: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Workload Management with SGE: Definitive Reference for Developers and Engineers
From Everand
Efficient Workload Management with SGE: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Kestra Pipeline Orchestration Essentials: The Complete Guide for Developers and Engineers
From Everand
Kestra Pipeline Orchestration Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Resoto for Cloud Resource Automation: The Complete Guide for Developers and Engineers
From Everand
Resoto for Cloud Resource Automation: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Architecting Distributed Applications with Macrometa: The Complete Guide for Developers and Engineers
From Everand
Architecting Distributed Applications with Macrometa: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Cortex for Scalable Multi-Tenant Metrics: The Complete Guide for Developers and Engineers
From Everand
Cortex for Scalable Multi-Tenant Metrics: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Presto in Practice: Definitive Reference for Developers and Engineers
From Everand
Presto in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
StreamSets Data Integration Architecture and Design: The Complete Guide for Developers and Engineers
From Everand
StreamSets Data Integration Architecture and Design: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
ZooKeeper Systems and Techniques: Definitive Reference for Developers and Engineers
From Everand
ZooKeeper Systems and Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Trino Distributed SQL Query Engine Essentials: Definitive Reference for Developers and Engineers
From Everand
Trino Distributed SQL Query Engine Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Mattermost Administration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Mattermost Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Polly in Action: Definitive Reference for Developers and Engineers
From Everand
Polly in Action: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
RisingWave for Real-Time Data Processing: The Complete Guide for Developers and Engineers
From Everand
RisingWave for Real-Time Data Processing: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Rocket.Chat Administration and Deployment Guide: Definitive Reference for Developers and Engineers
From Everand
Rocket.Chat Administration and Deployment Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Striim Platform Essentials: Definitive Reference for Developers and Engineers
From Everand
Striim Platform Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Redshift Essentials: Definitive Reference for Developers and Engineers
From Everand
Redshift Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
From Everand
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OpenTracing in Distributed Systems: Definitive Reference for Developers and Engineers
From Everand
OpenTracing in Distributed Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Spacelift Automation and Workflow Design: Definitive Reference for Developers and Engineers
From Everand
Spacelift Automation and Workflow Design: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Debezium in Action: Definitive Reference for Developers and Engineers
From Everand
Debezium in Action: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Commvault Administration and Best Practices: Definitive Reference for Developers and Engineers
From Everand
Commvault Administration and Best Practices: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Observability Engineering with Relic: Definitive Reference for Developers and Engineers
From Everand
Practical Observability Engineering with Relic: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cohesity Architecture and Administration: Definitive Reference for Developers and Engineers
From Everand
Cohesity Architecture and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
From Everand
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Istio in Production Environments: Definitive Reference for Developers and Engineers
From Everand
Istio in Production Environments: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Blue-Green Deployment Engineering: Definitive Reference for Developers and Engineers
From Everand
Blue-Green Deployment Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
ELT Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
From Everand
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Lustre Administration and Optimization: Definitive Reference for Developers and Engineers
From Everand
Lustre Administration and Optimization: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DataDog Operations and Monitoring Guide: Definitive Reference for Developers and Engineers
From Everand
DataDog Operations and Monitoring Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Litmus Chaos Experiments in Practice: The Complete Guide for Developers and Engineers
From Everand
Litmus Chaos Experiments in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Mastering OpenTelemetry: Building Scalable Observability Systems for Cloud-Native Applications
From Everand
Mastering OpenTelemetry: Building Scalable Observability Systems for Cloud-Native Applications
Robert Johnson
No ratings yet
Study Guide Cisco Certified Design Expert (CCDE 400-007) Exam
From Everand
Study Guide Cisco Certified Design Expert (CCDE 400-007) Exam
Anand Vemula
No ratings yet
Edge Cloud Operations: A Systems Approach
From Everand
Edge Cloud Operations: A Systems Approach
Larry L Peterson
No ratings yet
Scalability By Design
From Everand
Scalability By Design
Chukwunonso Offor
No ratings yet
Essays on Infrastructure-as-code
From Everand
Essays on Infrastructure-as-code
Ravi Rajamani
No ratings yet
Dancing on a Cloud: A Framework for Increasing Business Agility
From Everand
Dancing on a Cloud: A Framework for Increasing Business Agility
David Sterling
No ratings yet
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet

A Policy Evaluation Tool For Multisite Resource Management

Uploaded by

A Policy Evaluation Tool For Multisite Resource Management

Uploaded by

1352 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO.

10, OCTOBER 2008

A Policy Evaluation Tool for

E NTERPRISES typically operate multiple data center sites [2],

Fig. 1. Multisite resource management.

4.1 Machine Model 4.5 Site-State Model

nodes currently at site X. Let the number of requested

. P7 . Donate for r0 nodes for workload w, where

denotes the probability (over an extended period of time)

Fig. 4. Unified policy evaluation.

Fig. 8. Accuracy of the policy evaluation tool.

Fig. 6. Workload model: sample transition probability transition matrix.

local reallocation, and nrtsðS; S 0 ; wÞ denotes the number of

Fig. 14. Sensitivity analysis. Fig. 16. Worst-case analysis.

Fig. 18. Adaptive borrow threshold. Fig. 19. Cycle-breaking rule.

Fig. 20. Lease time.

Fig. 22. Exponentially distributed events.

Fig. 21. Reallocation time. Fig. 23. Pareto-distributed events.

at high loads. Fig. 24 shows the total cost incurred by policy

You might also like