A Policy Evaluation Tool For Multisite Resource Management
A Policy Evaluation Tool For Multisite Resource Management
Abstract—Enterprises typically operate multiple data center sites, each handling workloads according to an enterprise-level strategy.
Sharing resources across multiple sites (or enterprises) brings up several important problems. Each site may have its own policies that
govern its interactions with other remote sites. Different policies impact the system performance in different ways, and site administrators
and system designers need to understand the effects of a given set of policies on potential workloads. In this paper, we describe an
analysis methodology that determines the impact of policies on workloads, and we present results and validation for a prototypical
multisite resource sharing system. Our analytical tool is capable of evaluating complex policies on a large-scale system and permits
independent policies for each site so that policy makers can quickly evaluate several alternatives and their effects on the workloads
before deploying them.
Index Terms—Modeling and prediction, performance of systems, policy impact analysis, resource sharing, distributed systems.
1 INTRODUCTION
However, queuing-network-based analysis is faced with notion of site, workload, event, policy, and cost in Section 4.
a state-space explosion problem, that is, the number of We present algorithms for policy evaluation and combating
states in the queuing network may explode to large state-space explosion in Section 5. Section 6 describes
numbers, making fast and scalable policy analysis a experiments that quantify the scalability of our approach,
challenging problem. In this paper, we introduce the notion followed by a detailed collection of case studies. Section 7
of near-equivalent states and present a tuneable state-space validates our policy evaluation tool against a real imple-
compression algorithm that simultaneously achieves the mentation. Finally, we conclude in Section 8.
following goals: 1) reduce the number of states in the
queuing network by several orders of magnitude and 2 RELATED WORK
thereby facilitate fast and scalable policy analysis, 2) retain
the accuracy of steady-state solution to the queuing network Although load sharing in clusters and processor sharing [34],
and thus preserve the integrity of the overall cost analysis, [29] are widely studied, a methodology for analyzing
and 3) identify bottleneck resources in the system and thus multisite resource sharing under policy constraints is
facilitate the system administrator to perform more intelli- relatively new. The most closely related work is in the field
of grid computing [1], [23], [22]. An open grid services
gent capacity planning and policy tuning. We used our
architecture (OGSA) [24], [21] develops a framework (for
methodology to analyze several multisite resource usage
both commercial and scientific grids) to support distributed
policies. The tool was useful not only in understanding the
system integration, virtualization, and management ser-
effects of various policies but also in characterizing the
vices. A grid resource allocation manager (GRAM) [27]
conditions under which a policy would be useful. The key
supports locating, submitting, monitoring, and canceling
results from our analysis are listed as follows: jobs on a grid.
. Multisite resource allocation is effective when Our work falls under the category of a commercial grid
individual sites are moderately loaded, they experi- that is built on top of the IBM TIO [5]. Platform Load Sharing
ence high variance in their loads, and only a small Facility (LSF) [10] offers several infrastructural features
subset of sites are heavily loaded. similar to the IBM TIO [5]. Platform LSF supports multi-
. Greedy strategies work well when individual sites cluster capabilities allowing a cluster to span across multiple
experience moderate loads and high load variance. sites across various geographical locations. Similar to TIO,
LSF attempts to provide a single computing machinery
. Allowing low-priority workloads to borrow remote
image across multiple connected hardware clusters. LSF also
servers is useful, especially when the resource
supports rich multisite resource allocation policies including
manager uses priority-based preemption to allocate
job priorities and preserves local ownership and control. In
local resources.
this paper, we assume that the infrastructure required to
. Long lease times hurt high-priority workloads.
support multisite clusters is available (say, using the IBM
We validated our analysis methodology with a real TIO or Platform LSF). Our goal is to develop fast and
implementation of the multisite resource management scalable algorithms for performance-based policy impact
system (see Fig. 1). Our prototype comprises a simple analysis that answers questions such as: What if we use policy
two-site scenario. It uses a benchmark J2EE application P instead of policy Q? What if we change the threshold parameter
Trade2 [8] as the workload, Cayuga [35] as the workload in policy P from thr1 to thr2 ? What if site A prefers sharing its
manager, and TIO [5] as the resource manager. We observed resources with site B to site C? What if the sites have different
that our analysis results match the measurements obtained (and possibly conflicting) policies? While Platform LSF sup-
from our implementation to within a 5 percent error. ports a rich set of resource sharing policies and tools for
The following sections of this paper are organized as monitoring resource utilization, it is not evident that it
follows: We describe related work in Section 2, followed by supports policy impact analysis. The algorithms described
an overview of the multisite resource allocation problem in in this paper for fast and scalable policy evaluation (e.g., cost
Section 3. We present a concrete model that captures the analysis and what-if analysis) have been implemented using
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1354 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008
the IBM TIO, but they may be applicable to several other autonomy of each site, all decisions on local resources
multisite resource management systems. available at that site should be made by the site’s resource
Several authors have addressed policy-related issues in manager. Having a centralized resource manager that
virtual organizations (VOs) [18], [30], [31], [19], which may coordinates the resource managers at each local site might
span multiple autonomous organizations. In [18] and [19], solve this problem in a very similar way that a site’s resource
the authors introduce usage policies for resources in a grid manager coordinates various local application managers.
and evaluate these policies by using simulations and However, the resource managers at each site would lose
measured validation. The authors model VOs that generate their autonomy over local resources. The resource managers
batch jobs to be executed at various resource provider sites. at each site coordinate with their counterparts in a peer-to-
The resource providers accept jobs based on usage policies. peer manner. Fundamental primitives that are used by
The study finds that a policy called commitment-limit is resource managers to interact with their counterparts
most effective in minimizing the response times. The include resource borrowing and resource donating.
commitment-limit policy specifies that a site should accept
a job if a VO’s resource usage is below a threshold or if there 3.1 Resource Borrowing and Donation
are idle nodes and the VO’s use of resources is below another In our framework, each site is capable of borrowing and
threshold. In [31] and [19], the authors extend policy-based donating resources. When a resource is donated from sites
resource allocation techniques for hierarchical VOs. Our S1 to S2 , site S2 can use that resource for time duration
work is motivated by the need to study policy-based work specified by a lease time. At the end of the lease time, site S2
offloading from one site to another in a federated environ- may request a renewal of the lease. Site S1 may renew the
ment by using different system and workload models and lease, depending on its current state. Our resource borrow-
considers a broad range of policies. Therefore, we studied a ing and donating differs radically from traditional resource
different aspect of policy-based resource sharing and models along two dimensions: resource granularity and
provide a complimentary set of results. We believe that time granularity. We assume that the resource granularity is
our methodology is general enough such that various usage one computing node in a server. There are two primary
policies can be modeled and analyzed using our tool. reasons for choosing this granularity level in a commercial
A significant amount of work has been done in the field grid: configuring one node to host multiple applications is
of economic models for resource allocation in peer-to-peer challenging, and more importantly, for security reasons
networks. In [16], the authors have proposed trading as a (accidental information leakage), it may be unsafe to host
mechanism, wherein a site acquires remote resources by two applications (especially if they are from different clients)
trading away its own local resources. Several authors have on the same node. Having said that, as isolation and
also researched on optimal resource allocation in peer-to- virtualization techniques [9], [11] improve, it would be
peer networks [40], [17]. Our architecture for multisite possible to use finer grained resources. Nonetheless, our
resource management is peer to peer; however, unlike policy evaluation tool can be easily extended to accommo-
peer-to-peer systems wherein the peers may be greedy and date such fine-grained resources.
competing with each other, our focus is on cooperative The time granularity at which a resource is leased could
multisite resource management. Our work shares a similar vary from a couple of minutes to hours. This large resource
spirit as that of several studies on commercial sharing of granularity can be attributed to the fact that data centers
IT resources [14], [15]. (sites) typically own tens or even hundreds of nodes. The
Many papers describe system architectures required to coarseness in time granularity was chosen, because it takes
support cluster-based resource sharing and load balancing a long time to switch nodes among different applications:
for Web servers and application servers. Fox et al. [25] e.g., time is required to set up the software stack, provision,
describe the idea of using a separate overflow pool of and enable the application [6].
servers (that are not usually a part of the cluster) for For the sake of simplicity, we assume that all updates to
handling bursts in the Web traffic for one application. A the logical state of a resource manager are serialized. This
report on giant-scale services [13] presents an extensive logical state update changes the in-memory state model and
discussion on Internet-based systems that are primarily adds log entries. These operations are typically fast, and
single owner and comprise well-connected clusters. The thus, this keeps the information lag small. Following the
basic model of the giant-scale services implementation is synchronized logical state update, a workflow that executes
similar to our site resource manager that attempts to hide the actual borrow or donate operation is kicked off. If
node failures and balances traffic. However, the emphasis necessary, this workflow also installs the required software
of this paper is on lessons learned from the infrastructural stack, installs the application, and sets appropriate environ-
facilities available for effective load balancing and on ment variables. The actual state change (trigged by work-
enhancing the performance of application services such as flows) may take several seconds or a couple of minutes.
round-robin DNS and layer-4 and layer-7 Web switches.
Emphasis is on the evaluation of multisite resource sharing 3.2 Policies
policies in a system model of a few small enterprise data In view of the resource borrowing and donating model
centers to the giant-scale services. discussed above, a resource manager needs to make the
following decisions:
3 MULTISITE RESOURCE ALLOCATION . When can a node be borrowed?
In this section, we present the outline of our framework, . Where can a node be borrowed?
which can be employed by a resource manager to share . When can a node be donated?
resources with its counterparts. In order to preserver the . How can lease renewal requests be handled?
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
SRIVATSA ET AL.: A POLICY EVALUATION TOOL FOR MULTISITE RESOURCE MANAGEMENT 1355
These policy-based decisions could be performance related measurements are primarily used to build a workload
(for example, never borrow a node from a remote site generator that best simulates the dynamics on a commercial
when there are free nodes available at the local site) or grid.1 The workload transits between levels, as specified
administrative (for example, site A should never donate/ by a transition probability matrix tpm: nlevels nlevels.
borrow a resource to/from site B). Our framework permits The entry tpmði; jÞ specifies the probability that a workload
a wide variety of such policies to be deployed by the transits from levels i to j ð1 i; j nlevelsÞ.
system. In order to preserve site autonomy, we permit sites
to specify their own policies. Our analytical tool is capable 4.4 Workload-State Model
of studying the effect of these (possibly) heterogeneous A workload’s state is characterized by hlv; nl; nbi, where lv
policies on different workloads. denotes the current workload level (as described in the
workload model above), nl denotes the amount of local
resources that are currently serving this workload, and nb
4 MODEL is an array (one element per remote site) that denotes the
In this section, we present a concrete model that describes amount of resources borrowed on behalf of this work-
various entities (physical and logical) in multisite resource load. These resources (local and remote) belong to the
allocation, including machines, sites, workloads, policies, pool type required by the workload. A collection of
events, and cost models. We also describe a model for resources (local and remote) is quantified by the three
capturing the state of physical entities such as site and tuple hnnodes; mm; pti, where nnodes denotes the number
workload. of nodes of machine model mm and pool type pt.
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1356 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008
4.7 Event Model whether the reallocated node was idle or running some
Events represent those external changes that would require workload, whether the reallocated node was local or
the system to reallocate or redistribute its resources among remote with respect to the workload, and what the cost
workloads in order to meet its business objectives. We of provisioning and setting up the workload is. First, the
focus on two major types of events in our framework: 1) a RC would be higher if the node was previously running
workload event occurs when a workload requirement some workload (say, due to the time expended in simply
moves from one level to another and 2) a node event shutting down that workload on the node); also, such a cost
occurs when a node fails (or is moved into the maintenance might depend on the workload (if any) that was previously
mode) or when a node is reinstated into the system upon running on the node. Second, the RC for a local node is
recovering from failure (or upon maintenance completion). likely to be much smaller than that for a remote node. This
A node event also occurs when a node is borrowed or is primarily because a local reallocation does not have to go
donated and when the lease on a borrowed node expires. through agreements between different sites [4]. Support for
Given a state S, there is a set of events E that could automated negotiations across sites is provided by PANDA
potentially occur when the system is in state S. For each [26] and WS agreements [12]. Third, the most important
event e 2 E, the probability distribution of the time for component of reallocation is the cost of provisioning and
event e to fire is assumed to be known. For workload setting up the environment required for the workload on
events, the probability distribution of the time to the next the reallocated node. This might involve additions to the
event is obtained from the workload model. We assume that node’s software stack, starting the workload’s runtime
nodes fail (and recover) independently and the time of environment, setting up database connections, etc.
failure (and recover) is assumed to follow an exponential
4.9 Sample Policies
distribution or the Weibull (bathtub) distribution.
In this section, we discuss samples policies that are
4.8 Cost Model permissible in our model. We present policies that aid the
We have developed a cost model for evaluating the effect of system in responding to the following questions: When can
various policies on the system. In general, our cost model a node be borrowed? Where can a node be borrowed?
permits cost functions that are arbitrary functions of the When can a node be donated? For the sake of simplicity, in
the following discussion, we assume that workloads are
system’s state and state transitions. In this section, we present
prioritized in decreasing order of their VCs (VCs are
a sample cost model that considers three important costs:
typically specified in dollars).
1. Violation cost (VC). This cost represents the cost of
violating a workload’s SLA. 4.9.1 When to Borrow a Node?
2. Remote node cost (RSC). This cost represents the cost Let an event e denote the fact that a workload w
of using a remote resource. requirement has increased. Then, a borrow policy could
3. Reallocation cost (RC). This cost represents the initial be defined as a collection of policies shown as follows: With
A denoting the site to which workload w belongs
setup and provisioning cost for a workload.
Additionally, one could include the cost of operating a node . P1 . Always allocate idle nodes in preference to
measured in terms of power (typically dominated by cooling other nodes.
costs), physical space, and man-hours for maintenance. . P2 . Always use local nodes in preference to remote
We use ðS; wÞ to denote the VC for workload w when nodes.
the system is in state S. A popular choice for estimating the . P3 . If site A has currently donated nodes for some
VC would be to make the VC proportional to ndsðS; wÞ, remote workload rw, then preempt remote workload
where ndsðS; wÞ denotes the difference between the number rw if its priority is smaller than workload w.
of nodes required for workload w and the number of nodes . P4 . If site A had allocated nodes to some local
actually allocated to workload w (inclusive of local and workload lw, then preempt local workload lw if its
remote nodes) in state S. In this case, the VC would be priority is smaller than workload w.
expressed as a penalty per deficit node per unit of time in . P5 . A workload w is eligible for using a remote node
the workload’s SLA. only if its priority is greater than a minimum
We use ðS; wÞ to denote the cost of using a remote threshold.
node for workload w when the system is in state S. A
4.9.2 From Whom to Borrow a Node?
common choice for estimating the RSC would be to make
it proportional to the nbsðS; wÞ, where nbsðS; wÞ denotes Let us suppose that a workload w has qualified to borrow a
the number of remote nodes borrowed for workload w in remote node. Then, the site from which it borrows a node
could be determined by the following policy. Let nfX
state S. In this case, the RSC would be expressed as a
denote the number of free nodes currently at site X:
penalty per remote node per unit of time in the site’s
policy set. Additionally, one might choose to distinguish . P6 . Borrow a node from the site that has the
between nodes borrowed from different sites. In that case, maximum number of free nodes.
the RSC for different nodes would depend on the site
from which those nodes were borrowed. 4.9.3 When to Donate a Node?
We use ðS; S 0 ; wÞ to denote the RC for workload w Let us suppose that site X has received a request for
when the system makes a transition from states S to S 0 . The donating a node. Then, the site grants the request based on
RC would typically depend on at least three factors: the following policies. Let nfX denote the number of free
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
SRIVATSA ET AL.: A POLICY EVALUATION TOOL FOR MULTISITE RESOURCE MANAGEMENT 1357
5 ANALYSIS
In this section, we present techniques for analyzing a
complex policy-driven system. We incorporate policies into
the system by modeling the system dynamics as a finite- Fig. 2. Policy-guided state-space exploration.
state automaton. Then, we superimpose a queuing net-
work model on the automaton that labels state transitions nfX . Policies like p3 permit policy makers to add
in the automaton with probability distribution functions. randomization techniques that are popularly used as
We then solve the model by using numerical techniques or heuristics for performance enhancement.
simulations and use this solution to estimate workload We encode policies into the function P ðS; eÞ, as discussed in
costs (VC, RSC, and RC). the policy model. Recall that P ðS; eÞ determines how the
system responds to event e when it is in state S. We now
5.1 Finite-State Automaton present a simple technique for constructing the finite-state
We model the dynamics of a complex policy-driven automaton from the site models and their policies. We use
system as a nondeterministic finite-state automaton. The the algorithm in Fig. 2, which uses a policy-driven
automaton is constructed automatically from the indivi- technique for constructing the automaton. The algorithm
dual site’s policy set that is defined according to the policy starts with some valid (or legal) state S0 and explores the
model. States in the automaton correspond to the system state space in a policy-driven manner. This technique
states. State transitions are triggered by events. We use P allows us to automatically generate the automaton from
to denote the collection of policy sets from all sites in the the site model and their policies. By construction, every
system. The collective policy set P determines how the state S in the automaton is legal, and every transition T in
system responds to events. More specifically, policies the automaton is permissible. The probability of transition
serve three critical purposes: 1) to determine whether a is handled in the queuing network model that is juxtaposed
given state S is legally permissible or not (we call a state S on this finite-state automaton. We observed that this policy-
illegal if it does not conform to the policy set P ), 2) to guided state-space exploration technique allows us to speed
determine whether a transition T between two states S up automaton construction drastically. This is because
and S 0 is permissible or not (we term a transition, that is, a among all the combinatorial choices that could potentially
response to an event, illegal if the event is not handled in represent a state, very few (< 0.1 percent) of them actually
a way that conforms to policy set P ), and 3) to determine conformed to our policy set. For instance, simple policies
the probability that the system makes a transition T from like “do not starve a workload whenever an idle node (of
states S to S 0 in response to a given event e. We illustrate the required pool type) is locally available,” “use local node
the role of policies in generating the finite-state automaton in preference to remote nodes,” and “use priority-based
description for the system using three examples: preemptive scheduling to manage local nodes,” tend to
drastically limit the number of states. Hence, pruning the
. Policy p1 : site A never borrows a remote node for state space by using a policy-guided technique turned out
workload w. Given a state S, its legality can be to be highly beneficial.
tested (with respect to policy p1 ) by ensuring that
the workload state for w should indicate that nb 5.2 Queuing Network Model
(the number of nodes borrowed by workload w) is We superimpose a queuing network model on the finite-
equal to zero. state model to annotate state transitions with their prob-
. Policy p2 : site A borrows a remote node for ability distribution functions. A transition T : S !e S 0 is
workload w from a remote site that currently has labeled with a tuple hfe ; pri. The function fe describes the
the maximum number of idle nodes (of the same probability distribution of the event e that causes the system
server pool type as that required for workload w). to transit from states S to S 0 . The probability pr denotes the
Given a transition T from states S to S 0 , its legality probability that the system transits from states S to S 0 in
can be tested (with respect to policy p2 ) as follows: response to event e. For node events, the function fe is an
If workload w has borrowed a node in transition T , exponential distribution. For workload events, the function
then the state S 0 should indicate that the borrowed could either be an exponential distribution or a Pareto
node belongs to the site that has the maximum distribution. Exponential distribution is amenable to nu-
number of idle nodes in state S. merical analysis and thus provides fast (though crude)
. Policy p3 : suppose that site A has to borrow a remote results, while Pareto distribution captures a more realistic
node for workload w and let nfX denote the current bursty workload characteristics [37].
number of idle nodes at site X; then, site A borrows A solution to the above model gives us prðSÞ for all
a node from site X with a probability proportional to states S and rateðT Þ for all transitions T , where prðSÞ
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1358 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
SRIVATSA ET AL.: A POLICY EVALUATION TOOL FOR MULTISITE RESOURCE MANAGEMENT 1359
of the policy evaluation tool ðrunT ime < thrÞ, or the total
The unified policy evaluation algorithm also permits the
number of iterations ðnumItr < thrÞ.
When the policy evaluation tool terminates, we are left administrator to carry out sensitivity analysis effectively.
with important states and only relevant components in the We study techniques to perform sensitivity analysis with
state vector. For example, consider a computation-intensive respect to a system model, a workload model, and a cost
application where disk I/O is irrelevant. Let us suppose that model. Let us suppose that we have evaluated a system
a normalized network resource initially had four classes model SM, a workload model W M, and a cost model CM
(0, 32 kilobits per second (Kbps)), (32 Kbps, 64 Kbps), under the policy set P to obtain CSS (coalesced states). Let
(64 Kbps, 96 Kbps), and (96 Kbps, 128 Kbps). At the end of us now suppose that we make a small change 4SM in the
the algorithm, we will be left with states wherein the system model. Recall that SM is a state vector and 4SM
network resource has only one class (0, 128 Kbps). When any denotes the change in the system model and is also
resource spans its full range, the resource becomes irrelevant represented as a huge state vector such that the new
(irrespective of the quantity of that resource, the system system model SM 0 ¼ SM þ 4SM (vector addition). We use
behaves identically). Hence, we replace such resources with 4SM and update every state S in CSS to obtain
a special class “,” meaning that the resource is not S 0 ¼ S þ 4SMS, where 4SMS denotes a projection of
important for the given application, the policy set, and the 4SM on S. For example, if S has an on the disk I/O
cost function. In general, we start with an arbitrarily large component in its vector, then the disk I/O component in
number (millions) of highly fine-grained states. Our algo- 4SMS is also replaced by . If 4SM involved a machine
rithm not only provides a steady-state solution to the with 300 MIPS replaced by a machine with 600 MIPS and
queuing network model but also reduces the number of the state S had its CPU resource vector marked with the
states to only a few important ones (a few tens, depending on range (0, 1,000) MIPS, then 4SMS is (0, 1,000) MIPS.
the distance and cost thresholds). Clearly, a project of 4SM on S eliminates all changes that
5.4 Unified Policy Evaluation Algorithm are irrelevant to S. Finally, we compute important changes
in the system model
P with respect to the policy set S as
In this section, we present our complete algorithm for policy 0
imp change ¼ SS2CSS distanceðS; S Þ. If imp change is
evaluation. The algorithm takes a system model SM, a
workload model W M, a cost model CM, and a policy set P larger than a threshold, we solve the system afresh by
as its input. The output of the algorithm is a probability using the algorithm in Fig. 4; else, we simply construct the
distribution function of the overall system cost, namely, new set of coalesced states CSS 0 by replacing every state S
P rðcost ¼ xÞ for all x, where P rðcost ¼ xÞ denotes the in CSS by S 0 . We use CSS 0 to obtain the cost distribution.
probability that the overall system cost is equal to x. Note We use the same technique described above for sensitivity
that, given the overall cost distribution, one could easily analysis toward the workload model. However, one cannot
measure the average cost and its higher order moments (like use the same technique for the cost model, since states may
standard deviation). The unified policy evaluation algo- be coalesced using a cost threshold. We therefore start with
rithm is shown in the algorithm in Fig. 4. We use a policy- the uncoalesced state space SS and evaluate the costðSÞ for
guided state-space generation technique to generate the every S in SS by using the cost model CM and cost model
state space (the algorithm in Fig. 2). We then coalesce the CM 0 . If the mean difference SS 2 CSS jcostCM ðSÞcost
jSSj
CM 0 ðSÞj
is
states, depending on the distance and cost thresholds, and lesser than a threshold, then we assume that this change in
solve the coalesced state space CSS for a steady-state the cost model does not change the set of coalesced states.
solution. The steady-state solution gives prðSÞ, the prob- In this case, we use the same CSS to revaluate the cost
ability that the system is in some state S for all S 2 CSS. distribution (step 3 of the algorithm in Fig. 4); else, we rerun
Now, we obtain the cost of each state S by using a cost model the algorithm from step 2.
and translate the probability distribution over the state space
to a probability distribution over the overall system cost. The
cost distribution allows the administrator to perform worst 6 RESULTS
case analysis. For example, there could be two policy sets P In this section, we present several results obtained using
and Q such that the average cost of P is lower than that of Q. our analytical tool to study various policies for multisite
However, P might have a state S, where prðSÞ > 0, and resource management. For every experiment, we use a
costðSÞ is greater than the cost of all states in the state space different scenario that best highlights the inferences that
of policy Q. The administrator can make such an observation we made from them. A scenario is described using the site
by looking at the probability distribution plots for the and workload models used to perform the experiment. We
overall system for policy sets P and Q. This enables the use a small set of site types and workload types in all our
administrator to make much sounder decisions about which experiments. We first describe them in Figs. 5 and 6. For
policy set should be chosen for the system. example, Fig. 5 shows that a site of type ST1 runs a
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1360 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008
Fig. 9. Runtime.
workload of type W T1 and has three local nodes of pool- 6.1 Scalability Experiments
type zero, a site of type ST2 runs a workload of type W T2 In this section, we present performance and scalability
and has one local node of pool-type zero, and a site of results on our policy evaluation tool. We first study the
type ST3 runs two workloads, both of type W T2 , and has effect of distance and cost threshold on the performance of
two local nodes of pool-type zero. the policy evaluation tool and then show the ability of our
We now describe the first workload type W T1 . A evaluation tool to scale with the number of sites. Fig. 7
workload of type W T1 has five levels numbered 0, 1, 2, 3, shows the fraction of remaining states for different values of
and 4. At level i, the workload requires i nodes. At each distance and cost threshold. Note that only important states
level i, the workload spends an exponentially distributed are left behind when our policy evaluation tool terminates.
amount of time, with a mean of 100 time units. The default Recall that the higher the distance and cost threshold, the
transition probability matrix of the workloads is shown in more the probability that the states would be coalesced.
Fig. 6. In our experiments, we vary this matrix to change the Note that the number of important states drops steeply as
mean load and load variance. In this example, the mean the threshold values are increased. This is primarily because
load is about two nodes. When averaged over a long period most of the system states are indeed equivalent to one
of time, the workload is in level 2 for about 66 percent of the another and can be coalesced. For example, in a transac-
time, in levels 1 and 3 each for about 13 percent of the time, tional grid, the cost model is independent of the disk I/O
and in levels 0 and 4 each for about 4 percent of the time. A utilization. Hence, in all the final states, the disk I/O part of
workload of type W T2 has three levels numbered 0, 1, and the state vector would be eliminated (replaced by “”).
2. At level i, the workload requires i nodes. At each level i, Fig. 8 shows the accuracy of the evaluation tool for
the workload spends a Pareto-distributed amount of time, different values of distance and cost threshold. Accuracy is
with a mean of 100 time units and infinite variance. In our measured as the ratio of the estimated system cost with
experiments, we use different transition probability ma- state coalition and without coalition. In Fig. 7, as the
trices to achieve different mean loads and load variances. threshold is increased, the number of remaining states
For all workloads, we used the following simplified cost decreases sharply. However, the accuracy of our policy
model. We assume that every workload w has a priority evaluation tool decreases very marginally with the distance
denoted by priorityðwÞ. The VC parameter ðS; wÞ ¼ and the cost threshold.
ndsðS; wÞ priorityðwÞ, where ndsðS; wÞ denotes the differ- Fig. 9 shows the time taken for policy evaluation for
ence between the number of nodes required for workload w different values of distance and cost threshold. The time
and the number of nodes actually allocated to workload w that it takes for the policy evaluation tool to terminate falls
(local and remote nodes inclusive) in state S. The RSC very sharply with the threshold values. This is primarily
parameter ðS; wÞ ¼ 0, that is, there is no penalty for using because of the reduced number of states. Note that the
a remote node. The RC ðS; S; wÞ is defined to be equal to number of equations to be solved in order to obtain a
the VC experienced by the workload during the reallocation steady-state solution for the queuing network model is
process. Based on our measurements on our prototype (see proportional to the number of states. Note that the cost of
Section 6.1), we observed that a reallocation involving a solving a system of n linear equations is proportional to n2 .
local node took two time units and that involving a remote Fig. 10 shows the scalability of the system with the
node took 12 time units. Hence, ðS; S 0 ; wÞ ¼ nltsðS; S 0 ; wÞ number of sites for certain values of distance and cost
priorityðwÞ 2 þ nrtsðS; S 0 ; wÞ priorityðwÞ 12, w h e r e threshold. Note when the threshold values are zero, the
nltsðS; S 0 ; wÞ denotes the number of nodes involved in time that it takes for the policy evaluation tool to terminate
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
SRIVATSA ET AL.: A POLICY EVALUATION TOOL FOR MULTISITE RESOURCE MANAGEMENT 1361
Fig. 10. Scalability with the number of sites. Fig. 12. VC versus load variance.
almost increases exponentially with the number of sites. 1. Light loads. In this zone, there is not much need to
This is because the number of possible system states borrow resources, and hence, cooperating multiple
increase exponentially with the number of sites, thereby sites do not succeed in reducing the aggregate VC
severely limiting the scalability of the policy evaluation tool. significantly.
However, as we raise the threshold, the policy evaluation 2. Heavy loads. In this zone, the system (the collective
tool is much better equipped to handle a system with a resources available at all sites) is insufficient, and
larger number of states. hence, cooperation does not yield significant gains
(unless the workloads vary largely in terms of their
6.2 Case Study priorities).
In this section, we present a collection of case studies on 3. Moderate loads. In this zone, sites can offload their
multisite resource allocation. Even though our policy peak demands to free nodes available at remote
evaluation tool allows different sites to use different cost sites, thereby achieving much lower VCs.
models, in this section, we assume that all sites use the same Fig. 12 shows the aggregate VC as the load variance
cost model. changes (at fixed mean load ¼ 4). As the variance increases,
the workloads spend most of their time at a state where
6.2.1 When Is Multisite Resource Allocation Useful? they need four nodes or at a state where they need just
In this experiment, we identify the workload characteristics zero or one node. At lower load variances, the workloads
spend a significant portion of their time close to their
that would make multisite resource allocation a better
mean, that is, when the workloads each need two nodes.
choice compared to independent noncooperating sites. This
As the variance increases, cooperating multisites can
comparison is achieved by explicitly comparing the handle peak demands at one site by borrowing resources
aggregate VC of cooperating versus noncooperating sites. from the other site. Although the variance is high, the peak
The workload characteristics that are of primary interest to demands at the two sites are likely to be uncorrelated.
us are the mean load and the load variance. Furthermore, as the variance increases, the workloads
Scenario 1. Two sites S1 and S2 are both of type ST1 . Site S1 has spend much more time units at a state where they require
one workload W1 of type W T1 . Site S2 has one workload W2 four nodes and at a state where they require zero node.
Hence, as the variance increases, cooperating multisites
of type W T1 . Both workloads W1 and W2 have the same
become a much better choice for resource allocation
priority. compared to noncooperating sites.
Fig. 11 shows the VC as the total mean load of Scenario 2. Two sites S1 and S2 are both of type ST5 . Site S1 has
workloads W1 and W2 varies ðunder fixed variance ¼ 1Þ. one workload W1 of type W T1 . Site S2 has one workload W2 of
“lvc” denotes the VCs when the sites operate without type W T1 . Both workloads W1 and W2 have the same priority.
cooperating with each other (they optimize resource
In this experiment, we demonstrate the usefulness of
allocations locally). “vc” denotes the VCs when the sites
multisite resource allocation when the sites are unevenly
cooperate with another and borrow/donate nodes to
loaded. Fig. 13 shows the aggregate VCs when the sites are
handle peak loads. Fig. 11 can be divided into three zones: unevenly loaded. We fix the mean load on site S1 to be
Fig. 11. VC versus mean load. Fig. 13. Unevenly loaded sites.
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1362 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008
1.5 nodes and vary the mean load on site S2 . When the load indicates that only workloads fWi : i jg can borrow
on site S2 is very low, cooperating multisites do not have remote nodes. The values shown in the figure are normal-
any advantage. However, as the load on S2 increases, site S2 ized by “pr1 ,” which indicates the VC when all workloads
can offload some its load to site S1 . But, when site S1 gets can borrow remote nodes. At very low loads, there is no
loaded to its maximum capacity, it can no longer accept need to borrow nodes, and hence, the threshold has no effect
load from site S2 . Hence, the difference between “lvc” and on the VC. As the mean load increases, there is opportunity
“vc” does not diverge when the workload on site S2 soaks to offload the peak demand by borrowing remote nodes.
up all the resources available in site S2 and the unused Setting a threshold inhibits the system from exploiting the
resources in site S1 . free nodes available at remote sites for lower priority
Fig. 14 shows how our evaluation tool can be used to workloads. However, at very high loads, using a borrow
perform sensitivity analysis. Fig. 14 shows a simple threshold is very useful. Since lower priority workloads are
sensitivity analysis on the overall system cost as we vary not allowed to borrow nodes, there is not much time wasted
the workload model parameters. We vary the mean load and in reallocating resources among different workloads.
the load variance (keeping the mean load a constant) and Fig. 16 shows the cost distribution of two policies P1 and
study its effect on the system cost. On the x-axis, we show the P2 under a heavy mean load ¼ 8. P1 uses a borrow thresh-
factor by which a workload parameter is changed, and the old of 4, and P2 uses a borrow threshold of 2. Observe in
y-axis shows the corresponding factor by which the overall Fig. 15 that the average cost of P1 is lower than the average
system cost changes. Fig. 14 shows that by keeping the mean cost of P2 . However, the cost distribution in Fig. 16 shows
load a constant and increasing the variance by 20 percent that the worst case cost for P1 is higher than the worst case
(a factor of 1.2), the overall cost goes up to the same extent cost of P2 . This is primarily because under policy P1 ,
as increasing the mean load by 12 percent to 14 percent. three workloads are never permitted to borrow resources,
and thus, it incurs a higher cost when the load due to
6.2.2 Borrowing Remote Nodes workload W4 is low, and the rest (W1 , W2 , and W3 ) are
In this section, we study the effect of borrow threshold thr br high. If the system administrator is interested in worst case
on the aggregate VC. Note that when a borrow threshold is costs, then the administrator can graphically view the cost
enforced, only workloads w with priorityðwÞ thr br are distributions before deciding on the appropriate policy.
permitted to borrow remote nodes. The first experiment on The second experiment on borrow threshold shows the
borrow threshold shows that borrow threshold is useful only effect of a local resource allocation strategy. In this scenario,
when the system (inclusive of all sites) is operating at a high the resource manager uses a priority-based preemptive
mean load. The second experiment shows how our resource allocation strategy for managing local resources.
analytical tool can be used to perform worst case analysis.
Fig. 17 shows the VC versus the mean load for different
Scenario 3. Four sites S1 , S2 , S3 , and S4 are all of type ST2 . values of borrow threshold. “prj ” indicates that only
Site Si has one workload Wi of type W T2 for 1 i 4. workloads fWi : i jg can borrow remote nodes. The
The priority of workload Wi is i ð1 i 4Þ. values shown in the figure are normalized by “pr1 ,” which
Fig. 15 shows the aggregate VC versus the aggregate indicates the VC when all workloads can borrow remote
mean load for different values of borrow threshold. “prj ” nodes. The main emphasis in this experiment is that one
Fig. 15. Effect of borrow threshold. Fig. 17. Effect of the local optimization strategy.
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
SRIVATSA ET AL.: A POLICY EVALUATION TOOL FOR MULTISITE RESOURCE MANAGEMENT 1363
needs to be careful in choosing borrow thresholds. For some nodes from site B. Let the set wpA denote the priority
instance, “pr3 ” and “pr4 ” in the figure behave much worse of a workload that belong to site A and are currently
than “pr1 ” under all values of mean loads (even under high running on nodes in site B (and similarly for wpB ). Then,
loads, as compared to Fig. 15). This is because, in this the cycle-breaking rule requires that wpA > wpB . Clearly, if
scenario, the workloads W3 and W4 never need to borrow the cycle-breaking rule is strictly implemented, then there
nodes. The local optimization cycle always grabs a node would be no cycles in the resource borrowing graph;
from a local lower priority workload and transfers it to a that is, site A will not be using the resources at site B, and
higher priority workload. Since W3 and W4 require not site B will not be using the resources at site A at the same
more than two nodes, they are always guaranteed to be time instant. On the contrary, if we assume that there
allocated local nodes. Hence, “pr3 ” and “pr4 ” are equivalent exists a cycle A ! B ! ! A, then it would mean that
to the case where the sites are noncooperating (W3 and W4 wpA > wpB > > wpA , which is an obvious contradiction.
never need to borrow nodes, whereas W1 and W2 are not Fig. 19 compares the RSC (“rsc”) and the VC (“vc”) with
permitted to borrow nodes). and without the cycle-breaking rule for different values of
The third experiment on borrow threshold shows the the mean load. The figure shows the ratio of each cost with
effect of threshold adaptation. In threshold adaptation, the the cycle-breaking rule to that without the rule. At very low
threshold value is decayed by a constant decay factor on loads, no borrow operations are required, as the sites are
every remote optimization cycle. However, it is reset to its
self sufficient. As the mean load increases, much more
original (default) value whenever a borrow operation fails
nodes may be borrowed. The cycle-breaking rule decreases
to obtain a remote node. When the system is heavily loaded,
the number of borrow operations and thus ensures that the
the borrow requests for lower priority workloads is very
likely to fail, and hence, the borrow threshold would stay RSC and the RC are substantially smaller. At high loads,
close to its default value. On the other hand, if the system is there are no free nodes available that could be borrowed at
lightly loaded, most borrow requests would succeed in either site. Hence, at very high loads, the number of borrow
fetching a free remote node. Therefore, at low loads, the operations come down, and consequently, the cost differ-
borrow threshold would be very low, and thus, most ence between a policy with and without the cycle-breaking
workloads would be permitted to borrow nodes. We use the rule decreases.
same scenario as in Scenario 3, which is described earlier in
this section. Fig. 18 shows the VC versus the mean load for 6.2.4 Lease Time
different values of decay factor. At very low loads, the Scenario 5. Two sites S1 and S2 are each of type ST4 . Site S1
decay factor has no influence on the VC, since the has one workload W1 of type W T2 , and site S2 has
individual sites have sufficient resources to handle their one workload W2 of type W T2 . The priority of workload Wi
peak demands. When the decay factor is very close to one, is i ð1 i 2Þ.
we are being highly conservative in permitting lower This section studies the effect of lease time on the
priority workloads in borrowing nodes; thus, higher decay workload VCs. Lease-time-based policies allow resources
factors tend to perform poorly at moderate loads (under- to be borrowed or donated for a fixed period of time,
utilized resources). When the decay factor is low, we namely, the lease time. Leases are nonpreemptable; that is,
encourage lower priority workloads to borrow nodes; thus, once a resource is leased, the donor cannot withdraw that
low decay factors tend to perform poorly at high loads resource before the specified lease time. On the other
(thrashing due to frequent reallocation). hand, the site that borrowed a resource could return the
resource before its lease terminates.
6.2.3 Cycle Breaking Rule
Fig. 20 shows the VC of the two workloads as the lease
Scenario 4. Four sites S1 , S2 , S3 , and S4 are all of type ST2 . time increases. Note that leases are nonpreemptable, but a
Site Si has one workload Wi of type W T2 for 1 i 4. borrow resource can be returned before a lease expires (if
The priority of workload Wi is i ð1 i 4Þ. the borrowing site decides that the borrowed node is no
A cycle-breaking rule is a policy added to improve the longer required). Consider a scenario wherein the lower
system’s stability, that is, “no local workload with a higher priority workload W1 has borrowed a node. Now, if the
priority is executed on a remote node when a remote higher priority workload W2 needs a node, it has to wait till
workload with a lower priority is being run on a local the lease expires. When the lease expires, W1 is denied a
node.” The key motivation behind a cycle-breaking rule is lease extension, and the node is assigned to W2 . Hence, as
given as follows: Let A ! B denote that site A is using the lease duration increases, the VC for higher priority
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1364 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
SRIVATSA ET AL.: A POLICY EVALUATION TOOL FOR MULTISITE RESOURCE MANAGEMENT 1365
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.
1366 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 19, NO. 10, OCTOBER 2008
skillful management support for this effort and for the [28] A.K. Iyengar, M.S. Squillante, and L. Zhang, “Analysis and
Characterization of Large-Scale Web Server Access Patterns and
multisite resource allocation project, and Norbert Vogl for Performance,” World Wide Web J., 1999.
proofreading this paper. [29] J. Kay and P. Lauder, A Fair Share Scheduler. Univ. of Sydney and
AT&T Labs, 1988.
[30] K.H. Kim and R. Buyya, “Policy-Based Resource Allocation in
Hierarchical Virtual Organizations for Global Grids,” Proc. 18th
REFERENCES Int’l Symp. Computer Architecture and High-Performance Computing
[1] Globus Toolkit, http://www.globus.org/, 2008. (SBAC-PAD), 2006.
[2] IBM Data Center Solution, http://www-1.ibm.com/servers/ [31] K.H. Kim and R. Buyya, “Fair Resource Sharing in Hierarchical
eserver/xseries/windows/datacenter.html, 2008. Virtual Organizations for Global Grids,” Proc. Eighth IEEE/ACM
[3] IBM Network Dispatcher Features, http://www-3.ibm.com/ Int’l Workshop Grid Computing (GRID), 2007.
software/network/about/features/keyfeatures.html, 2008. [32] A.J. King and M.S. Squillante, “Optimal Control of Web Hosting
[4] IBM Tivoli Decision Support for OS/390 Capacity Planner Feature Systems under Service-Level Agreements,” IBM Research Report
Guide and Reference, http://publib.boulder.ibm.com/tividd/td/ RC23094 (W0401-145), 2004.
TDS390/SH19-4021-05/en_US/HTML/drlz9mst.htm, 2008. [33] L. Kleinrock, Queuing Systems. Wiley Interscience, 1975.
[5] IBM Tivoli Intelligent Orchestrator, http://www-306.ibm.com/ [34] O. Kremien and J. Kramer, “Methodical Analysis of Adaptive
software/tivoli/products/intell-orch, 2008. Load Sharing Algorithms,” IEEE Trans. Parallel and Distributed
[6] IBM Tivoli Provisioning Manager, http://www-306.ibm.com/ Systems, vol. 3, no. 6, Nov. 1992.
software/tivoli/products/prov-mgr, 2008. [35] A. Leff, J. Rayfield, and D. Dias, “Meeting Service Level
[7] IBM WebSphere Extended Deployment Edition (WebSphere XD), Agreements in a Commercial Grid,” IEEE Internet Computing,
http://www-6.ibm.com/jp/software/websphere/ft/was/xd/ 2003.
pdf/whitepaper.pdf, 2008. [36] H. Ludwig, A. Keller, A. Dan, R. King, and R. Frank, “A Service-
[8] IBM WebSphere Performance Benchmark Sample (Trade2), Level Agreement Language for Dynamic Electronic Services,”
http://www-4.ibm.com/software/webservers/appserv/ Electronic Commerce Research 3, 2003.
wpbs_download.html, 2008. [37] M.S. Squillante, D.D. Yao, and L. Zhang, “Web Traffic Modeling
[9] LPAR: Dynamic Logical Partitioning, http://www-03.ibm.com/ and Server Performance Analysis,” Proc. 38th IEEE Conf. Decision
servers/eserver/iseries/lpar/, 2008. and Control (CDC), 1999.
[10] Platform LSF, http://www.platform.com/, 2008. [38] Simple Object Access Protocol (SOAP), World Wide Web Con-
[11] VMWare ESX Server, http://www.vmware.com/, 2008. sortium (W3C) note, http://www.w3.org/TR/soap, 2008.
[12] WS Agreement Specification, http://www.gridforum.org/, 2008. [39] Web Services, World Wide Web Consortium (W3C) note, http://
[13] E.A. Brewer, “Lessons from Giant-Scale Services,” IEEE Internet www.w3.org/2002/ws, 2008.
Computing, 2001. [40] Y. Yan, A. El-Atawy, and E. Al-Shaer, “Ranking-Based Optimal
[14] G. Cheliotis and C. Kenyon, “Autonomic Economics: Why Self- Resource Allocation in Peer-to-Peer Networks,” Proc. IEEE
Managed E-Business Systems Will Talk Money?” Proc. IEEE Conf. INFOCOM, 2007.
e-Commerce Technology (CEC), 2003.
[15] G. Cheliotisyx, C. Kenyony, and R. Buyya, “10 Lessons from Mudhakar Srivatsa received the BTech degree
Finance for Commercial Sharing of Its Resources,” Peer to Peer in computer science and engineering from the
Computing: The Evolution of Disruptive Technology, IRM Press, Indian Institute of Technology Madras, Chennai,
2004. India, in 2002 and the PhD degree in computer
[16] B. Cooper and H. Garcia-Molina, “Peer-to-Peer Resource Trading science from Georgia Institute of Technology,
in a Reliable Distributed System,” Proc. First Int’l Workshop Peer-to- Atlanta, in 2006. He is currently a research
Peer Systems (IPTPS), 2002. scientist at the IBM T.J. Watson Research
[17] Y. Drougas and V. Kalogeraki, “A Fair Resource Allocation Center, conducting research at the intersection
Algorithm for Peer-to-Peer Overlays,” Global Internet, 2005. of networking and security. His research inter-
[18] C. Dumitrescu and I. Foster, “Usage-Policy-Based CPU Sharing in ests include security and reliability of large-scale
Virtual Organizations,” Proc. Fifth IEEE/ACM Int’l Workshop Grid networks, secure information flow, and risk management. He is a
Computing (GRID), 2004. member of the IEEE.
[19] C. Dumitrescu, M. Wilde, and I. Foster, “A Model for Usage
Policy-Based Resource Allocation in Grids,” Proc. Sixth IEEE Int’l Nithya Rajamani received the BE degree from
Workshop Policies for Distributed Systems and Networks (POLICY), Anna University, Chennai, India and the mas-
2005. ter’s degree in computer science from the
[20] G.S. Fishman, “Discrete Event Simulation,” Springer Series in University of Illinois, Urbana-Champaign. She
Operations Research, 2001. is currently a research developer and a senior
[21] I. Foster, D. Gannon, H. Kishimoto, and J.V. Reich, “Open Grid software engineer at the IBM T.J. Watson
Services Architecture Use Cases,” Global Grid Forum (GGF), Research Center. Her interests include distrib-
information document, Oct. 2004. uted systems, Web services, and, more re-
[22] I. Foster, C. Kesselman, J. Nick, and S. Tuescke, “The Physiology cently, information management and services
of the Grid: An Open Grid Services Architecture for Distributed science.
Systems Integration,” Open Grid Service Infrastructure WG, Global
Grid Forum, 2002.
[23] I. Foster, C. Kesselman, and S. Tuecke, “The Anatomy of the Grid:
Enabling Scalable Virtual Organizations,” Int’l J. Supercomputer Murthy Devarakonda received the PhD degree
Applications, vol. 15, no. 3, pp. 200-222, 2001. in computer science from the University of
[24] I. Foster, H. Kishimoto, A. Savva, D. Berry, A. Djaoui, Illinois, Urbana-Champaign, in 1988. He is
A. Grimshaw, B. Horn, F. Maciel, F. Siebenlist, R. Subramaniam, currently a senior manager and a research staff
J. Treadwell, and J.V. Reich, “The Open Grid Services member in the Services Research Department,
Architecture,” Global Grid Forum (GGF), informational docu- IBM T.J. Watson Research Center, where he
ment, 2005. has worked on distributed file systems, Web
[25] A. Fox, S.D. Gribble, Y. Chawathe, E.A. Brewer, and P. Gauthier, technologies, policy-based systems manage-
“Cluster-Based Scalable Network Services,” Proc. 16th ACM Symp. ment, and services computing. He received
Operating Systems Principles (SOSP), 1997. IBM Divisional Awards for his work on distributed
[26] H. Gimpel, H. Ludwig, A. Dan, and R. Kearney, “PANDA: file systems and global technology outlook development. He is a senior
Specifying Policies for Automated Negotiations of Service Con- member of the IEEE and the ACM.
tracts,” Proc. First Int’l Conf. Service-Oriented Computing (ICSOC),
2003.
[27] Grid Resource Allocation Management (GRAM) Service. Globus,
http://www.globus.org/toolkit/docs/3.2/gram/, 2008.
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on April 20,2010 at 01:00:27 UTC from IEEE Xplore. Restrictions apply.