NimrodG An Architecture For A Resource Management and Scheduling System in A Global Computational Grid
NimrodG An Architecture For A Resource Management and Scheduling System in A Global Computational Grid
t School of Computer Science and $ CRC for Enterprise Distributed Systems Technology
Software Engineering General Purpose South Building,
Monash University, Caulfield Campus University of Queensland, St. Lucia
Melbourne, Australia Brisbane, Australia
283
$10.000 2000 IEEE
0-7695-0589-2/00
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on March 31,2010 at 20:19:03 EDT from IEEE Xplore. Restrictions apply.
computational resources to run, the user can hire or rent 2. System Architecture
resources on the fly and pay for what he uses. The
The architecture of N i m r d G is shown in Figure 1 and
resource price may vary from time to time and from one
user to another user. At runtime, the user can even enter its key components are:
into bidding and negotiate for the best possible Client or User Station
resources for low-cost access from computational Parametric Engine
service providers. In order to get the best value for Scheduler
money, the user can reserve the resources in advance. Dispatcher
The use of a computational economy can be made Job-Wrapper
simpler by providing a layer that allows the user to The interaction between the above components and grid
select a “deadline”, the period within which an resources is shown in Figure 2.
application execution must be completed, and a Client or User Station
“price”, the amount that the user is willing to pay for
the completion of the application. This layer is then This component acts as a user-interface for controlling
responsible for agglomerating individual resources to and supervising an experiment under consideration. The
satisfy these constraints (user requirements) using user can vary parameters related to time and cost that
resource reservation and bidding methods. influence the direction the scheduler takes while
selecting resources. It also serves as a monitoring
In order to address the complexities associated with console and lists status of all jobs, which a user can
parametric computing on clusters of ‘ distributed view and control. Another feature of the Nimrod/G
systems, we have devised a system called Nimrod client is that it is possible to run multiple instances of
[ 1][2][3][4]. Nimrod provides a simple declarative the same client at different locations. That means the
parametric modeling language for expressing a experiment can be started on one machine, monitored
parametric experiment. The domain experts can easily on another machine by the same or different user, and
create a plan for a parametric computing (task farming) the experiment can be controlled from yet another
and use the Nimrod runtime system to submit, run, and location. We have used this feature to monitor and
collect the results from multiple computers (cluster control an experiment from Monash University in
nodes). Nimrod has been used to run applications Australia and Argonne National Laboratory in the USA
ranging from bio-informatics and operations research to simultaneously. It is also possible to have alternative
the simulation of business processes [ 12][21]. A clients, such as a purely text-based client, or another
reengineered version of Nimrod, called Clustor, has application (Active Sheets [ 161, an extended Microsoft
been commercialized by Active Tools [5]. Excel spreadsheet that submits cell functions for
The Nimrod system has been used successfully with execution on the computational grid).
a static set of computational resources, but is unsuitable
as implemented in the large-scale dynamic context of Parametric Engine
computational grids, where resources are scattered The parametric engine acts as a persistent job control
across several administrative domains, each with their agent and is the central component from where the
own user policies, employing their own queuing whole experiment is managed and maintained. It is
system, varying access cost and computational power. responsible for parameterization of the experiment and
These shortcomings are addressed by our new system the actual creation of jobs, maintenance of job status,
called N i m r d G that uses the Globus [ 171 middleware interacting with clients, schedule advisor, and
services for dynamic resource discovery and dispatcher. The parametric engine takes the experiment
dispatching jobs over computational grids. plan as input described by using our declarative
The preliminary monolithic version of N i m r d G has parametric modeling language (the plan can also be
been discussed in [4]. In this paper we mainly discuss created using the Clustor CUI [13]) and manages the
the architecture of a new highly modularized, portable, experiment under the direction of schedule advisor. It
and extensible version of N i m r d G . It takes advantage then informs the dispatcher to map an application task
of features supported in the latest version (vl.1) of to the selected resource.
Globus [20] such as automatic discovery of allowed The parametric engine maintains the state of the
resources. Furthermore, we introduce the concept of whole experiment and ensures that the state is recorded
computational economy as part of the NimrodG in persistent storage. This allows the experiment to be
scheduler. The architecture is extensible enough to use restarted if the node running Nimrod goes down. The
any other grid-middleware services such as NetSolve parametric engine exposes the Clustor network
[8]. The rest of the paper focuses on N i m r d G interface [ 141 to the other components and allows new
architecture and its interactions with grid components, components to be “plugged in” to the central engine.
scheduling, computational economy, and related work.
284
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on March 31,2010 at 20:19:03 EDT from IEEE Xplore. Restrictions apply.
NimroUG Cllent‘ ~
---NimrodlG Client
f ’
I
285
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on March 31,2010 at 20:19:03 EDT from IEEE Xplore. Restrictions apply.
renegotiate either by changing the deadline andor the 0 Application Deadline
cost. The advantage of this approach is that the user 0 User CapacityNillingness to Pay for Resource
knows before the experiment is started whether the Usage
system can deliver the results and what the cost will be. Resource Cost (in terms of dollars that the user
T o some extent our earlier prototype of N i m r d G need to pay to the resource owner)
has followed the first method and was able to select 0 Resource Cost Variation in terms of Time-scale
resources based on artificial costs [4]. This system tries (like high 63 daytime and low 63 night)
to find sufficient resources to meet the user’s deadline, Historical Information, including Job
and adapts the list of machines it is using depending on Consumption Rate
competition for them. However, the cost changes as The important parameters of computational economy
other competing experiments are put on the grid. The that can influence the way resource scheduling is done
implementation of the second method of computational are:
economy as part of our N i m r d G system is rather Resource Cost (set by its owner)
complex and it needs grid middleware services for Price (that the user is willing to pay)
resource reservation, broker services for negotiating Deadline (the period by which an application
cost, and the underlying system having management
execution need to completed)
and accounting infrastructure in place.
The scheduler can use all sorts of information
The scheduling system can use various kinds of gathered by a resource discoverer and also negotiate
parameters in order to arrive at a scheduling policy to with resource owners to get the best “value for money”.
optimally complete an application execution. The The resource that offers the best price and meets
parameters to be considered include, resource requirements can eventually be selected. This
Resource Architecture and Configuration can be achieved by resource reservation and bidding. If
Resource Capability (clock speed, memory size) the user deadline is relaxed, the chances of obtaining
Resource State (such as CPU load, memory low-cost access to resources are high. The cost of
available, disk storage free) resources can vary dynamically from time to time and
Resource Requirements of an Application the resource owner will have the full control over
Access Speed (such as disk access speed) deciding access cost. Further, the cost can vary from
Free or Available Nodes one user to another. The scheduler can even solicit bids
Priority (that the user has) or tenders from computational resource providers in an
Queue Type and Length open market, and select the feasible service-provider
and use. It is real challenge for the resource sellers to
Network Bandwidth, Load, and Latency (if jobs
decide costing in order to make profit and attract more
need to communicate)
customers.
Reliability of Resource and Connection
User Preference
Additional services used implicitly:
9GSI (authentication &authorization)
9Nexus (communication)
Resource
allocation
/-----.
286
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on March 31,2010 at 20:19:03 EDT from IEEE Xplore. Restrictions apply.
component based architectural design for N i m r d G
4. Implementation that will allow it to be extended and implemented
The interaction between various components of using different middleware services. The new
Nimrod/G and grid resources (see Figure 2) has been architecture will also support experimentation with a
discussed in the above sections and from this it is clear computational economy in a way that has not been
that they need to use dedicated protocols for possible to date. In order to illustrate the potential for
communication. Nimrod/G components use TCP/IP such a system, we show the results produced using the
sockets for exchanging commands and information previous monolithic version of NimrodG [4] using a
between them. The implementation of the latest real science case study-simulation of an Ionization
version of Nimrod/G follows the Clustor network Chamber Calibration on the GUSTO testbed
protocols [I41 as much as possible. In order to avoid resources. In this experiment, we ran the code across
the need for every user to understand the low-level different design parameters and specified different real
protocols, we are developing a library of higher level time deadlines. For the sake of completeness, the
APls that can be accessed by any other extension results of trials conducted during ApriVMay 1999 are
components inter-operating with the NimrodG job shown in Figure 3. There were about 70 machines
control agent. A user could build an altemative available to us during the trial. We expect to repeat
scheduler by using these APIs. this trial using the new architecture in the near future,
and then experimentation will begin with a new
The earlier version of Nimrod/G [4] was developed
computational economy as discussed in Section 3.
using Globus Toolkit (version 1 .O). The components
of Globus used in its implementation are: GRAM Figure 3 shows the effect of varying the deadline
(Globus Resource Allocation Manager), MDS for the experiment on the number of processors used.
(Metacomputing Directory Service), GSI (Globus Not surprisingly, as the deadline is tightened, the
Security Infrastructure), and GASS (Global Access to scheduler needs to find more resources until the
Secondary Storage). The current version also uses deadline can be met. The GUSTO test-bed resources
these services along with the new features (such as selected change from one deadline to another and also
Grid Directory Information Services) supported by the from time to time due to the variation of
latest version of Globus Toolkit (version 1 . 1 ) [20]. availability/status of resources. When the deadline is
tight, the scheduler selects a large number of resources
In the future, the Globus toolkit is expected to (even though they are likely to be expensive) in order
support resource reservation services [ 191. Our to complete the experiment within the deadline. In the
scheduler will use those services in order to support above case, the scheduler has selected resources to
the market-based computational-economy model
keep the cost of experiment as low as possible, yet
discussed earlier. However, currently we plan to build
meeting the deadline. This clearly demonstrates the
a simulated model for investigation purposes and build
ability and the scalability of NimrodG to schedule
a real model when middleware services are made tasks according to time and cost constraints over grid
available. resources. The new architecture will allow more varied
In many dedicated clusters of computers (such as experimentation.
Beowulf-class Linux clusters) it is common for only
the master node to be able to communicate to the 6. Related Work
external world (Intemet) and all other nodes are
interconnected using private (high-speed) networks. A number of projects are investigating scheduling on
Accordingly, these private nodes of the cluster can computational grids. They include AppLeS [6],
only be accessed by master-node. In order to address NetSolve [SI, and DISCWorld [15], but these do not
this problem, we have developed a proxy server in employ the concept of computational economy in
order to integrate closed cluster nodes as part of scheduling. REXEC [lo] supports the concept of
computation grids. The proxy deployed on the cluster computational economy, but i t is limited to
master node acts as a mediator between external departmentkampus-wide network of workstations.
Nimrod components and cluster private-nodes for The AppLeS (Application-Level Scheduling) builds
accessing storage. When a client, running on a cluster agents for each application (case-by-case) responsible
private nodes, makes U 0 call for accessing data for offering a scheduling mechanism [6]. It uses the
available on external system, the proxy uses Globus NWS (Network Weather Service)[22] to monitor the
GASS services to fetch or stage the required data. varying loads on resourcednetworks to select viable
resource configurations. Whereas, N i m r d G offers a
5. Evaluation tool level solution that applies to all applications and
the users are not required to build scheduling agents
The purpose of this paper is to present a new
for each of their applications as in AppLeS.
287
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on March 31,2010 at 20:19:03 EDT from IEEE Xplore. Restrictions apply.
20 hours deadline
Figure 3. GUSTO resources usage for 10, 15, and 20 hours of deadline. [4]
NetSolve is a client-agent-server system, which whereas, NimrodG focuses on providing an easy and
enables the user to solve complex scientific problems transparent mechanism for accessing computational
remotely [8]. The NetSolve agent does the scheduling resources.
by searching for those resources that offer the best Another related tool that uses the concept of
performance in a network. The applications need to be computational economy is REXEC, a remote execution
built using one of the APIs provided by NetSolve to environment [lo] for a campus-wide network of
perform RF'C-like computations. NetSolve also workstations, which is part of the Berkeley Millennium
provides an API for creating t u s k f u m s [9] that means, project. At the command line, the user can specify the
the framing applications needs to be developed on a maximum rate (credits per minute) that he is willing to
case-by-case basis using NetSolve APls. In case of pay for CPU time. The REXEC client selects a node
Nimrod, non computer-scientists can easily use the CUI that fits the user requirements and executes the
to create task farms without modifying the application. application on it. The E X E C provides an extended
However, it is interesting to note that the NetSolve shell for remote execution of applications on clusters.
middleware services can be used to build a system like That is, it offers a generic user interface for
N i m r d G parametric engine. But the concept of computational economy on clusters, whereas NimrodG
computational economy cannot be supported unless aims at offering a comprehensive and specialized
NetSolve offers the ability to build scheduling agents or environment for parametric computing in computational
allows the user to plug-in their own scheduling policies. grids. Another difference is that REXEC executes jobs
DlSCWorld (Distributed Information Systems on the node computer directly, whereas N i m r d G has
Control World) is a service-oriented metacomputing the capability to submit jobs to queues on a remote
environment, based on the client-server-server model system that, in turn, manages compute resources.
[15]. Remote users can login to this environment over
the Intemet and request access to data, and also invoke 7. Conclusions and Future Work
services or operations on the available data. The evolution of Nimrod from scheduling for a local
DlSCWorld aims for remote information access
computing environment to scheduling for the global
288
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on March 31,2010 at 20:19:03 EDT from IEEE Xplore. Restrictions apply.
computational grid has been discussed. It particular, we Parallel and Distributed Processing Symposium (IPDPS),
focused on the N i m r d G architecture for resource Mexico, 2000.
management and scheduling on a computational grid. [5] Active Tools Corporation,
The various parameters that influence the scheduling on http://www.activetools.com, Nov. 6, 1999.
computational grids with a computational economy are [6] Fran Berman, F. and Wolsh, R., The AppLeS Projecr: A
Status Report, Proceedings of the 8th NEC Research
presented. Some preliminary results from an earlier
Symposium, Germany, May 1997.
experiment are presented to demonstrate the scalability [7] Buyya, R. (ed.), High Performance Cluster Computing:
and the ability of Nimrod/G in making good scheduling Architectures and Systems, Volume I , Ile, Prentice Hall
decisions. Finally, we show how Nimrod/G relates to PTR, NJ, USA, 1999.
other projects. [8] Casanova, H. and Dongma, J., Netsolve: A Network
The future work focuses on the use of economic Server for Solving Computational Science Problems,
theories in grid resource management and scheduling. Intl. Journal of Supercomputing Applications and HPC,
Vol. 11, Number 3, 1997.
The components that make up Q i d Architecture for
[9] Casanova, H., Kim, M., Plank, J., Dongarra, J., Adaptive
Computational Economy (GRACE) include global
-
Scheduling for Task Farming with Grid Middleware,
scheduler (broker), bid-manager, directory server, and Intl. Journal of Supercomputer Applications and High
bid-server working closely with grid middleware and Performance Computing, 1999.
fabrics. The GRACE infrastructure also offers generic [IO] Chun, B. and Culler, D., REXEC: A Decentralized,
interfaces (APIs) that the grid tools and applications Secure Remote Execution Environment for Prallel and
programmers can use to develop software supporting Sequential Programs, Special Issue on Cluster
the computational economy. Computing using High-speed Networks, Journal of
Supercomputing. (In review).
[ l l ] Clustor, Active Tools, Nov. 8, 1999,
Acknowledgments http://www.activetools.com/prOducts.html
The Nimrod project is supported by the Distributed [12] Clustor Case Studies, Nov. 6, 1999,
Systems Technology Centre (DSTC) under the http://hathor.cs.monash.edu.au/RJW
Australian Government CRC program. The award of [13] Clustor Manual, Writing Job Plans, Chap.4, 1999.
the Australian Government International Postgraduate http://www.activetools.com/manhtm120/plans.htm
Research Scholarship (IPRS), the Monash University [14] Clustor Manual, Using Application Programming
Interface, Chap. 9, Oct 1999,
Graduate Scholarship (MGS), the DSTC Monash
http://www.activetools.codmanhtml20/plans.htm
Scholarship, and the IEEE Computer Society Richard E 151 Hawick, K., James, H. et. al., DISCWorld: An
Merwin Scholarship is acknowledged. Environment for Service-Based Metacomputing, Future
We thank Jack Dongarra (University of Tennessee, Generation Computing Systems (FGCS), Vol. 15, 1999.
Knoxville), Francine Berman (University of California, I61 DSTC Project, ActiveSheets-Active Spreadsheets using
San Diego), Toni Cortes (Universitat Politecnica de an Enterprise Wide Computational Grid, Bstributed
Systems Technology Centre, 1999.
Catalunya, Barcelona), Hai Jin (University of Southern
171 Foster, I., Kesselman, C., Globus: A Mefacomputing
California, Los Angeles), Savithri S (Motorola India Infrastructure Toolkit, Intemational Journal of
Electronics Ltd., Bangalore), and anonymous reviewers Supercomputer Applications, 11(2):115-128, 1997.
for their comments on the work. [18] Foster, I., and Kesselman, C. (editors), Computational
Grids. The Grid: Blueprint for a Future Computing
References Infrastructure, Morgan Kaufmann Publishers, USA,
1999.
[l] Abramson, D., Sosic, R., Giddy, J., Cope, M., The [ 191Foster, I., Advance Reservations and Co-Scheduling
Laboratory Bench: Distributed Computing for Workshop, Workshop Results, NASA Ames Research
Parametised Simulations, 1994 Parallel Computing and Center and Argonne National Laboratory, 1999.
Transputers Conference, Wollongong, 1994. [20] Globus Grid Programming Toolkit, Version 1.1,
[2] Abramson D., Sosic R., Giddy J. and Hall B., Nimrod: A http://www.globus.org/vl .I/
Tool for Performing Parametised Simulations using [21] Lewis, A., Abramson D., Sosic R., Giddy J., Tool-based
Distributed Workstations, The 4th IEEE Symposium on Parameterisation: An Application Perspective,
High Performance Distributed Computing, Virginia, Computational Techniques and Applications
August 1995. Conference, Melboume, July 1995.
[3] Abramson, D., Foster, I., Giddy, J., Lewis, A., Sosic, R., [22] Wolski R., Neil T. Spring, and Jim Hayes, The Network
Sutherst, R., and White, N., Nimrod Computational Weather Service: A Distributed Resource Performance
Workbench: A Case Study in Desktop Metacomputing, Forecasting Service for Metacomputing, Journal of
Australian Computer Science Conference (ACSC 97), Future Generation Computing Systems, Vol. 15,
Macquarie University, Sydney, Feb 1997. October 1999.
[4] Abramson, D., Giddy, J, and Kotler, L., High [23] William Leinberger and Vipin Kumar, Information
Performance Parametric Modeling with NimroaYG: Power Grid: The new frontier in parallel computing?,
Killer Application for the Global Grid?, International IEEE Concurrency, Vol. 7, No. 4, Oct.-Dec. 1999.
289
Authorized licensed use limited to: UNIVERSITY OF SYDNEY. Downloaded on March 31,2010 at 20:19:03 EDT from IEEE Xplore. Restrictions apply.