Grid Computing 1
Grid Computing 1
Definitions
Today there are many definitions of Grid computing:
• The definitive definition of a Grid is provided by Ian Foster in his article "What is
the Grid? A Three Point Checklist"The three points of this checklist are:
o Computing resources are not administered centrally.
o Open standards are used.
o Non-trivial quality of service is achieved.
• IBM defines Grid Computing as "the ability, using a set of open standards and
protocols, to gain access to applications and data, processing power, storage
capacity and a vast array of other computing resources over the Internet. A Grid is
a type of parallel and distributed system that enables the sharing, selection, and
aggregation of resources distributed across 'multiple' administrative domains
based on their (resources) availability, capacity, performance, cost and users'
quality-of-service requirements"
Origins
Like the Internet, grid computing evolved from the computational needs of "big science".
The Internet was developed to meet the need for a common communication medium
between large, federally funded computing centers. These communication links led to
resource and information sharing between these centers and eventually to provide access
to them for additional users. Ad hoc resource sharing 'procedures' among these original
groups pointed the way toward standardization of the protocols needed to communicate
between any administrative domain. The current grid technology can be viewed as an
extension or application of this framework to create a more generic resource sharing
context.
Fully functional proto-grid systems date back to the early 1970s with the Distributed
Computing System[1] (DCS) project at the University of California, Irvine. David Farber
was the main architect. The caption read "The ring acts as a single, highly flexible
machine in which individual units can bid for jobs". In modern terminology ring =
network, and units = computers, very similar to how computational capabilities are
utilized on the grid. This technology was mostly abandoned in the 1980s as the
administrative and security issues involved in having machines you did not control do
your computation were seen as insurmountable.
The ideas of the grid were brought together by Ian Foster, Carl Kesselman and Steve
Tuecke, the so called "fathers of the grid." They led the effort to create the Globus
Toolkit incorporating not just CPU management (examples: cluster management and
cycle scavenging) but also storage management, security provisioning, data movement,
monitoring and a toolkit for developing additional services based on the same
infrastructure including agreement negotiation, notification mechanisms, trigger services
and information aggregation. In short, the term grid has much further reaching
implications than the general public believes. While Globus Toolkit remains the de facto
standard for building grid solutions, a number of other tools have been built that answer
some subset of services needed to create an enterprise grid.
Features
Grid computing offers a model for solving massive computational problems by making
use of the unused resources (CPU cycles and/or disk storage) of large numbers of
disparate computers, often desktop computers, treated as a virtual cluster embedded in a
distributed telecommunications infrastructure. Grid computing's focus on the ability to
support computation across administrative domains sets it apart from traditional computer
clusters or traditional distributed computing.
Grids offer a way to solve Grand Challenge problems like protein folding, financial
modelling, earthquake simulation, and climate/weather modeling. Grids offer a way of
using the information technology resources optimally inside an organization. They also
provide a means for offering information technology as a utility bureau for commercial
and non-commercial clients, with those clients paying only for what they use, as with
electricity or water.
Grid computing has the design goal of solving problems too big for any single
supercomputer, whilst retaining the flexibility to work on multiple smaller problems.
Thus Grid computing provides a multi-user environment. Its secondary aims are better
exploitation of available computing power and catering for the intermittent demands of
large computational exercises.
This approach implies the use of secure authorization techniques to allow remote users to
control computing resources.
Grid computing is often confused with cluster computing. The key difference is that a
cluster is a single set of nodes sitting in one location, while a Grid is composed of many
clusters and other kinds of resources (e.g. networks, storage facilities).
Grids can be categorized with a three stage model of departmental Grids, enterprise Grids
and global Grids. These correspond to a firm initially utilising resources within a single
group i.e. an engineering department connecting desktop machines, clusters and
equipment. This progresses to enterprise Grids where non-technical staff's computing
resources can be used for cycle-stealing and storage. A global Grid is a connection of
enterprise and departmental Grids which can be used in a commercial or collaborative
manner.
Conceptual framework
Grid computing reflects a conceptual framework rather than a physical resource. The
Grid approach is utilized to provision a computational task with administratively-distant
resources. The focus of Grid technology is associated with the issues and requirements of
flexible computational provisioning beyond the local (home) administrative domain.
Resources
One characteristic that currently distinguishes Grid computing from distributed
computing is the abstraction of a 'distributed resource' into a Grid resource. One result
of abstraction is that it allows resource substitution to be more easily accomplished. Some
of the overhead associated with this flexibility is reflected in the middleware layer and
the temporal latency associated with the access of a Grid (or any distributed) resource.
This overhead, especially the temporal latency, must be evaluated in terms of the impact
on computational performance when a Grid resource is employed.
Web based resources or Web based resource access is an appealing approach to Grid
resource provisioning. A recent GGF (Global Grid Forum) Grid middleware evolutionary
development "re-factored" the architecture/design of the Grid resource concept to reflect
using the W3C WSDL (Web Service Description Language) to implement the concept of
a WS-Resource. The stateless nature of the Web, while enhancing the ability to scale, can
be a concern for applications that migrate from a stateful protocol for accessing resources
to the Web-based stateless protocol. The GGF WS-Resource concept includes
discussions on accommodating the statelessness associated with Web resources access.
State-of-the-art, 2005
The conceptual framework and ancillary infrastructure are evolving at a fast pace and
include international participation. The business sector is actively involved in
commercialization of the Grid framework. The "big science" sector is actively addressing
the development environment and resource (aka performance) monitoring aspects.
Activity is also observed in providing Grid-enabled versions of HPC (High Performance
Computing) tools. Activity in the domains of "little science" appears to be scant at this
time. The treatment in the GGF documentation series reflects the HPC roots of the Grid
concept framework; this bias should not be interpreted as a restriction in the application
of the Grid conceptual framework in its application to other research domains or other
computational contexts.
To address these issues of scale and to enable the formation of large-scale organisations
for scientific research in the life-sciences, we are actively working on building grid-
infrastructures to support these virtual scientific communities. We are currently focusing
our work on three main topics.
Grid-Enabled Workflows
Although current grid infrastructures like Globus show promise in addressing many of
the problems scalable virtual organizations are confronted with, the area of grid-enabled
workflows still seems to be characterized by a lack of implementation or conflicting
approaches. While there are myriads of competing approaches to modell workflows, we
are currently migrating our own proprietary workflow-engine mine-it to a more standards
based approach: The Standard of choice here is BPEL. First experiments with this
approach showed, that the support for this standard is a viable approach to connect
existing grid-resources and use of the shelf-tools to model bioinformatic workflows that
run distributed on a grid-infrastructure.
Grid Imaging
Recent lab-techniques like RNAi and advances in both live-cell microscopy and image-
processing algroithms enable new approaches to automating the often tedious and labour-
intense process of mapping genetic information to phenotypes. Although complex
processing chains that range from the image-acquistion, over pre-processing,
reconstruction up to the automated classification of live-cell images can be constructed
right now, the infrastructural problems that arise with the sheer amount of imaging data
produced, are not yet solved. Trying to use grid-infrastructure to solve these ressource-
and processing-heavy issues seems like a natural fit. We are currently building up an
Imaging-Enviroment that extends the Open Microscopy Enviroment (OME) to include
storage- and processingressources from our existing grid-infrasturcture. The long-term
vision of this project is to build up an open enviroment that allows bioinformatics to
cooperate on semi-automated or automated Imaging Workflows without having to take
care of the underlying storage and processing infrastructure.