Module 3 Compressed
Module 3 Compressed
Introduction
In today's pervasive world of needing information anytime and anywhere, the explosive Grid
Computing environments have now proven to be so significant that they are often referred to as
being the world's single and most powerful computer solutions. It has been realized that with the
many benefits of Grid Computing, we have consequently introduced both a complicated and complex
global environment, which leverages a multitude of open standards and technologies in a wide
variety of implementation schemes. As a matter of fact the complexity and dynamic nature of
industrial problems in today's world are much more intensive to satisfy by the more traditional,
single computational platform approaches.
Grid Computing equates to the world's largest computer …
The Grid C omputing discipline involves the actual networking services and connections of a potentially unlimited number of
ubiquitous computing devices within a "grid." This new innovative approach to computing can be most simply thought of as a
massively large power "utility" grid, such as what provides power to our homes and businesses each and every day. This
delivery of utility-based power has become second nature to many of us, worldwide. We know that by simply walking into a room
and turning on the lights, the power will be directed to the proper devices of our choice for that moment in time. In this same
utility fashion, Grid C omputing openly seeks and is capable of adding an infinite number of computing devices into any grid
environment, adding to the computing capability and problem resolution tasks within the operational grid environment.
The incredible problem resolution capabilities of Grid C omputing remain yet unknown, as we continue to forge ahead and enter
this new era of massively powerful grid-based problem-solving solutions.
This "Introduction" section of the book will begin to present many of the Grid Computing topics,
which are discussed throughout this book. These discussions in Chapter 1 are intended only to
provide a rather high-level examination of Grid Computing. Later sections of the book provide a full
treatment of the topics addressed by many worldwide communities utilizing and continuing to
develop Grid Computing.
The worldwide business demand requiring intense problem-solving capabilities for incredibly complex
problems has driven in all global industry segments the need for dynamic collaboration of many
ubiquitous computing resources to be able to work together. These difficult computational problem-
solving needs have now fostered many complexities in virtually all computing technologies, while
driving up costs and operational aspects of the technology environments. However, this advanced
computing collaboration capability is indeed required in almost all areas of industrial and business
problem solving, ranging from scientific studies to commercial solutions to academic endeavors. It is
a difficult challenge across all the technical communities to achieve this level of resource
collaboration needed for solving these complex and dynamic problems, within the bounds of the
necessary quality requirements of the end user.
To further illustrate this environment and oftentimes very complex set of technology challenges, let
us consider some common use case scenarios one might have already encountered, which will begin
to examine the many values of a Grid Computing solution environment. These simple use cases, for
purposes of introduction to the concepts of Grid Computing, are as follows:
A group of scientists studying the atmospheric ozone layer will collect huge amounts of
experimental data, each and every day. These scientists need efficient and complex data
storage capabilities across wide and geographically dispersed storage facilities, and they need
to access this data in an efficient manner based on the processing needs. This ultimately
results in a more effective and efficient means of performing important scientific research.
Massive online multiplayer game scenarios for a wide community of international gaming
participants are occurring that require a large number of gaming computer servers instead of a
dedicated game server. This allows international game players to interact among themselves as
a group in a real-time manner. This involves the need for on-demand allocation and
provisioning of computer resources, provisioning and self-management of complex networks,
and complicated data storage resources. This on-demand need is very dynamic, from moment-
to-moment, and it is always based upon the workload in the system at any given moment in
time. This ultimately results in larger gaming communities, requiring more complex
infrastructures to sustain the traffic loads, delivering more profits to the bottom lines of
gaming corporations, and higher degrees of customer satisfaction to the gaming participants.
A government organization studying a natural disaster such as a chemical spill may need to
immediately collaborate with different departments in order to plan for and best manage the
disaster. These organizations may need to simulate many computational models related to the
spill in order to calculate the spread of the spill, effect of the weather on the spill, or to
determine the impact on human health factors. This ultimately results in protection and safety
matters being provided for public safety issues, wildlife management and protection issues,
and ecosystem protection matters: Needles to say all of which are very key concerns.
Today, Grid Computing offers many solutions that already address and resolve the above problems.
Grid Computing solutions are constructed using a variety of technologies and open standards. Grid
Computing, in turn, provides highly scalable, highly secure, and extremely high-performance
mechanisms for discovering and negotiating access to remote computing resources in a seamless
manner. This makes it possible for the sharing of computing resources, on an unprecedented scale,
among an infinite number of geographically distributed groups. This serves as a significant
transformation agent for individual and corporate implementations surrounding computing practices,
toward a general-purpose utility approach very similar in concept to providing electricity or water.
These electrical and water types of utilities, much like Grid Computing utilities, are available "on
demand," and will always be capable of providing an always-available facility negotiated for
individual or corporate utilization.
In this new and intriguing book, we will begin our discussion on the core concepts of the Grid
Computing system with an early definition of grid. Back in 1998, it was defined, "A computational
grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and
inexpensive access to high-end computational capabilities" (Foster & Kesselman, 1998).
The preceding definition is more centered on the computational aspects of Grid Computing while
later iterations broaden this definition with more focus on coordinated resource sharing and problem
solving in multi-institutional virtual organizations (Foster & Kesselman, 1998). In addition to these
qualifications of coordinated resource sharing and the formation of dynamic virtual organizations,
open standards become a key underpinning. It is important that there are open standards
throughout the grid implementation, which also accommodate a variety of other open standards-
based protocols and frameworks, in order to provide interoperable and extensible infrastructure
environments.
Grid Computing environments must be constructed upon the following foundations:
Coordinated resources. W e should avoid building grid systems with a centralized control;
instead, we must provide the necessary infrastructure for coordination among the resources,
based on respective policies and service-level agreements.
Open standard protocols and framew orks. The use of open standards provides interoperability
and integration facilities. These standards must be applied for resource discovery, resource
access, and resource coordination.
Another basic requirement of a Grid Computing system is the ability to provide the quality of service
(QoS) requirements necessary for the end-user community. These QoS validations must be a basic
feature in any Grid system, and must be done in congruence with the available resource matrices.
These QoS features can be (for example) response time measures, aggregated performance, security
fulfillment, resource scalability, availability, autonomic features such as event correlation and
configuration management, and partial fail over mechanisms.
There have been a number of activities addressing the above definitions of Grid Computing and the
requirements for a grid system. The most notable effort is in the standardization of the interfaces
and protocols for the Grid Computing infrastructure implementations. W e will cover the details later
in this book. Let us now explore some early and current Grid Computing systems and their
differences in terms of benefits.
Early Grid Activities
Over the past several years, there has been a lot of interest in computational Grid Computing
worldwide. W e also note a number of derivatives of Grid Computing, including compute grids, data
grids, science grids, access grids, knowledge grids, cluster grids, terra grids, and commodity grids.
As we explore careful examination of these grids, we can see that they all share some form of
resources; however, these grids may have differing architectures.
One key value of a grid, whether it is a commodity utility grid or a computational grid, is often
evaluated based on its business merits and the respective user satisfaction. User satisfaction is
measured based on the QoS provided by the grid, such as the availability, performance, simplicity of
access, management aspects, business values, and flexibility in pricing. The business merits most
often relate to and indicate the problem being solved by the grid. For instance, it can be job
executions, management aspects, simulation workflows, and other key technology-based
foundations.
Earlier Grid Computing efforts were aligned with the overlapping functional areas of data,
computation, and their respective access mechanisms. Let us further explore the details of these
areas to better understand their utilization and functional requirements.
Data
The data aspects of any Grid Computing environment must be able to effectively manage all aspects
of data, including data location, data transfer, data access, and critical aspects of security. The core
functional data requirements for Grid Computing applications are:
The ability to integrate multiple distributed, heterogeneous, and independently managed data
sources.
The ability to provide efficient data transfer mechanisms and to provide data where the
computation will take place for better scalability and efficiency.
The ability to provide data caching and/or replication mechanisms to minimize network traffic.
The ability to provide necessary data discovery mechanisms, which allow the user to find data
based on characteristics of the data.
The capability to implement data encryption and integrity checks to ensure that data is
transported across the network in a secure fashion.
The ability to provide the backup/restore mechanisms and policies necessary to prevent data
loss and minimize unplanned downtime across the grid.
Computation
The core functional computational requirements for grid applications are:
The ability to provide mechanisms that can intelligently and transparently select computing
resources capable of running a user's job
The understanding of the current and predicted loads on grid resources, resource availability,
dynamic resource configuration, and provisioning
Ensure appropriate security mechanisms for secure resource management, access, and integrity
Let us further explore some details on the computational and data grids as they exist today.
The access to databases, utilizing meta-data and other attributes of the data
The capability to support flexible data access and data filtering capabilities
As one begins to realize the importance of extreme high performance-related issues in a Grid
Computing environment, it is recommended to store (or cache) data near to the computation, and to
provide a common interface for data access and management.
It is interesting to note that upon careful examination of existing Grid Computing systems, readers
will learn that many Grid Computing systems are being applied in several important scientific
research and collaboration projects; however, this does not preclude the importance of Grid
Computing in business-, academic-, and industry-related fields. The commercialization of Grid
Computing invites and addresses a key architectural alignment with several existing commercial
frameworks for improved interoperability and integration.
As we will describe in this book, many current trends in Grid Computing are toward service-based
architectures for grid environments. This "architecture" is built for interoperability and is (again)
based upon open standard protocols. W e will provide a full treatment including many of the details
toward this architecture throughout subsequent sections in this book.
Current Grid Activities
As described earlier, initially, the focused Grid Computing activities were in the areas of computing
power, data access, and storage resources.
The definition of Grid Computing resource sharing has since changed, based upon experiences, with
more focus now being applied to a sophisticated form of coordinated resource sharing distributed
throughout the participants in a virtual organization. This application concept of coordinated
resource sharing includes any resources available within a virtual organization, including computing
power, data, hardware, software and applications, networking services, and any other forms of
computing resource attainment. This concept of coordinated resource sharing is depicted in Figure
1.1.
As depicted in the previous illustration, there are a number of sharable resources, hardware and
software applications, firmware implementations, and networking services, all available within an
enterprise or service provider environment. Rather than keeping these resources isolated within an
atomic organization, the users can acquire these resources on a "demand" basis. Through
implementing this type of Grid Computing environment, these resources are immediately available
to the authenticated users for resolving specific problems. These problems may be a software
capability problem (e.g., modeling, simulation, word processing, etc.) or hardware availability and/or
computing capacity shortage problems (e.g., processor computing resources, data storage/access
needs, etc.). W hile on another level, these problems may be related to a networking bandwidth
availability problem, the need for immediate circuit provisioning of a network, a security event or
other event correlation issue, and many more types of critical environmental needs.
Based upon the specific problem dimension, any given problem may have one or more resolution
issues to address. For example, in the above case there is two sets of users, each with a need to
solve two different types of problems. Y ou will note that one has to resolve the weather prediction
problem, while the other has to provide a financial modeling case. Based upon these problem
domains noted by each of the user groups, their requirements imply two types of virtual
organizations. These distinct virtual organizations are formulated, sustained, and managed from a
computing resource viewpoint according to the ability to access the available resources. Let us
further explore this concept of "virtualization" by describing in more detail the usage patterns found
within each of the virtual organizations.
A virtual organization for w eather prediction. For example, this virtual organization requires
resources such as weather prediction software applications to perform the mandatory
environmental simulations associated with predicting weather. Likewise, they will require very
specific hardware resources to run the respective software, as well as high-speed data storage
facilities to maintain the data generated from performing the simulations.
A virtual organization for financial modeling. For example, this virtual organization requires
resources such as software modeling tools for performing a multitude of financial analytics,
virtualized blades [1] to run the above software, and access to data storage facilities for storing
and accessing data.
These virtual organizations manage their resources and typically will provision additional resources
on an "as-needed" basis. This on-demand approach provides tremendous values toward scalability,
in addition to aspects of enhanced reusability. This approach is typically found in any "on-demand"
environment. This capability is based upon a utility infrastructure, where resources are allocated as,
and when, they are required. Likewise, their utility pricing scenarios are always based upon the
capturing of usage metrics.
The following discussion introduces a number of requirements needed for such Grid Computing
architectures utilized by virtual organizations. W e shall classify these architecture requirements into
three categories. These resources categories must be capable of providing facilities for the following
scenarios:
The need for dynamic discovery of computing resources, based on their capabilities and
functions.
The immediate allocation and provisioning of these resources, based on their availability and
the user demands or requirements.
The management of these resources to meet the required service level agreements (SLAs).
The provisioning of multiple autonomic features for the resources, such as self-diagnosis, self-
healing, self-configuring, and self-management.
The provisioning of secure access methods to the resources, and bindings with the local
security mechanisms based upon the autonomic control policies.
The formation of virtual task forces, or groups, to solve specific problems associated with the
virtual organization.
The dynamic collection of resources from heterogeneous providers based upon users' needs and
the sophistication levels of the problems.
The dynamic identification and automatic problem resolution of a wide variety of troubles, with
automation of event correlation, linking the specific problems to the required resource and/or
service providers.
The dynamic provisioning and management capabilities of the resources required meeting the
SLAs.
The formation of a secured federation (or governance model) and common management model
for all of the resources respective to the virtual organization.
The secure delegation of user credentials and identity mapping to the local domain(s).
The management of resources, including utilization and allocation, to meet a budget and other
economic criteria.
Users/applications typically found in Grid Computing environments must be able to perform the
following characteristics:
The identification and mapping of the resources required solve the problem
The ability to sustain the required levels of QoS, while adhering to the anticipated and
necessary SLAs
The capability to collect feedback regarding resource status, including updates for the
environment's respective applications
The above discussion helps us now to better understand the common requirements for grid systems.
In the subsequent chapters in this section, and moreover throughout this book, we discuss the many
specific details on the Grid Computing architecture models and emerging Grid Computing software
systems that have proven valuable in supporting the above requirements.
The following section will provide treatment toward some of the more common Grid Computing
business areas that exist today, and those areas that will typically benefit from the above concepts
of Grid Computing. It is worthy to mention that these business areas are most often broadly
classified, and based upon the industry sector where they reside.
An Overview of Grid Business Areas
One of the most valuable aspects of all Grid Computing systems are that they attract the business
they are intended to address. In an "on-demand" scenario, these Grid Computing environments are
the result of autonomic provisioning of a multitude of resources and capabilities, typically
demonstrating increased computing resource utilization, access to specialized computer systems,
cost sharing, and improved management capabilities.
IBM Business On Demand Initiative
Business On Demand (in the rest of the book we will refer to this as On Demand) is not just about utility computing as it has a
much broader set of ideas about the transformation of business practices, process transformation, and technology
implementations. C ompanies striving to achieve the Business On Demand operational models will have the capacity to sense and
respond to fluctuating market conditions in real-time, while providing products and services to customers in a Business On
Demand operational model. The essential characteristics of on-demand businesses are responsiveness to the dynamics of
business, adapting to variable cost structures, focusing on core business competency, and resiliency for consistent availability.
This is achieved through seamless integration of customers and partners, virtualization of resources, autonomic/dependable
resources, and open standards.
There have been a significant number of commercialization efforts, which support Grid Computing in
every sector of the marketplace. In general terms, the utilization of Grid Computing in business
environments provides a rich and extensible set of business benefits. These business benefits
include (but are not limited to):
Acceleration of implementation time frames in order to intersect with the anticipated business
end results.
Improved productivity and collaboration of virtual organizations and respective computing and
data resources.
Allowing widely dispersed departments and businesses to create virtual organizations to share
data and resources.
Many organizations have started identifying the major business areas for Grid Computing business
applications. Some examples of major business areas include (but are not limited to):
Life sciences, for analyzing and decoding strings of biological and chemical information
Financial services, for running long, complex financial models and arriving at more accurate
decisions
Engineering services, including automotive and aerospace, for collaborative design and data-
intensive testing
Government, for enabling seamless collaboration and agility in both civil and military
departments and other agencies
Collaborative games for replacing the existing single-server online games with more highly
parallel, massively multiplayer online games
Let us now introduce and explore the analytics of each of these industry sectors by identifying some
of the high-level business-area requirements for Grid Computing systems. In doing so, we will look
at the facilities necessary for grid systems in order to meet these requirements.
Life Sciences
This industry sector has noted many dramatic advances in the life sciences sector, which have in
turn provided rapid changes in the way that drug treatment and drug discovery efforts are now being
conducted. The analytics and system efforts' surrounding genomic, proteomics, and molecular
biology efforts provides the basis for many of these Grid Computing advancements in this sector.
These advances have now presented a number of technical challenges to the information technology
sector, and especially the Grid Computing disciplines.
Grid Computing efforts have realized that these challenges include huge amounts of data analysis,
data movement, data caching, and data mining. In addition to the complexity of processing data,
there needs to be additional requirements surrounding data security, secure data access, secure
storage, privacy, and highly flexible integration. Another area that requires attention is the querying
of nonstandard data formats and accessing data assets across complex global networks.
The above requirements presented by life sciences require a Grid Computing infrastructure to
properly manage data storage, providing access to the data, and all while performing complex
analysis respective to the data. The Grid Computing systems can provide a common infrastructure
for data access, and at the same time, provide secure data access mechanisms while processing the
data. Today, life sciences utilizes the Grid Computing systems to execute sequence comparison
algorithms and enable molecular modeling using the above-collected secured data. This now
provides the Life Sciences sector the ability to afford world-class information analysis respective to
this discussion, while at the same time providing faster response times and far more accurate
results.
These virtual organizations engaged in research collaboration activities generate petabytes [2] of
data and require tremendous amounts of storage space and thousands of computing processors.
Researchers in these fields must share data, computational processors, and hardware
instrumentation such as telescopes and advanced testing equipment. Most of these resources are
pertaining to data-intensive processing, and are widely dispersed over a large geographical area.
The Grid Computing discipline provides mechanisms for resource sharing by forming one or more
virtual organizations providing specific sharing capabilities. Such virtual organizations are
constituted to resolve specific research problems with a wide range of participants from different
regions of the world. This formation of dynamic virtual organizations provides capabilities to
dynamically add and delete virtual organization participants, manage the "on-demand" sharing of
resources, plus provisioning of a common and integrated secure framework for data interchange and
access.
Grid Computing systems provide a wide range of capabilities that address the above kinds of
analysis and modeling activities. These advanced types of solutions also provide complex job
schedulers and resource managers to deal with computing power requirements. This enables
automobile manufacturers (as an example) to shorten analysis and design times, all while
minimizing both capital expenditures and operational expenditures.
Collaborative Games
There are collaborative types of Grid Computing disciplines that are involving emerging technologies
to support online games, while utilizing on-demand provisioning of computation-intensive resources,
such as computers and storage networks. These resources are selected based on the requirements,
often involving aspects such as volume of traffic and number of players, rather than centralized
servers and other fixed resources.
These on-demand-driven games provide a flexible approach with a reduced up-front cost on
hardware and software resources. W e can imagine that these games use an increasing number of
computing resources with an increase in the number of concurrent players and a decrease in resource
usage with a lesser number of players. Grid Computing gaming environments are capable of
supporting such virtualized environments for enabling collaborative gaming.
Government
The Grid Computing environments in government focus on providing coordinated access to massive
amounts of data held across various agencies in a government. This provides faster access to solve
critical problems, such as emergency situations, and other normal activities. These key
environments provide more efficient decision making with less turnaround time.
Grid Computing enables the creation of virtual organizations, including many participants from
various governmental agencies (e.g., state and federal, local or country, etc.). This is necessary in
order to provide the data needed for government functions, in a real-time manner, while performing
the analysis on the data to detect the solution aspects of the specific problems being addressed.
The formation of virtual organizations, and the respective elements of security, is most challenging
due to the high levels of security in government and the very complex requirements.
Grid Applications
Based on our earlier discussion, we can align Grid Computing applications to have common needs,
such as what is described in (but not limited to) the following items:
Application partitioning that involves breaking the problem into discrete pieces
Data communications distributing the problem data where and when it is required
Let us now explore some of these Grid applications and their usage patterns. W e start with
schedulers, which form the core component in most of the computational grids.
Schedulers
Schedulers are types of applications responsible for the management of jobs, such as allocating
resources needed for any specific job, partitioning of jobs to schedule parallel execution of tasks,
data management, event correlation, and service-level management capabilities. These schedulers
then form a hierarchical structure, with meta-schedulers that form the root and other lower level
schedulers, while providing specific scheduling capabilities that form the leaves. These schedulers
may be constructed with a local scheduler implementation approach for specific job execution, or
another meta-scheduler or a cluster scheduler for parallel executions. Figure 1.2 shows this concept.
Figure 1.2. The scheduler hierarchy embodies local, meta-level, and cluster
schedulers.
The jobs submitted to Grid Computing schedulers are evaluated based on their service-level
requirements, and then allocated to the respective resources for execution. This will involve complex
workflow management and data movement activities to occur on a regular basis. There are
schedulers that must provide capabilities for areas such as (but not limited to):
Job and resource policy management and enforcement for best turnaround times within the
allowable budget constraints
Later in this book, full treatment is provided for many of the most notable scheduler and meta-
scheduler implementations.
Resource Broker
The resource broker provides pairing services between the service requester and the service
provider. This pairing enables the selection of best available resources from the service provider for
the execution of a specific task. These resource brokers collect information (e.g., resource
availability, usage models, capabilities, and pricing information) from the respective resources, and
use this information source in the pairing process.
Figure 1.3 illustrates the use of a resource broker for purposes of this discussion. This particular
resource broker provides feedback to the users on the available resources. In general cases, the
resource broker may select the suitable scheduler for the resource execution task, and collaborate
with the scheduler to execute the task(s).
Figure 1.3. The resource broker collects information from the respective
resources, and utilizes this information source in the pairing process.
The pairing process in a resource broker involves allocation and support functions such as:
Allocating the appropriate resource or a combination of resources for the task execution
Load Balancing
The Grid Computing infrastructure load-balancing issues are concerned with the traditional load-
balancing distribution of workload among the resources in a Grid Computing environment. This load-
balancing feature must always be integrated into any system in order to avoid processing delays
and overcommitment of resources. These kinds of applications can be built in connection with
schedulers and resource managers.
The workload can be pushed outbound to the resources, based on the availability state and/or
resources, and can then pull the jobs from the schedulers depending on their availability. This level
of load balancing involves partitioning of jobs, identifying the resources, and queueing of the jobs.
There are cases when resource reservations might be required, as well as running multiple jobs in
parallel.
Another feature that might be of interest for load balancing is support for failure detection and
management. These load distributors can redistribute the jobs to other resources if needed.
Grid Portals
Grid portals are similar to W eb portals, in the sense they provide uniform access to the grid
resources. For example, grid portals provide capabilities for Grid Computing resource authentication,
remote resource access, scheduling capabilities, and monitoring status information. These kinds of
portals help to alleviate the complexity of task management through customizable and personalized
graphical interfaces for the users. This, in turn, alleviates the need for end users to have more
domain knowledge than on the specific details of grid resource management.
Some examples of these grid portal capabilities are noted in the following list:
File transfer facilities such as file upload, download, integration with custom software, and so
on
Security management
In short, these grid portals help free end users from the complexity of job management and resource
allocation so they can concentrate more on their domain of expertise. There are a number of
standards and software development toolkits available to develop custom portals. The emerging
W eb services and W eb service portal standards will play a more significant role in portal
development.
Integrated Solutions
Many of the global industry sectors have witnessed the emergence of a number of integrated grid
application solutions in the last few years. This book focuses on this success factor.
These integrated solutions are a combination of the existing advanced middleware and application
functionalities, combined to provide more coherent and high performance results across the Grid
Computing environment.
Integrated Grid Computing solutions will have more enhanced features to support more complex
utilization of grids such as coordinated and optimized resource sharing, enhanced security
management, cost optimizations, and areas yet to be explored. It is straightforward to see that
these integrated solutions in both the commercial and noncommercial worlds sustain high values
and significant cost reductions. Grid applications can achieve levels of flexibility utilizing
infrastructures provided by application and middleware frameworks.
In the next section we introduce and explain the grid infrastructure. Today, the most notable
integrated solutions in the commercial and industry sectors are utility computing, on-demand
solutions, and resource virtualizations infrastructures. Let us briefly explore aspects of some of
these infrastructure solutions. W e will provide an additional, more focused treatment in subsequent
chapters of this book.
Grid Infrastructure
The grid infrastructure forms the core foundation for successful grid applications. This infrastructure
is a complex combination of a number of capabilities and resources identified for the specific
problem and environment being addressed.
In initial stages of delivering any Grid Computing application infrastructure, the developers/service
providers must consider the following questions in order to identify the core infrastructure support
required for that environment:
1. What problem(s) are we trying to solve for the user? How do we address grid enablement
simpler, while addressing the user's application simpler? How does the developer
(programmatically) help the user to be able to quickly gain access and utilize the
application to best fit their problem resolution needs?
How difficult is it to use the grid tool? Are grid developers providing a flexible environment for the
intended user community?
Is there anything not yet considered that would make it easier for grid service providers to create
tools for the grid, suitable for the problem domain?
W hat are the open standards, environments, and regulations grid service providers must address?
In the early development stages of grid applications, numerous vertical "towers" and middleware
solutions were often developed to solve Grid Computing problems. These various middleware and
solution approaches were developed for fairly narrow and limited problem-solving domains, such as
middleware to deal with numerical analysis, customized data access grids, and other narrow
problems. Today, with the emergence and convergence of grid service-oriented technologies, [3]
including the interoperable XML [4]-based solutions becoming ever more present and industry
providers with a number of reusable grid middleware solutions facilitating the following requirement
areas, it is becoming simpler to quickly deploy valuable solutions. Figure 1.4 shows this topology of
middleware topics.
Figure 1.4. Grid middleware topic areas are becoming more sophisticated at an
aggressive rate.
In general, a Grid Computing infrastructure component must address several potentially complicated
areas in many stages of the implementation. These areas are:
Security
Resource management
Information services
Data management
Security
The heterogeneous nature of resources and their differing security policies are complicated and
complex in the security schemes of a Grid Computing environment. These computing resources are
hosted in differing security domains and heterogeneous platforms. Simply speaking, our middleware
solutions must address local security integration, secure identity mapping, secure
access/authentication, secure federation, and trust management.
The other security requirements are often centered on the topics of data integrity, confidentiality,
and information privacy. The Grid Computing data exchange must be protected using secure
communication channels, including SSL/TLS and oftentimes in combination with secure message
exchange mechanisms such as W S-Security. The most notable security infrastructure used for
securing grid is the Grid Security Infrastructure (GSI). In most cases, GSI provides capabilities for
single sign-on, heterogeneous platform integration and secure resource access/authentication.
The latest and most notable security solution is the use of W S-Security standards. This mechanism
provides message-level, end-to-end security needed for complex and interoperable secure solutions.
In the coming years we will see a number of secure grid environments using a combination of GSI
and W S-Security mechanisms for secure message exchanges. W e will discuss the details of security
mechanisms provided by these standards later in this book.
Resource Management
The tremendously large number and the heterogeneous potential of Grid Computing resources
causes the resource management challenge to be a significant effort topic in Grid Computing
environments. These resource management scenarios often include resource discovery, resource
inventories, fault isolation, resource provisioning, resource monitoring, a variety of autonomic
capabilities, [5] and service-level management activities. The most interesting aspect of the resource
management area is the selection of the correct resource from the grid resource pool, based on the
service-level requirements, and then to efficiently provision them to facilitate user needs.
Let us explore an example of a job management system, where the resource management feature
identifies the job, allocates the suitable resources for the execution of the job, partitions the job if
necessary, and provides feedback to the user on job status. This job scheduling process includes
moving the data needed for various computations to the appropriate Grid Computing resources, and
mechanisms for dispatching the job results.
It is important to understand multiple service providers can host Grid Computing resources across
many domains, such as security, management, networking services, and application functionalities.
Operational and application resources may also be hosted on different hardware and software
platforms. In addition to this complexity, Grid Computing middleware must provide efficient
monitoring of resources to collect the required matrices on utilization, availability, and other
information.
One causal impact of this fact is (as an example) the security and the ability for the grid service
provider to reach out and probe into other service provider domains in order to obtain and reason
about key operational information (i.e., to reach across a service provider environment to ascertain
firewall and router volume-related specifics, or networking switch status, or application server
status). This oftentimes becomes complicated across several dimensions, and has to be resolved by
a meeting-of-the-minds between all service providers, such as messaging necessary information to
all providers, when and where it is required.
Another valuable and very critical feature across the Grid Computing infrastructure is found in the
area of provisioning; that is, to provide autonomic capabilities for self-management, self-diagnosis,
self-healing, and self-configuring. The most notable resource management middleware solution is
the Grid Resource Allocation Manager (GRAM). This resource provides a robust job management
service for users, which includes job allocation, status management, data distribution, and
start/stop jobs.
Information Services
Information services are fundamentally concentrated on providing valuable information respective to
the Grid Computing infrastructure resources. These services leverage and entirely depend on the
providers of information such as resource availability, capacity, and utilization, just to name a few.
This information is valuable and mandatory feedback respective to the resources managers
discussed earlier in this chapter. These information services enable service providers to most
efficiently allocate resources for the variety of very specific tasks related to the Grid Computing
infrastructure solution.
In addition, developers and providers can also construct grid solutions to reflect portals, and utilize
meta-schedulers and meta-resource managers. These metrics are helpful in service-level
management (SLA) in conjunction with the resource policies. This information is resource specific
and is provided based on the schema pertaining to that resource. W e may need higher level
indexing services or data aggregators and transformers to convert these resource-specific data into
valuable information sources for the end user.
For example, a resource may provide operating system information, while yet another resource might
provide information on hardware configuration, and we can then group this resource information,
reason with it, and then suggest a "best" price combination on selecting the operating system on
other certain hardware. This combinatorial approach to reasoning is very straightforward in a Grid
Computing infrastructure, simply due to the fact that all key resources are shared, as is the
information correlated respective to the resources.
Data Management
Data forms the single most important asset in a Grid Computing system. This data may be input
into the resource, and the results from the resource on the execution of a specific task. If the
infrastructure is not designed properly, the data movement in a geographically distributed system
can quickly cause scalability problems. It is well understood that the data must be near to the
computation where it is used. This data movement in any Grid Computing environment requires
absolutely secure data transfers, both to and from the respective resources. The current advances
surrounding data management are tightly focusing on virtualized data storage mechanisms, such as
storage area networks (SAN), network file systems, dedicated storage servers, and virtual
databases. These virtualization mechanisms in data storage solutions and common access
mechanisms (e.g., relational SQLs, W eb services, etc.) help developers and providers to design data
management concepts into the Grid Computing infrastructure with much more flexibility than
traditional approaches.
Some of the considerations developers and providers must factor into decisions are related to
selecting the most appropriate data management mechanism for Grid Computing infrastructures.
This includes the size of the data repositories, resource geographical distribution, security
requirements, schemes for replication and caching facilities, and the underlying technologies utilized
for storage and data access.
So far in this introductory chapter we have been discussing the details surrounding many aspects of
the middleware framework requirements, specifically the emergence of service provider-oriented
architectures [6] and, hence, the open and extremely powerful utility value of XML-based
interoperable messages. These combined, provide a wide range of capabilities that deal with
interoperability problems, and come up with a solution that is suitable for the dynamic virtual
organizational grids. The most important activity noted today in this area is the Open Grid Service
Architecture (OGSA) and its surrounding standard initiatives. Significant detail is recorded on this
architecture, and will be given full treatment in subsequent chapters in this book. The OGSA
provides a common interface solution to grid services, and all the information has been conveniently
encoded using XML as the standard. This provides a common approach to information services and
resource management for Grid Computing infrastructures.
This introductory chapter has discussed many of the chapters and some of their detail that will be
presented throughout this book. This introductory discussion has been presented at a high level,
and more detailed discussions with simple-to-understand graphics will follow.
Conclusion
So far we have been describing and walking through overview discussion topics on the Grid
Computing discipline that will be discussed further throughout this book, including the Grid
Computing evolution, the applications, and the infrastructure requirements for any grid environment.
In addition to this, we have discussed when one should use Grid Computing disciplines, and the
factors developers and providers must consider in the implementation phases. W ith this introduction
we can now explore deeper into the various aspects of a Grid Computing system, its evolution
across the industries, and the current architectural efforts underway throughout the world.
The proceeding chapters in this book introduce the reader to this new, evolutionary era of Grid
Computing, in a concise, hard-hitting, and easy-to-understand manner.
Parametric Computational Experiments
Parametric computational experiments are becoming increasingly important in science and engineering as a means of exploring
the behavior of complex systems. For example, a flight engineer may explore the behavior of a wing by running a computational
model of the airfoil multiple times while varying key parameters such as angle of attack, air speed, and so on.
The results of these multiple experiments yield a picture of how the wing behaves in different parts of parametric space.
Many practitioners of Grid Computing believe that economic policy/criteria-driven Grid Computing, as
depicted by Nimrod-G, is a major interest to the utility computing world.
Grid Research Integration Deployment and Support (GRIDS) Center. The GRIDS[11] center
is responsible for defining, developing, deploying, and supporting an integrated and stable
middleware infrastructure created from a number of open source grid and other distributed
computing technology frameworks. It intends to support 21st-century science and engineering
applications by working closely with a number of universities and research organizations.
Some of the open source packages included in this middleware are Globus Toolkit, Condor-G,
GSI-OpenSSH, Network W eather service, Grid Packaging Tools, GridConfig, MPICH-G2, MyProxy,
and so on.
Enterprise and Desktop Integration T echnologies (EDIT ) Consortium. EDIT [12] develops
tools, practices, and architectures to leverage campus infrastructures to facilitate multi-
institutional collaboration.
EDIT provides software to support a wider variety of desktop security, video, and enterprise
uses with a directory schema. This facilitates the federated model of directory-enabled
interrealm authentication and authorization. In addition, they are responsible for conventions
and best practice guidelines, architecture documents, policies, and to provide services to
manage the middleware. Some of the open sources packages included in this middleware are:
LDAP Operational ORCA Kollector (LOOK), Privilege and Role Management Infrastructure
Standards Validation (PERMIS), openSAMIL, and others.
The latest release (Release 3) of the NMI middleware consists of 16 software packages. The above
two teams of the NMI is creating production-quality middleware using open-source and open-
standards approaches. They continue to refine processes for team-based software development,
documentation, and technical support. The software packages included in the NMI solution have
been tested and debugged by NMI team members, so that various users, campuses, and institutions
can easily deploy them. In addition, it helps to facilitate directory-enabled (LDAP) sharing and
exchanging of information to support authentication and authorization among campuses and
institutions.
The aforementioned best practices and policy deliverables have been reviewed and deployed by
leading campuses and institutions. Some of the major initiatives using this middleware suite include
NEESgrid (Network for Earthquake Engineering Simulation), GriPhyN, and the iVDGL.
Organizations Building and Using Grid-Based Solutions to Solve
Computing, Data, and Network Requirements
These organizations and individuals are the real users of Grid Computing. They are benefiting from
resource sharing and virtualization. As of now these projects are mostly in the scientific areas. W e
will be discussing some of the major grid projects and infrastructures around the world. In general,
these grid users need:
On-demand construction of virtual computing system with the capabilities to solve the
problems at hand including scarcity of computing power, data storage, and real-time processing
Most of the DOE projects are widely distributed among collaborators and non-collaborators. It
requires a cyberinfrastructure that supports the process of distributed science with sharable
resources including expensive and complex scientific instruments.
All of the science areas need high-speed networks and advanced middleware to discover,
manage, and access computing and storage systems.
The DOE Science Grid is an integrated and advanced infrastructure that delivers:
Data capacity sufficient for scientific tasks with location independence and manageability
Software services with rich environments that let scientists focus on the science simulation and
analysis aspects rather than on management of computing, data, and communication resources
The construction of grids across five major DOE facilities provides the computing and data resources.
To date major accomplishments include the following:
Design and deployment of a grid security infrastructure for collaboration with U.S. and European
High Energy Physics projects, helping to create a single-sign-on solution within the grid
environment
The following work is used by the DOE's Particle Physics Data Grid, Earth Systems Grid, and Fusion
Grid projects:
A resource monitoring and debugging infrastructure for managing these widely distributed
resources
Several DOE applications use this grid infrastructure including computational chemistry, ground
water transport, climate modeling, bio informatics, and so on.
To establish a European GRID network of leading high performance computing centers from
different European countries
To develop important GRID software components and to integrate them into EUROGRID (fast
file transfer, resource broker, interface for coupled applications, and interactive access)
To contribute to the international GRID development and work with the leading international
GRID projects
The application-specific work packages identified for the EUROGRID project are described in the
following areas:
Bio Grid. The BioGRID project develops interfaces to enable chemists and biologists to submit
work to high performance center facilities via a uniform interface from their workstations,
without having to worry about the details of how to run particular packages on different
architectures.
Metro Grid. The main goal of the Metro Grid project is the development of an application
service provider (ASP) solution, which allows anyone to run a high resolution numerical weather
prediction model on demand.
Computer-Aided Engineering (CAE) Grid. This work project focuses on industrial CAE
applications including automobile and aerospace industries. It aims at providing services to
high performance computing (HPC) customers who require huge computing power to solve their
engineering problems.
The major partners in this work package are Debis SystemHaus and EADS Corporate Research
Center. They are working to exploit the CAE features like code coupling (to improve system
design by reducing the prototyping and testing costs) and ASP-type services (designing
application-specific user interfaces for job submission).
High Performance Center (HPC) Research Grid. This HPC research grid is used as a test-bed
for the development of distributed applications, and as an arena for cooperative work among
major scientific challenges, using computational resources distributed on a European scale. The
major partners in this work-package are the HPC centers.
The EUROGRID software is based on the UNICORE system developed and used by the leading
German HPC centers.
Earth Observations
One of the main challenges for High Energy Physics is to answer longstanding questions about the
fundamental particles of matter and the forces acting between them. In particular, the goal is to
explain w hy some particles are much heavier than others, and w hy particles have mass at all. To
that end, CERN is building the Large Hadron Collider (LHS), one of the most powerful particle
accelerators.
The search on LHS will generate huge amounts of data. The DataGrid Project is providing the
solution for storing and processing this data. A multitiered, hierarchical computing model will be
adopted to share data and computing power among multiple institutions. The Tier-0 center is
located at CERN and is linked by high-speed networks to approximately 10 major Tier-1 data-
processing centers. These will fan out the data to a large number of smaller ones (Tier-2).
The storage and exploitation of genomes and the huge flux of data coming from post-genomics puts
growing pressure on computing and storage resources within existing physical laboratories. Medical
images are currently distributed over medical image production sites (radiology departments,
hospitals).
Although there is a need today, as there is no standard for sharing data between sites, there is an