0% found this document useful (0 votes)
15 views65 pages

GridComputing-An Introduction MAIN

Uploaded by

Goldi Ladani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views65 pages

GridComputing-An Introduction MAIN

Uploaded by

Goldi Ladani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 65

GRID COMPUTING

Outline
 Introduction to Grid Computing
 Methods of Grid computing
 Grid Middleware
 Grid Architecture
 Grid Applications
 Related topics on Grid
Grid Computing
Grid computing is a form of distributed computing whereby a "super and
virtual computer" is composed of a cluster of networked, loosely coupled
computers, acting in concert to perform very large tasks.

Grid computing (Foster and Kesselman, 1999) is a growing technology


that facilitates the executions of large-scale resource intensive
applications on geographically distributed computing resources.

Facilitates flexible, secure, coordinated large scale resource sharing


among dynamic collections of individuals, institutions, and resource

Enable communities (“virtual organizations”) to share geographically


distributed resources as they pursue common goals
Ian Foster and Carl Kesselman
Introduction

 Since its introduction, the concept of grid


computing has acquired great popularity, even
greater than the Web itself had at its beginning.
 The concept has not only found its place within

numerous science projects (in medicine e.g.),


but is also being used for various commercial
applications.
What is Grid Computing?
 Grid computing is a type of data management
and computer infrastructure, designed as a
support primarily for scientific research, but, as
said in the introduction, also used in various
commercial concepts, business research,
entertainment and finally by governments of
different countries.
Criteria for a Grid:
Coordinates resources that are not subject to
centralized control.
Uses standard, open, general-purpose protocols and
interfaces.
Delivers nontrivial qualities of service.

Benefits
Exploit Underutilized resources
Resource load Balancing
Virtualize resources across an enterprise
Data Grids, Compute Grids
Enable collaboration for virtual organizations
Who can use grid computing
 Governments and International
Organizations
 The military

 Teachers and educators

 Businesses
Grid Applications
Data and computationally intensive applications:
This technology has been applied to computationally-intensive scientific,
mathematical, and academic problems like drug discovery, economic
forecasting, seismic analysis back office data processing in support of
e-commerce
 A chemist may utilize hundreds of processors to screen thousands of
compounds per hour.
 Teams of engineers worldwide pool resources to analyze terabytes of
structural data.
 Meteorologists seek to visualize and analyze petabytes of climate data
with enormous computational demands.
Resource sharing
 Computers, storage, sensors, networks, …
 Sharing always conditional: issues of trust, policy, negotiation,
payment, …
Coordinated problem solving
 distributed data analysis, computation, collaboration, …
Grid Computing Applications
 One of the most tantalizing applications of radio
astronomy is the observation of radio signals as
part of Searches for Extra Terrestrial Intelligence
(SETI).
 The vast amount of computing capacity required
for SETI radio signal processing has led to a
unique grid computing concept that has now been
expanded to many applications.
Grid Topologies

• Intragrid
– Local grid within an organisation
– Trust based on personal contracts
• Extragrid
– Resources of a consortium of organisations
connected through a (Virtual) Private Network
– Trust based on Business to Business contracts
• Intergrid
– Global sharing of resources through the internet
– Trust based on certification
TYPES OF GRID

• Computational Grid
• Scavenging Grid
• Data Grid
Computational Grid

• A computational grid is focused on


setting aside resources specifically
for computing power.
• In this type of grid, most of the
machines
are high-performance servers.
Computational Grid

“A computational grid is a hardware and software infrastructure


that provides dependable, consistent, pervasive, and
inexpensive access to high-end computational capabilities.”

”The Grid: Blueprint for a New Computing Infrastructure”,


Kesselman & Foster

Example : Science Grid (US Department of Energy)


Scavenging Grid

 A scavenging grid is most commonly used with


large numbers of desktop machines.
 Machines are scavenged for available CPU
cycles and other resources.
 Owners of the desktop machines are usually
given control over when their resources are
available to participate in the grid.
Data Grid

 A data grid is responsible for housing and


providing access to data across multiple
organizations.
 Users are not concerned with where this data
is located as long as they have access to the
data.
Data Grid
 A data grid is a grid computing system that deals with data
— the controlled sharing and management of large
amounts of distributed data.

 Data Grid is the storage component of a grid environment.


Scientific and engineering applications require access to
large amounts of data, and often this data is widely
distributed. A data grid provides seamless access to the
local or remote data required to complete compute intensive
calculations.
Example :
Biomedical informatics Research Network (BIRN),
the Southern California earthquake Center (SCEC).
Methods of Grid Computing

 Distributed Supercomputing
 High-Throughput Computing
 On-Demand Computing
 Data-Intensive Computing
 Collaborative Computing
 Logistical Networking
Distributed Supercomputing

 Combining multiple high-capacity resources on


a computational grid into a single, virtual
distributed supercomputer.

 Tackle problems that cannot be solved on a


single system.
High-Throughput Computing
 Uses the grid to schedule large numbers of
loosely coupled or independent tasks, with the
goal of putting unused processor cycles to
work.

On-Demand Computing
 Uses grid capabilities to meet short-term
requirements for resources that are not
locally accessible.
 Models real-time computing demands.
Collaborative Computing
 Concerned primarily with enabling and
enhancing human-to-human interactions.
 Applications are often structured in terms of a
virtual shared space.
Data-Intensive Computing
 The focus is on synthesizing new information
from data that is maintained in geographically
distributed repositories, digital libraries, and
databases.

 Particularly useful for distributed data mining.


Logistical Networking

 Logistical networks focus on exposing storage


resources inside networks by optimizing the global
scheduling of data transport, and data storage.
 Contrasts with traditional networking, which does
not explicitly model storage resources in the
network.
 high-level services for Grid applications
 Called "logistical" because of the analogy it bears
with the systems of warehouses, depots, and
distribution channels.
P2P Computing vs Grid Computing

 Differ in Target Communities


 Grid system deals with more complex,

more powerful, more diverse and highly


interconnected set of resources than
P2P.
 VO
A typical view of Grid
environment
Grid InformationGrid Information Service
Service system Details of Grid
resources
collects the details of
the available Grid 1
resources and passes
the information to
2
the resource broker. 4
Computational
3 jobs
Grid
application
Processed jobs
Computation
result
User
A User sends
Resource Broker
A Resource Broker
computation or data
intensive application
distribute the jobs in an Grid Resources
application to the Grid Grid Resources (Cluster,
to Global Grids in resources based on user’s PC, Supercomputer,
order to speed up the QoS requirements and database, instruments,
execution of the details of available Grid etc.) in the Global Grid
application. resources for further execute the user jobs.
Grid Middleware
 Grids are typically managed by grid ware -
a special type of middleware that enable sharing and manage grid
components based on user requirements and resource
attributes (e.g., capacity, performance)
 Software that connects other software components or

applications to provide the following functions:


Run applications on suitable available resources
– Brokering, Scheduling
Provide uniform, high-level access to resources
– Semantic interfaces
– Web Services, Service Oriented Architectures
Address inter-domain issues of security, policy, etc.
– Federated Identities
Provide application-level status
monitoring and control
Middlewares
 Globus –chicago Univ
 Condor – Wisconsin Univ – High throughput
computing
 Legion – Virginia Univ – virtual workspaces-
collaborative computing
 IBP – Internet back pane – Tennesse Univ –
logistical networking
 NetSolve – solving scientific problems in
heterogeneous env – high throughput & data
intensive
Two Key Grid Computing Groups
The Globus Alliance (www.globus.org)
 Composed of people from:

Argonne National Labs, University of Chicago, University of


Southern California Information Sciences Institute, University
of Edinburgh and others.
 OGSA/I standards initially proposed by the Globus Group

The Global Grid Forum (www.ggf.org)


 Heavy involvement of Academic Groups and Industry
 (e.g. IBM Grid Computing, HP, United Devices, Oracle, UK e-
Science Programme, US DOE, US NSF, Indiana University,
and many others)
 Process
 Meets three times annually
 Solicits involvement from industry, research groups, and
academics
Some of the Major Grid Projects
Name URL/Sponsor Focus

EuroGrid, Grid eurogrid.org Create tech for remote access to super


Interoperability European Union comp resources & simulation codes; in
(GRIP) GRIP, integrate with Globus Toolkit™
Fusion Collaboratory fusiongrid.org Create a national computational
DOE Off. Science collaboratory for fusion research
Globus Project™ globus.org Research on Grid technologies;
DARPA, DOE, development and support of Globus
NSF, NASA, Msoft Toolkit™; application and deployment
GridLab gridlab.org Grid technologies and applications
European Union
GridPP gridpp.ac.uk Create & apply an operational grid within the
U.K. eScience U.K. for particle physics research
Grid Research grids-center.org Integration, deployment, support of the NSF
Integration Dev. & NSF Middleware Infrastructure for research &
Support Center education
Grid Architecture
The Hourglass Model
 Focus on architecture issues
Applications
 Propose set of core services as
Diverse global services
basic infrastructure
 Used to construct high-level,
domain-specific solutions
(diverse)
Core
 Design principles services
 Keep participation cost low
 Enable local control
 Support for adaptation
 “IP hourglass” model
Local OS
Grid architecture

 Fabric layer: Provides the resources to which shared access is


mediated by Grid protocols.
 Connectivity layer: Defines the core communication and

authentication protocols required for grid-specific network


functions.
 Resource layer: Defines protocols, APIs, and SDKs for secure

negotiations, initiation, monitoring control, accounting and


payment of sharing operations on individual resources.
 Collective Layer: Contains protocols and services that capture

interactions among a collection of resources.


 Application Layer: These are user applications that operate

within VO environment.
Layered Grid Architecture
(By Analogy to Internet Architecture)

Application

Architecture
Internet Protocol
“Coordinating multiple resources”:
ubiquitous infrastructure services, Collective
app-specific distributed services Application

“Sharing single resources”:


negotiating access, controlling use Resource

“Talking to things”: communication


(Internet protocols) & security Connectivity Transport
Internet
“Controlling things locally”: Access
to, & control of, resources Fabric Link
Example:
Data Grid Architecture
App Discipline-Specific Data Grid Application

CollectiveCoherency control, replica selection, task management,


(App) virtual data catalog, virtual data code catalog, …

CollectiveReplica catalog, replica management, co-allocation,


(Generic) certificate authorities, metadata catalogs,
Resource Access to data, access to computers, access to network
performance data, …

Connect Communication, service discovery (DNS), authentication,


authorization, delegation

Fabric Storage systems, clusters, networks, network caches, …


Advantages
 Increased user productivity: By providing
transparent access to resources, work can be
completed more quickly.
 Scalability: Grids can grow seamlessly over time,
allowing many thousands of processors to be
integrated into one cluster.
 Flexibility: Grid computing provides computing
power where it is needed most, helping
to better meet dynamically changing work
loads.
Disadvantages
1) for memory hungry applications that can't take advantage
of MPI you may be forced to run on a large SMP.
2) you may need to have a fast interconnect between
compute resources (gigabit ethernet at a minimum).
Infobahn for MPI intense applications.

3) some applications may need to be tweaked to take full


advantage of the new model.
Disadvantages…
 4) Licensing across many servers may make it
prohibitive for some apps. Vendors are starting to
be more flexible with environment like this.

Areas that already are taking good advantage of


grid computing include bioinformatics,
cheminformatics, and oil & drilling, and financial
applications.
Grid Computing Software
Interface
•Brief introduction to Globus

•Executing a simple job on command line

•Executing program through a Grid portal

Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © 2009.
Chapter 1, pp 19-28. For educational use only. All rights reserved. Aug 24, 2009 1-2.38
Grid computing infrastructure
(middleware) software

Primary objective:

To make seamless environment for users


to access distributed resources.

1-2.39
Grid computing infrastructure software
Key aspects include:
Secure envelop over all transactions
Single sign-on - being able to access all available
resources and run jobs without having to supply
additional passwords or account information.
Data management tools
Information services providing characteristics of
resources and their status (including dynamic load)
APIs and services that enable applications
themselves to take advantage of Grid platform
Convenient user interface
1-2.40
Globus Project
Open source software toolkit developed for
Grid computing.
One of the most influential projects
Roots in I-way experiment.
Work started in 1996.
Four versions developed to present time.
Reference implementations of Grid computing
standards.
Defacto standard for Grid computing.

1-2.41
Globus
A “toolkit” of services and packages for
creating the basic grid computing
infrastructure
Higher level tools added to this infrastructure
Version 4 is web-services based
Some non-web services code exists from
earlier versions (legacy) or where not
appropriate (for efficiency, etc.).

1-2.42
Some Globus toolkit versions
(approximate time line)

Fig. 1.5 1-2.43


Globus Toolkit
Five major parts:
Common run time
- Libraries and services
Security
- Components to provide secure access
Execution management
- Executing, monitoring and management of jobs
Data Management
- Discovery access and transfer of data
Information
- Discovery and monitoring of resources and services
1-2.44
Some basic Globus components
GSI Grid Security Infrastructure
– Provides for security envelop around Grid resources
– Uses public key cryptography
GRAM (Globus/Grid Resource Allocation
Management)
– Globus’ basic execution management component
– Used to issue and manage jobs
MDS (Monitoring and Discovery Service)
– To discover resources and their status
GridFTP
– For transferring files between resources
1-2.45
Security

Has to cross administrative domains.


Need agreed mechanisms and standards.
Focus on Internet security mechanisms,
modified to handle the special needs of
Grid computing.

1-2.46
Security
Distributed resources must be protected from unauthorized
access.
GSI (Grid Security Infrastructure) -- Globus components for
creating security envelop.
Requires each user to be authenticated (their identity proved).
Uses public key cryptography (basis of Internet security)
Each user must possess a so-called (digital) certificate, signed
by a trusted certificate authority.
Users will also need to be able to give their authority to Grid
components to act on their behalf.
Users generally will also need accounts of resources they
intend to use (authorization).

1-2.47
Resource Discovery

Still primitive and in research but ideal is


to be able to submit a job and the
system find the best grid resources for
that job across the whole grid

1a.48
Resource Discovery
Basic Globus component called MDS (Monitoring
and Discovery System).
Users might access MDS to discover status of
compute resources. In practice, users often
know what resources are there but not dynamic
load.
MDS might be used by other Grid components
such as schedulers.

1-2.49
Executing a Job
Next user typically would want to submit a job.

Basic Globus component for running a job is


GRAM (Globus or Grid Resource Allocation
Management).

1-2.50
Command-line interface
Grid computing environments mostly Linux-based and originally
accessed through a command line.
Once you have established your security credentials, to run a
simple job you might issue GRAM command:

globusrun-ws -submit -c prog1

where prog1 is executable of job.

Executable needs to be present on compute resource that is to


execute it.
Above command does not specify compute resource and hence
computer executing globusrun-ws command will execute prog1.
Executing a Job (continued)

May be necessary beforehand to transfer files


to resources and afterwards to transfer files
to other locations including back to user.

User might use data management component


called GridFTP for that.

1-2.52
GridFTP command to transfer files
globus-url-copy \
gsiftp://www.coitgrid02.uncc.edu/~abw/
prog1out \
file:///home/abw/

First argument is source location and second argument is


destination location.

In the above case, the file:


www.coit-grid02.uncc.edu/~abw/prog1out
transferred to
home/abw/
on the local computer.
User employing Globus services and facilities

Fig. 1.6 1-2.54


Grid Portal
Command-line interface a very primitive way of interacting with
Grid resources. Web-based interface called a Grid portal more
desirable. UNC–Charlotte Grid portal course based upon
GridSphere Grid portal toolkit.

Fig. 1.7
Before users can log on, they need a user name and
password for portal.
They must have user “credentials” and accounts on
the resources they wish to access.
In UNC–Charlotte course portal, PURSe (Portal-
based User Registration Service) portlet used to
facilitate setup procedures.
Reached by selecting “Register” tab.
User enters required information (name, email
address, institution, etc.) which is forwarded to Grid
system administrator to set up accounts and
credentials.

1-2.56
PURSe
registration
portlet

Fig. 1.8
Registration activities

Fig. 1.9 1-2.58


Once logged into Grid portal, user will see a number
of tabs across top, which enable user to perform
many basic tasks.
Grid information tab

1-2.60
Proxies
To use many services, you are required to have a
proxy certificate (a proxy).
Proxies are part of Grid security infrastructure,
discussed later in course.
Proxy is an electronic document that enables
resources to be accessed on user’s behalf.
Very convenient to use credential management
service called myProxy to hold proxies
Usually, Gridsphere automatically obtains a proxy
from the myProxy server for you when you log in.

1-2.61
Proxy management tab

1-2.62
File management tab

1-2.63
Batch job submission tab
CONCLUSION
 Grid computing introduces a new concept to IT
infrastructures because it supports distributed
computing over a network of heterogeneous
resources and is enabled by open standards.

You might also like