0% found this document useful (0 votes)
45 views26 pages

Data Mining Foster

Data Mining and Grid Ian Foster Computation Institute Argonne National Lab and University of Chicago www.ci.uchicago.edu. 5 Grid: A Unifying Concept and Technology Grid enables the federation of resources distributed computers, storage, data, people,. Networks provide connectivity Infrastructure Software and standards provide the "glue" 8 Globus Downloads Last 24 Hours Last month Focus on aggregation of many resources for massively (data-)parallel applications.

Uploaded by

api-3798592
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views26 pages

Data Mining Foster

Data Mining and Grid Ian Foster Computation Institute Argonne National Lab and University of Chicago www.ci.uchicago.edu. 5 Grid: A Unifying Concept and Technology Grid enables the federation of resources distributed computers, storage, data, people,. Networks provide connectivity Infrastructure Software and standards provide the "glue" 8 Globus Downloads Last 24 Hours Last month Focus on aggregation of many resources for massively (data-)parallel applications.

Uploaded by

api-3798592
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Data Mining and Grid

Ian Foster

Computation Institute
Argonne National Lab & University of Chicago
http://ianfoster.typepad.com
www.ci.uchicago.edu www.ci.anl.gov
2
Data
Mining

Grid

3
In the Next 50 Years,
We Must …
● Increase energy production by 5, while
reducing GHG emissions by 2 or more

● Mitigate and adapt to climate change

● Address increasingly drug resistant diseases

● Provide meaningful livelihoods for 9B people

 Innovation 4
Innovation
as a Systems Problem
● Quasi-ubiquitous Internet …
● … connects many potential innovators
◆ Millions of scientists, billions of people
● Who need to leverage
◆ Enormous data of tremendous complexity
◆ Immensely powerful computing
◆ Experimental apparatus of great power

 We must address problem solving as an


distributed, end-to-end, systems problem
5
Grid:
A Unifying Concept & Technology
Grid enables the federation of resources
• Distributed computers, storage, data, people, …
• Networks provide connectivity Infrastructure
• Software & standards provide the “glue”
• Infrastructure services facilitate operation

DATA ADVANCED
ACQUISITION VISUALIZATION ,ANALYSIS

Applications QuickTime™ and a


decompressor
are needed to see this picture.

Research
COMPUTATIONAL
RESOURCES
IMAGING INSTRUMENTS
LARGE-SCALE DATABASES

Credit: Mark Ellisman 6


Grid Infrastructure
● Massive computing and storage
● Service interfaces facilitate access and use

TeraGrid Open Science Grid


7
Software and Standards
Bob
Domenico Grossman
Talia
Angle
Weka Tool
4WS Tool File
Transfer User Svc
Uniform interfaces, Host Env
security mechanisms, Registry
Web service transport,
monitoring
DAI
User Svc
GRAM GridFTP
Globus
IBM

Host Env
IBM

IBM

IBM

Database
Computers Specialized
File system
resource 8
Globus Downloads Last 24 Hours

Last month 9
First Generation Grids:
On-Demand/Batch Computing
Focus on aggregation of many resources for
massively (data-)parallel applications

EGEE

Globus 10
Applications:
High Energy Physics

Globus

11
Integrating Data and
Computing, on Demand
Public PUMA
Knowledge Base
Information about
proteins analyzed
against ~2 million
gene sequences

Back Office
Analysis on Grid
Millions of BLAST,
BLOCKS, etc., on
OSG and TeraGrid
Natalia Maltsev et al., http://compbio.mcs.anl.gov/puma2 12
Second Generation Grids:
Service-Oriented Science
● Empower many more users by enabling
on-demand access to services
● Grids become an enabling technology for
service oriented science (or business)
◆ Grid infrastructures host services
◆ Grid technologies used to build services

Science
Gateways

“Service-Oriented Science”, Science, 2005 13


Service-Oriented Science
People create services (data or functions) …
which I discover (& decide whether to use) …
& compose to create a new function ...
& then publish as a new service.

 I find “someone else” to host services,


so I don’t have to become an expert in

!! operating services & computers!


 I hope that this “someone else” can
manage security, reliability, scalability, …
“Service-Oriented Science”, Science, 2005 14
Earth System Grid
● On-demand access to
climate simulation data
◆ Multiple archives
◆ Interactive query
◆ Per-collection control
◆ Server-side processing
● Major scientific impact
◆ >5000 users
◆ >200 TB downloaded
◆ >300 scientific papers
Globus
www.earthsystemgrid.org — DOE OASCR 15
Cancer Biomedical Informatics
Grid (caBIG)
caBIG: sharing of infrastructure, applications, and
data.

Data
Integration!

Globus

16
caBIG Under the Covers
Analytical Service Grid-Enabled Client Gene
Databas
Tool 1 e

caArray Tool 2 Research


Center
NCICB

Protein
Database
Grid Data Service Grid Services Infrastructure
(Metadata, Registry, Query, Tool 3
Invocation, Security, etc.)
Tool 4

Image
Grid Portal Microarray

Tool 2 Research
Center
Tool 3
Globus 17
LIGO Data Grid
LIGO Gravitational Wave Observatory

Birmingham•
Cardiff

AEI/Golm

Globus
Replicating >1 Terabyte/day to 8 sites
>150 million replicas so far
MTBF = 1 month www.globus.org/solutions 18
The Angle Project

Globus
19
Social Informatics Data Grid

Globus
20
Bennett Berthenthal et al., www.sidgrid.org
A Few Example
Research Themes
● Service discovery, composition, provisioning
◆ SOA, virtualization, cloud computing, …
● Large-scale (distributed) computation
◆ E.g., Swift, Kepler, Taverna
● Provenance
◆ E.g., “Provenance Challenge”
● “Virtual organizations”
◆ E.g., attribute-based authorization, trust
● Integration of physical systems
◆ Optimization of end-to-end workflows
21
Security Services for
Virtual Organization Policy
● Attribute Authority (ATA)
◆ Issue signed attribute assertions (incl. identity,
delegation & mapping)
● Authorization Authority (AZA)
◆ Decisions based on assertions & policy

Resource Admin VO Delegation Assertion VO


Attribute User A User B can use Service A AZA
Globus
VO VO M Mapping VO-A Attr 
ATA embe VO-B Attr
r Attr ATA
ibute

VO Member VO VO A VO B
Attribute User B Service Service
22
Swift
(www.ci.uchicago.edu/swift)

23
An Integrated View of Modeling,
Simulation, Experiment, & Informatics

Problem Modeling and Analysis &


Specification Simulation Visualization

Bioinformatics
Analysis Integrated
Tools Biological
Databases

Experimental High-throughput Analysis &


Design Experiments Visualization

24
Robot Scientist
“The robot scientist project aims to develop a
computer system capable of originating its own Biomek 200
experiments, physically doing them, interpreting
the results, & then repeating the cycle.”
Background Machine Analysis
Knowledge Learning

Consistent
Hypothesis

Experiments(s)
Experiment(s)
Final Theory Robot Results
selection
Stephen Muggleton, Ross King et al., UK 25
Team Science meets
Data Deluge

26

You might also like