Sem A Tic Microsoft
Sem A Tic Microsoft
Digital Repositories
Fabrizio Gagliardi
EMEA & LATAM Director
Technical Computing
MSR External Research
Microsoft Corporation
Microsoft Research’s Commitment to Science
• Advancement of Science
• Global Collaboration
• Technology Excellence
• Interoperability
myGrid
Goals
• A platform for building services and tools for research
output repositories
• Papers, Videos, Presentations, Lectures,
References, Data, Code, etc.
• Relationships between stored entities
• Enable a tools and services ecosystem for “research UIs
Desktop
Search
Execution Research Tools
• Utilizing OAI-ORE, SWORD, and other output
community protocols repository
platform
• In development, deployment within MSR in early Q4
• Beta release to the community in late Q4
• Built on SQL Server 2008 + Entity Framework
Interop Syndication
• Using WPF and Silverlight for UI
Research Output Repository Platform
Goals Non-goals
• Create a platform for building • A generic platform for asset
“research output” repositories management
• Engage with the digital library and • Support the lifecycle of publications
scholarly communications • Compete with existing repository
community solutions
• Become the “research output”
repository for MSR (RMCr project)
Services/tools
– Papers, Videos, Presentations, Lectures,
References, Data, Code, etc.
• Support an ecosystem of services and
tools
Microsoft.Famulus.Framework
• Available to the community for free
(we are still considering the open Microsoft.Famulus.Core
source route) (Based on the Entity Framework Model + extensions)
• Build an easy-to-install collection of SQL Server 2008, MS data storage technologies, Entity
basic services and tools Framework runtime
Research Output Repository Platform
Triple stores
-Evolution friendly Relational schema
-Poor performance -Evolution not so easy
-No need to model everything in advance -Great opportunities for optimization
-Semantic interpretation at the application level -Model everything in advance
pub1.Cites.Add(pub2);
pub1.Authors.Add(tony);
PDF file
Lecture on
is representation of contains 2/19/2008
PowerPoint
presentation
authored by
organized by
presented by
tony
Elizabeth, Sebastien,
Matthew, Norman,
Brian, Sarah, George, Roy
An Ecosystem of Research Repositories
Support of harvesting & federation
to/from Institutional Repositories
- arXiv.org
- DSpace
- ePrints
- Fedora
- etc.
12
A smart cyberinfrastructure
• Collective intelligence
– If last.fm can recommend what song to broadcast to me
based on what my friends are listening to, why cannot the
cyberinfrastructure of the future recommend articles of
potential interest based on what the experts in the field
that I respect are reading?
– Already examples emerging but the process is manual
(Connotea, BioMedCentral Faculty of 1000 ...)
• Automatic correlation of scientific data
• Smart composition of services and functionality
• Cloud computing to aggregate, process, analyze and
visualize data
A world where all data is linked…
• Data/information is inter-
connected through machine-
interpretable information (e.g.
paper X is about star Y)
• Social networks are a special case
of ‘data networks’
• Important/key considerations
– Formats or “well-known” representations
of data/information
– Pervasive access protocols are key (e.g.
HTTP)
– Data/information is uniquely identified
(e.g. URIs)
– Links/associations between
data/information
Reference instant
management messaging
Project identity
mail
management
notification
document store
storage/data
services
knowledge
compute
management
services
knowledge virtualization
discovery
Added slides
eScience
Emergence of a New Research Paradigm?
• Thousand years ago – Experimental Science
– Description of natural phenomena
• Last few hundred years – Theoretical Science
– Newton’s Laws, Maxwell’s Equations…
• Last few decades – Computational Science 2
– Simulation of complex phenomena .
a 4G c2
a
• Today – eScience or Data-centric Science
3 a2
– Unify theory, experiment, and simulation
– Using data exploration and data mining
• Data captured by instruments
• Data generated by simulations
• Data generated by sensor networks
– Scientists overwhelmed with data
– Computer Science and IT companies
have technologies that will help
http://ecrystals.chem.soton.ac.uk
Thanks to Jeremy Frey
Data and services can be easily composed
Taverna Workflow
Compose services from the Web
SensorMap
Functionality: Map navigation
Data: sensor-generated temperature, video camera feed,
traffic feeds, etc.
Data is easily accessible
With thanks to
Catharine van Ingen
Data is easily shareable
storing computing
Computers are huge amounts
great tools for managing indexing
of data
For example, Google and Microsoft both have copies of the Web
for indexing purposes
Tomorrow…
storing computing
Computers will still huge amounts
be great tools for managing indexing
of data
acquisition discovery
We would like
aggregation organization
computers to also of the world’s
help with the correlation analysis information
automatic
interpretation inference
Semantic Computing
What is Semantic Computing?
Current technologies