Slides
Slides
Introduction
3
… an intermediate one …
12
… or this one
17
… on flickr …
19
… on Google …
20
… Dopplr,
28
… Twine,
29
… LinkedIn,
30
A “mashup” example:
36
was not…
43
It is that simple…
Of course, the devil is in the details
a common model has to be provided for machines to
describe, query, etc, the data and their connections
the “classification” of the terms can become very complex
for specific knowledge areas: this is where ontologies,
thesauri, etc, enter the game…
53
In what follows…
We will use a simplistic example to introduce the
main technical concepts
The details will be for later during the course
54
6 ID Auteur
7 ISBN-0-00-651409-X A12
11 Nom
12 Ghosh, Amitav
13 Besse, Christianne
59
Is that surprising?
It may look like it but, in fact, it should not be…
What happened via automatic means is done every
day by Web users!
The difference: a bit of extra rigour so that
machines could do this, too
72
RDF triples
Let us begin to formalize what we did!
we “connected” the data…
but a simple connection is not enough… data should be
named somehow
hence the RDF Triples: a labelled connection between two
resources
77
(<http://…isbn…6682>,
(<http://…isbn…6682>, <http://…/original>,
<http://…/original>, <http://…isbn…409X>)
<http://…isbn…409X>)
<rdf:Description
<rdf:Description rdf:about="http://…/isbn/2020386682">
rdf:about="http://…/isbn/2020386682">
<f:titre
<f:titre xml:lang="fr">Le palais
xml:lang="fr">Le palais des
des mirroirs</f:titre>
mirroirs</f:titre>
<f:original
<f:original rdf:resource="http://…/isbn/000651409X"/>
rdf:resource="http://…/isbn/000651409X"/>
</rdf:Description>
</rdf:Description>
<http://…/isbn/2020386682>
<http://…/isbn/2020386682>
f:titre
f:titre "Le
"Le palais
palais des
des mirroirs"@fr
mirroirs"@fr ;;
f:original
f:original <http://…/isbn/000651409X>
<http://…/isbn/000651409X> ..
81
“Internal” nodes
Consider the following statement:
“the publisher is a «thing» that has a name and an address”
Until now, nodes were identified with a URI. But…
…what is the URI of «thing»?
82
<rdf:Description
<rdf:Description rdf:about="http://…/isbn/000651409X">
rdf:about="http://…/isbn/000651409X">
<a:publisher rdf:nodeID="A234"/>
<a:publisher rdf:nodeID="A234"/>
</rdf:Description>
</rdf:Description>
<rdf:Description
<rdf:Description rdf:nodeID="A234">
rdf:nodeID="A234">
<a:p_name>HarpersCollins</a:p_name>
<a:p_name>HarpersCollins</a:p_name>
<a:city>HarpersCollins</a:city>
<a:city>HarpersCollins</a:city>
</rdf:Description>
</rdf:Description>
<http://…/isbn/2020386682>
<http://…/isbn/2020386682> a:publisher
a:publisher _:A234.
_:A234.
_:A234
_:A234 a:p_name
a:p_name "HarpersCollins".
"HarpersCollins".
Same in Turtle
<http://…/isbn/000651409X>
<http://…/isbn/000651409X> a:publisher
a:publisher [[
a:p_name
a:p_name "HarpersCollins";
"HarpersCollins";
……
].
].
85
Jena example
//
// create
create aa model
model
Model
Model model=new
model=new ModelMem();
ModelMem();
Resource subject=model.createResource("URI_of_Subject")
Resource subject=model.createResource("URI_of_Subject")
//
// 'in'
'in' refers
refers to
to the
the input
input file
file
model.read(new InputStreamReader(in));
model.read(new InputStreamReader(in));
StmtIterator
StmtIterator iter=model.listStatements(subject,null,null);
iter=model.listStatements(subject,null,null);
while(iter.hasNext())
while(iter.hasNext()) {{
st
st == iter.next();
iter.next();
pp == st.getProperty();
st.getProperty();
oo == st.getObject();
st.getObject();
do_something(p,o);
do_something(p,o);
}}
88
Merge in practice
Environments merge graphs automatically
e.g., in Jena, the Model can load several files
the load merges the new statements automatically
89
Courtesy of Nigel Wilkinson, Lee Harland, Pfizer Ltd, Melliyal Annamalai, Oracle (SWEO Case Study)
90
Classes, resources, …
Think of well known traditional ontologies or
taxonomies:
use the term “novel”
“every novel is a fiction”
“«The Glass Palace» is a novel”
etc.
RDFS defines resources and classes:
everything in RDF is a “resource”
“classes” are also resources, but…
…they are also a collection of possible resources (i.e.,
“individuals”)
“fiction”, “novel”, …
93
<rdf:Description
<rdf:Description rdf:about="http://…/isbn/000651409X">
rdf:about="http://…/isbn/000651409X">
<rdf:type rdf:resource="http://…/bookSchema.rdf#Novel"/>
<rdf:type rdf:resource="http://…/bookSchema.rdf#Novel"/>
</rdf:Description>
</rdf:Description>
96
Inferred properties
(<http://…/isbn/000651409X>
(<http://…/isbn/000651409X> rdf:type
rdf:type #Fiction)
#Fiction)
If:
If:
uuu
uuu rdfs:subClassOf
rdfs:subClassOf xxx
xxx ..
vvv
vvv rdf:type
rdf:type uuu
uuu ..
Then
Then add:
add:
vvv
vvv rdf:type
rdf:type xxx
xxx ..
99
Properties
Property is a special class (rdf:Property)
properties are also resources identified by URI-s
There is also a possibility for a “sub-property”
all resources bound by the “sub” are also bound by the other
Range and domain of properties can be specified
i.e., what type of resources serve as object and subject
100
In Turtle:
:title
:title
rdf:type
rdf:type rdf:Property;
rdf:Property;
rdfs:domain :Fiction;
rdfs:domain :Fiction;
rdfs:range
rdfs:range rdfs:Literal.
rdfs:Literal.
101
<http://…/isbn/000651409X>
<http://…/isbn/000651409X> rdf:type
rdf:type :Fiction
:Fiction ..
102
Literals
Literals may have a data type
floats, integers, booleans, etc, defined in XML Schemas
full XML fragments
(Natural) language can also be specified
103
<http://…/isbn/000651409X>
<http://…/isbn/000651409X>
:page_number
:page_number "543"^^xsd:integer
"543"^^xsd:integer ;;
:publ_date
:publ_date "2000"^^xsd:gYear
"2000"^^xsd:gYear ;;
:price
:price "6.99"^^xsd:float
"6.99"^^xsd:float ..
104
Michael Grove, Clark & Parsia, LLC, and Andrew Schain, NASA, (SWEO Case Study)
106
Simple approach
Write RDF/XML or Turtle “manually”
In some cases that is necessary, but it really does
not scale…
108
Extract RDF
Use intelligent “scrapers” or “wrappers” to extract a
structure (hence RDF) from a Web pages or XML
files…
… and then generate RDF automatically (e.g., via
an XSLT script)
110
GRDDL
The transformation itself has to be provided for
each set of conventions
A more general syntax is defined for XML formats
in general (e.g., via the namespace document)
a method to get data in other formats to RDF (e.g., XBRL)
112
RDFa
RDFa extends (X)HTML a bit by:
defining general attributes to add metadata to any elements
provides an almost complete “serialization” of RDF in
XHTML
It is a bit like the microformats/GRDDL approach
but fully generic
114
RDFa example
For example:
<div
<div about="http://uri.to.newsitem">
about="http://uri.to.newsitem">
<span
<span property="dc:date">March
property="dc:date">March 23,
23, 2004</span>
2004</span>
<span
<span property="dc:title">Rollers hit casino
property="dc:title">Rollers hit casino for
for £1.3m</span>
£1.3m</span>
By
By <span property="dc:creator">Steve Bird</span>. See
<span property="dc:creator">Steve Bird</span>. See
<a href="http://www.a.b.c/d.avi" rel="dcmtype:MovingImage">
<a href="http://www.a.b.c/d.avi" rel="dcmtype:MovingImage">
also
also video
video footage</a>…
footage</a>…
</div>
</div>
London Gazette
117
Linking Data
120
dbpedia:Amsterdam
dbpedia:Amsterdam
dbterm:officialName
dbterm:officialName “Amsterdam”
“Amsterdam” ;;
dbterm:longd
dbterm:longd “4”
“4” ;;
dbterm:longm
dbterm:longm “53”
“53” ;;
dbterm:longs
dbterm:longs “32”
“32” ;;
...
...
dbterm:leaderTitle
dbterm:leaderTitle “Mayor”
“Mayor” ;;
dbterm:leaderName
dbterm:leaderName dbpedia:Job_Cohen
dbpedia:Job_Cohen ;;
...
...
dbterm:areaTotalKm
dbterm:areaTotalKm “219”
“219” ;;
...
...
dbpedia:ABN_AMRO
dbpedia:ABN_AMRO
dbterm:location
dbterm:location dbpedia:Amsterdam
dbpedia:Amsterdam ;;
...
...
123
<http://sws.geonames.org/2759793>
<http://sws.geonames.org/2759793>
owl:sameAs
owl:sameAs <http://dbpedia.org/resource/Amsterdam>
<http://dbpedia.org/resource/Amsterdam>
wgs84_pos:lat
wgs84_pos:lat “52.3666667”
“52.3666667” ;;
wgs84_pos:long
wgs84_pos:long “4.8833333”
“4.8833333” ;;
geo:inCountry
geo:inCountry <http://www.geonames.org/countries/#NL>
<http://www.geonames.org/countries/#NL> ;;
...
...
Returns:
[[<..49X>,33,£], [<..49X>,50,€], [<..6682>,60,€],
[<..6682>,78,$]]
137
Pattern constraints
SELECT
SELECT ?isbn
?isbn ?price
?price ?currency
?currency ## note:
note: not
not ?x!
?x!
WHERE
WHERE { ?isbn a:price ?x. ?x rdf:value ?price. ?x
{ ?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency
p:currency ?currency.
?currency.
FILTER(?currency ==
FILTER(?currency == € }€ }
Ontologies
(OWL)
143
Ontologies
RDFS is useful, but does not solve all possible
requirements
Complex applications may want more possibilities:
characterization of properties
identification of objects with different URI-s
disjointness or equivalence of classes
construct classes, not only name them
can a program reason about some terms? E.g.:
“if «Person» resources «A» and «B» have the same
«foaf:email» property, then «A» and «B» are identical”
etc.
144
Ontologies (cont.)
The term ontologies is used in this respect:
“defines
“defines the
the concepts
concepts and
and relationships
relationships used
used to
to describe
describe
and
and represent
represent an
an area
area of
of knowledge”
knowledge”
OWL is complex…
OWL is a large set of additional terms
We will not cover the whole thing here…
147
Term equivalences
For classes:
owl:equivalentClass: two classes have the same
individuals
owl:disjointWith: no individuals in common
For properties:
owl:equivalentProperty
remember the a:author vs. f:auteur
owl:propertyDisjointWith
For individuals:
owl:sameAs: two URIs refer to the same concept
(“individual”)
owl:differentFrom: negation of owl:sameAs
148
Connecting to French…
149
<http://dbpedia.org/resource/Amsterdam>
<http://dbpedia.org/resource/Amsterdam>
owl:sameAs
owl:sameAs <http://sws.geonames.org/2759793>;
<http://sws.geonames.org/2759793>;
Property characterization
In OWL, one can characterize the behaviour of
properties (symmetric, transitive, functional, inverse
functional…)
One property may be the inverse of another
OWL also separates data and object properties
“datatype property” means that its range are typed literals
151
Classes in OWL
In RDFS, you can subclass existing classes…
that’s all
In OWL, you can construct classes from existing
ones:
enumerate its content
through intersection, union, complement
Etc
153
ex:Person
ex:Person rdf:type
rdf:type owl:Class.
owl:Class.
<uri-for-Amitav-Ghosh>
<uri-for-Amitav-Ghosh>
rdf:type
rdf:type owl:Thing;
owl:Thing;
rdf:type
rdf:type owl:Person ..
owl:Person
154
:£
:£ rdf:type
rdf:type owl:Thing.
owl:Thing.
:€ rdf:type owl:Thing.
:€ rdf:type owl:Thing.
:$
:$ rdf:type
rdf:type owl:Thing.
owl:Thing.
:Currency
:Currency
rdf:type
rdf:type owl:Class;
owl:Class;
owl:oneOf
owl:oneOf (:€
(:€ :£
:£ :$).
:$).
:Novel
:Novel rdf:type
rdf:type owl:Class.
owl:Class.
:Short_Story
:Short_Story rdf:type owl:Class.
rdf:type owl:Class.
:Poetry
:Poetry rdf:type
rdf:type owl:Class.
owl:Class.
:Literature rdf:type owl:Class;
:Literature rdf:type owl:Class;
owl:unionOf
owl:unionOf (:Novel
(:Novel :Short_Story
:Short_Story :Poetry).
:Poetry).
For example…
If:
:Novel
:Novel rdf:type
rdf:type owl:Class.
owl:Class.
:Short_Story
:Short_Story rdf:type owl:Class.
rdf:type owl:Class.
:Poetry
:Poetry rdf:type
rdf:type owl:Class.
owl:Class.
:Literature rdf:type owl:Class;
:Literature rdf:type owl:Class;
owl:unionOf
owl:unionOf (:Novel
(:Novel :Short_Story
:Short_Story :Poetry).
:Poetry).
<myWork>
<myWork> rdf:type
rdf:type :Novel
:Novel ..
<myWork>
<myWork> rdf:type
rdf:type :Literature
:Literature ..
157
Restrictions formally
Defines a class of type owl:Restriction with a
reference to the property that is constrained
definition of the constraint itself
One can, e.g., subclass from this node when
defining a particular class
:Listed_Price
:Listed_Price rdfs:subClassOf
rdfs:subClassOf [[
rdf:type
rdf:type owl:Restriction;
owl:Restriction;
owl:onProperty
owl:onProperty p:currency;
p:currency;
owl:allValuesFrom
owl:allValuesFrom :Currency.
:Currency.
].
].
162
Possible usage…
If:
:Listed_Price
:Listed_Price rdfs:subClassOf
rdfs:subClassOf [[
rdf:type
rdf:type owl:Restriction;
owl:Restriction;
owl:onProperty
owl:onProperty p:currency;
p:currency;
owl:allValuesFrom
owl:allValuesFrom :Currency.
:Currency.
].
].
:price
:price rdf:type
rdf:type :Listed_Price
:Listed_Price ..
:price
:price p:currency
p:currency <something>
<something> ..
Other restrictions
OWL “species”
OWL species comes to the fore:
restricting which terms can be used and under what
circumstances (restrictions)
if one abides to those restrictions, then simpler inference
engines can be used
They reflect compromises: expressibility vs.
implementability
166
OWL Full
No constraints on any of the constructs
owl:Class is just syntactic sugar for rdfs:Class
owl:Thing is equivalent to rdfs:Resource
this means that:
Class can also be an individual, a URI can denote a property
as well as a Class
e.g., it is possible to talk about class of classes, apply properties
on them
etc
etc.
Extension of RDFS in all respects
But: no system may exist that infers everything one
might expect
167
OWL DL
A number of restrictions are defined
classes, individuals, object and datatype properties, etc, are
fairly strictly separated
object properties must be used with individuals
i.e., properties are really used to create relationships between
individuals
no characterization of datatype properties
…
But: well known inference algorithms exist!
169
<q>
<q> rdf:type
rdf:type <A>.
<A>. ## AA is
is aa class,
class, qq is
is an
an individual
individual
<r>
<r> rdf:type
rdf:type <q>.
<q>. ## error:
error: qq cannot
cannot be
be used
used for
for aa class,
class, too
too
<A>
<A> ex:something
ex:something <B>.
<B>. ## error:
error: properties
properties are
are for
for individuals
individuals only
only
<q>
<q> ex:something
ex:something <s>.
<s>. ## error:
error: same
same property
property cannot
cannot be
be used
used as
as
<p> ex:something “54”.
<p> ex:something “54”. ## object and datatype property
object and datatype property
170
OWL DL usage
Abiding to the restrictions means that very large
ontologies can be developed that require precise
procedures
eg, in the medical domain, biological research, energy
industry, financial services (eg, XBRL), etc
the number of classes and properties described this way
can go up to the many thousands
OWL DL has become a language of choice to
define and manage formal ontologies in general
even if their usage is not necessarily on the Web
171
Ontology development
The hard work is to create the ontologies
requires a good knowledge of the area to be described
some communities have good expertise already (e.g.,
librarians)
OWL is just a tool to formalize ontologies
large scale ontologies are often developed in a community
process
Ontologies should be shared and reused
can be via the simple namespace mechanisms…
…or via explicit import
173
Ontologies examples
eClassOwl: eBusiness ontology for products and
services, 75,000 classes and 5,500 properties
National Cancer Institute’s ontology: about 58,000
classes
Open Biomedical Ontologies Foundry: a collection
of ontologies, including the Gene Ontology to
describe gene and gene product attributes in any
organism or protein sequence and annotation
terminology and data (UniProt)
BioPAX: for biological pathway data
175
Other SW technologies
There are other technologies that we do not have
time for here
find RDF data associated with general URI-s: POWDER
bridge to thesauri, glossaries, etc: SKOS
use Rule engines on RDF data
180
Integration of
relevant data in
Zaragoza (using
RDF and ontologies)
Use rules on the
RDF data to provide
a proper itinerary
Courtesy of Jesús Fernández, Mun. of Zaragoza, and Antonio Campos, CTIC (SWEO Use Case)
183
“Core” vocabularies
There are also a number widely used “core
vocabularies”
Dublin Core: about information resources, digital libraries,
with extensions for rights, permissions, digital right
management
FOAF: about people and their organizations
DOAP: on the descriptions of software projects
SIOC: Semantically-Interlinked Online Communities
vCard in RDF
…
One should never forget: ontologies/vocabularies
must be shared and reused!
186
Some books
G. Antoniu and F. van Harmelen: Semantic Web
Primer, 2nd edition in 2008
D. Allemang and J. Hendler: Semantic Web for the
Working Ontologist, 2008
Jeffrey Pollock: Semantic Web for Dummies, 2009
…
Further information
Planet RDF aggregates a number of SW blogs:
http://planetrdf.com/
Semantic Web Interest Group
a forum developers with archived (and public) mailing list,
and a constant IRC presence on freenode.net#swig
anybody can sign up on the list:
http://www.w3.org/2001/sw/interest/
188
Conclusions
The Semantic Web is about creating a Web of
Data
There is a great and very active user and
developer community, with new applications
witness the size and diversity of this event
190
http://www.w3.org/2009/Talks/0615-SanJose-tutorial-IH/