0% found this document useful (0 votes)
27 views76 pages

IRSW - Semantic Web Introduction

The document discusses organizing a trip to Budapest by finding flights, hotels, and local information from various websites. It then describes how the semantic web could standardize how data is published and linked online to allow applications to directly access and combine information from different sources more easily.

Uploaded by

aastha garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views76 pages

IRSW - Semantic Web Introduction

The document discusses organizing a trip to Budapest by finding flights, hotels, and local information from various websites. It then describes how the semantic web could standardize how data is published and linked online to allow applications to directly access and combine information from different sources more easily.

Uploaded by

aastha garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 76

1

Introduction to Semantic Web


2

Let’s organize a trip to Budapest using the


Web!
3

You try to find a proper flight with …


4

… a big, reputable airline, or …


5

… or a low cost one


6

You have to find a hotel, so you look for…


7

… a really cheap accommodation, or …


8

… or a really luxurious one, or …


9

… an intermediate one …
10

oops, that is no good, the page is in


Hungarian that almost nobody
understands, but…
11

… this one could work


12

Of course, you could decide to trust a


specialized site…
13

… like this one, or…


14

… or this one
15

You may want to know something about


Budapest; look for some photographs…
16

… on flickr …
17

… on Google …
18

… or a (social) travel site


19

What happened here?

 You had to consult a large number of sites, all


different in style, purpose, possibly
language…
 You had to mentally integrate all those
information to achieve your goals
 We all know that, sometimes, this is a long
and tedious process!
20

 All those pages are only tips of respective icebergs:


 the real data is hidden somewhere in databases, XML files,
Excel sheets, …
 you have only access to what the Web page designers allow
you to see
 Specialized sites (Expedia, TripAdvisor) do a bit
more:
 they gather and combine data from other sources (usually
with the approval of the data owners)
 but they still control how you see those sources
 But sometimes you want to personalize: access
the original data and combine it yourself!
21

Put it another way…

 We would like to extend the current Web to a “Web


of data”:
 allow for applications to exploit the data directly
22

But wait! Isn’t what mashup sites are


already doing?
23

A “mashup” example:
24

 In some ways, yes, and that shows the huge power


of what such Web of data provides
 But mashup sites are forced to do very ad-hoc jobs
 various data sources expose their data via Web Services
 each with a different API, a different logic, different structure
 these sites are forced to reinvent the wheel many times
because there is no standard way of doing things
25

Put it another way (again)…


 We would like to extend the current Web to a
standard way for a “Web of data”

 What makes the current (document) Web work?


 people create different documents
 they give an address to it (ie, a URI) and make it accessible
to others on the Web
26

Let us put it together


 What we need for a Web of Data:
 use URI-s to publish data, not only full documents
 allow the data to link to other data
 characterize/classify the data and the links (the “terms”) to
convey some extra meaning
 and use standards for all these!
27

So What is the Semantic Web?


28

It is a collection of standard technologies to


realize a Web of Data.

Semantic web formalizes knowledge in a way


that improves decision making and can form
the basis for autonomous reasoning in future

Semantic web is an effort to make the content


in www accessible and readable for a
machine
29

Web 3.0- semantic web


• The Web 3.0 also referred as Semantic Web or
read-write-execute is the era(2010 and above)
which refers to the future of web.

• In this era computers can interpret information like


humans via Artificial Intelligence and Machine
Learning.

• Help to intelligently generate and distribute useful


content tailored to a particular need of a user.
30

Semantic web
Semantic web have set of standards and best
practices for sharing data and the
semantics of that data over the web for use
by applications.
A set of standards:
• The RDF Data Model.
• The SPARQL query Language.
• OWL Standards for storing vocabularies and ontologies.
The best practices for sharing data over
web
• The use of URIs to name things
• The use of standards such as RDF and SPARQL
31

In what follows…
 We will use a simplistic example to introduce the
main technical concepts
 The details will be for later during the course
32

The rough structure of data integration


1. Map the various data onto an abstract data
representation
 make the data independent of its internal representation…
2. Merge the resulting representations
3. Start making queries on the whole!
 queries that could not have been done on the individual data
sets
A simplified bookstore data 33

English books database(dataset “A”)


ID Author Title Publisher Year
ISBN0-00-651409-X id_xyz The Glass Palace id_qpr 2000

ID Name Home Page


id_xyz Ghosh, Amitav http://www.amitavghosh.com

ID Publ. Name City


id_qpr Harper Collins London
34

1st: export your data as a set of relations


35

French books database (dataset “F”)


A B D E

1 ID Titre Traducteur Original


ISBN0 2020386682 Le Palais A13 ISBN-0-00-651409-X
des
2 miroirs
3

6 ID Auteur
7 ISBN-0-00-651409-X A12

11 Nom
12 Ghosh, Amitav
13 Besse, Christianne
36

2nd: export your second set of data


37

3rd: start merging your data


38

3rd: start merging your data (cont.)


39

3rd: merge identical resources


40

Start making queries…


 User of data “F” can now ask queries like:
 “give me the title of the original”
 well, … « donnes-moi le titre de l’original »
 This information is not in the dataset “F”…
 …but can be retrieved by merging with dataset “A”!
41

However, more can be achieved…


 We “feel” that a:author and f:auteur should be
the same
 But an automatic merge does not know that!
 Let us add some extra information to the merged
data:
 a:author same as f:auteur
 both identify a “Person”
 a term that a community may have already defined:
 a “Person” is uniquely identified by his/her name and, say,
homepage
 it can be used as a “category” for certain type of resources
42

3rd revisited: use the extra knowledge

Foaf: friend of a friend is a machine readable


ontology describing persons, their activities and their
relations to other people and objects. Anyone can use
FOAF to describe themselves. FOAF allows groups of
people to describe social networks without the need for
a centralised database.
43

Start making richer queries!


 User of dataset “F” can now query:
 “donnes-moi la page d’accueil de l’auteur de l’originale”
 well… “give me the home page of the original’s ‘auteur’”
 The information is not in datasets “F” or “A”…
 …but was made available by:
 merging datasets “A” and datasets “F”
 adding three simple extra statements as an extra “glue”
44

Combine with different datasets


 Using, e.g., the “Person”, the dataset can be
combined with other sources
 For example, data in Wikipedia can be extracted
using dedicated tools
 e.g., the “dbpedia” project can extract the “infobox”
information from Wikipedia already…
45

Merge with Wikipedia data


46

Merge with Wikipedia data


47

Merge with Wikipedia data


48

Is that surprising?
 It may look like it but, in fact, it should not be…
 What happened via automatic means is done every
day by Web users!
 The difference: a bit of extra rigour so that
machines could do this, too
49

What did we do?


 We combined different datasets that
 are somewhere on the web
 are of different formats (mysql, excel sheet, XHTML, etc)
 have different names for relations
 We could combine the data because some URI-s
were identical (the ISBN-s in this case)
 We could add some simple additional information
(the “glue”), possibly using common terminologies
that a community has produced
 As a result, new relations could be found and
retrieved
50

It could become even more powerful


 We could add extra knowledge to the merged
datasets
 e.g., a full classification of various types of library data
 geographical information
 etc.
 This is where ontologies, extra rules, etc, come in
 ontologies/rule sets can be relatively simple and small, or
huge, or anything in between…
 Even more powerful queries can be asked as a
result
51

What did we do? (cont)


52

The Basis: RDF


53

RDF HISTORY
54

Resource Description Framework


55

Views of RDF
56

RDF triples (cont.)


 An RDF Triple (s,p,o) is such that:
 “s”, “p” are URI-s, ie, resources on the Web; “o” is a URI or
a literal
 “s”, “p”, and “o” stand for “subject”, “property”, and “object”
 here is the complete triple:

 RDF is a general model for such triples (with machine


readable formats like RDF/XML, Turtle, N3, RXR, …)
57

RDF triples (cont.)


 Resources can use any URI, e.g.:
 http://www.example.org/file.xml#element(home)
 http://www.example.org/file.html#home
 http://www.example.org/file2.xml#xpath1(//q[@a=b])

 URI-s can also denote non Web entities:


 http://www.ivan-herman.net/me is me
 not my home page, not my publication list, but me

 RDF triples form a directed, labelled graph


58

A simple RDF example (in RDF/XML)

<rdf:Description
<rdf:Description rdf:about="http://…/isbn/2020386682">
rdf:about="http://…/isbn/2020386682">
<f:titre
<f:titre xml:lang="fr">Le palais
xml:lang="fr">Le palais des
des mirroirs</f:titre>
mirroirs</f:titre>
<f:original
<f:original rdf:resource="http://…/isbn/000651409X"/>
rdf:resource="http://…/isbn/000651409X"/>
</rdf:Description>
</rdf:Description>

(Note: namespaces are used to simplify the URI-s)


59

Resource description framework (RDF)

It is Data Model used to represent resources


Use to define a resource a triple is used
It is basic building block of a statement
Universal machine readable exchange format
RDF has an XML syntax
RDF has other Notation (Turtle, N Triples, N3, JSON
60

RDF Graph (Eg : Beatle band)


61

This graph shows several nodes that represent entities such as the
Beatles band and one of their studio albums.

Each edge has an identifier that tells us what relationship holds between
those nodes. For example, the :member edge links bands to its
members. The rdf:type edge represent a special kind of relationship.

These edges are sometimes called “attributes” of the node and are often
used to represent the characteristics of the nodes.
This simple, flexible data model has a lot of expressive power to
represent complex situations, relationships, and other things of interest,
while also being appropriately abstract,
62

RDf in turtle
63

RDF

RDF has a special nomenclature for


naming nodes and edges in a graph.
An edge is called a triple, the source
node is called a subject, the edge
name is called a predicate, and the
target node is called an object.
64

RDF NODES

There are three different kinds of RDF nodes:


•IRI (Internationalized Resource Identifiers): An IRI is a unicode string for
identifying nodes and edges in an unambiguous way.
IRIs are internationalized versions of URIs which are generalizations of URLs.

•Blank node: Nodes without a user-visible identifier are called blank nodes
(“bnode” for short). A blank node is appropriate when the node does not need
to be referenced directly. Blank nodes can be reached by following its incident
edges from other nodes.

•Literal: Literals are concrete values used to represent datatypes like strings,
numbers, and dates.

In the example Beatles graph, we have several IRIs: :The_Beatles and :John_Lennon are
two examples. Literals in our example include the string "The Beatles", the date "1963-03-22",
and the integer 125
IRI 65

IRIs, just like the URIs and URLs they generalize, are long strings that are not
easy to read or write. A full IRI can be serialized by simply enclosing it in
angle brackets:

A prefix is a short name that is mapped to a long namespace. A prefixed name


is the sequence of the prefix and the local name separated by a colon :. The
empty string is a valid prefix and called the default namespace for a graph.
Blank Nodes 66

Suppose we extend our example. We have a new use case that requires us to
capture, for all the tracks in an album, which side they belong to–when applicable
for albums released on media with “sides”, i.e., vinyl, cassettes, etc.–and the order
of the song on the album/side. We can introduce a blank node between the album
and the song to attach this data:

Blank nodes do not have globally unique identifiers so when they are serialized a locally
unique, non-persistent label is used. These blank node names are serialized after
the special prefix _:.
67

Literals
Literals are serialized as their lexical value in double quotes followed by the datatype
after double carets (^^). The datatype is typically a built-in datatype from
XML Schema Datatypes (XSD) that defines many commonly used datatypes but
custom datatype IRIs can also be used. Some of the XSD datatypes can be
serialized without the explicit datatype or the quotes. The following table shows
examples of serializing different datatypes:

Serialization Datatype Description


“The Beatles” xsd:string Datatype can be omitted for string values
“1963-03-
xsd:date Date value
22”^^xsd:date
The datatype and double quotes can be omitted for integer
125 xsd:integer
values
Arbitrary-precision decimals can be written without the
xsd:decima
3.0 datatype and quotes too. Existence of . in the number makes it
l
a decimal.
Double-precision floating point values can be written in
3.2E4 xsd:double scientific notation with the symbol E separating the mantissa
from the exponent
xsd:boolea Lowercase strings true and false can be used for boolean
true
RDF Syntax 68

Node-and-link visualization of graphs is convenient


and easy to understand on a small scale, but it is
not very useful for exchanging data between
systems.
There are several syntaxes to serialize an RDF graph
as text including syntaxes based on XML and
JSON. In this tutorial, we will use the Turtle syntax
which is also the basis of the SPARQL query
language.
69

RDf in turtle

The
serialization of
the Beatles
graph in Turtle
syntax looks
like this:
70

Turtle introduces some syntactic sugar:


•Multiple predicates: If two triples share the same subject then the first triple can be
terminated with; and the subject of the second triple can be omitted.

•Multiple objects: If two triples share the same subject and the same predicate the
objects can be separated with , without repeating the subject or the predicate.

•Types: The letter a can be used in place of rdf:type; you can read this as “is a”,
basically. For example, “Love Me Do is a song.”
71
72

Named Graph
Sometimes it is useful to assign a name to an RDF
graph for the purposes of sane data management,
access control, or to attach metadata to the overall
graph rather than to individual nodes. The notion of
named graphs in RDF allows us to do that.
73

Shown here are the triples


that are separated into named
graphs, not the nodes, and
different named graphs can
share some common nodes,
e.g. :Please_Please_Me node
appears both in
the :Artist graph and
the :Album graph.

It is possible to traverse the


edges starting from one
named graph and continue
into another named graph via
these shared nodes.

It is through this sharing of


nodes across named graphs
that the collection of named
graphs (conceptually)
constitute a larger unified
graph.
74

RDF Class

Classes represent categories of nodes with similar


characteristics. Nodes that belong to this category are linked to
the class using the rdf:type (short hand: a) property. Classes
themselves are identified by the meta-class rdfs:Class

:Band a rdfs:Class . # declaration of a class


:The_Beatles a :Band . # declaring an instance of a class
Properties 75

Property is a relation between subjects and objects. We have already seen many
examples of properties; :album, for example. We can use the rdf:Property class to
declare properties:

:track a rdf:Property .
:length a rdf:Property .

should
The range of the track property is defined to be a Song class, so the objects be resources th
are instances of this class.
The range of the length property, on the other hand, is defined to be the built-in datatype xsd:integer
so the objects should be integer literals.
76

RDF Metadata

RDFS also provides two properties that can


be used to provide metadata about nodes,
classes, and properties:
•rdfs:label: provides a human-readable name
length rdfs:label "length (in seconds)" ;
for a resource
:
rdfs:comment "The length of a song expressed in seconds".
•rdfs:comment: provides a human-readable
description of a resource

You might also like