Semantic Unit1
Semantic Unit1
WEB
UNIT - I
The Semantic Web Vision
1. Introduction
1. Motivation for the Semantic Web
2. Design Decisions for the Semantic Web
3. Basic Technology for the Semantic Web
4. From Data to Knowledge
5. The Web Architecture of the Semantic Web
6. How to Get There from Here
2. Semantic Web Technologies
1. Explicit Metadata
2. Ontologies
3. Logic
4. The Semantic Web versus Artificial Intelligence
3. A Layered Approach
1. Introduction
1. Motivation for the Semantic Web
2. Design Decisions for the Semantic Web
3. Basic Technology for the Semantic Web
4. From Data to Knowledge
5. The Web Architecture of the Semantic Web
6. How to Get There from Here
Motivation for the Semantic Web
• “semantic web” can be summarized in a single phrase:
• to make the web more accessible to computers
• current web is a web of text and pictures
• Such media are very useful for people, but computers play a very
limited role on the current web:
• they index keywords, and they ship information from servers to clients, but
that is all.
• All the intelligent work (selecting, combining, aggregating, etc.) has to
be done by the human reader.
• What if we could make the web richer for machines, so that it would
be full of machine readable, machine “understandable” data?
•Such a web would facilitate many things that are
impossible on the current web:
•Search would be no longer limited to simply looking for
keywords, but could become more semantic, which would
include looking for
• synonyms
• being aware of homonyms
• taking into account context
• purpose of the search query
•Websites could become more personalized if personal
browsing agents were able to understand the contents of a
web page and tailor it to personal interest profiles.
•Linking could become more semantic by deciding
dynamically
• which pages would be useful destinations
• based on the current user’s activities
• instead of having to hardwire the same links for all users
ahead of time.
•It would be possible to integrate information across
websites, instead of users currently having to do a
“mental copy-paste” whenever they find some
information on one site that they want to combine with
information from another.
Design Decisions for the Semantic Web
• There are many ways of going about building a more “semantic” web.
• One way would be to build a “Giga Google,” relying on “the unreasonable
effectiveness of data” to find the right correlations among words, between
terms and context, etc.
• The plateau in search engine performance that we have been witnessing
over the past few years seems to suggest that there are limitations to this
approach:
• none of the search giants have been able to go beyond returning simply flat lists of
disconnected pages
•The Semantic Web follows different design principles,
which can be summarized as follows:
1. make structured and semi-structured data available in
standardized formats on the web;
2. make not just the datasets, but also the individual
data-elements and their relations accessible on the web;
3. describe the intended semantics of such data in a
formalism, so that this intended semantics can be
processed by machines.
• The decision to exploit structured and semi-structured data is based on a
key observation, namely that underlying the current unstructured “web
of text and pictures is” actually a very large amount of structured and
semi-structured data.
• The vast majority of web content is being generated from databases and
content management systems containing carefully structured datasets.
• However, the often rich structure that is available in such datasets is
almost completely lost in the process of publishing such structured data
as human-readable Hypertext Markup Language (HTML) pages.
Structured and unstructured data on the web:
Basic Technology for the Semantic Web
• The aforementioned three design principles have been translated into actual
technology
1. use labeled graphs as the data model for objects and their relations, with objects
as nodes in the graph, and the edges in the graph depicting the relations
between these objects.
• The unfortunately named “Resource Descripion Framework” RDF3 is used as the
formalism to represent such graphs.
2. use web identifiers (Uniform Resource Identifiers - URI) to identify the
individual data-items and their relations that appear in the datasets.
• Again, this is reflected in the design of RDF.
3. use ontologies (briefly: hierarchical vocabularies of types and relations) as the
data model to formally represent the intended semantics of the data.
• Formalisms such as RDF Schema and The Web Ontology Language (OWL) are used for
this purpose, again using URIs to represent the types and their properties
From Data to Knowledge
• It is important to realize that in order to really capture the intended
semantics of the data, a formalism such as RDF Schema and OWL are not
just data-description languages, but are actually lightweight knowledge
representation languages.
• They are “logics” that allow the inference of additional information from
the explicitly stated information.
• RDF Schema is a very low expressivity logic that allows some very simple
inferences, such as property inheritance over a hierarchy of types and
type-inference of domain and range restrictions.
• Similarly, OWL is somewhat richer (but still relatively lightweight) logic
that allows additional inferences such as
• equality and
• inequality,
• number restrictions,
• existence of objects and others.
• Such inferences in RDF Schema and OWL give publishers of information
the possibility to create a minimal lower bound of facts that readers must
believe about published data.
• Additionally, OWL gives information publishers the possibility to forbid
readers of information to believe certain things about the published data
(at least as long as everybody intends to stay consistent with the published
ontology).
•Together, performing such inferences over these
logics amounts to imposing both a lower bound and
an upper bound on the intended semantics of the
published data.
•By increasingly refining the ontologies,
•these lower and upper bounds can be moved arbitrarily
close together
•thereby pinning down ever more precisely the
intended semantics of the data
•to the extent required by the use cases at hand
The Web Architecture of the Semantic Web
•A key aspect of the traditional web is the fact that its content is
distributed, both in location and in ownership:
• web pages that link to each other often live on different web servers,
and these servers are in different physical locations and owned by
different parties.
•A crucial contributor to the growth of the web is the fact that
“anybody can say anything about anything,” or more precisely:
• anybody can refer to anybody’s web page without having to negotiate
first about permissions or inquire about the right address or identifier
to use.
• A similar mechanism is at work in the Semantic Web:
• a first party can publish a dataset on the web (left side of the diagram),
• a second party can independently publish a vocabulary of terms (right side
of the diagram),
• a third party may decide to annotate the object of the first party with a term
published by the second party, without asking for permission from either of
them, and in fact without either of these two having to even know about it.
• It is this decoupling that is the essence of the weblike nature of the Semantic
Web.
How to Get There from Here
• Architectural principles an implemented reality:
1. We must agree on standard syntax to represent data and metadata.
2. We must have sufficient agreement on the metadata vocabularies in
order to share intended semantics of the data.
3. We must publish large volumes of data in the formats of step 1,
using the vocabularies of step 2.
Semantic Web Technologies
1. Explicit Metadata
2. Ontologies
3. Logic
4. The Semantic Web versus Artificial
Intelligence
Explicit Metadata
• Currently, web content is formatted for human readers rather than
programs.
• HTML is the predominant language in which web pages are written
directly or using tools.
• A portion of a typical web page of a physical therapist might look
like this
•This representation is far more easily processable by machines.
•In particular, it is useful for exchanging information on the web,
which is one of the most prominent application areas of XML
technology.
•However, XML still remains at the syntactic level, as it describes
the structure of information, but not its meaning.
•The basic language of the Semantic Web is RDF, which is a
language for making statements about pieces of information.
• In our example, such statements include:
<html>
<head>
<title>Apartments for Rent</title>
</head>
<body>
<ol>
<li> Studio apartment on Florida Ave.
<li> 3 bedroom Apartment on Baron Way
</ol>
</body>
</html>
• The syntax of HTML is text with tags (e.g. <title>) written using angle
brackets.
• For example, <head> should come before <body> and <li> elements
should appear within <ol> elements.
• The syntax, data model, and semantics are all defined within the HTML standard.
• HTML is designed to communicate information about the structure of documents for human
consumption.
• We need a data model that can be used by multiple applications, not just for describing
documents for people but for describing application-specific information.
• This data model needs to be domain independent so that applications
ranging from real estate to social networks can leverage it.
• Finally, like HTML, we need a way to write down all this information
– a syntax
• RDF (Resource Description Framework) provides just such a flexible
domain independent data model.
• RDFS allows users to precisely define how their vocabulary (i.e. their
terminology) should be interpreted.
•Combined, these technologies define the components of a
standard language for exchanging arbitrary data between
machines:
• RDF – data model
• RDFS – semantics
• Turtle / RDFa/ RDF-XML – syntax
2. Other Syntaxes
1. RDF/XML
2. RDFa
Turtle
• Already seen graphical syntax for RDF neither machine interpretable nor
standardized.
• Here newly introduced standard machine interpretable syntax is called Turtle - Terse
RDF Triple Language (Turtle) is a text-based syntax for RDF.
• The file extension used for Turtle text files is “.ttl”.
• We have already seen how to write a statement in Turtle earlier.
• Here’s an example:
<html>
<body>
<H1> Baron Way Apartment for Sale</H1>
The Baron Way Apartment has three bedrooms and is located in the family
friendly Baron Way Building. The Apartment is located in the north of Amsterdam.
</body>
</html>
• URLs are enclosed in angle brackets.
• The subject, property, and object of a statement appear in order, followed
by a period.
• Indeed, we can write a whole RDF graph just using this approach.
1. Literals
• So far we have defined statements that link together resources.
• Literals, that is, atomic values within RDF.
• In Turtle, we write this down by simply enclosing the value in quotes
and appending it with the data type of the value.
• A data type tells us whether we should interpret a value as string, a
date, integer or some other type.
• Data types are again expressed as URLs.
• It is recommend practice to use the data types defined by XML
Schema.
• When using these data types the values must conform to the XML
Schema definition.
• If no data type is specified after a literal, it is assumed to be a string.
Here are some common data types and how they look in Turtle:
string - ‘‘Baron Way’’
integers - ‘‘1’’^^<http://www.w3.org/2001/XMLSchema#integer>
decimals - ‘‘1.23’’ <http://www.w3.org/2001/XMLSchema#decimal>
dates - ‘‘1982-08-30’’^^http://www.w3.org/2001/XMLSchema#date
time - ‘‘11:24:00’’^^<http://www.w3.org/2001/XMLSchema#time>
date with a time - ‘‘1982-08-30T11:24:00’’^^<http://www.w3.org/2001/XMLSchema#dateTime>
Suppose that we want to add to our graph that the Baron Way Apartment has three bedrooms. We
would add the following statement in Turtle to our graph.
<http://www.semanticwebprimer.org/ontology/apartments.ttl#BaronWayApartment>
<http://www.semanticwebprimer.org/ontology/apartments.ttl#hasNumberOfBedrooms>
"3"^^<http://www.w3.org/2001/XMLSchema#integer>.
<http://www.semanticwebprimer.org/ontology/apartments.ttl#BaronWayApartment>
<http://www.semanticwebprimer.org/ontology/apartments.ttl#isPartOf>
<http://www.semanticwebprimer.org/ontology/apartments.ttl#BaronWayBuilding>.
<http://www.semanticwebprimer.org/ontology/apartments.ttl#BaronWayBuilding>
<http://dbpedia.org/ontology/location>
<http://dbpedia.org/resource/Amsterdam>
2. Abbreviations
• Often when we define vocabularies, we do so at the same URI.
• In our example, the resources Baron Way Apartment and Baron Way Building are both
defined at the URL http://www.semanticwebprimer.org/ontology/apartments.ttl.
• This URL defines what is termed the namespace of those resources.
• Turtle takes advantage of this convention to allow URLs to be abbreviated.
• It introduces the @prefix syntax to define short stand-ins for particular namespaces.
• For example, we can say that swp should be the stand-in for
http://www.semanticwebprimer.org/ontology/apartments.ttl.
• Such a stand-in is termed a qualified name.
Example using prefixes
@prefix swp: <http://www.semanticwebprimer.org/ontology/apartments.ttl#>.
@prefix dbpedia: <http://dbpedia.org/resource/>.
@prefix dbpedia-owl: <http://dbpedia.org/ontology/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
swp:BaronWayApartment swp:hasNumberOfBedrooms "3"^^<xsd:integer>.
swp:BaronWayApartment swp:isPartOf swp:BaronWayBuilding.
swp:BaronWayBuilding dbpedia-owl:location dbpedia:Amsterdam
• Note that the angle brackets are dropped from around resources that are referred to
using a qualified name
• Secondly, we can mix and match regular URLs with these qualified names.
• Turtle also allows us to not repeat particular subjects when they are used
repeatedly.
• In the example above, swp:BaronWayApartment is used as the subject of two
triples.
• This can be written more compactly by using a semicolon at the end of a
statement.
• For example:
@prefix swp: <http://www.semanticwebprimer.org/ontology/apartments.ttl#>.
@prefix dbpedia: <http://dbpedia.org/resource/>.
@prefix dbpedia-owl: <http://dbpedia.org/ontology/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
swp:BaronWayApartment swp:hasNumberOfBedrooms "3"^^<xsd:integer>;
swp:isPartOf swp:BaronWayBuilding.
swp:BaronWayBuilding dbpedia-owl:location dbpedia:Amsterdam.
• If both a subject and predicate are used repeatedly, we can use a comma at the
end of a statement.
• For instance, if we want to extend our example to say that Baron Way Building
is not only located in Amsterdam but also in the Netherlands,
• we can write the following Turtle:
<html>
<body>
<H1> Baron Way Apartment for Sale</H1>
The Baron Way Apartment has three bedrooms and is located in the family
friendly Baron Way Building. The Apartment is located in the north of Amsterdam.
</body>
</html>
• This page does not contain any machine readable description. We can mark up the page using RDFa as follows:
<html xmlns:dbpedia="http://dbpedia.org/resource/"
xmlns:dbpediaowl="http://dbpedia.org/ontology/"
xmlns:swp="http://www.semanticwebprimer.org/ontology/apartments.ttl#"
xmlns:geo="http://www.geonames.org/ontology#">
<body>
<H1> Baron Way Flat for Sale</H1>
<div about="[swp:BaronWayFlat]">
The BaronWay Flat has <span property="swp:hasNumberOfBedrooms">3</span> bedrooms
and is located in the family friendly <span rel="swp:isPartOf" resource="[
swp:BaronWayBuilding]">Baron Way Building</span>
<div about="[swp:BaronWayBuilding]">
The building is located in the north of Amsterdam.
<span rel="dbpediaowl:location" resource="[dbpedia:Amsterdam]"></span>
<span rel="dbpediaowl:location" resource="[dbpedia:Netherlands]"></span>
</div>
</div>
</body>
</html>
• This markup will produce the same RDF expressed above in Turtle.
• Since the RDF is encoded in tags such as spans, paragraphs, and links, the
RDF will not be rendered by browsers when displaying the HTML page.
• Similar to RDF/XML, namespaces are encoded using the xmlns
declaration.
• In some cases, we must use brackets to inform the parser that we are using
prefixes.
• Subjects are identified by the about attribute.
• Properties are identified by either a rel or property attribute.
• Rel attributes are used when the object of the statement is a resource
whereas a property attribute is used when the object of a statement is a
literal.
•Properties are associated with subjects through the use of the
hierarchal structure of HTML.
•Each syntax for RDF presented above is useful for different
situations.
•However, it is important to realize that even though different
syntaxes may be used, they all share the same underlying data
model and semantics.
•How to write down statements about things identified by URLs
have discussed
•But what do those statements mean?
•How should a computer go about interpreting the statements
made?
RDFS: Adding Semantics
1. Classes and Properties
2. Class Hierarchies and Inheritance
3. Property Hierarchies
4. RDF versus RDFS Layers
•RDF is a universal language that lets users describe
resources using their own vocabularies.
•RDF does not make assumptions about any particular
application domain, nor does it define the semantics of
any domain.
•In order to specify these semantics, a developer or user of
RDF needs to define what those vocabularies mean in
terms of a set of basic domain independent structures
defined by RDF Schema.
1. Classes and Properties
• How do we describe a particular domain?
• Let us consider our domain of apartment rentals.
• First we have to specify the “things” we want to talk about.
• Here we make a first, fundamental distinction.
• On one hand, we want to talk about particular apartments, such as the
BaronWay Apartment, and particular locations, such as Amsterdam; we
have already done so in RDF.
• But we also want to talk about apartments, buildings, countries, cities, and
so on.
• What is the difference?
• In the first case we talk about individual objects (resources), and in the second about
classes that define types of objects.
• A class can be thought of as a set of elements.
• Individual objects that belong to a class are referred to as instances of that
class.
• RDF provides us a way to define the relationship between instances and
classes using a special property rdf:type.
• An important use of classes is to impose restrictions on what can be stated
in an RDF document using the schema.
• In programming languages, typing is used to prevent nonsense from being
written
• such as A+1, where A is an array;
• we lay down that the arguments of + must be numbers).
4. rdfs:subPropertyOf,
• which relates a property to one of its superproperties.
• Here is an example stating that all apartments are residential units:
swp:apartment refs:subClassOf swp:ResidentialUnit
• Note that rdfs:subClassOf and rdfs:subPropertyOf are transitive, by
definition.
• Also, it is interesting that rdfs:Class is a subclass of rdfs:Resource
(every class is a resource),
• and rdfs:Resource is an instance of rdfs:Class (rdfs:Resource is the class
of all resources, so it is a class!).
• For the same reason, every class is an instance of rdfs:Class.
3. Core Properties for Restricting Properties
• The core properties for restricting properties are
• rdfs:domain, which specifies the domain of a property P and states that any
resource
• that has a given property is an instance of the domain classes.
• rdfs:range, which specifies the range of a property P and states that the
values of a
• property are instances of the range classes.
• Here is an example stating that whenever any resource has an
address, it is (by inference)
• a unit and that its value is a literal:
• swp:address rdfs:domain swp:Unit.
• swp:address refs:range rdf:Literal.
4.Useful Properties for Reification
•The following are some useful properties for
reification:
•rdf:subject, which relates a reified statement to its subject
•rdf:predicate, which relates a reified statement to its
predicate
•rdf:object, which relates a reified statement to its object
Reification:
The key idea behind reification is to introduce an auxiliary object, say, LocationStatement, and relate
it to each of the three parts of the original statement through the properties subject, predicate, and
object.
5. Container Classes
• RDF also allows for containers to be represented in a standard way.
• One can represent bags, sequences, or alternatives (i.e., choice).
• rdf:Bag, the class of bags,
• rdf:Seq, the class of sequences,
• rdf:Alt, the class of alternatives,
• rdfs:Container, a superclass of all container classes, including the three
preceding ones.
6. Utility Properties
•A resource may be defined and described in many places on
the web.
•The following properties allow us to define links to those
addresses:
• rdfs:seeAlso relates a resource to another resource that explains it.
• rdfs:isDefinedBy is a subproperty of rdfs:seeAlso and relates a
resource to the place where its definition, typically an RDF schema,
is found.
Often it is useful to provide more information intended for human
readers. This can be done with the following properties:
• rdfs:comment.
• Comments, typically longer text, can be associated with a resource.
• rdfs:label.
• A human-friendly label (name) is associated with a resource.
• Among other purposes, it may serve as the name of a node in a graphic
representation of the RDF document.
7. Example: Housing
We refer to the housing example, and provide a conceptual model of the domain, that is, an ontology.
@prefix swp: <http://www.semanticwebprimer.org/ontology/apartments.ttl#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
swp:Person rdf:type rdfs:Class.
swp:Person rdfs:comment "The class of people".
swp:Unit rdf:type rdfs:Class.
swp:Unit rdfs:comment "A self-contained section of accommodations in a larger building or group of buildings.".
swp:ResidentialUnit rdf:type rdfs:Class.
swp:ResidentialUnit rdfs:subClassOf swp:Unit.
swp:ResidentialUnit rdfs:comment "The class of all units or places where people live.".
swp:Apartment rdf:type rdfs:Class.
swp:Apartment rdfs:subClassOf swp:ResidentialUnit.
swp:Apartment rdfs:comments "The class of apartments".
swp:House rdf:type rdfs:Class.
swp:House rdfs:subClassOf swp:ResidentialUnit.
swp:House rdfs:comment "The class of houses".
swp:residesAt rdf:type rdfs:Property.
swp:residesAt rdfs:comment "Relates persons to their residence".
swp:residesAt rdfs:domain swp:Person.
swp:residesAt rdfs:range swp:ResidentialUnit.
swp:rents rdf:type rdfs:Property.
swp:rents rdfs:comment "It inherits its domain (swp:Person) and range (swp:ResidentialUnit) from its
superproperty (swp:residesAt)".
swp:rents rdfs:subPropertyOf swp:residesAt.
swp:address rdf:type rdfs:Property.
swp:address rdfs:comment "Is a property of units and takes literals as its value".
swp:address rdfs:domain swp:Unit.
swp:address rdfs:range rdf:Literal.
8. Example: Motor Vehicles
Here we present a simple ontology of motor vehicles
• A final example often comes as a surprise to people first looking at RDF Schema:
IF E contains the triples (?x, ?p, ?y)
and (?p, rdfs : range, ?u)
THEN E also contains the triple (?y, rdf : type, ?u)