0% found this document useful (0 votes)
71 views61 pages

Semantic Web and Ontologies

The document discusses semantic analysis in language technology and provides an overview of ontologies and the semantic web. It defines the semantic web as an evolution of the existing web where metadata provides machine-readable meaning. Ontologies are described as rich conceptual schemas that provide formal definitions and allow knowledge to be annotated and linked on the web in a machine-understandable way. The goal of the semantic web is to allow information and services on the web to be more effectively used by humans and automated tools.

Uploaded by

Naglaa Fathy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views61 pages

Semantic Web and Ontologies

The document discusses semantic analysis in language technology and provides an overview of ontologies and the semantic web. It defines the semantic web as an evolution of the existing web where metadata provides machine-readable meaning. Ontologies are described as rich conceptual schemas that provide formal definitions and allow knowledge to be annotated and linked on the web in a machine-understandable way. The goal of the semantic web is to allow information and services on the web to be more effectively used by humans and automated tools.

Uploaded by

Naglaa Fathy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

Semantic Analysis in Language Technology

http://stp.lingfil.uu.se/~santinim/sais/2016/sais_2016.htm

Ontologies and the Semantic Web

Marina Santini
[email protected]fil.uu.se

Department of Linguistics and Philology


Uppsala University, Uppsala, Sweden

Spring 2016
Acknowledgements
• Most slides based on Harrocks (2008).

The Semantic Web & Ontologies 2


Outline
• The Semantic Web

• Ontologies

The Semantic Web & Ontologies 3


Chronology
http://en.wikipedia.org/wiki/
History_of_the_World_Wide_Web
• On August 6, 1991,Berners-­‐Lee posted a short summary of the World Wide
Web project on the alt.hypertext newsgroup, inviting collaborators. This date
also marked the debut of the Web as a publicly available service on the
Internet, although new users could only access it after August 23.

• Beginning in 2002, new ideas for sharing and exchanging content ad hoc,
such as Weblogs and RSS, rapidly gained acceptance on the Web. This new
model for information exchange, primarily featuring user-­‐generated and
user-­‐edited websites, was dubbed Web 2.0.

• Popularized by Berners-­‐Lee's book Weaving the Web (2000) and a Scientific


American article by Berners-­‐Lee, James Hendler, and Ora Lassila, the term

• Semantic Web describes an evolution of the existing Web in


which the network of hyperlinked human-­‐readable web
pages is extended by machine-­‐readable metadata about
documents and how they are related to each other,
enabling automated agents to access the Web more
intelligently and perform tasks on behalf of users.

• In 2006, Berners-­‐Lee and colleagues stated that


the idea "remains largely unrealized"
The Semantic Web & Ontologies 4
Web 1.0
• Web 1.0 is a retronym referring to an early stage of the
World Wide Web's evolution.
• Some design elements of a Web 1.0 site include:

– Personal web pages were common, consisting mainly of


static pages
– Static pages instead of dynamic HTML.
– The use of HTML 3.2-­‐era elements such as Framing (World
Wide Web)s and tables to position and align elements on a
page (now we use css and frames are deprecated)
– GIF buttons...

The Semantic Web & Ontologies 5


Web 2.0
• Web 2.0 describes World Wide Web sites that use technology
beyond the static pages of earlier Web sites.
• The key features of Web 2.0 include:

– Tagging ‐-­ allows users to collectively classify and find information (e.g.
Tagging)
– Rich User Experience-­‐ dynamic content; responsive to user input
– User Participation ‐-­ information flows two ways between site owner and
site user by means of evaluation, review, and commenting.
– Site users add content for others to see
– Mass Participation ‐-­ Universal web access leads to differentiation of
concerns from the traditional internet userbase.
– etc.

The Semantic Web & Ontologies 6


Web 3.0
• “Web 3.0, a phrase coined by John Markoff of the New York Times in 2006,
refers to a supposed third generation of Internet-­‐based services that
collectively comprise what might be called ‘the intelligent Web’ — such as
those using semantic web, microformats, natural language search, data-­‐
mining, machine learning, recommendation agents, and artificial
intelligence technologies — which emphasize machine-­‐facilitated
understanding of information in order to provide a more productive and
intuitive user experience.”

• Web 3.0 will be more connected, open, and intelligent, with semantic Web
technologies, distributed databases, natural language processing, machine
learning, machine reasoning, and autonomous agents.
– http://lifeboat.com/ex/web.3.0

This has yet to happen.


The Semantic Web & Ontologies 7
The web: present and future
• "The Web was designed as an information space, with the
goal that it should be useful not only for human-­‐human
communication, but also that machines would be able to
participate and help.

• One of the major obstacles to this has been the fact that
most information on the Web is designed for human
consumption, and even if it was derived from a database
with well defined meanings (in at least some terms) for its
columns, that the structure of the data is not evident to a
robot browsing the Web.

• Leaving aside the artificial intelligence problem of training


machines to behave like people, the Semantic Web
approach instead develops languages for expressing
information in a machine process-­‐able form"-­‐
– Tim Berners-­‐Lee, The Semantic Web Roadmap, 1998
– http://www.w3.org/DesignIssues/Semantic.html

The Semantic Web & Ontologies 8


Today…
• The web is relatively simple:
– Hypertexts and hypermedia
– Access is engineered via a combination of
keyword-­‐based search and link nagivation.

This simplicity has been one of the great


strengths of the web, and has been an
important factor in its popularity and their own
content.

The Semantic Web & Ontologies 9


Shortcomings
Examples:
• Finding information about people with very
common names can be a frustrating experience.

• Answering more complex queries along with


more general information retrieval, integration,
sharing and processing can be difficult …. We
have seen that…
The Semantic Web & Ontologies 10
Some solutions
• Software glue: Mashups
– location information from one source might be combined with
map information from another source in order to show the
location of and provide directions to points of interest such as
hotels and restaurants.

• Tagging via social networks (Web 2.0)


– harness the power of user communities in order to share and
annotate information.
• Examples include image and video shar-­‐ing sites such as Flickr and
YouTube, and auction sites such as eBay.
– In these applications, annotations usually take the form simple tags, such as
”each", ”birthday", ”family" and ”friends". The meaning of tags is, however,
typically not well defined, and may be impenetrable even to human users:
typ-­‐ical examples (from Flickr) include "asquatchmusicfestival",
"elebritylookalikes", and "wab08".

The Semantic Web & Ontologies 11


The ”travel agent”
• The classic example of a semantic web application is an
automated travel agent that, given various constraints
and preferences, would offer the user suitable travel or
vacation suggestions.

• A key feature of such a "software agent" is that it


would not simply exploit a predetermined set of
information sources, but would search the web for
relevant information in much the same way that a
human user might do when planning a vacation.

The Semantic Web & Ontologies 12


The goal
• The goal of the Semantic Web is to allow web
information and services to be more
effectively exploited by humans and
automated tools.

The Semantic Web & Ontologies 13


Semantic Web
• The focus of the semantic web is to share data
instead of documents.

• In other words, it is a project that should provide


a common framework that allows data to be
shared and reused across application, enterprise,
and community boundaries.

• It is a collaborative effort led by World Wide Web


Consortium (W3C).

The Semantic Web & Ontologies 14


Semantic Web & Ontologies
• How are we going to represent meaning and knowledge on the web?

• A key idea behind the semantic web is to address this problem by giving
machine-­‐accessible semantics via annotation.

• Knowledge is represented in the form of rich conceptual schemas called


ontologies.

• Ontologies are the backbone of the Semantic Web.

• Ontologies are rich conceptual schemas that give formally defined


meanings to the terms used in annotations, transforming them into
semantic annotations.

• They provide the knowledge that is required for semantic


kinds.
applications of all The Semantic Web & Ontologies
Main Difficulty
• Current web content is intended for humans
(HTML markup with layout, images and other
presentational features).

• Humans understand this content, but


machines can’t.

The Semantic Web & Ontologies 16


Basically...
• Ontologies provide a shared understanding of a domain.

• They provide background knowledge to automatize certain tasks.

• By the process of annotation, knowledge can be linked to


ontologies.
– Example: “Angelina Jolie” (Text) linked to concept Actress
– In our ontology we also know that an actress always is female and a
person.

• Ontologies allow the creation of annotations  machine-readable and


machine-understandable content.

• If machines can understand content, they can also perform more


meaningful and intelligent queries.
– Distinction of Jaguar the animal and the car.
– Combination of information that is distributed on the Web.

The Semantic Web & Ontologies


Old and New Issues
Old ones:
• knowledge representation
• Reasoning
• Harnessing the idiosyncracies of natural languages
• …

New ones:
• integrating different ontologies may prove to be at least as
hard as integrating the resources that they describe
• Creation of suitable annotations
• …

The Semantic Web & Ontologies 18


Regardless these issues…
• … considerable progress has been made in the
development of the infrastructure needed to
support the semantic web.

• In particular, there has been impressive


progress in the development of languages and
tools for content annotation and for the
design and deployment of ontologies.

The Semantic Web & Ontologies 19


Semantic Annotation
• To facilitate the process of semantic
annotation, RDF and OWL have been
developed as standard formats fo the sharing
and integration of data and knowledge.
• RDF and OWL are standards:
– RDF (Resource Description Framework)
– OWL (Web Ontology Language)

The Semantic Web & Ontologies 20


Ontologies Tree of Porphyry

(Metaphysics)
• Ontology, in its original philosophical
sense, is a fundamental branch of
metaphysics focusing on the study of
existence.

• Its objective is to determine what


entities and types of entities actually
exist, and thus to study the
structure of the world.

• The study of ontology can be traced


back to the work of Plato and
Aristotle, and includes the
development of hierarchical
categorisations of different kinds of
entities and the features that
distinguish them

The Semantic Web & Ontologies 21


Tree of Porphyry, III AD
• The Porphyrian tree, Tree of Porphyry or
Arbor Porphyriana is a classic device for
illustrating what is also called a "scale of
being". It was suggested by the 3rd century
AD Greek neoplatonist philosopher and
logician Porphyry

The Semantic Web & Ontologies 22


Ontology (Computer Science, AI, LT, IR…)

• Engineering artefact, usually a model of some


aspect of the world.

• It introduces vocabulary describing various


aspects of the domain being modelled, and
provides an explicit specification of the intended
meaning of the vocabulary.

• This specification often includes classification-­‐


based information, not unlike that in Porphyry's
tree.
The Semantic Web & Ontologies 23
What is an ontology (i)?
“An ontology is a formal, explicit specification of a shared conceptualization”

Consensual
Knowledge
Machine-readable

Concepts, properties
relations, functions, Abstract model and
constraints, axioms, simplified view of some
are explicitly defined phenomenon in the world
that we want to represent
Studer, Benjamins, Fensel. Knowledge Engineering: Principles and Methods. Data and Knowledge Engineering. 25 (1998) 161-­‐197

An ontology is an explicit specification of a conceptualization


Gruber, T. A translaTthioen SAepmpraonatichc tWo peobrt&abOlenotnotloolgoigeysspecifications. Knowledge Acquisition. Vol. 5.
1993. 199-220
What is an ontology (ii)?
• An ontology is a hierarchically structured set of terms for describing a
domain that can be used as a skeletal foundation for a
knowledge base
B. Swartout; R. Patil; k. Knight; T. Russ. Toward Distributed Use of Large-Scale Ontologies
Ontological
Engineering. AAAI-97 Spring Symposium Series. 1997. 138-148

• An ontology defines the basic terms and relations comprising


the vocabulary of a topic area, as well as the rules for
combining terms and relations to define extensions to the
vocabulary
Neches, R.; Fikes, R.; Finin, T.; Gruber, T.; Patil, R.; Senator, T.; Swartout, W.R. Enabling Technology
for Knowledge Sharing. AI Magazine. Winter 1991. 36-56

• An ontology provides the means for describing explicitly the


conceptualization behind the knowledge represented in a knowledge
base
A. Bernaras;I. Laresgoiti; J. Correra.
Building and Reusing Ontologies for Electrical Network
Applications
ECAI96. 12th European conference on Artificial Intelligence. Ed. John Wiley & Sons, Ltd.
298-302

The Semantic Web & Ontologies


Examples
• Top level ontology: Standard Upper Ontology
– In information science, an upper ontology (also known as a
top-­‐ level ontology or foundation ontology) is an ontology (in
the sense used in information science) which describes very
general concepts that are the same across all knowledge
domains.

• Linguistic ontology: WordNet


• General Ontology: Cyc, UNSPSC, ecl@ss
• Domain ontology: MeSH (Medical Subject Headings),
CHEMICALS, UMLS
• Research ontology: KA2 (Knowledge Acquisition
Community Ontology)

The Semantic Web & Ontologies 26


Resource Description Framework (i)

• A language that has been developed in order to


provide a extensible mechanism for describing web
resources and relationships between them.

• A key feature of RDF is the use of Internationalized a directed graph


is a set of nodes
Resource Identifiers (IRIs) (which is a generalisation of
Uniform Resource Locators (URLs) to refer to connected by
resources. edges, where
/ˈaɪˌɑːˌraɪ/ the edges have a
direction
• RDF is a very simple language: its underlying data associated with
structure is a labelled directed graph, and its only them.
syntactic construct is the triple.

• A triple consists of three components, referred to as


the subject, predicate and object.
The Semantic Web & Ontologies 27
RDF (ii)
• More formally, a triple represents a single edge
(labelled with the predicate) connecting two nodes
(labelled with the subject and object); it describes a
binary relationship between the subject and object via
the predicate.

• The predicate of a triple is always an IRI, and an IRI that is


used in the predicate position of a triple is called a
property.

• A set of triples is called an RDF graph.

• In order to facilitate the sharing and exchanging of graphs


on the web, an XML serialisation has also been defined.
The Semantic Web & Ontologies 28
”Harry Potter has a pet called Hedwig…”
RDF graph

RDF/XML

The Semantic Web & Ontologies 29


Lect 09: Relation Extraction:
DBPediaRelation database that draw from Wikipedia

• Resource Description Framework (RDF) triples


subject predicate object
Golden Gate Park location San Francisco
dbpedia:Golden_Gate_Park dbpedia-­‐
owl:location dbpedia:San_Francisco

• DBPedia: The DBpedia project uses the Resource


Description Framework (RDF) to represent the extracted
information and consists of 3 billion RDF triples, 580 million
extracted from the English edition of Wikipedia and 2.46
billion from other language editions (wikipedia, March
2016).
The Semantic Web & Ontologies 30
… but … not enough…
• Capabilities of RDF as ontology language are
limited
– No cardinality
– No possible to describe conjunction of classes
–…

cardinality of a set is a measure


of the "number of elements of
the set”. For example, the set A
= {2, 4, 6} contains 3 elements,
RDF is a very simple language and therefore A has a
cardinality of 3

The Semantic Web & Ontologies 31


Need for a more expressive ontology language:
OWL (Web Ontology Language)
• Since the architecture of the web depends on agreed
standards, the World Wide Web Consortium (W3C) set
up a standardisation working group to develop a
standard for a web ontology language.

• The result of this activity was the OWL


ontology language standard.

• The integration of OWL with RDF has the advantage of


making OWL ontologies directly accessible to web
based applications.
The Semantic Web & Ontologies 32
Back Story:
http://ileriseviye.wordpress.com/2011/11/01/why-­‐web-­‐
ontology-­‐language-­‐is-­‐abbreviated-­‐as-­‐owl-­‐and-­‐not-­‐wol/

The Semantic Web & Ontologies 33


Description Logics (DLs)
• A key feature of OWL is its basis in Description
Logics, a family of logic-­‐based knowledge
representation formalisms that have a formal
semantics based on first-­‐order logic (FOL).

The Semantic Web & Ontologies 34


Description Logics
• We can use DLs to model an application domain. The focus is
then on:
– Representation of knowledge about categories
– The set of categories in an application domain is called
terminology
– The terminology is arranged in a hierachical organization called
ontology, which capture superset & subset relations among
categoires/concepts.
– In order to specify a hierachical structure, we can use
subsumption relations betw the appropriate concepts in a
terminiology
– Subsumption is a form of inference. Determines whether a
superset/subset relation (based on the fact asserted in a
terminology) exists betw two concepts.

The Semantic Web & Ontologies 35


In short, DLs are…
• … formalisms based on an object-­‐oriented
modelling, in which the domain is described in
terms of individuals (instances), concepts
(classes), and roles (properties/predicates):

– individuals, e.g., "Hedwig", are the basic elements of


the domain;
– concepts, e.g., "Owl", describe sets of individuals
having similar characteristics;
– roles, e.g., "hasPet", describe relationships between
pairs of individuals, such as "HarryPotter hasPet
Hedwig".

The Semantic Web & Ontologies 36


Axioms
• An OWL ontology consists of a set of axioms

• Exemple:
– given the axiom C equivalentClass D, then an individual is an instance of C if and
only if it is an instance of D.
– i.e. Combining axioms with class descriptions allows for easy extension of the
vocabulary by introducing new names as abbreviations for descriptions.

See the following axiom:


Class: HogwartsStudent
EquivalentTo: Student and attendsSchoolvalue
Hogwarts

introduces the class name HogwartsStudent, and asserts that its instances are
just those Students who attend Hogwarts.

The Semantic Web & Ontologies 37


TBox & ABox
• Axioms describe constraints on the structure of the
domain:
– in DLs such a set of axioms is called a TBox (Terminology
Box).

• OWL also allows for axioms asserting facts about some


concrete situation, similar to data in a database setting:
– in DLs such a set of axioms is called an ABox (Assertion Box).

The Semantic Web & Ontologies 38


Decid-‐ability
­ (i)
• Description Logics are fully-­‐fledged logics and
so have a formal semantics.
• DLs can be seen as decidable subsets of
FOL with:
– individuals being equivalent to constants,
– concepts to unary predicates,
– roles to binary predicates.

The Semantic Web & Ontologies 39


FOL … undecidable (sometimes)

• The Incompleteness Theorem , proven in


1930, demonstrates that first-­‐order logic is in
general undecidable.
• That means there exist statements in this logic
form that, under certain conditions, cannot
be proven either true or false.
• Ex: can’t solve the Halting Problem

The Semantic Web & Ontologies 40


Halting Problem
• In 1936 Alan Turing proved that it's not possible to decide whether
an arbitrary program will eventually halt, or run forever.

• The official definition of the problem is to write a program (actually,


a Turing Machine*) that accepts as parameters a program and its
parameters. That program needs to decide, in finite time, whether
that program will ever halt running these parameters.

• The halting problem is a cornerstone problem in computer science.


It is used mainly as a way to prove a given task is impossible, by
showing that solving that task will allow one to solve the halting
problem.

*A Turing machine is a hypothetical device that manipulates symbols


according to a table of rules. Despite its simplicity, a Turing machine
can be adapted to simulate the logic of any computer algorithm,

The Semantic Web & Ontologies 41


Decid-‐ability
­ (ii)
• DLs give a precise and unambiguous meaning
to descriptions of the domain

• This also allows for the development of


reasoning algorithms that can provide correct
answers to arbitrarily complex queries
about the domain.

The Semantic Web & Ontologies 42


Reasoning:
OWL vs Databases
OWL axioms behave like inference rules rather than database constraints.

Class: Phoenix
SubClassOf: isPetOf only Wizard

Individual: Fawkes
Types: Phoenix
Facts: isPetOf Dumbledore

• Fawkes is said to be a Phoenix and to be the pet of Dumbledore, and it is also stated that only a
Wizard can have a pet Phoenix.

• In OWL, this leads to the implication that Dumbledore is a Wizard. That is, if we were to query the
ontology for instances of Wizard, then Dumbledore would be part of the answer.

• In a database setting the schema could include a similar statement about the Phoenix class, but in
this case it would be interpreted as a constraint on the data: adding the fact that Fawkes isPetOf
Dumbledore without Dumbledore being already known to be a Wizard would lead to an invalid
database state, and such an update would therefore be rejected by a database management
system as a constraint violation.

The Semantic Web & Ontologies 43


Ontology Development Tools
• State of the art ontology development tools, such
as SWOOP, Protégé, and TopBraid Composer, use
DL reasoners to provide feedback to the user about
the logical implications of their design:
– i.e. warnings about inconsistencies and synonyms.

The Semantic Web & Ontologies 44


WebProtégé
http://protege.stanford.edu/products.php#web-­‐protege

The Semantic Web & Ontologies 45


VOWL:
Visual Notation for OWL
Ontologies http://vowl.visualdata
web.org/v2/

The Semantic Web & Ontologies 46


Domain-­‐specific ontologies
• The availability of tools has contributed to the
increasingly widespread use of OWL, and it has
become the de facto standard for ontology
development in fields as diverse as
– Biology
– Medicine
– Geography
– Geology
– Agriculture
– Defence
– etc

The Semantic Web & Ontologies 47


Complex Queries
• The use of DL reasoners allows OWL ontology
applications to answer complex queries and to
provide guarantees about the correctness of the
result.
• Reliability and correctness are clearly important
features of any information system;

• They are particularly important if ontology based


systems are to be used in safety-­‐critical
applications such as medicine, where incorrect
reasoning could adversely impact patient care.
The Semantic Web & Ontologies 48
Standard Query Language
• It has long been recognised that the semantic
web, and semantic web knowledge
representation languages such as RDF and
OWL, would also benefit from the
availability of a standardised query language
such as SQL
• A W3C standardisation working group was set
up, and has completed its work on the
SPARQL query language standard.
The Semantic Web & Ontologies 49
SPARQL Protocol and RDF Query Language

• … is an RDF query language, ie a query
language that can retrieve and manipulate
data stored in RDF format (ie triples).

• SPARQL allows for a query to consist of triple


patterns, conjunctions, disjunctions, and
optional patterns

The Semantic Web & Ontologies 50


Tags & Ontologies
• Tagging facilities within Web 2.0 applications
have shown how it might be possible for user
communities to collaboratively annotate
web content, and create simple forms of
ontology via the development of
hierarchically organised sets of tags, often
called folksonomies….

The Semantic Web & Ontologies 51


Challenges
• Currently hard to combine:
– Increased expressive power (by using more
sophisticated logics) with scalability (large ontologies)

The Semantic Web & Ontologies 52


Ontology Learning
• Ontology learning (ontology extraction, ontology generation, or ontology
acquisition) is the automatic or semi-­‐automatic creation of ontologies,
including extracting the corresponding domain's terms and the
relationships between those concepts from a corpus of natural language
text, and encoding them with an ontology language for easy retrieval.

• As building ontologies manually is extremely labor-­‐intensive and time


consuming, there is great motivation to automate the process.

• Typically, the process starts by extracting terms and concepts or noun


phrases from plain text using linguistic processors such as part-­‐of-­‐speech
tagging and phrase chunking. Then statistical techniques are used to
extract relation, often based on Machine Learning.
– http://en.wikipedia.org/wiki/Ontology_learning

The Semantic Web & Ontologies 53


In summary…
Why to build an ontology?

•To share common understanding of the


structure of information among people or
software agents
• To enable reuse of domain knowledge
• To make domain assumptions explicit
• To analyze domain knowledge
The Semantic Web & Ontologies 54
How to build an ontology
Generally speaking (and roughtly said), when
designing an ontology, four main components
are used:
1. Classes
2. Relations
3. Axioms
4. Instances

The Semantic Web & Ontologies 55


Classes
•concepts of the domain or tasks, which are
usually organized in taxonomies
Ex: in a university ontology, student and
professor are two classes

The Semantic Web & Ontologies 56


Relations
A type of interaction between concepts of the
domain:
Ex: subclass-­‐of or is-­‐a are relations

The Semantic Web & Ontologies 57


Axioms
Assertions that are always true for the domain
of interest

Ex: if a student attends both ”Math” and ”Basic


text processing” courses, then he or she must
be a 1st year student.

The Semantic Web & Ontologies 58


Instances
Represent specific elements
Ex: a Student called Peter is the instance of
Student class

The Semantic Web & Ontologies 59


Important!
• There is no single correct class hierarchy
for any given domain.
• The hierarchy depends on the possible uses of
the ontology.
• The level of detail is depend on the applications
and purposes.

The Semantic Web & Ontologies 60


The end

The Semantic Web & Ontologies 61

You might also like