0% found this document useful (0 votes)
5 views

Lecture_5

This lecture discusses the differences between XML and HTML, highlighting XML's advantages in structural information and machine readability. It covers the role of XML in the Semantic Web, its standard structure, and the importance of well-formed documents and namespaces. Additionally, it introduces addressing and querying XML documents using path expressions and XPath examples.

Uploaded by

manchestermilf1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture_5

This lecture discusses the differences between XML and HTML, highlighting XML's advantages in structural information and machine readability. It covers the role of XML in the Semantic Web, its standard structure, and the importance of well-formed documents and namespaces. Additionally, it introduces addressing and querying XML documents using path expressions and XPath examples.

Uploaded by

manchestermilf1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

LECTURE 5

Knowledge Representation : AI 303


By
Dr.Ashraf Hendam
OUTLINE
• XML vs HTML
• Problems with Automated Interpretation of HTML Documents
• HTML vs XML: Structural Information
• HTML vs XML: Different Use of Tags
• Role of XML in the Semantic Web
• XML Standard structure
• Addressing & Querying XML Documents
• Types of Path Expressions
Semantic Web layers
XML vs HTML
• HTML
<h2>Nonmonotonic Reasoning: Context-Dependent Reasoning</h2>
<i>by <b>V. Marek</b> and
<b>M. Truszczynski</b>
</i>
<br>Springer 1993
<br> ISBN 0387976892
• XML
<book>
<title>Nonmonotonic Reasoning: Context- Dependent Reasoning</title>
<author>V. Marek</author> <author>M. Truszczynski</author>
<publisher>Springer</publisher> <year>1993</year>
<ISBN>0387976892</ISBN>
</book>
XML vs HTML
• Both use tags (e.g. <h2> and <year>)
• Tags may be nested (tags within tags)
• Human users can read and interpret
both HTML and XML representations
quite easily
• … But how about machines?
Problems with Automated Interpretation of
HTML Documents
• An intelligent agent trying to retrieve the names of
the authors of the book
• Authors’ names could appear immediately after
the title or immediately after the word by
• Are there two authors?
• Or just one, called “V. Marek and M. Truszczynski”?
HTML vs XML: Structural Information
• HTML documents do not contain structural information,
• i.e., pieces of the document and their relationships.
• HTML has only presentation
• XML more easily accessible to machines because
Every piece of information is described
Relations are also defined through the nesting structure.
E.g., the <author> tags appear within the <book> tags,
so they describe properties of the particular book.
HTML vs XML: Structural Information
• A machine processing the XML document would be
able to deduce that
the author element refers to the enclosing book
element rather than by proximity considerations
• XML allows the definition of constraints on values
• E.g. year must be a number of four digits
HTML vs XML: Another Example
In HTML
<h2>Relationship force-mass</h2>
<i> F = M × a </i>
In XML
<equation>
<meaning>Relationship forcemass</meaning>
<leftside> F </leftside>
<rightside> M × a </rightside>
</equation>
HTML vs XML: Different Use of Tags
• In HTML docs we have the same tags
• In XML completely different (for different meanings)
• HTML tags define display: color, lists …
• XML tags not fixed: user definable tags
• XML is a meta markup language:
• 11 language for defining markup languages
Role of XML in the Semantic Web

• The Semantic Web involves ideas and languages


at a fairly abstract level, e.g.: for defining
ontologies, publishing data using them
• We also need a practical way of encoding the
abstract languages
• Today’s Web technology is (still) largely based on
XML standards
Role of XML in the Semantic Web
• XML is a
(1) source of many key SW concepts and technology
(2) potential alternative the SW must improve on
(3) common serialization for SW data.
• XML: Meant for Serialization
• A serialization format is a way to encode information so that
when it’s passed between machines it can be parsed.
• In fact, the popularity of XML is due to its addressing the
problem of too many file formats.
XML Standard structure
• An XML document consists of
1. a prolog
2. a number of elements
Prolog of an XML Document
The prolog consists of
XML prolog which contains both XML Declaration and XML
DTD (Document Type Definition) and the body. If the XML prolog
is present, it should always be the beginning of the document.
<?xml version="1.0" encoding="UTF-16"?>
<!DOCTYPE book SYSTEM "book.dtd">
XML Elements

Elements are the “things” the XML document talks about


– E.g., books, authors, publishers
An element consists of:
– an opening tag
– the content
– a closing tag
<lecturer> David Billington </lecturer>
Content of XML Elements
• Content is what’s between the tags
• It can be text, or other elements, or nothing
<lecturer>
<name>David Billington</name>
<phone> +61 − 7 − 3875 507 </phone>
</lecturer>
• If there is no content, then element is called empty; it can be
abbreviated as follows:
• <lecturer/> = <lecturer></lecturer>
XML Attributes
An empty element isn’t necessarily meaningless
– It may have properties expressed as attributes
An attribute is a name-value pair inside the opening tag of an
element
<order orderNo="23456“
customer="John Smith"
date="October 15, 2017” >
<item itemNo="a528" quantity="1“ />
<item itemNo="c817" quantity="3“ />
</order>
Well-Formed XML Documents
Are constraints on syntactically correct documents:
– Only one outermost element (root element)
– Each element contains opening and corresponding closing tag
except self-closing tags like <foo/>)
– Tags may not overlap
<author><name>Lee Hong</author></name>
– Attributes within an element have unique names
– Element and tag names must be permissible
e.g.: can’t use strings beginning with digit “2ndbest”
The Tree Model of XML Docs
The tree representation of an XML document is an
ordered labeled tree:
– There is exactly one root
– There are no cycles
– Each non-root node has exactly one parent
– Each node has a label.
– The order of elements is important
– … but the order of attributes is not
The Tree Model of XML Docs
<email>
<head>
<from name="Michael Maher" address="[email protected]" />
<to name="Grigoris Antoniou" address="[email protected]" />
<subject>Where is your draft?</subject>
</head>
<body>
Grigoris, where is the draft of the paper you promised me last week?
</body>
</email>
Structuring XML Documents
An XML document is valid if
– it is well-formed XML
– Respects the structuring information it uses
Ways to define structure of XML documents:
– DTDs (Document Type Definition) came first, was
based on SGML’s approach
– XML Schema (aka XML Schema Definition, XSD) is
more recent and expressive
– RELAX NG and DSDs are two alternatives
Namespaces
• XML namespaces provide uniquely named
elements & attributes in an XML document
• XML document may use > 1 DTD or schema
Since each was developed independently,
name collisions can occur
Solution: use different prefix for each DTD
or schema
prefix:name
Namespaces even more important in RDF
Namespace Declarations
• Namespaces declared within elements for use
in it and its children (elements and attributes)
• A namespace declaration has form:
– xmlns:prefix="location"
– location is the URL of the DTD or XML
schema
• If no prefix specified: xmlns="location" then
the location is used as the default prefix
We’ll see this same idea used in RDF
Namespace Declarations
<vu:instructors xmlns:vu="http://www.vu.com/empDTD"
xmlns:gu="http://www.gu.au/empDTD"
xmlns:uky="http://www.uky.edu/empDTD" >
<uky:faculty uky:title="assistant professor" uky:name="John Smith"
uky:department="Computer Science"/>
<gu:academicStaff gu:title="lecturer" gu:name="Mate Jones"
gu:school="Information Technology"/> </vu:instructors>
Addressing & Querying XML Documents
• In relational databases, parts of a database can be selected
and retrieved using SQL
– Also very useful for XML documents
– Query languages: XQuery, XQL, XML-QL
• The central concept of XML query languages is
a path expression
– Specifies how a node or set of nodes, in the
tree representation, can be reached
• Useful for extracting data from XML
Types of Path Expressions
• Absolute (starting at the root of the tree)
– Syntactically they begin with the symbol /
– It refers to the root of the document (one
level above document’s root element)
• Relative to a context node
An XML Example
<library location="Bremen">
<author name="Henry Wise">
<book title="Artificial Intelligence"/>
<book title="Modern Web Services"/>
<book title="Theory of Computation"/>
</author>
<author name="William Smart">
<book title="Artificial Intelligence"/>
</author>
<author name="Cynthia Singleton">
<book title="The Semantic Web"/>
<book title="Browser Technology Revised"/>
</author>
</library>
An XML Example
Well-Formed XML Documents
Are constraints on syntactically correct documents:
– Only one outermost element (root element)
– Each element contains opening and corresponding closing tag
except self-closing tags like <foo/>)
– Tags may not overlap
<author><name>Lee Hong</author></name>
– Attributes within an element have unique names
– Element and tag names must be permissible
e.g.: can’t use strings beginning with digit “2ndbest”
Converting XML file to tree representation
To convert XML file to a tree representation the following rules are
used:-
• Elements are represented by ovals
• Attributes are represented by ovals
• Solid lines are used to connect between Elements and Elements
to attributes
• Attributes values are represented by rectangular shape
• Dotted lines are connecting attributes and their values
Converting XML file to tree representation
<library location="Bremen">
<author name="Henry Wise">
<book title="Artificial Intelligence"/>
<book title="Modern Web Services"/>
<book title="Theory of Computation"/>
</author>
<author name="William Smart">
<book title="Artificial Intelligence"/>
</author>
<author name="Cynthia Singleton">
<book title="The Semantic Web"/>
<book title="Browser Technology Revised"/>
</author>
</library>
Converting XML file to tree representation
Converting tree representation to XML file assignment
bookstore

book

category title author price edition

cooking lang name currency amount order year

en Giada dollar 30 first 2005


Well-Formed XML Documents Example

<bookstore> <bookstore>
<book> <book>
<7title lang="en“ lang=“ar">Harry Potter</title> <title lang="en">Harry Potter</title>
<price>29.99</book></price> <price>29.99</price>
</book> </book>
<book> <book>
<title lang="en">Learning XML</title> <title lang="en">Learning XML</title>
<price>39.95</price> <price>39.95</price>
</book> </book>
</bookstore>
</bookstore>
Examples of Path Expressions in XPath
Q1: /library/author
– Addresses all author elements that are children of
the library element node immediately below root
– /t1/.../tn, where each ti+1 is a child node of ti, is a
path through the tree representation
Q2: //author
– Consider all elements in document and check
whether they are of type author
– Path expression addresses all author elements
anywhere in the document
Examples of Path Expressions in XPath
Q3: /library/@location
– Addresses location attribute nodes within library
element nodes
– The symbol @ is used to denote attribute nodes
Q4: //book/@title="Artificial Intelligence”
– Adresses all title attribute nodes within book
elements anywhere in the document that have the
value “Artificial Intelligence
Examples of Path Expressions in XPath
//book/@title="Artificial Intelligence”
Examples of Path Expressions in XPath
Q6: Address first author element node in the XML
document
//author[1]
Q7: Address last book element within the first
author element node in the document
//author[1]/book[last()]
Q8: Address all book element nodes without a title
attribute
//book[not @title]
Examples of Path Expressions in XPath
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book>
<title lang="en">Harry Potter</title>
<price>29.99</price>
</book>
<book>
<title lang="en">Learning XML</title>
<price>39.95</price>
</book>
</bookstore>
Examples of Path Expressions in XPath
Expression Description

nodename Selects all nodes with the name "nodename"

/ Selects from the root node

// Selects nodes in the document from the current node that


match the selection no matter where they are

. Selects the current node

.. Selects the parent of the current node

@ Selects attributes

/bookstore Selects the root element bookstore


bookstore/book Selects all book elements that are children of bookstore
//book Selects all book elements no matter where they are in the document
bookstore//book Selects all book elements that are descendant of the bookstore element
//@lang Selects all attributes that are named lang
Examples of Path Expressions in XPath
Expression Description

[1] Selects the first element that is the child of the element.

[last()] Selects the last element that is the child of the element

[position()<3] Selects the first two book elements that are children of the element

//element[@attribute] Selects all the elements that have a certain attribute named

.. Selects the parent of the current node

@ Selects attributes

/bookstore/book[1] Selects the first book element that is the child of the bookstore
/bookstore/book[last()] Selects the last book element that is the child of the bookstore element
/bookstore/book[position()<3] Selects the first two book elements that are children of the bookstore element
//title[@lang] Selects all the title elements that have an attribute named lang

You might also like