0% found this document useful (0 votes)
26 views47 pages

Siam6 PDF

This document discusses integrating XML and databases. It provides an overview of XML, including its origins, usages, and key specifications like XML Schema. It describes the structure of XML documents, including elements, attributes, IDs, and DTDs. It also covers XML Schema which adds data typing and constraints compared to DTDs. The document is intended to provide background on XML and how its structure can be used to represent and integrate data.

Uploaded by

Cherry Mea Bahoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views47 pages

Siam6 PDF

This document discusses integrating XML and databases. It provides an overview of XML, including its origins, usages, and key specifications like XML Schema. It describes the structure of XML documents, including elements, attributes, IDs, and DTDs. It also covers XML Schema which adds data typing and constraints compared to DTDs. The document is intended to provide background on XML and how its structure can be used to represent and integrate data.

Uploaded by

Cherry Mea Bahoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Integrating

XML and Database


XML Origin and Usages
■ Defined by the WWW Consortium (W3C)
■ Originally intended as a document markup language, not a database
language
■ Documents have tags giving extra information about sections of the document
■ For example:
■ <title> XML </title>
■ <slide> XML Origin and Usages </slide>
■ Meta-language: used to define arbitrary XML languages/vocabularies (e.g. XHTML)
■ Derived from SGML (Standard Generalized Markup Language)
■ standard for document description
■ enables document interchange in publishing, office, engineering, …
■ main idea: separate form from structure
■ XML is simpler to use than SGML
■ roughly 20% complexity achieves 80% functionality
XML Origin and Usages (cont.)
■ XML documents are to some extent self-describing
■ Tags represent metadata
■ Metadata and data are combined in the same document
■ semi-structured data modeling
■ Example
<bank>
<account>
<account-number> A-101 </account-number>
<branch-name> Downtown </branch-name>
<balance> 500 </balance>
</account>
<depositor>
<account-number> A-101
</account-number>
<customer-name> Johnson </customer-name>
</depositor>
</bank>
Forces Driving XML
■ Document Processing
■ Goal: use document in various, evolving systems
■ structure – content – layout
■ grammar: markup vocabulary for mixed content
■ Data Bases and Data Exchange
■ Goal: data independence
■ structured, typed data – schema-driven – integrity constraints
■ Semi-structured Data and Information Integration
■ Goal: integrate autonomous data sources
■ data source schema not known in detail – schemata are dynamic
■ schema might be revealed through analysis only after data processing
XML Language Specifications
XSL

XML Link XML Pointer XPath XQuery

XSLT XSL-FO

XML Schema XML Namespace


XML Metadata Interchange XHML

eXtensible Markup Language Cascading Style Sheets

Unified Modeling Language

Standardized Generalized Markup Language


Meta Object Facility Unicode
Document Type Definition
XML Documents
■ XML documents are text (unicode)
■ markup (always starts with '<' or '&')
■ start/end tags
■ references (e.g., &lt, &amp, …)
■ declarations, comments, processing instructions, …
■ data (character data)
■ characters '<' and '&' need to be indicated using references (e.g., &lt) or using the
character code
■ alternative syntax: <![CDATA[ (a<b)&(c<d) ]]>
■ XML documents are well-formed
■ logical structure
■ (optional) prolog (XML version, …)
■ (optional) schema
■ root element (possibly nested)
■ comments, …
■ correct sequence of start/end tags (nesting)
■ uniqueness of attribute names
■ …
XML Documents: Elements
■ Element: section of data beginning with <tagname> and ending
with matching </tagname>
■ Elements must be properly nested
■ Formally: every start tag must have a unique matching end tag, that is in the
context of the same parent element.
■ Mixture of text with sub-elements is legal in XML
■ Example:
<account>
This account is seldom used any more.
<account-number> A-102</account-number>
<balance> 400 </balance>
<branch-name> Perryridge</branch-name>
</account>
■ Useful for document markup, but discouraged for data representation
XML Documents: Attributes
■ Attributes: can be used to describe elements
■ Attributes are specified by name=value pairs inside the starting
tag of an element
■ Example
<account acct-type = "checking" >
<account-number> A-102 </account-number>
<branch-name> Perryridge </branch-name>
<balance> 400 </balance>
</account>
■ Attribute names must be unique within the element
<account acct-type = “checking” monthly-fee=“5”>
XML Documents: IDs and IDREFs
■ An element can have at most one attribute of type ID
■ The ID attribute value of each element in an XML document must be distinct
ID attribute (value) is an 'object identifier'
■ An attribute of type IDREF must contain the ID value of an element in the
same document
■ An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID
value must contain the ID value of an element in the same document
■ IDs and IDREFs are untyped, unfortunately
■ Example below: The owners attribute of an account may contain a reference
to another account, which is meaningless;
owners attribute should ideally be constrained to refer to customer elements
XML data with ID and IDREF attributes
<bank-2>
<account account-number=“A-401” owners=“C100 C102”>
<branch-name> Downtown </branch-name>
<balance>500 </balance>
</account>
...
<customer customer-id=“C100” accounts=“A-401”>
<customer-name>Joe</customer-name>
<customer-street>Monroe</customer-street>
<customer-city>Madison</customer-city>
</customer>
<customer customer-id=“C102” accounts=“A-401 A-402”>
<customer-name> Mary</customer-name>
<customer-street> Erin</customer-street>
<customer-city> Newark </customer-city>
</customer>
</bank-2>
XML Document Schema
■ XML documents may optionally have a schema
■ standardized data exchange, …
■ Schema restricts the structures and data types allowed in a document
■ document is valid, if it follows the restrictions defined by the schema
■ Two important mechanisms for specifying an XML schema
■ Document Type Definition (DTD)
■ contained in the document, or
■ stored separately, referenced in the document
■ XML Schema
Document Type Definition - DTD
■ Original mechanism to specify type and structure of an XML document
■ What elements can occur
■ What attributes can/must an element have
■ What subelements can/must occur inside each element, and how many times.
■ DTD does not constrain data types
■ All values represented as strings in XML
■ Special DTD syntax
■ <!ELEMENT element (subelements-specification) >
■ <!ATTLIST element (attributes) >
Element Specification in DTD
■ Subelements can be specified as
■ names of elements, or
■ #PCDATA (parsed character data), i.e., character strings
■ EMPTY (no subelements) or ANY (anything can be a subelement)
■ Structure is defined using regular expressions
■ sequence (subel, subel, …), alternative (subel | subel | …)
■ number of occurences
■ “?” - 0 or 1 occurrence
■ “+” - 1 or more occurrences
■ “*” - 0 or more occurrences
■ Example
<!ELEMENT depositor (customer-name account-number)>
<!ELEMENT customer-name(#PCDATA)>
<!ELEMENT account-number (#PCDATA)>
<!ELEMENT bank ( ( account | customer | depositor)+)>
Example: Bank DTD
<!DOCTYPE bank-2[
<!ELEMENT account (branch-name, balance)>
<!ATTLIST account
account-number ID #REQUIRED
owners IDREFS #REQUIRED>
<!ELEMENT customer(customer-name, customer-street,
customer-city)>
customer-id
<!ATTLIST customer ID #REQUIRED
accounts IDREFS #REQUIRED>
… declarations for branch, balance, customer-name,
customer-street and
customer-city
]>
Describing XML Data: XML Schema
■ XML Schema is closer to the general understanding of a (database) schema
■ XML Schema supports
■ Typing of values
■ E.g. integer, string, etc
■ Constraints on min/max values
■ Typed references
■ User defined types
■ Specified in XML syntax (unlike DTDs)
■ Integrated with namespaces
■ Many more features
■ List types, uniqueness and foreign key constraints, inheritance ..
■ BUT: significantly more complicated than DTDs
XML Schema Structures
■ Datatypes (Part 2)
Describes Types of scalar (leaf) values
■ Structures (Part 1)
Describes types of complex values (attributes, elements)
■ Regular tree grammars
repetition, optionality, choice recursion
■ Integrity constraints
Functional (keys) & inclusion dependencies (foreign keys)
■ Subtyping (similar to OO models)
Describes inheritance relationships between types
■ Supports schema reuse
XML Schema Structures (cont.)
■ Elements : tag name & simple or complex type
<xs:element name=“sponsor” type=“xsd:string”/>
<xs:element name=“action” type=“Action”/>
■ Attributes : tag name & simple type
<xs:attribute name=“date” type=“xsd:date”/>
■ Complex types
<xs:complexType name=“Action”>
<xs:sequence>
<xs:elemref name =“action-date”/>
<xs:elemref name =“action-desc”/>
</xs:sequence>
</xs:complexType>
XML Schema Structures (cont.)
■ Sequence
<xs:sequence>
<xs:element name=“congress” type=xsd:string”/>
<xs:element name=“session” type=xsd:string”/>
</xs:sequence>
■ Choice
<xs:choice>
<xs:element name=“author” type=“PersonName”/>
<xs:element name=“editor” type=“PersonName”/>
</xs:choice>
■ Repetition
<xs:element name =“section”
type=“Section”
minOccurs=“1”
maxOccurs=“unbounded”/>
Namespaces
■ A single XML document may contain elements and attributes defined for and
used by multiple software modules
■ Motivated by modularization considerations, for example
■ Name collisions have to be avoided
■ Example:
■ A Book XSD contains a Title element for the title of a book
■ A Person XSD contains a Title element for an honorary title of a person
■ A BookOrder XSD reference both XSDs
■ Namespaces specifies how to construct universally unique names
XML Schema Version of Bank DTD
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.banks.org"
xmlns ="http://www.banks.org" >
<xsd:element name=“bank” type=“BankType”/>
<xsd:element name=“account”>
<xsd:complexType>
<xsd:sequence>
<xsd:element
<xsd:element name=“account-number”
name=“branch-name” type=“xsd:string”/>
type=“xsd:string”/>
<xsd:element name=“balance” type=“xsd:decimal”/>
</xsd:sequence>
</xsd:complexType>
</xsd:element> ….. definitions of customer and depositor ….
<xsd:complexType name=“BankType”>
<xsd:choice minOccurs="1" maxOccurs="unbounded">
<xsd:element ref=“account”/>
<xsd:element ref=“customer”/>
<xsd:element ref=“depositor”/>
</xsd:choice>
</xsd:complexType>
</xsd:schema>
XML Document Using Bank Schema
<bank xmlns="http://www.banks.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.banks.org Bank.xsd">
<account>
<account-number> … </account-number>
<branch-name> … </branch-name>
<balance> … </balance>
</account>

</bank>
Application Programming with XML
■ Application needs to work with XML data/document
■ Parsing XML to extract relevant information
■ Produce XML
■ Write character data
■ Build internal XML document representation and Serialize it
■ Generic XML Parsing
■ Simple API for XML (SAX)
■ “Push” parsing (event-based parsing)
■ Parser sends notifications to application about the type of document pieces it encounters
■ Notifications are sent in “reading order” as they appear in the document
■ Preferred for large documents (high memory efficiency)
■ Document Object Model (DOM) – w3c recommendation
■ “One-step” parsing
■ Generates in-memory representation of the document (parse tree)
■ DOM specifies the types of parse tree objects, their properties and operations
■ Independent of programming language (uses IDL)
■ Bindings available to specific programming languages (e.g., Java)
■ Parsing includes
■ checking for well-formedness
■ optionally checking for validity (often used for debugging only)
Transforming and Querying XML Data
■ XPath
■ path expressions for selecting document parts
■ not originally designed as a stand-alone language
■ XSLT
■ transformations from XML to XML and XML to HTML
■ primarily designed for style transformations
■ recursive pattern-matching paradigm
■ difficult to optimize in a DBMS context
■ XQuery
■ XML query language with a rich set of features
■ XQuery builds on experience with existing query languages:
XPath, Quilt, XQL, XML-QL, Lorel, YATL, SQL, OQL, …
XML Data Model
■ There is no uniform XML data model
■ different approaches with different goals
■ XML Information Set, DOM Structure Model, XPath 1.0 data model, XQuery data model
■ Common denominator: an XML document is modeled as a tree, with nodes of
different node types
■ Document, Element, Attribute, Text, Namespace, Comment, Processing Instruction
■ XQuery data model builds on a tree-based model, but extends it to support
■ sequences of items
■ nodes of different types (see above) as well as atomic values
■ can contain heterogeneous values, are ordered, can be empty
■ typed values and type annotations
■ result of schema validation
■ type may be unknown
■ Closure property
■ XQuery expressions operate on/produce instances of the XQuery Data Model
Example
<?xml version = "1.0"?>
<!-- Requires one trained person -->
<procedure title = "Removing a light bulb">
<time unit = "sec">15</time>
<step>Grip bulb.</step> D
<step> procedure

Rotate it C E A title="Removing a light bulb"


<warning>slowly</warning>
counterclockwise.
</step> time step step

</procedure> E A unit="sec" E E

warning
T T T E T
15 Grip bulb. Rotate it counterclockwise.

one possible T
instance of the slowly

XQuery data model


Processing XML Data: XPath
■ XPath is used to address (select) parts of documents using path expressions
■ A path expression consists of one or more steps separated by “/”
■ Each step in an XPath expression maps a node (the context node) into a set of
nodes
■ Result of path expression: set of values
that along with their containing
elements/attributes match the specified path
■ E.g.: /bank-2/customer/customer-name
evaluated on the bank-2 data returns
■ <customer-name> Joe </ customer-name>
■ < customer- name> Mary </ customer-name>
■ E.g.:/bank-2/customer/cust-name/text( )
returns the same names, but without the
enclosing tags
XPath
■ The initial “/” denotes root of the document (above the top-level tag)
■ In general, a step has three parts:
■ The axis (direction of movement: child, descendant, parent, ancestor, following, preceding,
attribute, … - 13 axes in all - )
■ A node test (type and/or name of qualifying nodes)
■ Some predicates (refine the set of qualifying nodes)
■ Path expressions are evaluated left to right
■ Each step operates on the set of instances produced by the previous step
■ Selection predicates may follow any step in a path, in [ ]
■ E.g. /bank-2/account[balance > 400]
■ returns account elements with a balance value greater than 400
■ /bank-2/account[balance] returns account elements containing a balance subelement
■ Attributes are accessed using “@”
■ E.g. /bank-2/account[balance > 400]/@account-number
■ returns the account numbers of those accounts with balance > 400
■ IDREF attributes are not dereferenced automatically (more on this later)
XPath Summary
■ Strengths:
■ Compact and powerful syntax for navigating a tree,
but not as powerful as a regular-expression language
■ Recognized and accepted in XML community
■ Used in other XML processors/specifications such as XPointer, XSLT, XQuery

■ Limitations:
■ Operates on one document (no joins)
■ No grouping or aggregation
■ No facility for generating new output structures
XQuery
■ XQuery is a general purpose query language for XML data
■ Standardized by the World Wide Web Consortium (W3C)
■ XQuery is derived from
■ the Quilt (“Quilt” refers both to the origin of the language and to its use in “knitting ” together heterogeneous
data sources) query language, which itself borrows from

■ XPath: a concise language for navigating in trees


■ XML-QL: a powerful language for generating new structures
■ SQL: a database language based on a series of keyword-clauses: SELECT - FROM
– WHERE
■ OQL: a functional language in which many kinds of expressions can be nested
with full generality
XQuery – Main Constituents
■ Path expressions
■ Inherited from XPath
■ An XPath expression maps a node (the context node) into a set of nodes
■ Element constructors
■ To construct an element with a known name and content, use XML-like syntax:
<book isbn = "12345">
<title>Huckleberry Finn</title>
</book>
■ If the content of an element or attribute must be computed, use a nested
expression enclosed in { }
<book isbn = "{$x}">
{$b/title }
</book>
■ FLWOR - Expressions
XQuery: The General Syntax Expression FLWOR

FOR_clause RETURN_clause

LET_clause WHERE_clause ORDER_BY_clause

■ FOR clause, LET clause generate list of tuples of bound variables (order preserving) by
■ iterating over a set of nodes (possibly specified by an XPath expression), or
■ binding a variable to the result of an expression
■ WHERE clause applies a predicate to filter the tuples produced by FOR/LET
■ ORDER BY clause imposes order on the surviving tuples
■ RETURN clause is executed for each surviving tuple, generates ordered list of outputs
■ Associations to SQL query expressions
for ⬄ SQL from
where ⬄ SQL where
order by ⬄ SQL order by
return ⬄ SQL select
let allows temporary variables, and has no equivalent in SQL
Evaluating FLWOR Expressions
input sequence
tuple stream
$x $y $z
$x $y $z

… ok!
FOR $X,$Y ..
LET $Z .. WHERE ..
ok!
… …
X …
… … …

ORDER
BY ..

ouput sequence $x $y $z


RETURN ..

… …

FLWOR - Examples
■ Simple FLWR expression in XQuery
■ Find all accounts with balance > 400, with each result enclosed in an <account-
number> .. </account-number> tag
for $x in /bank-2/account
let $acctno := $x/@account-number
where $x/balance > 400
return <account-number> {$acctno} </account-number>
■ Let and Where clause not really needed in this query, and selection can be
done in XPath.
■ Query can be written as:
for $x in /bank-2/account[balance>400] return
<account-number> {$x/@account-number}
</account-number>
Nesting of Expressions
■ Here: nesting inside the return clause
■ Example: inversion of a hierarchy

<book> <author>
<title> <name>
<author> <title>
<author> FOR $a IN fn:distinct-values(//author) <title>
</book> ORDER BY $a/name </author>
<book> RETURN <author>
<title> <author> <name>
<author> <name> { $a/text() } </name> <title>
<author> { FOR $b IN //book[author = $a] <title>
</book> RETURN $b/title } </author>
</author>
XQuery: Joins
■ Joins are specified in a manner very similar to SQL
for $a in /bank/account,
$c in /bank/customer,
$d in /bank/depositor
where $a/account-number = $d/account-number
and $c/customer-name = $d/customer-name
return <cust-acct>{ $c $a }</cust-acct>
■ The same query can be expressed with the selections specified as XPath
selections:
for $a in /bank/account
$c in /bank/customer
$d in /bank/depositor[
account-number =$a/account-number and
customer-name = $c/customer-name]
return <cust-acct>{ $c $a }</cust-acct>
XQJ – Main Concepts
■ Similar to JDBC, but for XQuery statements
■ data source, connection, (prepared) XQuery expression (statement)
■ XQuery variable identifier instead of parameter markers ("?")
■ Query result is a sequence (XQSequence)
■ iterate through sequence items using XQSequence.next()
■ retrieve Java DOM objects using XQSequence.getObject()
■ retrieve atomic values as character string or mapped to Java data types
■ individual items or the complete stream can be "written" to the SAX API
■ Support for "serializing" an XQuery result
■ to file, Java writer, string
■ as (X)HTML
XQuery - Status
■ XQuery 1.0 is a w3c recommendation since January 2007
■ XQuery API for JavaTM (XQJ) is final (JSR) since 2009
■ XQuery Update Facility 1.0 is a candidate recommendation
■ XQuery 1.1 is in the making (working draft), work items include
■ value-based and positional grouping
■ outer join support
■ windowing
■ date and numeric value formatting
■ Additional ongoing work
■ XQuery and XPath Full Text 1.0 (candidate recommendation)
■ adds support for text retrieval in XQuery
■ XQuery Scripting Extensions 1.0 (working draft)
■ adds procedural features
Transforming XML Data: XSLT
■ A stylesheet stores formatting options for a document, usually separately
from document
■ E.g. HTML style sheet may specify font colors and sizes for headings, etc.
■ The XML Stylesheet Language (XSL) was originally designed for
generating HTML from XML
■ XSLT is a general-purpose transformation language
■ Can translate XML to XML, and XML to HTML
■ XSLT transformations are expressed using rules called templates
■ Templates combine selection using XPath with construction of results
Understanding A Template
■ Most templates have the following form:
<xsl:template match="emphasis">
<i><xsl:apply-templates/></i>
</xsl:template>
■ The whole <xsl:template> element is a template
■ The match pattern determines where this template applies
■ XPath pattern
■ Literal result element(s) come from non-XSL namespace(s)
■ XSLT elements come from the XSL namespace
SQL and XML
■ Use existing (object-)relational technology?
■ Large Objects: granularity understood by DBMS may be too coarse!
■ search/retrieval of subsets, update of documents
■ Decompose into tables: often complex, inefficient
■ mapping complexity, especially for highly "denormalized" documents
■ Useful, but not sufficient
■ should be standardized as part of SQL
■ but needs further enhancement to support "native" XML support in SQL
■ Enable "hybrid" XML/relational data management
■ supports both relational and XML data
■ storage, access
■ query language
■ programming interfaces
■ ability to view/access relational as XML, and XML as relational
■ all major relational DBMS vendors are moving into this direction
SQL/XML Big Picture
XML, enhanced
SQL client
XQuery client SQL client

<?xml version = "1.0"?>


<order> client
<?xml version = "1.0"?>
<item> … </item>
<item> … </item>
<order>
<item> … </item>
view
<item> … </item>
… …
</order> </order>

SQL/XML

storage
<?xml version = "1.0"?>
<order>
<item> … </item> <?xml version = "1.0"?>
<order>
<item> … </item> <item> … </item>
… <item> … </item>
</order> …
</order>
SQL: Parts and Packages
•Two major goals:
•"Publish" SQL query results as XML documents 14: XML
•Ability to store and retrieve XML documents
• Rules for mapping SQL types, SQL identifiers and
SQL data values to and from corresponding
XML concepts
•A new built-in type XML
• A number of built-in operators that produce
values of type XML

recent additions for SQL200n:


•Integration of the XQuery Data Model
•Additional XML Constructor Functions
•Querying XML values
XML Publishing Functions- Example
CREATE VIEW XMLDept (DeptDoc XML) AS (
SELECT XMLELEMENT ( NAME "Department",
XMLATTRIBUTES ( e.dept AS "name" ),
XMLATTRIBUTES ( COUNT(*) AS
"count", XMLAGG (XMLELEMENT (NAME "emp",
XMLELEMENT(NAME "name", e.lname)
XMLELEMENT(NAME "hire", e.hire))
) AS "dept_doc"
FROM employees e GROUP BY dept) ;

dept_doc ==>

<Department name="Accounting" count="2">


<emp><name>Yates</name><hire>2005-11-01</hire></emp>
<emp><name>Smith</name><hire>2005-01-01</hire></emp>
</Department>
<Department name="Shipping" count="2">
<emp><name>Oppenheimer</name><hire>2002-10-01</hire></emp>
<emp><name>Martin</name><hire>2005-05-01</hire></emp>
</Department>
Manipulating XML Data
■ Constructor functions
■ focus on publishing SQL data as XML
■ no further manipulation of XML
■ More requirements
■ how do we select or extract portions of XML data (e.g., from stored XML)?
■ how can we decompose XML into relational data?
■ both require a language to identify, extract and possibly combine parts of XML
values

SQL/XML utilizes the XQuery standard for this!


XMLQUERY
■ Evaluates an XQuery or XPath expression
■ returns a sequence of XQuery nodes
■ XMLQUERY – Example
SELECT XMLQUERY(‘for $e in $dept[@count > 1]/emp
where $e/hire > 2004-12-31 return $e/name’
PASSING BY REF DeptDoc AS “dept”
RETURNING SEQUENCE) AS “Name_elements”
FROM XMLDept
=>
Name_elements

<name>Yates</name>
<name>Smith</name
<name>Martin</name>
JDBC-Support for SQLXML
■ New methods to create and retrieve SQLXML
■ Connection.createSQLXML()
■ ResultSet.getSQLXML()
■ PreparedStatement.setSQLXML()
■ SQLXML interface supports methods for accessing its XML content
■ getString()
■ getBinaryStream(), get CharacterStream()
■ obtain a Java stream/reader that can be passed directly to an XML parser
■ getSource()
■ obtain a source object suitable for XML parsers and XSLT transformers
■ corresponding setXXX() methods to initialize newly created SQLXML objects
QUESTIONS?????

You might also like