A Complex XML Element Called : Slide 27-1
A Complex XML Element Called : Slide 27-1
3
A complex XML
element called
<projects>
Slide 27- 1
The basic object is XML is the XML document.
There are two main structuring concepts that
are used to construct an XML document:
◦ Elements
◦ Attributes
Attributes in XML provide additional
information that describe elements.
Slide 27- 2
As in HTML, elements are identified in a document
by their start tag and end tag.
◦ The tag names are enclosed between angled brackets
<…>, and end tags are further identified by a backslash
</…>.
Complex elements are constructed from other
elements hierarchically, whereas simple elements
contain data values.
It is straightforward to see the correspondence
between the XML textual representation and the
tree structure.
◦ In the tree representation, internal nodes represent
complex elements, whereas leaf nodes represent simple
elements.
◦ That is why the XML model is called a tree model or a
hierarchical model.
Slide 27- 3
It is possible to characterize three main types of
XML documents:
1. Data-centric XML documents
These documents have many small data items that
follow a specific structure, and hence may be
extracted from a structured database. They are
formatted as XML documents in order to exchange
them or display them over the Web.
2. Document-centric XML documents:
These are documents with large amounts of text,
such as news articles or books. There is little or no
structured data elements in these documents.
3. Hybrid XML documents:
These documents may have parts that contains
structured data and other parts that are
predominantly textual or unstructured.
Slide 27- 4
Two types of XML
◦ Well-Formed XML
◦ Valid XML
Slide 27- 6
Well-Formed XML
◦ It must start with an XML declaration to indicate the
version of XML being used—as well as any other
relevant attributes.
◦ It must follow the syntactic guidelines of the tree
model.
This means that there should be a single root element,
and every element must include a matching pair of
start tag and end tag within the start and end tags of
the parent element.
Slide 27- 7
Well-Formed XML (contd.)
◦ A well-formed XML document is syntactically
correct
This allows it to be processed by generic processors
that traverse the document and create an internal tree
representation.
DOM (Document Object Model) - Allows programs to
manipulate the resulting tree representation
corresponding to a well-formed XML document. The
whole document must be parsed beforehand when using
dom.
SAX - Allows processing of XML documents on the fly by
notifying the processing program whenever a start or end
tag is encountered.
Slide 27- 8
Valid XML
◦ A stronger criterion is for an XML document to be
valid.
◦ In this case, the document must be well-formed,
and in addition the element names used in the start
and end tag pairs must follow the structure
specified in a separate XML DTD (Document Type
Definition) file or XML schema file.
Slide 27- 9
FIGURE 27.4 An XML DTD file called projects
Slide 27- 10
XML DTD Notation
◦ A * following the element name means that the
element can be repeated zero or more times in the
document. This can be called an optional multivalued
(repeating) element.
◦ A + following the element name means that the
element can be repeated one or more times in the
document. This can be called a required multivalued
(repeating) element.
◦ A ? following the element name means that the
element can be repeated zero or one times. This can
be called an optional single-valued (non-repeating)
element.
◦ An element appearing without any of the preceding
three symbols must appear exactly once in the
document. This can be called an required single-
valued (non-repeating) element.
Slide 27- 11
XML DTD Notation (contd.)
◦ The type of the element is specified via parentheses
following the element.
If the parentheses include names of other elements, these
would be the children of the element in the tree structure.
If the parentheses include the keyword #PCDATA or one
of the other data types available in XML DTD, the element
is a leaf node. PCDATA stands for parsed character data,
which is roughly similar to a string data type.
◦ Parentheses can be nested when specifying elements.
◦ A bar symbol ( e1 | e2 ) specifies that either e1 or e2
can appear in the document.
Slide 27- 12
Limitations of XML DTD
◦ First, the data types in DTD are not very general.
◦ Second, DTD has its own special syntax and so it
requires specialized processors.
It would be advantageous to specify XML schema
documents using the syntax rules of XML itself so that
the same processors for XML documents can process
XML schema descriptions.
◦ Third, all DTD elements are always forced to follow
the specified ordering the document so unordered
elements are not permitted.
Slide 27- 13
FIGURE 27.5 An XML schema file called company
Slide 27- 14
FIGURE 27.5 An XML
schema file called
company (contd.)
Slide 27- 15
FIGURE 27.5 An XML
schema file called
company (contd.)
Slide 27- 16
FIGURE 27.5 An XML schema file called company
(contd.)
Slide 27- 17
XML Schema
◦ Schema Descriptions and XML Namespaces
It is necessary to identify the specific set of XML
schema language elements (tags) by a file stored at a
Web site location.
The second line in our example specifies the file used in
this example, which is:
"http://www.w3.org/2001/XMLSchema".
Each such definition is called an XML namespace.
The file name is assigned to the variable xsd using the
attribute xmlns (XML namespace), and this variable is
used as a prefix to all XML schema tags.
Slide 27- 18
XML Schema (contd.)
◦ Annotations, documentation, and language used:
The xsd:annotation and xsd:documentation are used
for providing comments and other descriptions in the
XML document.
The attribute XML:lang of the xsd:documentation
element specifies the language being used. E.g., “en”
Slide 27- 19
XML Schema (contd.)
◦ Elements and types:
We specify the root element of our XML schema. In
XML schema, the name attribute of the xsd:element
tag specifies the element name, which is called
company for the root element in our example.
The structure of the company root element is a
xsd:complexType.
Slide 27- 20
XML Schema (contd.)
◦ First-level elements in the company database:
These elements are named employee, department, and
project, and each is specified in an xsd:element tag. If
a tag has only attributes and no further sub-elements
or data within it, it can be ended with the back slash
symbol (/>) and termed Empty Element.
Slide 27- 21
XML Schema (contd.)
◦ Specifying element type and minimum and
maximum occurrences:
If we specify a type attribute in an xsd:element, this
means that the structure of the element will be
described separately, typically using the
xsd:complexType element. The minOccurs and
maxOccurs tags are used for specifying lower and
upper bounds on the number of occurrences of an
element. The default is exactly one occurrence.
Slide 27- 22
XML Schema (contd.)
◦ Specifying Keys:
For specifying primary keys, the tag xsd:key is used.
For specifying foreign keys, the tag xsd:keyref is used.
When specifying a foreign key, the attribute refer of the
xsd:keyref tag specifies the referenced primary key
whereas the tags xsd:selector and xsd:field specify the
referencing element type and foreign key.
Slide 27- 23
XML Schema (contd.)
◦ Specifying the structures of complex elements via
complex types:
Complex elements in our example are Department,
Employee, Project, and Dependent, which use the tag
xsd:complexType. We specify each of these as a sequence
of subelements corresponding to the database attributes
of each entity type by using the xsd:sequence and
xsd:element tags of XML schema. Each element is given a
name and type via the attributes name and type of
xsd:element.
We can also specify minOccurs and maxOccurs attributes
if we need to change the default of exactly one
occurrence. For (optional) database attributes where null
is allowed, we need to specify minOccurs = 0, whereas
for multivalued database attributes we need to specify
maxOccurs = “unbounded” on the corresponding
element.
Slide 27- 24
XML Schema (contd.)
◦ Composite (compound) attributes:
Composite attributes from ER Schema are also
specified as complex types in the XML schema, as
illustrated by the Address, Name, Worker, and
WorksOn complex types. These could have been
directly embedded within their parent elements.
Slide 27- 25
XPath
◦ An XPath expression returns a collection of element
nodes that satisfy certain patterns specified in the
expression.
◦ The names in the XPath expression are node names
in the XML document tree that are either tag
(element) names or attribute names, possibly with
additional qualifier conditions to further restrict the
nodes that satisfy the pattern.
Slide 27- 26
XPath (contd.)
◦ There are two main separators when specifying a
path:
single slash (/) and double slash (//)
A single slash before a tag specifies that the tag must appear
as a direct child of the previous (parent) tag, whereas a
double slash specifies that the tag can appear as a
descendant of the previous tag at any level.
◦ It is customary to include the file name in any XPath
query allowing us to specify any local file name or
path name that specifies the path.
◦ doc(www.company.com/info.XML)/company =>
COMPANY XML doc
Slide 27- 27
1. Returns the COMPANY root node and all its descendant
nodes, which means that it returns the whole XML document.
2. Returns all department nodes (elements) and their
descendant subtrees.
3. Returns all employeeName nodes that are direct children of
an employee node, such that the employee node has another
child element employeeSalary whose value is greater than
70000.
4. This returns the same result as the previous one except that
we specified the full path name in this example.
5. This returns all projectWorker nodes and their descendant
nodes that are children under a path /company/project and
that have a child node hours with value greater than 20.0
hours.
Slide 27- 28
FIGURE 27.14
Some examples of XPath expressions on XML
documents that follow the XML schema file
COMPANY in FIGURE 27.5.
Slide 27- 29
XQuery
◦ XQuery uses XPath expressions, but has
additional constructs.
◦ XQuery permits the specification of more general
queries on one or more XML documents.
◦ The typical form of a query in XQuery is known as a
FLWR expression, which stands for the four main
clauses of XQuery and has the following form:
FOR <variable bindings to individual nodes (elements)>
LET <variable bindings to collections of nodes
(elements)>
WHERE <qualifier conditions>
RETURN <query result specification>
Slide 27- 30
1. This query retrieves the first and last names of employees
who earn more than 70000. The variable $x is bound to each
employeeName element that is a child of an employee
element, but only for employee elements that satisfy the
qualifier that their employeeSalary is greater that 70000.
2. This is an alternative way of retrieving the same elements
retrieved by the first query.
3. This query illustrates how a join operation can be performed
by having more than one variable. Here, the $x variable is
bound to each projectWorker element that is a child of
project number 5, whereas the $y variable is bound to each
employee element. The join condition matches SSN values in
order to retrieve the employee names.
Slide 27- 31
Some examples of XQuery queries on XML
Slide 27- 32