XML Concepts Overview
XML Concepts Overview
By PenchalaRaju.Yanamala
You can import XML definitions into PowerCenter from the following file types:
XML file. An XML file contains data and metadata. An XML file can reference a
Document Type Definition file (DTD) or an XML schema definition (XSD) for
validation.
DTD file. A DTD file defines the element types, attributes, and entities in an
XML file. A DTD file provides some constraints on the XML file structure but a
DTD file does not contain any data.
XML schema. An XML schema defines elements, attributes, and type
definitions. Schemas contain simple and complex types. A simple type is an
XML element or attribute that contains text. A complex type is an XML element
that contains other elements and attributes.
Schemas support element, attribute, and substitution groups that you can
reference throughout a schema. Use substitution groups to substitute one
element with another in an XML instance document. Schemas also support
inheritance for elements, complex types, and element and attribute groups.
XML Files
XML files contain tags that identify data in the XML file, but not the format of the
data. The basic component of an XML file is an element. An XML element
includes an element start tag, element content, and element end tag. All XML
files must have a root element defined by a single tag at the top and bottom of
the file. The root element encloses all the other elements in the file.
<book>
<title>Fun with XML</title>
<chapter>
<heading>Understanding XML</heading>
<heading>Using XML</heading>
</chapter>
<chapter>
<heading>Using DTD Files</heading>
<heading>Fun with Schemas</heading>
</chapter>
</book>
Book is the root element and it contains the title and chapter elements. Book is
the parent element of title and chapter, and chapter is the parent of heading. Title
and chapter are sibling elements because they have the same parent.
An element can have attributes that provide additional information about the
element. In the following example, the attribute graphic_type describes the
content of picture:
<picture graphic_type="gif">computer.gif</picture>
Enclosure element. An element that contains other elements but does not
contain data. An enclosure element can include other enclosure elements.
Global element. An element that is a direct child of the root element. You can
reference global elements throughout an XML schema.
Leaf element. An element that does not contain other elements. A leaf element
is the lowest level element in the XML hierarchy.
Local element. An element that is nested in another element. You can
reference local elements only within the context of the parent element.
Multiple-occurring element. An element that occurs more than once within its
parent element. Enclosure elements can be multiple-occurring elements.
Parent chain. The succession of child-parent elements that traces the path
from an element to the root.
Parent element. An element that contains other elements.
Single-occurring element. An element that occurs once within its parent.
Validating XML Files with a DTD or Schema
A valid XML file conforms to the structure of an associated DTD or schema file.
To reference the location and name of a DTD file, use the DOCTYPE declaration
in an XML file. The DOCTYPE declaration also names the root element for the
XML file.
For example, the following XML file references the location of the note.dtd file:
<?xml version="1.0"?>
"http://www.w3schools.com/dtd/note.dtd">
<note>
<body>XML Data</body>
</note>
The following XML file references the note.xsd schema in an external location:
<?xml version="1.0"?>
</note>
Unicode Encoding
An XML file contains an encoding attribute that indicates the code page in the
file. The most common encodings are UTF-8 and UTF-16. UTF-8 represents a
character with one to four bytes, depending on the Unicode symbol. UTF-16
represents a character as a 16-bit word.
<?xml version="1.0"encoding="UTF-8"?>
<body>XML Data</body>
</note>
DTD Attributes
The element name is product. The attribute is product_name. The attribute has a
default value, vacuum.
An XML schema is a document that defines the valid content of XML files. An
XML schema file, like a DTD file, contains only metadata. An XML schema
defines the structure and type of elements and attributes for an associated XML
file. When you use a schema to define an XML file, you can restrict data, define
data formats, and convert data between datatypes. XML schemas support
complex types and inheritance between types. They also provide a way to
specify element and attribute groups, ANY content, and circular references
Cardinality
Absolute Cardinality
For example, an element has an absolute cardinality of once (1) if the element
occurs once within its parent element. However, the element might occur many
times within an XML hierarchy if the parent element has a cardinality of one or
more (+).
Table 1-2 describes how DTD and XML schema files represent cardinality:
Relative Cardinality
XPath uses a slash (/) to distinguish between elements in the hierarchy. XML
attributes are preceded by “@” in the XPath.
Using XML with PowerCenter Overview
You can create an XML definition in PowerCenter from an XML file, DTD file,
XML schema, flat file definition, or relational table definition. When you create an
XML definition, the Designer extracts XML metadata and creates a schema in the
repository. The schema provides the structure from which you edit and validate
the XML definition.
An XML definition can contain multiple groups. In an XML definition, groups are
called views. The relationship between elements in the XML hierarchy defines
the relationship between the views. When you create an XML definition, the
Designer creates views for multiple-occurring elements and complex types in a
schema by default. The relative cardinality of elements in an XML hierarchy
affects how PowerCenter creates views in an XML definition. Relative cardinality
determines if elements can be part of the same view.
When you create an XML definition, you can create a hierarchical model or an
entity relationship model of the XML data. When you create a hierarchical model,
you create a normalized or denormalized hierarchy. A normalized hierarchy
contains separate views for multiple-occurring elements. A denormalized
hierarchy has one view with duplicate data for multiple-occurring elements.
If you create an entity model, the Designer creates views for complex types and
multiple-occurring elements. The Designer creates an XML definition that models
the inheritance and circular relationships the schema provides.
Importing XML Metadata
When you import an XML definition, the Designer creates a schema in the
repository for the definition. The repository schema provides the structure from
which you edit and validate the XML definition.
XML files
DTD files
XML schema files
Relational tables
Flat files
In an XML file, a pair of tags marks the beginning and end of each data element.
These tags are the basis for the metadata that PowerCenter extracts from the
XML file. If you import an XML file without an associated DTD or XML schema,
the Designer reads the XML tags to determine the elements, their possible
occurrences, and their position in the hierarchy. The Designer checks the data
within the element tags and assigns a datatype depending on the data
representation. You can change the datatypes for these elements in the XML
definition.
Figure 2-1 shows a sample XML file. The root element is Employees. Employee
is a multiple occurring element. The Employee element contains the LastName,
FirstName, and Address. The Employee element also contains the multiple-
occurring elements: Phone and Email.