XML Integration
XML Integration
About SQL Server 2005's support for XML features, including unified storage for XML and
relational data, a new XML data type that supports native XML queries as well as strong data
typing by associating the XML data type to an XSD, and more.
XML (eXtensibleMarkup Language) has emerged as one of the most important Internet
technologies. XML’s flexible text-based structure enables it to be used for an incredibly wide array
of network tasks, including data/document transfer, web page rendering, and even as a transport
for interapplication remote Procedure Calls (RPC) via SOAP (Simple Object Access Protocol).
XML has truly become the lingua franca of computer languages.
Microsoft first added support for XML to SQL Server 2000, starting with support for the FOR XML
clause as part of the SELECT statement and the OpenXML function. As XML continued to grow
rapidly in enterprise acceptance and usage, Microsoft quickly provided additional functionality by
producing a series of web releases. SQL for XML 1.0 added support for UpdateGrams,
Templates, and BulkLoad to the base SQL Server 2000 release. The next two web releases,
SQLXML 2.0 and SQLXML 3.0, further enhanced the SQL Server 2000 product by adding
support for XML Views and SOAP in addition to several other new capabilities. While SQL Server
2000’s support for XML provided a great starting point for integrating hierarchical XML documents
with SQL Server’s relational data, it had some limitations. Once the XML data was stored in a
SQL Server database using either the Text or Image data type, there was little that you could do
with it. SQL Server 2000 was unable to natively query the hierarchical data that made up the XML
document without using complex T-SQL or client-side code.
SQL Server 2005 builds on this starting point by adding support for many new XML features. At a
high level, SQL Server 2005 provides a new level of unified storage for XML and relational data.
SQL Server 2005 adds a new XML data type that provides support for both native XML queries
as well as strong data typing by associating the XML data type to an XSD (Extensible Schema
Definition). In addition, it provides bidirectional mapping between relational data and XML data.
The XML support is well integrated into the SQL Server 2005 relational database engine, as it
provides support for triggers on XML, replication of XML data, and bulk load of XML data, as well
as enhanced support for data access via SOAP and many other enhancements. In this chapter
you’ll get an introduction to the most important new XML features provided by SQL Server 2005.
TCS Public
<DocumentText>Text</DocumentText>
</MyXMLDoc>')
NOTE: One important point to notice here is that because the XML data is untyped, any valid XML
document can be inserted into the XML data type.
The following listing shows a sample XSD schema for the simple XML document that was used in
the preceding example:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified" targetNamespace="MyXMLDocSchema"
xmlns="MyXMLDocSchema">
<xs:element name="MyXMLDoc">
<xs:complexType>
<xs:sequence>
<xs:element name="DocumentID" type="xs:string" />
<xs:element name="DocumentBody" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
This XSD schema uses the namespace of MyXMLDocSchema and defines an XML document
that has a complex element named MyXMLDoc. The MyXMLDoc complex element contains two
simple elements. The first simple element must be named DocumentID, and a second simple
element is named DocumentBody. Both elements contain XML string-type data.
To create a strongly typed XML column or variable, you first need to register the XSD schema
with SQL Server using the CREATE XMLSCHEMA T-SQL DDL statement. The following listing
shows how you combine the CREATE XML SCHEMA COLLECTION statement with the sample
MyXMLDocSchema to register the schema with the SQL Server 2005 database:
CREATE XML SCHEMA COLLECTION MyXMLDocSchema AS
N'<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified" targetNamespace="http://MyXMLDocSchema">
<xs:element name="MyXMLDoc">
<xs:complexType>
<xs:sequence>
<xs:element name="DocumentID" type="xs:string" />
<xs:element name="DocumentBody" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>'
The CREATE XML SCHEMA COLLECTION DDL statement takes a single argument that names
the collection. Next, after the AS clause it expects a valid XSD schema enclosed in single quotes.
If the schema is not valid, an error will be issued when the statement is executed. The CREATE
XML SCHEMA COLLECTION statement is database specific, and the schema that is registered
can be accessed only in the database for which the schema is registered.
Once you’ve registered the XML schema with SQL Server 2005, you can go ahead and associate
XML variables and columns with that schema. Doing so ensures that any XML documents that
are contained in those variables or columns will adhere to the definition provided by the
TCS Public
associated schema. The following example illustrates how you can create a table that uses a
strongly typed XML column:
CREATE TABLE MyXMLDocs
(DocID INT PRIMARY KEY IDENTITY,
MyXmlDoc XML(MyXMLDocSchema))
Here you can see that the MyXMLDocs table is created using the CREATE TABLE statement
much as in the preceding example. In this case, however, the MyXMLDoc column is created
using an argument that specifies that name of the registered XSD schema definition. If you refer
to the earlier listing, you can see that the schema was registered using the name
MyXMLDocSchema. After the MyXMLDoc column has been associated with the schema that was
registered, any data that’s inserted into this column will be strongly typed according to the
schema definition and any attempt to insert data that doesn’t match the schema definition will be
rejected. The following listing illustrates an INSERT statement that can add data to the strongly
typed MyXMLDoc column:
INSERT INTO MyXMLDocs Values
('<MyXMLDoc xmlns="http://MyXMLDocSchema">
<DocumentID>1</DocumentID>
<DocumentBody>"My text"</DocumentBody>
</MyXMLDoc>')
NOTE: Because this example uses a typed XML data type, the data must conform to the definition
provided by the associated XSD schema.
In this case, the XML document must reference the associated XML namespace
http://MyXMLDocSchema. And the XML document must contain a complex element named
MyXMLDoc, which in turn contains the DocumentID and DocumentBody elements. The SQL
Server engine will reject any attempt to insert any other XML documents into the MyXMLDocs
column. If the data does not conform to the supplied XSD schema, SQL Server will return an
error message like the one shown in the following listing:
Msg 6965, Level 16, State 1, Line 1
XML Validation: Invalid content,expected
element(s):MyXMLDocSchema:DocumentID where element 'MyXMLDocSchema:Do' was
specified
NOTE: As you might expect from their dependent relationship, if you assign a schema to a column
in a table, that table must be altered or dropped before that schema definition can be updated.
TCS Public
<xsd:complexType><xsd:complexContent><xsd:restriction base="xsd:any
(1 row(s) affected)
The elementname attribute lists the columns that use the typed XML data type.
TCS Public
NOTE: While this example illustrates performing a replace operation, the Modify method also
supports insert and delete operations.
(2 row(s) affected)
Value(XQuery, [node ref])
The Value method enables the extraction of scalar values from an XML data type. You can see
an example of how the XML data type’s Value method is used in the following listing:
SELECT MyXMLDoc.value('declare namespace xd=http://MyXMLDocSchema
(/xd:MyXMLDoc/xd:DocumentID)[1]', 'int') AS ID
FROM MyXMLDocs
Unlike the other XML data type methods, the XML Value method requires two parameters. The
first parameter is an XQuery expression, and the second parameter specifies the SQL data type
that will hold the scalar value returned by the Value method. This example returns all of the
values contained in the DocumentID element and converts them to the int data type, as shown in
the following results:
ID
-----------
1
2
(2 row(s) affected)
XQUERY SUPPORT
In the previous section you saw how XQuery is used in the new XML data type’s methods.
XQuery is based on the XPath language created by theW3C (www.w3c.org) for querying XML
data. XQuery extends the XPath language by adding the ability to update data as well as support
for better iteration and sorting of results. At the time of this writing, the XQuery language used by
SQL Server 2005 is an early implementation based on a working draft of the XQuery standard
submitted to the W3C, so it’s possible that some implementation details could change before SQL
Server 2005 is officially released. A description of the XQuery language is beyond the scope of
this book, but for more details about the W3C XQuery standard you can refer to
http://www.w3.org/XML/Query. The SQL Server 2005 Books Online also has an introduction to
the XQuery language.
XML INDEXES
The XML data type supports a maximum of 2GB of storage, which is quite large. The size of the
XML data and its usage can have a big impact on the performance the system can achieve while
TCS Public
querying the XML data. To improve the performance of XML queries, SQL Server 2005 provides
the ability to create indexes over the columns that have the XML data type.
In addition to the primary index, you can also build secondary XML indexes. Secondary indexes
are built on one of the following document attributes:
Path The document path is used to build the index.
Value The document values are used to built the index
Property The documents properties are used to build the index
Secondary indexes are always partitioned in the same way as the primary XML index. The
following listing shows the creation of a secondary-path XML index:
CREATE XML INDEX My2ndXMLDocsIdx ON MyXMLDocs(MyXMLDoc)
USING XML INDEX MyXMLDocsIdx FOR PATH
Type Directive
When XML data types are returned using the FOR XML clauses’ Type directive, they are not
serialized. Instead the results are returned as an XML data type. You can see an example of
using the FOR XML clause with the XML Type directive here:
SELECT DocID, MyXMLDoc FROM MyXMLDocs
WHERE DocID=1 FOR XML AUTO, TYPE
This query returns the relational DocID column along with the MyXMLDoc XML data type column.
It uses the FOR XML AUTO clause to return the results as XML. The TYPE directive specifies
that the results will be returned as an XML data type. You can see the results of using the Type
directive here:
<MyXMLDocs DocID="1">
<MyXMLDoc>
<MyXMLDoc xmlns="MyXMLDocSchema">
<DocumentID>1</DocumentID>
<DocumentBody>My New Body</DocumentBody>
</MyXMLDoc>
</MyXMLDoc>
</MyXMLDocs>
TCS Public
NOTE: The Type directive returns the XML data type as a continuous stream. I added the
formatting to the previous listing to make it more readable.
SQL Server 2000 was limited to using the FOR XML clause in the top level of a query.
Subqueries couldn’t make use of the FOR XML clause. SQL Server 2005 adds the ability to use
nested FOR XML queries. Nested queries are useful for returning multiple items where there is a
parent-child relationship. One example of this type of relationship might be order header and
order details records; another might be product categories and subcategories. You can see an
example of using a nested FOR XML clause in the following listing:
SELECT DocID, MyXMLDoc,
(SELECT MyXMLDoc
FROM MyXMLDocs2
WHERE MyXMLDocs2.DocID = MyXMLDocs.DocID
FOR XML AUTO, TYPE)
FROM MyXMLDocs Where DocID = 2 FOR XML AUTO, TYPE
In this example the outer query on table MyXMLDocs is combined with a subquery on the table
MyXMLDocs2 (for this example, a simple duplicate of the MyXMLDocs table). The important thing
to notice in this listing is SQL Server 2005’s ability to use the FOR XML clause in the subquery. In
this case the subquery is using the Type directive to return the results as a native XML data type.
If the Type directive were not used, then the results would be returned as an nvarchar data type
and the XML data would be entitized. You can see the results of the nested FOR XML query
shown in the listing that follows:
<MyXMLDocs DocID="2">
<MyXMLDoc>
<MyXMLDoc xmlns="MyXMLDocSchema">
<DocumentID>1</DocumentID>
<DocumentBody>"My text"</DocumentBody>
</MyXMLDoc>
</MyXMLDoc>
</MyXMLDocs>
NOTE: I added the formatting to the previous listing to make it more readable.
Another new feature in SQL Server 2005’s FOR XML support is the ability to generate an XSD
schema by adding the XMLSCHEMA directive to the FOR XML clause. You can see an example
of using the new XMLSCHEMA directive in the following listing:
SELECT MyXMLDoc FROM MyXMLDocs WHERE DocID=1 FOR XML AUTO, XMLSCHEMA
In this case, because the XMLSCHEMA directive has been added to the FOR XML clause the
query will generate and return the schema that defines the specific XML column along with the
XML result from the selected column. The XMLSCHEMA directive works only with the FOR XML
AUTO and FOR XML RAW modes. It cannot be used with the FOR XML EXPLICIT mode. If the
XMLSCHEMA directive is used with a nested query, it can be used only at the top level of the
query. The XSD schema that’s generated from this query is shown in the following listing:
<xsd:import namespace="http://MyXMLDocSchema" />
<xsd:element name="MyXMLDocs">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="MyXMLDoc" minOccurs="0">
<xsd:complexType sqltypes:xmlSchemaCollection="[tecadb].[dbo].
[MyXMLDocSchema]">
<xsd:complexContent>
<xsd:restriction base="sqltypes:xml">
<xsd:sequence>
TCS Public
<xsd:any processContents="strict"
namespace="http://MyXMLDocSchema" />
</xsd:sequence>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
<MyXMLDocs xmlns="urn:schemas-microsoft-com:sql:SqlRowSet1">
<MyXMLDoc>
<MyXMLDoc xmlns="http://MyXMLDocSchema">
<DocumentID>1</DocumentID>
<DocumentBody>My New Body</DocumentBody>
</MyXMLDoc>
</MyXMLDoc>
</MyXMLDocs>
The XMLSCHEMA directive can return multiple schemas, but it always returns at least
two: one schema is returned for the SqlTypes namespace, and a second schema is
returned that describes the results of the FOR XML query results. In the preceding listing
you can see the schema description of the XML data type column beginning at:
<xsd:element name=“MyXMLDocs”>. Next, the XML results can be seen at the line
starting with <MyXMLDocs xmlns=“urn:schemas-microsoft-com:sql:SqlRowSet1”>.
NOTE: You can also generate an XDR (XML Data Reduced) schema by using the XMLDATA
directive in combination with the FOR XML clause.
OPENXML EHANCEMENTS
The FOR XML clause is great for retrieving XML from the SQL Server 2005 database. The FOR
XML clause essentially creates an XML document from relational data. The OPENXML keyword
is the counterpart to the FOR XML clause. The OPENXML function provides a relational rowset
over an XML document. To use SQL Server’s OPENXML functionality, you must first call the
sp_xml_preparedocument stored procedure, which parses the XML document using the XML
Document Object Model (DOM) and returns a handle to OPENXML. OPENXML then provides a
rowset view of the parsed XML document. When you are finished working with the document, you
then call the sp_xml_removedocument stored procedure to release the system resources
consumed by OPENXML and the XML DOM. You can see an overview of this process in Figure
7-1.
With SQL Server 2005 the OPENXML support has been extended to include support for the new
XML data type, and the new user-defined data type. The following example shows how you can
use OPENXML in conjunction with a WITH clause in conjunction with the new XML data type:
DECLARE @hdocument int
DECLARE @doc varchar(1000)
SET @doc ='<MyXMLDoc>
<DocumentID>1</DocumentID>
<DocumentBody>"OPENXML Example"</DocumentBody>
</MyXMLDoc>'
EXEC sp_xml_preparedocument @hdocument OUTPUT, @doc
SELECT * FROM OPENXML (@hdocument, '/MyXMLDoc', 10)
WITH (DocumentID varchar(4),
DocumentBody varchar(50))
EXEC sp_xml_removedocument @hdocument
TCS Public
At the top of this listing you can see where two variables are declared. The @hdocument variable
will be used to store the XML document handle returned by the sp_xml_preparedocument stored
procedure, while the @doc variable will contain the sample XML document itself. Next, the
sp_xml_preparedocument stored procedure is executed and passed the two variables. The
sp_xml_preparedocument stored procedure uses XML DOM to parse the XML document and
then returns a handle to the parsed document in the @hdocument variable. That document
handle is then passed to the OPENXML keyword used in the SELECT statement.
The first parameter used by OPENXML is the document handle contained in the @hdocument
variable. The second parameter is an XPath pattern that specifies the nodes in the XML
document that will construct the relational rowset. The third parameter specifies the type of XML-
to-relational mapping that will be performed. The value of 2 indicates that element-centric
mapping will be used. A value of 1 would indicate that attribute-centric mapping would be
performed. The WITH clause provides the format of the rowset that’s returned. In this example,
the WITH clause specifies that the returned rowset will consist of two varchar columns named
DocumentID and DocumentBody. While this example shows the rowset names matching the XML
elements, that’s not a requirement. Finally, the sp_xml_removedocument stored procedure is
executed to release the system resources.
This SELECT statement using the OPENXML feature will return a rowset that consists of the
element values from the XML document. You can see the results of using OPENXML in the
following listing:
DocumentID DocumentBody
---------- --------------------------------------------------
1 "OPENXML Example"
(1 row(s) affected)
TCS Public
by web services without requiring an IIS system to act as an intermediary. Using the native HTTP
SOAP support, you can create web services that are capable of executing T-SQL batches, stored
procedures, and user-defined scalar functions. To ensure a high level of default security, native
HTTP access is turned off by default. However, you can enable HTTP support by first creating an
HTTP endpoint. You can see an example of the code to create an HTTP endpoint in the following
listing:
CREATE ENDPOINT MyHTTPEndpoint
STATE = STARTED
AS HTTP(
PATH = '/sql',
AUTHENTICATION = (INTEGRATED ),
PORTS = ( CLEAR ),
SITE = 'server'
)
FOR SOAP (
WEBMETHOD 'http://tempUri.org/'.'GetProductName'
(name='AdventureWorks.dbo.GetProductName',
schema=STANDARD ),
BATCHES = ENABLED,
WSDL = DEFAULT,
DATABASE = 'AdventureWorks',
NAMESPACE = 'http://AdventureWorks/Products'
)
This example illustrates creating an HTTP endpoint named MyHTTPEndPoint for the stored
procedure named GetProductName in the sample AdventureWorks database. Once the HTTP
endpoint is created, it can be accessed via a SOAP request issued by an application. You can
use the ALTER ENDPOINT and DROP ENDPOINT DDL statements to manage SQL Server’s
HTTP endpoints. The new HTTP endpoints are also able to provide data stream encryption using
SSL. More information about SQL Server’s new HTTP support can be found in Chapter 2.
The follow command shows how to list the HTTP endpoints that have been created:
select * from sys.http_endpoints
XML for Analysis Services provides two publicly accessible methods: the Discover method and
the Execute method. As its name implies, the Discover method gets information about a data
source. The Discover method can list information about the available data sources, the data
source providers, and the metadata that is available. The Execute method enables an application
to run commands against XML for Analysis Services data sources.
TCS Public
Table 7-1 SQL Server 2005 XML Catalog Views
sys.xml_attributes This view provides a row for each stored XML attribute.
sys.xml_components This view provides a row for each component of an XML schema.
This view provides a row for each XML component that is an XML
sys.xml_elements
element.
This view provides a list of all the XML component that are part
sys.xml_model_groups
of a Model-Group.
This view provides a row for each XML component that is an XML
sys.xml_types
type.
sys.xml_wildcard_namespaces This view provides a row for each XML wildcard namespace.
TCS Public