0% found this document useful (0 votes)
62 views34 pages

2.7 - db2 Purexml

Uploaded by

Suneet Singh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views34 pages

2.7 - db2 Purexml

Uploaded by

Suneet Singh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Information Management Ecosystem Partnerships

IBM Canada Lab

Summer/Fall 2010

DB2® pureXML

Information Management

© 2010 IBM Corporation


Information Management

Agenda
■ Overview of XML
■ pureXML in DB2
■ XML Data Movement in DB2
■ XQuery and SQL/XML
■ XML Indexes in DB2
■ Application Development

2 © 2010 IBM Corporation


Information Management

What is XML?
<book>
<authors>
■ eXtensible Markup Language <author id="47">John Doe</author>
– XML is a language designed <author id="58">Peter Pan</author>
</authors>
to describe data <title>Database systems</title>
</book>
■ A hierarchical data model

Flexible Easy to Describes


share itself

Characteristics of XML

Easy to Vendor Plataform


extend Independent Independent

3 © 2010 IBM Corporation


Information Management

Who Uses XML?


Banking Life Sciences Retail
IFX, OFX, SWIFT, SPARCS, MIAME, MAGE, IXRetail, UCCNET, EAN-UCC
MISMO +++ LSID, HL7, DICOM, ePC Network +++
CDIS, LAB, ADaM +++
Electronics
Healthcare
PIPs, RNIF, Business Directory,
HL7, DICOM, SNOMED,
Open Access Standards +++
LOINC, SCRIPT +++

Insurance Telecommunications
ACORD eTOM, NGOSS, etc.
XML for P&C, Life +++ Parlay Specification +++
Financial Markets Automotive
FIX Protocol, FIXML, MDDL, ebXML, Energy & Utilities
RIXML, FpML +++ other B2B Stds. IEC Working Group 14
Multiple Standards
Cross Industry Chemical & Petroleum CIM, Multispeak
PDES/STEPml Chemical eStandards
SMPI Standards CyberSecurity
RFID, DOD XML+++ PDX Standard+++

4 © 2010 IBM Corporation


Information Management

XML Document Components

Root element
<book>
<authors> Attribute
<author id=“47”>John Doe</author>
<author id=“58”>Peter Pan</author>
</authors>
<title>Database systems</title>
<price>29</price>
Element
<keywords>
<keyword>SQL</keyword>
<keyword>relational</keyword>
</keywords>
</book>
Text node (Data)

5 © 2010 IBM Corporation


Information Management

The XML Data Model: Node Types


<book>
<authors> Node Types
<author id=“47”>John Doe</author>
<author id=“58”>Peter Pan</author>
Document
</authors> node
<title>Database systems</title>
<price>29</price>
Element
<keywords> nodes
<keyword>SQL</keyword>
<keyword>relational</keyword>
book Attribute
</keywords> nodes
</book>
authors title price Text nodes
keywords

author author Database 29 keyword keyword


systems

id=47 John Doe id=58 Peter Pan SQL relational


6 © 2010 IBM Corporation
Information Management

Well-Formed Versus Valid XML Documents


■ A well-formed XML document is a document that follows
basic rules:
1) It must have one and only one root element
2) Each element begins with a start tag and ends with an end tag
3) An element can contain other elements, attributes, or text nodes
4) Attribute values must be enclosed in double quotes. Text nodes,
on the other hand, should not.
(i.e. it can be parsed by an XML parser without error)

■ A valid XML document is BOTH:


1) A well-formed XML document
2) A document compliant with the rules defined in an XML schema
document or a Document Type Definition (DTD) document.

XML Parsers can optionally perform “validation”

7 © 2010 IBM Corporation


Information Management

Native XML Storage


■ Documents are stored in parsed representation
customerInfo

Document XML Parsing


customer customer
Object
Model
Id=”1” name sex phone Id=”2” name sex phone

Victor M type = 739-1274 April F type = 983-2179


“work” “home”
<customerInfo>
<cusotmer id ="1">
<name>Victor</name>
<sex>M</sex>
Serialization <phone type="work">739-
1274</phone>
</customer>
<customer id ="2">
<name>April</name>
<sex>F</sex>
<phone type="home">983-
2179</phone>
9
</customer> © 2010 IBM Corporation
</customerInfo>
Information Management

Relational Versus Hierarchical (XML) Model

Relational Hierarchical (XML)


Relational data is flat XML data is nested.
Relational model is set XML retrieves sequences
oriented. Sets are (the order matters)
unordered.
Relational data is XML data is semi-
structured. structured.
Relational data has a XML data has a flexible
strong schema, unlikely to schema, appropriate for
change often. constant changes.
Use NULL for an unknown NULLS don't exist. Don't
state. add any XML element.
10
Based on the ANSI/ISO Based on the W3C industry
© 2010 IBM Corporation
Information Management

PureXML Storage in DB2: XML Data Type


CREATE TABLE dept (deptID VARCHAR(30), ..., custDoc XML)

deptID ... custDoc


A001 ..
.
... .. ...
.
DB2 storage

customerInfo
customer customer

Id=”1” name sex phone Id=”2” name sex phone

Victor M type = 739-1274 April F type = 983-2179


“work” “home”

11 © 2010 IBM Corporation


Information Management

XML Node Storage Layout


■ Node hierarchy of an XML document is stored on DB2 pages
■ Documents that don't fit on 1 page are split into pages/regions
■ No architectural limit for size of XML documents
■ NodeIDs are used to identify individual nodes

1.1 Document split into 3


1.2 1.3 regions, stored on 3
pages..

1.3.2

1.3.1.3

1.2.1.1.5.3
12 © 2010 IBM Corporation
Information Management

XML Data: As Trees on DB2 Pages


All benefits of DB2 tablespaces
– Buffered in Buffer Pools
– Prefetching
– Logging & Recovery

...

page page page


table space
13 © 2010 IBM Corporation
Information Management

XML Storage: Internal Objects and Their Relationship


INX Object

DAT Object
deptID ... custDoc
A001 ...
A002 ... Region
... ... Index

XDA Object
Like LOBs, XML
data is stored
separately from
the base table
(unless inlined)

14 © 2010 IBM Corporation


Information Management

How to Get Data In?


■ Implicit XML parsing:
– Inserting data of XML data type into a column

INSERT INTO dept VALUES


(‘PR27’, …, ‘<dept>…<emp>…</emp>…</dept>’)

■ Explicit XMLPARSE
– Transform XML value from serialized (text) form into internal
representation.
– Tell system how to treat whitespaces (strip/preserve)
• Default is 'Strip WHITESPACE'

INSERT INTO dept VALUES (‘PR27’, xmlparse(document


'<a>...</a>'));
INSERT INTO dept VALUES (‘PR27’,
xmlparse(document '<a>...</a>‘ preserve whitespace));

15 © 2010 IBM Corporation


Information Management

Deleting XML Data


■ DELETE
–Will delete every XML document for a row

DELETE FROM dept WHERE deptID=‘A001’

–You can also delete based on the XML content


DELETE FROM dept WHERE
XMLEXISTS ('$d//phone[type="Home"]'
passing INFO as "d")

■ Note: Setting an XML column to NULL deletes the XML


document
UPDATE dept SET custDoc = NULL WHERE deptID='A001’

16 © 2010 IBM Corporation


Information Management

Import DEL file to import

/data/dept.del
import from /data/dept.del of del 1000,"<XDS FIL=‘C1.xml' />"
XML from /data/xmlfiles
1001,"<XDS FIL=‘C2.xml' />"
insert into dept
1002,"<XDS FIL=‘C3.xml' />"
1003,"<XDS FIL=‘C4.xml' />"
1004,"<XDS FIL=‘C5.xml' />"

/data/xmlfiles
/data/xmlfiles/C1.xml
/data/xmlfiles/C2.xml
Directory that includes /data/xmlfiles/C3.xml
the XML files that are /data/xmlfiles/C4.xml
referenced in the DEL file
/data/xmlfiles/C5.xml

dept
1000 <dept><employee><name>John Doe</name>
<address><street>555 Bailey Ave</street><city>…</city><zip>95141</zip>
</address>…</employee></dept>
1001 <dept><employee><name>Kathy Smith</name> …
1002 <dept><employee><name>Jim Noodle ….

17 © 2010 IBM Corporation


Information Management

Export DEL file to output

EXPORT TO /data/dept.del of DEL Directory to place XML files


XML TO /data/xmlfiles
Base name for exported XML files
XMLFILE deptdoc
MODIFIED BY XMLINSEPFILES Store each XML document in separate file
SELECT * FROM dept (Optionally: Concatenate all
XML documents in one large file.)

What to export
dept
1000 <dept><employee><name>John Doe</name>
<address><street>555 Bailey Ave</street><city>…</city><zip>95141</zip>
</address>…</employee></dept>
1001 <dept><employee><name>Kathy Smith</name> …
1002 <dept><employee><name>Jim Noodle ….

/data/dept.del /data/xmlfiles
1000,"<XDS FIL=‘C1.xml' />" /data/xmlfiles/C1.xml
1001,"<XDS FIL=‘C2.xml' />" /data/xmlfiles/C2.xml
1002,"<XDS FIL=‘C3.xml' />" /data/xmlfiles/C3.xml
1003,"<XDS FIL=‘C4.xml' />" /data/xmlfiles/C4.xml
1004,"<XDS FIL=‘C5.xml' />" /data/xmlfiles/C5.xml
18 © 2010 IBM Corporation
Information Management

SQL/XML and XQuery


■ DB2 Supports two query languages:
– XQuery
– SQL/XML
■ XPath
– Cornerstone for both XQuery and SQL/XML standard
– Provides ability to navigate within XML documents
■ XQuery
– Two important functions to access the database:
• db2-fn:sqlquery
• db2-fn:xmlcolumn
– Results returned as a sequence of items
■ SQL/XML
– Provides functions to work with both XML and relation data at the
same time.

19 © 2010 IBM Corporation


Information Management

XPath
<customerInfo>
Path Table
<cusotmer id ="1">
<name>Victor</name> /
<sex>M</sex>
<phone type="work">739- /customerInfo
1274</phone>
/customerInfo/customer/@id
</customer>
<customer id ="2"> Parse /customerInfo/customer/name
<name>April</name>
<sex>F</sex> /customerInfo/customer/sex
<phone type="home">983-
2179</phone>
/customerInfo/customer/phone
</customer> customerInfo /
</customerInfo> customerInfo/customer/phone/@type

customer customer

Id=”1” name sex phone Id=”2” name sex phone

Victor M type = 739-1274 April F type = 983-2179


“work” “home”
20 © 2010 IBM Corporation
Information Management

Some Common XPath Expressions


<customerInfo> / Selects from the root node.
<cusotmer id ="1">
<name>Victor</name> // Selects nodes in the document from the
<sex>M</sex> current node that match the select.
<phone type="work">739-
1274</phone> text() Specifies the text node under an
</customer> element.
<customer id ="2">
<name>April</name> @ Specifies an attribute.
<sex>F</sex> * Matches any element node.
<phone type="home">983-
2179</phone> @* Matches any attribute node.
</customer>
</customerInfo> [ … ] Predicates
XPath Expression Result Description Result
/customerInfo/*/phone/text() Selects the text node under the 739-1274
phone element of customerInfo 983-2179
/customerInfo//phone/@type Selects the type attribute under the work
phone element of customerInfo home
/customerInfo/customer[1]/phone/text() Selects the phone element text 739-1274
node under the first customer of
customerInfo
/customerInfo//phone[@type='home'] Selects all phone elements under <phone
cusomterInfo which has an type=”home”>
attribute named type with a value 983-2179
of 'home' </phone>
21 © 2010 IBM Corporation
Information Management

Introduction to XQuery
■ Unlike relational data (which is predictable and has a regular
structure), XML data is:
– Often unpredictable
– Highly variable
– Sparse
– Self-describing

■ You may need XML queries to perform the following operations:


– Search XML data for objects that are at unknown levels of the
hierarchy
– Perform structural transformations on the data
– Return results that have mixed types
22 © 2010 IBM Corporation
Information Management

DB2 XQuery Functions


■ To obtain XML data from a DB2 database with XQuery
– db2-fn:xmlcolumn ( xml-column-name )
• Input argument is a string literal that identifies an XML column in a
table, case sensitive
xquery
db2-fn:xmlcolumn(“CUSTOMER.INFO”)/customerinfo
• Retrieves an entire XML column as a sequence of XML values
– db2-fn:sqlquery ( full-select-sql-statement )
• Input argument is interpreted as an SQL statement and parsed by
the SQL parser
• SQL statement needs to return a single XML column
xquery
db2-fn:sqlquery(‘SELECT INFO
FROM CUSTOMER WHERE CID=6’)/customerinfo
• Returns an XML sequence that results from the full select
23 © 2010 IBM Corporation
Information Management

XQuery: Retrieving XML Data From a Column


■ db2-fn:xmlcolumn
– Retrieve all XML documents from an XML column, then process
them with an XQuery expression.
XMLCUSTOMER
CID INFO xquery
db2-fn:xmlcolumn("XMLCUSTOMER.INFO");
1001

1002

1003

xquery
db2-fn:xmlcolumn("XMLCUSTOMER.INFO")/customerinfo/name;

name

… …
name

… …
24 © 2010 IBM Corporation
Information Management

XQuery: Retrieving XML Based on a SQL Query


■ db2-fn:sqlquery
– Retrieve and XML document using SQL, then process it with an
XQuery expression
– Allows filtering based on relational data
XMLCUSTOMER
CID INFO xquery
db2-fn:sqlquery(
1001 "SELECT INFO
FROM XMLCUSTOMER
1002 WHERE CID=1001");
1003

xquery
db2-fn:sqlquery("SELECT INFO FROM
XMLCUSTOMER
WHERE CID=1001")/customerinfo/name;

name

… … © 2010 IBM Corporation


25
Information Management

SQL/XML Functions
■ XQuery can be invoked from SQL
– XMLQUERY()
– XMLTABLE()
– XMLEXISTS()

■ By executing XQuery expressions from within the SQL


context, you can:
– Operate on parts of stored XML documents instead of entire XML
documents
– Enable XML data to participate in SQL queries
– Operate on both relational and XML data
– Apply further SQL processing to the returned XML values
26 © 2010 IBM Corporation
Information Management

SQL/XQuery: XML Data for SQL Developers


■ XMLQUERY
– Scalar function, applied once to each qualifying document
– Evaluates an XPath (or XQuery) expression
– Input arguments can be passed into the XQuery
(e.g. column names, constants, parameter markers)
– Returns a sequence of 0, 1 or multiple items from each document
XMLCUSTOMER
CID INFO SELECT
1001 XMLQUERY(‘$i/customerinfo/name’
1002 PASSING INFO AS “i”)
FROM
1003 CUSTOMER

1
<name>...</name>
<name>...</name>
...
27 © 2010 IBM Corporation
Information Management

SQL/XQuery: XML Data for SQL Developers

SELECT

XMLQUERY(‘$i/customerinfo/name’
PASSING INFO AS “i”)
FROM CUSTOMER

■ SELECT iterates over all rows in the customer table


■ For each row, "XMLQUERY" is invoked
– The "passing" clause binds the variable "$i" to the value of
the “INFO" column of the current row
– The XQuery expression is executed
– XMLQUERY returns the result of the XQuery expression,
a value of type XML

28 © 2010 IBM Corporation


Information Management

SQL/XQuery: XML Data for SQL Developers


■ XMLTABLE
– Creates a temporary SQL table using XML data
XMLCUSTOMER
SELECT T.*
CID INFO
FROM XMLTABLE(
1001 'db2-fn:xmlcolumn("XMLCUSTOMER.INFO")/customerinfo'
COLUMNS "NAME" VARCHAR (20) PATH 'name',
1002 "STREET" VARCHAR (20) PATH 'addr/street',
"CITY" VARCHAR (20) PATH 'addr/city'
1003 ) AS T

<customerinfo>
<name>John Smith</name> NAME STREET CITY
<addr country=“Canada">
<street>Fourth</street> Amir Malik Young Toronto
<city>Calgary</city> John Smith Fourth Calgary
<prov-state>Alberta</prov- … … …
state>
<pcode-zip>M1T 2A9</pcode-zip>
</addr>
<phone type="work">
963-289-4136
</phone>
</customerinfo>
29 © 2010 IBM Corporation
Information Management

SQL/XQuery: XML Data for SQL Developers


■ XMLEXISTS
– Predicate that tests if an XQuery expression returns a sequence
XMLCUSTOMER
CID INFO SELECT CID, INFO
1001
FROM XMLCUSTOMER WHERE
XMLEXISTS(
1002 '$d/customerinfo[name = "John Smith"]'
passing INFO as "d")
1003

<customerinfo>
<name>John Smith</name> CID INFO
<addr country=“Canada">
<street>Fourth</street> 1003
<city>Calgary</city>
<prov-state>Alberta</prov-
state>
<pcode-zip>M1T 2A9</pcode-zip>
</addr>
<phone type="work">
963-289-4136
</phone>
30 © 2010 IBM Corporation
</customerinfo>
Information Management

XML Indexes
■ An index over XML data can be used to improve the
efficiency of queries on XML documents.
–Index entries will provide access to nodes within the
document by creating index keys based on XML pattern
expressions.
■ Like relational data they may have some cost.
– Performance for INSERT, UPDATE and DELETE
– Space needed to store the indexes

Regular Indexes Indexes for XML CREATE INDEX IDX1 ON


TB1(XMLDOC)
Based on XML pattern
Based on columns
expressions
GENERATE KEY USING XMLPATTERN
‘/company/emp/salary’
1 or more columns Only 1 XML column AS SQL DOUBLE;
All nodes that satisfy
1 row  1 index key
the XML pattern: CREATE INDEX IDX2 ON
1 document  0, 1 or TB1(XMLDOC)
more index keys GENERATE KEY USING XMLPATTERN
B-Tree B-Tree ‘//@id’ AS SQL VARCHAR(20);
31 © 2010 IBM Corporation
Information Management

XML Indexes: Under the Covers


(PathID, keyvalue) → (DocID, NodeID, RowID)
Intege cha xm
r r l
XML
Values Index ... ...
PathID, Value, DocID, NodeID,...,RID

Regions
Index

■ XML Index contains


Path/Value pairs
XDA
■ Path encoded as
PathID
■ docID points to
region containing
doc root node
■ Direct sub-doc
level access page page page

32 © 2010 IBM Corporation


Information Management

Development Support for XML Data

C or C++
SQL
Procedures COBOL

Ruby pureXML Java

C# and
Perl
Visual Basic
PHP
33 © 2010 IBM Corporation
Information Management

XML – Conclusion
■ Native XML hierarchical storage
–No shredding, no CLOBs, no BLOBs required
–Optimized for XPATH and XQuery (LUW Only) processing
■ High performance
–Superior indexing technology
–No parsing of XML data at query runtime
■ Fully integrated XML and relational processing
–Seamlessly query various types of data at once
–No internal translation of XQuery into SQL

34 © 2010 IBM Corporation


Information Management Ecosystem Partnerships
IBM Canada Lab

Questions? Summer/Fall 2010

E-mail: [email protected]
Subject: “DB2 Academic Workshop”

Information Management

© 2010 IBM Corporation

You might also like