XML Technologies and Applications: Rajshekhar Sunderraman
XML Technologies and Applications: Rajshekhar Sunderraman
Rajshekhar Sunderraman
Department of Computer Science
Georgia State University
Atlanta, GA 30302
[email protected]
December 2005
Outline
Introduction
XML Basics
XML Structural Constraint Specification
Document Type Definitions (DTDs)
XML Schema
XML/Database Mappings
XML Parsing APIs
Simple API for XML (SAX)
Document Object Model (DOM)
XML Querying and Transformation
XPath
XQuery
XSLT
XML Applications
Parsers
What is a parser?
<orders> <item>
<order> <pnum>10507</pnum>
<onum>1020</onum>
<quantity>1</quantity>
</item>
<takenBy>1000</takenBy>
<item>
<customer>1111</customer> <pnum>10508</pnum>
<recDate>10-DEC 94</recDate> <quantity>2</quantity>
<items> </item>
<item> <item>
<pnum>10506</pnum> <pnum>10509</pnum>
<quantity>3</quantity>
<quantity>1</quantity>
</item>
</item> </items>
</order>
...
</orders>
Sample Data
Orders Data in XML: Parsing Event
several orders, each with several items
each item has a part number and a quantity
<orders> <item>
<order> <pnum>10507</pnum>
<onum>1020</onum>
<quantity>1</quantity>
</item>
<takenBy>1000</takenBy>
<item>
<customer>1111</customer> <pnum>10508</pnum>
<recDate>10-DEC-94</recDate> <quantity>2</quantity>
<items> </item>
<item> <item>
<pnum>10506</pnum> <pnum>10509</pnum>
endElement
<quantity>3</quantity>
<quantity>1</quantity>
</item>
</item> </items>
</order>
startElement ...
</orders>
Sample Data
Orders Data in XML:
several orders, each with several items
each item has a part number and a quantity
<orders> <item>
<order> <pnum>10507</pnum>
<onum>1020</onum>
<quantity>1</quantity>
</item>
<takenBy>1000</takenBy>
<item>
<customer>1111</customer> <pnum>10508</pnum>
<recDate>10-DEC-94</recDate> <quantity>2</quantity>
<items> </item>
<item> characters <item>
<pnum>10506</pnum> <pnum>10509</pnum>
<quantity>3</quantity>
<quantity>1</quantity>
</item>
</item> </items>
</order>
...
</orders>
SAX Parsers
<?xml version="1.0"?>
.
.
. When you see
the start of the
document do …
SAX Parser When you see
the start of an
element do … When you see
the end of an
element do …
Used to create a
SAX Parser Handles document
events: start tag,
XML-Reader end tag, etc.
Factory
Handles
Content Parser
Handler Errors
Error
XML Handler Handles
XML Reader DTD
DTD
Handler
Entity Handles
Resolver Entities
SAX API
public SAXParser()
import org.xml.sax.*;
import oracle.xml.parser.v2.SAXParser;
public class SampleApp extends HandlerBase {
// Global variables declared here
static public void main(String [] args){
Parser parser = new SAXParser();
SampleApp app = new SampleApp();
parser.setDocumentHandler(app);
parser.setErrorHandler(app);
try {
parser.parse(createURL(args[0]).toString());
} catch (SAXParseException e) {
System.out.println(e.getMessage());
}
}
}
SAX API – Sample Application
Write a SAX Parser that reads the orders.xml file and extracts the
various data items and creates SQL insert statements to insert the data
into a relational database table.
//Global Variables
Vector itemNum = new Vector();
int numberOfRows, numberOfItems;
String elementEncountered, orderNumber, takenBy,
customer, receivedDate, partNumber, quantity;
//elementEncountered holds most recent element name
countries
country
continent
name city
Asia city
capital
capital name
Israel name
population country
no Ashdod population
year
6,199,008 year
yes continent name
Jerusalem 60,424,213
2001
Europe
France 2004
Using a DOM Tree
A
P Application
I
XML File DOM Parser DOM Tree
The Node Interface
getParentNode() getChildNodes()
getLastChild()
getNextSibling()
DOM Parsing Example
import org.w3c.dom.*;
import org.w3c.dom.Node;
import oracle.xml.parser.v2.*;
NodeList sl = doc.getElementsByTagName("state");
NodeList cl = doc.getElementsByTagName("city");
SAX parsers store only local information that is encountered during the
serial traversal
In particular,
They do not read backwards
They do not enable access to elements by ID or name
You can save time and effort if you send and receive DOM objects
instead of XML files
But, DOM object are generally larger than the source
• If your document is very large and you only need to extract only a
few elements – use SAX
• If you need to access the XML many times – use DOM (assuming
the file is not too large)