07 Java API For XML Processing Jaxp
07 Java API For XML Processing Jaxp
Introduction
XML, DTD, XSD
XPath, XQuery
XSLT
Parsing XML document
Using SAX
StAX
Using DOM
XML transformation
2
XML Introduction
• Features
o SGML ensures to represent the data in its own way.
o HTML allows the user to use any text editor
3
XML Features
4
Sample XML Document
5
Benefits of XML
6
XML Namespace
7
XML Namespace example
8
Well Formed XML Documents
9
Document Type Definition (DTD)
2. ELEMENT declaration
o Specifies the name of the element, the
content which that element can contain
3. ATTRIBUTE declaration
o Specifies the element that owns the
attributes, the attribute name, its type
and its default values (if any)
4. ENTITY declaration
o Specifies the name of the entity and
either its value or location of its values.
11
Example of Internal DTD Declaration
12
Example of External DTD Declaration
13
DTD limitations and XML Schema
15
Example of XML Schema
16
Referencing a Schema in an XML Document
17
• Read more:
http://www.w3schools.com/xml/default.asp
• …
18
JAXP
Java API for XML Processing
19
Parsing anh Parser
20
JAXP – Java API for XML Processing
• JAXP is a collection of APIs that you can use in your Java
applications to process and translate XML documents
• Consists of three APIs:
o Simple API for XML (SAX): Allows you to use a SAX parser to process
the XML documents serially.
o Document Object Model (DOM): Allows you to use a DOM parser to
process the XML documents in an object-oriented manner.
o XML Stylesheet Language for Transformation (XSLT): Allows you to
transform XML documents in other formats, such as HyperText Markup
Language (HTML)
21
XML API Styles
22
SAX
Simple API for XML
23
Parsing an XML Document
Callback
Method/
Input Event
XML Content/Default
document Parser ... Handler
24
Parsing an XML Document (cont)
• Steps for processing of parsing in code are:
1. An instance of SAXParserFactory is generated by the parser to
initialize the working of SAX.
2. The parser encloses a SAXReader object.
3. The parser() method of the SAXParser class is invoked.
4. The SAXReader invokes a callback method
25
Callback interface
• SAX is used with XMLReader as the Subject and org.xml.sax.ContentHandler/
org.xml.sax.helpers.DefaultHandler as an Observer (is similar to define the event Listener to
subject – button in AWT) Callback in SAX parser
endDocument()
26
Parsing example
27
Content parsing API (1)
• Used for parsing the XML content and returning a SAX parsed
document
• The org.xml.sax.ContentHandler interface:
o Receives notifications of the logical content of the document being
parsed
o Application can implement this interface and register an instance with
the SAX parser using the setContentHandler() method.
o The parser uses this instance to report events
o The order of events in this interface is very important and it mirrors
the order of information in the document itself.
o The different methods must be implemented
28
Content parsing API (2)
29
Content parsing API (3)
31
Attributes interface
Methods Descriptions
33
Errors in processing with XML
• Fatal Errors
o Occur in the SAX parser
o XML document is not well-formed
o XML document cannot be processed because it terminates the execution
• Non-Fatal Errors
o Validation errors in SAX parser are termed
o An XML document is not valid.
o A declaration specified by an XML version, which cannot be handled by the
parser
• Warnings
o Are generated when a DTD contains duplicate definitions
o Generated during XML validation are not errors but the user needs to be
informed about it
34
Read more
• http://www.cafeconleche.org/slides/sd2000east/sax/
35
StAX
Streaming API for XML
36
Streaming API for XML
• Goals:
o Develop APIs and conventions that allow a user to programmatically pull parse
events from an XML input stream.
o Develop APIs that allow a user to write events to an XML output stream.
o Develop a set of objects and interfaces that encapsulate the information
contained in an XML stream.
37
Major Classes and Interfaces
• XMLStreamReader:
o an interface that represents the parser
• XMLInputFactory:
o the factory class that instantiates an implementation dependent
implementation of XMLStreamReader
• XMLStreamException:
o the generic class for everything other than an IOException that might
go wrong when parsing an XML document, particularly well-formedness
errors
• XMLStreamWriter
o An event based API for creating XML documents
38
Reading document example
39
StAX methods on states (1/2)
Event Type Valid Methods
40
StAX methods on states (2/2)
Event Type Valid Methods
41
Creating document example
42
Read more
• http://www.cafeconleche.org/slides/sd2004west/stax/index.html
43
DOM
Document Object Model
44
DOM
45
DOM components
• DOM is an API for HTML and XML documents
• A Java implementation of the DOM model defines the logical structure
of XML documents and the way a document is accessed and manipulated
• DOM acts as a language-independent interface between programs to
access and modify data in a document. This provides the retrieved data
in XML document a logical structure and style of representation. XML
presents data in a tree format so DOM also presents the XML data in
the same way
• DOM creates a tree structure of the XML document with each XML
element represented as a node
• DOM more effective to act as an interface between an application
written in Java and XML data because both Java and DOM are platform
independent 46
Working of DOM
• The DOM works with the following steps:
o The methods of DocumentBuilder class in the DOM API create a tree structure of
the XML document and represent each element as a node.
o The methods contained in various interfaces in the DOM API provide access to the
document and its nodes to add, modify, or delete nodes or elements in the
document.
47
DocumentBuilderFactory
48
DocumentBuilder
49
Working of DOM example
50
Node interface
• Acts as the primary data type for the whole DOM model
• Contains various methods to access and manipulate the nodes in a DOM
document
Methods Descriptions
52
Node interface example (2)
53
Document interface
• Represents the entire XML document
• Is the root of the DOM tree, which provides access to the data in the XML document
• Contains factory methods to create the elements, text nodes, comments, and
processing instructions
Methods Descriptions
Returns the document type associated with the document.
getDocType
Returns null if the XML document is without a document type.
- Returns the attribute that allows direct access to the root element of the XML
getDocumentElement
document
- Created an element of the specified type.
createElement - Throws the DOMException by raising INVALID_CHARACTER_ERR when an
illegal character is encountered in the specified name
createTextNode - Creates a Text node with the string specified as the argument
- Creates an Attr in the given name. The instance of the attribute then can be set to an
element using the setAttributeNode() method.
createAttribute
- Throws the DOMException by raising INVALID_CHARACTER_ERR for
encountering an illegal character in the specified name.
- Returns an instance of the NodeList interface of all the elements with a given tag
getElementsByTagName name in the order in which they are encountered in a pre order traversal of the
document tree 54
NodeList & Element interface
• NodeList Interface
o Provides an instance of the interface
o Defines the only method, item() that returns the specific item the node list
identified by its index number. If the specified index number is greater than the
number of nodes in the node list then this method returns null
o public Node item(int index)
o The NodeList object represents an abstract presentation of all the Node objects in
a document. So any change to the Node objects in a NodeList is reflected in the list
o This collection of ordered nodes facilitates indexed access to individual nodes. As
the list is ordered, iterative traversing through the list is possible
• Element Interface
o Provides an instance of the interface
o The Element object is a type of node encountered in a document tree
o Provides several useful methods to handle the properties of the elements
55
NodeList example
56
Attr interface
57
Text interface
59
Manipulating DOM
60
Node
61
Nodes in DOM tree
• Document
o Represents the entire DOM document. Is the root node of the XML document.
o The Document interface then manipulates the Document node through the methods defined in it. This
type of node can contain only a single child node. Its child nodes can be a processing instruction element,
a document type element or a comment element.
• Document Fragment
o Holds a portion of a complete document.
o Is created by the methods present in the Document interface.
o Can have processing instruction, comment, text, CDATA section, and entity reference as its child nodes.
• Document Type
o Each document has a DOCTYPE attribute. It can have value as null or an object of the DocumentType
interface.
o Provides an interface to the entities defined for the document.
• Processing Instruction
o This is just a processor specific instruction kept in the XML document.
o The Document interface creates a Processing Instruction node.
62
Types of Node
A CDATA section starts with “<!
[CDATA[" and ends with "]]>":
• Entity
<script>
• Entity Reference <![CDATA[
function matchwo(a,b){
• Element if (a < b && a < 0) then {
return 1;
• Attribute
}
• Text else {
return 0;
• CDATA Section }
}
• Comment
]]>
• Notation </script>
68
Deleting Nodes
• Remove an Element
element.getParentNode().removeChild(element);
• Remove an Attribute
element.removeAttribute("attribute name");
69
Appending Nodes
70
Seeking Nodes
• getParentNode
• getChildNode
• getFirstChild
• getLastChild
• getNodeName
71
Seeking Elements
• getElementsByTagName
• getElementsByTagNameNS
• getTagName
• getAttributeNode
72
Modifying a documents
Methods Descriptions
public DocumentFragment
createDocumentFragment()
- Creates an empty DocumentFragment object and
createDocumentFragment returns it.
- You may then add elements, nodes, and so on to this
fragment just the way you create a tree under the root
node
public CDATASection
createCDATASection(String data) throws
DOMException
- Creates a CDATA Section node whose value is the
createCDATASection specified string passed as an argument.
- Throws the DOMException for encountering
NOT_SUPPORTED_ERR error condition. This error is
raised when the document is an HTML document.
73
Modifying an Attribute
• removeAttribute
• removeAttributeNode
• setAttributeNode
74
DOM Level 2 Modules
75
The DOM Level 2 (DOM2)
• DOM 2 modules:
o Core
o Views
o Style
o Event
o Traversal
o Range
o HTML
o CSS
76
Core Module
• Is the fundamental specification or
module.
• Defines a set of objects and
interfaces to access and manipulate
parsed XML content.
• Has incorporated new ways to
traverse and manipulate the XML
documents through other optional
modules.
• Facilitates creating and populating a
Document object through the DOM
API calls.
• Extends the functionality of the
DOM core 1 with some added
features.
77
Range Module(1)
79
80
Event Module (1)
•Design a general event system
that allows registration of event
handlers, defines event flow
through a tree structure, and
provides basic contextual
information for each event.
•Develop compatibility between
the current event systems used in
DOM Level 0 browsers and DOM
Level 2 browsers.
81
Event Module (2)
• Example:
82
Error event
83
Traversal Module (1)
• Allows programs and scripts to traverse
through a DOM tree and identify a range of
content in the document dynamically.
• Allows the traversing the DOM tree to access
the content in it.
• Contains the TreeWalker, NodeIterator, and
NodeFilter interfaces to facilitate easy traversal
through the document content.
84
Traversal Module (2)
• Using TreeWalker
85
Traversal Module(3)
• Using NodeIterator
86
Using NodeFilter
• Facilitates the creation of the object that will filter out specific nodes
present in a NodeIterator or TreeWalker.
• The filter object has a user defined function to decide whether or not a node
should be part of the traversal’s logical view of the document.
• Override the acceptNode() method that return these constants:
o FILTER_ACCEPT: indicates that the node will be a part of the logical view of the
sub-tree.
o FILTER_SKIP:
• Indicates that the node is not a part of the logical view of the sub-tree.
• In this case, the current node is considered as absent in the logical
view, but its child not can be part of the logical view.
o FILTER_REJECT: indicates that the node and its descendants cannot be present in
the logical view of the sub-tree
87
Traversal Module(4)
• Using NodeFilter
88
CSS Module
• Is an optional module in the DOM. Its implementation requires the implementation of the Core
module.
• To support its implementation, the hasFeature(feature, version) method needs to pass the feature as
CSS and the version as 2.0.
• It defines interface to provide a mechanism to access and manipulate the CSS documents
dynamically
Interfaces Descriptions
• View Module
o Provides interfaces to facilitate the presentation of XML documents.
o Is optional in nature. Its implementation requires the implementation of the DOM 2
Core module.
• Style Module
o Provides interfaces to enable programmers to dynamically access and manipulate style
sheets.
o Is optional in nature. Its implementation requires the implementation of the DOM 2
Core module.
• HTML Module
o Allows programs and scripts to access and modify the content and structure of HTML
documents dynamically.
o Extends the interfaces defined in the DOM 1 HTML module.
91
XSLT
92
Intro to XML Transformations
93
TrAX API (1)
94
TrAX API (2)
Packages Descriptions
- Implements the generic APIs for transformation
instructions and executing the transformation,
starting from source to destination.
- The interfaces present in this package are
javax.xml.transform
ErrorListener, Result, Source, SourceLocator,
Templates, URIResolver.
- The classes present in this package are
OutputKeys, Transformer, TransformerFactory
- Implements streams and URI specific
javax.xml.transform.stre transformation APIs.
am - The classes present in this package are
StreamResult, StreamSource
- Implements DOM-related transformation APIs.
javax.xml.transform.dom - The interface present in this package is
DOMLocator
95
- Implements SAX-related transformation APIs.
XSLT Stylesheet (1)
97
Transformer
98
TransformerFactory
99
Source and Result
100
Template
101
Transforming XML Document
102
Example
You can display the output in console by using System.out instead of using
outfile.
Using transformer.setOutputProperty(OutputKeys.INDENT, "yes") to
display result with indentation 103
Transform with parameter
104
JAXP API
For XPath Processing
105
JAXP API For XPath Processing(1)
106
JAXP API For XPath Processing(2)
• The XPath interface gives syntax for traversing through the nodes
in an XML document.
• The XPathExpression interface deals with location path and
predicates.
• The XPathFactory class is used for creating XPath objects.
• The XPathConstants class defines the data types such as Boolean,
NodeSet, number, and string for working with nodes in and XML
document
107
XPath Example
108
Namespace Context
110
Namespace Context Example
111
Processing Namespace Context Example
112
Schema Validation Framework
113
Definition and Purpose
114
Validation API
o Enables to parse only the schema and check the syntax and
semantics on the basis of the imposed schema language.
o The javax.xml.validation API helps to validate the XML
document.
115
XML Validation against a DTD
116
DOM validation against DTD Ex
117
Schema Compilation
118
119
XML Validation against a Compiled Schema
120
Validating SAX & DOM Source
121
122
Validating DOM Source Example
123
Validating SAX Source Example
124
XML Validation After Transformation
125
Validating SAX Stream
126
Validating Transformed DOM
• The transformed result from Transformation APIs can be
obtained as a DOM object.
• Schema can be used to validate this DOM object in the memory.
Since there is no parsing involved when validating a transformed
XML document, this approach boosts performance
127
Data Security
128
XML schema to java type mapping
129
JAXB
Java API for XML Binding
130
JAXB
132
JAXB Architecture
133
JAXB Architecture
• First of all, JAXB can define a set of classes into an XML schema
by using a schema generator.
• It also enables the reverse action, allowing you to generate a
collection of Java classes from a given XML schema through the
schema compiler.
• The schema compiler takes XML schemas as input and generates a
package of Java classes and interfaces that reflect the rules
defined in the source schema.
• These classes are annotated to provide the runtime framework
with a customized Java-XML mapping.
134
JAXB Architecture
135
JAXB Architecture
136
demo
137
JAXB Demo: Object to XML
138
JAXB Demo: XML to Object
139
Questions
140
That’s all for this session!
141
141/27