0% found this document useful (0 votes)
203 views7 pages

XML Parsers: When A Software Program Reads An XML Document and Takes Actions

An XML parser is a software component that reads XML documents and provides access to their content. There are two main types of XML parsers: DOM parsers, which load the entire XML document into memory as a tree structure, and SAX parsers, which read the XML document sequentially and trigger events as elements are encountered.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
203 views7 pages

XML Parsers: When A Software Program Reads An XML Document and Takes Actions

An XML parser is a software component that reads XML documents and provides access to their content. There are two main types of XML parsers: DOM parsers, which load the entire XML document into memory as a tree structure, and SAX parsers, which read the XML document sequentially and trigger events as elements are encountered.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

XML Parsers

XML parser is a software library or a package that provides interface for


client applications to work with XML documents.
It checks for proper format of the XML document and may also validate the
XML documents.
Modern day browsers have built-in XML parsers.
The goal of a parser is to transform XML into a readable code.

 When a software program reads an XML document and takes actions


accordingly, this is called processing the XML.
 Any program that can read and process XML documents is known as
an XML processor.
 An XML processor reads the XML file and turns it into in-memory
structures that the rest of the program can access.
 The most fundamental XML processor reads an XML document and
converts it into an internal representation for other programs or
subroutines to use. This is called a parser, and it is an important
component of every XML processing program.

XML Processor Types


Validating Parsers – Check XML documents for validity.
Eg: MSXML (Microsoft, in Java), Java Project X (Sun, in Java), xmlproc (Python)
Non Validating Parser-Not Check the XML document for validity.
OpenXML (Java), xmllib (Python).
A Parser is a program or software library used to check the given document
is valid or not.
If it is valid, the XML document is converted into some memory
representation to access the document content.
XML Parsers are in two classifications.
1. DOM Parser (Document Object Model Parsers)
2. SAX Parsers. (Simple API for XML Parser).

1. DOM Parser

 Upon successfully parsing a document, The XML parsers store


document data as tree structures in memory.

 DOM parser load full XML file in memory and creates a tree
representation of XML document.

 If sufficient amount of memory in server we can choose DOM as


this faster because load entire xml in memory and works as tree
structure which is faster to access.

 for small and medium sized XML documents, DOM is much faster

<?xml version="1.0" ?>


<article>

<title> C Programming </title>


<date> July 4 , 2001 </date>
<author>
<firstname> Dennies </firstname>
<lastname> Ritchi </firstname>
</author>
<summary> C is Basic Programming </summary>
<content> This book presents variables,functions , loops </content>
</article>

The tree structure for the root element of the document article.xml
 This hierarchical tree structure is called a Document Object Model
(DOM) tree, and an XML parser that creates this type of structure is
known as a DOM parser.

In DOM Parser

 The DOM tree has a single root node, which contains all the other
nodes in the document.
 Each element name (e.g., article, date, firstName) is represented by
a node.
 A node that contains other nodes (called child nodes or children) is
called a parent node (e.g., author).
 A parent node can have many children, but a child node can have
only one parent node.
 Nodes that are peers (e.g., firstName and lastName) are called
sibling nodes.
 Uses the XML DOM API to display and manipulate the document’s
element names and values.
 The DOM tree contains various nodes like root node, element
node, attribute node, value node etc.
 Tree structure allows traversing in top to bottom and vice versa.
The API used by DOM for handling XML document is

Property/Method Description
Node (XML Element)

nodeName The name of the node.

nodeValue A string or null depending on the node type.

parentNode The parent node.

childNodes A NodeList with all the children of the node.

firstChild The first child in the Node’s NodeList.

lastChild The last child in the Node’s NodeList.

attributes A collection of Attr objects containing the attributes for


this node

insertBefore Inserts the node (passed as the first argument) before the
existing node (passed as the second argument). If the new
node is already in the tree, it’s removed before insertion

NodeList
Method that receives an index number and returns the
iItem(index i)
element node at that index. Indices range from 0 to length
– 1.
length() The total number of nodes in the list.

Document (XML File)


documentElement The root node of the document.

createElement Creates and returns an element node with the specified tag
name.

createAttribute Creates and returns an Attr node with the specified name
and value.

getElementsByTagName Returns a NodeList


Element
tagName The name of the element.

getAttribute Returns the value of the specified attribute.

setAttribute Changes the value of the attribute passed as the first


argument to the
value passed as the second argument.

removeAttribute Removes the specified attribute.

Attribute
value The specified attribute’s value.

name The name of the attribute.

2. SAX Parser

 SAX (Simple API for XML) is an event-based parser for XML


documents.
 Unlike a DOM parser, a SAX parser creates no parse tree. SAX is a
streaming interface for XML, which means that applications using SAX
receive event notifications about the XML document being processed
an element, and attribute, at a time in sequential order starting at the
top of the document, and ending with the closing of the ROOT
element.

In SAX Parser
 Reads an XML document from top to bottom, recognizing the tokens
that make up a well-formed XML document.
 Tokens are processed in the same order that they appear in the
document.

 The application program provides an "event" handler that must be


registered with the parser.
 As the tokens are identified, callback methods in the handler are
invoked with the relevant information.

SAX Parser is used


 We can process the XML document in a linear fashion from top to
down.
 The document is not deeply nested.
 When processing a very large XML document whose DOM tree would
consume too much memory.
 The problem to be solved involves only a part of the XML document.
 Data is available as soon as it is seen by the parser, so SAX works well
for an XML document that arrives over a stream.

Disadvantages of SAX
 We have no random access to an XML document since it is processed
in a forward-only manner.

 If you need to keep track of data that the parser has seen or change
the order of items, we must write the code and store the data on your
own.

The API used by SAX for handling XML document is


void startDocument() − Called at the beginning of a document.
void endDocument() − Called at the end of a document.
void startElement(String uri, String localName, String qName, Attributes
atts) − Called at the beginning of an element.
void endElement(String uri, String localName,String qName) − Called at the
end of an element.
void characters(char[] ch, int start, int length) − Called when character data
is encountered.
void ignorableWhitespace( char[] ch, int start, int length) − Called when a
DTD is present and ignorable whitespace is encountered.
Attributes Interface
This interface specifies methods for processing the attributes connected to
an element.
int getLength() − Returns number of attributes.
String getQName(int index) – returns name of attribute specified by index
String getValue(int index) - returns value of attribute specified by index
String getValue(String qname)- returns value of attribute specified by name

Differences between DOM and SAX parsers


DOM Parser SAX Parser
DOM stands for Document Object Model SAX stands for Simple API for XML Parsing
Tree Based Parser Event Based Parser

Load entire document in memory as DOM Does not load entire document.
Tree

More Memory required storing entire tree Requires less memory


structure.

Effectively suitable for smaller and efficient SAX is suitable for large XML files
memory, not for large XML files.

DOM provides API for traversing the tree in SAX provides API for traversing only in top
top to bottom and vice versa and in to bottom.
random access.

DOM allows to perform read, write and SAX does allow only read operation,
update XML document. cannot update directly.

The API for DOM is provides by the XML The API for SAX is provided by the
Parser. application languages like javascript, php,
java.

You might also like