XML DB_U3
XML DB_U3
A SEMINAR REPORT
Submitted By
ATUL KUMAR
of
BACHELOR OF TECHNOLOGY
in
SCHOOL OF ENGINEERING
KOCHI-682022
SEPTEMBER 2010
Division of Computer Engineering
School of Engineering
Cochin University of Science &
Technology
Kochi-682022
_____________________________________________________
CERTIFICATE
Xml Databases
Done by
ATUL KUMAR
ACKNOWLEDGEMENT
I express my sincere thanks to Ms. Anu M., my seminar guide for her valuable
suggestions and sincere vigilance, Mr. Sudheep P. Eliydom (Staff in charge) for providing right
guidance and co-operations and Dr David peter S. (Head of Division) for allowing us to use the
facilities. Also I would like to extend my sincere thanks to all other members of the faculty of
Computer Science and Engineering Department. Last but not least I want to thank my friends for
ATUL KUMAR
ABSTRACT
1. INTRODUCTION……………………………………………………………………………………01
2. SEMI-STRUCTURED DATA…………………………………………………………………….08
3. XML …………………………………………………………………………………………………….09
4. XML FOR SEMI-STRUCTURED DATA……………………………………………………..12
5. XML DTD : DOCUMENT TYPE DEFINITION…………………………………………….14
6. XML SCHEMA ………………………………………………………………………………………19
7. XPATH………………………………………………………………………………………………….22
8. XQUERY……………………………………………………………………………………………….25
9. XSLT……………………………………………………………………………………………………..27
10. XML PARSER……………………………………………………………………………………….30
11. XML DATABASE……………………………………………………………………………….….32
12. SOAP…………………………………………………………………………………………………..35
13. CONCLUSION……………………………………………………………………………………...36
14.REFERENCE………………………………………………………………………………………….37
Xml Databases
1. INTRODUCTION
For three decades, application developers have relied on relational databases as the bedrock
for a persistent data storage layer. While the technology is mature, today's requirements are
becoming more complex and relational databases may not be the tool for the job in hand, but
what else does a designer / developer pick if they know no better? - Relational Databases
were developed in the days of procedural programming languages (e.g. C, COBOL and
RPG), programming techniques have evolved in many ways since 30 years ago most notably
with introduction of an Object Oriented approach but the persistent storage model has stayed
the same. This article tries to question if developers have been dumbing down and creating
more work for themselves (unknowingly) for many years, this article also attempts to give an
eye-opener into a new approach of storing and retrieving data.
Commonly today, data structures are often modelled in a hierarchical object manner, imagine
a simple invoice in terms of an object hierarchy:
Invoice = {
date : "2008-05-24"
invoiceNumber : 421
InvoiceItems : {
Item : {
quantity : 1
unitPrice : 105.00
Item : {
quantity : 1
unitPrice : 75.00
Item : {
quantity : 2
unitPrice : 67.50
Table Invoices
date invoiceId
2008-05-24 421
Table InvoiceItems
Representing this simple single Invoice Object in a relational database can be done, but
immediately even for something this simple you need more than 1 table, table joins based on
keys and of course the Object has to be spanned over multiple tables. This leaves room for
human error; when inserting and updating data it is up to the developer to ensure keys
correctly match and when trying to rebuild the object from the persistent layer you need an
SQL query which will select data from multiple tables, by nature the query returns the data as
essentially a result set of flat 1 dimensional arrays and its then up to the developer to build
this hierarchical object from scratch.
To a programmer who has been developing with relational databases for some time this may
seem like second nature but for a new developer that has just learned the concepts of Object
Oriented programming this may seem a little alien.
Leaving aside the programmer's responsibility to ensure the mapping between Object and
relational structures, because the data types in SQL databases are quite simplistic all
validation must be performed within the business logic layer of an application before any data
can be inserted or updated in the database.
SQL "CREATE TABLE" and the SQL data type values a developer can bind to each column
is too simplistic to be used as a means of validating data taken directly from a user's input.
Often the business logic layer in today's applications performs additional validation, e.g.
checks that a field is a valid phone number or a valid e-mail address or even that when the
field is inserted into the SQL INSERT or UPDATE statement that it won't actually break the
syntax or cause a security breach.
Object Relational Mapping has definitely eased these problems with relational databases
because it allows a relational database to become a "virtual object database", but O/R
Mapping has brought some problems of its own. O/R Mapping techniques and frameworks
can be difficult to learn, it is by no means simple to map complex Java classes with multiple
Java class descendants to a relational structure, validating user's input is still cumbersome and
essentially still needs to be written in full in the business logic layer and it of course adds an
additional performance overhead because essentially the O/R mapping process attempts to
emulate the natural functionality of an Object oriented database.
Object oriented databases are designed to work well with object oriented programming
languages such as Java, C# and C++. Object Databases use the same model as today's
programming languages as they store and index theoretical objects. Object databases are
generally recommended when there is a business need for high performance processing on
complex data.
What has held Object databases back over the years is A. The industries resilience to change.
B. The majority of developers in the industry can't be bothered to investigate about new or
alternative technologies to the ones that are common place in industry.
However, thankfully change does happen. Today we are living in the information age,
businesses are talking to each other via complex XML data structures, (SOAP and RESTful
Web Services becoming the ever more popular means of information exchange between
disparate applications and systems).
The XML messages exchanged are by nature hierarchical and deeply tree structured,
sometimes the data is even unpredictable and sometimes the structure is prone to change at
any time, developers trying to map this data to a relational structure may find their lives
becoming more and more difficult.
XML Databases offer the same functionality of Object Databases, data is structured in a
hierarchical manner except XML Databases store XML documents instead of theoretical
Objects. While in principle this is the same concept of data storage, XML databases have the
added benefit of being able exchange the data in its native format, which is perfect for today's
requirements.
Where Object Databases have Object Query Language (OQL), XML Databases have XQuery
which is a W3C standard. XQuery covers the major functionality from former language
proposals like XML-QL, XQL, OQL and the SQL standard.
Going back to the Invoice object and a persistent layer. A developer working with an XML
Database would just need to place an XML representation of the Object into a collection.
The following is an example of the invoice data but stored in XML format
<invoice>
<number>421</number>
<date>2008-05-24</date>
<items>
<item>
<quantity>1</quantity>
<unitPrice>105.00</unitPrice>
</item>
<item>
<quantity>1</quantity>
<unitPrice>75.00</unitPrice>
</item>
<item>
<quantity>2</quantity>
<unitPrice>67.50</unitPrice>
</item>
</items>
</invoice>
Pulling up the full invoice from the XML Database requires no long winded table joins, it is
as simple as:
XQuery
collection("invoices")/invoice[number=421]
Pretty simple when you compare it to the equivilant SQL for Relational Databases:
Equivilant SQL
XML Databases can accept structured as well as unstructured data. XML documents do not
have to conform to any set Schema so a developer can fire anything they wish at the database,
no need to modify tables and columns. On the other hand, XML may conform to an XML
Schema.
XML Schema allows one to define an XML document in both its node structure (e.g.
elements and attributes) as well as the data types contained within these nodes. It allows one
to define these data types in very explicit detail, e.g. a float with additional constraints like
Maximum Number, Minimum Number, Total Digits, Fraction Digits, etc. Strings can also be
given many additional constraints including Minimum and Maximum Lengths as well as
matching a user defined Regular Expression, this is a perhaps the most effective constraint.
Because XML Schema is so powerful in terms of the explicitness of the constraints that can
be placed on XML data, potentially large amounts of validation that would normally be
performed in the business logic layer of an application can be reduced dramatically or even
completely.
A great tool for Java/J2EE Developers is Java Architecture for XML Binding or JAXB which
allows a developer to generate simple Java Bean classes which represent the structure of an
underlying XML document, the classes can be generated from an existing XML Schema.
Object/XML Mapping if you like.
JAXB allows a developer to convert XML documents into in-memory Java Bean Objects
which act as an interface to the underlying XML, it also has the ability to serialize these in-
memory Java Objects back into XML documents. Validation of the in-memory data is
performed based on the original XML Schema from which the classes were generated, which
means far less / no validation code would need to be written in the business logic layer of the
application.
JAXB also allows the developer to generate an XML Schema based on existing Java code, so
a developer can use an XML Database much like an Object database without ever getting into
the detail of using XML, XQuery or SOAP / RESTful Web Services.
Conclusion
A new project which deals with XML and/or unpredictable data, choosing to use a Relational
Database will not stop the project in its tracks but a great deal of time will be wasted on
trivial matters that could be easily solved by making use of an XML Database instead.
2. Semi-structured data
Data that is inherently self-describing and does not conform to any explicit and fixed
schema is known as semi-structured data. An example of such a data is an xml document.
The structure is implicit in such data. For example, xml tags define the structure of the data in
an xml document. The information that is associated with the schema in the normal course, is
contained within the data itself. Semi-structured data is usually formalized as labeled graphs.
Some examples of semi-structured data are letters, document, web information systems,
digital libraries, and heterogeneous data integration. A letter has a limited structure as every
letter starts with ‘to’ and ends with ‘from’ but, in between them the structure of a letter
changes from person to person, from place to place and from one situation to another. With
the advent of web the amount of flow of semi-structured data increased many fold.
Irregularity in structure:
NOTICE
To : the students
Here, in this example a notice has certain structure from ‘to’ till ‘heading’ but, after that its
structure can change significantly based on what kind of notice is it.
3. XML
Extensible Markup Language (XML) is a set of rules for encoding documents in machine-
readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several
other related specifications, all gratis open standards.
XML's design goals emphasize simplicity, generality, and usability over the Internet. It is a
textual data format with strong support via Unicode for the languages of the world. Although
the design of XML focuses on documents, it is widely used for the representation of arbitrary
data structures, for example in web services.
Many application programming interfaces (APIs) have been developed that software
developers use to process XML data, and several schema systems exist to aid in the definition
of XML-based languages.
As of 2009, hundreds of XML-based languages have been developed, including RSS, Atom,
SOAP, and XHTML. XML-based formats have become the default for most office-
productivity tools, including Microsoft Office (Office Open XML), OpenOffice.org and
Apple's iWork.
Key terminology
The material in this section is based on the XML Specification. This is not an exhaustive list
of all the constructs which appear in XML; it provides an introduction to the key constructs
most often encountered in day-to-day use.
(Unicode) Character
The processor analyzes the markup and passes structured information to an application.
The specification places requirements on what an XML processor must do and not do, but the
application is outside its scope. The processor (as the specification calls it) is often referred to
colloquially as an XML parser.
The characters which make up an XML document are divided into markup and content.
Markup and content may be distinguished by the application of simple syntactic rules. All
strings which constitute markup either begin with the character "<" and end with a ">", or
begin with the character "&" and end with a ";". Strings of characters which are not markup
are content.
Tag
A markup construct that begins with "<" and ends with ">". Tags come in three flavors:
start-tags, for example <section>, end-tags, for example </section>, and empty-element tags,
for example <line-break/>.
Element
A logical component of a document which either begins with a start-tag and ends with a
matching end-tag, or consists only of an empty-element tag. The characters between the start-
and end-tags, if any, are the element's content, and may contain markup, including other
elements, which are called child elements. An example of an element is <Greeting>Hello,
world.</Greeting> (see hello world). Another is <line-break/>.
Attribute
A markup construct consisting of a name/value pair that exists within a start-tag or empty-
element tag. In the example (below) the element img has two attributes, src and alt: <img
src="madonna.jpg" alt='by Raphael'/>. Another example would be <step
number="3">Connect A to B.</step> where the name of the attribute is "number" and the
value is "3".
XML Declaration
XML documents may begin by declaring some information about themselves, as in the
following example.
Example:
Here is a small, complete XML document, which uses all of these constructs and concepts.
<painting>
<date>1511</date>–<date>1512</date>.
</caption>
</painting>
There are five elements in this example document: painting, img, caption, and two dates. The
date elements are children of caption, which is a child of the root element painting. img has
two attributes, src and alt.
<NOTICE>
<body> air conditioner will be installed in all rooms by this week </body>
</NOTICE>
<imdb>
<show year=”2010”>
<review>
<suntimes>
</suntimes>
</review>
<review>
………………………
</review>
</show>
<show year=”2010”>
………………
………………
</show>
………………
</imdb>
Thus xml can be used to store a very small semi-structured data like notice to a huge
website like imdb.
DTDs use a terse formal syntax that declares precisely which elements and references may
appear where in the document of the particular type, and what the elements’ contents and
attributes are. DTDs also declare entities which may be used in the instance document.
Markup declarations
DTDs describe the structure of a class of documents via element and attribute-list
declarations. Element declarations name the allowable set of elements within the
document, and specify whether and how declared elements and runs of character data may
be contained within each element. Attribute-list declarations name the allowable set of
attributes for each declared element, including the type of each attribute value, if not an
explicit set of valid value(s).
DTD markup declarations declare which element types, attribute lists, entities and notations
are allowed in the structure of the corresponding class of XML documents.
An element type declaration defines an element and its possible content. A valid XML
document contains only elements that are defined in the DTD.
Various keywords and characters specify an element’s content; they can be either:
* EMPTY for specifying that the defined element allows no content, i.e. it can't have any
children elements, not even text elements (if there are whitespaces, they are ignored);
* ANY for specifying that the defined element allows any content, without restriction, i.e.
that it may have any number (including none) and type of children elements (including text
elements);
* or an expression, specifying the only elements allowed as direct children in the content
of the defined element; this content can be either:
+ ( #PCDATA ): historically meaning parsed character data, this means that only one
text element is allowed in the content (no quantifier is allowed);
+ ( #PCDATA | element name | ... )*: a limited choice (in an exclusive list between
parentheses and separated by "|" pipe characters and terminated by the required "*"
quantifier) of two or more child elements (including only text elements or the specified
named elements) may be used in any order and number of occurrences in the content.
o an element content, which means that there must be no text elements in the
children elements of the content (all whitespaces encoded between child elements are then
ignored, just like comments). Such element content is specified as content particle in a
variant of Backus-Naur Form without terminal symbols and element names as non-terminal
symbols. Element content consists of:
+ a content particle can be either the name of an element declared in the DTD, or a
sequence list or choice list. It may be followed by an optional quantifier.
+ a choice list means an mutually exclusive list (specified between parentheses and
separated by a "|" pipe character) of two or more content particles : only one these content
particles may appear in the content of the defined element at the same position.
# + for specifying that there must be one or more occurrences of the item —
the effective content of each occurrence may be different;
# ? for specifying that there must not be more than one occurrence — the item
is optional;
# If there is no quantifier, the specified item must occur exactly one time at the
specified position in the content of the element.
For example:
An example of a very simple external XML DTD to describe the schema of a list of persons
might consist of:
1. people_list is a valid element name, and an instance of such an element contains any
number of person elements. The * denotes there can be 0 or more person elements within
the people_list element.
2. person is a valid element name, and an instance of such an element contains one
element named name, followed by one named birthdate (optional), then gender (also
optional) and socialsecuritynumber (also optional). The ? indicates that an element is
optional. The reference to the name element name has no ?, so a person element must
contain a name element.
3. name is a valid element name, and an instance of such an element contains "parsed
character data" (#PCDATA).
4. birthdate is a valid element name, and an instance of such an element contains parsed
character data.
5. gender is a valid element name, and an instance of such an element contains parsed
character data.
An example of an XML file which makes use of and conforms to this DTD follows. The DTD is
referenced here as an external subset, via the SYSTEM specifier and an URI. It assumes that
we can identify the DTD with the relative URI reference "example.dtd"; the "people_list"
after "!DOCTYPE" tells us that the root tags, or the first element defined in the DTD, is called
"people_list":
<people_list>
<person>
<name>Fred Bloggs</name>
<birthdate>2008-11-27</birthdate>
<gender>Male</gender>
</person>
</people_list>
The same DTD can also be embedded directly in the XML document itself as an internal
subset, by surrounding it within [square brackets] in the document type declaration, in
which case the document may no longer depend on other external entities and could be
processed as standalone, like this:
<!DOCTYPE people_list [
]>
<people_list>
<person>
<name>Fred Bloggs</name>
<birthdate>2008-11-27</birthdate>
<gender>Male</gender>
</person>
</people_list>
6 Xml Schema
XML Schema, published as a W3C recommendation in May 2001, is one of several XML
schema languages. It was the first separate schema language for XML to achieve
Recommendation status by the W3C.
Schema documents are organized by namespace: all the named schema components belong
to a target namespace, and the target namespace is a property of the schema document as
a whole. A schema document may include other schema documents for the same
namespace, and may import schema documents for a different namespace.
XML Schema Documents usually have the filename extension ".xsd". A unique Internet
Media Type is not yet registered for XSDs, so "application/xml" or "text/xml" should be
used, as per RFC 3023.
Example
<xs:schema elementFormDefault="qualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Address">
<xs:complexType>
<xs:sequence>
<xs:element name="Country">
<xs:simpleType>
<xs:restriction base="xs:string">
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
<Address xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="SimpleAddress.xsd">
<House>49</House>
<Street>Featherstone Street</Street>
<Town>LONDON</Town>
<PostCode>EC1Y 8SY</PostCode>
<Country>UK</Country>
</Address>
7.Xpath
XPath 2.0 is the current version of the XPath language defined by the World Wide Web
Consortium, W3C. It became a recommendation on 23 January 2007.
XPath is used primarily for selecting parts of an XML document. For this purpose the XML
document is modelled as a tree of nodes. XPath allows nodes to be selected by means of a
hierarchic navigation path through the document tree.
XPath 2.0 is used as a sublanguage of XSLT 2.0, and it is also a subset of XQuery 1.0. All three
languages share the same data model, type system, and function library, and were
developed together and published on the same day.
Data model
Every value in XPath 2.0 is a sequence of items. The items may be nodes or atomic values.
An individual node or atomic value is considered to be a sequence of length one. Sequences
may not be nested.
Nodes are of seven kinds, corresponding to different constructs in the syntax of XML:
elements, attributes, text nodes, comments, processing instructions, namespace nodes, and
document nodes.
Type system
The type system of XPath 2.0 is noteworthy for the fact that it mixes strong typing and weak
typing within a single language.
Operations such as arithmetic and boolean comparison require atomic values as their
operands. If an operand returns a node (for example, @price * 1.2), then the node is
automatically atomized to extract the atomic value. If the input document has been
validated against a schema, then the node will typically have a type annotation, and this
determines the type of the resulting atomic value (in this example, the price attribute might
have the type decimal). If no schema is in use, the node will be untyped, and the type of the
resulting atomic value will be untypedAtomic.
Path expressions
The location paths of XPath 1.0 are referred to in XPath 2.0 as path expressions. Informally,
a path expression is a sequence of steps separated by the "/" operator, for example a/b/c
(which is short for child::a/child::b/child::c). More formally, however, "/" is simply a binary
operator that applies the expression on its right-hand side to each item in turn selected by
the expression on the left hand side. So in this example, the expression a selects all the
element children of the context node that are named <a>; the expression child::b is then
applied to each of these nodes, selecting all the <b> children of the <a> elements; and the
expression child::c is then applied to each node in this sequence, which selects all the <c>
children of these <b> elements.
The "/" operator is generalized in XPath 2.0 to allow any kind of expression to be used as an
operand. For example, a function call can be used on the right-hand side. The typing rules
for the operator require that the result of the first operand is a sequence of nodes. The right
hand operand can return either nodes or atomic values (but not a mixture). If the result
consists of nodes, then duplicates are eliminated and the nodes are returned in document
order, and ordering defined in terms of the relative positions of the nodes in the original
XML tree.
Operators Effect
+, -, *, div, mod,
Arithmetic on numbers, dates, and durations
idiv
=, !=, <, >, <=, General comparison: compare arbitrary sequences. The result is true if any
>= pair of items, one from each sequence, satisfies the comparison
eq, ne, lt, gt, le,
Value comparison: compare single items
ge
is Compare node identity: true if both operands are the same node
<<, >> Compare node position, based on document order
union, intersect, Compare sequences of nodes, treating them as sets, returning the set union,
except intersection, or difference
boolean conjunction and disjunction. Negation is achieved using the not()
and, or
function.
to defines an integer range, for example 1 to 10
instance of determines whether a value is an instance of a given type
cast as converts a value to a given type
castable as tests whether a value is convertible to a given type
XPath 2.0 also offers a for expression, which is a small subset of the FLWOR expression from
XQuery. The expression for $x in X return Y evaluates the expression Y for each value in the
result of expression X in turn, referring to that value using the variable reference $x.
8.Xquery
XQuery is a query and functional programming language that is designed to query
collections of XML data. The mission of the XML Query project is to provide flexible query
facilities to extract data from real and virtual documents on the World Wide Web, therefore
finally providing the needed interaction between the Web world and the database world.
Ultimately, collections of XML files will be accessed like databases.
XQuery provides the means to extract and manipulate data from XML documents or any
data source that can be viewed as XML, such as relational databases or office documents.
XQuery uses XPath expression syntax to address specific parts of an XML document. It
supplements this with a SQL-like "FLWOR expression" for performing joins. A FLWOR
expression is constructed from the five clauses after which it is named: FOR, LET, WHERE,
ORDER BY, RETUR XQuery 1.0 does not include features for updating XML documents or
databases; it also lacks full text search capability. These features are both under active
development for a subsequent version of the language.
XQuery is a programming language that can express arbitrary XML to XML data
transformations with the following features:
2. Declarative
3. High level
4. Side-effect free
5. Strongly typed.
Examples
The sample XQuery code below lists the unique speakers in each act of Shakespeare's play
Hamlet, encoded in hamlet.xml
<html><head/><body>
return
<div>
<ul>
</ul>
</div>
</body></html>
9.XSLT
XSLT (Extensible Stylesheet Language Transformations) is a declarative, XML-based language
used for the transformation of XML documents into other XML documents. The original
document is not changed; rather, a new document is created based on the content of an
existing one.[2] The new document may be serialized (output) by the processor in standard
XML syntax or in another format, such as HTML or plain text.[3] XSLT is often used to
convert XML data into HTML or XHTML documents for display as a web page: the
transformation may happen dynamically either on the client or on the server, or it may be
done as part of the publishing process. It is also used to create output for printing or direct
video display, typically by transforming the original XML into XSL Formatting Objects to
create formatted output which can then be converted to a variety of formats, a few of
which are PDF, PostScript, AWT and PNG. XSLT is also used to translate XML messages
between different XML schemas, or to make changes to documents within the scope of a
single schema, for example by removing the parts of a message that are not needed.
XSLT examples
<xsl:template match="/persons">
<root>
<xsl:apply-templates select="person"/>
</root>
</xsl:template>
<xsl:template match="person">
<name username="{@username}">
<xsl:value-of select="name" />
</name>
</xsl:stylesheet>
<xsl:template match="/persons">
<html>
<head> <title>Testing XML Example</title> </head>
<body>
<h1>Persons</h1>
<ul>
<xsl:apply-templates select="person">
<xsl:sort select="family-name" />
</xsl:apply-templates>
</ul>
</body>
</html>
</xsl:template>
<xsl:template match="person">
<li>
<xsl:value-of select="family-name"/><xsl:text>, </xsl:text>
<xsl:value-of select="name"/>
</li>
</xsl:template>
</xsl:stylesheet>
with the XML input file shown above results in the following XHTML
This XHTML generates the output below when rendered in a web browser.
Rendered XHTML generated from an XML input file and an XSLT transformation.
A parser is a piece of program that takes a physical representation of some data and
converts it into an in-memory form for the program as a whole to use. Parsers are used
everywhere in software. An XML Parser is a parser that is designed to read XML and create a
way for programs to use XML. There are different types, and each has its advantages. Unless
a program simply and blindly copies the whole XML file as a unit, every program must
implement or call on an XML parser.
The main types of parsers are known by some funny names: SAX and DOM
What is SAX?
SAX stands for Simple API for XML. Its main characteristic is that as it reads each unit of
XML, it creates an event that the calling program can use. This allows the calling program to
ignore the bits it doesn't care about, and just keep or use what it likes. The disadvantage is
that the calling program must keep track of everything it might ever need. SAX is often used
in certain high-performance applications or areas where the size of the XML might exceed
the memory available to the running program.
What's a DOM?
DOM stands for Document Object Model. It differs from SAX in that it builds the entire XML
document representation in memory and then hands the calling program the whole chunk
of memory. DOM can be very memory intensive; by the time you figure in the overhead for
managing the relationships of the nodes, you might be talking 4× to 8× the size of the
original document in memory usage.
DOM has been widely criticized for being too complicated; it has tried to maintain the same
programming interface for whatever language it is implemented in, even if it violates some
of the conventions of that language. This has led to some DOM-like implementations that
are more in keeping in line with the philosophy of the local language
An XML database is a data persistence software system that allows data to be stored in XML
format. This data can then be queried, exported and serialized into the desired format.
2. Native XML (NXD): the internal model of such databases depends on XML and uses XML
documents as the fundamental unit of storage, which are , however, not necessarily stored
in the form of text files.
O'Connell gives one reason for the use of XML in databases: the increasingly common use of
XML for data transport, which has meant that "data is extracted from databases and put
into XML documents and vice-versa". It may prove more efficient (in terms of conversion
costs) and easier to store the data in XML format.
The term "native XML database" (NXD) can lead to confusion. Many NXDs do not function as
standalone databases at all, and do not really store the native (text) form.
The formal definition from the XML: DB initiative (which appears to be inactive since 2003)
states that a native XML database:
* Defines a (logical) model for an XML document — as opposed to the data in that
document — and stores and retrieves documents according to that model. At a minimum,
the model must include elements, attributes, PCDATA, and document order. Examples of
such models include the XPath data model, the XML Info set, and the models implied by the
DOM and the events in SAX 1.0.
* Has an XML document as its fundamental unit of (logical) storage, just as a relational
database has a row in a table as its fundamental unit of (logical) storage.
* Need not have any particular underlying physical storage model. For example, NXDs can
use relational, hierarchical, or object-oriented database structures, or use a proprietary
storage format (such as indexed, compressed files).
Additionally, many XML databases provide a logical model of grouping documents, called
"collections". Databases can set up and manage many collections at one time. In some
implementations, a hierarchy of collections can exist, much in the same way that an
operating system's directory-structure works.
All XML databases now support at least one form of querying syntax. Minimally, just about
all of them support XPath for performing queries against documents or collections of
documents. XPath provides a simple pathing system that allows users to identify nodes that
match a particular set of criteria.
Many XML databases also support XQuery to perform querying. XQuery includes XPath as a
node-selection method, but extends XPath to provide transformational capabilities. Users
sometimes refer to its syntax as "FLWOR" (pronounced 'Flower') because the query may
include the following clauses: 'for', 'let', 'where', 'order by' and 'return'. Traditional RDBMS
vendors (who traditionally had SQL only engines), are now shipping with hybrid SQL and
XQuery engines. Hybrid SQL/XQuery engines help to query XML data alongside the
relational data, in the same query expression. This approach helps in combining relational
and XML data.
Some XML databases support an API called the XML: DB API (or XAPI) as a form of
implementation-independent access to the XML data store. In XML databases, XAPI
resembles ODBC and JDBC as used with relational databases. On the 24th of June 2009, The
Java Community Process released the final version of the XQuery API for Java specification
(XQJ) - "a common API that allows an application to submit queries conforming to the W3C
XQuery 1.0 specification and to process the results of such queries.
eXist Java
MonetDB/XQuery C++
12. SOAP
SOAP, originally defined as Simple Object Access Protocol, is a protocol specification for
exchanging structured information in the implementation of Web Services in computer
networks. It relies on Extensible Markup Language (XML) for its message format, and usually
relies on other Application Layer protocols, most notably Remote Procedure Call (RPC) and
Hypertext Transfer Protocol (HTTP), for message negotiation and transmission. SOAP can
form the foundation layer of a web services protocol stack, providing a basic messaging
framework upon which web services can be built. This XML based protocol consists of three
parts: an envelope, which defines what is in the message and how to process it, a set of
encoding rules for expressing instances of application-defined datatypes, and a convention
for representing procedure calls and responses.
As a layman's example of how SOAP procedures can be used, a SOAP message could be sent
to a web-service-enabled web site, for example, a real-estate price database, with the
parameters needed for a search. The site would then return an XML-formatted document
with the resulting data, e.g., prices, location, features. Because the data is returned in a
standardized machine-parseable format, it could then be integrated directly into a third-
party web site or application.
Advantages
* SOAP is versatile enough to allow for the use of different transport protocols. The
standard stacks use HTTP as a transport protocol, but other protocols are also usable (e.g.,
JMS, SMTP).
* Since the SOAP model tunnels fine in the HTTP get/response model, it can tunnel easily
over existing firewalls and proxies, without modifications to the SOAP protocol, and can use
the existing infrastructure.
Disadvantages
* Because of the verbose XML format, SOAP can be considerably slower than competing
middleware technologies such as CORBA. This may not be an issue when only small
messages are sent.[7] To improve performance for the special case of XML with embedded
binary objects, the Message Transmission Optimization Mechanism was introduced.
* When relying on HTTP as a transport protocol and not using WS-Addressing or an ESB,
the roles of the interacting parties are fixed. Only one party (the client) can use the services
of the other. Developers must use polling instead of notification in these common cases.
13. Conclusion
Xml databases are rapidly becoming the de facto standard of transferring data over the
internet. Also, because of their ability to store heterogeneous data, they are used in a wide
variety of fields like e-publishing, digital libraries, finance industry, etc.
Xml standard is evolving day and night and new standard of xml are being created for
almost every field. Open source base and simplicity has made it best tool to pass
information. We live in an era of information technology and xml databases is the best
vehicle ,we can ride on.
14. References
Books:
Websites :
http://en.wikipedia.org/wiki/XML_database
http://www.cfoster.net/articles/xmldb-business-case/
http://www.25hoursaday.com/StoringAndQueryingXML.html
http://www.stylusstudio.com/db_to_xml_mapper.html
http://www.wisegeek.com/what-is-an-xml-database.htm
http://en.wikipedia.org/wiki/SOAP