WT - Unit Ii
WT - Unit Ii
What is XML?
● XML stands for eXtensible Markup Language
● XML is a markup language much like HTML
● XML was designed to store and transport data
● XML was designed to be self-descriptive
● XML is a W3C Recommendation
The tags in the example above (like <to> and <from>) are not defined in any
XML standard. These tags are "invented" by the author of the XML document.
HTML works with predefined tags like <p>, <h1>, <table>, etc.
With XML, the author must define both the tags and the document structure.
XML is Extensible
Most XML applications will work as expected even if new data is added (or
removed).
Imagine an application designed to display the original version of note.xml
(<to> <from> <heading> <body>).
Then imagine a newer version of note.xml with added <date> and <hour>
elements, and a removed <heading>.
The way XML is constructed, older version of the application can still work:
<note>
<date>2015-09-01</date>
<hour>08:30</hour>
<to>Tove</to>
<from>Jani</from>
</note>
XML stores data in plain text format. This provides a software- and
hardware-independent way of storing, transporting, and sharing data.
XML also makes it easier to expand or upgrade to new operating systems, new
applications, or new browsers, without losing data.
With XML, data can be available to all kinds of "reading machines" like people,
computers, voice machines, news feeds, etc.
XML is a W3C Recommendation
XML became a W3C Recommendation as early as in February 1998.
XML - Tags
XML tags form the foundation of XML. They define the scope of an element in XML.
They can also be used to insert comments, declare settings required for parsing the
environment, and to insert special instructions.
Start Tag
The beginning of every non-empty XML element is marked by a start-tag. Following is
an example of start-tag −
<address>
End Tag
Every element that has a start tag should end with an end-tag. Following is an example
of end-tag −
</address>
Empty Tag
The text that appears between start-tag and end-tag is called content. An element
which has no content is termed as empty. An empty element can be represented in two
ways as follows −
A start-tag immediately followed by an end-tag as shown below −
<hr></hr>
A complete empty-element tag is as shown below −
<hr />
Rule 1
XML tags are case-sensitive. Following line of code is an example of wrong syntax
</Address>, because of the case difference in two tags, which is treated as erroneous
syntax in XML.
Following code shows a correct way, where we use the same case to name the start
and the end tag.
Rule 2
XML tags must be closed in an appropriate order, i.e., an XML tag opened inside
another element must be closed before the outer element is closed. For example −
<outer_element>
<internal_element>
</internal_element>
</outer_element>
XML - Attributes
Attributes are part of XML elements. An element can have multiple unique attributes.
Attribute gives more information about XML elements. To be more precise, they define
properties of elements. An XML attribute is always a name-value pair.
Syntax
An XML attribute has the following syntax −
....content..
< /element-name>
name = "value"
value has to be in double (" ") or single (' ') quotes. Here, attribute1 and attribute2 are
unique attribute labels.
Attributes are used to add a unique label to an element, place the label in a category,
add a Boolean flag, or otherwise associate it with some string of data. Following
example demonstrates the use of attributes −
<!DOCTYPE garden [
]>
<garden>
</plants>
</garden>
Attributes are used to distinguish among elements of the same name, when you do not
want to create a new element for every situation. Hence, the use of an attribute can
add a little more detail in differentiating two or more similar elements.
In the above example, we have categorized the plants by including attribute category
and assigning different values to each of the elements. Hence, we have two categories
of plants, one flowers and other shrubs. Thus, we have two plant elements with
different attributes.
You can also observe that we have declared this attribute at the beginning of XML.
Attribute Types
Following table lists the type of attributes −
XML Values
An XML value represents well-formed XML in the form of an XML document, XML
content, or an XML sequence.
An XML value that is stored in a table as a value of a column defined with the XML data
type must be a well-formed XML document. XML values are processed in an internal
representation that is not comparable to any string value including another XML value.
The only predicate that can be applied to the XML data type is the IS NULL predicate.
An XML value can be transformed into a serialized string value representing an XML
document using the XMLSERIALIZE function. Similarly, a string value that represents
an XML document can be transformed into an XML value using the XMLPARSE
function. An XML value can be implicitly parsed or serialized when exchanged with
application string and binary data types.
The XML data type has no defined maximum length. It does have an effective maximum
length when treated as a serialized string value that represents XML which is the same
as the limit for LOB data values. Like LOBs, there are also XML locators and XML file
reference variables.
Restrictions when using XML values: With a few exceptions, you can use XML
values in the same contexts in which you can use other data types. XML values are
valid in:
XML values cannot be used directly in the following places. Where expressions are
allowed, an XML value can be used, for example, as the argument of XMLSERIALIZE:
No host languages have a built-in data type for the XML data type.
For information on the XML data model and XML values, see SQL XML programming.
XML Dcoument
is a document which is designed following XML and one of the XML markup language standards
To perform the first 2 operations we can use DTD or XML Schema which are part of XML Specification.
And to develop an XML Application we can use XML Parsers
which are even standardized under XML specification by W3c ...i.e parser specifications
where XML Application is an application using XML Document and can be developed using any
programing language like JAVA,JAVASCRIPT,C,C++,C#.....
UNIT-3
Lecture-20
is used to declare the elements and give the type definition,where XML document can be designed
based on the type defination given by DTD
I. Elements
II. Attributes
III. Entities
IV. Notations
i)Element
Definition:
Types of Elements:
i)child only
ii)Text only
iii)Empty
iv)Mixed
v)ANY(is a special type)
i)Child only:
these type of elements consists of one or more elements as a contents
Syntax:
<!ELEMENT elemnet_name(list of child element names)>
Example:
<account>
<name> </name>
<bal> </bal>
</account>
<!ELEMENT account(name,bal)>
Example2:
<bank>
<account> </account>
<account> </account>
</bank>
<!ELEMENT bank(account*)>
occurence Specifiers
* indicate 0 or More
+ indiactes 1 or More
? indicates 0 or 1
Example:
<emps>
<emp>
<name> </name>
<sal> </sal>
</emp>
<emp>
<name> </name>
<wages> </wages>
</emp>
</emps>
<!ELEMENT emps(emp+)>
<!ELEMENT emp(name,(sal|wages))>
ii)Text only:
These type of elements can take only text as a content where char,string,int,float,double,boolean...are
considered as a text. and are refered with a type PCDATA
PCDATA:Parsed character DATA
Syntax:
<!ELEMENT element_name(#PCDATA)>
Example:
<name>cmrcet</name>
<!ELEMENT name(#PCDATA)>
<sal>1000</sal>
<!ELEMENT sal(#PCDATA)>
PCDATA allows all the characters of our encoding format except markup char like <..
iii)Empty:
These type of elements does not takes any content
Syntax:
<!ELEMENT element_name EMPTY>
Example:
<br> </br>
or
<br/>
<!ELEMENT br EMPTY>
iv)Mixed:
These type of elements can contain child elements or text or child elemnets and text or even it can be
empty
Syntax:
<!ELEMENT element_name(#PCDATA|list of child elements with | as a separator)*>
Example:
<p>Welcome,<b>to CMRCET</b> and <i>B.Tech(CSE)</i><br/>Hello
</p>
<!ELEMENT p(#PCDATA|b|i|br)*>
v)ANY
These type of elements can take any type of content i.e:text or can be empty or any element declared in
the document
Syntax:
<!ELEMENT element_name ANY>
Example:
<!ELEMENT MyElement ANY>
The above declaration describes that element MyElement can hold text and even any element declared
in the document and it can be empty also
2. Attributes:
Are used to give a extra meaning for the content described by element
Attribute resides in the opening tag of the element
One element can be declared with any number of attributes,where element name and each of
these attributes are separated with space character.
Each of the attribute consist of one name and value where these are separated with ‘=’
character and value should be in quotes ‘ or “(Single quotes or double quotes)
Attribute name cannot have a space character
Example:
<emp empno=”e101”>
Syntax to declare an attribute:
<!ATTLIST element_name attribute_name type specifier[defaultvalue]>
Types:
1. CDATA(character data):
This type allows all the characters including numbers and space character
2. NMTOKEN:
is same as CDATA but does not accept space character
3. NMTOKENS:
it accepts one or more tokens(where one token is a sequence of characters without space
character) and in this case space is taken as separator between tokens
4. ID:The value of ID type attribute should be unique
it should not start with number but it contain number
5. IDREF:it allows one of the ID type attribute value
6. IDREFS:it can take one or more ID type attribute values where space is the separator
7. enum:in this case while declaring attribute we will specify the list of values and it allows to use
any one of the specified value.
8. ENTITY:it allows one entity name where this entity should be umparsed entity
9. ENTITIES:allows one or more entity names where space is the separator
Example:
<!ATTLIST empno working(yes|no) 'yes'>
Specifiers:
#REQUIRED --------------- Mandatory
#IMPLIED ---------------Optional
#FIXED ------------- -is Optional and even if it is used it has to be given with the value which is
specified while declaring the attribute(i.e its value will be fixed)same as final in java
3)Entity:
is reference to some content.i.e is used to represent some reusable content.we have a requirement
where some content is required to be used for more number of times within the XML documents and
even in some cases we have content being repeated in DTD document also based on this requirement
Entities are classified into 2 types.
1. General Entities Entities
2. Parameter Entity
Un Parsed Entity
Parsed Entities
General Entities:
Are declared in DTD and used in XML documents
Internal Entity:
In this casethe content which has to be replaced where ever the entity is refered,will be placed in the
declaration of the entity directly i.e in DTD document itself.
Syntax:
Example:
<!ENTITY copyrights "copyrights Myshop 2013-2014">
External Entity:
Here the content which has to be replaced will be placed in separate file and in the declaration of the
entity insted of specifying the content we will provide the filename with its path.
Syntax:
<!ENTITY enitity_name SYSTEM "filename with path">
Example:
<!ENTITY mylogo SYSTEM "shoplogo.gif">
Parameter Entity:
These entities are declared and used in DTD itself
Internal entity:
Syntax:
External Entities:
Syntax:
<!ENTITY % entity_name SYSTEM "filename">
to use
%entity_name
Example:
Unparsed Entities:
To refer some content which is of different encoding format we have to go for unparsed entities
Syntax:
Notations:
These are used to refer some content which provides some additional description like
MIME/Contenttype ........
Syntax:
Example:
Example:
XML Document Structure
UNIT-3
Lecture-21
XML Schema:
Is used to declare the elements of the Markup Language and Grammar rules i.e an alternative to DTD
An XML Schema describes the structure of an XML document.The XML Schema language is also referred
to as XML Schema Definition (XSD)
An XML Schema:
DTD uses a small language to define the rules where as xml schema is xml document.XML
schema documents are more descriptive than compared to DTD
With DTD &XML Schemas we have a provision to declare complex types but with DTD the type
name and the element name should be same which is not required in XML Schema
With DTD we don’t have a support to specify a particular occurrence for a element i.e MIN and
MAX occurrence(We were allowed to given MIN as 0 or 1 and MAX 1 or more) where as with
XML Schema we can specify the required Max and Min occurrences.
DTD doesn’t supports all the common types(i.e it considers numbers.. all as text #PCDATA)
where as with XML Schema we can specific type like String,char,number,double,float,Boolean
XML schema supports NameSpace.Since XML Schema document is also an XML document it can
be generated/written using any tool which supports
We think that vey soon XML Schemas will be used in most Web Applications as a replacement for DTDs.
One of the greatest strengths of XML Schema is the Support for data types
When data is sent from sender to a receiver it is essential that both parts have the same “expectations”
about the content.
With XML Schemas,the sender can describe the data in way that the receiver will understand.
Well-Formed is not enough
A well-formed XML document is a document that conforms to the XML syntax rules:
Even if documents are well-Formed they can still contain errors and those errors can have serious
consequences. Think of this situation: you order 5 gross of laser printers, instead of 5 laser printers. With
XML Schema most of these errors can be caught by your validating software.
“note.xml”
<?xml version="1.0"?>
<note>
<to>Srinandhan</to>
<from>shashank</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>
A simple DTD
This simple DTD file called “Note.dtd” that defines the elements of the XML document
above(“note.xml”)
<!ELEMENT note(to,from,heading,body)>
<!ELEMENT to(#PCDATA)>
<!ELEMENT from(#PCDATA)>
<!ELEMENT heading(#PCDATA)>
<!ELEMENT body(#PCDATA)>
Syntax:
<?xml version=”1.0”?>
<xs:schema>
----
----
</xs:schema>
The <schema> Element may contain some attributes. A schema declaration often looks something like
this:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com"
elementFormDefault="qualified">
--
---
</xs:schema>
</xs:element>
</xs:schema>
A reference to an XML Schema:
<?xml version="1.0"?>
<note xmlns="http://www.w3schools.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance" xsi:schemaLocation="http://w3schools.com note.xsd">
<to>
Srinandhan
</to>
<from>Shashank</from>
<heading>Reminder</heading>
<body>Test</body>
</note>
Output:
Namespace:
i.e this is most required when multiple markup language elements are used in one document in such a
case if the element names are same from both the markup languages then a small prefix can represent a
element uniquely describing that the element is of a particular markup language.
org.xml.sax:
javax.xml.transform:
Defines the XSLT API that let you transform XML into other forms
You can also use the DocumentBuilder newDocument() method to create an empty
Document that implements the
org.w3c.dom.Document interface. Alternatively, you can use one of the builder's parse methods
to create a Document
from existing XML data. The result is a DOM tree like that shown in the diagram.
Example:
Shop.dtd
<shop logo="mylogo">
<item item_no="i101" type="books">
<name>item1</name>
<price units="one" type="rs">400</price>
<available_qtys>20</available_qtys>
</item>
<selected_items item_no="i101">
<discount units="percentage">10</discount>
</selected_items>
<selected_items item_no="i102">
<gift item="i101"/>
</selected_items>
<copy-rights>©rights;</copy-rights>
</shop>
Output:
ReadShopXMLFile.java
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
try {
doc.getDocumentElement().normalize();
System.out.println("----------------------------");
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
Output:
XML Parsers
An XML parser is a software library or package that provides interfaces for client applications to
work with an XML document. The XML Parser is designed to read the XML and create a way for
programs to use XML.
XML parser validates the document and check that the document is well formatted.
Let's understand the working of XML parser by the figure given below:
1. DOM
2. SAX
Advantages
1) It supports both read and write operations and the API is very simple to use.
Disadvantages
1) It is memory inefficient. (consumes more memory because the whole XML document
needs to loaded into memory).
Clients does not know what methods to call, they just overrides the methods of the API
and place his own code inside method.
Advantages
Disadvantages
1) It is event-based so its API is less intuitive.
2) Clients never know the full information because the data is broken into pieces.