WT Unit - 2
WT Unit - 2
XML
INTRODUCTION TO XML
what is XML :
* XML tags are not pre defined. You must define your own tags.
XML stands for exentensible Markup Language, developed by W3c in 1996. XML 1.0
was officially adopted as a W3C recommendation in 1998.
XML was designed to carry data not to display data . XML is designed to self-
descriptive. XML is a subset of SGML that can be defined in your own tags. A meta
language and tags describe the content ,XML supports CSS,XSL,AND DOM
Html xml
HTML stands for hypertext Xml stands for extensible markup language
markup language
Html tags are predefined XML tag are user defined tag
tag
Some of the tools used for Some of the tools are used for xml are
html are oxygen xml
Visual Studio Code Xml notepad
Notepad ++ Liquid
studio and many Liquid studio and many more
more
Notepad And many
more
HTML:EXAMPLE XML:EXAMPLE
<html> <college>
<body> <class1>
<p>student1</p> <student>sukanya</student>
</body> </class1>
</html> </college>
Advantages of xml:-
4. Easy to read and write: XML documents are human-readable and can be easily
written and edited.
5. Extensible: XML allows you to define your own tags and attributes, making it
highly extensible.
6. Open standard: XML is an open standard, maintained by the World Wide Web
Consortium (W3C).
7. Wide adoption: XML is widely adopted and supported by most programming
languages, tools, and technologies.
8. Data exchange: XML is ideal for exchanging data between different systems,
applications, and organizations.
9. Searching and querying: XML documents can be easily searched and queried
using technologies like XPath and XQuery.
10. Transformability: XML documents can be transformed into other formats, such
as HTML, CSV, or JSON, using XSLT.
Goals of XML:-
The user must be able to define and use his own tags .
Allows the user to build his own tag library, based on his web requirement.
Allows user to define the formatting rules for the user defined tags.
xml must support storage or transport of data
XML should be easier to write programs
An xml file is structured by the several xml elements, also called the xml nodes or
xml tags. XML elements names an enclosed by the triangular brackets <> also
known below
<element/>
Element Syntax: Each XML-element needs to be closed either with start or with end
elements as shown below:
<element>....</element>
Nesting of elements:
<?xml version="1.0"?>
<contact_info>
<company>TCS</company>
<contact_info>
Let us learn about one of the most important part of XML, The XML tax form the foundation of
XML. They define the scope of an element in the xml they can also be used to insert comments
declare setting required for piercing the environment and the insert special instructions
Start tag:
The beginning of every non empty xml element is marked by a start tag. An example for the
start tag is
<address>
End tag:
Every element that has a start tag should end with an end-tag . An example for the end tag
is
</address>
* Note that the end tag includes a soliders (“/”) before the name of an element
Empty tag:
The text that appears between start tag and end-tag is called content. An element which is
no content is termed as empty An empty element can be represented in two ways as below:
(1) A start tag immediately follows by an end tag is also shown below:
<hr> </hr>
(2) A complete empty element tag is shown as below:
<hr/>
Empty element tag may be used for any element which has no content
Following are the rules that need to be followed to use xml tags:
Rule 1:
XML tags are case-sensitive, following line of the code is an example for wrong syntax
</Address>
because of the case difference in two tags, which is created as erroneous syntax in XML
following code shows a correct way ,where we use the same case to name the start and the
end tag .
Rule 2:
XML tax must be closed in an appropriate order i.e., an XML tag opened inside another
element must be closed before the outer element is closed.
<outer_element>
<internal_element>
This tag is closer before the outer element
</internal_element>
</outer_element >
Xml elements:
Xml elements can be defined as a building blocks of an xml. Elements can behave as
containers to hold text, elements ,attributes, media objects or all of these.
* each Xml documents contains one or more elements, the scope of which are either
eliminated by start and end tags or for empty elements by an empty element tag
Syntax:
Where,
Element-name is the name of the element the name its case in the start and element tag must
match
Attribute1 ,attribute2 are attributes of the element separated by white space An attribute
defines a property of the element. It associate a name which with a value which is a string of
characters An attributes is written as:
name=”value”
The name is followed by an = sign and a string value inside double (“ “)or single (‘ ‘) quotes
Empty element:-
An empty element (element with no content) as following syntax:
<?xml version=”1.0”?>
<contact-info>
<address category=”residence”>
<name>xyz </name>
<company> tutorial point</company>
<phone>123456789</phone>
</address>
</contact-info>
An element name can contain any alphanumeric character. The only punctuation
mark allowed in names are the hyphen ( - ),underscore ( _ ) and period ( . )
Names are case sensitive for example Address,addressand address are different
names
An element which is a container, can contain text or elements as seen in the above
example.
Root element:
An XML document can have only one root element. For example, following is not a correct
XML document, because both the x and y elements occur at the top level without a root
element:
<x>...</x>
<y>...</y>
<root>
<x>...</x>
<y>...</y>
</root>
Seen from ADTD point of view at all. XML documents were made by the
following building blocks.
a) Elements
b) Attributes
c) Entities
d) PCDATA
e) C DATA
A) Elements
Declaring elements
In dtd xml elements are declared with the following syntax.
OR
<!ELEMENT ELEMENT-NAME(ELEMENT-CONTENT)>
<!ELEMENT element-name(child)>
Or
<!ELEMENT element-name(child1,child2)>
Example:-
<!ELEMENT name(#PCDATA)>
Attributes:-
#IMPLIED:-
#FIXED:
or
Example
//abc.dtd
<!ELEMENT name(#PCDATA)>
//abc.xml
<student>
<branch dept=”mca”>
<rno>101</rno>
</branch>
</student>
Internal DTD you can write rules inside xml document using <!DOCTYPE..>
declaration. scope of this DTD within this document. Advantages is document
validated by itself, without external reference
External DTD you can write rules in a separate file (with . dtd extension) later
this file linked to a xml document. This way you can linked several xml
documents refer same DTD rules
Internal DTD:-
Internal DTD. You can declare inside your xml file in xml file top <!DOCTYPE...>
declaration to declare the DTD
<!DOCTYPE root_element [
.........................
.......
]
Following internal DTD. Example, define root element <student> and other elements,
or second level element, along with the discipline attitude.
DTD rules must be placed specifies top of the xml element ( root element) in the
document.
<!DOCTYPE root_element [
<!ELEMENT name(#PCDATA)>
<student>
<branch dept=”mca”>
<rno>101</rno>
</branch>
</student>
External DTD:-
External duty are shared between multiple xml documents, any changes or updates
in DTD Document effects or update come to all xml documents
Private DTD:-
Private DTD identified by the SYSTEM keyboard access for single or group of users.
You can specify the rules in the external dtd file with .dtd extension later in xml file.
<!DOCTYPE...> declaration is present to link the DTD file
Syntax:
Example:-
Public DTD:-
Public DTD identified by the PUBLIC keyword access any users and our xml editor
are known as known to DTD
Syntax:-
<!DOCTYPE root_element PUBLIC ”dtd-name” “dtd-file-location”>
Example:-
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN ”
“http://www.w3.org/TR/chtml1/DTD/Xhtml-transitional.dtd”>
XML SCHEMA:-
XML SCHEMA is commonly known as XML schema definition(XSD) . It
is used to describe and validate the structure and the content of the xml data. Xml
schema defines the elements, attributes and data types schema element supports
namespaces. It is similar to database schema that describes the data in the
database.
Syntax:-
The following example show how to use scheme
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="contact">
<xs:complexType>
<xs:sequence>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The basic idea behind XML schemes is that they describe the legitimate format that an
xml document can take
Elements
As we saw in xml element chapter elements are the building blocks of the XML
document an element can be defined within a XSD as follows
Definition types
You can define XML schema elements in the following ways
Simple type
Complex type
Global Type
Simple type:-
Simple type element is used only in the context of the text. Some of the pre defined
simple types are
For example:-
<xs:element name=”phone_number” type=”xs:int”/>
Complex type:-
A complex type is a container for other element definitions. This allows you to specify
which child element can contain and to provide some structure within your xml
documents.
<xs:element name="Address">
<xs:complexType>
<xs:sequence>
</xs:sequence>
</xs:complexType>
</xs:element>
In the above example, address element consists of child element. This is the container
of the <xs: element> definitions that allows to build a simple hierarchy of elements in
the xml document.
Global type:-
With the global type, you can define a single type in your document, which can be
used by War references for example, suppose you want to journalise the person and
company for different addresses of the company. In such case, you can define a
general type as follows
<xs:element name="Address">
<xs:complexType>
<xs:sequence>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType>
<xs:sequence>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType>
<xs:sequence>
</xs:sequence>
</xs:complexType>
</xs:element>
Instead of having to define the name and the company twice, (once for address one
and once for the address ), we now have a single definition This makes maintenance
simpler i.e., if you decide to add “postcode” element to the address you need to add
them at just one place.
Attributes:-
Attributes is XSD provides extra information within an element. Attributes have name
and type property as follows, as shown below
The html DOM defines a standard way for accessing and manipulating html
documents. It presents the html documents as a tree structure.
In XML DOM is a standard for how to get ,change,Add or delete xml elements
XML PROPERTIES:
1) nodeName
2) node value
3) Parent Node
4) childNode
5) attributes
methods:-
1) getElmentByTagName(“ “)
2) appendChild(node);
3) removeChild(node);
<student>
<branch Br=”CSE”>
<title> xyz</title>
</branch>
<branch Br=”ECE”>
</branch>
</student>
For inserting a new element, we can use append child, which insert a child node to a
specified tag
X= document.appendChild(“marks”)
Example:-
getElementByTagName(‘dept’)[1].childNode[0].nodeValue=240;
program:-
<college>
<dept Branch=’CSE’>
<faculty> 30 </faculty>
</dept>
<dept Branch=’ECE’>
<faculty> 30 </faculty>
</dept>
</college>
newElement=xmlDoc.CreateElement(“lab”);
Create an attribute:-
xmlDoc.getElementByTagName (“dept”) [2] . setAttribute(‘ branch’ , ’ mech ’ );
To remove child:-
xmlDoc.getElementByTagName (“dept”) [0] . childNode[2]. Removechild ( );
XHTML
XHTML is html written as xml
What is xhtml?
xhtml stands for Extensible Hypertext Markup language
X H T M L is almost identical to H T M L
XHTML is supported by all major browsers
xhtml elements:-
xhtml elements must be properly nested
XHTML attributes:-
Attributes name must be in lower case
<!DOCTYPE....> is mandatory:-
An xhtml document must have a xhtml doctype declaration.
A complete list of all the exhtml doc types is found in our html tax reference
The <html> <head> <title> and <body> Elements must also be present and the
xml ns attribute in html must specify the hml namespace for the document
Xhtml elements must always be closed
Empty elements must always be closed
XHTML elements must be in lower case.
Xhtml attributes name must be in the lower case
Attribute values must be quoted
XML namespace:-
XML namespace provides a method to avoid element named conflicts
To avoid such type of conflicts We use a name prefix
Example:-
Instead of <table>, we write <f :table> for all children of table. It should be started
/ended with the same prefix ‘ f ’
<f:table>
<f:tr>
<f:td> </f:td>
<f:td> </f:td>
</f:tr>
</f:table>
Note:-
XMLNS attribute in Html Specifies the xml namespace for a document. This attribute
will be added to first element of your xml
Structure of xhtml:-
1. Doctype
2. Header .
3. Body
//Create a student table, which has two columns, row number and name with three
rows.
Student .xhtml
<!doctype html public “_//w3c//DTD xhtml
1.0 transitional //EN “ “http: // www . w3.org / TR/ xhtml / DTD xhtml . dtd ‘ >
<head> </head>
<body>
<table>
<tr>
</tr >
<tr>
</tr>
<tr>
<td> 502 </td>
</tr>
<tr>
</tr>
</table>
</body>
</html>
*parsing the Xml Doc using Dom methods and properties are called a tree based
approach. Whereas use SAX (simple api for XML) methods and properties are
called as event based approach
Xml
parsing
Object
based(eg:
EVENT
DOM)
based
Push Pull
parsing(eg:SAX) parsing(eg:-
stAX
DOM SAX
1) Tree data structure 1) Event based model.
2) Random access 2) serial access.
3) high memory usage 3) low memory usage.
4) used to process multiple lines( 4) used to process the document only
document is loaded in memory) once.
5) used to edit the document 5) used to process parts of the
document
6) Stores the entire external 6) parses node by node
document into memory before
Processing
7) Occupies more memory 7) Doesnt store the xml in memory
8) We can insert or delete nodes 8) We can’t insert an or delete nodes
9) Traverse in any direction 9) Top to bottom of traversing
10) Document object model (DOM) 10) SAX is a central API for XML
11) Import javax.xml.parsers.x; 11) Packages required to import
Import javax.xml.parsers.*;
Import Import org .xml.sax.*;
javax.xml.parsers.documentBuilder;
Importjavax.xml.parsers.
documentBuilder
Factory;
12)DOM is slow rather than SAX. 12) SAX Generally Run a little faster
than DOM.
Document object model is far defining the standard for accessing and
manipulating xml documents Xml DOM is used for
loading the xml document
accessing the xml document,
deleting the element of xml document
changing the xml elements of xml document,
according to the dom everything is an Xml document is a node it considers
the entire document is a document node.
Every xml element is an element node.
The text in the xml element is a text nodes.
Every attribute is an attribute node.
comments or comment nodes
Dom Parser, parses the entire XML documents and loads it into memory. Then models it in a
“tree” structure for easy traversal or manipulation.
In short, it turns a xml file into DOM or tree structure. And you have to traverse a
node by node to get what you want.
Document builder
factory
Document (DOM)
Object
object object
Document
XML Data object object
In this approach to access xml document object model implementation is defined in the
following packages.
The following methods and properties are necessary to process the xml document
property meaning
Node name Finding the name of the node
Method meaning
Getelementbytagname(name) To access the element by specifying its name
import java.io.*;
rootele.appendChild (studentele );
Student xml:-
<?xml version="1.0" encoding ="UTF-8" standalone = "no" ?>
<student details>
<student>
</student>
</student-details>
SAX parser is different from the Dom Parser, where SAX Parser doesn’t load the complete xml
into the memory instead. It parses the xml line by line triggering different events as and when
it encounters different elements, like opening tag, closing tag character data and comments
and so on. This is the reason why SAX parser is called an event based parser
Along with the XML source file, we also register a handler which extends the default handler
class. The default handler class provides different call backs out of which we Would be
interested in
startElement( ):- Triggers this event when the start of the tag is encountered
endElement( ):- Triggers this event when the end of the tag is encountered
character( ):- Triggers this event when it encounters some text data
Startdocument( )
Startelement( )
XML
document Characters( )
SAX Content handler
parser endElements( )
Enddocument( )
Let's create a demo Program to read xml file with SAX parser to understand fully.
Student. xml
<student >
</student >
<student>
<(student -details>