Unit 1
Unit 1
XML stands for Extensible Markup Language. It is a text-based markup language derived from
Standard Generalized Markup Language (SGML). Markup is information added to a document
each other. More specifically, a markup language is a set of symbols that can be placed in the
text of a document to demarcate and label the parts of that document.
XML tags identify the data and are used to store and organize the data, rather than specifying
how to display it like HTML tags, which are used to display the data. XML is not going to replace
HTML in the near future, but it introduces new possibilities by adopting many successful
features of HTML.
XML is designed to carry data, not to display data. XML tags are not predefined. You must
define your own tags. XML is platform independent and language independent. The main
benefit of xml is that user can use it to take data from a program like Microsoft SQL, convert it
into XML then share that XML with other programs and platforms. User can communicate
between two platforms which are generally very difficult.
XML was designed to carry data - with focus on what data is HTML was designed to display
data - with focus on how data looks
XML documents form a tree structure that starts at "the root" and branches to "the leaves".
With XML, data can be stored in separate XML files. This way user can focus on using HTML/CSS
for display and layout, and be sure that changes in the underlying data will not require any
changes to the HTML.
With a few lines of JavaScript code, user can read an external XML file and update the data
content of your web page.
XML data is stored in plain text format. This provides a software- and hardware-independent
way of storing data.
This makes it much easier to create data that can be shared by different applications.
One of the most time-consuming challenges is to exchange data between incompatible systems
over the Internet.
Exchanging data as XML greatly reduces this complexity, since the data can be read by different
incompatible applications.
Upgrading to new systems (hardware or software platforms), is always time consuming. Large
amounts of data must be converted and incompatible data is often lost.
XML data is stored in text format. This makes it easier to expand or upgrade to new operating
systems, new applications, or new browsers, without losing data.
Different applications can access your data, not only in HTML pages, but also from XML data
sources.
With XML, your data can be available to all kinds of "reading machines" (Handheld computers,
voice machines, news feeds, etc), and make it more available for blind people, or people with
other disabilities.
There are three important characteristics of XML that make it useful in a variety of systems and
solutions:
XML is extensible:
XML allows you to create your own self-descriptive tags, or language, that suits your
application.
XML carries the data, does not present it:
XML allows you to store the data irrespective of how it will be presented.
XML is a public standard:
XML was developed by an organization called the World Wide Web Consortium (W3C)
and is available as an open standard.
XML Syntax
The following diagram depicts the syntax rules to write different types of markup and text in an
XML document.
XML Declaration
Where version is the XML version and encoding specifies the character encoding used in the
document.
The XML declaration is case sensitive and must begin with "<?xml>" where "xml" is
written in lower-case.
If document contains XML declaration, then it strictly needs to be the first statement of
the XML document.
An HTTP protocol can override the value of encoding that you put in the XML
declaration.
If the XML declaration is included, it must contain version number attribute.
The Parameter names and values are case-sensitive.
The names are always in lower case.
The order of placing the parameters is important. The correct order is: version, encoding
and standalone.
Either single or double quotes may be used.
The XML declaration has no closing tag i.e. </?xml>
An XML file is structured by several XML-elements, also called XML-nodes or XML-tags. The
names of XML-elements are enclosed in triangular brackets < > as shown below −
<element>
<element>....</element>
<element/>
Nesting of Elements –
An XML-element can contain multiple XML-elements as its children, but the children elements
must not overlap. i.e., an end tag of an element must have the same name as that of the most
recent unmatched start tag.
Root Element –
An XML document can have only one root element. For example, following is not a correct XML
document, because both the x and y elements occur at the top level without a root element −
<x>...</x>
<y>...</y>
<root>
<x>...</x>
<y>...</y>
</root>
Case Sensitivity – The names of XML-elements are case-sensitive. That means the name of the
start and the end elements need to be exactly in the same case.
XML Attributes
An attribute specifies a single property for the element, using a name/value pair. An XML-
element can have one or more attributes. For example −
Here href is the attribute name and http://www. relianceinfo.com/ is attribute value.
XML References
References usually allow you to add or include additional text or markup in an XML document.
References always begin with the symbol "&" which is a reserved character and end with the
symbol ";". XML has two types of references −
Entity References − An en ty reference contains a name between the start and the end
delimiters. For example & where amp is name. The name refers to a predefined
string of text and/or markup.
Character References − These contain references, such as A, contains a hash mark
(“#”) followed by a number. The number always refers to the Unicode code of a
character. In this case, 65 refers to alphabet "A".
XML Text
Whitespace characters like blanks, tabs and line-breaks between XML-elements and between
the XML-attributes will be ignored.
Some characters are reserved by the XML syntax itself. Hence, they cannot be used directly. To
use them, some replacement-entities are used, which are listed below −
An XML document is always descriptive. The tree structure is often referred to as XML Tree.
The tree structure contains root (parent) elements, child elements and so on. By using tree
structure, you can get to know all succeeding branches and sub-branches starting from the
root. The parsing starts at the root, then moves down the first branch to an element, take the
first branch from there, and so on to the leaf nodes.
There is a root element named as <company>. Inside that, there is one more element
<Employee>. Inside the employee element, there are five branches named <FirstName>,
<LastName>, <ContactNo>, <Email>, and <Address>. Inside the <Address> element, there are
three sub-branches, named <City> <State> and <Zip>.
XML documents are formed as element trees. An XML tree starts at a root element and branches
from the root to child elements.
Document Prolog comes at the top of the document, before the root element. This section
contains −
XML declaration
Document type declaration
Document Elements are the building blocks of XML. These divide the document into a
hierarchy of sections, each serving a specific purpose. You can separate a document into
multiple sections so that they can be rendered differently, or used by a search engine. The
elements can be containers, with a combination of text and other elements.
XML declaration contains details that prepare an XML processor to parse the XML document. It
is optional, but when used, it must appear in the first line of the XML document.
<?xml
version = "version_number"
encoding = "encoding_declaration"
standalone = "standalone_status"
?>
Each parameter consists of a parameter name, an equals sign (=), and parameter value inside a
quote.
Rules
If the XML declaration is present in the XML, it must be placed as the first line in the
XML document.
If the XML declaration is included, it must contain version number attribute.
The Parameter names and values are case-sensitive.
The names are always in lower case.
<?xml >