IT g12 Unit 4 Note 1
IT g12 Unit 4 Note 1
Web development
4.1 Introduction to XML
A markup language is a computer language that uses tags enclosed with less than (<) and greater
than (>) symbols to define elements within a document.
When the file is processed by a suitable application, the tags are used to control the structure
or presentation of the data contained in the file.
Any text that appears within one of these tags is considered part of the markup language.
Markup files contain standard words, rather than typical programming syntax.
XML is a way of applying structure to a web page.
XML provides a standard open format and mechanisms for structuring a document so that it
can be exchanged and manipulated.
XML is used for storing structured data, rather than formatting information on a page.
While HTML documents use predefined tags (such as “ <head>”, “<title> ”), XML files use custom
tags to define elements.
Microsoft Rich Text Format (RTF), Adobe Portable Document Format (PDF), and HTML are
types of markup languages that provide presentational markup.
They are powerful solutions to the problem of displaying information.
Their common limitation is that they describe how the data looks, but they do not give any
information about what it is.
The custom tags make your data easier to organize and more searchable.
For example, a student might describe the book she reads on vacation time like this:
<books>
<book>
<title>An Introduction to XML and Web Technologies </title>
<author> by Anders Moller, Michael Schwartzbach</author>
<type>Programming Languages</type>
</book>
</books>
XML does not change the way your web pages looks; instead, it changes the way the documents
are read and filed and stored.
Therefore, XML is used to describe the structure of a document rather than the way it is
presented.
The two areas in which XML is useful are structuring data for storage, where a relational
database (See Unit Three about database) is inappropriate, and the presentation of web pages.
4.1.1 Elements of XML Documents
XML document must contain one root element that is the parent of all other elements,
for example “<people>” in Figure 4.1).
The numbers shown in Figure 4.2 are interpreted as follows:
1. XML declaration: describes the general characteristics of the document, such as XML document,
the version of the XML, and the encoding character it uses. XML documents usually begin with the XML
declaration statement called the processing instructions which provide information on how the XML
file should be processed.
The processing instruction statement uses the encoding property to specify the encoding
scheme used to create the XML file. Encoding is the process of converting Unicode character
into their binary equivalent representation depending on the type of encoding(‘UTF-8’ or ‘UTF-
16’).
2. Document Type Declaration (DTD): describes the structure of the document in terms of which
elements it may contain, along with any restrictions it may have. In other words, it describes the root.
The above example is about people. The document ‘people’ is described with five elements. These are
described below.
E.g. <!DOCTYPE people […]>
3. Internal DTD subset: a DTD is internal if the elements are declared within the same XML file. In
the following example, internal declarations that are local to the XML document are used.
<!ELEMENT people (person+)>
<!ELEMENT person (name)>
<!ELEMENT name (first, last)>
<!ELEMENT first(#PCDATA)>
<!ELEMENT last(#PCDATA)>
Notes
PCDATA – stands for Parsed Character DATA. Character data is a text that is found between the
start tag and the end tag of an XML element. E.g. Abebech in Figure 4.2.
4. XML information set or Content: this represents the XML document’s content—the information
that the document conveys. Content refers to the information that is represented by the elements of
an XML document. See the example in Figure 4.3.
<people>
<person>
<name>
<first>Abebech</first>
<last>Kebede</last>
</name
</person>
<person>
<name>
<first>Jemal</first>
<last>Ahmed</last>
</name>
</person>
</people>
Figure 4. 3 XML document content.
5. Root element: This encloses all of the information. An XML document can have only one root
element. Therefore, “<people>” is the root of this XML document.
6. Start tag: XML elements have a start and an end tag—the start tag provides the name of the XML
element. E.g. <first>
7. End tag: The name of the end tag must exactly match the name of the start tag
E.g. <people> with </people>, <person> with </person>, <first> with </first>
8. XML element: The start and the end tags are collectively referred to as an XML elements.
Elements are the basic units that are used to identify and describe the data in XML. They are the
building blocks of an XML document.
E.g. <last>Kebede</last>
9. Data: XML elements can contain data between the start and the end tags. An XML document
represents information using a hierarchy. That means, it begins with a root element (e.g. people), which
contains sub-elements (e.g. person)
which in turn can contain other sub-elements(e.g. name), data (e.g. Kebede), or both. E.g. Abebech,
Kebede, Jemal, and Ahmed are data.
Besides the above elements, attributes and comments are also part of XML documents.
Attribute: Like HTML, XML elements can contain attributes. An attribute provides additional
information about the elements for which it is declared. It consists of a name-value pair. In the
following example, the attribute name is personid and the value is “101”. The attribute value should be
quoted: single or double quotes can be used.
E.g. <name personid = “101”>Ubang </name>
Comment: This is a kind of note or statement that is used to describe the XML code. Comments can
provide documentation information about the XML file or the application to which the file belongs.
A comment is ignored by the XML parser(or a program that interpretes XML instruction) during code
execution.
The syntax for a comment is: <!-- This is a comment -->
We can see the whole hierarchy of the above markup in an upside-down tree structure, as shown in
Figure 4.4.
4.1.2 Creating XML Documents
There are a few ways of opening an XML file directly. You can open and edit XML files with any text
editor, view them with any web browser, or use a website that lets you view, edit, and even convert
them to other formats. You can also use applications such as “oxygen” or “XML Notepad” to see your
files’ structures. In this section, we use “XML Notepad” for our demonstration.
For example, after saving the XML code given above as an example with the *.xml extension in any text
editor, when you open the file with XML Notepad, it looks as shown in Figure 4.5. It just does nothing
because XML is just information wrapped in tags. A piece of program should be written to send, receive,
store, or display it.
When you view your XML document in a browser, most browsers display an XML document with color-
coded elements (See Figure 4.5 below comparing the document Mozilla and Internet Explorer
browsers). Often a plus (+) or minus (-) sign to the left of the elements can be clicked to expand or
collapse the element structure. To view XML source code, try to select “View Page Source” or “View
Source” from the browser menu you use.