Introduction To XML: A Universal Data Format
Introduction To XML: A Universal Data Format
Module Introduction
The module describes drawbacks of earlier mark up languages that led to the development of XML.
The module also explains the structure and lifecycle of the XML document. This module covers more on the XML syntax and the various parts of the XML document. In this module, you will learn about: 1. Introduction to XML 2. Exploring XML 3. Working with XML 4. XML Syntax
#1 - Introduction to XML
Outline the features of markup languages and list their drawbacks. Define and describe XML. State the benefits and scope of XML.
Evolution of markup languages: GML SGML HTML. Features SGML ensures to represent the data in its own way. HTML allows the user to use any text editor Drawbacks GML and SGML were not suited for data interchange over the web. HTML possesses instructions on how to display the content rather than the content they encompass.
Evolution of XML
The Extensible Markup Language (XML) was created in order to address the issues raised by earlier markup languages XML is a W3C recommendation. XML is a set of rules for defining semantic tags that Break a document into parts And identify the different parts of the document. XML was developed over HTML.
Features of XML
XML stands for Extensible Markup Language XML is a markup language much like HTML XML was designed to describe data XML tags are not predefined. You must define your own tags XML uses a Document Type Definition (DTD) or an XML Schema to describe the data
XML Markup
XML markup defines the physical and logical layout of the document. XML's markup divides a document into separate information containers called elements.
A document consists of one outermost element, called root element that contains all the other elements, plus some optional administrative information at the top, known as XML declaration.
Benefits of XML
Data independence: separates the content from its presentation. Easier to parse: frameworks for data exchange. Reducing server load: using DOM to manipulate the data.
Describe the structure of an XML document. Explain the lifecycle of an XML document. State the functions of editors for XML and list the popularly used editors. State the functions of parsers for XML and list names of commonly used parsers. State the functions of browsers for XML and list the commonly used browsers.
XML documents are commonly stored in text files with extension .xml. The two sections of an XML document are: Document Prolog Root Element
Help XML parser to get information about the content in the document
1.
2.
$2 - Root Element
Also called a document element. It must contain all the other elements and content in the document. An XML element has a start tag and end tag.
Logical Structure
Gives information about the elements and the order in which they are to be included in the document. It shows how a document is constructed rather than what it contains.
XML Editors
The main functions that editors provide are as follows: Add opening and closing tags to the code Check for validity of XML Verify XML against a DTD/Schema Perform series of transforms over a document Color the XML syntax Display the line numbers Present the content and hide the code Complete the word The popularly used editors are: XMLwriter XML Spy XML Pro XMLmind XMetal
Parsers
An XML parser/XML processor reads the document and verifies it for its well-formedness. After the document is verified, the processor converts the document into a tree of elements or a data structure. Speed and performance are the criteria against which XML parsers are selected. Commonly used parsers are: Crimson Oracle XML Parser JAXP (Java API for XML) MSXML
Browsers
After the XML document is read, the parser passes the data structure to the client application (web browser) The browser then formats the data and displays it to the user. Other programs like database, MIDI program or a spreadsheet program may also receive the data and present it accordingly. Commonly used web browsers are as follows: Netscape Mozilla Internet Explorer Firefox Opera
Explain the steps towards building an XML Define what is meant by well-form XML
An XML document has three main components: Tags (markup) and text (content) DTD or Schema Formatting or display specifications
The steps to build an XML document are as follows: Create an XML document in an editor. Save the XML document. Load XML document in a browser.
3.
XML Version Declaration Document Type Definition (DTD) Document instance in which the content is defined by the mark up
<?xml It indicates that the document is an XML document version="1.0 Specific the version of XML encoding = "iso-8859-l Characters are encoded using standalone="yes Indicates the presence of external markup declarations. yes" indicates no external mark up declarations no" indicate mark up declarations might exist.
<!DOCTYPE student Declares and defines the elements used in the document Externally <!DOCTYPE student SYSTEM "studatabase.dtd"> Internally
$3 - Document instance
< student > This part defines the content of the XML document called as mark up. It describes the purpose and function of each element.
Meaning in Markup
1.
Structure 1. Describes the form of the document by specifying the relationship between different elements in the document. 2. It emphasizes to specify a single nonempty, root element that contains other elements and the content
Semantic Describes how each element is specified to the outside world of the document. ex. Web browser assigns "paragraph" to the tags <P> and </P>
Well-formedness refers to the standards that are to be followed by the XML documents. Rules: Minimum of one element is required, XML tags are case sensitive. Every start tag should end with end tag. XML tags should be nested properly. XML tags should be valid. Length of markup names XML attributes should be valid. XML documents should be verified
State and describe the use of comments and processing instructions in XML. Classify character data that is written between tags. Describe entities, DOCTYPE declarations and attributes.
Comments
Give information about the code Can appear in the document prolog, DTD or in the textual content. Not appear inside the tags or attribute values. Syntax: <! -- <comments> -->
XML Elements
An XML element is everything from (including) the element's start tag to (including) the element's end tag. An element can contain other elements, simple text or a mixture of both. Elements can also have attributes. XML Naming Rules Names can contain letters, numbers, and other characters Names must not start with a number or punctuation character Names cannot contain spaces
Processing Instructions
Processing instructions are information which is application specific. These instructions do not follow XML rules or internal syntax. With the help of a parser these instructions are passed to the application. The main objective of a processing instruction is to present some special instructions to the application. Syntax
An XML document is divided into markup and character data. Character data describes the document's actual content with the white space. The text in character data is not processed by the parser and thus not treated as a regular text. The character data can be classified into: CDATA PCDATA
The data that is parsed by the parser The PCDATA specifies that the element has parsed character data. It is used in the element declaration. Escape character like "<" when used in the XML document will make the parser interpret it as a new element.
CDATA
The text inside a CDATA section is not parsed by the XML parser. A text is considered in a CDATA section if it contains '<' or '<&>' characters. The syntax for CDATA "<![CDATA[]]>
The CDATA sections: Cannot be nested. Does not accept line breaks or spaces inside the "]]>" string.
Entities
Entities are a construct that are referenced in the document Every entity consists: name - value. As the XML document is parsed, it checks for entity references.
For every entity reference, the parser checks the memory to replace the entity reference with a text or markup.
Syntax for an entity reference: &<entity name>;. All the entities must be declared before they are used in the document. An entity can be declared either in a document prolog or in a DTD.
Predefined entities
Entity Categories
Entities are used as shortcuts to refer to the data pages. The two types of entities are as follows: General Entity Parameter Entity
Entity Categories
General Entity These are the entities used within the document content. They refer to the content of a named entity. References to these entities: &<entity_name>;
Parameter Entity These types of entities are used only in the DTD. These type of entities are declared in DTD. References to these entities: %<entity_name>;
DOCTYPE declarations
Defines the elements to be used in the document. To indicate what DTD the document adheres to. It can be declared either: In the XML document (internal) Referenced to the external document (external)
XML file
Attributes
Additional information about the attributes can be given in the form of attributes. Attributes are created in the DTD along with the elements. Every attribute within an element is associated with a name-value pair. Attributes can be used to distinguish between the elements of the same name. Attributes occur in the start-tags after the element name.