0% found this document useful (0 votes)
78 views

Introduction To XML: A Universal Data Format

The document provides an overview of XML including its introduction, structure, editing, parsing and syntax. It discusses the drawbacks of earlier markup languages that led to XML's development. It describes the components of an XML document including the prolog, root element, and logical structure. It also explains XML editing, parsing, browsing and the steps to create a well-formed XML document. Finally, it covers XML syntax elements like comments, processing instructions, character data classification and entities.

Uploaded by

Phuong Le
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

Introduction To XML: A Universal Data Format

The document provides an overview of XML including its introduction, structure, editing, parsing and syntax. It discusses the drawbacks of earlier markup languages that led to XML's development. It describes the components of an XML document including the prolog, root element, and logical structure. It also explains XML editing, parsing, browsing and the steps to create a well-formed XML document. Finally, it covers XML syntax elements like comments, processing instructions, character data classification and entities.

Uploaded by

Phuong Le
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 41

Introduction to XML

A Universal data format

Module Introduction

Welcome to the module, Introduction to XML.

The module describes drawbacks of earlier mark up languages that led to the development of XML.
The module also explains the structure and lifecycle of the XML document. This module covers more on the XML syntax and the various parts of the XML document. In this module, you will learn about: 1. Introduction to XML 2. Exploring XML 3. Working with XML 4. XML Syntax

#1 - Introduction to XML

Outline the features of markup languages and list their drawbacks. Define and describe XML. State the benefits and scope of XML.

Features and Drawback of Markup Languages

Evolution of markup languages: GML SGML HTML. Features SGML ensures to represent the data in its own way. HTML allows the user to use any text editor Drawbacks GML and SGML were not suited for data interchange over the web. HTML possesses instructions on how to display the content rather than the content they encompass.

Evolution of XML

The Extensible Markup Language (XML) was created in order to address the issues raised by earlier markup languages XML is a W3C recommendation. XML is a set of rules for defining semantic tags that Break a document into parts And identify the different parts of the document. XML was developed over HTML.

Features of XML

XML stands for Extensible Markup Language XML is a markup language much like HTML XML was designed to describe data XML tags are not predefined. You must define your own tags XML uses a Document Type Definition (DTD) or an XML Schema to describe the data

XML with a DTD or XML Schema is designed to be self-descriptive

XML Markup

XML markup defines the physical and logical layout of the document. XML's markup divides a document into separate information containers called elements.

A document consists of one outermost element, called root element that contains all the other elements, plus some optional administrative information at the top, known as XML declaration.

Benefits of XML

Data independence: separates the content from its presentation. Easier to parse: frameworks for data exchange. Reducing server load: using DOM to manipulate the data.

Easier to create: it is text-based.


Web site content: transforms to HTML using XSLT and CSS. Remote procedure call: allows distributed computing. Ecommerce: sends data from one company to another.

#2 - Exploring XML Lesson Overview

Describe the structure of an XML document. Explain the lifecycle of an XML document. State the functions of editors for XML and list the popularly used editors. State the functions of parsers for XML and list names of commonly used parsers. State the functions of browsers for XML and list the commonly used browsers.

XML Document Structure

XML documents are commonly stored in text files with extension .xml. The two sections of an XML document are: Document Prolog Root Element

$1- Document Prolog

Help XML parser to get information about the content in the document

Document prolog contains metadata and consists of two parts:


XML Declaration Specifies the version of XML being used Document Type Declaration. Defines entities' or attributes' values Checks grammar of markup Checks vocabulary of markup

1.

2.

$2 - Root Element

Also called a document element. It must contain all the other elements and content in the document. An XML element has a start tag and end tag.

Logical Structure

Gives information about the elements and the order in which they are to be included in the document. It shows how a document is constructed rather than what it contains.

Life cycle of an XML document

XML Editors

The main functions that editors provide are as follows: Add opening and closing tags to the code Check for validity of XML Verify XML against a DTD/Schema Perform series of transforms over a document Color the XML syntax Display the line numbers Present the content and hide the code Complete the word The popularly used editors are: XMLwriter XML Spy XML Pro XMLmind XMetal

Parsers

An XML parser/XML processor reads the document and verifies it for its well-formedness. After the document is verified, the processor converts the document into a tree of elements or a data structure. Speed and performance are the criteria against which XML parsers are selected. Commonly used parsers are: Crimson Oracle XML Parser JAXP (Java API for XML) MSXML

Browsers

After the XML document is read, the parser passes the data structure to the client application (web browser) The browser then formats the data and displays it to the user. Other programs like database, MIDI program or a spreadsheet program may also receive the data and present it accordingly. Commonly used web browsers are as follows: Netscape Mozilla Internet Explorer Firefox Opera

#3 - Working with XML

Explain the steps towards building an XML Define what is meant by well-form XML

Creating an XML document

An XML document has three main components: Tags (markup) and text (content) DTD or Schema Formatting or display specifications

The steps to build an XML document are as follows: Create an XML document in an editor. Save the XML document. Load XML document in a browser.

Exploring the XML document

The various building blocks of an XML document are:


1. 2.

3.

XML Version Declaration Document Type Definition (DTD) Document instance in which the content is defined by the mark up

$1- XML Version Declaration

<?xml It indicates that the document is an XML document version="1.0 Specific the version of XML encoding = "iso-8859-l Characters are encoded using standalone="yes Indicates the presence of external markup declarations. yes" indicates no external mark up declarations no" indicate mark up declarations might exist.

$2 - Document Type Definition (DTD)

<!DOCTYPE student Declares and defines the elements used in the document Externally <!DOCTYPE student SYSTEM "studatabase.dtd"> Internally

$3 - Document instance

< student > This part defines the content of the XML document called as mark up. It describes the purpose and function of each element.

Meaning in Markup

Markup can be divided into following three parts:

1.

Structure 1. Describes the form of the document by specifying the relationship between different elements in the document. 2. It emphasizes to specify a single nonempty, root element that contains other elements and the content
Semantic Describes how each element is specified to the outside world of the document. ex. Web browser assigns "paragraph" to the tags <P> and </P>

Style It specifies how the content of the tag or element is displayed.

Well-formed XML document

Well-formedness refers to the standards that are to be followed by the XML documents. Rules: Minimum of one element is required, XML tags are case sensitive. Every start tag should end with end tag. XML tags should be nested properly. XML tags should be valid. Length of markup names XML attributes should be valid. XML documents should be verified

#4- XML Syntax

State and describe the use of comments and processing instructions in XML. Classify character data that is written between tags. Describe entities, DOCTYPE declarations and attributes.

Comments

Give information about the code Can appear in the document prolog, DTD or in the textual content. Not appear inside the tags or attribute values. Syntax: <! -- <comments> -->

XML Elements

An XML element is everything from (including) the element's start tag to (including) the element's end tag. An element can contain other elements, simple text or a mixture of both. Elements can also have attributes. XML Naming Rules Names can contain letters, numbers, and other characters Names must not start with a number or punctuation character Names cannot contain spaces

Processing Instructions

Processing instructions are information which is application specific. These instructions do not follow XML rules or internal syntax. With the help of a parser these instructions are passed to the application. The main objective of a processing instruction is to present some special instructions to the application. Syntax

Classification of character data

An XML document is divided into markup and character data. Character data describes the document's actual content with the white space. The text in character data is not processed by the parser and thus not treated as a regular text. The character data can be classified into: CDATA PCDATA

PCDATA (parsed character data)

The data that is parsed by the parser The PCDATA specifies that the element has parsed character data. It is used in the element declaration. Escape character like "<" when used in the XML document will make the parser interpret it as a new element.

CDATA

The text inside a CDATA section is not parsed by the XML parser. A text is considered in a CDATA section if it contains '<' or '<&>' characters. The syntax for CDATA "<![CDATA[]]>

The CDATA sections: Cannot be nested. Does not accept line breaks or spaces inside the "]]>" string.

Entities

Entities are a construct that are referenced in the document Every entity consists: name - value. As the XML document is parsed, it checks for entity references.

For every entity reference, the parser checks the memory to replace the entity reference with a text or markup.
Syntax for an entity reference: &<entity name>;. All the entities must be declared before they are used in the document. An entity can be declared either in a document prolog or in a DTD.

Predefined entities

Entity Categories

Entities are used as shortcuts to refer to the data pages. The two types of entities are as follows: General Entity Parameter Entity

Entity Categories

General Entity These are the entities used within the document content. They refer to the content of a named entity. References to these entities: &<entity_name>;

Parameter Entity These types of entities are used only in the DTD. These type of entities are declared in DTD. References to these entities: %<entity_name>;

DOCTYPE declarations

Defines the elements to be used in the document. To indicate what DTD the document adheres to. It can be declared either: In the XML document (internal) Referenced to the external document (external)

Example of DOCTYPE declarations - Internal

Example of DOCTYPE declarations - External

DTD file (note.dtd)

XML file

Attributes

Additional information about the attributes can be given in the form of attributes. Attributes are created in the DTD along with the elements. Every attribute within an element is associated with a name-value pair. Attributes can be used to distinguish between the elements of the same name. Attributes occur in the start-tags after the element name.

Attribute values are always enclosed in single or double quotes.


Attributes are case sensitive and must start with a letter or underscore

Thats all for today !

Introduction to XML Exploring XML Working with XML XML Syntax

Thank you all for your attention and patient !

You might also like