0% found this document useful (0 votes)
18 views5 pages

Chapter1 General Introduction

The document provides an overview of XML, highlighting its purpose as a markup language designed to create custom tags and improve upon HTML's limitations. It explains how XML allows machines to understand the meaning of data through self-describing tags, making data interchange and processing more efficient. Additionally, the document discusses the advantages of XML in enabling smart code and searches, ultimately enhancing web functionality.

Uploaded by

sop lionnel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views5 pages

Chapter1 General Introduction

The document provides an overview of XML, highlighting its purpose as a markup language designed to create custom tags and improve upon HTML's limitations. It explains how XML allows machines to understand the meaning of data through self-describing tags, making data interchange and processing more efficient. Additionally, the document discusses the advantages of XML in enabling smart code and searches, ultimately enhancing web functionality.

Uploaded by

sop lionnel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

School year 2024-2025/ Second semester/ FET / Computer Engineering

XML and Document Content Validation

Chapter 1: General Introduction

I. Introduction
XML, or Extensible Markup Language, is a markup language that you can use to create your
own tags. It was created by the World Wide Web Consortium (W3C) to overcome the
limitations of HTML, the Hypertext Markup Language that is the basis for all Web pages.
Like HTML, XML is based on SGML -- Standard Generalized Markup Language. This is an
ISO standard (ISO8879) defined in 1986 in the field of electronic document management.
However, it complexity intimidated many people that might have used it. XML was designed
with the Web in mind.

Why do we need XML?

HTML is the most successful markup language of all time. You can view the simplest HTML
tags on virtually any device, from palmtops to mainframes, and you can even convert HTML
markup into voice and other formats with the right tools. Given the success of HTML, why
did the W3C create XML? To answer that question, take a look at this document:

<p><b>Mrs. Mary McGoon</b>


<br>
1401 Main Street
<br>
Anytown, NC 34829</p>

The trouble with HTML is that it was designed with humans in mind. Even without viewing
the above HTML document in a browser, you and I can figure out that it is someone's postal
address. (Specifically, it's a postal address for someone in the United States; even if you're not
familiar with all the components of U.S. postal addresses, you could probably guess what this
represents.)

As humans, you and I have the intelligence to understand the meaning and intent of most
documents. A machine, unfortunately, can't do that. While the tags in this document tell a

Proposed by Dr. SOP DEFFO Lionel Landry Page 1


School year 2024-2025/ Second semester/ FET / Computer Engineering

browser how to display this information, the tags don't tell the browser what the information
is. You and I know it's an address, but a machine doesn't.

II. Understanding the Concept

a) Rendering HTML

To render HTML, the browser merely follows the instructions in the HTML document. The
paragraph tag tells the browser to start rendering on a new line, typically with a blank line
beforehand, while the two break tags tell the browser to advance to the next line without a
blank line in between. While the browser formats the document beautifully, the machine still
doesn't know this is an address.

Figure 1. HTML address

b) Processing HTML

To wrap up this discussion of the sample HTML document, consider the task of extracting the
postal code from this address. Here's an (intentionally brittle) algorithm for finding the postal
code in HTML markup:

If you find a paragraph with two <br> tags, the postal code is the second word after the first
comma in the second break tag.

Although this algorithm works with this example, there are any number of perfectly valid
addresses worldwide for which this simply wouldn't work. Even if you could write an
algorithm that found the postal code for any address written in HTML, there are any number
of paragraphs with two break tags that don't contain addresses at all. Writing an algorithm that

Proposed by Dr. SOP DEFFO Lionel Landry Page 2


School year 2024-2025/ Second semester/ FET / Computer Engineering

looks at any HTML paragraph and finds any postal codes inside it would be extremely
difficult, if not impossible.

c) A sample XML document

Now let's look at a sample XML document. With XML, you can assign some meaning to the
tags in the document. More importantly, it's easy for a machine to process the information as
well. You can extract the postal code from this document by simply locating the content
surrounded by the <postal-code> and </postal-code> tags, technically known as the <postal-
code> element.

<address>
<name>
<title>Mrs.</title>
<first-name>
Mary
</first-name>
<last-name>
McGoon
</last-name>
</name>
<street>
1401 Main Street
</street>
<city>Anytown</city>
<state>NC</state>
<postal-code>
34829
</postal-code>
</address>

4) Tags, elements, and attributes

There are three common terms used to describe parts of an XML document: tags, elements,
and attributes. Here is a sample document that illustrates the terms:

Proposed by Dr. SOP DEFFO Lionel Landry Page 3


School year 2024-2025/ Second semester/ FET / Computer Engineering

<address>
<name>
<title>Mrs.</title>
<first-name>
Mary
</first-name>
<last-name>
McGoon
</last-name>
</name>
<street>
1401 Main Street
</street>
<city state="NC">Anytown</city>
<postal-code>
34829
</postal-code>
</address>

• A tag is the text between the left angle bracket ( <) and the right angle bracket ( >).
There are starting tags (such as <name>) and ending tags (such as </name>)
• An element is the starting tag, the ending tag, and everything in between. In the
sample above, the <name> element contains three child elements: <title>, <first-
name>, and <last-name>.
• An attribute is a name-value pair inside the starting tag of an element. In this example,
state is an attribute of the <city> element; in earlier examples, <state> was an element

Proposed by Dr. SOP DEFFO Lionel Landry Page 4


School year 2024-2025/ Second semester/ FET / Computer Engineering

III. How XML is changing the Web

Now that you've seen how developers can use XML to create documents with self-describing
data, let's look at how people are using those documents to improve the Web. Here are a few
key areas:

• XML simplifies data interchange. Because different organizations (or even different
parts of the same organization) rarely standardize on a single set of tools, it can take a
significant amount of work for applications to communicate. Using XML, each group
creates a single utility that transforms their internal data formats into XML and vice
versa. Best of all, there's a good chance that their software vendors already provide
tools to transform their database records ( purchase orders, and so forth) to and from
XML.
• XML enables smart code. Because XML documents can be structured to identify
every important piece of information (as well as the relationships between the pieces),
it's possible to write code that can process those XML documents without human
intervention. The fact that software vendors have spent massive amounts of time and
money building XML development tools means writing that code is a relatively simple
process.
• XML enables smart searches. Although search engines have improved steadily over
the years, it's still quite common to get erroneous results from a search. If you're
searching HTML pages for someone named "Chip," you might also find pages on
chocolate chips, computer chips, wood chips, and lots of other useless matches.
Searching XML documents for <first-name> elements that contained the text Chip
would give you a much better set of results.

Exercise: Give the differences between XML and HTML

Proposed by Dr. SOP DEFFO Lionel Landry Page 5

You might also like