0% found this document useful (0 votes)

18 views5 pages

Chapter1 General Introduction

The document provides an overview of XML, highlighting its purpose as a markup language designed to create custom tags and improve upon HTML's limitations. It explains how XML allows machines to understand the meaning of data through self-describing tags, making data interchange and processing more efficient. Additionally, the document discusses the advantages of XML in enabling smart code and searches, ultimately enhancing web functionality.

Uploaded by

sop lionnel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views5 pages

Chapter1 General Introduction

Uploaded by

sop lionnel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

School year 2024-2025/ Second semester/ FET / Computer Engineering

XML and Document Content Validation

Chapter 1: General Introduction

I. Introduction
XML, or Extensible Markup Language, is a markup language that you can use to create your
own tags. It was created by the World Wide Web Consortium (W3C) to overcome the
limitations of HTML, the Hypertext Markup Language that is the basis for all Web pages.
Like HTML, XML is based on SGML -- Standard Generalized Markup Language. This is an
ISO standard (ISO8879) defined in 1986 in the field of electronic document management.
However, it complexity intimidated many people that might have used it. XML was designed
with the Web in mind.

Why do we need XML?

HTML is the most successful markup language of all time. You can view the simplest HTML
tags on virtually any device, from palmtops to mainframes, and you can even convert HTML
markup into voice and other formats with the right tools. Given the success of HTML, why
did the W3C create XML? To answer that question, take a look at this document:

Mrs. Mary McGoon

1401 Main Street
 
Anytown, NC 34829

The trouble with HTML is that it was designed with humans in mind. Even without viewing
the above HTML document in a browser, you and I can figure out that it is someone's postal
address. (Specifically, it's a postal address for someone in the United States; even if you're not
familiar with all the components of U.S. postal addresses, you could probably guess what this
represents.)

As humans, you and I have the intelligence to understand the meaning and intent of most
documents. A machine, unfortunately, can't do that. While the tags in this document tell a

Proposed by Dr. SOP DEFFO Lionel Landry Page 1

School year 2024-2025/ Second semester/ FET / Computer Engineering

browser how to display this information, the tags don't tell the browser what the information
is. You and I know it's an address, but a machine doesn't.

II. Understanding the Concept

a) Rendering HTML

To render HTML, the browser merely follows the instructions in the HTML document. The
paragraph tag tells the browser to start rendering on a new line, typically with a blank line
beforehand, while the two break tags tell the browser to advance to the next line without a
blank line in between. While the browser formats the document beautifully, the machine still
doesn't know this is an address.

Figure 1. HTML address

b) Processing HTML

To wrap up this discussion of the sample HTML document, consider the task of extracting the
postal code from this address. Here's an (intentionally brittle) algorithm for finding the postal
code in HTML markup:

If you find a paragraph with two tags, the postal code is the second word after the first
comma in the second break tag.

Although this algorithm works with this example, there are any number of perfectly valid
addresses worldwide for which this simply wouldn't work. Even if you could write an
algorithm that found the postal code for any address written in HTML, there are any number
of paragraphs with two break tags that don't contain addresses at all. Writing an algorithm that

Proposed by Dr. SOP DEFFO Lionel Landry Page 2

School year 2024-2025/ Second semester/ FET / Computer Engineering

looks at any HTML paragraph and finds any postal codes inside it would be extremely
difficult, if not impossible.

c) A sample XML document

Now let's look at a sample XML document. With XML, you can assign some meaning to the
tags in the document. More importantly, it's easy for a machine to process the information as
well. You can extract the postal code from this document by simply locating the content
surrounded by the <postal-code> and </postal-code> tags, technically known as the <postal-
code> element.

<address>
<name>
<title>Mrs.</title>
<first-name>
Mary
</first-name>
<last-name>
McGoon
</last-name>
</name>
<street>
1401 Main Street
</street>
<city>Anytown</city>
<state>NC</state>
<postal-code>
34829
</postal-code>
</address>

4) Tags, elements, and attributes

There are three common terms used to describe parts of an XML document: tags, elements,
and attributes. Here is a sample document that illustrates the terms:

Proposed by Dr. SOP DEFFO Lionel Landry Page 3

School year 2024-2025/ Second semester/ FET / Computer Engineering

<address>
<name>
<title>Mrs.</title>
<first-name>
Mary
</first-name>
<last-name>
McGoon
</last-name>
</name>
<street>
1401 Main Street
</street>
<city state="NC">Anytown</city>
<postal-code>
34829
</postal-code>
</address>

• A tag is the text between the left angle bracket ( <) and the right angle bracket ( >).
There are starting tags (such as <name>) and ending tags (such as </name>)
• An element is the starting tag, the ending tag, and everything in between. In the
sample above, the <name> element contains three child elements: <title>, <first-
name>, and <last-name>.
• An attribute is a name-value pair inside the starting tag of an element. In this example,
state is an attribute of the <city> element; in earlier examples, <state> was an element

Proposed by Dr. SOP DEFFO Lionel Landry Page 4

School year 2024-2025/ Second semester/ FET / Computer Engineering

III. How XML is changing the Web

Now that you've seen how developers can use XML to create documents with self-describing
data, let's look at how people are using those documents to improve the Web. Here are a few
key areas:

• XML simplifies data interchange. Because different organizations (or even different
parts of the same organization) rarely standardize on a single set of tools, it can take a
significant amount of work for applications to communicate. Using XML, each group
creates a single utility that transforms their internal data formats into XML and vice
versa. Best of all, there's a good chance that their software vendors already provide
tools to transform their database records ( purchase orders, and so forth) to and from
XML.
• XML enables smart code. Because XML documents can be structured to identify
every important piece of information (as well as the relationships between the pieces),
it's possible to write code that can process those XML documents without human
intervention. The fact that software vendors have spent massive amounts of time and
money building XML development tools means writing that code is a relatively simple
process.
• XML enables smart searches. Although search engines have improved steadily over
the years, it's still quite common to get erroneous results from a search. If you're
searching HTML pages for someone named "Chip," you might also find pages on
chocolate chips, computer chips, wood chips, and lots of other useless matches.
Searching XML documents for <first-name> elements that contained the text Chip
would give you a much better set of results.

Exercise: Give the differences between XML and HTML

Proposed by Dr. SOP DEFFO Lionel Landry Page 5

Machine and Industrial Design in Mechanical Engineering (Milan Rackov, Radivoje Mitrović, Maja Čavić) (Z-Library)
No ratings yet
Machine and Industrial Design in Mechanical Engineering (Milan Rackov, Radivoje Mitrović, Maja Čavić) (Z-Library)
725 pages
Scottish Fold Cat
100% (2)
Scottish Fold Cat
11 pages
QAQC
100% (1)
QAQC
15 pages
Extensible Markup Language
100% (1)
Extensible Markup Language
89 pages
The Difference Between XML and HTML
No ratings yet
The Difference Between XML and HTML
161 pages
XML Basics
No ratings yet
XML Basics
17 pages
UNIT4pptx 2023 10 27 08 58 28
No ratings yet
UNIT4pptx 2023 10 27 08 58 28
108 pages
Unit 4 Web Authoring Complete
No ratings yet
Unit 4 Web Authoring Complete
69 pages
XML Unit 2 Notes
No ratings yet
XML Unit 2 Notes
24 pages
XML Introduction
No ratings yet
XML Introduction
42 pages
Md-070 Application Extensions Technical Design
100% (1)
Md-070 Application Extensions Technical Design
16 pages
Sabyasachi Moitra
No ratings yet
Sabyasachi Moitra
26 pages
Note PDF
No ratings yet
Note PDF
52 pages
Unit 3
No ratings yet
Unit 3
50 pages
XML 2
No ratings yet
XML 2
38 pages
Automation - ch05
No ratings yet
Automation - ch05
35 pages
(Ibm) - Introduction To XML
No ratings yet
(Ibm) - Introduction To XML
37 pages
Unit II WT Notes
No ratings yet
Unit II WT Notes
32 pages
Module 5
No ratings yet
Module 5
29 pages
KRK-rpg2 Manual PDF
No ratings yet
KRK-rpg2 Manual PDF
20 pages
Web Technologies UNIT-1 XML
No ratings yet
Web Technologies UNIT-1 XML
34 pages
TCP Lec02
No ratings yet
TCP Lec02
39 pages
XML Stands For Extensible Markup Language
No ratings yet
XML Stands For Extensible Markup Language
61 pages
Chapter 4
No ratings yet
Chapter 4
22 pages
Lesson 07 - XML-DTD
No ratings yet
Lesson 07 - XML-DTD
47 pages
What Is XML?: Example
No ratings yet
What Is XML?: Example
33 pages
WT Unit - 2
No ratings yet
WT Unit - 2
26 pages
XML Unit III
No ratings yet
XML Unit III
21 pages
Chapter 1 XML Basic3
No ratings yet
Chapter 1 XML Basic3
61 pages
Web Technology (CSC-353) : (Unit 3: XML)
No ratings yet
Web Technology (CSC-353) : (Unit 3: XML)
50 pages
WP Unit5
No ratings yet
WP Unit5
17 pages
Module-1: Introduction To HTML and Introduction To Css 1.1. What Is HTML and Where Did It Come From?
No ratings yet
Module-1: Introduction To HTML and Introduction To Css 1.1. What Is HTML and Where Did It Come From?
41 pages
Unit-3 XML
No ratings yet
Unit-3 XML
22 pages
Unit 5
No ratings yet
Unit 5
19 pages
XML and JSP
No ratings yet
XML and JSP
27 pages
XML Stands For Extensible
No ratings yet
XML Stands For Extensible
20 pages
Mohammed - PMP, ASM - ITIL - Resume For - SAP Project Manager
No ratings yet
Mohammed - PMP, ASM - ITIL - Resume For - SAP Project Manager
5 pages
Overview of HTML and XML
No ratings yet
Overview of HTML and XML
22 pages
Tutorial: XML Document Example
No ratings yet
Tutorial: XML Document Example
62 pages
2 XML
No ratings yet
2 XML
14 pages
4..lect-09 XML Languages & Applications
No ratings yet
4..lect-09 XML Languages & Applications
14 pages
Wit Unit Iv Pmfa
No ratings yet
Wit Unit Iv Pmfa
39 pages
XML (BScCSIT 5th Semester)
No ratings yet
XML (BScCSIT 5th Semester)
39 pages
Module 2 PDF
No ratings yet
Module 2 PDF
25 pages
Sgmlandxml 200806091332
No ratings yet
Sgmlandxml 200806091332
12 pages
XML
No ratings yet
XML
24 pages
What You Should Already Know: XML Was Designed To Transport and Store Data. HTML Was Designed To Display Data
No ratings yet
What You Should Already Know: XML Was Designed To Transport and Store Data. HTML Was Designed To Display Data
22 pages
XML Notes
No ratings yet
XML Notes
18 pages
XML and Webservices
No ratings yet
XML and Webservices
30 pages
Introduction To XML and HTML Language: Anant Manas Computer Science Department
No ratings yet
Introduction To XML and HTML Language: Anant Manas Computer Science Department
16 pages
Chapter 7
No ratings yet
Chapter 7
12 pages
What Is XML?
No ratings yet
What Is XML?
26 pages
Unit 1
No ratings yet
Unit 1
10 pages
HTML & XML - Unit 5
No ratings yet
HTML & XML - Unit 5
6 pages
Introduction To XML
No ratings yet
Introduction To XML
9 pages
Closed-Loop Control of DC Drives With Controlled Rectifier
0% (1)
Closed-Loop Control of DC Drives With Controlled Rectifier
40 pages
Application of IR - ITC
No ratings yet
Application of IR - ITC
23 pages
Unit 5
No ratings yet
Unit 5
10 pages
IT g12 Unit 4 Note 1
No ratings yet
IT g12 Unit 4 Note 1
4 pages
Introduction To: What You Should Already Know
No ratings yet
Introduction To: What You Should Already Know
16 pages
XML Soap
No ratings yet
XML Soap
16 pages
Office of The Senior Citizens Affairs (Osca)
100% (1)
Office of The Senior Citizens Affairs (Osca)
13 pages
Unit10 XML PDF
No ratings yet
Unit10 XML PDF
16 pages
XML
No ratings yet
XML
14 pages
What Is XML
No ratings yet
What Is XML
7 pages
Attachment 14940535 2 4 - S-GATE - Presentation
No ratings yet
Attachment 14940535 2 4 - S-GATE - Presentation
14 pages
Class Action Filed B John Fergusson, Kelli Beaugez and Gregory Stenstrom Against Apple, January 9, 2018
No ratings yet
Class Action Filed B John Fergusson, Kelli Beaugez and Gregory Stenstrom Against Apple, January 9, 2018
24 pages
Bus 1010 E-Portfolio Assignment
No ratings yet
Bus 1010 E-Portfolio Assignment
6 pages
Purcell Cash Why Seismic Matters Activity and Presentation
No ratings yet
Purcell Cash Why Seismic Matters Activity and Presentation
47 pages
Maths New Sylabus Ministry of Primary and Secondary Education - Validated-1
No ratings yet
Maths New Sylabus Ministry of Primary and Secondary Education - Validated-1
96 pages
Essentials of KABC II Assessment, 1st Edition ISBN 0471667331, 9780471667339 Best Quality Download
No ratings yet
Essentials of KABC II Assessment, 1st Edition ISBN 0471667331, 9780471667339 Best Quality Download
15 pages
NMI
No ratings yet
NMI
36 pages
Schematic - Baby Wall-E Control Board - 2023!05!05
No ratings yet
Schematic - Baby Wall-E Control Board - 2023!05!05
1 page
Notation: Ae Aeff An
No ratings yet
Notation: Ae Aeff An
4 pages
Ba01572cen 0320
No ratings yet
Ba01572cen 0320
16 pages
Unit 1 - Set Theory, Types of Sets, Set Operations
No ratings yet
Unit 1 - Set Theory, Types of Sets, Set Operations
20 pages
CSS Solved General Science and Ability Past Paper 2021
No ratings yet
CSS Solved General Science and Ability Past Paper 2021
35 pages
Robotics Perception Week 3 Assignment
No ratings yet
Robotics Perception Week 3 Assignment
6 pages
San Ildefonso College: Table of Specification
No ratings yet
San Ildefonso College: Table of Specification
11 pages
Hate Speech, 2016 Report
No ratings yet
Hate Speech, 2016 Report
60 pages
GTU Big Data Analysis Question Paper Summer 2022
No ratings yet
GTU Big Data Analysis Question Paper Summer 2022
1 page
Magazine English January24
No ratings yet
Magazine English January24
12 pages
Tmi 2019 Leclerc
No ratings yet
Tmi 2019 Leclerc
13 pages
Group 14 SWOT Meeting Report
No ratings yet
Group 14 SWOT Meeting Report
8 pages
Tutorial Sheet 2 Digital 1
No ratings yet
Tutorial Sheet 2 Digital 1
2 pages
CEF 349 Group Analization Report
No ratings yet
CEF 349 Group Analization Report
7 pages
Group 16 Assessment
No ratings yet
Group 16 Assessment
3 pages
Ijfs 11 00110
No ratings yet
Ijfs 11 00110
17 pages
Group Permutations
No ratings yet
Group Permutations
5 pages
AFFICHE CARI2024 Version Finale
No ratings yet
AFFICHE CARI2024 Version Finale
1 page
Japan - Volcanology PHD Position - 2024
No ratings yet
Japan - Volcanology PHD Position - 2024
1 page
Cove R Lin e
No ratings yet
Cove R Lin e
17 pages
EB L1300U Datasheet
No ratings yet
EB L1300U Datasheet
3 pages
Soap, Fatty Acids, and Synthetic Detergents: Janine Chupa, Steve Misner, Amit Sachdev, and George A. Smith
No ratings yet
Soap, Fatty Acids, and Synthetic Detergents: Janine Chupa, Steve Misner, Amit Sachdev, and George A. Smith
2 pages
Beginning XML
From Everand
Beginning XML
Joe Fawcett
3/5 (1)
XML Programming: The Ultimate Guide to Fast, Easy, and Efficient Learning of XML Programming
From Everand
XML Programming: The Ultimate Guide to Fast, Easy, and Efficient Learning of XML Programming
Christopher Right
2.5/5 (2)
Beginning HTML and CSS
From Everand
Beginning HTML and CSS
Rob Larsen
No ratings yet
XHTML
From Everand
XHTML
Jitendra Patel
No ratings yet

Chapter1 General Introduction

Uploaded by

Chapter1 General Introduction

Uploaded by

School year 2024-2025/ Second semester/ FET / Computer Engineering

XML and Document Content Validation

Chapter 1: General Introduction

Why do we need XML?

<p><b>Mrs. Mary McGoon</b>

Proposed by Dr. SOP DEFFO Lionel Landry Page 1

II. Understanding the Concept

Figure 1. HTML address

Proposed by Dr. SOP DEFFO Lionel Landry Page 2

c) A sample XML document

4) Tags, elements, and attributes

Proposed by Dr. SOP DEFFO Lionel Landry Page 3

Proposed by Dr. SOP DEFFO Lionel Landry Page 4

III. How XML is changing the Web

Exercise: Give the differences between XML and HTML

Proposed by Dr. SOP DEFFO Lionel Landry Page 5

You might also like