1 - XML 2020 Lab.01 - XML Standard (v3.0)
1 - XML 2020 Lab.01 - XML Standard (v3.0)
0)
Content
1.1. XML definition, purpose and history
1.2. XML versus HTML (comparison)
1.3. XML document components
1.4. XML documents syntax rules: “well-formed”
1.5. XML Namespaces
1.6. XML Document processing - Use Cases
1.7. How to create and view XML Documents
1.8. Working environments
1.9. Things to try
1.10. References
HTML XML
Created for formatting and presenting data Created for structuring and storage of data
Used to indicate how data in the document will be displayed Describes data, focuses on what data is
The tags are defined by/in HTML Standard (specification) The tags are user-defined
Fixed number of tags => limitation Unlimited number of tags
Note:
• HTML 5.1 was released as a W3C Recommendation, on 01.11.2016
• HTML 5.2 was released as a W3C Recommendation, on 14.12.2017
• HTML 5.3 is W3C Working Draft, since 18.10.2018
1
3. Comments
4. Elements
5. Attributes
6. Entities
7. DOCTYPE sections
8. CDATA sections
9. XML Namespaces
Notes:
Please take your time and read more about them in the indicated References (on-line Tutorials &
book).
DOCTYPE sections are used to define/reference a Document Type Definition (DTD) for XML validation
purposes (will be presented during Lab. 03).
The XML document components preceding the root element (see below) are known as the prolog of
the XML document.
Note:
• Exception for Rule 3: empty elements.
An XML document observing these rules is considered “well-formed” (in W3C terminology!) that is
“syntactically correct”.
1.4.2. XML Naming Rules
XML names have to observe the following rules:
• XML Names are case-sensitive
• XML Names cannot start with a number or punctuation character (except (“:”, “_”)
• XML Names cannot start with the letters xml (or XML, or Xml, etc.)
• XML Names can contain letters, digits, and other characters (“:”, “_”, “-”, “.”, …)
• XML Names cannot contain spaces
Note:
• An XML document observing these rules (1.4.1. & 1.4.2.) is considered “well-formed” (in W3C
terminology!) that means “syntactically correct”.
1.4.3. The content of an Element
The information found/located between the start tag of an element and its end tag is called the content of the
element. This content is described by the so called content model. There are four kinds of this model:
- empty content (i.e. there is nothing between the tags of the element)
- simple text (i.e. only data is stored in the elemnt)
- child elements (i.e. describing the hierarchical structure of the element)
2
- mixed content (i.e. both data and structure)
7. View it again using various browsers Google Chrome, Firefox Mozilla, Microsoft IE/Edge, Opera, … and
maybe in Microsoft Word. Notice the differences (if any).
1.10. References:
1. W3C XML 1.0 Recommendation Extensible Markup Language (XML) 1.0 (Fifth Edition)
W3C Recommendation 26 Nov 2008, https://www.w3.org/TR/2008/REC-xml-20081126/
2. W3C Namespaces Namespaces in XML 1.0 (Third Edition)
W3C Recommendation 8 Dec. 2009, https://www.w3.org/TR/2009/REC-xml-names-20091208/
3. XML Tutorial (from Introduction to Display)
https://www.w3schools.com/xml/
4. XML Tutorial (from Overview to White-spaces)
https://www.tutorialspoint.com/xml/index.htm
5. Beginning XML, 5th Edition, Joe Fawcett, Liam Quin, Danny Ayers, John Wiley & Sons, Inc., 2012
(Chapters 1, 2, 3 – but without Advanced Parsing, XML Infoset, XML Schema & Common Namespaces)
Note: There are plenty of free books on the Lab topics to be found on Internet. If the proposed References do
not satisfy your needs you are encouraged to search and find the one that suites better your way of learning!