A Quick Introduction To XML
A Quick Introduction To XML
This document provides a quick introduction to some of the terms and concepts used in the analysis of
the XML documents in the Tutorial section of the CellML website. The terms are taken from the original XML specification [http://www.w3.org/TR/1998/REC-xml-19980210] published in February 1998 by
the World Wide Web Consortium.
The following online resources provide more thorough documentation on XML:
The following list of terms is by no means exhaustive, and the definitions are in some cases incomplete:
XML
XML stands for eXtensible Markup Language, and it is a standard
for structured text documents developed by the World Wide Web
Consortium [http://www.w3.org/] (W3C). The W3C represents
about 500 paying member companies and is responsible for many of
the standards relating to the internet, including HTML. XML can be
used to structure text in such a way that it is readable by both humans and machines, and it presents a simple format for the exchange
of information across the internet between computers. As such, electronic commerce is the principal application area for XML.
XML is a simplification (or subset) of the Standard Generalized
Markup Language (SGML) which was developed in the 1970s for
the large-scale storage of structured text documents.
XML document
An XML document contains a prolog and a body. The prolog consists of an XML declaration, possibly followed by a document type
declaration. The body is made up of a single root element, possibly
with some comments and/or processing instructions. An XML document is typically a computer file whose contents meet the requirements laid out in the XML specification. However, XML documents
may also be generated "on the fly" by a computer responding to a request from another computer. For instance, an XML document may
be dynamically compiled from information contained in a database.)
XML Declaration
The first few characters of an XML document must make up an
XML declaration. The declaration is used by the processing software to work out how to deal with the subsequent XML content. A
typical XML declaration is shown below. The encoding of a document is particularly important, as XML processors will default to
UTF-8 when reading an 8-bit-per-character document. This will
cause characters to be rendered incorrectly if the document uses Latin encoding (iso-8859-1). XML processing applications are required
1
The Uniform Resource Identifier (URI) in a document type declaration can point to a document known as a document type definition
(DTD). The format for a DTD is defined in the XML Specification
and is not the same as for an XML document. A DTD may contain a
set of rules that specify how the different tags in an XML document
can be used together and the attributes that may belong to each tag.
Most XML processors provide checking of XML documents against
a DTD, allowing applications to quickly and painlessly check that
the structure of an XML document is roughly correct.
DTDs do not allow the specification of constraints on element and
attribute content like the value of the att_1 attribute must be a
number. This kind of validation can be handled by using XML
Schema [http://www.w3.org/XML/Schema], the successor to DTDs
which defines an XML-based file format.
Comment
A document author can place comments in XML documents to add
annotations intended for other humans reading the document. The
contents of a comment are not regarded as part of the document's
data. A comment is started with a less-than sign, exclamation mark,
and two hyphens, and is ended with two-hyphens and a greater-than
sign, as shown below. Comments may not be placed inside start- or
end-tags.
<my_tag> content <!-- comment on content --></my_tag>
XML Namespace
Namespaces in XML [http://www.w3.org/TR/REC-xml-names/] is a
companion specification to the main XML specification. It provides
a facility for associating the elements and/or attributes in all or part
of a document with a particular schema, as indicated by a URI. The
key aspect of the URI is that it is unique. The value of the URI need
not have anything to do with the XML document that uses it, although typically it would be a good location for the XML Schema or
DTD that defines the rules for the document type. The URI may be
mapped to a prefix which may then be used in front of tag and attribute names, separated by a colon. If not mapped to a prefix, the
URI sets the default schema for the current element and all of its
children.
A namespace declaration looks like an attribute on a start tag, but
may be identified by the keyword xmlns. In the following example,
the default namespace is set to the CellML namespace, and the
MathML namespace is declared and mapped to the mathml prefix,
which is then used on a <math> element. Note that the <model>
element and any children elements with no default namespace declaration or namespace prefix (such as the <component> element)
will be in the CellML namespace.
<model
xmlns="http://www.cellml.org/cellml/1.1#"
xmlns:mathml="http://www.w3.org/1998/Math/MathML">
<component> ... </component>
<mathml:math> ... math goes here ... </mathml:math>
</model>