0% found this document useful (0 votes)
46 views10 pages

XML Simplified References 2

XML

Uploaded by

stephanraza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views10 pages

XML Simplified References 2

XML

Uploaded by

stephanraza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

XML Simplified References

Aptech Ltd Version 1.1 Page 1 of 10


XML Simplified References

Table of Contents

S# Session Page #
1. Session 2: Namespaces
Principles to Declare Namespaces 3

XML 1.1 and Namespaces 1.1 revealed 6


2. Session 4: XML Schema
Named and Anonymous Complex Types 10

Aptech Ltd Version 1.1 Page 2 of 10


XML Simplified References

Session 2: Namespaces
Principl es to Declar e N amespaces

Source http://www-128.ibm.com/developerworks/xml/library/x-namcar.html
Date of 19/2/2009
Retrieval

Basic principles
The mechanism of XML namespaces has several moving parts: local names, namespace URIs, prefixes,
and declarations. The most important step in using namespaces effectively is to learn how to keep these
straight.

Local names

The point of namespaces is that you can use the best concise name for each element or attribute within
each context and then put these names in a namespace that distinguishes the context. The concise part
of the name that only need be unique within its own context is the local name. Be sure to take
advantage of the distinguishing context and don't repeat in local names information that's already
inherent in the namespace itself. For example, you don't need to make the local name of the linking
element in the XHTML namespace xhtml-link. Since it is already local to the XHTML namespace just
link will do. For historical reasons, the XHTML specifications themselves go against this guideline when
naming the root element html ; it could just as well have been renamed to document.

Namespace URIs

A namespace is a string with the syntax of a URI (often redundantly called the "namespace URI"). The
namespace is an integral part of the element's or attribute's name. The combination of a local name and
a namespace is called a universal name. In order to highlight the namespace's importance, XML pioneer
James Clark developed a notation for universal names that emphasizes how fundamentally namespace
and local name are bound. For example, the universal name with local part customer and namespace
http://uche.ogbuji.net/eg/ns is written in Clark's notation as
{http://uche.ogbuji.net/eg/ns}customer .

Choosing the namespace URI is important. Whether it's better to use URLs or URNs is the source of
some debate. The former have the advantage of familiarity, but people often create namespace URLs
that do not have any corresponding resource -- that is, if you browse to the equivalent URL you get a
404 "not found" error. URNs have the advantage that they don't encourage people to try to look them
up in browsers. Use URLs for namespaces if you are careful to place some sort of document at the URL
that would be useful for a reader. I recommend placing an RDDL 1.0 document at URLs that correspond
to namespaces, unless more specialized conventions apply. For example, in RDF/XML documents,
namespaces often lead to RDF schema documents when resolved as URLs. URNs have many classes

Aptech Ltd Version 1.1 Page 3 of 10


XML Simplified References

(classes of URNs are formally called "namespaces", not to be confused with XML namespaces). If you
don't wish to use URLs, use URNs if your organization has a means of managing and resolving a
suitable class of URN. Examples of URN namespaces include oid (an ISO-sanctioned system for assigning
numerically coded identifiers to network nodes) and publicid (formal public identifier entities as defined
in SGML and XML).

Prefixes

When specifying a universal name in an XML document, you use an abbreviation based on an optional
prefix that's attached to the local name. This abbreviation is called the qualified name or qname. The
prefix is optional because a special syntactical form allows you to specify a default namespace which is
associated with qnames that have no prefix. The prefix is strictly a syntactic convenience; in general, it is
not really a matter of XML language design but rather a matter of author or tool preference. I call such
issues instance details and I only cover them in these articles on design when in my experience the
designer has no choice but to consider them. I recommend that you publish well-known prefixes for
namespaces but never make any prefix mandatory. Choose well-known prefixes for a namespace when
creating documents but accept any chosen prefix for a namespace when reading documents.

Namespace declarations

The namespace declaration is the syntactic device through which prefixes are assigned to namespaces in
an XML document. This is technically an instance detail, but important enough that I devote a section to
guidelines for namespace declarations.

Use and evolution of namespaces


Some designers start out not using namespaces and later on adopt namespaces as they feel the need to
mix vocabularies. Such a cautious approach can seem sensible considering how tricky namespaces can
be. The problem is that since namespaces are a fundamental part of XML names, this change is more
significant than you might realize. It requires extensive changes in tools and other related materials. You
can deal with name clashes in other ways. Other than namespaces, the leading approaches are ideas
based on SGML architectural forms, in which names are directly declared and modified by tools in case
of clashes. Try to think as hard as possible about future developments for your XML design and be
decisive about whether to deal with name clashes, and how to do so. I have come to agree with many of
the criticisms of XML namespaces and dearly wish for a cleaner mechanism that was well established in
tools. For practical reasons based on my experience, these days I use namespaces in almost all of my
XML designs.

It is also difficult to decide when to evolve or differentiate a namespace. A namespace can be used for
versioning, or to differentiate concepts within a domain. The key to best deciding when to do so is to
remember that the namespace is a basic part of the name. Change or differentiate the namespace only
when you want to make a real, fundamental distinction that defines each element and attribute. If a
version change significantly alters the meaning of names in an XML vocabulary, then a namespace
change is probably in order. Otherwise, use other versioning mechanisms such as adding a version
attribute to top-level elements.

Aptech Ltd Version 1.1 Page 4 of 10


XML Simplified References

The pitfalls of using namespaces to make distinctions within a domain are best illustrated by example. In
1999 XHTML 1.0 became a finalized proposal. It was really just an XML variation on HTML 4.01, which
has three separate DTDs: strict, transitional, and frameset. The XHTML working group decided to use
three separate namespaces for the corresponding XHTML DTDs. This decision was met with an uproar in
the XML community. The main problem was that even though three separate DTDs existed, the meaning
of each element didn't change significantly from one to another; a code element in the XHTML
transitional DTD essentially means the same thing as a code element in the XHTML strict DTD. By
changing the names in each case, the XHTML design was working against this fact. In the end, the
XHTML working group corrected things by issuing new specifications that used a single namespace
across the XHTML 1.0 domain. You should heed this lesson well. Make distinctions in XML namespaces
only when there are truly distinctions between the things being named.

Unfortunately, things are rarely black and white. A common situation is when a new version of a
vocabulary adds new elements. The meaning of the carried over elements may not have changed and so
a namespace change may seem improper. But if you use the same old namespace, it may also seem
improper to place the elements added in the new vocabulary in the original namespace. Using a
different namespace for only the new elements is rarely a sensible option. In the end, you have to use
your judgment to decide whether or not to evolve the namespace with the vocabulary. Some tricks with
namespaces may give you other options, but you should use even these with care.

~~~ End of Article ~~~

Aptech Ltd Version 1.1 Page 5 of 10


XML Simplified References

XML 1.1 and Namespaces 1.1 revealed

Source http://www-128.ibm.com/developerworks/xml/library/x-xmlns11.html
Date of 19/2/2009
Retrieval

Why did the W3C define XML 1.1?


When the W3C created XML 1.0 in 1998, it chose to base its definition on Unicode 2.0, the then-current
version of the Unicode Standard. The Unicode Standard is meant to provide a unique number -- a code --
for every character in the world, so that all characters can be represented and correctly processed by
computers. Of course, assigning numbers to every character in the world is a task that takes time. For
this reason, the Unicode Consortium -- the standards body that defines Unicode -- has been working on
it for several years; they release a new version of its standard every other year or so, with each version
including a whole new set of characters. What this means, however, is that systems that depend on the
Unicode Standard need to be either designed in a forward-compatible manner or updated to
accommodate new versions.

Unfortunately, XML 1.0 was not designed to fully accommodate new versions of Unicode. While
characters that were not present in Unicode 2.0 can be used in XML 1.0 character data, they are not
allowed in important parts of XML such as element and attribute names, or enumerated attribute
values.

The reason for this is that the designers of XML 1.0 chose to limit these constructs to a range of
characters that were defined (assigned numbers) at that time. Understandably, they felt that allowing
character codes not yet assigned to any character made no sense and was risky. Unfortunately, this also
means that when new characters are defined, they cannot be used without a change in the definition of
XML.

As subsequent versions of Unicode were released, the lack of support for the new characters they
brought created the need to revise XML. This, plus the discovery of a few flaws inherent in any first
version, inspired the W3C to charter its XML Core Working Group to do just that.

What are the main differences between XML 1.1 and XML 1.0?
In the early days of its work on XML 1.1, the XML Core Working Group discussed the possibility of
simply changing the base of XML from Unicode 2.0 to the latest available version of Unicode, which was
then 3.0, by simply adding the new characters to the existing constructs. However, this would only have
been a temporary solution and a few versions of Unicode later, the Working Group would have found
itself in a similar situation. Therefore, they considered a more radical approach: forward compatibility.

Aptech Ltd Version 1.1 Page 6 of 10


XML Simplified References

You are no doubt already familiar with backward compatibility: A system is said to be backward
compatible when it can deal with something that is older than what it is developed for. Forward
compatibility is the capability of dealing with future versions. Note that these two characteristics are not
exclusive -- something can be both backward and forward compatible.

Unlike XML 1.0, XML 1.1 is forward compatible with the Unicode Standard. This means that it is defined
in such a way that an XML 1.1 processor developed today is able to process documents that use
characters only assigned in future versions of the Unicode Standard.

How is this done? In essence, XML 1.0 defines constructs such as element names by explicitly allowing
certain characters and excluding any other. This excludes any character that is not yet assigned. XML 1.1
takes the opposite approach: It allows every possible character except certain characters. These
characters typically are characters that have special meaning for XML processors, such as the opening
angle bracket (<) or the space character, and characters that might cause problems, such as the null
character. This approach means that characters that will be added to Unicode in the future are in fact
already allowed in element names and other similar constructs.

This approach has one small drawback, though. If you were to have a code in an XML 1.1 file that is not
yet assigned in Unicode -- meaning it does not correspond to any actual character -- your XML 1.1
processor would process it as if it were, without even issuing a mere warning of some kind. In the end,
however, the benefits were considered to outweigh this drawback -- especially since you would have to
go out of your way to generate such characters in the first place, because most authoring tools do not
even allow you to do so.

Other differences
Since the XML Core Working Group was in the process of defining a new version of XML, it seemed
appropriate to fix some other shortcomings that plagued XML 1.0 at the same time. The first of these is
a misalignment between the definition of what marks the end of a line in XML and what Unicode defines
this to be. This particularly affects IBM and IBM-compatible mainframes, as well as any system that
communicates with them. On these mainframes, tools mark the end of a line with a character (NEL) that
is not recognized as such by XML 1.0. What this means is that when you generate an XML 1.0 document
with a tool as simple as Notepad on these systems and feed this to an XML 1.0 compliant processor,
your document is rejected as not well-formed. XML 1.1 addresses this problem by adding NEL (#x85) to
the list of characters that mark the end of a line. For completeness, it also adds the Unicode line
separator character (#x2028) to this list.

In addition, XML 1.1 allows you to have control characters in your documents through the use of
character references. This concerns the control characters #x1 through #x1F, most of which are
forbidden in XML 1.0. This means that your document can now include the bell character, like this:
&#07;. However, you still cannot have these characters appear directly in your documents; this violates
the definition of the mime type used for XML (text/xml), and might cause problems with tools that
expect XML files to contain only textual characters and that treat control characters in a special way.

Aptech Ltd Version 1.1 Page 7 of 10


XML Simplified References

The last addition to XML 1.1 is character normalization checking. Even though the original intent of
Unicode was to provide a unique number for every character, certain characters -- or what users think of
as characters -- can actually be represented in more than one way. For instance, an "e" with an acute
accent (the inrsum) is typically represented as the single code assigned to that character (#xE9) or
as an equivalent sequence of multiple codes (#x65 for the "e" and #x301 for the acute accent). Also,
some characters don't have any unique code at all, like an "e" with a cedilla (the cedilla is the mark
below the "c" in "faade"). Instead, they can only be represented by combining several codes together
(in this case: #xE9 "e", followed by #x327 cedilla). This is because there is an unlimited number of
possible combinations. Where there are multiple equivalent representations, simple string comparisons
may fail to recognize equivalent strings as equal. To solve this problem, Unicode defines several ways to
normalize strings before they are processed. XML 1.1 provides for XML 1.1 processors to verify whether
a document is in a normal form or not; in the absence of this information, application programmers may
need to perform normalization or make sure that their code does not rely on a particular form of text.

Where's all the noise about XML 1.1?


So why haven't you heard more about XML 1.1? In short, to avoid chaos. The success of XML is largely
due to its stability and universality. You can trust that any XML 1.0 compliant processor is able to
process your well-formed XML 1.0 data. Introducing a new version of XML is basically like introducing a
new format -- it leads to having two sets of tools out there, the 1.0s and the 1.1s. Even if XML 1.1
processors are required to also support 1.0 (and therefore grok both 1.0 and 1.1 documents), the huge
collection of existing 1.0 tools will cough on XML 1.1 documents. For this reason, it was important for
XML 1.1 to be introduced carefully. The way the W3C has chosen to approach this difficult problem is by
recommending that applications that produce XML documents keep using XML 1.0 as much as possible,
and only use XML 1.1 when necessary. In practice, this means that unless you have a reason to change
anything, you shouldn't. This is why most people haven't seen any XML 1.1 yet. Tools like Xerces have
been supporting XML 1.1 for several months and few people have noticed. This strategy allows the
deployment of XML 1.1 processors without creating a mess that might be detrimental to the computer
industry as a whole.

In practice, though, the W3C recommendation can be hard to follow. Unless you get this information
along with the data, this can be costly to find out. Obviously, it would be much easier to simply always
generate XML 1.1 documents. Ideally, this time will come before too long.

But even then, you need to be aware of one special case. You'll recall that earlier I mentioned forward
and backward compatibility -- well, unfortunately XML 1.1 isn't fully backward compatible with XML 1.0.
Indeed, a few XML 1.0 characters are not allowed in XML 1.1. These are the control characters #x7F
through #x9F which are now restricted to appear as character references to improve the robustness of
character encoding detection. This may seem odd in a version that is meant to allow for more characters
to be directly contained in an XML document, but the benefits on the encoding detection front were
considered to outweigh this inconsistency and to be significant enough to justify the small
incompatibility. In practice, this still means that you have to look for these characters in your data when
you generate XML 1.1 documents.

Aptech Ltd Version 1.1 Page 8 of 10


XML Simplified References

Sharing external entities between 1.0 and 1.1 documents


As people start generating XML 1.1 documents, more and more of them will want to share external
entities between the 1.0 and 1.1 documents. One of the features of XML is to allow reuse of content by
providing a way to store content in separate files, and to include them in one another. Such pieces of
XML are called external entities. The introduction of XML 1.1 raises the question of how these are
handled in a mixed environment where XML 1.0 entities are included in XML 1.1 documents. For
simplicity, the XML 1.1 specification says that entities are treated according to the document in which
they are used. In practice, this means that you can use your old XML 1.0 entities in your new XML 1.1
documents; you don't need to convert or duplicate them to have them labeled as XML 1.1. The only
possible problem is that if you add an XML 1.1 only character to an XML 1.0 entity, the processor would
not detect it and would treat it as XML 1.1 input. However, this is only a problem if you then try to use
that entity as part of an XML 1.0 document again.

The Namespaces 1.1 specification


At the same time that the W3C released the XML 1.1 Recommendation, it released its companion
"Namespaces in XML 1.1" specification. This new version of the so-called XML Namespaces added little
to the previous version. For the most part it exists because "Namespaces in XML 1.0" is, by the way it's
defined, limited to XML 1.0, and cannot strictly speaking be used with XML 1.1. The new version
addresses that problem. However, that is not all: This new version brings one additional feature that is
worth mentioning. You may have been wondering for a long time why you're allowed to undeclare the
default namespace but you can't undeclare a specific namespace prefix. This was deemed unnecessary
by the original designers, but it has been bugging many people. It makes the model irregular, and that
gets reflected in the Infoset. This new version addresses this shortcoming by allowing you to do the
obvious -- undeclare a prefix by associating it to the empty namespace, like this: xmlns:foo="".

Does the XML Infoset specification have a 1.1 version?


The nature of the changes brought by XML 1.1 and Namespaces 1.1 do not necessitate such a change in
the Infoset specification. When the W3C released the other two recommendations, they also released a
new edition of the XML Information Set Recommendation in which the impact of these specs is
described, but basically it is limited in what content one can find in the Infoset. No structural change was
made to the data model, and therefore you don't need to define new information items or modify
existing ones. This means that you, the developer, do not have to worry much on that front either; if you
already handle Unicode characters in your programs, you should be able to deal with the new characters
introduced in XML 1.1 without changing anything.
~~~ End of Article ~~~

Aptech Ltd Version 1.1 Page 9 of 10


XML Simplified References

Session 4: XML Schema


Named and Anonymous Complex Types
Source http://csharpcomputing.com/XMLTutorial/Lesson11.htm
Date of
19/2/2009
Retrieval:

Just as with Simple Types, elements can use named or unnamed (anonymous) complex types as
well. The following boxes show both types of declaration for an element:
<xs:complexType name="weightType">
<xs:sequence>
<xs:element name="unit"/>
<xs:element name="conversion"/>
</xs:sequence>
</xs:complexType>
<xs:element name=weight type=weightType/>

<xs:element name="weight">
<xs:complexType>
<xs:sequence>
<xs:element name="unit"/>
<xs:element name="conversion"/>
</xs:sequence>
</xs:complexType>
</xs:element>

The box on the left declares a Complex Type named weightType which is then used to type
the element weight (shown in blue boldface). In the box on the right however, the element
weight (again shown in blue boldface) directly uses an anonymous Complex Type within
itself. Note that the definitions of the Complex Types in both boxes are exactly the same. Only
the method of referencing is different.
DTDs and Schemas Conclude
Over the last four chapters we have seen in fair depth how the syntax and structure of XML
documents is specified. The earlier Document Type Definition and the later Schema have been
the basis for many a XML document specification and now Schemas are gradually replacing
DTDs are the default method.

With that, we have reached Camp II and the end of XML proper! Now we embark on a very
interesting trail that of transformation of XML. We will see some directly useful things that
will explain why XML is surely taking centre stage when it comes to data interchange and
display across sources, channels and endpoints.

~~~ End of Article ~~

Aptech Ltd Version 1.1 Page 10 of 10

You might also like