What Is XML?
What Is XML?
Some markup languages (eg those used in word processors) only describe
appearances (’this is italics’, ‘this is bold’),
but this method can only be used for display, and is not normally
re-usable for anything else.
3. Where should I use XML?
Its goal is to enable generic SGML to be served, received, and
processed on the Web in the way that is now possible with HTML.
XML has been designed for ease of implementation and for interoperability
with both SGML and HTML.
Despite early attempts, browsers never allowed other SGML, only
HTML (although there were plugins), and they allowed it (even encouraged
it) to be corrupted or broken, which held development back for over
a decade by making it impossible to program for it reliably. XML
fixes that by making it compulsory to stick to the rules, and by
making the rules much simpler than SGML.
But XML is not just for Web pages: in fact it’s very rarely used
for Web pages on its own because browsers still don’t provide reliable
support for formatting and transforming it. Common uses for XML
include:
Information identification because you can define your own markup,
you can define meaningful names for all your information items.
Information storage because XML is portable and non-proprietary,
it can be used to store textual information across any platform.
Because it is backed by an international standard, it will remain
accessible and processable as a data format. Information structure
XML can therefore be used to store and identify any kind of (hierarchical)
information structure, especially for long, deep, or complex document
sets or data sources, making it ideal for an information-management
back-end to serving the Web. This is its most common Web application,
with a transformation system to serve it as HTML until such time
as browsers are able to handle XML consistently. Publishing the
original goal of XML as defined in the quotation at the start of
this section. Combining the three previous topics (identity, storage,
structure) means it is possible to get all the benefits of robust
document management and control (with XML) and publish to the Web
(as HTML) as well as to paper (as PDF) and to other formats (eg
Braille, Audio, etc) from a single source document by using the
appropriate stylesheets. Messaging and data transfer XML is also
very heavily used for enclosing or encapsulating information in
order to pass it between different computing systems which would
otherwise be unable to communicate. By providing a lingua franca
for data identity and structure, it provides a common envelope for
inter-process communication (messaging). Web services Building on
all of these, as well as its use in browsers, machine-processable
data can be exchanged between consenting systems, where before it
was only comprehensible by humans (HTML). Weather services, e-commerce
sites, blog newsfeeds, AJaX sites, and thousands of other data-exchange
services use XML for data management and transmission, and the web
browser for display and interaction.
XML
User definable tags
Content driven
End tags required for well formed documents
Quotes required around attributes values
Slash required in empty tags
HTML
Defined set of tags designed for web display
Format driven
End tags not required
Quotes not required
Slash not required
7. What is SGML?
SGML is the Standard Generalized Markup Language (ISO 8879:1986),
the international standard for defining descriptions of the structure
of different types of electronic document. There is an SGML FAQ
from David Megginson at http://math.albany.edu:8800/hm/sgml/cts-faq.htmlFAQ;
and Robin Cover’s SGML Web pages are at http://www.oasis-open.org/cover/general.html.
For a little light relief, try Joe English’s ‘Not the SGML
FAQ’ at http://www.flightlab.com/~joe/sgml/faq-not.txtFAQ.
* XML allows documents which are all the same type to be created
consistently and without structural errors, because it provides
a standardized way of describing, controlling, or allowing/disallowing
particular types of document structure. [Note that this has absolutely
nothing whatever to do with formatting, appearance, or the actual
text content of your documents, only the structure of them.]
* XML provides a robust and durable format for information storage
and transmission. Robust because it is based on a proven standard,
and can thus be tested and verified; durable because it uses plain-text
file formats which will outlast proprietary binary ones.
* XML provides a common syntax for messaging systems for the exchange
of information between applications. Previously, each messaging
system had its own format and all were different, which made inter-system
messaging unnecessarily messy, complex, and expensive. If everyone
uses the same syntax it makes writing these systems much faster
and more reliable.
* XML is free. Not just free of charge (free as in beer) but free
of legal encumbrances (free as in speech). It doesn’t belong to
anyone, so it can’t be hijacked or pirated. And you don’t have to
pay a fee to use it (you can of course choose to use commercial
software to deal with it, for lots of good reasons, but you don’t
pay for XML itself).
* XML information can be manipulated programmatically (under machine
control), so XML documents can be pieced together from disparate
sources, or taken apart and re-used in different ways. They can
be converted into almost any other format with no loss of information.
* XML lets you separate form from content. Your XML file contains
your document information (text, data) and identifies its structure:
your formatting and other processing needs are identified separately
in a style sheet or processing system. The two are combined at output
time to apply the required formatting to the text or data identified
by its structure (location, position, rank, order, or whatever).
</xsl:template>
20. How would you build a search engine for large volumes
of XML data?
The way candidates answer this question may provide insight into
their view of XML data. For those who view XML primarily as a way
to denote structure for text files, a common answer is to build
a full-text search and handle the data similarly to the way Internet
portals handle HTML pages. Others consider XML as a standard way
of transferring structured data between disparate systems. These
candidates often describe some scheme of importing XML into a relational
or object database and relying on the database’s engine for searching.
Lastly, candidates that have worked with vendors specializing in
this area often say that the best way the handle this situation
is to use a third party software package optimized for XML data.
update=”2001-11-22″>
<name>Camshaft end bearing retention circlip</name>
<image drawing=”RR98-dh37″ type=”SVG” x=”476″
<greeting>Hello, world!</greeting>
<response>Stop the planet, I want to get
off!</response>
</conversation>
Or they can be more complicated, with a Schema or question C.11,
Document Type Description (DTD) or internal subset (local DTD changes
in [square brackets]), and an arbitrarily complex nested structure:
type=”URI” alignment=”centered”/>
<white-space type=”vertical” amount=”24″/>
<author font=”Baskerville” size=”18/22″
style=”italic”>Vitam capias</author>
<white-space type=”vertical” role=”filler”/>
</titlepage>
Or they can be anywhere between: a lot will depend on how you want
to define your document type (or whose you use) and what it will
be used for. Database-generated or program-generated XML documents
used in e-commerce is usually unformatted (not for human reading)
and may use very long names or values, with multiple redundancy
and sometimes no character data content at all, just values in attributes:
<?xml version=”1.0″?> <ORDER-UPDATE AUTHMD5=”4baf7d7cff5faa3ce67acf66ccda8248″
ORDER-UPDATE-ISSUE=”193E22C2-EAF3-11D9-9736-CAFC705A30B3″
ORDER-UPDATE-DATE=”2005-07-01T15:34:22.46″ ORDER-UPDATE-DESTINATION=”6B197E02-
EAF3-11D9-85D5-997710D9978F”
ORDER-UPDATE-ORDERNO=”8316ADEA-EAF3-11D9-9955-D289ECBC99F3″>
<ORDER-UPDATE-DELTA-MODIFICATION-DETAIL ORDER-UPDATE-ID=”BAC352437484″>
<ORDER-UPDATE-DELTA-MODIFICATION-VALUE ORDER-UPDATE-ITEM=”56″
ORDER-UPDATE-QUANTITY=”2000″/>
</ORDER-UPDATE-DELTA-MODIFICATION-DETAIL>
</ORDER-UPDATE>
The parser must inform the application that white-space has occurred
in element content, if it can detect it. (Users of SGML will recognize
that this information is not in the ESIS, but it is in the Grove.)
<chapter>
<title>
My title for
Chapter 1.
</title>
<para>
text
</para>
</chapter>
In the example above, the application will receive all the pretty-printing
linebreaks, TABs, and spaces between the elements as well as those
embedded in the chapter title. It is the function of the application,
not the parser, to decide which type of white-space to discard and
which to retain. Many XML applications have configurable options
to allow programmers or users to control how such white-space is
handled.
<List>
<Item>Chocolate</Item>
<Item>Music</Item>
<Item>Surfingv</Item>
</List>
(assuming you put the DTD in that file). Now your editor will let
you create files according to the pattern:
<Shopping-List>
<Item>Chocolate</Item>
<Item>Sugar</Item>
<Item>Butter</Item>
</Shopping-List>
<invoice id=”abc123″
xmlns=”http://example.org/ns/books/”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation=”http://acme.wilycoyote.org/xsd/invoice.xsd”>
…
</invoice>
http://xml.silmaril.ie/faq.xml#ID(hypertext)
.child(1,#element,’answer’)
.child(2,#element,’para’)
.child(1,#element,’link’)
This means the first link element within the second paragraph within
the answer in the element whose ID is hypertext (this question).
Count the objects from the start of this question (which has the
ID hypertext) in the XML source:
1. the first child object is the element containing the question
();
2. the second child object is the answer (the element);
3. within this element go to the second paragraph;
4. find the first link element.
Eve Maler explained the relationship of XLink and XPointer as follows:
XLink governs how you insert links into your XML document, where
the link might point to anything (eg a GIF file); XPointer governs
the fragment identifier that can go on a URL when you’re linking
to an XML document, from anywhere (eg from an HTML file).
[Or indeed from an XML file, a URI in a mail message, etc…Ed.]
David Megginson has produced an xpointer function for Emacs/psgml
which will deduce an XPointer for any location in an XML document.
XML Spy has a similar function.
</header>
&chap1;
&chap2;
&chap3;
&chap4;
&chap5;
</novel>
The difference between this method and the one used for including
a DTD fragment (see question D.15, ‘How do I include one DTD
(or fragment) in another?’) is that this uses an external
general (file) entity which is referenced in the same way as for
a character entity (with an ampersand).
The one thing to make sure of is that the included file must not
have an XML or DOCTYPE Declaration on it. If you’ve been using one
for editing the fragment, remove it before using the file in this
way. Yes, this is a pain in the butt, but if you have lots of inclusions
like this, write a script to strip off the declaration (and paste
it back on again for editing).
The example above parses as: 1. Element person identified with Attribute
corpid containing abc123 and Attribute birth containing 1960-02-31
and Attribute gender containing female containing …
2. Element name containing …
3. Element forename containing text ‘Judy’ followed
by …
4. Element surname containing text ‘O’Grady’
(and lots of other stuff too).
As well as built-in parsers, there are also stand-alone parser-validators,
which read an XML file and tell you if they find an error (like
missing angle-brackets or quotes, or misplaced markup). This is
essential for testing files in isolation before doing something
else with them, especially if they have been created by hand without
an XML editor, or by an API which may be too deeply embedded elsewhere
to allow easy testing.
If your keyboard will not allow you to type the characters you want,
or if you want to use characters outside the limits of the encoding
scheme you have chosen, you can use a symbolic notation called ‘entity
referencing’. Entity references can either be numeric, using
the decimal or hexadecimal Unicode code point for the character
(eg if your keyboard has no Euro symbol (€) you can type €);
or they can be character, using an established name which you declare
in your DTD (eg ) and then use as € in your document. If you
are using a Schema, you must use the numeric form for all except
the five below because Schemas have no way to make character entity
declarations. If you use XML with no DTD, then these five character
entities are assumed to be predeclared, and you can use them without
declaring them: <
‘
The apostrophe or single-quote character (’) can be symbolised with
this character entity reference when you need to embed a single-quote
or apostrophe inside a string which is already single-quoted.
If you are using a DTD then you must declare all the character entities
you need to use (if any), including any of the five above that you
plan on using (they cease to be predeclared if you use a DTD). If
you are using a Schema, you must use the numeric form for all except
the five above because Schemas have no way to make character entity
declarations.
If not, all that is needed is to edit the mime-types file (or its
equivalent: as a server operator you already know where to do this,
right?) and add or edit the relevant lines for the right media types.
In some servers (eg Apache), individual content providers or directory
owners may also be able to change the MIME types for specific file
types from within their own directories by using directives in a
.htaccess file. The media types required are:
* text/xml for XML documents which are ‘readable by casual
users’;
* application/xml for XML documents which are ‘unreadable
by casual users’;
* text/xml-external-parsed-entity for external parsed entities such
as document fragments (eg separate chapters which make up a book)
subject to the readability distinction of text/xml;
* application/xml-external-parsed-entity for external parsed entities
subject to the readability distinction of application/xml;
* application/xml-dtd for DTD files and modules, including character
entity sets.
The RFC has further suggestions for the use of the +xml media type
suffix for identifying ancillary files such as XSLT (application/xslt+xml).
If you run scripts generating XHTML which you wish to be treated
as XML rather than HTML, they may need to be modified to produce
the relevant Document Type Declaration as well as the right media
type if your application requires them to be validated.
In the Web development area, the biggest thing that XML offers is
fixing what is wrong with HTML:
* browsers allow non-compliant HTML to be presented;
* HTML is restricted to a single set of markup (’tagset’).
If you let broken HTML work (be presented), then there is no motivation
to fix it. Web pages are therefore tag soup that are useless for
further processing. XML specifies that processing must not continue
if the XML is non-compliant, so you keep working at it until it
complies. This is more work up front, but the result is not a dead-end.
If you wanted to mark up the names of things: people, places, companies,
etc in HTML, you don’t have many choices that allow you to distinguish
among them. XML allows you to name things as what they are:
<person>Charles Goldfarb</person> worked
at <company>IBM</company>
gives you a flexibility that you don’t have with HTML:
<B>Charles Goldfarb</B> worked at<B>IBM<</B>
With XML you don’t have to shoe-horn your data into markup that
restricts your options.
<State>State</State>
<Country>Country</Country>
<PostalCode>H98d69</PostalCode>
</Address>
and:
<?xml version=”1.0″ ?>
<Server>
<Name>OurWebServer</Name>
<Address>888.90.67.8</Address>
</Server>
Each document uses a different XML language and each language defines
an Address element type. Each of these Address element types is
different — that is, each has a different content model, a different
meaning, and is interpreted by an application in a different way.
This is not a problem as long as these element types exist only
in separate documents. But what if they are combined in the same
document, such as a list of departments, their addresses, and their
Web servers? How does an application know which Address element
type it is processing?
One solution is to simply rename one of the Address element types
– for example, we could rename the second element type IPAddress.
However, this is not a useful long term solution. One of the hopes
of XML is that people will standardize XML languages for various
subject areas and write modular code to process those languages.
By reusing existing languages and code, people can quickly define
new languages and write applications that process them. If we rename
the second Address element type to IPAddress, we will break any
code that expects the old name.
A better answer is to assign each language (including its Address
element type) to a different namespace. This allows us to continue
using the Address name in each language, but to distinguish between
the two different element types. The mechanism by which we do this
is XML namespaces.
(Note that by assigning each Address name to an XML namespace, we
actually change the name to a two-part name consisting of the name
of the XML namespace plus the name Address. This means that any
code that recognizes just the name Address will need to be changed
to recognize the new two-part name. However, this only needs to
be done once, as the two-part name is universally unique.
</google:A>
(In practice, most people that create XML namespaces also describe
the element types and attributes whose names are in it — their
content models and types, their semantics, and so on. However, this
is not part of the process of creating an XML namespace, nor does
the XML namespace include or provide a way to discover such information.)
xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”>
<xsl:template match=”Address”>
<!– The Addresses element type is not part of the XSLT namespace.
–>
<Addresses>
<xsl:apply-templates/>
</Addresses>
</xsl:template>
</xsl:stylesheet>
<google:A xmlns:google=”http://www.google.org/”>
<google:B google:C=”bar”/>
</google:A>
–OR–
xmlns
These attributes are often called xmlns attributes and their value
is the name of the XML namespace being declared; this is a URI.
The first form of the attribute (xmlns:prefix) declares a prefix
to be associated with the XML namespace. The second form (xmlns)
declares that the specified namespace is the default XML namespace.
For example, the following declares two XML namespaces, named http://www.google.com/ito/addresses
and http://www.google.com/ito/servers. The first declaration associates
the addr prefix with the http://www.google.com/ito/addresses namespace
and the second declaration states that the http://www.google.com/ito/servers
namespace is the default XML namespace.
<Department
xmlns:addr=”http://www.google.com/ito/addresses”
xmlns=”http://www.google.com/ito/servers”>
NOTE: Technically, xmlns attributes are not attributes at all —
they are XML namespace declarations that just happen to look like
attributes. Unfortunately, they are not treated consistently by
the various XML recommendations, which means that you must be careful
when writing an XML application.
For example, in the XML Information Set (http://www.w3.org/TR/xml-infoset),
xmlns “attributes” do not appear as attribute information
items. Instead, they appear as namespace declaration information
items. On the other hand, both DOM level 2 and SAX 2.0 treat namespace
attributes somewhat ambiguously. In SAX 2.0, an application can
instruct the parser to return xmlns “attributes” along
with other attributes, or omit them from the list of attributes.
Similarly, while DOM level 2 sets namespace information based on
xmlns “attributes”, it also forces applications to manually
add namespace declarations using the same mechanism the application
would use to set any other attributes.
</google:A>
<google:A>
<google:B>abc</google:B>
</google:A>
For more information, see question 7.2. (Note that an earlier version
of MSXML (the parser used by Internet Explorer) did use fixed xmlns
attribute declarations as XML namespace declarations, but that this
was removed in MSXML 4.
<google:A xmlns:google=”http://www.google.org/”>
<google:B>
<google:C xmlns:google=”http://www.bar.org/”>
<google:D>abcd</google:D>
</google:C>
</google:B>
</google:A>
<A xmlns=”http://www.google.org/”>
<B>
<C xmlns=”http://www.bar.org/”>
<D>abcd</D>
</C>
</B>
</A>
<google:A xmlns:google=”http://www.google.org/”>
<google:B>
<google:C xmlns:google=”"> <==== This is an error
in v1.0, legal in v1.1.
<google:D>abcd</google:D>
</google:C>
</google:B>
</google:A>
<A xmlns=”http://www.google.org/”>
<B>
<C xmlns=”">
<D>abcd</D>
</C>
</B>
</A>
<google:A xmlns:google=”http://www.google.org/”>
<google:B>
<google:C>bar</google:C>
</google:B>
</google:A>
Simply using a text editor to cut the fragment headed by the <B>
element from one document and paste it into another document results
in the loss of namespace information because the namespace declaration
is not part of the fragment — it is on the parent element (<A>)
– and isn’t moved.
For example, suppose you have associated the serv prefix with the
http://www.our.com/ito/servers namespace and that the declaration
is still in scope. In the following, serv:Address refers to the
Address name in the http://www.our.com/ito/servers namespace. (Note
that the prefix is used on both the start and end tags.)
<!– serv refers to the http://www.our.com/ito/servers namespace.
–>
<serv:Address>127.66.67.8</serv:Address>
Now suppose you have associated the xslt prefix with the http://www.w3.org/1999/XSL/Transform
namespace. In the following, xslt:version refers to the version
name in the http://www.w3.org/1999/XSL/Transform namespace:
<!– xslt refers to the http://www.w3.org/1999/XSL/Transform
namespace. –>
<html xslt:version=”1.0″>
<C>efgh</C>
<!– D, E, and F are in the http://www.bar.org/ namespace. –>
<D xmlns=”http://www.bar.org/”>
<E>1234</E>
<F>5678</F>
</D>
<!– Remember! G is in the http://www.google.org/ namespace.
–>
<G>ijkl</G>
</A>
When elements whose names are in multiple XML namespaces are interspersed,
default XML namespaces definitely make a document more difficult
to read and prefixes should be used instead. For example:
<A xmlns=”http://www.google.org/”>
<B xmlns=”http://www.bar.org/”>abcd</B>
<C xmlns=”http://www.google.org/”>efgh</C>
<D xmlns=”http://www.bar.org/”>
<E xmlns=”http://www.google.org/”>1234</E>
<F xmlns=”http://www.bar.org/”>5678</F>
</D>
<G xmlns=”http://www.google.org/”>ijkl</G>
</A>
</google:B>
</google:A>
<A xmlns=”http://www.google.org/”>
<B xmlns=”http://www.bar.org/”>
<C>abcd</C>
</B>
</A>
</google:B>
</google:A>
<google:A xmlns:google=”http://www.google.org/”>
<google:B google:C=”google” />
<bar:D bar:E=”bar” />
</google:A>
</A>
<C>efgh</C>
</A>
<google:B>abcd</google:B>
<bar:C>efgh</bar:C>
</A>
One consequence of this is that you can place all XML namespace
declarations on the root element and they will be in scope for all
elements. This is the simplest way to use XML namespaces.
The reason qualified names are allowed in the DTD is so that validation
will continue to work.
<!DOCTYPE A [
<!ELEMENT A EMPTY>
]>
<A xmlns=”http://www.google.org/” />
<!DOCTYPE google:A [
<!ELEMENT google:A (bar:A)>
<!ATTLIST google:A
xmlns:google CDATA #FIXED "http://www.google.org/">
<!ELEMENT bar:A (#PCDATA)>
<!ATTLIST bar:A
<!ATTLIST google:A
xmlns:google CDATA #FIXED "http://www.google.org/">
<!ELEMENT bar:B EMPTY>
<!ATTLIST bar:B
xmlns:bar CDATA #FIXED "http://www.google.org/">
]>
<google:A>
<bar:B />
</google:A>
However, documents that use multiple prefixes for the same XML
namespace or the same prefix for multiple XML namespaces are confusing
to read and thus prone to error. They also allow abuses such as
defining an element type or attribute with a given universal name
more than once, as was seen earlier. Therefore, a better set of
guidelines for writing documents that are both valid and conform
to the XML namespaces recommendation is:
* Declare all xmlns attributes in the DTD.
* Use the same qualified names in the DTD and the body of the document.
<!ENTITY % p “” >
<!ENTITY % s “” >
<!ENTITY % nsdecl “xmlns%s;” >
Now use the p entity to define parameter entities for each of the
names in your namespace. For example, suppose element type names
A, B, and C and attribute name D are in your namespace.
<!ENTITY % A “%p;A”>
<!ENTITY % B “%p;B”>
<!ENTITY % C “%p;C”>
<!ENTITY % D “%p;D”>
Next, declare your element types and attributes using the “name”
entities, not the actual names. For example:
E CDATA #REQUIRED>
<!ELEMENT %C; (#PCDATA)>
There are several things to notice here.
* Attribute D is in a namespace, so it is declared with a “name”
entity. Attribute E is not in a namespace, so no entity is used.
* The nsdecl entity is used to declare the xmlns attribute. (xmlns
attributes must be declared on every element type on which they
can occur.) Note that a default value is given for the xmlns attribute.
* The reference to element type B in the content model of A is placed
inside parentheses. The reason for this is that a modifier — *
in this case — is applied to it. Using parentheses is necessary
because the replacement values of parameter entities are padded
with spaces; directly applying the modifier to the parameter entity
reference would result in illegal syntax in the content model.
Now let’s see how this all works. Suppose our XML document won’t
use prefixes, but instead wants the default namespace to be the
http://www.google.org/ namespace. In this case, no entity declarations
are needed in the document. For example, our document might be:
<C>bizbuz</C>
</A>
In this case, the internal DTD is read before the external DTD,
so the values of the p and s entities from the document are used.
Thus, after replacing the p, s, and nsdecl parameter entities, the
DTD is as follows. Notice that both the DTD and document use the
element type names google:A, google:B, and google:C and the attribute
names google:D and E.
E CDATA #REQUIRED>
<!ELEMENT google:C (#PCDATA)>
93. How can I validate an XML document that uses XML namespaces?
When people ask this question, they usually assume that validity
is different for documents that use XML namespaces and documents
that don’t. In fact, it isn’t — it’s the same for both. Thus, there
is no difference between validating a document that uses XML namespaces
and validating one that doesn’t. In either case, you simply use
a validating parser or other software that performs validation.
If your DTD contains element type and attribute names from a single
XML namespace, the easiest thing to do is to use your XML namespace
as the default XML namespace. To do this, declare the attribute
xmlns (no prefix) for each possible root element type. If you can
guarantee that the DTD is always read , set the default value in
each xmlns attribute declaration to the URI used as your namespace
name. Otherwise, declare your XML namespace as the default XML namespace
on the root element of each instance document.
If your DTD contains element type and attribute names from multiple
XML namespaces, you need to choose a single prefix for each XML
namespace and use these consistently in qualified names in both
the DTD and the body of each document. You also need to declare
your xmlns attributes in the DTD and declare your XML namespaces.
As in the single XML namespace case, the easiest way to do this
is add xmlns attributes to each possible root element type and use
default values if possible.
The only thing you need to be careful about when using the same
document with both namespace-aware and namespace-unaware applications
is when the namespace-unaware application requires the document
to be valid. In this case, you must be careful to construct your
document in a way that is both valid and conforms to the XML namespaces
recommendation. (It is possible to construct documents that conform
to the XML namespaces recommendation but are not valid and vice
versa.)
<!ATTLIST google:A
xmlns:google CDATA #FIXED “http://www.google.org/”>
<!ELEMENT google:B (#PCDATA)>
<!ATTLIST google:B
xmlns:google CDATA #FIXED “http://www.google.org/”>
MSXML returned an error for the following because the second google
prefix was not “declared”:
The reason for this restriction was so that MSXML could use universal
names to match element type and attribute declarations to elements
and attributes during validation. Although this would have simplified
many of the problems of writing documents that are both valid and
conform to the XML namespaces recommendation some users complained
about it because it was not part of the XML namespaces recommendation.
In response to these complaints, Microsoft removed this restriction
in later versions, which are now shipping. Ironically, the idea
was later independently derived as a way to resolve the problems
of validity and namespaces. However, it has not been implemented
by anyone.
For example:
http://www.google.com/ito/sales^SalesOrder
Your application can then base its processing on these longer names.
For example, the code:
public void startElement(String elementName, AttributeList attrs)
throws SAXException
{
…
if (elementName.equals(”SalesOrder”))
{
// Add new database record.
}
…
}
might become:
public void startElement(String elementName, AttributeList attrs)
throws SAXException
{
…
if (elementName.equals(”http://www.google.com/sales^SalesOrder”))
{
// Add new database record.
}
…
}
or:
public void startElement(String elementName, AttributeList attrs)
throws SAXException
{
…
// getURI() and getLocalName() are utility functions
if (elementNode.getNodeName().equals(”SalesOrder”))
{
// Add new database record.
}
if (elementNode.getLocalName().equals(”SalesOrder”))
{
// Add new database record.
}
}
Note that, unlike SAX 2.0, DOM level 2 treats xmlns attributes
as normal attributes.
For example, both of the following are qualified names. The first
name has a prefix of serv; the second name does not have a prefix.
For both names, the local part (local name) is Address.
serv:Address
Address
<!DOCTYPE foo:A [
<!ELEMENT foo:A (foo:B)>
<!ATTLIST foo:A
foo:C CDATA #IMPLIED>
<!ELEMENT foo:B (#PCDATA)>
]>
<foo:A xmlns:foo=”http://www.foo.org/” foo:C=”bar”>
<foo:B>abcd
<foo:A>
There are two potential problems with this. First, the application
must be able to retrieve the prefix mappings currently in effect.
Fortunately, both SAX 2.0 and DOM level 2 support this capability.
Second, any general purpose transformation tool, such as one that
writes an XML document in canonical form and changes namespace prefixes
in the process, will not recognize qualified names in attribute
values and therefore not transform them correctly. Although this
may be solved in the future by the introduction of the QName (qualified
name) data type in XML Schemas, it is a problem today.
<foo:A>
<A>
<A>
http://www.foo.com/ito/servers^Address
The third representation places the XML namespace name (URI) in
braces and concatenates this with the local name. This notation
is suggested only for documentation and I am aware of no code that
uses it. For example, the above name would be represented as:
{http://www.foo.com/ito/servers}Address
* One or both people use a URI that is not under their control,
such as somebody outside Netscape using the URI http://www.netscape.com/,
or
* Both people have control over a URI and both use it.
<serv:Addresses xmlns:serv=”http://www.foo.com/ito/addresses”>
116. Can I use the same prefix for more than one XML namespace?
Yes.
118. What does the URI used as an XML namespace name point
to?
The URI used as an XML namespace name is simply an identifier.
It is not guaranteed to point to anything and, in general, it is
a bad idea to assume that it does. This point causes a lot of confusion,
so we’ll repeat it here:
URIs USED AS XML NAMESPACE NAMES ARE JUST IDENTIFIERS. THEY ARE
NOT GUARANTEED TO POINT TO ANYTHING.
While this might be confusing when URLs are used as namespace names,
it is obvious when other types of URIs are used as namespace names.
For example, the following namespace declaration uses an ISBN URN:
xmlns:xbe=”urn:ISBN:0-7897-2504-5″
Accept: application/rdf+xml
Response:
HTTP/1.1 200 Ok
Content-Type: application/rdf+xml
<rdf:RDF />
This request asks for the entire triple store, serialized as RDF/XML.
In that case the HTTP request, including a copy of the RDQL query
wrapped up as an XPointer expression, looks as follows. Note that
we have added a range-unit whose value is xpointer to indicate that
the value of the Range header should be interpreted by an XPointer
processor. Also note the use of the XPointer xmlns() scheme to set
bind the namespace URI for the rdql() XPointer scheme. This is necessary
since this scheme has not been standardized by the W3C.
The response looks as follows. The HTTP 206 (Partial Content) status
code is used to indicate that the server recognized and processed
the Range header and that the response entity includes only the
identified logical range of the addressed resource.
HTTP/1.1 206 Partial Content
Content-Type: application/rdf+xml
* element()
* xpath() - This is not a W3C defined XPointer scheme since W3C
has not published an XPointer sheme for XPath. The namespace URI
for this scheme is http://www.cogweb.org/xml/namespace/xpointer
. It provides for addressing XML subresources using a XPath 1.0
expressions.
133. What are the valid values for xlink:actuate and xlink:show?
Don’t blame me to put such a simple question here. I saw a famous
exam simulator gave wrong answer on this one. Typing them out also
help me to remember them. xlink:actuate onRequest, onLoad, other,
none xlink:show replace new embed other none
4. HTML links are activated when user clicks on them. XLink has
option of activating automatically when XML document is processed.