XHTML by - Example
XHTML by - Example
Ann Navarro
XHTML by Example
Copyright 2001 by Que
Acquisitions Editor
Todd Green
Development Editor
Sean Dixon
Managing Editor
Thomas F. Hayes
Project Editor
Karen S. Shields
Copy Editor
Sossity Smith
Indexers
Kevin Fulcher
Larry Sweazy
Proofreaders
Jeanne Clark
Megan Wade
Technical Editors
02
Shane McCarron
Kynn Bartlett
Benot Marchal
01
00
Trademarks
All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. Que
cannot attest to the accuracy of this information. Use of a term
in this book should not be regarded as affecting the validity of
any trademark or service mark.
Team Coordinator
Cindy Teeters
Interior Designer
Karen Ruggles
Cover Designer
Rader Design
Production
Darin Crone
iii
Contents at a Glance
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Part I: Learning XHTML
7
1 XHTML Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
2 Adding Semantics to Structure . . . . . . . . . . . . . . . . . . . . . . . . . . .23
3 Working with Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
4 Collecting Data with Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59
5 Working with Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79
6 Using Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107
7 Universal Accessibility on the Web . . . . . . . . . . . . . . . . . . . . . . . .127
8 Validating XHTML Documents . . . . . . . . . . . . . . . . . . . . . . . . . .141
9 Implementing XHTML Today . . . . . . . . . . . . . . . . . . . . . . . . . . . .157
Part II: XHTML Style and Structure
10 XHTML as the Bridge to XML . . . . . . . . . . . . . . . .
11 Using Cascading Style Sheets with XHTML . . . . .
12 XSLStyle the XML Way . . . . . . . . . . . . . . . . . . .
13 Document Type DefinitionsThe Syntax Rulebook
....
....
....
...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
173
. . .175
. . .185
. . .199
. . .225
Table of Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Part 1 Learning XHTML
7
1 XHTML Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
XHTML Document Well-Formedness and Validity . . . . . . . . . . . .10
Choosing an XHTML Document Type . . . . . . . . . . . . . . . . . . . . . .10
XHTML 1.0 Strict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
XHTML 1.0 Transitional . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
XHTML 1.0 Frameset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Meta InformationThe Document Head . . . . . . . . . . . . . . . . . . .15
The Doctype Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Head, Title, and Meta Tags . . . . . . . . . . . . . . . . . . . . . . . . . .18
Building Blocks of XHTML Documents . . . . . . . . . . . . . . . . . . . . .19
Block-Level Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
Inline Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
Whats Next . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
2 Adding Semantics to Structure . . . . . . . . . . . . . . . . . . . . . . . . . . .23
The Semantics of Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
Organizing Documents with Headings . . . . . . . . . . . . . . . . . . . . .24
Grouping and Ordering Data with Lists . . . . . . . . . . . . . . . . . . . .25
Unordered Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
Ordered Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
Definition Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36
Emphasizing Important Content . . . . . . . . . . . . . . . . . . . . . . . . . .38
Inline Emphasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38
Block-Level Emphasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39
Whats Next . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40
3 Working with Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
Image Formats for the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44
GIF Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44
JPEG Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44
PNG ImagesThe Webs Newest Format . . . . . . . . . . . . . . .45
Web Graphics Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45
Adding Graphics Using the Image Element . . . . . . . . . . . . . . . . .46
Image and Text Alignment . . . . . . . . . . . . . . . . . . . . . . . . . .47
Using Images as Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48
Image Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52
Creating an Image Map with CuteMAP . . . . . . . . . . . . . . . .53
Whats Next . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56
vi
vii
.
.
.
.
.
.
.
.
.
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . .166
. . . . . . .167
. . . . . . .168
. . . . . . .170
. . . . . . .171
. . . . . . .171
viii
ix
xi
Dedication
For Dave: Want to go for a swim?
Acknowledgments
The process of writing a book involves a great many people. As always, I am
indebted to many for their assistance, advice, and support throughout the creation of this manuscript. At Macmillan USA Id like to thank acquisitions editor
Todd Green, development editor Sean Dixon, as well as Sossity Smith, Karen
Shields, Jeanne Clark, Kevin Fulcher, Larry Sweazy, and the Macmillan production team.
Thanks to my agents Neil Salkind, David Rogelberg, and all of the staff at
Studio B Productions, who so deftly handle the business aspects of my writing
career.
Special thanks goes to Shane McCarronmy technical editor, friend and colleague on the HTML Working Group. Without his sharp eye, quick wit, and
occasional firm nudge, this book would not have been nearly as accurate or complete.
And finally to my husband Dave, who watched over WebGeek while my attention was focused firmly on my laptop, writing this book at some of the oddest
moments.
xii
317-581-4666
Email:
Mail:
Associate Publisher
Que
201 West 103rd Street
Indianapolis, IN 46290 USA
Introduction
From HTML to XHTML
If youve been on the Web for more than just a few minutes, youve probably
heard about HTML. Its so pervasive that its discussed frequently on television,
and not just as an answer in a quiz show! Story lines in comedies include characters having their own Web sites. You hear www-dot-something-dot-com in
nearly every radio advertisement. HTML is showing up in rsums outside the
realm of tech workersits become mainstream. But you know that the Web is
continuing to grow: XML, the Extensible Markup Language, is the buzzword in
the halls of business today. XHTML, the Extensible Hypertext Markup
Language, bridges the two worlds: HTML to XML.
Introduction
Introduction
Overview of Chapters
This book is divided into logical parts and chapters to help you find the lessons
that are most appropriate to your knowledge level. What follows is a description
of each part of this book, including a look at each chapter.
Introduction
Meaning
Italic
Bold
Computer type
NOTE
Notes provide additional information related to a particular topic.
TIP
Tips provide quick and helpful information to assist you along the way.
Introduction
Part I
Learning XHTML
1 XHTML Fundamentals
2 Adding Semantics to Structure
3 Working with Images
4 Collecting Data with Forms
5 Working with Tables
6 Using Frames
7 Universal Accessibility on the Web
8 Validating XHTML Documents
9 Implementing XHTML Today
1
XHTML Fundamentals
At first glance, XHTML documents are very similar to HTML documents.
Much of what has changed is in the background. Instead of being based on
SGML (the Standard Generalized Markup Language) as HTML is,
XHTML is an application of XML, the Extensible Markup Language.
Because of this, several fundamental changes have been made to the way
you write XHTML content as compared to how you would author content in
HTML.
The W3C was actually quite concerned about the transition that needed to
take place between HTML and XML. In May of 1998 they held a public
workshop on the topic near San Francisco, California. During the two-day
event, Web developers, software vendors, and authors of both the HTML
and XML recommendations discussed how HTML could be brought into the
XML world while minimizing the learning curve. The solution suggested
was a transitional language that would provide a bridge between HTML
and XML. That language is what we now know as XHTML.
XTHML conforms to the XML concept of well-formedness, which restricts
the author to a complete and ordered syntax. This really isnt anything new
that wasnt in HTML, just that this time its being enforced where completeness was at times optional before. In addition to well-formedness, XML
also introduces several new attributes that appear on elements in the document head section. Well explore each of those and get a quick overview of
the structural concepts for all XHTML documents.
This chapter teaches you:
The three document types in XHTML 1.0
XML namespaces
How to handle language defaults
How to add meta information
The differences between block-level and inline elements
10
These basic questions can point you in the right direction. Lets take a
closer look at each of the three document types, so that your decisions will
indeed be the correct ones.
11
Document
Yes
Does it use
presentational
markup?
No
Yes
Is all the
presentational
markup in a
stylesheet?
No
Yes
Figure 1.1: You can use this decision tree to determine the proper XHTML
document type for your needs.
EXAMPLE
The following memo has several instances where words and phrases need
to be offset from the rest of the text. In the headers, the To:, From:, and Re:
fields are traditionally boldface. Elsewhere, when citing the name of a magazine publication, the name is italicized.
12
13
OUTPUT
Figure 1.2: This memo has markup that includes bold and italics tags.
NOTE
You might have noticed that the filename in Listing 1.1 is memo.html. XHTML has not
defined its own MIME-type, and retains the document naming conventions of HTML.
Therefore, XHTML documents will have the .html file extension.
This sample uses the presentational tags <i> and <b> to produce the italic
and boldface effects. However, remember that, with XHTML 1.0 Strict, presentation isnt allowed in the elements or attributes. We can change this
passage to be compliant with Strict (as shown in Figure 1.3) by replacing
<i> and <b> with <em> and <strong>, respectively:
<p><strong>To:</strong> - Joe Cline<br>
<strong>From:</strong> - Marshall Jansen<br>
<strong>Re:</strong> - <em>Business Week</em> article</p>
<p>Joe,</p>
<p>Attached is a copy of a recent <em>Business Week</em> article focusing on the
success of e-commerce in our industry, with a mention of our award winning Web
site! Please circulate amongst your staff.</p>
14
Figure 1.3: A conforming XHTML 1.0 Strict document uses emphasis and
strong emphasis rather than bold and italic elements.
Notice in Figures 1.2 and 1.3 that both of these paragraphs look the same
in Netscape Navigator. The paragraphs would look the same when viewed
in Internet Explorer as well. This is because the browser programmers
have adopted the conventional rendering of the emphasis and strong
emphasis elements as italic and boldface, respectively.
I say traditional rendering here, in that the XHTML specification doesnt
require that those elements be presented in italic or boldface. Instead, it
only requires that they be given some form of emphasis that is distinct
from the main body of text and from each other. So a user agentthat is,
the browser or other display softwarecould render emphasis as purple
text, or a bigger font size, or a combination of both and still be compliant
with the specifications.
15
handles things like color, alignment, width, and size. Consider this one
line:
<p align=center>This would render centered in the browser</p>
16
The DOCTYPE declaration can be broken down thusly: The tag is opened with
the string <!DOCTYPE. Next is the root element for this document type, in
our case, html (note that html here is in lowercase). PUBLIC declares that the
next portion is a public identifier for this document type, rather than a
local name, which would be noted using the SYSTEM keyword. For example,
when working with the Transitional DOCTYPE declaration, the string
-//W3C/DTD XHTML 1.0 Transitional//EN is known as the formal public
identifier, or FPI. Each version has its own FPI, as does any XML-based
language. Finally, the URI provided is the location of the document type
definition file.
CAUTION
The DOCTYPE declaration is unique in XHTML in that it has mixed-case components.
Doctypes must be written exactly as they are shown herecapitalization, spacing, punctuation, and all. The browser or parsers ability to recognize them (and act upon them
appropriately) depends on it.
XHTML diverges from HTML at this point, adding several attributes to the
root element that you might not have seen before. These include an XML
namespace, the XML language attribute, and the language attribute. These
XML-based attributes are important in providing the additional detail
required that allows XHTML to act as that bridge between HTML and
XML that I described previously.
THE NAMESPACE
When the W3C Working Group responsible for creating the Namespaces in
XML recommendation did so, they were likely unaware of the firestorm
they were about to unleash on the development community. For a concept
with such a seemingly simple definition, the debates about what it truly
means have been both staggering and unending.
17
For more on namespaces and how they impact XHTML documents, see The
Freedom of XMLDefining It All Yourself, p. 176.
xml:lang AND lang
The two language attributes hold the same information: an identifier for
which language is used when writing the document. The first version, the
xml:lang attribute, is the XML-conforming attribute that identifies the primary natural language that the document is written in. XHTML 1.0 also
includes the HTML-based lang attribute for backward compatibility. If
you include these attributes, you should use both for full interoperability.
Although the attributes are optional for the same reasoninteroperability
that youd use both if either were present, a good argument can be made for
never leaving them out. Its a small consideration toward broader acceptance and usability of your documents.
The value of the xml:lang and lang attributes are two-letter language
codes. The code en refers to English, fr to French, and so on. A list of
country codes can be found online at http://www.ics.uci.edu/pub/
websoft/wwwstat/country-codes.txt. Do note, of course, that not all country
codes correspond to a language code for purposes of the lang attribute when
a country doesnt have its own unique language.
18
Finally, then, we have a complete opening tag for the root element of any of
our XHTML documents:
<html xmlns=http://www.w3.org/1999/xhtml xml:lang=en lang=en>
Next within the head content are any required meta elements. Meta is used
to hold descriptive information about the document itself. Quite often youll
see author credit, creation date, copyright information, and other details
that you might normally see in the front matter of a book, or in the masthead area of a magazine.
The meta tag has two primary attributes, name and content:
<meta name=author content=Ann Navarro>
The value of the name attribute can be any string of characters, provided
that there are no spaces. That would mean
name=author-name
is valid, whereas
name=author name
is not.
The content attribute, on the other hand, doesnt have that restriction, so
phrases like my name, a list of comma-delimited values, and other entries
are all possible values.
A full head element for this book, then, might read:
EXAMPLE
<head>
<title>XHTML by Example</title>
<meta name=author content=Ann Navarro></meta>
<meta name=publisher content=Que></meta>
<meta name=ISBN content=>>0789723859<<></meta>
</head>
19
TIP
Many document authors find it useful to practice version control within meta tags.
Document version numbers, publication dates, and other authoring notes can be stored
in these elements for reference by future editors.
Block-Level Elements
Think of block-level elements as the large concrete blocks used to build a
house or other building. They provide the main structure of the document,
where the inline elements provide the finishing details.
The definition of what can or cant be included within an element is known
as the content model. The content model for each element is defined in the
document type definition, or DTD. That is the document that is referenced
by the URI in the doctype declaration. It has, in rather cryptic notation
similar to the notation used to describe computer languages known as
Extended Backus-Naur Form, the required syntax of all XHTML elements,
attributes, and character entities.
The declaration for the body element in the XHTML 1.0 Strict DTD is seen
here:
<!ELEMENT body %Block;>
This says that body is the name of the element, and it can contain what is
defined in the parameter entity Block. Looking further to the definition of
Block we find
<!ENTITY % Block (%block; | form | %misc;)*>
This says that the entity Block is defined as a choice of the elements in the
block entity (lowercase, differentiating it from Block), the form element, or
the elements defined in the misc entity.
The block entity is defined as
<!ENTITY % block
p | %heading; | div | %lists; | %blocktext; | fieldset | table>
20
parsing these formal definitions. There are tools available to you, such as
the validator discussed in Chapter 8, Validating XHTML Documents, that
will help check the compliance of your document to the DTD.
Although not all do, block-level elements are the only elements that also
might contain other block-level elements. In addition, they all might contain inline elements and character data.
Inline Elements
Inline elements are those that can only contain other inline elements or
character data. They can never contain block-level elements. For instance,
the emphasis element is an inline element. It can contain character data
(the text contained within the start and end tags) and other elements, such
as strong emphasis.
EXAMPLE
Its not uncommon to see text that has been both italicized and presented in
boldface. To accomplish this, two inline elements are nested. You can wrap
the magazine name in our previous memo within the strong element to
increase its emphasis:
<p>Attached is a copy of a recent <strong><em>Business Week</em></strong>
article focusing on the success of e-commerce in our industry, with a mention
of our award winning Web site! Please circulate among your staff.</p>
The output seen in Figure 1.4 shows both emphasis styles applied to the
data within those tags.
OUTPUT
Figure 1.4: This document uses nested inline elements: boldface and italic.
Whats Next
21
It doesnt matter in which order most elements are nested, but once nested,
they must be closed in descending order. In the previous example, the
<strong> tag came first, then the <em> tag. Therefore the </em> tag must be
the first closing tag, as it completes the innermost tag, then the outer
</strong> tag is closed. If they are closed in improper order, the document
is not well-formed.
Whats Next
In this chapter, youve learned about the basic structure of an XHTML document. Each document must follow one of three document types, the choice
of which is based on the type of content the document will contain.
Every XHTML document must contain at least four out of five elements
before any visible content is included. These elements include the doctype
declaration, root, head, and title elements. Meta tags are optional, though
are contained within the head element when present.
Block-level elements are those that can, at times, contain other block-level
elements, and can always contain inline elements and data. Block-level elements tend to make up the larger structures of the document. Inline elements may only contain other inline elements and data. They tend to
provide the finishing touches or minor details in the overall structure of the
document.
In Chapter 2, Adding Semantics to Structure, well talk about adding
semantics to the structure of your document. Youll learn how to provide
meaning in your organization by choosing the proper heading levels, organizing data using lists, and emphasizing important content.
2
Adding Semantics to Structure
In the first chapter you spent a considerable amount of time learning about
the structure of HTML, XHTML, and your documents. The purpose of this
was more than just to provide you with a nice visual of building blocks, or
nesting Tupperware containers. Instead, weve been leading up to the
discussion in this chapter. In this chapter, you learn how the structural
components of XHTML differ from each other, why those distinctions are
important, and what the characteristics of each component are meant to
tell us.
This chapter teaches you:
How semantics define XHTML behavior
Document organization using headings
How to order and group data using lists
How to emphasize data using structure
24
25
That could be converted to a more standard outline form using the following notation:
I. From HTML to XHTML
A.
B.
C.
EXAMPLE
Many HTML authors are tempted to choose their heading levels by the size
or appearance of the text when its rendered by the browser. As Ive often
repeated in these first chapters, avoid the temptation to do so, as it
removes the structural information from the document and applies presentational semantics to elements that werent meant to have them.
26
This structure doesnt require specific ordering, but it does depend on having a collection of individual data points that form a group. Data points
within groups, then, are generally presented in list format, be that delimited by bullet points, numbers, letters, commas, spaces, or any other identifiable character or presentation.
EXAMPLE
Its important to remember here that a list has both semantics and structure. A list is not just about its delimiting characters. Its functionality is
defined by both the bounding of the data into a group (the structure), and
then by the presentation of the individual units within the group (the
semantics).
In XHTML, we have the same three major list types as found in HTML 4:
unordered lists, ordered lists, and definition lists.
Unordered Lists
An unordered list is a representation of a data collection where the individual members have no relative rank or position. For instance, the coins in
my wallet have no specific order to them, and they could be described using
an unordered list. My coins include five quarters, two dimes, a nickel, and
seven pennies.
The basic unordered list syntax remains as
<ul>
<li>five quarters</li>
<li>two dimes</li>
<li>a nickel</li>
<li>seven pennies</li>
</ul>
Notice that in this example, each of the opening list item tags has a corresponding closing tag, meeting the well-formedness requirement of XHTML.
The only rendering action required of the XHTML-compliant browser is
that the list items be delimited in an unordered manner. The most traditional rendering is to use a solid bullet point (disc), as shown in Figure 2.1.
CHANGING LIST ITEM DELIMITERS
When working with the Transitional or Frameset document types, document authors have the additional freedom to select presentation instructions for delimiters using the type attribute. Table 2.1 shows possible
values for type.
27
Figure 2.1: The typical rendering of an unordered list uses solid discs.
Table 2.1: Available Bullet Types for Unordered Lists
Attribute Value
Representation
A filled circle
A small filled square
An unfilled circle
A filled circle
The following example makes use of the default, circle, and square delimiter types:
EXAMPLE
<body>
<p>Default list</p>
<ul>
<li>item one</li>
<li>item two</li>
</ul>
<p>List Two - Circles</p>
<ul type=circle>
<li>item one</li>
<li>item two</li>
</ul>
<p>List Three - Squares</p>
<ul type=square>
<li>item one</li>
<li>item two</li>
</ul>
</body>
28
OUTPUT
<ul compact>
XHTML does not allow attributes to be expressed using the Boolean syntax
due to the necessity of all attributes having values, which is a part of wellformedness. Therefore the second example,
<ul compact=compact>
is required in XHTML.
29
NOTE
At this time, neither Netscape nor Internet Explorer support the compact attribute.
However, the capability of another browser or XML-based parser to support it should not
be discounted. Remember, most of the semantics associated with these elements and
attributes are defined by tradition rather than dictate.
Ordered Lists
Ordered lists, contrasted with unordered lists, attach priority and relative
position to their members. What that order means isnt directly provided by
the semantic of having order. Instead, the surrounding prose will provide
appropriate context.
For instance, a numbered list could be used to express the members of my
family. I could choose to order them as
1. Ann
EXAMPLE
2. Dave
3. Linda
The meaning of the ordering isnt apparent simply because the list is
ordered. The reader could assume that I placed myself in the number one
position just as a self-referential starting point. In view of the end result,
they could be correct, but only coincidentally. In this case they would not be
correct. The fact that I ordered this list by age would only be apparent if I
expressed that in my text before or after the list.
TIP
Dont make your readers guess at the internal semantics of your ordered lists. Be sure
to identify in the surrounding prose what meaning should be inferred from the ordering
of your list. If no meaning should be inferred, change to an unordered list.
The rendering options for ordered lists, as with unordered lists, are set
using the type attribute. The potential values for type are shown in
Table 2.2.
Table 2.2: Number Sets Used in Ordered Lists
Attribute Value
Representation
1
a
A
i
I
30
SETTING
THE
A unique aspect of ordered lists is the starting value used for the list delimiter. By default, the delimiter starts at the beginning of the chosen set, that
is the number 1, the letter A, or the Roman numeral I.
However, document authors are free to choose a different starting point.
This is accomplished using the start attribute, placed on the OL element.
EXAMPLE
Setting a unique starting point isnt confined to Arabic numerals. The same
approach can be taken with any ordered list type. We could recast my list of
recent checks using Roman numerals, for a little fun (see Figure 2.4):
EXAMPLE
Note that although I chose a non-numeric list delimiter in the second example, I still manipulated the start value using a numerical value. Despite the
fact that some list representations use letters, they are indeed still a numbering system. Therefore the value of a start attribute can, if only for simplicitys sake, be represented in a single system: the Arabic numerals.
OUTPUT
31
32
MANIPULATING
AN INDIVIDUAL
EXAMPLE
When looking at athlete placement in sports score results, you often see
individual rankings jump from one number down two or more when two or
more persons are tied in score. To do this, youd set a specific value on the
individual that comes after the tiesuch as Nancy and Sue tied for third
place. Rather than listing a fourth place result, the next athlete is listed as
finishing in fifth place:
<body>
<p>Top six finishers in the Ladies 100 meter Freestyle
<ol>
<li>Janet Davis</li>
<li>Stephanie Lindstrom</li>
<li>Nancy Cruz<br>Sue Clayton</li>
<li value=5>Linda Nelson</li>
<li>Suriya Khan</li>
</ol>
IN
33
NESTED LISTS
Nested ordered lists are frequently used to present information in a traditional outline form. In developing the table of contents for this book, we
could retrieve the chapter title and individual section headings, and come
up with a list that looks like this:
II. Adding Semantics to Structure
1. Organizing Documents with Headings
2. Grouping and Ordering Data with Lists
A. Unordered Lists
1. Changing the List Item Delimiter
2. Compacting Your List
B. Ordered Lists
1. Setting the List Start Value
2. Manipulating an Individual List Item Value
3. Mixing List Types in Nested Lists
C. Definition Lists
1.Nesting Ordered or Unordered Lists within Definition
Lists
3. Emphasizing Important Content
A. Inline Emphasis
B. Block-level Emphasis
4. Whats Next?
To produce this in XHTML, wed need a set of four nested lists.
The first list is an uppercase Roman numeral list, containing the chapter
numbers and titles. Because were looking only at Chapter 2, we need to set
the start value of this list to 2:
EXAMPLE
NOTE
The list item tag has not yet been closed. This is not a violation of the well-formedness
requirement; instead this is because a nested list actually occurs within the preceding
list item. The list item tag will be closed only after the nested list is closed.
Next is the list of major headings within the chapter. This list uses Arabic
numerals, and begins with 1. Because 1 is the default start value, it does
34
not need to be declared. The Arabic numeral choice is also the default for
an ordered list, so we dont need to declare that either. The markup will
appear as
<ol>
<li>Organizing Documents with Headings</li>
<li>Grouping and Ordering Data with Lists
Again, well be nesting a list within the second list item, Grouping and
Ordering Data with Lists, so this second list item remains unclosed for
now.
Next are the lettered items for the chapter subheadings about individual
list types:
<ol type=A>
<li>Unordered Lists
The first list item A has a further subheading, requiring its own one-item
list. After this list is closed, we can close the list item for Unordered Lists,
so we have
<ol>
<li>Changing the List Item Delimiter</li>
<li>Compacting Your List</li>
</ol>
</li>
Having closed the list item for Unordered Lists, were now back within the
uppercase Arabic alphabet list. By simply opening a new list item, the list
will continue on at the proper value:
<li>Ordered Lists
A new nested list is now needed for the subheadings under Ordered Lists,
so we open and complete a new one here:
<ol>
<li>Setting a List Start Value</li>
<li>Manipulating an individual list item value</li>
<li>Mixing list types in nested lists</li>
</ol>
The next list item is the third item C in the subheading list, Definition
Lists:
<li>Definition Lists
<ol>
<li>Nesting Ordered or Unordered Lists within Definition Lists</li>
</ol>
</li>
Now that each of the subheadings for Grouping and Ordering Data with
Lists have been presented, that list item and that nested list can be closed:
35
</ol>
</li>
</ol>
The final entry, then, is the fourth major heading Whats Next? after
which the main Roman Numeral list, and its sole list item, must be closed:
<li>Whats Next?</li>
</li>
</ol>
The entire sequence then comes together as shown in Listing 2.1. Figure
2.6 shows the result.
Listing 2.1: A Set of Nested Lists
<ol type=I start=2>
<li>Adding Semantics to Structure
<ol>
<li>Organizing Documents with Headings</li>
<li>Grouping and Ordering Data with Lists
<ol type=A>
<li>Unordered Lists
<ol>
<li>Changing the List Item Delimiter</li>
<li>Compacting Your List</li>
</ol>
</li>
<li>Ordered Lists
<ol>
<li>Setting a List Start Value</li>
<li>Manipulating an individual list item value</li>
<li>Mixing list types in nested lists</li>
</ol>
<li>Definition Lists
<ol>
<li>Nesting Ordered or Unordered Lists within Definition Lists</li>
</ol>
</li>
</ol>
<li>Emphasizing Important Content
<ol>
36
OUTPUT
Definition Lists
Definition lists have a rather unfortunate name, in that they probably
arent used truly for definitions even half of the time they occur in documents. The idea behind them is that you have a term that is highlighted in
some manner, and a corresponding definition for that term. Here, we might
express that as
XHTMLExtensible Hypertext Markup Language
XHTML doesnt prescribe the presentational aspects of a definition list, so
authors cant count on any specific look; more so here than with other list
formats. The basic syntax occurs as
<dl>
<dt>XHTML</dt>
<dd>Extensible Hypertext Markup Language</dd>
<dt>XML</dt>
37
OR
It is possible to nest other list types within a definition list. The means to
do so is a little different from the other lists. The rendering of the following
example is shown in Figure 2.8:
<dl>
<dt>XHTML</dt>
<dd>Extensible Hypertext Markup Language
<ol>
<li>XHTML 1.0 Strict</li>
<li>XHTML 1.0 Transitional</li>
<li>XHTML 1.0 Frameset</li>
</ol>
</dd>
</dl>
38
Inline Emphasis
Two of the most common means of providing emphasis in traditional print
media are the use of boldface or italic. Both of these choices have strict
presentational semantics attached to them: the change in font weight and
style.
Emphasis in HTML first began with the <em> element for emphasis, and
the <strong> element for strong emphasis. By tradition, the major browsers
would render <em> as italic and <strong> as boldface. But many developers
desired some assurance that with these presentations their intentions
wouldnt be tampered with. As a result, the <i> and <b> elements were
introduced for italic and boldface, respectively.
However, in XHTML were again moving away from the presentational
aspects of HTML, and retaining the structural markup instead. The <i>
and <b> elements are available to you, provided you use the XHTML 1.0
Transitional doctype. They remain deprecated in favor of the structural
39
versions <em> and <strong>. Specific font weights and styles are then handled in CSS or another stylesheet language.
Block-Level Emphasis
At times, document authors will be faced with the need to offset large
blocks of text, rather than just a word or phrase within a sentence. XHTML
continues to provide the mechanism to do this that youre familiar with in
HTML 4, namely with the <blockquote> element.
blockquote is intended to function as its name implies, a specific block of
data offset as a quotation. Because blockquote is also a block-level element,
you have the freedom to include additional styling at the inline level.
Suppose you were writing a term paper about Shakespeare, and wanted to
quote a famous passage from Julius Caesar. You could do so using the
blockquote element:
EXAMPLE
<blockquote>
Remember March, the ides of March remember:
Did not great Julius bleed for justice sake?
What villain touchd his body, that did stab,
and not for justice? What, shall one of us
that struck the foremost man of all this world
but for supporting robbers, shall we now
contaminate our fingers with base bribes,
and sell the mighty space of our large honours
for so much trash as may be grasped thus?
I had rather be a dog, and bay the moon,
than such a Roman.
</blockquote>
The actual presentation of a block quote passage isnt decreed in the HTML
or XHTML specs. Instead, implementation is left to the browser programmer. Tradition has been that the text is indented on both margins (see
Figure 2.9). Some browsers might go further and italicize the passage, or
provide other font and color changes.
CAUTION
Avoid the temptation to use blockquote to produce the double-sided indent look for
general layout purposes. As we discussed, this rendering is not guaranteed by the
XHTML specification. The user agent can do anything to offset the text and still conform.
40
Figure 2.9: A block quote passage rendered with indented margins between
nonsense text.
Whats Next
In this chapter weve examined major structural elements provided by
XHTML. Youve learned how to properly organize a document using headings and create basic and nested lists that conform to the well-formedness
requirements of XHTML, and weve reviewed how to emphasize words,
phrases, and entire blocks of text using the appropriate emphasis elements.
Next up in Chapter 3, Working with Images, youll bring graphical elements into your documents using images and review the popular means of
interacting with your readersXHTML forms.
3
Working with Images
If the average Web user was asked to describe the World Wide Web, he
might say pictures and things on the Internet. Only people introduced to
the online world before early 1995 were very familiar with the Net as a
text-based medium. Desktop publishing was a major new activity and found
many non-artists manipulating digital images for the first time.
Beyond the graphical skills gained in the DTP environment, digital imaging
for the Web brings new concerns and constraints. The number of image formats supported by Web browsers is significantly smaller than the number
you can work with in MS Publisher, or Framemaker. Additionally, the resolution and color depth available on a computer monitor change the way we
see those images compared to the output youd find in print media.
This chapter teaches you:
What image formats work online
Which format works best with what type of graphic
How to incorporate images into your Web page
How to create links using images
How to draw hot spots on images for an interactive map
44
GIF Images
The most common format used for non-photographic images is GIF
(Graphics Interchange Format). GIF is a bit-map image format, meaning
that the image is mapped pixel by pixel. The information in the bit-map
can be compressed when neighboring pixels have the same color values,
using whats essentially a form of digital shorthand. This allows the resulting image file to be considerably smaller when stored, saving space on the
Web server and bandwidth when the image is delivered to the site visitor.
GIF images do have some limitations, the most notable being the maximum
of 256 unique colors within the image. Thankfully, this 256-color palette is
not a static set of colors, but might be any 256 colors that work best within
the image. This limited, though flexible, palette makes GIF the ideal candidate for graphics with large blocks of color, as often seen in logos, buttons,
and banners.
Controversy over the GIF format arose in the early days of the Web, when
the company that owned the patent on the LZW compression algorithm
used in GIF asserted that it was owed significant licensing rights from any
software publisher that produced tools that could publish GIF images. The
end creator of an image generally doesnt owe any licensing fees, but in part
because of the furor over this issue, the W3C began working on an opensource (royalty-free) image format known as PNG (Portable Network
Graphics). Support for this image format is increasing, but PNG isnt
nearly as universally supported as GIF.
NOTE
To learn more about PNG, visit the W3C Web site at http://www.w3.org/Graphics/
PNG/. And see the PNG Imagesthe Webs Newest Format section later in this
chapter.
JPEG Images
The JPEG format, short for Joint Photographic Experts Group, was specifically designed for digital storage of photographic images. JPEGs (sometimes referred to as JPGs) can use up to 16.7 million colors instead of GIFs
relatively paltry 256. The compression algorithm used by JPEG is known
as a lossy technique, meaning that information is literally thrown away in
45
the process of compressing the data. A low compression rate preserves the
highest quality, while a higher compression rate removes more information.
With a photograph, this doesnt have much impact on image quality when
using a low to moderate compression level, especially when viewed on a
computer screen with the monitors relatively low pixel-per-inch resolution.
46
Practically, at least two additional attributes are needed: height and width.
These attributes hold the vertical and horizontal measurements of the
image, expressed in pixels. When present, these values allow the browser to
reserve the space required for the image as it renders the rest of the page,
which results in the images appearing to fill in after the text has been
displayed. The end result is a page that gives the appearance of loading
faster, which can result in fewer impatient visitors leaving your site before
the content can be displayed.
The following example puts a single image inline within a paragraph, with
the image space reserved using the height and width attributes (see
Figure 3.1):
EXAMPLE
47
OUTPUT
48
Figure 3.2: Changing the image alignment to middle only adjusts its
position relative to the baseline.
For smaller images, this might be okay, but with something this large, the
placement is awkward to say the least.
What helps in this situation is to align the image relative to the margins
using align=left or align=right. Then, the image is solidly docked at
the margin, and the text will flow around it (see Figure 3.3).
49
Figure 3.3: The image is aligned to the right, with the text then flowing
around it on the left side.
Figure 3.4: This site uses two types of buttons to guide users to different
functions of the bank.
50
Figure 3.5: The file folder tabs are a popular navigation metaphor.
The process of linking graphics is as simple as linking text; the image element is placed inside the anchor element:
<a href=foo.html><img src=bar.gif alt=bar /></a>
EXAMPLE
To simulate a navigation menu, Ive created three buttons with the labels
Option 1, Option 2, and Option 3.
TIP
These buttons were created using the buttonize effect available in Paint Shop Pro 6
(shareware). Most current graphics editors have a similar effect available.
Each button will be placed inside its own anchor, linked to a corresponding
option page:
<a href=option1.html><img src=opt1.gif alt=option 1 width=100
height=60 /></a>
<a href=option2.html><img src=opt2.gif alt=option 2 width=100
height=60 /></a>
<a href=option3.html><img src=opt3.gif alt=option 3 width=100
height=60 /></a>
51
The initial results, shown in Figure 3.6, have two issues that might need to
be addressed: Theres a bright blue border around the images, and whitespace between each of them.
Next, to address the spacing between images, we can move each of the
anchors onto the same line of text in the source HTML document. Browsers
properly interpret the new line or carriage return/new line characters that
many Windows- or Mac-based text editors produce as white space.
Removing those by writing the links on a single line will then remove the
white space (see Figure 3.7).
52
Figure 3.7: The linked buttons with white space and bordering removed.
Image Maps
The Web site of one of my favorite public places, the Monterey Bay
Aquarium, features three separate image maps on its home page; two traditional navigation bars, and a third based on a composite collage image
(see Figure 3.8). Both versions use the same techniques; the collage image
simply lets the designer break out of the grid-like effect that using individual images as links can produce.
The basic idea behind an image map is that specified regions of the image
are identified as hot spots by mapping their coordinates to a linked URL
using the <area> element. The browser captures the exact coordinates of
the spot where the user mouse-clicks within the image, and activates the
corresponding link.
Hot spots can be drawn in one of four ways:
Rect (rectangle)Two coordinates are used to draw a rectangle: the
upper-left corner and the lower-right corner.
CircleThree values are used: the x/y coordinates of the circles center point, and the radius length.
Image Maps
53
Poly (polygon)At least two sets of x/y coordinates, with the last set
holding the same values as the first, close the polygon.
DefaultNot a shape so much as the set of all coordinates on the
image that are not otherwise defined in a hotspot.
EXAMPLE
Begin by launching the map editor software. Start a new map definition by
choosing File, New Map and locating the image youll be working on within
the Open dialog box. CuteMAPs interface consists of three panes (see
Figure 3.9). The image is displayed in the largest pane, with a source view
beneath it. To the left is a data pane, where youll enter the URL, alternative text, and other data associated with each hot spot within the map.
54
Image Maps
Figure 3.11: Entering data for the newly created hot spot.
55
56
The corresponding HTML is seen in the code pane, beneath the image. The
program allows you to copy the code to the Clipboard for import into your
text editor. The end result appears in Listing 3.1.
CAUTION
Most of the image mapping tools available, including CuteMAP, arent accustomed to the
case sensitivity and well-formedness requirements of XHTML. If you import the results
directly into your text editor, youll have some cleaning up to do, as youll instantly have
invalid XHTML due to the case issues. I tend to write the definitions by hand, copying
over the coordinates. This situation should improve with the next versions of these
tools.
Listing 3.1: An Image Map of the State of Hawaii
<img src=hawaii.gif usemap=#hawaii.gif width=504 height=386
border=0 alt=Image map for the state of Hawaii />
<map name=hawaii.gif>
<area shape=rect coords=2,14,66,69 href=kauai.html
alt=Island of Kauai />
<area shape=circle coords=178,104,43 href=oahu.html
alt=Island of Oahu />
<area shape=rect coords=238,122,303,147 href=molokai.html
alt=Island of Molokai />
<area shape=poly coords=297.157.315.138.398,173,336,214
href=maui.html alt=Island of Maui />
<area shape=rect coords=354,223,501,374 href=hawaii.html
alt=Island of Hawaii />
</map>
Whats Next
In this chapter youve learned how to incorporate images into your Web
documents, varying alignment and placement within the page. You can create links and maps using images to define visually pleasing navigation
systems.
Next, in Chapter 4, Collecting Data with Forms, youll learn how to collect
data from your site visitors using forms. A basic Perl-based CGI script for
processing the form submission is supplied.
4
Collecting Data with Forms
As witnessed by todays booming dot com economy, nothing has been more
vital to the success of the Web as a commercial medium as the ability to
collect data from Web site visitors. The sale of information, services, and
products of all kinds are contributing to the billions of dollars changing
hands each year online; and each transaction begins with a simple form.
Forms combine not just XHTML elements and content, but additional
scripts and programmatic actions in order to collect the data from the user
and deliver it to the site owner by email, store it on the Web server, or perhaps insert it directly into a database.
This chapter teaches you:
How to use each of the 10 form controls
How form content is processed
How to implement a simple CGI script written in Perl
60
The method attribute takes one of two values: get or post. These values
determine how the data collected in your form is sent to the server. Youll
learn about the options in more detail in the Form Processing Options
section later in this chapter. For now, you should know that post is the
most commonly used method.
The script or program used to actually process the form data is referenced
in the action attribute. This value is the full URL for the program or script.
61
Buttons
Image controls
Well look at each of these later in the chapter. Each of the 10 input control
types might use the same collection of attributes. These are defined in
Table 4.1.
Table 4.1: Attributes Available for the Input Element
Attribute
Use
type
name
value
checked
Disabled
Readonly
size
maxlength
src
alt
usemap
tabindex
accesskey
onfocus, onblur, onselect,
onchange
accept
align
Typically youll use the type and name attributes, and perhaps size and
maxlength. As I introduce each form control type, youll have an opportunity
to use the attributes commonly associated with that control.
TEXT BOXES
Probably the most commonly used input control is the text box (see Figure
4.1). You see these all the time when asked to enter your name or your
email address.
62
EXAMPLE
Figure 4.1: This figure shows a simple text box input control.
Of the attributes available to this element, only type and name are required.
For practical use, however, youll also be using the size attribute, especially
when working with multiple inputs, to provide a uniform appearance while
at the same time allowing the user to see his or her entire entry in most
instances. The text box in Figure 4.1 was created using the following
XHTML. Notice that input is an empty element, using the shorthand slash
before the closing bracket to complete the tag set:
<p>Please enter your name: <input type=text name=foo size=20 /></p>
With a size of 20, most browsers will make the box as wide as is necessary
to fit 20 characters of the default fixed-width font being used. If thats
8-point Courier, the box will be smaller than 20 characters of the text
elsewhere in the document if that was done in 14-point Arial Bold.
If youd like to control the number of characters the user can enter, the
maxlength attribute can be added.
PASSWORD CONTROLS
The password control is nearly identical to the text box control. The only
functional difference is that users input is hidden behind a masking character, such as an asterisk (*), as they type. Password entry is the most common need for visual security, but you also can use passwords for credit card
number entry or the collection of other sensitive information when the user
might not be in a private location, such as at a Web kiosk or cyber caf.
63
Password controls are created using the type value of password instead of
text:
<input type=password name=foo size=20 />
EXAMPLE
CAUTION
Though the user input into the password control is visually masked, the data doesnt
receive additional encoding when its sent for processing with the rest of the form
input. Information that needs to be secure during that transmission will need more handling (such as a connection through SSL) than just a password control.
EXAMPLE
The option text gives the user the cue to check or uncheck the box, as
shown in Figure 4.2. Notice, though, that the option text is not contained
within the input element. Instead, it is placed after the element is closed.
64
Using this configuration, if the check box is selected, the form data would
be returned as
checkbox_name=on
Many times, though, youll want a value more descriptive than on. If you
add the value attribute to the input element, that predefined text will be
sent instead. For example:
<input type=checkbox name=manufacturer value=ford /> Ford
would pass
manufacturer=ford
which gives the data recipient a bit more context as to the original question. You can group check boxes together by giving each of them the same
name, but varying the value.
EXAMPLE
Paolos Pizza lets local customers order their pizza online. They need to
allow the user to select multiple toppings to be added to the pizza, yet they
want to limit the input to the toppings they have on hand. A text box
wouldnt be appropriate in this case, because that would allow free-form
entry that could bring in requests they couldnt fulfill. Grouping a set of
check boxes handles this task nicely:
<input
<input
<input
<input
<input
<input
<input
type=checkbox
type=checkbox
type=checkbox
type=checkbox
type=checkbox
type=checkbox
type=checkbox
name=toppings
name=toppings
name=toppings
name=toppings
name=toppings
name=toppings
name=toppings
Figure 4.3 illustrates how the check boxes are placed one after the other if
no additional formatting is used between them.
Based on the users selections, the script would return something like
toppings=pepperoni&toppings=extra%20cheese&toppings=onions
Whichever button is selected will determine the value paired with shipping.
65
Figure 4.3: A group of check boxes using the same name value helps
logically group the information returned.
TIP
You can provide your users a gentle nudge in a certain direction if you tell the browser
to preselect one option. To do so, use the checked=checked attribute.
<select name=size>
<option value=s> small</option>
<option value=m> medium</option>
<option value=l> large</option>
<option value=xl> extra large</option>
</select>
66
Figure 4.4 shows the traditional rendering of this control (this time in
Netscape Navigator 4.72), both unactivated and after user activation.
The list box doesnt necessarily have to have a size attribute value thats
equal to the number of option elements. If, for some reason, you wanted to
create a list box with 20 elements, you could begin with 5 of them showing
using a size attribute value of 5. A scrollbar would then appear allowing
the user to see the remaining options (see Figure 4.5).
A final option for the select control is to allow the user to choose multiple
options from within the list. This function is controlled with the multiple
attribute, written as
<select name=foo multiple=multiple>
option elements
</select>
67
Figure 4.5: A list boxstyle select control allows users to see more choices
without activating the control.
CAUTION
Although the select control has the ability to use multiple selections, take care in using
this feature. The act of holding the Ctrl key when making multiple selections tends not
to be intuitive to many users. Consider a set of checkboxes for multiple selections
instead.
OF
TEXT
There will be times when it is desirable to collect more than the few words
or short sentence that can be entered into a reasonably sized text box. The
text area control was designed to help solve that problem.
Text areas are formed a little differently from most other controls in that
they are not empty elements and can accept plain text between the opening
and closing tags. Additionally, they have two attributes that control the size
of the control instead of one. The rows attribute manages the number rows
that are visible in the control (the height measurement), and the cols
attribute determines how many characters are visible horizontally (the
width measurement).
EXAMPLE
68
The Webmaster creates a text area control that contains the instruction
Please describe your problem in detail here. between the <textarea> tags.
<textarea name=details rows=6 cols=50>Please describe your problem in
detail here.</textarea>
The row attribute value of 6 meets the height requirement for the control,
and the cols attribute value of 50 provides a comfortable width for most
users. Figure 4.6 shows the results.
OUTPUT
Figure 4.6: A text area with associated instructions for the user provides a
hint as to what you expect.
TIP
Remember that the width of form controls is based on the default font in use for the
browser. Fifty characters in width might appear small to a designer using a monitor set
to 1024784 (or even higher) resolution, but to a user on a 640480 resolution monitor, it might fill the entire screen. Be sure not to set a cols value so high as to force
horizontal scrolling for those users on the lower resolution machines.
TO THE
SCRIPT
In almost any data exchange with your visitors, there is likely to be some
information that is static, that is, it doesnt change with each user. Perhaps
you have several different forms on your site, and you want to be able to
determine at a glance which form generated the response. The hidden form
control handles this for you nicely.
69
The hidden control will always have three attributes: type, name, and value:
<input type=hidden name=foo value=My value here>
Size or other presentational attributes arent an issue as the form isnt visible to the end user. Although they arent displayed, the data do still pass
back and forth from the server to the user and back to the processor, so use
them sparingly for important information.
FILE INPUT CONTROL
EXAMPLE
Figure 4.7 illustrates what the field might look like when the dialog box is
generated after the button is activated. The actual implementation does
vary from browser to browser.
Figure 4.7: The browse action of the file control allows users to locate the
file desired.
70
NOTE
If you do include a file upload option in your form, youll need to add an additional
attribute to the form element: enctype. Enctype controls the encoding type, or Internet
Media Type of the data that is going to be passed. When a file is included, there will be
two parts: the form data and the file. Therefore, the enctype value must be set to multipart/form-data to transmit both pieces successfully.
BUTTONS
Nearly everyone is familiar with the ubiquitous Submit and Reset buttons seen on many forms. Both of these input types have special properties.
The minimal syntax for each is shown here:
<input type=submit />
<input type=reset />
Figure 4.8: Customized text on submit and reset buttons allows for more
intuitive instructions.
71
A third button type, this one without any special properties, is also available. It uses the generic type value of button. Designed for use with
scripting, the button type is supported only in some of the newer browsers,
and should be used with caution.
IMAGE CONTROLS
The final input type is the image control. Used as a replacement for the
Submit button, many designers favor it because it allows them to keep
each element of their form page within the desired look and feel. Because
youre working with a separate file for the image, the input element needs
an additional attribute of src. The value is the URL for the image file:
<input type=image src=myimage.gif alt=Order Now! />
EXAMPLE
Presto Printers wants to begin taking in business over the Internet. If customers can upload camera-ready artwork, Presto can turn around their
business card or letterhead order in 48 hours. Aside from the file, the customer will need to provide paper selections, quantities, shipping and billing
address, payment information, and any special instructions. The manager
also would like to offer customers an account registration option, so that
they can save information from the initial order for one-step re-ordering.
The complete form is developed as follows in Listing 4.1.
Listing 4.1: form.html
<!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>
<html>
<head>
<title>Presto Printers Job Spec Sheet</title>
</head>
<body>
<h1>Presto Printers Job Spec Sheet</h1>
<form method=post action=http://www.webgeek.com/cgi-bin/forms.cgi>
<input type=hidden name=form value=Job Spec Sheet />
<h2>Shipping Address</h2>
Customer Name: <input type=text name=name size=30 /><br />
Address Line 1: <input type=text name=add1 size=30 /><br />
Address Line 2: <input type=text name=add2 size=30 /> <br />
City: <input type=text name=city size=30 /> <br />
State/Province: <input type=text name=state size=15 />
Zip/Postal Code: <input type=text name=zip size=10 /><br />
Daytime Telephone: <input type=text name=phone size=12 /><br />
72
73
Now its time to actually do something with the data collected. To accomplish that, youll need a script or program to process the information.
74
conjunction with the Web server software. The use of these systems for
forms processing goes beyond the scope of this text, so I wont go into them
any further.
CAUTION
Avoid the temptation to add additional parameters to the mailto: value. A popular,
though flawed trend on the Web is to try to insert subject lines and even full text sections to email messages generated by forms by adding specific strings to the mailto:
data. The use of these techniques violates accepted Internet mail protocols. Because
of this, youll likely have further difficulties in getting a mailto:-based form to process
the data as desired.
If you do manage to get it to work, youll get the data back in whats known
as URL-encoded form. This is one long string of characters composed of the
name=value pairs from each form element, and the spaces, carriage
returns, and line feeds between and within those pairs converted to their
equivalent ASCII character codes.
75
The instructions are passed through the gateway in the form of scripts,
which are essentially miniature programs that perform limited functions.
CGI scripts can be written in quite a few different programming languages,
but the most prevalent remains Perl.
Perl is an interpreted language, meaning that a program known as a Perl
interpreter on the server processes the script. If the script were executable
on its own, it would be known as a compiled program, or executable program. Perl interpreters are available for most computer platforms, including all variations of UNIX/Linux, 32-bit Windows (Windows NT, 95/98),
Solaris, and almost any other popular platform that also is capable of
running a Web server.
To run CGI scripts, you must have a directory within your Web site that is
set up with the proper permissions. The system administrator normally
does this. On a UNIX- or Linux-based system, the directory is usually
named cgi-bin, and is often found at or just below the root directory of the
Web site. If you have control over your own Web server, you can set this up
yourself. If you use Web hosting space on an ISP or IPP, youll need to ask
your provider if they allow CGI scripts, and if so, which directories you
should be using.
The following script (Listing 4.2) is a generic form handling script written
in Perl. Its not important for you to know what each of the lines mean, but
you will need to edit the first line of the script, known as the path to Perl,
and several of the variables defined in lines two through six. Youll come
back to those details in a moment.
Listing 4.2: form.cgi
#!/usr/bin/perl
#Line 1
# Line 1 enter the email address where you wish the response
# to be delivered. The @ character must be escaped, as in \@
# Line 2 Edit the subject line between the quote characters.
# Line 3 Enter a URL to be displayed after the form has been
# processed Users will click this link to continue viewing
# your site.
# Line 4 The title of the page referenced in the URL in Line 3
# Line 5 Ask your system administrator for the path to sendmail on your
system.
# Edit this line as necessary.
# --
$email = you\@yourcompany.com;
$subject = Your Subject Here;
$nextpage = http://www.yoururl.com;
$title = Page title for URL in Line 3;
$SENDMAIL = /usr/sbin/sendmail t;
#Line 2
#Line 3
#Line 4
#Line 5
#Line 6
76
TO
PERL
AND THE
SIX VARIABLES
The information stored in the six variables should be pretty selfexplanatory. What you need to be careful with is how you enter them
within the script.
Path to Perl (line 1)Edit this line as necessary, using the information provided by your system administrator for the path to Perl. This
line will always begin with the #! characters.
Email Address (line 2)This is the address to which all form
responses will be delivered. It should be an address that someone will
be checking frequently, and that does not have any artificial limit on
the number of messages that might be stored there. Perl can only
properly process the email address if the @ character is escaped.
Escaping a character means alerting the Perl interpreter that the
character should simply be passed on as text, rather than interpreted
as a programming instruction (in Perl, the @ character indicates the
beginning of an array name). To escape the character, a backslash
must be placed in front of it. Therefore the email address I use for this
book would be xhtml\@webgeek.com.
Whats Next
77
Next Page URL (line 4)This variable holds the URL that you will
make available to the user after she submits your form. It should lead
back to a logical pointyour sites home page, the beginning of your
online catalog for an ecommerce site, or some other major division
within the site. The URL must be fully qualified, that is, it must include
the protocol and any fragment identifiers required. For example:
http://www.mycompany.com/catalog/
Title (line 5)What is the title of the page you just sent your visitors
to in line four? That information goes here. For example:
Widgets, Inc. Online Catalog
WITH
NON-UNIX SERVERS
Although its most popular on UNIX-based Web servers, Perl isnt limited
to that platform. Ports of the language are available in both Windows and
Macintosh versions. Small variations in syntax might be required when
working on one of these other platforms, but the root language remains the
same.
Of course not all CGI scripts must be written in Perl. Many are written in
C, C++, various forms of BASIC, and a whole host of other programming
languages. Consult your system administrator to find out which languages
are available to you.
To learn more about Perl, or to download a copy of the Perl interpreter for
any of these platforms, visit http://www.perl.com.
Whats Next
In this chapter you have learned how to create XHTML pages that allow you
to collect information from your site visitors through XHTML forms. Youve
been introduced to two methods of processing form data, mailto: and CGI
scripts, and you can edit a simple Perl-based script for use on many Web sites.
Next up in Chapter 5, Working with Tables, well look at creating XHTML
tables, both for tabular content and to format content in an easier-to-read
manner, such as the Presto Printing form just created in this chapter.
5
Working with Tables
In the early days of HTML, one of the most challenging page design problems dealt with the inclusion of tabular content. How could you readily
structure data that was more than just a simple list? Authors needed to
create grids, spreadsheets, and other information that rightfully belonged
in a table if it were to be printed on paper, yet HTML didnt always have
this facility. Thankfully, tables were included in HTML 3.2, and are again
available in enhanced form in XHTML 1.0.
After their introduction, Web designers quickly realized that tables could be
used for more than just truly tabular content. By organizing the content of
a page into columns and rows, enhanced visual designs could be achieved.
Some early WYSIWYG design tools went overboard with this idea, trying to
give the designer pixel perfect placement control over page content,
resulting in dozens and dozens of tables per page. I wont go so far as to tell
you that you cant use tables for visual layout, but when you do, you should
understand the ramifications of doing so when your pages are viewed by
alternative devices.
This chapter teaches you:
How to create a basic table
How to merge columns and rows
How to create tables within tables
How to improve the usability of tabular data
80
<tr>
<td width=100 height=100>&nbps;</td>
<td width=100 height=100>&nbps;</td>
<td width=100 height=100>&nbps;</td>
</tr>
NOTE
The non-breaking space character is used as filler in these table cells to help you
visualize the outcome when viewed in the browser. Completely empty table cells are
often collapsed by the browser and display as simply blank space.
The process is repeated for the second and third rows, before the table is
closed:
<tr>
<td width=100
<td width=100
<td width=100
</tr>
<tr>
<td width=100
<td width=100
<td width=100
</tr>
</table>
height=100>&nbps;</td>
height=100>&nbps;</td>
height=100>&nbps;</td>
height=100>&nbps;</td>
height=100>&nbps;</td>
height=100>&nbps;</td>
81
82
Populate the cells with Xs and Os as you please. Figure 5.1 shows my
choices. When first entered, the letters are centered vertically in the cell,
but are sitting at the left border of each, which looks a little funny. To fix
this, well need to add horizontal alignment for the cell contents.
Alignment can be added at the cell level, or for an entire row. Horizontal
alignment is set using the align attribute, whereas vertical alignment is
set using valign.
Because each cells alignment within a row is the same, I can place the
attributes on the table row element. Ive used center for the horizontal
alignment, and middle for vertical alignment:
<tr align=center valign=middle>
Much more can be done with tables than the simple alignments that weve
done in this first example. Next, well prepare a monthly-view calendar that
will utilize background color for cells, headers, a caption, and multi-column
cells.
The calendar begins with a seven-column row of cells containing the days of
the week. These cells act as headings for the rest of the cells in the column,
so theyre defined using the table headthelement. The characteristics of
each cell will be the same, with the sole exception of the actual content.
Therefore the attributes can be placed on the TR element, impacting the
entire row, instead of repeated seven times, once per cell.
83
Youve used the alignment attributes already in the previous example, but
the background color attribute, at least as it pertains to tables, is new. The
attribute is bgcolor, just as it is with the body element. The value is a color
name or hexadecimal value. Ive chosen a pale yellow, with a hex value of
#FFFF99:
<table border=1>
<tr align=center valign=top bgcolor=#FFFF99 height=35 width=100>
<th>Sunday</th>
<th>Monday</th>
<th>Tuesday</th>
<th>Wednesday</th>
<th>Thursday</th>
<th>Friday</th>
<th>Saturday</th>
</tr>
In the next row, the first row of date cells, the first five cells dont have any
data. We could simply place a non-breaking space in them as you did when
creating the tic-tac-toe board. However, we want those cells to mimic print
calendars, in that this leading blank area will be a single solid space, not
five empty boxes. To do that, a single cell needs to occupy the first five
columns. This is achieved using the colspan attribute on the td element. As
you might guess, the value is the number of columns to be spanned by the
cell in question. The alignment attributes are set to left for horizontal, and
top for vertical, placing the date numbers in the upper-left corner of each
cell. There is a corresponding rowspan attribute that can be used to span
more than one row per cell. In this example, we only need to use colspan:
<tr align=left valign=top>
<td colspan=5 bgcolor=#CCCCCC width=100 height=100> </td>
<td width=80 height=80>1</td>
<td width=80 height=80>2</td>
</tr>
NOTE
Even though the first cell in this row spans five columns, the height and width
attributes are set as if the cell occupied a single column. With a colspan value of 5,
the cell then occupies 500 pixels of space (100 times 5), yet the same height space,
because the rowspan hasnt been manipulated.
Each of the next four rows will look the same, excepting the numeric date
content of the cell:
<tr align=left valign=top>
<td width=80 height=80>3</td>
<td width=80 height=80>4</td>
<td width=80 height=80>5</td>
<td width=80 height=80>6</td>
84
The final product is shown both in Listing 5.2 and Figure 5.3.
Listing 5.2: Monthly Calendar View
OUTPUT
<table border=1>
<tr align=center valign=top bgcolor=#FFFF99>
<th width=80 height=25>Sunday</th>
<th width=80 height=25>Monday</th>
<th width=80 height=25>Tuesday</th>
<th width=80 height=25>Wednesday</th>
<th width=80 height=25>Thursday</th>
<th width=80 height=25>Friday</th>
<th width=80 height=25>Saturday</th>
</tr>
<tr align=left valign=top>
<td colspan=5 bgcolor=#CCCCCC width=80 height=80> </td>
<td width=80 height=80>1</td>
<td width=80 height=80>2</td>
</tr>
<tr align=left valign=top>
<td width=80 height=80>3</td>
<td width=80 height=80>4</td>
<td width=80 height=80>5</td>
<td width=80 height=80>6</td>
<td width=80 height=80>7</td>
<td width=80 height=80>8</td>
<td width=80 height=80>9</td>
</tr>
<tr align=left valign=top>
<td width=80 height=80>10</td>
<td width=80 height=80>11</td>
<td width=80 height=80>12</td>
<td width=80 height=80>13</td>
<td width=80 height=80>14</td>
<td width=80 height=80>15</td>
<td width=80 height=80>16</td>
</tr>
<tr align=left valign=top>
<td width=80 height=80>17</td>
<td width=80 height=80>18</td>
<td width=80 height=80>19</td>
<td width=80 height=80>20</td>
<td width=80 height=80>21</td>
<td width=80 height=80>22</td>
85
Figure 5.3: This monthly calendar view uses merged cells and background
coloring.
86
Figure 5.4: Our previously created form needs some visual alignment.
Looking over the form, its clear that we can nicely line up the input fields
and the labels in two distinct columns. Well start by placing the first section (shipping address information) into our table:
<table border=1>
<tr>
<td align=right>Customer Name:</td>
<td><input type=text name=name size=30
</tr>
<tr>
<td align=right>Address Line 1:</td>
<td><input type=text name=add1 size=30
</tr>
<tr>
<td align=right>Address Line 2:</td>
<td><input type=text name=add2 size=30
</tr>
<tr>
<td align=right>City:</td>
<td><input type=text name=city size=30
</tr>
/></td>
/></td>
/></td>
/></td>
87
<tr>
<td align=right>State/Province:</td>
<td><input type=text name=state size=15 /></td>
</tr>
<tr>
<td align=right>Zip/Postal Code:</td>
<td><input type=text name=zip size=10 /></td>
</tr>
<tr>
<td align=right>Daytime Telephone:</td>
<td><input type=text name=phone size=12 /></td>
</tr>
<tr>
<td align=right>Fax:</td>
<td><input type=text name=fax size=12 /></td>
</tr>
</table>
Ive set the border attribute on the table element to a value of 1 to emphasize the changes brought about by the table. For the final version, well
take it out. Figure 5.5 shows what weve done so far.
Figure 5.5: The first section of the form is enclosed in a table for better
visual alignment.
The heading and check box for the billing address section can remain as
they originally were. The actual address fields will go in another table,
structured as with the shipping address (see Figure 5.6):
88
<h2>Billing Address</h2>
<input type=checkbox name=bill-same value=yes> Check here if same as
shipping address
<table border=1>
<tr>
<td align=right>Address Line 1:</td>
<td><input type=text name=badd1 size=30 /></td>
</tr>
<tr>
<td align=right>Address Line 2:</td>
<td><input type=text name=badd2 size=30 /></td>
</tr>
<tr>
<td align=right>City:</td>
<td><input type=text name=bcity size=30 /></td>
</tr>
<tr>
<td align=right>State/Province:</td>
<td><input type=text name=bstate size=15 /></td>
</tr>
<tr>
<td align=right>Zip/Postal Code:</td>
<td><input type=text name=bzip size=10 /></td>
</tr>
</table>
89
TIP
The two tables in Figure 5.6 are of different sizes. To make the presentation even more
visually uniform, size each of the tables to the same value using the width attribute on
the table element.
Placed all together, with the borders removed and a standard submit button, the final result is shown in Listing 5.3 and Figure 5.7. Much nicer,
isnt it?
Listing 5.3: The Revised Form
<form method=post action=http://www.webgeek.com/cgi-bin/forms.cgi>
<input type=hidden name=form value=Job Spec Sheet />
<h2>Shipping Address</h2>
<table border=0>
<tr>
<td align=right>Customer Name:</td>
<td><input type=text name=name size=30 /></td>
</tr>
<tr>
<td align=right>Address Line 1:</td>
<td><input type=text name=add1 size=30 /></td>
</tr>
<tr>
<td align=right>Address Line 2:</td>
<td><input type=text name=add2 size=30 /></td>
</tr>
<tr>
90
91
92
Figure 5.7: The newly aligned table as seen in the Web browser.
Nesting Tables
An alternative to complicated spanned layouts is available by creating
entire tables within a table, a technique known as nesting. A common situation where youll find nested tables is when the site layout is one large
table, usually with a navigation column down the left, and the main content in a large cell on the right. Then, any tabular information in the main
content area will need to be inside a table within that cell.
Ive created a page on my personal Web site that details current and
upcoming appearances at trade shows and other industry events. The page
uses just this type of layout, beginning with a single-row, two-column table:
<table border=0>
<tr>
<td width=200 bgcolor=#FFFF99>
Nesting Tables
93
<p><strong>Getting Around</strong></p>
<ul>
<li>Upcoming Events</li>
<li><a href=books.html>Books</a></li>
<li><a href=musings.html>Musings</a></li>
</ul>
</td>
<td align=center>content goes here<td>
</tr>
</table>
Next well go back and fill in the content on the second cell, which consists
of a two-row by three-column table with headers and a caption:
<table border=1>
<caption>Upcoming Events</caption>
<tr bgcolor=#9999FF>
<th>Event Name</th>
<th>Location</th>
<th>Dates</th>
</tr>
<tr>
<td>Y2K Pan-Pacific Conference</td>
<td>Waikiki Beach, Hawaii</td>
<td>October 19-21, 2000</td>
</tr>
<td>Builder.com Live!</td>
<td>New Orleans, Louisiana</td>
<td>December, 2000</td>
</tr>
</table>
OUTPUT
94
Figure 5.8: A set of nested tables is shown here in a popular form: one large
table for layout, the other has tabular content.
<td>Y2K Pan-Pacific Conference</td>
<td>Waikiki Beach, Hawaii</td>
<td>October 19-21, 2000</td>
<td>new content here</td>
</tr>
<td>Builder.com Live!</td>
<td>New Orleans, Louisiana</td>
<td>December, 2000</td>
<td>To Be Announced</td>
</tr>
</table>
The new content in the last cell for the Pan-Pacific Conference will be a
three-column, three-row table with headers:
<table border=1>
<tr bgcolor=#99FFFF>
<th>Session Type</th>
<th>Session Name</th>
<th>Date/Time</th>
</tr>
<tr>
<td>Workshop</td>
<td>Basic XML</td>
<td>TBD</td>
95
</tr>
<tr>
<td>Workshop</td>
<td>Intermediate XML</td>
<td>TBD</td>
<tr>
<td>Presentation</td>
<td>Introduction to XML</td>
<td>TBD</td>
</tr>
</table>
OUTPUT
Figure 5.9: A third table is nested inside the original nested table.
96
you be able to match the data with the header? XHTML provides several
attributes that can be added to TH and TD elements to provide this identifying information.
To illustrate, lets build a small table that sorts what my family ordered the
last time we went out to dinner. Each of our choices for entre, beverage,
and dessert are listed.
The table begins as all others do, with the table element and a caption:
EXAMPLE
<table border=1 summary=What the Navarro Family had for dinner at Los Amigos
Restaurant>
<caption>Dinner at Los Amigos</caption>
The next segment is the first table row, with the header cells. Each TH
element is given an id attribute with a unique value, as all IDs are
required to be unique:
<tr>
<th id=c1>Name</th>
<th id=c2>Entre</th>
<th id=c3>Beverage</th>
<th id=c4>Dessert</th>
</tr>
In a traditional browser, the results are as they would be without the inclusion of the id and headers attributes (see Figure 5.10).
OUTPUT
headers
97
Where the real change occurs is when a speech synthesizer or other adaptive technology reads this table for users with sight difficulties. Without the
additional data, a listener might hear:
Dinner at Los Amigos. Name, Entre, Beverage, Dessert, Dave,
Chimichanga, Margarita, Flan, Ann, Enchiladas, Sangria, Sopapillas,
Linda, Tostada, Pepsi, Fried Ice Cream.
After only a few iterations, it can become very difficult to keep track of
which items apply to what header. With the addition of the id and headers
attributes, the listener would hear something like this:
Dinner at Los Amigos. Name: Dave. Entre: Chimichanga. Beverage:
Margarita. Dessert: Flan. Name: Ann. Entre: Sangria. Dessert:
Sopapillas. Name: Linda. Entre: Tostada. Beverage: Pepsi. Dessert:
Fried Ice Cream.
TIP
If you have a particularly long value for a table header, use the abbr attribute to provide
a shorter alternative for just this type of rendering. For instance, a header of City of
Departure could take the attribute abbr=Departing.
With the example used here, you might be able to keep up, because its
fairly obvious that a Margarita belongs in the Beverage category, and not
98
Entre. But if youre dealing with categories that arent quite so obvious,
such as numbers, having the headings read in place makes the difference
between useful data and a stream of gibberish!
With more complex tables, its sometimes necessary to determine the meaning of a given cell, in addition to the cells contents, without having to listen
to the entire table. A common example would be a spreadsheet dealing with
various budget categories, or an expense report for a multi-stop trip, especially when subtotals or totals are present.
Figure 5.11 shows a simple expense report for a three-city business trip.
Expenditures are broken down by date, as well as by category (meals, hotel,
local transport, samples given). The table begins in the traditional manner,
using a summary and caption:
<table border=1>
<summary=A summary of expenses for Marshall Jansens sales trip for dates
September 20, 2000 through September 26, 2000.>
<caption>Expense Report: 09/20/00 through 09/26/00</caption>
99
<tr>
<th> </th>
<th id=c1 axis=expense>Meals</th>
<th id=c2 axis=expense>Hotel</th>
<th id=c3 axis=expense abbr=transport>Local Transport</th>
<th id=c4 axis=expense abbr=samples>Samples Given</th>
<td>Subtotals</td>
</tr>
Each new row is for a new date and city, the city being labeled as such with
an axis value of city:
<tr>
<th id=city1 axis=city>Orlando</th>
<th colspan=5> </th>
</tr>
The first full data row, the first date Marshall was in Orlando, introduces
the headers attribute on the TD element. For instance, the amount spent on
meals that first day needs to be bound to the meals category, the city of
Orlando, and the date the expense was incurred. The binding is accomplished by listing each id value for those categories in a space-delimited list
for the value of the headers attribute. Meals is represented by id c1,
Orlando by id city1, and September 20 by id d1. The headers value for
this cell then becomes c1 city1 d1. The process is repeated for each column in the row:
<tr>
<td id=d1 axis=date>20 September 2000</td>
<td headers=c1 city1 d1>52.91</td>
<td headers=c2 city1 d1>198.43</td>
<td headers=c3 city1 d1>4.50</td>
<td headers=c4 city1 d1>15</td>
<td> </td>
</tr>
<tr>
<td id=d2 axis=date>21 September 2000</td>
<td headers=c1 city1 d2>42.90</td>
<td headers=c2 city1 d2>198.43</td>
<td headers=c3 city1 d2>9.40</td>
<td headers=c4 city1 d2>12</td>
<td> /td>
</tr>
100
<td>13.90</td>
<td>27</td>
<td>506.57</td>
</tr>
This entire process is then repeated for the two additional cities:
<tr>
<th id=city2 axis=city>Charleston</th>
<th colspan=5> </th>
</tr>
<tr>
<td id=d3 axis=date>22 September 2000</td>
<td headers=c1 city2 d3>35.78</td>
<td headers=c2 city2 d3>173.55</td>
<td headers=c3 city2 d3>6.00</td>
<td headers=c4 city2 d3>25</td>
<td> </td>
</tr>
<tr>
<td id=d2 axis=date>21 September 2000</td>
<td headers=c1 city2 d4>51.15</td>
<td headers=c2 city2 d4>173.55</td>
<td headers=c3 city2 d4>11.30</td>
<td headers=c4 city2 d4>19</td>
<td> /td>
</tr>
<tr>
<td>subtotals</td>
<td>86.93</td>
<td>347.10</td>
<td>17.30</td>
<td>44</td>
<td>451.33</td>
</tr>
<tr>
<th id=city3 axis=city>Mobile</th>
<th colspan=5> </th>
</tr>
<tr>
<td id=d5 axis=date>24 September 2000</td>
<td headers=c1 city3 d5>57.14</td>
<td headers=c2 city3 d5>149.44</td>
<td headers=c3 city3 d5>8.00</td>
<td headers=c4 city3 d5>13</td>
<td> </td>
</tr>
<tr>
<td id=d6 axis=date>25 September 2000</td>
101
d6>52.99</td>
d6>149.44</td>
d6>6.70</td>
d6>22</td>
102
103
104
Whats Next
In this chapter you have learned how to create a wide variety of tables,
both for tabular content and basic document structure. Usability of tables
can be greatly enhanced by the use of visual and informational cues as to
cell content.
Next up in Chapter 6, Using Frames, you will learn additional techniques
for page layout and division by using XHTML frames.
6
Using Frames
Mention the topic of frames to any gathering of Web developers, no matter
how small, and youre bound to trigger a lively debate on the merits (and
demerits!) of this controversial technology.
Frames, first introduced to the Web world in Netscape Navigator in their
simplest form back in version 1.1, evolved into what we know as frames
today starting with version 3.05. Frames were formally adopted into the
HTML specifications with HTML 4.0, and are carried over into XHTML 1.0
with the XHTML 1.0 Frameset DTD.
This chapter teaches you:
How to create a basic frameset
Three methods of dividing available screen space into frames
How to change the look of frames through attributes
How to nest frames within frames
How to target content into specific frames
108
Building a Frameset
Frames divide the browser window space into two dimensions: horizontally
in rows and vertically in columns. Each portion of the window space, the
frame, is defined by its size. Sizes are set in fixed measures of pixels, by a
percentage of the available space, or by relative sizing.
One of the simplest framesets to create is to divide the browser window into
four equal quadrants. The basic syntax looks like this:
<frameset rows=50%, 50% cols=50%, 50%>
</frameset>
The frameset element, then, provides the spatial relationship between each
frame. However, you still need to populate each frame with content documents. To do this, each defined space will need its own frame element:
<frameset rows=50%, 50% cols=50%, 50%>
<frame src=frame1.html name=f1 />
<frame src=frame2.html name=f2 />
<frame src=frame3.html name=f3 />
<frame src=frame4.html name=f4 />
</frameset>
Notice that each of the frame elements has a name attribute. The value of
this attribute must be unique within the frameset, and must follow two
important naming rules: The name must start with an alphabetic character, and cannot be _blank, _self, _parent, or _top. Those names are
reserved for specific behavioral semantics youll learn about later in this
chapter in the Linking Between Frames: The target Attribute section.
Building a Frameset
109
Each of the four XHTML documents referenced in the frame elements are
created using either the XHTML 1.0 Strict or Transitional document types.
Figure 6.1 shows this frameset with a simple identifying heading in each of
the framed documents.
You can re-create each of the framed documents by editing this basic
document:
EXAMPLE
110
SIZING
BY
PIXELS
The only method that allows a frame to be given an absolute size that will
be the same no matter what type of system or monitor resolution is in use
is to size it in pixels. The tricky part about working in pixels is that even
though its an absolute size, meaning the size doesnt change based on the
users monitor settings or other factors, its apparent size does vary.
Consider this: Many computer systems ship with a default monitor resolution setting of 640480 pixels high. This means that if you create a frameset that divides the height of the screen into two rows, and use a size of 200
pixels for the first frame, youve taken up nearly half of the vertical screen
space. However, if youre looking at a system with screen resolution set to
1,024768, a 200-pixel frame will only take one-fifth of the available vertical space.
Accordingly, fixed width frames work best when the size of the contents is
known in advance. For instance, some sites keep a small frame at the top of
the screen to hold banner advertising or other images of a known size.
SIZE
AS A
PERCENTAGE
OF
AVAILABLE SPACE
Probably the most popular way to size frames is to divide the space by percentages. This method allows the designer to specify things like half the
screen, or one quarter of the left-hand column. The only restriction
placed on you is that for each dimension, horizontal and vertical, the figures must add up to 100%.
RELATIVE FRAME SIZING
EXAMPLE
This type of design is seen on the Geek Cruises Web site (see Figure 6.2).
This can be taken even further, expressing values as a ratio. If you wanted
three rows instead of two, you could write
<frameset rows=100, 4*, 1*>
which creates the initial 100-pixel high frame, and then divides the remaining space into two additional frames, the first having four pixels for every
one that goes to the last frame.
Building a Frameset
111
Figure 6.2: A fixed size frame holds navigation at the top of this site, with
content displayed in lower frames.
Values
Effect
noresize
noresize
scrolling
frameborder
0, 1
marginwidth
pixels
marginheight
pixels
112
Figure 6.3: This is how the frameset looks before introducing the nested
frame.
Next youll take the document in frame 1 (upper-left corner) and split it
into two rows:
<frameset rows=50%, 50%>
<frame src=frame1a.html name=f1a />
<frame src=frame1b.html name=f1b />
</frameset>
Building a Frameset
113
OUTPUT
Figure 6.4: One frame within a larger frameset also defines additional
frames.
TIP
You can nest frames to nearly unlimited levels. Think twice, however, before doing it
more than once. The visible space for each frame will become increasingly smaller, and
managing the hierarchy of frames becomes correspondingly more difficult.
EXAMPLE
Youve noticed that each frame created so far has its own name. As noted
previously, these names must be unique within a browser instance. That
uniqueness is the key to success in cross-frame linking. Consider again the
initial four-quadrant frameset created at the beginning of this chapter:
<frameset rows=50%, 50% cols=50%, 50%>
<frame src=frame1.html name=f1 />
<frame src=frame2.html name=f2 />
<frame src=frame3.html name=f3 />
<frame src=frame4.html name=f4 />
</frameset>
114
After following this link, the results look like Figure 6.5.
Figure 6.5: A new document is loaded into a separate frame using the
target attribute.
EXAMPLE
EXAMPLE
To begin, well divide the screen into the fixed banner frame and the lower
portion. The banner frame should be 100 pixels in size, should not be resizable, nor should it have a scrollbar. The second row should assume all
remaining available space:
115
Figure 6.6: This is the basic framed design for our recipe site.
<?xml version=1.0?>
<!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Frameset//EN
http:// www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd>
<html xmlns=http://www.w3.org/1999/xhtml xml:lang=en lang=en>
<head>
<title>Navarro Family Recipes</title>
</head>
<frameset rows=100, *>
<frame src=banner.html name=banner noresize=noresize scrolling=no />
<frame src=main.html name=main />
</frameset>
Remember that the first frameset document must contain not only the
frameset and frame definitions, but also the noframes content that provides
alternative access for those browsers that dont support frames. To that
end, you need to provide the content that will be initially placed in each of
the three frames in a single noframes section:
<noframes>
<body>
<div align=center><img src=banner.gif height=90 width=500 alt=Navarro
Family Recipes />
</div>
<h1>Welcome!</h1>
<p>This Web site was created as convenient way to share my familys favorite
recipes.
116
They have been arranged by category in the list below. Feel free to use as many
as you
wish, but if you send them along, please do give the original author
(noted in each file) attribution. </p>
<p>Bon Appetit!</p>
<ul>
<li><a href=soups.html>Soups</a></li>
<li><a href=bread.html>Bread</a></li>
<li><a href=veggies.html>Vegetables</a></li>
<li><a href=beef.html>Beef</a></li>
<li><a href=poultry.html>Poultry</a></li>
<li><a href=desserts.html>Desserts</a></li>
</ul>
</body>
</noframes>
</html>
117
Next we need to divide the lower portion of the screen, referred to in the
first file as main.html, into its two columns; the navigation frame on the
left, and the main content frame on the right. These will be sized at 20%
and 80%, respectively. Listing 6.2 shows the code used.
Listing 6.2: main.html
<?xml version=1.0?>
<!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Frameset//EN
http:// www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd>
<html xmlns=http://www.w3.org/1999/xhtml xml:lang=en lang=en>
<head>
<title>Navigation and Content: Navarry Family Recipes</title>
</head>
<frameset cols=20%, 80%>
<frame src=nav.html name=nav />
<frame src=content.html name=content />
</frameset>
</html>
Now that the shell is finished, we need to populate the initial frames. That
means we still need to create the following files:
banner.html
nav.html
content.html
The banner file consists of a single image centered within the page, and is
found in Listing 6.3.
TIP
Remember that the content pages in a framed site take either the Strict or Transitional
doctypes, not Frameset.
Listing 6.3: banner.html
<?xml version=1.0?>
<!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN
http:// www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>
<html xmlns=http://www.w3.org/1999/xhtml xml:lang=en lang=en>
<head>
118
The links that will be provided in the navigation frame are connected to
documents that will be displayed in the content frame. Accordingly, the
target attribute must be set to focus the link on that frame (see Listing
6.4).
Listing 6.4: nav.html
<?xml version=1.0?>
<!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN
http:// www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>
<html xmlns=http://www.w3.org/1999/xhtml xml:lang=en lang=en>
<head>
<title>Navarro Family Recipes</title>
</head>
<body>
<p><b>Recipe Categories</b></p>
<ul>
<li><a href=soups.html target=content>Soups</a></li>
<li><a href=bread.html target=content>Bread</a></li>
<li><a href=veggies.html target=content>Vegetables</a></li>
<li><a href=beef.html target=content>Beef</a></li>
<li><a href=poultry.html target=content>Poultry</a></li>
<li><a href=desserts.html target=content>Desserts</a></li>
</ul>
</body>
</html>
Finally, the initial page for the content frame must be supplied, which is
content.html (see Listing 6.5).
Listing 6.5: content.html
<?xml version=1.0?>
<!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN
http:// www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>
<html xmlns=http://www.w3.org/1999/xhtml xml:lang=en lang=en>
<head>
<title>Navarro Family Recipes</title>
</head>
<body>
<h2>Welcome!</h2>
<p>This Web site was created as convenient way to share my familys favorite
119
recipes.
Listing 6.5: continued
They have been arranged by category in the list to your left. Feel free to use
as many as you wish, but if you send them along, please do give the original
author
(noted in each file) attribution. </p>
<p>Bon Appetit!</p>
</body>
</html>
Now youre ready to view the initial Web page, as seen in Figure 6.7.
Figure 6.7: This is the opening view of the recipe site.
Each of the subdocuments, the ones linked from nav.html, are created in
the same manner, using the XHTML 1.0 Transitional document type. These
are found in Listings 6.6 through 6.11.
Listing 6.6: soups.html
<?xml version=1.0?>
<!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN
http:// www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>
<html xmlns=http://www.w3.org/1999/xhtml xml:lang=en lang=en>
<head>
<title>Soup Recipes</title>
</head>
<body>
<h2 align=center>Soups</h2>
120
121
<title>Beef Recipes</title>
Listing 6.9: continued
</head>
<body>
<h2 align=center>Beef</h2>
<ul><li><a href=tritip.html>Monterey Tri-Tip</a></li>
<li><a href=fajitas.html>Steak Fajitas</a></li>
<li><a href=ribroast.html>Standing Rib Roast</a></li>
</ul>
</body>
</html>
Listing 6.10: poultry.html
<?xml version=1.0?>
<!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN
http:// www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>
<html xmlns=http://www.w3.org/1999/xhtml xml:lang=en lang=en>
<head>
<title>Poultry Recipes</title>
</head>
<body>
<h2 align=center>Poultry</h2>
<ul><li><a href=hmchicken.html>Anns Honey Mustard Chicken</a></li>
<li><a href=tchicken.html>Teriyaki Chicken</a></li>
<li><a href=picatta.html>Chicken Picatta</a></li>
</ul>
</body>
</html>
Listing 6.11: desserts.html
<?xml version=1.0?>
<!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN
http:// www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>
<html xmlns=http://www.w3.org/1999/xhtml xml:lang=en lang=en>
<head>
<title>Dessert Recipes</title>
</head>
<body>
<h2 align=center>Desserts</h2>
<ul><li><a href=chocmouse.html>Chocolate Mousse</a></li>
<li><a href=peanutpie.html>Peanut Butter Pie</a></li>
<li><a href=peaches.html>Peaches and Cream</a></li>
</ul>
</body>
</html>
NOTE
In the interest of brevity, the actual recipe files have not been provided here. The functionality of the siteloading documents in the content frame when linked in the nav
122
After you have each of these files written, your site is complete. Test your
implementation of the frames by clicking any of the categories in the navi-
gation menu, and the new page should load in the right-hand content
frame (see Figure 6.8).
Figure 6.8: Clicking a link in the navigation frame loads the document in
the content frame.
Interoperability
Nearly by definition, frames suffer from a lack of interoperability. Support
is limited to the major browsers designed for desktop computer use.
Limitations in screen size, processing power, or rendering methods (in the
case of screen readers and Braille devices) make much of framed content
unreachable by a considerable portion of the Web audience.
123
Additionally, some authors tend to find it cumbersome to maintain alternative access methods, or find the noframes syntax confusing or difficult to
implement.
User Manipulation
One of the highest compliments a Web page can get is to find itself on a
users bookmark or favorites list. After all, thats what youre after, isnt it,
visitors who find your information to be a valuable resource?
But what happens when they try to bookmark a framed site? The bookmark is made to the main site, not the individual page the user has navigated to. When working in a knowledge base or other system where users
can find themselves many levels into a site, this can be an extremely frustrating experience for them to not be able to locate the document again
using the bookmark.
A second problem comes up when users try to print framed material. Most
browsers do not support the printing of the entire window, instead printing
only the frame that currently has the system focus. Those browsers that do
attempt to print the entire window (such as Internet Explorer 4 or 5), often
generate inconsistent results when content in any of the frames is out of
view. Sometimes content will be cropped to the visible area, and other times
users find themselves with page after page of partially blank paper as the
printer tries to recreate the framed look on the screen although its only
scrolling through a single frame.
Users can, to some extent, work around this problem by clicking within the
frame they want to print, and most browsers will acknowledge the change
in focus and print the desired area. However, without the remaining context provided by the other frames, some data might be lost when the
printed version is read at a later date.
124
Other size issues can include frames that scroll horizontally (perhaps
because of an overly large image or table that forces the scroll) that dont
necessarily have to be that way, or frames so small that they result in a
column of text just a word or two wide.
Figure 6.9: An improperly sized fixed frame can result in content being
forced out of view.
Whats Next
In this chapter youve explored the possibilities available when you frame
several documents within a single browser window. You can create fixed
frames with content that doesnt change, and other frames that can load
new documents within their own borders or manipulate the documents
within other frames. Youve also learned some of the hazards of working
with frames and can watch out for accessibility and interoperability problems.
Next up in Chapter 7, Universal Accessibility on the Web, youll take a
close look at how Web designers can promote interoperability and accessibility in their Web site by authoring valid XHTML. Additional techniques
that enhance accessibility for users with disabilities or those using todays
7
Universal Accessibility on the Web
When the topic of access to the Web for people with disabilities comes up,
the discussion can often take on a very moral and righteous tone. This
chapter is not going to lecture you on morality, or how to make the world a
better place by adding a few attributes to your XHTML code. What we will
discuss in this chapter is the realities that face many Web users, and how
by keeping a few simple issues in mind while developing your pagesyou
can improve your chances of disseminating the information you desire to as
many people as possible.
This chapter teaches you:
How the Web can be inaccessible
That accessibility can help traditional users, too
About the Web Accessibility Initiative
How to implement the Web Content Authoring Guidelines
128
129
Advances in technology originally intended to serve a disabled audience frequently benefit the general public as much or, as a percentage of use, even
more than the intended recipient of aid.
Two of my favorite examples are the sidewalk curbcut and closed-captioned
television broadcasts. Not that many years ago, perhaps 1015 years in
most places, finding the curbs cut down to street level on corners throughout a city was rare, as were ramps used to bypass outdoor stairways.
Both curbcuts and ramps were originally incorporated into public works
projects to assist wheelchair-bound citizens in crossing streets and gaining
access to buildings previously only reachable by stairs. But look around the
next time you take a walk outside or go for a drive. Who is using those
features? Youll undoubtedly see several grateful parents pushing baby
strollers, a postal worker pushing a delivery cart, even roller-skaters
using them for a smooth transition between sidewalk and roadway. These
concrete features are easily used 9 out of 10 times (or even more) for
other-than-intended, but very convenient, purposes.
Closed-captioning can often be seen in sports bars or busy airport lounges
places where theres a television broadcasting the news or a game, but the
ambient noise level is high enough that hearing the audio track is difficult
to impossible. These establishments turn on the closed-captioning text
track available on most television broadcast streams, and hearing users can
happily sit across the room and still hear the information that accompanies the pictures, no lip-reading required.
Now Im not intending to suggest that all of the accessibility design techniques youll learn will be life enhancing or even convenient to the traditional user. Instead, I remind you of these situations to encourage you to
remember that enhancements have a way of finding usefulness in the
mainstream world as they also serve the needs of those with disabilities.
130
131
principles. This has been somewhat helpful, although the result still has a
tendency to be more academic than practical. Another tool is a checklist of
checkpoints that can be printed and used as an evaluation and recordkeeping tool in the post-design phase. The checklist can be accessed at
http://www.w3.org/TR/WAI-WEBCONTENT/full-checklist.html.
Second, the conformance levels provide no leeway in implementation. If a
site developer cannot meet a single Priority Two checkpoint, the site cannot
be labeled Double-A conformant, no matter if every other Priority Two
checkpoint is met. Many feel this results in discouraging implementation of
the checkpoints sites can meet, because the effort will not be rewarded
with the more desirable conformance ranking. The cost of implementation
is something every company or individual will have to evaluate when planning their design techniques.
EXAMPLE
Think for a moment about how this chapter is presented. On the first page,
it has a chapter number, then on the next line a chapter title. Theres a
paragraph or so of text, and then a bulleted list of things youll be reading
about. On the following page, theres a heading written in white letters on a
black background. Each of these items represent a specific structural portion of the chapter. In the initial manuscript I submitted to the publisher, it
didnt just say write the number 7, and then Universal Accessibility on
the Web, leaving the reader to interpret the structural significance of those
items. Instead, both were clearly labeled as heading Level-A (the topmost)
and heading level B, respectively.
If we were to mark up the chapter using XHTML, those structures would
be mapped to the h1 and h2 elements. By using the elements for their hierarchical structure, I can give the content of the elements the proper semantic importance. If I were to begin with the chapter number in an h2
element, versus an h1 element, simply because I preferred the default size
and font weight, Id lose the structural cue as to its place in the document.
132
Defining Languages
Language definition is one type of meta information most frequently overlooked by Web designers. Many authors will assume their site visitors will
read and speak the same language they do (an affliction, Im sad to say,
that is more commmon among Americans than residents of other countries).
Begin by indicating the base language for the document using the xml:lang
and lang attributes on the html element. Screen readers or other aural presentations can select the appropriate pronunciation, perhaps even distinguishing between U.S. English traditions and British practices.
Additionally, to name just a few advantages, indexing systems such as
search engines can provide the additional data of language when cataloging
sites, tactile readers (for example, Braille devices) can insert the right
accent characters, or a user can automatically run a document through a
translation utility if the base language isnt his preferred tongue.
Just as important as setting the base language is bounding the presence of
secondary languages embedded within the document. Place names,
addresses, and phrases are frequent candidates for language annotation:
EXAMPLE
<p>Our hotel was located on the Avenida Rafael E. Melgar Sur, just off the
dazzling white beaches of Cozumel. As we stepped out of the cab we were met
with a friendly greeting, Holas, Amigas!</p>
Both the street name and the greeting can be marked as Spanish, using the
span element and the lang and xml:lang attributes:
<p>Our hotel was located on the <span lang=es xml:lang=es>Avenida Rafael E.
Melgar Sur</span> just off the dazzling white beaches of Cozumel. As we stepped
out of the cab we were met with a friendly greeting, <span lang=es
xml:lang=es>Holas, Amigas!</p>
theyd write
<b>This sentence is important.</b>
For the visual browser, the intended result generally came through. Sighted
readers have learned to associated boldface text with something important.
But what about the user who cant see the boldface treatment? By using the
structural approach of strong, screen readers or tactile devices have the
opportunity to give strong emphasis to the contained text in their own
unique way.
133
To create the most accessible pages, tabular content is all that tables
should be used for. But even the most evangelistic supporter of accessible,
structural design will often concede that tables do have their uses for basic
layout purposes. One of the most frequently cited cases is that of the left
column of navigation information, with the main document presented in a
wider column on the right.
When using this form of table, you can take several steps to ensure usability through alternative clients. The contents of each cell should make sense,
whether read cell to cell from left to right, top to bottom, or some other
combination. Readability can be enhanced by enclosing the content within
headings, paragraphs, and other structural elements, so that the cellular
content can stand on its own. This specifically precludes using tables to create a newspaper-column effect for text.
More free ebooks : http://fast-file.blogspot.com
134
Links
Hyperlinks provide dual functionality within a Web page. They create the
mechanical function of linking one document to another. But when written
correctly, they also provide contextual information about the link. Where
will the link take me? Is it something Im interested in seeing? Sadly, there
are thousands of Web pages online that only tell you to click here!:
<p>I live in Port Charlotte, Florida. <a
href=http://www.charlottecountyfl.com>Click here</a> to visit resources for
visitors, residents, or potential residents.
This type of link makes two assumptions that wont always hold true. First,
it assumes that the link is activated by clicking, which means a mouse or a
trackball is in use. Laptop users might tap their fingers on a touch-pad.
PDA owners press a stylus against the screen. Listeners using a screen
reader might press a single letter or number their program has associated
with the link, or perhaps theyll even speak a number to follow the link.
Second, not all users will see where here is, and if they dont click, what
do they do with here?
Instead, you can rewrite the hyperlink in the previous example to create a
link that gives context to the hyperlink:
EXAMPLE
TIP
An empty ALT attribute is actually preferable to one holding an asterisk, space, or other
filler character. Screen readers will speak each asterisk if used that way. Far better to
leave it empty than to have a lot of garbage read to you.
135
If the user agent supports mpeg movies, the content of the outermost
object element, the movie will be played. If it doesnt, the still image will
be offered. If a still image cannot be rendered, the text content will be
given.
136
This form can now be navigated by pointing and clicking inside each control, by using the Tab key to cycle through each control, or by activating a
single keyboard entry to directly access a specific control.
The Checkpoints
Each of the checkpoints defined in the WCAG document are presented here
in one of three tables; Tables 7.1 through 4.4 show the checkpoints for each
priority level. The tables are organized by priority and include the feature
that the checkpoint covers (general issues, images, and so on) and the
instruction.
The Checkpoints
137
Checkpoint
General
Tables
Frames
Applets and scripts
Multimedia
Checkpoint
General
138
Tables
Frames
Forms
Checkpoint
Mark up lists and list items properly.
Mark up quotations. Do not use quotation markup to produce
indentation or other visual effect.
Ensure that dynamic content is accessible or provide accessible alternatives.
Allow the user to turn off blinking (until user agents provide a
user-adjustable setting).
Do not create auto-refreshing pages (until user agents allow
user-control of this feature).
Do not use markup to provide auto-redirects. Use server controls instead.
Do not create pop-up windows or cause the current window to
change without informing the user.
Use W3C technologies when available, and the latest version
that is supported.
Avoid deprecated features in W3C technologies.
Divide large blocks of information into more manageable units
where appropriate.
Clearly identify the target of each link.
Provide metadata as necessary to add semantic information to
pages and sites.
Provide information about the general layout of a site (site
map, table of contents, and so on).
Use consistent navigation schemes.
Do not use tables for layout unless the content can be sufficiently linearized.
If you do use tables for layout, do not then use structural information for visual rendering.
Describe the purpose of a frame and how it relates to other
frames if this is not obvious in the frame titles.
Position labels clearly in relationship to their input controls,
until all user agents allow explicit binding.
Associate labels explicitly with their controls when possible.
Use event handlers that are device independent.
Avoid movement in pages until user agents allow the user to
freeze content.
Programmatic applets or scripts should be directly accessible
or compatible with assistive technologies. (If functionality is
critical, this becomes a Priority One checkpoint.)
Any element that has its own interface must be operated in a
device-independent manner.
For scripts, specify logical event handlers rather than devicedependent handlers.
Whats Next
139
Checkpoint
General
Forms
Whats Next
In this chapter, you have learned how people with disabilities might face
barriers when accessing online content. Youve been introduced to Web
Content Authoring Guidelines and techniques for creating accessible Web
pages.
Next up, well take a look at implementing XHTML in todays Web sites,
both on the Web and in the intranet or extranet, and how authors can
smooth the transition between HTML 4 and XHTML.
8
Validating XHTML Documents
Before making your Web pages accessible to the public, youll want to take
several steps to make sure theyre in the best shape possible. They should
be run through a spell-checker, a proofreader can help with grammar, a
copy editor might be employed to help out, and your pages might need to go
through several levels of managerial approval if youre working on a company site. All of these tasks are necessary in many cases, but one more
check should be made before going live: validation.
This chapter teaches you:
What validation means
How to use the W3C HTML Validation Service
How to interpret and correct errors
142
Why Validate?
Because XHTML requires that documents be both well-formed and valid,
and validity itself requires well-formedness, one check can ascertain that a
document conforms to both. Practically, validating your documents provides
the experienced Web author benefit in finding errors that were unintentional, such as typos or a forgotten closing tag.
Typo Control
One of the most rudimentary uses of validation is to catch some of the simple problems, such as typographical errors. Any developer whos not using
an authoring tool that creates tags for him is bound to make a few typos in
any given document. Rather than having to painstakingly read each page
for problems, a quick pass through the validator can point them out in seconds (see Figure 8.1).
TIP
Web authors from outside the United States should be aware that XHTML has adopted
U.S. English standards of spelling. Terms such as colour and centre are spelled
as color and center. Validators will consider the Queens English versions as
erroneous.
Beyond the spelling of words used as element and attribute names, the validator can check the accuracy of your typing when quoting values, entering
slashes and brackets, and any other characters necessary to make tags
complete.
Why Validate?
143
If your system is like mine, youll see the table as you intended in Internet
Explorer, which helped you and assumed you did close your table (see
Figure 8.3). However, in Navigator, the page is blank! A blank page is certainly not what you intended. If you only spot checked your work in
Internet Explorer, you might never have known there was a problem until
someone pointed it out to you.
144
Figure 8.2: Well-formedness errors can be easy to spot on your own; in this
case, a missing </a> tag.
Figure 8.3
Figure 8.3: A single missing tag can have drastically different results!
TIP
Feedback on the Web tends to mimic that of any service industry. Youre far more likely
to have dissatisfied visitors just go away and never return, rather than complain and
tell you whats wrong. Therefore, relying on user feedback to know if theres a problem
can be a dangerous practice.
Interoperability
One issue thats becoming more and more important as the Web is expanding beyond the desktop computer is that of interoperability. What works on
145
your desktop, a computer with a Pentium III processor and 128MB of RAM,
is not necessarily going to work on your 8MB Palm Vx, or on your cellphone with an even smaller memory allotment. In fact, we can almost guarantee that much of it wont work.
Now of course Im not suggesting that a cell-phone should be able to visit
all Web sites. We know thats simply not possible, nor is it even desirable.
However, making sure that your Web pages are well-formed and valid can
increase the likelihood that less powerful devices can make use of the sites
you do produce.
Smaller devices simply dont have the computing power to handle the complex process required to anticipate what a document author might have
intended to write. Browsers that dont attempt to do this are smaller and
can run on less powerful machines, which means that your Web pages,
when well-formed and valid, have a higher likelihood of success on those
same devices.
146
Figure 8.4: The W3C HTML Validation Service Web site is used to check
conformance to the XHTML specifications.
The CGI script activated by your form submission will retrieve your document from the URI specified, and then parse the file looking for errors. The
validator uses the DTD specified in your doctype declaration to determine
just which grammar its validating against. As it parses the document, it
makes note of any errors found and where they are, and compiles them for
the report back to you when its finished.
147
EXAMPLE
Although its certainly nice to get a no errors report on the first pass
through the validator, not every document will succeed. Your next task,
then, is learning how to interpret the error reports given by the validator.
Listing 8.1 is a simple XHTML document that contains a few wellformedness errors.
Listing 8.1: trip.html
<!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN
DTD/xhtml1-strict.dtd>
<html>
<head>
148
Type up and upload this page to your Web site, and you can run it through
the validator and see the results yourself. Well also step through them
here.
The first two bullet items in the error report (see Figure 8.6) refer to the
same problem.
149
second half of this sentence, but OMITTAG NO was specified, isnt anything you really need to pay attention to. It simply means that the validation service script was told that XHTML 1.0 Strict doesnt allow end tags to
be omitted (no XML end tags can). To fix this error, simply add the closing
title tag.
Point three (see Figure 8.7) tells us that there is an end tag for the h2 element, but there isnt an open h2 to go with it. If you look carefully at the
line of markup quoted to you, youll see that the phrase started out being
opened with an h1 element. I apparently mismatched my tags on this one.
Changing the closing </h2> to an </h1> fixes this problem.
150
the position of the start tag for the li element in question. Adding the closing tag will fix the error. What the validator did was note that the li tag
was open, and continued along in parsing the document until it hit a tag
that would have been a problem occurring inside an open li tag, and that
was the closing body tag. At that point it stopped and said oops, that li
element back there never got closed. Heres where I stopped, and this is
why (the closing body tag, and because the end tag for li was omitted),
and Ill be helpful and show you where it started.
The next two points, shown in Figure 8.8, have already been corrected.
151
you might want to revalidate your documents after fixing each error. It will
cut down on the number of remaining issues, and can help fight any feelings of being overwhelmed!
152
153
Just looking at the file in your Web browser should give you a clue that
theres something wrong; heres how it rendered for me using Netscape
Navigator 4.72 (see Figure 8.10).
Figure 8.10: This is not the presentation we expected for this table.
The first pass the validator takes at the document looks like Figure 8.11.
The first item that should catch your eye in this report is that the document type is listed as HTML 4.0 Transitional! Were writing XHTML; what
happened? The clue is in the first bulletted errorthe doctype declaration
is missing. This document needs the XHTML 1.0 Transitional doctype, so
well add that in:
<!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN
DTD/xhtml1-transitional.dtd>
Before revalidating, lets look at the next two errors. This first one is a little obtuse, saying literal is missing closing delimeter, with the caret
pointing at my name in the meta tag. The second error says that the
CAPTION element isnt allowed to be where it is, and that the validator has
154
assumed that the opening table tag is missing. Lets take a look at the section of code between these two elements:
<head>
<title>September 2000</title>
<meta name=author content=Ann Navarro
</head>
<body>
<p>
<table border=1>
<caption><b>September 2000</b></caption>
Whats Next
155
have forgotten the table tag! Well, no, clearly we didnt. What happened is
that the validator ignored the data between the real end of the meta tag
and the caption element because it thought it was processing attribute
value information. If we close up the meta tag, this error will go away.
Revalidate the file, and lets look at any remaining errors (see Figure 8.12).
Whats Next
In this chapter you have learned how validation serves as a grammar
checker for Web documents. Youve learned how to use a popular and
trusted validation service provided by the W3C. Errors can be simple
things to locate and fix, or you might need to read between the lines in
what the validator is reporting to find the real problem.
Next up in Chapter 9, Implementing XHTML Today, well take a look at
implementing XHTML in todays Web site, including issues surrounding
updating existing sites, remaining compatible with older browsers, and
finding authoring tools that support XHTML.
9
Implementing XHTML Today
One of the most delightful aspects of XHTML is that you can begin using it
immediately with near perfect results. The well-formedness and validity
requirements of XHTML, when followed, create highly interoperable documents that can be viewed on nearly any devices that accesses the Web.
You might have noticed that I said near perfect results. In the transition
between HTML and XML, there are several areas that deserve special
attention from authors. The W3C has outlined these for us in the XHTML
1.0 Recommendation, and well be reviewing each of them here.
This chapter teaches you:
How to manage XML-based techniques in an HTML-browser world
How to position your XHTML documents for success in XML parsers
The tools for converting HTML to XHTML
Automated systems for applying features in the Compatibility
Guidelines
158
The encoding attribute value can be modified when necessary to incorporate extended character sets. Most documents will do fine with the UTF-8
and UTF-16 character sets, which are accepted defaults on most systems.
A backward compatibility issue arises in how user agents deal with markup
they dont understand. The recommendation has always been that the
browser should pass through the content of the markup as raw data (essentially plain text). Most browsers interpret the processing instruction as an
empty element, and therefore dont pass anything through to the browser
because by definition, empty elements have no content.
159
Testing has shown a few browsers either mistakenly pass through unknown
empty elements (the processing instruction) to the document, or simply
dont recognize PIs as an element at all, and pass it through as PCDATA.
These browsers include Netscape Navigator 3.0 and earlier, HotJava 1.1.5
and 3.0, Opera 3.6, Internet Explorer 3.x and 4.x/Mac. Luckily, all these
have newer versions available and the number of visitors seeing such a declaration should be negligible. Still, to avoid passing it through completely,
forgo the XML declaration when serving the documents as text/html.
The rules found in the XML 1.0 Recommendation dont allow for empty elements so a change in syntax is required moving from HTML to XHTML.
Two options for managing empty elements are available to the author. The
more straightforward solution is to simply add a closing tag:
<img src=graphic.gif alt=My Graphic></img>
160
has a corresponding closing tag. The second text segment was written to be
broken using the XML shorthand <br/>, as seen here:
<h1>Segment 1</h1>
<p>Mary had a little lamb,<br></br>with fleece as white as snow</p>
<h1>Segment 2</h1>
<p>Everywhere that Mary went,<br/>the lamb was sure to go.</p>
Figure 9.1: The two XML-based methods of handling empty elements have
uneven results in Navigator 4.73.
In Figure 9.1 you see that the results are uneven. The closing-tag method
has the results we were after, but the slash-shorthand, doesnt.
XHTML explored possible alternatives to these methods, looking for something that would be both XML compliant and backward compatible, producing the results that weve come to expect from these elements. Luckily,
a solution was found. It is recommended that authors use the slashshorthand version, but only after inserting a space character between the
end of the element name and the slash, so that
<br>
becomes
<br />
161
content empty. For instance, when working with tables, authors will often
need to denote empty table cells by writing:
<td></td>
EXAMPLE
It has become customary for Web authors to hide JavaScript (or any scripting segment for that matter) embedded in a document using the HTML
commenting facility. A quick way some authors indicate the date a file was
last modified is to use a small JavaScript snippet like this one:
<script type=text/javascript language=JavaScript>
<!-document.write( on + document.lastModified + );
// -->
</script>
The actual script code is hidden between the HTML comments <!-- and -->
with a JavaScript comment indicator, //, thrown in for good measure before
the closing HTML comment.
For HTML browsers, this technique has become a tried and true method.
No current browser has any substantial problem with the technique; the
script gets passed to the scripting engine appropriately, and the raw script
code is hidden from view in the rendered document.
However, XML parsers handle comments a bit differently from an HTMLbased Web browser. Instead of passing on the code to the scripting engine,
an XML parser ignores the contents of the comment completely, in essence
just throwing it into the proverbial bit bucket. To avoid this problem, scripting code needs to be wrapped in a CDATA section that the XML parser will
recognize as something its supposed to ignore, yet retain for use by another
program. The CDATA wrapper is written as follows:
<![CDATA[ script here ]]>
The cue for the XML parser to stop ignoring the code is the string ]]>.
Although its possible for that string to appear in JavaScript code, its generally rare and certainly avoidable.
162
TIP
A happy byproduct of using an external script file is that you can comment the code to
your hearts content without needing to worry about nested or dropped comments. This
makes reuse of your script code far more accessible.
163
can provide the additional context and, in some cases, even pronunciation
hints to certain user agents.
XML uses the special xml:lang attribute for this language identification
task. It has been recommended that authors use both versions of the
attribute in XHTML documents to cover processing by either an HTMLbased browser or an XML parser. Such usage on an element would look like
<p>Jack had a strange sense of <span xml:lang=fr lang=fr>deja vu</span> the
moment he entered the room.</p>
TIP
If there is a discrepancy between the values of the xml:lang and lang attributes, the
value of the xml:lang attribute takes precedence.
Notice that the value of the name and id attributes are identical in this
example.
Some additional restrictions come into play for the value of these attributes. The XML requirement of using an attribute type ID is in conflict with
the HTML name attribute, which allows any CDATA in the attribute value.
To help normalize these differences, the type of the XHTML name attribute
has been changed to NMTOKEN. Essentially, this requires the value to be a
single string of characters composed of any letters; numbers; and the characters ., -, _, and :.
164
TIP
XML further restricts NMTOKENs from beginning with the characters XML in any variation of case. Those are reserved for special tokens developed in the XML
Recommendation itself.
Finally, the value of each id attribute needs to be unique within the document (note that this does not forbid the matching name attribute value).
XML, on the other hand, uses the encoding attribute on the XML declaration to store this information. Previously in the XML Processing
Instructions section of this chapter, we discussed the fact that the XML
declaration must be present if the encoding is to be anything other than
UTF-8 or UTF-16. So in the case here of a Japanese encoding, the XML
declaration must be present, and would look like
<?xml version=1.0 encoding=EUC-JP?>
165
This technique must be used for all Boolean attributes, including checked,
compact, declare, defer, disabled, ismap, multiple, noresize, noshade,
nowrap, readonly, and selected.
Current Web browsers, those written to be aware of and compliant with the
HTML 4 Recommendation, shouldnt have a problem understanding these
expanded attributes. Earlier browsers, notably of the era of Netscape
Navigator 3 and Internet Explorer 3, might ignore the attribute as it would
other unknown markup.
166
This prescription also includes attribute values that hold URI values. Some
Web authors mistakenly believe that ampersands included in URIs cannot
be written using the & character entity because the link or source reference will no longer function. This actually isnt the case. The URI will contain & in the source document, when its displayed in the browser the
character entity is rendered, and the URI will look and function just as if it
were written using the & character directly.
167
HTML Tidy
The W3C has published a tool that helps authors clean up HTML files and
format them for XHTML (as well as XML). Its a very powerful utility, yet
one that can be somewhat opaque when youre first trying to learn how to
use it. Written by engineers, in the command-line format that so many of
them love, its not the most user friendly of programs for those who view
DOS as something out of the history books and who rarely if ever encounter
a Unix system. In other words, quite a few of us!
NOTE
The supporting Web site for HTML Tidy, including links for downloading the various ports
of the program, can be found at http://www.w3.org/People/Raggett/tidy/.
EXAMPLE
A fairly standard usage has Tidy modifying the source document in place as
necessary, and recording the changes and errors present in a separate text
output file. To initiate this operation, the following instruction is given at
the command line:
tidy -f errors.txt -m file.html
Taking this command apart piece by piece, we start by invoking the program itself with tidy. The flag or switch -f before errors.txt tells Tidy
that it needs to create a file named errors.txt, to which the output of the
program will be written. The second flag -m tells Tidy to modify the input
file, represented above by file.html. A successful run through Tidy will
result in the report seen in Figure 9.2.
To do anything more than basic tidying of your source files, youll need to
create a configuration file. This text file consists of a list of properties separated by their corresponding value by a colon. Listing 9.1 shows a basic configuration file that could be used for converting HTML source documents to
XHTML.
Listing 9.1: config.txt, A Configuration File to Be Used with HTML Tidy
wrap: 72
markup: yes
output-xhtml: yes
uppercase-tags: no
uppercase-attributes: no
char-encoding: utf8
error-file: errors.txt
168
show-warnings: yes
TIP
Notice that the Errors Report and Modify In Place flags arent necessary when a configuration file is in use, provided those options are annotated in that file.
TidyGUI
For those of us more comfortable in a graphical user interface than the
command line, Andre Blavier has written a GUI overlay for HTML Tidy. An
external configuration file is generated to control Tidys options, just as
with the command-line version, but a familiar set of tabbed dialog boxes
are used to produce it, saving the user from having to remember property
names and the syntax used to store them (see Figure 9.3).
Links to obtain TidyGUI can be found on the Tidy home page noted previously, or directly from the authors site at
169
http://perso.wanadoo.fr/ablavier/TidyGUI/.
Figure 9.3: The TidyGUI configuration dialog box provides easy access to
all Tidys options.
To run TidyGUI, locate the source file using the Browse dialog box (see
EXAMPLE
Figure 9.4), or type the path directly into the source file box.
Figure 9.4: Locate the source file to be tidied using the Browse dialog box.
All thats left to do is select the Tidy! button and watch the program go to
work. After processing youll see a list of warnings and errors along with
the location of their occurrences in the lower pane of the main TidyGUI
window, and a single description of the situation per warning type in the
upper pane (see Figure 9.5).
The rewritten output is accessible using the Output button, which instantiates a new window where the markup is displayed (see Figure 9.6). Ive
found it a bit inconvenient that you cant simply select and copy the output
directly from this window. Instead, you must use the Save dialog box to
save the file locally before opening it again with the text editor of your
170
choice.
Figure 9.5: Warnings listed in the lower pane are explained in the upper
pane.
Figure 9.6: Output is accessed and then saved from a new window.
HTML-Kit
EXAMPLE
HTML-Kit isnt an XHTML editor per se, but is a fully functional HTML
Whats Next
171
authoring environment that has integrated the HTML Tidy features into
the program. Of the available tools for using Tidy, HTML-Kit is my favorite
in that it provides immediate access to the source and tidied document via
a split-file view, options to perform specific Tidy tasks one at a time without
need for detailed configuration files, and a convert to XHTML feature (see
Figure 9.7).
Figure 9.7: HTML-Kit provides single-operation choices for using Tidy on
your documents, including convert to XHTML.
HTML-Kit is available and supported through the Chami.com Web site,
located at http://www.chami.com/html-kit/.
Whats Next
In this chapter you have reviewed each of the guidelines published by the
172
W3C for retaining compatibility in XHTML documents for both HTML- and
XML-based processors. Youve used one of several tools available for tidying
your source files and converting HTML-based source to XHTML.
Next, in Chapter 10, XHTML as the Bridge to XML, well explore how
XHTML serves as the bridge between HTML and XHTML, adding the
freedom to define new elements and attributes to the familiar and trusted
structure and semantics of HTML.
Part II
XHTML Style and Structure
10 XHTML as the Bridge to XML
11 Using Cascading Style Sheets with XHTML
12 XSLStyle the XML Way
13 Document Type DefinitionsThe Syntax Rulebook
10
XHTML as the Bridge to XML
Back at the beginning of this book, in Chapter 1, XHTML Fundamentals,
we talked about how the W3C recognized it needed some sort of transition
between the HTML most Web authors were used to working with, and
XML, the new language that provided a framework for customized language definitions and new markup vocabularies. The job of making that
transition was given to XHTML. XHTML 1.0 takes on many of the traits of
XML, while retaining backward compatibility with HTML 4, acting as that
metaphoric bridge between the two technologies.
This chapter teaches you:
How XML provides more freedom for document authors
What constitutes well-formedness
How languages can be defined using document type definitions
How a new XML-based technology, schemas, can also define
XML-based vocabularies
176
EXAMPLE
We can force existing elements to do what we want by using them in unconventional or even invalid ways, but doing so also compromises interoperability. Back in Chapter 1, you were introduced to one of my favorite
examples of forcing meaning upon vague structure in an XHTML-based
memorandum. The Web page author essentially has two choices when creating the document: use heading elements for each addressing component
(see Listing 10.1), or simply use a paragraph and some line breaks with a
liberal dose of emphasis (see Listing 10.2).
Listing 10.1: A Memo Using Heading Elements for Address Structures
<html>
<head>
<title>Memorandum</title>
</head>
<body>
<h1>Memorandum</h1>
<h2>To: Stacey Baker</h2>
<h2>From: David Angeles</h2>
<h2>Subject: Vacation Request</h2>
<h3>Date: September 22, 2000</h3>
<p>Stacey,</p>
<p>This note confirms my requested vacation dates, in order of preference, for
the 2001 calendar year:
<ol>
<li>March 19, 2001 through March 30, 2001</li>
<li>October 8, 2001 through October 19, 2001</li>
<li>December 17, 2001 through December 28, 2001</li>
</ol>
<p>Regards,</p>
<p>David</p>
</body>
</html>
Listing 10.2: A Memorandum Using Paragraphs and Strong Emphasis for Addressing Components
<html>
<head>
<title>Memorandum</title>
</head>
<body>
<h1>Memorandum</h1>
<p><strong>To:</strong> Stacey Baker<br />
177
178
179
...more content...
</body>
</html>
NOTE
In the formal vocabulary of markup languages, youll often hear elements that can be
nested referred to as child elements. The element that contains the nested element is
the parent element. This relationship becomes more important when you begin working
with scripting and the Document Object Model.
180
In HTML, the quotes were optional if you were working with attribute
values other than URIs or non-alphanumeric characters, such as the #
symbol in hex-based color codes. In XHTML (as well as XML), all
attribute values must be quoted for the document to be considered
well-formed.
Boolean attributes must be expanded. Stated another way, this
says every attribute must have a value. A Boolean attribute is one
which is on just by its presence. In HTML 4 an example could be the
checked attribute on the input element of type checkbox:
<input type=checkbox name=mybox checked>
or
<img src=myphoto.gif alt=a picture of the author/>
181
EXAMPLE
For instance, the ordered list element (ol) used previously in the memo
found in Listings 10.110.3 is designed to contain only one or more list item
elements (li). That constraint cant be checked nor can it be enforced
through the state of being well-formed.
Whats needed, then, is a second set of rules that a document can be judged
against. Historically, beginning with SGML and moving on through HTML,
this rule set is provided by the document type definition. This document
provides a formal description of the elements, attributes, and allowed content for each segment of a document that might be written in the language
being defined.
Beginning in 1998, the W3C has been working on a second method of
describing and defining languages. This effort is known as XML Schemas.
The desire is to end up with a definition mechanism that is simultaneously
more flexible than a document type definition, but also more expressive.
The expressiveness comes with the ability to constrain element content not
only down to the level of simple text data, but to say the content must be a
10-digit whole number, or a string of five lowercase letters. The additional constraints possible with schemas are viewed as a significant
advancement in Internet commerce and automated data exchange. Just
how well these efforts work remains to be seen, as the XML Schema specifications are solidified and ratified by the W3C.
182
A valid document, then, is one that fully conforms to all the rules in the
document type definition. Youve been exposed to this at a basic level
already in Chapter 8, Validating XHTML Documents, when you checked
your documents using the W3C Validation Service.
For most documents that Web authors will work with, inserting the DOCTYPE
declaration at the beginning of your pages and running them through the
validator is the closest youll come to the document type definition. But
some of you, at least a small part of the time, will begin working with
DTDs to customize portions of the language and use XHTML
Modularization to create new languages.
For more information on creating custom components for DTDS, see Creating the Module
Using a DTD, p. 266.
Overview of Schemas
Over the years document authors voiced many complaints about the nature
and limitations of the document type definition as the basis of a markup
language. Many people find the notation and formatting used to be overly
formal and, primarily, too confusing for the non-expert to be able to readily
grasp.
Additionally, document type definitions are limited in the constraints that
can be placed on data content within elements, and the type of string data
that can be found in attribute values.
For instance, documents that contain phone numbers could make use of a
<telephone> element. Nothing other than the number (and its dividing
dashes) should be present:
EXAMPLE
<telephone>941-555-1234</telephone>
Whats Next
183
their working draft phase, and were about to move into the final review
process. Youll walk through a very basic primer on XML Schemas in
Chapter 15.
NOTE
Any material changes in this book dealing with the schema drafts that occurs after publication will be updated on the books supporting Web site.
The hope stored in much of this effort is that schemas will be more userfriendly. Additionally, they should offer sufficient facets for constraining
information so that document authors will be able to move easily between a
document-based information storage system and a data-driven system (and
of course other scenarios we probably havent thought of yet). To address
the user-friendly aspect, XML Schemas are written in an XML-style format instead of the more formal (and cryptic) notation used in document
type definitions.
Whats Next
In this chapter you have learned how XML can provide document authors
with the freedom to give meaning to their document structures by giving
them names that correspond to their purpose or contents. A parser or other
user agent wont know what these mean or even particularly care about
them, but data management systems can manage content much more easily
searching for <foo>, rather than a generic <p> structure that has the text
string foo inside of it.
Weve also taken a close look at the definition of well-formedness, both
from an XML viewpoint and specifically the requirements for XHTML.
Conformance to a language can go further than well-formedness, by validating against a DTD or a schema.
Next in Chapter 11, Using Cascading Style Sheets with XHTML, well
take an in-depth look at XHTML 1.1, a version that takes a closer step
toward XML by removing deprecated elements and syntactical conventions.
11
Using Cascading Style Sheets with
XHTML
The last chapter introduced you to XHTML 1.1, where all presentational
elements and attributes have been removed from the language. Of course
this doesnt mean that Web pages created using XHTML 1.1 will be dry,
text-only documents. Instead, it means that presentation and style must be
applied to the document in some manner other than with XHTML. That job
now falls to a document known as a style sheet.
Style sheets can be written in a number of languages. In this chapter, well
review one that might be familiar to you: Cascading Style Sheets (CSS).
CSS first became popular as an adjunct to HTML. Today, it can still be used
with XHTML and has the benefit of familiarity among many Web developers. This chapter is not intended as a comprehensive tutorial on developing
style sheets. Such an endeavor could fill at least one book all by itself.
Instead, you will review the syntax used to create style rules, review the
different selectors available to bind styles to elements or instances of elements, and take a quick tour through some commonly used style properties.
This chapter teaches you:
The syntax of a basic style rule
How to create non-element selectors (class and ID)
How to use text styles
How to apply style within blocks
186
The file is then saved using the .css file extension. The style sheet is then
linked to the document using the link element inside the head element.
The XHTML document begins as it normally would:
EXAMPLE
Before the head element is closed, the new link element will be inserted.
This element takes a minimum of two attributes: rel and href. The rel
attribute describes the relationship between the current document and the
document being linked. The href attribute works as it does with an anchor
(a) element; it provides the URI of the linked document. Our link element,
then, is written as follows:
<link rel=stylesheet href=calendar.css />
The head element can be closed after this, and the remainder of the document put in place:
</head>
<body>
<h1>Scheduled Meeting</h1>
<p><b>With:</b> Shane McCarron<br />
<b>From:</b> ApTest<br />
<b>Time:</b> 10:00 am</p>
<h2>Details</h2>
<p>Discuss parameters of test suite for conformance of Widgets 1.0 to the W3C
standards.</p>
</body>
</html>
187
The results are shown in Figure 11.1, where youll find underlined text
within the paragraphs.
OUTPUT
Figure 11.1: This XHTML page was created using an external style sheet.
NOTE
The bold portions of the text in Figure 11.1 also are underlined. This is because the
bold element inherited the underlining from the paragraph element. Inheritance is one
of the main features of working with CSS, in that an element contained within another
element inherits the properties of its parent element. For example, the bold element is
contained within the paragraph element, sometimes referred to as child (bold) and parent (paragraph).
The possibilities available to designers working with style sheets are quite
complex, allowing for some very stunning results. One of my favorite implementations, even if it does seem a bit biased, is the HTML Writers Guild
site (see Figure 11.2).
For example, take a look at the navigation bar down the left side of the
screen in Figure 11.2. The links look like graphical buttons; theyre raised,
with beveled edges. The larger blocks of color, across the top and behind the
Recent Announcements section, are set using a CSS rule, as are the colored borders around text in the main (white) portion of the page.
To generate looks like this, the Webmaster made extensive use of classes,
grouping similar selectors together to write a single rule.
188
Identifying Selectors
A selector is the element, or group of elements, to which a style will be
applied. Selectors can be individual elements, lists of like elements, or specific instances of elements identified by class or ID attributes. In this next
section, well look at selectors of each type.
Elements as Selectors
Using an element as a selector is pretty straightforward. The style rule is
written by giving the element (selector), followed by the property or properties and values bound within braces:
selector {property:value}
Multiple elements that will have the same properties can be grouped to
save redundancy in the style sheet document. For instance, you might want
all headings to be colored red. Instead of writing
h1
h2
h3
h4
h5
h6
{color:red}
{color:red}
{color:red}
{color:red}
{color:red}
{color:red}
Identifying Selectors
189
There will inevitably be times, however, when youll want the majority of
instances of a given element to be treated in one manner, but one or more
instances need to be given a different look. To handle this a new type of
selector, one not based on element names, needs to be used. These selectors
are known as classes.
Creating Classes
The syntax within the style sheet using a class as a selector is nearly identical to that used when the element is the selector. The only difference is
that classes use other-than-element names and begin with the period character; for example
.myclass {text-align:right; color:blue}
The <p> tag on the second paragraph needs to be adjusted to take the new
class assignment:
<p More
class=newp>
free ebooks : http://fast-file.blogspot.com
190
Note that the class attribute value is the name of the class without the .
character. Figure 12.3 shows the results; quite a difference between them,
isnt there?
NOTE
Although our example had a base style rule for the p element, it is not necessary for
there to be an existing style rule before a class rule can be applied.
The selector name is the value of the id attribute, prepended with the hash
character (#). A rule created for the element
<h1 id=a1>Generic Heading</h1>
would be written as
#a1 {property:rule}
In most design cases, class selectors will be used rather than ID selectors,
unless youre fitting an existing document with the specialized ID structure
alreadyMore
in place.
free ebooks : http://fast-file.blogspot.com
Applying Style
191
NOTE
The style attribute, to be used for inline styling in documents, is being deprecated.
Deprectation, youll recall, means that the W3C is indicating that the method may not
appear in future versions of the language, and generally, an alternative method will be
provided.
Applying Style
Now with the structure of style rules under your belt, lets take a quick tour
of popular styles. The full spectrum of style rules and possible values can
be found in the CSS recommendations on the W3C Web site. CSS Level 1,
the first version of CSS, is located at http://www.w3.org/TR/REC-CSS1. CSS
Level 2, which builds upon Level 1, is found at http://www.w3.org/TR/
REC-CSS2/. CSS1 covered the basics: fonts, list formatting, and box properties
such as margins and padding. CSS2 continued the work by adding control
of positioning, the ability to float boxes in a given area, and whats known
as the z-index, for a psuedo-three-dimensional presentation.
At present not all browsers fully implement CSS1. CSS2 support in
browsers is far from complete. Internet Explorer 5.5 and the previews of
Netscape 6 come close to fully implementing CSS1.
Values
font-family
font-style
font-size
font-variant
font-weight
192
big box
4 smaller boxes
that go into
a bigger box
2 smaller boxes
that go into 1 of the
small boxes
box
padding
Figure 11.4: This shows how a document layout can be visualized as a set
of nested boxes.
The large outer box is the html element, containing everything else in the
document. The biggest box inside is the body element, smaller boxes in that
are paragraphs, tables, lists, headings, and so on.
XHTML elements are easily divided up into block-level elements and the
rest, which are known as inline elements. All block elements are rendered
beginning on a new line in the browser. Youll have noticed that headings,
paragraphs, and div behave in this manner. Inline elements, on the other
hand, are in the current line. Tags such as em, strong, span are inline elements.
CAUTION
It is possible to change the display model for elements using the CSS display property. However, unless theres a particularly compelling reason to do so (and Im hard
pressed to think of any), avoid changing these expected semantics. Its far better to
find an element with the correct semantics and use that.
Block elements have three distinct areas of space surrounding them: the
margin, the padding, and the border. Each of these areas is shown in the
drawing in Figure 11.5.
Applying Style
193
content
padding
margins
border
Figure 11.5: This shows margin, padding, and border spaces surrounding
a box.
These spaces are typically modified in terms of width and height to provide
visual white space between box elements and, when using the border space,
to provide a colored or other distinctive frame.
The document in Listing 11.2 has a heading and two simple paragraphs. To
demonstrate each of the box spacing properties, weve defined two style
rules using the h1 and p selectors.
EXAMPLE
The heading is given borders 10 pixels wide on the top and bottom only, colored in green. The paragraphs have 2 pixel-wide borders on all sides, colored purple. Additionally, the paragraphs have a left margin of 4 em spaces
and a padding of 1 em space. The style sheet is shown in Listing 11.3.
The listing in 11.2 is a simple XHTML document to which we will apply the
box.css CSS style sheet.
Listing 11.2: A Simple XHTML Document Where You Can Apply the box.css CSS Style Sheet
<html>
<head>
<link rel=stylesheet href=box.css>
</head>
<body>
<h1>Did you know?</h1>
<p>Theres a special paragraph that contains every letter in
the English language. Students who have taken a typing class
will surely be familiar with it.</p>
<p>The quick brown fox jumped over the lazy dog.</p>
194
Listing 11.3 applies box spacing properties to the XHTML document listed
in Listing 11.2.
Listing 11.3: The XHTML Document After Applying Box Spacing Properties
h1 {border-top-width:10px;
border-bottom-width:10px;
border-style:solid;
border-color:green}
p {border-width:2px;
border-color:purple;
padding:1em;
margin-left:4em}
The resulting page (shown in Figure 11.6) clearly shows the results of the
padding, with a four-sided gutter of space between the text and the purple
border, as well as the deeper left-side margin for the paragraphs.
Applying Style
195
word or letter spacing. These properties and their potential values are
listed in Table 11.2.
Table 11.2: Intra-block Spacing Properties
Style Property
Potential Values
text-align
text-indent
line-height
word-spacing
letter-spacing
TEXT ALIGNMENT
Aligning text is a process youre already familiar with using the align
attribute. The property used in CSS is the text-align property. The default
placement in XHTML is aligned to the left margin. Other possibilities are
aligned to the center, to the right margin, or a justified alignment, where it
appears that the text is aligned to both the left and right margins.
Justification is actually achieved by stretching the spaces within the line to
allow the ends to stay docked at the margin.
INDENTATION
For many years the ability, or perhaps better put the inability, to indent the
beginning of paragraphs became a seemingly never-ending quest. Designers
would try to force spaces using the character entity, only to have a
browser collapse the white space. Theyd use small blank .gif images, yet
struggle with the changes in font sizes and line height.
When CSS was developed, a new property was defined: text-indent. The
value is a measure, expressed either in units such as pixels or em spaces,
or as a percentage of the available space within the block. Using the standard three-em indent so long sought after, the rule is expressed like this:
p {text-indent:3em}
Provided your browser supports this property, the result would look like
Figure 11.7.
LINE HEIGHT
Line height is a measurement from the baseline of one line of text to the
baseline of the line above or below it. The baseline is the invisible line on
which the bottoms of most letters rest (letters like g and q project below the
baseline with their tails). Potential values for the line-height property
include specific units (pixels, centimeters, and so on), and relative values. If
a font size is set to 10 pt, a line-height value of 1.4 results in the same
space allocated to a 14-pt font.
196
Figure 11.7: This page shows paragraphs with indented first lines, thanks
to the text-indent CSS property.
TIP
When working with line heights, keep in mind that the excess space, known as the leading (that not occupied by the font in use) is equally divided above and below the space
occupied by the font.
WORD
AND
LETTER SPACING
If youve used justified text alignment, youve already seen word spacing in
action. The even margins are achieved by stretching the space between
words to fill the available distance between the first and last word on the
line.
You can change this distance manually using the word-spacing property.
The default value of this property is normalwhat the browser will
choose based on the font designers choices. Manual adjustments use a CSS
length value (pixels, em spaces, and so on).
Although word spacing operates on the space between whole words, the
letter-spacing property, also valued in CSS lengths, adjusts the space
between individual letters.
Care should be taken when modifying this property for large blocks of text.
Font designers have developed significant expertise in the readability of
their fonts and provide default settings for the space between individual
Whats Next
197
letter pairs. For instance, the space between a V and an A when next to
each other, VA, is often different from the space between a P and an O,
PO, because of how neatly the V and A letters fit together based on their
complementary angles. Font designers have taken these issues into consideration when setting the defaults. Adjusting them can lead the reader to
notice that something is slightly off in the presentation, though theyll
often not be able to discern just what that is.
Whats Next
In this chapter weve reviewed the basic syntax for creating and referencing
external CSS style sheets. Youve developed style rules based on single elements, groups of elements, classes, and id selectors.
Next up in Chapter 12, XSLStyle the XML Way, well take a look at the
XML-based style language XSL: the Extensible Style Sheet Language.
12
XSLStyle the XML Way
In the last chapter, you learned about the separation of structure and presentation in XHTML 1.1 and saw one way to apply style information to
XHTML 1.1 pages using Cascading Style Sheets.
Such complete separation of structure (content) from presentation in
XHTML 1.1 implements one of the basic principles of the Extensible
Markup Language (XML).
XHTML, whether version 1.0 or 1.1, complies with the syntax requirements of XML 1.0, for example the need to have an end tag for each
start tag. But XHTML 1.0 only went part way to making HTML fully
XML compliant. XHTML 1.0 included many presentation elements within
it, thus mixing content and presentation. To move XHTML closer to full
compliance with XML principles, those presentation-related elements had
to go. XHTML 1.1 removes the presentation elements (tags) from XHTML,
bringing XHTML more fully into line with the requirements and principles
of XML 1.0.
Cascading Style Sheets can be used with XML or XHTML, but CSS has
limitations when applied to documents where there is no presentation
information at allfor example, with respect to reordering elements in the
output document. In this chapter you will learn about another solution to
the need to apply style to XML documents, Extensible Stylesheet Language
Transformations (XSLT), when content and presentation are completely
separated. Because XHTML 1.1 documents are XML documents, the principles you learn will apply fully to XHTML documents, from XHTML 1.1
onwards.
This chapter teaches you:
Extensible Stylesheet Language Transformations (XSLT)
XML Path Language (XPath)
Extensible Stylesheet Language Formatting Objects (XSL-FO)
How to create an XSLT style sheet
200
Understanding XSLT
Its possible to use XHTML 1.0 pretty much as if it were simply HTML. But
to obtain full benefit from XHTMLs compliance with XML, you will need to
begin learning about aspects of XML which have no direct equivalent in
HTML or XHTML 1.0.
One of the most powerful technologies in the XML family is Extensible
Stylesheet Language Transformations, usually abbreviated to XSLT.
XML 101
Before diving straight into XSLT, you will need to understand a few points
about XML that are relevant to how XSLT works.
The XML family of technologies is very flexible, which can be very useful
because there can be many ways of achieving much the same thing and
many solutions that can be found. But flexibility, by giving various ways to
achieve the same or similar output, also can be confusing until you get a
clear picture about the advantages and disadvantages of different
approaches.
The element is a foundational part of XML. Each element that has content
has a start tag and an end tag.
For example, if you wanted to use a particular piece of text in an XSLT
style sheet, you might use this element:
<xsl:text>To be or not to be?</xsl:text>
EXAMPLE
The start-tag is <xsl:text>. The end-tag is </xsl:text>. And, not surprisingly the content is To be or not to be?
When a tag is nested within another tag, the outer tag is said to be the parent and the contained tag is said to be the child. For example, in this simple example XML document, the <chapter> element is the parent of the
<paragraph> element:
<?xml version=1.0?>
<chapter>
<paragraph>The paragraph element is a child of the chapter element.
</paragraph>
</chapter>
Each XML document is allowed to have only one root elementan element
within which every other element in the document is properly nested. In
the example, which you have just seen, the root element is the <chapter>
element.
But the root element (or element root) is not the root of the whole document.
Outside of the root element but still within the document is, for example,
Understanding XSLT
201
the prolog of the XML document, consisting of the XML declaration (where
present), processing instructions, the DOCTYPE declaration (where present),
and any comments you might choose to include.
All the content of an XML document is viewed, from the perspective of XML
1.0, as being contained within the document entity. The root element is a
child of the document entity.
Sometimes the term root is used in an unqualified way to refer to either the
document entity or the root element. Because, as you will see later in the
chapter, it is important in navigating through XML documents to know
exactly where you are, you need to be sure where any particular element is
located relative to the document entity and the root element.
The prolog of an XML document precedes the root element but is contained
within the document entity. The prolog might contain an XML declaration,
processing instructions, comments, and the DOCTYPE declaration. It is in the
prolog that you can use a processing instruction to attach an XSLT style
sheet to an XML or XHTML document, as you will see later.
There is very much more to XML 1.0 than has been mentioned here, but
this basic information is needed for you to understand the discussion of
XSLT, Xpath, and XSL-FO which follows.
The full text of the Extensible Markup Language (XML) 1.0
Recommendation can be viewed at
http://www.w3.org/TR/1998/REC-xml-19980210
202
Part of the work on XSL transformations took account of the fact that there
was a need to navigate around XML documents, and it also was seen that
another W3C activity (XPointer) had similar navigational needs. Rather
than produce two standards, work on those navigational needs was
focussed into an initiative from which emerged the XML Path Language as
a W3C Recommendation on the same day that the XSLT was released.
The full XPath Recommendation, XML Path Language (XPath) Version 1.0,
can be viewed at
http://www.w3.org/TR/1999/REC-xpath-19991116
What Is XSL?
It might seem pretty odd to ask what XSL is in a chapter like this, but different people have used and are using the term to refer to different things.
If you are to avoid the possibility of getting very confused and wasting quite
a bit of time trying to work out why you are having problems, you need to
be aware of that difference in usage of the term XSL. The different types of
XSL will work together in some circumstances but not in others. Knowing
which meaning of XSL applies in a particular context therefore will help
you to avoid problems, either in your thinking or, sometimes, when one version of XSL will simply not work at all or fails to work as you expect it to.
There are basically four different ways in which the term XSL is used:
As a combination of XSLT and XSL-FO
As XSL-FO without XSLT
As Microsoft proprietary XSL, that is nonW3C-compliant XSL
Ambiguously or indiscriminately to refer to any one or combination of
the first three
Part of the potential for confusion arises from variations in terminology by
the W3C over time. For example, one W3C document implies XSLT can be
used separately from XSL, whereas others state that XSLT is part of XSL.
Part of the problem in unambiguously describing this area arises because
XSLT and XSL-FO will, when XSL-FO is finalized, operate very closely
together.
To complicate things further, there are several versions of Microsoft XSL
available currently, each with varying degrees of compliance with the W3C
standards. The Microsoft XSL originally supplied with Internet Explorer
5.0 is now obsolete. At the time of this writing, the July 2000 Preview
Release was current and was significantly closer to W3C standards than
Understanding XSLT
203
previous flavors of Microsoft XSL had been. Microsoft have stated that their
XSL will be made fully compliant with W3C standards, but that has not yet
been achieved.
In this chapter, the terms XSLT and XSL-FO are used to refer, respectively,
to the standards expressed in the W3C Extensible Stylesheet Languages
Transformations Recommendation and the Extensible Stylesheet Language
Working Draft. XSLT transforms the source XML document into an output
document (for example, in another dialect of XML, XTHML). XSL-FO
applies the formatting information to the transformed document, controlling how it is displayedwhether in a desktop browser, in a mobile
browser, or on paper.
Because of the ambiguity surrounding the term XSL, it wont be used
(without explanation) in the remainder of this chapter.
204
XSLT Processors
An XSLT processor is a piece of software that can transform an XML source
document into a result or an output document. The result or output document can be another XML document (to which formatting objects might be
applied, but need not be), an XHTML or HTML document, or a plain text
document.
XSLT processors are routinely combined with an XML parser (sometimes
called an XML processor), but increasingly XSLT processors also are combined with other software modulesfor example, an XML editor in development environments that are more or less well integrated.
In some circumstances the result document might be saved to disc, such as
when a static XHTML page is created from an XML source file. On other
occasions the result document will be created dynamically and displayed to
a user but never exist other than in memory and onscreen.
Several XSLT processors are freely available for download. If you are using
a Windows 9x PC, an XSLT processor called Instant Saxon is particularly
easy to download and install. A full Java-based version of Saxon that can be
installed on most PC platforms also is available for download without
charge.
Instant Saxon can be downloaded from
http://users.iclway.co.uk/mhkay/saxon/instant.html
EXAMPLE
XSLT processors differ in how they associate a style sheet with an XML
file. One way is to include an <?xml-stylesheet?> processing instruction
within the prolog of the XML document. If you do this using Internet
Explorer 5.0, the style sheet will be associated with the XML document. In
addition (assuming the style sheet is situated in the location indicated by
the processing instruction and is a legitimate style sheet), the source XML
document will be transformed according to the transformations contained
within the style sheet. Early Preview Releases of Netscape 6 have not contained XSLT functionality.
Understanding XSLT
205
To associate a CSS style sheet with an XML document, you might use a
command something like
<?xml-stylesheet href=myfirst.css type=text/css?>
To associate an XSLT style sheet with an XML file, you would use a command something like this:
<?xml-stylesheet href=myfirst.xsl type=text/xml?>
NOTE
Further details of how to use the <?xml-stylesheet?> processing instruction are
found in the W3C Recommendation titled Associating Style Sheets with XML
Documents Version 1.0, which can be accessed at
http://www.w3.org/1999/06/REC-xml-stylesheet-19990629
EXAMPLE
Other XSLT processors can associate a style sheet with an XML file from
the command line. For example, using Instant Saxon to associate an XSLT
style sheet myfirst.xsl with an XML document called myfirst.xml and produce an XHTML file called myfirst.html, you would issue the following
command:
saxon myfirst.xml myfirst.xsl > myfirst.html
If you are using a different XSLT processor, the syntax might differ.
To extract or make use of the logical structure contained within a wellformed or valid XML document, it is necessary for the XSLT processor,
which will typically include an XML parser, to create a tree-like structure
of nodes in memory which captures the logical content of the XML document. Remember that the XSL style sheet is itself an XML document so it,
too, is transformed into a hierarchical tree-like structure in memory.
CAUTION
If you are familiar with the Document Object Model, you might notice that the XSLT
object model is very similar. However, you should be aware that the DOM and XSLT data
models are not identical. Detailed consideration of the differences is beyond the scope
of this chapter.
Further information on the DOM Level 1 can be viewed at http://www.w3.org/TR/
1998/REC-DOM-Level-1-19981001.
DOM Level 2 is currently being drafted by the W3C. Further details can be found at
http://www.w3.org/TR/2000/CR-DOM-Level-2-20000510.
The XML parser creates from the stream of characters in each XML document (the source XML document and the XSLT style sheet) a tree-like
structure in memory. The tree-like branching structure contains a
206
Namespaces in XML
Increasingly, the XML family of technologies are interdependent and work
together. One of the necessary technologies to make XSLT work involves
using XML namespaces. We need to take a quick look at what namespaces
are and how they work.
One of the innovations in XHTML 1.1 is the use of modules of elements and
the ability to create new elements for use in XHTML documents. You will
learn more about this in Chapters 14 through 16. This creates the potential
for confusion if two providers of modules happen to use the same element
name.
Understanding XSLT
207
XML, and therefore XHTML, solves this problem by the use of namespaces.
Perhaps the easiest way to think of this is to think how human beings solve
a similar problem.
Depending where you live in the world, you might know several people
named Andrew. In most Western countries the way to distinguish one
Andrew from another is very simpleyou can use the surname to distinguish Andrew Smith from Andrew Patterson from Andrew Watt. For most
practical day-to-day purposes that removes the likelihood of confusing one
person named Andrew with another.
XML uses a similar mechanism to solve the problem of potentially confusing one XHTML or XML element with another similarly named element
coming from different modules but which are used within the same XML or
XHTML document.
EXAMPLE
EXAMPLE
<xsl:stylesheet
xmlns:xsl=http://www.w3.org/1999/XSL/Transform
version=1.0>
208
209
XPath Nodes
As mentioned in the discussion earlier in this chapter, an XML document is
converted when loaded into memory and parsed into a tree-like structure of
nodes.
XPath allows for seven different types of node within a tree:
Root nodesThere is only one of these in an individual document,
which corresponds to the document entity of the XML 1.0 specification. A root node can have as its children element nodes, processing
instruction nodes, and comment nodes.
Element nodesThere is an element node for each element in the
source XML document. One element node, the element root, is a child
of the root node in a well-formed XML document.
Text nodesCharacter data is grouped into text nodes. As much
character data as possible is grouped into each text node.
Attribute nodesEach element node has an associated set of
attribute nodes; the element is the parent of each of these attribute
nodes. However, an attribute node is not a child of its parent element.
Namespace nodesThis can be envisaged as a representation of the
Namespace Declaration. An element node will have a namespace node
for each namespace declaration which is in scope.
Processing instruction nodesThere is a processing instruction
node for every processing instruction in a source document, except for
those processing instructions which may be contained in a document
type declaration.
210
When processed in memory, this very simple XML document would have a
root node (representing the invisible document entity), which would have
two child nodes: a comment node representing the comment that forms the
second line of the code and an element node representing the <chapter> element. In addition, a text node representing the content of the <chapter>
element would be present as a child of the element node, which represents
the <chapter> element.
Location Paths
The XPath term for the equivalent of street directions is location path. A
location path can be of one of two formsan absolute location path, which
takes its starting point as the root node of the tree or a relative location
path, which takes the context node as its starting point. These will be compared in more detail a little later in this chapter.
A location path consists of one or more location steps separated by a slash
(/). A location step has three parts:
An axisSpecifies the tree relationship between the nodes selected
by the location step and the context node. Example axes are parent
and child.
A node testSpecifies the node type and expanded-name of the
nodes selected by the location step.
Zero or more predicatesUse arbitrary expressions to further
refine the set of nodes selected by the location step.
A location path returns the node set selected by the location path (a location path is a special type of XPath expression). Location paths also can
return Boolean values, numbers, or strings in some circumstances.
Detailed consideration of location steps, XPath functions, and so on is
beyond the scope of this chapter.
211
<?xml version=1.0?>
<book title=XHTML By Example>
<!-- Some other chapters would be listed here -->
<chapter title=XSL: Style the XML Way>
<section title=XSLT - Transformation></section>
<section title=XPath - XML Path Language></section>
<section title=XSL-FO - XSL Formatting Objects></section>
<section title=Creating an XSL Stylesheet></section>
</chapter>
<!-- Some more chapters would be listed here -->
</book>
If the current context node was <book> then to find a <chapter> element,
which is a child of the <book> element, we could use the following XPath
expression:
child::chapter
212
Because in our simplified document all children of the <book> element are,
in fact, <chapter> elements, if the context node was the node corresponding
to the <book> element, we could achieve the same thing by writing
child::*
in unabbreviated syntax, or
*
in abbreviated syntax. Yes, thats right; a simple asterisk will choose all
direct children of the context node. Note that the * does not select the children of such child nodes.
Similar to the XPath expressions for selecting elements, there are corresponding expressions to select attributes, which will not be detailed here.
EXAMPLE
So far in this chapter you have seen relative location paths, that is, relative
to a context node. However, XPath expressions also allow for absolute
addressing. Another way to look at absolute location paths is that they, too,
are relativebut relative to the root node of the XSLT/XPath tree. You will
see some examples of XPath expressions that use absolute location paths
using the previous example:
<?xml version=1.0?>
<book title=XHTML By Example>
<!-- Some other chapters would be listed here -->
<chapter title=XSL: Style the XML Way>
<section title=XSLT - Transformation></section>
<section title=XPath - XML Path Language></section>
<section title=XSL-FO - XSL Formatting Objects></section>
<section title=Creating an XSL Stylesheet></section>
</chapter>
<!-- Some more chapters would be listed here -->
</book>
XSL-FOFormatting Objects
213
or
/child::book/child::chapter/child::section
Both the abbreviated and unabbreviated syntax would select all four of the
<section> elements that are present in our document.
To make a more precise selection of just one of the <section> elements, we
could use an XPath expression to make a selection by specifying the title
attribute of a particular <section> element:
/book/chapter/section[title=XPath - XML Path Language]
or
/child::book/child::chapter/child::section[attribute::title=XPath - XML Path
Language]
Note that selecting an element on the basis of the value of one of its attributes is not the same as selecting the attribute.
Some straightforward usage of XPath will be illustrated later in the chapter when you see some example XSLT style sheets.
XSL-FOFormatting Objects
The Extensible Stylesheet Language Transformations Recommendation was
issued by W3C in November 1999. Currently, W3C is working on an
Extensible Stylesheet Language Working Draft, which focuses largely on
XSL Formatting Objects (XSL-FO).
The W3C work on XSL-FO aims to capture all the functionality of
Cascading Style Sheets for use with XML documents on the Web but also
intends to add many layout facilities to allow use of XML documents not
only on the Web but also in print. By separating content from presentation
this becomes possible.
When an XSLT processor is applied, with an appropriate XSLT style sheet,
to an XML document a transformation of the source document occurs, producing an output or result tree.
That same process is carried out when using XSL Formatting Objects by
including such formatting objects in the tree output by the XSL
Transformation.
A second step then takes place called formatting which, not surprisingly, is
carried out by an XSL formatter. In this context XSL refers to a combination of XSLT and XSL-FO included in the output tree.
214
The most recent W3C Working Draft summarizes the process as follows:
Formatting is enabled by including formatting semantics in the result
tree. Formatting semantics are expressed in terms of a catalog of
classes of formatting objects. The nodes of the result tree are formatting objects. The classes of formatting objects denote typographic
abstractions such as page, paragraph, table, and so forth. Finer control over the presentation of these abstractions is provided by a set of
formatting properties, such as those controlling indents, word- and
letter-spacing, and widow, orphan, and hyphenation control. In XSL,
the classes of formatting objects and formatting properties provide the
vocabulary for expressing presentation intent.
What that means is that a formatter will obtain information from nodes in
the result tree that are formatting objects to provide the framework for the
presentation of the output XML document and that smaller scale, fine control of detail in the output document is provided by formatting properties.
In principle, a formatter will be able to create output for traditional desktop
Web browsers, mobile browsers, and output on paper.
Typically, many of the nodes in the result tree will be in the formatting
objects namespace.
Each formatting object represents part of the specification of the output, for
example, layout and style.
The Working Draft summarizes the formatting process as follows:
Formatting consists of the generation of a tree of geometric areas,
called the area tree. The geometric areas are positioned on a sequence
of one or more pages (a browser typically uses a single page). Each
geometric area has a position on the page, a specification of what to
display in that area and may have a background, padding, and borders. For example, formatting a single character generates an area
sufficiently large enough to hold the glyph that is used to present the
character visually and the glyph is what is displayed in this area.
These areas may be nested. For example, the glyph may be positioned
within a line, within a block, within a page.
If you had difficulty understanding the ideas involved in an XSLT output
tree of nodes, you might find the idea of a tree of geometric areas a little
overwhelming. Broadly, it is helpful to visualize the geometric areas nodes
as reserved areas on the output page that are set aside for particular parts
of the output tree to be displayed.
215
The current Working Draft describing XSL-FO is many times larger than,
for example, the XSLT 1.0 Recommendation, and detailed consideration of
its contents is beyond the scope of this chapter. I estimate that the current
Working Draft is equivalent to a printed document of perhaps 500 pages!
Thus, you will realize that this section is a very compressed account of a
topic that would require a substantial book simply to describe it.
If you want to explore some of the detail of the current Working Draft, you
can find it at
http://www.w3.org/TR/2000/WD-xsl-20000327/
At the present time Web browsers do not support XSL-FO, and are unlikely
to do so for some time after the XSL-FO Recommendation is finalized.
216
CAUTION
If you find an XSL style sheet that includes the text
xmlns:xsl=http://www.w3.org/TR/WD-xsl in the namespace declaration attribute
of an <xsl:stylesheet> or <xsl:transform> element, the style sheet is using an outdated proprietary form of Microsoft XSL, which deviates from the W3C standard.
Updates to a more conformant version of MSXML can be downloaded from the
www.microsoft.com Web site.
The current form of the namespace declaration is
xmlns:xsl=http://www.w3.org/1999/XSL/Transform
So, the basic skeleton of an XSLT style sheet looks like this:
<?xml version=1.0 ?>
<xsl:stylesheet
xmlns:xsl=http://www.w3.org/1999/XSL/Transform
version=1.0>
<!-- The meat of the stylesheet will go here -->
</xsl:stylesheet>
This style sheet is well-formed XML. Unfortunately, it does absolutely nothing. But the positive side of that is that it produces no errors either. So,
now you can give thought to putting useful content into your first style
sheet.
EXAMPLE
217
Figure 12.1: The XML document viewed in Internet Explorer 5.0 before any
transformation (other than IE 5.0s default style sheet).
Nor do we really want to see the start- and end-tags of the <famous_quote>
element.
Instead of simply allowing Internet Explorer to display an XML file, we will
be able to produce a neater output by using an XSL Transformation to create an XHTML file that Internet Explorer can display.
We want the output file to look something like this:
<html>
<head>
<title>
Shakespeare Quotation
</title>
</head>
<body>
<h1>To be or not to be?</h1>
</body>
</html>
NOTE
The output file might appear without any whitespace and so be considerably less readable than the XHTML that is shown here. To avoid the complexities involved in handling
whitespace, no attempt will be made here to control whitespace either in the style
sheet or the output document.
When viewed in Internet Explorer 5.0 the output, although still very simple, is a little tidier with a title in the browser window and a little basic
layout.
218
Figure 12.2: The XHTML output in Internet Explorer following transformation of the XML source document.
Here is the XSLT style sheet which produced that output:
<?xml version=1.0 ?>
<xsl:stylesheet
xmlns:xsl=http://www.w3.org/1999/XSL/Transform
version=1.0>
<xsl:template match=/>
<html>
<head>
<title>
Shakespeare Quotation
</title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match=famous_quote>
<h1><xsl:value-of select=./></h1>
</xsl:template>
</xsl:stylesheet>
To help you understand the style sheet and how it works, we will go
through key parts of it line by line.
<?xml version=1.0 ?>
219
The start tag of the <xsl:stylesheet> element declares the XSL namespace
using the official W3C URI for XSLT.
<xsl:template match=/>
The <xsl:template> element is key to the operation of many XSLT transformation sheets. The / is an XPath expression which matches the document
root. Because our source XML document, in common with all source trees,
has a document root this XSLT template is applied. Most of what the
<xsl:template> does is to output a series of lines that form an XHTML file.
<html>
<head>
<title>
Shakespeare Quotation
</title>
</head>
<body>
This completes the initial series of lines of XHTML that are output as literal text.
<xsl:apply-templates/>
The <xsl:apply-templates> element is another commonly used and important XSLT element. Essentially, what it means here is to look for all elements in the source XML document and apply any template that matches
relevant element nodes within the source document.
</body>
</html>
The closing </body> and </html> tags are output after the <xsl:applytemplates> element is processed.
NOTE
Strictly speaking, many operations carried out by XSLT processors need not be in strict
order. However, it is useful when attempting to grasp how an XSLT transformation sheet
works to think of these processes as being sequential. Discussion of when it is significant and when it is immaterial what order processing is carried out in is beyond the
scope of this chapter.
The literal output stops here, and you see a second <xsl:template>
element, which will be applied as a result of being called by the
<xsl:apply-templates> element you saw a little earlier.
</xsl:template>
<xsl:template match=famous_quote>
<h1><xsl:value-of select=./></h1>
220
These are simply closing tags for the <xsl:template> element (in this case
the one which matches the XPath expression .the <famous_quote>
element) and for the <xsl:stylesheet> element.
EXAMPLE
221
All it does is to make the color of the output text in <h1> elements red, to
align it to the right, and to underline the text.
As you can see from Figure 12.3, the Cascading Style Sheet was successfully applied.
222
Whats Next
In this chapter weve looked at some of the basic concepts that you need to
understand to use XSLT, and you have seen some simple examples of
putting those concepts into practice.
Next up in Chapter 13 well take a look at Document Type Definitions,
which define what is allowable content for XHTML documents.
13
Document Type DefinitionsThe
Syntax Rulebook
Every markup language has a set of rules that must be followed for a document to conform to the language. The XHTML language is defined in a set
of three documents known as Document Type Definitions, or DTDs. Youve
used DTDs already by incorporating the document type declaration in your
XHTML pages, and youve checked your authoring work against the DTD
using the validation service.
As XHTML begins to incorporate advanced features such as
Modularization, Web authors will hold an advantage if they are comfortable
working with DTDs. At a minimum, you should be able to read a DTD, if
not actually author your own.
This chapter teaches you:
The syntax used to write DTDs
How to declare an element
How to create an attribute-list declaration
Using Parameter Entities
How to resolve parameter entities to fully understand an element and
attribute-list declaration
226
EXAMPLE
The characters ::= should be read is defined as. A common example used to
demonstrate this is the set of letters known as vowels:
vowels ::= [aeiou]
This rule would be read as The symbol vowels is defined as the lowercase
letters a, e, i, o, and u. If you didnt care which case was used, both versions are included in the expression:
vowels ::= [aeiouAEIOU]
EXAMPLE
The first complex expression well look at is or. Consider that the symbol
Today can be defined as Monday, Tuesday, Wednesday, Thursday,
Friday, Saturday, or Sunday, depending on the day of the week. To
write that in EBNF, wed say:
Today ::= (Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday)
227
The values shown are the hexadecimal numbers representing the characters place in the ASCII charts (hex being a popular machine-read representation). The + character outside the parentheses indicates one or more of
the enclosed values. You would read this notation as follows:
White space is defined as one or more space, tab, newline, or linefeed
characters.
There will certainly be times when its more succinct to define a symbol by
what it doesnt contain than by what it does. For instance, Weekdays is
defined in a less verbose fashion by saying All days except Saturday and
Sunday instead of Monday, Tuesday, Wednesday, Thursday, Friday.
EBNF provides for these exceptions using an exclusion set. The weekdays
example would be written as follows:
weekdays ::= ( [^Saturday Sunday] | Week)
The exclusion set is bound by the bracket characters [ and ], and the caret
indicates the beginning of the set. The entire expression is predicated on
the subject (Week) having been previously defined.
Finally, we can express constraints such as one, but not the other and
either/or.
Defining Elements
Elements are the basic building blocks of the document. Before an element
can be used in a markup language, it must be declared (defined) in the
DTD. The basic syntax for the declaration looks like this:
<!ELEMENT name ContentModel>
Name, in this expression, is the name of the element being defined. The
content model is an expression of what the element can contain. With your
knowledge of HTML, and the XHTML youve learned so far, you know that
some elements can contain other elements, whereas others can contain just
text, and some cant contain anything at all. These constraints on content
are known as content models, and every element has one.
228
XML, and therefore XHTML, has four basic content model types, shown
here in Table 13.1.
Table 13.1: Content Models Found in XHTML
Content Model
Allowable Content
Empty
Any
Element
Mixed
The Empty content model says that no content is allowed in the element.
Probably the most familiar example is the img element. Its also the easiest
content model to declare. The actual img element declaration in the
XHTML 1.0 DTDs looks like this:
<!ELEMENT img EMPTY>
This one was pretty easy to read, although I will say they can get a little
more complex. Lets look at an element that has been given the Element
content model: the unordered list element (ul).
<!ELEMENT ul (li)+>
The content model expression should be familiar to you from our previous
discussion of EBNF. This declaration says that the element ul must contain
one or more instances of the li element.
The + symbol used in the ul element declaration is known as an occurrence
indicator. There are three indicators commonly used in DTDs, as seen in
Table 13.2.
Table 13.2: Occurence Indicators Used in DTDs
Occurrence Indicator
Definition
*
+
More than one element can be found in an Element-type content model. For
instance, the table row element, tr, can contain one or more of the th and
td elements. The element declaration for tr then, would be written as
follows:
<!ELEMENT tr (th|td)+>
229
The vertical bar character between the two elements indicates that the document author has a choice of which to use, and the + occurrence indicator
says that at least one of the elements must occur, though you can have
more.
NOTE
The occurrence indicator operates on the whole of the expression bound by the parentheses. So when evaluating (th|td)+, it can be easier to work from the outside in. We
could read that the tr element must contain one or more elements, and those elements might be th or td.
Content models can contain more than one of the indicators found in Table
13.2 when the mix of elements becomes more complex. Consider the
XHTML 1.0 Transitional declaration for the table element:
EXAMPLE
<!ELEMENT table
(caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))>
Theres quite a lot going on here, but we can take it apart piece by piece to
fully understand the expression.
First, the comma-delimited list of elements is known as a sequence. Each of
the elements in the sequence must appear within the element being defined
(except when modified by an occurrence indicator with a zero option).
Reading left to right, the content model for table becomes:
An optional caption element, one or more of the col or colgroup elements if so desired, a thead element if desired, a tfoot element if
desired, and finally at least one tbody element or at least one tr
element.
Looking closely, the only required elements here are a single tbody or tr
element; all others are optional.
CAUTION
Before worrying about where the td elements are in this expression, think about what
were defining: the elements that might be or are required to be contained within the
table element. The td element isnt contained within the table element itself;
instead, its contained within the tr element (which is the element contained within
table). Because td is contained in something else, its not mentioned here. Only the
first level sub-elements are defined.
Creating Attributes
Elements alone arent expressive enough to fill the needs of document
authors. A secondary data structure, the attribute, is used to provide additional information about the individual element instance. A common
230
EXAMPLE
After the declaration begins (with the <!ATTLIST string), the element where
the attributes will be used is noted, followed by the attribute name, the
attribute value type, and then any necessary keywords. Well take this
apart piece by piece.
Attributes, like elements, each have a content model, although in this case
theyre known as attribute value types. There are four major types used in
XHTML, which well define next (see Table 13.3).
Table 13.3: Major Attribute Value Types in XHTML
Attribute Value Type
Sample
String Types
Tokenized Types
Enumerated Types
Entities
A string of characters
NMTOKEN, ID, or IDREF
A list of fixed values
An entity defined elsewhere (for example, a URI)
231
TOKENIZED TYPES
A token is another name for a label. In XHTML, instances of certain elements need to be identified, so a label is required. Consider the input elements in forms.
NMTOKEN
The name attribute is used to identify and bind the value of the input into a
name=value pair for processing by the CGI script. The value of the name
attribute is a label, or token. When used in this manner, the attribute value
is referred to as an NMTOKEN, or name token.
The strings that make up a name token are restricted by three rules:
1. It must begin with a letter.
2. Only letters, digits, and the characters - (dash), _ (underscore), :
(colon), and . (period) can be used.
3. The first three characters cannot be the series xml, in any case variant (for example, not xml, XML, xMl, and so on).
ID AND IDREF
There are two additional tokenized types, which can identify individual
elements or work in concert with each other: ID and IDREF. Whereas the
NMTOKEN type can provide a name for an element, there is no requirement
that the name be unique. Uniqueness isnt desirable when youre trying to
match up a set of check boxes on that form.
When you do need a unique identifier, the ID attribute can provide that.
The naming rules are the same as for NMTOKEN, with the added constraint
that the name must be unique within the document.
The ID and IDREF types can be used together to provide a logical connection
to two portions of a document. In a report, if you were to provide footnotes
or bibliographic references at the end of the document, on paper youd simply use a superscript number at the point of reference, and then the number again when writing out the note. If you were to create your own
elements for this, you might choose an info element and corresponding
note element. Attributes for each would need to hold the unique identifier.
The markup might look like this:
EXAMPLE
232
Although in this example Ive chosen to use a number for the value of the
reference attribute, the attribute is defined as type ID, so any legal ID
string would be possible there.
NOTE
You might have noticed that the attributes that hold the ID and IDREF values have different names; thats okay! Whats important is the matching of the attribute values
when the first has the type ID and the second IDREF.
ENTITIES
Well talk about entities in more detail later in this chapter (see Parameter
Entities and Planning for Global Entities and Attributes). At this point,
you need to know that general entities represent additional content or
objects. For instance, a URI is an entity, so the src attribute on the img element is an entity attribute type.
EXAMPLE
233
Figure 13.1: The first elements and attributes defined are for major
structural components.
The root element for XHTML is defined thusly:
<!ELEMENT html (head, body)>
<!ATTLIST html
%i18n;
xmlns
%URI;
>
#FIXED http://www.w3.org/1999/xhtml
The html element must contain a single head element and a body element.
It can take attributes defined by the i18n entity (i18n stands for internationalization, an i with 18 additional letters before the final n), and the
xmlns attribute.
Each additional section is well commented, both with dividing comments
blocking out related sections (as seen in Figure 13.1), and with additional
explanation or rationale provided to aid the human readability of the DTD
document.
Entities, as used in the declarations shown here, are an important part of
large DTDs. They deserve close inspection, and well walk through the resolution of an entire set of entities used in defining an element and attribute
list in this next section.
234
Parameter Entities
Parameter entities are entities that are used only within the DTD that
defines them. Most often, theyre used as a form of shorthand. Instead of
writing out attribute definitions repeatedly in the various elements that
might contain them, a parameter entity is defined and then used for those
definitions.
For example, the align attribute is used in many text-containing elements:
headings, paragraphs, and so on. A parameter entity TextAlign is defined,
acting as a replacement for the full att-list declaration:
EXAMPLE
Then any time the align attribute is needed (for text alignment issues), the
entity is used, prefaced by the % symbol, instead of the actual declaration,
as is done here with the ATTLIST for the h1 element:
<!ELEMENT h1 %Inline;>
<!ATTLIST h1
%attrs;
%TextAlign;
>
As you can see, there are two other parameter entities in use in this element and att-list declaration: Inline and attrs. Parameter entities are
used throughout XHTML to improve the readability of the DTD. At the
same time, however, you need to understand all that the PEs represent
when theyre used in this manner. Lets expand each of the entities to see
what the declarations would have looked like without them.
The first entity is Inline, which represents the content model for the h1 element. Inline was defined as follows:
<!ENTITY % Inline (#PCDATA | %inline; | %misc;)*>
The Inline entity provides a content model only, a choice between zero or
more of #PCDATA, or the models represented by the Inline or misc entities.
Replacing Inline with its expansion, the declaration for h1 becomes
<!ELEMENT h1 (#PCDATA | %inline; | %misc;)*>
235
The phrase, inline.forms, and misc entities also are found in this location:
<!ENTITY % phrase em | strong | dfn | code | q | sub | sup |
samp | kbd | var | cite | abbr | acronym>
<!ENTITY % inline.forms input | select | textarea | label | button>
<!ENTITY % misc ins | del | script | noscript>
Going through the same exercise as with the element declaration, well
expand each of these entities, beginning with coreattrs:
<!ENTITY % coreattrs
id
ID
class
CDATA
#IMPLIED
#IMPLIED
236
style
title
>
%StyleSheet;
%Text;
#IMPLIED
#IMPLIED
TIP
Though DTDs dont have the power to constrain NMTOKEN any further than its initial
naming rules, a comment in the DTD indicates that the token should conform to the list
of language codes, which are defined in RFC1766.
#IMPLIED
#IMPLIED
style
CDATA
title
CDATA
lang
NMTOKEN
xml:lang NMTOKEN
dir
(ltr|rtl)
%events;
%TextAlign;
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
>
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED
237
238
Finally, the TextAlign attribute is resolved and then incorporated into the
ATTLIST declaration:
<!ENTITY % TextAlign align (left|center|right) #IMPLIED>
The final, fully resolved element and ATTLIST declaration for the h1 element
become
<!ELEMENT h1 (#PCDATA | a | br | span | bdo | objects | applet | img | map |
iframe | tt | i | b | big | small | u | s | strike | font | basefont | em |
strong | dfn | code | q | sub | sup | samp | kbd | var | cite | abbr | acronym |
input | select | textarea | label | button | ins | del | script | noscript )>
<!ATTLIST h1
id
ID
#IMPLIED
class
CDATA
#IMPLIED
style
CDATA
#IMPLIED
title
CDATA
#IMPLIED
lang
NMTOKEN
#IMPLIED
xml:lang NMTOKEN
#IMPLIED
dir
(ltr|rtl)
#IMPLIED
onclick
CDATA
#IMPLIED
ondblclick CDATA
#IMPLIED
onmousedown CDATA
#IMPLIED
onmouseup
CDATA
#IMPLIED
onmouseover CDATA
#IMPLIED
onmousemove CDATA
#IMPLIED
onmouseout CDATA
#IMPLIED
onkeypress CDATA
#IMPLIED
onkeydown
CDATA
#IMPLIED
onkeyup
CDATA
#IMPLIED
align
(left|center|right) #IMPLIED
>
The same process that weve stepped through here with the h1 element can
be undertaken for any element or attribute list found in the XHTML DTDs,
or any DTD for that matter.
Whats Next
239
Whats Next
In this chapter you have learned how to read DTDs that use the Extended
Backus Naur Form notation. You know how to write declarations for both
elements and attribute lists. Entities are used as a powerful means to organize and shorten element and attribute declarations.
Next, in Chapter 14, XHTML Modularization, well take a look at another
means of expressing a languages rules: XML schemas.
Part III
Modularization
14 XHTML Modularization
15 Creating a Custom XHTML Module
16 Combining Custom Modules with XHTML
14
XHTML Modularization
Theres no doubt that HTML has been embraced as a powerful means of
producing documents for the World Wide Web. Its success was fed by the
myriad of new features introduced over each successive version. Even so, it
was still very much limited by the elements and attributes defined in the
HTML Recommendations. Web authors clearly saw the need for more freedom in defining structures and behaviors for those structures beyond the
bounds of HTML.
This chapter teaches you:
How modularization works
How to group elements into abstract modules
How to understand the DTD implementation of an abstract module
How to combine predefined modules into a new DTD
244
TIP
A module is not required to have an abstract definition to conform to the XHTML
Modularization Recommendation. However, module authors are strongly encouraged to
provide one to lower the learning curve required to adopt the module and to provide a
quick reference for document authors who dont need to delve into the DTD or schema
definition of the module.
245
Attributes
Minimal Content
Model
dl
Common
Common
Common
Common
Common
Common
(dt|dd)+
(PCDATA|Inline)*
(PCDATA|Inline)*
li+
li+
(PCDATA|Inline)*
dt
dd
ol
ul
li
This module also defines the content set list with the minimal content
model (dl|ol|ul)+ and adds this set to the Flow content set of the
Text Module.
The prose at the beginning of this abstract is very clear: The module contains the elements in XHTML used to create lists, including definition,
ordered, and unordered lists. The table contains each of the element members of the module, the attributes allowed on each, and the minimal content
model for the element.
READING
THE
246
Description
Character
Charset
Charsets
Color
ContentType
ContentTypes
Datetime
247
Datatype
Description
FrameTarget
A name of an XHTML Frame, used as the destination for loading a document or other actions to be performed within that
frame.
A code representing a language, as defined in RFC1766 (for
example, en for English, fr for French).
A measure, either in pixels or a percentage of available space.
One of, or a list of, names from a defined set of link types.
See Appendix A for the complete list and their conventional
meanings.
A comma-delimited list of media descriptors (see Appendix A
for the list of recognized descriptors).
A length as previously defined, or a relative length. Relative
lengths are expressed as i*, where i is an integer and the
* character allocates an amount of space proportional to i
when evaluated against the other integers present. For
example, 1*, 3* sets up a 1 to 3 ratio of space.
One or more digits.
A numeric value representing a number of pixels.
Data that must be passed on to a script engine and not
parsed as XHTML.
Any text-based data.
A Uniform Resource Identifier as defined in RFC1738.
A space-delimited list of Uniform Resource Identifiers as
defined previously.
LanguageCode
Length
LinkTypes
MediaDesc
MultiLength
Number
Pixels
Script
Text
URI
URIs
Looking back at the List Module definition, youll notice that each of the
allowable attribute entries references an attribute collection: Common.
Common is actually a collection of collections, comprised of the Core,
Events, I18N, and Style collections. These four collections are defined,
along with their data types, in Table 14.2.
Table 14.2: The Attribute Collections That Make Up the Common Collection
Collection
Attributes
Core
I18N
Events
Style
xml:lang (NMTOKEN)
onclick (Script), ondblclick (Script), onmousedown (Script),
onmouseup (Script), onmouseover (Script), onmousemove (Script),
onmouseout (Script), onkeypress (Script), onkeydown (Script),
onkeyup (Script)
style (CDATA)
248
For a complete review of DTD syntax, see EBNF: The Syntax of DTDs, p. 226.
Taking the first element of the List Module, dl, the minimal content model
is defined as (dt|dd)+. This says that the dl element is required to contain
at least a dt element and/or a dd element.
The + occurrence indicator shows that one or more sets of dt and dd elements might be contained within dl.
Minimal content models also might reference collections of elements known
as content sets, just as the set of allowable attributes is often defined by a
collection. The minimal dt element in the List Module uses just such a collection when it is defined as (PCDATA|Inline)*. This model says that the dt
element can contain processed character data and any elements contained
within the Inline content set, either of which might be found zero or more
times within the dt element.
The Inline content set is defined in the prose of the Text Module as the following elements: abbr, acronym, br, cite, code, dfn, em, kbd, q, samp, span,
strong, and var (see Appendix A or http://www.w3.org/TR/xhtml1/
xhtml-modularization-20000705.html#s_textmodule).
A link to the DTD implementation of each abstract module is found at the
end of the prose description. Well continue to explore the List Module by
reviewing its implementation DTD.
NOTE
It is the intention of the W3C to explore defining modules in XML Schema, but until that
work has reached Recommendation status, the XHTML Modularization documents are
written using DTDs.
249
THE
LIST MODULE
250
Next are the PUBLIC and SYSTEM identifiers used for the module, along with
any revisions to those identifiers:
This DTD module is identified by the PUBLIC and SYSTEM identifiers:
PUBLIC -//W3C//ELEMENTS XHTML Lists 1.0//EN
SYSTEM xhtml-list-1.mod
Revisions:
(none)
....................................................................... -->
Next, the qualified names for each element presented in the module are
defined in a parameter entity named after the element name suffixed by
.qname, for example, dl.qname for the parameter entity representing the element name dl:
<!ENTITY
<!ENTITY
<!ENTITY
<!ENTITY
<!ENTITY
<!ENTITY
%
%
%
%
%
%
dl.qname
dt.qname
dd.qname
ol.qname
ul.qname
li.qname
dl
dt
dd
ol
ul
li
>
>
>
>
>
>
The next section reduces the declaration for the dl element to a singleparameter entity:
<!-- dl: Definition List ............................... -->
<!ENTITY % dl.element INCLUDE >
<![%dl.element;[
<!ENTITY % dl.content ( %dt.qname; | %dd.qname; )+ >
<!ELEMENT %dl.qname; %dl.content; >
<!-- end of dl.element -->]]>
These entities can be read fairly easily if you approach them much like an
algebra problem and work your way out from the inside. First, consider the
dl.content parameter entity. Its definition holds two .qname parameter
entities, for the dt and dd elements. The syntax surrounding those entities
should be familiar to you. If expanded, it reads:
<!ENTITY % dl.content (dt|dd)+ >
251
252
253
254
EXAMPLE
....................................................................... -->
SKELETAL DTD ......................................................... -->
file: TEMPLATE.dtd
SKELETAL DTD
This is a skeletal driver file. Modify it however you want, paying
careful attention to the embedded comments about order.
Please use this formal public identifier to identify it:
-//W3C//DTD XHTML-MYDTD//EN
-->
<!ENTITY % XHTML.version -//W3C//DTD XHTML-MYDTD//EN >
<!-- Reserved for use with the XLink namespace:
-->
<!ENTITY % XLINK.ns >
<!ENTITY % XLinkns.attrib >
255
256
By taking this skeletal DTD file apart, we can explore how this driver file
glues together the various module implementations to create the new
257
XHTML Family Markup Language. Well take the core modules required of
all XHTML Host Languages, and add the W3C-defined Basic Tables
Module to come up with the Tables Markup Language.
The first edits to be made include the name of the file, the formal public
identifier used to identify your new language, and the namespace that corresponds to it:
<!-- ....................................................................... -->
<!-- SKELETAL DTD ......................................................... -->
<!-- file: TableML.dtd
-->
<!-- SKELETAL DTD
-->
<!-- This is a skeletal driver file. Modify it however you want, paying
careful attention to the embedded comments about order.
Please use this formal public identifier to identify it:
-//WEBGEEK//DTD XHTML-TABLEML//EN
-->
<!ENTITY % XHTML.version -//WEBGEEK//DTD XHTML-TABLEML//EN >
Were not going to be defining any new modules in this markup language,
so the section on modifying the document model can just be left alone.
Finally, its time to add the Basic Tables module into the DTD. We do so
just after the comment that says Your modules can be included here:
<!-- Your modules can be included here. Use the basic form defined above, and
be sure to include the public FPI definition in your catalog file for
each module that you define. You may also include W3C-defined modules at
this point.
-->
The entry for the Basic Tables module follows the same form as the
others. The pertinent details can be found in the implementation of
that module, which is found in the Modularization document at
http://www.w3.org/TR/2000/PR-xhtml-modularization-20000705/
dtd_module_defs.html#sec_F.3.5:
<!-- Basic Tables Module .......................................... -->
<!ENTITY % xhtml-basic-table-1.module INCLUDE >
<![ %xhtml-basic-table-1.module;[
<!ENTITY % xhtml-basic-table-1.mod
PUBLIC -//W3C/ELEMENTS XHTML Basic Tables 1.0//EN
xhtml-basic-table-1.mod
%xhtml-basic-table-1.mod;]]>
258
259
260
EXAMPLE
Whats Next
261
Whats Next
In this chapter you have learned how the features of XHTML were divided
into logical groups of elements that shared the same semantics. Each
atomic group forms an abstract module, defined by a brief prose section
outlining its use, and a table of elements, their allowable attributes, and
minimal content models. Each abstract module has a corresponding DTD
implementation, which is combined with other module DTDs to create the
final DTD used to represent the new language.
Next, in Chapter 15, Creating a Custom HTML Module, youll dive deeper
into the world of Modularization and create your own abstract module and
its corresponding DTD implementation. You will plan the content model,
define qnames, parameter entities, and the element and attribute-list declarations.
15
Creating a Custom XHTML Module
The power of the Extensible Hypertext Markup Language certainly lies in
its extensibility. XML offers document authors the ultimate extensibility in
allowing them to create entirely new markup languages using whichever
design principles they find the most compelling. For those of us who just
need to get some work done, however, re-inventing the wheeleven if it
meant doing so in a way that we liked better than how the originators did
ituses an awful lot of time and energy.
So, well use the power of extensibility and the comfort of HTML to create
new document types for our own purposes. An online recipe archive presents just this sort of challenge. HTML provides the basic text features
well want to use, yet we need the additional structure of XML for segments
of the recipes themselves to allow for easy searching, storage, and other
data management activities that might be performed on a large archive. To
handle this, well spend this chapter creating a customized recipe module
that will be combined with a basic set of XHTML features to create a new
DTD.
This chapter teaches you:
How to organize your data storage needs
How to write an abstract module definition
How to use parameter entities to manage namespaces
How to define elements and attributes in a Declaration sub-module
264
ingredients
ingredient 1
I-2
preparation
I-3
equipment
temperature
directions
265
store it. Early assumptions about how youll handle data can result in needing to rework your content model and/or DTD much later in the game when
its more difficult to do so.
all-purpose flour
The ingredient can be divided into three parts: quantity (2), a unit of measure (cups), and the item itself. All three of these pieces will need to be
recorded in a potential ingredient element.
The next major segment of a recipe is the preparation instructions. What
kind of information might appear there? Certainly youll have text, so well
need to allow for paragraphs, perhaps lists, and other text structures. This
suggests wed want to allow Block XHTML elements to appear within a
prep element.
Next comes the table of elements, attributes, and minimal content models:
Element
Attributes
recipe
Common, (CDATA),
title, category
Common
Common, quantity,
unit, item
Common
ingredients, prep
ingredients
ingredient
prep
ingredient+
EMPTY
(PCDATA|Flow)+
When this module is selected, it adds the recipe element to the Block content set, and it defines the content set Recipe with a minimal content model
of ingredients, prep, and adds it to the Inline content set as these are
defined in the Text Module.
266
The QName sub-module begins with a commented section that details the
module name, its filename, the public and system identifiers used to reference it, and the namespace declaration. Putting each of these in place, our
QName sub-module begins like this:
<!-- ...................................................................... -->
<!--Recipe Qname Module
............................................... -->
<!-- file: recipe-qname-1.mod
PUBLIC -//WebGeek//ELEMENTS XHTML Recipe Qnames 1.0//
SYSTEM http://www.webgeek.com/DTDs/recipe-qname-1.mod
xmlns:recipe=http://www.webgeek.com/xmlns/recipe
....................................................................... -->
Several conventions have been used in this section. First, the sub-module
file is named with the suffix .mod, which will correlate to the parameter
entity suffix used to identify it later when pulling the module components
together. Next, the Formal Public Identifier (FPI) must take a specific form.
267
The string will always begin with the -// characters. The next portion identifies the company or individual that owns the modulein this case, my
company WebGeek, Inc. Another set of // characters follow, then the word
ELEMENTS in capital letters, followed by the title and version of the module.
A final set of // characters close out the string, for a complete FPI of
-//WebGeek/ELEMENTS XHTML Recipe Qnames 1.0//
The system identifier is a URI where the file entity for the module might be
found. It might be expressed in fully qualified form, or as a relative URI
when working locally. In this case, weve chosen the former:
SYSTEM http://www.webgeek.com/DTDs/recipe-qname-1.mod
Finally, the XML Namespace is always declared in two parts: the xmlns:
prefix followed by a short string that will be used throughout as a token
representing the full URI given in the second part of the declaration, for
example:
xmlns:recipe=http://www.webgeek.com/xmlns/recipe
268
Namespaces and still have confidence that his document is valid. The way
the namespace hack accomplishes these two tasks is complicated. A simple
explanation of it is
Each element in a module has a name that is defined symbolically.
If the document type author or the document author chooses, the element names can be prefixed with their namespace identifier.
This namespace identifier can be set to any value. It is mapped to the
namespaces URIa world-unique identifier.
Whether or not a document author chooses to use namespaces, if the
document has the correct structure, it is a valid XHTML family
document.
XML Namespaces are a somewhat futuristic application within XMLthey
are not in wide use in the Web document authoring community. However,
as more and more collections of elements are developed into XHTML
Modules, these modules will need to rely upon XML Namespaces to a) help
imply the semantics associated with an element or attribute and b) help
ensure that there are no name collisions.
So to get back to our module, youll remember that this information noted
at the beginning of the sub-module file was in comments, meant for human
consumption, not machine. This entity takes the following form:
<!ENTITY % Recipe.xmlns http://www.webgeek.com/xmlns/recipe >
The prefix to be used when the namespace-qualified names are used must
be declared in its own PE, identified by the .prefix suffix in the PE name:
<!ENTITY %Recipe.prefix recipe >
<![ %Recipe.prefixed;[
<!ENTITY % Recipe.pfx %Recipe.prefix;: >
<!ENTITY % Recipe.xmlns.extra.attrib
xmlns:%Recipe.prefix;
%URI.datatype;
]]>
<!ENTITY % Recipe.pfx >
<!ENTITY % Recipe.xmlns.extra.attrib >
#FIXED
%Recipe.xmlns; >
Now, each of the elements that will be defined in the module need to have a
parameter entity that represents their qualified names. These PEs are
always named with the .qname suffix. The value of these entities is always
%Module.pfx;namewhere the identifying string for the module is substituted for Module, and name holds the place of the element namefor
example, %Recipe.pfx;ingredient:
<!ENTITY
<!ENTITY
<!ENTITY
<!ENTITY
%
%
%
%
269
EXAMPLE
Now its time to actually declare the elements and attributes that make up
this module using the Declaration sub-module.
The sub-module begins with the same type of commented section as the
QName sub-module. Appropriate public and system identifiers are
declared, and the same namespace URI is given:
<!-- ...................................................................... -->
<!-- WebGeek Recipe Module
............................................. -->
<!-- file: recipe-elements-1.mod
PUBLIC -//WEBGEEK//ELEMENTS XHTML-WebGeek Recipe 1.0//EN
SYSTEM http://www.webgeek.com/DTDs/recipe-1.mod
xmlns:recipe=http://www.webgeek.com/xmlns/recipe
...................................................................... -->
270
Next is another commented section that names the module, lists the elements to be declared, and provides a basic description of the modules
purpose:
<!-- WebGeek Recipe Module
recipe:recipe
recipe:ingredients
recipe:ingredient
recipe:prep
This module defines structural components of a cooking recipe.
-->
The first declarative section invokes and defines a parameter entity that is
used within the ATTLIST of each element being declared. Its purpose is to
manage the prefixing of the attributes when the module is used to create a
standalone DTD. When combined with the XHTML Framework Module, the
value of these PEs are overridden by the global NS.attrib:
<![ %Recipe.prefixed;[
<!ENTITY % Recipe.xmlns.attrib
NS.prefixed.attrib;
>
]]>
<!ENTITY % REcipe.xmlns.attrib
NS.prefixed.attrib;
xmlns %URI.datatype;
#FIXED
>
%Recipe.xmlns;
Now, each element is defined using the parameter entities created in the
QNames sub-module, along with any ATTLIST definitions required:
<!ELEMENT %Recipe.recipe.qname;
(%Recipe.ingredients.qname;, %Recipe.prep.qname;) >
<!ATTLIST %Recipe.recipe.qname;
%Common;
title
CDATA
#REQUIRED
category CDATA
#IMPLIED
%Recipe.ns.attrib;
>
<!ELEMENT %Recipe.ingredients.qname;
(%Recipe.ingredient.qname;)+ >
<!ATTLIST %Recipe.ingredients.qname;
%Common;
%Recipe.ns.attrib;
>
<!ELEMENT %Recipe.ingredient.qname; EMPTY >
271
<!ATTLIST %Recipe.ingredient.qname;
%Common;
quantity
CDATA
#REQUIRED
unit
CDATA
#REQUIRED
item
CDATA
#REQUIRED
%Recipe.ns.attrib;
>
<!ELEMENT %Recipe.prep.qname;
(PCDATA| %Flow;)+ >
<!ATTLIST %Recipe.prep.qname;
%Common;
%Recipe.ns.attrib;
>
EXAMPLE
%Recipe.xmlns;
<!ELEMENT %Recipe.recipe.qname;
272
Whats Next
In this chapter weve completed the process of gathering the type of information we need to store in new elements, determining how that information will be used within the document, and planning the content model of
elements that should be created. We then took that abstract definition and
constructed both a QNames and a Declaration sub-module that parameterize and define elements and attributes to be used to hold this data.
Next, in Chapter 16, Combining Custom Modules with XHTML, well take
the QNames and Declaration sub-modules we created here and incorporate
them into a larger DTD that governs a new XHTML Family Language.
Using that DTD, well create new Web pages that are fully interoperable on
todays Web.
16
Combining Custom
Modules with XHTML
So far in your studies of XHTML Modularization, youve seen how the
Modularization process is designed, how abstract modules are written, and
how content models are defined. Youve also stepped through the process of
creating a customized module to be combined with existing W3C-provided
modules to create a new XHTML-based markup language.
The final step in the process of creating such a language is the creation of
the final DTD. This is where each of the modules defined are brought
together in a single document, the languages DTD against which all documents will be authored.
This chapter teaches you:
How the Modular Framework Module incorporates necessary basic
structures
How to plug your custom module into the Modularized DTD template
How to edit the template as necessary based on your design decisions
How to create a new document based on your completed DTD
276
EXAMPLE
notations
datatypes
namespace-qualified names
common attributes
document model
character entities
277
278
Next is the description of the modules purpose, namely to provide the DTD
components that will be required in all XHTML compliant languages:
NotationsConventions used in other languages and some defined in
XHTML, including CDATA, FPI, and others.
Data typesDefinitions of terms such as length, number, pixels, and
so on.
Namespace-qualified namesAdd the ability to use namespaces to
qualify names for differentiation between XHTML names and names
from other markup languages.
Common attributesDefine the attribute sets referenced by existing
and extended XHTML modules.
The document modelInstantiated by the Document Model module
declared by the DTD driver.
Character entitiesProvide the ability to include the Latin 1, Symbol,
and Special Character collections in your documents.
Support for intrinsic eventsTurned off by default.
Its not necessary to edit any of these components, and its not even really
necessary to understand how they work. Just know that this Modular
Framework Module provides the basic components required for a complete
markup language in the XHTML Family.
Now that you know where each piece of the DTD puzzle is coming from, its
time to create a DTD for our new language.
279
and is reproduced here in Listing 16.2. Well edit each segment along the
way to create the DTD for our new language.
Listing 16.2: The New DTD Template
EXAMPLE
280
281
282
This entity references a content model definition module, which now needs
to be prepared. The content model defined here for our recipe module is
fairly generic. It is the DTD representation of the minimal content models
described in the abstract module definitions, plus our extensions needed to
incorporate the recipe root element as its appropriate content type.
As the recipe element is intended to act as an addition to the block element
set, its Qname needs to be added to the %Block.extra parameter entity definition:
<!ENTITY % Block.extra
| %Recipe.recipe.qname; >
Other than this, the only changes that need to be made in this generic content model module are the public and system identifiers, the filename, and
the namespace, as is always done in the first section of the .mod file (see
Listing 16.3).
283
284
285
Turning our attention back to the Recipe DTD itself, the next item to be
addressed is document profiles. We arent working with document profiles
at this stage, so the entity for that remains empty:
<!-- reserved for future use with document profiles -->
<!ENTITY % XHTML.profile >
Now the Framework and required modules are defined and brought into
place. No post-framework redeclarations were needed, so that segment has
been left out here. That is the only edit you need to make in this section:
<!-- Modular Framework Module ................................... -->
<!ENTITY % xhtml-framework.module INCLUDE >
<![%xhtml-framework.module;[
<!ENTITY % xhtml-framework.mod
PUBLIC -//W3C//ENTITIES XHTML 1.1 Modular Framework 1.0//EN
xhtml11-framework-1.mod >
286
%xhtml-framework.mod;]]>
<!-- Text Module (required) ............................... -->
<!ENTITY % xhtml-text.module INCLUDE >
<![%xhtml-text.module;[
<!ENTITY % xhtml-text.mod
PUBLIC -//W3C//ELEMENTS XHTML 1.1 Text 1.0//EN
xhtml11-text-1.mod >
%xhtml-text.mod;]]>
<!-- Hypertext Module (required) ................................. -->
<!ENTITY % xhtml-hypertext.module INCLUDE >
<![%xhtml-hypertext.module;[
<!ENTITY % xhtml-hypertext.mod
PUBLIC -//W3C//ELEMENTS XHTML 1.1 Hypertext 1.0//EN
xhtml11-hypertext-1.mod >
%xhtml-hypertext.mod;]]>
<!-- Lists Module (required) .................................... -->
<!ENTITY % xhtml-list.module INCLUDE >
<![%xhtml-list.module;[
<!ENTITY % xhtml-list.mod
PUBLIC -//W3C//ELEMENTS XHTML 1.1 Lists 1.0//EN
xhtml11-list-1.mod >
%xhtml-list.mod;]]>
287
288
289
290
Figure 16.1 shows this file when saved as recipe.xml and rendered by the
Internet Explorer XML processor.
Whats Next
291
techniques you learned in Chapter 12, XSLStyle the XML Way, to see
how you might style these new elements.
Whats Next
In this chapter youve taken the custom XHTML module defined in Chapter
15 and incorporated it into a new markup language with its own DTD.
Weve also reviewed a document written using this markup language.
Next, in Chapter 17, Subsetting XHTML: XHTML Basic, well take a look
at additional applications of XHTML Modularization, specifically the idea
of an XHTML subset for smaller devices in XHTML Basic.
Part IV
The Future of XHTML
17 Subsetting XHTML: XHTML Basic
18 XHTML Document Profiling
19 Next Steps for XHTML
17
Subsetting XHTML: XHTML Basic
When people think about the flexibility that an extensible language like
XHTML offers, its easy to forget that you dont necessarily have to add to
the language, that you can instead subtract from it, or make it more
restrictive.
XHTML Basic is the first XHTML Family Member vocabulary to be published by the W3C that takes this approach. Its intent is to act as a basic
platform for further development in vocabularies used on small footprint
devices.
This chapter teaches you:
How small footprint devices are being used to access the Internet
What kinds of devices are gaining in popularity
How to convert an existing HTML document to XHTML Basic
How XHTML Basic compares to other XHTML document types
296
Miniature Computers
The form factor of many laptop and portable devices is certainly getting
smaller by the year. However, the systems worthy of mention in this section
arent just smaller devices, but those that are both small and run on operating systems optimized to perform under the tight constraints present in
these devices. 3Com and now Palm Computing offer the PalmOS found in
the wildly popular Palm Pilot line of PDAs, and Microsoft has joined the
fray with the Windows CE operating system. Both palm-sized devices operate primarily with a stylus, and miniature keyboard-based systems known
as hand-helds can take advantage of WinCE.
These machines typically have just 2MB16MB of memory in which to
operate. Thats not RAM, as in our desktop machines, but the entire storage
system!
Understandably, the software used on these machines must have a significantly smaller footprint than those used on traditional computers. Because
of the limited space for software to operate, the features available within
programs can be limited. For instance, a Web browser might not have a
Java Virtual Machine installed, or it might not have the capability to
process scripts or style sheets. Its also unlikely that these browsers will
have plug-ins available to support advanced Web features such as video,
audio, or animations such as Shockwave and SMIL.
Nontraditional Appliances
A nontraditional appliance is one where, at least until very recently, you
wouldnt normally think about a computer being present, let alone one with
access to the Internet!
297
Wireless Access
Wireless access to the Internet is one of the fastest growing user segments.
Cellular telephones are used by as much as 60%70% of the population in
Sweden and Finland, followed quite closely by the Japanese.
Elements
Structure Module
Text Module
Hypertext Module
List Module
EXAMPLE
Lets take a look at an existing Web site and see what would need to be
done if it were to be updated for XHTML Basic. Orions Domain is the personal Web site of a good friend of mine, and he has graciously allowed me to
use his pages here in this book (see Figure 17.1). Note that his home page
is currently a valid HTML 4.0 Transitional document. The page source is
found in Listing 17.1.
298
Figure 17.1: Orions Domain: a valid HTML 4.0 Transitional Web site that
well transform to XHTML Basic.
Listing 17.1: orion.html
<!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN
http://www.w3.org/TR/REC-html40/loose.dtd>
<HTML>
<HEAD>
<TITLE>Orions Domain</TITLE>
<META NAME=description
CONTENT=Orions pages. Warhammer. Apollo Smile. Angst.
Street Fighter II.
These are just a few of my favorite things.>
<META NAME=keywords
CONTENT=orion, marshall, jansen, angst, poetry, anime, space,
madness, deep, seas, dino, mush, mud, evil!, xibo, apollo, smile,
warhammer, 40k, fantasy, tyranid, chaos, gw, eldar, miniatures,
painting, conversion, crimson, fists, dark, eldar, elves, hello,
kitty, slaanesh, tzeentch, khorne, nurgle>
<LINK REL=STYLESHEET TYPE=text/css HREF=styles/main.css>
</HEAD>
<BODY>
<TABLE WIDTH=90%>
<TR>
<TD><H1 CLASS=head>Orions Domain</H1></TD>
<TD ALIGN=RIGHT><IMG SRC=pics/orion.jpg
WIDTH=152
299
300
Tidy converts all the elements and attributes to lowercase for me, and adds
the required slashes to empty elements (see Figure 17.2).
301
to
<!DOCTYPE html PUBLIC -//W3C//DTD XHTML Basic 1.0//EN
http://www.w3.org/TR/2000/WD-xhtml-basic-20000210/xhtml-basic10.dtd>
The first trouble spot we run into is the width, alignment, and cellpadding
attributes on each of the tables and table cells. As presentational attributes, theyve been removed from the XHTML Basic DTD.
TIP
Remember, although these presentational attributes arent supported in XHTML Basic,
that doesnt mean the effect cannot be rendered in the document. CSS provides for
table manipulation properties, allowing the same effect to be transferred from the
XHTML source document to the style sheet. With the class identifiers already in place
within the table, theres already a hook available.
Horizontal rules arent supported, so the divider between the content of this
page and the address and other information below the rule will need to be
removed. Also, the JavaScript date insertion needs to be converted to plain
text as seen here:
<table width=100%>
<tr>
302
Now the document is in order, and you can see the markup itself in Listing
17.2 and a view from a traditional browser in Figure 17.3.
303
304
Whats Next
305
</tr>
</table>
</body>
</html>
Whats Next
In this chapter, you have learned about the wide variety of computing
devices that now have access to the Internet. Youve become familiar with
the unique requirements that many of these devices present, both in terms
of the software they can run and the type of documents they are able to
process.
Youve gone through the steps of converting an existing HTML document
over to XHTML Basic and compared the results to both the original presentation and XHTML 1.0 Strict.
Next, in Chapter 18, XHTML Document Profiling, well take a look at how
XHTML documents can be described and managed using document profiling techniques.
18
XHTML Document Profiling
Up until this point weve discussed the technologies behind creating
XHTML documents, as well as defining and validating their grammars. The
next step in managing a set of Web documents is to work with them as a
collection. To manage, sort, or otherwise document the properties of a collection, you must first know what each individual member of that set represents. To that end, methods have been developed to describe and categorize
documents as a larger part of information management systems.
XHTML has facilities for describing documents that are compatible with
these efforts, and the future holds promise for several more techniques that
will further enhance the abilities for XHTML.
This chapter teaches you:
What metadata is
How metadata is included in XHTML documents
How to build descriptive
How to use
meta
meta
elements
308
Meta Information
Before we can talk too much about what meta information provides for us,
its essential that you fully understand what metadata, as it has pertained
to Web documents, actually is. The most simplistic definition of metadata is
data about data. Practically, that means descriptive information about a
document or collection of data such as a database or other concentrated
store of data. In XHTML, the meta element provides document authors with
an opportunity to include this descriptive information.
The interesting parts of the meta element are the attributes, because meta is
an empty element. The presence of the i18n parameter entity indicates that
meta can take any of the attributes represented by i18n (to review whats
provided in the internationalization attribute set, see Chapter 13,
Document Type DefinitionsThe Syntax Rulebook).
The two most common attributes used are name and content. The name
attribute provides a label for the meta information being supplied in the
element. The content attribute value holds the actual meta data. For example, to identify the author of this book, if it were written as an XHTML file,
the meta element would appear as
<meta name=author content=Ann Navarro />
The http-equiv attribute is used in place of the name attribute when the
information is intended for the benefit of http servers gathering information to produce compatible http-response headers. Well come back to the
http-equiv attribute later in this chapter.
Meta Information
309
In practice, the scheme attribute is rarely used. Instead, the name attribute
would take ISBN and the context would be understood by any human readers.
NOTE
The collection of and processing of metadata is a science unto itself. The Dublin Core
Metadata Initiative is one project thats specific to the world of electronic resources.
Visit its Web site at http://purl.org/DC/.
310
EXAMPLE
The remainder of the meta elements commonly used are for the benefit of
search engines, browsers, or other automated systems, such as the last two
elements seen in this example. The http-equiv attribute with the value
Expires tells the browser when it should retrieve a fresh copy of the page
from the server, rather than relying on a copy stored in cache. The last meta
element seen is a content ratings system that can be used by proxies or
other content-filtering systems to disallow access to objectionable material.
The rating service used to develop this meta element was the Internet
Content Rating Association, which can be found online at http://
www.rsac.org/.
This element tells spidering agents, or robots, that they should not index a
document when this metadata appears with the noindex instruction.
Additionally, nofollow tells the spider not to look for or follow any links
found within the document.
TIP
Not all search engine spiders recognize or obey the preferences stated in the robots
meta element. However, many more follow the Robots Exclusion Protocol that was developed in 1994 as a cooperative project between the programmers of robots and spiders
and site administrators. Information on using the Robots Exclusion Protocol and the
resulting robots.txt file generated can be found at http://info.webcrawler.com/
mak/projects/robots/exclusion-admin.html.
Most search engines determine when theyll return a given Web page by
comparing the search term provided by the user with the information it has
stored about a given site or page. Frequently that store of data is a list of
Meta Information
311
keywords found in the document. Authors can provide keywords that they
want the search engines to match by using the keywords meta element.
Just how many keywords can be read, or whether you need to use every
form of a word pertaining to your site, has always been a hotly debated
topic. The reality is there will never be one set answer for all purposes.
Some search engines dont record your keywords and instead index the
entire page, whereas others will read the keywords but limit themselves to
the first few. Most sources suggest using no more than about 25 words, or
perhaps 250 characters of keyword data, and ordering them by relevance.
That is, the keywords that best describe your site should come first in the
list.
Lets take a look at the use of keywords on a popular Web site. The Lycos
Network has a site known as Gamesville, where visitors can play free
games for the opportunity to win small, and many not so small, cash prizes
(see Figure 18.1).
meta
The description meta element (edited here to conform to XHTML 1.0 requirements by making sure element and attribute names are in lowercase, and
we have the closing / at the end that is now required of empty elements) is
short and to the point:
<meta name=description content=Play free games, win cash prizes! />
312
The
keywords meta
Considerable information on the behavior of search engine spiders and various features of search engines can be found at the Search Engine Watch
site, located at http://searchenginewatch.internet.com/webmasters/features.html
(this article is publicly available, though some articles on the site are only
available via paid subscription).
Some machine-usable meta elements are written not by using the name
attribute, but instead use the http-equiv attribute. If we can digress only for
a moment, you know that Web pages are requested and delivered over the
Internet using the Hypertext Transfer Protocol, or HTTP. The client that
makes the request and the server that responds share additional information with each other by way of HTTP headers, essentially extra bits of
data pre-pended to the exchange. The property name used as the http-equiv
attribute value in a meta element can be used to create a header in the
HTTP response between the client and server. Table 21.1 outlines a number
of popular http-equiv-based properties. Listing 21.1 puts them to use in the
head of an XHTML document.
Table 21.1: Sample
http-equiv
Property
Content Values
Usage
expires
Date/time stamp
Content-type
Content-Script-Type
Content-Style-Type
Pragma
Meta Information
313
EXAMPLE
<head>
<title>Metadata Samples</title>
<meta name=author content=Ann Navarro />
<meta name=copyright content=Copyright 2000, WebGeek, Inc. />
<meta name=description content=A document used to provide examples of meta
elements />
<meta name=keywords content=meta, metadata, descriptions />
<meta http-equiv=Expires content=Sun, 31 Dec 2000 23:59:59 GMT />
<meta http-equiv=Content-Type content=text/html />
<meta http-equiv=Content-Script-Type content=text/JavaScript />
<meta http-equiv=Content-Style-Type content=text/CSS />
<meta http-equiv=Pragma content=nocache />
</head>
meta
Elements
Site
Details
Meta Builder 2
http://vancouver-webpages.com/
META/mk-metas.html
One of my favorite sites is the Reggie Metadata editor. This tool allows the
user to select which metadata schema to use, and what syntax the output
should be formatted with, such as HTML 4.0, RDF, or others. Once these
options are selected, the user is prompted to fill in the details for commonly
used metadata properties. Then the applet compiles the final markup for
you (see Figure 21.2).
314
Whats Missing?
So far weve explored ways to describe the contents of a document; to identify specific properties such as the author of the document, copyright statements, and creation or modification dates; as well as methods used to
enhance the machine-processibility of the page. So whats missing?
A frequently heard complaint in some Web design circles is the inability to
exert precise control over the presentation of their Web sites. These authors
might not want a browser to attempt to render the document if it cant handle some portions of it, preferring to miss out on that visitor rather than
change the display of the page content. Others might have preferences as to
the path of substitution a browser takes if it cant render the document as
first defined.
Today, XHMTL doesnt provide a mechanism for exerting that much control
over the delivery of Web content. A request is made from the client, and the
server happily pushes all that is there back out. Its the clients responsibility to determine how to display that content to the user, based on policies or
decision trees set by the clients programmers.
Similarly, its not possible for a device to let the server know that it can
accept GIF images but not PNG, or that the browser in use doesnt understand JavaScript. Capabilities and preferences also dont necessarily have
to be device or client related. A user with less-than-perfect eyesight might
have chosen to increase the default font size of his display and would like
that preference to carry over when his browsing software evaluates any
Whats Next
315
font sizes set in a Web document. Many of these issues are being addressed
in work getting underway at the W3C, as well as other standards and
research bodies.
Whats Next
In this chapter weve talked about systems for describing documents,
including syntax for providing details for originator use such as creation
and editing dates. The meta element can be used with the name attribute to
provide any number of descriptive properties, and the http-equiv attribute is
used to suggest http response headers.
Next, in Chapter 16, Combining Custom Modules with XHTML, well take
a look at several exciting coming attractions for XHTML. Weve already
touched briefly on the capabilities and profiling work to be done by CC/PP,
and well take a closer look at the developments occurring in that field.
Work is also underway in combining television and the Web, both from a
broadcast over the Internet perspective as well as an embedded Web documents in a TV broadcast viewpoint. Finally, the ubiquitous form is in for
an overhaul with extended capabilities being reviewed by the W3C XForms
Working Group. We expect to see efforts to provide state management
between form pages, databinding, and stronger form-field validation tools.
19
Next Steps for XHTML
In this chapter you will learn a little about some of the exciting likely
future developments in Web standards that will impact XHTML or draw
from recent or forthcoming XHTML standards.
The new possibilities being opened up are exciting, but you should be aware
that all the material discussed in this chapter is at an early stage of development, and so is particularly liable to change. So please read this chapter
with a view to grasping the big picture, the general concepts. By all means
think about the preliminary detail provided but make sure to check the current versions of the documents cited to identify where major changes might
have occurred.
It isnt possible to cover every emerging technology here, but the ones
which are covered should give you a flavor of some of the substantial
changes that are likely to further change the ways in which you can make
use of the World Wide Web.
This chapter teaches you:
A methodology that will permit Web servers to tailor output of documents to the type of browser that you choose to use at any one time.
This embryonic technology goes by the name Composite
Capabilities/Preferences Profiles, abbreviated to CC/PP.
Possible ways in which television and the World Wide Web can benefit
from the strengths of the partner technology.
A new generation of XML-based Web forms, called XForms, which will
improve on existing HTML/XHTML forms.
318
Figure 19.1: This shows a simple JavaServer page, which displays some
information extracted from an HTTP header. Note the information about the
browser in the final line of the page.
319
320
RDF is also the foundation of CC/PP, the semantics of which are overlaid on
RDF.
CC/PP Terminology
For a Web server to be able to identify the nature and capabilities of the
variety of browsers that might access it, new concepts are raised; not surprisingly, CC/PP involves quite a bit of new jargon.
Some terms might be familiar to you, but others are likely to be unfamiliar
to at least some readers.
Not all the terminology relating to CC/PP is described here. Further details
can be obtained by consulting the W3C documents listed later in this
section.
321
Here are some of the terms that apply to the browser end of things:
User Agent ProfileA description of the capabilities and preferences
of a client device or user agentin other words, a browser.
AttributesThe characteristics or capabilities of a user agent that
together constitute the user agent profilefor example, screen resolution.
HintThe expression of a preference. This will carry weight with the
origin server (see the following) but is unlikely to be mandatory. For
example, the browser might indicate that it prefers to be served
XHTML rather than HTML.
At the Web server end, these terms are in use:
Origin ServerA Web server that originates or supplies Web content
to a user agent.
AuthenticationConfirmation of the identity of a user/browser.
Content SelectionThe selection of a document to serve that is appropriate to the CC/PP of the user agent.
Some terms involve both the client and browser:
CapabilityThe attributes of, usually, the receiver of a message. It
can, however, also apply to the capabilities of a server with regard to
the types of messages it is capable of serving.
Content NegotiationA process that involves exchange of information
between an origin server and a user agent allowing a user agent to
make a choice about which of a range of available content formats is
most acceptable.
A CC/PP profile describes client (or user agent) capabilities in terms of a
number of CC/PP attributes, or features. Each of these features is identified
by a name in the form of a URI (Uniform Resource Indicator). A collection
of such names used to describe a client is called a vocabulary.
CC/PP describes a small, core set of features that are widely applicable and
constitute the core vocabulary of CC/PP. In addition, it is anticipated that
extension vocabularies also will be supported. You might recognize similarities here to the extensibility enabled by modularization of XHTML, which
was described in Chapter 14, XHTML Modularization, and Chapter 16,
Combining Custom Modules with XHTML. It is anticipated that the
extension vocabularies might be standardized for, for example, imaging
devices, voice messaging devices, and wireless access devices.
322
Please note that, because the current documents are only a draft, the
applicable URI might change.
323
CC/PP identifies the capabilities of the user agent together with how the
user prefers to use them. XHTML Document Profiles express the required
functionality for what the document author perceives as optimal rendering
of the document. Hence, CC/PP and XHTML document profiles might, in
certain settings, be compared when an origin server is considering the optimal type of content to be delivered to a user agent.
Background information on XHTML document profiles can be viewed at
http://www.w3.org/TR/1999/WD-xhtml-prof-req-19990906
324
If you are not familiar with the typical process through which documents
pass during their development at W3C, this is the hierarchy, in ascending
order of importance:
Working DraftsThese are draft documents that are subject to perhaps radical change or redrafting. Concepts or solutions can be added
or removed. They are a snapshot of steps towards a hoped-for solution.
When a Working Draft is nearing what is hoped to be its final form, a
so-called Last Call Working Draft is issued. When a technology is
particularly complex or the merits of a proposed solution are still
being highly debated, it can happen that one Last Call draft is succeeded by another.
Candidate RecommendationPromotion of a document to Candidate
Recommendation status is an indication that a document has successfully completed its Last Call process. Advancement of a document to
Candidate Recommendation status is an explicit call to those outside
of the related Working Groups or the W3C itself for implementation
and technical feedback.
Proposed RecommendationIf the feedback on the Candidate
Recommendation is satisfactory, promotion to Proposed
Recommendation is the next step. It is still possible for issues to be
raised which, in a worst-case scenario, might result in the document
being returned to Working Draft status for further development.
RecommendationFor all practical purposes, this is a fixed standardat least until a subsequent version is formally approved and
issued. Because W3C is not an inter-governmental body, it cannot call
its documents standards and therefore uses the term Recommendations. Recommendations are stable documents that might be cited
as source material.
The W3C Documents of CC/PP are, at the time of writing, at the earliest
stageWorking Draft.
Security
The use of a browser that might identify you personally and your location
raises a number of security issues.
In at least some settings, it will be advisable or possible to conceal your
identity (anonymization).
One issue that raises specific security concerns is mentioned in relation to
position-dependent information.
325
Position-Dependent Information
Lets take a brief look at the pros and cons of the availability of systems
that depend on knowing where you are; in the jargon, systems that operate
using position-dependent information.
Imagine that you have a voice-enabled browser built into or plugged into
your car that has capabilities (perhaps in conjunction with the electronic
systems of the car) to determine your position and give you advice about
suitable routes to your desired destination, perhaps with the ability to
choose a scenic route, the fastest route, a route free of traffic jams, and so
on. In that context the ability of your browser to transmit your exact position to others, and receive travel advice in return, has obvious benefits.
Similar benefits might arise if you are travelling and want to locate a hotel
or other facility with criteria you want to define.
In practice, many of these processes, once they are fully developed, are
likely not to involve direct human participation. However, it is quite possible that staff at the Web server company or others might be able to gain
access to information about your location.
Because significant costs will be incurred in creating and maintaining such
a system, it is unlikely to be free and thereby provide anonymous access.
The fact that others might be aware of your geographical location might be
viewed by you as an invasion of your personal privacy.
At first glance, others might say that there is no issue here. If you are acting lawfully and have nothing to hide, why should you object to others
knowing your exact location?
Thinking beyond the principles of what are or are not the boundaries of
legitimate personal privacy, some very practical issues arise. Suppose you
are on a highway 300 miles from home, you transmit your positionfor
example, to find out traffic flows aheadand that information is intercepted by an unauthorized person or is used unscrupulously by a member
of the staff at a Web server company. Your position is known. You are 300
miles from home. A burglar has, at a minimum, four safe hours in which to
invade your home and steal from it. The service that depends on knowing
your geographical location has potentially exposed your property to crime.
Not surprisingly, issues relating to privacy and security are under active
consideration in relation to position-dependent information.
Technical issues also are receiving consideration. For example, there is a
need to develop an XML-compliant format for encapsulating positiondependent information.
326
XForms
327
As mentioned earlier in the chapter, the possible needs of television are one
item being considered with the scope of establishing a CC/PP standard.
XForms
HTML forms were introduced to the World Wide Web in 1993. They and,
more recently, XHTML forms, perform a pivotal role on many Web sites in
328
EXAMPLE
XForms
329
</model>
<instance>
<purchaseOrder>
<shipTo>
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<state>CA</state>
<zip>90952</zip>
</shipTo>
</purchaseOrder>
</instance>
</xform>
</head>
<body>
<h1>Shipping Information</h1>
<form name=po_xform>
Name: <input name=purchaseOrder.shipTo.name/><br/>
Street: <input name=purchaseOrder.shipTo.street/><br/>
City: <input name=purchaseOrder.shipTo.city/><br/>
State: <input name=purchaseOrder.shipTo.state/><br/>
Zip: <input name=purchaseOrder.shipTo.zip/><br/>
<button onclick=submit(po_xform)>Submit</button>
</form>
</body>
</html>
The XForms data model will include definitions of data types, data type
facets, data model structures, and an XForms expression language. Because
there are interdependencies between these aspects, and certain parts of the
XForms data model are far from complete, there is little advantage in discussing here a draft that in all likelihood will change substantially in the
coming months.
However, many of the more general issues relating to XForms are likely to
be of more enduring relevance.
XForms relate, in part, to the two other initiatives mentioned earlier in this
chapter because the requirements of handheld and television-based
browsers are an explicit consideration in the development of XForms.
330
The W3C working group that is working to develop XForms has identified a
number of key requirements for the next generation of Web forms that
include
Ease of migration
Improved interoperability and accessibility
Enhanced client/server interaction
Advanced forms logic
Support for internationalization
Greater flexibility in presentation
In the current draft, the working group expresses the view that it will not
be possible to maintain full backward compatibility with previous generations of Web browsers. Currently, it is not entirely clear to what extent and
in what situations problems will arise because of such lack of backward
compatibility.
In line with the principles of XML, increasingly being implemented in
XHTML, it is desired that content and presentation within XForms be separated.
The increasingly global nature of e-commerce and e-business demands that
internationalization of forms be achieved to a degree that would have been
only of specialized interest only a few years ago.
It is likely that XForms will include capabilities for advanced forms logic,
which are absent from the current generation of forms.
Because XForms will be based on XML and XHTML, the possibility exists
of sophisticated use with other XML-based technologies such as the
Scalable Vector Graphics (SVG) specification currently in Last Call
Working Draft and the Synchronized Multimedia Integration Language
(SMIL) Recommendation. Graphics-rich and multimedia-capable forms
might provide user experiences that are currently difficult to envisage.
Because all these technologies are XML-based, the possibility also will exist
to use Cascading Style Sheets and Extensible Style sheet Language (XSLT
and XSL-FO) to control aspects of presentation.
The facilities to be provided by XForms on different platforms will, quite
obviously, differ greatly. On a mobile browser with a tiny screen, the priority might be to get an XForm to work at all, whereas on a large monitor
attached to a computer with a high bandwidth connection to the Web a very
rich graphical and multimedia user experience is likely to be possible.
XForms
331
Part V
Appendix
A XHTML Modularization Abstract Module Definitions
A
XHTML Modularization Abstract
Module Definitions
This appendix contains the Abstract Module Definitions found in the W3C
document Modularization of XHTML. It is provided here as a means of
quick reference.
Some preliminary information is needed before the modules can be defined;
specifically, syntactic conventions, attribute types, and attribute collection
names.
336
Appendix A
Syntactic Conventions
The first thing that should be noted is that abstract module definitions are
not written in a formal grammar. However, the W3C has borrowed conventions that should be familiar to the authors of DTDs.
Each module might be viewed as a table with three columns. In each row,
youll find an element, the attributes allowed within the element, and then
the minimal content model for the element. A minimal content model might
contain a list of elements, a content set (which is of itself a common list of
elements), or a datatype such as PCDATA for the title element.
Occurrence indicators and other expressions will be found in the minimal
content models as follows:
Expression
Meaning
Foo ?
Foo +
Foo *
A, B
A is required, followed by B.
A|B
Either A or B is required.
A-B
Parentheses
Appendix A
337
Attribute Types
Each attribute listed in the module definitions will have an accompanying
type definition in parentheses.
This first table holds attribute types as defined in the XML 1.0
Recommendation:
Attribute Type
Definition
CDATA
Character Data
ID
A document-unique identifier
IDREF
IDREFS
NAME
NMTOKEN
NMTOKENS
PCDATA
The following table contains data types and their semantics as defined by
XHTML Modularization. Links to ISO, RFC, and other references can be
found in the Modularization of XHTML document on the W3C Web site:
Data Type
Description
Character
Charset
Charsets
Color
338
Appendix A
Data Type
Description
Fuschia = #FF00FF
Green = #008000
Lime = #00FF00
Olive = #808000
Yellow = #FFFF00
Navy = #000080
Blue = #0000FF
Teal = #008080
Aqua = #00FFFF
ContentType
ContentTypes
Datetime
FrameTarget
LanguageCode
Length
LinkTypes
Appendix A
Data Type
339
Description
PrevRefers to the previous document in an ordered
series of documents. Synonymous to Previous where supported.
ContentsRefers to a document that serves as a table
of contents. Synonymous with ToC where supported.
IndexRefers to a document providing an index for the
current document.
GlossaryRefers to a document providing a glossary of
terms used in the current document.
CopyrightRefers to a document containing a copyright
statement for the current document.
ChapterRefers to a document serving as a chapter in a
collection of documents.
SectionRefers to a document serving as a section in a
collection of documents.
SubsectionRefers to a document serving as a subsection in a collection of documents.
AppendixRefers to a document serving as an appendix
in a collection of documents.
HelpRefers to a document offering help (more information, links to other sources of information, and so on).
BookmarkRefers to a bookmark. A bookmark is a link
to a key entry point within an extended document. The
title attribute could be used to label the bookmark.
Also, more than one bookmark might be defined in each
document.
MediaDesc
340
Appendix A
Data Type
Description
PrintIntended for paged, opaque material, and for documents viewed onscreen in print preview mode
BrailleIntended for Braille tactile feedback devices
AuralIntended for speech synthesizers
AllSuitable for all devices
MultiLength
The value might be a Length or a relative length. A relative length takes the form i* where i is an integer.
When allotting space among elements competing for that
space, user agents allot stated pixel and percentage
lengths first, and then divide up remaining available
space among relative lengths. Each relative length
receives a portion of the available space, in proportion to
the integer preceding the *. The value * is equivilant
to 1*.
Number
Pixels
The value is an integer that represents the number of pixels on the canvas (screen, paper, and so on).
Script
Text
URI
URIs
Attribute Collections
Five groups of attributes are used together frequently enough that they are
placed into sets according to purpose, known as collections. The following
table lists the collection name and each member attribute. Each attribute
Appendix A
341
in the collection is further noted with its corresponding data type. The last
collection, Common, is a superset of the other collections:
Collection name
Attributes
Core
I18N (Internationalization)
Style**
style (CDATA)
Common
* The Events collection is only defined when the Intrinsic Events Module is in use. Otherwise, the
collection is empty.
** The Style collection is only defined when the Style Attribute Module is selected. Otherwise, the
collection is empty.
Structure Module
The Structure Module defines the major structural components of all
XHTML documents. The root element, html, is defined here:
Element
Attributes
body
Common
(Heading|Block|List)*
head
I18N, profile
(URI)
title
html
I18N, version
(CDATA), xmlns
(URI)
head, body
title
I18N
PCDATA
342
Appendix A
Text Module
As its name implies, the Text Module defines the elements used to structure text.
Four new content model sets are defined using the elements found in this
module. The following table defines the content model sets and corresponding member elements. The fourth set is a superset of the previous three.
Content Model Set
Elements
Heading
h1|h2|h3|h4|h5|h6
Block
address|blockquote|div|p|pre
Inline
abbr|acronym|br|cite|code|dfn|em
|kbd|q|samp|span|strong|var
Flow
Heading|Block|Inline
Attributes
abbr
Common
(PCDATA|Inline)*
acronym
Common
(PCDATA|Inline)*
address
Common
(PCDATA|Inline)*
blockquote
(PCDATA|Heading|Block)*
br
Core
EMPTY
cite
Common
(PCDATA|Inline)*
code
Common
(PCDATA|Inline)*
dfn
Common
(PCDATA|Inline)*
div
Common
(Heading|Block|List)*
em
Common
(PCDATA|Inline)*
h1
Common
(PCDATA|Inline)*
h2
Common
(PCDATA|Inline)*
h3
Common
(PCDATA|Inline)*
h4
Common
(PCDATA|Inline)*
h5
Common
(PCDATA|Inline)*
Appendix A
Element
Attributes
h6
Common
(PCDATA|Inline)*
kbd
Common
(PCDATA|Inline)*
Common
(PCDATA|Inline)*
pre
Common
(PCDATA|Inline)*
(PCDATA|Inline)*
samp
Common
(PCDATA|Inline)*
span
Common
(PCDATA|Inline)*
strong
Common
(PCDATA|Inline)*
var
Common
(PCDATA|Inline)*
343
Hypertext Module
Defining just a single element, a, the Hypertext Module allows links to
other resources to be created. When present, and it must be in all XHTML
Family conformant document definitions, the a element is added to the
Inline content set from the basic Text Module.
Element
Attributes
(PCDATA|Inline - a)*
NOTE
a elements cannot contain other a elements. Because the a element is added to the
Inline content set when the Hypertext Module is present, the minimal content model for
a must be defined as Inline minus a.
List Module
The List Module defines the list containers and the corresponding item elements.
A new content model set, List, is defined in this module. The List set is
then added to the Flow set defined in the basic Text Module.
344
Appendix A
Content Model
Elements
List
(dl|ol|ul)+
Attributes
dl
Common
(dt|dd)+
dt
Common
(PCDATA|Inline)*
dd
Common
(PCDATA|Inline)*
ol
Common
li+
ul
Common
li+
li
Common
(PCDATA|Inline)*
Optional Modules
The remaining 23 modules defined here are optional. They might be added
in any combination as desired to the Core Modules set to create XHTML
Family conformant document types.
Applet Module
NOTE
It should be noted that the Applet module, though provided by the W3C to describe
behavior of these elements, has been deprecated.
The Applet Module provides a means for referencing and accessing external
applications. When this module is present in a document type definition,
the Inline content set is modified to include the applet element:
Element
Attributes
applet
param?
param
EMPTY
Appendix A
345
Elements Added
Block
hr
Inline
Attributes
Common
(PCDATA|Inline)*
big
Common
(PCDATA|Inline)*
hr
Common
EMPTY
Common
(PCDATA|Inline)*
small
Common
(PCDATA|Inline)*
sub
Common
(PCDATA|Inline)*
sup
Common
(PCDATA|Inline)*
tt
Common
(PCDATA|Inline)*
EDIT MODULE
This module provides two elements used to show editorial changes in documents. When the module is in use, both elements are added to the Inline
content set:
Minimal Content
Model
Element
Attributes
del
(PCDATA|Inline)*
ins
(PCDATA|Inline)*
346
Appendix A
Attributes
bdo
Common
(PCDATA|Inline)*
Forms Modules
Two modules are defined in this section: a Basic Forms Module, which
incorporates form features found in HTML 3.2, and the broader Forms
Module, which provides the additional forms features introduced in
HTML 4.0.
BASIC FORMS MODULE
When in use, this module defines two new content model sets, as follows:
Content Model
Elements Added
Form
form
Formctrl
input|select|textarea
Additionally, two existing content model sets are modified to include the
new model sets:
Content Model
Block
Form
Inline
Formctrl
Attributes
form
Heading|Block - form
input
Common, checked
(checked), maxlength
(Number), name
(CDATA), size
(Number), src (URI),
type (text,
password,
checkbox, radio,
submit, reset,
file, hidden),
value (CDATA)
EMPTY
Appendix A
Element
Attributes
select
Common, multiple
(multiple), name
(CDATA), size
(Number)
option+
option
Common, selected
(selected), value
(CDATA)
Inline*
textarea
Common, columns
(Number), name
(CDATA), rows
(Number)
PCDATA*
347
FORMS MODULE
The Forms Module is a superset of the Basic Forms Module. Accordingly,
the new content model sets defined by the Forms Module are supersets of
the sets defined earlier.
NOTE
The content model sets use the same name as the Basic Forms Module definitions in
part because they are supersets, but also to allow easy substitution of forms modules
in document type definitions. Any additional forms modules defined by the W3C will
make use of these same content model set names, enhancing interoperability.
Content Model
Elements Added
Form
form|fieldset
Formctrl
input|select|textarea|label|button
As in the Basic Forms Module, the Form content model set is added to the
Block content set, and the Formctrl set is added to the Inline set.
The Forms Module itself is defined here:
Element
Attributes
form
Common, accept
(ContentTypes), acceptcharset (Charsets),
action (URI), method
(get|put), enctype
(ContentType)
(Heading|Block form|fieldset)+
348
Appendix A
Element
Attributes
input
Common, accept
(ContentTypes), accesskey
(Character), alt (CDATA),
checked (checked),
disabled (disabled),
maxlength (Number),
name (CDATA), readonly
(readonly), size
(Number), src (URI),
tabindex (Number), type
(text, password,
checkbox, radio,
submit, reset,
file, hidden,
image), value (CDATA)
EMPTY
select
Common, disabled
(disabled), multiple
(multiple), name
(CDATA), size (Number),
tabindex (Number)
(optgroup|option)+
option
Common, disabled
(disabled), label
(Text), selected
(selected), value
(CDATA)
PCDATA
textarea
Common, accesskey
(Character), columns
(Number), disabled
(disabled), name
(CDATA), readonly
(readonly), rows
(Number), tabindex
(Number)
PCDATA
button
Common, accesskey
(Character), disabled
(disabled), name
(CDATA), tabindex
(Number), type
(button|submit|
reset), value (CDATA)
(PCDATA|Heading|List|
Block - Form|Inline Formctrl)*
fieldset
Common
(PCDATA|legend|Flow)*
label
Common, accesskey
(Character), for (IDREF)
(PCDATA|Inline label)*
Appendix A
349
Element
Attributes
legend
Common, accesskey
(Character)
(PCDATA|Inline)+
optgroup
Common, disabled
(disabled), label
(Text)
option+
Table Modules
As with forms, two modules are defined in this section, a Basic Tables
Module, providing the most basic table functionality, and the broader
Tables Module, which has many more features and is a superset of the
Basic Tables Module.
Whenever either module is present, the Block content set is modified to
include the table element.
BASIC TABLES MODULE
Element
Attributes
caption
Common
(PCDATA|Inline)*
table
caption?,tr+
td
(PCDATA|Flow)*
th
(PCDATA|Flow)*
tr
(th|td)+
350
Appendix A
TABLES MODULE
Element
Attributes
caption
Common
(PCDATA|Inline)*
table
caption?,(col*|
colgroup*),((thead?,
tfoot?,tbody+)|(tr+))
td
(PCDATA|Flow)*
th
(PCDATA|Flow)*
tr
(th|td)+
Appendix A
Element
Attributes
col
EMPTY
colgroup
col*
tbody
tr+
thead
tr+
tfoot
tr+
351
Image Module
The Image Module provides the ability to incorporate basic images in the
document. Selecting the image module does not imply support for clientside image maps. Those are defined in a separate selectable module.
When the module is selected, the img element is added to the Inline
content set:
352
Appendix A
Element
Attributes
img
EMPTY
Element
Attributes
a&
n/a
area
Common, accesskey
(Character), alt* (Text),
coords (CDATA), href
(URI), nohref (nohref),
shape (rect*|circle|
poly|default),
tabindex (Number)
EMPTY
img&
usemap (IDREF)
map
((Heading|Block)|
area)+
object&
usemap (IDREF)
Appendix A
Element
Attributes
img&
ismap (ismap)
353
Object Module
This module is used for generic object inclusion, making no assertions
about what type of object might be incorporated.
When the module is in use, the object element is added to the Inline content set.
Element
Attributes
object
(PCDATA|Flow|param)*
param
EMPTY
Frames Module
This module provides frame-related elements and modifies several previously defined elements with new attributes.
NOTE
When selected, this module makes a major change to the content model of the html
element, which becomes (head, frameset).
Element
Attributes
frameset
frame+, noframes?
frame
Core, frameborder
(1|0), longdesc (URI),
marginheight (Pixels),
marginwidth (Pixels),
noresize (noresize),
scrolling (yes|no|
auto*), src (URI)
EMPTY
354
Appendix A
Element
Attributes
noframes
Common
body
a&
target (CDATA)
n/a
area&
target (CDATA)
base&
target (CDATA)
link&
target (CDATA)
form&
target (CDATA)
Iframe Module
This module provides the definition for inline frames. It should be noted
that it is not dependent on the presence of the Frames Module.
When this module is in use, the iframe element is added to the Inline content set:
Element
Attributes
iframe
Core, frameborder
(1|0), height
(Pixels), longdesc (URI),
marginheight (Pixels),
marginwidth (Pixels),
scrolling (yes|no|
auto*), src (URI),
width (Length)
Flow
Intrinsic Events
Intrinsic events are things that might occur when a user performs specific
actions. This module defines attributes that are added to the attribute set
for the listed elements only when the modules that define those elements
are present in the DTD. This module makes use of the Events attribute collection defined at the beginning of this appendix.
Element
a&
onblur, onfocus
Notes
area&
onblur, onfocus
form&
onreset, onsubmit
Appendix A
355
Element
body&
onload, onunload
Notes
label&
onblur, onfocus
input&
onblur, onchange,
onfocus, onselect
select&
textarea&
onblur, onchange,
onfocus, onselect
button&
onblur, onfocus
Metainformation Module
The single element in this module, meta, provides descriptive information in
the declarative portion of the document.
When this module is present, the content model for the head element (as
defined in the Structure Module) is changed to include the meta element.
Element
Attributes
meta
EMPTY
Scripting Module
This module defines elements used to invoke or provide alternative content
for executable scripts.
When this module is present, both elements are added to the Block and
Inline content sets, and the script element is added to the content model of
the head element in the Structure Module.
Element
Attributes
Minimal Content
Model
noscript
Common
(Heading|List|Block)+
script
PCDATA
xml:space=preserve
356
Appendix A
NOTE
The language attribute is deprecated, and will not appear in future versions of the
XHTML language.
Attributes
style
PCDATA
preserve
This module defines a single attribute, style. When the module is selected,
it activates the Style attribute collection previously defined as part of the
Common attribute collection. No other information is added.
Link Module
This module defines a single element, link, that can be used to define links
to external resources (such as external style sheets or scripts).
When this module is used, it adds the link element to the content model of
the head element in the Structure Module.
Element
Attributes
link
EMPTY
Base Module
This module defines the element base, which can be used to define a base
URI used to resolve all relative URIs in the document.
Appendix A
357
When this module is selected, the base element is added to the content
model of the head element in the Structure Module.
Element
Attributes
base
href* (URI)
EMPTY
Attributes
a&
name (CDATA)
Notes
applet&
name (CDATA)
form&
name (CDATA)
frame&
name (CDATA)
iframe&
name (CDATA)
img&
name (CDATA)
map&
name (CDATA)
Legacy Module
This module defines elements and attributes that have previously been
deprecated in earlier versions of HTML and XHTML. This module is
358
Appendix A
Attributes
basefont
EMPTY
center
Common
(PCDATA|Flow)*
font
(PCDATA|Inline)*
Common
(PCDATA|Inline)*
strike
Common
(PCDATA|Inline)*
Common
(PCDATA|Inline)*
Element
Attributes
Notes
body&
br&
clear (left|all|
right|none*)
caption&
align (left|center|
right|justify)
div&
align (left|center|
right|justify)
h1-h6&
align (left|center|
right|justify)
hr&
align (left|center|
right|justify)
img&
align (left|center|
right|justify), border
(Pixels), hspace (Pixels),
vspace (Pixels)
input&
align (left|center|
right|justify)
Appendix A
Element
Attributes
Notes
legend&
align (left|center|
right|justify)
li&
ol&
p&
align (left|center|
right|justify)
pre&
width (Number)
script&
language (CDATA)
table&
align (left|center|
right|justify),
bgcolor (Color)
tr&
bgcolor (Color)
th&
td&
ul&
(CDATA)
359
Index
Symbols
& (ampersand), 166
* (asterisk)
passwords, 62
relative sizing frames, 110
A
abbreviated syntax (XPath),
211-212
absolute location paths
(XPath), 212-213
absolute sizes, frames, 110
abstract modules, 244-248, 265
implementing, 248-254
syntactic conventions, 336
accept attribute (Input tag), 61
access
DOM (Document Object Model),
165
wireless, 297
accessibility for disabled
people, 127, 129
W3C (World Wide Web
Consortium), 129-130
WCAG (Web Content Authoring
Guidelines), 130-131
applets, 135
checkpoints, 130, 136-139
designing documents, 131
emphasizing text, 132
forms, 136
graphics, 134-135
hyperlinks, 134
language definition, 132
lists, 133
objects, 135
scripts, 136
tables, 133
accesskey attribute, 136
Input tag, 61
action attribute, form tag, 60
Adobe Photoshop, 46
agents, 310-312
align attribute, 15, 82, 195, 230
Input tag, 61
aligning
images/text, 47-48
table data, 95-104
text, 195
362
attributes
content, 18
creating, 229-232
deprecated, 14
Disabled (Input tag), 61
encoding, 164
frameborder (Frame tag), 111
headers, 96-98
height, 83
href, 186
http-equiv, 164, 308, 312
id, 96-98, 190
inserting images, 46-51
language, 16-18, 132, 162
marginheight/marginwidth
(Frame tag), 111
maxlength
Input tag, 61
text boxes, 62
meta element, 308
method (form tag), 60
name, 18, 308
Input tag, 61
text boxes, 62
namespace, 16-17
noresize (Frame tag), 111
onblur (Input tag), 61
onchange (Input tag), 61
onfocus (Input tag), 61
onselect (Input tag), 61
planning, 239
Readonly (Input tag), 61
rel, 186
rows, text area controls, 67-68
rowspan, 83
scheme, 309
scrolling (Frame tag), 111
size
Input tag, 61
select controls, 66
text boxes, 62
src, 162
Input tag, 61
tabindex, 136
Input tag, 61
target
frames navigation, 118
linking frames, 113-122
TextAlign, 238
type
Input tag, 61
text boxes, 62
unordered lists, 26-28
types, 337-340
usemap (Input tag), 61
valign, 82
value
button controls, 70
check box controls, 64
Input tag, 61
radio buttons, 64
value types, 230
width, 83, 89
xml:lang, 132
B
b (bold) tag, 13, 38-39, 132
banner files, frames, 117-118
banner.html listing, 117-118
Base Module, 356
Basic Forms Module, 346-347
Basic Tables Module, 257, 349
beef.html listing, 120-121
bgcolor attribute, 83
Bi-directional Text Module,
346
bit-maps, 44
blank space, tables, 81
blind accessibility, 128
block level elements, 19-20
emphasis, 39
form elements, 60
formatting, 191-194
blockquote tag, 39
block spacing, 194-196
BML (Broadcast Markup
Language), 327
BODY element, 19
boldface effects, 13-14, 38-39
b (bold) tag, 13, 38-39, 132
WCAG (Web Content Authoring
Guidelines), 132
bookmarks, frame sites, 123
Boolean attributes, 28
expanding, 165
borders
border attribute, 80
deleting, 51
box models, 191-194
bread.html listing, 120
Broadcast Markup Language
(BML), 327
Browse dialog box, 169
browsers
compact attribute support, 29
compatibility guidelines, 158-166
rendering documents, 14
buttons
browse button, form input controls, 69
data collection forms, 70-71
C
calendar tables, 82-85
calendar.html listing,W3C
HTML Validation Service,
151-153
Cascading Style Sheets (CSS),
166, 185
creating, 186-187
rules, 191-196
selectors, 188-190
elements
compatibility guidelines,
158-166
complex expressions, 226-227
Composite
Capabilities/Preference
Profiles (CC/PP), 318-326
content
empty content, 160
models, 247-248
planning, 264-265
sets, 248
content attribute, 18, 308
content.html listing, 118-119
controls (forms), 60-61
buttons, 70-71
check boxes, 63-64
coding, 71-73
form input, 69-70
hidden, 68-69
image controls, 71
password controls, 62-63
radio buttons, 64-65
select controls, 65-67
text areas, 67-68
text boxes, 61-62
converting XHTML
Transitional to XHTML
Basic, 301
core modules, 341-343
country codes language
attribute, 17
CSS (Cascading Style Sheets),
166, 185
creating, 186-187
rules, 191-196
selectors, 188-190
XML documents, 205
XSLT, 220-221
CuteMAP, 53-54
D
data collection forms, 59
controls, 60-61
buttons, 70-71
check boxes, 63-64
coding, 71-73
form input, 69-70
hidden, 68-69
image controls, 71
password controls, 62-63
radio buttons, 64-65
select controls, 65-67
text areas, 67-68
text boxes, 61-62
form tag, 60
processing information, 73-74
mailto: function, 74
Perl CGI scripting, 74-77
data entry, table alignment,
95-104
E
EBNF (Extended Backus Naur
Form), 226-227
ecommerce, 206
Edit Module, 345
editing meta elements, 313-314
editors
CuteMAP, 53-54
graphics, 45-46
XML, 204
elements. See also tags
block-level, 19-20
BODY, 19
creating attributes, 229-232
deprecated, 14
description meta, 311
363
364
elements
Extensible Stylesheet
Language. See XSL
Extensible Stylesheet
Language Transformations.
See XSLT
external style sheets, 186-187
F
Favorites, frame sites, 123
File menu, New Map command, 53
file uploads, data collection
forms, 69-70
Fireworks, 45
fixed-size frames, 123
font styles, 191
form tag, 60
form.cgi listing, 75-76
form.html listing, 71-72
formal public identifiers
(FPIs), 16, 266
formats
GIF, 44
JPEG, 44
PNG, 45
formatting
attributes (DTDs), 229-232
block level, 191-194
classes, 189-190
graphical navigation systems,
48-51
image maps, 52-54
meta elements, 313-314
style sheets, 186-187
XSL style sheets, 215-221
XSL-FO, 213-214
forms
controls, 60-61
buttons, 70-71
check boxes, 63-64
coding, 71-73
form input, 69-70
hidden, 68-69
image controls, 71
password controls, 62-63
radio buttons, 64-65
select controls, 65-67
text areas, 67-68
text boxes, 61-62
data collection, 59
form tag, 60
processing information, 73-74
mailto: function, 74
Perl CGI scripting, 74-77
WCAG (Web Content Authoring
Guidelines), 136
XForms, 327-331
Forms Module, 347-349
FPIs (formal public identifiers), 16, 266
fragment identifiers, 163-164
G
GIF (Graphics Interchange
Format), 44
global entities, 239
grammar checking, 142
graphical navigation systems,
48-51
graphics
aligning, 47-48
editors, 45-46
formats
GIF, 44
JPEG, 44
PNG, 45
hyperlinks, 48-51
image maps, 52-54
WCAG (Web Content Authoring
Guidelines), 134-135
grouping items, 226-227
H
h1 (heading) tag, 25, 238
handheld devices, 296
head element, 18-19, 164, 186
head tag (XHTML), 15, 18-19
headers
headers attribute, 96-98
HTTP, 319
http-response, 308
headings, 24-25
height
height attribute, 46, 83
line measurement, 195-196
hidden form controls, 68-69
hiding JavaScript, 161-162
horizontally scrolling frames,
123
Host Language, 279-286
Host Language Conformant,
341-343
listings
hot spots, 52
href attribute, 186
HTML (Hypertext Markup
Language)
character encoding, 164
commenting facilities, 161-162
DOM (Document Object Model),
165
empty elements, 159-160
writing XHTML, 171
versus XHTML, 9
html tag, 15
HTML Tidy, 167-168
HTML-Kit, 170
HTTP headers, 319
http-equiv attribute, 164, 308,
312
http-response headers, 308
hyperlinks
images, 48, 50-51
WCAG (Web Content Authoring
Guidelines), 134
Hypertext Module, 343
Hypertext Markup Language.
See HTML
I
i (italic) tag, 10, 13, 38-39
id attribute, 96-98, 190
id element, 164
ID token, 231-232
identifying natural language
of documents, 162
IDREF token, 231-232
Iframes Module, 354
Image element, inserting
images, 46-51
Image Module, 351-352
images
aligning, 47-48
editors, 45-46
GIF, 44
hyperlinks, 48-51
image controls, forms, 71
inserting, 46-51
JPEG, 44
maps, 52-54
PNG, 45
WCAG (Web Content Authoring
Guidelines), 134-135
img element, 228
implementation
abstrct modules, 248-254
QName Module, 266-272
indenting text, 195
information collection forms.
See data collection forms
Inline content set, 248
inline elements, 20-21, 192
Image, 46-51
inline emphasis, 38-39
J
JASC Paint Shop Pro, 45
JavaScript, hiding, 161-162
JavaServer, 319
JPEG (Joint Photographic
Experts Group), 44
justifying text, 195
K-L
keywords
search engines, 311
SYSTEM, 16
language attribute, 16-18, 132,
162
language definition (WCAG
guidelines), 132
LanguageCode entity, 236
languages
Host, 279-286
Recipe Markup Language DTD,
287-288
syntax (EBNF), 226-227
365
366
lists
lists, 25-26
definition lists, 36-37
nesting, 37-38
ordered lists, 29-30
changing individual values,
32
nested, 33-36
start values, 30
reading allowable attributes,
245-247
unordered lists, 26
compact attribute, 28-29
type attribute, 26, 28
WCAG guidelines, 133
location paths (XPath), 210
absolute/relative, 212-213
M
machine instructions, applying metadata, 310-312
Macintosh Perl CGI scripting,
77
Macromedia Fireworks, 45
mailto: function, 74
main.html listing, 117
managing fragment identifiers, 163-164
maps (image), 52-54
margins
marginheight attribute (Frame
tag), 111
marginwidth attribute (Frame
tag), 111
XML (Extensible Markup
Language), 162
masking password characters,
62
maxlength attribute
Input tag, 61
text boxes, 62
mechanics
XPath, 208
XSL style sheets, 215-221
XSLT, 203
memo.html listing, 12
merging cells, 85-91
meta information, 308-314
meta tag, 15, 18, 164
metadata, 15, 320
applying machine instructions,
310-312
DOCTYPE declarations, 15-16
head elements, 18-19
meta elements, 18
root elements, 16
language attribute, 17-18
namespace attribute, 16-17
title elements, 18
Metainformation Module, 355
method attribute (form
tag), 60
Microsoft Paint, 45
MIME-types, 13
miniature computers, 296
minimal content model,
247-248, 336
models
box, 191-194
Empty content, 228
minimal content, 247-248, 336
Modular Framework, 276-278
integrating, 279-28
modifying images, 47-48
Modular Framework Model,
276-278
integrating, 279-286
modularization, 244
abstract module definitions,
244-248
DTDs, 248-254, 287-288
drivers, 254-260
Web use, 260-261
modules
abstract, 265
Applet, 344
Base, 356
Basic Forms, 346-347
Basic Table, 257
Bi-directional Text, 346
Client-side Image Map, 352
core, 341-343
Edit, 345
Forms, 347-349
Hypertext, 343
Iframes, 354
Image, 351-352
Legacy, 357-358
Link, 356
List, 249-254, 343
Metainformation, 355
Name Identification, 357
Object, 353
Presentation, 345
QName, 266-272
Scripting, 355
Server-side Image Map, 352
Style Attribute, 356
Style Sheet, 356
Tables, 349-351
Text, 342-343
text extension, 345
XHTML Basic, 297
Monthly Calendar View
listing, 84-85
moving
images, 47-48
text, 47-48, 195
N
name attribute, 18, 231, 308
Input tag, 61
text boxes, 62
O
Object Module, 353
object tag, 135
objects
controls, 60-61
buttons, 70-71
check boxes, 63-64
coding, 71-73
form input, 69-70
hidden, 68-69
image controls, 71
password controls, 62-63
radio buttons, 64-65
select controls, 65-67
text areas, 67-68
text boxes, 61-62
WCAG (Web Content Authoring
Guidelines), 135
XSL-FO, 213-214
occurrence indicators, 228, 336
ol (ordered list) element, 133,
181
Strict DTD
P
p (paragraph) elements, 15,
190
Paint (Microsoft), 45
Paint Shop Pro, 45
palm-sized devices, 296
parameter entities (PEs),
234-238, 266-267
parsers, XML, 204-205
comments, 161
password controls, forms,
62-63
paths
Perl scripting, 76
XPath, 210
PEs (parameter entities),
234-238, 266-267
Perl CGI scripting, processing
data collection forms, 74-77
Photoshop, 46
pixels, sizing frames, 110
planning
attributes/global entities, 239
content model, 264-265
PNG (Portable Network
Graphics), 44-45
poly (polygon), 53
position-dependent information, CC/PP, 325-326
poultry.html listing, 121
PPs (parameter entities), 266
prefixes, distinguishing
elements, 207
prep element, 265
presentation attributes
(Frame tag), 111
Presentation Module, 345
printing frame sites, 123
Q-R
QName Module, 266-272
radio buttons, forms, 64-65
RDF (Resource Description
Framework), 320
reading
allowable attributes list, 245-247
DTDs, 232-239
Readonly attribute (Input
tag), 61
Recipe Markup Language
DTD, 287-288
recipes.html listing 6.1,
116-117
rect (rectangle), 52
Reggie Metadata Editor, 314
rel attribute, 186
relative location paths
(XPath), 212-213
relative sizing, frames, 110
rendering documents, 14
revised form listing 5.3
revisions, meta elements,
313-314
robots, 310-312
Robots Exclusion Protocol,
310
root elements, 16, 200
language attribute, 17-18
namespace attribute, 16-17
rows, 80-81
merging cells, 85-91
rows attribute, text area
controls, 67-68
rowspan attribute, 83
S
Scalable Vector Graphics
(SVG), 330
schemas, 182-183, 248-254
well-formedness, 181
scheme attribute, 309
scope (CC/PP), 322
scripts
embedding, 161-162
processing data collection forms,
74-77
Script entity, 237
Scripting Module, 355
WCAG (Web Content Authoring
Guidelines), 136
scrolling frames, 111, 123
search engines
metadata, 310-312
Search Engine Watch Web site,
312
sections (CDATA), 162
security (CC/PP), 324
select controls, forms, 65-67
select tag, 65
selectors
creating classes, 189-190
style sheets, 188-190
semantics, elemental, 24
sequences, 229
Server-side Image Map
Module, 352
Set of Nested Lists listing,
35-36
SGML (Standard Generalized
Markup Language), 9
shorthand, XML (Extensible
Markup Language), 159
single-occurrence styles, 190
size attribute
Input tag, 61
select controls, 66
text boxes, 62
sizing frames, 109-110, 123
small-footprint devices,
296-297
SMIL (Synchronized
Multimedia Integration
Language), 330
soups.html listing, 119-120
source code listings. See
listings
spacing
letters/words, 196-197
within block (MOVE UP),
194-196
speech synthesizers, 97
spidering agents, 310-312
src attribute, 162
Input tag, 61
Standard Generalized Markup
Language (SGML), 9
start values, ordered lists,
30-32
storage, content models,
264-265
367
368
T
tabindex attribute, 136
Input tag, 61
Tables Module, 350-351
tables, 79-84
alignment, 95-102, 104
align attribute, 82
axis attribute, 98-101
bgcolor attribute, 83
blank space, 81
border attribute, 80
calendar, 82-85
cells, 80-81
denoting empty cells, 160
colspan attribute, 83
Expense Report File listing,
101-104
height attribute, 83
merging cells, 85-91
nesting, 92-95
reading with speech synthesizers, 97
rows, 80-81
rowspan attribute, 83
WCAG (Web Content Authoring
Guidelines), 133
width attribute, 83, 89
tags. See also elements
applet, 135
b (bold), 13, 38-39, 132
blockquote, 39
dl (definition lists), 36-37
nesting lists, 37-38
!DOCTYPE, 15-16
em (emphasis), 10, 13, 38
empty elements, 159-160
enctype, 70
form, 60
Frame, presentation attributes,
111
h1 (heading), 25
head, 15, 18-19
html, 15
i (italic), 10, 13, 38-39
Input, 60-61
li (ordered lists), 29-30
changing individual values,
32
nested lists, 33-36
start values, 30
meta, 15, 18
noframes, 115-116
noscript, 136
object, 135
ol, 133
p (paragraph), 15
select, 65
strong, 13, 38
syntax checking, 143-144
TD, headers attribute, 96-98
TH
axis attribute, 98-101
id attribute, 96-98
title, 15, 18
ul (unordered lists), 26, 133
compact attribute, 28-29
type attribute, 26-28
target attribute, frames,
113-122
TD tag, 96-98
Techniques for WCAG document Web site, 131
text
aligning, 47-48, 195
forms
text area controls, 67-68
text boxes, 61-62
indenting, 195
styles, 191
XSLT, 200-201
U
ul (unordered lists) tag, 26
compact attribute, 28-29
type attribute, 26, 28
ul tag (XHTML), 133
unabbreviated syntax, XPath,
211-212
Uniform Resource Identifiers
(URIs), 17, 207
universal accessibility, 129-130
unordered lists, 26, 228
compact attribute, 28-29
type attribute, 26-28
updating to XHTML Basic,
297, 300-303
uploading files, data collection
forms, 69-70
URIs (Uniform Resource
Identifiers), 17, 207
URLs (Uniform Resource
Locators), 207
processing data collection forms,
XSLT
77
usemap attribute (Input
tag), 61
utilities, 166
HTML Tidy, 167-168
HTML-Kit, 170
meta elements, 313-314
TidyGUI, 168-169
V
validating documents
grammar checking, 142
interoperability, 144-145
syntax errors, 143-144
typographical errors, 142
W3C HTML Validation Service,
145-147
error reports, 147-155
validity, 10
valign attribute, 82
value attribute
button controls, 70
check box controls, 64
Input tag, 61
radio buttons, 64
value types, attributes, 230
http-equiv, 312
veggies.html listing, 120
vertical alignment (valign)
attribute, 82
W
W3C (World Wide Web
Consortium)
CC/PP, 323-324
HTML Validation Service,
145-147
error reports, 147-155
Web site, 145
WAI program, 129-130
Web
browsers, rendering documents,
14
disabled accessibility, 127-129
W3C (World Wide Web
Consortium), 129-130
WCAG (Web Content
Authoring Guidelines),
130-139
graphics editors, 45-46
modularized DTD (Document
Type Definition), 260-261
sites
checklist of checkpoints, 131
DOM (Document Object
Model), 205
Internet Content Rating
Association, 310
PNG, 44
Search Engine Watch, 312
Techniques for WCAG
document, 131
TidyGUI, 168
W3C HTML Validation
Service, 145
WCAG document, 130
XForms, 327-331
XHTML Basic, 297, 300-303
Web Content Authoring
Guidelines (WCAG), 130-131
applets, 135
checkpoints, 130, 136-139
designing documents, 131
emphasizing text, 132
forms, 136
graphics, 134-135
hyperlinks, 134
language definition, 132
lists, 133
objects, 135
scripts, 136
tables, 133
WCAG document Web site, 130
WebTV, 297
well-formedness, 10, 178-181
white space, 206, 227
deleting, 51
XML (Extensible Markup
Language), 162
width attribute, 46, 83, 89
wireless access, 297
CC/PP (Composite
Capabilities/Preference
Profiles), 318-326
word spacing, 196-197
World Wide Web. See Web
Working Drafts, 214
writing
documents, well-formedness,
178-181
XHTML with HTML tools, 171
X-Z
XForms, 327-331
XHTML (Extensible Hypertext
Markup Language)
CC/PP, 322-323
HTML comparison, 9
Modular Framework Module,
276-278
integrating, 279-286
structure requirements, 10
tags. See elements; tags
validity, 10
XHTML Basic, 297
evaluating results, 305
369