Guide Htmldoc
Guide Htmldoc
Table of Contents
Introduction...................................................................................................................................................IN-1
History................................................................................................................................................IN-1
Organization of This Manual.............................................................................................................IN-2
Support...............................................................................................................................................IN-2
Encryption Support.............................................................................................................................IN-2
Copyright, Trademark, and License Information...............................................................................IN-2
Chapter 1 - Installing HTMLDOC................................................................................................................1-1
Requirements........................................................................................................................................1-1
Installing HTMLDOC..........................................................................................................................1-1
Installing HTMLDOC on Microsoft Windows..............................................................................1-1
Installing HTMLDOC on MacOS X..............................................................................................1-2
Installing HTMLDOC on Linux....................................................................................................1-3
Installing HTMLDOC on Solaris...................................................................................................1-3
Licensing HTMLDOC..........................................................................................................................1-3
Uninstalling HTMLDOC......................................................................................................................1-4
Uninstalling HTMLDOC on Microsoft Windows.........................................................................1-4
Uninstalling HTMLDOC on MacOS X.........................................................................................1-4
Uninstalling HTMLDOC on Linux................................................................................................1-4
Uninstalling HTMLDOC on Solaris..............................................................................................1-4
Chapter 2 - Getting Started............................................................................................................................2-1
Starting HTMLDOC.............................................................................................................................2-1
Choosing a HTML File........................................................................................................................2-2
Setting the Output File..........................................................................................................................2-3
Generating the Document.....................................................................................................................2-4
Chapter 3 - Generating Books........................................................................................................................3-1
Overview..............................................................................................................................................3-1
Choosing HTML Files..........................................................................................................................3-2
Selecting a Title File.............................................................................................................................3-2
Setting the Output Format....................................................................................................................3-3
Setting the Output File..........................................................................................................................3-3
Generating the Document.....................................................................................................................3-3
Saving Your Book................................................................................................................................3-3
Chapter 4 - HTMLDOC from the Command-Line......................................................................................4-1
Getting to the Command-Line on Windows.........................................................................................4-1
The Basics of Command-Line Access..................................................................................................4-2
What Are All These Commands?..................................................................................................4-2
Converting Multiple HTML Files........................................................................................................4-3
Generating Books.................................................................................................................................4-3
What are all these commands?.......................................................................................................4-3
Setting the Title File.............................................................................................................................4-4
Putting It All Together...................................................................................................................4-4
Chapter 5 - Using HTMLDOC on a Web Server.........................................................................................5-1
The Basics.............................................................................................................................................5-1
Using HTMLDOC as a CGI Program..................................................................................................5-1
Server-Side Preferences.................................................................................................................5-2
Configuring HTMLDOC with Apache..........................................................................................5-2
Configuring HTMLDOC with Microsoft IIS.................................................................................5-3
Additional Configuration for IIS 6.0..............................................................................................5-7
i
Table of Contents
Chapter 5 - Using HTMLDOC on a Web Server
Using HTMLDOC From Server-Side Scripts and Programs...............................................................5-8
Calling HTMLDOC from a Shell Script........................................................................................5-9
Calling HTMLDOC from Perl.....................................................................................................5-10
Calling HTMLDOC from PHP....................................................................................................5-10
Calling HTMLDOC from C.........................................................................................................5-12
Calling HTMLDOC from Java....................................................................................................5-13
Chapter 6 - HTML Reference........................................................................................................................6-1
General Usage.......................................................................................................................................6-1
Elements...............................................................................................................................................6-2
Comments.............................................................................................................................................6-4
Header/Footer Strings....................................................................................................................6-6
FONT Attributes...................................................................................................................................6-7
Headings...............................................................................................................................................6-7
Numbered Headings.......................................................................................................................6-8
Images...................................................................................................................................................6-8
Links.....................................................................................................................................................6-8
META Attributes..................................................................................................................................6-9
Page Breaks..........................................................................................................................................6-9
Tables....................................................................................................................................................6-9
Chapter 7 - GUI Reference.............................................................................................................................7-1
The HTMLDOC GUI...........................................................................................................................7-1
Document File Operations.............................................................................................................7-1
New................................................................................................................................................7-1
Open...............................................................................................................................................7-1
Save................................................................................................................................................7-2
Save As..........................................................................................................................................7-2
Generate.........................................................................................................................................7-2
Close...............................................................................................................................................7-2
The Input Tab.......................................................................................................................................7-3
Document Type..............................................................................................................................7-3
Input Files......................................................................................................................................7-3
Add Files........................................................................................................................................7-3
Edit Files........................................................................................................................................7-3
Delete Files....................................................................................................................................7-4
Move Up........................................................................................................................................7-4
Move Down....................................................................................................................................7-4
Logo Image....................................................................................................................................7-4
Title File/Image..............................................................................................................................7-4
The Output Tab.....................................................................................................................................7-5
Output To.......................................................................................................................................7-5
Output Path....................................................................................................................................7-5
Output Format................................................................................................................................7-5
Output Options...............................................................................................................................7-5
Compression...................................................................................................................................7-6
JPEG Quality..................................................................................................................................7-6
The Page Tab........................................................................................................................................7-7
Page Size........................................................................................................................................7-7
2-Sided...........................................................................................................................................7-7
Landscape.......................................................................................................................................7-7
Top, Left, Right, and Bottom.........................................................................................................7-7
ii
Table of Contents
Chapter 7 - GUI Reference
Header and Footer..........................................................................................................................7-8
The TOC Tab........................................................................................................................................7-9
Table of Contents...........................................................................................................................7-9
Numbered Headings.......................................................................................................................7-9
Header and Footer..........................................................................................................................7-9
Title................................................................................................................................................7-9
The Colors Tab...................................................................................................................................7-10
Body Color...................................................................................................................................7-10
Body Image..................................................................................................................................7-10
Text Color....................................................................................................................................7-10
Link Color....................................................................................................................................7-10
Link Style.....................................................................................................................................7-10
The Fonts Tab.....................................................................................................................................7-11
Base Font Size..............................................................................................................................7-11
Line Spacing................................................................................................................................7-11
Body Typeface.............................................................................................................................7-11
Heading Typeface........................................................................................................................7-11
Header/Footer Size.......................................................................................................................7-12
Header/Footer Font......................................................................................................................7-12
Character Set................................................................................................................................7-12
Options.........................................................................................................................................7-12
The PS Tab.........................................................................................................................................7-12
PostScript Level...........................................................................................................................7-12
Send Printer Commands...............................................................................................................7-13
Include Xerox Job Comments......................................................................................................7-13
The PDF Tab......................................................................................................................................7-13
PDF Version.................................................................................................................................7-13
Page Mode....................................................................................................................................7-14
Page Layout..................................................................................................................................7-14
First Page......................................................................................................................................7-14
Page Effect...................................................................................................................................7-14
Page Duration...............................................................................................................................7-14
Effect Duration.............................................................................................................................7-14
The Security Tab................................................................................................................................7-15
Encryption....................................................................................................................................7-15
Permissions..................................................................................................................................7-15
Owner Password..........................................................................................................................7-15
Options.........................................................................................................................................7-15
User Password..............................................................................................................................7-16
The Options Tab.................................................................................................................................7-16
HTML Editor...............................................................................................................................7-16
Browser Width.............................................................................................................................7-16
Search Path...................................................................................................................................7-17
Proxy URL...................................................................................................................................7-17
Tooltips........................................................................................................................................7-17
Modern Look................................................................................................................................7-17
Strict HTML.................................................................................................................................7-17
Save Options and Defaults...........................................................................................................7-17
The File Chooser................................................................................................................................7-18
Show.............................................................................................................................................7-18
Favorites.......................................................................................................................................7-18
File List........................................................................................................................................7-18
iii
Table of Contents
Chapter 7 - GUI Reference
Filename.......................................................................................................................................7-18
Dialog Buttons.............................................................................................................................7-19
Chapter 8 - Command-Line Reference.........................................................................................................8-1
Basic Usage..........................................................................................................................................8-1
Options.................................................................................................................................................8-1
-d directory.....................................................................................................................................8-1
-f filename......................................................................................................................................8-2
-t format..........................................................................................................................................8-2
-v....................................................................................................................................................8-2
--batch filename.book....................................................................................................................8-2
--bodycolor color............................................................................................................................8-2
--bodyfont typeface........................................................................................................................8-3
--bodyimage filename....................................................................................................................8-3
--book.............................................................................................................................................8-3
--bottom margin.............................................................................................................................8-3
--browserwidth pixels....................................................................................................................8-3
--charset charset.............................................................................................................................8-4
--color.............................................................................................................................................8-4
--compression[=level]....................................................................................................................8-4
--continuous...................................................................................................................................8-5
--cookies 'name=\"value with space\"; name=value'......................................................................8-5
--datadir directory..........................................................................................................................8-5
--duplex..........................................................................................................................................8-5
--effectduration seconds.................................................................................................................8-5
--embedfonts..................................................................................................................................8-5
--encryption....................................................................................................................................8-5
--firstpage page..............................................................................................................................8-6
--fontsize size.................................................................................................................................8-6
--fontspacing spacing.....................................................................................................................8-6
--footer lcr......................................................................................................................................8-7
--format format...............................................................................................................................8-8
--gray..............................................................................................................................................8-8
--header lcr.....................................................................................................................................8-8
--headfootfont font.........................................................................................................................8-9
--headfootsize size..........................................................................................................................8-9
--headingfont typeface.................................................................................................................8-10
--help............................................................................................................................................8-10
--helpdir directory........................................................................................................................8-10
--jpeg[=quality]............................................................................................................................8-10
--landscape...................................................................................................................................8-10
--left margin.................................................................................................................................8-10
--linkcolor color...........................................................................................................................8-10
--links...........................................................................................................................................8-11
--linkstyle style.............................................................................................................................8-11
--logoimage filename...................................................................................................................8-11
--no-compression.........................................................................................................................8-11
--no-duplex...................................................................................................................................8-11
--no-embedfonts...........................................................................................................................8-11
--no-encryption.............................................................................................................................8-11
--no-jpeg.......................................................................................................................................8-11
--no-links......................................................................................................................................8-12
iv
Table of Contents
Chapter 8 - Command-Line Reference
--no-localfiles...............................................................................................................................8-12
--no-numbered..............................................................................................................................8-12
--no-pscommands.........................................................................................................................8-12
--no-strict......................................................................................................................................8-12
--no-title.......................................................................................................................................8-12
--no-toc.........................................................................................................................................8-12
--no-xrxcomments........................................................................................................................8-12
--numbered...................................................................................................................................8-12
--nup pages...................................................................................................................................8-12
--outdir directory..........................................................................................................................8-12
--outfile filename..........................................................................................................................8-13
--owner-password password........................................................................................................8-13
--pageduration seconds................................................................................................................8-13
--pageeffect effect........................................................................................................................8-14
--pagelayout layout......................................................................................................................8-15
--pagemode mode.........................................................................................................................8-15
--path dir1;dir2;dir3;...;dirN.........................................................................................................8-15
--permissions permission[,permission,...]....................................................................................8-16
--portrait.......................................................................................................................................8-16
--pscommands..............................................................................................................................8-16
--quiet...........................................................................................................................................8-16
--referer url...................................................................................................................................8-16
--right margin...............................................................................................................................8-17
--size size......................................................................................................................................8-17
--strict...........................................................................................................................................8-17
--textcolor color............................................................................................................................8-17
--textfont typeface........................................................................................................................8-18
--title.............................................................................................................................................8-18
--titlefile filename........................................................................................................................8-18
--titleimage filename....................................................................................................................8-18
--tocfooter lcr...............................................................................................................................8-18
--tocheader lcr..............................................................................................................................8-18
--toclevels levels..........................................................................................................................8-18
--toctitle string..............................................................................................................................8-19
--top margin..................................................................................................................................8-19
--user-password password............................................................................................................8-19
--verbose.......................................................................................................................................8-19
--version.......................................................................................................................................8-19
--webpage.....................................................................................................................................8-19
--xrxcomments.............................................................................................................................8-19
Environment Variables.......................................................................................................................8-20
HTMLDOC_DATA.....................................................................................................................8-20
HTMLDOC_DEBUG..................................................................................................................8-20
HTMLDOC_HELP......................................................................................................................8-20
HTMLDOC_NOCGI...................................................................................................................8-20
Messages.............................................................................................................................................8-21
BYTES: Message.........................................................................................................................8-21
DEBUG: Messages......................................................................................................................8-21
ERRnnn: Messages......................................................................................................................8-21
INFO: Messages...........................................................................................................................8-21
PAGES: Message.........................................................................................................................8-21
REMOTEBYTES: Message........................................................................................................8-21
v
Table of Contents
Chapter 8 - Command-Line Reference
TIMING: Message.......................................................................................................................8-22
Appendix A - License Agreement..................................................................................................................A-1
Introduction..........................................................................................................................................A-1
Source Code and the GNU GPL...................................................................................................A-1
Trademarks....................................................................................................................................A-2
Binary Distribution Rights............................................................................................................A-2
Binaries and Support.....................................................................................................................A-2
End-User License Agreement..............................................................................................................A-3
TERMS AND CONDITIONS OF SOFTWARE LICENSE........................................................A-3
LIMITED WARRANTY AND DISCLAIMER OF WARRANTY; LIMITATION OF
LIABILITY...........................................................................................................................A-4
Appendix B - Book File Format.....................................................................................................................B-1
Introduction..........................................................................................................................................B-1
The Header...........................................................................................................................................B-1
The Options..........................................................................................................................................B-2
The Files..............................................................................................................................................B-2
Putting It All Together.........................................................................................................................B-2
Older Book Files..................................................................................................................................B-3
Appendix C - Release Notes...........................................................................................................................C-1
Changes in HTMLDOC v1.8.27..........................................................................................................C-1
Changes in HTMLDOC v1.8.26..........................................................................................................C-1
Changes in HTMLDOC v1.8.25..........................................................................................................C-1
Changes in HTMLDOC v1.8.24..........................................................................................................C-1
Changes in HTMLDOC v1.8.23..........................................................................................................C-2
Changes in HTMLDOC v1.8.22..........................................................................................................C-2
Changes in HTMLDOC v1.8.21..........................................................................................................C-2
Changes in HTMLDOC v1.8.20..........................................................................................................C-2
Changes in HTMLDOC v1.8.19..........................................................................................................C-3
Changes in HTMLDOC v1.8.18..........................................................................................................C-3
Changes in HTMLDOC v1.8.17..........................................................................................................C-3
Changes in HTMLDOC v1.8.16..........................................................................................................C-4
Changes in HTMLDOC v1.8.15..........................................................................................................C-4
Changes in HTMLDOC v1.8.14..........................................................................................................C-4
Changes in HTMLDOC v1.8.13..........................................................................................................C-4
Changes in HTMLDOC v1.8.12..........................................................................................................C-4
Changes in HTMLDOC v1.8.8............................................................................................................C-5
Changes in HTMLDOC v1.8.7............................................................................................................C-5
Changes in HTMLDOC v1.8.6............................................................................................................C-5
Changes in HTMLDOC v1.8.5............................................................................................................C-5
Changes in HTMLDOC v1.8.4............................................................................................................C-5
Changes in HTMLDOC v1.8.3............................................................................................................C-5
Changes in HTMLDOC v1.8.2............................................................................................................C-5
Changes in HTMLDOC v1.8.1............................................................................................................C-6
Changes in HTMLDOC v1.8...............................................................................................................C-6
Appendix D - Compiling HTMLDOC from Source....................................................................................C-6
Requirements.......................................................................................................................................C-6
Compiling under UNIX/Linux.............................................................................................................C-6
Compiling on Windows Using Visual C++.........................................................................................C-6
vi
Table of Contents
Appendix D - Compiling HTMLDOC from Source
Installing with Visual C++...................................................................................................................C-6
vii
viii
Introduction
This document describes how to use the HTMLDOC software, version 1.8.27. HTMLDOC converts
Hyper-Text Markup Language ("HTML") input files into indexed HTML, Adobe PostScript, or Adobe
Portable Document Format ("PDF") files.
HTMLDOC supports most HTML 3.2 elements, some HTML 4.0 elements, and can generate title and table of
contents pages. It does not currently support stylesheets.
HTMLDOC can be used as a standalone application, in a batch document processing environment, or as a
web-based report generation application.
No restrictions are placed upon the output produced by HTMLDOC.
HTMLDOC is available both as open source software under the terms of the GNU General Public License
and as commercial software under the terms of a traditional commercial End-User License Agreement.
History
Like many programs HTMLDOC was developed in response to a need our company had for generating
high-quality documentation in printed and electronic forms. For a while we used FrameMaker and a
package from sgi that generated "compiled" Standard Generalized Markup Language ("SGML") files that
could be used by the Electronic Book Technologies ("EBT") documentation products; EBT was bought by
INSO who was bought by StellentTM who apparently has dropped the whole product line. When sgi stopped
supporting these tools we turned to INSO, but the cost of their tools is prohibitive to small businesses.
In the end we decided to write our own program to generate our documentation. HTML seemed to be the
source format of choice since WYSIWYG HTML editors are widely (and freely) available and at worst you
Introduction
IN-1
Support
Commercial support is available from Easy Software Products when you purchase the HTMLDOC
Professional Membership. More information is available at the HTMLDOC web page at the following URL:
http://www.easysw.com/htmldoc/
Encryption Support
HTMLDOC includes code to encrypt PDF document files using the RC4 algorithm with up to a 128-bit key.
While this software and code may be freely used and exported under current US laws, other countries may
restrict your use and possession of this code and software.
IN-2
History
IN-3
1-4
This chapter describes the steps needed to install the commercial version of HTMLDOC on your system. If
you are installing HTMLDOC from source code, please see Appendix D, Compiling HTMLDOC from
Source.
Requirements
HTMLDOC requires approximately 4MB of disk space and one of the following environments:
Microsoft Windows 2000 or higher
MacOS X 10.2 or higher
Linux 2.4 or higher
Solaris 7 or higher
HTMLDOC may run on other platforms, however we do not provide packages for platforms other than those
listed.
Installing HTMLDOC
The following instructions describe how to install the HTMLDOC software on your system.
1-1
1-2
Licensing HTMLDOC
Before you can use HTMLDOC, you must license it. When you first run HTMLDOC, the license dialog
(Figure 1-3) will appear.
1-3
Uninstalling HTMLDOC
The following instructions describe how to remove the HTMLDOC software from your system.
1-4
Licensing HTMLDOC
This chapter describes how to start HTMLDOC and convert HTML files into PostScript and PDF files.
Note:
HTMLDOC currently does not support HTML 4.0 features such as stylesheets or the
STYLE, TBODY, THEAD, or TFOOT elements. For more information, please consult
Chapter 6 - HTML Reference.
Starting HTMLDOC
For Windows click:
Start Menu->All Programs->HTMLDOC->HTMLDOC
or type:
htmldoc ENTER
2-1
or type:
htmldoc ENTER
2-2
Starting HTMLDOC
2-3
2-4
Overview
While HTMLDOC can convert web pages into PostScript and PDF files, its real strength is generating
indexed HTML, PostScript, or PDF books.
HTMLDOC uses HTML heading elements to delineate chapters and headings in a book. The H1 element is
used for chapters:
<HTML>
<HEAD>
<TITLE>The Little Computer that Could</TITLE>
</HEAD>
<BODY>
<H1>Chapter 1 - The Little Computer is Born</H1>
...
<H1>Chapter 2 - Little Computer's First Task</H1>
...
</BODY>
</HTML>
3-1
3-2
Overview
3-3
4-4
This chapter describes how to use HTMLDOC from the command-line to convert web pages and generate
books.
4-1
You now have a list of available files and directories that you can use. To access a different directory simply
type cd and the name of the new directory. For example, type the following if you want to access a directory
called Steve:
cd Steve ENTER
4-2
All we are doing is adding another file. In this example we are converting two files: file1.html and file2.html.
Try this example: Convert one.html and two.html into a PDF file named 12pdf.pdf. Again, the answer is on
the next line.
Your line command should look like this:
htmldoc --webpage -f 12pdf.pdf one.html two.html ENTER
We've been using HTML files, but you can also use URLs. For example:
htmldoc --webpage -f output.pdf http://slashdot.org/ ENTER
Generating Books
Type one of the following commands to generate a book from one or more HTML files:
htmldoc --book -f output.html file1.html file2.html ENTER
htmldoc --book -f output.pdf file1.html file2.html ENTER
htmldoc --book -f output.ps file1.html file2.html ENTER
4-3
--titlefile
--titlefile
--titlefile
--titlefile
--titlefile
HTMLDOC supports BMP, GIF, JPEG, and PNG images, as well as generic HTML text you supply for the
title page(s).
Take a look at the entire command line. Dissect the information. Can you see what the new filename is? What
are the names of the files being converted? Do you see the titlepage file? What kind of file is your titlefile?
Figure it out? The new file is 12book.pdf. The files converted were 1book.html and 2book.html. A title
page was created using the JPEG image file bookcover.jpg.
Chapter 8 - Command Line Reference digs deeper into what you can do with the the command line prompt.
4-4
This chapter describes how to interface HTMLDOC to your web server using CGI and your own server-side
scripts and programs.
The Basics
HTMLDOC can be used in a variety of ways to generate formatted reports on a web server. The most
common way is to use HTMLDOC as a CGI program with your web server to provide PDF-formatted output
of a web page. Examples are provided for Microsoft IIS and the Apache web servers.
HTMLDOC can also be called from your own server-side scripts and programs. Examples are provided for
PHP and Java.
WARNING:
Passing information directly from the web browser to HTMLDOC can potentially
expose your system to security risks. Always be sure to "sanitize" any input from the
web browser so that filenames, URLs, and options passed to HTMLDOC are not
acted on by the shell program or other processes.
5-1
and if you installed HTMLDOC in your server's cgi-bin directory, you would direct your clients to the
following URL:
http://servername/cgi-bin/htmldoc/superproducts.html
The boldface portion represents the location of the HTMLDOC executable on the web server. You simply
place that path before the page you want to convert.
Form data using the GET method can be passed at the end of the URL, for example:
http://servername/cgi-bin/htmldoc/superproducts.html?name=value
Server-Side Preferences
When run as a CGI program, HTMLDOC will try to read a book file to set any preferences for the conversion
to PDF. For the superproducts.html file described previously, HTMLDOC will look at the following URLs
for a book file:
http://servername/superproducts.html.book
http://servername/.book
http://servername/cgi-bin/.book
If you are using Apache 2.0.30 or higher, you will also need to enable PATH_INFO support by adding the
following line to your httpd.conf file:
AcceptPathInfo On
Apache also allows you to associate CGI programs with a specific extension. If you add the following line to
your httpd.conf file:
AddHandler cgi-script .cgi
and enable CGI execution with the Options directive for a directory:
Options +ExecCGI
then you can copy or symlink the htmldoc executable to an alternate location. For example, if you have a web
directory called /var/www/htdocs/products, you can install HTMLDOC in this directory with the following
command:
ln -s /usr/bin/htmldoc /var/www/htdocs/products/htmldoc.cgi ENTER
5-2
5-3
5-4
5-5
5-6
The boldface portion represents the location of the HTMLDOC program on the web server.
5-7
5-8
Users of this CGI would reference the URL "http://www.domain.com/topdf.cgi/index.html" to generate a PDF
file of the site's home page.
The options variable in the script can be set to use any supported command-line option for HTMLDOC; for a
complete list see Chapter 8 - Command-Line Reference.
5-9
The function accepts a filename and an optional "options" string for specifying the header, footer, fonts, etc.
To prevent malicious users from passing in unauthorized characters into this function, the following function
can be used to verify that the URL/filename does not contain any characters that might be interpreted by the
shell:
5-10
Another method is to use the escapeshellarg() function provided with PHP 4.0.3 and higher to
generate a quoted shell argument for HTMLDOC.
To make a "portal" script, add the following code to complete the example:
global
global
global
global
$SERVER_NAME;
$SERVER_PORT;
$PATH_INFO;
$QUERY_STRING;
if ($QUERY_STRING != "") {
$url = "http://${SERVER_NAME}:${SERVER_PORT}${PATH_INFO}?${QUERY_STRING}";
} else {
$url = "http://${SERVER_NAME}:${SERVER_PORT}$PATH_INFO";
}
if (bad_url($url)) {
print("<html><head><title>Bad URL</title></head>\n"
."<body><h1>Bad URL</h1>\n"
."<p>The URL <b><tt>$url</tt></b> is bad.</p>\n"
."</body></html>\n");
} else {
topdf($url);
}
5-11
/*
* Tell HTMLDOC not to run in CGI mode...
*/
putenv("HTMLDOC_NOCGI=1");
/*
* Write the content type to the client...
*/
puts("Content-Type: application/pdf\n");
/*
* Run HTMLDOC to provide the PDF file to the user...
*/
sprintf(command, "htmldoc --quiet -t pdf --webpage %s", filename);
return (popen(command, "w"));
}
5-12
5-13
5-14
This chapter defines all of the HTML elements and attributes that are recognized and supported by
HTMLDOC.
General Usage
There are two types of HTML files - structured documents using headings (H1, H2, etc.) which HTMLDOC
calls "books", and unstructured documents that do not use headings which HTMLDOC calls "web pages".
A very common mistake is to try converting a web page using:
htmldoc -f filename.pdf filename.html
which will likely produce a PDF file with no pages. To convert web page files you must use the --webpage
option at the command-line or choose Web Page in the input tab of the GUI.
Note:
HTMLDOC does not support HTML 4.0 elements, attributes, stylesheets, or scripting.
6-1
Elements
The following HTML elements are recognized by HTMLDOC:
Element
6-2
Version Supported?
Notes
!DOCTYPE
3.0
Yes
DTD is ignored
1.0
Yes
See Below
ACRONYM
2.0
Yes
No font change
ADDRESS
2.0
Yes
AREA
2.0
No
1.0
Yes
BASE
2.0
No
BASEFONT
1.0
No
BIG
2.0
Yes
BLINK
2.0
No
BLOCKQUOTE
2.0
Yes
BODY
1.0
Yes
BR
2.0
Yes
CAPTION
2.0
Yes
CENTER
2.0
Yes
CITE
2.0
Yes
Italic/Oblique
CODE
2.0
Yes
Courier
DD
2.0
Yes
DEL
2.0
Yes
Strikethrough
DFN
2.0
Yes
Helvetica
DIR
2.0
Yes
DIV
3.2
Yes
DL
2.0
Yes
DT
2.0
Yes
Italic/Oblique
EM
2.0
Yes
Italic/Oblique
EMBED
2.0
Yes
HTML Only
FONT
2.0
Yes
See Below
FORM
2.0
No
FRAME
3.2
No
Elements
Notes
FRAMESET
3.2
No
H1
1.0
Yes
H2
1.0
Yes
H3
1.0
Yes
H4
1.0
Yes
H5
1.0
Yes
H6
1.0
Yes
HEAD
1.0
Yes
HR
1.0
Yes
HTML
1.0
Yes
1.0
Yes
IMG
1.0
Yes
INPUT
2.0
No
INS
2.0
Yes
ISINDEX
2.0
No
KBD
2.0
Yes
LI
2.0
Yes
LINK
2.0
No
MAP
2.0
No
MENU
2.0
Yes
META
2.0
Yes
N3.0
No
NOBR
1.0
No
NOFRAMES
3.2
No
OL
2.0
Yes
OPTION
2.0
No
1.0
Yes
PRE
1.0
Yes
2.0
Yes
Strikethrough
SAMP
2.0
Yes
Courier
SCRIPT
2.0
No
MULTICOL
Elements
Version Supported?
See Below
See Below
Underline
Courier Bold
See Below
6-3
Version Supported?
Notes
SELECT
2.0
No
SMALL
2.0
Yes
SPACER
N3.0
Yes
STRIKE
2.0
Yes
STRONG
2.0
Yes
Boldface Italic/Oblique
SUB
2.0
Yes
Reduced Fontsize
SUP
2.0
Yes
Reduced Fontsize
TABLE
2.0
Yes
See Below
TD
2.0
Yes
TEXTAREA
2.0
No
TH
2.0
Yes
TITLE
2.0
Yes
TR
2.0
Yes
TT
2.0
Yes
1.0
Yes
UL
2.0
Yes
VAR
2.0
Yes
WBR
1.0
No
Boldface Center
Courier
Helvetica Oblique
Comments
HTMLDOC supports many special HTML comments to initiate page breaks, set the header and footer text,
and control the current media options:
<!-- FOOTER LEFT "foo" -->
Sets the left footer text; the test is applied to the current page if empty, or the next page otherwise.
<!-- FOOTER CENTER "foo" -->
Sets the center footer text; the test is applied to the current page if empty, or the next page otherwise.
<!-- FOOTER RIGHT "foo" -->
Sets the right footer text; the test is applied to the current page if empty, or the next page otherwise.
<!-- HALF PAGE -->
Break to the next half page.
<!-- HEADER LEFT "foo" -->
Sets the left header text; the test is applied to the current page if empty, or the next page otherwise.
<!-- HEADER CENTER "foo" -->
Sets the center header text; the test is applied to the current page if empty, or the next page otherwise.
6-4
Comments
Comments
6-5
Header/Footer Strings
The HEADER and FOOTER comments allow you to set an arbitrary string of text for the left, center, and right
headers and footers. Each string consists of plain text; special values or strings can be inserted using the dollar
sign ($):
$$
Inserts a single dollar sign in the header.
$CHAPTER
Inserts the current chapter heading.
$CHAPTERPAGE
$CHAPTERPAGE(format)
Inserts the current page number within a chapter or file. When a format is specified, uses that numeric
format (1 = decimal, i = lowercase roman numerals, I = uppercase roman numerals, a = lowercase
ascii, A = uppercase ascii) for the page numbers.
$CHAPTERPAGES
$CHAPTERPAGES(format)
Inserts the total page count within a chapter or file. When a format is specified, uses that numeric
format (1 = decimal, i = lowercase roman numerals, I = uppercase roman numerals, a = lowercase
ascii, A = uppercase ascii) for the page count.
$DATE
Inserts the current date.
$HEADING
Inserts the current heading.
$LOGOIMAGE
Inserts the logo image; all other text in the string will be ignored.
$PAGE
$PAGE(format)
Inserts the current page number. When a format is specified, uses that numeric format (1 = decimal, i
= lowercase roman numerals, I = uppercase roman numerals, a = lowercase ascii, A = uppercase ascii)
for the page numbers.
$PAGES
$PAGES(format)
Inserts the total page count. When a format is specified, uses that numeric format (1 = decimal, i =
lowercase roman numerals, I = uppercase roman numerals, a = lowercase ascii, A = uppercase ascii)
for the page count.
$TIME
Inserts the current time.
$TITLE
Inserts the document title.
6-6
Header/Footer Strings
FONT Attributes
Limited typeface specification is currently supported to ensure portability across platforms and for older
PostScript printers:
Requested Font
Actual Font
Arial
Helvetica
Courier
Courier
Dingbats
Dingbats
Helvetica
Helvetica
Monospace
Sans
DejaVu Sans
Serif
DejaVu Serif
Symbol
Symbol
Times
Times
Headings
Currently HTMLDOC supports a maximum of 1000 chapters (H1 headings). This limit can be increased by
changing the MAX_CHAPTERS constant in the config.h file included with the source code.
All chapters start with a top-level heading (H1) markup. Any headings within a chapter must be of a lower
level (H2 to H15). Each chapter starts a new page or the next odd-numbered page if duplexing is selected.
Note:
Heading levels 7 to 15 are not standard HTML and will not likely be recognized by
most web browsers.
The headings you use within a chapter must start at level 2 (H2). If you skip levels the heading will be shown
under the last level that was known. For example, if you use the following hierarchy of headings:
<H1>Chapter Heading</H1>
...
<H2>Section Heading 1</H2>
...
<H2>Section Heading 2</H2>
...
<H3>Sub-Section Heading 1</H3>
...
<H4>Sub-Sub-Section Heading 1</H4>
...
<H4>Sub-Sub-Section Heading 2</H4>
...
<H3>Sub-Section Heading 2</H3>
...
<H2>Section Heading 3</H2>
FONT Attributes
6-7
Numbered Headings
When the numbered headings option is enabled, HTMLDOC recognizes the following additional attributes for
all heading elements:
VALUE="#"
Specifies the starting value for this heading level (default is "1" for all new levels).
TYPE="1"
Specifies that decimal numbers should be generated for this heading level.
TYPE="a"
Specifies that lowercase letters should be generated for this heading level.
TYPE="A"
Specifies that uppercase letters should be generated for this heading level.
TYPE="i"
Specifies that lowercase roman numerals should be generated for this heading level.
TYPE="I"
Specifies that uppercase roman numerals should be generated for this heading level.
Images
HTMLDOC supports loading of BMP, GIF, JPEG, and PNG image files. EPS and other types of image files
are not supported at this time.
Links
External URL and internal (#target and filename.html) links are fully supported for HTML and PDF
output.
When generating PDF files, local PDF file links will be converted to external file links for the PDF viewer
instead of URL links. That is, you can directly link to another local PDF file from your HTML document
with:
<A HREF="filename.pdf">...</A>
6-8
Headings
META Attributes
HTMLDOC supports the following META attributes for the title page and document information:
<META NAME="AUTHOR" CONTENT="..."
Specifies the document author.
<META NAME="COPYRIGHT" CONTENT="..."
Specifies the document copyright.
<META NAME="DOCNUMBER" CONTENT="..."
Specifies the document number.
<META NAME="GENERATOR" CONTENT="..."
Specifies the application that generated the HTML file.
<META NAME="KEYWORDS" CONTENT="..."
Specifies document search keywords.
<META NAME="SUBJECT" CONTENT="..."
Specifies document subject.
Page Breaks
HTMLDOC supports four new page comments to specify page breaks. In addition, the older BREAK attribute
is still supported by the HR element:
<HR BREAK>
Support for the BREAK attribute is deprecated and will be removed in a future release of HTMLDOC.
Tables
Currently HTMLDOC supports a maximum of 200 columns within a single table. This limit can be increased
by changing the MAX_COLUMNS constant in the config.h file included with the source code.
HTMLDOC does not support HTML 4.0 table elements or attributes, such as TBODY, THEAD, TFOOT,
or RULES.
META Attributes
6-9
7-10
Tables
New
The New button starts a new document. A confirmation dialog will appear if you have not saved the changes
to the existing document.
Open...
The Open... button retrieves a document that you have saved previously. A file chooser dialog is displayed
that allows you to pick an existing book file.
7-1
Save
The Save button saves the current document. A file chooser dialog is displayed if there is no filename
assigned to the current document.
Note: Saving a document is not the same as generating a document. The book files saved to disk by the Save
and Save As... buttons are not the final HTML, PDF, or PostScript output files. You generate those files by
clicking on the Generate button.
Save As...
The Save As... button saves the current document to a new file. A file chooser dialog is displayed to allow
you to specify the new document filename.
Note: Saving a document is not the same as generating a document. The book files saved to disk by the Save
and Save As... buttons are not the final HTML, PDF, or PostScript output files. You generate those files by
clicking on the Generate button.
Generate
The Generate button generates the current document, creating the specified HTML, PDF, or PostScript
file(s) as needed. The progress meter at the bottom of the window will show the progress as each page or file
is formatted and written.
Note: Generating a document is not the same as saving a document. To save the current HTML files and
settings in the HTMLDOC GUI, click on the Save or Save As... buttons instead.
Close
The Close button closes the HTMLDOC window.
7-2
Save
Document Type
The Book radio button specifies that the input files are structured with headings. The Continuous radio
button specifies unstructured files without page breaks between each file. The Web Page radio button
specifies unstructured files with page breaks between each file.
Input Files
The Input Files list shows all of the HTML input files that will be used to produce the document.
Double-click on files to edit them.
Add Files...
The Add Files... button displays the file chooser dialog, allowing you to select one or more HTML files to
include in the document.
Edit Files...
The Edit Files... button starts the specified editor program to edit the files selected in the Input Files list.
Select one or more files in the Input Files list to enable the Edit Files... button.
Close
7-3
Delete Files
The Delete Files button removes the selected files from the Input Files list. Select one or more files in the
Input Files list to enable the Delete Files button.
The Delete Files button only removes the files from the Input Files list. The files are not removed from
disk.
Move Up
The Move Up button moves the selected files in the Input Files list up one line in the list. To enable the
Move Up button select one or more files in the Input Files list.
Move Down
The Move Down button moves the selected files in the Input Files list down one line in the list. To enable
the Move Down button select one or more files in the Input Files list.
Logo Image
The Logo Image field contains the filename for an image to be shown in the header or footer of pages, and in
the navigation bar of HTML files.
Click on the Browse... button to select a logo image file using the file chooser dialog.
Title File/Image
The Title File/Image field contains the filename for an image to be shown on the title page, or for a HTML
file to be used for the title page(s).
Click on the Browse... button to select a title file using the file chooser dialog.
7-4
Delete Files
Output To
The File radio button selects output to a single file. The Directory radio button selects output to multiple files
in the named directory.
Directory output is not available when generating PDF files.
Output Path
The Output Path field contains the output directory or filename. Click on the Browse... button to choose an
output file using the file chooser dialog.
Output Format
The HTML radio button selects HTML output, the Separated HTML radio button selects HTML output that
is separated into a separate file for each heading in the table-of-contents, the PS radio button selects
PostScript output, and the PDF radio button selects PDF output.
Output Options
The Grayscale check box selects grayscale output for PostScript and PDF files. The Title Page check box
specifies that a title page should be generated for the document. The JPEG Big Images check box specifies
that JPEG compression should be applied to continuous-tone images.
Title File/Image
7-5
Compression
The Compression slider controls the amount of compression that is used when writing PDF or Level 3
PostScript output.
Note: HTMLDOC uses Flate compression, which is not encumbered by patents and is also used by the
popular PKZIP and gzip programs. Flate is a lossless compression algorithm (that is, you get back exactly
what you put in) that performs very well on indexed images and text.
JPEG Quality
The JPEG Quality slider controls the quality level used when writing continuous-tone images with JPEG
compression.
7-6
Compression
Page Size
The Page Size field contains the current page size. Click on the arrow button to choose a standard page size.
HTMLDOC supports the following standard page size names:
Letter - 8.5x11in (216x279mm)
A4 - 8.27x11.69in (210x297mm)
Universal - 8.27x11in (210x279mm)
Click in the Page Size field and enter the page width and length separated by the letter "x" to select a custom
page size. Append the letters "in" for inches, "mm" for millimeters, or "cm" for centimeters.
2-Sided
Click in the 2-Sided check box to select 2-sided (duplexed) output.
Landscape
Click in the Landscape check box to select landscape output.
7-7
Description
Blank
Title
Chapter Title The field should contain the current chapter title.
Heading
Logo
1,2,3,...
The field should contain the current page number in decimal format (1, 2, 3, ...)
i,ii,iii,...
The field should contain the current page number in lowercase roman numerals (i,
ii, iii, ...)
I,II,III,...
The field should contain the current page number in uppercase roman numerals (I,
II, III, ...)
a,b,c,...
7-8
The field should contain the current page number using lowercase letters.
A,B,C,...
The field should contain the current page number using UPPERCASE letters.
Chapter
Page
1/N,2/N,...
The field should contain the current and total number of pages (n/N).
1/C,2/C,...
The field should contain the current and total number of pages in the chapter
(n/N).
Date
The field should contain the current date (formatted for the current locale).
Time
The field should contain the current time (formatted for the current locale).
Date + Time
The field should contain the current date and time (formatted for the current
locale).
Table of Contents
Select the desired number of levels from the Table of Contents option button.
Numbered Headings
Click in the Numbered Headings check box to automatically number the headings in the document.
Title
Enter the desired title for the table-of-contents in the Title field.
7-9
Body Color
The Body Color field specifies the default background color. It can be a standard HTML color name or a
hexadecimal RGB color of the form #RRGGBB. Click on the Lookup... button to pick the color graphically.
Body Image
The Body Image field specifies the default background image. Click on the Browse... button to pick the
background image using the file chooser.
Text Color
The Text Color field specifies the default text color. It can be a standard HTML color name or a hexadecimal
RGB color of the form #RRGGBB. Click on the Lookup... button to pick the color graphically.
Link Color
The Link Color field specifies the default link color. It can be a standard HTML color name or a hexadecimal
RGB color of the form #RRGGBB. Click on the Lookup... button to pick the color graphically.
Link Style
The Link Style chooser specifies the default link decoration.
7-10
Title
Line Spacing
The Line Spacing field specifies the spacing between lines as a multiple of the base font size. Click on the
single arrow buttons to decrease or increase the size by 10ths or on the double arrow buttons to decrease or
increase the size by whole numbers.
Body Typeface
The Body Typeface option button specifies the typeface to use for normal text. Click on the option button to
select a typeface.
Heading Typeface
The Heading Typeface option button specifies the typeface to use for headings. Click on the option button
to select a typeface.
Link Style
7-11
Header/Footer Size
The Header/Footer Size field specifies the size of header and footer text in the document in points (1 point
= 1/72nd inch). Click on the single arrow buttons to decrease or increase the size by 1/10th point or on the
double arrow buttons to decrease or increase the size by whole points.
Header/Footer Font
The Header/Footer Font option button specifies the typeface and style to use for header and footer text.
Click on the option button to select a typeface and style.
Character Set
The Character Set option button specifies the encoding of characters in the document. Click on the option
button to select a character set.
Options
The Embed Fonts check box controls whether or not fonts are embedded in PostScript and PDF output.
The PS Tab
The PS tab (Figure 7-7) contains options specific to PostScript output.
PostScript Level
Click on one of the Level radio buttons to select the language level to generate. PostScript Level 1 is
compatible with all PostScript printers and will produce the largest output files.
7-12
Header/Footer Size
PDF Version
The PDF Version radio buttons control what version of PDF is generated. PDF 1.3 is the most commonly
supported version. Click on the corresponding radio button to set the version.
PostScript Level
7-13
Page Mode
The Page Mode option button controls the initial viewing mode for the document. Click on the option button
to set the page mode.
The Document page mode displays only the document pages. The Outline page mode displays the
table-of-contents outline as well as the document pages. The Full-Screen page mode displays the document
pages on the whole screen; this mode is used primarily for presentations.
Page Layout
The Page Layout option button controls the initial layout of document pages on the screen. Click on the
option button to set the page layout.
The Single page layout displays a single page at a time. The One Column page layout displays a single
column of pages at a time. The Two Column Left and Two Column Right page layouts display two columns
of pages at a time; the first page is displayed in the left or right column as selected.
First Page
The First Page option button controls the initial page that is displayed. Click on the option button to choose
the first page.
Page Effect
The Page Effect option button controls the page effect that is displayed in Full-Screen mode. Click on the
option button to select a page effect.
Page Duration
The Page Duration slider controls the number of seconds that each page will be visible in Full-Screen
mode. Drag the slider to adjust the number of seconds.
Effect Duration
The Effect Duration slider controls the number of seconds that the page effect will last when changing pages.
Drag the slider to adjust the number of seconds.
7-14
Page Mode
Encryption
The Encryption buttons control whether or not encryption is performed on the PDF file. Encrypted
documents can be password protected and also provide user permissions.
Permissions
The Permissions buttons control what operations are allowed by the PDF viewer.
Owner Password
The Owner Password field contains the document owner password, a string that is used by Adobe Acrobat
to control who can change document permissions, etc.
If this field is left blank, a random 32-character password is generated so that no one can change the document
using the Adobe tools.
Options
The Include Links option controls whether or not the internal links in a document are included in the PDF
output. The document outline (shown to the left of the document in Acrobat Reader) is unaffected by this
setting.
Effect Duration
7-15
User Password
The User Password field contains the document user password, a string that is used by Adobe Acrobat to
restrict viewing permissions on the file.
If this field is left blank, any user may view the document without entering a password.
HTML Editor
The HTML Editor field contains the name of the HTML editor to run when you double-click on an input file
or click on the Edit Files... button. Enter the program name in the field or click on the Browse... button to
select the editor using the file chooser.
The %s is added automatically to the end of the command name to insert the name of the file to be edited. If
you are using Netscape Composer to edit your HTML files you should put "-edit" before the %s to tell
Netscape to edit the file and not display it.
Browser Width
The Browser Width slider specifies the width of the browser in pixels that is used to scale images and other
pixel measurements to the printable page width. You can adjust this value to more closely match the
formatting on the screen.
7-16
User Password
Search Path
The Search Path field specifies a search path for files that are loaded by HTMLDOC. It is usually used to
get images that use absolute server paths to load.
Directories are separated by the semicolon (;) so that drive letters (and eventually URLs) can be specified.
Proxy URL
The Proxy URL field specifies a URL for a HTTP proxy server.
Tooltips
The Tooltips check button controls the appearance of tooltip windows over GUI controls.
Modern Look
The Modern Look check button controls the appearance of the GUI controls.
Strict HTML
The Strict HTML check button controls strict HTML conformance checking. When checked, HTML elements
that are improperly nested and dangling close elements will produce error messages.
Browser Width
7-17
Show
The Show option button (1) selects which files are displayed in the file list (3). Click on the option button to
choose a different type of file.
Favorites
The Favorites button (2) allow you to view a specific directory or add the current directory to your list of
favorites.
File List
The file list (3) lists the files and directories in the current directory or folder. Double-click on a file or
directory to select that file or directory. Drag the mouse or hold the CTRL key down while clicking to select
multiple files.
Filename
The Filename field contains the currently selected filename. Type a name in the field to select a file or
directory. As you type, any matching filenames will be highlighted; press the TAB key to accept the matches.
7-18
Dialog Buttons
The dialog buttons (5) close the file chooser dialog window. Click on the OK button to accept your selections
or the Cancel button to reject your selections and cancel the file operation.
Filename
7-19
8-20
Dialog Buttons
Basic Usage
The basic command-line usage for HTMLDOC is:
% htmldoc options filename1.html ... filenameN.html ENTER
% htmldoc options filename.book ENTER
The first form converts the named HTML files to the specified output format immediately. The second form
loads the specified .book file and displays the HTMLDOC window, allowing a user to make changes and/or
generate the document interactively.
If no output file or directory is specified, then all output is sent to the standard output file.
On return, HTMLDOC returns and exit code of 0 if it was successful and non-zero if there were errors.
Options
The following command-line options are recognized by HTMLDOC.
-d directory
The -d option specifies an output directory for the document files.
This option is not compatible with the PDF output format.
Chapter 8 - Command-Line Reference
8-1
-f filename
The -f option specifies an output file for the document.
-t format
The -t option specifies the output format for the document and can be one of the following:
Format
html
Description
Generate one or more indexed HTML files.
htmlsep Generate separate HTML files for each heading in the table-of-contents.
pdf
pdf11
pdf12
pdf13
pdf14
ps
ps1
ps2
ps3
-v
The -v option specifies that progress information should be sent/displayed to the standard error file.
--batch filename.book
The --batch option specifies a book file that you would like to generate without the GUI popping up. This
option can be combined with other options to generate the same book in different formats and sizes:
% htmldoc --batch filename.book -f filename.ps ENTER
% htmldoc --batch filename.book -f filename.pdf ENTER
--bodycolor color
The --bodycolor option specifies the background color for all pages in the document. The color can be
specified by a standard HTML color name or as a 6-digit hexadecimal number of the form #RRGGBB.
8-2
-f filename
--bodyfont typeface
The --bodyfont option specifies the default text font used for text in the document body. The typeface
parameter can be one of the following:
typeface
Actual Font
Arial
Helvetica
Courier
Courier
Helvetica
Helvetica
DevaVu Sans
Serif
DejaVu Serif
Times
Times
--bodyimage filename
The --bodyimage option specifies the background image for all pages in the document. The supported
formats are BMP, GIF, JPEG, and PNG.
--book
The --book option specifies that the input files comprise a book with chapters and headings.
--bottom margin
The --bottom option specifies the bottom margin. The default units are points (1 point = 1/72nd inch); the
suffixes "in", "cm", and "mm" specify inches, centimeters, and millimeters, respectively.
This option is only available when generating PostScript or PDF files.
--browserwidth pixels
The --browserwidth option specifies the browser width in pixels. The browser width is used to scale
images and pixel measurements when generating PostScript and PDF files. It does not affect the font size of
text.
The default browser width is 680 pixels which corresponds roughly to a 96 DPI display. Please note that your
images and table sizes are equal to or smaller than the browser width, or your output will overlap or truncate
in places.
--bodyfont typeface
8-3
--charset charset
The --charset option specifies the 8-bit character set encoding to use for the entire document. HTMLDOC
comes with the following character set files:
charset
Character Set
cp-874
cp-1250
cp-1251
cp-1252
cp-1253
cp-1254
cp-1255
cp-1256
cp-1257
cp-1258
iso-8859-1
ISO-8859-1
iso-8859-2
ISO-8859-2
iso-8859-3
ISO-8859-3
iso-8859-4
ISO-8859-4
iso-8859-5
ISO-8859-5
iso-8859-6
ISO-8859-6
iso-8859-7
ISO-8859-7
iso-8859-8
ISO-8859-8
iso-8859-9
ISO-8859-9
iso-8859-14 ISO-8859-14
iso-8859-15 ISO-8859-15
koi8-r
KOI8-R
--color
The --color option specifies that color output is desired.
This option is only available when generating PostScript or PDF files.
--compression[=level]
The --compression option specifies that Flate compression should be performed on the output file(s). The
8-4
--charset charset
--continuous
The --continuous option specifies that the input files comprise a web page (or site) and that no title page
or table-of-contents should be generated. Unlike the --webpage option described later in this chapter, page
breaks are not inserted between each input file.
This option is only available when generating PostScript or PDF files.
--datadir directory
The --datadir option specifies the location of data files used by HTMLDOC.
--duplex
The --duplex option specifies that the output should be formatted for two sided printing.
This option is only available when generating PostScript or PDF files. Use the --pscommands option to
generate PostScript duplex mode commands.
--effectduration seconds
The --effectduration option specifies the duration of a page transition effect in seconds.
This option is only available when generating PDF files.
--embedfonts
The --embedfonts option specifies that fonts should be embedded in PostScript and PDF output. This is
especially useful when generating documents in character sets other than ISO-8859-1.
--encryption
The --encryption option enables encryption and security features for PDF output.
This option is only available when generating PDF files.
--compression[=level]
8-5
--firstpage page
The --firstpage option specifies the first page that will be displayed in a PDF file. The page parameter
can be one of the following:
page
Description
p1
toc
c1
--fontsize size
The --fontsize option specifies the base font size for the entire document in points (1 point = 1/72nd
inch).
--fontspacing spacing
The --fontspacing option specifies the line spacing for the entire document as a multiplier of the base
font size. A spacing value of 1 makes each line of text the same height as the font.
8-6
--firstpage page
--footer lcr
The --footer option specifies the contents of the page footer. The lcr parameter is a three-character
string representing the left, center, and right footer fields. Each character can be one of the following:
lcr
Description
A colon indicates that the field should contain the current and total number of
pages in the chapter (n/N).
A slash indicates that the field should contain the current and total number of
pages (n/N).
The number 1 indicates that the field should contain the current page number in
decimal format (1, 2, 3, ...)
A lowercase "a" indicates that the field should contain the current page number
using lowercase letters.
An uppercase "A" indicates that the field should contain the current page number
using UPPERCASE letters.
A lowercase "c" indicates that the field should contain the current chapter title.
An uppercase "C" indicates that the field should contain the current chapter page
number.
A lowercase "d" indicates that the field should contain the current date.
An uppercase "D" indicates that the field should contain the current date and
time.
An "h" indicates that the field should contain the current heading.
A lowercase "i" indicates that the field should contain the current page number in
lowercase roman numerals (i, ii, iii, ...)
An uppercase "I" indicates that the field should contain the current page number
in uppercase roman numerals (I, II, III, ...)
A lowercase "l" indicates that the field should contain the logo image.
A lowercase "t" indicates that the field should contain the document title.
An uppercase "T" indicates that the field should contain the current time.
--footer lcr
8-7
--format format
The --format option specifies the output format for the document and can be one of the following:
Format
html
Description
Generate one or more indexed HTML files.
htmlsep Generate separate HTML files for each heading in the table-of-contents.
pdf
pdf11
pdf12
pdf13
pdf14
ps
ps1
ps2
ps3
--gray
The --gray option specifies that grayscale output is desired.
This option is only available when generating PostScript or PDF files.
--header lcr
The --header option specifies the contents of the page header. The lcr parameter is a three-character
string representing the left, center, and right header fields. See the --footer option for the list of formatting
characters.
Setting the header to "..." disables the header entirely.
8-8
--format format
--headfootfont font
The --headfootfont option specifies the font that is used for the header and footer text. The font
parameter can be one of the following:
Courier
Courier-Bold
Courier-Oblique
Courier-BoldOblique
Helvetica
Helvetica-Bold
Helvetica-Oblique
Helvetica-BoldOblique
Monospace
Monospace-Bold
Monospace-Oblique
Monospace-BoldOblique
Sans
Sans-Bold
Sans-Oblique
Sans-BoldOblique
Serif
Serif-Roman
Serif-Bold
Serif-Italic
Serif-BoldItalic
Times
Times-Roman
Times-Bold
Times-Italic
Times-BoldItalic
This option is only available when generating PostScript or PDF files.
--headfootsize size
The --headfootsize option sets the size of the header and footer text in points (1 point = 1/72nd inch).
This option is only available when generating PostScript or PDF files.
--headfootfont font
8-9
--headingfont typeface
The --headingfont options sets the typeface that is used for headings in the document. The typeface
parameter can be one of the following:
typeface
Actual Font
Arial
Helvetica
Courier
Courier
Helvetica
Helvetica
DevaVu Sans
Serif
DejaVu Serif
Times
Times
--help
The --help option displays all of the available options to the standard output file.
--helpdir directory
The --helpdir option specifies the location of the on-line help files.
--jpeg[=quality]
The --jpeg option enables JPEG compression of continuous-tone images. The optional quality
parameter specifies the output quality from 0 (worst) to 100 (best).
This option is only available when generating PDF or Level 2 and Level 3 PostScript files.
--landscape
The --landscape option specifies that the output should be in landscape orientation (long edge on top).
This option is only available when generating PostScript or PDF files.
--left margin
The --left option specifies the left margin. The default units are points (1 point = 1/72nd inch); the suffixes
"in", "cm", and "mm" specify inches, centimeters, and millimeters, respectively.
This option is only available when generating PostScript or PDF files.
--linkcolor color
The --linkcolor option specifies the color of links in HTML and PDF output. The color can be specified
by name or as a 6-digit hexadecimal number of the form #RRGGBB.
8-10
--headingfont typeface
--links
The --links option specifies that PDF output should contain hyperlinks.
--linkstyle style
The --linkstyle option specifies the style of links in HTML and PDF output. The style can be "plain" for
no decoration or "underline" to underline links.
--logoimage filename
The --logoimage option specifies the logo image for the HTML navigation bar and page headers and
footers for PostScript and PDF files. The supported formats are BMP, GIF, JPEG, and PNG.
Note:
You need to use the --header and/or --footer options with the l parameter or
use the corresponding HTML page comments to display the logo image in the header
or footer.
The following example uses the --header option:
htmldoc --logoimage image.png --header lt. -f file.pdf file.html
--no-compression
The --no-compression option specifies that Flate compression should not be performed on the output
files.
--no-duplex
The --no-duplex option specifies that the output should be formatted for one sided printing.
This option is only available when generating PostScript or PDF files. Use the --pscommands option to
generate PostScript duplex mode commands.
--no-embedfonts
The --no-embedfonts option specifies that fonts should not be embedded in PostScript and PDF output.
--no-encryption
The --no-encryption option specifies that no encryption/security features should be enabled in PDF
output.
This option is only available when generating PDF files.
--no-jpeg
The --no-jpeg option specifies that JPEG compression should not be performed on large images.
--links
8-11
--no-links
The --no-links option specifies that PDF output should not contain hyperlinks.
--no-localfiles
The --no-localfiles option disables access to local files on the system. This option should be used
when providing remote document conversion services.
--no-numbered
The --no-numbered option specifies that headings should not be numbered.
--no-pscommands
The --no-pscommands option specifies that PostScript device commands should not be written to the
output files.
--no-strict
The --no-strict option turns off strict HTML conformance checking.
--no-title
The --no-title option specifies that the title page should not be generated.
--no-toc
The --no-toc option specifies that the table-of-contents pages should not be generated.
--no-xrxcomments
The --no-xrxcomments option specifies that Xerox PostScript job comments should not be written to the
output files.
This option is only available when generating PostScript files.
--numbered
The --numbered option specifies that headings should be numbered.
--nup pages
The --nup option sets the number of pages that are placed on each output page. Valid values for the pages
parameter are 1, 2, 4, 6, 9, and 16.
--outdir directory
The --outdir option specifies an output directory for the document files.
This option is not compatible with the PDF output format.
8-12
--no-links
--outfile filename
The --outfile option specifies an output file for the document.
--owner-password password
The --owner-password option specifies the owner password for a PDF file. If not specified or the empty
string (""), a random password is generated.
This option is only available when generating PDF files.
--pageduration seconds
The --pageduration option specifies the number of seconds that each page will be displayed in the
document.
This option is only available when generating PDF files.
--outfile filename
8-13
--pageeffect effect
The --pageeffect option specifies the page effect to use in PDF files. The effect parameter can be one
of the following:
effect
Description
none
No effect is generated.
bi
Box Inward
bo
Box Outward
Dissolve
gd
Glitter Down
gdr
gr
Glitter Right
hb
Horizontal Blinds
hsi
hso
vb
Vertical Blinds
vsi
vso
wd
Wipe Down
wl
Wipe Left
wr
Wipe Right
wu
Wipe Up
8-14
--pageeffect effect
--pagelayout layout
The --pagelayout option specifies the initial page layout in the PDF viewer. The layout parameter can
be one of the following:
layout
Description
single
one
twoleft
Two columns are displayed with the first page on the left.
tworight Two columns are displayed with the first page on the right.
This option is only available when generating PDF files.
--pagemode mode
The --pagemode option specifies the initial viewing mode in the PDF viewer. The mode parameter can be
one of the following:
mode
Description
fullscreen The document pages are displayed on the entire screen in "slideshow" mode.
This option is only available when generating PDF files.
--path dir1;dir2;dir3;...;dirN
The --path option specifies a search path for files that are loaded by HTMLDOC. It is usually used to get
images that use absolute server paths to load.
Directories are separated by the semicolon (;) so that drive letters and URLs can be specified. Quotes around
the directory parameter are optional. They are usually used when the directory string contains spaces.
--path "dir1;dir2;dir3;...;dirN"
--pagelayout layout
8-15
--permissions permission[,permission,...]
The --permissions option specifies the document permissions. The available permission parameters are
listed below:
Permission
Description
all
All permissions
annotate
copy
modify
no-modify
no-print
none
No permissions
The --encryption option must be used in conjunction with the --permissions parameter.
--permissions no-print --encryption
--portrait
The --portrait option specifies that the output should be in portrait orientation (short edge on top).
This option is only available when generating PostScript or PDF files.
--pscommands
The --pscommands option specifies that PostScript device commands should be written to the output files.
This option is only available when generating Level 2 and Level 3 PostScript files.
--quiet
The --quiet option prevents error messages from being sent to stderr.
--referer url
The --referer option sets the URL that is passed in the Referer: field of HTTP requests.
8-16
--permissions permission[,permission,...]
--right margin
The --right option specifies the right margin. The default units are points (1 point = 1/72nd inch); the
suffixes "in", "cm", and "mm" specify inches, centimeters, and millimeters, respectively.
This option is only available when generating PostScript or PDF files.
--size size
The --size option specifies the page size. The size parameter can be one of the following standard sizes:
size
Description
Letter
8.5x11in (216x279mm)
A4
8.27x11.69in (210x297mm)
--strict
The --strict option turns on strict HTML conformance checking. When enabled, HTML elements that are
improperly nested and dangling close elements will produce error messages.
--textcolor color
The --textcolor option specifies the default text color for all pages in the document. The color can be
specified by a standard HTML color name or as a 6-digit hexadecimal number of the form #RRGGBB.
--right margin
8-17
--textfont typeface
The --textfont options sets the typeface that is used for text in the document. The typeface parameter
can be one of the following:
typeface
Actual Font
Arial
Helvetica
Courier
Courier
Helvetica
Helvetica
DevaVu Sans
Serif
DejaVu Serif
Times
Times
--title
The --title option specifies that a title page should be generated.
--titlefile filename
The --titlefile option specifies a HTML file to use for the title page.
--titleimage filename
The --titleimage option specifies the title image for the title page. The supported formats are BMP, GIF,
JPEG, and PNG.
--tocfooter lcr
The --tocfooter option specifies the contents of the table-of-contents footer. The lcr parameter is a
three-character string representing the left, center, and right footer fields. See the --footer option for the
list of formatting characters.
Setting the TOC footer to "..." disables the TOC footer entirely.
--tocheader lcr
The --tocheader option specifies the contents of the table-of-contents header. The lcr parameter is a
three-character string representing the left, center, and right header fields. See the --footer option for the
list of formatting characters.
Setting the TOC header to "..." disables the TOC header entirely.
--toclevels levels
The --toclevels options specifies the number of heading levels to include in the table-of-contents pages.
The levels parameter is a number from 1 to 6.
8-18
--textfont typeface
--toctitle string
The --toctitle options specifies the string to display at the top of the table-of-contents; the default string
is "Table of Contents".
--top margin
The --top option specifies the top margin. The default units are points (1 point = 1/72nd inch); the suffixes
"in", "cm", and "mm" specify inches, centimeters, and millimeters, respectively.
This option is only available when generating PostScript or PDF files.
--user-password password
The --user-password option specifies the user password for a PDF file. If not specified or the empty
string (""), no password will be required to view the document.
This option is only available when generating PDF files.
--verbose
The --verbose option specifies that progress information should be sent/displayed to the standard error
file.
--version
The --version option displays the HTMLDOC version number.
--webpage
The --webpage option specifies that the input files comprise a web page (or site) and that no title page or
table-of-contents should be generated. HTMLDOC will insert a page break between each input file.
This option is only available when generating PostScript or PDF files.
--xrxcomments
The --xrxcomments option specifies that Xerox PostScript job comments should be written to the output
files.
This option is only available when generating PostScript files.
--toctitle string
8-19
Environment Variables
HTMLDOC looks for several environment variables which can override the default directories, display
additional debugging information, and disable CGI mode.
HTMLDOC_DATA
This environment variable specifies the location of HTMLDOC's data and fonts directories, normally
/usr/share/htmldoc or C:\Program Files\Easy Software Products\HTMLDOC.
HTMLDOC_DEBUG
This environment variable enables debugging information that is sent to stderr. The value is a list of keywords
separated by spaces:
keyword
Information Shown
links
memory
remotebytes Shows the number of bytes that were transferred via HTTP
table
tempfiles
Shows the temporary files that were created, and preserves them for debugging
timing
all
HTMLDOC_HELP
This environment variable specifies the location of HTMLDOC's documentation directory, normally
/usr/share/doc/htmldoc or C:\Program Files\Easy Software Products\HTMLDOC\doc.
HTMLDOC_NOCGI
This environment variable, when set (the value doesn't matter), disables CGI mode. It is most useful for using
HTMLDOC on a web server from a scripting language or invocation from a program.
8-20
Environment Variables
Messages
HTMLDOC sends error and status messages to stderr unless the --quiet option is provided on the
command-line. Applications can capture these messages to relay errors or statistics to the user.
BYTES: Message
The BYTES: message specifies the number of bytes that were written to an output file. If the output is
directed at a directory then multiple BYTES: messages will be sent.
DEBUG: Messages
The DEBUG: messages contain debugging information based on the value of the HTMLDOC_DEBUG
environment variable. Normally, no DEBUG: messages are sent by HTMLDOC.
ERRnnn: Messages
The ERRnnn: messages specify an error condition. Error numbers 1 to 14 map to the following errors:
1. No files were found or loadable.
2. No pages were generated.
3. The document contains too many files or chapters.
4. HTMLDOC ran out of memory.
5. The specified file could not be found.
6. The comment contains a bad HTMLDOC formatting command.
7. The image file is not in a known format.
8. HTMLDOC was unable to remove a temporary file.
9. HTMLDOC had an unspecified internal error.
10. HTMLDOC encountered a networking error when retrieving a file via a URL.
11. HTMLDOC was unable to read a file.
12. HTMLDOC was unable to write a file.
13. A HTML error was found in a source file.
14. A table, image, or text fragment was too large to fit in the space provided.
15. A hyperlink in the source files was unresolved.
16. A header/footer string in the document contains a bad $ command.
Error numbers 100 to 505 correspond directly to a HTTP status code.
INFO: Messages
The INFO: messages contain general information that is logged when HTMLDOC is running in CGI mode
or when you use the --verbose option.
PAGES: Message
The PAGES: message specifies the number of pages that were written to an output file. If the output is
directed at a directory then multiple PAGES: messages will be sent. No PAGES: messages are sent when
generating HTML output.
REMOTEBYTES: Message
The REMOTEBYTES: message specifies the number of bytes that were transferred using HTTP. This message
is only displayed if the HTMLDOC_DEBUG environment variable has the keyword remotebytes or all.
Messages
8-21
TIMING: Message
The TIMING: message specifies the load, render, and total time in seconds for the current command. This
message is only displayed if the HTMLDOC_DEBUG environment variable has the keyword timing or all.
8-22
TIMING: Message
Introduction
HTMLDOC is distributed in both source code and binary (executable) forms. The source code is provided
under the terms of the GNU General Public License ("GPL") with a license exception for the OpenSSL
toolkit. A copy of the source code license can be found in the file COPYING.txt in the source code
distribution.
The binaries are provided under a typical commercial software end-user license agreement which is more
restrictive than the GNU GPL.
A-1
Trademarks
Easy Software Products has trademarked the HTMLDOC name. You may use the name in any direct port or
binary distribution of HTMLDOC. Please contact Easy Software Products for written permission to use the
name in derivative products. Our intention is to protect the value of this trademark and ensure that any
derivative product meets the same high-quality standards as the original.
A-2
Trademarks
A-3
A-4
Introduction
The HTMLDOC .book file format is a simple text format that provides the command-line options and files
that are part of the document. These files can be used from the GUI interface or from the command-line using
the --batch option:
htmldoc filename.book
htmldoc --batch filename.book
The first form will load the book and display the GUI interface, if configured. Windows users should use
ghtmldoc.exe executable to show the GUI and htmldoc.exe for the batch mode:
ghtmldoc.exe filename.book
htmldoc.exe --batch filename.book
The Header
Each .book file starts with a line reading:
#HTMLDOC 1.8.17
B-1
The Options
Following the header is a line containing the options for the book. You can use any valid command-line
option on this line:
-f htmldoc.pdf --titleimage htmldoc.png --duplex --compression=9 --jpeg=90
Long option lines can be broken using a trailing backslash (\) on the end of each continuation line:
-f htmldoc.pdf --titleimage htmldoc.png --duplex \
--compression=9 --jpeg=90
The Files
Following the options are a list of files or URLs to include in the document:
intro.html
1-install.html
2-starting.html
3-books.html
4-cmdline.html
5-cgi.html
6-htmlref.html
7-guiref.html
8-cmdref.html
a-license.html
b-book.html
c-relnotes.html
B-2
The Options
While HTMLDOC still supports reading this format, we do not recommend using it for new books. In
particular, when generating a document using the --batch option, some options may not be applied
correctly since the files are loaded prior to setting the output options in the old format.
B-3
C-4
This appendix provides the release notes for each version of HTMLDOC.
C-1
Changes
On Windows, HTMLDOC now logs CGI mode errors to a file called "htmldoc.log" in the Windows
temporary directory.
HTMLDOC no longer uses Base-85 encoding for image data when producing Level 2 and 3
PostScript output. It appears that many printers and PostScript interpreters cannot properly decode
this data when the original image data is not a multiple of 8 bits.
HTMLDOC now renders STRONG elements in boldface instead of bold-italic to match the W3C
recommendations.
HTMLDOC now automatically inserts a TR element before a TD or TH element as needed to
improve web site compatibility; this also triggers a HTML error in --strict mode.
Bug Fixes
"$HFIMAGEn" didn't work in a header/footer string.
HTMLDOC could crash when rendering a table.
Book files were not used in CGI mode (STR #69)
Cookies were not sent in HTTP requests (STR #71)
Table cells were not aligned properly when the ROWSPAN attribute was set to 1 (STR #73)
HTMLDOC crashed when rendering unresolved hyperlinks in aligned images (STR #62)
Documented the HTMLDOC_NOCGI environment variable (STR #63)
HTMLDOC sometimes crashed when rendering tables with background colors (STR #65)
HTMLDOC would crash when writing encrypted strings longer than 1024 bytes (STR #66)
HTMLDOC didn't set the data directory when running in CGI mode on Windows.
HTMLDOC could crash when loading the Symbol.afm file (STR #93)
HTMLDOC did not always honor HEIGHT attributes in table rows.
Tables with a mix of colspan and rowspan sometimes caused cells to be moved vertically outside the
cell.
C-2
Bug Fixes
Changes
The command-line now allows --fontsize values from 4 to 26 to match the GUI.
Now use a 0.001 point tolerance when checking for content that overflows the page/cell.
HTMLDOC no longer enables interpolation of 2-color images.
The default vertical alignment of images is "BOTTOM" to match the HTML specification.
Paragraph spacing is only applied to the first table after a paragraph.
The tabloid media size was 10 points too short in length.
The table formatter now subtracts the outside border and padding widths for percentage-based widths.
This helps to eliminate "truncation or overlapping" errors.
Dropped support for FLTK 1.0.x when building the GUI.
The default vertical alignment is now "bottom" inside paragraphs to correctly align different sized text
and images to the baseline.
Indexed images are now written as PDF image objects when encryption is enabled; this works around
a serious bug in Acrobat 6 which tries to decrypt the colormap of in-line images twice, causing some
very strange colors!
Table captions can now be bottom aligned.
Blocks now break at the bottom of a page if the current line height + standard line height goes below
the bottom of the page; this prevents images with captions from getting erroneously moved to the top
of the next page.
Character entities are now supported in HTML attributes and unknown or invalid character entities
are left as plain text.
Changed handling of NOWRAP for some tables.
The --permissions option now supports multiple permission keywords in a single invocation.
Dropped support for MacOS 9 and earlier.
HTMLDOC now breaks between images that are too large to fit on a single line, to match the
behavior of Mozilla/Netscape (STR #7).
HTMLDOC now handles XHTML input more cleanly.
HTMLDOC no longer specifies an interpolation preference for images in PostScript or PDF output
(STR #8)
The DT element no longer applies an italic style (PR #5178)
HTMLDOC now ignores content inside a STYLE element (PR #5183)
C-3
Bug Fixes
Switching between landscape and portrait orientations would cause margin creepage.
Images did not default to align=bottom, and the align=bottom line spacing calculation was incorrect.
Whitespace before a link was underlined.
Fixed a table column sizing bug.
HTMLDOC didn't read back the HTTP response properly in all situations.
Fixed some more PNG transparency cases.
The PageBoundingBox comments in PostScript output did not account for the back page when
duplexing was enabled.
HTMLDOC generated an incorrect image mask for some images.
The first page of each chapter did not use the custom page number if it was placed inside the heading.
HTMLDOC did not reset the rendering cache before each page when producing N-up output; this
caused font errors in some cases that prevented the document from printing or displaying properly.
Eliminated a common cause of "table too wide" formatting errors,
Fixed a bug when applying a table background color to a cell without a border that cross a page
boundary.
Fixed some calls to strcpy with overlapping arguments.
The names object was never set when the name objects were written.
Character entities were not decoded/encoded inside HTML comments.
The current heading was not always correctly substituted when used in the page header or footer.
When converting web pages from the GUI, the table-of-contents page number preferences were
incorrectly used.
PDF page effects/transitions were not put in the right part of the page dictionary, causing them not to
be used by the PDF reader application.
The _HD_OMIT_TOC attribute was not being honored for HTML output.
HTMLDOC now handles "open" messages from the MacOS X Finder (STR #3)
The GUI did not load or save the "strict HTML" setting (STR #6)
The HTML version of the title page did not set the ALT attribute for the title image (STR #10)
The HTML version of the table of contents did not correctly nest the lists in the parent items (STR
#10)
Borders around left and right-aligned images were not drawn properly (PR #5112)
Grayscale PDF output was not truly grayscale (STR #32)
Fixed a table-of-contents bug introduced in 1.8.24rc1 which caused the PDF document outline and
actual TOC pages were not rendered properly (STR #37)
Links were not rendered due to a bug that was introduced in 1.8.24rc2 (STR #41)
Changes
The NEW SHEET page comment now breaks on N-up boundaries when N is greater than 1.
Bug Fixes
HTMLDOC tried to format tables with no rows or columns. While the HTML in technically not in
error, it is not exactly something you'd expect someone to do.
HTMLDOC didn't report an error when it could not find the specified title page file.
HTMLDOC could crash if it was unable to create its output files.
HTMLDOC could crash when writing HTML output containing unknown HTML elements.
C-4
Bug Fixes
Bug Fixes
Changes
HTMLDOC now calculates the resolution of the body image using the printable width instead of the
page width.
HTMLDOC should now compile out-of-the-box using the Cygwin tools.
HTMLDOC no longer inserts whitespace between text inside DIV elements.
HTMLDOC now supports quoted usernames and passwords in URLs.
HTMLDOC now defaults unknown colors to white for background colors and black for foreground
colors. This should make documents that use non-standard color names still appear readable.
Bug Fixes
C-5
Changes
The HTML parser now allows BODY to auto-close HEAD and visa-versa.
Bug Fixes
HTMLDOC wouldn't compile using GCC under HP-UX due to a badly "fixed" system header file
(vmtypes.h).
Generating a book without a table-of-contents would produce a bad PDF file.
The Xerox XRX comments used the wrong units for the media size, points instead of millimeters.
IMG elements with links that use the ALIGN attribute didn't get the links.
Header and footer comments would interfere with the top and bottom margin settings.
Fixed a bug in the htmlReadFile() function which caused user-provided title pages not to be displayed
in PS or PDF output.
The table-of-contents would inherit the last media settings in the document, but use the initial settings
when formatting.
Changes
Updated the HTML parser to use HTML 4.0 rules for embedding elements inside a LI.
Now check for a TYPE attribute on EMBED elements, so that embedded Flash files do not get treated
as HTML.
Now put the COPYRIGHT meta data in the Author field in a PDF file along with the AUTHOR meta
data (if any).
No longer embed the prolog.ps command header when PostScript commands are not being embedded
in the output.
HTMLDOC now properly ignores the HTML 4.0 COL element.
Bug Fixes
Squeezed tables were not centered or right-aligned properly.
Cells didn't align properly if they were the first things on the page, or if there were several intervening
empty cells.
The preferred cell width handling didn't account for the minimum cell width, which could cause some
tables to become too large.
Remote URLs didn't always resolve properly (like the images from the Google web page...)
The font width loading code didn't force the non-breaking space to have the same width as a regular
space.
PRE text didn't adjust the line height for the tallest fragment in the line.
HTMLDOC tried to seek backwards when reading HTML from the standard input.
C-6
Changes
Now accept all JPEG files, even if they don't start with an APPn marker.
Now only start a new page for a chapter/filter if we aren't already at the top of a page.
Bug Fixes
ROWSPAN handling in tables has been updated to match the MSIE behavior, where the current
rowspan is reduced by the minimum rowspan in the table; that is, if you use "ROWSPAN=17" for all
cells in a row, HTMLDOC now treats this as if you did not use ROWSPAN at all. It is unclear if this
is what the W3C intends.
The "--webpage" option didn't force toc levels to 0, which caused a bad page object reference to be
inserted in the PDF output file.
Background colors in nested tables didn't always get drawn in the right order, resulting in the wrong
colors showing through.
The HEADER page comment didn't set the correct top position in landscape orientation.
Changes
Made some changes in how COLSPAN and ROWSPAN are handled to better match how Netscape
and MSIE format things.
HTMLDOC now handles .book files with CR, LF, or CR LF line endings.
Changed the TOC numbering to use 32-bit integers instead of 8-bit integers...
Now handle local links with quoted (%HH) characters.
The command-line interface no longer sets PDF output mode when using --continuous or --webpage.
HTMLDOC now opens HTML output files in binary mode to prevent extra CR's under Windows, and
strips incoming CR's from PRE text.
Now support inserting the current chapter and heading in the table-of-contents headers and footers.
Bug Fixes
The table cell border and background were offset by the cellpadding when they should only be offset
by the cellspacing.
Bug Fixes
C-7
Bug Fixes
HR elements didn't render properly.
Background images didn't render properly and could lock up HTMLDOC.
The "HALF PAGE" comment would lock up HTMLDOC - HTMLDOC would keep adding pages
until it ran out of memory.
SUP and SUB used a fixed (reduced) size instead of using a smaller size from the current one.
Empty cells could cause unnecessary vertical alignment on the same row.
Changes
Now load images into memory only as needed, and unload them when no longer needed. This
provides a dramatic reduction in memory usage with files that contain a lot of in-line images.
Now use the long names for the Flate and DCT filters in all non-inline PDF streams. This avoids a
stupid bug in Acrobat Reader when printing to PostScript printers.
HTMLDOC now strips any trailing GET query information when saving the start of files (target) in a
document.
Unqualified URLs (no leading scheme name, e.g. http:) now default to the HTTP port (80) instead of
the IPP port (631).
Optimized the image writing code to do more efficient color searching. This provides a significant
speed improvement when including images.
Now hide all text inside SCRIPT, SELECT, and TEXTAREA elements.
C-8
Bug Fixes
Bug Fixes
If a document started with a heading greater than H1, HTMLDOC would crash.
Full justification would incorrectly be applied to text ending with a break.
Images using ALIGN="MIDDLE" were not centered properly on the baseline.
Table cells that used both ROWSPAN and COLSPAN did not format properly (the colspan was lost
after the first row.)
Tables that used cells that exclusively used COLSPAN did not format properly.
When writing HTML output, image references would incorrectly be mapped using the current path.
Images with a width or height of 0 should not be written to PS or PDF output.
The CreationDate comment in PostScript output contained a bad timezone offset (+-0500, for
example, instead of -0500).
The PHP portal example now verifies that the URL passed to it contains no illegal characters.
Changes
Most output generation limits have been removed; HTMLDOC now dynamically allocates memory as
needed for pages, images, headings, and links. This has the happy side-effect of reducing the initial
memory footprint significantly.
Now call setlocale() when it is available to localize the date and time in the output.
The table parsing code now checks to see that a ROWSPAN attribute fits in the table; e.g., a
ROWSPAN of 10 for a table that has only 6 rows remaining needs to be reduced to 6...
Bug Fixes
Tables with a lot of COLSPANs could cause a divide- by-zero error or bad pages (NAN instead of a
number.)
Table cells with a single render element would not be vertically aligned.
The --quiet option would enable progress messages on the command-line.
Table cell widths could be computed incorrectly, causing unnecessary wrapping.
Changes
C-9
Changes
External file references to non-PDF files now use the "Launch" action so they can be
opened/executed/saved as allowed by the OS and PDF viewer.
Changed the indexed/JPEG'd transition point to 256 colors when using Flate compression. This makes
PDF files much smaller in general.
Changed the in-line image size limit to 64k.
Now allow "<" followed by whitespace, "=", or "<". This violates the HTML specification, but we're
sick of people complaining about it.
Preferences are now stored in a user-specific file under Windows, just like UNIX. This provides
user-specific preferences and allows preferences to be kept when upgrading to new versions of
HTMLDOC.
The book loading code now allows for blank lines, even though these are not a part of the format.
(added to support some scripted apps that include extra newlines...)
Changed the leading space handling of blocks to more closely match the standard browser behavior.
Bug Fixes
The table formatting code adding the border width to the cell width, while Netscape and MSIE don't.
This caused some interesting formatting glitches...
The table formatting code didn't account for the preferred width of colspan'd cells.
The table formatting code tried to enforce the minimum cell width when squeezing a table to fit on the
page; this caused the table to still exceed the width of the page.
The PDF catalog object could contain a reference to a /Names object of "0 0 R", which is invalid.
This would happen when the "--no-links" option was used.
Several HTML elements were incorrectly written with closing tags.
When piping PDF output, the temporary file that is created needed to be open for reading and writing,
but HTMLDOC only opened the file for writing.
Image links did not work.
The JPEG image loading code did not correctly handle grayscale JPEG images.
JPEG images were not encrypted when writing a document with encryption enabled.
The user password was not properly encrypted.
The colormap of indexed images were not encrypted when writing a document with encryption
enabled.
The temporary file creation and cleanup functions did not use the same template under Windows,
causing multiple conversions to fail when temporary files were used.
Paragraphs could end up with one extra text fragment, causing the line to be too long.
The command-line program would clear the error count after reading all the files/URLs on the
command-line, but before generating the document. If there were problems reading the files/URLs,
HTMLDOC would return a 0 exit status instead of 1.
Image objects that were both JPEG and Flate compressed would not display (filters specified in the
wrong order.)
Images with more than 256 colors would cause a segfault on some systems.
Background images would generate the error message "XObject 'Innn' is unknown".
Changes
Changes
Consolidated temporary file management into new file_temp() function. The new function also makes
use of the Windows "short lived" open option which may improve performance with small temporary
files.
Updated book file format and added an appendix describing the format.
Now default to PDF 1.3 (Acrobat 4.0) output format.
Now output length of PDF streams with the stream object; this offers a modest reduction in file size.
The HTTP file cache now keeps track of previous URLs that were downloaded.
The HTTP code now supports redirections (status codes 301 to 303) to alternate URLs.
Limit the height check for table rows to 1/8th of the page length; this seems to provide fairly
consistent wrapping of tables without leaving huge expanses of blank space at the bottom of pages.
The HTML output now also includes a font-family style for PRE text; otherwise the body font would
override the PRE font with some browsers.
The snprintf/vsnprintf emulation functions were not included in the HTMLDOC makefile.
RGB hex colors are now recognized with or without the leading #. This breaks HTML standards
compliance but should reduce the number of problem reports from buggy HTML.
The stylesheet generated with the HTML output no longer contains absolute font sizes, just the
typefaces and a relative size for SUB/SUP.
The title image is no longer scaled to 100% in the HTML output.
Bug Fixes
The web page output was not divided into chapters for each input file.
The "make install" target did a clean.
The configure script would remove the image libraries if you did not have FLTK installed.
The fix_filename() function didn't handle relative URLs for images (e.g.
SRC="../images/filename.gif")
Comments in the source document were being closed by a ".
The command-line and GUI interfaces looked for "outlines" instead of "outline" for the page mode.
The HTML output code didn't output closing tags for empty elements.
The GUI interface started with the compression slider enabled, even for HTML output.
The beginnings of some lines could start with whitespace.
Wasn't aligning images and text on lines based on the line height.
The compression slider was enabled in the GUI even though HTML output was selected.
The Perl example code was incorrect.
Fixed the check for whether or not pages were generated.
htmlSetCharSet() wasn't reloading the character set data if the data directory changed.
The GUI did not reset the default background color.
The 'C' page number style (chapter page numbers) started at 3 instead of 1.
The chapter links were off by 1 or 2 pages when no title page was included.
C-11
Changes
Added missing casts in htmllib.cxx that were causing a compile warning with some compilers.
No longer draw borders around empty cells in tables...
Now disable the TOC tab when using webpage mode.
Now scale title image to 100% in HTML output.
Now handle comments with missing whitespace after the "<!--".
Bug Fixes
Nested tables didn't take into account the table border width, spacing, or padding values.
HTMLDOC crashed under Solaris when reading HTML files from the standard input.
<ELEM>text</ELEM> <MELE>text</MELE> was rendered without an intervening space.
Changes
Increased default MAX_PAGES to 10000 (was 5000.)
File links in book files now point to the top of the next page.
<TABLE ALIGN=xyz> now aligns the table (previously it just set the default alignment of cells.)
Transparent GIFs now use the body color instead of white for the transparent color.
Updated to LIBPNG 1.0.6 in source distribution.
Updated the default cellpadding to be 1 pixel to match Netscape output.
Updated line and block spacing to match Netscape.
DL/DT/DD output now matches browsers (was indented from browser output.)
Now only output link (A) style if it is set to "none". Otherwise Netscape would underline all targets as
well as links.
Increased the MAX_COLUMNS constant to 200, and dropped MAX_ROWS to 200. Note that the
new table code now allocates rows in increments of MAX_ROWS rows, so the actual maximum
number of rows depends on available memory.
Bug Fixes
Now ignore illegal HTML in tables.
The VALIGN code didn't handle empty cells properly.
C-12
New Features
Changes
Minor source changes for OS/2 compilation.
SUP and SUB now raise/lower text more to be consistent with browser look-n-feel.
Non-breaking space by itself was being output. Now check for that and ignore strings that consist
entirely of whitespace.
New progress bar.
Bug Fixes
Didn't add whitespace after a table caption.
Nested tables caused formatting problems (flatten_tree() didn't insert breaks for new rows)
A cell whose minimum width exceeded the available width for the table would cause the table to go
off the page.
Cells that spanned more than two pages were drawn with boxes around them rather than just the sides.
The stylesheet info in the HTML output specified the H1 size for all headings.
The title page was incorrectly formatted when an image was specified - the text start position was
computed using the pixel height of the title image and not the formatted height.
1 color images didn't come out right; the "fix" to work around an Acrobat Reader bug was being done
too soon, so the color lookups were wrong.
HTML file links now work properly.
Now limit all HTML input to the maximum size of input buffers to avoid potential buffer overflow
problems in CGIs.
If a row had a predefined height, HTMLDOC wasn't making sure that the row would fit on the current
page.
THEAD, TFOOT, and TBODY caused problems when formatting tables. Note: THEAD and TFOOT
are *still* not supported, however the code now properly ignores them and parses the rows in the
TBODY group.
The VALIGN code introduced in the 1.8.5 release didn't check for NULL pointers in all cases.
Bug Fixes
C-13
New Features
New "--titlefile" option to include an HTML file for the title page(s).
New 'C' header/footer option to show current page number within chapter or HTML file.
Allow adding of .book files to import all HTML files in the book.
New "HALF PAGE" page comment to feed 1/2 page.
Added VALIGN and HEIGHT support in tables.
Changes
Now optimize link objects in PDF files (provides a 40k reduction in file size for the HTMLDOC
manual alone)
Table rows that cross page boundaries are now rendered more like Netscape and MSIE.
Now support HTMLDOC_DATA and HTMLDOC_HELP environment variables under UNIX (for
alternate install directory)
Now show error messages when HTMLDOC can't open the AFM, character set, or PostScript glyph
files.
The logo image is now scaled to its "natural" size (as it would appear in a web browser)
Now recognize VALIGN="MIDDLE" or VALIGN="CENTER".
Bug Fixes
Generation of PDF files to the standard output (i.e. to the web server + browser) didn't work on some
versions of UNIX. HTMLDOC now writes the PDF output to a temporary file and then copies it to
the standard output as needed.
PDF links were missing the first 5 characters in the filename; the code was trying to skip over the
"file:" prefix, but that prefix was already skipped elsewhere.
Nested descriptive lists (DL) did not get rendered properly.
Tables had extra whitespace before and after them.
Multiple aligned images confused parse_paragraph(); the images would overlap instead of stack on
the sides.
Bug Fixes
The Fonts and Colors tab groups did not extend to the full width of the tab area, which prevented the
Browse button from working when clicked on the right side.
The help dialog window did not scroll all the way to the bottom of the text.
The chapter ("c") header/footer string did not work.
The heading ("h") header/footer string did not always match the first heading on a page.
The header and footer fonts were not used when computing the widths of the header and footer
strings.
The Windows distribution did not create the right shortcut for the Users Manual in the Start menu.
The command-line code did not accept "--grayscale", only "--gray"
Multi-file HTML output did not use the right link for the table-of-contents file if no title page was
being generated.
Extra whitespace before and after tables has been eliminated.
C-14
New Features
Changes
The configure script now looks for the OpenGL library (required if you use a shared FLTK library
with OpenGL support.)
Increased the max number of chapters to 1000.
Bug Fixes
Page break comments didn't force a paragraph break.
--no-toc prevented chapters from being output in PS and PDF files.
Filenames didn't always get updated properly when doing a "save as"...
Fixed some more leading/trailing whitespace problems.
Wasn't freeing page headings after the document was generated.
Wasn't range checking the current chapter number; now limits the number of chapters to
MAX_CHAPTERS and issues an error message whenever the limit is exceeded.
Changes
Documentation updated for new UNIX "setup" program and "..." usage for headers and footers.
Changed margins to floating point (instead of integer) to improve table column accuracy.
Bug Fixes
HTMLDOC could crash under Microsoft Windows with some types of HTML files. This was caused
by a stack overflow, usually when processing nested tables.
Multiple HTML files weren't being converted properly in web page mode - only the last file would be
generated for PostScript output, and no file for PDF output.
Wasn't preserving the whitespace between "one" and "two" in the HTML code "one<I> two</I>
three".
Paragraph spacing was inconsistent.
<TABLE WIDTH="xx"> wasn't formatted properly.
The command-line code wasn't opening HTML files in binary mode. This caused problems under
Microsoft Windows.
C-15
Bug Fixes
Wasn't using TOC title string in PDF document outline.
Preformatted text in tables didn't force the column width.
Cells using COLSPAN > 1 didn't contribute to the width of columns.
The table code didn't enforce the per-column minimums under certain circumstances, causing
"scrambled" columns.
The configure script and makefiles didn't work when FLTK was not available. They now only build
the "gui" library when it is available.
The Windows distribution was installing files under PROGRAMDIR instead of TARGETDIR. This
prevented users from customizing the installation directory.
The configure script overrode the LDFLAGS environment variable, preventing FLTK from being
located in a non- default directory.
Changes
Lots of documentation changes.
Much better table formatting.
Changed HTML output to use less invasive navigation bars at the top and bottom of each file. This
also means that the "--barcolor" option is no longer supported!
C-16
Changes
Bug Fixes
Wasn't escaping &,<, or > in HTML output
Wasn't preserving
Links in multi-file HTML output were off-by-one.
BLOCKQUOTE needed to be like CENTER and DIV.
Needed to use existing link name if present for headings to avoid nested link name bug in Netscape
and MSIE.
Extremely long link names could cause TOC generation to fail and HTMLDOC to crash.
PDF output was not compatible with Ghostscript/Ghostview because Ghostscript does not support
inherited page resources or the "Fl" abbreviation for the "FlateDecode" compression filter.
PostScript DSC comments didn't have unique page numbers. This caused Ghostview (among others)
to get confused.
Some functions didn't handle empty text fragments.
Images couldn't be scaled both horizontally and vertically.
<LI> didn't support the VALUE attribute (but <OL> did...)
Fixed whitespace problems before and after some markups that was caused by intervening links.
The indexed image output code could generate an image with only 1 color index used, which upset
Acrobat Reader.
Fixed a bug in table-of-contents handling - HTMLDOC would crash on some systems if you
converted a web page on the command-line.
Wasn't setting the font size and spacing soon enough when generating files on the command-line.
Didn't hide EMBED elements when generating indexed HTML files.
Didn't always set the current drawing position before drawing a box or line.
Base85 encoding of image data was broken for PostScript output.
JPEG compression was broken for PostScript output.
Didn't set binary mode for the standard output under Windows and OS/2 needed.
Changes
C-17
1-18
Bug Fixes
This chapter describes the steps needed to install HTMLDOC on your system from the source distributions.
Requirements
HTMLDOC requires ANSI C and C++ compilers - recent versions of GCC/EGCS work fine. To build the
GUI you'll also need:
Fast Light Tool Kit ("FLTK"), version 1.1 or higher.
X11 libraries, R5 or higher (needed to build under UNIX and OS/2 only.)
Secure (https) URL support can be enabled via the OpenSSL library. You should use at least version 0.9.6l.
[C Shell]
[Bourne/Korn Shell]
Similarly, if your C++ compiler is not called CC, gcc, c++, or g++, set the CXX environment variable to the
name and path of your C++ compiler:
% setenv CXX /path/to/compiler ENTER
% CXX=/path/to/compiler; export CXX ENTER
[C Shell]
[Bourne/Korn Shell]
1-1
The default configuration will install HTMLDOC in the /usr/bin directory with the data files under
/usr/share/htmldoc and the documentation and on-line help under /usr/share/doc/htmldoc. Use the
--prefix option to change the installation prefix to a different directory such as /usr/local:
% ./configure --prefix=/usr/local ENTER
If the OpenSSL library is not installed in a standard location for your compilers, use the
--with-openssl-includes and --with-openssl-libs options to point to the OpenSSL library:
% ./configure --with-openssl-libs=/path/to/openssl/lib \
--with-openssl-includes=/path/to/openssl ENTER
HTMLDOC is built from a Makefile in the distribution's main directory. Simply run the "make" command to
build HTMLDOC:
% make ENTER
If you get any fatal errors, please report them on the htmldoc.general newsgroup at:
http://www.easysw.com/newsgroups.php
Please note the version of HTMLDOC that you are using as well as any pertinent system information such as
the operating system, OS version, compiler, and so forth. Omitting this information may delay or prevent a
solution to your problem.
Once you have compiled the software successfully, you may install HTMLDOC by running the following
command:
% make install ENTER
If you are installing in a restricted directory like /usr then you'll need to be logged in as root.
1-3
1-4