0% found this document useful (0 votes)
103 views

Better Literate Programming

Better ways to do literate programming according to the original author, Norman Ramsey...
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views

Better Literate Programming

Better ways to do literate programming according to the original author, Norman Ramsey...
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

literate programming

tools let you arrange the


parts of a program in
any order and extract
docurqentation and code
from the same source
file. The author argues
that languagedepen-
dence and feature com-
plexity have hampered
acceptance of these
tools, then affers 0
simpler alternative.
NORMAN RAMSEY
Bellcore
LITERATE PROGRAMMING
S~MPLUFIED~
I
n 1983, Donald Knuth introduced
literate programmingin the form of
Web, his tool for writing literate Pascal
programs. Web lets authors interleave
source code and descriptive text in a
single document. It also frees authors
to arrange the parts of a program in an
order that helps explain how the pro-
gram functions, not necessarily the
order required by the compiler.
In the mid-80s, word spread about
this new programming method as sev-
eral literate programs were published.
In 1987, Com77zmications of the ACM
created a special forum to discuss liter-
ate programming.2 Web was adapted
to programming languages other than
Pascal, including C, Modula-2,
Fortran, Ada, and others.3-6 With expe-
rience, however, many Web users
became dissatisfied. Continued inter-
est in literate programming led to a
frenzy of tool building. In the resulting
confusion, the literate-programming
forum was dropped, on the grounds
that literate programming had become
the province of those who could build
the* own tools.*
The proliferation of literate-pro-
gramming tools made it hard for liter-
ate programming to enter the main-
stream, but it led to a better under-
standing of what such tools should do.
Today the field is more mature, and
there is an emerging demand for tools
that are simple, easy to learn, and not
tied to a particular programming lan-
guage-
My own literate-programming tool,
noweb, fills this niche. Freely available
IEEE SOFTWARE 07407459/94/m 00 0 1994 IEEE 97
AN EXAMPLE OF NOWEB: COUNTING WORDS
I
This example, based on d program by Klaus Gunter-
mann and Joachim Schrod and a program hy Silvio Levv
and D. E. Knuth, presents the word count program
from Lnix, rewritten in noweh to demonstrate literate
programming using noweh. The level of detail in this
c
document is intentionally high, for didactic purposes;
many of the things spelled out here dont need to he
explained in other programs. The purpose ofwc is to
count lines, characters, and/or words in a list of files.
The number of lines in a file is the number of new-line
characters it contains. The number of characters is the
file length in bytes. A word is a maximal sequence of
consecutive characters other than newline. space, or tah,
containing at least one visible ASCII code. (Vie assume
that the standard .ASCIl code is in use.)
Most literate C programs share a common structure.
Its probably a good idea tn state the overall structure
explicitly at the outset, even though the various parts
could all be introduced in chunks named <*> if we want-
ed to add them piecemeal
Here, then. is an overview of the file WC. c that is
defined by the noweh program WC. nw:
98a
Root chunk (not used in this docummt).
Lve must include the standard l/O definitions because we
want to send formatted output to stdout and stderr.
dieader fiks to inrhde 98b>= 986
#include <etdio.h>
This code is used in chunk 9%~.
The status variable will tell the operating system if the run
was successful or not, and prog-nume is used in case theres an
error message to be printed.
cDe$niriom 98~s 98C
#define OK 0
/* status code for successful run */
#define usage-error 1
/* status code for improper rryntsx l /
#define cannot-open-file 2
/* statm code for file acc68is error ft
Definer
cannot-open-file, usedin chunktO&.
OK, used in chunk 98d.
usage~ertot, usedinchunk IO2d.
Uscs8tatwi 9M
This d&niiw is continualin chunks loOa, IO&, md 102r.
Thiic&iPtsedG&uak9Ra.
,: '
.P
.oSk&d tamk&s 98dxa
/* 962
on the Internet since 1989, noweb strips literate programming
(
tc
CC
nc
tk
PI
i its essentials. Programs are composed of named chunks of
)de, written in any order, with documentation interleaved.
To facilitate comparison of Web and noweb, a sample
Iweb program appears in the shaded box that runs throughout
iis article. I took the text, code, and presentation for this sam-
e from Knuths Literate Programming.
Noweb was developed on Unix and can be ported to non-
nix platforms provided they can simulate pipelines and sup-
)rt both AVSI C and either awk or Icon. For example, Kean
alleges Lee Wittenberg ported noweb to MS-DOS. Noweb
unique among literate-programming tools in its pipelined,
rtensible implementation, which makes it easy for experi-
[enters to create new features without writing their own tools.
Ii1
m
m
W
cc
h:
0
St
sl
St
P
rc
EBS COMPLEXITIES
Webs complexities make it difficult to explore the idea of
terate programming because too much effort is required to
laster the tool. To compound the difficulty, different program
ling languages are served by different versions of Web, each
ith its own idiosyncrasies.
The classic Web expands three kinds of macros, prettyprints
,de for typeset output, evaluates some constant expressions,
a&s string support into Pascal, and implements a simple form
f version control. The manual documents 27 control
:quences. I Versions for languages other than Pascal offer
ightly different functions and different sets of control
:quences.
Web uses its Tangle tool to produce source code and its
Jeave tool to produce documentation. Webs original Tangle
:moved white space and folded lines to fill each line with
multiple f i Les a:~
exit (stdt:b) ;
If the.first argument begins with a (\tt-), the
user is choosing the desired counts and specify-
Fig-we 1. A noweb sozwc(, fiagmentfiom the example progmnz.
98
SEPTEMBER 1994
-------.-7
Thi s code is used i n chunk 98~~.
Now we come to the general l ayout of the mai n function.
/ <The mai n program 9917s
99a
mai n(argc, argv)
int argc;
! /* # argumnte on Uni x c onmand l i ne*/
I
char l *argv;
tokens, maki ng its output unreadabl e. Later adaptati ons pre-
served l i ne breaks but removed other whi te space. Web s
Weave di vi des a program i nto numbered secti ons, and its
i ndex and cross-reference i nformati on refer to secti on numbers,
not page numbers. Web works poorl y wi th LaTexz LaTex con-
structs cannot be used i n Web source, and getti ng Weave out-
put to work i n LaTex document s requi res tedi ous adj ustments
by hand. Weave s source (wri tten i n Web) is several t housand
l i nes l ong, and the formatti ng code is not i sol ated.
NOWEB S FEATURES
Noweb s simplicity deri ves from a si mpl e model of files,
whi ch are marked up usi ng a si mpl e syntax. Fi gure 1 shows a
fragment of the noweb source used to generat e the boxed sam-
pl e program. It shows exampl es of chunk defi ni ti ons and uses,
quot ed code, and lists of def i ned i denti fi ers - al l of noweb s
syntax except escaped angl e brackets.
FRO structere. A noweb file is a sequence of chunks. A chunk
may contai n code, i n whi ch case it is named, or documentati on,
i n whi ch case it is unnamed. Chunks may appear i n any order.
Each code chunk begi ns wi th +&mk nmtzes= on a l i ne by itself.
The doubl e-l eft angl e bracket must bei n the first col umn. Each
document at i on chunk begi ns wi th a l i ne that starts wi th an @
symbol fol l owed by a space or newl i ne. Chunks are termi nated
implicitly by the begi nni ng of anot her chunk or by the end of the
file. If the first l i ne i n the file does not mark the begi nni ng of a
chunk, noweb assumes it is the first l i ne of a document at i on
chunk.
As Fi gure 2 shows, noweb uses its not angl e and noweave
tool s to extract code and documentati on, respecti vel y. When
not angl e is gi ven a noweb file, it wri tes the program on st andard
output. When noweave is gi ven a noweb file, it reads it and pro-
duces, on st andard output, Tex source for typeset document a-
ti on.
codr US. Code chunks contai n program source code and
/* the argumante, an array of atri age l /
f
4nri abl eshcal to mai n YYb>
wag-nam - argv[Ol r
<Set up opti on sel ecti on 990
<process al l thefilrs 99d>
cPri nt the grand tota.ls if there were mrrl ti pl efi l rr 102b
exit (statue);
)
Defi nes:
argc, used i n chunks 99c and 99d.
arm used i n chunks 99c, l OOr, and IOl c.
mai n. never used.
Uses prog_nuu 98d and #tatw 986.
Thi s code is used i n chunk 98s.
4i ri abl csl ourl tomai n99brt
int fl l e~count~
/* how many file8 thora are l /
char *whi aht
99b
/* whi ch oount~ to pri nt */
D&W
fi l r_aouat,usedi ncbdra99i r, JOOc, 10/r, md 102b.
ri hi a~, wed i n chunks J?%, l ol e, RX?&, and mu.
Tbi o defi ni ti on i a umci o@ i s~dwnb I&b aad l w
Thi s+bdho~ .
<.Gt up opti on sel ecti on Y 9~x3
9 91.
whi ch = lwc ;
/* if no opti on is gi ven pri nt 3 val ues */
if (argc > 1 h& *argv[l] == t-c) f
whi ch = argv[l] + 1;
argc--;
argv++ j
1
references to other code chunks. Several code chunks may have
the same name; not angl e concatenates thei r defi ni ti ons to pro-
duce a si ngl e chunk
fi l e-count - argc -1~
Us- =gc 9% p%rr 99~4 fi l e-count 996, ~LIJ whi ch 99).
Thi s code is used i n chunk 99a.
Code-chunk defi ni ti ons are l i ke macro defi ni ti ons: Not angl e :
Now we Scan the remai ni ng arguments and try to open a file
extracts a program by expandi ng one chunk (by defaul t the
if possi bl e. The tile is processed and its statistics are aven. i Ve
use a do 9 l .
whi l e hop because we shoul d read from the
chunk named cc*>>). The defi ni ti on of that chunk co&&s ref-
standard i nput if no tile name is gi ven.
erences to other chunks, whi ch are themsel ves expanded, and so
on. Fi gure 3 shows part of the boxed sampl e program as
<Proaur&t be@ 99$>rrph
99d
extracted by notangl e. Not angl e s output is readabl e; it pre-
l &3--j
& c I
serves whi te space and mai ntai ns the i ndentati on of expanded :
chunks wi th respect to the chunks i n whi ch they appear. Thi s
behavi or al l ows noweb to be used wi th l anguages l i ke Mi randa
&.&,&
i mdeo- IOlrz
and Haskel l , i n whi ch i ndentati on is si gni fi cant.
,,.gkaB rrp% jJ,
When doubl e-l eft and -ri ght angl e brackets are not pai red,
r #i i CstsWcs*@ XUXes
they are treated as literals. Users can force any such brackets,
*W#e ft?o+ 1,
even pai red brackets, to be treated as l i teral by usi ng a preced- _
~?qr*mrJ@?rcr;
,I II
/e
_.. . .~ .-..._ - -..- .-.._ .__. _____
If tbe first argument begi ns wi th a - , the user is choosi nr
the desi red counts and speci fyi ng the order i n whi ch they -
shoul d be di spl ayed. Each sel ecti on is gi ven by the initial char-
acter (l i nes, words, or characters). For exampl e, -cl woul d
cause just the number of characters and the number of l i nes to
be pri nted, i n that order. We do not process this stri ng now; w
si mpl y remember where it is. It wi l l be used to control the for-
matti ng at output ti me.
I EEE SOFTWARE
8R
/* even if there is only one file*/
) while (--argc > 0);
Lsce argc 9%.
This code is ued in chunk 79n.
Heres the code to open the tile. A special trick allows us
to handl e input From &din when no name is given. Recall
that the file descriptor to etdin is 0; thats what we use as
the default initial value.
int fd = 0;
/*file descriptor, initialized to &din*/
DdiIP3:
fd, used tn chunh lOOr, 1OOd. and IOld.
<Definitions 99~-w= 1 VVb
#define READ-ONLY 0
/* read access code for system open l /
Defines:
RJIAD~ONLY, used in chunk 100~.
4f ajik is given, t9y to open l (++argv) ;
cant hue if unmcces~~d I VVcx 1 voc
if (file-count > 0
&8 (fd=open (*(++argv),REAI_ONLY))< 0) {
fprint(stderr,
"%a: cannot open file %e\n",
9rogsame. *argv);
statue I= cannot~open~file;
file-count--;
continue;
Lkesargv99a, camot~open~file98c. fd 10Oa,ffle~count
99b, prog_nans 98d, IO&%, and status 98d.
This code is wed in chunk 99d.
4kejZe 1 OOd>=
close (fd)f
Uaesfll looa.
IOVd
This code is wed in chunk 99d.
We will do some homemade buffering in order to speed
things up: Characters will be read into the buffer array
before we process them. To do this we set up appropriate
pointers and counters.
eDq%i ti Qm98n+e 1OOe
#define buf-size BWFSI!6
/* atdi0.h BuFsIe cbnn far effici~ */
D&es:
ktf-#i8& used in chmka lwfand IOlA
--. - --i
Figure 2. Using noweb to build code and documentati on.
ing @ sign.
Any line begi nni ng with 0 and a space terminates a code
chunk. If such a line has the form @ cj %de f identijk-s it also
means that the precedi ng chunk defines the identifiers listed in
identijkn. This notation provi des a way of marki ng definitions
manually when no automatic marki ng is available.
Documentati on chunks. Documentati on chunks contain text that
is i gnored by notangl e and copi ed verbatim to standard output
by noweave (except for quoted code). Code may be quoted
within documentati on chunks by placing doubl e square brack-
ets around it. These brackets are i gnored by notangl e but are
used by noweave to give the quoted code special typographi c
treatment. For exampl e, in the sampl e program, quoted code is
set in the Courier font.
Noweave can work with LaTex, or it can use a plain Tex
macro package, supplied with noweb, that defines commands
like \chapter and \section. Noweave can also work with
HTML, the hypertext markup l anguage for Mosai c and the
Worl d-Wi de Web. The exampl e simulates the results after
processi ng by noweave and LaTex.
Noweave adds no newline characters to its output, maki ng it
easy to find the sources of Tex or LaTex errors. For exampl e,
an error on line 634 of a generated Tex file is caused by a prob-
lem on line 634 of the correspondi ng noweb file.
Index and cross-reference features. Cross-referencing of chunks
and identifiers makes large programs easier to understand. The
sampl e program accompanyi ng this article shows full cross-ref-
erence information.
Unlike Web, noweb does not introduce numbered set-
tions for cross-referencing. Noweb uses page numbers. If two
or more chunks appear on a page, say page 24, they are distin-
gui shed by appendi ng a letter to the page number: 24a or 24b,
for exampl e. Readers of large literate programs will appreciate
the use of a single numberi ng system.
Like Web, noweb writes chunk-cross-reference information
in a footnote font bel ow each code chunk. Noweb also includes
cross-reference information for identifiers, for exampl e,
Defines file-count, used in chunks 7,11,19, and 21.
Noweb generates this by usi ng the @ U %de f marki ngs in its
source code, or by recognizing definitions automatically.
Although noweb can automatically recogni ze definitions in C
programs, I used @J%def to mark the definitions in the sampl e
program. This choi ce not only illustrates the use of @ 0 %de f
100 SEPTEMBER 1994
but it also ensures results compatible with the CWeb version of
this program. Atitomatically generated indices would differ
because CWeb and noweb use different recognition heuristics.
Because noweb uses a language-independent heuristic to find
identifier uses, it can be fooled into finding false uses in com-
ments or string literals, like the use of status in chunk 3.
Complier and debugger support. On a large project, it is essential
that compilers and other tools refer to locations in the noweb
source, even though they work with notangles output. Giving
notangle the -L option makes it emit pragmas that inform
compilers of the placement of lines in the noweb source. It also
preserves the columns in which tokens appear, so that line-and-
column error messages are accurate. If you do not give notan-
gle the -L option, it respects the indentation of its input, mak-
ing its output easy to read.
Formatting features. Noweave depends on text formatters in
two ways: in the source of noweave itself and in the supporting
macros. Noweaves dependence on its formatter is small and
isolated, instead of being distributed throughout a large imple-
mentation. Noweb uses 250 lines of source for Tex and LaTex
combined, and another 250 for HTML. It uses about 200 lines
of supporting macros for plain Tex and another 300 lines to
support LaTex, primarily because the page-based cross-refer-
ence mechanism is complex. LaTex support without cross-ref-
/
1
maintargc, argv!
t
i
int argc;
/* the number of arguments on theUNIX
command line */
char **arp.
!* the &uments themselves, an array
of strings */
i
int fiLe_count;
i* how many f-:es tkele are *:
char **hick;-
i* which cxnts to c:~nt *i
int fd = 3;
/* f:le descriptor, ir~itiallzed to stdin Y
char buffer[kxf-size;;
1
i* we read the ir.~.;: :T.LO this array */
register char *ptr;
;* the first -nprocessed cnaracterin buffer */
I
register char *buf-end;
/* the first unused position in buffer 4
register int c;
I* current character, or number of characters
iust read */
int &word;
/* are we within a word? */
low word-count, line-count, char-count;
/*number of words, lines, and characters
-&und in file so far */
which = *l~~*rol'
:q
P=xLn~ =
-.--
Figure 3. Part of the example program after extraction by
notangle.
Ptr = buf-end = buffer;
line-count = word-count = char-count = 0;
in-word = 0;
CW buf-end /Wj, buffer !f//& char-count [0/y-,
in-word lM)/~, line-count I00t; Ptr IO/~;
and word-count /U/!/I
.Ihls co& is uvzl in chunk YYd.
The grand totals must he initialized to zero at the beginning
of the program. If u e made there variables local to main, we
would have to do this initialization cylicitlp; however, Cs
globals are automatically zeroed. (Or rather, statically
zeroed.) (Get It?)
cGlobaal z?ariabb 98dd>+= 1Olb
long tot-word-count, tot-line-count.
tot-char-count;
/* total number of words, lines, chars */
The present chunk, which does the counting that is WCS mi-
smz A%-e, was actually one of the simplescto write. LSre look at
each character and change state if it begins ot ends a word.
c.%anjk 101~~ IOIC
while (1) (
&ill buf fer ifit is empty; break at end offile [Old>
C ii l ptKtti
if (c > " && c < 0177) 1:
/* vieibile ASCII codes l /
if (!in-word) {
word-count++;
in-word = 1;
1
continue
if (C == \Zl) lZi.Ile-CoUnt++i
else if (c !=
"'1 &Ii c Ir '\t')coutiIlue;
in_word = 0:
/*c ie newline, space, or tab */
Usesingr~rd 1OOj line-countlwptr 10af;wcmLcountlO@
Thiscode isusedinchunk 996
Buffered I/O allows us to count the number of characters
almost for free.
IEEE SOFTWARE 101
printf(" '%a\n", l argv); /* not etdin l /
else
printf (\n) ; /* stdin l /
Cres argv YYz~.char-count IO/J5 file-count 996, line-
count lOOj3L;wcgrint 1026. which 99b, word-count lOll&
Yhls code 1s used ,n chunk 99d.
tot-line-count t= line-count;
tot-word-count += word-count;
tot-char-count += char-count;
1 erencing requires only 34 lines of source and no supporting
i macros. HTML requires no supporting macros.
Ccrs char count lOl$ line-count 1Otlf; word-count IO@ !
-
Xhls code is wed in chunk 996.
I
i1.c might as well improve a hit on Vnirs WC by displaying ~
the number of tiles too.
<Print the grand totds ifthel-e wre multiple files 102bE 102b
if (file-count > 1) {
wcgrint(which, tot-char-count,
totJord~count, tot~line.Jzount);
prfntf (total in %d filas\n, file-count);
Uses file-count 9911, wcgrint IO2d. which 996.
This code is used in chunk 99~.
The function below prints the values according to the speci-
fied options. The calling routine should supply a newline. If an
invalid option character is found we inform the user about
proper uie of the command. Counts are printed in eight-digit
fields so thev will line UD in columns.
1 cl)efhkiuns 98c>+a
#define print-count(n) printf("%81d': n)
l)ifi!liY
I OZC
dmkm~~ I OX>=
WC grintcwhich, char-count, word_count,
line_count )
1 l,21
char *which; /* which counts to print/
long &ax-count, word-count, line-count :
/* given totals l /
while (*which)
switch (%hich+t) (
case '1': print-count(line-count);
break:
case w: print-count (word_caunt) ;
break;
case c: print-count(char-count);
break; I
default:
if ((status 6r usage-error) == 0) I
fgrintf (stderr,
\nlWage:%a[-lwcl filename.. .]\I?,
Prog-=) i
status I= usage~ermr;
D&C?S:
wc.print,usedin~hunb IOlrand 102b.
Csrs char-count loof; line-count lOof; print-count 102~.
prog~~ame Wd. status Pad, usage-error 98c, which 99b,
andword-count lOOf.
This code is used in chunk 988.
A test of this program against the nstem WC command on a
SparcStadon showed the official WC rvas slightly slower.
Although that WC gave an appropriate error message for the
options -abc, it made no complaints about the options -labc!
Dare we suggest the s)istem routine might have been better had
its programmer used a more literate approach?
Uncoupling files and programs. The mapping between noweb
files and programs is many-to-many; the mapping between files
and documents is many-to-one. You combine source files by
listing their names on notangles or noweaves command line.
Notangle can extract more than one program from a single
source file by using the -R command-line option to identify
the root chunks of the different programs.
The simplest example of one-to-many program mapping is
that of putting a C header and program in a single noweb file.
The header comes from the root chunk <header>, and the pro-
gram from the default root chunk, <*A The following Unix
commands extract files wc.h and wc.c from noweb file wc.nw.
notangle -L wc.nw > wc.c
notangle -Rheader wc.nw I cpif -ne wc.h
The > in the first command directs notangles output to the file
wc.c. The I in the second command directs notangles output
to the cpif program, which is distributed with noweb. cpi f -
ne WC. h compares its input to the contents of file wc.h; if
they differ, the input replaces wc.h. This trick avoids touching
the file wc.h when its contents have not changed, which avoids
tiggering unnecessary recompilations.
Because it is language-independent, noweb can combine dif-
ferent programming languages in a single literate program.
This ability makes it possible to explain all of a projects source
in a single document, including not just ordinary code but also
things like make files, test scripts, and test inputs. Using literate
programming to describe tests as well as source code provides a
lasting, written explanation of the thinking needed to create the
tests, and it does so with little overhead. If not documented at
the time, the rationale behind complex tests can easily be lost.
IMPLEMENTING NOWEB
Until now we have discussed noweb from a users point of
view, showing that it is simple and easy to use. Nowebs imple-
mentation is also worth discussing, because nowebs extensible
implementation makes it unique among literate-programming
tools. Noweb tools are implemented as pipelines. Each pipeline
begins with the noweb source file. Successive stages of the
pipeline implement simple transformations of the source, until
the desired result emerges from the end of the pipeline.
Users change or extend noweb not by recompiling but by
inserting or removing pipeline stages; for example, noweave
switches from LaTex to HTML by changing just the last
pipeline stage. Nowebs extensibility enables its users to create
new literate-programming features without having to write
their own tools.
Nowebs syntax is easy to read, write, and edit, but it is not
easily manipulated by programs. Markup, which is the first
stage in every pipeline, converts noweb source to a representa-
102
SEPTEMBER 1994
don easily manipulated by common Unix tools like sed and mands, respectively.
awk, greatly simplifying the construction of later pipeline Noweb turns a World-Wide-Web browser like Mosaic
stages. Middle stages add information to the representation. into a hypertext browser for literate programs. For example,
Notangles final stage converts to code; noweaves final you can click on an identifier or chunk name to jump to the
stages convert to Tex, LaTex, or HTML. definition of that identifier or chunk. You can find a hyper-
In the pipeline representation, every line begins with 8 text version of the boxed sample program at ftp://bellcore.
and a keyword. The most important possibilities appear in comfpub/norman/noweb/wc.html.
Table 1. Markup brackets chunks by @begin . . . @end, and it
uses the noweb source to identify text and newlines, defini- EVALUATING NOWEB
tions and uses of chunks, and quoted code, which can all
appear inside chunks. It 1 a so preserves information about file Reviewers have had many expectations of literate-pro-
names and defined identifiers. Other index and cross-refer- gramming tools. lo We expect to be able to write code
ence information is inserted automatically by later pipeline chunks in any order. We expect to develop code and docu-
stages. The details of nowebs pipeline representation are mentation in one place. Finally, we expect automatically
described in the Noweb Hackers Guide, which is distributed generated cross-reference and index information. Like the
with noweb. original Web, noweb provides all these features, but in sim-
pler form.
EXTENDING NOWEB Web does provide features that noweb lacks, but existing
Unix tools can substitute for most of these. Although noweb
Noweb lets users insert stages into the notangle and contains no internal support for macros, Unix supplies two
noweave pipelines, so that they can change a tools existing macro processors that can work with noweb: the C pre-
behavior or add new features without recompiling. Even lan- processor and the m4 macro processor. The xstr program
guage-dependent features like formatted output and auto- extracts string literals, and the patch program provides a
matic index generation have been added to noweb without form of version control similar to Webs change files.
recompiling. Indexing and cross-referencing make noweb less simple
Stages inserted in the middle of a pipeline both read and than it could be. I need complex LaTex code to compute
write nowebs pipeline representation; they are called Jilters, page numbers for use in cross-reference lists and in the
by analogy with Unix filters, which are used in the Unix index. The ability to use page numbers justifies this com-
implementation.
Filters can be used to change the way noweb works; for
example, a one-line sed script makes noweb treat two chunk
names as identical if they differ only in their representation
of white space, as in Web. A 55-line Icon program makes it
possible to abbreviate chunk names using a trailing ellipsis.
To share programs with colleagues who dont enjoy literate
start a cllLlnk
programming, I use a filter that places each line of docu-
End a chunk
mentation in a comment and moves it to the succeeding
code chunk. With this filter, notangle transforms a literate
Qtext mittg sWitt,q appeared in 3 chunk
program into a traditional commented program, without @nl
A newline appeared in a chunk
loss of information and with only a modest penalty in read-
~ @tlefn *[z7/te
The code chunk named t/N?/ze in being
ability.
defined
Filters can be used to add significant features. Noweaves
~ @use nume
A reference to code chunk named 7ull)le
cross-reference and indexing features use two filters, one ~ @quote Start of quoted code in a document
that finds uses of defined identifiers and one that inserts I
chunk
cross-reference information. In most cases, programmers @endquote F;$kf quoted code in a document
must mark identifier definitions by hand, using @Cl %def,
.
but in some cases a third, language-dependent filter can be
Q f le pt1a7ltc Name of the tile from which the
used to mark identifier definitions, making index generation
cl1w1ks Gllllt!
completely automatic.
@index defn ident The current chunk contains a
Kostas Oikonomou of AT&T Bell Labs, Kaelin
definition of ihnt
Colclasure of Bridge Information Systems, and Conrad0
@index . . . Automatically generated index
Martinez-Parra of the Universidad Politecnica de Catalunya
inforination
in Barcelona have written noweb filters that add prettyprint-
@xref . . . Automaticall generated cross-
ing for Icon, C++, and Dijkstras language of guarded corn- ~. .._- ~~-.-~~-.-
reference in ormation fy --_ .- ~~ _~~ -~~ ~~,
IEEE SOFTWARE 103
plexity, especially since it can be hid-
den from most users. You do need to
understand the LaTex code if you
want to customize the appearance of
your noweb documents while retain-
ing nowebs use of page numbers for
cross-reference. Most literate-pro-
gramming tools forbid customization,
but not all users will accept such a
restriction. I have compromised
between simplicity and
customizability by add-
ing LaTex options for a
dozen of the most com-
monly requested cus-
tomizations. Users can
choose from among
these ontions without
unders&ding nowebs
LaTex code.
Experimenting with
noweb is easy because
the tools are simple. If
the experiment is unsat-
isfying, it is easy to a-
bandon, because notan-
gles output is readable,
records. Programs created with noweb
may be delivered in the form of ordi-
nary source code, leaving no clue that
noweb was used. The only way for me
to find out about uses of noweb is to
appeal for information on the
Internet. In this way I have learned
about significant noweb projects in
C++, Modula-2, Occam, parallel C,
Perl, Prolog, and Scheme.
David Hanson and
LANGUAGE-
INDEPENDENT
TOOLS iIKE
NOWEB ARE
Chris Fraser are using
noweb to write a book
describing the design and
implementation of a retar-
getable C compiler. Tip-
ton Cole & Company use
Noweb in their consulting
SIMILAR AND
business, which focuses on
EASIER TO
writing database applica-
tions on DOS platforms.
USE THAN
They find that noweb
TliADlTlONAL
helps compensate for
some of the deficiencies in
COMPLEX
DOS database tools, and
TOOL%
that literate programming
helps when a customer
and documentation can -
be preserved as embed-
ded comments. Noweb is simpler than
Web and easier to use and under-
stand, but it does less. I argue, howev-
er, that the benefit of Webs extra
features is outweighed by the cost of
the extra complexity, making noweb
better for writing literate programs.
Few of Webs remaining features will
be missed; for example, many compil-
ers evaluate constant expressions at
compile time. Noweb users are most
likely to miss pretty-printing, but it
may be more trouble than it is
worth.
In my own work, I have used
noweb for code written in various lan-
guages, including assembly language,
awk, Bourne shell, C, Icon, Modula-3,
Promela, Standard ML, and Tex.
These projects have ranged in size
from a few hundred to twenty thou-
sand lines of code. Information about
other programs written using noweb
is hard to find. Noweb is provided
free of charge, generating no sales
requests a change in a
program that hasnt been
touched in a year. A customer-sup-
port group at Sun Micro-systems is
using noweb to help teach their cus-
tomers how to work with aspects of
the. Solaris operating system like
threads and device drivers. The liter-
ate-programming paradigm makes it
possible to extract working code from
the same source used to create techni-
cal reports and newsletters.
OTHER TOOLS
A survey of literate-programming
tools is beyond the scope of this arti-
cle, but we can still sketch nowebs
place in the context of other tools.
Most literate-programming tools are
language-dependent and complex.
You must change tools when chang-
ing programming languages, repeat-
ing effort spent mastering a tool.
Newer tools, like noweb, are lan-
guage-independent. The three most
prominent are noweb, nuweb, and
Funnelweb.
To users, Noweb and nuweb look
very similar. There are minor syntac-
tic differences, and nuweb uses
markup within the source file instead
of command-line options to show
things like the names of output files,
but both are simple and easy to mas-
ter. Funnelweb is a complex tool that
includes its own rudimentary typeset-
ting language and command shell.
Many of the similarities between
noweb and nuweb arise by design.
Nuwebs initial design borrowed from
noweb, and later versions of each tool
have incorporated ideas from the
other.
Noweb and nuweb differ substan-
tively in implementation. Nuweb
is not pipelined; it is a single, mono-
lithic C program. This structure
makes nuweb easy to port, since only a
C compiler is needed, and it makes
it faster, since no parts are interpret-
ed and the overhead of creating a
pipeline is eliminated, but it also
makes nuweb hard to extend. Nowebs
pipeline makes it easy to extend, and
different stages of the pipeline can
be implemented in different pro-
gramming languages, depending
on which language is best for which
job. Extensibility is particularly valu-
able to those interested in pushing
the frontiers of literate programming,
who would otherwise have to write
their own tools from scratch.
I advocate language-independent
tools for two reasons. First, after mas-
tering one such tool, you can write
almost anything as a literate program,
including things like shell and per1
scripts, which often benefit dispropor-
tionately from a literate treatment.
Second, two of these tools - noweb
and nuweb - are much simpler, and
therefore much easier to master, than
any of the language-dependent tools.
Those who use one language exclu-
sively may, however, prefer a lan-
guage-dependent tool, since it pro-
vides pretty-printing, which when
done well can make the printed liter-
ate program easier to read.
.
N
oweb probably culminates one
kind of evolution in literate pro-
gramming: the trend toward greatest
simplicity. No significantly simpler
tool could do much. Noweb also
begins another kind of evolution,
toward greater extensibility and flexi-
bility. Further evolution might involve
replacing Unix shell scripts and
pipelines with an embedded language
having special data types to represent
pipelines, chunks, and literate pro-
grams. This step would make it easier
to port noweb to nonUnix platforms,
and it could make noweb run much
faster. Other developments might
include constructing new pipeline
stages to support language-dependent
operations like macro processing,
pretty-printing, and automatic iden-
tifier cross-reference.
These changes would extend no-
webs capabilities, but noweb is already
quite capable of supporting complex
programs and documents. It and relat-
ed tools are less capable of supporting
a modem word-processing style. The
word processors noweb currently
supports, Tex, LaTex, and HTML,
all use the old batch model of word
processing. Today, ma.ny authors
prefer WYSIWYG word processors
like Framemaker, WordPerfect, or
Microsoft Word. Kean Colleges
Wittenberg has developed a noweb-
like system called WinWordWeb
based on Word. Because of Words
limitations, including its secret propri-
etary data format, he could not reuse
any of nowebs implementation, but
the design is the same.
The challenge for literate pro-
gramming today is getting it into use.
Noweb helps by eliminating clutter
and complexity. Supporting modern
word processors would eliminate
ACKNOWLEDGEMENTS
Mark Weisers invaluable encouragement provided the impetus for me to write this
oaner, which I did while visitine the Comnuter Science Laboratorv of the Xerox Palo Alto
kesearch Center. David Hanso: sugEesteh and provided the cpif brogram. Preston Briggs
developed many of the ideas used in-nowebs indexing, and he con&ib;ted code used in &e
of the nineline stages. Bill Trost wrote the first HTMLnineline stage. Dave Love nrovided
I I I 1 1
much-needed LaTex expertise. Comments from Hanson and from the anonymous referees
stimulated me to improve the paper. The development of noweb was supported by a
Fannie and John Hertz Foundation Fellowship.
REFERENCES
1. D.E. Knuth, Literate Programming, Stanford University, Stanford, Calif., 1992.
2. P.J. Denning, Announcing Literate Programming, &mm. ACM, July 1987, p. 593.
3. K. Guntermann and J. Schrod, Web Adapted to C, TUGBoat, Oct. 1986, pp. 134-137.
4. S. Levy, Web Adapted to C, Another Approach, TUGBoat, April 1987, pp. 12-13.
5. N. Ramsey, Literate Programming: Weaving a Language-Independent Web, Connn. ACM, Sept.
1989, pp. 1051-1055.
6. H. Thimbleby, Experiences of Literate Programming Using CWeb (a Variant of Knuths Web),
Cmnputer3ouma1, 1986, pp. 201.2 11,
7. N. Ramsey and C. Marceau, Literate Programming on a Team Project, Sofnuare -Pm&e 6
Eqrit=nce, July 1991, pp. 677-683.
8. C. J. Van Wyk, Literate Programming: An Assessment, Comm. ACM, Mar. 1990, pp. 361.365.
9. D.E. Km&, The Web System of Structured Documentation, Tech. Report 980, Computer
Science Dept., Stanford Univ., Stanford, Calif., 1983.
another barrier, making it possible to
write literate programs without first
learning a new word-processing lan-
guage like LaTex or HTML.
More must be learned about suit-
able ways of structuring literate pro-
grams, about whether hypertext is a
useful alternative, and about what
other kinds of documents literate pro-
grams should resemble. What place
does literate programming have for
the majority of programmers, who are
not writing for publication? In the
near term, I suspect the best use for
literate programming will be to sup-
port rapid prototyping, providing a
simple and reliable way of document-
ing the design decisions made in, and
the lessons learned from, the proto-
type. In the long term, I hope that
simple, extensible tools like noweb
will lead everyone to appreciate the
benefits of literate programming. +
Norman Ramsey is a
research scientist at
Bellcore. His research
interests are the construc-
tion of software that is easy
to understand and to retar-
get to different machines.
His recent work includes a
retargetable debugger and
a toolkit that helps build
debuggers and other programs that manipulate
machine code.
Ramsey received a PhD in computer science
from Princeton University. He is a member of
ACM.
Address questions about this article to Ramsey at
Bellcore, 445 South Street, Morristown, NJ 07960;
[email protected]. Noweb can be obtained by
anonymous ftp from CLAN, the Comprehensive
Tex Archive Network, in directory web/noweb.
CTAN replicas appear on hosts ftp.shsu.edu,
ftp.tex.ac.uk, and ftpani-stuttgade. Nowebs
World-Wide-Web page is located at
ftp://bellcore.com/pub/norman/noweb.
IEEE SOFTWARE 105

You might also like