Roberto Busa, S.J., and The Invention of The Machine-Generated Concordance
Roberto Busa, S.J., and The Invention of The Machine-Generated Concordance
January 1999
Winter, Thomas Nelson, "Roberto Busa, S.J., and the Invention of the Machine-Generated Concordance" (1999). Faculty Publications,
Classics and Religious Studies Department. 70.
http://digitalcommons.unl.edu/classicsfacpub/70
This Article is brought to you for free and open access by the Classics and Religious Studies at DigitalCommons@University of Nebraska - Lincoln. It
has been accepted for inclusion in Faculty Publications, Classics and Religious Studies Department by an authorized administrator of
DigitalCommons@University of Nebraska - Lincoln.
ROBERTOBUSA, S.J., AND THE INVENTION
OF THE MACHINE-GENERATED CONCORDANCE1
H
ere in 1999, Classicists, both as teachers and as scholars, are
equipped to appreciate that Tasman's 1957 prophecy was, if
anything, an understatement.
Classicists, both as teachers and as scholars, are enjoying the fruits
. . -
To start with the 1951 entry is essentiallyto start with the beginning of
humanities computing. For fiuther setting and contrast, the U.S. Bureau of
the Census installed the fust UNIVAC in 1951, and as noted above, IBM
delivered its fmt computer, the 70 1, in 1953.3 With regard to the above
setting, how could you do a machine-generated concordance in 195l ?
To begin to answer, Tasman's 1957prophecy was no shot in the dark.
His view of the future was a projection from his recent past. Thomas J.
Watson, Sr. had assigned him in 1949 to be IBM liaison and support
person for a young Jesuit's daring project to produce an index to the
complete writings of St. Thomas Aquinas [Ref. 176, p. 841.
First, Tasman's thesis, as subsequent history turned out, was a huge
understatement, and second, it essentially defines the first large invention
of Father Roberto Busa, S.J., namely, to look at "tools developed primarily
for science and commerce" and to see other uses for them. As will be seen,
this was a case of fortune favoring the prepared mind. Redirecting
3Fisher, Franklin, et al. IBM and the US Data Processing Industry: an Economic
History, New York, 1983, p. 8 and 12.
ROBERTO S.J.
BUSA,
ROBERTO
BUSA,
S.J.
Like all good projects, this one began with a question: What is the
metaphysics of presence in St. Thomas Aquinas? Combing for praesens
andpraesentia, he realized that such words were peripheral, and, however
unfortunately, Saint Thomas's doctrine of presence is linked with the
preposition in!
Inquiring what St. Thomas meant by "presence," the young Roberto
Busa realized that we must also study the way hction-words affect
meaning-words. To study the significant phrase "in the presence" he
needed the shades of "in." His dissertation, defended in 1946, was
essentially founded on a handmade Thomistic Concordance, essentially
complete, but with one entry.
He had made 10,000hand-written cards.
This project had results quite beyond the theological and
philosophicalvalue of his findings published in the first entry of the select
bibliography below. Deeming it necessary to learn what significance
6Rogers,William, THNK, a biography of the Watsom and IBM, New York, 1969, p. 69.
'Rogers, pp. 203-208.
T h e situation of the then-contemporary data-processing industry has implications.
That Varia Specimina is the world's first machine-generated concordance approaches
the certainty of a theorem in geometry.
(a) Suppose the contrary, i.e., that there was a machine-generated concordance
before 1951.
(b) If it was done before 1951, it was done on punch-card manipulating machines.
(c) If it was done on punch-card manipulating machines of that era, it was produced
with the full cooperation of IBM.
(d) But this is the project done with that cooperation.
Item (b) above is a certainty. Item (c) above, is not fully as certain: Remington
Rand equipment is an outside possibility. The US Bureau of the Census lodged an anti-
trust suit against IBM before the war for monopolizing the punch-card
Father Busa recalled the meeting as follows:
I knew, the day I was to meet Thomas J. Watson, Sr., that he had
on his desk a report which said IBM machines could never do
what I wanted. I had seen in the waiting room a small poster
imprinted with the words:
"The diff~cultwe do right away; the impossible takes a
little longer."
(IBM always loved slogans.) I took it in with me into Mr.
Watson's office. Sitting down in front of him and sensing the
tremendous power of his mind, I was inspired to say: "It is not
right to say 'no' before you have tried." I took out the poster and
showed him his own slogan. He agreed that IBM would
cooperate..."provided that you do not change IBM into
International Busa Machines." [Ref. 176, p. 841
The first product of this alliance was formed partly by the limitations
of the then current IBM equipment. It would manage only eighty
characters on a card. Eighty characters? Father Busa realized this would
allow nothing longer than, say, a line of hendecasyllabic poetry. And a
lemma
So he did the poetry of St. Thomas Aquinas. His 1951machine-generated
and machine-printed concordance served as a proof-of-concept exercise.
How was it done?
In the preface to the Varia Specimina, Father Busa summarizes the
essential stages of work for generating concordances:
data-processing industry. In that suit, it was revealed that IBM had a no-competition
agreement with Rernington Rand, whose card-machines used mechanical pins instead
of electric brushes to read the punch-outs. Remington in 1935 held the 15% of the
market left over from IBM (Rogers, 129-130). The question becomes, was there a
Remington Rand-associated concordance project? Not with punch-cards. The first
Remington Rand-associated concordance project appears to be John W. Ellison's
Bible Concordance o f 1957. It was done with the new magnetic tapes.
ROBERTO
BUSA,
S.J.
At the time Father Busa entered the scene, only the second of these
concordance-tasks,the multiplicationof cards, was being done mechanically.
The TLL was using the services of a copying bureau; Professor Roy
Deferrari, Catholic University of Washington, was using electrical
typewriterswhich could make many copies; Professor P. O'Reilly of Notre
Dame University, South Bend, Indiana, had each side of the text-page
repeated as many times as there were words thereon. [Ref. 4, p. 221
Something was needed which could do all five. Promising was the
Rapid Selector, invented by Vannevar Bush and developed by Ralph
Shaw. Father Busa saw it operating in 1949 at the Library of the
Department of Agriculture in Washington, D. C. It simultaneously ran two
microfilm reels, one with text, the other with coded symbols for the words
of that text, and photographed the pages exhibiting the targeted
keywords. It examined 10,000 microfilmedpages per minute, instantaneously
rephotographing the found pages on another microfilm! [Ref. 4, p. 221
Though impressive, this looked like a blind end-all the coding had to be
done by hand! And it could not be adapted to automated printing.
One cannot but note that Father Busa knew the nature of the task and
knew what he was looking for. In the array of business machines in the
post-war IBM inventory, the punchcard electro-mechanical accounting
machines looked more promising than anything he had seen at the
American libraries or in other indexing projects.
What were these machines? How did they work? What was Father
Busa looking at-and later dealing with? No one has set this forth better
than Dr. Cuthbert Hurd, testifying in US vs IBM, the antitrust suit brought
by the Justice department in 1969.
This is the array in which Father Busa managed to see the Index
Thornisticus. Computer? Not yet. Obviously applicable to Humanities
research? Not exactly. But to continue Hurd's testimony:
In Fisher, p. 13.
These devices were controlled by control panels, or "plug
boards...." Such a control panel might measure three feet by two
feet and contain perhaps a thousand holes. Each machine had a
different control panel...manual intervention was the key and
because of manual intervention and because of the mechanical
nature of the devices, the results were slow and unreliable.
Consequently, there was a sharp limit on the size and kind of
application or tasks that could be performed.
Varia Specimina
"'This was an abiding theme for the entire project. In their write-ups of the time,
both Busa in Europe and Tasman in New York used as a facing card illustration
Thomas's words-reminiscent of Aristotle-QUIA PARWS ERROR IN PRINCIPIO
MAGNUS EST IN FINE.... It made a good sample punch card.
the text. In other words, each line was multiplied as many times as the
words it contained. At this point, alphabetizing was a matter of feeding the
Sorter Machine [Ref, 4, p. 261, and if a researcher were to drop the cards,
getting them in correct array would be (ideally) a matter of simply putting
them in the Sorter again.
Most of the manual intervention occurred here, as the business
machine did not begin the second word in the same column on all cards, or
the third, or the fourth.... The language itself was trouble-causing. E.g. in
the Italian o, ebbi, avrei, are all forms of the same word; andiamocene is
three words presented as one; in Latin rnortuus est is (functionally) one
word presented as two. When the necessary manual intervention was
done, the last stop was the Alphanumeric Accounting Machine, or
Tabulator. This retranscribed the words in the holes in the cards into the
letters and numbers. A print-out, in sum. As Father Busa put it in his
preface: "The concordance which I am presenting as an example is
precisely an off-set reproduction of tabulated sheets turned out by the
accounting machine." [Ref. 4,281
The project was done in IBM offices in Milan, where Father Busa
started his own punching and verifling department.
Wood replied they were too slow. "Ultimately these machines, which
are now electro-mechanical, since they operate on electricity, are going to
operate at the speed of light, 10,000times the speed at which they function
now." Watson's biographer reports
"THINK, p. 175.
BUSA,S.J.
ROBERTO
Only the tiniest fraction of the machine time per card was spent
reading its punches. Almost all was in the physical manipulation of the
card. The now-familiar mark-sense machine scoring was not possible.
Carbon from pencil marks had an electrical resistance, 500-5,000 ohms,
varying by a factor of ten. A Michigan school teacher, Reynold B.
Johnson, invented a scoring machine with such high resistance that the
pencilmark resistance was not consequential. IBM bought it.I3
"THINK, p. 138.
')THINK, p. 139.
The definition of what he meant by "Large-scale data-processing method"
glimmers in the next sentence: "This, of course, is exclusive of preparation
andprogrammingtime." [emphasis added] Programming is now in civilian
use, but it is still in card-or card-analog-management. l5
The cards are now, figuratively speaking, two-edged. Machine
punched for the one, mark-sense capable for the other. The same machine
can now read both holes and pencil-lead. That is, the cards enable built-in,
machine readable human intervention-for which, as we will see, there is
always great need.
Tasman gives the background to Father Busa's Dead Sea Scrolls project:
Much of the actual "programming" was still in the design of the card, and
the essential over-all program was still visibly the process for producing a
concordance laid out by Father Busa in Varia Specimina. I venture to
summarize the 1957 algorithm, from Tasman's section "Specific phases of
automation in the literary analysis of the Summa Theologica of St.
Thomas Aquinas":
14Paul Tasman, July, 1957 IBM Journal of Research and Development, p. 256.
"The first FORTRAN system was released in 1957, for the IBM 704. This was the
year of Tasman's report/prospectus. FORTRAN was, of course, for numerical
computation. See S. Rarnsden, University of Manchester, http://www.man.ac.uk
hpctec/courses/Fortran9O/Fortran90~4.htrn1
I6Tasman, p. 256.
ROBERTO
BUSA,S.J. 15
3. From the phrase-cards the machine [the IBM 705?] does two
jobs: a) produces the word cards, and b) produces a complete
copy of the text, phrase by phrase. In various zones of the
card, there are encoded 1)the reference, 2) the first letter of the
preceding word and the first letter of the following word, 3) the
ordinal number denoting the word's position in the text, 4) a
characterizing mark. l7
4. Form cards: the machine counts and eliminates all word
duplicates. In this stack, there is left only one card for every
graphically different word, and a record of its total
occurrences. At this stage, sum and est, for example, are still
. different entries.
5. The scholar converts the form-cards into the entry cards.
But, in the words of Father Busa himself in 1990, "Our generation has
not done everything: for the young people there are still immense open
spaces" [Ref. 243, p. 3 3 91. He posed eight challengesldesiderata:
"Tasman, p. 256.
20Somemay be sampled in Paolo Guietti, "Hermeneutic of Aquinas's Texts: Notes
on the Index Thornisticus," The Thomist, 57, (1993) pp. 667-686, where one may
also see a knowledgeable view of the Index.
"1 take "tacit words" to mean words which are sufficiently implied, but not present
as, for instance, in the Greek geometric texts, the word for "line" can be represented
by simply the feminine article or feminine adjective; "point" by the neuter.
4. The creation of a general conspectus-within the confines of
a precise universe--of all words which are potentially
homographic.
"I have shortened or paraphrased some of these desiderata. They are in Ref. 243,
p. 341. Father Busa credits Peter Luhn of IBM with formulating challenge number 8
in the early 1950s, and also credits him as the introducer of the KWIC Index. Finally,
a personal note: working towards desideratum number five has, coincidentally, been
the core of my career as a researching Latinist since 1980-which is the vintage of
the IBM 360 mainframe in the background of Father Busa's picture.
7. "Mechanisierung der philologischen Analyse," in Nachrichtenfir
Dokumentation, vol. 3, n. 1 (Mar. 1952),pp. 14-19.