We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 4
Bootstrapping
‘A compiler is a complex enough program that we would like to write it in a
friendlier language than assembly language. In_the UNIX programming
environment. compilers are usually written in C. Even C compilers are writ
ten in C. Using the facilities offered by a language to compile itself is the
essence of booistrapping. Here we shall look at the use of bootstrapping to
create compilers and to move them from one machine to another by modifying
the back end. The basic ideas of bootstrapping have been known since the
mid. 1950's (Strong et al. 11958)
Bootstrapping may raise the question. “How was the first compiler com-
piled?" which sounds like, “What came first. the chicken or the egg?" but is
easier to answer. For an answer we consider how Lisp became a program-
ming language. McCarthy {19814 notes that in fate 1958 Lisp was used as a
notation for writing functions; they were then hand-translated into assembly
language and run. The implementation of an interpreter for Lisp occurred
unexpectedly. McCarthy wanted to show that Lisp was 2 notation for describ-
ing functions “much neater than Turing machines or the gener:
definitions used in recursive function theory,” so he wrote a function
evalfe, aL in Lisp that took a Lisp expression ¢ as an argument. S. R. Russell
noticed that eval could serve as an interpreter for Lisp, hand-coded it, and
thus created a programming language with an interpreter. As mentioned in
Section 1.1, rather than generating target code, an interpreter actually per-
forms the operations of the source program.
For bootstrapping purposes, a compiler is characterized by three languages:
the source language S (hat it compiles, the target language T that it generates
sode for, and the implementation language I that it is written in. We
represent the three languages using the following diagram, called a T-diagram,
‘because of its shape (Bratman [1961]).
recursive
Within text, we abbreviate the above T-diagram as Si T. The three languages
S, I, and ‘T may all be quite diffeccat
one machine and produce target code Fe
often called a eruss-compiter
Suppose we write a cross-compiter for a new language L in implementation
language S to generate code for machine IN; that is, we create LSN. If an
existing compiler for S runs on machine M and gencrates code for M, it is
characterized by SMM. If LSN is run through SMM, We get a compiler
LyX, that is, a compiler from L to N that runs on M. This process is illus-
trated in Fig. 11.1 by putting together the T-diagrams for these compilers.
For example, a compiler may run on
another machine. Such a compiler isWhen T-diagrarms are put together as in Fig. 111, note that the implemen-
tation language S of the compiler LgN must be the same as the source
Jan guage of the existing compiler Sq M and that the target language M of the
existing compiler must be that same as the implementation language of the
translated form LN. & trio of T-diagrams such as Fig. 11.1 can be thought
‘of as an equation
LsN + SMM = LMN
Example 11.1, The first version of the EQN compiler (see Section 12.1) had
C as the implementation language and generated commands for the text tor-
matter TROFF. As shown in the following diagram, a cross-compiler for
EQN, running on a PDP-11, was obtained by running EQN CTROFF through
the C compiler Cp] Lon the PDP-11
‘One form of bootstrapping builds up a compiler for larger and larger sub-
sets of a language. Suppose a new language L is to be implemented on
machine M. As a first step we might write a sma! compiler that translates a
subset S of L. into the target code for M: that is, a compiler Sm M. We then
use the subset S to write a compiler LgM for L. When LM is run through
SMM. we obtain an implementation of L, namely. LMM. Neliac was one of
the first languages to be implemented in its own language (Huskey, Halstead,
and McArthur [1960})
Wirth [1971| notes that Pascal was first implemented by writi
in Pascal itself. The compiler was then translated “by hand
low-level language without any attempt at optimization, The compiler was for
a subset “(>60 per cent)” of Pascal; several bootstrapping stages later a com-
piler for all of Pascal was obtained. Lecarme and Peyrolle-Thomas [1978]
summarize methods that have been used to bootstrap Pascal compilers,
For the advantages of bootstrapping to be realized fully. a compiler has to
be written in the langeage it compiles. Suppose we write a compiler LUN for
language L im L to generate code for machine N. Development takes place on
a machine M, where an existing compiler LMM for L runs and generates
code for M. By first compiling L LN with LM. we obtain a cross-compiler
LMN that runs on M, but produces code for N
ng a compiler
into an available‘The compiler L~N can be compiled a second time. this time using the gen-
ctoted exost-compiler
es) ee
N
uf N]N
M
‘The result af the second compilation is @ compiler LN that runs on N and
generates code for N. The are a number of useful applications of this two-
step process. so we shall write it as in Fig. bi.2.
Example 11.2. This example is motivated by the development of the Fortran
H compiler (see Section 12.4). “The compiler was itself written in Fortsan
and bootstrapped three times. The first time was to convert from running on
the 1BM 7094 to System/360 — an arduous procedure. The second time was
to optimize itself, which reduced the size of the compiler from about SOK to
about 400K bytes” (Lowry and Medlock (1969]}
Fig. 11.2. Bootstrapping a compiler
Using bootstrapping techniques, an optimizing compiler can optimize itself
Suppove ali development is done on machine M. We have Ss
optimizing compiler for a language S wi
optimizing compiler for § written in M.
We can create Smj}M#, « quick-and-dirty compiler for S on M that net
only generates poor code, but also takes 2 long time to do so. (M# indicates a
poor implementation in M. SM3Mé is a poor implementation of a compiler
thal generates poor code.) However, we can use the indifferent compiler
SM¢ M$ to obtain a good compiler for S in two steps:
> & good
S, and we want SMM, a good
enFirst, the optimizing compiler SsM is translated by the quick-and-dirty com-
piler to produce SMM, % poor implementation of the optimizing compiler
but one that does produce good code. The good optimizing compiler SMM is
obtained by recompiling S§M through SMiM. a