Chilton - ACL - Compiler Compiler 1966
Chilton - ACL - Compiler Compiler 1966
Contents
1. Introduction
1.1 Early Autcodes
1.2 Fortran
1.3 Algol 58
1.4 Some Terminology
2. Early Translator Writing Systems
2.1 Syntax machine
2.2 PSYCO
2.3 BMCC
3. Second Generation Translator Writing Systems
3.1 COGENT
3.2 CGS
3.3 TMG
3.4 Meta II
4. Third Generation Translator Writing Systems
4.1 FSL
4.2 TGS
4.3 Tree-Meta
5. The Demise of Translator Writing Systems
6. References
1. Introduction
Prior to 1950, most computer programs were written in assembly code; that is a low-level programming language where a one-to-
one relationship exists between instructions in the language and the computer's machine code instructions. By 1950, people were
considering whether it would be possible for the machine to generate the assembly code rather than humans.
In 1950, William Schmitt implemented Short Code for the Univac 1 In 1952, Alick Glennie produced an autocode programming
language for the Machester Mark 1 computer (the compiler was 750 instructions). Laning and Zierler produced an autocode for the
the Whirlwind at about the same time. (Knuth lists 20 autocodes that predate Fortran and states that Glennie's autocode was the
first one completed.)
1.2 Fortran
Work on a more ambitious autocode for the IBM 704 started around 1952. The proposal for Fortran I went to IBM Management in
1953, was specified by 1954 and compiler produced by 1957. The major deficiency of Fortran I was the absence of user-written
subroutines and functions. These were added in Fortran II in 1958.
The need for a system to automatically write compilers had not really appeared so far as many computers still ran programs in
assembler language or a single autocode. Implementing a single language on many different computers or implementing many
languages on a single computer were still not major requirements.
1.3 Algol 58
By 1958, there was concern that programming languages would proliferate. There were already signs of this. In both the USA and
Europe there was support towards defining a single programming language. A meeting was held in 1958 in Zurich which defined the
language called IAL (International Algebraic Language) in the USA and Algol58 in Europe. This was followed quite closely by
Algol60.
In the USA, languages such as JOVIAL, MAD and and NELIAC were defined as subsets of IAL. In Europe, there was enthusiasm
for Algol 60 which had a number of unfortunate features:
No reserved words. How an implementation defined goto was up to it. Thus, allez, go to, goto and many other variants
appeared.
No input/output statements. This ensured that every single Algol 60 implementation was different. In the UK, there were three
quite different input/output systems on the Elliott, English Electric and Ferranti computers.
Features that were either difficult or impossible to implement on a range of computers either due to size or speed..
The main result of the attempt at a single universal programming language was a proliferation of languages and dialects that meant
it was impossible to implement the compilers needed by a specific customer without some automation of the production of
compilers.
There was a need for a lexical analysis subsystem to resolve the problems associated with the variety of dialects and languages. It
was also needed to cover up the deficiencies in the way some languages had been defined, notably Fortran.
For Fortran, syntax analysis is relatively straightforward once the lexical problems are resolved. Algol is significantly different. In the
Algol 60 Report the language is defined in BNF, a method for defining legal programs in the language. It also had statements in
English that also described the language. The block structure facilities in Algol 60 made it more difficult to do the syntax analysis.
The semantic description of Algol 60 is also quite difficult. Features were defined that were difficult to compiler. Ambiguities existed.
Meanwhile the architectures of computers were changing with different storage systems, radically different instruction codes, and
speed based on pipelining specific operations. In consequence, generating efficient code across a range of computers was also
becoming difficult.
The need for some automation in the production of compilers was needed.
In describing translator writing systems, it is useful to define a set of terms that can be used irrespective of the system being
described.
Backus-Naur Form (BNF)
A set of rewrite rules that define a programming language. A rule has the form:
The non-terminal address can be replaced by a non-terminal name followed by a comma followed by a non-terminal street
followed by a comma followed by a non-terminal zip. Rewrite rules are applied starting from a single non-terminal, program say,
until a string is obtained that contains no non-terminal symbols. This is a program in the language.
BNF language
A language defined using BNF
parse tree
Given a program in the BNF languge, it is possible to describe all the productions applied to create the program as a tree starting
from the non-terminal program. This is called a parse tree.
syntax-directed compilation
Any system that can take the definition of a BNF language and generates a parse tree for an arbitrary program in that language.
(Manchester Mark 1 Autocodes: Alick Glennie, Tony Brooker 1954)
Val Schorre et al
Meta II : Spring 1963 S C Johnson
YACC 1970
O'Neil
Meta PI : 1968
J F Rulifson
Tree-Meta : 1968
The Syntax Machine defines a top-down parser for the language to be compiled having possible rules of the form:
The first parses an A as a B followed by a C ..etc. The second parses an A as a B or a C or a D ... The third parses an A as a B
followed by a C .. by one or more X. Having this iteration, saves on recursion.
It is up to the person writing the translator for a specific language to ensure that no backup is required. However, the rules described
above are converted into a pseudo flow chart that allows variations on this basic parsing. The Syntax Machine defines a machine-
independent instruction set which defines the semantics of the program. It is possible to optimise the pseudo code before
transforming it into the actual order code of a specific computer. The whole system was simulated on an IBM 650 and used to
generate compiled code for both the IBM704 and IBM709. Many of the ideas that appear in the Syntax Machine were used in the
S1, S2 and S3 Fortran compilers developed at AWRE Aldermaston for the IBM 7030 (Stretch) and later ported to the Ferranti Atlas
II. However, by then, an operator precedence parser was used for syntax analysis.
2.2 PSYCO
Irons started his work on a syntax-directed compiler in 1959 while at Hanover for the summer.
PSYCO uses a bounded-context bottom-up parser with backup. That is at any stage in the recognition of a program, a finite set of
previously recognised non-terminals and the next finite set of terminal systems will always define the action that the parser needs to
take. PSYCO was mainly aimed at writing translators for Algol-like languages. The system started with a symbol table containing
basic information and a parse of the Algol-like program collected declarative information about the variables used in the program. A
second pass used this information to compile the executable part of the program.
Irons also recognised the need for handling errors in programs presented to the compiler.
2.3 Compiler-Compiler
The Brooker-Morris compiler-compiler (BMCC) has a diferent notation from either Glennie or Irons. The parser is a top-down
recursive-descent parser with statements of the form:
It indicates that a source statement (SS) in the languge consists of a variable (V) followed by '=' followed by a signed arithmetic
expression (SAE).
Associated with each FORMAT statement is a ROUTINE that defines the semantics of the statement:
The appropriate ROUTINE statement is passed the parse tree of the statement recognised. It can inspect the parse tree and decide
what code should be generated.
The Compiler-Compiler effectively adds the definition of the language being defined to that of the Compiler-Compiler itself. Before
releasing the final compiler, it is usual to remove the Compiler-Compiler's own statements from that of the language being compiled.
The Compiler-Compiler was used to produce Fortan II, Fortran IV, Algol, Extended Mercury Autode, Atlas Autocode, SOL, ACL,
CPL, Elliott Autocode Mk III compilers for the Ferranti Atlas computer.
3. Early Translator Writing Systems
The early translator writing systems influenced a number of systems that followed soon after. We shall look at three of these and
one other, TMG, that also comes later but appears to have not been influenced by the earlier systems.
3.1 COGENT
John Reynolds' COGENT system was developed at the Argonne national Laboratory around 1962. COGENT stand for COmpiler
and GENeralised Translator. The objective was to unify the concept of syntax-directed compilation with the more general but
primitive concept of recursive list-processing. It is similar to BMCC in that it has two main constructs productions and generator
definitions which correspond somewhat to FORMAT and ROUTINE in BMCC.
An example of a set of productions for a simple language is:
(LETTER) = A,B,C,D,E.
(STRING) = (LETTER),(STRING)(LETTER).
(VARIABLE) = (STRING).
(FACTOR) = (VARIABLE).
(FACTOR) = (()(POLYNOMIAL)()).
(TERM) = (FACTOR).
(TERM = (TERM)*(FACTOR).
(POLYNOMIAL) = (TERM),+(TERM),-(TERM).
(POLYNOMIAL) = (POLYNOMIAL)+(TERM)
(POLYNOMIAL) = (POLYNOMIAL)-(TERM)
(TERM) = (FACTOR),(TERM)*(FACTOR)
COGENT uses the production statements to construct a parser, similar to the Irons parser, that takes the input and generates a
parse tree. By giving each recognised production a unique internal number, the parse tree can be very compactly represented as a
list structure. This list structure can be manipulated by the generator definitions and eventually output the compiled code. COGENT
was implemented on the Control Data 3600. A novelty of the implementation was that backup spawned a set of processes that
performed the various alternatives in parallel.
3.2 CGS
Computer Associates of Massachusetts was in the business of writing compilers. Compiler Generator System (CGS) was their first
attempt at a general purpose translator writing system aimed at producing efficient object code with optimisation provided at several
places in the system. The basic approach was a table-driven top down syntax analyser. It ran on an IBM 7090, a Burroughs D-825,
and a CDC 1604.
The generated compiler had five phases:
1. Syntactic Analyzer: converts input string into a tree-representation of its syntax
2. Generator: transforms the tree into a sequence of macro-instructions
3. Optimizer: recognises and eliminates redundant computation (eg common subexpressions, invariant computation out of loops,
etc)
4. Code Selector: assembles code fragments
5. Assembler: binds the code fragments in form required by the compiler environment.
The Syntactic Analyzer, Generator, and Code Selector are driven by tables defined for a specific language and machine code using
a BNF-like definition of each. The Syntactic Analyzer is used to generate the relevant tables.
3.3 TMG
TMG (stands for TransMoGrifier) was developed for the PDP-7 by R. M. McClure at Texas Instruments in 1975 and appears not to
have been influenced by the earlier systems. The system was ported to the IBM 7040, IBM 7090, and CDC 1604. It was used at
Bell Labs and also by the Multics Project. At Bell Labs, Dennis Ritchie used TMG to produce the compiler for B that later became C.
McIlroy and Morris used TMG to write the EPL compiler for Multics. As an aside, Stephen Johnson named his system YACC
because they already had TMG at Bell Labs! TMG was used to implement a subset of PL/I by Cal Tech.
TMG was aimed at constructing simple one-pass translators for some specialized language with almost no optimisation of code
generated. It was defined in a language TMGL with a compiler for TMGL on the target machine. Later, the system was defined in
itself.
The system is somewhat similar to the Glennie Syntax Machine in that the syntax specification of a program consists of sets of
statements with a label, a syntactic specification, and a semantic action that takes place if the syntactic entity is recognised. The
syntactic part consists of a set of actions with possibly two exit labels which give where to go next in the case of the syntactic entity
being recognised or not. If the syntactic part is recognised, the semantic part is executed.
In the first 18 months TMG was used to define two different FORTRAN compilers, a logic simulation system, several data format
converters, a geometric description language translator, and the TMGL compiler itself.
The system is not really aimed at handling structured languages like Algol 60.
3.4 Meta II
The SIG/PLAN Working Group 1 on Syntax Directed Compilers met monthly in the conference room at the UCLA computing facility.
John Backus attended one meeting to see how BNF was being used as a compiler writing language. Several separate attempts at
defining a translator writing system came out of that working group and the ideas finally came together initially as a system called
META but finally as META II.
META II was a simple translator writing system written by V Schorre et al in 1962.
In META II, the language for which a translator is required must be specified as a set of syntax rules. These very much resemble the
BNF notation except that they contain code generation commands as part of the syntax definition. There is limited memory in META
II so there is a need to output information soon after it has been recognised.
A restrictive top-down method of syntax analysis is used in the META II system. The restriction is that it does not allow back tracking
or left recursion. The main goal is to match the complete program with the Syntax Rule whose name follows .META at the head of
the program. The name of the Syntax Rule appears at the start of the rule followed by the = symbol. The right-hand side of this
Syntax Rule indicates which entities have to be recognised to achieve this goal. The recogniser achieves its main goal by looking
for these smaller entities, from left to right, as individual subgoals. These subgoals are themselves defined in terms of other entities,
and so the process of recognising a complete program becomes a matter of looking for progressively smaller entities, right down to
the level at which basic symbols such as characters and numbers are recognised.
A typical Meta II rule would be:
This recognises any string consisting of the letter C followed by any sequence involving the letters D and E.
Meta II rules can include .OUT statements that generate output code. Recognised basic symbols (.ID, .STRING, .NUMBER) are
stored and can be output by .OUT(*). Also two unique labels can be output for each statement by writing *1 or *2.
Meta II is a very simple system that can be defined in itself. This is the complete definition:
.SYNTAX PROGRAM
OUT1 = '*1' .OUT('GN1') / '*2' .OUT('GN2') / '*' .OUT('CI')/ .STRING .OUT('CL ' * ) .,
OUTPUT = ('.OUT' '(' $OUT1')' / '.LABEL' .OUT('LB') OUT1 ) .OUT('OUT') .,
EX3 = .ID .OUT('CLL' * ) / .STRING .OUT('TST' *) /
'.ID' .OUT('ID') / '.NUMBER' .OUT('NUM') / '.STRING' .OUT('SR') /
'(' EX1 ')' / '.EMPTY' .OUT('SET') /
'$' .LABEL *1' EX3 .OUT('BT ' *1 ) .OUT( 'SET') .,
EX2 = (EX3 .OUT('BF ' *1 ) / OUTPUT ) $(EX3 .OUT('BE') / OUTPUT) .LABEL *1 .,
EX1 = EX2 $( '/' .OUT('BT ' *1) EX2) .LABEL *1 .,
ST = .ID .LABEL * '=' EX1 '.,' .OUT('R') .,
PROGRAM = '.SYNTAX' .ID .OUT('ADR' * ) $ ST '.END' .OUT('END') .,
.END
4.1 FSL
In the period 1963-67, Bob Floyd from Computer Associates had moved to Carnegie-Mellon University. Jay Earley, a doctoral
student, was working on a parsing algorithm, the most efficient general context-free algorithm known. Alan Perlis et al were defining
an enhanced version of Algol, Formula Algol, which would incorporate the manipulation of mathematical formulae within the
language Algol. There was a need for a Formula Algol compiler. Jerome Feldman had completed his thesis, A Formal Semantics for
Computer Oriented Languages, in 1964. This defined a compiler-compiler, FSL, that was capable of translating into machine
language most of the existing programming languages. It was, therefore, used to produce the Formaula Algol Compiler for the
Bendix G-21. FSL had two main subsections. A Syntax Loader that builds tables which will control the recognition and parsing of
programs in the required language. A Semantic Loader builds a table describing the semantics of the instructions in the required
language. The Syntax Loader generated a Production Language (PL), based on the work of Bob Floyd, that scanned the current
input stream against the PL rules and decided on the action to take. The Semantic Loader defined the meaning of each language
statement in a machine-independent code.
FSL was a major input t0 the CABAL Compiler-Compiler system produced at Carnegie-Mellon University in the period 1967
onwards.
4.2 TGS
TGS (sometimes called TRANGEN), Translator Generator System, was produced at Computer Associates as a follow on to CGS. It
ran on an IBM 7094, CDC-1604, Univac M460 and GE-635 and was used to write translators for PL/I, ALGOL, and FORTRAN IV.
Each compiler is driven by a set of tables, TRANTAB, described in a language called TRANDIR. Synatx Analysis was done by a
system based on Floyd Productions. The code generation system was also used in later versions of the META systems.
4.3 Tree-Meta
After a number of intermediate systems, Rulifson defined Tree-Meta as a major advance over the earlier META systems. The
TREE-META program is in two parts. The first part is a definition of the syntax for the source language, L, for which a translator is
required and this is largely inherited from Meta II. The major difference is that instead of outputting code, the output is a tree built up
as the syntax proceeds. The second part defines a set of code generation rules that generate the object code to be produced from
this tree. The tree itself is quite flexible in format and is defined by the TREE-META user as he defines the syntax analyser for the
language. The code generator can take note of any peculiarities in the form of the tree, if that will lead to better code being
produced.
An example of the syntax recognise is:
.META EXPRESSION
EXPRESSION = TERM $ ( '+' TERM :ADD[2] / '-' TERM :SUB[2] ) ;
TERM = FACTOR $ ( '*' FACTOR :MULT[2] / '/' FACTOR :DIV[2] );
FACTOR = '+' PRIMARY / '-' PRIMARY :MIN[1] / PRIMARY ;
PRIMARY = .ID / .NUM / '(' EXPRESSION ')' ;
.END
The construct ADD[2] generates an ADD tree node from the top two items on the stack. The code generator instruction would be of
the form:
Separating code generation from the building of the parse tree means that quite efficient code can be generated.
5. The Demise of Translator Writing Systems
Interest in Translator Writing Systems continued after the mid-1960s but the interest in producing a universal system capable of
production-quality compilers decreased. This occurred for a number of reasons:
The number of new languages decreased and the requirements for new languages that did appear tended to be significantly
different from the previous generation.
Phases of the compilation process were specified in independent modules like LEX and YACC so that a compiler writer could
select a relevant set of modules to put together to create a specific compiler.
The speed of computers increased so that the need for the fastest compilation possible and the most efficient object code
decreased.
6. References
© Chilton Computing and UKRI Science and Technology Facilities Council [email protected]
Our thanks to UKRI Science and Technology Facilities Council for hosting this site
Build: 23/12/2022 14:15:25