Basic Compiler
Basic Compiler
define
computer!
too
81
6 Example: BASIC Compiler
Program le for this chapter:
The BASIC programming language was designed by John Kemeny and Thomas Kurtz in
the late 1960s. (The name is an acronym for Beginners All-purpose Symbolic Instruction
Code.) It was rst implemented on a large, central computer facility at Dartmouth; the
designers goal was to have a language that all students could use for simple problems, in
contrast to the arcane programming languages used by most experts at that time.
A decade later, when the microcomputer was invented, BASIC took on a new
importance. Kemeny and Kurtz designed a simple language for the sake of the users,
but that simplicity also made the language easy for the Every programming
language requires a computer program to translate it into instructions that the computer
can carry out. For example, the Logo programs you write are translated by a Logo
interpreter. But Logo is a relatively complex language, and a Logo interpreter is a
pretty big program. The rst microcomputers had only a few thousand bytes of memory.
(Todays home computers, by contrast, have several million bytes.) Those early personal
computers couldnt handle Logo, but it was possible to write a BASIC interpreter that
would t them. As a result, BASIC became the near-universal language for amateur
computer enthusiasts in the late 1970s and early 1980s.
Todays personal computers come with translators for a wide variety of programming
languages, and also with software packages that enable many people to accomplish their
computing tasks without writing programs of their own at all. BASIC is much less widely
used today, although it has served as the core for Microsofts Visual Basic language.
In this chapter, I want to show how Logos command can be used in
a program-writing program. My program will translate BASIC programs into Logo
programs. I chose BASIC for the same reason the early microcomputers used it: Its
a small language and the translator is relatively easy to write. (Kemeny and Kurtz, the
designers of BASIC, have criticized the microcomputer implementations as simple
5
A Short Course in BASIC
line number.
82 Chapter 6 Example: BASIC Compiler
10 print "Table of Squares"
20 print
30 print "How many values would you like?"
40 input num
50 for i=1 to num
60 print i, i*i
70 next i
80 end
Table of Squares
How many values would you like?
1 1
2 4
3 9
4 16
5 25
75 print "Have a nice day."
and as unfaithful to their original goals. My implementation will share that defect, to
make the project easier. Dont use this version as a basis on which to judge the language!
For that you should investigate True Basic, the version that Kemeny and Kurtz wrote
themselves for personal computers.)
Heres a typical short BASIC program:
And heres what happens when we run it:
Each line in the sample BASIC program begins with a These numbers
are used for program editing. Instead of the modern screen editors with which youre
familiar, the early versions of BASIChad a very primitive editing facility; you could replace
a line by typing a new line with the same number. There was no way to replace less than
an entire line. To delete a line completely, youd enter a line containing just the number.
The reason the line numbers in this program are multiples of ten is to leave room for
inserting new lines. For example, I could say
to insert a new line between lines 70 and 80. (By the way, the earliest versions of Logo
used a similar line numbering system, except that each Logo procedure was separately
Logo
BASIC
to for
then if
let make
let
A Short Course in BASIC 83
LET variable = value
PRINT values
INPUT variables
FOR variable = value TO value
NEXT variable
IF value THEN command
GOTO linenumber
GOSUB linenumber
RETURN
END
10 print "Table of Squares":print
30 print "How many values would you like?":input num
50 for i=1 to num : print i, i*i : next i
80 end
make "x :y + 3 ( )
let x = y + 3 ( )
numbered. The editing technique isnt really part of the language design; early systems
used line editors because they had typewriter-like paper terminals instead of todays
display screens. Im using a line editor in this project because its easy to implement!)
The BASIC language consists of one or two dozen commands, depending on the
version used. My BASIC dialect understands only these ten commands:
Unlike Logo procedure calls, which consist of the procedure name followed by inputs
in a uniform format, each BASIC command has its own format, sometimes including
internal separators such as the equal sign and the word in the command format,
or the word in the command format.
In some versions of BASIC, including this one, a single line can contain more than
one command, if the commands are separated with colons. Thus the same program
shown earlier could also be written this way:
The command assigns a value to a variable, like Logos procedure. Unlike
Logo, BASIC does not have the rule that all inputs are evaluated before applying the
command. In particular, the word after must be the name of the variable, not an
expression whose value is the name. Therefore the name is not quoted. Also, a variable
cant have the same name as a procedure, so there is no need for anything like Logos
use of the colon to indicate a variable value. (This restricted version of BASIC doesnt
have named procedures at all, like some early microcomputer versions.)
string
operations
84 Chapter 6 Example: BASIC Compiler
print "x = "; x, "y = "; y, "sum = "; x+y
input "Please enter x and y: " x,y
+ -
* /
print
print
let
input print
input
readword readlist
make
input
type print
input
for next
for for
In my subset of BASIC, the value of a variable must be a number. More complete BASIC
dialects include string variables (like words in Logo) and arrays (like Logos arrays).
The value to be assigned to a variable canbe computedusing an arithmetic expression
made up of variables, numbers, the arithmetic operators , , , and , and parentheses
for grouping.
The command is similar to Logos print procedure in that it prints a line on
the screen. That line can include any number of values. Here is an example
command:
Inthis example two kinds of values are printed: arithmetic values (as inthe command)
and strings. A is any sequence of characters surrounded by quotation marks.
Notice that the values in this example are separated by punctuation marks, either
commas or semicolons. When a semicolon is used, the two values are printed right next to
each other, with no space between them. (Thats why each of the strings in this example
ends with a space.) When a comma is used, BASIC prints a tab character between the
two values, so that values on different lines will line up to form columns. (Look again at
the table of squares example at the beginning of this chapter.)
The command is the opposite of ; it reads values from the keyboard
and assigns them to variables. There is nothing in Logo exactly like . Instead,
Logo has and that output the contents of a line; those
values can be assigned to variables using or can be used in some other way. The
Logo approach is more exible, but the early versions of BASIC didnt have anything like
Logos operations. The command will also accept a string in quotation marks
before its list of variables; that string is printed as a prompt before BASIC reads from the
keyboard. (BASIC does not start a new line after printing the prompt, so the effect is like
Logos command rather than like .) Heres an example:
The user can type the values for x and y on the same line, separated by spaces, or on
separate lines. BASIC keeps reading lines until it has collected enough numbers for
the listed variables. Notice that the variable names in the command must be
separated by commas, not by semicolons.
The and commands work together to provide a numeric iteration
capability like Berkeley Logos procedure. The command format includes a
4
A Short Course in BASIC 85
for next
For
next
next j next i for next
if if
if
then
if goto
if = < >
10 input "Input size: " num
20 for i = 1 to num
30 for j = i to num
40 print i;" ";j
50 next j:next i
60 end
Input size:
1 1
1 2
1 3
1 4
2 2
2 3
2 4
3 3
3 4
4 4
let
make if equalp
* Notice that the equal sign has two meanings in BASIC. In the command, its like Logos
; in the command, its like Logos . In the early 1980s, Logo enthusiasts had
erce arguments with BASIC fans, and this sort of notational inconsistency was one of the things
that drove us crazy! (More serious concerns were the lack of operations and of recursion in the
microcomputer versions of BASIC.)
variable name, a starting value, and an ending value. (The step value is always 1.) The
named variable is given the specied starting value. If that value is less than the ending
value, then all of the commands between the command and the matching
command (the one with the same named variable) are carried out. Then the variable
is increased by 1, and the process continues until the ending value is reached. and
pairs with different variables can be nested:
Notice that the must come before the so that the / pairs are
properly nested.
The command allows conditional execution, much like Logos command, but
with a different notation. Instead of taking an instruction list as an input, BASICs
uses the keyword to introduce a single conditional command. (If you want to make
more than one command conditional, you must combine with , described next.)
The value that controls the must be computed using one of the operators , , or
for numeric comparison.*
Using the BASIC Translator
only
86 Chapter 6 Example: BASIC Compiler
goto
if
gosub return
goto
end end
end
throw toplevel
basic
READY
10 input x
20 if x > 0 then goto 100
30 print "x is negative."
40 print "x = "; x
50 goto 200
100 print "x is positive."
200 end
10 let x=7
20 gosub 100
30 let x=9
40 gosub 100
50 goto 200
100 print x, x*x
110 return
200 end
The command transfers control to the beginning of a command line specied
by its line number. It can be used with to make a sequence of commands conditional:
The and commands provide a rudimentary procedure calling
mechanism. I call it rudimentary because the procedures have no inputs, and can only
be commands, not operations. Also, the command lines that make up the procedure are
also part of the main program, so you generally need a in the main program to skip
over them:
Finally, the command ends the program. There must be an at the end of a
BASIC program, and there should not be one anywhere else. (In this implementation of
BASIC, an stops the BASIC program even if there are more lines after it. Its roughly
equivalent to a to in Logo.)
To start the translator, run the Logo procedure with no inputs. You will then see
the BASIC prompt, which is the word on a line by itself.
At the prompt you can do either of two things. If you type a line starting with a line
number, that line will be entered into your BASIC program. It is inserted in order by
line number. Any previous line with the same number will be deleted. If the line you
type contains a line number, then the line in the program with that number will be
deleted.
run
list
exit
Overview of the Implementation
immediate
source
target
machine language
another
batch
Overview of the Implementation 87
If your line does not start with a number, then it is taken as an command,
not as part of the program. This version of BASIC recognizes only three immediate
commands: The word means to run your program, starting from the smallest line
number. The word means to print out a listing of the programs lines, in numeric
order. The word returns to the Logo prompt.
There are twokinds of translators for programming languages: compilers andinterpreters.
The difference is that a compiler translates one language (the language) into
another (the language), leaving the result around so that it can be run repeatedly
without being translated again. An interpreter translates each little piece of source
language into one action in the target language and runs the result, but does not
preserve a complete translated program in the target language.
Ordinarily, the target language for both compilers and interpreters is the native
language of the particular computer youre using, the language that is wired into the
computer hardware. This is the only form in which a program can
actually be run. The BASIC compiler in this chapter is quite unrealistic in that it uses
Logo as the target language, which means that the program must go through
translation, from Logo to machine language, before it can actually be run. For our
purposes, there are three advantages to using Logo as the target language. First, every
kind of computer has its own machine language, so Id have to write several versions of
the compiler to satisfy everyone if I compiled BASIC into machine language. Second, I
know you know Logo, so you can understand the resulting program, whereas you might
not be familiar with any machine language. Third, this approach allows me to cheat by
leaving out a lot of the complexity of a real compiler. Logo is a high level language,
which means that it takes care of many details for us, such as the allocation of specic
locations in the computers memory to hold each piece of information used by the
program. In order to compile into machine language, Id have to pay attention to those
details.
Why would anyone want an interpreter, if the compiler translates the program once
and for all, while the interpreter requires retranslation every time a command is carried
out? One reason is that an interpreter is easier to write, because (just as in the case
of a compiler with Logo as the target language) many of the details can be left out.
Another reason is that traditional compilers work using a method, which means
that you must rst write the entire program with a text editor, then run the compiler to
translate the program into machine language, and nally run the program. This is okay
40
basic%40
run
incremental compiler,
88 Chapter 6 Example: BASIC Compiler
run (list (word "basic% first :linenumbers))
10 let x=3
20 let y=9
30 ...
to basic%10
make "%x 3
basic%20
end
for a working program that is used often, but not recompiled often. But when youre
creating a program in the rst place, there is a debugging process that requires frequent
modications to the source language program. If each modication requires a complete
recompilation, the debugging is slow and frustrating. Thats why interpreted languages
are often used for teachingwhen youre learning to program, you spend much more
time debugging a program than running the nal version.
The best of both worlds is an a compiler that can recompile only
the changed part when a small change is made to a large program. For example, Object
Logo is a commercial version of Logo for the Macintosh in which each procedure is
compiledwhen it is dened. Modifying a procedure requires recompiling that procedure,
but not recompiling the others. Object Logo behaves like an interpreter, because the
user doesnt have to ask explicitly for a procedure to be compiled, but programs run
faster in Object Logo than in most other versions because each procedure is translated
only once, rather than on every invocation.
The BASIC translator in this chapter is an incremental compiler. Each numbered
line is compiled into a Logo procedure as soon as it is typed in. If the line number is
then the resulting procedure will be named . The last step in each of these
procedures is to invoke the procedure for the next line. The compiler maintains a list of
all the currently existing line numbers, in order, so the command is implemented
by saying
Actually, what I just said about each procedure ending with an invocation of the next
one is slightly simplied. Suppose the BASIC program starts
and we translate that into
Overview of the Implementation 89
show member "the [when in the course of human events]
basic%15 basic%20
nextline
Nextline member memberp
true
member
emptyp if
butfirst
to basic%20
make "%y 9
basic%30
end
to basic%10
make "%x 3
nextline 10
end
to basic%20
make "%y 9
nextline 20
end
to nextline :num
make "target member :num :linenumbers
if not emptyp :target [make "target butfirst :target]
if not emptyp :target [run (list (word "basic% first :target))]
end
?
[the course of human events]
Then what happens if the user adds a new line numbered 15? We would have to recompile
line 10 to invoke instead of . To avoid that, each line is compiled
in a way that defers the choice of the next line until the program is actually run:
This solution depends on a procedure that nds the next available line
number after its argument:
uses the Berkeley Logo primitive , whichis like the predicate
except that if the rst input is found as a member of the second, instead of giving
as its output, it gives the portion of the second input starting with the rst input:
If the rst input is not a member of the second, outputs an empty word or list,
depending on the type of the second input.
The two separate tests are used instead of a single because the desired
line number might not be in the list at all, or it might be the last one in the list, in which
case the invocation will output an empty list. (Neither of these cases should
arise. The rst means that were running a line that doesnt exist, and the second means
The Reader
10 let linenumbers = 100
end
basic%10 x
%x
linenumbers
%linenumbers
let x+1
if
then
nextline
reader
parser
code generator
runtime library
optimizer
reader
90 Chapter 6 Example: BASIC Compiler
that the BASIC program doesnt end with an line. But the procedure tries to avoid
disaster even in these cases.)
Look again at the denitionof . Youll see that the variable named inthe
BASIC program is named in the Logo translation. The compiler uses this renaming
technique to ensure that the names of variables and procedures in the compiled program
dont conict with names used in the compiler itself. For example, the compiler uses a
variable named whose value is the list of line numbers. What if someone
writes a BASIC program that says
This wont be a problem because in the Logo translation, that variable will be named
.
The compiler can be divided conceptually into four parts:
The divides the characters that the user types into meaningful units. For
example, it recognizes that is a single word, but should be understood as
three separate words.
The recognizes the form of each of the ten BASIC commands that this dialect
understands. For example, if a command starts with , the parser expects an
expression followed by the word and another command.
The constructs the actual translation of each BASIC command into one
or more Logo instructions.
The contains procedures that are used while the translated program
is running, rather than during the compilation process. The procedure
discussed earlier is an example.
Real compilers have the same structure, except of course that the code generator produces
machine language instructions rather than Logo instructions. Also, a professional
compiler will include an that looks for ways to make the compiled program as
efcient as possible.
A is a program that reads a bunch of characters (typically one line, although not in
every language) and divides those characters into meaningful units. For example, every
The Reader 91
-
x x-3
first 555-2368
555
:
"
basicread
print :x-3
make "phones [555-2368 555-9827 555-8311]
+ -
* / = < > ( ) , ; :
Logo implementation includes a reader that interprets square brackets as indications
of list grouping. But some of the rules followed by the Logo reader differ among
implementations. For example, can the hyphen character ( ) be part of a larger word,
or is it always a word by itself? In a context in which it means subtraction, wed like it to
be a word by itself. For example, when you say
as a Logo instruction, you mean to print three less than the value of the variable named
, not to print the value of a variable whose name is the three-letter word ! On the
other hand, if you have a list of telephone numbers like this:
youd like the of that list to be an entire phone number, the word ,
not just . Some Logo implementations treat every hyphen as a word by itself; some
treat every hyphen just like a letter, and require that you put spaces around a minus sign
if you mean subtraction. Other implementations, including Berkeley Logo, use a more
complicated rule in which the status of the hyphen depends on the context in which it
appears, so that both of the examples in this paragraph work as desired.
In any case, Logos reader follows rules that are not appropriate for BASIC. For
example, the colon ( ) is a delimiter in BASIC, so it should be treated as a word by itself;
in Logo, the colon is paired with the variable name that follows it. In both languages,
the quotation mark ( ) is used to mark quoted text, but in Logo it comes only at the
beginning of a word, and the quoted text ends at the next space character, whereas in
BASIC the quoted text continues until a second, matching quotation mark. For these
and other reasons, its desirable to have a BASIC-specic reader for use in this project.
The rules of the BASIC reader are pretty simple. Each invocation of
reads one line from the keyboard, ending with the Return or Enter character. Within
that line, space characters separate words but are not part of any word. A quotation mark
begins a quoted word that includes everything up to and including the next matching
quotation mark. Certain characters form words by themselves:
All other characters are treated like letters; that is, they can be part of multi-character
words.
The Parser
parser
92 Chapter 6 Example: BASIC Compiler
show basicread
30 print x;y;"foo,baz",z:print hello+4
basicread
Basicread readword Readword
Basicread
basicread
let
x ( 3 * y ) + 7
let
?
[30 print x ; y ; "foo,baz" , z : print hello + 4]
let x = ( 3 * y ) + 7
LET variable = value
Notice that the comma inside the quotation marks is not made into a separate word by
. The other punctuation characters, however, appear in the output sentence
as one-character words.
uses the Logo primitive to read a line. can be
thought of as a reader with one trivial rule: The only special character is the one that
ends a line. Everything else is considered as part of a single long word.
examines that long word character by character, looking for delimiters, and accumulating
a sentence of words separated according to the BASIC rules. The implementation of
is straightforward; you can read the procedures at the end of this chapter
if youre interested. For now, Ill just take it for granted and go on to discuss the more
interesting parts of the BASIC compiler.
The is the part of a compiler that gures out the structure of each piece of the
source program. For example, if the BASIC compiler sees the command
it must recognize that this is a command, which must follow the pattern
and therefore must be the name of a variable, while must be an
expression representing a value. The expression must be further parsed into its
component pieces. Both the variable name and the expression must be translated into
the form they will take in the compiled (Logo) program, but thats the job of the code
generator.
In practice, the parser and the code generator are combined into one step; as each
piece of the source program is recognized, it is translated into a corresponding piece
of the object program. So well see that most of the procedures in the BASIC compiler
include parsing instructions and code generation instructions. For example, here is the
procedure that compiles a command:
The Parser 93
queue
make let
let
compile.let
let
pop
let
pop
expression if
to compile.let :command
make "command butfirst :command
make "var pop "command
make "delimiter pop "command
if not equalp :delimiter "= [(throw "error [Need = in let.])]
make "exp expression
queue "definition (sentence "make (word ""% :var) :exp)
end
make "command butfirst :command
make "var pop "command
make "delimiter pop "command
if not equalp :delimiter "= [(throw "error [Need = in let.])]
make "exp expression
In this procedure, all but the last instruction (the line starting with ) are parsing
the source command. The last line, which well come back to later, is generating a Logo
instruction, the translation of the BASIC in the object program.
BASIC was designed to be very easy to parse. The parser can read a command from
left to right, one word at a time; at every moment, it knows exactly what to expect. The
command must begin with one of the small number of command names that make up
the BASIC language. What comes next depends on that command name; in the case
of , what comes next is one word (the variable name), then an equal sign, then
an expression. Each instruction in the procedure handles one of these
pieces. First we skip over the word by removing it from the front of the command:
Then we read and remember one word, the variable name:
(Remember that the operation removes one member from the beginning of a list,
returning that member. In this case we are removing the variable name from the entire
command.) Then we make sure theres an equal sign:
And nally we call a subprocedure to readthe expression; as well see later, that procedure
also translates the expression to the form it will take in the object program:
The parsers for other BASIC commands have essentially the same structure as
this example. They repeatedly invoke to read one word from the command or
to read and translate an expression. (The command is a little more
split
Split
Split
94 Chapter 6 Example: BASIC Compiler
show split [30 print x ; y ; "foo,baz" , z : print hello + 4] ?
[30 [print x ; y ; "foo,baz" , z] [print hello + 4]]
to basic
forever [basicprompt]
end
to basicprompt
print "READY
make "line basicread
if emptyp :line [stop]
ifelse numberp first :line [compile split :line] [immediate :line]
end
to compile :commands
make "number first :commands
ifelse emptyp butfirst :commands ~
[eraseline :number] ~
[makedef (word "basic% :number) butfirst :commands]
end
to makedef :name :commands
...
foreach :commands [run list (word "compile. first ?) ?]
...
end
complicated because it contains another command as a component, but that inner
command is just compiled as if it occurred by itself. Well look at that process in more
detail when we get to the code generation part of the compiler.)
Each compilation procedure expects a single BASIC command as its input. Remem-
ber that a line in a BASIC program can include more than one command. The compiler
uses a procedure named to break up each line into a list of commands:
outputs a list whose rst member is a line number; the remaining members are
lists, each containing one BASIC command. works by looking for colons within
the command line.
Here is the overall structure of the compiler, but with only the instructions related
to parsing included:
The Code Generator
The Code Generator 95
Basic basicprompt
Basicprompt
split compile
Compile
makedef
makedef
compile.
if for
let make
print type print
nextline
define
10 let x = 3 : let y = 4 : print x,y+6
to basic%10
make "%x 3
make "%y 4
type :%x
type char 9
type :%y + 6
print []
nextline 10
end
define "basic%10 [[] [make "%x 3] [make "%y 4] ... [nextline 10]]
does some initialization (not shown) and then invokes repeatedly.
calls the BASIC reader to read a line; if that line starts with a number,
then is used to transform the line into a list of commands, and is
invoked with that list as input. remembers the line number for later use, and
then invokes with the list of commands as an input. Ive left out most of
the instructions in because theyre concerned with code generation, but the
important part right now is that for each command in the list, it invokes a procedure
named something based on the rst word of the command, which must be
one of the command names in the BASIC language.
Each line of the BASIC source programis going to be compiled into one Logo procedure.
(Well see shortly that the BASIC and commands are exceptions.) For example,
the line
will be compiled into the Logo procedure
Each of the three BASIC commands within the source line contributes one or more
instructions to the object procedure. Each command is translated into a
instruction; the command is translated into three instructions and a
instruction. (The last instruction line in the procedure, the invocation of ,
does not come from any of the BASIC commands, but is automatically part of the
translation of every BASIC command line.)
To generate this object procedure, the BASIC compiler is going to have to invoke
Logos primitive, this way:
show expression [3 + x * 4]
Beyond Programming.
96 Chapter 6 Example: BASIC Compiler
define :name :definition
queue "definition (sentence "make (word ""% :var) :exp)
?
[3 + :%x * 4]
define
name makedef
definition makedef
define
basic%10
definition
queue Queue
compile.let
make
expression
3+x*4
x
expression
x :%x
makedef
Of course, these actual inputs do not appear explicitly in the compiler! Rather, the inputs
to are variables that have the desired values:
The variable is an input to , as weve seen earlier. The variable
is created within . It starts out as a list containing just the empty
list, because the rst sublist of the input to is the list of the names of the desired
inputs to , but it has no inputs. The procedures within the compiler that parse
each of the commands on the source line will also generate object code (that is, Logo
instructions) by appending those instructions to the value of using Logos
command. takes two inputs: the name of a variable whose value is a list,
and a new member to be added at the end of the list. Its effect is to change the value of
the variable to be the extended list.
Look back at the denition of above. Earlier we considered the
parsing instructions within that procedure, but deferred discussion of the last instruction:
Now we can understand what this does: It generates a Logo instruction and
appends that instruction to the object procedure denition in progress.
We can now also think about the output from the procedure. Its job is
to parse a BASIC expression and to translate it into the corresponding Logo expression.
This part of the compiler is one of the least realistic. A real compiler would have to think
about such issues as the precedence of arithmetic operations; for example, an expression
like must be translated into two machine language instructions, rst one that
multiplies by 4, and then one that adds the result of that multiplication to 3. But the
Logo interpreter already handles that aspect of arithmetic for us, so all has
to do is to translate variable references like into the Logo form .
(Well take a closer look at translating arithmetic expressions in the Pascal compiler
found in the third volume of this series, )
We are now ready to look at the complete version of :
The Code Generator 97
nextline
nextline
define
goto gosub
goto goto gosub
goto
goto
nextline
stop
to makedef :name :commands
make "definition [[]]
foreach :commands [run list (word "compile. first ?) ?]
queue "definition (list "nextline :number)
define :name :definition
make "linenumbers insert :number :linenumbers
end
goto 40
basic%40 stop
stop
basic%40
* In fact, the Berkeley Logo interpreter is clever enough to notice that there is a instruction
after the invocation of , and it arranges things so that there is no return from that
procedure. This makes things a little more efcient, but doesnt change the meaning of the
program.
I hope youll nd this straightforward. First we create an empty denition. Then, for
each BASIC command on the line, we append to that denition whatever instructions
are generated by the code generating instructions for that command. After all the BASIC
commands have been compiled, we add an invocation of to the denition.
Now we can actually dene the Logo procedure whose text weve been accumulating.
The last instruction updates the list of line numbers that uses to nd the next
BASIC command line when the compiled program is running.
In a sense, this is the end of the story. My purpose in this chapter was to illustrate
how can be used in a signicant project, and Ive done that. But there are a
few more points I should explain about the code generation for some specic BASIC
commands, to complete your understanding of the compiler.
One such point is about the difference between and . Logo doesnt
have anything like a mechanism; both and must be implemented by
invoking the procedure corresponding to the given line number. The difference is that
in the case of , we want to invoke that procedure and not come back! The solution
is to compile the BASIC command
into the Logo instructions
In effect, we are calling line 40 as a subprocedure, but when it returns, were nished.
Any additional Logo instructions generated for the same line after the (including
the invocation of thats generated automatically for every source line) will be
ignored because of the .*
generated symbol,
98 Chapter 6 Example: BASIC Compiler
for next
next
for for
next
basic%N
for
for
make basic%30 let
for
i
%g1
g1
gensym gensym
g1 g2
%g1 type print
print
30 let x = 3 : for i = 1 to 5 : print i,x : next i
to basic%30
make "%x 3
make "%i 1
make "let%i 5
make "next%i [%g1]
%g1
end
to %g1
type :%i
type char 9
type :%x
print []
make "%i :%i + 1
if not greaterp :%i :let%i [run :next%i stop]
nextline 30
end
The next tricky part of the compiler has to do with the and commands.
Think rst about . It must increment the value of the given variable, test that
value against a remembered limit, and, if the limit has not been reached, go to... where?
The loop continues with the BASIC command just after the command itself.
That might be in the middle of a line, so cant just remember a line number and
invoke for line number N. To solve this problem, the line containing the
command is split into two Logo procedures, one containing everything up to and
including the , and one for the rest of the line. For example, the line
is translated into
The rst instruction in is the translation of the command. The
remaining four lines are the translation of the command; it must give an initial value
to the variable , remember the limit value 5, and remember the Logo procedure to be
used for looping. That latter procedure is named in this example. The percent
sign is used for the usual reason, to ensure that the names created by the compiler dont
conict with names in the compiler itself. The part is a created by
invoking the Berkeley Logo primitive operation . Each invocation of
outputs a new symbol, rst , then , and so on.
The rst four instructions in procedure (three s and a ) are the
translation of the BASIC command. The next two instructions are the translation
The Code Generator 99
next make i if
%g1
run :next%i %g1 %g1
for
next
for
next%i next
next%i
for
next
print input
if
if
20 print "hi there"
50 if x<6 then print x, x*x
to basic%50
if :%x < 6 [%g2]
nextline 50
end
to %g2
type :%x
type char 9
type :%x * :%x
print []
end
of the command; the instruction increments , and the instruction tests
whether the limit has been passed, and if not, invokes the looping procedure again.
(Why does this say instead of just ? Remember that the name was
created during the compilation of the command. When we get around to compiling
the command, the code generator has no way to remember which generated
symbol was used by the corresponding . Instead it makes reference to a variable
, named after the variable given in the command itself, whose value is the
name of the procedure to run. Why not just call that procedure itself instead of
using a generated symbol? The trouble is that there might be more than one pair of
and commands in the same BASIC program using the same variable, and each of
them must have its own looping procedure name.)
There is a slight complication in the and commands to deal with
quoted character strings. The trouble is that Logos idea of a word ends with a space, so
its not easy to translate
into a Logo instruction in which the string is explicitly present in the instruction. Instead,
the BASIC compiler creates a Logo global variable with a generated name, and uses that
variable in the compiled Logo instructions.
The trickiest compilation problemcomes fromthe command, because it includes
another command as part of itself. That included command might be translated into
several Logo instructions, all of which should be made to depend on the condition that
the is testing. The solution is to put the translation of the inner command into a
separate procedure, so that the BASIC command line
is translated into the two Logo procedures
goto
stop %g3 %g3 basic%60
if
if
if
100 Chapter 6 Example: BASIC Compiler
60 if :foo < 10 then goto 200
to basic%60
if :%foo < 10 [%g3]
nextline 60
end
to %g3
basic%200 stop
end
to basic%60
if :%foo < 10 [basic%200 stop]
nextline 60
end
to compile.if :command
make "command butfirst :command
make "exp expression
make "delimiter pop "command
if not equalp :delimiter "then [(throw "error [Need then after if.])]
queue "definition (sentence "if :exp (list c.if1))
end
Unfortunately, this doesnt quite work if the inner command is a . If we were
to translate
into
then the inside would stop only itself, not as desired. So the
code generator for checks to see whether the result of compiling the inner command
is a single Logo instruction line; if so, that line is used directly in the compiled Logo
rather than diverted into a subprocedure:
How does the code generator for divert the result of compiling the inner
command away from the denition of the overall BASIC command line? Here is the
relevant part of the compiler:
The Runtime Library
separate
The Runtime Library 101
if
then c.if1
definition
c.if1 c.if1
definition
compile.print compile.goto
nextline
readvalue
input BASIC
input
input
input Readvalue readline
to c.if1
local "definition
make "definition [[]]
run list (word "compile. first :command) :command
ifelse (count :definition) = 2 ~
[output last :definition] ~
[make "newname word "% gensym
define :newname :definition
output (list :newname)]
end
The rst few lines of this are straightforwardly parsing the part of the BASIC command
up to the word . What happens next is a little tricky; a subprocedure is
invoked to parse and translate the inner command. It has to be a subprocedure because
it creates a local variable named ; when the inner command is compiled,
this local variable steals the generated code. If there is only one line of generated code,
then outputs that line; if more than one, then creates a subprocedure and
outputs an instruction to invoke that subprocedure. This technique depends on Logos
dynamic scope, so that references to the variable named in other parts of
the compiler (such as, for example, or ) will refer to
this local version.
Weve already seen the most important part of the runtime library: the procedure
that gets the compiled program from one line to the next.
There is only one more procedure needed as runtime support; its called
and its used by the BASIC command. In , data input is independent of
lines. If a single command includes two variables, the user can type the two
desired values on separate lines or on a single line. Furthermore, two
commands can read values from a single line, if there are still values left on the line after
the rst has been satised. uses a global variable whose
value is whatevers still available from the last data input line, if any. If there is nothing
available, it reads a new line of input.
A more realistic BASIC implementation would include runtime library procedures
to compute built-in functions (the equivalent to Logos primitive operations) such as
absolute value or the trigonometric functions.
list 100-200
define
102 Chapter 6 Example: BASIC Compiler
Further Explorations
Program Listing
to basic
make "linenumbers []
make "readline []
forever [basicprompt]
end
to basicprompt
print []
print "READY
print []
make "line basicread
if emptyp :line [stop]
ifelse numberp first :line [compile split :line] [immediate :line]
end
This BASIC compiler leaves out many features of a complete implementation. In a real
BASIC, a string can be the value of a variable, and there are string operations such
as concatenation and substring extraction analogous to the arithmetic operations for
numbers. The BASIC programmer can create an array of numbers, or an array of strings.
In some versions of BASIC, the programmer can dene named subprocedures, just as
in Logo. For the purposes of this chapter, I wanted to make the compiler as simple as
possible and still have a usable language. If you want to extend the compiler, get a BASIC
textbook and start implementing features.
Its also possible to expand the immediate command capabilities of the compiler.
In most BASIC implementations, for example, you can say to list only a
specied range of lines within the source program.
A much harder project would be to replace the code generator in this compiler with
one that generates machine language for your computer. Instead of using to
create Logo procedures, your compiler would then write machine language instructions
into a data le. To do this, you must learn quite a lot about how machine language
programs are run on your computer!
I havent discussed every detail of the program. For example, you may want to trace
through what happens when you ask to delete a line from the BASIC source program.
Here is the complete compiler.
Program Listing 103
to compile :commands
make "number first :commands
make :number :line
ifelse emptyp butfirst :commands ~
[eraseline :number] ~
[makedef (word "basic% :number) butfirst :commands]
end
to makedef :name :commands
make "definition [[]]
foreach :commands [run list (word "compile. first ?) ?]
queue "definition (list "nextline :number)
define :name :definition
make "linenumbers insert :number :linenumbers
end
to insert :num :list
if emptyp :list [output (list :num)]
if :num = first :list [output :list]
if :num < first :list [output fput :num :list]
output fput first :list (insert :num butfirst :list)
end
to eraseline :num
make "linenumbers remove :num :linenumbers
end
to immediate :line
if equalp :line [list] [foreach :linenumbers [print thing ?] stop]
if equalp :line [run] [run (list (word "basic% first :linenumbers))
stop]
if equalp :line [exit] [throw "toplevel]
print sentence [Invalid command:] :line
end
;; Compiling each BASIC command
to compile.end :command
queue "definition [stop]
end
to compile.goto :command
queue "definition (list (word "basic% last :command) "stop)
end
104 Chapter 6 Example: BASIC Compiler
to compile.gosub :command
queue "definition (list (word "basic% last :command))
end
to compile.return :command
queue "definition [stop]
end
to compile.print :command
make "command butfirst :command
while [not emptyp :command] [c.print1]
queue "definition [print []]
end
to c.print1
make "exp expression
ifelse equalp first first :exp "" ~
[make "sym gensym
make word "%% :sym butfirst butlast first :exp
queue "definition list "type word ":%% :sym] ~
[queue "definition fput "type :exp]
if emptyp :command [stop]
make "delimiter pop "command
if equalp :delimiter ", [queue "definition [type char 9] stop]
if equalp :delimiter "\; [stop]
(throw "error [Comma or semicolon needed in print.])
end
to compile.input :command
make "command butfirst :command
if equalp first first :command "" ~
[make "sym gensym
make "prompt pop "command
make word "%% :sym butfirst butlast :prompt
queue "definition list "type word ":%% :sym]
while [not emptyp :command] [c.input1]
end
to c.input1
make "var pop "command
queue "definition (list "make (word ""% :var) "readvalue)
if emptyp :command [stop]
make "delimiter pop "command
if not equalp :delimiter ", (throw "error [Comma needed in input.])
end
Program Listing 105
to compile.let :command
make "command butfirst :command
make "var pop "command
make "delimiter pop "command
if not equalp :delimiter "= [(throw "error [Need = in let.])]
make "exp expression
queue "definition (sentence "make (word ""% :var) :exp)
end
to compile.for :command
make "command butfirst :command
make "var pop "command
make "delimiter pop "command
if not equalp :delimiter "= [(throw "error [Need = after for.])]
make "start expression
make "delimiter pop "command
if not equalp :delimiter "to [(throw "error [Need to after for.])]
make "end expression
queue "definition (sentence "make (word ""% :var) :start)
queue "definition (sentence "make (word ""let% :var) :end)
make "newname word "% gensym
queue "definition (sentence "make (word ""next% :var)
(list (list :newname)))
queue "definition (list :newname)
define :name :definition
make "name :newname
make "definition [[]]
end
to compile.next :command
make "command butfirst :command
make "var pop "command
queue "definition (sentence "make (word ""% :var) (word ":% :var) [+ 1])
queue "definition (sentence [if not greaterp]
(word ":% :var) (word ":let% :var)
(list (list "run (word ":next% :var)
"stop)))
end
106 Chapter 6 Example: BASIC Compiler
to compile.if :command
make "command butfirst :command
make "exp expression
make "delimiter pop "command
if not equalp :delimiter "then [(throw "error [Need then after if.])]
queue "definition (sentence "if :exp (list c.if1))
end
to c.if1
local "definition
make "definition [[]]
run list (word "compile. first :command) :command
ifelse (count :definition) = 2 ~
[output last :definition] ~
[make "newname word "% gensym
define :newname :definition
output (list :newname)]
end
;; Compile an expression for LET, IF, PRINT, or FOR
to expression
make "expr []
make "token expr1
while [not emptyp :token] [queue "expr :token
make "token expr1]
output :expr
end
to expr1
if emptyp :command [output []]
make "token pop "command
if memberp :token [+ - * / = < > ( )] [output :token]
if memberp :token [, \; : then to] [push "command :token output []]
if numberp :token [output :token]
if equalp first :token "" [output :token]
output word ":% :token
end
Program Listing 107
;; reading input
to basicread
output basicread1 readword [] "
end
to basicread1 :input :output :token
if emptyp :input [if not emptyp :token [push "output :token]
output reverse :output]
if equalp first :input "| | [if not emptyp :token [push "output :token]
output basicread1 (butfirst :input)
:output "]
if equalp first :input "" [if not emptyp :token [push "output :token]
output breadstring butfirst :input
:output "]
if memberp first :input [+ - * / = < > ( ) , \; :] ~
[if not emptyp :token [push "output :token]
output basicread1 (butfirst :input) (fput first :input :output) "]
output basicread1 (butfirst :input) :output (word :token first :input)
end
to breadstring :input :output :string
if emptyp :input [(throw "error [String needs ending quote.])]
if equalp first :input "" ~
[output basicread1 (butfirst :input)
(fput (word "" :string "") :output)
"]
output breadstring (butfirst :input) :output (word :string first :input)
end
to split :line
output fput first :line split1 (butfirst :line) [] []
end
to split1 :input :output :command
if emptyp :input [if not emptyp :command [push "output reverse :command]
output reverse :output]
if equalp first :input ": [if not emptyp :command
[push "output reverse :command]
output split1 (butfirst :input) :output []]
output split1 (butfirst :input) :output (fput first :input :command)
end
108 Chapter 6 Example: BASIC Compiler
;; Runtime library
to nextline :num
make "target member :num :linenumbers
if not emptyp :target [make "target butfirst :target]
if not emptyp :target [run (list (word "basic% first :target))]
end
to readvalue
while [emptyp :readline] [make "readline basicread]
output pop "readline
end