Programming Language
Programming Language
net/publication/228882836
CITATIONS READS
0 538
1 author:
Morgan Mcguire
Williams College
62 PUBLICATIONS 1,196 CITATIONS
SEE PROFILE
All content following this page was uploaded by Morgan Mcguire on 04 June 2014.
Morgan McGuire∗
Williams College
Contents
1 Introduction 1
2 Computability 3
3 Life of a Program 5
5 Syntax 8
6 Semantics 10
7 The λ Calculus 15
8 Macros 19
9 Type Systems 21
10 Memory Management 27
1 Introduction
A well-written program is a poem. Both are powerful because their content is condensed without being in-
scruitable and because the form is careful chosen to give insight into the topic. For a program, the topic is
an algorithm and the implementation should emphasize the key steps while minimizing the details. The most
elegant implementation is not always the most efficient, although it often is within a constant factor of optimal.
The choice of programming language most closely corresponds to the choice of poem structure, e.g., sonnet or
villanelle, not the choice of natural language, e.g., English or French. Structure enforces certain patterns and
ways of thinking on the author and reader, thus aiding certain kinds of expression and inhibiting others.
To author elegant programs, one must master a set of languages and language features. Then, one must sub-
jectively but precisely choose among them to express specific algorithms. Languages are themselves designed.
A well-designed language is a meta-poem. A language designer crafts a set of expressive tools suited to the
safety, performance, and expressive demands of a problem domain. As with literature, the difficult creative
choice is often not what to include, but what to omit.
A programming language is a mathematical calculus, or formal language. Its goal is to express algorithms
in a manner that is unambiguous to people and machines. Like any calculus, a language defines both syntax
and semantics. Syntax is the grammar of the language; the notation. Semantics is the meaning of that notation.
Since syntax can easily be translated, the semantics are more fundamental.
Church and Turing (and Kleene) showed that the minimal semantics of the λ calculus and Turing machine
are sufficient to emulate the semantics of any more complicated programming language or machine. However,
reducing a particular language to the λ calculus may require holistic restructuring of programs in that language.
∗ morgan@cs.williams.edu, http://graphics.cs.williams.edu
1 INTRODUCTION 2
We say that a particular language feature (e.g., continuations, macros, garbage collection) is expressive if it
cannot be emulated without restructuring programs that use it.
In these notes, features are our aperture on programming languages. These features can increase and de-
crease the expressiveness of the language for certain domains. Why would we want to decrease expressiveness?
The primary reason is to make it easier to automatically reject invalid programs, thus aiding the debugging and
verification process. Compilers print error messages when they automatically detected (potential) errors in your
program. You might have previously considered such errors to be bad. But they are good! When this happens,
the compiler saves you trouble of finding the errors by testing (as well as the risk of not finding them at all). A
major challenge and theme of programming language design is simultaneously pushing the boundaries of what
is checkable and expressive in a language.
1.1 Types
Every language makes some programs easy to express and others difficult. When a language is well-suited to
a problem domain, the programs it makes easy to express are correct solutions to problems in that domain. A
well-suited language furthermore makes it hard to express programs that are incorrect. This is desirable! One
way to design a language is to selectively add restrictions until it is hard to express incorrect programs for the
target domain. The cost of a language design is that some correct and potentially useful programs also become
hard to express in the language.
The type system is one tool for restricting a language. A type system associates metadata with values and the
variables that can be bound to them. A well-typed program is one where constraints on the metadata imposed
by the language and program itself are satisfied. When these are violated, e.g., by assigning a “String” value
to an “int” variable in Java, the program is incorrect. Some kinds of program errors can be detected by static
analysis, which means examining the program without executing it. Some kinds of errors cannot be detected
efficiently through static analysis, or are statically undecidable. Many of these can be detected by dynamic
analysis, which means executing type checks at run-time–while the program is executing.
We say that a language exhibits type soundness if well-typed programs in that language cannot “go
wrong” [Mil78]. That is, if well-typed programs cannot reach stuck states [WF94] from which further exe-
cution rules are undefined. Another view of this is that “A language is type-safe if the only operations that can
be performed on data in the language are those sanctioned by the type of the data.” [Sar97]
By declaring undesirable behaviors–such as dereferencing a null pointer, accessing a private member of
another class, or reading from the filesystem–to be type errors and thus unsanctioned, the language designer
can leverage type soundness to enforce safety and security.
All languages (even assembly languages) assign a type to a value at least before it is operated on, since
operations are not well-defined without an interpretation of the data. Most languages also assign types to
values that are simply stored in memory. One set of languages that does not is assembly languages: values
in memory (including registers) are just bytes and the programmer must keep track of their interpretation
implicitly. Statically typed languages contain explicit declarations that limit the types of values a to which
a variable may be bound. C++ and Java are statically typed languages. Dynamically typed languages such
as Scheme and Python allow a variable to be bound to any type of value. Some languages, like ML, are
dynamically typed but the interpreter uses type inference to autonomously assign static types where possible.
2 Computability
evaluated at n = i. This statement is a formal equivalent of the Liar’s Paradox, which in natural language is
the statement, “This sentence is not true.” Sn (n) creates an inconsistency. As a paradox, it can neither be proved
(true), nor disproved (false).
As a result of the Incompleteness Theorem, we know that there exist functions whose results cannot be
computed. These non-computable functions (also called undecidable) are interesting for computer science
because they indicate that there are mathematical statements whose validity cannot be determined mechanically.
For computer science, we define computability as:
A function f is computable if there exists a program P that computes f , i.e., for any input x, the
computation P(x) halts with output f (x).
Unfortunately, many of undecidable statements are properties of programs that we would like a compiler to
check. A constant challenge in programming language development is that it is mathematically impossible to
prove certain properties about arbitrary programs, such as whether a program does not contain an infinite loop.
1 “program”
as in plan of action, not code
2 ...and
answered Hilbert’s “second problem”: prove that arithmetic is self-consistent. Whitehead and Russell’s Principia
Mathematica previously attempted to derive all mathematics from a set of axioms.
2 COMPUTABILITY 4
Proof. Assume program Q(P, x) computes H (somehow). Construct another program D(P) such
that
D(P):
if Q(P, P) = “halts” then loop
else halt
The proof only holds when H must determine the status of every program and every input. It is possible to
prove that a specific program with a specific input halts. For a sufficiently limited language, it is possible to
solve the Halting Problem. For example, every finite program in a language without recursion or iteration must
halt.
The theorem and proof can be extended to most observable properties of programs. For example, within
the same structure one can prove that it is undecidable whether a program prints output or reaches a specific
line in execution. Note that it is critical to the proof that Q(P, x) does not actually run P; instead, it must
decide what behavior P would exhibit, were it to be run, presumably by examining the source code of P. See
http://www.cgl.uwaterloo.ca/ csk/halt/ for a nice explanation of the Halting Problem using the C programming language.
Two straightforward ways to prove that a property is undecidable are:
• Show that the Halting Problem reduces to this property. That is, if you can solve static checking of the
property, then you can solve the Halting Problem, therefore the property is at least as hard as the Halting
Problem and is undecidable.
• Rewrite a Halting Problem proof, substituting the property for halting. Note that this is not the same as
reducing the property to the Halting Problem.
early versions of the 3D graphics DirectX HLSL and OpenGL GLSL languages. Once can guarantee that pro-
grams in such languages always halt3 , for example, by simply not providing the mechanisms for function calls
or variable-length loops. Of course, programmers often find those restrictions burdensome and most practical
languages eventually evolve features that expand the expressive power to defeat their static checkability. This
has already occurred in the C++ type system with the template feature and in both the aforementioned HLSL
and GLSL languages.
3 Life of a Program
A program goes through three major stages: Source, Expressions, and Values. Formal specifications describe
the syntax of the source and the set of expressions using an grammar, typically in BNF. This is called the
expression domain of the language. The value domain is described in set notation or as BNF grammars.
Expressions are also called terms. Expressions that do not reduce to a value are sometimes called statements.
An analogy to a person reading a book helps to make clear the three stages. The physical ink on the printed
page is source. The reader scans the page, distinguishing tokens of individual letters and symbols from clumps
of ink. In their mind, these are assigned the semantics of words–i.e., expressions. When those expressions are
evaluated, the value (meaning) of the words arises in the readers mind. This distinction is subtle in the case
of literals. Consider a number written on the page, such as “32”. The curvy pattern of ink is the source. The
set of two digits next to each other is the expression. The interpretation of those digits in the reader’s mind is
the number value. The number value is not something that can be written, because the act of writing it down
converts it back into an expression. Plato might say that the literal expression is a shadow on the cave wall of
the true value, which we can understand but not directly observe. 4
3.2 Expressions
A parser converts the token stream into a parse tree of expressions. The legal expressions are described by the
expression domain of the language, which is often specified in BNF. The nodes of a parse tree are instances
of expressions (e.g., a FOR node, a CLASS-DEFINITION node) and their children are the sub-expressions. The
structure of the parse tree visually resembles the indenting in the source code. Figure 3.2 shows a parse tree for
the expressions found in the token stream from figure 3.1.
The Scheme language contains the QUOTE special form for conveniently specifying parse trees directly as
literal values, omitting the need for a tokenizer and parser when writing simple interpreters for languages that
have an abstract syntax. The drawback of this approach is that simply quoting the factorial code in figure 3.1
would not produce the tree in figure 3.2. Instead, the result would be a tree of symbols and numbers without
appropriate expression types labeling the nodes.
3.3 Values
When the program executes (if compiled, or when it is evaluated by an interpreter if not), expressions are
reduced to values. The set of legal values that can exist during execution is called the value domain. The value
domain typically contains all of the first-class values, although some languages have multiple value domains
and restrict what can be done to them. In general, a value is first-class in a language if all of the following hold:
1. The value can be returned from a function
2. The value can be an argument to a function
3. A variable can be bound to the value
4. The value can be stored in a data structure
Java generics (a polymorphism mechanism) do not support primitive types like int, so in some sense those
primitives are second-class in Java and should be specified in a separate domain from Object and its subclasses,
which are first-class. In Scheme and C++, procedures (functions) and methods are first-class because all of the
above properties hold. Java methods are not first-class, so that language contains a Method class that describes
a method and acts as a proxy for it.
4 INTERPRETERS AND COMPILERS 7
or using a BNF grammar (at least, for a substitution interpreter), which is described later.
ations and registers. This allows them include new architectural optimizations without changing the public interface, for
compatibility.
5 SYNTAX 8
distributed, although it is possible to obfuscate or, in some languages, encrypt the source to discourage others
from reading it.
5 Syntax
Although we largely focus on semantics, some notable points about syntax:
hdigiti ::= ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
hdigitsi ::= hdigiti | hdigitihdigitsi
In this document, these are typeset using an unofficial (but common) variation, where terminals are typeset as
x and nonterminals as x. This improves readability for dense expressions. With this convention, digits are:
digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
digits ::= digit | digit digits
It is common to extend BNF with regular expression patterns to avoid the need for helper productions. These
include the following notation:
( x ) = x; parentheses are for grouping only
[x] = zero or one instances of x (i.e., x is optional)
5 SYNTAX 9
boolean ::= #t | #f
digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
integer ::= [ + | - ] digit +
rational ::= integer / digit +
decimal ::= [ + | - ] digit ∗ . digit +
real ::= integer | rational | decimal
BNF can be applied at both the character level (e.g., to describe a lexer/tokenizer) and the token level (e.g.,
to describe a parser). The preceding example operates on individual characters within a string and is useful to
a tokenizer. An example of a subset of Scheme’s expression domain represented in BNF at the token level is:
variable ::= id
∗
let ::= ( let ( [ id exp ] ) exp )
Here the pattern on the left may be reduced to the simpler form on the left during evaluation. This is an example
of a general mechanism for ascribing formal semantics to syntax that is described further in the following
section.
6 SEMANTICS 10
6 Semantics
6.1 Operational Semantics
An operational semantics is a mathematical representation of the semantics of a language. It defines how
expressions are reduced to other expressions by applying a set of rules. Equivalently, it gives a set of progress
rules for progressing from complex expressions to simpler ones, and eventually to values. Each rule has
preconditions: to be applied, the rule must match the pattern of the expression. Any rule whose preconditions
are met can be applied. When no rule applies the program halts. If it halts with a value, that is the result of
the computation. If the semantics are self-consistent and rule expansion halts with an expression that is not a
value, that indicates an error in the target program. The nature and location of the error are determined by the
remaining expression and the last rule applied.
Note that the semantics influence but ultimately do not imply the implementation. For example, an operation
that requires O(n2 ) rule applications may require only O(n) operations on an actual computer. Likewise, eager
and lazy substitution are often interchangeable at the semantic level. We say that two implementations are
semantically equivalent if they always reduce identical expressions to identical vales, even if they use different
underlying algorithms to implement the semantics.
The rules are expressed using a notation for reductions, conditional reductions, and substitutions. The most
general form of a rule is:
x ⇒ y (5)
“Expressions matching x reduce to y”
where x is a placeholder in this notation, not a variable in the programming language. These placeholders will
be filled by terms, or by term variables such as expsubscript and valsubscript . Here, the name of the variable
indicates its domain (often, expression or value), and the subscript is a tag to distinguish multiple variables
in the same statement6 Sometimes rules are written more casually using the subscripts as the variables and
ignoring the domains, when the domain is irrelevant.
A specific example of a reduction rule is the additive identity in arithmetic:
The addition sign on the right of the reduction indicates actual addition of value; the one on the left denotes
syntax for an expression. Variables numx and numy are expressions; the hat on the right side indicates that we
mean the value corresponding to that operation.
To make this more concrete, we could give a very specific reduction:
1 + 2 ⇒ 3̂ (8)
Here, 1 and 2 on the left of the arrow are syntax for literals, i.e., they are numerals. The hat on the 3 to the
right of the arrow indicates that it is a value, i.e., it represents an actual number and not just the syntax for a
number. See [Kri07, 231] for further discussion of this hat notation. We will see some cases in which the line
between the expression domain and the value domain is blurry. In those, the hat notation is unnecessary.
Sometimes we have complicated conditions on a rule that constrain the pattern on the left side of ⇒. These
are notated with a conditional statement the form:
a
(9)
b In gen-
“If mathematical statement a is true, then statement b is true (applicable).”
6 This is admittedly a suboptimal notation, since the “name” that carries the meaning for the reader but is buried in the
eral, both a and b are reductions. Furthermore, there may be multiple conditions in a, notationally separated by
spaces, that must all be true for the reduction(s) in b to be applied. These rules are useful for making progress
when no other rule directly applies. For example, to evaluate the mathematical expression 1 + (7 + 2), we must
first resolve (7 + 2). The rule for making progress on addition with nested subexpression on the right of the
plus sign is:
exp2 ⇒ num2
(10)
exp1 + exp2 ⇒ exp1 + num2
which reads, “if expression #2 can be reduced to some number (by some other rules), then the entire sum can
be reduced to the sum of expression #1 and that number.” We of course need some way of reducing expressions
on the left of the plus sign as well:
exp1 ⇒ num1
(11)
exp1 + exp2 ⇒ num1 + exp2
Applying combinations of these rules allows us to simplify arbitrarily nested additions to simple number addi-
tions.
There are two major interpreter designs: substitution and evaluation interpreters. Substitution interpreters
transform expressions to expressions and terminate when the final expression is a literal. Their value domain is
their literal domain. Evaluation/environment interpreters are implemented with an EVAL procedure that reduces
expressions directly to values. Such a procedure recursively invokes itself, but program execution involves a
single top-level EVAL call. These two models of implementation correspond to two styles of semantics:
The following two subsections demonstrate how semantics are assigned in each. We use the context of a
simple language that contains only single-argument procedure definition, variable, conditional, application, and
booleans. Let these have the semantics of the equivalent forms in the Scheme language:
exp ::= ( λ ( id ) exp )
id
( i f exp exp exp )
( exp exp )
true false (12)
Had Scheme’s semantics dictated that the test expression must be a boolean, the first rule would have replaced
ok with true , which is likely what you first expected to see there. However, recall that Scheme treats any
non-false value as true for the purpose of an IF expression.
A perhaps less obvious rule, E-IfProgress, is required as well to complete the semantics of IF. When the test
expression for the conditional is not in the value domain, we need some way of making progress on reducing
the expression.
exptest ⇒ valtest
E-IfProgress: (16)
( if exptest expthen expelse ) ⇒ ( if valtest expthen expelse )
Application requires a notation for expressing variable substitution. This is:
body is the expression in which all instances of variable named id are to be replaced with the expression v. In
an eager language, the v expression must be in the value domain. In a lazy language, v can be any expression.
For semantic purposes the distinction between eager and lazy is irrelevant in a language without mutation and
small step semantics are almost always restricted to languages without mutation.
Using this notation we can express application as a reduction. If we choose lazy evaluation, it is:
E-Applazy : ( ( λ ( id ) expbody ) exparg ) ⇒ id exparg expbody (18)
and introduce another progress rule (we still require rule E-AppProgress1):
exp2 ⇒ val2
E-AppProgress2eager : (21)
( exp1 exp2 ) ⇒ ( exp1 val2 )
6.2.2 A Proof
Because each rule is a mathematical statement, we can prove that a complex expression reduces to a simple
value by listing the rules that apply. That is, by giving a list of true statements that reduce the expression to
a specific value. Operational semantics are used this way, but they are more often used as a rigorous way of
specifying what an interpreter should do. The reason for exploring this proof structure is that we will later us
the same structure on type judgements (another kind of rule set) to prove that an expression has some interesting
property, rather than a specific value.
We list the statements in the order they are applied. When reaching a conditional we must prove its ante-
cedants. To do this, we replace each antecedant with its own proof; that is, we start nesting the conditional
statements until all are satisfied.
(E-App)
h i (subst.)
((λ (x) x) true) ⇒ x true x ⇒ true
Proof. (E-IfTrue)
(if ((λ (x) x) true) false true) ⇒ false
ok fˆ
val ::= (22)
The angle brackets in the procedure notation hid, exp, E i denote a tuple, in this case, a 3-tuple. That is
simply a mathematical vector of values, or equivalently, a list in an interpreter implementation. The specific
tuple we’re defining here is a 1-argument closure value, which has the expected three parts: formal parameter
identifier, body expression, and captured environment. Big step operational semantics use environments in the
same way that interpreters do for representing deferred substitutions. The environment is placed on the left side
of the progress arrow and separated by an expression by a comma.
Now that the value domain is disjoint from the expression domain, we need some trivial rules for reducing
literal and LAMBDA expressions to their corresponding values.
Forget all of the previous small-step rules. We begin big step with:
E-True : true , E ⇒ tˆ
E-False : false , E ⇒ fˆ
Note that LAMBDA captures the identifier and environment, and saves and delays the expression, and that the
literals ignore the environment in which they are reduced.
The rules for evaluating IF expressions are largely the same as for small step, but we can ignore the progress
rules because they are implicit in the big steps:
¬ expt , E ⇒ fˆ expc , E ⇒ valc
E-IfOk:
( if expt expc expa ) , E ⇒ valc
Each reduction, whether a conditional or a simple a ⇒ b, is a mathematical statement. Statements are either
true or false7 in the mathematical sense. Each progress rule is really a step within a proof whose theorem is
7 ...although the function to evaluate the truth value of a statement may be non-computable, as in the case of the Halting
Problem.
6 SEMANTICS 14
“this program reduces to (whatever the final value is).” The ¬ operator negates the truth value of a statement.
Thus the E-IfOk rule has as an antecedent “the test expression does not reduce to fˆ ”. We need to express
it this way because our desired semantics follow Scheme’s, which allow any value other than fˆ to act like
“true” for the purpose of an IF expression.
To express the semantics of APP we need a notation for extending an environment with a new binding. Since
environments and substitutions are not used simultaneously, the substitution notation is repurposed to denote
extending an environment:
Note that the arrow inside the brackets points in the opposite direction as for substitution, following the con-
vention of [Kri07, 223]. The application rule for our toy language under big step semantics is:
exp p , E1 ⇒ hid, expb , Eb i expa , E1 ⇒ vala expb , Eb [id vala ] ⇒ valb
E-App : (26)
( exp p expa ) , E1 ⇒ valb
This reads,
“if(expression exp p reduces to a procedure in the current environment (E1 ),
and argument expression expa reduces to value in the current environment, and
and the body of that procedure evaluated in the procedure’s stored environment (Eb )
extended with id bound to the argument’s value reduces to value valb ,)
then the application of the procedure expression to the argument expression in the current environment is
valb .”
Observe that the body expression is evaluated in the stored environment extended with the new binding,
creating lexical scoping. Were we to accidentally use the current environment there we would have created
dynamic scope. We conclude with the trivial variable rule:
E-Var : id, E [id val] ⇒ val (27)
Examining our big-step rules, we see that they map one-to-one to the implementation of an interpreter for
the language. Each rule is one case inside the EVAL procedure, or inside the parser for ones that we choose to
rewrite at parse time. Within a rule, each antecedant corresponds to one recursive call to EVAL. For example
in the E-App rule, there are three antecedants. These correspond to the recursive calls to evaluate the first
subexpression (which should evaluate to a procedure), the second (i.e., the argument), and then the body. That
last call to evaluate the body is usually burried inside APPLY. The environments specified on the left sides of
the antecedants tell us which environments to pass to EVAL.
See [Kri07] chapter 23 for an excellent set of examples of progress rules for big-step operational semantics.
6.3.2 A Proof
This is a proof of the same statement from the small-step example, now proven with the big-step semantics.
Because the nesting gets too deep to fit the page width, I created a separate lemma for the conditional portion
of the proof.
Proof.
(Lemma 1)
1. Because ((λ (x) x) true) , E ⇒tˆ , and tˆ 6= fˆ
the negation of the contradiction of Lemma 1, ¬ ((λ (x) x) true) , E ⇒ fˆ , is also true.
¬ ((λ (x) x) true) , E ⇒ fˆ false , E ⇒ fˆ
2. E-IfOk
(if ((λ (x) x) true) false true) , E ⇒ fˆ
7 The λ Calculus
The λ calculus is Church’s [Chu32] minimalist functional model of computation. Church showed that all
other programming constructs can be eliminated by reducing them to single-argument procedure definition
(i.e., abstraction; lambda), variables, and procedure application. Variations of λ calculus are heavily used in
programming language research as a vehicle for proofs. Outside research, there are several motivations for
studying λ calculus and reductions to it from more complex languages.
Philosophically, λ calculus is the8 foundation for our understanding of computation and highlights the power
of abstraction. Practically, understanding the language and how to reduce others to it changes the way that one
thinks about (and applies) constructs in other languages. This leads the way to emulating constructs that are
missing in a language at hand, which makes for a better programmer. For example, Java lacks lambda. The
Java API designers quickly learned to use anonymous classes to create anonymous closures, enabling the use
of first-class function-like objects in a language that does not support functions. C++ programmers discovered
a way to use the polymorphic mechanism of templates as a complete macro language.
On learning a new language, the sophisticated programmer does not learn the specific forms of that language
blindly but instead asks, “which forms create closures, recursive bindings, iteration, etc. in this language?”.
If any of the desired features are missing, that programmer then emulates them, using techniques learned by
emulating complex features in the minimalist λ calculus. So, although implementing Church Booleans is just
an academic puzzle for most programmers, that kind of thought process is valuable in implementing practical
applications.
André van Meulebrouck describes an alternative motivation:
“Perhaps you might think of Alonzo Church’s λ -calculus (and numerals) as impractical mental
gymnastics, but consider: many times in the past, seemingly impractical theories became the under-
pinnings of future technologies (for instance: Boolean Algebra [i.e., today’s computers that operate
in binary build massive abstractions using only Boole’s theoretical logic!]).
Perhaps the reader can imagine a future much brighter and more enlightened than today. For in-
stance, imagine computer architectures that run combinators or λ -calculus as their machine instruc-
tion sets.”9
7.1 Syntax
The λ calculus is a language with surprisingly few primitives in the expression domain10 :
var ::= id
abs ::= λ id . exp
app ::= exp exp
exp ::= var | abs | app | ( exp )
The last expression on the right simply states that parentheses may be used for grouping.
The language contains single value type, the single-argument procedure, in the value domain. In set notation
this is:
val = proc = var × exp
and in BNF:
The abbreviated names used here and in the following discussions are mnemonics for: ‘id’ = ‘identifier’,
‘abs’ = ‘abstraction’ (since λ creates a procedure, which is an abstraction of computation), ‘app’ = ‘procedure
application’, ‘exp’ = ‘expression’, ‘proc’ = ‘procedure’, and ‘val’ = ‘value’.
7.2 Semantics
The formal semantics are simply those of substitution [Pie02, 72]:
App-Part 1: (reduce the procedure expression towards a value)
exp p ⇒ exp0p
(28)
exp p expa ⇒ exp0p expa
expa ⇒ exp0a
(29)
exp p expa ⇒ exp p exp0a
The App-Abs rule relies on the same syntax for the val value and abs expression, which is fine in λ calculus
because we’re using pure textural substitution. In the context of a true value domain that is distinct from the
expression domain, we could express it as an abs evaluation rule for reducing a procedure expression to a
procedure value and and an application rule written something like:
val p = λ id . expbody
(31)
val p vala ⇒ [id vala ] expbody
7.3 Examples
For the sake of giving simple examples, momentarily expand λ calculus with the usual infix arithmetic opera-
tions and integers. Consider the evaluation of the following expression:
λ x . (x + 3) 7 (32)
10 This is specifically a definition of the untyped λ -calculus.
7 THE λ CALCULUS 17
This is an app expression. The left sub-expression is a abs, the right sub-expression is an integer (7). Rule
App-Abs is the only rule that applies here, since both the procedure and the integer cannot be reduced further.
App-Abs replaces all instances of x in the body expression (x + 3) with the right argument expression, 7:
λ x . (x + 3) 7 (33)
⇒ [x 7](x + 3) (34)
= (7 + 3) (35)
= 10 (36)
Arithmetic then reduces this to 10.
Now consider using a procedure as an argument:
(λ f . ( f 3)) (λ x.(1 + x)) (37)
This is again an app. In this case, both the left and right are abs expressions. Applying rule App-Abs substitutes
the right expression for f in the body of the left procedure, and then we apply App-Abs again:
(λ f . ( f 3)) (λ x.(1 + x)) (38)
⇒ [ f (λ x.(1 + x)) ]( f 3)) (39)
= λ x.(1 + x) 3 (40)
⇒ [x 3](1 + x) (41)
= (1 + 3) (42)
=4 (43)
λ f . λ n.(if (iszero n)
1
(n ∗ ( f (n − 1))))
This just says that if we already had the factorial function that operated on values less than n, we could imple-
ment the factorial function for n. That’s close to the idea of recursion, but is not fully recursive because we’ve
only implemented one inductive step. We need to handle all larger values. To let this inductive step run further,
say that f is the function that we’re defining, which means that the inner call requires two arguments: f and
n − 1:
λ f .λ n.(if (iszero n)
1
(n ∗ ( f f (n − 1))))
Call this entire function g. It is a function that, given a factorial function, creates the factorial function. At first
this does not sound useful–if we had the factorial function, we wouldn’t need to write it! However, consider
that what we have defined g such that (g f ) = f ...in other words, the factorial function is the fixed point of g.
For this particular function, we can find the fixed point by binding it and then calling it on itself. Binding and
applying values are accomplished using abs and app expressions. An expression of the form:
(λ z.z z) g ⇒ g g (52)
applies g to itself. Wrapping our entire definition with this:
(λ z.z z)
(λ f .λ n.(if (iszero n)
1
(n ∗ ( f f (n − 1)))))
produces a function that is indeed factorial, albeit written in a strange manner. Convince yourself of this by
running it in Scheme, using the following translation and application to 4:
(
((lambda (z) (z z))
(lambda (f)
(lambda (n)
(if (zero? n)
1
(* n ((f f) (- n 1)))))))
4)
8 MACROS 19
Y = λ f.
((λ z . z z)
(λ x . f (λ y . x x y))) (53)
When applied to a generator function, Y finds its fixed point and produces that recursive function. The Y
combinator is formulated so that the generator need not apply its argument to itself. That is, the step where we
rewrote ( f (n − 1)) as ( f f (n − 1)) in our derivation is no longer necessary.
A Scheme implementation of Y and its use to compute factorial are below. The use of the DEFINE state-
ment is merely to make the implementation more readable. Those can be reduced to LAMBDA and application
expressions.
; Creates the fixed point of its argument
(define Y
(lambda (f)
((lambda (z) (z z))
(lambda (x) (f (lambda (y) ((x x) y)))))))
; Example: prints 24
((Y generator) 4)
8 Macros
We’ve seen that a parser can reduce macro-expressive forms to other forms to minimize the number of cases
that need to be handled in the compiler/interpreter. For example, a short-circuiting OR expression like the one
in Scheme can be reduced within the parser by the small-step rule:
macros are at level 2, and so on. Not all macro systems are hygienic. While Scheme’s macro system is (since
R5R6) and is generally considered both clean and powerful, the most frequently used macro system–that of
C/C++ –is not. This does not mean that C macros are useless, just that extreme care must be taken when using
them.
8.1 C Macros
The C macro system contains two kinds of statements: #if and #define. Without defining its semantics here,
an example12 illustrates how they are typically employed:
- # i n c l u d e < s t d i o . h>
# i f d e f i n e d ( MSC VER )
// Windows
# d e f i n e BREAK : : DebugBreak ( ) ;
# e l i f d e f i n e d ( i 3 8 6 ) && d e f i n e d ( GNUC )
// g c c on some I n t e l p r o c e s s o r
# d e f i n e BREAK asm volatile ( ” i n t $3 ” ) ;
#else
// H o p e f u l l y , some o t h e r g c c
# d e f i n e BREAK : : a b o r t ( )
# endif
# d e f i n e ASSERT ( t e s t e x p r , m e s s a g e ) \
i f ( ! ( t e s t e x p r ) ) {\
p r i n t f ( ”%s \ n A s s e r t i o n \”% s \” f a i l e d i n %s a t l i n e %d . \ n ” , \
message , # t e s t e x p r , FILE , LINE ) ; \
BREAK; \
}
i n t main ( i n t a r g c , char ∗∗ a r g v ) {
int x = 3;
ASSERT ( x > 4 , ” S o m e t h i n g bad ” ) ;
return 0;
}
The #if statements are used to determine, based on expressions including variables such as MSC VER that
are defined at compile time, what compiler and operating system the code is being compiled for. They allow the
program to function differently on different machines without the expense of a run-time check. Furthermore,
certain statements that are not legal on one compiler can be avoided entirely, such as the inline assembly syntax
used for gcc. The #define statement creates a new macro. By convention, macros are given all uppercase
names in C. Here, two macros are defined: BREAK, which halts execution of the program when the code it emits
is invoked, and ASSERT, which conditionally halts execution if a test expression returns false. ASSERT cannot
be a procedure for two reasons: first, it would be nice to define it so that in an optimized build the assertions
are removed (not shown here), and second, because we do not want to evaluate the message expression if the
test passes.
Within the body of the ASSERT definition we see several techniques that are typical of macro usage. The
special variables FILE and LINE indicate the location at which the macro was invoked. Unlike proce-
dures in most languages, macros have access to the source code context from which they were invoked. This
allows them to customize error reporting behavior. The expression #test expr is applying the # operator to
the test expr macro variable. This operator quotes the source code, converting it from code into a string that
may then be printed. Procedures have no way of accessing the expressions that produced their arguments, let
alone the source code for those expressions. Note that where it is used as the conditional for if, test expr
is wrapped in parentheses. This is necessary because C macros operate at a pre-parse (in fact, pre-tokenizer!)
level, unlike Scheme macros. Without these extra parentheses, the application of the not operator (!) might
be parsed differently depending on the operator precedence of other operators inside the expression. This is
12 Adapted from the G3D source code, http://g3d-cpp.sf.net.
9 TYPE SYSTEMS 21
generally considered a poor design decision of the C language, not a feature, although it can be exploited in
useful ways to create tokens at compile time.
C’s macro system is practical, though ugly. The C++ language addresses many of its shortcomings by intro-
ducing two other language features for creating new syntax: templates and operator overloading. Templates
were originally introduced as a straightforward polymorphic type mechanism, but have since been exploited by
programmers as a general metaprogramming mechanism that has Turing-equivalent computational power.
9 Type Systems
9.1 Types
A type is any property of a program that we might like to check, regardless of whether it is computable.
Examples of such properties are “Function f returns an even integer”, “Program P halts on input x”, and
“Program P never dereferences a NULL pointer”. We will restrict ourselves to properties of values, which
is what is commonly meant by programers when they refer to “types”. In this context, a type is any of the
following equivalent representations:
Definition Example
Set of values x = {0, 1, 2, ...}
Specific subset of the value domain uint = {x ∈ num ⊆ val | x ≥ 0}
Condition x ∈ num and x ≥ 0
Predicate (define (uint? x) (and (number? x) (>= x 0)))
Precise description non-negative integers
Values have types in all programming languages. This is necessary to define the semantics of operations on
the values. In some implementations of some languages, types are explicitly stored with the values at runtime.
This is the case for Java Objects and Python and Scheme values, as well as C++ classes under most compilers.
In other implementations and languages, types are implicit in the code produced by a compiler. This is the
case in C. For an implementation to avoid storing explicit type values at runtime, it must ascribe static types
to variables, meaning that each variable can be bound to values of a single type that is determined at compile
time. Languages that retain run-time type information may have either static or dynamic types for variables.
Dynamically typed variables can be bound to any type of value. Few languages allow the type of a value
to change at runtime. One exception is C++, where the reinterpret cast operator allows the program to
reinterpret the bits stored in memory of a value as if they were a value of another type.
language by restricting the kinds of expressions that are legal, beyond the restrictions imposed by the grammar
itself. This is desirable because a well-defined type system generally prohibits programs that would have been
incorrect anyway. Of course, we always run the risk that some correct programs will be excluded by this choice.
Furthermore, when we limit the type system in order to make type checking decidable, we lose the ability to
eliminate all incorrect programs. Therefore, a decidable type system can only guarantee that a program is valid,
and not correct.
bool-lit ::= true false
lit ::= num-lit bool-lit
and ::= ( and exp exp )
tˆ fˆ
bool ::=
val ::= num bool (55)
We could use the operational semantics notation to express type judgements, e.g.,
exp1 , E ⇒ bool exp2 , E ⇒ bool
T-and: (56)
( and exp1 exp2 ) , E ⇒ bool
However, there would be two drawbacks to that approach. First, it looks like an operational semantics, so we
might be confused as to what we were seeing. Second, environments don’t exist during static evaluation (i.e.,
at “compile time”), so E is not useful. That is, environments store values that we’ve already discovered, but for
type checking we aren’t discovering values. We need a different kind of environment for type-checking: one
that stores types we’ve already discovered. Therefore, type judgements use a slightly different notation than
operational semantics. The above statement is written as:
Γ ` exp1 : bool Γ ` exp2 : bool
T-and : (57)
Γ` ( and exp1 exp2 ) : bool
This reads, “if gamma proves that expression 1 has type bool and gamma proves that expression 2 has
type bool, then gamma proves that (and expression 1 expression 2) has type bool”. Gamma (Γ) is a variable
representing a type environment. Just as the root value environment contains the library functions, the root
type environment contains the types of literals and the library functions. It is then extended with other types
that are discovered. The turnstile13 operator “`” is the part that reads “proves”. It replaces the comma from
operational semantics notation. The colon reads “has type”; it replaces the arrow from operational semantics
(except for the colon right next to “T-and”; that’s just a natural language colon indicating the name of the
judgement). The T-plus and T-equal judgments are similar to T-and, but with the appropriate types inserted.
We need a set of trivial judgements for the literals:
T-num: num-lit : num
T-bool: bool-lit : bool (58)
13 \vdash in Latex
9 TYPE SYSTEMS 23
All that remains is IF. The type of a conditional depends on what the types of the consequent and alternate
expressions are. Since we don’t statically know which branch will be taken, we are unable to ascribe a single
type to an IF expression if it allows those expressions to have different types as Scheme does. This will limit
the number of invalid programs that we can detect. So let us choose to define our language in the same manner
as C and Java, where both branches of the IF must have the same type, but there is no restriction on what that
type must be:
Γ ` expt : bool Γ ` expc : τ Γ ` expa : τ
T-if : (59)
Γ ` ( if expt expc expa ) : τ
Given type judgments for all of the expressions in our language, we can now discover/prove the type of any
expression. For example, the following is a proof that (if (and true false) 1 4) has type bool:
Here, “×” denotes the Cartesian set product, which is equivalent to denoting the set of all possible tuples of
values from its operand sets and the arrow indicates that this is a procedure mapping the argument types to the
return type.
The application rule is straightforward. For one argument it is:
Γ ` exp p : (a → r) Γ ` expa : a
T-app: (62)
Γ ` ( exp p expa ) : r
It is difficult to write a type judgement for LAMBDA without additional information. Some languages, like
ML, tackle that challenge using a process called type inference. For the moment, let us avoid the problem and
require the programmer to specify additional information in the form of an annotation. This is the strategy
taken by languages like Java and C++. For our annotations, change the syntax of LAMBDA to be:
Note that this is essentially the same as C++/Java syntax, except that we write types after identifiers instead of
before them. Now we can express a type judgement for procedure creation:
This reads, “If gamma extended such that id1 has type a1 , and so on for all other identifiers, proves that the
body expression has type r, then gamma proves that the procedure has type (a1 × . . . × an → r)”.
T-app and T-lambda each check one half of the type contract, and together ensure that procedures are cor-
rectly typed:
• “When typing the function declaration, we assume the argument will have the right type and
guarantee that the body, or result, will have the promised type.
• When typing a function application, we guarantee the argument has the type that the function
demands, and assume the result will have the type the function promises.” [Kri07, 245]
Observe that because we already have a mechanism for determining the type of an arbitrary expression the
return type annotations are spurious.
From a theoretical perspective, type annotations are merely a crutch for the type checker and would not
be required in any language. From the perspective of a practical programmer, they are a feature and not a
drawback. The type annotations are a form of documentation. Many programmers feel that they are likely to
get the annotations right and instead make mistakes in the implementation of a procedure. If this is true, then
a system that infers procedure types will tend to generate misleading error messages that blame the application
and not the procedure when the argument types do not match due to an error in the procedure.
9.5 Inference
Type inference solves the problem of typing LAMBDA and APP without requiring explicit annotations on
LAMBDA. A Hindley-Milner-Damas type system operates by discovering constraints on the types of expressions
and then unifying those constraints. This process is used by languages such as ML and Haskell.
The unification process corresponds to running a substitution interpreter in a metalanguage where the values
9 TYPE SYSTEMS 25
are types. Say we have the following language for our programs:
bool-lit ::= true false
num-lit ::= ... | -1 | 0 | 1 | ...
lit ::= num-lit bool-lit
app ::= ( exp exp )
c c
bool ::= t f
\ \
tlit ::= num bool
tvar ::= exp
t proc ::= texp → texp
texp ::= tlit | tvar | t proc
Note that the metalanguage isn’t quite so clear about distinguishing its expression and value domains. That’s
because we’re going to write a substitution interpreter (i.e., no mutation!) for the metalanguage, and for a
substitution interpreter the expression/value distinction is less significant. tval is essentially the same as texp,
except it requires that there be no variables nested anywhere in a type.
A type inference system assigns types to all bound program variables. In the process of doing so, it also finds
the types of all other expressions in the same way as a type checker. Thus it checks types while discovering
them.
Within the inference system, the type variables (tvar) are program expressions (exp), because the type vari-
ables are what we want to find the type values of, and those are expressions. For example, we might describe
the program:
((λ (x) x) 7) (70)
as having the following variables:
t2 = [[ (λ (x) x) ]] (72)
t3 = [[ 7 ]] (73)
t4 = [[ x ]] (74)
where [[ ... ]] means “the type of” the enclosed expression. In solving for these variables, we encounter the
following constraints, from the application, the lambda, and the literal expressions:
9 TYPE SYSTEMS 26
[[ (λ (x) x) ]] = [[ x ]] → [[ x ]] (76)
[[ 7 ]] = \
num (77)
Writing these in terms of the type variables, we have:
t2 = t3 → t1
t2 = t4 → t4
t3 = \num (78)
This is a set of simultaneous constraints, which we can solve by substitution (much like solving simultaneous
equations in algebra!) A substitution type interpreter solves for their values. This type interpreter maintains
a stack of constraints and a type environment (a.k.a. a substitution). These are analogous to the stack of
expressions and the environments in a program interpreter. The type environment initially contains the types of
the built-in procedures. The stack initially contains the constraints obtained directly from the program. While
the stack is not empty, the interpreter pops a constraint of the form X = Y off the top, where X and Y are both
texp, and processes that constraint as follows:
• If X ∈ tvar and X = Y , do nothing. This constraint just said a type variable was equal to itself.
• If X ∈ tvar, replace all occurrences of X with Y in the stack and the environment and let extend the
environment with [X 7→ Y ] (N.B. that arrow denotes a binding of a type variable, not a procedure type).
We have found an expansion of X, so we want to immediately expand it everywhere we encountered it
and save the expansion for future use in the environment.
• If Y ∈ tvar, replace all occurrences of Y with X in the stack and the environment and let extend the
environment with [Y 7→ X].
• If X and Y are both in t proc, such that X = Xa → Xr and Y = Ya → Yr , push Xa = Ya and Xr = Yr onto
the stack. These are both procedure type expressions, so their argument and return types must match each
other. This is the type inference system’s analog of APPLY.
• In any other case, the constraint X = Y is unsatisfiable, indicating an error in the target program. Report
this error and abort unification.
Constraint solving is a general problem in computer science. See the union-find algorithm for a generalized
discussion of the problem and popular approach.
9.6 Subtypes
There are many ways of defining types; therefore there are many ways of defining a subtypes t 0 of t, which is
written t 0 <: t or t 0 ⊆ t. Here are a few:
• Informally, t 0 <: t if you can use an instance of t 0 anywhere that you can use an instance of t
• Considering types as sets of values, the subset notation holds literally: t 0 is a subtype of t if it is also a
subset of t
• By the Liskov substitution principle, if q(x) is a property provable about objects x of type t, then q(y)
should be true for objects y of type t 0 , where t 0 <: t [Lis87].
• If t is defined by a predicate function q(x) that returns true for x ∈ t, then q(y) must also return true for
every y ∈ t 0 .
• t 0 <: t if t 0 is a sort of t
10 MEMORY MANAGEMENT 27
Γ ` exp : t 0 t 0 <: t
T-sub: (79)
Γ ` exp : t
10 Memory Management
Early languages had no memory management, e.g., FORTRAN required all arrays to have static size and pro-
hibited recursive procedure calls. Later languages (notably first Algol, and soon thereafter, LISP) split memory
into a dynamically controlled stack and heap. The stack grows and shrinks (...like a “stack” data structure)
primarily in response to procedure calls. Values on the stack are no longer available after that stack frame is
popped, e.g., when the procedure that created the frame returns. The heap is a giant block of random access
memory (...unfortunately, unlike a “heap” data structure). Values on the heap are available until deallocated.
With the introduction of the heap, a language needs a process for managing heap memory to ensure that
programs don’t ever allocate the same block of memory twice, and reclaim memory once it is no longer in
use. Let a block of memory be a contiguous heap region, which can be described by a starting address and
a size (e.g., in bytes). Let the free list and allocated list are each lists of block descriptors that have been,
respectively, not allocated (i.e., are free) or allocated. Assume an implementation of these as doubly-linked lists.
A manual memory management scheme provides the programmer with explicit procedures for allocating and
deallocating heap memory. These might look like:
d e f a l l o c ( numBytes ) :
global freeList , allocList
cur = f r e e L i s t . firstNode ( )
w h i l e c u r ! = None :
i f c u r . s i z e g r e a t e r t h a n o r e q u a l t o numBytes :
b l o c k = B l o c k D e s c r i p t o r ( c u r . s t a r t , numBytes )
i f ( c u r . s i z e g r e a t e r t h a n numBytes ) :
# Shrink the remaining block
c u r . s i z e −= numBytes
c u r . s t a r t += numBytes
else :
10 MEMORY MANAGEMENT 28
# We u s e d up t h e whole b l o c k
f r e e L i s t . remove ( c u r )
a l l o c L i s t . i n s e r t ( block )
# return the block
return block
# We ’ r e o u t o f memory !
r e t u r n None
def f r e e ( block ) :
global freeList , allocList
a l l o c L i s t . remove ( b l o c k )
f r e e L i s t . i n s e r t ( block )
Many languages don’t actually give you a block descriptor back from alloc; they return only the start address.
In order to recover the block size, they often stash it (by convention) in the bytes right before that start address.
In C, malloc and free implement manual memory management. e.g.,
MyObj∗ p t r = ( MyObj ∗ ) m a l l o c ( s i z e o f ( MyObj ) ) ;
...
free ( ptr );
You can see why the implementation of memory allocation might be somewhat slow–the functions have to
walk a linked list that initially contains one element, but increasingly becomes filled with lots of small block
descriptors, and the most straightforward algorithm will not guarantee that sequentially allocated blocks are
near enough each other in memory for hardware caches to function effectively.
Some problems with manual memory management:
1. Object initialization: type safety cannot be maintained if the program is free to treat objects and blocks of
memory as interchangable.
2. Dangling pointers: If you hang onto an address to a block of freed memory and later try to use it, the
result is undefined! That memory might have been allocated to some other object in the mean time.
3. Memory leaks: If you forget to free blocks of memory, eventually the program will run out of memory
and be unable to continue (in practice, it will start swapping to disk memory and become unusuably slow
before this occurs).
4. Memory management is boilerplate that clutters the program, without adding algorithmic insight.
In C++, new and delete allow for proper initialization of heap allocated objects. new allocates memory
for an object and then invokes its constructor. Delete invokes the destructor and then frees memory. The
destructor typically invokes delete recursively on all objects referenced by the one undergoing destruction.
As a result, most C++ programmers will try to allocate objects on the stack whenever possible and create chains
of destructors that automate much of the heap memory management. This addresses the first problem. But can
we do better, building this type-safety automation right into the language and addressing the other drawbacks
of memory management?
The allocation part is easy–values have to be explicitly created, so their allocation is necessarily part of the
program. But when is the right time to free a block of memory? When the values in it will never be used again
by the program. Unfortunately,
the function that determines whether a value will be used in the future is noncomputable
...it is as hard as evaluating the rest of the program and equivalent to the halting function. So we can at best
conservatively approximate this function. One good approximation is that if no other variable is pointing at a
value in the heap, then we know it can never be used again and can be freed. Algorithms for implementing this
strategy are called garbage collection algorithms.
Reference counting is one garbage collection algorithm. In it, each object maintains a count of the number
of incoming references (pointers). When this hits zero, the object automatically frees its own memory. This
10 MEMORY MANAGEMENT 29
is used by many implementations of the Python language, by operating systems for managing shared libraries,
and can be implemented in C++ using operator overloading to implement faux-pointers. Reference counting
is very efficient and predictable because the program’s action of setting a pointer to nulll triggers collection of
exactly the objects that action allows to be freed. Reference counting has a key weakness: cycles of pointers
ensure that even a completely disconnected component appears to always be in use. This is like two people
each floating in space by holding the other one up...really the entire structure should crash (in our case, be
deallocated).
One algorithm that can handle cycles is called mark-sweep:
def alloc2 ( size ) :
x = alloc ( size )
i f x == None :
# Out o f memory ! Time t o c o l l e c t .
gc ( )
x = alloc ( size )
i f x == None :
# A f t e r c o l l e c t i n g , s t i l l n o t enough memory .
growHeap ( )
x = alloc ( size )
return x
d e f gc ( ) :
for x in programStack :
mark ( x )
sweep ( )
d e f mark ( b l o c k ) :
i f n o t b l o c k . marked :
b l o c k . marked = T r u e
for x in block . p o i n t e r s :
mark ( x )
d e f sweep ( ) :
for x in a l l o c L i s t :
i f n o t x . marked :
a l l o c L i s t . remove ( x )
f r e e L i s t . add ( x )
else :
x . marked = F a l s e
This is much more accurate than reference counting for identifying dead objects, but some problems remain.
The first is that it has made the timing of garbage collection unpredictable. Some calls to alloc may take a
very long time to return as they sweep the allocated part of the heap. The resulting pauses at runtime make this
kind of garbage collection unacceptable for many realtime systems. In addition:
• Memory becomes incoherent–structures allocated closely in time may be far apart in space, leading to
poor cache behavior.
• How big should the heap initially be? If it is all of memory, that isn’t fair to other programs in a multi-
tasking OS. If it is too small, then we’ll garbage collect too frequently in the beginning.
10 MEMORY MANAGEMENT 30
There are other algorithms that address some of these problems, like the stop-and-copy collector algorithm,
which compacts all marked memory by moving objects around during collection, and generational collector
algorithms that predict the lifetime of objects to minimize the size of the traced heap.
One interesting side note is conservative collection, which allows gc to operate on an implementation that
does not provide run-time types. The way that it works is to treat every four bytes of allocated memory as if
it were a legal pointer. Assuming block sizes are stashed right in front of the first byte of the block and that
we have an allocated list, we can quickly reject many false pointers and for the rest can trace them through
the system with mark-sweep. This is even more conservative than regular mark-sweep because sometimes four
bytes will happen to match a valid block address when they are not in fact a pointer. Recall that mark-sweep
could not determine whether a block will actually be used in the future, only whether it is referenced. So,
the conservative collector is more conservative than regular mark-sweep, but neither is perfect. One popular
implementation is the Boehm-Demers-Weiser collector http://www.hpl.hp.com/personal/Hans Boehm/gc/, which is used
to add garbage collection to C/C++.
11 SOME POPULAR LANGUAGES 31
References
Alonzo Church. A set of postulates for the foundation of logic. Annals of Mathematics, 33:346–366, 1932.
Kurt Gödel. Über formal unentscheidbare sätze der principia mathematica und verwandter systeme. I. Monatshefte für
Mathematik und Physik, 38:173–198, 1931.
Shriram Krishnamurthi. Programming Languages: Application and Interpretation. 04 2007.
Barbara Liskov. Keynote address - data abstraction and hierarchy. In OOPSLA ’87: Addendum to the proceedings on Object-
oriented programming systems, languages and applications (Addendum), pages 17–34, New York, NY, USA, 1987. ACM.
Robin Milner. A theory of type polymorphism in programming. Journal of Computer and System Sciences, 17:348–375,
1978.
Benjamin C. Pierce. Types and programming languages. MIT Press, Cambridge, MA, USA, 2002.
Vijay Saraswat. Java is not type-safe. Technical report, AT&T Research, 1997. http://www.research.att.com/ vj/bug.html.
Alan Turing. On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Math-
ematical Society, 42:230–265, 1936. http://www.turingarchive.org/browse.php/B/12.
Jean van Heijenoort, editor. From Frege to Gdel: A Source Book in Mathematical Logic, 1979-1931. Harvard University
Press, 1967.
Andrew K. Wright and Matthias Felleisen. A syntactic approach to type soundness. Information and Computation, 115:38–
94, 1994.