0% found this document useful (0 votes)

24 views34 pages

Type Inference

The document discusses the concepts of compilers, interpreters, and domain-specific languages (DSLs), emphasizing the stages of compilation including lexing, parsing, semantic analysis, and code generation. It explains the roles of abstract syntax trees (ASTs) and intermediate representations (IRs) in the compilation process, as well as the importance of type inference and semantic checks. Additionally, it highlights the use of lexer and parser generators and the significance of context-free grammars in syntax analysis.

Uploaded by

aswitha.bukka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views34 pages

Type Inference

Uploaded by

aswitha.bukka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Type Inference 1

Spring 2025 © 2025 Ritwik Banerjee

Compilers and Interpreters

While you may never need to design a complete programming language,

it is much more likely that at some point in your career, you will have to
design a special-purpose language for a highly specific scenario. Such a
language is called a domain-specific language (DSL).

We saw an example of a DSL-entity being created using algebraic

data types.

Our next topic – interpreters, syntax trees, and type inference – is

meant to prepare you for such a scenario.

Spring 2025 © 2025 Ritwik Banerjee 2

Compilers and Interpreters

A compiler is a program that translates code written in a Source

source language into another target language. Usually, the Program
term “compiler” is used for programs where the target
language is a low-level language (e.g., assembly language •Object files
or machine code) to create an executable program. •Linker these are some steps in complier
Compiler •Executable which are not so imp for ths course
• After producing the machine code, the compiler is •Loader

no longer needed. The operating system (OS) loads

and executes the target code.
Machine Program
Code Input

Runtime
Environment

Program
Output

Spring 2025 © 2025 Ritwik Banerjee 3

Compilers and Interpreters

An interpreter directly executes code written in a source

language, without producing any code in a target language.
Some interpreters directly execute code written in a source
language, while others execute an intermediate form Program
known as bytecode. example here is java bytecode Text

• The OS loads and executes the interpreter, and the

interpreter is responsible for executing the source
(or intermediate) code.
INTERPRETER
Program
Output

Program
Input

Spring 2025 © 2025 Ritwik Banerjee 4

Stages of a Compiler

A lexer is responsible for transforming the source code text Program: 1 + (2 + 3)

from a sequence of characters to a sequence of tokens. A
token is a sequence of adjacent characters that form a +
meaning when put together in that order (analogous to / \
“word” in a natural language). The process is called lexing. 1 +
• Tokens include reserved names like if, let, or / \
match (in OCaml); constants like 42 or“hello”; 2 3
variable names like h, t, or lst; punctuation and The parentheses are implicit in the tree structure, so they
other symbols like ; or ;; or ->
are not represented in the tree as separate nodes.
A parser transforms the sequence of tokens into an
abstract syntax tree (AST), which is a tree structure used Program: [1; 2; 3]
to represent the structure of a program. The syntax is
abstract because it does not represent all the details in the list
real syntax, but rather just the structural or content-related / | \
information. 1 2 3
Here are two simple examples of why such trees are
considered to be abstractions of the concrete syntax of the The square brackets and semicolons are not represented
program. in the tree. The parent node already provides the necessary
details of the data type that stores the three values.

Spring 2025 © 2025 Ritwik Banerjee 5

Stages of a Compiler

Program (Euclid’s algorithm to compute GCD):

while b ≠ 0:
if a > b:
a := a - b
else:
b := b - a
return a

By Dcoetzee - Own work, CC0 | source

Spring 2025 © 2025 Ritwik Banerjee 6
Stages of a Compiler

A compiler checks if the program is meaningful as per the rules of the source language. This is done during the compiler’s
semantic analysis, and it includes type checking. This will probably make intuitive sense, since we introduced types to
lambda calculus in order to produce “meaningful” programs with the appropriate invariants. like lamda n. 1/n for n is 0 case as an example.
Type checking typically uses a symbol table, which maps names to their types. This table defines an environment for the
program. When the program enters a new scope, this table is extended with new bindings (each name-type entry is a key-value
pair in the table/map).
• Please keep in mind the concept of shadowing: when a new binding is added, it may shadow an older binding; when the
program exits the scope, the scope’s bindings are removed and the shadowed bindings are not directly accessible again.
• You can think of the symbol table as a standard map/table/dictionary data structure, but with additional stack-like
properties: you can only directly access the names and types of the “top” scope.

Spring 2025 © 2025 Ritwik Banerjee 7

Stages of a Compiler

Semantic analysis also performs other checks. For example: if pattern matching is exhaustive (OCaml), if a Java attribute
marked as private is being accessed outside the class, if a Java attribute marked as final has been initialized in the
constructor, etc.
Note: There is some overlap between parsing (syntax analysis) and semantic analysis. Parsing does just enough work to
produce an AST for a program. Everything else is done by semantic analysis. But, strictly speaking, producing the AST may
also require checking whether the program is meaningful. ex: the woman duck with a telescope.
Programs may require semantic analysis to determine the correct AST. For example, consider this Java/C code snippet:
(expr_a) – token_b
• Is the above line evaluating a numeric expression expr_a and then subtracting another number token_b from it?
ex: expr_a is float i.e
• Or is it performing unary negation of the number token_b, and then explicitly casting it to the data type expr_a?
(float) - token_b
This is not immediately obvious! Additional semantic analysis is needed to determine if expr_a is a variable name or a type
name before the compiler can correctly choose one of the above options.
Typically, the parser and semantic analysis work together: the parser produces an AST with some “ambiguous” nodes, then the
semantic analysis stage modifies the tree so that the ambiguity is resolved. so parsers produces multiple AST and check for semantics and
accepts which one is meaningful.

Spring 2025 © 2025 Ritwik Banerjee 8

Stages of a Compiler

Usually, the compiler does not translate the AST directly to the target
language (even though, in theory, it could).
C++
This is because, in practice, the target language is machine code, and Java
thus, machine-dependent (e.g., x86 or ARM). Instead of translating the AST
to each target, the process is broken up: the compiler first translates the
AST to an intermediate representation, which is an abstraction of several F#
assembly languages. for example javabytecode
• Java bytecode is an example of such an intermediate
representation (IR). Many languages use C as an IR, because C is
essentially an abstraction of assembly (and it is the de facto Intermediate this IR can be translated to corresponding

language of Unix-like operating systems). Representation ML language

• Multiple source languages (e.g., C, C++, Java, OCaml) can be

translated to the same IR. For example: Java bytecode is used as
the IR by compilers of Java, Kotlin, Scala, Closure, Groovy, and Machine code Machine code Machine code
several other languages; the common intermediate language (CIL) for x86 for ARM for RISC-V
is used as IR by all compilers for the .NET framework.
Then, many target language outputs can be produced from that IR.
Note: An IR is an abstraction because it remains unaware
An IR is an abstract machine language, meant for conceptually simple
of constraints like the number of registers available in the
tasks like loading from memory, storing to memory, calling and returning,
or jumping to other instructions. machine, or the specific physical CPU architecture.

Spring 2025 © 2025 Ritwik Banerjee 9

Stages of a Compiler

The final stage is target code generation from the IR. Here, concrete machine instructions (not oblivious to the physical CPU
architecture any more) are selected for machine-dependent optimizations.
Various resource allocation and storage decisions are made while translating the IR to the target (native machine language).
It is common to group the compiler’s stages into its
• front end (lexing, parsing, and semantic analysis, producing the AST and symbol tables);
• middle (CPU-agnostic optimizations on the IR); and
• back end (target code generation and machine-dependent optimizations).

An interpreter works the like front end of a full-fledged compiler: it does lexing, parsing, and semantic analysis. After that, it
can either execute the AST, or transform the AST into an IR and then execute the IR.

Spring 2025 © 2025 Ritwik Banerjee 10

Syntax Analysis: Lexers

It is very unlikely that you will ever need to build a lexer or parser from scratch. Most languages have built-in tools that
automatically generate them from the formal syntax descriptions of the language: lex (or some variant of it) to generate a lexer,
and yacc (or some variant of it) to generate a parser. ex: ocmal lex

Lexer are generated using deterministic finite automata (DFA), which are abstract machines that accept regular languages (and
for our purpose, we can think of them as regular expressions).
• The input to a lexer generator is a set of regular expressions. These describe the tokens of the language. For example, if
we build an interpreter for arithmetic expressions, we will need a regular expression to capture all valid numeric values.
• The output of a lexer generator is a DFA implemented in a higher-level language (C, if we use lex; OCaml, if we use
ocamllex; etc.). This automaton takes strings as its input, i.e., each character in the source program text now becomes a
character input to the automaton. In the end, the automaton either accepts the input string as a valid token in the source
language, or rejects the input string as an invalid token.

Spring 2025 © 2025 Ritwik Banerjee 11

Syntax Analysis: Parsers

Parsers are also built on the theory of automata. They use

what’s known as pushdown automata, which can accept a
Context-free languages
much larger class of languages, known as context-free (accepted by pushdown
languages (CFLs). automata)

• CFLs are a strict superset of regular languages, and

patterns that lie in this “difference” play an
important role in programming language syntax.
• For example, the language defined as ℒ = {𝑎! 𝑏 ! ∶≥
1} is a CFL, but it is not regular.
• This is the language of balanced delimiters (e.g., put Regular languages (accepted
by deterministic finite
𝑎 =‘{’ and 𝑏 =‘}’)! The ability to detect whether automata)
delimiters are balanced is crucial to any syntax-
checking process.
Just like a regular language can be expressed by a regular
expression, a CFL can be expressed by a context-free Balanced
grammar (CFG). parentheses

Spring 2025 © 2025 Ritwik Banerjee 12

Syntax Analysis: Parsers
check these symbol examples
A context-free grammar defines a set of terminal and non-terminal symbols, and a set of production rules describing how
non-terminals can be replaced. Let us consider the language of balanced delimiters again. The valid strings in this language
are of the form {}, {{}}, {}{}, {{}{}{{}}}, etc., while invalid strings include } and {}}}{.
The production rules for this language are
• 𝑆→ 𝑆
• 𝑆 → 𝑆𝑆
• 𝑆→𝜖
where 𝜖 denotes the empty string, 𝑆 is the only non-terminal, and the left/right braces are the terminal symbols.
The standard notation for CFGs is something you have already seen before: the Backus-Naur Form (BNF).check this form in 1st lectures.
• The input to a parser generator is the language syntax described in BNF. This is the output of the lexer.
• The output of a parser generator is a program that recognizes/accepts the language of the grammar.

Spring 2025 © 2025 Ritwik Banerjee 13

Syntax Analysis and Abstract Syntax Trees

Recall how we started our journey into syntax and semantics with the
Backus-Naur Form and a “toy” language of arithmetic expressions. If +
we only care about addition and multiplication, our BNF becomes
e ::= i | e op e | (e) here op is operator + *
i ::= <integers>

Note: the name “integers” is yet undefined. 5 7 2 3

If you connect that material to the second programming assignment,
you may quickly realize that (a) we were able to work with
expressions that have whitespace, parentheses, and multiple
operators, and (b) those expressions were represented as trees that +
represented how the tokens related to each other. E.g., (5 + 7) + 2
* 3 or 6 * 2 + 7.
* 7

6 2

Spring 2025 © 2025 Ritwik Banerjee 14

Syntax Analysis and Abstract Syntax Trees

Recall how we started our journey into syntax and semantics with the How would you define such an expression in OCaml?
Backus-Naur Form and a “toy” language of arithmetic expressions. If Probably something like this:
we only care about addition and multiplication, our BNF becomes type expr =
e ::= i | e op e | (e) | Int of int
i ::= <integers> | Operator of expr * op * expr
and
Note: the name “integers” is yet undefined. op = Add | Mult;;
If you connect that material to the second programming assignment,
We don’t need to model the third component from
you may quickly realize that (a) we were able to work with
our BNF, (e): it gets abstracted out because the tree
expressions that have whitespace, parentheses, and multiple
structure already captures the meaning of the
operators, and (b) those expressions were represented as trees that
parentheses tokens.
represented how the tokens related to each other. E.g., (5 + 7) + 2
* 3 or 6 * 2 + 7. In general, you will see a very close correspondence
between the formal specifications of a language’s
syntax (as expressed using BNF), and the abstract
syntax of that language (as expressed using algebraic
data types).

Spring 2025 © 2025 Ritwik Banerjee 15

Program Evaluation ► Simplification

After lexing and parsing, semantic analysis yields the abstract syntax tree (AST) of a program.
• Next, a compiler rewrites the AST into an intermediate expression (IR). for efficiency tradeoffs. to simplify the AST's into other features which are
easily understandable.
• An interpreter has two options: it may also rewrite the AST into an IR, or it may directly evaluate the AST.
Why rewrite the AST?
To Simplify it. Often, certain language features can be rewritten (i.e., implemented) in terms of other features. It makes sense to
simplify the core language so that we don’t have to worry about too many distinct features. This keeps the core of the
compiler/interpreter smaller.
• One obvious example of such simplification we can see in programming languages is what’s called “syntactic sugar”
(and eliminating them from our set of features is de-sugaring the language)
understand this cncept in detail.

Spring 2025 © 2025 Ritwik Banerjee 16

Program Evaluation ► Simplification

Examples of syntactic sugar public class SomeClass {

• Array access in C: a[i] ► *(a+i)
public static String twice(String s) { return s + s; }
• Augmented assignment in C, C++, Java, C#, and
others: List<String> names = Arrays.asList("Alice", "Bob", "Charlie");
a += b ► a = a + b
// (1) Using lambda expressions
• Extension method in OOP languages like Java, C#,
names.stream().map(name -> name.toUpperCase())
Ruby, VB.Net:
.forEach(s -> System.out.println(s));
myObject.myMethod(arg1, arg2) ►
myMethod(myObject, arg1, arg2) // (2) Using a static method reference
• Ternary conditional operator in C, C++, C#, Java, and // SomeClass::twice is identical to the lambda expression
JavaScript: // s -> SomeClass.twice(s)
names.stream().map(SomeClass::twice)
condition ? val1 : val2 ► if (condition) .forEach(s -> System.out.println(s));
then val1 else val2
• Enhanced for-loop in Java ► iterator implementation // Using instance method references; this is identical to (1)
check these examples
names.stream().map(String::toUpperCase)
• Method reference in Java streams
.forEach(System.out::println);

Spring 2025 © 2025 Ritwik Banerjee 17

Program Evaluation ► Simplification

We have seen and used syntactic sugar in OCaml as well!

let x = e1 in e2 is equivalent to (fun x -> e2) e1
Suppose we had a language with this BNF (corresponding to its AST):
e ::= x | fun x -> e | e1 e2 | let x = e1 in e2
This can be simplified by de-sugaring, so that the interpreter only has to worry about
e ::= x | fun x -> e | e1 e2

Spring 2025 © 2025 Ritwik Banerjee 18

Program Evaluation ► Operational Semantics

Once we have the de-sugared AST, which is the first step of

simplification, it’s time for evaluating the program.
Evaluation is the process of simplifying the AST until no
further reduction is possible. In other words, no further
computation remains to be done because we have arrived
at a value. Recall from the various BNFs we have seen, that
a value is a strict subset of the expressions allowed by a 11
language’s grammar, and it is an irreducible (i.e., terminal)
symbol.
Evaluation is defined in terms of a mathematical relation.
In fact, this relation bears resemblance to the single-step
𝛽-reduction you saw in the evaluation of lambda calculus
expressions/programs.
Unlike denotational semantics, the meaning of a program
is defined in terms of what the program evaluates to. In this
way, the sequence of single-step computations **is** the
meaning of the program. This way of defining the meaning
of a program is called operational semantics.

Spring 2025 © 2025 Ritwik Banerjee 19

Program Evaluation ► Operational Semantics

To illustrate operational semantics using step-by-step evaluation, let us augment our toy arithmetic language in two ways: (i)
include let and if-then-else; and (ii) include the inequality <= as an operator.
In this language, a terminal irreducible value is either an integer or a Boolean constant. The BNF of our language is now
e ::= x | int | bool | e1 op e2
| if e1 then e2 else e3
| let x = e1 in e2
op ::= + | * | <=
v ::= int | bool
bool ::= True | False
Note: We are still using some built-in underlying notions like “what is an integer” and “what are +, *, and <=“.
The operational semantics of such a language can be defined in terms of the reduction (->) relation. Additionally, it helps to
introduce the negated relation (!->) as well.
For example, we can specify that a terminal value cannot be reduced with the rules int !-> and bool !->

Spring 2025 © 2025 Ritwik Banerjee 20

Program Evaluation ► Operational Semantics

Earlier in this course, we saw the stepwise evaluation of our toy arithmetic language. For example,
e1 -> e3 => e1 op e2 -> e3 op e2
This is an example of inductive reasoning: two expressions are related by -> if two other expressions (at least one of which is
simpler) are also related by ->.
This rule reduces the first argument of a binary operator. Once that is complete (i.e., the first argument has been reduced step-
by-step all the way to a terminal value), the evaluation of the second argument can begin:
e2 -> e3 => v1 op e2 -> v1 op e3
And finally, when both arguments are values, we can use the built-in definition of the binary operator to evaluate
v1 op v2 -> v

Spring 2025 © 2025 Ritwik Banerjee 21

Program Evaluation ► Operational Semantics

Previously, it was pointed out that the evaluation bears resemblance to the 𝛽-reduction we saw in lambda calculus. For the
“let” binding, this is literally true:

e1 -> e3 => let x = e1 in e2 -> let x = e3 in e2

and when a terminal value is reached,

let x = v in e2 -> e2 with v1 substituted for x

This is exactly what 𝛽-reduction did! The lambda calculus syntax for the same substitution was [x -> v] (e2).

Note: please look out for the square brackets [ ] surrounding a reduction. Unfortunately, the syntax for lambda calculus’
substitution and the general operational semantics step both use the -> arrow. If we use the square brackets, the context
should tell you that we are using the arrow to indicate 𝛽-reduction in lambda calculus.

Spring 2025 © 2025 Ritwik Banerjee 22

Program Evaluation ► Operational Semantics

In operational semantics, there are actually two (not one) relations that define the meaning of a program/expression.
The first, which we denoted by ->, is called small-step semantics. It formally describes how the individual steps of a
computation take place. This is useful to describe and understand specific details and features of a programming language.
There is also the big-step semantics, sometimes called natural semantics, which describes the final result of a program. This
is a faithful abstraction of the small-step semantics, and it is quite similar to how interpreters are often implemented. If we
denote the big-step semantics by the relation ↦, we can represent it as follows:
!
∀ expressions 𝑒 and values 𝑣, 𝑒 ↦ 𝑣 if and only if there exists a sequence 𝑒" "#$ , 𝑛 ≥ 0, such that 𝑒 = 𝑒$ → ⋯ → 𝑒! = 𝑣.
In other words, if an expression reaches a value through a sequence of well-defined small-step evaluations, then and only then
does it reach that value in a big step.
This is why we call ↦ an abstraction of -> (it just “forgets” the intermediate low-level details and jumps to the final result).

Spring 2025 © 2025 Ritwik Banerjee 23

Program Evaluation ► Operational Semantics

Let’s now tie this to something you have studied earlier.

Exercise
Consider this small language, defined using BNF:
Using the syntax we have been using in
e ::= x | e1 e2 | fun x -> e this topic, how would you similarly
v ::= fun x -> e
define call-by-name semantics for
The functions are anonymous, and the metavariable x is a placeholder for lambda calculus?
other variables (so, technically, we could add another line to this BNF, stating
x ::= <variables>).

What you see above is lambda calculus.

Using big-step evaluation, we can define its semantics as

e1 ↦ fun x -> e and e2 ↦ v2 and [x -> v2](e) => e1 e2 ↦ v

The arguments are reduced to a terminal value before a function is applied.

You know this as call-by-value semantics, which is used in many languages
(including OCaml).

Evaluation models

What we have seen so far today is the use of substitution rules to perform step-by-step reductions and arrive at the final
“meaning” of a program/expression.
We also concluded (using the BNF for lambda calculus) that this definition of a program’s semantics aligns with call-by-value
semantics. This approach is not very smart, when it comes to practical implementations. It’s “too eager”!
• For instance, we could have a program of the form let x = 5 in e (or equivalently, calling a function with argument 5,
where the function body is the expression e). This evaluation technique requires substituting every occurrence of x in e
(if e is very large, this means parsing a huge amount of text looking for x). What if x never even occurs in e or it occurs
only in a specific branch that never gets evaluated?
• In many scenarios, it is better to be “lazy” and substitute only when the value of a variable is needed for the next step of
computation. Otherwise, the interpreter may be working a lot for nothing.
For this lazy evaluation approach, a data structure called the dynamic environment is used.
• You can think of it as a dictionary (or map), mapping variable names to values. Instead of eager substitutions, the
interpreter looks up the value in this dictionary when the value is needed.

Type Checking

After lexing and parsing, we jumped into the program evaluation. However, one
extremely important component of the semantic analysis that happens before
evaluation is type checking. This is a major (in fact, it is *the* major) task
within the semantic analysis phase of a compiler/interpreter’s job.

Formally, a type system is a mathematical description of how to determine

whether an expression is well-typed or not. Recall that many programs are
well-defined only for certain data types (which serve as important invariants for
the program). For example, the program fun x -> 1 + x is well-defined in a
strongly typed language only when the type of x is numeric.

If an expression is well-typed, the type system also determines the type of that
expression. For example, the type system of OCaml determines that the type of
fun x -> 1 + x is int -> int.

Type Checking & Environments

“Prevention is better than cure”: The goal of type checking is to prevent runtime type errors. These errors are detectable at
compile time, so we should never allow them to happen at runtime.
A type checker is a program that implements a type system. It analyzes a program and rejects it if there are any type errors. In
other words, if an error is detectable at compile time, the type checker will simply not allow that program to run.
And to achieve this goal, a type checker uses a static (compile-time) environment, which maps names (in scope) to types, as
opposed to the dynamic (runtime) environment, which maps names to values:

x 5 x int So, we can think of the static environment as an abstraction of the

dynamic environment: we know that x and y are going to be some
y 7 y int int values, we just don’t know which ones.
c false c bool Formally, a type checker is usually formulated using ternary (i.e.,
A dynamic environment A static environment with 3-arguments) relations E ⊢ 𝑒: 𝑡 to mean that the expression 𝑒
maps names to values maps names to type has type 𝑡 in a static environment E. Usually, we read it as “the
environment E shows that 𝑒 has type 𝑡”.

Type Checking

The evaluation of an expression e is stuck if

1. e is not a value, i.e., computation has not yet terminated, and
2. e is irreducible, i.e., there is no way to move forward with any computation.
A type system’s purpose is to make sure that no expression gets stuck. This is what we call type safety, and type-safe
expressions are what we call ‘well typed’. Let’s define this term precisely, though.
We can denote the static environment as a map, and write x:t to denote that x is bound to t. So, {foo:int, bar:float}
denotes the static environment where foo has type int and bar has type float.
An expression 𝑒 is well-typed in a static environment E if there exists a type 𝑡 such that E ⊢ 𝑒: 𝑡. The type checker’s goal is find
such a type 𝑡, starting with some initial static environment.
Note: For the sake of convenience, we sometimes show analyses with an initial environment that’s empty. In practice, this
is rarely the case. A language will almost always use some built-in names and their types that are in scope. For example,
OCaml will have the names defined in its Stdlib module.

A Simple Type System

Let’s revisit our toy language, where I now use “i” and “b” for integers and Booleans in the BNF to avoid confusion with the data
types:

e ::= x | i | b | e1 op e2
| if e1 then e2 else e3
| let x = e1 in e2
op ::= + | * | <=

We want to define a type system E ⊢ 𝑒: 𝑡. The only data types are integers and Boolean constants:

t ::= int | bool

The ternary relation will be inductively defined. That is, the type of an expression is based on the type of its sub-expressions.

A Simple Type System

Three things are fixed, though:

1. E ⊢ 𝑖: int, an integer constant always has type int;

2. E ⊢ 𝑏: bool, a Boolean constant always has type bool; and

3. 𝑥: 𝑡, … ⊢ 𝑥: t, a variable has whatever type its static environment dictates.

Everything else follows inductively from these three.

A Simple Type System

Consider the let binding.

let x = 5 in x + x;;

The constant 5 has type int (this was one of our three base rules). That is, E = 5: int .

And, E ∪ x: int shows that the expression x + x has type int.

We can formalize this as the following rule:

E ⊢ 𝑒1: 𝑡1 ∧ E ∪ 𝑥: 𝑡1 ⊢ 𝑒2: 𝑡2 ⇒ 𝐸 ⊢ let 𝑥 = 𝑒1 in 𝑒2: 𝑡2

In other words, if an expression e1 has type t1 in the environment E, and an expression e2 has type t2 in an environment that is
E plus the variable x being bound to the type t1, then “let x = e1 in e2” has type t2.

Exercise
Similarly, write down the type system rules for (i) the binary operators, and (ii) the if construct.

Type Inference

OCaml and Java are both statically typed languages. Thus, type checking is a compile-time process that either accepts or
rejects a program (unlike dynamically typed languages like Python or JavaScript).
But unlike Java (and other languages like C, C++, etc.), Ocaml is implicitly typed. A programmer does not usually need to
specify the data types, as the types are inferred. This is possible due to the sophistication of the type checker, which can figure
out what the types would have been, if the programmer had correctly specified them in the program text.
• Such specifications are often called type annotations.
Type inference and type checking are usually rolled into a single process called type reconstruction. At a very high level, it
works as follows:
1. Determine the types of later definitions using the types of earlier definitions. E.g., determine the type of fun x -> 1 + x to
be int -> int by using the fact that the type of 1 + x is int.
2. For each “let” definition, use the definition to determine the constraints about its type. E.g., determine the type of x to be
int, based on the constraint imposed by the expression 1 + x. The set of all constraints (from if-then-else, pattern
matches, etc.) form a system of equations.
3. Use the system of equations to solve for the type of the name being defined.

Type Inference

The system just described is called the Hindley-Milner type system. We will not go deeper into this, and we will also not
formally explore how type reconstruction handles mutable data types.
But we will use the intuition of this algorithmic approach to reconstruct/infer the types of certain programs and expressions.
Here are some example exercises (the goal is to figure out the type of the name shown in red):
let double x = 2 * x;;
let square x = x * x;;
let twice f x = f (f x);;
let quad = twice double;;
let fourth = twice square;;
let rec f x = g x and g x = f x

Type Inference

let thrice f x = f(f(f(x)));;

let composition f g x = f(g(x));;
let triple3 = thrice tripleFloat;;
let f list =
let rec aux acc = function
| [] -> acc
| h::t -> aux (h::acc) t in aux [] list;;
let rec f g lst = if (lst = []) then []
else (let h = List.hd lst in
let t = List.tl lst in
(if (g h) then (g h)::(f g t) else f g t));;

C# For Beginners: An Introduction to C# Programming with Tutorials and Hands-On Examples
From Everand
C# For Beginners: An Introduction to C# Programming with Tutorials and Hands-On Examples
Nathan Metzler
5/5 (1)
Chapter 1
No ratings yet
Chapter 1
43 pages
1-Phases of Compiler
No ratings yet
1-Phases of Compiler
68 pages
1 Compiler Phases
No ratings yet
1 Compiler Phases
30 pages
Lecture 1 - Ch1. Introduction To Compiler
No ratings yet
Lecture 1 - Ch1. Introduction To Compiler
29 pages
AT - Module6 - Compiler and Its Phases - PS
No ratings yet
AT - Module6 - Compiler and Its Phases - PS
32 pages
Compiler Construction CSEC325 Token
No ratings yet
Compiler Construction CSEC325 Token
2 pages
Lec00 Outline
No ratings yet
Lec00 Outline
27 pages
Lect 02
No ratings yet
Lect 02
29 pages
Phases of Compiler
No ratings yet
Phases of Compiler
36 pages
Lec#1
No ratings yet
Lec#1
36 pages
Compiler Construction Final
No ratings yet
Compiler Construction Final
6 pages
Compiler Construction CS-4207 Lecture - 01 - 02: Input Output Target Program
No ratings yet
Compiler Construction CS-4207 Lecture - 01 - 02: Input Output Target Program
8 pages
Compiler Design Note1
No ratings yet
Compiler Design Note1
111 pages
Compiler
No ratings yet
Compiler
15 pages
Compiler Construction: Lecture 1 - An Overview
No ratings yet
Compiler Construction: Lecture 1 - An Overview
30 pages
Demonstrate The Phases of A Compiler With Example
No ratings yet
Demonstrate The Phases of A Compiler With Example
16 pages
Compiler Design Ch1
No ratings yet
Compiler Design Ch1
13 pages
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
No ratings yet
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
53 pages
1.1 What Is A Compiler?: Source Language Translator Target Language
No ratings yet
1.1 What Is A Compiler?: Source Language Translator Target Language
22 pages
1 Compiler Design Lect1
No ratings yet
1 Compiler Design Lect1
28 pages
Ln01 Overview
No ratings yet
Ln01 Overview
11 pages
1-Phases of Compiler
No ratings yet
1-Phases of Compiler
66 pages
Cs133 Group A: Compiler Construction
No ratings yet
Cs133 Group A: Compiler Construction
24 pages
Compiler Assisgnment
No ratings yet
Compiler Assisgnment
15 pages
Unit 1 CD
No ratings yet
Unit 1 CD
26 pages
Lecture1 - Compiler Design
No ratings yet
Lecture1 - Compiler Design
52 pages
Unit I SRM
100% (1)
Unit I SRM
36 pages
Unit 1
No ratings yet
Unit 1
109 pages
CH 02 - PL
No ratings yet
CH 02 - PL
92 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Material 1
No ratings yet
Material 1
164 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
13 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
Chapter 2 - A Quick Tour: 2.1 The Compiler Toolchain
No ratings yet
Chapter 2 - A Quick Tour: 2.1 The Compiler Toolchain
6 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Compiler Design - Compilers Principles and Practice - A.hosking - Compiler Course Slides
No ratings yet
Compiler Design - Compilers Principles and Practice - A.hosking - Compiler Course Slides
237 pages
Lecture#1 2
No ratings yet
Lecture#1 2
54 pages
Compiler CT
No ratings yet
Compiler CT
4 pages
Chapter 1 - Intro
No ratings yet
Chapter 1 - Intro
20 pages
Compiler 1
No ratings yet
Compiler 1
28 pages
Compiler
No ratings yet
Compiler
27 pages
Compiler CT
No ratings yet
Compiler CT
8 pages
Chapter 1
No ratings yet
Chapter 1
35 pages
Bcse307l - Module 1
No ratings yet
Bcse307l - Module 1
121 pages
Chapter 1-2 Compiler Design
No ratings yet
Chapter 1-2 Compiler Design
60 pages
Chapter 1 Completed
No ratings yet
Chapter 1 Completed
31 pages
Introduction Compiler
No ratings yet
Introduction Compiler
47 pages
Compiler Construction Notes
No ratings yet
Compiler Construction Notes
61 pages
Introduction To Compilation
No ratings yet
Introduction To Compilation
33 pages
Lecture 01
No ratings yet
Lecture 01
47 pages
Introduction To Compiler
No ratings yet
Introduction To Compiler
14 pages
Cousins of Compiler
100% (1)
Cousins of Compiler
25 pages
1-Introduction To Compilers
No ratings yet
1-Introduction To Compilers
40 pages
Introduction To Compiler Construction: M Ikram Ul Haq
No ratings yet
Introduction To Compiler Construction: M Ikram Ul Haq
11 pages
C Language for Beginners with Easy Tips of C Basic Programming
From Everand
C Language for Beginners with Easy Tips of C Basic Programming
Publicancy Ltd
No ratings yet
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
From Everand
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
Dexter Rogers
No ratings yet
Programming In ‘C’
From Everand
Programming In ‘C’
Rajendra Kawale
No ratings yet
C Programming: Core Concepts and Techniques
From Everand
C Programming: Core Concepts and Techniques
William Smith
No ratings yet
C++ Basics for New Programmers: A Practical Guide with Examples
From Everand
C++ Basics for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
Unit-5 Undecidability-ToC
No ratings yet
Unit-5 Undecidability-ToC
13 pages
Python Formal and Normal Languages
No ratings yet
Python Formal and Normal Languages
8 pages
Chapter - 2 - Finite State Automata - Part - 2
No ratings yet
Chapter - 2 - Finite State Automata - Part - 2
55 pages
ITE1006 Theory-Of-Computation TH 1 AC40
No ratings yet
ITE1006 Theory-Of-Computation TH 1 AC40
2 pages
Final CSE V Semester New Syllabus
No ratings yet
Final CSE V Semester New Syllabus
13 pages
FLAT Curriculum
No ratings yet
FLAT Curriculum
1 page
The Logic of Incest PDF
No ratings yet
The Logic of Incest PDF
305 pages
Road Map and Outlines BS (CS) 2023-27 (U)
No ratings yet
Road Map and Outlines BS (CS) 2023-27 (U)
8 pages
Lecture 01 - Introduction To LT & FA-2024
No ratings yet
Lecture 01 - Introduction To LT & FA-2024
34 pages
Recursive Prime Factorizations: Dyck Words As Representations of Numbers
No ratings yet
Recursive Prime Factorizations: Dyck Words As Representations of Numbers
49 pages
10 1 1 1 6269 PDF
No ratings yet
10 1 1 1 6269 PDF
231 pages
This Is Gold
No ratings yet
This Is Gold
47 pages
Gallier Theory of Computation
No ratings yet
Gallier Theory of Computation
398 pages
Chap-2 2 (RegularExpression)
No ratings yet
Chap-2 2 (RegularExpression)
46 pages
T.Y.B.sc. Computer Science 30june
No ratings yet
T.Y.B.sc. Computer Science 30june
53 pages
Sheet1 SAsec
No ratings yet
Sheet1 SAsec
3 pages
3 RegularExpressions
No ratings yet
3 RegularExpressions
25 pages
Syntax Analysis
No ratings yet
Syntax Analysis
47 pages
JKSET Syllabus Computer Science Applications
No ratings yet
JKSET Syllabus Computer Science Applications
6 pages
Lecture Note Formal Methods in Software Engineering - Lecture 1 (Download Tai Tailieutuoi - Com)
No ratings yet
Lecture Note Formal Methods in Software Engineering - Lecture 1 (Download Tai Tailieutuoi - Com)
6 pages
Flat Syllabus
No ratings yet
Flat Syllabus
3 pages
Department of Computer Science and Engineering
No ratings yet
Department of Computer Science and Engineering
40 pages
Chomsky
No ratings yet
Chomsky
21 pages
SRSM: A Software Requirements Specification Method: April 1994
No ratings yet
SRSM: A Software Requirements Specification Method: April 1994
27 pages
TCS Confidential
No ratings yet
TCS Confidential
15 pages
Formal and Informal English
No ratings yet
Formal and Informal English
8 pages
511CIT05 Formal Languages and Automata Theory LTPM C 3 1 0 100 4 Aim
No ratings yet
511CIT05 Formal Languages and Automata Theory LTPM C 3 1 0 100 4 Aim
2 pages
Theory of Automata: Dr. S. M. Gilani
No ratings yet
Theory of Automata: Dr. S. M. Gilani
29 pages
Theory of Automata - CS402 Solved Mcqs
100% (4)
Theory of Automata - CS402 Solved Mcqs
17 pages

Type Inference

Uploaded by

Type Inference

Uploaded by

Type Inference 1

Spring 2025 © 2025 Ritwik Banerjee

While you may never need to design a complete programming language,

We saw an example of a DSL-entity being created using algebraic

Our next topic – interpreters, syntax trees, and type inference – is

Spring 2025 © 2025 Ritwik Banerjee 2

A compiler is a program that translates code written in a Source

no longer needed. The operating system (OS) loads

Spring 2025 © 2025 Ritwik Banerjee 3

An interpreter directly executes code written in a source

• The OS loads and executes the interpreter, and the

Spring 2025 © 2025 Ritwik Banerjee 4

A lexer is responsible for transforming the source code text Program: 1 + (2 + 3)

Spring 2025 © 2025 Ritwik Banerjee 5

Program (Euclid’s algorithm to compute GCD):

By Dcoetzee - Own work, CC0 | source

Spring 2025 © 2025 Ritwik Banerjee 7

Spring 2025 © 2025 Ritwik Banerjee 8

language of Unix-like operating systems). Representation ML language

• Multiple source languages (e.g., C, C++, Java, OCaml) can be

Spring 2025 © 2025 Ritwik Banerjee 9

Spring 2025 © 2025 Ritwik Banerjee 10

Spring 2025 © 2025 Ritwik Banerjee 11

Parsers are also built on the theory of automata. They use

• CFLs are a strict superset of regular languages, and

Spring 2025 © 2025 Ritwik Banerjee 12

Spring 2025 © 2025 Ritwik Banerjee 13

Note: the name “integers” is yet undefined. 5 7 2 3

Spring 2025 © 2025 Ritwik Banerjee 14

Spring 2025 © 2025 Ritwik Banerjee 15

Spring 2025 © 2025 Ritwik Banerjee 16

Examples of syntactic sugar public class SomeClass {

Spring 2025 © 2025 Ritwik Banerjee 17

We have seen and used syntactic sugar in OCaml as well!

Spring 2025 © 2025 Ritwik Banerjee 18

Once we have the de-sugared AST, which is the first step of

Spring 2025 © 2025 Ritwik Banerjee 19

Spring 2025 © 2025 Ritwik Banerjee 20

Spring 2025 © 2025 Ritwik Banerjee 21

e1 -> e3 => let x = e1 in e2 -> let x = e3 in e2

and when a terminal value is reached,

let x = v in e2 -> e2 with v1 substituted for x

Spring 2025 © 2025 Ritwik Banerjee 22

Spring 2025 © 2025 Ritwik Banerjee 23

Let’s now tie this to something you have studied earlier.

What you see above is lambda calculus.

Using big-step evaluation, we can define its semantics as

e1 ↦ fun x -> e and e2 ↦ v2 and [x -> v2](e) => e1 e2 ↦ v

The arguments are reduced to a terminal value before a function is applied.

Spring 2025 © 2025 Ritwik Banerjee 24

Spring 2025 © 2025 Ritwik Banerjee 25

Formally, a type system is a mathematical description of how to determine

Spring 2025 © 2025 Ritwik Banerjee 26

x 5 x int So, we can think of the static environment as an abstraction of the

Spring 2025 © 2025 Ritwik Banerjee 27

The evaluation of an expression e is stuck if

Spring 2025 © 2025 Ritwik Banerjee 28

t ::= int | bool

Spring 2025 © 2025 Ritwik Banerjee 29

Three things are fixed, though:

1. E ⊢ 𝑖: int, an integer constant always has type int;

2. E ⊢ 𝑏: bool, a Boolean constant always has type bool; and

3. 𝑥: 𝑡, … ⊢ 𝑥: t, a variable has whatever type its static environment dictates.

Everything else follows inductively from these three.

Spring 2025 © 2025 Ritwik Banerjee 30

Consider the let binding.

And, E ∪ x: int shows that the expression x + x has type int.

We can formalize this as the following rule:

Spring 2025 © 2025 Ritwik Banerjee 31

Spring 2025 © 2025 Ritwik Banerjee 32

Spring 2025 © 2025 Ritwik Banerjee 33

let thrice f x = f(f(f(x)));;

Spring 2025 © 2025 Ritwik Banerjee 34

You might also like