0% found this document useful (0 votes)
24 views34 pages

Type Inference

The document discusses the concepts of compilers, interpreters, and domain-specific languages (DSLs), emphasizing the stages of compilation including lexing, parsing, semantic analysis, and code generation. It explains the roles of abstract syntax trees (ASTs) and intermediate representations (IRs) in the compilation process, as well as the importance of type inference and semantic checks. Additionally, it highlights the use of lexer and parser generators and the significance of context-free grammars in syntax analysis.

Uploaded by

aswitha.bukka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views34 pages

Type Inference

The document discusses the concepts of compilers, interpreters, and domain-specific languages (DSLs), emphasizing the stages of compilation including lexing, parsing, semantic analysis, and code generation. It explains the roles of abstract syntax trees (ASTs) and intermediate representations (IRs) in the compilation process, as well as the importance of type inference and semantic checks. Additionally, it highlights the use of lexer and parser generators and the significance of context-free grammars in syntax analysis.

Uploaded by

aswitha.bukka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Type Inference 1

Spring 2025 © 2025 Ritwik Banerjee


Compilers and Interpreters

While you may never need to design a complete programming language,


it is much more likely that at some point in your career, you will have to
design a special-purpose language for a highly specific scenario. Such a
language is called a domain-specific language (DSL).

We saw an example of a DSL-entity being created using algebraic


data types.

Our next topic – interpreters, syntax trees, and type inference – is


meant to prepare you for such a scenario.

Spring 2025 © 2025 Ritwik Banerjee 2


Compilers and Interpreters

A compiler is a program that translates code written in a Source


source language into another target language. Usually, the Program
term “compiler” is used for programs where the target
language is a low-level language (e.g., assembly language •Object files
or machine code) to create an executable program. •Linker these are some steps in complier
Compiler •Executable which are not so imp for ths course
• After producing the machine code, the compiler is •Loader

no longer needed. The operating system (OS) loads


and executes the target code.
Machine Program
Code Input

Runtime
Environment

Program
Output

Spring 2025 © 2025 Ritwik Banerjee 3


Compilers and Interpreters

An interpreter directly executes code written in a source


language, without producing any code in a target language.
Some interpreters directly execute code written in a source
language, while others execute an intermediate form Program
known as bytecode. example here is java bytecode Text

• The OS loads and executes the interpreter, and the


interpreter is responsible for executing the source
(or intermediate) code.
INTERPRETER
Program
Output

Program
Input

Spring 2025 © 2025 Ritwik Banerjee 4


Stages of a Compiler

A lexer is responsible for transforming the source code text Program: 1 + (2 + 3)


from a sequence of characters to a sequence of tokens. A
token is a sequence of adjacent characters that form a +
meaning when put together in that order (analogous to / \
“word” in a natural language). The process is called lexing. 1 +
• Tokens include reserved names like if, let, or / \
match (in OCaml); constants like 42 or“hello”; 2 3
variable names like h, t, or lst; punctuation and The parentheses are implicit in the tree structure, so they
other symbols like ; or ;; or ->
are not represented in the tree as separate nodes.
A parser transforms the sequence of tokens into an
abstract syntax tree (AST), which is a tree structure used Program: [1; 2; 3]
to represent the structure of a program. The syntax is
abstract because it does not represent all the details in the list
real syntax, but rather just the structural or content-related / | \
information. 1 2 3
Here are two simple examples of why such trees are
considered to be abstractions of the concrete syntax of the The square brackets and semicolons are not represented
program. in the tree. The parent node already provides the necessary
details of the data type that stores the three values.

Spring 2025 © 2025 Ritwik Banerjee 5


Stages of a Compiler

Program (Euclid’s algorithm to compute GCD):

while b ≠ 0:
if a > b:
a := a - b
else:
b := b - a
return a

By Dcoetzee - Own work, CC0 | source


Spring 2025 © 2025 Ritwik Banerjee 6
Stages of a Compiler

A compiler checks if the program is meaningful as per the rules of the source language. This is done during the compiler’s
semantic analysis, and it includes type checking. This will probably make intuitive sense, since we introduced types to
lambda calculus in order to produce “meaningful” programs with the appropriate invariants. like lamda n. 1/n for n is 0 case as an example.
Type checking typically uses a symbol table, which maps names to their types. This table defines an environment for the
program. When the program enters a new scope, this table is extended with new bindings (each name-type entry is a key-value
pair in the table/map).
• Please keep in mind the concept of shadowing: when a new binding is added, it may shadow an older binding; when the
program exits the scope, the scope’s bindings are removed and the shadowed bindings are not directly accessible again.
• You can think of the symbol table as a standard map/table/dictionary data structure, but with additional stack-like
properties: you can only directly access the names and types of the “top” scope.

Spring 2025 © 2025 Ritwik Banerjee 7


Stages of a Compiler

Semantic analysis also performs other checks. For example: if pattern matching is exhaustive (OCaml), if a Java attribute
marked as private is being accessed outside the class, if a Java attribute marked as final has been initialized in the
constructor, etc.
Note: There is some overlap between parsing (syntax analysis) and semantic analysis. Parsing does just enough work to
produce an AST for a program. Everything else is done by semantic analysis. But, strictly speaking, producing the AST may
also require checking whether the program is meaningful. ex: the woman duck with a telescope.
Programs may require semantic analysis to determine the correct AST. For example, consider this Java/C code snippet:
(expr_a) – token_b
• Is the above line evaluating a numeric expression expr_a and then subtracting another number token_b from it?
ex: expr_a is float i.e
• Or is it performing unary negation of the number token_b, and then explicitly casting it to the data type expr_a?
(float) - token_b
This is not immediately obvious! Additional semantic analysis is needed to determine if expr_a is a variable name or a type
name before the compiler can correctly choose one of the above options.
Typically, the parser and semantic analysis work together: the parser produces an AST with some “ambiguous” nodes, then the
semantic analysis stage modifies the tree so that the ambiguity is resolved. so parsers produces multiple AST and check for semantics and
accepts which one is meaningful.

Spring 2025 © 2025 Ritwik Banerjee 8


Stages of a Compiler

Usually, the compiler does not translate the AST directly to the target
language (even though, in theory, it could).
C++
This is because, in practice, the target language is machine code, and Java
thus, machine-dependent (e.g., x86 or ARM). Instead of translating the AST
to each target, the process is broken up: the compiler first translates the
AST to an intermediate representation, which is an abstraction of several F#
assembly languages. for example javabytecode
• Java bytecode is an example of such an intermediate
representation (IR). Many languages use C as an IR, because C is
essentially an abstraction of assembly (and it is the de facto Intermediate this IR can be translated to corresponding

language of Unix-like operating systems). Representation ML language

• Multiple source languages (e.g., C, C++, Java, OCaml) can be


translated to the same IR. For example: Java bytecode is used as
the IR by compilers of Java, Kotlin, Scala, Closure, Groovy, and Machine code Machine code Machine code
several other languages; the common intermediate language (CIL) for x86 for ARM for RISC-V
is used as IR by all compilers for the .NET framework.
Then, many target language outputs can be produced from that IR.
Note: An IR is an abstraction because it remains unaware
An IR is an abstract machine language, meant for conceptually simple
of constraints like the number of registers available in the
tasks like loading from memory, storing to memory, calling and returning,
or jumping to other instructions. machine, or the specific physical CPU architecture.

Spring 2025 © 2025 Ritwik Banerjee 9


Stages of a Compiler

The final stage is target code generation from the IR. Here, concrete machine instructions (not oblivious to the physical CPU
architecture any more) are selected for machine-dependent optimizations.
Various resource allocation and storage decisions are made while translating the IR to the target (native machine language).
It is common to group the compiler’s stages into its
• front end (lexing, parsing, and semantic analysis, producing the AST and symbol tables);
• middle (CPU-agnostic optimizations on the IR); and
• back end (target code generation and machine-dependent optimizations).

An interpreter works the like front end of a full-fledged compiler: it does lexing, parsing, and semantic analysis. After that, it
can either execute the AST, or transform the AST into an IR and then execute the IR.

Spring 2025 © 2025 Ritwik Banerjee 10


Syntax Analysis: Lexers

It is very unlikely that you will ever need to build a lexer or parser from scratch. Most languages have built-in tools that
automatically generate them from the formal syntax descriptions of the language: lex (or some variant of it) to generate a lexer,
and yacc (or some variant of it) to generate a parser. ex: ocmal lex

Lexer are generated using deterministic finite automata (DFA), which are abstract machines that accept regular languages (and
for our purpose, we can think of them as regular expressions).
• The input to a lexer generator is a set of regular expressions. These describe the tokens of the language. For example, if
we build an interpreter for arithmetic expressions, we will need a regular expression to capture all valid numeric values.
• The output of a lexer generator is a DFA implemented in a higher-level language (C, if we use lex; OCaml, if we use
ocamllex; etc.). This automaton takes strings as its input, i.e., each character in the source program text now becomes a
character input to the automaton. In the end, the automaton either accepts the input string as a valid token in the source
language, or rejects the input string as an invalid token.

Spring 2025 © 2025 Ritwik Banerjee 11


Syntax Analysis: Parsers

Parsers are also built on the theory of automata. They use


what’s known as pushdown automata, which can accept a
Context-free languages
much larger class of languages, known as context-free (accepted by pushdown
languages (CFLs). automata)

• CFLs are a strict superset of regular languages, and


patterns that lie in this “difference” play an
important role in programming language syntax.
• For example, the language defined as ℒ = {𝑎! 𝑏 ! ∶≥
1} is a CFL, but it is not regular.
• This is the language of balanced delimiters (e.g., put Regular languages (accepted
by deterministic finite
𝑎 =‘{’ and 𝑏 =‘}’)! The ability to detect whether automata)
delimiters are balanced is crucial to any syntax-
checking process.
Just like a regular language can be expressed by a regular
expression, a CFL can be expressed by a context-free Balanced
grammar (CFG). parentheses

Spring 2025 © 2025 Ritwik Banerjee 12


Syntax Analysis: Parsers
check these symbol examples
A context-free grammar defines a set of terminal and non-terminal symbols, and a set of production rules describing how
non-terminals can be replaced. Let us consider the language of balanced delimiters again. The valid strings in this language
are of the form {}, {{}}, {}{}, {{}{}{{}}}, etc., while invalid strings include } and {}}}{.
The production rules for this language are
• 𝑆→ 𝑆
• 𝑆 → 𝑆𝑆
• 𝑆→𝜖
where 𝜖 denotes the empty string, 𝑆 is the only non-terminal, and the left/right braces are the terminal symbols.
The standard notation for CFGs is something you have already seen before: the Backus-Naur Form (BNF).check this form in 1st lectures.
• The input to a parser generator is the language syntax described in BNF. This is the output of the lexer.
• The output of a parser generator is a program that recognizes/accepts the language of the grammar.

Spring 2025 © 2025 Ritwik Banerjee 13


Syntax Analysis and Abstract Syntax Trees

Recall how we started our journey into syntax and semantics with the
Backus-Naur Form and a “toy” language of arithmetic expressions. If +
we only care about addition and multiplication, our BNF becomes
e ::= i | e op e | (e) here op is operator + *
i ::= <integers>

Note: the name “integers” is yet undefined. 5 7 2 3


If you connect that material to the second programming assignment,
you may quickly realize that (a) we were able to work with
expressions that have whitespace, parentheses, and multiple
operators, and (b) those expressions were represented as trees that +
represented how the tokens related to each other. E.g., (5 + 7) + 2
* 3 or 6 * 2 + 7.
* 7

6 2

Spring 2025 © 2025 Ritwik Banerjee 14


Syntax Analysis and Abstract Syntax Trees

Recall how we started our journey into syntax and semantics with the How would you define such an expression in OCaml?
Backus-Naur Form and a “toy” language of arithmetic expressions. If Probably something like this:
we only care about addition and multiplication, our BNF becomes type expr =
e ::= i | e op e | (e) | Int of int
i ::= <integers> | Operator of expr * op * expr
and
Note: the name “integers” is yet undefined. op = Add | Mult;;
If you connect that material to the second programming assignment,
We don’t need to model the third component from
you may quickly realize that (a) we were able to work with
our BNF, (e): it gets abstracted out because the tree
expressions that have whitespace, parentheses, and multiple
structure already captures the meaning of the
operators, and (b) those expressions were represented as trees that
parentheses tokens.
represented how the tokens related to each other. E.g., (5 + 7) + 2
* 3 or 6 * 2 + 7. In general, you will see a very close correspondence
between the formal specifications of a language’s
syntax (as expressed using BNF), and the abstract
syntax of that language (as expressed using algebraic
data types).

Spring 2025 © 2025 Ritwik Banerjee 15


Program Evaluation ► Simplification

After lexing and parsing, semantic analysis yields the abstract syntax tree (AST) of a program.
• Next, a compiler rewrites the AST into an intermediate expression (IR). for efficiency tradeoffs. to simplify the AST's into other features which are
easily understandable.
• An interpreter has two options: it may also rewrite the AST into an IR, or it may directly evaluate the AST.
Why rewrite the AST?
To Simplify it. Often, certain language features can be rewritten (i.e., implemented) in terms of other features. It makes sense to
simplify the core language so that we don’t have to worry about too many distinct features. This keeps the core of the
compiler/interpreter smaller.
• One obvious example of such simplification we can see in programming languages is what’s called “syntactic sugar”
(and eliminating them from our set of features is de-sugaring the language)
understand this cncept in detail.

Spring 2025 © 2025 Ritwik Banerjee 16


Program Evaluation ► Simplification

Examples of syntactic sugar public class SomeClass {


• Array access in C: a[i] ► *(a+i)
public static String twice(String s) { return s + s; }
• Augmented assignment in C, C++, Java, C#, and
others: List<String> names = Arrays.asList("Alice", "Bob", "Charlie");
a += b ► a = a + b
// (1) Using lambda expressions
• Extension method in OOP languages like Java, C#,
names.stream().map(name -> name.toUpperCase())
Ruby, VB.Net:
.forEach(s -> System.out.println(s));
myObject.myMethod(arg1, arg2) ►
myMethod(myObject, arg1, arg2) // (2) Using a static method reference
• Ternary conditional operator in C, C++, C#, Java, and // SomeClass::twice is identical to the lambda expression
JavaScript: // s -> SomeClass.twice(s)
names.stream().map(SomeClass::twice)
condition ? val1 : val2 ► if (condition) .forEach(s -> System.out.println(s));
then val1 else val2
• Enhanced for-loop in Java ► iterator implementation // Using instance method references; this is identical to (1)
check these examples
names.stream().map(String::toUpperCase)
• Method reference in Java streams
.forEach(System.out::println);

Spring 2025 © 2025 Ritwik Banerjee 17


Program Evaluation ► Simplification

We have seen and used syntactic sugar in OCaml as well!


let x = e1 in e2 is equivalent to (fun x -> e2) e1
Suppose we had a language with this BNF (corresponding to its AST):
e ::= x | fun x -> e | e1 e2 | let x = e1 in e2
This can be simplified by de-sugaring, so that the interpreter only has to worry about
e ::= x | fun x -> e | e1 e2

Spring 2025 © 2025 Ritwik Banerjee 18


Program Evaluation ► Operational Semantics

Once we have the de-sugared AST, which is the first step of


simplification, it’s time for evaluating the program.
Evaluation is the process of simplifying the AST until no
further reduction is possible. In other words, no further
computation remains to be done because we have arrived
at a value. Recall from the various BNFs we have seen, that
a value is a strict subset of the expressions allowed by a 11
language’s grammar, and it is an irreducible (i.e., terminal)
symbol.
Evaluation is defined in terms of a mathematical relation.
In fact, this relation bears resemblance to the single-step
𝛽-reduction you saw in the evaluation of lambda calculus
expressions/programs.
Unlike denotational semantics, the meaning of a program
is defined in terms of what the program evaluates to. In this
way, the sequence of single-step computations **is** the
meaning of the program. This way of defining the meaning
of a program is called operational semantics.

Spring 2025 © 2025 Ritwik Banerjee 19


Program Evaluation ► Operational Semantics

To illustrate operational semantics using step-by-step evaluation, let us augment our toy arithmetic language in two ways: (i)
include let and if-then-else; and (ii) include the inequality <= as an operator.
In this language, a terminal irreducible value is either an integer or a Boolean constant. The BNF of our language is now
e ::= x | int | bool | e1 op e2
| if e1 then e2 else e3
| let x = e1 in e2
op ::= + | * | <=
v ::= int | bool
bool ::= True | False
Note: We are still using some built-in underlying notions like “what is an integer” and “what are +, *, and <=“.
The operational semantics of such a language can be defined in terms of the reduction (->) relation. Additionally, it helps to
introduce the negated relation (!->) as well.
For example, we can specify that a terminal value cannot be reduced with the rules int !-> and bool !->

Spring 2025 © 2025 Ritwik Banerjee 20


Program Evaluation ► Operational Semantics

Earlier in this course, we saw the stepwise evaluation of our toy arithmetic language. For example,
e1 -> e3 => e1 op e2 -> e3 op e2
This is an example of inductive reasoning: two expressions are related by -> if two other expressions (at least one of which is
simpler) are also related by ->.
This rule reduces the first argument of a binary operator. Once that is complete (i.e., the first argument has been reduced step-
by-step all the way to a terminal value), the evaluation of the second argument can begin:
e2 -> e3 => v1 op e2 -> v1 op e3
And finally, when both arguments are values, we can use the built-in definition of the binary operator to evaluate
v1 op v2 -> v

Spring 2025 © 2025 Ritwik Banerjee 21


Program Evaluation ► Operational Semantics

Previously, it was pointed out that the evaluation bears resemblance to the 𝛽-reduction we saw in lambda calculus. For the
“let” binding, this is literally true:

e1 -> e3 => let x = e1 in e2 -> let x = e3 in e2

and when a terminal value is reached,

let x = v in e2 -> e2 with v1 substituted for x

This is exactly what 𝛽-reduction did! The lambda calculus syntax for the same substitution was [x -> v] (e2).

Note: please look out for the square brackets [ ] surrounding a reduction. Unfortunately, the syntax for lambda calculus’
substitution and the general operational semantics step both use the -> arrow. If we use the square brackets, the context
should tell you that we are using the arrow to indicate 𝛽-reduction in lambda calculus.

Spring 2025 © 2025 Ritwik Banerjee 22


Program Evaluation ► Operational Semantics

In operational semantics, there are actually two (not one) relations that define the meaning of a program/expression.
The first, which we denoted by ->, is called small-step semantics. It formally describes how the individual steps of a
computation take place. This is useful to describe and understand specific details and features of a programming language.
There is also the big-step semantics, sometimes called natural semantics, which describes the final result of a program. This
is a faithful abstraction of the small-step semantics, and it is quite similar to how interpreters are often implemented. If we
denote the big-step semantics by the relation ↦, we can represent it as follows:
!
∀ expressions 𝑒 and values 𝑣, 𝑒 ↦ 𝑣 if and only if there exists a sequence 𝑒" "#$ , 𝑛 ≥ 0, such that 𝑒 = 𝑒$ → ⋯ → 𝑒! = 𝑣.
In other words, if an expression reaches a value through a sequence of well-defined small-step evaluations, then and only then
does it reach that value in a big step.
This is why we call ↦ an abstraction of -> (it just “forgets” the intermediate low-level details and jumps to the final result).

Spring 2025 © 2025 Ritwik Banerjee 23


Program Evaluation ► Operational Semantics

Let’s now tie this to something you have studied earlier.


Exercise
Consider this small language, defined using BNF:
Using the syntax we have been using in
e ::= x | e1 e2 | fun x -> e this topic, how would you similarly
v ::= fun x -> e
define call-by-name semantics for
The functions are anonymous, and the metavariable x is a placeholder for lambda calculus?
other variables (so, technically, we could add another line to this BNF, stating
x ::= <variables>).

What you see above is lambda calculus.

Using big-step evaluation, we can define its semantics as

e1 ↦ fun x -> e and e2 ↦ v2 and [x -> v2](e) => e1 e2 ↦ v

The arguments are reduced to a terminal value before a function is applied.


You know this as call-by-value semantics, which is used in many languages
(including OCaml).

Spring 2025 © 2025 Ritwik Banerjee 24


Evaluation models

What we have seen so far today is the use of substitution rules to perform step-by-step reductions and arrive at the final
“meaning” of a program/expression.
We also concluded (using the BNF for lambda calculus) that this definition of a program’s semantics aligns with call-by-value
semantics. This approach is not very smart, when it comes to practical implementations. It’s “too eager”!
• For instance, we could have a program of the form let x = 5 in e (or equivalently, calling a function with argument 5,
where the function body is the expression e). This evaluation technique requires substituting every occurrence of x in e
(if e is very large, this means parsing a huge amount of text looking for x). What if x never even occurs in e or it occurs
only in a specific branch that never gets evaluated?
• In many scenarios, it is better to be “lazy” and substitute only when the value of a variable is needed for the next step of
computation. Otherwise, the interpreter may be working a lot for nothing.
For this lazy evaluation approach, a data structure called the dynamic environment is used.
• You can think of it as a dictionary (or map), mapping variable names to values. Instead of eager substitutions, the
interpreter looks up the value in this dictionary when the value is needed.

Spring 2025 © 2025 Ritwik Banerjee 25


Type Checking

After lexing and parsing, we jumped into the program evaluation. However, one
extremely important component of the semantic analysis that happens before
evaluation is type checking. This is a major (in fact, it is *the* major) task
within the semantic analysis phase of a compiler/interpreter’s job.

Formally, a type system is a mathematical description of how to determine


whether an expression is well-typed or not. Recall that many programs are
well-defined only for certain data types (which serve as important invariants for
the program). For example, the program fun x -> 1 + x is well-defined in a
strongly typed language only when the type of x is numeric.

If an expression is well-typed, the type system also determines the type of that
expression. For example, the type system of OCaml determines that the type of
fun x -> 1 + x is int -> int.

Spring 2025 © 2025 Ritwik Banerjee 26


Type Checking & Environments

“Prevention is better than cure”: The goal of type checking is to prevent runtime type errors. These errors are detectable at
compile time, so we should never allow them to happen at runtime.
A type checker is a program that implements a type system. It analyzes a program and rejects it if there are any type errors. In
other words, if an error is detectable at compile time, the type checker will simply not allow that program to run.
And to achieve this goal, a type checker uses a static (compile-time) environment, which maps names (in scope) to types, as
opposed to the dynamic (runtime) environment, which maps names to values:

x 5 x int So, we can think of the static environment as an abstraction of the


dynamic environment: we know that x and y are going to be some
y 7 y int int values, we just don’t know which ones.
c false c bool Formally, a type checker is usually formulated using ternary (i.e.,
A dynamic environment A static environment with 3-arguments) relations E ⊢ 𝑒: 𝑡 to mean that the expression 𝑒
maps names to values maps names to type has type 𝑡 in a static environment E. Usually, we read it as “the
environment E shows that 𝑒 has type 𝑡”.

Spring 2025 © 2025 Ritwik Banerjee 27


Type Checking

The evaluation of an expression e is stuck if


1. e is not a value, i.e., computation has not yet terminated, and
2. e is irreducible, i.e., there is no way to move forward with any computation.
A type system’s purpose is to make sure that no expression gets stuck. This is what we call type safety, and type-safe
expressions are what we call ‘well typed’. Let’s define this term precisely, though.
We can denote the static environment as a map, and write x:t to denote that x is bound to t. So, {foo:int, bar:float}
denotes the static environment where foo has type int and bar has type float.
An expression 𝑒 is well-typed in a static environment E if there exists a type 𝑡 such that E ⊢ 𝑒: 𝑡. The type checker’s goal is find
such a type 𝑡, starting with some initial static environment.
Note: For the sake of convenience, we sometimes show analyses with an initial environment that’s empty. In practice, this
is rarely the case. A language will almost always use some built-in names and their types that are in scope. For example,
OCaml will have the names defined in its Stdlib module.

Spring 2025 © 2025 Ritwik Banerjee 28


A Simple Type System

Let’s revisit our toy language, where I now use “i” and “b” for integers and Booleans in the BNF to avoid confusion with the data
types:

e ::= x | i | b | e1 op e2
| if e1 then e2 else e3
| let x = e1 in e2
op ::= + | * | <=

We want to define a type system E ⊢ 𝑒: 𝑡. The only data types are integers and Boolean constants:

t ::= int | bool

The ternary relation will be inductively defined. That is, the type of an expression is based on the type of its sub-expressions.

Spring 2025 © 2025 Ritwik Banerjee 29


A Simple Type System

Three things are fixed, though:

1. E ⊢ 𝑖: int, an integer constant always has type int;

2. E ⊢ 𝑏: bool, a Boolean constant always has type bool; and

3. 𝑥: 𝑡, … ⊢ 𝑥: t, a variable has whatever type its static environment dictates.

Everything else follows inductively from these three.

Spring 2025 © 2025 Ritwik Banerjee 30


A Simple Type System

Consider the let binding.

let x = 5 in x + x;;

The constant 5 has type int (this was one of our three base rules). That is, E = 5: int .

And, E ∪ x: int shows that the expression x + x has type int.

We can formalize this as the following rule:


E ⊢ 𝑒1: 𝑡1 ∧ E ∪ 𝑥: 𝑡1 ⊢ 𝑒2: 𝑡2 ⇒ 𝐸 ⊢ let 𝑥 = 𝑒1 in 𝑒2: 𝑡2

In other words, if an expression e1 has type t1 in the environment E, and an expression e2 has type t2 in an environment that is
E plus the variable x being bound to the type t1, then “let x = e1 in e2” has type t2.

Exercise
Similarly, write down the type system rules for (i) the binary operators, and (ii) the if construct.

Spring 2025 © 2025 Ritwik Banerjee 31


Type Inference

OCaml and Java are both statically typed languages. Thus, type checking is a compile-time process that either accepts or
rejects a program (unlike dynamically typed languages like Python or JavaScript).
But unlike Java (and other languages like C, C++, etc.), Ocaml is implicitly typed. A programmer does not usually need to
specify the data types, as the types are inferred. This is possible due to the sophistication of the type checker, which can figure
out what the types would have been, if the programmer had correctly specified them in the program text.
• Such specifications are often called type annotations.
Type inference and type checking are usually rolled into a single process called type reconstruction. At a very high level, it
works as follows:
1. Determine the types of later definitions using the types of earlier definitions. E.g., determine the type of fun x -> 1 + x to
be int -> int by using the fact that the type of 1 + x is int.
2. For each “let” definition, use the definition to determine the constraints about its type. E.g., determine the type of x to be
int, based on the constraint imposed by the expression 1 + x. The set of all constraints (from if-then-else, pattern
matches, etc.) form a system of equations.
3. Use the system of equations to solve for the type of the name being defined.

Spring 2025 © 2025 Ritwik Banerjee 32


Type Inference

The system just described is called the Hindley-Milner type system. We will not go deeper into this, and we will also not
formally explore how type reconstruction handles mutable data types.
But we will use the intuition of this algorithmic approach to reconstruct/infer the types of certain programs and expressions.
Here are some example exercises (the goal is to figure out the type of the name shown in red):
let double x = 2 * x;;
let square x = x * x;;
let twice f x = f (f x);;
let quad = twice double;;
let fourth = twice square;;
let rec f x = g x and g x = f x

Spring 2025 © 2025 Ritwik Banerjee 33


Type Inference

let thrice f x = f(f(f(x)));;


let composition f g x = f(g(x));;
let triple3 = thrice tripleFloat;;
let f list =
let rec aux acc = function
| [] -> acc
| h::t -> aux (h::acc) t in aux [] list;;
let rec f g lst = if (lst = []) then []
else (let h = List.hd lst in
let t = List.tl lst in
(if (g h) then (g h)::(f g t) else f g t));;

Spring 2025 © 2025 Ritwik Banerjee 34

You might also like