Compiler Construction
Compiler Construction
USMAN ASGHAR
Policies and Guidelines
Attendance policy: marking at start
Plagiarism policy: as per outline
Do’s
◦ Be interactive, ask questions
◦ Participate in the lecture
◦ Relax and learn
Don’ts
◦ Use of cell phones
◦ Discussion with fellows during class (unless told otherwise)
◦ Early leave (may result in absent)
Compilers
A compiler is a program that can read a program in one language - the source language
- and translate it into an equivalent program in another language - the target language
Compilers
An important role of the compiler is to report any errors in the source program that it
detects during the translation process.
If the target program is an executable machine-language program, it can then be called
by the user to process inputs and produce outputs;
Interpreters
An interpreter translates the code line by line when the program is running.
Compiler vs Interpreter
A compiler takes an entire program and a lot of time to analyze the source code,
whereas the interpreter takes a single line of code and very little time to analyze it.
Compiler generates intermediate object code whereas interpreter does not produce
any intermediate object code
Memory requirement is more due to the creation of intermediate object code whereas
an interpreter requires as it does not create intermediate object code.
A compiler displays all errors after compilation. If your code has mistakes, it will not
compile. But the interpreter displays errors of each line one by one
Compiler vs Interpreter
Language Compiler
C++ gcc
g++ your_program.cpp -o output_executable
C# C# compiler
Java Javac – part of JDK (java development kit)
Fortran Gfortran– GCC compiler Language Interpreter
Swift swiftc python cpython
bash Bash interpreter
Bourne Again SHell
JavaScript MonkeySpider (Mozilla Firefox)
V8 (Google Chrome)
Example of Java Compilation Process
Java language processors combine compilation and interpretation.
A Java source program may first be compiled into an intermediate form called
bytecodes. The bytecodes are then interpreted by a virtual machine.
Example of Java Compilation Process
A benefit of this arrangement is that bytecodes
compiled on one machine can be interpreted on
another machine.
Why need to study Compiler Construction?
1. Understand how programming language works
2. Optimize code
3. Career Opportunities
4. Academic and Research
Cousins of Compiler / Language Processing System
In addition to a compiler, several other programs may be required to create an
executable target program
Preprocessor
A preprocessor is a tool that produces input for compilers
A source program may be divided into modules stored in separate files. The task of collecting
the source program is sometimes entrusted to a separate program, called a preprocessor.
File Inclusion: A preprocessor may also include header files into the program text like
#include<iostream>
Macro Processing: The preprocessor may also expand shorthand called macros into source
language statements like #define PI 3.14
The modified source program is then fed to a compiler.
Cousins of Compiler / Language Processing System
Compiler
The compiler may produce an assembly language that is easier to produce as output and
easier to debug
Assembler
The assembly language is then processed by a program called an assembler that produces
relocatable machine code as its output
The term "relocatable" comes from the fact that this type of code can be relocated, or
moved, to different memory addresses without requiring extensive modifications to the
code itself.
Cousins of Compiler / Language Processing System
Linker/Loader
The linker is responsible for combining multiple object files (containing relocatable code) into
a single executable program.
It resolves external references. If one part of the code refers to a function or variable in
another part, the linker makes sure those references are correctly connected.
The loader's primary task is to place the combined program (executable code) into memory
for execution.
The Structure of a Compiler (Lexical Analysis)
The first phase of a compiler is called lexical analysis or scanning
The lexical analyzer reads the stream of characters
Groups the characters into meaningful sequences called a lexeme
For each lexeme, the lexical analyzer produces as output a token of the form
<token-name, attribute-value>
that it passes on to the subsequent phase, syntax analysis
The Structure of a Compiler (Lexical Analysis)
In the token, the first component token-name is an abstract symbol that is used during
syntax analysis, and the second component attribute-value points to an entry in the
symbol table for this token.
Information from the symbol-table entry is needed for semantic analysis and code
generation
position = initial + rate * 60
The characters in this assignment could be grouped into the following lexemes and
mapped into the following tokens passed on to the syntax analyzer
The Structure of a Compiler (Lexical Analysis)
position = initial + rate * 60
The characters in this assignment could be grouped into the following lexemes and
position is a lexeme that would be mapped into a token <id, 1>, where id is an abstract
symbol standing for identifier and 1 point to the symbol table entry for position
The assignment symbol = is a lexeme that is mapped into the token <=>. Since this
token needs no attribute value, we have omitted the second component.
initial a lexeme that is mapped into the token <id, 2>, where 2 points to the symbol-
table entry for initial.
The Structure of a Compiler (Lexical Analysis)
position = initial + rate * 60
+ is a lexeme that is mapped into the token <+>
rate is a lexeme that is mapped into the token <id, 3>, where 3 points to the symbol-
table entry for rate
* is a lexeme that is mapped into the token <*>
60 is a lexeme that is mapped into the token <60>
Blanks separating the lexemes would be discarded by the lexical analyzer.
<id,1> <=> <id,2> <+> <id,3> <*> <60>
The Structure of a Compiler (Syntax Analysis)
The second phase of the compiler is syntax analysis or parsing
The parser uses the first components of the tokens produced by the lexical analyzer to
create a tree-like intermediate representation that depicts the grammatical structure
of the token stream.
<Id,1> <=> <id,2> <+> <id,3> * <60>
The Structure of a Compiler (Syntax Analysis)
<Id,1> <=> <id,2> <+> <id,3> * <60>
Syntax tree
The Structure of a Compiler (Semantic Analysis)
The semantic analysis phase checks the source program for semantic error and gather
type information for the code-generation phase
An important part of semantic analysis is type checking
The Structure of a Compiler (Intermediate Code
Generator)
After syntax and semantic analysis of the source program, many compilers
generate an explicit low-level or machine-like intermediate representation.
This intermediate representation should have two important properties:
it should be easy to produce
it should be easy to translate into the target machine
The Structure of a Compiler (Intermediate Code
Generator)
Intermediate form called three-address code, which consists of a sequence of
assembly-like instructions with three operands per instruction.
The output of the intermediate code generator
The Structure of a Compiler (Intermediate Code
Generator)