Chapter 1 - Introduction
Chapter 1 - Introduction
(CoSc-3112)
Chapter One
Introduction
Compiler is an executable program that can read a program in one high-level language
usually a high-level
A target language
program/code .
is a program/code written Source Target
program Compiler
program
in the target language, which often is a machine e.g. C++ e.g.
Assembly
language or an intermediate code (Object Code). Error
message
Cousins of Compilers
tree.
Breaks up source program into constituent pieces
During synthesis, the operations involved in producing translated code.
1. Lexical Analysis Imposes a grammatical structure on these pieces
2. Syntax Analysis Analysis
End
Fron
5. Optimization Synthesis
Bac
Takes the tree structure and translates the operations into the target
k
program
Various phases in a compilation
Analysis
1. Linear/Lexical analysis (Scanning)
The stream of characters is read from left to right and grouped into tokens.
A token is a sequence of characters having a collective meaning.
Token is the basic lexical unit.
Examples:
• Identifiers are variables
• Keywords
• Symbols (+, -, …)
• Numbers
• Etc…
Blanks, new lines, tabulation marks will be removed during lexical analysis.
Example
DIST1 = DIST2 + 5 * 4
<IDENT,1> <ASSIGN> <IDENT,2> <PLUS> <NUMB,3> <MULT> <NUMB,4>
2. Hierarchical/Syntax analysis (Parsing)
Tokens are grouped hierarchically into nested collections with collective
meaning.
The result is generally a parse tree.
Most syntactic errors in the source program are caught in this phase.
Syntactic rules of the source language are given via a Grammar.
Consider the previous example: DIST1
= DIST2 + 5 * 4
IDENT ASSIGN IDENT PLUS NUMB MULT NUMB
3. Semantic
analysis
Certain checks are performed to make sure that the components of the program fit
together meaningfully.
Unlike parsing, this phase checks for semantic errors in the source program (e.g.
type mismatch)
Semantic analysis uses the symbol table.
Symbol table:- is a data structure with a record for each identifier and its
attributes
• Attributes include storage allocation, type, scope, etc
• All the compiler phases insert and modify the symbol table
The result of semantic analysis is Intermediate Code (IC).
The IC can be represented using either abstract tree or Three address
code
The previous example in three address
code: TEMP1 = 5 * 4
TEMP2 = inttoreal( TEMP1 )
TEMP3 = IDENT2 + TEMP2
IDENT1 = TEMP3
Synthesis
Synthesis is composed of two phases:
1. Code optimization
2. Code generation
3. Code optimization
This phase changes the IC so that the code generator produces faster and less
memory
consuming program.
The optimized code does the same thing as the original (non-optimized) code but with less cost
in terms of CPU time and memory space.
There are several techniques of optimizing code and they will be discussed in the last chapter.
Example
Unnecessary lines of code in loops (i.e. code that could be executed outside of the loop) are moved
out of thefor(i=1;
loop. i<10; i++) x = y+1;
{ x = y+1; for(i=1; i<10; i+
z = x+i; +)
} z = x+i;
2. Code generation
The final phase of the compiler.
Generates the target code in the target language (e.g. Assembly)
The instructions in the IC are translated into a sequence of machine instructions
that
perform the same task.
Phase I: Lexical
Analysis
Typically, spaces, tabs, end-of-line characters and comments are ignored by the lexical analyzer.
To design a lexical analyzer: input a description (regular expressions) of the tokens in the language, and
output a lexical analyzer (a program).
Phase II: Parsing (Syntax
Analysis)
A parser gets a stream of tokens from the scanner, and determines if the syntax (structure) of the
program is correct according to the (context-free) grammar of the source language.
Then, it produces a data structure, called a parse tree or an abstract syntax tree, which describes the
if if (x > 3) then x = x + 1
Context-free grammars will be used (as the input) by the parser generator to describe the syntax of
the compiling language
Most compilers do not generate a parse tree explicitly but rather go to intermediate code directly as
Parse Tree
Is output of parsing that shows the Top-down description of program syntax
Root node is entire program and leaves are tokens that were identified during
lexical
analysis
It gets the parse tree from the parser together with information about some syntactic elements
It determines if the semantics (meanings) of the program is correct.
It detects errors of the program, such as using variables before they are declared, assign an
integer value to a Boolean variable, …
In some compilers, a source program is translated into an intermediate code first and then the
Compiler makes a second pass over the parse tree to produce the translated code
If there are no compile-time errors, the semantic analyzer translates the abstract syntax tree into the
The abstract assembly tree will be passed to the code optimization and assembly code generation
phase
Contd.
Using intermediate code is beneficial when compilers which translates a single source
Code generator coverts the abstract assembly tree into the actual assembly code
To do code generation
The generator covers the abstract assembly tree with tiles (each tile represents a small portion of
Output the actual assembly code associated with the tiles that we used to cover the tree
The final phase of compilation coverts the assembly code into machine code and links (by a linker) in
The compiler looks at a very small block of instructions and tries to determine how it
can improve the efficiency of this local code block
1. Constant evaluation
2. Strength reduction
The compiler looks at large segments of the program to decide how to improve
performance
Much more difficult; usually omitted from all but the
most sophisticated and expensive production- level “optimizing compilers”
Optimization cannot make an inefficient algorithm efficient
Compiler construction tools
More specialized tools have been created to help implement various phases of a compiler.