Compiler Construction Complete Notes
Compiler Construction Complete Notes
Compiler Construction
1: Compiler Techniques and Methodology
What is a Compiler?
A compiler is a computer program which helps
you transform source code written in a high-level language into low-
level machine language. It translates the code written in one
programming language to some other language without changing the
meaning of the code.
Features of Compilers:
1. Correctness
2. Speed of compilation
3. The speed of the target code
4. Code debugging help
Types of Compiler:
Following are the different types of Compiler:
1) Single Pass Compilers
2) Two Pass Compilers
3) Multipass Compilers
{HT}
Single Pass Compiler:
In single pass Compiler source code directly transforms into machine code.
For example, Pascal language.
Multipass Compilers:
The multipass compiler processes the source code or syntax tree of a program
several times. It divided a large program into multiple small programs and
process them. It develops multiple intermediate codes. All of these multipass
take the output of the previous phase as an input. So it requires less memory.
It is also known as ‘Wide Compiler’.
2: Organization of Compilers
The organization of
compilers in compiler construction involves breaking down the compiler
into several distinct phases or components, each responsible for specific
tasks in the process of translating a high-level programming language
into machine code or an intermediate representation. The traditional
organization of compilers follows a structure known as the "compiler
front end" and "compiler back end."
Structure of a compiler:
Any large software is easier
to understand and implement if it is divided into well-defined modules.
{HT}
Front End:
Optimization:
Back End:
Additional Considerations:
2. Components:
Lexer/Tokenizer: This is the component responsible for
scanning the source code and identifying the tokens.
Regular Expressions: These rules define the patterns for
different types of tokens.
Syntax Analysis:
1. Purpose:
Grammar Verification: Syntax analysis checks whether the
sequence of tokens generated by the lexical analysis
conforms to the grammatical structure of the programming
language.
AST Construction: It builds a hierarchical structure called
the Abstract Syntax Tree (AST) that represents the syntactic
structure of the program.
2. Components:
Parser: The parser is responsible for analyzing the
arrangement of tokens and ensuring that it follows the syntax
rules of the language.
Context-Free Grammar (CFG): Syntax rules are often
specified using CFG, which describes the syntactic structure
of the language.
Error Handling: The syntax analysis phase detects and
reports syntax errors.
4. Output:
{HT}
The output of syntax analysis is the AST, which serves as the
basis for subsequent phases like semantic analysis,
optimization, and code generation.
Example:
E→E+E
E→E–E
E → id
For the string id + id – id, the above grammar generates two parse trees:
Special Symbols:
Most of the high-level languages contain some special symbols, as
shown below:
Name Symbols
Punctuation Comma(,), Semicolon(:)
Assignment =
Special Assignment +=, -=, *=, /=
Comparison ==, ≠, <, >, ≤, ≥
Preprocessor #
Location Specifier &
Logical &&, |, ||, !
{HT}
Name Symbols
Shift Operator >>, <<, >>>, <<<
4: Parsing Techniques
The process of
transforming the data from one format to another is called Parsing. This
process can be accomplished by the parser. The parser is a component of
the translator that helps to organize linear text structure following the set
of defined rules which is known as grammar.
Types of Parsing:
Top-down Parsing:
When the parser generates a
parse with top-down expansion to the first trace, the left-most derivation
of input is called top-down parsing. The top-down parsing initiates with
the start symbol and ends on the terminals. Such parsing is also known
as predictive parsing.
{HT}
Bottom-up Parsing:
The bottom-up parsing
works just the reverse of the top-down parsing. It first traces the
rightmost derivation of the input until it reaches the start symbol.
{HT}
Shift-Reduce Parsing:
Shift-reduce parsing
works on two steps: Shift step and Reduce step.
a. Shift step:
The shift step indicates the increment of the input pointer to
the next input symbol that is shifted.
b. Reduce Step:
When the parser has a complete grammar rule on the right-
hand side and replaces it with RHS.
LR Parsing:
LR parser is one of the
most efficient syntax analysis techniques as it works with context-free
grammar. In LR parsing L stands for the left to right tracing, and R
stands for the right to left tracing.
// C program
int x = 10;
int y = 20;
int z = x + y;
(i) A = 2*(22.0/7.0)*r
Perform 2*(22.0/7.0)*r at compile time.
(ii) x = 12.4
y = x/2.3
Evaluate x/2.3 as 12.4/2.3 at compile time.
2. Variable Propagation:
//Before Optimization
c=a*b
x=a
till
{HT}
d=x*b+4
//After Optimization
c=a*b
x=a
till
d=a*b+4
3. Constant Propagation:
If the value of a
variable is a constant, then replace the variable with the constant. The
variable may not always be a constant.
Example:
(i) A = 2*(22.0/7.0)*r
Performs 2*(22.0/7.0)*r at compile time.
(ii) x = 12.4
y = x/2.3
Evaluates x/2.3 as 12.4/2.3 at compile time.
4. Copy Propagation:
It is extension of
constant propagation. It helps in reducing the compile time as it reduces
copying.
Example:
//Before Optimization
c=a*b
x=a
till
d=x*b+4
//After Optimization
c=a*b
{HT}
x=a
till
d=a*b+4
Example:
//Before Optimization
c=a*b
x=a
till
d=a*b+4
//After elimination:
c=a*b
till
d=a*b+4
7. Function Cloning:
Here, specialized codes
for a function are created for different calling parameters.
1. Error Detection:
Lexical Errors:
Definition: Lexical errors involve invalid characters or
token sequences.
Detection: Lexical analyzers (scanners) examine the source
code and identify errors by recognizing characters that do not
form valid tokens or violate lexical rules.
Syntax Errors:
Definition: Syntax errors occur when the input source code
violates the grammar rules of the programming language.
{HT}
Detection: Syntax analyzers (parsers) detect these errors
during the parsing phase by analyzing the structure of the
code.
Semantic Errors:
Definition: Semantic errors involve violations of the
language's semantics, such as using a variable before it is
declared.
Detection: Semantic analysis identifies these errors during
the semantic analysis phase.
4. User-Defined Errors:
Definition: Compilers may allow programmers to define custom
error-handling routines or specify error-handling behavior.
Purpose: Provides flexibility in handling errors based on the
specific requirements of a programming project.
Compiler:
The Compiler is a translator which takes input i.e., High-
Level Language, and produces an output of low-level language i.e.
machine or assembly language. The work of a Compiler is to transform
the codes written in the programming language into machine code
(format of 0s and 1s) so that computers can understand.
A compiler is more intelligent than an assembler it checks all kinds
of limits, ranges, errors, etc.
But its program run time is more and occupies a larger part of
memory.
Advantages of Compiler:
Compiled code runs faster in comparison to Interpreted code.
Compilers help in improving the security of Applications.
Disadvantages of Compiler:
The compiler can catch only syntax errors and some semantic
errors.
Compilation can take more time in the case of bulky code.
Interpreter:
An Interpreter is a program that translates a programming language
into a comprehensible language. The interpreter converts high-level
{HT}
language to an intermediate language. It contains pre-compiled code,
source code, etc.
It translates only one statement of the program at a time.
Interpreters, more often than not are smaller than compilers.
Advantages of Interpreter:
Programs written in an Interpreted language are easier to debug.
Interpreted Language is more flexible than a Compiled language.
Disadvantages of Interpreter:
The interpreter can run only the corresponding Interpreted
program.
Interpreted code runs slower in comparison to Compiled code.
{HT}