0% found this document useful (0 votes)
52 views22 pages

Compiler Construction Complete Notes

A compiler is a program that translates high-level source code into low-level machine language, ensuring correctness and speed. It consists of various types, including single pass, two pass, and multipass compilers, and follows stages such as lexical analysis, syntax analysis, semantic analysis, code generation, and optimization. Error detection and recovery are crucial in compiler construction, with mechanisms to handle lexical, syntax, and semantic errors.

Uploaded by

Johny Webs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views22 pages

Compiler Construction Complete Notes

A compiler is a program that translates high-level source code into low-level machine language, ensuring correctness and speed. It consists of various types, including single pass, two pass, and multipass compilers, and follows stages such as lexical analysis, syntax analysis, semantic analysis, code generation, and optimization. Error detection and recovery are crucial in compiler construction, with mechanisms to handle lexical, syntax, and semantic errors.

Uploaded by

Johny Webs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

{HT}

Compiler Construction
1: Compiler Techniques and Methodology

What is a Compiler?
A compiler is a computer program which helps
you transform source code written in a high-level language into low-
level machine language. It translates the code written in one
programming language to some other language without changing the
meaning of the code.

Features of Compilers:
1. Correctness
2. Speed of compilation
3. The speed of the target code
4. Code debugging help

Types of Compiler:
Following are the different types of Compiler:
1) Single Pass Compilers
2) Two Pass Compilers
3) Multipass Compilers
{HT}
Single Pass Compiler:

In single pass Compiler source code directly transforms into machine code.
For example, Pascal language.

Two Pass Compiler:


Two pass Compiler is divided into two sections, viz.
1. Front end: It maps legal code into Intermediate Representation (IR).
2. Back end: It maps IR onto the target machine

Multipass Compilers:

The multipass compiler processes the source code or syntax tree of a program
several times. It divided a large program into multiple small programs and
process them. It develops multiple intermediate codes. All of these multipass
take the output of the previous phase as an input. So it requires less memory.
It is also known as ‘Wide Compiler’.

Steps for Language processing systems:


{HT}
Before knowing about the concept of
compilers, you first need to understand a few other tools which work
with compilers.

Advantages of Compiler Design:


1. Efficiency:
3. Error Checking:
4. Optimizations:
Disadvantages of Compiler Design:
1. Longer Development Time:
2. Debugging Difficulties:
3. Platform-Specific Code:

Compiler Techniques and Methodology:


Compiler techniques and
methodology are the principles and practices that guide the design and
implementation of compilers.

Stages of Compiler Techniques and Methodology:


{HT}
 Scanning and parsing: These are the processes of analyzing
the syntax and structure of the source code and building an intermediate
representation, such as an abstract syntax tree, that captures its
meaning.
 Semantic analysis: This is the process of checking the validity
and consistency of the source code, such as type checking, scope
checking, and name resolution.
 Code generation: This is the process of translating the
intermediate representation into executable code for the target machine
or platform, such as assembly language or bytecode.
 Optimization: This is the process of improving the quality and
performance of the executable code by applying various techniques,
such as data flow analysis, register allocation, instruction scheduling,
and loop transformation.

2: Organization of Compilers
The organization of
compilers in compiler construction involves breaking down the compiler
into several distinct phases or components, each responsible for specific
tasks in the process of translating a high-level programming language
into machine code or an intermediate representation. The traditional
organization of compilers follows a structure known as the "compiler
front end" and "compiler back end."

Structure of a compiler:
Any large software is easier
to understand and implement if it is divided into well-defined modules.
{HT}

Front End:

 Lexical Analysis (Scanner): This is the first phase, where the


source code is broken down into a sequence of tokens.
 Syntax Analysis (Parser): This phase checks whether the
sequence of tokens adheres to the grammatical structure of the
programming language.
 Semantic Analysis: This phase checks the meaning of the
statements and expressions in the program. It ensures that the
{HT}
program follows the language's semantics and performs tasks like
type checking.

Intermediate Code Generation:

 After the front end, the compiler may generate an intermediate


representation (IR) of the program. The IR is an abstraction that
simplifies the source code while preserving its essential meaning.

Optimization:

 The compiler performs various optimizations on the intermediate


code to improve the efficiency of the generated machine code.

Back End:

 Code Generation: In this phase, the compiler generates the


target machine code or assembly code from the optimized
intermediate code.
 Code Optimization (Machine-Dependent): This phase
optimizes the generated machine code for the specific target
architecture. It may include instruction scheduling, register
allocation, and other architecture-specific optimizations.
 Code Emission: The final step involves emitting the machine
code or generating an executable file from the optimized code.

Additional Considerations:

 Error Handling: Throughout the compilation process, compilers


must handle errors gracefully, providing meaningful error
messages.
 Debugging Information: Compilers often include information
in the executable to aid in debugging, such as source code line
numbers or variable names.
{HT}
 Cross-Compilation: Some compilers support generating code
for a different target architecture than the one on which the
compiler itself runs.

3: Lexical and Syntax Analysis


Lexical analysis and
syntax analysis are two crucial phases in the process of compiler
construction. They are responsible for analyzing the source code of a
programming language and converting it into a form that can be further
processed by the compiler.
Lexical Analysis:
1. Purpose:
 Tokenization: The main goal of lexical analysis is to break
down the source code into a sequence of tokens. Tokens are
the smallest units of meaning in a programming language,
such as keywords, identifiers, literals, and operators.

2. Components:
 Lexer/Tokenizer: This is the component responsible for
scanning the source code and identifying the tokens.
 Regular Expressions: These rules define the patterns for
different types of tokens.

3. Steps in Lexical Analysis:


 Scanning: The lexer scans the source code character by
character.
 Token Recognition: It recognizes and categorizes sequences
of characters into tokens based on predefined rules.
 Error Handling: Lexical analysis also involves detecting
and reporting lexical errors, such as invalid characters or
tokens.
{HT}
4. Output:
 The output of lexical analysis is a stream of tokens that
serves as input for the subsequent phases of the compiler.

Syntax Analysis:
1. Purpose:
 Grammar Verification: Syntax analysis checks whether the
sequence of tokens generated by the lexical analysis
conforms to the grammatical structure of the programming
language.
 AST Construction: It builds a hierarchical structure called
the Abstract Syntax Tree (AST) that represents the syntactic
structure of the program.

2. Components:
 Parser: The parser is responsible for analyzing the
arrangement of tokens and ensuring that it follows the syntax
rules of the language.
 Context-Free Grammar (CFG): Syntax rules are often
specified using CFG, which describes the syntactic structure
of the language.
 Error Handling: The syntax analysis phase detects and
reports syntax errors.

3. Steps in Syntax Analysis:


 Parsing: The parser processes the stream of tokens generated
by the lexer and checks whether it conforms to the language's
syntax rules.
 Error Reporting: Syntax analysis also involves reporting
detailed error messages when syntax errors are encountered.

4. Output:
{HT}
 The output of syntax analysis is the AST, which serves as the
basis for subsequent phases like semantic analysis,
optimization, and code generation.

Example:
E→E+E
E→E–E
E → id
For the string id + id – id, the above grammar generates two parse trees:

Special Symbols:
Most of the high-level languages contain some special symbols, as
shown below:
Name Symbols
Punctuation Comma(,), Semicolon(:)
Assignment =
Special Assignment +=, -=, *=, /=
Comparison ==, ≠, <, >, ≤, ≥
Preprocessor #
Location Specifier &
Logical &&, |, ||, !
{HT}
Name Symbols
Shift Operator >>, <<, >>>, <<<

Now we will understand with proper code of C++.


#include <iostream>
int maximum (int x, int y) {
// This will compare two numbers
if (y > x)
return y;
else {
return x;
}
}

4: Parsing Techniques
The process of
transforming the data from one format to another is called Parsing. This
process can be accomplished by the parser. The parser is a component of
the translator that helps to organize linear text structure following the set
of defined rules which is known as grammar.

The process of Parsing:


{HT}

Types of Parsing:

There are two


types of
Parsing:
1) The Top-down Parsing
2) The Bottom-up Parsing

Top-down Parsing:
When the parser generates a
parse with top-down expansion to the first trace, the left-most derivation
of input is called top-down parsing. The top-down parsing initiates with
the start symbol and ends on the terminals. Such parsing is also known
as predictive parsing.
{HT}

 Recursive Descent Parsing: Recursive descent parsing is a


type of top-down parsing technique. This technique follows the
process for every terminal and non-terminal entity. It reads the
input from left to right and constructs the parse tree from right to
left.
 Back-tracking: The parsing technique that starts from the initial
pointer, the root node. If the derivation fails, then it restarts the
process with different rules.

Bottom-up Parsing:
The bottom-up parsing
works just the reverse of the top-down parsing. It first traces the
rightmost derivation of the input until it reaches the start symbol.
{HT}

Shift-Reduce Parsing:
Shift-reduce parsing
works on two steps: Shift step and Reduce step.
a. Shift step:
The shift step indicates the increment of the input pointer to
the next input symbol that is shifted.
b. Reduce Step:
When the parser has a complete grammar rule on the right-
hand side and replaces it with RHS.

LR Parsing:
LR parser is one of the
most efficient syntax analysis techniques as it works with context-free
grammar. In LR parsing L stands for the left to right tracing, and R
stands for the right to left tracing.

Why is parsing useful in compiler designing?


In the world of
software, every different entity has its criteria for the data to be
processed. So parsing is the process that transforms the data in such a
way so that it can be understood by any specific software.

The Technologies Use Parsers:


 The programming languages like Java.
 The database languages like SQL.
 The protocols like HTTP.
 The XML and HTML.
{HT}
5: Object Code Generation and Optimization
Object code
generation and optimization are crucial phases in the process of compiler
construction. These phases are responsible for translating high-level
programming languages into machine code that can be executed by a
computer's hardware efficiently.

Code generation and optimization involves several


stages:
1. Intermediate Code Generation: The front-end of the compiler
generates an intermediate representation of the source code.
2. Intermediate Code Optimization: Some compilers perform
initial optimization on the intermediate code before generating the
object code.
3. Object Code Generation: The optimized intermediate code is
translated into machine code or assembly language.
4. Final Code Optimization: Further optimizations are applied to
the generated object code to improve performance.
Example of object code generation and optimization for a C
program:

// C program
int x = 10;
int y = 20;
int z = x + y;

// Intermediate code (three-address code)


t1 = 10
t2 = 20
t3 = t1 + t2
x = t1
y = t2
z = t3
{HT}
// Object code (x86 assembly)
mov eax, 10; t1 = 10
mov ebx, 20; t2 = 20
add eax, ebx; t3 = t1 + t2
mov [x], eax; x = t1
mov [y], ebx; y = t2
mov [z], eax; z = t3

// Optimized object code (x86 assembly)


mov eax, 10; x = 10
mov ebx, 20; y = 20
add eax, ebx; z = x + y
mov [x], eax; store x
mov [y], ebx; store y
mov [z], eax; store z

Code Optimization is done in the following different


ways:

1. Compile Time Evaluation:

(i) A = 2*(22.0/7.0)*r
Perform 2*(22.0/7.0)*r at compile time.
(ii) x = 12.4
y = x/2.3
Evaluate x/2.3 as 12.4/2.3 at compile time.

2. Variable Propagation:

//Before Optimization
c=a*b
x=a
till
{HT}
d=x*b+4
//After Optimization
c=a*b
x=a
till
d=a*b+4

3. Constant Propagation:
If the value of a
variable is a constant, then replace the variable with the constant. The
variable may not always be a constant.

Example:
(i) A = 2*(22.0/7.0)*r
Performs 2*(22.0/7.0)*r at compile time.
(ii) x = 12.4
y = x/2.3
Evaluates x/2.3 as 12.4/2.3 at compile time.

4. Copy Propagation:
It is extension of
constant propagation. It helps in reducing the compile time as it reduces
copying.

Example:
//Before Optimization
c=a*b
x=a
till
d=x*b+4

//After Optimization
c=a*b
{HT}
x=a
till
d=a*b+4

5. Common Sub Expression Elimination:


In the above
example, a*b and x*b is a common sub expression.

6. Dead Code Elimination:


Copy propagation often
leads to making assignment statements into dead code.

Example:
//Before Optimization
c=a*b
x=a
till
d=a*b+4
//After elimination:
c=a*b
till
d=a*b+4

7. Function Cloning:
Here, specialized codes
for a function are created for different calling parameters.

Example: Function Overloading

7: Detection and Recovery from Errors


In compiler
construction, error detection and recovery mechanisms play a crucial
{HT}
role in ensuring that a compiler can handle erroneous input and produce
meaningful output. Errors can occur at various stages of the compilation
process, such as lexical analysis, syntax analysis, semantic analysis, and
code generation.

Error Detection and Recovery in Compiler Construction:

1. Error Detection:

 Lexical Errors:
 Definition: Lexical errors involve invalid characters or
token sequences.
 Detection: Lexical analyzers (scanners) examine the source
code and identify errors by recognizing characters that do not
form valid tokens or violate lexical rules.

 Syntax Errors:
 Definition: Syntax errors occur when the input source code
violates the grammar rules of the programming language.
{HT}
 Detection: Syntax analyzers (parsers) detect these errors
during the parsing phase by analyzing the structure of the
code.

 Semantic Errors:
 Definition: Semantic errors involve violations of the
language's semantics, such as using a variable before it is
declared.
 Detection: Semantic analysis identifies these errors during
the semantic analysis phase.

2. Panic Mode Recovery:


 Definition: Panic mode recovery involves discarding tokens until
a synchronizing token is found.
 Purpose: It helps the compiler recover from a syntax error and
continue parsing the source code.

3. Code Generation and Optimization Errors:


 Definition: Errors in later stages may involve incorrect
translations or inefficient code generation.
 Handling: The compiler detects and reports these errors to ensure
the generation of correct and optimized machine code.

4. User-Defined Errors:
 Definition: Compilers may allow programmers to define custom
error-handling routines or specify error-handling behavior.
 Purpose: Provides flexibility in handling errors based on the
specific requirements of a programming project.

8: Contrast between Compilers and Interpreters


{HT}

Compiler:
The Compiler is a translator which takes input i.e., High-
Level Language, and produces an output of low-level language i.e.
machine or assembly language. The work of a Compiler is to transform
the codes written in the programming language into machine code
(format of 0s and 1s) so that computers can understand.
 A compiler is more intelligent than an assembler it checks all kinds
of limits, ranges, errors, etc.
 But its program run time is more and occupies a larger part of
memory.

Advantages of Compiler:
 Compiled code runs faster in comparison to Interpreted code.
 Compilers help in improving the security of Applications.

Disadvantages of Compiler:
 The compiler can catch only syntax errors and some semantic
errors.
 Compilation can take more time in the case of bulky code.

Interpreter:
An Interpreter is a program that translates a programming language
into a comprehensible language. The interpreter converts high-level
{HT}
language to an intermediate language. It contains pre-compiled code,
source code, etc.
 It translates only one statement of the program at a time.
 Interpreters, more often than not are smaller than compilers.

Advantages of Interpreter:
 Programs written in an Interpreted language are easier to debug.
 Interpreted Language is more flexible than a Compiled language.

Disadvantages of Interpreter:
 The interpreter can run only the corresponding Interpreted
program.
 Interpreted code runs slower in comparison to Compiled code.
{HT}

You might also like