Compiler Designnotes
Compiler Designnotes
Introduction to Compilers
What is a Compiler?
What is an Interpreter?
An interpreter is a program that executes code line by line without translating the entire
program into machine code first. It reads, translates, and runs each line one by one.
Error
Shows all errors after compilation Shows errors line by line
Detection
Phases of a Compiler
A compiler works in several steps, called phases, to convert source code into executable
code:
1. Lexical Analysis: Breaks the code into small pieces called tokens (like keywords,
symbols).
2. Syntax Analysis: Checks if tokens follow the grammar rules of the language.
3. Semantic Analysis: Ensures the code makes sense (e.g., variables are declared
before use).
4. Intermediate Code Generation: Creates a middle-level code that is easier to
optimize.
7. Symbol Table Management: Keeps track of variables, functions, and their details.
The lexical analyzer (or scanner) is the first phase of a compiler. It:
• Groups characters into meaningful units called tokens (e.g., int, +, variable_name).
Regular Expressions
Finite Automata
A finite automaton (FA) is a simple machine that recognizes patterns in text. It has:
• Accepting State: If the machine reaches this state, the input matches the pattern.
There are two types:
1. Deterministic Finite Automata (DFA): Only one possible transition for each input.
• The NFA can then be converted into a DFA for faster processing.
• This process helps the lexical analyzer recognize tokens based on regex patterns.
• A pass is one complete run through the source code by the compiler.
• A compiler may need multiple passes to complete all phases (e.g., one pass for
lexical analysis, another for code generation).
Bootstrapping
Bootstrapping is the process of writing a compiler for a language using the same language.
For example:
• Writing a C compiler in C.
LEX is a tool that automatically creates a lexical analyzer. You give it:
Parsing
What is Parsing?
Parsing is the process of analyzing the structure of the source code to check if it follows
the grammar rules of the programming language. It’s done by the parser in the syntax
analysis phase.
Role of Parser
The parser:
• Takes tokens from the lexical analyzer.
• Checks if the tokens form valid sentences according to the language’s grammar.
• Productions: Rules that describe how non-terminals are formed (e.g., expr → expr +
term).
• Start Symbol: The main non-terminal that represents the entire program.
Derivations
A derivation is the process of applying grammar rules to create a valid sentence. For
example:
• Grammar: S → aS | b
Parse Trees
A parse tree is a tree-like diagram that shows how a sentence is derived from the grammar:
Ambiguity
A grammar is ambiguous if a single sentence can have multiple parse trees. For example:
• Sentence: 2 + 3 + 4
• Possible parse trees: (2 + 3) + 4 or 2 + (3 + 4).
Ambiguity causes confusion, so it must be eliminated.
Left recursion happens when a grammar rule starts with the same non-terminal (e.g., A →
Aα | β). This causes problems for some parsers. To eliminate it:
• Example:
o Original: E → E + T | T
Left Factoring
Left factoring removes common prefixes from grammar rules to make parsing easier. For
example:
• Original: A → aB | aC
• After: A → aD, D → B | C
The dangling-else problem occurs in grammars for if-else statements, where it’s unclear
which if an else belongs to. For example:
• Rewrite the grammar to enforce that else binds to the nearest if.
• Example grammar:
Classes of Parsing
2. Bottom-Up Parsing: Starts from the tokens and builds the parse tree upward.
Top-Down Parsing
In top-down parsing, the parser starts with the start symbol and tries to derive the input
sentence.
Backtracking
• The parser tries different grammar rules and backtracks if a choice leads to a dead
end.
• Example: For grammar E → T + E | T, there’s a function for E that calls functions for T
and E.
Predictive Parsers
• A predictive parser guesses the next rule to apply based on the current token.
• Fast and efficient but requires the grammar to be suitable (e.g., LL(1)).
LL(1) Grammars
• The parser reads the input from left to right, builds a leftmost derivation, and looks
at one token ahead to make decisions.
• Requirements:
o No ambiguity.
o No left recursion.
o Rules must be left-factored.
• A grammar is LL(1) if the parser can always choose the correct rule by looking at the
next token.
Unit:-2
Bottom-up parsing is a method of parsing where the parser starts with the input tokens
(the "bottom") and builds the parse tree upward until it reaches the start symbol of the
grammar (the "top"). It tries to construct the parse tree by combining tokens into larger
structures based on the grammar rules.
• Key Idea: It works by reducing tokens into non-terminals using grammar rules,
moving from the input to the start symbol.
• Example: For a grammar S → aB and input aB, the parser starts with aB and reduces
it to S.
Handles
A handle is a part of the input string that matches the right-hand side of a grammar rule
and can be reduced to a non-terminal.
• Example: For grammar S → aB, if the input is aB, the handle is aB because it can be
reduced to S.
• The parser identifies handles and replaces them with the corresponding non-
terminal.
Handle Pruning
Handle pruning is the process of repeatedly finding and reducing handles in the input
string until the start symbol is reached.
• Steps:
• This process "prunes" the parse tree from the bottom up.
Shift-Reduce Parsing
2. Reduce: If the top of the stack matches the right-hand side of a grammar rule (a
handle), pop those symbols and push the corresponding non-terminal.
How It Works:
• It shifts tokens from the input to the stack until it finds a handle.
• This continues until the input is empty and the stack contains only the start symbol.
Example:
• Grammar: S → aB, B → b
• Input: ab
• Steps:
Sometimes, the parser cannot decide whether to shift or reduce. These situations are
called conflicts:
1. Shift-Reduce Conflict:
o The parser can either shift the next token or reduce the stack.
o Example: For grammar S → aS | a, if the stack has a and the next token is a,
the parser doesn’t know whether to shift a or reduce a to S.
2. Reduce-Reduce Conflict:
o The parser can reduce the stack using two or more different grammar rules.
o Example: For grammar S → aB, B → c, C → c, if the stack has ac, the parser
doesn’t know whether to reduce c to B or C.
LR Grammars
LR grammars are a class of grammars that can be parsed using a bottom regognizable
bottom-up parser. The name LR stands for Left-to-right, Rightmost derivation.
• LR parsers are powerful and can handle a wide range of grammars, including some
ambiguous ones.
• They use a parsing table to decide whether to shift or reduce based on the current
state and the next token.
Types of LR Parsers
1. Simple LR (SLR):
2. Canonical LR (CLR):
o Can handle more complex grammars but requires more memory and time.
3. Look-Ahead LR (LALR):
o A middle ground between SLR and CLR.
o Uses a smaller parsing table than CLR but can handle more grammars than
SLR.
Comparison:
When a parser finds a syntax error (e.g., a missing semicolon), it needs to recover so it can
continue parsing. Common error recovery techniques:
1. Panic Mode:
o Example: If the parser expects a ) but sees a +, it skips tokens until it finds a
valid one.
2. Phrase-Level Recovery:
o Insert or delete a token to fix the error (e.g., add a missing semicolon).
3. Error Productions:
4. Global Recovery:
Good error recovery helps the compiler report multiple errors instead of stopping at the
first one.
• Grammar: E → E + E | num
• Input: 2 + 3 + 4
• Modifying the grammar to remove ambiguity (e.g., rewriting the grammar to enforce
precedence).
• You provide:
o Actions to perform when a rule is reduced (e.g., create a parse tree node).
• YACC generates:
• Features:
%{
#include <stdio.h>
%}
%token NUM
%%
%%
Unit:-3
Syntax Directed Translation (SDT) is a method used by compilers to translate source code
into another form (like intermediate code or machine code) while parsing the code. It
attaches rules or actions to the grammar rules to perform translations during parsing.
• Key Idea: Each grammar rule has associated actions (called semantic actions) that
describe what to do when the rule is applied.
• Example: For a grammar rule E → E1 + E2, an SDT might generate code to add E1 and
E2.
A Syntax Directed Definition (SDD) is a formal way to define the translation process. It
consists of:
• Semantic Rules: Instructions that calculate attribute values for each production
rule.
Example:
Grammar: E → E1 + E2
A syntax tree (or parse tree) is a tree that represents the structure of the source code
according to the grammar. SDT can be used to build syntax trees by:
• Example:
o Grammar: E → E1 + E2
S-Attributed Definitions
• Characteristics:
• Example:
o Grammar: E → E1 + E2
An L-attributed definition uses both synthesized and inherited attributes, but inherited
attributes are computed in a left-to-right order.
• Characteristics:
o Attributes can depend on the parent or left siblings but not on right siblings.
• Example:
o Grammar: D → T id
Translation Schemes
A translation scheme is an SDT where semantic actions are embedded within the
grammar rules to specify when the actions should be executed during parsing.
• Key Idea: Actions (enclosed in {}) are placed in the production to indicate the order
of execution.
• Example:
o Grammar: E → E1 + E2
o Translation Scheme:
Emitting a Translation
Emitting a translation means producing the output of the translation process (e.g.,
intermediate code, machine code). The translation scheme controls when and how the
output is generated.
• t1 = a + b
Unit:- 4
Code Optimization
Code optimization is the process of improving the intermediate code (or machine code) to
make it run faster, use less memory, or consume less power, while producing the same
output. It’s an optional phase in a compiler but very important for performance.
• Goal: Make the program more efficient without changing its behavior.
A code optimizer is a part of the compiler that applies optimization techniques. It typically
works on intermediate code and is organized as:
• Front End: Analyzes the intermediate code and builds data structures (like flow
graphs).
• Back End: Outputs the optimized intermediate code for code generation.
• Basic Block:
o A sequence of instructions with only one entry point (the first instruction) and
one exit point (the last instruction).
o t1 = a + b
o t2 = t1 * c
• Flow Graph:
o A graph where each node is a basic block, and edges represent control flow
(jumps or branches) between blocks.
o Example:
o B1: t1 = a + b
o goto B2
o B2: t2 = t1 * c
Here, B1 and B2 are basic blocks, and there’s an edge from B1 to B2.
Optimizations within a basic block are called local optimizations. Common techniques
include:
1. Constant Folding:
2. Constant Propagation:
5. Strength Reduction:
1. Redundant Computations:
2. Unreachable Code:
3. Loop Optimizations:
4. Algebraic Simplifications:
5. Function Inlining:
o Replace function calls with the function’s body to avoid call overhead.
A Directed Acyclic Graph (DAG) is a graphical representation of a basic block used for
optimization.
• Uses:
o Code: `t1 = a