Compiler Design - Complete Study Notes
Compiler Design - Complete Study Notes
Lexical Analysis
Syntax Analysis
Semantic Analysis
Detailed Explanation:
int x = 10;
2. Compiler vs Interpreter
Aspect Compiler Interpreter
Easy Definition:
Lexeme: The actual string in source code
Example:
= ASSIGN =
42 NUMBER [0-9]+
Memory Trick: Lexeme = Literal string, Token = Type, Pattern = Production rule
LEX Structure:
%{
/* C declarations */
%}
/* LEX definitions */
%%
/* LEX rules */
%%
/* User functions */
%{
#include <stdio.h>
%}
%%
[0-9]+ { printf("NUMBER: %s\n", yytext); }
[a-zA-Z]+ { printf("IDENTIFIER: %s\n", yytext); }
"+" { printf("PLUS\n"); }
"=" { printf("ASSIGN\n"); }
[ \t\n] { /* ignore whitespace */ }
%%
int main() {
yylex();
return 0;
}
Why Buffer?
Reading character by character is expensive
Need lookahead for token recognition
Techniques:
1. Single Buffer
Buffer 1: |----token----|
Buffer 2: |--continues--|
3. Sentinels
Definition:
A CFG is a 4-tuple: G = (V, T, P, S)
V: Variables (Non-terminals)
T: Terminals
P: Productions
S: Start symbol
Example:
E → E + T | T
T → T * F | F
F → (E) | id
Top-Down Parsing
Strategy: Start from start symbol, derive input
Bottom-Up Parsing
Strategy: Start from input, reduce to start symbol
Memory Trick:
S-Attributed Definitions
Rule: Use only Synthesized attributes
L-Attributed Definitions
Rule: Use synthesized + Limited inherited attributes
Memory Trick:
Production: A → aB or A → a
Grammar Properties:
Ambiguous Grammar
Example: E → E + E | E * E | id
String "id + id * id" has 2 parse trees
Left Recursion
Direct: A → Aα
Indirect: A → Bα, B → Aβ
Problem: Infinite loop in top-down parsing
Left Factoring
Problem: A → αβ | αγ
Memory Trick:
Definition:
Functions that work with multiple types
Types:
java
java
java
2. Stack Allocation
When: Function calls
3. Heap Allocation
When: Runtime (dynamic)
Where: malloc(), new operator
Primary Goals:
1. Correctness: Generated code must be semantically equivalent
Specific Goals:
Register Allocation: Minimize memory access
Enable optimizations
Example:
a = b + c
d = b + c
e = d + a
Benefits:
Dead Code Elimination: Unused computations
Problem:
Jump addresses unknown during code generation
Example:
if (condition) goto L1
stmt1
goto L2
L1: stmt2
L2: next_stmt
Memory Trick: "BackPatch = Blank first, Patch later"
Types:
Constant Folding: 3 + 4 → 7
Register Allocation
Instruction Scheduling
Peephole Optimization
Levels:
Local: Within basic block
Definition:
Optimize small "window" of instructions (usually 3-5)
Types:
2. Constant Folding
Before: MOV R1, #3
ADD R1, #4
After: MOV R1, #7
3. Strength Reduction
4. Algebraic Simplification
Purpose:
Determine how data flows through program
Enable optimizations
Safety analysis
Key Concepts:
Reaching Definitions
Available Expressions
Definition:
Compiler that runs on one machine but generates code for another
Example:
Host: x86 PC
Why Needed?
Target machine too small for compiler
Development convenience
Different architectures
Essential Properties:
1. Correctness Preservation
2. Efficiency Improvement
3. Compile-Time Efficiency
Optimization shouldn't take too long
Trade-off between compile time and runtime benefit
4. Debugging Support
5. Predictability
Exam Tips 💡
1. Draw diagrams for phases, parse trees, DAGs