0% found this document useful (0 votes)
2 views

Compiler Design - Complete Study Notes

The document provides comprehensive study notes on compiler design, detailing the phases of a compiler, the differences between compilers and interpreters, and various concepts such as tokens, grammar types, and optimization techniques. It includes memory tricks for easier recall of complex information and examples to illustrate key points. Additionally, it covers practical aspects like storage allocation strategies and properties of optimizing compilers.

Uploaded by

Kamlesh Porwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Compiler Design - Complete Study Notes

The document provides comprehensive study notes on compiler design, detailing the phases of a compiler, the differences between compilers and interpreters, and various concepts such as tokens, grammar types, and optimization techniques. It includes memory tricks for easier recall of complex information and examples to illustrate key points. Additionally, it covers practical aspects like storage allocation strategies and properties of optimizing compilers.

Uploaded by

Kamlesh Porwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Compiler Design - Complete Study Notes 📚

1. Phases of Compiler (WITH MEMORY TRICK! 🧠)


Memory Trick: "Lexy Sarah Sees Incredible Code Optimization"

Lexical Analysis

Syntax Analysis
Semantic Analysis

Intermediate Code Generation


Code Optimization
Object Code Generation

Detailed Explanation:

Phase 1: Lexical Analysis (Scanner)

What it does: Converts source code into tokens


Input: Character stream

Output: Token stream


Example:

int x = 10;

Tokens: <INT>, <IDENTIFIER,"x">, <ASSIGN>, <NUMBER,"10">, <SEMICOLON>

Phase 2: Syntax Analysis (Parser)

What it does: Checks grammatical structure


Input: Token stream
Output: Parse tree/Abstract Syntax Tree

Example: Verifies int x = 10; follows grammar rules

Phase 3: Semantic Analysis

What it does: Type checking, scope resolution

Input: Parse tree


Output: Annotated parse tree
Example: Ensures you can't do int x = "hello";

Phase 4: Intermediate Code Generation

What it does: Creates platform-independent code

Input: Annotated parse tree


Output: Three-address code

Example: t1 = 10; x = t1;

Phase 5: Code Optimization

What it does: Improves code efficiency


Input: Intermediate code

Output: Optimized intermediate code

Example: Removes dead code, constant folding

Phase 6: Code Generation

What it does: Generates target machine code


Input: Optimized intermediate code

Output: Assembly/Machine code

2. Compiler vs Interpreter
Aspect Compiler Interpreter

Translation Entire program at once Line by line

Execution After compilation During translation

Speed Faster execution Slower execution

Memory More memory needed Less memory

Error Detection All errors at once Stops at first error

Examples C, C++, Java Python, JavaScript


 

Memory Trick: Compiler = Complete translation first, Interpreter = Immediate execution

3. Tokens, Patterns, and Lexemes

Easy Definition:
Lexeme: The actual string in source code

Token: Category/type of the lexeme

Pattern: Rule that describes the token

Example:

int count = 42;

Lexeme Token Pattern

int KEYWORD Reserved word

count IDENTIFIER [a-zA-Z][a-zA-Z0-9]*

= ASSIGN =

42 NUMBER [0-9]+
 

Memory Trick: Lexeme = Literal string, Token = Type, Pattern = Production rule

4. LEX - Lexical Analyzer Generator

LEX Structure:

%{
/* C declarations */
%}
/* LEX definitions */
%%
/* LEX rules */
%%
/* User functions */

Example LEX Program:


lex

%{
#include <stdio.h>
%}

%%
[0-9]+ { printf("NUMBER: %s\n", yytext); }
[a-zA-Z]+ { printf("IDENTIFIER: %s\n", yytext); }
"+" { printf("PLUS\n"); }
"=" { printf("ASSIGN\n"); }
[ \t\n] { /* ignore whitespace */ }
%%

int main() {
yylex();
return 0;
}

5. Input Buffering Techniques

Why Buffer?
Reading character by character is expensive
Need lookahead for token recognition

Techniques:

1. Single Buffer

Problem: What if token spans buffer boundary?

2. Double Buffer (Better!)

Buffer 1: |----token----|
Buffer 2: |--continues--|

3. Sentinels

Use special EOF character to avoid boundary checks

Trick: Place EOF at end of each buffer half

Memory Trick: "Double Sentinel Buffers Work Better"


6. Context-Free Grammar (CFG)

Definition:
A CFG is a 4-tuple: G = (V, T, P, S)

V: Variables (Non-terminals)

T: Terminals

P: Productions

S: Start symbol

Example:

E → E + T | T
T → T * F | F
F → (E) | id

This generates expressions like: id + id * id

Memory Trick: "Very Talented Programmers Start here"

7. Top-Down vs Bottom-Up Parsing

Top-Down Parsing
Strategy: Start from start symbol, derive input

Types: Recursive Descent, LL(1)

Think: Root to leaves (like reading a book)

Bottom-Up Parsing
Strategy: Start from input, reduce to start symbol

Types: LR(0), SLR(1), LALR(1), LR(1)

Think: Leaves to root (like building a pyramid)

Memory Trick:

Top-Down = Tree from Top

Bottom-Up = Build from Base


8. Syntax-Directed Translation

S-Attributed Definitions
Rule: Use only Synthesized attributes

Direction: Information flows UP the parse tree

Example: Expression evaluation

E → E1 + T { E.val = E1.val + T.val }


T → F { T.val = F.val }
F → num { F.val = num.lexval }

L-Attributed Definitions
Rule: Use synthesized + Limited inherited attributes

Direction: Information flows UP and Left-to-right

Restriction: Inherited attributes can only depend on left siblings

Memory Trick:

S = Synthesized = Simple (only up)

L = Limited = Left-to-right allowed

9. Types of Grammar (COMPLETE KNOWLEDGE)

Chomsky Hierarchy (Remember: "Uncles Can Count Regularly")

Type 0: Unrestricted Grammar

Production: α → β (any string to any string)


Automaton: Turing Machine

Example: αAβ → αγβ

Type 1: Context-Sensitive Grammar

Production: αAβ → αγβ (|γ| ≥ 1)

Automaton: Linear Bounded Automaton

Rule: Left side ≤ Right side length

Type 2: Context-Free Grammar


Production: A → α (single non-terminal on left)

Automaton: Pushdown Automaton

Most important for compilers!

Type 3: Regular Grammar

Production: A → aB or A → a

Automaton: Finite Automaton


Used in lexical analysis

Grammar Properties:

Ambiguous Grammar

Multiple parse trees for same string

Example: E → E + E | E * E | id
String "id + id * id" has 2 parse trees

Left Recursion

Direct: A → Aα
Indirect: A → Bα, B → Aβ
Problem: Infinite loop in top-down parsing

Left Factoring

Problem: A → αβ | αγ

Solution: A → αA', A' → β | γ

10. Type Checking

Static Type Checking


When: Compile time

Advantage: Catches errors early, faster execution

Languages: C, C++, Java

Example: int x = "hello"; ← Error caught at compile time

Dynamic Type Checking


When: Runtime
Advantage: More flexible

Languages: Python, JavaScript


Example: Variable can change type during execution

Memory Trick:

Static = Strictly checked at Start

Dynamic = Determined During execution

11. Polymorphic Functions

Definition:
Functions that work with multiple types

Types:

1. Parametric Polymorphism (Generics)

java

<T> void swap(T a, T b) {


// Works with any type T
}

2. Ad-hoc Polymorphism (Overloading)

java

int add(int a, int b) { return a + b; }


double add(double a, double b) { return a + b; }

3. Subtype Polymorphism (Inheritance)

java

Animal a = new Dog(); // Dog inherits from Animal

Memory Trick: "Polymorphic = People Adding Similar functions"

12. Storage Allocation Strategies


1. Static Allocation
When: Compile time

Where: Global variables, static variables


Lifetime: Entire program execution

2. Stack Allocation
When: Function calls

Where: Local variables, parameters


Lifetime: Function scope

Structure: LIFO (Last In, First Out)

3. Heap Allocation
When: Runtime (dynamic)
Where: malloc(), new operator

Lifetime: Until explicitly freed


Management: Garbage collection or manual

Memory Trick: "Static Stays, Stack Shrinks, Heap Hangs around"

13. Goals of Code Generation

Primary Goals:
1. Correctness: Generated code must be semantically equivalent

2. Efficiency: Optimize for speed and space

3. Target Independence: Easy to retarget

Specific Goals:
Register Allocation: Minimize memory access

Instruction Selection: Choose best instructions


Instruction Scheduling: Avoid pipeline stalls

Memory Trick: "Correct Efficient Targeted code"

14. DAG (Directed Acyclic Graph) Representation


Purpose:

Represent basic blocks efficiently


Identify common subexpressions

Enable optimizations

Example:

a = b + c
d = b + c
e = d + a

DAG shows b + c computed once, used twice!

Benefits:
Dead Code Elimination: Unused computations

Common Subexpression: Avoid recomputation

Constant Folding: Compute constants at compile time

15. Back Patching in Code Generation

Problem:
Jump addresses unknown during code generation

Forward references need to be "patched"

Solution - Back Patching:


1. Generate jump with placeholder address

2. Keep list of locations to patch

3. When target known, patch all locations

Example:

if (condition) goto L1
stmt1
goto L2
L1: stmt2
L2: next_stmt
Memory Trick: "BackPatch = Blank first, Patch later"

16. Code Optimization

Types:

Machine Independent Optimizations:

Constant Folding: 3 + 4 → 7

Constant Propagation: Replace variables with constants


Dead Code Elimination: Remove unreachable code

Common Subexpression Elimination

Machine Dependent Optimizations:

Register Allocation

Instruction Scheduling

Peephole Optimization

Levels:
Local: Within basic block

Global: Within function


Interprocedural: Across functions

17. Peephole Optimization

Definition:
Optimize small "window" of instructions (usually 3-5)

Types:

1. Redundant Instruction Elimination

Before: MOV R1, R2


MOV R2, R1
After: MOV R1, R2

2. Constant Folding
Before: MOV R1, #3
ADD R1, #4
After: MOV R1, #7

3. Strength Reduction

Before: MUL R1, #2


After: SHL R1, #1 (shift left = multiply by 2)

4. Algebraic Simplification

Before: ADD R1, #0


After: (remove instruction)

Memory Trick: "Peep through small hole, optimize locally, efficiently"

18. Data Flow Analysis

Purpose:
Determine how data flows through program
Enable optimizations
Safety analysis

Key Concepts:

Reaching Definitions

Which definitions reach which uses?

Use: Dead code elimination

Live Variable Analysis

Which variables are "live" (will be used later)?

Use: Register allocation

Available Expressions

Which expressions are available (already computed)?

Use: Common subexpression elimination


Direction:
Forward: Information flows with program execution

Backward: Information flows against program execution

19. Cross Compiler

Definition:
Compiler that runs on one machine but generates code for another

Example:
Host: x86 PC

Target: ARM processor (mobile phones)


Use Case: Embedded systems development

Why Needed?
Target machine too small for compiler

Development convenience
Different architectures

Memory Trick: "Cross Compiler Compiles Crosswise"

20. Properties of Optimizing Compilers

Essential Properties:

1. Correctness Preservation

Optimized code must behave identically to original


Most Important Property!

2. Efficiency Improvement

Time: Faster execution


Space: Smaller code size

Energy: Lower power consumption

3. Compile-Time Efficiency
Optimization shouldn't take too long
Trade-off between compile time and runtime benefit

4. Debugging Support

Maintain correlation between source and optimized code


Support for debugging optimized code

5. Predictability

Similar code patterns should be optimized similarly


Developers can reason about performance

Memory Trick: "Correct Efficient Compilation Debugs Predictably"

Quick Review Mnemonics 🎯


1. Compiler Phases: "Lexy Sarah Sees Incredible Code Optimization"
2. Grammar Types: "Uncles Can Count Regularly"

3. Storage Types: "Static Stays, Stack Shrinks, Heap Hangs around"


4. Parsing Types: "Top-Down from Top, Bottom-Up from Base"
5. Attributes: "S = Simple UP, L = Limited left-right"

Exam Tips 💡
1. Draw diagrams for phases, parse trees, DAGs

2. Give examples for every concept

3. Compare and contrast (compiler vs interpreter, static vs dynamic)


4. Remember the "why" behind each technique

5. Practice code examples for LEX, grammar, optimizations

Good Luck! You've got this! 🚀

You might also like