compiler notes
compiler notes
Answer:
Shift–Reduce Parsing [10M]
Shift–reduce parsing is a bottom-up technique a compiler uses to recognize the structure of your code by gradually building larger phrases from
small tokens. It works like this:
1. Shift: Read the next token from the input and push it onto a stack.
2. Reduce: Whenever the top items on the stack match the right side of a grammar rule, pop those items off and replace them with the rule’s
left-hand symbol.
You repeat these two steps—shifting tokens and reducing patterns—until:
• All input tokens have been read, and
• The stack contains only the start symbol of the grammar.
At that point, the parser has successfully recognized the entire input as a legal sentence under the language’s grammar.
Example Walkthrough
Grammar:
S→E
E→E+T|T
T→T*F|F
F → ( E ) | id
Input: id + id * id
Step Stack Remaining Input Action
1 [] id + id * id Shift id → stack: [id]
2 [id] + id * id Reduce id → F (by F → id)
3 [F] + id * id Reduce F → T (by T → F)
4 [T] + id * id Reduce T → E (by E → T)
5 [E] + id * id Shift + → [E, +]
6 [E, +] id * id Shift id → [E, +, id]
7 [E, +, id] * id Reduce id → F
8 [E, +, F] * id Reduce F → T
9 [E, +, T] * id Shift * → [E, +, T, *]
10 [E, +, T, *] id Shift id → [E, +, T, *, id]
11 [E, +, T, *, id] ― Reduce id → F
12 [E, +, T, *, F] ― Reduce T * F → T
13 [E, +, T] ― Reduce E + T → E
14 [E] ― Reduce E → S
Now the stack has [S] and input is empty, so parsing succeeds.
Key Points
• Shift–reduce parsing is powerful and handles most programming-language grammars.
• It uses a stack to keep track of partial results.
• Shift actions push tokens; reduce actions apply grammar rules.
• Conflicts (shift/reduce or reduce/reduce) arise in ambiguous grammars and must be resolved by rewriting the grammar or using parser tables
with lookahead.
End of 10-mark answer for Question 1
How it looks
Suppose you have the high-level statement:
a = b + c * d;
Normal code tries to do both the multiplication and addition at once. In TAC, we split it:
t1 = c * d // multiply c and d, store in temporary t1
t2 = b + t1 // add b and t1, store in temporary t2
a = t2 // finally, assign t2 to a
• t1 and t2 are temporary variables the compiler invents.
• Each line has the form (result) = (operand1) (op) (operand2).
Key takeaways
1. Three-address: at most two inputs, one output per instruction.
2. Temporaries break down complex expressions into simple steps.
3. Ideal for later optimization (constant folding, dead-code elimination) and for turning into real
machine or assembly code.
Summary:
• Static for fixed-size, lifetime-long data.
• Stack for short-lived, well-nested data (function locals).
• Heap for dynamic, flexible data whose size and lifetime vary at run time.
S-Attributed Definitions
• Flow direction: Bottom-up only.
• How it works: Every attribute is synthesized, meaning it’s computed from a symbol’s children in
the parse tree and passed up to its parent.
• Use case: Ideal for bottom-up parsers (like LR) because you naturally build subtrees first, then
combine their attributes.
Example:
For an expression E → E1 + T, you might compute E.value = E1.value + T.value. Both E1.value and T.value
come from child nodes and synthesize into E.value.
L-Attributed Definitions
• Flow direction: Left-to-right across each production.
• How it works: You can have inherited attributes from the left siblings or parent, and synthesized
attributes to the parent.
• Use case: Works for top-down parsers (like recursive-descent) because you can pass information
(inherited attributes) as you expand the grammar from left to right.
Example:
For a declaration list D → type L, you might pass D.type down to L so each variable in L knows its type
(L.inheritedType = D.type), and then L synthesizes attribute information back up if needed.
Key Differences
• S-Attributed: Only synthesized attributes (children → parent).
• L-Attributed: Allows inherited (parent/left → child) and synthesized (child → parent).
Both help the compiler check types, compute values, or manage symbol-table entries as it parses code.
How It Works
1. Nodes represent either:
○ Operands (variables or constants), or
○ Operators (like +, *).
2. Edges point from operands into the operator that uses them.
3. If two parts of the code compute the same operator on the same operands, they share the same
DAG node.
Simple Example
Consider the expression:
a+b+a+b
A naïve parse tree would compute a + b twice. A DAG merges them:
Benefits
• Eliminates duplicate work by finding common subexpressions.
• Simplifies later optimization (like constant folding).
• Reduces the number of temporaries and instructions in generated code.
Key Takeaway: A DAG is a compact, shared representation of expressions that lets the compiler spot
and eliminate redundant calculations—making your resulting program faster and leaner.
Que.6. Differentiate Top-Down Parsing (TDP) and Bottom-Up Parsing (BUP) parsers with examples.
Ans:
Top-Down vs. Bottom-Up Parsing [10M]
Parsers turn a flat list of tokens into a structured representation (a parse tree) of your program. There
are two main approaches:
Side-by-Side Comparison
Aspect Top-Down (LL) Bottom-Up (LR)
Start point Grammar’s start symbol Input tokens
Core actions Predict and expand rules Shift tokens, reduce rules
Grammar No left recursion; must be factored Handles left recursion; more general
restrictions
Implementation Often handwritten recursive code Usually table-driven (tool-generated)
Error detection Catches errors early (predictive) Good at finding where reduction fails
Tool support Simple parsers (ANTLR, recursive-descent) Yacc, Bison, GNU Bison
Key Takeaway:
Top-down parsers predict structure from the top, while bottom-up parsers construct structure from the
tokens up. Bottom-up covers more grammars but is more complex; top-down is simpler but less
powerful.
Que. 7. What is left recursion and left factoring? Explain with suitable examples.
Ans
Left Recursion & Left Factoring [5M]
1. Left Recursion
• What it is:
A grammar rule is left-recursive if a nonterminal refers to itself as its very first symbol. In formal
terms, a production like
A→Aα|β
is left-recursive because A appears on the left side and also at the start of the right side.
• Why it’s a problem:
Top-down parsers (like recursive-descent) expand rules by calling functions or pushing symbols. If
you try to expand A → A α, you end up in an infinite loop—A calls itself without consuming any
input.
• How to remove:
Rewrite the rules so that A no longer appears first. Given
A→Aα|β
Here, A' is a new helper nonterminal, and ε means “empty.” You first match β, then zero or more
repetitions of α.
• Example:
Original:
Expr → Expr + Term | Term
2. Left Factoring
• What it is:
Left factoring restructures grammar rules that share a common beginning so that a parser can
decide which path to take by looking at the next token.
• Why it’s needed:
In a predictive (LL) parser, if you have
A → α β1 | α β2
both alternatives start with the same prefix α. A one-token lookahead can’t tell whether to choose
β1 or β2.
• How to apply:
Factor out the common prefix α into a single rule, and use a new helper nonterminal:
A → α A'
A' → β1 | β2
• Example:
Original:
Stmt → if Expr then Stmt else Stmt
| if Expr then Stmt
Left-factor:
Stmt → if Expr then Stmt Stmt'
Stmt'→ else Stmt | ε
Key Takeaways:
• Left Recursion must be eliminated for top-down parsers by introducing a new nonterminal and
rewriting the rules.
• Left Factoring groups common prefixes so that a parser can make a decision with limited
lookahead.
Both techniques help build simple, efficient predictive parsers without infinite loops or choice conflicts.
Que. 8 Verify whether the following grammar is LL(1) or not? E -> E + T | T; T -> T * F | F; F -> (E) | id
Ans: LL(1) Check for the Expression Grammar [5M]
Grammar:
E→E+T|T
T→T*F|F
F → ( E ) | id
How to Fix It
To make this grammar LL(1), we must remove left recursion and left-factor:
1. Remove Left Recursion for E:
E → T E'
E' → + T E' | ε
3. F is already fine:
F → ( E ) | id
Now each nonterminal’s alternatives start with distinct tokens (or ε), so a single-lookahead LL(1) parser
can unambiguously pick the right production:
• When you see an id or (, you choose T (for E → T E') or F (in T → F T').
• When you see +, you know to expand E' → + T E'.
• When you see *, you expand T' → * F T'.
• When you see ) or ; (or end-of-input), you pick the ε-rules.
Conclusion:
The original grammar is not LL(1) because of left recursion and FIRST/FIRST conflicts. After rewriting
(removing left recursion and factoring), it becomes suitable for a single-lookahead top-down parser.
Que.9 Differentiate between S-attributed and L-attributed definitions with suitable examples
Ans:
9. Difference Between S-Attributed and L-Attributed Definitions [5M]
Both S-attributed and L-attributed schemes describe how a parser carries extra information (attributes)
through a parse tree. Here’s how they differ:
S-Attributed Definitions
• Attribute Direction: Only synthesized—data flows up from child nodes to their parent.
• When to Use: Ideal for bottom-up parsers (LR family), since children are processed before
parents.
• Example:
In an expression E → E1 + T, you compute
E.val = E1.val + T.val
Both E1.val and T.val already exist, so you simply combine them.
L-Attributed Definitions
• Attribute Direction: Allows inherited attributes (from parent or left siblings) and synthesized
attributes.
• When to Use: Works for top-down parsers (LL family), because you can pass information down as
you expand nonterminals left-to-right.
• Example:
For a declaration D → type L, you pass the type down:
Then each variable in L knows its type before you process it. You may also synthesize additional
info back up if needed.
Summary of Differences
Feature S-Attributed L-Attributed
Attribute kinds Synthesized only Inherited and synthesized
Information flow Child → Parent Left-to-right and child ↔ parent
Parser type Bottom-up (LR) Top-down (LL)
Grammar restrictions None beyond CFG Inherited attrs must respect left-to-right flow
Both schemes help a compiler check types, compute values, and manage symbol tables systematically as
it parses code.
1. Preprocessor
• Job: Handle directives like #include and #define, expand macros, and strip out comments.
• Result: A “clean” source file (often with an .i extension) that only contains the raw code the
compiler needs.
2. Compiler Proper
This core phase itself has six steps:
1. Lexical Analysis
○ What it does: Reads the cleaned source text and chops it into tokens (like int, x, =, 42, ;).
○ Also: Builds a symbol table entry for each new identifier.
2. Syntax Analysis (Parsing)
○ What it does: Takes the tokens and checks they follow the language’s grammar.
○ Result: A parse tree that shows the nesting of statements and expressions.
3. Semantic Analysis
○ What it does: Walks the parse tree to ensure the code “makes sense”:
Types match (no adding an integer to a string).
Variables were declared before use.
Scopes are respected.
○ Result: An annotated tree with type and scope information.
4. Intermediate Code Generation
○ What it does: Converts the annotated tree into a simple, machine-independent form, like
three-address code.
○ Why: This makes later optimizations and target-specific code generation easier.
5. Code Optimization
○ What it does: Improves the intermediate code without changing what it does:
Constant folding: Compute 3 + 4 at compile time.
Dead-code elimination: Remove calculations whose results aren’t used.
Loop simplifications, etc.
6. Code Generation
○ What it does: Transforms optimized intermediate code into actual assembly instructions for
the target CPU (x86, ARM, etc.).
○ Tasks: Assign registers, choose instructions, and lay out memory addresses.
3. Assembler
• Job: Take the assembly file (e.g. program.s) and convert it into an object file (program.o)
Example Flow
Compiling hello.c that prints “Hello” typically does:
1. Preprocessor → hello.i
2. Compiler Proper → hello.s
3. Assembler → hello.o
4. Linker → hello.exe (or a.out)
5. Loader → Executes and shows Hello
Key Takeaway:
Although you usually run one command (like gcc hello.c), under the hood your code moves through
these clear stages—preprocessing, tokenizing, parsing, semantic checks, intermediate-code,
optimizations, assembly, linking, and finally execution. Each step plays a vital role in turning high-level
code into a working program.
Because of these limitations, top-down parsing is best suited for simple, well-structured grammars or
educational purposes. For full-scale language compilers, bottom-up parsers are often preferred.
Que 12. Explain the following terms: LEX, Dangling else Grammar.
A. LEX
• What it is:
LEX is a tool that automatically builds a lexical analyzer (also called a scanner) from a high-level
specification of token patterns.
• How you use it:
1. You write a file (e.g. scanner.l) containing regular-expression patterns and actions (usually
snippets of C code).
2. Running lex scanner.l generates a C source file (lex.yy.c).
3. You compile that (gcc -o scanner lex.yy.c -lfl), producing an executable that reads your
program text and outputs a stream of tokens.
• Why it’s useful:
○ Saves you from hand-coding messy character-reading loops.
○ Lets you express token rules cleanly (keywords, identifiers, numbers, operators).
○ Integrates easily into a compiler pipeline (feeding tokens into a parser).
B. Dangling-Else Grammar
• What the problem is:
Consider nested if statements without braces:
if (cond1)
if (cond2)
stmt1;
else
stmt2;
It’s ambiguous whether the else pairs with the inner if or the outer one.
• Why it matters:
Ambiguity in the grammar means a parser might build two different parse trees for the same
code, leading to inconsistent behavior.
• Classic solution:
Modify the grammar so that each else always attaches to the nearest unmatched if. In BNF form:
Here, MatchedStmt always has its else, while UnmatchedStmt may not. This ensures the else
binds to the closest if.
Que 13. Remove Left Recursion from the grammar: S -> Aa | b, A -> Ac | Sd | e.
Ans
Removing Left Recursion [10M]
Left recursion occurs when a nonterminal calls itself as the first symbol on its right side. In a top-down
parser, this causes infinite loops. We must rewrite the grammar to eliminate that.
Original Grammar
S→Aa
|b
So:
A → S d A′
| A′
A′ → c A′
|ε
Distribute:
S → S d A′ a ← s ll le -recursive
| A′ a
|b
We still have S → S …, so remove S’s direct left recursion using the same trick:
Let Y = the non-recursive alternatives: A′ a and b.
So:
S → Y S′
S′→ d A′ a S′ | ε
Putting back Y:
S → A′ a S′
| b S′
S′ → d A′ a S′
|ε
A′ → c A′
|ε
S′ → d A′ a S′
|ε