Parser and CFG
Parser and CFG
CONSTRUCTION
USMAN ASGHAR
Introduction to Parsing
•Definition: Parsing is the process of analyzing a sequence of symbols to determine its
grammatical structure with respect to a given formal grammar.
•Role in Compilers: The parser takes the input code and converts it into a structure (like a parse
tree or syntax tree) that represents the syntactic structure of the source code.
•Parsing Phases:
• Lexical Analysis (handled by a lexer): Breaks code into tokens.
• Syntax Analysis (handled by a parser): Checks if the tokenized input adheres to grammar rules.
Top-Down Parsing Overview
•Top-Down Parsing: Begins at the root (start symbol) and attempts to construct a parse tree by
expanding each node based on production rules, aiming to match the input from left to right.
•Recursive Descent Parsing: Uses a set of recursive functions to process the input. Each function
corresponds to a grammar rule. Suitable for grammars without left recursion.
•Predictive Parsing: A subset of recursive descent that uses lookahead to decide which rule to
apply. Works well with grammars that are LL(1), meaning they can be parsed with one token
lookahead.
Example of a Grammar for an
Expression
E→T+E|T
T→F*T|F
F → (E) | id
Syntax Analysis
Role of the Parser
• Syntax Validation: Ensures that the input code follows the grammar of the language.
• Error Detection and Reporting: Identifies and reports syntax errors for the programmer to correct.
• Parse Tree Creation: Builds a hierarchical structure of the code, often as a syntax tree.
Context-Free Grammars (CFG)
• Definition: A CFG is a set of recursive rules used to generate patterns of strings.
• Components of CFG:
• Terminals: Actual symbols of the language (e.g., keywords, operators).
• Non-terminals: Syntactic variables that define sets of strings.
• Production Rules: Define how terminals and non-terminals can be combined.
• Start Symbol: The symbol from which parsing begins.
Example CFG for Arithmetic
Expressions
E→E+T|E-T|T
T→T*F|T/F|F
F → (E) | id
Writing a Grammar
•Consider Precedence and Association: Design grammar rules that respect operator precedence
and associativity.
•Avoid Ambiguity: Craft rules to ensure each string has only one valid parse tree
Example Grammar for Mathematical Operations
E→E+T|E-T|T
T→T*F|T/F|F
F → (E) | id
Ambiguity in Grammar
•Ambiguous Grammar: A grammar is ambiguous if there exists a string with multiple valid parse
trees.
•Why Avoid Ambiguity? Ambiguous grammars make parsing inefficient and error-prone, as the
parser cannot uniquely determine the syntax tree structure.
Example of Ambiguous Grammar
S → aSb | SS | ε
Writing Grammar
Eliminating Ambiguity
• Method: Adjust grammar rules to enforce operator precedence, left or right associativity, and clarity in
structure.
• Example of Removing Ambiguity: Refine the grammar for expressions to respect operator precedence.
• Ambiguous Grammar:
E → E + E | E * E | id
Non-Ambiguous Version:
E→E+T|T
T→T*F|F
F → id
Elimination of Left Recursion
Definition: Left recursion occurs when a non-terminal calls itself as the first symbol in one of its
productions (e.g., A → Aα | β).
Problem with Left Recursion: It can lead to infinite recursion in top-down parsers.
Solution: Rewrite the grammar to eliminate left recursion.
1. Example of Removing Left Recursion
Original Left-Recursive Grammar
E→E+T|T
Left Factoring
• Definition: Left factoring is a technique to factor out common prefixes in productions.
This helps the parser decide which production to use based on lookahead symbols.
• When to Apply: Use left factoring when two or more productions for a non-terminal
start with the same sequence of symbols.
Transformed Grammar
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
Example of Left Factoring
Before Left Factoring
A → αβ | αγ
After Left Factoring
A → αA'
A' → β | γ