Chapter 3
Chapter 3
Syntax Analysis
Syntax Analysis
• Syntax analysis, also known as parsing, is the second
phase of compiler design.
• It takes the tokens generated by the lexical analysis
phase and organizes them into a hierarchical
structure, usually represented as a parse tree or
syntax tree, based on the grammar of the
programming language.
Role of Parser
• Validate Syntax: Ensure the program's structure
conforms to the language's grammar rules.
• Generate Parse Tree: Represent the syntactic structure
of the source code.
• Error Detection and Recovery: Identify and recover
from syntax errors for further analysis.
• Prepare for Semantic Analysis: Provide a structure for
semantic analysis and intermediate code generation.
Cont…
Context-Free Grammar (CFG)
• Used in syntax analysis to define language syntax.
• A Context-Free Grammar (CFG) is a formalism used to
define the syntax of programming languages.
• It provides the rules to describe the structure of valid
strings in the language.
• A CFG consists of the following key components:
Terminals
Non-terminals
Production rules
Start symbol.
Terminals
• Definition: The basic symbols of the language,
which cannot be broken down further.
• Represented with small letters
• Example
• In arithmetic expressions: +, *, (, ), id (identifier).
• In programming languages: if, else, while.
Non-Terminals
• Symbols that represent groups of strings or structures in
the language that are represented with capital letters
and that can be further derivate
• They are placeholders for patterns defined in terms of
other terminals and non-terminals.
• Example:
• Expr (Expression), Stmt (Statement), Term, Factor
Start Symbol
• A specific non-terminal symbol from which parsing or
derivation begins.
• One of non terminals from where production begins.
• Example:
For arithmetic expressions, the start symbol might be Expr
Production Rules
Rules that describe how non-terminals can be replaced with
terminals or other non-terminals.
• Terminals and non-terminals combined to form strings
• Form: A production rule has the format:
•A → α
Where:
• A is a non-terminal.
• α is a sequence of terminals and/or non-terminals.
Example 1
3 + 5, 10 * (2 + 3), or 5 - (3 / 2).
Key Components of the CFG:
Variables (Non-terminal symbols): These represent
syntactic categories.
Expr: Represents an arithmetic expression.
Term: Represents a term (part of an expression that can be
multiplied or divided).
Factor: Represents a factor (the smallest unit, such as a
number or a sub-expression)
Terminals: These are the actual symbols in the
language (operators, numbers, parentheses).
Operators: +, -, *, /
Parentheses: (, )
NUMBER: Represents any numeric value, e.g., 3, 5,
10, etc.
Cont…
Production Rules: These define how variables (non-
terminals) can be expanded into terminals or other non-
terminals.
Cont…
Start Symbol: This is the symbol from which the
parsing starts.
In this case, the start symbol is Expr.
Example 2
Variables (Non-terminals): Expr, Term, Factor For example, the expression 3 + 5 * (2 - 1) would be parsed as:
Terminals: +, -, *, /, (, ), NUMBER
Expr → Expr + Term
Production Rules: → Term + Term
1. Expr → Expr + Term → Factor + Term
2. Expr → Expr - Term → NUMBER + Term
→ 3 + Term
3. Expr → Term → Term * Factor
4. Term → Term * Factor → Factor * Factor
5. Term → Term / Factor → NUMBER * NUMBER
→ 5 * (2 - 1)
6. Term → Factor → 5 * (Expr - Expr)
7. Factor → ( Expr ) → 5 * (NUMBER - NUMBER)
8. Factor → NUMBER → 5 * (2 - 1)
Start Symbol: Expr
Description of Components
Parsing Techniques
Top-Down Parsing:
Begins from the start symbol and works down the
parse tree.
Examples: Recursive Descent, LL(1).
Bottom-Up Parsing
Builds the parse tree from leaves to root.
Examples: LR(0), SLR(1), LALR(1), Canonical LR.