Syntax Analysis Parsing (1)
Syntax Analysis Parsing (1)
3.1 Parsing
Types of Parsing:
1. Top-down Parsing: Constructs the parse tree from root to leaves (e.g., Recursive Descent
Parsing).
2. Bottom-up Parsing: Constructs the parse tree from leaves to root (e.g., LR Parsing).
Example:
Grammar:
E→E+T|T
T→T*F|F
F → (E) | id
Input: id + id * id
Top-down parsing starts at the root of the parse tree and expands non-terminals using
production rules to match the input string from left to right.
Example:
Grammar:
S → aA
A→b
Input: ab
Steps:
1. Start with S.
2. Expand S → aA.
3. Match a.
4. Expand A → b and match b.
Parse Tree:
S
/\
a A
|
b
The Grammar
To define the structure of our expression language, we'll use a context-free
grammar:
Term:
A term can be a term multiplied by a factor.
A term can be a term divided by a factor.
A term can be a single factor.
Factor:
A factor can be an expression enclosed in parentheses.
A factor can be a number.
Parsing Process
To parse an expression, we can use a parsing algorithm like recursive descent or
shift-reduce. These algorithms analyze the input string token by token, building a
parse tree that represents the syntactic structure of the expression.
Example:
expression
/ \
term expression
/ \ / \
factor * factor + factor
| | | | |
NUMBER NUMBER NUMBER NUMBER NUMBER
Implementation
We can implement a parser using a programming language like C++, Java, or Python.
Parser generators like Yacc or Bison can automate the process of generating a
parser from a grammar specification.
Key Points:
- Tokenization: The input string is broken down into tokens (e.g., numbers,
operators, parentheses).
- Parsing: The tokens are analyzed to determine the syntactic structure of the
expression.
- Semantic Analysis: The meaning of the expression is checked, and type checking is
performed.
- Code Generation: The parsed expression is translated into machine code or
intermediate code.
Conclusion
Parsing is a fundamental technique in compiler design and language processing. By
understanding the grammar of a language and using appropriate parsing
algorithms, we can analyze and interpret input strings. This case study provides a
basic introduction to parsing and demonstrates how to construct a parse tree for a
simple expression language.
Predictive parsing uses lookahead tokens to decide which production to use. It avoids
backtracking and requires the grammar to be LL(1).
Example:
Grammar:
E → T E'
E' → + T E' | ε
T → id
Input: id + id
Steps:
1. Start with E.
2. Expand E → T E'.
3. Match T → id, expand E' → + T E', and match tokens.
Principles:
- Always expand the leftmost non-terminal (Leftmost Derivation).
- Eliminate Left Recursion for top-down parsing.
Example:
Original Grammar: A → Aα | β
After Elimination: A → βA', A' → αA' | ε
Example:
Regex: [a-zA-Z_][a-zA-Z_0-9]*
CFG: E → T + E | T
Example Grammar:
E→T+E|T
T → id
Pseudo-code:
void E() {
T();
if (lookahead == '+') {
match('+'); E(); }}
3.7 Bottom-Up Parsing
Input: id + id
Steps:
1. Shift id, reduce to T.
2. Shift +, id, reduce to T.
3. Reduce T + T to E.
Example : Bottom-up parsing is a parsing technique that starts from the input tokens and
gradually reduces them to the start symbol of the grammar. It's like building a house from
the ground up, starting with the foundation and working your way to the roof.
Parsing Steps:
1. **Shift**:
- The parser reads the first input symbol 'a' and shifts it onto the stack.
- Stack: a
- Input: bc
2. **Reduce**:
- The top of the stack 'a' matches the right-hand side of the production A -> a.
- The parser reduces 'a' to 'A'.
- Stack: A
- Input: bc
3. **Shift**:
- The parser reads the next input symbol 'b' and shifts it onto the stack.
- Stack: Ab
- Input: c
4. **Reduce**:
- The top of the stack 'b' matches the right-hand side of the production B -> b.
- The parser reduces 'b' to 'B'.
- Stack: AB
- Input: c
5. **Shift**:
- The parser reads the next input symbol 'c' and shifts it onto the stack.
- Stack: ABC
- Input: (empty)
6. **Reduce**:
- The top three symbols on the stack 'ABC' match the right-hand side of the production S ->
ABC.
- The parser reduces 'ABC' to 'S'.
- Stack: S
- Input: (empty)
Now, the stack contains only the start symbol 'S', which means the input string has been
successfully parsed.
Key Points:
- **Shift-Reduce Actions**: The parser alternates between shifting input symbols onto the
stack and reducing sequences of symbols to non-terminals.
- **Handle Identification**: The parser must correctly identify the handle (the rightmost
substring that can be reduced) at each step.
- **Parse Table**: A parse table is used to determine the appropriate action (shift or reduce)
based on the current state and input symbol.