0% found this document useful (0 votes)
2 views

Syntax Analysis Parsing (1)

Wollo university

Uploaded by

milliyanmuhe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Syntax Analysis Parsing (1)

Wollo university

Uploaded by

milliyanmuhe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Syntax Analysis

3.1 Parsing

Parsing involves analyzing a string of tokens to determine its grammatical structure. It


ensures the source code conforms to the language's syntax rules.

Types of Parsing:
1. Top-down Parsing: Constructs the parse tree from root to leaves (e.g., Recursive Descent
Parsing).
2. Bottom-up Parsing: Constructs the parse tree from leaves to root (e.g., LR Parsing).

Example:
Grammar:
E→E+T|T
T→T*F|F
F → (E) | id

Input: id + id * id

Parsing determines if the input matches the start symbol E.

3.2 Top-down Parsing

Top-down parsing starts at the root of the parse tree and expands non-terminals using
production rules to match the input string from left to right.

Example:
Grammar:
S → aA
A→b
Input: ab
Steps:
1. Start with S.
2. Expand S → aA.
3. Match a.
4. Expand A → b and match b.

Parse Tree:
S
/\
a A
|
b

Case study on parse tree construction (top-down)


In this case study, we'll explore the process of parsing a simple expression language
that supports basic arithmetic operations and parentheses. Parsing is a fundamental
concept in compiler design and language processing, and it involves breaking down
a string of symbols into its syntactic structure.

The Grammar
To define the structure of our expression language, we'll use a context-free
grammar:

expression -> expression '+' term | expression '-' term | term


term -> term '*' factor | term '/' factor | factor
factor -> '(' expression ')' | NUMBER
This grammar defines the following rules:
Expression:
An expression can be an expression added to a term.
An expression can be an expression subtracted from a term.
An expression can be a single term.

Term:
A term can be a term multiplied by a factor.
A term can be a term divided by a factor.
A term can be a single factor.

Factor:
A factor can be an expression enclosed in parentheses.
A factor can be a number.

Parsing Process
To parse an expression, we can use a parsing algorithm like recursive descent or
shift-reduce. These algorithms analyze the input string token by token, building a
parse tree that represents the syntactic structure of the expression.

Example:

Consider the expression 2 * (3 + 4). Here's how it would be parsed:

Match 2 *: This matches the term -> term * factor rule.


Match 2: This matches the factor -> NUMBER rule.
Match (3 + 4): This matches the factor -> ( expression ) rule.
Match 3 + 4: This matches the expression -> expression + term rule.
Match 3 and 4: Both match the factor -> NUMBER rule.
The resulting parse tree would look like this:

expression
/ \
term expression
/ \ / \
factor * factor + factor
| | | | |
NUMBER NUMBER NUMBER NUMBER NUMBER

Implementation
We can implement a parser using a programming language like C++, Java, or Python.
Parser generators like Yacc or Bison can automate the process of generating a
parser from a grammar specification.

Key Points:
- Tokenization: The input string is broken down into tokens (e.g., numbers,
operators, parentheses).
- Parsing: The tokens are analyzed to determine the syntactic structure of the
expression.
- Semantic Analysis: The meaning of the expression is checked, and type checking is
performed.
- Code Generation: The parsed expression is translated into machine code or
intermediate code.

Conclusion
Parsing is a fundamental technique in compiler design and language processing. By
understanding the grammar of a language and using appropriate parsing
algorithms, we can analyze and interpret input strings. This case study provides a
basic introduction to parsing and demonstrates how to construct a parse tree for a
simple expression language.

3.3.1 Predictive Parsing

Predictive parsing uses lookahead tokens to decide which production to use. It avoids
backtracking and requires the grammar to be LL(1).

Example:
Grammar:
E → T E'
E' → + T E' | ε
T → id

Input: id + id

Steps:
1. Start with E.
2. Expand E → T E'.
3. Match T → id, expand E' → + T E', and match tokens.

3.4.1 Top-down Parsing Principles of CFG

Context-Free Grammar (CFG) consists of terminals, non-terminals, a start symbol, and


production rules.

Principles:
- Always expand the leftmost non-terminal (Leftmost Derivation).
- Eliminate Left Recursion for top-down parsing.

Example:
Original Grammar: A → Aα | β
After Elimination: A → βA', A' → αA' | ε

3.5 Regular Expressions vs CFG

| Aspect | Regular Expressions | Context-Free Grammar (CFG) |


|-----------------|----------------------------|----------------------------|
| Expressiveness | Regular languages | Context-free languages |
| Applications | Token patterns | Programming constructs |
| Parsing | Finite automata | Parsers (Top-down/Bottom-up)|

Example:
Regex: [a-zA-Z_][a-zA-Z_0-9]*
CFG: E → T + E | T

3.6 Recursive Descent Parsing

Recursive Descent Parsing uses recursive functions corresponding to grammar non-


terminals.

Example Grammar:
E→T+E|T
T → id

Pseudo-code:
void E() {
T();
if (lookahead == '+') {
match('+'); E(); }}
3.7 Bottom-Up Parsing

Bottom-up parsing reduces input symbols to the start symbol.

Shift-Reduce Parsing Example:


Grammar:
E→E+T|T
T → id

Input: id + id
Steps:
1. Shift id, reduce to T.
2. Shift +, id, reduce to T.
3. Reduce T + T to E.

Example : Bottom-up parsing is a parsing technique that starts from the input tokens and
gradually reduces them to the start symbol of the grammar. It's like building a house from
the ground up, starting with the foundation and working your way to the roof.

Consider the following grammar:


S -> A B C
A -> a
B -> b
C -> c

And the input string: abc

Parsing Steps:

1. **Shift**:
- The parser reads the first input symbol 'a' and shifts it onto the stack.
- Stack: a
- Input: bc

2. **Reduce**:
- The top of the stack 'a' matches the right-hand side of the production A -> a.
- The parser reduces 'a' to 'A'.
- Stack: A
- Input: bc

3. **Shift**:
- The parser reads the next input symbol 'b' and shifts it onto the stack.
- Stack: Ab
- Input: c

4. **Reduce**:
- The top of the stack 'b' matches the right-hand side of the production B -> b.
- The parser reduces 'b' to 'B'.
- Stack: AB
- Input: c

5. **Shift**:
- The parser reads the next input symbol 'c' and shifts it onto the stack.
- Stack: ABC
- Input: (empty)

6. **Reduce**:
- The top three symbols on the stack 'ABC' match the right-hand side of the production S ->
ABC.
- The parser reduces 'ABC' to 'S'.
- Stack: S
- Input: (empty)

Now, the stack contains only the start symbol 'S', which means the input string has been
successfully parsed.

Key Points:
- **Shift-Reduce Actions**: The parser alternates between shifting input symbols onto the
stack and reducing sequences of symbols to non-terminals.
- **Handle Identification**: The parser must correctly identify the handle (the rightmost
substring that can be reduced) at each step.
- **Parse Table**: A parse table is used to determine the appropriate action (shift or reduce)
based on the current state and input symbol.

Common Bottom-Up Parsing Algorithms:


- **LR(0)**: Simple but limited in its ability to handle ambiguous grammars.
- **SLR(1)**: More powerful than LR(0), but still has limitations.
- **LR(1)**: More powerful than SLR(1), but can be complex to implement.
- **LALR(1)**: A compromise between SLR(1) and LR(1), providing a good balance of
power and simplicity.

You might also like