0% found this document useful (0 votes)
7 views5 pages

Document From Aditya Tripathi

Syntax analysis, or parsing, is the second phase of a compiler that checks if a stream of tokens adheres to the grammatical rules of a programming language, producing a parse tree or abstract syntax tree if correct. It includes various parsing techniques such as top-down, bottom-up, predictive, and LR parsing, each with specific characteristics and applications. Ambiguous grammars pose challenges in parsing, which can be addressed through grammar rewriting or using parser generator tools like YACC and ANTLR.

Uploaded by

Aditya Tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views5 pages

Document From Aditya Tripathi

Syntax analysis, or parsing, is the second phase of a compiler that checks if a stream of tokens adheres to the grammatical rules of a programming language, producing a parse tree or abstract syntax tree if correct. It includes various parsing techniques such as top-down, bottom-up, predictive, and LR parsing, each with specific characteristics and applications. Ambiguous grammars pose challenges in parsing, which can be addressed through grammar rewriting or using parser generator tools like YACC and ANTLR.

Uploaded by

Aditya Tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

UNIT-II: Syntax Analysis

Syntax analysis, also known as parsing, is the second phase of a compiler. It takes the stream of tokens
produced by the lexical analyzer and checks if the sequence of tokens conforms to the grammatical rules
(syntax) of the programming language. If it does, it typically produces a parse tree or an abstract
syntax tree (AST) as output, which is then used by subsequent phases.

1. Working of Parser
The parser’s primary role is to determine if the input token stream can be generated by the language’s
grammar. It verifies the syntactic structure of the code. If the syntax is correct, it constructs a tree
representation (parse tree or AST) that reflects the hierarchical structure of the program. If syntax
errors are found, the parser reports them and attempts error recovery to continue parsing.

• Input: Stream of tokens from the lexical analyzer.

• Output: Parse tree / Abstract Syntax Tree (AST) or syntax error messages.

Reference Links:
• GeeksforGeeks - Syntax Analysis in Compiler Design
• TutorialsPoint - Compiler Design - Syntax Analysis

2. Top-down Parsing
Top-down parsing attempts to construct a parse tree by starting from the root (the start symbol of
the grammar) and working downwards to the input string. It predicts the next production rule to apply
based on the current non-terminal and the lookahead token.

• Characteristics:
– Starts with the start symbol and tries to match the input.
– Leftmost derivation.
– Can be implemented using recursive descent or predictive parsing.
– Grammars must be free of left recursion and left factoring for simple top-down parsers.

Reference Links:
• GeeksforGeeks - Top-Down Parsing
• TutorialsPoint - Compiler Design - Top Down Parser

3. Bottom-up Parsing
Bottom-up parsing attempts to construct a parse tree by starting from the input symbols (leaves)
and working upwards towards the root (the start symbol). It tries to reduce the input string to the start
symbol by applying grammar rules in reverse.

• Characteristics:

– Starts with the input string and tries to reduce it to the start symbol.
– Rightmost derivation in reverse.
– Commonly uses a shift-reduce approach.
– More powerful than top-down parsing for a wider range of grammars.

Reference Links:
• GeeksforGeeks - Bottom-Up Parsing

• TutorialsPoint - Compiler Design - Bottom Up Parser

1
4. Operator Precedence Parsing
Operator precedence parsing is a specific type of bottom-up parsing that is suitable for expression
grammars. It works by defining precedence and associativity relations between operators. It doesn’t
require constructing a full parse tree immediately but uses precedence rules to guide reductions.

• Key Idea: It identifies handles (substrings that match the right-hand side of a production) based
on operator precedence.
• Limitations: Only applicable to operator grammars and cannot handle arbitrary context-free
grammars.

Reference Links:
• GeeksforGeeks - Operator Precedence Parser
• TutorialsPoint - Compiler Design - Operator Precedence Parser

5. Predictive Parsers
Predictive parsers are a type of top-down parser that does not use backtracking. They predict the
next production rule to apply based on the current non-terminal on the stack and the next input token
(lookahead).
• LL(1) Parsers: A common form of predictive parser. The ”LL(1)” stands for:
– First ’L’: Scans the input from Left to right.
– Second ’L’: Produces a Leftmost derivation.
– ’(1)’: Uses 1 lookahead token to make parsing decisions.
• They rely on a parsing table, which indicates which production to use. Grammars for predictive
parsers must be free of left recursion and left factoring.
Reference Links:
• GeeksforGeeks - Predictive Parsing (LL(1) Parser)
• TutorialsPoint - Compiler Design - Predictive Parser

6. LR Parsers (SLR, Canonical LR, LALR)


LR parsers are a class of powerful, non-backtracking, bottom-up parsers. They are ”Left-to-right scan,
Rightmost derivation in reverse”. They can parse almost all programming language grammars for which
a context-free grammar can be written. LR parsers use a stack and a parsing table to make shift/reduce
decisions.

• General Working:
1. The parser uses a stack to store states and grammar symbols.
2. It reads input tokens one by one.
3. Based on the current state (top of stack) and the current lookahead token, it consults a
parsing table.
4. The table specifies one of four actions: shift, reduce, accept, or error.

Reference Links:
• GeeksforGeeks - LR Parser in Compiler Design
• TutorialsPoint - Compiler Design - LR Parser

7. SLR (Simple LR)


SLR is the simplest and least powerful type of LR parser. It uses FOLLOW sets to resolve reduce
conflicts.

2
Constructing SLR Parsing Tables:
1. Augment the grammar with a new start symbol and production (S ′ → S).

2. Construct the Canonical Collection of LR(0) items. An LR(0) item is a production rule with a
dot ‘.‘ indicating the current parsing position.
3. Compute CLOSURE and GOTO functions for these items to build the DFA of states.
4. Fill the ACTION and GOTO table:

• Shift actions: If [A -> α.aβ] is in state I and GOTO(I, a) = J, then ACTION[I, a] =


shift J.
• Reduce actions: If [A -> α.] is in state I (where A != S’), then for all b in FOLLOW(A),
ACTION[I, b] = reduce A -> α.
• Accept action: If [S’ -> S.] is in state I, then ACTION[I, $] = accept.
• Error: All undefined entries are error.

Reference Links:

• GeeksforGeeks - SLR Parser


• Javatpoint - SLR Parser

8. Canonical LR (CLR)
CLR (Canonical LR) parsers are the most powerful type of LR parser. They are capable of parsing the
largest class of grammars among LR types. CLR parsers distinguish between states based on lookahead
symbols, making their parsing tables larger and more complex.

Constructing Canonical LR Parsing Tables:


1. Augment the grammar (S ′ → S).

2. Construct the Canonical Collection of LR(1) items. An LR(1) item is [A -> α.β, a], where a
is a lookahead terminal.
3. Compute CLOSURE and GOTO functions for LR(1) items. The CLOSURE operation now involves
computing the FIRST set of the lookahead symbol.

4. Fill the ACTION and GOTO table:


• Shift actions: If [A -> α.aβ, b] is in state I and GOTO(I, a) = J, then ACTION[I, a]
= shift J.
• Reduce actions: If [A -> α., a] is in state I (where A != S’), then ACTION[I, a] =
reduce A -> α. (Only for the specified lookahead a).
• Accept action: If [S’ -> S., $] is in state I, then ACTION[I, $] = accept.

Reference Links:

• GeeksforGeeks - CLR (Canonical LR) Parser


• Javatpoint - CLR Parser

9. LALR (Look-Ahead LR)


LALR parsers are a compromise between SLR and CLR. They have the same number of states as SLR
parsers but can handle more grammars than SLR (though fewer than CLR). LALR parsing tables are
often much smaller than CLR tables, making them practical for real-world compilers.

3
Constructing LALR Parsing Tables:
1. Construct the Canonical Collection of LR(1) items.
2. Identify LR(1) states that are identical if their lookahead sets are ignored (i.e., their LR(0) cores
are the same).
3. Merge these states. When merging, the lookahead sets of their LR(1) items are combined. This
can introduce reduce-reduce conflicts that were not present in the CLR table.
4. Fill the ACTION and GOTO table similar to CLR, using the merged states and combined lookaheads
for reduce actions.

Reference Links:
• GeeksforGeeks - LALR Parser
• Javatpoint - LALR Parser

10. Using Ambiguous Grammars


An ambiguous grammar is a context-free grammar for which there exists at least one string that has
more than one leftmost derivation or more than one parse tree.

• Problems with Ambiguous Grammars in Parsing:


– Deterministic Parsing: Deterministic parsers (like LL, LR parsers) require a single, unam-
biguous decision at each step. An ambiguous grammar causes parsing conflicts (shift-reduce
or reduce-reduce conflicts) in the parsing table, making it impossible for the parser to
decide which rule to apply.
– Meaning: Different parse trees imply different semantic interpretations of the same code,
leading to unpredictable program behavior.
• Handling Ambiguity:
– Rewrite the Grammar: The primary way is to rewrite the grammar to remove ambiguity,
typically by incorporating precedence and associativity rules directly into the grammar.
– Parser Generator Directives: Tools like YACC/Bison allow specifying precedence and
associativity rules for operators, which helps resolve conflicts in the parsing table without
explicitly rewriting the grammar.
– Semantic Actions: In some cases, ambiguity might be resolved at the semantic analysis
stage if it doesn’t cause a syntactic conflict.

Reference Links:
• GeeksforGeeks - Ambiguous Grammar
• TutorialsPoint - Compiler Design - Ambiguity in Grammar

11. An Automatic Parser Generator


An automatic parser generator is a tool that takes a grammar specification (usually in a specialized
format like BNF or EBNF) as input and automatically generates source code for a parser in a target
programming language (e.g., C, Java). These tools significantly simplify the process of building parsers.

• Examples:
– YACC (Yet Another Compiler Compiler) / Bison: Generates LR parsers (LALR by
default) in C. It’s often used with Lex/Flex (a lexical analyzer generator).
– ANTLR: A powerful parser generator that supports various languages (Java, C#, Python,
JavaScript, etc.) and can generate LL(*) parsers.
– JavaCC: A parser generator for Java that generates LL(k) parsers.

4
Reference Links:
• GeeksforGeeks - YACC Tutorial
• TutorialsPoint - Lex & Yacc Tutorial

• ANTLR Website - https://www.antlr.org/

You might also like