Document From Aditya Tripathi
Document From Aditya Tripathi
Syntax analysis, also known as parsing, is the second phase of a compiler. It takes the stream of tokens
produced by the lexical analyzer and checks if the sequence of tokens conforms to the grammatical rules
(syntax) of the programming language. If it does, it typically produces a parse tree or an abstract
syntax tree (AST) as output, which is then used by subsequent phases.
1. Working of Parser
The parser’s primary role is to determine if the input token stream can be generated by the language’s
grammar. It verifies the syntactic structure of the code. If the syntax is correct, it constructs a tree
representation (parse tree or AST) that reflects the hierarchical structure of the program. If syntax
errors are found, the parser reports them and attempts error recovery to continue parsing.
• Output: Parse tree / Abstract Syntax Tree (AST) or syntax error messages.
Reference Links:
• GeeksforGeeks - Syntax Analysis in Compiler Design
• TutorialsPoint - Compiler Design - Syntax Analysis
2. Top-down Parsing
Top-down parsing attempts to construct a parse tree by starting from the root (the start symbol of
the grammar) and working downwards to the input string. It predicts the next production rule to apply
based on the current non-terminal and the lookahead token.
• Characteristics:
– Starts with the start symbol and tries to match the input.
– Leftmost derivation.
– Can be implemented using recursive descent or predictive parsing.
– Grammars must be free of left recursion and left factoring for simple top-down parsers.
Reference Links:
• GeeksforGeeks - Top-Down Parsing
• TutorialsPoint - Compiler Design - Top Down Parser
3. Bottom-up Parsing
Bottom-up parsing attempts to construct a parse tree by starting from the input symbols (leaves)
and working upwards towards the root (the start symbol). It tries to reduce the input string to the start
symbol by applying grammar rules in reverse.
• Characteristics:
– Starts with the input string and tries to reduce it to the start symbol.
– Rightmost derivation in reverse.
– Commonly uses a shift-reduce approach.
– More powerful than top-down parsing for a wider range of grammars.
Reference Links:
• GeeksforGeeks - Bottom-Up Parsing
1
4. Operator Precedence Parsing
Operator precedence parsing is a specific type of bottom-up parsing that is suitable for expression
grammars. It works by defining precedence and associativity relations between operators. It doesn’t
require constructing a full parse tree immediately but uses precedence rules to guide reductions.
• Key Idea: It identifies handles (substrings that match the right-hand side of a production) based
on operator precedence.
• Limitations: Only applicable to operator grammars and cannot handle arbitrary context-free
grammars.
Reference Links:
• GeeksforGeeks - Operator Precedence Parser
• TutorialsPoint - Compiler Design - Operator Precedence Parser
5. Predictive Parsers
Predictive parsers are a type of top-down parser that does not use backtracking. They predict the
next production rule to apply based on the current non-terminal on the stack and the next input token
(lookahead).
• LL(1) Parsers: A common form of predictive parser. The ”LL(1)” stands for:
– First ’L’: Scans the input from Left to right.
– Second ’L’: Produces a Leftmost derivation.
– ’(1)’: Uses 1 lookahead token to make parsing decisions.
• They rely on a parsing table, which indicates which production to use. Grammars for predictive
parsers must be free of left recursion and left factoring.
Reference Links:
• GeeksforGeeks - Predictive Parsing (LL(1) Parser)
• TutorialsPoint - Compiler Design - Predictive Parser
• General Working:
1. The parser uses a stack to store states and grammar symbols.
2. It reads input tokens one by one.
3. Based on the current state (top of stack) and the current lookahead token, it consults a
parsing table.
4. The table specifies one of four actions: shift, reduce, accept, or error.
Reference Links:
• GeeksforGeeks - LR Parser in Compiler Design
• TutorialsPoint - Compiler Design - LR Parser
2
Constructing SLR Parsing Tables:
1. Augment the grammar with a new start symbol and production (S ′ → S).
2. Construct the Canonical Collection of LR(0) items. An LR(0) item is a production rule with a
dot ‘.‘ indicating the current parsing position.
3. Compute CLOSURE and GOTO functions for these items to build the DFA of states.
4. Fill the ACTION and GOTO table:
Reference Links:
8. Canonical LR (CLR)
CLR (Canonical LR) parsers are the most powerful type of LR parser. They are capable of parsing the
largest class of grammars among LR types. CLR parsers distinguish between states based on lookahead
symbols, making their parsing tables larger and more complex.
2. Construct the Canonical Collection of LR(1) items. An LR(1) item is [A -> α.β, a], where a
is a lookahead terminal.
3. Compute CLOSURE and GOTO functions for LR(1) items. The CLOSURE operation now involves
computing the FIRST set of the lookahead symbol.
Reference Links:
3
Constructing LALR Parsing Tables:
1. Construct the Canonical Collection of LR(1) items.
2. Identify LR(1) states that are identical if their lookahead sets are ignored (i.e., their LR(0) cores
are the same).
3. Merge these states. When merging, the lookahead sets of their LR(1) items are combined. This
can introduce reduce-reduce conflicts that were not present in the CLR table.
4. Fill the ACTION and GOTO table similar to CLR, using the merged states and combined lookaheads
for reduce actions.
Reference Links:
• GeeksforGeeks - LALR Parser
• Javatpoint - LALR Parser
Reference Links:
• GeeksforGeeks - Ambiguous Grammar
• TutorialsPoint - Compiler Design - Ambiguity in Grammar
• Examples:
– YACC (Yet Another Compiler Compiler) / Bison: Generates LR parsers (LALR by
default) in C. It’s often used with Lex/Flex (a lexical analyzer generator).
– ANTLR: A powerful parser generator that supports various languages (Java, C#, Python,
JavaScript, etc.) and can generate LL(*) parsers.
– JavaCC: A parser generator for Java that generates LL(k) parsers.
4
Reference Links:
• GeeksforGeeks - YACC Tutorial
• TutorialsPoint - Lex & Yacc Tutorial