Unit-2 CD
Unit-2 CD
To generate string “cad” it uses the rules as shown in the given diagram:
Advantages :Advantages of using syntax analysis in compiler design include:
Structural validation: Syntax analysis allows the compiler to check if the source code
follows the grammatical rules of the programming language, which helps to detect and
report errors in the source code.
Improved code generation: Syntax analysis can generate a parse tree or abstract syntax
tree (AST) of the source code, which can be used in the code generation phase of the
compiler design to generate more efficient and optimized code.
Easier semantic analysis: Once the parse tree or AST is constructed, the compiler can
perform semantic analysis more easily, as it can rely on the structural information
provided by the parse tree or AST.
Complexity: Parsing is a complex process, and the quality of the parser can greatly
impact the performance of the resulting code. Implementing a parser for a complex
programming language can be a challenging task, especially for languages with
ambiguous grammars.
Reduced performance: Syntax analysis can add overhead to the compilation process,
which can reduce the performance of the compiler.
Limited error recovery: Syntax analysis algorithms may not be able to recover from
errors in the source code, which can lead to incomplete or incorrect parse trees and
make it difficult for the compiler to continue the compilation process.
Inability to handle all languages: Not all languages have formal grammars, and some
languages may not be easily parseable.
Grammar :
It is a finite set of formal rules for generating syntactically correct sentences or meaningful
correct sentences.
Constitute Of Grammar :
Grammar is basically composed of two basic elements –
1. Terminal Symbols –
Terminal symbols are those which are the components of the sentences generated using
a grammar and are represented using small case letter like a, b, c etc.
2. Non-Terminal Symbols –
Non-Terminal Symbols are those symbols which take part in the generation of the
sentence but are not the component of the sentence. Non-Terminal Symbols are also
called Auxiliary Symbols and Variables. These symbols are represented using a capital
letter like A, B, C, etc.
Formal Definition of Grammar :
Any Grammar can be represented by 4 tuples – <N, T, P, S>
N – Finite Non-Empty Set of Non-Terminal Symbols.
T – Finite Set of Terminal Symbols.
P – Finite Non-Empty Set of Production Rules.
S – Start Symbol (Symbol from where we start producing our sentences or strings).
Production Rules :
A production or production rule in computer science is a rewrite rule specifying a symbol
substitution that can be recursively performed to generate new symbol sequences. It is of
the form α-> β where α is a Non-Terminal Symbol which can be replaced by β which is a
string of Terminal Symbols or Non-Terminal Symbols.
Example-1 :
Consider Grammar G1 = <N, T, P, S>
T = {a,b} #Set of terminal symbols
P = {A->Aa,A->Ab,A->a,A->b,A-> } #Set of all production rules
S = {A} #Start Symbol
Derivation Of Strings :
Example:
Consider the Grammar
S(L) | a
LSL’
L’Ꜫ | SL’
M ( ) a $
S 1 2
L 3 3
L’ 5 4 5 4
For any grammar, if M has multiple entries then it is not LL(1) grammar.
Example:
S→iEtSS’/a
S’→eS/ε
E→b
Important Notes
If a grammar contains left factoring then it can not be LL(1).
Eg - S -> aS | a
---- both productions go in a
If a grammar contains left recursion it can not be LL(1)
Eg - S -> Sa | b
S -> Sa goes to FIRST(S) = b
S -> b goes to b, thus b has 2 entries hence not LL(1)
Advantages:
Shift-reduce parsing is efficient and can handle a wide range of context-free grammars.
It can parse a large variety of programming languages and is widely used in practice.
It is capable of handling both left- and right-recursive grammars, which can be
important in parsing certain programming languages.
The parse table generated for shift-reduce parsing is typically small, which makes the
parser efficient in terms of memory usage.
Disadvantages:
Shift-reduce parsing has a limited lookahead, which means that it may miss some syntax
errors that require a larger lookahead.
It may also generate false-positive shift-reduce conflicts, which can require additional
manual intervention to resolve.
Shift-reduce parsers may have difficulty in parsing ambiguous grammars, where there
are multiple possible parse trees for a given input sequence.
In some cases, the parse tree generated by shift-reduce parsing may be more complex
than other parsing techniques.
LR Parser
LR parser is a bottom-up parser for context-free grammar that is very generally used
by computer programming language compiler and other associated tools. LR parser reads
their input from left to right and produces a right-most derivation. It is called a Bottom-up
parser because it attempts to reduce the top-level grammar productions by building up from
the leaves. LR parsers are the most powerful parser of all deterministic parsers in practice.
Description Of LR Parser :
The term parser LR(k) parser, here the L refers to the left-to-right scanning, R refers
to the rightmost derivation in reverse and k refers to the number of unconsumed “look
ahead” input symbols that are used in making parser decisions. Typically, k is 1 and is
often omitted. A context-free grammar is called LR (k) if the LR (k) parser exists for it.
This first reduces the sequence of tokens to the left. But when we read from above, the
derivation order first extends to non-terminal.
1. The stack is empty, and we are looking to reduce the rule by S’→S$.
2. Using a “.” in the rule represents how many of the rules are already on the stack.
3. A dotted item, or simply, the item is a production rule with a dot indicating how much
RHS has so far been recognized. Closing an item is used to see what production rules can
be used to expand the current structure. It is calculated as follows :
Rules for LR parser
The rules of LR parser as follows.
1. The first item from the given grammar rules adds itself as the first closed set.
2. If an object is present in the closure of the form A→ α. β. γ, where the next symbol after
the symbol is non-terminal, add the symbol’s production rules where the dot precedes the
first item.
3. Repeat steps (B) and (C) for new items added under (B).
LR parser algorithm :
LR Parsing algorithm is the same for all the parser, but the parsing table is different for
each parser. It consists following components as follows.
Input Buffer:
It contains the given string, and it ends with a $ symbol.
Stack :
The combination of state symbol and current input symbol is used to refer to the parsing
table in order to take the parsing decisions.
Parsing Table:
Parsing table is divided into two parts- Action table and Go-To table. The action
table gives a grammar rule to implement the given current state and current terminal in the
input stream. There are four cases used in action table as follows.
1. Shift Action- In shift action the present terminal is removed from the input stream and the
state n is pushed onto the stack, and it becomes the new present state.
2. Reduce Action- The number m is written to the output stream.
3. The symbol m mentioned in the left-hand side of rule m says that state is removed from
the stack.
4. The symbol m mentioned in the left-hand side of rule m says that a new state is looked
up in the goto table and made the new current state by pushing it onto the stack.
An accept - the string is accepted
No action - a syntax error is reported
LR parser diagram :
LALR Parser :
LALR Parser is lookahead LR parser. It is the most powerful parser which can handle
large classes of grammar. The size of CLR parsing table is quite large as compared to other
parsing table. LALR reduces the size of this table.LALR works similar to CLR. The only
difference is , it combines the similar states of CLR parsing table into one single state.
The general syntax becomes [A->∝.B, a ]
where A->∝.B is production and a is a terminal or right end marker $
LR(1) items=LR(0) items + look ahead
How to add lookahead with the production?
CASE 1 –
A->∝.BC, a
Suppose this is the 0th production.Now, since ‘ . ‘ precedes B,so we have to write B’s
productions as well.
B->.D [1st production]
Suppose this is B’s production. The look ahead of this production is given as- we look at
previous production i.e. – 0th production. Whatever is after B, we find FIRST(of that
value) , that is the lookahead of 1st production. So, here in 0th production, after B, C is
there. Assume FIRST(C)=d, then 1st production become.
B->.D, d