Week 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Syntax Analysis Context-Free Grammar: A context-free grammar G is a quadruple (V, , R, S), where V is an alphabet, (the set of terminals) is a subset

t of V, R (the set of rules) is a finite subset of (V-) V*, and S (the start symbol) is an element of V-. The members of V- are called nonterminals. For any A V- and u V*, we write A u whenever (A, u) R. For any strings u, v V*, we write u => v if and only if there are strings s, y, v V* and A V- such that u = xAy, v=xvy, and A v. The relation =>* is the reflexive, transitive closure of =>. Finally, L(G), the language generated by G, is {w *: S =>*w}; we also say that G generates each string in L(G). A language L is said to be a context- free language if it is equal to L(G) for some contextfree grammar G. Not all context- free languages are regular. Every regular language is context- free. A context-free grammar G = (V, , R, S) is regular if and only if R (V-) *((V-) {e}). That is, the right-hand side of every rule contains at most one non-terminal, which, if present, must be the last symbol in the string. A language is regular if and only if it is generated by a regular grammar. A program that performs syntax analysis is called a parser. A syntax analyzer takes tokens as input and output error message if the program syntax is wrong. Top-down Parsing The parser uses the single-symbol look-ahead method and an approach called top-down parsing without backtracking. Rule 1. For every BNF rule D = R, the parser defines a procedure of the same name: proc D() begin a(R); end; where a(R) checks if input tokens form a sentence described by expression R.

If a valid sentence is found, the parse reads out the whole sentence, otherwise a syntax error is reported. Rule 2. A sequence of sentences of the forms F1, F2, , Fn is recognized by scanning the individual sentences one at a time in the order written: a(F1 F2 Fn) = a(F1); a(F2);a(Fn); Rule 3. A single token is recognized by the procedure expect(). Rule 4. To recognize a sentence described by a BNF rule named N the parser calls the corresponding procedure named N: a(N) = N Rule 5. The parser uses the following algorithm to recognize option rule [E]: a([E]) = if currToken in First(E) then a(E). Rule 6. The parse uses the following algorithm to recognize repetition rule {E}: a({E}) = while currToken in First(E) do a(E). Rule 7. If the Ois are non-empty, the following algorithm recognizes a sentence of the form O1 | O2 | O3 || On: a(O1 | O2 | O3 || On ) = if currToken in First(O1) then a(O1); else if currToken in First(O2) then a(O2); else if currToken in First(On) then a(On); else syntaxError(); First Symbols : 1) First(Empty) = Symbols[] 2) First(s) = Symbols[s], given a single token s 3) First(XY) = First(X), given X is non-empty 4) First(XY) = First(X) + First(Y) if X may be empty 5) First(X|Y) = First(X) + First(Y) 6) First([X]) = First({X}) = First(X) Follow symbols: The set of symbols that may follow after a non-terminal symbol T is denoted Follow(T). 1) T of the forms N=STU N = S [T] U N = S {T} U If U is non-empty, Follow(T) includes First(U). If U may be empty, Follow(T) includes Follow(N). 2) T of the forms N=ST N = S [T]

N = S {T} Follow(T) includes Follow(N). 3) If T occurs in the form {T}, Follow(T) includes First(T). Grammar Restrictions Restriction 1. Consider the sentences of the forms N = E | F. To decide whether the next symbol is the beginning of an E or F sentence, the parse must assume that the two kinds of sentences always begin with different symbols. So the BNF grammar must satisfy the following condition: First(E) * First(F) = Symbols[]. Ambiguity A grammar that produces more than one parse three for some sentence is said to be ambiguous. Restriction 2. If some of the sentences described by a BNF rule N may be empty, the grammar must be constrained as follows: First(N) * Follow(N) = Symbols[]. Any rule that contains [E] or {E} must be considered. Recursive-Descent Parsing: A set of recursive procedures are used to analyze a sentence from the top down in more and more detain until individual tokens are recognized. Elimination of Left Recursion: The rule A Aa | b can be replace by the non- left recursive productions A bA A aA | e without changing the set of strings derivable form A. This will eliminate immediate leftrecursion but not the following case: S Aa | b A Ac | Sd | e where S => Aa => Sda is also left-recursive.

You might also like