Summary: LR (0) Parsing: 1.1 Derivations
Summary: LR (0) Parsing: 1.1 Derivations
1.1 Derivations
A derivation is a sequence of mappings between the START symbol and the input tokens. At each step, one nonterminal is replaced by the RHS of a production rule which has the nonterminal on it LHS. The Leftmost Derivation is obtained by starting with the START symbol, and always expanding the leftmost nonterminal first. E.g., (the nonterminal in red is the one about to be expanded): E E*E (E)*E (E+E)*E ( id + E ) * E ( id + id ) * E ( id + id ) * id The RightMost Derivation is obtained by starting with the START symbol, and always expanding the rightmost nonterminal first: E E*E E * id ( E ) * id ( E + E ) * id ( E + id ) * id ( id + id ) * id LR parsing produces a rightmost derivation, but since it works from the bottom up, it reduces sequence of tokens on the left side first. But when read from the top, the derivation sequence appears to expand rightmost nonterminals first.
Example: Given the grammar: 1. E:- E $ 2. E :- T 3. E :- E + T 3. T :- id 4. T :- ( E ) Closure of E:- . E $: E:- . E $ E :- . T E :- . E + T T :- . id T :- . ( E ) (by rule a) (by rule b) (by rule b) (by rule c, b) (by rule c, b)
2 LR(0) Parsing
For this discussion, assume the following grammar: 1. E :- T 2. E :- E + T 3. T :- id 4. T :- ( E )
3) For each unique symbol which follows a dot in the closure(S0): a. Create a new state representing the recognition of the symbol b. Link the original state to the new state with an arc labelled with the recognised symbol c. Add a DFA node for this state.
S1 E E:- . E $ E :- . T E :- . E + T T :- . id T :- . ( E ) T id ( S4 S2 S3
S0
d. The closure of the new states is calculated as follows: i) It contains only those items from the previous state where the recognised symbol was the symbol after the dot. ii) For each of these items, move the dot AFTER he recognised symbols. iii) Where the new next symbol is a nonterminal, add the closure of these items to the closure (note in the state S4, we started with just T -> (. E ) but as the next symbol was E, we closure of this item added 3 other rules:
S1 E:- E . $ E :- E . + T S2 S3 T :- id . S4 T :- ( . E ) E :- . T E :- . E + T T :- . id T :- . ( E ) E :- T .
S0 E:- . E $ E :- . T E :- . E + T T :- . id T :- . ( E )
E T id (
e. Where the closure includes an item with dot at end, place a double circle around the state (some actions from this state cause a REDUCE)
S1
E:- E . $ E :- E . + T S2 E :- T . S3 T :- id . S4 T :- ( . E ) E :- . T E :- . E + T T :- . id T :- . ( E )
S0 E:- . E $ E :- . T E :- . E + T T :- . id T :- . ( E )
E T id (
4) Repeat (3) for each of the new states, Except: where a new state which has the same closure as an existing one, link to that state instead.
a. If shift, move the next token onto the top of the stack, and move the pointer to the next token. b. If reduce, look up the rule given in the action, and remove n*2 items from the stack (where n is the number of symbols on the RHS of the rule). Then place the LHS of the production on the stack. c. If accept, then finish parsing, we have a finished analysis. 2. Goto State: The top of the stack now contains a state and a symbol (terminal or nonterminal). In the Goto table, look up the row for the state, and the column symbol. a. If there is a number there, put this number on the top of the stack (it is the new state number). b. If there is no Goto, then return ERROR (the input cannot be parsed by the grammar).