0% found this document useful (0 votes)
117 views37 pages

Parsing: Fall 2005 Costas Buch - RPI 1

The document discusses parsing and parsers. It explains that a parser knows the grammar of a programming language and finds the derivation of an input program by applying grammar rules. The exhaustive search parser examines all possible derivations in phases, starting with length 1, to find the derivation of an input string. This has exponential time complexity. Faster parsing algorithms exist for specialized grammars, such as S-grammars and the CYK algorithm, which can parse in cubic time.

Uploaded by

tariqravian
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views37 pages

Parsing: Fall 2005 Costas Buch - RPI 1

The document discusses parsing and parsers. It explains that a parser knows the grammar of a programming language and finds the derivation of an input program by applying grammar rules. The exhaustive search parser examines all possible derivations in phases, starting with length 1, to find the derivation of an input string. This has exponential time complexity. Faster parsing algorithms exist for specialized grammars, such as S-grammars and the CYK algorithm, which can parse in cubic time.

Uploaded by

tariqravian
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Parsing

Fall 2005 Costas Buch - RPI 1


Machine Code
Program Add v,v,0
v = 5; cmp v,5
if (v>5) jmplt ELSE
x = 12 + v; THEN:
while (x !=3) { Compiler
add x, 12,v
x = x - 3; ELSE:
v = 10; WHILE:
} cmp x,3
...... ...
Fall 2005 Costas Buch - RPI 2
Compiler

Lexical
parser
analyzer

input output

machine
program
code
Fall 2005 Costas Buch - RPI 3
A parser knows the grammar
of the programming language

Fall 2005 Costas Buch - RPI 4


Parser
PROGRAM  STMT_LIST
STMT_LIST  STMT; STMT_LIST | STMT;
STMT  EXPR | IF_STMT | WHILE_STMT
| { STMT_LIST }

EXPR  EXPR + EXPR | EXPR - EXPR | ID


IF_STMT  if (EXPR) then STMT
| if (EXPR) then STMT else STMT
WHILE_STMT while (EXPR) do STMT

Fall 2005 Costas Buch - RPI 5


The parser finds the derivation
of a particular input

derivation
Parser
E => E + E
input
E -> E + E => E + E * E
10 + 2 * 5 |E*E => 10 + E*E
| INT => 10 + 2 * E
=> 10 + 2 * 5

Fall 2005 Costas Buch - RPI 6


derivation derivation tree
E
E => E + E
=> E + E * E E + E

=> 10 + E*E 10
E
* E
=> 10 + 2 * E
=> 10 + 2 * 5 2 5

machine code
mult a, 2, 5
add b, 10, a
Fall 2005 Costas Buch - RPI 7
A simple parser

Fall 2005 Costas Buch - RPI 8


We will build an exhaustive search parser
that examines all possible derivations

Exhaustive Parser
input
grammar derivation
string

Fall 2005 Costas Buch - RPI 9


Example:

Find derivation of string aabb

Exhaustive Parser
S  SS derivation
input
S  aSb
aabb ?
S  bSa
S 

Fall 2005 Costas Buch - RPI 10


Exhaustive Search

S  SS | aSb | bSa | 

Phase 1: S  SS Find derivation


of aabb
S  aSb
S  bSa
S 
All possible derivations of length 1
Fall 2005 Costas Buch - RPI 11
S  SS Find derivation
of aabb
S  aSb
S  bSa
S 

Fall 2005 Costas Buch - RPI 12


Phase 2 S  SS | aSb | bSa | 
S  SS  SSS
S  SS  aSbS
Phase 1 S  SS  bSaS
Find derivation
S  SS S  SS  S of aabb
S  aSb S  aSb  aSSb
S  aSb  aaSbb
S  aSb  abSab
Fall 2005
S  aSb  ab
Costas Buch - RPI 13
S  SS | aSb | bSa | 
Phase 2
S  SS  SSS
S  SS  aSbS Find derivation
of aabb
S  SS  S

S  aSb  aSSb
S  aSb  aaSbb
Phase 3
S  aSb  aaSbb  aabb
Fall 2005 Costas Buch - RPI 14
Final result of exhaustive search

Exhaustive Parser
S  SS
input
S  aSb
aabb
S  bSa
S 
derivation
S  aSb  aaSbb  aabb
Fall 2005 Costas Buch - RPI 15
The time complexity of exhaustive search

Suppose there are no productions of the form


A (  productions)
A B (unit productions)

Number of phases for string w : at most 2 | w |


Since in every phase: either a new variable is inserted
or a variable changes to terminal
Fall 2005 Costas Buch - RPI 16
For grammar with k productions

Derivation steps for phase 1: at most k

since there are at most k possible derivations

Fall 2005 Costas Buch - RPI 17


2
Steps for phase 2: at most k  k  k

Derivations Number of
from phase 1 Productions

In General
(i 1) i
Steps for phase i: at most k k  k

Derivations Number of
from phase i-1 Productions
Fall 2005 Costas Buch - RPI 18
Total steps needed for string w:

Exponential to the string length


2 2|w|
k  k  k

phase 1 phase 2 phase 2|w|

Extremely bad!!!
Fall 2005 Costas Buch - RPI 19
Faster Parsers

Fall 2005 Costas Buch - RPI 20


There exist faster parsing algorithms
for specialized grammars

S-grammar: A  av
Symbol String of variables

Each pair of variable, terminal ( X , )


appears once in a production X  w

(a restricted version of Greinbach Normal form)

Fall 2005 Costas Buch - RPI 21


S-grammar example: S  aS
S  bSS
S c

Each string has a unique derivation

S  aS  abSS  abcS  abcc

Fall 2005 Costas Buch - RPI 22


For S-grammars:

In the exhaustive search parsing


there is only one choice in each phase

Steps for a phase: 1

Total steps for parsing string w : | w|

Fall 2005 Costas Buch - RPI 23


For general context-free grammars:

There exists a parsing algorithm


that parses a string w in O (| w |3 )

(CYK parser, described next)

Fall 2005 Costas Buch - RPI 24


The CYK Parsing Algorithm

Input: • Arbitrary Grammar G


in Chomsky Normal Form
• String w

Output: Determine if w L(G )


Number of Steps: 3
O (| w | )

Can be easily converted to a Parser


Fall 2005 Costas Buch - RPI 25
Basic Idea
Consider a grammar G
In Chomsky Normal Form

Denote by F (w) the set of variables


that generate a string w

*
X  F (w) if X w
Fall 2005 Costas Buch - RPI 26
F (w) can be computed recursively:
prefix suffix

Write w  uv
If X  F (u ) and Y  F (v)
* *
( X  u) (Y  v)

and there is production Z  XY

Then Z  F (w)
* *
( Z  XY  uY  uv  w)
Fall 2005 Costas Buch - RPI 27
Compute F (w) by taking the union
all possible decompositions of w
Length Set of Variables
1 That generates w
w  u1v1 H1
2
w  u2v2 H2

|w|-1
w  u|w|1v|w|1 H|w|1

Result: F ( w)  H1  H 2    H|w|1
Fall 2005 Costas Buch - RPI 28
At the basis of the recursion
we have strings of length 1

F ( )  {Variables that generate symbol  }

symbol X 

Fall 2005 Costas Buch - RPI 29


In order to determine if w L(G )
we only need to examine if S  F (w)
*
Start variable ( S  w)

Remark:
The whole algorithm can be implemented
with dynamic programming

Fall 2005 Costas Buch - RPI 30


Example:
• Grammar G : S  AB
A  BB | a
B  AB | b

• Determine if w  aabbb  L(G )


Fall 2005 Costas Buch - RPI 31
Decompose the string aabbb
to all possible substrings
Length
1 a a b b b

2 aa ab bb bb

3 aab abb bbb

4 aabb abbb

5 aabbb
Fall 2005 Costas Buch - RPI 32
S  AB, A  BB | a, B  AB | b

a a b b b
F ( ) {A} {A} {B} {B} {B}
aa ab bb bb

aab abb bbb

aabb abbb

aabbb
Fall 2005 Costas Buch - RPI 33
S  AB, A  BB | a, B  AB | b

a a b b b
F ( )  {A} {A} {B} {B} {B}
aa ab bb bb
F ()  {} {S,B} {A} {A}
aab abb bbb

aabb abbb

aabbb
Fall 2005 Costas Buch - RPI 34
S  AB, A  BB | a, B  AB | b
F (aa ) prefix aa suffix
F (a )  { A} F (a )  { A}
There is no production of form X  AA
Thus, F (aa )  {}

F (ab) prefix ab suffix


F (a )  { A} F (b)  {B}
There are two productions of form X  AB
S  AB, B  AB
Thus, F (ab)  {S , B}
Fall 2005 Costas Buch - RPI 35
S  AB, A  BB | a, B  AB | b
a a b b b
{A} {A} {B} {B} {B}
Since aa ab bb bb
S  F (w) {} {S,B} {A} {A}
aab abb bbb
{S,B} {A} {S,B}
aabbb L(G )
aabb abbb
{A} {S,B}
aabbb
F (aabbb) 
Fall 2005
{S,B}
Costas Buch - RPI 36
Approximate number of steps:
2 3
O(| w |  | w |)  O(| w | )

Number of Number of
strings Prefix-suffix
decompositions
for a string

Fall 2005 Costas Buch - RPI 37

You might also like