0% found this document useful (0 votes)
13 views

CYK Parsing Notes

The CYK algorithm is a parsing technique for context-free grammar that checks if a string can be derived from a given grammar, requiring the grammar to be in Chomsky Normal Form (CNF). It utilizes dynamic programming to construct a parsing table, allowing for efficient parsing with a complexity of O(n³) and the ability to handle ambiguous grammars. The document outlines the steps to convert a grammar to CNF, construct a CYK parsing table, and highlights the advantages of the CYK algorithm in various applications.

Uploaded by

bhowmikpinki59
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

CYK Parsing Notes

The CYK algorithm is a parsing technique for context-free grammar that checks if a string can be derived from a given grammar, requiring the grammar to be in Chomsky Normal Form (CNF). It utilizes dynamic programming to construct a parsing table, allowing for efficient parsing with a complexity of O(n³) and the ability to handle ambiguous grammars. The document outlines the steps to convert a grammar to CNF, construct a CYK parsing table, and highlights the advantages of the CYK algorithm in various applications.

Uploaded by

bhowmikpinki59
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CYK PARSING

• CYK algorithm is a parsing algorithm for context free grammar.


• It was independently developed by three Russian scientists
named Cocke, Younger, and Kasami, hence the name CYK!
• t is used to check if a particular string can be derived from the language
generated by a given grammar. It is also called the membership algorithm as it
tells whether the given string is a member of the given grammar or not.
• To apply CYK algorithm to a grammar, it must be in Chomsky Normal Form.
• It uses a dynamic programming algorithm to tell whether a string is in the
language of a grammar.

Basics

In a CYK algorithm, the structure of grammar should be in Chomsky normal form.

In addition, the CYK algorithm uses the dynamic programming or table filling algorithm.

CNF

• Chomsky Normal Form (CNF) is a way to simplify context-free grammars (CFGs)


so that all production rules follow specific patterns.
• In CNF, each rule either produces two non-terminal symbols, a single terminal
symbol, or, in some cases, the empty string.
• Converting a CFG to CNF is an important step in many parsing algorithms, like
the CYK algorithm, and helps in understanding the structure of languages.

CNF Rules

• It has starting symbol


• It has LHS and RHS
• LHS will have non-terminal
• RHS can contain:
▪ Single terminal
▪ Two non-terminal
• RHS should not contain:
▪ Combination of terminal and non-terminal
▪ Single non-terminal

The grammar will be in CNF if each rule has one of the following forms:

• A→BC (at most two variables on the right-hand side)

• A→ a (a single terminal on the right-hand side)

• S→Ø (null string)


If the given Grammar is not in the CNF form, convert it to the CNF form before applying
the CYK Algorithm.

Step 1:

Original Grammar

S → NP VP

NP → Det N | N

VP → V NP

Det → 'the' | 'a'

N → 'cat' | 'dog'

V → 'chased'

Step 2: Convert to CNF

S → NP VP

NP → Det N

VP → V NP

Det → 'the' | 'a'

N → 'cat' | 'dog'

V → 'chased'

The above is already in CNF form

CYK Parsing Table Construction

We construct a CYK table where row i, column j represents the non-terminals that
can derive the substring from position iii to jjj.

Sentence: "The cat chased a dog"

Count the number of words in the sentence and construct a matrix accordingly.

i.e., 5 x 5 matrix (rows from 1 to 5 and columns from 0 to 4)


0 the 1 cat 2 chased 3 a 4 dog 5

1 2 3 4 5
[0,1] [1,2]
0 Det
1 N
2
3
4

Check the CNF rule and find if there are any non-terminal contains Det and V;

Yes, we have :

NP → Det N

1 2 3 4 5
[0,1] [1,2]
0 Det NP

1
N
2

Next, we have ‘chased’ comes in between 2 and 3

1 2 3 4 5
[0,1] [1,2] [2,3]
0 Det NP

1
N
2 V

4
Next , we have ‘a’ which occurs between 3 and 4 and similarly ‘dog’ occurs between 4
and 5:

1 2 3 4 5
[0,1] [1,2] [2,3] [3,4] [4,5]

0 Det NP

1
N
2 V

3 Det NP

4
N

Finally go up one by one, V and NP is there in the rule, so we can combine

1 2 3 4 5
[0,1] [1,2] [2,3] [3,4] [4,5]

0 Det NP

1
N

2 V VP

3 Det NP

4
N
Now, check VP and NP in the rule, which is there in S→ NP VP

1 2 3 4 5
[0,1] [1,2] [2,3] [3,4] [4,5]

0 Det NP S

1
N

2 V VP

3 Det NP

4
N

Finally S is obtained, so can end here.

Advantages:

• Guaranteed Completeness – Ensures all valid parses are found.


• Efficient (O(n³) Complexity) – Uses dynamic programming for faster parsing.
• Handles Ambiguous Grammars – Finds all possible parses for a sentence.
• Supports Probabilistic Parsing (PCFGs) – Determines the most probable
parse.
• Foundation for Modern Parsers – Used in NLP and machine learning models.
• Useful in Compiler Design – Helps in syntax analysis and syntax tree
generation.
• Works Well for Large Sentences – More efficient than top-down approaches.

You might also like