0% found this document useful (0 votes)

4 views5 pages

CYK Algorithm & Tree Based Language Models

The CYK algorithm is a bottom-up parsing method used in Natural Language Processing to determine if a string can be generated by a context-free grammar in Chomsky Normal Form. It involves initializing a table to track non-terminals for substrings and checking if the start symbol can generate the full string. Tree-based language models enhance traditional models by incorporating syntactic structures, improving the handling of grammatical relationships and long-range dependencies.

Uploaded by

ankithmahareddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views5 pages

CYK Algorithm & Tree Based Language Models

Uploaded by

ankithmahareddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CYK Algorithm

The CYK algorithm (Cocke–Younger–Kasami algorithm) is a parsing algorithm used in Natural

Language Processing (NLP) to determine whether a given string (sentence) can be generated
by a context-free grammar (CFG) in Chomsky Normal Form (CNF). It is a bottom-up parsing
algorithm and is widely used for syntactic parsing.

Key Concepts:

• Context-Free Grammar (CFG): A grammar where each production rule has a single
non-terminal on the left-hand side.

• Chomsky Normal Form (CNF): A restricted form of CFG where every production is
either:

o A → BC (two non-terminals)

o A → a (a terminal)

Use of CYK in NLP:

• Parsing sentences to check grammatical correctness

• Building parse trees

• Used in syntax checking and machine translation

How the CYK Algorithm Works:

Given:

• A string of length n: w = w₁ w₂ w₃ ... wₙ

• A CFG in CNF

Step-by-Step Process:

1. Initialize a table T[n][n], where each cell T[i][j] holds the set of non-terminals that can
generate the substring w[i...j].

2. Base Case (Length = 1): For each position i, find the non-terminals that produce the
terminal w[i].

3. Recursive Step (Length > 1): For each substring length l = 2 to n, and each starting
position i, compute the possible non-terminals for substring w[i...i+l-1] by:

o Splitting into two parts: w[i...k] and w[k+1...i+l-1]

o If A → BC and B ∈ T[i][k], C ∈ T[k+1][i+l-1], then add A to T[i][i+l-1]

4. Check if the start symbol S is in T[0][n-1] (i.e., the full string):

o If yes → the string is generated by the grammar

o If no → the string is not in the language

Time and Space Complexity:

• Time Complexity: O(n3⋅∣G∣)O(n^3 \cdot |G|)

• Space Complexity: O(n2)O(n^2)

o Where nn is the length of the input string, and ∣G∣|G| is the number of production
rules in the grammar.

Example:

Grammar in CNF:

S → AB | BC

A → BA | a

B → CC | b

C → AB | a

Input string: "baaa"

Apply the CYK algorithm to check if "baaa" belongs to the language.

Applications in NLP:

• Syntax checking in compilers and NLP

• Parsing natural language queries

• Speech recognition systems

• Machine translation (for grammatical correctness)

What are Tree-Based Language Models?
Tree-based language models are statistical or neural language models that incorporate
syntactic structure (trees) of sentences instead of treating them as flat sequences of words.
These models use parse trees or syntax trees to capture hierarchical relationships in natural
language.

Tree-based language models aim to improve upon traditional n-gram or sequential neural
models (like RNNs or LSTMs) by explicitly modeling the grammatical structure of a sentence
using constituency trees or dependency trees.

Types of Tree-Based Language Models:

1. Probabilistic Context-Free Grammar (PCFG) Based Models

• What it is: An extension of context-free grammar (CFG) with probabilities attached to

production rules.

• How it works: Each rule (e.g., NP → DT NN) has a probability based on frequency.

• Probability of a sentence: Product of probabilities of rules used in the parse tree.

Used in: Statistical parsing, early tree-based language modeling.

2. Tree Adjoining Grammar (TAG) Based Models

• What it is: A more expressive grammar formalism than CFG.

• Advantage: Captures long-distance dependencies better.

• Language modeling: Probabilities assigned to derivations based on elementary trees.

Used in: Parsing and modeling of languages with complex syntax (like German, Hindi).

3. Syntactic Tree-LSTM (Tree-Structured LSTM)

• Introduced by: Kai Sheng Tai, Richard Socher, and Christopher Manning (2015).

• How it works: Instead of sequentially passing information (like in RNN), it passes

information from child nodes to parent node in a parse tree.

• Input: Parse tree (constituency or dependency) + word embeddings

• Output: Sentence representation or probability of next word

Used in:

• Sentiment analysis

• Syntax-aware language modeling

• Machine translation

4. Recursive Neural Networks (RecNNs)

• Structure: A neural model that recursively combines child node vectors to form
parent node vectors.

• Parse Tree Usage: Applies the same function at each node of a syntax tree.

• Limitation: Shallow structure and hard to train; replaced in many areas by Tree-LSTMs.

Used in:

• Sentence similarity

• Syntax-aware classification

5. Dependency-Based Language Models

• Focus: Dependency parse trees, where each word is connected to others through
grammatical relationships (e.g., subject, object).

• Model: Predicts words using their syntactic dependents rather than left-to-right
sequence.

• Example: Eisner’s Dependency Model (1996), Structured Language Models by Chelba &
Jelinek (1998)

Benefits:

• Captures syntactic structure explicitly

• Works well for languages with free word order

6. Tree Transformers

• What it is: Transformer models that integrate syntactic trees (parse trees) into attention
mechanisms.

• How:

o Bias attention heads based on syntactic distances

o Restrict self-attention using tree structures

Examples:

• Syntax-Aware Transformers

• TreeFormer

• StructBERT (uses parse trees for pretraining)

Why Use Tree-Based Language Models?

Tree-Based Sequence Models (e.g., RNN,

Feature
Models Transformer)

Captures syntax explicitly Yes No (implicitly, if at all)

Handles long-range dependencies Better Sometimes (especially in LSTMs)

Suitable for free-word order

Yes Often struggles
languages

Summary:

Model Type Structure Used Strength

PCFG Constituency Tree Probabilistic grammar rules

TAG Extended parse trees Long-distance dependency modeling

Tree-LSTM Any parse tree Hierarchical neural computation

RecNN Constituency Tree Recursive vector combination

Dependency Models Dependency Tree Head-modifier syntax modeling

Tree Transformers Parse Trees + Attention Syntax-aware deep models

Study MAterial Unit 2
No ratings yet
Study MAterial Unit 2
16 pages
Dependency Parsing
100% (11)
Dependency Parsing
127 pages
Compiler 2
100% (1)
Compiler 2
45 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
45 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
23 pages
Module-2 ch-4
No ratings yet
Module-2 ch-4
32 pages
BCS 324 Compiler Design Notes - Unit2
No ratings yet
BCS 324 Compiler Design Notes - Unit2
37 pages
NLP Unit 2
No ratings yet
NLP Unit 2
20 pages
2019-11-29 04 41 39CS V Sem Compiler Design
No ratings yet
2019-11-29 04 41 39CS V Sem Compiler Design
10 pages
Unit 4 CYK Algo Slides
No ratings yet
Unit 4 CYK Algo Slides
60 pages
Unit 3
No ratings yet
Unit 3
19 pages
A Simple One-Pass Compiler (To Generate Code For The JVM)
No ratings yet
A Simple One-Pass Compiler (To Generate Code For The JVM)
70 pages
NLP Module 3
No ratings yet
NLP Module 3
41 pages
Parsing Bun
No ratings yet
Parsing Bun
48 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
Parsing
No ratings yet
Parsing
27 pages
NLP Unit-2
No ratings yet
NLP Unit-2
18 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
71 pages
Important Notes How Have Facing Problem in NLP
No ratings yet
Important Notes How Have Facing Problem in NLP
6 pages
2nd Phase Syntax Analyzer - 1
No ratings yet
2nd Phase Syntax Analyzer - 1
136 pages
BAI601 All Modules VTU 10 Mark Complete
No ratings yet
BAI601 All Modules VTU 10 Mark Complete
18 pages
Automated Methods For The Comparison of Natural La
No ratings yet
Automated Methods For The Comparison of Natural La
11 pages
4.chapter5 - Syntactic and Semantic Representations
No ratings yet
4.chapter5 - Syntactic and Semantic Representations
47 pages
Sem:U: Btecht in
No ratings yet
Sem:U: Btecht in
5 pages
SPand MT
No ratings yet
SPand MT
93 pages
Unit 2 New One
No ratings yet
Unit 2 New One
12 pages
Ch2 Modified
No ratings yet
Ch2 Modified
39 pages
NLP Unit-3
No ratings yet
NLP Unit-3
14 pages
NLPPR6
No ratings yet
NLPPR6
6 pages
Compiler 2
No ratings yet
Compiler 2
45 pages
NLP - Viva - Que & Ans
No ratings yet
NLP - Viva - Que & Ans
15 pages
NLP Sem 3 Unit
No ratings yet
NLP Sem 3 Unit
12 pages
Reference Material NLP - 2
No ratings yet
Reference Material NLP - 2
40 pages
Syntactic Analysis
No ratings yet
Syntactic Analysis
66 pages
NLP - Lecture1
No ratings yet
NLP - Lecture1
21 pages
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
No ratings yet
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
89 pages
NLP M3 SPP
No ratings yet
NLP M3 SPP
53 pages
Unit II
No ratings yet
Unit II
61 pages
Mod - 3
No ratings yet
Mod - 3
51 pages
NLP Unit 2
No ratings yet
NLP Unit 2
13 pages
Pumping Lemma (Bar Hillel Lemma)
No ratings yet
Pumping Lemma (Bar Hillel Lemma)
49 pages
Lecture15 Parsing
No ratings yet
Lecture15 Parsing
37 pages
What Is Parsing
No ratings yet
What Is Parsing
47 pages
Module 3 NLP
No ratings yet
Module 3 NLP
32 pages
NLP - Shortnotes Unit 3
No ratings yet
NLP - Shortnotes Unit 3
16 pages
Natural Language Processing: Parsing
No ratings yet
Natural Language Processing: Parsing
18 pages
Lecture Notes On Context-Free Grammars: 15-411: Compiler Design Frank Pfenning September 15, 2009
No ratings yet
Lecture Notes On Context-Free Grammars: 15-411: Compiler Design Frank Pfenning September 15, 2009
9 pages
Constituency Parsing PPT 2
No ratings yet
Constituency Parsing PPT 2
33 pages
Ai Unit - 5
No ratings yet
Ai Unit - 5
12 pages
Introduction To Natural Language Processing and NLTK
No ratings yet
Introduction To Natural Language Processing and NLTK
23 pages
Syntax Analyzer
No ratings yet
Syntax Analyzer
38 pages
Grammars: Before You Can Parse You Need A Grammar. So Where Do Grammars Come From?
No ratings yet
Grammars: Before You Can Parse You Need A Grammar. So Where Do Grammars Come From?
32 pages
14 Syntax 1
No ratings yet
14 Syntax 1
22 pages
Thuật toán NLP
No ratings yet
Thuật toán NLP
57 pages
NLP Mid-1
No ratings yet
NLP Mid-1
15 pages
Lec15 CL1-f11
No ratings yet
Lec15 CL1-f11
5 pages
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
No ratings yet
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
57 pages
Verb Pattern in English
No ratings yet
Verb Pattern in English
4 pages
Worksheet - Questions With Do - Does
100% (1)
Worksheet - Questions With Do - Does
3 pages
Numbers & Percentages Reporting
No ratings yet
Numbers & Percentages Reporting
4 pages
Lecture 2
No ratings yet
Lecture 2
28 pages
Rigpai Lodap
No ratings yet
Rigpai Lodap
222 pages
3rd Periodical Exam (g7)
No ratings yet
3rd Periodical Exam (g7)
4 pages
Elev 2 Language Lift PDF
No ratings yet
Elev 2 Language Lift PDF
44 pages
1001 Merged2 Zrugu9
No ratings yet
1001 Merged2 Zrugu9
1,005 pages
Unit - 4 Error Recovery
No ratings yet
Unit - 4 Error Recovery
14 pages
Method of Teaching Using Tongue Twisters
100% (2)
Method of Teaching Using Tongue Twisters
4 pages
Adverbial Clause - Concession
100% (1)
Adverbial Clause - Concession
3 pages
A Semi Lesson Plan in GRADE 5..
No ratings yet
A Semi Lesson Plan in GRADE 5..
8 pages
Group 1 - Deixis and Distance
No ratings yet
Group 1 - Deixis and Distance
22 pages
Stage 8 Term 3 2024
No ratings yet
Stage 8 Term 3 2024
3 pages
Q4 English2 Overandunder
No ratings yet
Q4 English2 Overandunder
41 pages
List of Verbs (Lista de Verbos)
No ratings yet
List of Verbs (Lista de Verbos)
4 pages
Semantic Case
No ratings yet
Semantic Case
10 pages
TEXT 1: Role of Literature: ENGLISH GRAMMAR Hand-Out by Mr. YONKEU Blaise-Aimé
No ratings yet
TEXT 1: Role of Literature: ENGLISH GRAMMAR Hand-Out by Mr. YONKEU Blaise-Aimé
49 pages
Anglu Testas
No ratings yet
Anglu Testas
6 pages
PRESENT SIMPLE and CONTINUOUS CHART
No ratings yet
PRESENT SIMPLE and CONTINUOUS CHART
2 pages
Think l3 Spanish Wordlist
No ratings yet
Think l3 Spanish Wordlist
44 pages
Test Paper 6
No ratings yet
Test Paper 6
5 pages
Tenses 14
No ratings yet
Tenses 14
5 pages
Past Tense b1
No ratings yet
Past Tense b1
4 pages
Present Simple - Present Continuous
No ratings yet
Present Simple - Present Continuous
13 pages
8 Pretérito Perfecto)
No ratings yet
8 Pretérito Perfecto)
8 pages
Lesson 13
No ratings yet
Lesson 13
4 pages
2022-23 F.Y.B.Com Optional English (CBCS)
No ratings yet
2022-23 F.Y.B.Com Optional English (CBCS)
3 pages
Lexico Grammar PDF
No ratings yet
Lexico Grammar PDF
25 pages
10 Active-Passive Voice
No ratings yet
10 Active-Passive Voice
8 pages
If Conditionall
No ratings yet
If Conditionall
3 pages
Natural Language Processing with NLTK: Definitive Reference for Developers and Engineers
From Everand
Natural Language Processing with NLTK: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Gene Expression Programming: Fundamentals and Applications
From Everand
Gene Expression Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet

CYK Algorithm & Tree Based Language Models

Uploaded by

CYK Algorithm & Tree Based Language Models

Uploaded by

CYK Algorithm

The CYK algorithm (Cocke–Younger–Kasami algorithm) is a parsing algorithm used in Natural

Use of CYK in NLP:

• Parsing sentences to check grammatical correctness

• Building parse trees

• Used in syntax checking and machine translation

How the CYK Algorithm Works:

• A string of length n: w = w₁ w₂ w₃ ... wₙ

o Splitting into two parts: w[i...k] and w[k+1...i+l-1]

o If A → BC and B ∈ T[i][k], C ∈ T[k+1][i+l-1], then add A to T[i][i+l-1]

4. Check if the start symbol S is in T[0][n-1] (i.e., the full string):

o If no → the string is not in the language

Time and Space Complexity:

• Time Complexity: O(n3⋅∣G∣)O(n^3 \cdot |G|)

• Space Complexity: O(n2)O(n^2)

Input string: "baaa"

Apply the CYK algorithm to check if "baaa" belongs to the language.

• Syntax checking in compilers and NLP

• Parsing natural language queries

• Speech recognition systems

• Machine translation (for grammatical correctness)

Types of Tree-Based Language Models:

1. Probabilistic Context-Free Grammar (PCFG) Based Models

• What it is: An extension of context-free grammar (CFG) with probabilities attached to

• Probability of a sentence: Product of probabilities of rules used in the parse tree.

Used in: Statistical parsing, early tree-based language modeling.

2. Tree Adjoining Grammar (TAG) Based Models

• What it is: A more expressive grammar formalism than CFG.

• Advantage: Captures long-distance dependencies better.

• Language modeling: Probabilities assigned to derivations based on elementary trees.

3. Syntactic Tree-LSTM (Tree-Structured LSTM)

• How it works: Instead of sequentially passing information (like in RNN), it passes

• Input: Parse tree (constituency or dependency) + word embeddings

• Output: Sentence representation or probability of next word

• Syntax-aware language modeling

4. Recursive Neural Networks (RecNNs)

5. Dependency-Based Language Models

• Captures syntactic structure explicitly

• Works well for languages with free word order

o Bias attention heads based on syntactic distances

o Restrict self-attention using tree structures

• StructBERT (uses parse trees for pretraining)

Tree-Based Sequence Models (e.g., RNN,

Captures syntax explicitly Yes No (implicitly, if at all)

Handles long-range dependencies Better Sometimes (especially in LSTMs)

Suitable for free-word order

Model Type Structure Used Strength

PCFG Constituency Tree Probabilistic grammar rules

TAG Extended parse trees Long-distance dependency modeling

Tree-LSTM Any parse tree Hierarchical neural computation

RecNN Constituency Tree Recursive vector combination

Dependency Models Dependency Tree Head-modifier syntax modeling

Tree Transformers Parse Trees + Attention Syntax-aware deep models

You might also like