0% found this document useful (0 votes)

14 views6 pages

NLP Unit-4

The document discusses probabilistic parsing and language processing, focusing on Probabilistic Context Free Grammar (PCFG) which incorporates probabilities into production rules to handle ambiguity in sentence parsing. It also covers semantic parsing, which translates natural language into machine-understandable representations, and introduces text classification using machine learning techniques like Naive Bayes and Logistic Regression. Additionally, the document outlines the use of a dataset for training and testing text classifiers in the context of economic news articles.

Uploaded by

gaursujal02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views6 pages

NLP Unit-4

Uploaded by

gaursujal02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

UNIT-4

Probabilistic parsing
Probabilistic parsing is using dynamic programming algorithms to compute the most likely
parse(s) of a given sentence, given a statistical model of the syntactic structure of a language.
Research at Stanford has focused on improving the statistical models used as well as the
algorithms.
A probabilistic context free grammar consists of terminal and nonterminal variables. Each
feature to be modeled has a production rule that is assigned a probability estimated from a
training set of RNA structures. Production rules are recursively applied until only terminal
residues are left.
Probabilistic Language Processing
Probabilistic language processing presupposes a probabilistic model of the language; and uses
that model to infer, for example, how sentences should be parsed, or ambiguous words
interpreted.

Probabilistic Language Processing

Example:
For example: weather forecasting, mail delivery. A probabilistic model is, instead, meant to
give a distribution of possible outcomes (i.e. it describes all outcomes and gives some measure
of how likely each is to occur).

PCFG
Probabilistic Context Free Grammar. (PCFG) A PCFG is a probabilistic version of a CFG
where each production has a probability. Probabilities of all productions rewriting a given non-
terminal must add to 1, defining a distribution for each non-terminal.
The PCFG is used to predict the prior probability distribution of the structure whereas posterior
probabilities are estimated by the inside-outside algorithm and the most likely structure is found
by the CYK algorithm.
Advantages:
PCFGs are good for grammar induction, learning a grammar from a text; CFGs require
negative data in order to learn. PCFGs tend to be robust to disfluencies and grammatical
mistakes because these simply receive low probabilities.
• They provide a precise mathematical definition that clearly rules out certain types of
language.
• The formal definition means that context free grammars are computationally
TRACTABLE--it is possible to write a computer program which determines whether
sentences are grammatical or not.
Limitations of Probabilistic Context Free Grammar:
• Lexical rules are difficult in case of Context free grammar.
• Notations in Context free grammar are quite complex.
• By using the context free grammar, it is very difficult to construct the recognizer.
Example:

Why PCFG Used:

Ambiguity is the reason why we are using probabilistic version of CFG. For instance, some
sentences may have more than one underlying derivation. That is, the sentence can be parsed
in more than one ways. In this case, the parse of the sentence become ambiguous.

Probabilistic Context Free Grammar (PCFG) is an extension of Context Free Grammar (CFG)
with a probability for each production rule. Ambiguity is the reason why we are using
probabilistic version of CFG. For instance, some sentences may have more than one underlying
derivation. That is, the sentence can be parsed in more than one ways. In this case, the parse of
the sentence become ambiguous. To eliminate this ambiguity, we can use PCFG to find the
probability of each parse of the given sentence.
A PCFG is made up of a CFG and a set of probabilities for each production rule of CFG. A
PCFG can be formally defined as follows;
A probabilistic context free grammar G is a quintuple G = (N, T, S, R, P) where

• (N, T, S, R) is a context free grammar where N is set of non-terminal (variable)

symbols, T is set of terminal symbols, S is the start symbol and R is the set of production
rules where each rule of the form A → s [Refer for more here – Context Free Grammar
Formal Definition].
• A probability P(A → s) for each rule in R. The properties governing the
probability are as follows;
o P(A → s) is a conditional probability of choosing a rule A → s in a left-
most derivation, given that A is the non-terminal that is expanded.
o The value for each probability lies between 0 and 1.
o The sum of all probabilities of rules with A as the left hand side non-
terminal should be equal to 1.

Example PCFG:
Probabilistic Context Free Grammar G = (N, T, S, R, P)
• N = {S, NP, VP, PP, Det, Noun, Verb, Pre}
• T = {‘a’, ‘ate’, ‘cake’, ‘child’, ‘fork’, ‘the’, ‘with’}
• S=S
• R = { S → NP VP
NP → Det Noun | NP PP
PP → Pre NP
VP → Verb NP
Det → ‘a’ | ‘the’
Noun → ‘cake’ | ‘child’ | ‘fork’
Pre → ‘with’
Verb → ‘ate’ }
• P = R with associated probability as in the table below;
Rule Probability Rule Probability
S → NP VP 1.0 Det → ‘a’ 0.5
Det → ‘the’ 0.5
NP → NP PP 0.6 Noun → ‘cake’ 0.4
NP → Det Noun 0.4 Noun → ‘child’ 0.3
Noun → ‘fork’ 0.3
PP → Pre NP 1.0 Pre → ‘with’ 1.0
VP → Verb NP 1.0 Verb → ‘ate’ 1.0

We have formally defined PCFG. Now the next question is how to use PCFG to derive the
probability of a parse tree (derivation tree). As discussed, a sentence can be parsed into more
than one way. That means, we can have more than one parse trees for the sentence as per the
CFG due to ambiguity.
Given a parse tree t, with the production rules α1 → β1, α2 → β2, … , αn → βn from R (ie., αi →
βi ∈ R), we can find the probability of tree t using PCFG as follows;

As per the equation, the probability P(t) of parse tree is the product of probabilities of
production rules in the tree t.

Example:
Find the probability of the parse tree t given below;

= 1.0 * 0.1 * 0.7 * 1.0 * 0.4 * 0.18 * 1.0 * 1.0 * 0.18

= 0.0009072
The probability of the parse tree t is calculated as 0.0009072.
Semantic Parsing
Semantic parsing is the task of converting a natural language utterance to a logical form: a
machine-understandable representation of its meaning. Semantic parsing can thus be
understood as extracting the precise meaning of an utterance.
Example
A semantic parser is a program that automatically translates natural-language utterances
(NLUs) to formal meaning representations (MRs) that a computer can execute. For
example, the NLU in (1-a) is a question that might be posed to a geographical information
system.
What is the difference between semantic parsing and syntax parsing?
As such, semantic parsing refers to the task of mapping natural language text to formal
representations or abstractions of its meaning. A syntactic parser may generate constituency or
dependency trees from a sentence, but a semantic parser may be built depending upon the task
for which inference is required.

The Semantics of Programming Languages. Semantics, roughly, are meanings given for groups
of symbols: ab+c, "ab"+"c", mult(5,4). For example, to express the syntax of adding 5 with 4,
we can say: Put a "+" sign in between the 5 and 4, yielding " 5 + 4 ". However, we must also
define the semantics of 5+4.

Kaggle-based Text Classification

Text classification also known as text tagging or text categorization is the process of
categorizing text into organized groups. By using Natural Language Processing (NLP), text
classifiers can automatically analyze text and then assign a set of pre-defined tags or categories
based on its content.

Text classification is a machine learning technique that assigns a set of predefined categories
to open-ended text. Text classifiers can be used to organize, structure, and categorize pretty
much any kind of text – from documents, medical studies and files, and all over the web.

With text classification, there are two main deep learning models that are widely
used: Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). CNN is
a type of neural network that consists of an input layer, an output layer, and multiple hidden
layers that are made of convolutional layers.

This kernel aims to give a brief overview of performing text classification using Naive Bayes,
Logistic Regression, Support Vector Machines and Decision Tree Classifier. We will be using
a dataset called "Economic news article tone and relevance" which consists of approximately
8000 news articles, which were tagged as relevant or not relevant to the US Economy. Our
goal in this kernel is to explore the process of training and testing text classifiers for this dataset.
Import Required Libraries
In [1]:
import numpy as np
import pandas as pd

import matplotlib as mpl

import matplotlib.cm as cm
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import seaborn as sns

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.feature_extraction import stop_words
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer

import string
import re

from sklearn.naive_bayes import MultinomialNB

from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC

from sklearn.metrics import accuracy_score

import sklearn.metrics as metrics
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn import metrics

from time import time

import warnings
warnings.filterwarnings("ignore")

NLP Unit-Iii
No ratings yet
NLP Unit-Iii
26 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
45 pages
4.chapter5 - Syntactic and Semantic Representations
No ratings yet
4.chapter5 - Syntactic and Semantic Representations
47 pages
Generative Models For Ambiguity Resolution
No ratings yet
Generative Models For Ambiguity Resolution
8 pages
NLP UNIT 2 Notes
No ratings yet
NLP UNIT 2 Notes
14 pages
Unit 3 NLP New
No ratings yet
Unit 3 NLP New
15 pages
Unit 3
No ratings yet
Unit 3
19 pages
Unit 5 NLP
No ratings yet
Unit 5 NLP
6 pages
NLP Unit 3
No ratings yet
NLP Unit 3
17 pages
NLP Unit-2
No ratings yet
NLP Unit-2
18 pages
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
No ratings yet
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
89 pages
6 Probabilisticparse
No ratings yet
6 Probabilisticparse
46 pages
NLP - Viva - Que & Ans
No ratings yet
NLP - Viva - Que & Ans
15 pages
NLP Session 16 - Post Midsem Review
No ratings yet
NLP Session 16 - Post Midsem Review
189 pages
Natural Language Processing Unit 3
No ratings yet
Natural Language Processing Unit 3
55 pages
CFG and PCFG
No ratings yet
CFG and PCFG
7 pages
Toc Mod3
No ratings yet
Toc Mod3
72 pages
PCFG
No ratings yet
PCFG
79 pages
Slp14 Handout s17hw
No ratings yet
Slp14 Handout s17hw
71 pages
Module No. 3: Parsing Structure in Text
No ratings yet
Module No. 3: Parsing Structure in Text
54 pages
Mod - 3
No ratings yet
Mod - 3
51 pages
Fake News Detection
100% (1)
Fake News Detection
44 pages
PCFG
No ratings yet
PCFG
79 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Unit 4
No ratings yet
Unit 4
45 pages
NLP 2 Internal
No ratings yet
NLP 2 Internal
39 pages
Constituency Parsing PPT 2
No ratings yet
Constituency Parsing PPT 2
33 pages
13-Dependency Grammar-03-09-2024
No ratings yet
13-Dependency Grammar-03-09-2024
31 pages
SCFG PCFG LCFG
No ratings yet
SCFG PCFG LCFG
25 pages
Lecture 07
No ratings yet
Lecture 07
35 pages
14 Syntax 1
No ratings yet
14 Syntax 1
22 pages
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
No ratings yet
Introduction To Computational Linguistics: Eugene Charniak and Mark Johnson
148 pages
NLP Unit 4
No ratings yet
NLP Unit 4
22 pages
NLP - Shortnotes Unit 3
No ratings yet
NLP - Shortnotes Unit 3
16 pages
Statistical Constituency Pars-Ing: 14.1 Probabilistic Context-Free Grammars
No ratings yet
Statistical Constituency Pars-Ing: 14.1 Probabilistic Context-Free Grammars
29 pages
Natural Language Processing Handout
No ratings yet
Natural Language Processing Handout
8 pages
14 Ai Cse551 NLP 2 PDF
No ratings yet
14 Ai Cse551 NLP 2 PDF
39 pages
Ai Unit 5
No ratings yet
Ai Unit 5
19 pages
NLP Module 3
No ratings yet
NLP Module 3
11 pages
Chapter 9 V 2
No ratings yet
Chapter 9 V 2
18 pages
Statistical Constituency Pars-Ing: C.1 Probabilistic Context-Free Grammars
No ratings yet
Statistical Constituency Pars-Ing: C.1 Probabilistic Context-Free Grammars
21 pages
Parsing-Lexicalization Text Mining
No ratings yet
Parsing-Lexicalization Text Mining
38 pages
CFG & PCFG
No ratings yet
CFG & PCFG
15 pages
NLP Sem 3 Unit
No ratings yet
NLP Sem 3 Unit
12 pages
Probabilistic Context Free Grammar For Urdu: Keywords
No ratings yet
Probabilistic Context Free Grammar For Urdu: Keywords
8 pages
Probabilistic Context-Free Grammar
No ratings yet
Probabilistic Context-Free Grammar
13 pages
Theme
No ratings yet
Theme
11 pages
Module 4
No ratings yet
Module 4
7 pages
Module III
No ratings yet
Module III
18 pages
Natural Language Processing: Parsing
No ratings yet
Natural Language Processing: Parsing
18 pages
NLPPR6
No ratings yet
NLPPR6
6 pages
Lexicalized Probabilistic Context-Free Grammars: Michael Collins
No ratings yet
Lexicalized Probabilistic Context-Free Grammars: Michael Collins
22 pages
Efficient, Feature-Based, Conditional Random Field Parsing: Jenny Rose Finkel, Alex Kleeman, Christopher D. Manning
No ratings yet
Efficient, Feature-Based, Conditional Random Field Parsing: Jenny Rose Finkel, Alex Kleeman, Christopher D. Manning
9 pages
Lexicalized Probabilistic Context-Free Grammars: Michael Collins
No ratings yet
Lexicalized Probabilistic Context-Free Grammars: Michael Collins
22 pages
Lecture 2
No ratings yet
Lecture 2
28 pages
A Look at Parsing and Its Applications
No ratings yet
A Look at Parsing and Its Applications
5 pages
Unit 3
No ratings yet
Unit 3
4 pages
Week 3 - Probablistic Context Free Grammars
No ratings yet
Week 3 - Probablistic Context Free Grammars
18 pages
Adaptor Nips
No ratings yet
Adaptor Nips
8 pages
Lec15 CL1-f11
No ratings yet
Lec15 CL1-f11
5 pages
SocherBauerManningNg ACL2013 PDF
No ratings yet
SocherBauerManningNg ACL2013 PDF
11 pages
NLP Unit II Notes
No ratings yet
NLP Unit II Notes
17 pages
Gadissa Hailu
No ratings yet
Gadissa Hailu
77 pages
NLP Akash
No ratings yet
NLP Akash
4 pages
NLP Sem Imp
No ratings yet
NLP Sem Imp
46 pages
Project Report
No ratings yet
Project Report
12 pages
Unit Ii Part of Speech Tagging and Syntactic Parsing
No ratings yet
Unit Ii Part of Speech Tagging and Syntactic Parsing
29 pages
NLP Final
No ratings yet
NLP Final
4 pages
Nptel: Natural Language Processing - Video Course
No ratings yet
Nptel: Natural Language Processing - Video Course
3 pages
PG 2nd SEM 2024-25 MTECH SYLLABUS UPDATED - 09may2025
No ratings yet
PG 2nd SEM 2024-25 MTECH SYLLABUS UPDATED - 09may2025
28 pages
Grammatical Methods in Computer Vision: An Overview
No ratings yet
Grammatical Methods in Computer Vision: An Overview
23 pages
NLP Quesion Bank
No ratings yet
NLP Quesion Bank
4 pages
NLP Question
No ratings yet
NLP Question
4 pages
RKM029A02 - Project - INT248 - Report 1
No ratings yet
RKM029A02 - Project - INT248 - Report 1
16 pages
CS6120 35650 - Spring2025 - Assignment - 2-1
No ratings yet
CS6120 35650 - Spring2025 - Assignment - 2-1
5 pages
F15 CS194 Lec 05 Natural Language
No ratings yet
F15 CS194 Lec 05 Natural Language
69 pages
CITS4012 Lecture02 PDF
No ratings yet
CITS4012 Lecture02 PDF
54 pages
NLP Unit-2 QB Updated
No ratings yet
NLP Unit-2 QB Updated
10 pages
Book Review: Natural Language Processing With Python
No ratings yet
Book Review: Natural Language Processing With Python
6 pages
07 - NLP - Syntax
No ratings yet
07 - NLP - Syntax
33 pages
Shewa - NLP Project Report PDF
No ratings yet
Shewa - NLP Project Report PDF
7 pages
6aid5-12 NLP
No ratings yet
6aid5-12 NLP
2 pages
CS 391L Machine Learning Course Syllabus
No ratings yet
CS 391L Machine Learning Course Syllabus
2 pages
Mastering Parallel Programming with R
From Everand
Mastering Parallel Programming with R
Simon R. Chapple
No ratings yet
Gene Expression Programming: Fundamentals and Applications
From Everand
Gene Expression Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet

NLP Unit-4

Uploaded by

NLP Unit-4

Uploaded by

UNIT-4

Probabilistic Language Processing

Why PCFG Used:

• (N, T, S, R) is a context free grammar where N is set of non-terminal (variable)

= 1.0 * 0.1 * 0.7 * 1.0 * 0.4 * 0.18 * 1.0 * 1.0 * 0.18

Kaggle-based Text Classification

import matplotlib as mpl

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import accuracy_score

from time import time

You might also like