NLP Unit-4
NLP Unit-4
Probabilistic parsing
Probabilistic parsing is using dynamic programming algorithms to compute the most likely
parse(s) of a given sentence, given a statistical model of the syntactic structure of a language.
Research at Stanford has focused on improving the statistical models used as well as the
algorithms.
A probabilistic context free grammar consists of terminal and nonterminal variables. Each
feature to be modeled has a production rule that is assigned a probability estimated from a
training set of RNA structures. Production rules are recursively applied until only terminal
residues are left.
Probabilistic Language Processing
Probabilistic language processing presupposes a probabilistic model of the language; and uses
that model to infer, for example, how sentences should be parsed, or ambiguous words
interpreted.
Example:
For example: weather forecasting, mail delivery. A probabilistic model is, instead, meant to
give a distribution of possible outcomes (i.e. it describes all outcomes and gives some measure
of how likely each is to occur).
PCFG
Probabilistic Context Free Grammar. (PCFG) A PCFG is a probabilistic version of a CFG
where each production has a probability. Probabilities of all productions rewriting a given non-
terminal must add to 1, defining a distribution for each non-terminal.
The PCFG is used to predict the prior probability distribution of the structure whereas posterior
probabilities are estimated by the inside-outside algorithm and the most likely structure is found
by the CYK algorithm.
Advantages:
PCFGs are good for grammar induction, learning a grammar from a text; CFGs require
negative data in order to learn. PCFGs tend to be robust to disfluencies and grammatical
mistakes because these simply receive low probabilities.
• They provide a precise mathematical definition that clearly rules out certain types of
language.
• The formal definition means that context free grammars are computationally
TRACTABLE--it is possible to write a computer program which determines whether
sentences are grammatical or not.
Limitations of Probabilistic Context Free Grammar:
• Lexical rules are difficult in case of Context free grammar.
• Notations in Context free grammar are quite complex.
• By using the context free grammar, it is very difficult to construct the recognizer.
Example:
Ambiguity is the reason why we are using probabilistic version of CFG. For instance, some
sentences may have more than one underlying derivation. That is, the sentence can be parsed
in more than one ways. In this case, the parse of the sentence become ambiguous.
Probabilistic Context Free Grammar (PCFG) is an extension of Context Free Grammar (CFG)
with a probability for each production rule. Ambiguity is the reason why we are using
probabilistic version of CFG. For instance, some sentences may have more than one underlying
derivation. That is, the sentence can be parsed in more than one ways. In this case, the parse of
the sentence become ambiguous. To eliminate this ambiguity, we can use PCFG to find the
probability of each parse of the given sentence.
A PCFG is made up of a CFG and a set of probabilities for each production rule of CFG. A
PCFG can be formally defined as follows;
A probabilistic context free grammar G is a quintuple G = (N, T, S, R, P) where
Example PCFG:
Probabilistic Context Free Grammar G = (N, T, S, R, P)
• N = {S, NP, VP, PP, Det, Noun, Verb, Pre}
• T = {‘a’, ‘ate’, ‘cake’, ‘child’, ‘fork’, ‘the’, ‘with’}
• S=S
• R = { S → NP VP
NP → Det Noun | NP PP
PP → Pre NP
VP → Verb NP
Det → ‘a’ | ‘the’
Noun → ‘cake’ | ‘child’ | ‘fork’
Pre → ‘with’
Verb → ‘ate’ }
• P = R with associated probability as in the table below;
Rule Probability Rule Probability
S → NP VP 1.0 Det → ‘a’ 0.5
Det → ‘the’ 0.5
NP → NP PP 0.6 Noun → ‘cake’ 0.4
NP → Det Noun 0.4 Noun → ‘child’ 0.3
Noun → ‘fork’ 0.3
PP → Pre NP 1.0 Pre → ‘with’ 1.0
VP → Verb NP 1.0 Verb → ‘ate’ 1.0
We have formally defined PCFG. Now the next question is how to use PCFG to derive the
probability of a parse tree (derivation tree). As discussed, a sentence can be parsed into more
than one way. That means, we can have more than one parse trees for the sentence as per the
CFG due to ambiguity.
Given a parse tree t, with the production rules α1 → β1, α2 → β2, … , αn → βn from R (ie., αi →
βi ∈ R), we can find the probability of tree t using PCFG as follows;
As per the equation, the probability P(t) of parse tree is the product of probabilities of
production rules in the tree t.
Example:
Find the probability of the parse tree t given below;
The Semantics of Programming Languages. Semantics, roughly, are meanings given for groups
of symbols: ab+c, "ab"+"c", mult(5,4). For example, to express the syntax of adding 5 with 4,
we can say: Put a "+" sign in between the 5 and 4, yielding " 5 + 4 ". However, we must also
define the semantics of 5+4.
Text classification is a machine learning technique that assigns a set of predefined categories
to open-ended text. Text classifiers can be used to organize, structure, and categorize pretty
much any kind of text – from documents, medical studies and files, and all over the web.
With text classification, there are two main deep learning models that are widely
used: Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). CNN is
a type of neural network that consists of an input layer, an output layer, and multiple hidden
layers that are made of convolutional layers.
This kernel aims to give a brief overview of performing text classification using Naive Bayes,
Logistic Regression, Support Vector Machines and Decision Tree Classifier. We will be using
a dataset called "Economic news article tone and relevance" which consists of approximately
8000 news articles, which were tagged as relevant or not relevant to the US Economy. Our
goal in this kernel is to explore the process of training and testing text classifiers for this dataset.
Import Required Libraries
In [1]:
import numpy as np
import pandas as pd
import string
import re
import warnings
warnings.filterwarnings("ignore")