0% found this document useful (0 votes)

248 views14 pages

Lecture Notes On Syntactic Processing

Lecture notes on Syntactic Processing

Uploaded by

Akila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

248 views14 pages

Lecture Notes On Syntactic Processing

Lecture notes on Syntactic Processing

Uploaded by

Akila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture Notes

Syntactic Processing
Syntactic processing is widely used in applications such as question answering systems, information extraction,
sentiment analysis, grammar checking etc. In this module, you learnt that there are three broad levels of syntactic
processing - (Parts-of-Speech) POS tagging, constituency parsing, and dependency parsing. And, POS tagging is a
crucial task in syntactic processing and is used as a preprocessing step in many NLP applications.

Parts-of-Speech Tags

You learnt about most commonly used tags in Penn Treebank of NLTK. Here is the list of the commonly
used tags:

Class Tag Part-of-Speech Definition Examples

Determiner DT Determiner Describes a reference to a noun This, that, the,

NN Noun, singular or tag of singular forms of common nouns Cat, box,

mass animal

Noun
NNS Noun, singular or tag of plural forms of common nouns Cats buses,
mass animals

NNP Proper noun, Tag of noun names, places and things India, Rahul,
singular Eric

NNPS Proper noun, plural Names of nations and nationalities, Personal Germans,
names Indians, three
Billys

Conjunction CC Coordinating Connects words, phrases, and clauses And but, or

Conjunctions

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

VB Verb, base form Verbs in the base form Learn, eat,
study

VBD Verb, past form Past tense verbs to express action/state of Learnt, ate,
the past studied

VBG Verb, gerund or Verbs ending with ‘-ing’ Dying, lying,

present participle travelling
Verb

VBN Verb, past Verbs in the past form when used with - Have died
participle has/have/had/modal verb. Common - Had waited
scenarios are: -Could have
Have + ‘verb’ written
Might + have + ‘verb’ - were excited
were/is + ‘verb’

VBP Verb, non-3rd Non 3rd person verb to express I like this
person singular routines/habits, facts/truth and thoughts and game.
present feelings (used without word ‘to’) I go for a walk
daily

VBZ Verb, 3rd person Verbs that are used for 3rd person singular Argues,
singular present entities catches,
-ends with ‘s’, ‘es’ replies

Adjective JJ Adjective Words describing noun Tall, large,

generous

Adverb RB Adverb Words modifying a verb, adjective or other Now, first,

adverb Slowly went

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Preposition IN Preposition Links nouns, pronouns and phrases to other Above, across,
words in a sentence in, near of

Pronoun PRP Personal Pronoun Substitutes of noun I, he, she, they

Different Approaches to POS Tagging

Next, you learnt about the four main techniques used for POS tagging:
• Lexicon-based approach uses the following simple statistical algorithm: for each word, it assigns the
POS tag that most frequently occurs for that word in some training corpus.
For example, it will assign the tag "verb" to any occurrence of the word "run" if "run" is used as a
verb more often than any other tag.
• Rule-based taggers first assign the tag using the lexicon method and then apply predefined rules.
Some examples of rules are: Change the tag to VBG for words ending with ‘-ing’, Changes the tag to
VBD for words ending with ‘-ed’, etc.
You also learnt to implement the lexicon and rule-based tagger on the Treebank corpus of NLTK. Next, you
learnt about:
• Probabilistic (or stochastic) techniques don't naively assign the highest frequency tag to each word,
instead, they look at slightly longer parts of the sequence and often use the tag(s) and the word(s)
appearing before the target word to be tagged. You learnt about the commonly used probabilistic
algorithm for POS tagging - the Hidden Markov Model (HMM)
Deep-learning based POS tagging: Recurrent Neural Networks (RNNs) are used for sequential modeling
processes. In this module, you got a basic overview on how RNNs are used for POS tagging.

Hidden Markov Models

In probabilistic method, you learnt about Markov processes and HMMs. Markov processes are commonly
used to model sequential data, such as text and speech. You learnt that the first-order Markov assumption
states that the probability of an event (or state) depends only on the previous state.

The Hidden Markov Model, an extension to the Markov process, is used to model phenomena where the
states are hidden and they emit observations. The transition and the emission probabilities specify the
probabilities of transition between states and emission of observations from states, respectively. In POS
tagging, the states are the POS tags while the words are the observations. To summarise, a Hidden Markov
Model is defined by the initial state, emission, and the transition probabilities. Refer to following image to
revise transition, emission probabilities:

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

You learnt how to calculate the probability of a tag sequence for a given sequence of words. The POS tag
T for given word W depends on two things: POS tag of the previous word and the word itself.
i i

P(T | W ) = P(W |T ) * P(T |T )

i i i i i-1 i

So, the probability of a tag sequence (T , T , T , …, T ) for a given the word sequence (W , W , W , …, W ) can
1 2 3 n 1 2 3 n

be defined as:
P(T|W) = (P(W |T ) * P(T |start)) * (P(W |T ) * P(T |T )) * ...* (P(W |T ) * P(T |T ))
1 1 1 2 2 2 1 n n n n-1

You learnt that for a sequence of n words and t tags, a total of t tag sequences are possible. The Penn
n

Treebank dataset in NLTK itself has 36 POS tags, so for a sentence of length say 10, there are 36 possible 10

tag sequences.

Viterbi Heuristic

Next, you studied how Viterbi Heuristic can deal with this problem by taking a greedy approach. The basic
idea of the Viterbi algorithm is as follows - given a list of observations (words) O1,O2....On to be tagged,
rather than computing the probabilities of all possible tag sequences, you assign tags sequentially, i.e.
assign the most likely tag to each word using the previous tag.

More formally, you assign the tag T to each word W such that it maximises the likelihood:
i i

P(T | W ) = P(W |T ) * P(T |T ) ,

i i i i i-1 i

where T is the tag assigned to the previous word. The probability of a tag T is assumed to be dependent
i-1 i

only on the previous tag T , and hence the term P(T |T ) - Markov Assumption.
i-1 i i-1

Next you learnt that the Viterbi algorithm is an example of a dynamic programming algorithm. In general,
algorithms which break down a complex problem into subproblems and solve each subproblem optimally
are called dynamic programming algorithms.

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Learning HMM Model Parameters

Next, you learnt to compute the emission & transition probabilities from a tagged corpus. This process of
learning the probabilities from a tagged corpus is called training an HMM model. The emission and the
transition probabilities can be learnt as follows:

Emission Probability of a word 'w' for tag 't':

P(w|t) = Number of times w has been tagged t/Number of times t appears
Example: P(‘cat’|N) = Number of times ‘cat’ appears as Noun/ Number of times Noun is appearing

Transition Probability of tag t1 followed by tag t2:

P(t2|t1) = Number of times t1 is followed by tag t2/ Number of times t1 appears
Example: P(Noun|Adj) = number of times adjective is followed by Noun/ Number of times Adjective is
appearing

HMM & Viterbi Implementation in Python

You learnt how to build a POS tagger using Viterbi Heuristic. For training the HMM, i.e., for learning the
model parameters, you used the NLTK Treebank corpus. After learning the model parameters, you find the
best possible state (tag) sequence for each given sentence. For that, you used the Viterbi algorithm - for
every word w in the sentence, a tag t is assigned to w such that it maximises the likelihood of the
occurrence of P(tag|word).

P(tag|word) = P(word|tag) * P(tag|previous tag)

= Emission probability * Transition probability

In other words, the tag t is assigned to the word w which has the max P(tag|word).
The assigned tags and words are then stored as a list of tuples. As you move to the next word in the list,
each tag to be assigned will use the tag of the previous word.

You saw that the Viterbi algorithm gave ~87% accuracy. The 13% loss of accuracy was majorly because of
the fact that when the algorithm hit an unknown word (i.e. not present in the training set), it naively
assigned the first tag in the list of tags that we have created.

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Deep-learning based POS Tagging

Next, you got a brief overview of how you can build POS taggers using RNNs. Recurrent Neural Networks (RNNs)
have empirically proven to outperform many conventional sequence models for tasks such as POS tagging, entity
recognition, dependency parsing etc. You’ll learn RNNs in detail later in the Neural Network course.

Constituency Parsing

Next, you studied why shallow parsing is not sufficient. Shallow parsing, as the name suggests, refers to
fairly shallow levels of parsing such as POS tagging, chunking, etc. But such techniques would not be able
to check the grammatical structure of the sentence, i.e. whether a sentence is grammatically correct, or
understand the dependencies between words in a sentence.
So, you learnt the two most commonly used paradigms of parsing - constituency parsing and dependency
parsing, which would help to check the grammatical structure of the sentence.
In constituency parsing, you learnt the basic idea of constituents as grammatically meaningful groups of
words, or phrases, such as noun phrase, verb phrase etc. You also learnt the idea of context-free grammars
or CFGs which specify a set of production rules. Any production rule can be written as A -> B C, where A is a
non-terminal symbol (NP, VP, N etc.) and B and C are either non-terminals or terminal symbols (i.e. words
in vocabulary such as flight, man etc.). Example a CFG is:
S -> NP VP
NP -> DT N| N| N PP
VP -> V| V NP
N -> ‘man’| ‘bear’
V -> ‘ate’
DT -> ‘the’| ‘a’

Then, you learnt two broad approaches to constituency parsing:

• Top-down parsing: starts with the start symbol S at the top and uses the production rules to parse
each word one by one. And, you continue to parse until all the words have been allocated to some
production rule. Top-down parsers have a specific limitation- Left Recursion. Example of a left
recursion: VP -> VP NP. Whenever a top-down parser encounters such a rule, it runs into an infinite
loop, thus no parse tree is obtained. Following is the illustration of top-down parse:

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

• Bottom-up parsing: reduces each terminal word to a production rule, i.e. reduces the right-hand-
side of the grammar to the left-hand-side. It continues the reduction process until the entire
sentence has been reduced to the start symbol S. You learnt about Shift-Reduce Parser algorithm,
which parses the words of the sentence one-by-one either by shifting a word to the stack or
reducing the stack by using the production rules. Below is an example of bottom-up parse tree.

You also learnt how to build both these types of parsed structures in Python.

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Probabilistic CFG

Since natural languages are inherently ambiguous (at least for computers to understand), there are often
cases where multiple parse trees are possible. In such cases, we need a way to make the algorithms figure
out the most likely parse tree. Probabilistic Context-Free Grammars (PCFGs) are used when we want to
find the most probable parsed structure of the sentence. PCFGs are grammar rules, similar to what you
have seen before, along with probabilities associated with each production rule. An example production
rule is as follows:
NP -> Det N (0.5) | N (0.3) |N PP (0.2)
It means that the probability of an NP breaking down to a ‘Det N’ is 0.50, to an 'N' is 0.30 and to an ‘N PP’ is
0.20. Note that the sum of probabilities is 1.00.
Overall probability for a parsed structure of the sentence is probabilities of all rules used in that parsed
structure. The parsed tree with maximum probability is best possible interpretation of the sentence. You
also learnt to implement PCFG in Python.

Chomsky Normal Form

The Chomsky Normal Form (CNF), proposed by the linguist Noam Chomsky, is a normalized version of the
CFG with a standard set of rules defining how production rule must be written. The three forms of CNF
rules can be written:
• A -> B C
• A -> a
• S -> ε
A, B, C are non-terminals (POS tags), a is a terminal (term), S is the start symbol of the grammar and ε is the
null string. The table below shows some examples for converting CFGs to the CNF:

CFG VP -> V NP PP VP -> V

VP -> V (NP1) VP -> V (VP1)

CNF
NP1 -> NP PP VP1 -> ε

Dependency Parsing

After constituency parsing, you learnt about Dependency Parsing. In dependency grammar, constituencies
(such as NP, VP etc.) do not form the basic elements of grammar, but rather dependencies are established
between the words themselves.

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

You learnt about free and fixed word order languages. Free word order languages such as Hindi are
difficult to parse using constituency parsing techniques. This is because, in such free-word-order languages,
the order of words/constituents may change significantly while keeping the meaning exactly the same. It is
thus difficult to fit the sentences into the finite set of production rules that CFGs offer.
Next, you learnt how dependencies in a sentence are defined using the elements Subject-Verb-Object
(SVO). The following table shows SVO dependencies in three types of sentences - declarative, interrogative,
and imperative:

Declarative Shyam complimented Suraj

Subject Verb Object

Interrogative Will the teacher take the class today?

Aux Subject Object
(Aux: auxiliary verbs such as will, be, can)

Imperative Stop the car!

Verb Object

Next, you learnt about universal dependencies. Apart from dependencies defined in the form of subject-
verb-object, there's a non-exhaustive list of dependency relationships, which are called universal
dependencies.
Dependencies are represented as labelled arcs of the form h → d (l) where 'h' is called the “head” of the
dependency, 'd' is the “dependent” and l is the “label” assigned to the arc. In a dependency parse, we start
from the root of the sentence, which is often a verb. And then start to establish dependencies between
root and other words.

Information Extraction

In this session, you learnt to build an information extraction (IE) system which can extract entities relevant
for booking flights (such as source and destination cities, time, date, budget constraints etc.) in a
structured format from unstructured user-generated queries. IE is used in many applications such as
conversational chatbots, extracting information from encyclopedias (such as Wikipedia), etc. In this
session, you learnt to use the ATIS dataset for IE.
A generic IE pipeline is as follows:
1. Preprocessing
1. Sentence Tokenization: sequence segmentation of text.

2. Word Tokenization: breaks down sentences into tokens
3. POS tagging - assigning POS tags to the tokens. The POS tags can be helpful in defining what
words could form an entity.
2. Entity Recognition
1. Rule-based models
2. Probabilistic models
Most IE pipelines start with the usual text preprocessing steps - sentence segmentation, word tokenisation
and POS tagging. After preprocessing, the common tasks are Named Entity Recognition (NER), and
optionally relation recognition and record linkage. NER is arguably the most important and non-trivial task
in the pipeline.
You learnt various techniques and models for building Named Entity Recognition (NER) system, which is a
key component in information extraction systems:
• Rule-based techniques
o Regular expression based techniques
o Chunking
• Probabilistic models
o Unigram & Bigram models
o Naive Bayes Classifier
o Decision trees
o Conditional Random Fields (CRFs)

You learnt about IOB labeling. IOB (or BIO) method tags each token in the sentence with one of the three
labels: I - inside (the entity), O- outside (the entity) and B - beginning (of entity). You saw that IOB labeling
is especially helpful if the entities contain multiple words. For example: words like ‘Air India’, ‘New Delhi’,
etc, are single entities.

Rule-based method for NER

Next, you learnt in detail about Rule-based method: Chunking. Chunking is a commonly used shallow
parsing technique used to chunk words that constitute some meaningful phrase in the sentence. A noun
phrase chunk (NP chunk) is commonly used in NER tasks to identify groups of words that correspond to
some 'entity'.
Sentence: He bought a new car from the Maruti Suzuki showroom.
Noun phrase chunks - a new car, the Maruti Suzuki showroom
The idea of chunking in the context of entity recognition is simple - most entities are nouns and noun
phrases, so rules can be written to extract these noun phrases and hopefully extract a large number of
named entities. Example of chunking done using regular expressions:
Sentence: Ram booked the flight.
Noun phrase chunks: 'Ram', 'the flight'
Grammar: 'NP_chunk: {<DT>?<NN>}'
© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved
Probabilistic method for NER

Next, you learnt the following two probabilistic models to get the most probable IOB tags for word:
1. Unigram chunker computes the unigram probabilities P(IOB label | pos) for each word and assigns
the label that is most likely for the POS tag.
1. Bigram chunker works similar to a unigram chunker, the only difference being that now the
probability of a POS tag having an IOB label is computed using the current and the previous POS
tags, i.e. P(label | pos, prev_pos).
Gazetteer Lookup
Another way to identify named entities (like cities and states) is to look up a dictionary or a gazetteer. A
gazetteer is a geographical directory which stores data regarding the names of geographical entities (cities,
states, countries) and some other features related to the geographies.

Machine Learning Classifiers for NER

You studied that just like machine learning classification models, you can have features for sequence labelling task.
Features could be the morphology (or shape) of the word such as whether the word is upper/lowercase, POS tags of
the words in the neighbourhood, whether the word is present in the gazetteer (i.e. word_is_city, word_is_state),
etc. And using these features, you learnt to build a Naive Bayes classifier and Decision Tree classifier.

Conditional Random Fields

HMMs can be used for any sequence classification task, such as NER. However, many NER tasks and
datasets are far more complex than tasks such as POS tagging, and therefore, more sophisticated sequence
models have been developed and widely accepted in the NLP community. One of these models is
Conditional Random Fields (CRFs).

CRFs are used in a wide variety of sequence labelling tasks across various domains - POS tagging, speech
recognition, NER, and even in computational biology for modelling genetic patterns etc.

Next, you studied the architecture of CRFs. CRFs model the conditional probability P(Y|X), where Y is the
vector of output sequence (IOB labels here) and X is the input sequence (words to be tagged), which are
similar to Logistic Regression classifier. Broadly, there are two types of classifiers in ML:
1. Discriminative classifiers learn the boundary between classes by modelling the conditional
probability distribution P(y|x), where y is the vector of class labels and x represents the input
features. Examples are Logistic Regression, SVMs etc.
1. Generative classifiers model the joint probability distribution P(x,y). Examples of generative
classifiers are Naive Bayes, HMMs etc.

CRFs are discriminative probabilistic classifiers (often represented as undirected graphical models in some
texts).

Next, you learnt about CRFs’ feature functions. CRFs use ‘feature functions’ rather than the input word
sequence x itself. The idea is similar to how features are extracted for building the naive Bayes and
decision tree classifiers in a previous section. Some example ‘word-features’ (each word has these
features) are:
• Word and POS tag based features: word_is_city, word_is_digit, pos, previous_pos, etc.
• Label-based features: previous_label

A feature function takes the following four inputs:

1. The input sequence of words: x
2. The position of a word in the sentence (whose features are to be extracted)
3. The label yi of the current word (the target label)
4. The label yi-1 of the previous word

Example of a feature function:

A feature function f1 which returns 1 if the word x is a city and the corresponding label yi is ‘I-location’, else
i

0. This can be represented as:

The feature function returns 1 only if both the conditions are satisfied, i.e. when the word is a city name
and is tagged as ‘I-location’ (e.g. Chicago/I-location).

Every feature function fi has a weight wi associated with it, which represents the ‘importance’ of that
feature function. This is almost exactly the same as logistic regression where coefficients of features
represent their importance. Training a CRF means to compute the optimal weight vector w which best
represents the observed sequences y for the given word sequences x. In other words, we want to find the
set of weights w which maximises P(y|x,w).

In CRFs, the conditional probabilities P(y|x,w) are modeled using a scoring function. If there are k feature
functions (and thus k weights), for each word i in the sequence x, a scoring function for a word is defined
as follows:

and overall sequence score for the sentence can be defined as:

The probability of observing the label sequence y given the input sequence x is given by:

where, Z(x) is sum of scores of all possible tag sequences N:

Training a CRF model means to compute the optimal set of weights w which best represents the observed
sequences y for the given word sequences x. In other words, we want to find the set of weights w which
maximises the conditional probability P(y|x,w) for all the observed sequences (x,y):

By taking log and simplifying the equations, the final equation comes out as:

The final equation after taking the gradient of the log-likelihood function is:

Prediction using CRF: the inference task to assign the label sequence y to x which maximises the score of
*

the sequence, i.e.

The naive way to get y* is by calculating w.f(x,y) for every possible label sequence , and then choose the
label sequence that has maximum (w.f(x,y)) value. However, there are an exponential number of possible
labels (tn for a tag set of size t and a sentence of length n), and this task is computationally heavy. You
learnt how to derive the best possible path using Viterbi algorithm.

You also learnt the Python implementation of CRF. CRFs outperformed the rule-based and ML-classification
algorithms.

Disclaimer: All content and material on the UpGrad website is copyrighted material, either belonging to UpGrad or
its bonafide contributors and is purely for the dissemination of education. You are permitted to access print and
download extracts from this site purely for your own education only and on the following basis:

• You can download this document from the website for self-use only.
• Any copies of this document, in part or full, saved to disc or to any other storage medium may only be used
for subsequent, self-viewing purposes or to print an individual extract or copy for non-commercial personal
use only.
• Any further dissemination, distribution, reproduction, copying of the content of the document herein or the
uploading thereof on other websites or use of content for any other commercial/unauthorized purposes in
any way which could infringe the intellectual property rights of UpGrad or its contributors, is strictly
prohibited.
• No graphics, images or photographs from any accompanying text in this document will be used separately
for unauthorised purposes.
• No material in this document will be modified, adapted or altered in any way.
• No part of this document or UpGrad content may be reproduced or stored in any other web site or included
in any public or private electronic retrieval system or service without UpGrad’s prior written permission.
• Any rights not expressly granted in these terms are reserved.

19CSE453 - Natural Language Processing: Part of Speech Tagging
No ratings yet
19CSE453 - Natural Language Processing: Part of Speech Tagging
59 pages
(Ebook) Speech and Language Processing: An Introduction To Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky, James H. Martin Download
100% (1)
(Ebook) Speech and Language Processing: An Introduction To Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky, James H. Martin Download
80 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Batch Normalization Separate
No ratings yet
Batch Normalization Separate
20 pages
Glove
100% (1)
Glove
10 pages
Lecture Notes: IV B. Tech I Semester (JNTUH-R13)
No ratings yet
Lecture Notes: IV B. Tech I Semester (JNTUH-R13)
18 pages
NLP Assignment 2
No ratings yet
NLP Assignment 2
2 pages
10pos Tagging PDF
No ratings yet
10pos Tagging PDF
76 pages
NLP Question Bank Answers (Jagmeet)
No ratings yet
NLP Question Bank Answers (Jagmeet)
31 pages
Natural Language Processing Exam Questions
No ratings yet
Natural Language Processing Exam Questions
2 pages
CM3060 NLP Mock Exam Oct2021
No ratings yet
CM3060 NLP Mock Exam Oct2021
4 pages
Language Models & N-Gram Analysis
No ratings yet
Language Models & N-Gram Analysis
41 pages
Word Embeddings in NLP
No ratings yet
Word Embeddings in NLP
42 pages
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
0% (1)
Be Computer Engineering Semester 7 2023 May Dloc III Natural Language Processing Rev 2019 C Scheme
2 pages
NLP End Sem Paper - Evaluation Scheme
No ratings yet
NLP End Sem Paper - Evaluation Scheme
14 pages
Al3501 - NLP Iat Set 2
No ratings yet
Al3501 - NLP Iat Set 2
2 pages
Semantics, Pragmatics, and Logic
No ratings yet
Semantics, Pragmatics, and Logic
105 pages
NLP Comprehensive Study Guide Pokhara University Fall 2025
No ratings yet
NLP Comprehensive Study Guide Pokhara University Fall 2025
50 pages
Unit 3
No ratings yet
Unit 3
14 pages
Unit 2 DL
No ratings yet
Unit 2 DL
43 pages
A Guide To Hidden Markov Model and Its Applications in NLP
No ratings yet
A Guide To Hidden Markov Model and Its Applications in NLP
11 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
Early Detection of Lung Cancer Using AI and ML
No ratings yet
Early Detection of Lung Cancer Using AI and ML
6 pages
Intro to Topic Modeling
No ratings yet
Intro to Topic Modeling
120 pages
Parameter Estimation
No ratings yet
Parameter Estimation
12 pages
NLP & Word Vectors: SVD and Word2Vec
No ratings yet
NLP & Word Vectors: SVD and Word2Vec
14 pages
NLP Preprocessing Steps
No ratings yet
NLP Preprocessing Steps
20 pages
CV Lab Manual PDF
No ratings yet
CV Lab Manual PDF
67 pages
NLP Text Representation Guide
No ratings yet
NLP Text Representation Guide
131 pages
Deep Learning with RBMs and DBNs
No ratings yet
Deep Learning with RBMs and DBNs
79 pages
Deep Learning Interview Questions Guide
No ratings yet
Deep Learning Interview Questions Guide
46 pages
Assignment 2
No ratings yet
Assignment 2
7 pages
Table of Content
No ratings yet
Table of Content
13 pages
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
No ratings yet
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
6 pages
Natural Deduction and Knowledge Representation in AI
No ratings yet
Natural Deduction and Knowledge Representation in AI
3 pages
Text Processing Basics: Tokenization Guide
No ratings yet
Text Processing Basics: Tokenization Guide
42 pages
ANN-unit 4
No ratings yet
ANN-unit 4
25 pages
Encoder-Decoder Models Overview
No ratings yet
Encoder-Decoder Models Overview
63 pages
AI Beyond Classical Search &CSPs
No ratings yet
AI Beyond Classical Search &CSPs
116 pages
Flower Identification Using Machine Learning This Report Conferred To The Department of CSE of Daffodil International
No ratings yet
Flower Identification Using Machine Learning This Report Conferred To The Department of CSE of Daffodil International
42 pages
Speech Recognition Systems Guide
No ratings yet
Speech Recognition Systems Guide
13 pages
Swe1017 NLP Syllabus
No ratings yet
Swe1017 NLP Syllabus
2 pages
ResNet & VGGNet Deep Learning Guide
No ratings yet
ResNet & VGGNet Deep Learning Guide
44 pages
Chapter V - Working With Text Data
No ratings yet
Chapter V - Working With Text Data
30 pages
Building and Booting Operating Systems
No ratings yet
Building and Booting Operating Systems
4 pages
RNN and LSTM Architectures Explained
No ratings yet
RNN and LSTM Architectures Explained
42 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
A Fuzzy Ontology and Its Application To News Summarization
100% (1)
A Fuzzy Ontology and Its Application To News Summarization
22 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
Text Classification PDF
No ratings yet
Text Classification PDF
56 pages
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
No ratings yet
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
3 pages
NLTK Tokenization & Stemming Guide
No ratings yet
NLTK Tokenization & Stemming Guide
8 pages
NLP Word Vectors: Intro & Methods
No ratings yet
NLP Word Vectors: Intro & Methods
128 pages
Speech and Language Processing - J&M
No ratings yet
Speech and Language Processing - J&M
599 pages
Word Vectors in NLP: Skip-Gram Model
No ratings yet
Word Vectors in NLP: Skip-Gram Model
11 pages
CV Lab Manual
No ratings yet
CV Lab Manual
126 pages
1 Autoencoders
No ratings yet
1 Autoencoders
22 pages
Vehicle Counting with AI: A Study
No ratings yet
Vehicle Counting with AI: A Study
11 pages
5 Sequence Learning
No ratings yet
5 Sequence Learning
50 pages
Techniques for POS Tagging Explained
No ratings yet
Techniques for POS Tagging Explained
12 pages
Addition, Subtraction, Multiplication, Division Part - 1 Grade 1 & 2
No ratings yet
Addition, Subtraction, Multiplication, Division Part - 1 Grade 1 & 2
19 pages
Modeling and Simulation of Complex Power Systems-IET
100% (1)
Modeling and Simulation of Complex Power Systems-IET
322 pages
Eton - KS - MathsBPaper - 2012
No ratings yet
Eton - KS - MathsBPaper - 2012
4 pages
Pre-Med Calculus Tutorial Sheet
No ratings yet
Pre-Med Calculus Tutorial Sheet
9 pages
Simple Profit Model Analysis
No ratings yet
Simple Profit Model Analysis
18 pages
Laplace Transforms & Transfer Functions
No ratings yet
Laplace Transforms & Transfer Functions
14 pages
Ground Water Flow and Contaminant Transport Models
No ratings yet
Ground Water Flow and Contaminant Transport Models
12 pages
Permanent Magnet Synchronous Motor Design Using Grey Wolf Optimizer Algorithm
No ratings yet
Permanent Magnet Synchronous Motor Design Using Grey Wolf Optimizer Algorithm
11 pages
Do We Really Need Graph Neural Networks For Traffic Forecasting
No ratings yet
Do We Really Need Graph Neural Networks For Traffic Forecasting
12 pages
Ec-303 DSD Lab Manual
No ratings yet
Ec-303 DSD Lab Manual
21 pages
A Comparative Study of Texture Measures (1996-Cited1797)
No ratings yet
A Comparative Study of Texture Measures (1996-Cited1797)
9 pages
Basic Proportionality Theorem
No ratings yet
Basic Proportionality Theorem
19 pages
Java Game Programming Basics
No ratings yet
Java Game Programming Basics
31 pages
Single-Camera 3D Gaze Estimation
No ratings yet
Single-Camera 3D Gaze Estimation
4 pages
DBMS Syllabus
No ratings yet
DBMS Syllabus
3 pages
Dsa Pattern
No ratings yet
Dsa Pattern
58 pages
Algebra Success in 20 Minutes A Day 2nd Edition Skill Builders Learningexpress Editors Newest Edition 2025
No ratings yet
Algebra Success in 20 Minutes A Day 2nd Edition Skill Builders Learningexpress Editors Newest Edition 2025
175 pages
Syllabus High Voltage
No ratings yet
Syllabus High Voltage
37 pages
Set 10 Pure Math 2025
No ratings yet
Set 10 Pure Math 2025
11 pages
CSE 134: Data Structure Lecture #1: Mohammad Reduanul Haque
No ratings yet
CSE 134: Data Structure Lecture #1: Mohammad Reduanul Haque
26 pages
Combinatorial Games Analysis
No ratings yet
Combinatorial Games Analysis
13 pages
Neural Networks for Zip Code Recognition
No ratings yet
Neural Networks for Zip Code Recognition
11 pages
Chap-05 System of Linear Equations
No ratings yet
Chap-05 System of Linear Equations
40 pages
Grade 8 Math Module 1 Mathematics Learning Material
No ratings yet
Grade 8 Math Module 1 Mathematics Learning Material
26 pages
3D Geometry: Parallel Lines & Planes
No ratings yet
3D Geometry: Parallel Lines & Planes
6 pages
SIOC Presentation
No ratings yet
SIOC Presentation
67 pages
Structural Analysis II Course Overview
No ratings yet
Structural Analysis II Course Overview
13 pages
Arnold, V.I. - Ordinary Differential Equations - Red
100% (1)
Arnold, V.I. - Ordinary Differential Equations - Red
302 pages
ECON 233-Lec 4,5 Summer
No ratings yet
ECON 233-Lec 4,5 Summer
43 pages

Lecture Notes On Syntactic Processing

Uploaded by

Lecture Notes On Syntactic Processing

Uploaded by

Lecture Notes

Class Tag Part-of-Speech Definition Examples

Determiner DT Determiner Describes a reference to a noun This, that, the,

NN Noun, singular or tag of singular forms of common nouns Cat, box,

Conjunction CC Coordinating Connects words, phrases, and clauses And but, or

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

VBG Verb, gerund or Verbs ending with ‘-ing’ Dying, lying,

Adjective JJ Adjective Words describing noun Tall, large,

Adverb RB Adverb Words modifying a verb, adjective or other Now, first,

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Pronoun PRP Personal Pronoun Substitutes of noun I, he, she, they

Different Approaches to POS Tagging

Hidden Markov Models

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

P(T | W ) = P(W |T ) * P(T |T )

P(T | W ) = P(W |T ) * P(T |T ) ,

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Emission Probability of a word 'w' for tag 't':

Transition Probability of tag t1 followed by tag t2:

HMM & Viterbi Implementation in Python

P(tag|word) = P(word|tag) * P(tag|previous tag)

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Then, you learnt two broad approaches to constituency parsing:

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Chomsky Normal Form

CFG VP -> V NP PP VP -> V

VP -> V (NP1) VP -> V (VP1)

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Declarative Shyam complimented Suraj

Interrogative Will the teacher take the class today?

Imperative Stop the car!

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Rule-based method for NER

Machine Learning Classifiers for NER

Conditional Random Fields

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

A feature function takes the following four inputs:

Example of a feature function:

0. This can be represented as:

where, Z(x) is sum of scores of all possible tag sequences N:

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

the sequence, i.e.

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

You might also like