Natural Language Processing: Dr. G. Bharadwaja Kumar
Natural Language Processing: Dr. G. Bharadwaja Kumar
LANGUAGE
PROCESSING
Dr. G. Bharadwaja Kumar
PARTS OF SPEECH
This part of a speech refers to words that are used to name persons, things,
animals, places, ideas, or events.
It is my birthday.
The word “birthday” is a noun which refers to an event.
Noun – Subcategories
Proper– proper nouns always start with a capital letter and refers to specific names of
persons, places, or things.
Common– common nouns are the opposite of proper nouns. These are just generic
names of persons, things, or places.
Concrete– this kind refers to nouns which you can perceive through your five senses.
Abstract- unlike concrete nouns, abstract nouns are those which you can’t perceive
through your five senses.
Some examples of pronouns are: I, it, he, she, mine, his, hers, we, they, theirs, and
ours.
Sample Sentences:
•Janice is a very stubborn child. She just stared at me and when I told her
to stop.
•The largest slice is mine.
•We are number one.
Adjectives
The italicized words in the sentences above are some examples of conjunctions.
Verbs
This is the most important part of a speech, for without a verb, a sentence would not
exist. Simply put, this is a word that shows an action (physical or mental) or state of
being of the subject in a sentence.
Examples of “State of Being Verbs” : am, is, was, are, and were
Sample Sentences:
As usual, the Stormtroopers missed their shot.
The italicized word expresses the action of the subject “Stormtroopers.”
Just like adjectives, adverbs are also used to describe words, but the difference is
that adverbs describe adjectives, verbs, or another adverb.
The different types of adverbs are:
Adverb of Manner– this refers to how something happens or how an action is done.
Example: Annie danced gracefully.
The word “gracefully” tells how Annie danced.
Adverb of Time- this states “when” something happens or “when” it is done.
Example: She came yesterday.
The italicized word tells when she “came.”
Adverb of Place– this tells something about “where” something happens or ”where”
something is done.
Example: Of course, I looked everywhere!
The adverb “everywhere” tells where I “looked.”
Adverb of Degree– this states the intensity or the degree to which a specific thing
happens or is done.
Example: The child is very talented.
The italicized adverb answers the question, “To what degree is the child talented?”
Prepositions
This part of a speech refers to words which express emotions. Since interjections
are commonly used to convey strong emotions, they are usually followed by an
exclamation point.
Sample Sentences:
N-IP rule:
A tag N (noun) cannot be followed by a tag IP (interrogative
pronoun)
... man who …
man: {N}
who: {RP, IP} --> {RP} relative pronoun
ART-V rule:
A tag ART (article) cannot be followed by a tag V (verb)
...the book…
the: {ART}
book: {N, V} --> {N}
How TBL Rules are Learned
For Markov chains, the output symbols are the same as the states.
See sunny weather: we’re in state sunny
So we need an extension!
A Hidden Markov Model is an extension of a Markov chain in which the output
symbols are not the same as the states.
This means we don’t know which state we are in.
P(word|tag)
word/lexical likelihood
probability that given this tag, we have this word
NOT probability that this word has this tag
modeled through language model (word-tag matrix)
P(tag|previous n tags)
tag sequence likelihood
probability that this tag follows these previous tags
modeled through language model (tag-tag matrix)
Efficient Tagging
35
For "Flies like a flower", there are four words and four
possible tags, giving 256 sequences depicted below.
In a brute force method, all of them would be examined.
36
Viterbi Notation
37
Viterbi Algorithm
Given the word sequence w1,n, the lexical tags t1,N, the
lexical probabilities P(wt|tt), and the bigram probabilities
P(ti|tj), find the most likely sequence of lexical tags for
the word sequence.
Initialization Step:
For i= 1 to N do // For all tag states t1,N
1(ti) = P(w1|ti) × P(ti|ø)
1(ti) = 0 // Starting point
38
Viterbi Algorithm
Iteration Step:
For f=2 to n // next word index
For i= 1 to N // tag states t1,N
f(ti) = maxj=1,N (f-1(tj) × P(ti | tj)) × P(wf| ti))
f(ti) = argmaxj=1,N (f-1(tj) × P(ti | tj)) × P(wf| ti)) //index that gave max
40
Second Iteration Step
41
Final Iteration