0% found this document useful (0 votes)
63 views44 pages

Natural Language Processing: Dr. G. Bharadwaja Kumar

This document discusses parts of speech in natural language processing. It defines parts of speech as categories of words based on their usage and functionality in sentences. The major parts of speech are nouns, verbs, adjectives, and adverbs, which contribute most to a sentence's meaning. Other function words like pronouns and prepositions help create grammatical relationships. The document then provides examples and definitions of each part of speech. It also discusses how part-of-speech tagging is used and two approaches to part-of-speech tagging: rule-based and stochastic tagging using hidden Markov models.

Uploaded by

vikas belida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views44 pages

Natural Language Processing: Dr. G. Bharadwaja Kumar

This document discusses parts of speech in natural language processing. It defines parts of speech as categories of words based on their usage and functionality in sentences. The major parts of speech are nouns, verbs, adjectives, and adverbs, which contribute most to a sentence's meaning. Other function words like pronouns and prepositions help create grammatical relationships. The document then provides examples and definitions of each part of speech. It also discusses how part-of-speech tagging is used and two approaches to part-of-speech tagging: rule-based and stochastic tagging using hidden Markov models.

Uploaded by

vikas belida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

NATURAL

LANGUAGE
PROCESSING
Dr. G. Bharadwaja Kumar
PARTS OF SPEECH

The parts of speech explain how a word is


used in a sentence.

 Based on their usage and


functionality words are
categorized into several
types or parts of speech.
 Words having major parts of speech contribute for
meaning to a greater extent, and hence are sometimes
called content words

 Nouns, verbs, adjectives, and adverbs are content parts of speech.

 Function words are words that exist to explain or create


grammatical or structural relationships into which the content
words may fit.

 Pronouns, prepositions, conjunctions, determiners, qualifiers/intensifiers,


and interrogatives are some function parts of speech.
Use of POS Tagging
Nouns

 This part of a speech refers to words that are used to name persons, things,
animals, places, ideas, or events.

 Tom Hanks is very versatile.


 The italicized noun refers to a name of a person.

 Dogs can be extremely cute.


 In this example, the italicized word is considered a noun because it names an animal.

 It is my birthday.
 The word “birthday” is a noun which refers to an event.
Noun – Subcategories

 Proper– proper nouns always start with a capital letter and refers to specific names of
persons, places, or things.

 Examples: Volkswagen Beetle, Shakey’s Pizza, Game of Thrones

 Common– common nouns are the opposite of proper nouns. These are just generic
names of persons, things, or places.

 Examples: car, pizza parlor, TV series

 Concrete– this kind refers to nouns which you can perceive through your five senses.

 Examples: folder, sand, board

 Abstract- unlike concrete nouns, abstract nouns are those which you can’t perceive
through your five senses.

 Examples: happiness, grudge, bravery


 Count– it refers to anything that is countable, and has a singular and plural form.
 Examples: kitten, video, ball
 Mass– this is the opposite of count nouns. Mass nouns are also called non-
countable nouns, and they need to have “counters” to quantify them.
 Examples of Counters: kilo, cup, meter
 Examples of Mass Nouns: rice, flour, garter
 Collective– refers to a group of persons, animals, or things.
 Example: faculty (group of teachers), class (group of students), pride (group of lions)
Pronoun

 A pronoun is a part of a speech which functions as a replacement for a noun.

 Some examples of pronouns are: I, it, he, she, mine, his, hers, we, they, theirs, and
ours.

Sample Sentences:
•Janice is a very stubborn child. She just stared at me and when I told her
to stop.
•The largest slice is mine.
•We are number one.
Adjectives

 This part of a speech is used to describe a noun or a pronoun. Adjectives can


specify the quality, the size, and the number of nouns or pronouns.

 The carvings are intricate.


 The italicized word describes the appearance of the noun “carvings.”

 I have two hamsters.


 The italicized word “two,” is an adjective which describes the number of the noun “hamsters.”

 Wow! That doughnut is huge!


 The italicized word is an adjective which describes the size of the noun “doughnut.”
Conjuctions

 The conjunction is a part of a speech which joins words, phrases, or clauses


together.
 Examples of Conjunctions: and, yet, but, for, nor, or, and so
 Sample Sentences:
 This cup of tea is delicious and very soothing.
 Kiyoko has to start all over again because she didn’t follow the professor’s instructions.
 Homer always wanted to join the play, but he didn’t have the guts to audition.

 The italicized words in the sentences above are some examples of conjunctions.
Verbs

 This is the most important part of a speech, for without a verb, a sentence would not
exist. Simply put, this is a word that shows an action (physical or mental) or state of
being of the subject in a sentence.
 Examples of “State of Being Verbs” : am, is, was, are, and were
 Sample Sentences:
 As usual, the Stormtroopers missed their shot.
 The italicized word expresses the action of the subject “Stormtroopers.”

 They are always prepared in emergencies.


 The verb “are” refers to the state of being of the pronoun “they,” which is the subject in the
sentence.
Adverb

 Just like adjectives, adverbs are also used to describe words, but the difference is
that adverbs describe adjectives, verbs, or another adverb.
 The different types of adverbs are:
 Adverb of Manner– this refers to how something happens or how an action is done.
 Example: Annie danced gracefully.
 The word “gracefully” tells how Annie danced.
 Adverb of Time- this states “when” something happens or “when” it is done.
 Example: She came yesterday.
 The italicized word tells when she “came.”
 Adverb of Place– this tells something about “where” something happens or ”where”
something is done.
 Example: Of course, I looked everywhere!
 The adverb “everywhere” tells where I “looked.”
 Adverb of Degree– this states the intensity or the degree to which a specific thing
happens or is done.
 Example: The child is very talented.
 The italicized adverb answers the question, “To what degree is the child talented?”
Prepositions

 This part of a speech basically refers to words that specify location or


a location in time.
 Examples of Prepositions: above, below, throughout, outside, before,
near, and since
 Sample Sentences:
 Micah is hiding under the bed.
 The italicized preposition introduces the prepositional phrase “under the bed,” and
tells where Micah is hiding.
 During the game, the audience never stopped cheering for their team.
 The italicized preposition introduces the prepositional phrase “during the game,” and tells when
the audience cheered.
Interjections

 This part of a speech refers to words which express emotions. Since interjections
are commonly used to convey strong emotions, they are usually followed by an
exclamation point.

 Sample Sentences:

 Ouch! That must have hurt.


 Hurray, we won!
 Hey! I said enough!
Corpus Alignment
Sample rules

N-IP rule:
A tag N (noun) cannot be followed by a tag IP (interrogative
pronoun)
... man who …
 man: {N}
 who: {RP, IP} --> {RP} relative pronoun

ART-V rule:
A tag ART (article) cannot be followed by a tag V (verb)
...the book…
 the: {ART}
 book: {N, V} --> {N}
How TBL Rules are Learned

 We will assume that we have a tagged corpus.

 Brill’s TBL algorithm has three major steps.


 Tag the corpus with the most likely tag for each (unigram model)
 Choose a transformation that deterministically replaces an existing tag with a
new tag such that the resulting tagged training corpus has the lowest error rate
out of all transformations.
 Apply the transformation to the training corpus.

 These steps are repeated until a stopping criterion is reached.

 The result (which will be our tagger) will be:


 First tags using most-likely tags
 Then apply the learned transformations
Strengths of transformation-based tagging

 exploits a wider range of lexical and syntactic regularities

 can look at a wider context


 condition the tags on preceding/next words not just preceding
tags.
 can use more context than bigram or trigram.

 transformation rules are easier to understand


Stochastic POS tagging

Stochastic (=Probabilistic) tagging


 Assume that a word’s tag only depends on the
previous tags (not following ones)
 Use a training set (manually tagged corpus) to:
 learn the regularities of tag sequences
 learn the possible tags for a word
 model this info through a Markov process
Hidden Markov Model

 For Markov chains, the output symbols are the same as the states.
 See sunny weather: we’re in state sunny

 But in part-of-speech tagging (and other things)


 The output symbols are words
 But the hidden states are part-of-speech tags

 So we need an extension!
 A Hidden Markov Model is an extension of a Markov chain in which the output
symbols are not the same as the states.
 This means we don’t know which state we are in.

Lecture 1, 7/21/2005 Natural Language Processing 33


Hidden Markov Model (HMM) Taggers
 Goal: maximize P(word|tag) x P(tag|previous n tags)

Lexical information Syntagmatic information

 P(word|tag)
 word/lexical likelihood
 probability that given this tag, we have this word
 NOT probability that this word has this tag
 modeled through language model (word-tag matrix)

 P(tag|previous n tags)
 tag sequence likelihood
 probability that this tag follows these previous tags
 modeled through language model (tag-tag matrix)
Efficient Tagging

 How to find the most likely sequence of tags for a sequence


of words

 Given the contextual and lexical estimates, we can use the


Viterbi algorithm to avoid using the brute force method,
which for N tags and T words examines NT sequences.

35
For "Flies like a flower", there are four words and four
possible tags, giving 256 sequences depicted below.
In a brute force method, all of them would be examined.

36
Viterbi Notation

 To track the probability of the best sequence leading to each possible


tag at each position, the algorithm uses , an N×n array, where N is
the number of tags and n is the number of words in the sentence. t(ti)
records the probability of the best sequence up to position t that
ends with the tag, ti.
 To record the actual best sequence, it suffices to record only the one
preceding tag for each tag and position. Hence, another array , an
N×n array, is used. t(ti) indicates for the tag ti in position t which tag
at position t-1 is in the best sequence.

37
Viterbi Algorithm
 Given the word sequence w1,n, the lexical tags t1,N, the
lexical probabilities P(wt|tt), and the bigram probabilities
P(ti|tj), find the most likely sequence of lexical tags for
the word sequence.

Initialization Step:
For i= 1 to N do // For all tag states t1,N
1(ti) = P(w1|ti) × P(ti|ø)
1(ti) = 0 // Starting point

38
Viterbi Algorithm

Iteration Step:
For f=2 to n // next word index
For i= 1 to N // tag states t1,N
f(ti) = maxj=1,N (f-1(tj) × P(ti | tj)) × P(wf| ti))
f(ti) = argmaxj=1,N (f-1(tj) × P(ti | tj)) × P(wf| ti)) //index that gave max

Sequence Identification Step:


Xn = argmaxj=1,N n(tj) // Get the best ending tag state for wn
For i = n-1 to 1 do // Get the rest
Xi = i+1(Xi+1) // Use the back pointer from subsequent state

P(X1,…, Xn) = maxj=1,N n(tj) 39


Example

40
Second Iteration Step

41
Final Iteration

Now we have to backtrack to get the best sequence


“Flies N like V a ART flower N”
42
HMM Training
HMM Learning: Supervised

You might also like