0% found this document useful (0 votes)
20 views66 pages

Module-5 (Markov Model and Pos Tagging)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views66 pages

Module-5 (Markov Model and Pos Tagging)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Institute of Aeronautical Engineering

Natural Language
Processing
Module-5
Dr.Akella S Narasimha Raju,
Assistant Professor,
Institute of Aeronautical Engineering,
Dunidgal
Hyderabad
MARKOV MODEL AND POS
TAGGING(MODULE- V)
Introduction to Markov Models and POS Tagging
Part-of-Speech (POS) Tagging

 Part-of-Speech(POS) tagging is a fundamental task


in Natural Language Processing (NLP) that involves
assigning a grammatical category, such as noun,
verb, adjective, etc., to each word in a sentence.
 POS tagging is crucial because it provides the
syntactic structure of sentences, which is
essential for higher-level NLP tasks like parsing,
machine translation, and information retrieval.
Part of Speech (POS) Tagging

 Part of Speech (POS) Tagging is the process of assigning a part


of speech to each word in a sentence. Parts of speech are
grammatical categories that describe the function of words in
a sentence.
 POS tagging is a crucial task in Natural Language Processing
(NLP) as it provides syntactic information necessary for higher-
level tasks like parsing, machine translation, and information
retrieval.
Common Parts of Speech
 Noun (NN): A word that represents a person, place, thing, or idea.
 Examples: dog, city, happiness
 Verb (VB): A word that describes an action, state, or occurrence.
 Examples: run, is, become
 Adjective (JJ): A word that modifies a noun or pronoun, describing
a quality or characteristic.
 Examples: quick, blue, happy
 Adverb (RB): A word that modifies a verb, adjective, or another
adverb, often describing how, when, or where something happens.
 Examples: quickly, very, well
Common Parts of Speech
 Pronoun (PRP): A word that takes the place of a noun.
 Examples: he, they, it
 Preposition (IN): A word that shows the relationship between a
noun (or pronoun) and other words in a sentence.
 Examples: in, on, at
 Conjunction (CC): A word that connects words, phrases, or
clauses.
 Examples: and, but, or
 Determiner (DT): A word that introduces a noun and specifies it as
known or unknown.
 Examples: the, a, this
Common Parts of Speech
 Interjection (UH): A word or phrase that expresses emotion or
exclamation.
 Examples: oh, wow, ouch
 Auxiliary Verb (AUX): A verb that adds functional or
grammatical meaning to the main verb.
 Examples: is, have, will
 Particle (RP): A word that does not belong to any of the
traditional parts of speech but often forms part of a verb
phrase.
 Examples: up, off, out (as in "look up", "take off", "run out")
 Numeral (CD): A word or phrase that denotes a number.
 Examples: one, two, first, second
Common Parts of Speech
•Modal (MD): A type of auxiliary verb that expresses
necessity or possibility.
•Examples: can, could, will, should
•Proper Noun (NNP): A specific name of a particular person,
place, or organization, typically capitalized.
•Examples: John, London, Google
•Possessive (POS): A word that shows ownership.
•Examples: John's, its
•Exclamation (EX): A word used to express surprise,
emotion, or a sudden utterance.
•Examples: there (as in "There is")
Importance of POS Tagging
•Syntactic Parsing: Helps in understanding the grammatical
structure of sentences.
•Machine Translation: Ensures that words are translated
correctly based on their grammatical role.
•Information Retrieval: Enhances search engines by
understanding the context and grammatical function of query
terms.
•Text-to-Speech: Improves the naturalness of generated
speech by providing syntactic context
POS Tagging Techniques
1. Rule-Based Tagging: Uses a set of hand-written rules
to assign tags based on the word and its context.
2. Statistical
Tagging: Uses probabilistic models, such as
Hidden Markov Models (HMMs), to assign tags based
on the likelihood of sequences of tags.
3. Machine Learning-Based Tagging: Uses algorithms
like Conditional Random Fields (CRFs) or neural
networks to learn from annotated corpora and
predict tags.
Example of POS Tagging
 Consider the sentence: "The quick brown fox jumps over the
lazy dog."
• The: Determiner (DT)
• quick: Adjective (JJ)
• brown: Adjective (JJ)
• fox: Noun (NN)
• jumps: Verb (VBZ)
• over: Preposition (IN)
• the: Determiner (DT)
• lazy: Adjective (JJ)
• dog: Noun (NN)
Markov Models

A Markov model is a statistical model that


predicts the probability of a sequence of
events, where the probability of each event
depends only on the state attained in the
previous event.
 Markov models are particularly effective for
sequence prediction tasks, making them
well-suited for POS tagging.
Markov Models

 Markov Chain:
A simple type of Markov model that deals
with states and transitions between states.
 Each state represents a possible part of
speech, and transitions represent the
likelihood of moving from one part of
speech to another.
Markov Models
 Hidden Markov Model (HMM): An extension of Markov chains,
where the states (POS tags) are hidden and only observable
through emitted symbols (words). HMMs are widely used for
POS tagging due to their ability to model the sequential nature
of language.
 States: Represent different POS tags (e.g., noun, verb, adjective).
 Observations: Words in the sentence that are being tagged.
 Transition Probabilities: The likelihood of moving from one POS tag
to another.
 Emission Probabilities: The likelihood of a word being associated
with a particular POS tag.
 Initial Probabilities: The probability of a POS tag starting a sentence
Importance of POS Tagging and Markov
Models
•Syntactic Parsing: POS tagging provides the
grammatical structure needed for syntactic parsing,
helping in understanding sentence structure.
•Machine Translation: Accurate POS tagging ensures
better syntactic alignment in translations.
•Information Retrieval: Improves the relevance of
search results by understanding the context and
grammatical role of query terms.
•Text-to-Speech: Helps in generating natural and fluent
speech by providing syntactic context.
Applications of Markov Models in POS
Tagging
 Markov models, particularly Hidden Markov Models
(HMMs), are used to predict the most likely
sequence of POS tags for a given sentence. The
process involves:
 Training: Using a labeled corpus to estimate transition
and emission probabilities.
 Decoding: Applying algorithms like the Viterbi
algorithm to find the most probable sequence of POS
tags for a new sentence.
Overview

 In this module, we will explore how Markov models and POS


tagging work together to improve NLP tasks.
 We will delve into the theoretical foundations of Markov
models, understand how Hidden Markov Models (HMMs) are
applied to POS tagging, and examine advanced techniques
and practical implementations.
 By the end of this module, you will have a comprehensive
understanding of the role of Markov models in POS tagging
and be able to apply these concepts to real-world NLP
problems.
MARKOV MODEL AND POS
TAGGING(MODULE-V)
Markov Model: Hidden Markov model,
Fundamentals,
Markov Model Concepts

A Markov chain is a stochastic model that


describes a sequence of events in which the
probability of each event depends only on
the state attained in the previous event.
 Thisproperty is known as the Markov
property.
Key Characteristics:
•States: The possible conditions or positions in which
the system can be.
•Transitions: The probabilities of moving from one
state to another.
•Transition Matrix: A matrix that represents the
probabilities of transitioning from each state to every
other state.
Mathematical Representation:
Example of a Simple Markov Chain
Hidden Markov Model (HMM)
Hidden Markov Model (HMM)

A Hidden Markov Model is an extension of the


Markov chain where the states are not directly
observable (hidden) but can be inferred
through observable data.
 HMMs are particularly useful in sequence
prediction tasks, such as POS tagging, where
the underlying states (POS tags) are hidden and
need to be inferred from the observed words.
Key Components:
1. Hidden States: The actual states of the system,
which are not directly observable.
2. Observations: The visible data points generated by
the hidden states.
3. Transition Probabilities: The probabilities of
transitioning from one hidden state to another.
4. Emission Probabilities: The probabilities of
observing a specific observation given a hidden
state.
5. Initial Probabilities: The probabilities of starting in
each hidden state.
Mathematical Representation:
Example of HMM for POS Tagging
Core Algorithms for HMMs
1. Forward Algorithm: Used to compute the probability
of an observation sequence given the model.
2. Viterbi Algorithm: Used to find the most probable
sequence of hidden states for a given observation
sequence.
3. Baum-Welch Algorithm: Used to train the HMM by
adjusting the model parameters to maximize the
likelihood of the observed data.
Practical Example in POS Tagging

 Consider the sentence: "The dog runs fast."


POS Tags

We'll use the following POS tags:


• Determiner (DT)
• Noun (NN)
• Verb (VB)
• Adjective (JJ)
Step 1: Define the HMM Parameters
Step 2: Forward Algorithm (Evaluation)
 TheForward Algorithm calculates the probability of the
observation sequence given the model.
Step 2: Forward Algorithm (Evaluation)
Step 2: Forward Algorithm (Evaluation)
Step 3: Viterbi Algorithm (Decoding)

 The Viterbi Algorithm finds the most probable sequence of hidden states.
Step 3: Viterbi Algorithm (Decoding)
Step 3: Viterbi Algorithm (Decoding)
Step 4: Baum-Welch Algorithm (Training)
 The Baum-Welch Algorithm is used to train the HMM by adjusting the model parameters to
maximize the likelihood of the observed data.
Conclusion
• Markov Model: A probabilistic model where the future state
depends only on the current state.
• Hidden Markov Model (HMM): An extension where states
are hidden and only observable through emissions.
• Applications: Widely used in POS tagging, speech
recognition, and bioinformatics.
 Understanding the fundamentals of Markov models and
HMMs provides a robust framework for tackling sequence
prediction tasks in NLP and beyond.
MARKOV MODEL AND POS
TAGGING(MODULE- V)
Probability of properties, Parameter estimation, Variants,
Multiple input observation. The Information Sources in
Tagging: Markov model taggers,
Probability of Properties in Hidden
Markov Models (HMMs)
 Transition Probabilities: Transition probabilities represent the
likelihood of transitioning from one state to another in an
HMM. These probabilities are crucial for understanding how the
hidden states evolve over time.
Probability of Properties in Hidden
Markov Models (HMMs)
 Emission Probabilities: Emission probabilities represent the
likelihood of observing a particular observation given a
hidden state. These probabilities link the hidden states to
the observable data.
Probability of Properties in Hidden
Markov Models (HMMs)
 Initial Probabilities: Initial probabilities
represent the likelihood of the HMM
starting in each state.
Parameter Estimation
 Parameter estimation involves determining
the values of transition, emission, and
initial probabilities from the training data.
 The Baum-Welch algorithm, a form of the
Expectation-Maximization (EM) algorithm, is
commonly used for this purpose.
Baum-Welch Algorithm Steps:
Baum-Welch Algorithm Steps:
Variants of HMMs

 There are several variants of HMMs tailored for


different applications and more complex
modeling:
1. Factorial HMMs: Multiple hidden states interact to
generate observations.
2. Hierarchical HMMs: Hidden states are organized in a
hierarchy, allowing for multi-level modeling.
3. Input-Output HMMs: Include external inputs that
influence state transitions and emissions.
Multiple Input Observations
 In some applications, observations consist of multiple
features or modalities. For example, in speech recognition,
each observation could include both audio and visual
features.
 Handling Multiple Observations
 Concatenation: Combine all features into a single observation
vector.
 Separate Models: Train separate HMMs for each feature set and
combine their outputs.
 Joint Modeling: Use an HMM that simultaneously models multiple
observation sequences, accounting for dependencies between
them.
Information Sources in Tagging: Markov
Model Taggers
 Markov Model taggers utilize the principles of HMMs
to perform POS tagging by leveraging various sources
of information:
1. LexicalInformation: Word-level information such as word
forms and their frequencies in different parts of speech.
2. Contextual Information: Surrounding words and their POS
tags to predict the current word’s POS tag.
3. Syntactic Information: Grammatical rules and structures
that influence the likelihood of POS sequences.
Markov Model Taggers
 Markov Model taggers use transition and emission probabilities to assign POS tags to words in a
sequence. They can be implemented using different approaches:
Example of HMM Tagging

 Consider the sentence: "The dog runs fast."


 InitialProbabilities (π): Probability of each
POS tag starting the sentence.
 Transition Probabilities (A): Likelihood of
transitioning from one POS tag to another.
 Emission Probabilities (B): Likelihood of a
word being associated with a POS tag.
Viterbi Algorithm Example
Conclusion

 Understanding the probability properties,


parameter estimation, and different variants
of HMMs, as well as how to handle multiple
input observations and utilize information
sources, is essential for effectively applying
Markov Model taggers to POS tagging and
other sequence prediction tasks in NLP.
MARKOV MODEL AND POS
TAGGING(MODULE- V)
Viterbi algorithm, Applying HMMs to POS tagging,
Applications of Tagging.
Viterbi Algorithm

 The Viterbi algorithm is a dynamic


programming algorithm used to find the most
probable sequence of hidden states (POS
tags) in an HMM, given a sequence of
observed events (words).
 It is widely used in POS tagging, speech
recognition, and other sequence prediction
tasks.
Steps of the Viterbi Algorithm:
Steps of the Viterbi Algorithm:
Example
 Consider the sentence: "The dog runs fast."
• States: S={DT,NN,VB,JJ}
• Observations: O={The,dog,runs,fast}
Example
Applying HMMs to POS Tagging

 Hidden Markov Models (HMMs) are a


powerful tool for POS tagging due to their
ability to model the probabilistic
relationships between sequences of hidden
states (POS tags) and observed data
(words).
Steps to Apply HMMs to POS Tagging:
Steps to Apply HMMs to POS Tagging:
Example

 Given the sentence "The dog runs fast" and the


trained HMM parameters, the Viterbi algorithm
will calculate the most likely sequence of POS
tags as:
 The: Determiner (DT)
 dog: Noun (NN)
 runs: Verb (VB)
 fast: Adjective (JJ)
Applications of POS Tagging

 POS tagging is a fundamental task in NLP with various applications:


1. Syntactic Parsing:
 POS tags provide the grammatical structure needed for parsing sentences,
identifying phrases, clauses, and sentence components.
2. Machine Translation:
 Accurate POS tagging ensures that words are correctly translated according to their
grammatical roles, improving translation quality.
3. Information Retrieval:
 POS tags help in understanding the context and relevance of search queries,
leading to more accurate retrieval of information.
Applications of POS Tagging
4. Text-to-Speech (TTS):
 POS tags aid in generating natural and fluent speech by providing syntactic context.
5. Named Entity Recognition (NER):
 POS tagging helps identify entities like names, dates, and locations by distinguishing
between different parts of speech.
6. Sentiment Analysis:
 POS tags assist in identifying opinion words (adjectives, adverbs) and their targets
(nouns), improving sentiment classification.
7. Question Answering:
 POS tags help in understanding the structure of questions and extracting relevant
answers from text.
8. Linguistic Research:
 POS tagging is used in corpus linguistics to study language usage, frequency of different
parts of speech, and syntactic patterns.
Conclusion
 The Viterbi algorithm, when applied to HMMs for POS
tagging, provides an efficient way to find the most
probable sequence of POS tags for a given sentence.
 POS tagging itself is a crucial NLP task with wide-
ranging applications, from syntactic parsing and
machine translation to information retrieval and
sentiment analysis.
 Understanding and implementing these techniques is
essential for developing robust and accurate NLP
systems.

You might also like