0% found this document useful (0 votes)
33 views19 pages

NLP Soln

Nlp solution

Uploaded by

amkashyap1001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views19 pages

NLP Soln

Nlp solution

Uploaded by

amkashyap1001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

2 Mark Question

1. Define Natural Language Processing.


NLP focuses on the interaction between computers and human language, enabling machines to
understand, interpret, and generate human language in a way that is both meaningful and useful.
Natural language processing is a field of artificial intelligence and computational linguistics that
focuses on the interaction between computers and human (natural) languages.Language is
difficult to describe thoroughly. Even if you manage to document all the words and rules of the
standard version of any given language, there are complications such as dialects, slang, sarcasm,
context, and how these things change over time.
2. List out The Application of NLP.
Ans
. 1) Machine Translation
2) Speech Recognition
3) Speech Synthesis
4) Information Retrieval
5) Information Extraction
6) Question Answering
7) Text Summerization
8) Sentiment Analysis.
3. Difference Between Stemming and Lemmatization.

Stemming Lemmatization

Stemming is a process that stems or removes Lemmatization considers the context and
last few characters from a word, often leading converts the word to its meaningful base form,
to incorrect meanings and spelling. which is called Lemma.

For instance, stemming the word ‘Caring‘ For instance, lemmatizing the word ‘Caring‘
would return ‘Car‘. would return ‘Care‘.

Stemming is used in case of large dataset Lemmatization is computationally expensive


where performance is an issue. since it involves look-up tables and what not.
4. Explain in Detail Grams and its Types.
N-grams are continuous sequences of words or symbols, or tokens in a document. In technical
terms, they can be defined as the neighboring sequences of items in a document. They come into
play when we deal with text data in NLP (Natural Language Processing) tasks. They have a wide
range of applications, like language models, semantic features, spelling correction, machine
translation, text mining, etc.
Types of N-Gram

1 Unigram

2 Bigram
3 Trigram

n n-gram
5.What are The Difficulties /Challenges in POS Tagging?
Ans. 1. Contextual words & Phrases & Homonyms
2. Synonyms
3. Irony & Sarcasm
4. Ambiguity
5. Errors in text or speech
6. Idioms & Slang
7. Domain Specific Language.
8. Low-Resource Languages.
6. Explain Stochastic Based Tagging.
Stochastic-based tagging refers to a method in natural language processing (NLP) for
assigning tags or labels to text, often used in tasks like part-of-speech tagging, named
entity recognition, or other forms of text classification. The approach relies on
probabilistic models that use statistical methods to predict tags based on patterns
observed in training data.
Probabilistic Models: Stochastic-based tagging uses models that incorporate probability
to make predictions. These models are trained on a corpus of text where each word or
token is annotated with a tag. The model learns the likelihood of a particular tag
occurring given the context of surrounding words or tokens.
Hidden Markov Models (HMMs): One common type of stochastic model used for
tagging is the Hidden Markov Model. In HMMs, the process of tagging is modeled as a
sequence of states (tags) that generate observable outcomes (words) according to certain
probabilities. The model estimates the probability of a sequence of tags given a sequence
of words.
Training and Inference: During training, the model calculates the probabilities of
transitions between tags and the probabilities of observing specific words given particular
tags. During inference (i.e., when tagging new text), the model uses these probabilities to
predict the most likely sequence of tags for the input text.
Viterbi Algorithm: For HMMs, the Viterbi algorithm is often used to find the most
probable sequence of tags for a given sequence of words. This dynamic programming
algorithm efficiently computes the best tag sequence by considering all possible tag
sequences and their associated probabilities.
Conditional Random Fields (CRFs): Another popular stochastic model for tagging is
Conditional Random Fields. CRFs are used for labeling and segmenting sequential data
and are particularly useful when considering the context of an entire sequence rather than
individual elements in isolation. CRFs model the conditional probability of a label
sequence given an observation sequence, allowing for more flexible and accurate tagging.
Advantages and Limitations: Stochastic-based tagging models can handle a variety of
linguistic phenomena and capture complex patterns in data. However, they require
substantial amounts of annotated training data and can be computationally intensive.
Modern approaches, such as deep learning models, have largely supplemented or
replaced traditional stochastic models in many NLP tasks due to their ability to learn
more complex representations from data.
7 Define Language and Knowledge.
NLP focuses on the interaction between computers and human language, enabling machines to
understand, interpret, and generate human language in a way that is both meaningful and useful. Natural
language processing is a field of artificial intelligence and computational linguistics that focuses on the
interaction between computers and human (natural) languages.Language is difficult to describe
thoroughly. Even if you manage to document all the words and rules of the standard version of any given
language, there are complications such as dialects, slang, sarcasm, context, and how these things change
over time.
8 Explain Ambiguities and its Types.
Ambiguity in language refers to situations where a word, phrase, or sentence can
be interpreted in more than one way. This can create challenges in understanding
and processing language because the intended meaning isn't clear.
Ambiguities can arise in various forms, and they can be broadly categorized into
several types:
1. Lexical Ambiguity
2. Syntactic Ambiguity
3. Semantic Ambiguity
4. Pragmatic Ambiguity
5. Anaphoric Ambiguity
6. Quantifier Ambiguity
7. Semantic Role Ambiguity
9. Explain in Detail Tokenization and list of its Types
Tokenization is a fundamental preprocessing step in natural language processing
(NLP) that involves breaking down text into smaller units, called tokens. Tokens
are typically words, but they can also be phrases, characters, or other meaningful
elements depending on the context and application. Tokenization is crucial for
various NLP tasks, such as text analysis, machine translation, and information
retrieval.
Types of Tokenization
1. Word Tokenization
2. Subword Tokenization
3. Character Tokenization
4. Sentence Tokenization
5. Phrase Tokenization
6. Tokenization with Special Rules
7. Morphological Tokenization
10. Explain Parser and List Out its Types
In computational linguistics and natural language processing (NLP), a parser is a
tool or algorithm used to analyze and understand the structure of sentences or
other text units. Parsing involves breaking down and analyzing the syntactic
structure of text based on grammatical rules or patterns. The primary goal of
parsing is to determine how words or tokens are combined to form meaningful
sentences according to a specified grammar.
Types of Parsers
1 Top-Down Parsers: These parsers start from the root of the parse tree and
attempt to derive the input sentence by expanding grammar rules.

2. Bottom-Up Parsers: These parsers start from the input tokens and attempt to
construct the parse tree by combining tokens into larger constituents until they
match the start symbol of the grammar.

3. Dependency Parsers:Focus on identifying the relationships between words in


a sentence, where words are connected by directed edges that represent syntactic
dependencies.

4.Chart Parsers:Use a chart or table to keep track of partially parsed constituents


and ensure that all possible parses are considered.

5.Dependency Grammar Parsers : Focus on analyzing the syntactic structure


based on dependency grammar, where the emphasis is on the relationships
between words rather than phrase structure.

6.Constituency Parsers : Analyze sentences based on phrase structure grammar,


representing the syntactic structure in terms of nested constituents or phrases.

7. Machine Learning-Based Parsers: Utilize machine learning models, such as


neural networks, to parse text by learning patterns from large annotated corpora.
5 Marks Questions
1. Explain Ambiguities and its Types in Detail.

Ambiguity in language refers to situations where a word, phrase, or sentence can be


interpreted in more than one way. This can create challenges in understanding and
processing language because the intended meaning isn't clear. Ambiguities can arise in
various forms, and they can be broadly categorized into several types:
1. Lexical Ambiguity

Lexical ambiguity occurs when a word has multiple meanings. This type of ambiguity
arises at the level of individual words.

● Polysemy: When a single word has multiple related meanings. For example, the
word "bank" can refer to the side of a river or a financial institution.
● Homonymy: When a word has multiple unrelated meanings. For instance, "bat"
can refer to a flying mammal or a piece of sports equipment.

2. Syntactic Ambiguity

Syntactic ambiguity, also known as structural ambiguity, arises when a sentence can be
parsed in more than one way due to its structure.

● Ambiguous Phrase Structure: For example, "I saw the man with the telescope" can
be interpreted as either the man had the telescope or the speaker used a telescope
to see the man.
● Attachment Ambiguity: When it’s unclear which part of the sentence a modifying
phrase or clause attaches to. For example, "She told her friend that she was tired"
can be ambiguous about whether "she" refers to the friend or to the speaker.

3. Semantic Ambiguity

Semantic ambiguity occurs when the meaning of a sentence or phrase is unclear because
of the multiple meanings of words or phrases within it.

● Ambiguous Reference: For instance, in the sentence "John told Steve he needed
help," it is ambiguous whether "he" refers to John or Steve.
● Ambiguous Scope: When the scope of quantifiers or modifiers is unclear. For
example, "Every student in the class passed" can be interpreted as every student
passing in a single exam or every student passing at least one exam.

4. Pragmatic Ambiguity

Pragmatic ambiguity involves the interpretation of language based on context and world
knowledge. This type of ambiguity arises when the intended meaning depends on the
broader conversational or situational context.

● Implicature: When the speaker implies something that is not explicitly stated. For
example, "Can you pass the salt?" might pragmatically imply that the speaker
wants the salt passed, even though it is framed as a question.
● Speech Acts: Different intentions behind the same utterance. For instance, "Can
you close the window?" might be interpreted as a request rather than a question
about capability.
5. Anaphoric Ambiguity

Anaphoric ambiguity occurs when it's unclear what a pronoun or other referential
expression refers to.

● Pronoun Reference: For example, "Sarah told Emily that she would call her later"
is ambiguous regarding whether "she" refers to Sarah or Emily, and whether "her"
refers to Emily or someone else.

6. Quantifier Ambiguity

Quantifier ambiguity arises when the scope or extent of quantifiers (like "all," "some,"
"many") is unclear.

● Scope Ambiguity: For example, "Some students can speak Spanish" can be
interpreted as some students being able to speak Spanish or that there exists a
subset of students who can speak Spanish.

7. Semantic Role Ambiguity

This type occurs when it’s unclear which role an entity is playing in a sentence.

● Agent/Patient Ambiguity: For instance, in "The chef served the meal," it’s
ambiguous whether "the chef" is the one who prepared the meal or just the one
who served it.

Q2 Explain Lexicon Free Porter Stemmer Algorithm.

The Lexicon-Free Porter Stemmer Algorithm is a specific version of the Porter stemming
algorithm that does not rely on an external lexicon or predefined dictionary of word
stems. The Porter stemmer itself is a well-known algorithm used to reduce words to their
root or base forms, which is useful in various natural language processing (NLP) tasks
such as text indexing and information retrieval.

Stemming is the process of reducing inflected or derived words to their base or root form.
For example, "running," "runner," and "runs" might all be reduced to "run." The goal is to
normalize words so that they can be treated as the same term in text analysis.

The Porter Stemmer Algorithm

The Porter stemmer algorithm, developed by Martin Porter in 1980, is a widely-used


stemming algorithm that applies a series of rules to strip suffixes from words in English.
It involves several steps and rules designed to handle common word endings and
morphological variations.
Lexicon-Free Porter Stemmer Algorithm

Lexicon-Free refers to the characteristic of the algorithm where it operates purely based
on rules and patterns rather than relying on a predefined list of root words or stems. This
means that the algorithm does not need to consult an external lexicon or dictionary to
perform stemming.

Key Features of Lexicon-Free Porter Stemmer Algorithm:

1. Rule-Based Approach: The algorithm applies a sequence of transformation rules to


words. These rules are designed to remove common suffixes and normalize
different forms of words. The transformation process is deterministic and does not
require external lexical information.
2. No External Lexicon: Unlike some stemming approaches that might use a lexicon
to match words to their stems, the lexicon-free Porter stemmer relies solely on
predefined rules to determine the stem of a word. This makes it simpler and easier
to implement.
3. Porter Stemming Steps: The Porter algorithm is divided into several steps, each
handling specific types of suffixes and conditions. The steps generally include:
○ Step 1: Removal of common suffixes (e.g., -s, -es, -ed, -ing).
○ Step 2: Removal of additional suffixes (e.g., -er, -ly, -ness).
○ Step 3: Removal of more suffixes, with additional conditions (e.g., -ment,
-ful).
○ Step 4: Removal of suffixes with additional rules (e.g., -tion, -ity).
○ Step 5: Special handling of certain endings (e.g., -e, -l).
4. Examples of Rule Application:
○ Word: "running"
■ Step 1: Remove suffix "-ing" if applicable, resulting in "run".
○ Word: "happiness"
■ Step 1: Remove suffix "-ness" if applicable, resulting in "happy".

Advantages of the Lexicon-Free Approach

● Simplicity: Since it does not rely on an external lexicon, the algorithm is


straightforward to implement.
● Portability: It can be used in various applications without needing additional
resources or data.
● Consistency: The deterministic nature of the rule-based approach ensures
consistent results across different texts.

Limitations
● Over-Stemming: The algorithm may sometimes reduce words too aggressively,
leading to cases where different words with distinct meanings are reduced to the
same stem.
● Language-Specific: The rules are tailored to English and may not be suitable for
other languages without modification.

Q3 Explain in Detail Regular Expression with Types.


Regular expressions (regex or regexp) are sequences of characters used to define
search patterns. They are a powerful tool for pattern matching and text
manipulation, commonly used in programming, data processing, and text editing.
A regular expression is a string of text that describes a search pattern. This pattern
can be used to search, match, and manipulate text based on certain rules and
constraints.

Components of Regular Expressions:

● Literals: Characters that represent themselves. For example, a matches the letter
'a'.
● Metacharacters: Special characters that define patterns or rules. For example, .
matches any character except a newline.
● Quantifiers: Specify the number of times a character or group should appear. For
example, * means zero or more times.
● Character Classes: Define a set of characters. For example, [abc] matches any
one of the characters 'a', 'b', or 'c'.
● Groups and Ranges: Allow for the grouping and ordering of patterns. For
example, (abc) groups 'abc' together, and [a-z] defines a range of characters
from 'a' to 'z'.
● Anchors: Define positions in the text. For example, ^ matches the start of a line,
and $ matches the end of a line.

Types of Regular Expressions

Regular expressions can be categorized into different types based on their syntax and
usage:

**1. Basic Regular Expressions (BRE)

● Definition: The original and simplest form of regular expressions. They are used
in tools like grep in Unix.
● Syntax: Basic syntax includes literals, character classes, and basic quantifiers.
● Example: a.b matches 'a', followed by any character, followed by 'b'.

**2. Extended Regular Expressions (ERE)


● Definition: An extension of basic regular expressions that includes additional
features like more advanced quantifiers and grouping.
● Syntax: Supports additional metacharacters and syntax, such as + for one or more
occurrences and ? for zero or one occurrence.
● Example: a+b? matches 'a' followed by one or more 'a's and optionally followed
by 'b'.

**3. Perl-Compatible Regular Expressions (PCRE)

● Definition: A widely used regular expression syntax that provides advanced


features and is compatible with Perl’s regular expressions.
● Syntax: Includes additional features such as lookahead, lookbehind, and
non-greedy matching.
● Example: (?<=\d)\w+ matches a word that is preceded by a digit (lookbehind
assertion).

**4. JavaScript Regular Expressions

● Definition: Regular expressions used in JavaScript, with syntax similar to Perl but
with some differences in functionality.
● Syntax: Includes features like global search with the g flag and case-insensitive
search with the i flag.
● Example: /\d{2,}/ matches a sequence of two or more digits.

**5. Python Regular Expressions

● Definition: Regular expressions used in Python programming, provided by the re


module.
● Syntax: Similar to Perl but with Python-specific functions and options.
● Example: r'\b\w+\b' matches whole words using word boundaries.

**6. POSIX Regular Expressions

● Definition: A set of regular expression standards defined by the POSIX


specification. They are used in various Unix tools.
● Syntax: Includes Basic and Extended POSIX regular expressions with some
unique syntax rules.
● Example: ^[0-9]{3}-[0-9]{2}-[0-9]{4}$ matches a Social Security
number in the format 123-45-6789.

Common Regular Expression Patterns

**1. Literals
● abc matches the exact sequence 'abc'.

**2. Character Classes

● [abc] matches any one of 'a', 'b', or 'c'.


● [a-z] matches any lowercase letter.

**3. Quantifiers

● a* matches zero or more 'a's.


● a+ matches one or more 'a's.
● a? matches zero or one 'a'.

**4. Anchors

● ^abc matches 'abc' at the start of a line.


● abc$ matches 'abc' at the end of a line.

**5. Groups and Ranges

● (abc) groups 'abc' together.


● a{2,4} matches 'a' repeated 2 to 4 times.

**6. Assertions

● (?=abc) is a positive lookahead that matches a position before 'abc'.


● (?!abc) is a negative lookahead that matches a position not followed by 'abc'.

**7. Escaped Characters

● \. matches a literal dot (.) character.


● \d matches any digit (equivalent to [0-9]).

Q4 Define Parser and Explain Top Down and Bottom up Parser.

A parser is a component in natural language processing (NLP) and compiler design that
analyzes the syntactic structure of input text based on a formal grammar. Its primary
function is to decompose text into its constituent parts according to grammatical rules and
to produce a structured representation, such as a parse tree or abstract syntax tree, that
reflects the syntactic structure of the text.
● Syntax Analysis: Determines whether the input text conforms to the rules of the
grammar.
● Error Detection: Identifies syntax errors or inconsistencies in the text.
● Structure Representation: Produces a parse tree or abstract syntax tree that
represents the grammatical structure of the input.

Types of Parsers: Top-Down and Bottom-Up

1. Top-Down Parsers

Definition: Top-down parsers work by starting from the top of the parse tree (the start
symbol of the grammar) and attempt to construct the parse tree by breaking it down into
its constituent parts based on the grammar rules. They attempt to match the input text
with the expected structure by expanding grammar rules in a recursive manner.

Characteristics:

● Recursive Approach: Uses recursion to expand non-terminals into their


constituent rules.
● Predictive Parsing: Often uses lookahead tokens to make decisions about which
grammar rule to apply next.

Types of Top-Down Parsers:

● Recursive Descent Parser: Implements each non-terminal of the grammar as a


separate recursive function. Each function tries to match its corresponding part of
the input text.
○ Example: To parse a sentence, the function for a noun phrase would try to
match the noun phrase structure against the input tokens.
● Predictive Parser: A type of recursive descent parser that uses a parsing table to
make decisions based on the current input token and the expected rules. It is
designed to handle deterministic grammars without backtracking.
○ Example: A parsing table guides the parser to choose the correct
production rule based on the next token in the input.

Advantages:

● Simplicity: The recursive descent parser is relatively easy to implement.


● Readable: The structure of the parser often mirrors the grammar directly, making
it easier to understand and debug.

Disadvantages:

● Limited Grammar: Top-down parsers may struggle with left-recursive grammars


(where a non-terminal can eventually derive itself) or ambiguous grammars.
● Inefficiency: Recursive descent parsers can be less efficient if backtracking is
required.

2. Bottom-Up Parsers

Definition: Bottom-up parsers start from the input tokens and attempt to build the parse
tree by combining tokens into larger constituents until they match the start symbol of the
grammar. They work by reducing the input text to the start symbol based on the grammar
rules.

Characteristics:

● Shift-Reduce: Uses shift actions to move tokens onto a stack and reduce actions
to replace patterns on the stack with non-terminals based on grammar rules.
● Handles Ambiguity: More effective at handling certain types of ambiguities and
complex grammars.

Types of Bottom-Up Parsers:

● Shift-Reduce Parser: Implements a stack-based approach where tokens are


pushed onto the stack (shift) and replaced with non-terminals when the stack
matches a pattern defined by a rule (reduce).
○ Example: In a sentence "The cat sat," tokens are shifted onto the stack until
a valid pattern is recognized, and reductions are applied to form
higher-level structures.
● Earley Parser: A more general bottom-up parser that can handle any context-free
grammar, including ambiguous and complex grammars. It uses a chart to keep
track of possible parses.
○ Example: The Earley parser maintains a set of possible states as it
processes each token and uses dynamic programming to combine these
states.

Advantages:

● Versatility: Can handle a wider range of grammars, including ambiguous and


left-recursive ones.
● Efficiency: Shift-reduce parsers are often more efficient for certain types of
grammars.

Disadvantages:

● Complexity: Bottom-up parsers can be more complex to implement and


understand compared to top-down parsers.
● Memory Usage: Some bottom-up parsers, like the Earley parser, may use more
memory due to the chart or state management.
Q5 Explain in Detail History and Origin of Natural Language Processing.

Natural Language Processing (NLP) is a multidisciplinary field that intersects computer science,
artificial intelligence (AI), and linguistics. Its goal is to enable computers to understand,
interpret, and generate human language in a meaningful way. The history and origin of NLP is a
rich tapestry that reflects the evolution of computing and language understanding technologies.

Early Foundations

1. Ancient and Classical Contributions:

● Ancient Linguistics: The study of language has roots in ancient civilizations. For
example, Panini's grammar of Sanskrit in ancient India (around 5th century BCE) was an
early example of formal language rules.
● Classical Logic: Philosophers like Aristotle and later logicians developed formal systems
of logic that laid the groundwork for computational language analysis.

2. 1950s - The Birth of Computational Linguistics:

● Alan Turing: Alan Turing's work on the concept of a machine that could perform tasks
requiring intelligence led to the development of the Turing Test, which assesses a
machine's ability to exhibit intelligent behavior equivalent to or indistinguishable from
that of a human.
● Early Machine Translation: The first significant NLP application was machine
translation. In 1954, the Georgetown-IBM experiment demonstrated the potential of
machine translation by translating 60 Russian sentences into English.

Development of Algorithms and Methods

1. 1960s - Rule-Based Approaches and Early Models:

● Semantic Networks: Early work focused on rule-based systems and semantic networks.
For example, the work of Joseph Weizenbaum on ELIZA, a program simulating a
Rogerian psychotherapist, demonstrated that computers could engage in simple
conversations with humans.
● Syntax and Parsing: Research on syntactic parsing led to the development of formal
grammars, such as context-free grammar (CFG), which became foundational in parsing
algorithms.

2. 1970s - Formal Grammars and Knowledge Representation:

● Transformational Grammar: Noam Chomsky's theories on transformational-generative


grammar provided a formal framework for understanding syntax, influencing NLP by
emphasizing the importance of grammatical rules.
● Semantic Processing: Research expanded to include semantic processing, focusing on
how meaning could be extracted from text using logical and probabilistic approaches.
Statistical Methods and Machine Learning

1. 1980s - The Rise of Statistical Approaches:

● Statistical Models: The shift from rule-based to statistical methods began in the 1980s.
Researchers like Frederick Jelinek applied statistical models to speech recognition and
natural language processing, marking a departure from purely symbolic methods.
● Hidden Markov Models (HMMs): HMMs became popular for tasks such as
part-of-speech tagging and speech recognition, allowing for probabilistic modeling of
language sequences.

2. 1990s - Data-Driven Techniques and Machine Learning:

● Corpora and Annotation: The availability of large text corpora and annotated data
enabled the development of more sophisticated statistical models. The Penn Treebank, for
example, provided annotated corpora for syntactic parsing and other NLP tasks.
● Support Vector Machines (SVMs): SVMs and other machine learning techniques were
introduced for text classification and named entity recognition, further advancing the
field.

Modern NLP and Deep Learning

1. 2000s - Advancements in Machine Learning:

● Boosted Performance: Machine learning methods continued to advance, with algorithms


like Conditional Random Fields (CRFs) improving sequence labeling tasks.
● Language Modeling: The development of more sophisticated language models, such as
n-grams, improved text generation and understanding.

2. 2010s - The Era of Deep Learning:

● Neural Networks: The rise of deep learning and neural networks revolutionized NLP.
Models like Word2Vec, developed by Tomas Mikolov and his team at Google, introduced
word embeddings that captured semantic relationships between words.
● Transformer Models: The introduction of the Transformer architecture by Vaswani et al.
in 2017 marked a significant breakthrough. Transformers, and models based on them like
BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative
Pre-trained Transformer), achieved state-of-the-art performance on a wide range of NLP
tasks.

3. 2020s - Advanced Language Models and Applications:

● Large Pre-trained Models: The development of large pre-trained models such as GPT-3
by OpenAI and T5 by Google pushed the boundaries of what is possible in NLP, enabling
more advanced text generation, comprehension, and interaction capabilities.
● Ethical and Societal Implications: The field has also seen increasing focus on the ethical
implications of NLP technologies, including issues related to bias, fairness, and the
responsible use of AI.
Q6 Hidden Markov Model (HMM Viterbi) for POS Tagging.

The Hidden Markov Model (HMM) is a statistical model used for various sequence analysis
tasks, including Part-of-Speech (POS) tagging. POS tagging involves assigning parts of speech
(such as nouns, verbs, adjectives) to each word in a sentence based on the context.

The Viterbi algorithm is a dynamic programming algorithm used to find the most likely sequence
of hidden states in an HMM given a sequence of observed events. In the context of POS tagging,
the Viterbi algorithm helps determine the most likely sequence of POS tags for a given sentence.

Hidden Markov Model (HMM) Overview

An HMM is characterized by:

1. States: These are the hidden states that represent POS tags in POS tagging. For example,
states might be "Noun," "Verb," "Adjective," etc.
2. Observations: These are the observed symbols, which are the words in a sentence.
3. Transition Probabilities: These represent the probability of moving from one state (POS
tag) to another. For example, the probability of a noun being followed by a verb.
4. Emission Probabilities: These represent the probability of a particular observation (word)
being emitted by a state (POS tag). For example, the probability of the word "run" being
a verb.
5. Initial Probabilities: These represent the probability of the sequence starting with a
particular state (POS tag).

The Viterbi Algorithm

The Viterbi algorithm is used to find the most probable sequence of states given a sequence of
observations. It works by using dynamic programming to efficiently compute this sequence.
Here’s how the Viterbi algorithm is applied to POS tagging:

1. Define the Problem:

● Input: A sequence of words (observations), e.g., ["The", "cat", "sat", "on", "the", "mat"].
● Output: The most probable sequence of POS tags for the words.

2. Initialize the Variables:

● Viterbi Table: A table where V[i][j] represents the highest probability of the most
likely sequence of POS tags that ends in state j (POS tag j) at position i (the i-th word
in the sequence).
● Backpointer Table: A table where B[i][j] records the state (POS tag) that maximized
the probability at V[i][j].

3. Algorithm Steps:

Initialization:
● Set initial probabilities for the first word. V[0][j]=Initial Probability[j]×Emission
Probability[j][word0]V[0][j] = \text{Initial Probability}[j] \times \text{Emission
Probability}[j][\text{word}_0]V[0][j]=Initial Probability[j]×Emission
Probability[j][word0​]
● Initialize backpointer table for the first word.

Recursion:

● For each word position i from 1 to N-1, and for each state j (POS tag), compute:
V[i][j]=max⁡k(V[i−1][k]×Transition Probability[k][j]×Emission
Probability[j][wordi])V[i][j] = \max_{k} \left(V[i-1][k] \times \text{Transition
Probability}[k][j] \times \text{Emission Probability}[j][\text{word}_i]
\right)V[i][j]=kmax​(V[i−1][k]×Transition Probability[k][j]×Emission
Probability[j][wordi​]) where k ranges over all possible states. Update the backpointer
table to record the state that gave the maximum probability.

Termination:

● The final step involves finding the most probable state sequence that ends in any state j
at position N-1: Most Likely Ending State=arg⁡max⁡jV[N−1][j]\text{Most Likely Ending
State} = \arg \max_{j} V[N-1][j]Most Likely Ending State=argjmax​V[N−1][j]
● Trace back through the backpointer table to reconstruct the most probable sequence of
states.

Example

Let’s go through a simplified example with three POS tags (Noun, Verb, Adjective) and a
sentence "The cat sat."

1. Define Transition and Emission Probabilities:

● Transition Probabilities: P(Noun|Noun),P(Verb|Noun),P(Adj|Noun)\text{P(Noun|Noun)},


\text{P(Verb|Noun)}, \text{P(Adj|Noun)}P(Noun|Noun),P(Verb|Noun),P(Adj|Noun)
● Emission Probabilities: P(The|Noun),P(cat|Noun),P(sat|Verb)\text{P(The|Noun)},
\text{P(cat|Noun)}, \text{P(sat|Verb)}P(The|Noun),P(cat|Noun),P(sat|Verb)

2. Initialize the Viterbi and Backpointer Tables:

● For the first word ("The"): V[0][Noun]=Initial Probability[Noun]×Emission


Probability[Noun][The]V[0][\text{Noun}] = \text{Initial Probability[Noun]} \times
\text{Emission Probability[Noun][The]}V[0][Noun]=Initial Probability[Noun]×Emission
Probability[Noun][The] V[0][Verb]=Initial Probability[Verb]×Emission
Probability[Verb][The]V[0][\text{Verb}] = \text{Initial Probability[Verb]} \times
\text{Emission Probability[Verb][The]}V[0][Verb]=Initial Probability[Verb]×Emission
Probability[Verb][The]
● Backpointer table initialized.
3. Recursion:

● For the second word ("cat"): V[1][Noun]=max⁡(V[0][Noun]×Transition


Probability[Noun][Noun]×Emission Probability[Noun][cat], V[0][Verb]×Transition
Probability[Verb][Noun]×Emission Probability[Noun][cat])V[1][\text{Noun}] = \max
\left(V[0][\text{Noun}] \times \text{Transition Probability[Noun][Noun]} \times
\text{Emission Probability[Noun][cat]}, \, V[0][\text{Verb}] \times \text{Transition
Probability[Verb][Noun]} \times \text{Emission
Probability[Noun][cat]}\right)V[1][Noun]=max(V[0][Noun]×Transition
Probability[Noun][Noun]×Emission Probability[Noun][cat],V[0][Verb]×Transition
Probability[Verb][Noun]×Emission Probability[Noun][cat])
● Update the backpointer table.

4. Termination and Backtracking:

● Find the highest probability in the last column and trace back to reconstruct the sequence

OR

Ans 👍 The HMM is a sequence Model


A sequence model or sequence classifier is a model whose job is to assign a label or class to each unit in a
sequence, thus mapping a sequence of observations to a sequence of labels.
An HMM is a probabilities sequence model: given a sequence of units (Words, letters , morphemes,
sentences, whatever), it computes a probability distribution over possible sequences of labels and choose
the best label sequence.
The Viterbi Algorithm
The decoding algorithm for HMMs is the Viterbi algorithm shown in Fig.
As an instance of dynamic programming, Viterbi resembles the dynamic programming minimum edit
distance algorithm.
- Viterbi algorithm for finding the optimal sequence of tags. Given an observation sequence and an
HMM h = (A;B), the algorithm returmns the state path through the HMM that Assigns maximum
likelihood to the observation sequence.
- The Viterbi algorithm first sets up a probability matrix or Lattice, with one column for each
observation ot and one row for each state in the state graph. Each Column thus has a cell for each
state qi in the single combined automaton. Fig shows an intuition of this lattice for the sentence
Janet will back th bill.

-
- A sketch of the lattice for Janet will back the bill, showing the possible tags for each word and
highlighting the path corresponding to the correct tag sequence through the hidden state. States
(part of speech )which have a zero probability of generating a particular word according to the B
matrix (such as the probability that a determiner DT will be realized as Janet) are greyed out.

You might also like