Important Notes How Have Facing Problem in NLP
Important Notes How Have Facing Problem in NLP
evolve over time in a probabilistic way. It is used in many fields such as speech recognition, natural language
processing, and biology.
1. States:
In HMM, the system being studied can be in one of several states at any time. These states are
"hidden" (we can't see them directly), but we can observe certain things (called observations) that
give us clues about which state the system might be in.
2. Transitions:
The system moves from one state to another over time. The probability of moving from one state
to another is called the transition probability.
3. Observations:
While we cannot directly see the state, we can observe some data that is related to the state. These
observations are linked to the states through emission probabilities.
4. Markov Property:
HMM assumes that the next state depends only on the current state, not on the sequence of events
that preceded it. This is called the Markov property.
5. Goal:
The goal of using HMM is to find the most likely sequence of states (hidden states) given a series
of observations.
Example:
Imagine you are trying to predict the weather (states: sunny, rainy) based on observations (like whether people
are carrying umbrellas or wearing jackets). The weather today depends on the weather yesterday (this is the
Markov property). But, you can't directly observe the weather; instead, you can only see people’s actions,
which give you clues about the weather.
In Summary:
Transition Probabilities: The chance of moving from one state to another (e.g., from sunny to rainy).
Emission Probabilities: The chance of seeing an observation from a state (e.g., the chance of seeing an
umbrella when it’s rainy).
HMM helps to model situations where we have some hidden factors (states) influencing visible data
(observations) and we want to understand or predict the system's behavior over time.
The CYK (Cocke-Younger-Kasami) algorithm is a parsing algorithm used for context-free grammars (CFGs).
It helps to determine whether a given string can be generated by a particular CFG, and if so, it provides a
possible parse tree for that string.
This algorithm is commonly used in Natural Language Processing (NLP) and compilers to process sentences
or code.
Key Concepts:
Example:
1. S → NP VP
2. NP → Det N
3. VP → V NP
4. Det → the
5. N → cat | dog
6. V → chases
W
o
r
d
P
o 1 2 3 4 5
s
it
i
o
n
c
h
t c t d
a
1 h a h o
s
e t e g
e
s
N
o
n
-
T
e D D
r e N V e N
m t t
i
n
a
l
s
D D
e e
2 t N t N
N N
V V
3
P P
4 S
Explanation:
In the first row (1), we just list the words in the sentence.
In the second row (Non-Terminals), we start filling in the non-terminals that correspond to each word,
based on the grammar rules. For example, "the" corresponds to Det and "cat" corresponds to N.
We continue filling in the table by combining non-terminals for substrings. For example, "the cat" can
be generated by the rule NP → Det N, so in the next row (2), we fill NP for the substring "the cat".
We continue filling the table and checking combinations of non-terminals that can generate longer
substrings.
In the final step, we check the top-left cell of the table. If it contains the start symbol ( S), then the
sentence is grammatically correct according to the grammar.
Step 3: The final table shows that S appears in the top left cell, meaning that the sentence "the cat chases the
dog" can be generated by the given grammar.
Parsing Context-Free Grammars: CYK is useful when you need to check if a string can be generated
by a given grammar, especially for complex grammars.
Ambiguity Detection: CYK can also be used to identify multiple ways to parse a sentence, which
helps detect ambiguity in the grammar.
Advantages of CYK:
Disadvantages of CYK:
CYK requires the grammar to be in Chomsky Normal Form (CNF), which may require some
preprocessing of the grammar.
Ambiguity--- Natural language is inherently ambiguous, with words having multiple meanings and sentences
having different interpretations.
Variability--- Language use varies greatly between individuals, contexts, regions, and cultures.
Context Sensitivity--- The meaning of words and sentences heavily depends on context, making it challenging for
computers to accurately interpret.
Lack of Formal Rules---- Unlike programming languages, natural languages lack strict syntax and semantics rules.
Implicit Knowledge---- Much of human communication involves implicit knowledge and context, which is difficult
to model computationally.
History of NLP
2000sPresent: Deep learning revolutionized NLP with neural networks, leading to significant advances in tasks like
language translation, sentiment analysis, and more.
Advantages of NLP
Automation: Enables automation of tasks like translation, summarization, and sentiment analysis.
Insights: Helps extract insights and patterns from large volumes of textual data.
Disadvantages of NLP
Ambiguity and Complexity: Dealing with language ambiguity and complex structures.
Data Dependency: NLP models heavily rely on large datasets for training, which may not always be available or
representative.
Bias: Models can inherit biases present in training data, leading to unfair or inaccurate results.
Computational Cost: Deep learning models used in NLP can be computationally expensive and resourceintensive.
Components of NLP
Named Entity Recognition (NER): Identifying entities like names, dates, and locations.
Applications of NLP
Ambiguity in natural language arises from multiple possible interpretations of words, phrases, or sentences due to
context, tone, and cultural references. Resolving ambiguity is a key challenge in NLP, requiring advanced models
that can understand context and infer meaning accurately.
Phases of NLP
1. Preprocessing: Cleaning and preparing text data (e.g., removing punctuation, tokenization).
NLP APIs: Offer prebuilt services (like Google Cloud NLP, IBM Watson) for tasks such as sentiment analysis, entity
recognition, and translation.
NLP Libraries: Provide frameworks (like NLTK, spaCy, Transformers) for developing custom NLP applications,
offering tools for various tasks and models.
Natural Language: Evolves naturally among humans, is context dependent, ambiguous, and varies widely.
Computer Language: Formal, structured languages with clear syntax and semantics designed for programming
computers to perform specific tasks.