Unit 1 and 2
Unit 1 and 2
NLP helps users to ask questions about any subject and Lexical Ambiguity
get a direct response within seconds. Lexical Ambiguity exists in the presence of two or more
NLP offers exact answers to the question means it does possible meanings of the sentence within a single
not offer unnecessary and unwanted information. word.
"It is celebrated on the 15th of August each year ever French: Banque (not rive, which means river bank).
since India got independence from the British rule." Information Retrieval:
"This day celebrates independence in the true sense." Disambiguating user queries improves search accuracy.
Stemming is used to normalize words into its base Example:
form or root form. For example, celebrates, celebrated
Query: Apple.
and celebrating, all these words are originated with a
single root word "celebrate." The big problem with Sense: A fruit vs. a tech company.
stemming is that sometimes it produces the root word
Chatbots and Virtual Assistants:
which may not have any meaning.
Resolving word senses enhances user interaction.
For Example, intelligence, intelligent, and intelligently,
all these words are originated with a single root word Example:
"intelligen." In English, the word "intelligen" do not
have any meaning. User: What is the capital of the bank?
Example:
Word Senses and Word embeddings:
Question: What does a bat eat?
Correct sense: Bat as an animal. Key Features of RNNs:
Hidden Layer:
RNN(Recurrent Neural Network): Contains a recurrent connection that allows the
Recurrent Neural Networks (RNNs) are a class of neural network to remember past states.
networks specifically designed to handle sequential Output Layer:
data, making them particularly well-suited for tasks in
Natural Language Processing (NLP). Unlike traditional Generates the output at each time step or after
feedforward neural networks, RNNs have loops that processing the entire sequence.
allow information to be passed from one step of the
Challenges with RNNs:
network to the next, enabling the network to maintain
a memory of previous inputs. Vanishing Gradient Problem:
Gradients diminish over long sequences, making it Impact: The training process becomes unstable, with
difficult for the network to learn long-term the model's loss function often resulting in "NaN" (Not
dependencies. a Number) values, and the model's performance
deteriorates.
Exploding Gradient Problem:
HIDDEN MARKOV MODELS:
Gradients can grow excessively large during
Markov chain: A Markov chain is a way to model a
backpropagation, causing instability.
system where what happens next only depends on
Limited Memory: what is happening right now, not on what happened
before.
Difficulty in handling very long sequences due to
reliance on hidden states. In simple terms, it's a chain of events where each event
depends only on the one right before it.
Vanishing Gradients and Exploding Gradients are two
common problems encountered during the training of A Hidden Markov Model (HMM) in NLP is a tool used to
deep neural networks, particularly in Recurrent Neural predict things that we can't directly see (hidden states)
Networks (RNNs) and other deep architectures. These based on things we can observe.
issues arise during the backpropagation process, which
Ex: Imagine you're trying to guess the weather based
is used to update the model's weights by calculating
on how people are dressed. You can't see the weather
gradients.
directly (that's the hidden part), but you can see if
Vanishing Gradients:
people are wearing coats, hats, or sunglasses (these are
The Vanishing Gradient problem occurs when the the observable parts). The model helps you predict the
gradients of the loss function with respect to the weather (hidden state) from the outfits (observations)
model's parameters become very small as they are and the probabilities of transitioning between different
propagated backward through the network. This leads weather types (like sunny, rainy, etc.).
to very small updates to the model’s weights,
The model defines probabilities for transitioning
effectively stalling learning, particularly in the early
between hidden states and for generating observable
layers of the network. This problem is especially
symbols, allowing for the modeling of dynamic systems
prevalent in deep networks or in RNNs when trying to
with uncertainty.
capture long-term dependencies.
HMMs are useful for modelling dynamic systems and
In RNNs: When processing long sequences, the
forecasting future states based on sequences that have
contributions from earlier inputs diminish
been seen because of their flexibility.
exponentially, making it difficult for the network to
In NLP, HMMs are used for tasks like part-of-speech
learn relationships between distant inputs in the
tagging—where the hidden states are the grammatical
sequence.
categories (like noun, verb, etc.) and the observable
Impact: The model struggles to learn and represent words are the sentences. The model guesses the
long-range dependencies in the data, leading to poor hidden tags based on the words and patterns in the
performance on tasks that require understanding of text.
context over long sequences (e.g., in NLP tasks like
States:
language modeling or translation).
Hidden States: The actual states of the system are not
Exploding Gradients
directly observable. Instead, they are inferred based on
The Exploding Gradient problem occurs when the observable outputs.
gradients become excessively large during
Observable States: These are the outputs or
backpropagation. This can cause the model's weights to
observations that can be directly seen or measured.
grow exponentially, leading to numerical instability.
The model may diverge during training, making it Markov Property:
impossible to learn anything meaningful.
The Markov property assumes that the probability of
In RNNs: This typically happens when there are large transitioning to the next state depends only on the
weight values or when trying to model very complex current state, not on the sequence of previous states.
sequences, where the error gradients multiply and This is known as the first-order Markov property.
grow rapidly as they propagate backward through time.
An HMM consists of two types of variables: hidden Given the observed data, the Viterbi algorithm is used
states and observations. to compute the most likely sequence of hidden states.
This can be used to predict future observations, classify
The hidden states are the underlying variables that
sequences, or detect patterns in sequential data.
generate the observed data, but they are not directly
observable. Step 7: Evaluate the model
The observations are the variables that are measured The performance of the HMM can be evaluated using
and observed. various metrics, such as accuracy, precision, recall.