NLP JNTUH Unit 1
NLP JNTUH Unit 1
● Irregularity means having forms or structures in language that don't fit the usual
patterns or rules. These are exceptions to the standard ways of forming words or
sentences.
● Irregular verbs are verbs that don't follow the standard rule of adding -d, -ed, or -ied to
form their past simple or past participle forms (e.g., "go" becomes "went" instead of
"goed").
● Examples of irregular verbs include "run" (past: "ran"), "buy" (past: "bought"), and
"take" (past: "took").
● Irregularities can also appear in other parts of language, like noun plurals (e.g., "child"
becomes "children" instead of "childs") or comparative adjectives (e.g., "good" becomes
"better" instead of "gooder").
● Understanding irregularities is important because they often appear in everyday
language, and using them correctly makes speech and writing more natural.
Ambiguity:
● Ambiguity means that something, like a sentence or word, can have two or more
possible meanings.In both speaking and writing, there are two main types of ambiguity:
1. Lexical Ambiguity:
● Lexical ambiguity occurs when a word has more than one meaning, and it's
unclear which meaning is intended in a given context.
● This happens because some words have multiple definitions or uses.
○ Example:
● "Bank"
○ Could mean a financial institution (where you store
money).
○ Or it could refer to the side of a river (a riverbank).
2. Syntactic Ambiguity: Presence of two or more possible meanings within a single
sentence or sequence of words.
● This type of ambiguity happens when the way words are arranged allows for
multiple possible meanings.
○ Example:"The chicken is ready to eat."
○ This could mean that the chicken is prepared and ready for
someone to eat it (the chicken is the food).
○ Or it could mean that the chicken itself is hungry and ready to eat
something.
Homonyms: These are words that look the same but have different meanings or functions.
○ Examples:
■ "Bore" (to drill a hole) and "boar" (a wild pig).
■ "Two" (the number) and "too" (meaning also).
Productivity:
● Productivity in language refers to our unlimited ability to create new sentences and
expressions. This means we can use any language to say things that have never
been said before. It’s also called open-endedness or creativity.
● The term can also refer to specific parts of language, like prefixes or suffixes, that
help us create new words of the same type (e.g., adding "-ness" to "happy" to make
"happiness").
● Productivity is most often talked about in relation to word-formation, which is how we
create new words.
● Humans constantly come up with new ways to express ideas and describe new
things by using their language creatively. This ability, called productivity, allows us to
create an infinite number of sentences.
● Other animals don't have this kind of flexibility in communication. For example, cicadas
have only four signals, and vervet monkeys have 36 vocal calls. They can’t create new
signals to talk about new experiences or events.
● The limitless ability to create and understand completely new sentences is known as
open-endedness.
● Another important part of human creativity is the freedom to respond in any way we
choose. People can say whatever they want in any situation, or they can choose to
say nothing at all.
Morphological Models
• Dictionary Lookup
• Finite-State Morphology
• Unification-Based Morphology
• Functional Morphology
Dictionary Lookup:
Finite-State Morphology:
Finite-State Transducers:
Unification-Based Morphology:
Functional Morphology:
● Generative Approach: This method focuses on learning about each class (or
category) by understanding how data is generated for that class.
○ Learning Process: It learns the joint probability distribution p(x,y), which
means it tries to model how both the features (x) and the classes (y) are
related.
○ Data Modeling: It models the distribution of data within each class
separately. For instance, it learns what a lion and an elephant look like based on
images from the zoo.
○ Reconstruction: It can generate new samples that are similar to those from the
classes it has learned about. For example, it can generate images of lions and
elephants that resemble the ones seen before.
○ Understanding: It has a deeper understanding of the overall structure of the
data and the relationships between different features.
○ Applications: Useful in scenarios where you need to generate new data,
simulate scenarios, or understand the underlying distribution of the data.
Examples include generative adversarial networks (GANs) and hidden
Markov models (HMMs) and Naive Bayes classifiers.
○ Flexibility: Can be used for tasks like data imputation, anomaly detection, and
more because it understands the data generation process.
○ Advantages:
■ Can handle missing data by generating it.
■ Useful for scenarios where understanding the data generation process is
crucial.
○ Disadvantages:
● Can be more complex to train due to the need to model the entire
distribution.
● May require more data and computation.
● Discriminative Approach:
○ This method focuses on distinguishing between different classes based on
the features provided.
○ Learning Process: It learns the conditional probability distribution p(y∣x),
which means it tries to model the probability of a class given the features.
○ Feature Differences: It focuses on learning the differences between classes
by directly analyzing features and their relationships. For instance, it
identifies specific features that differentiate a lion from an elephant.
○ Classification: It is primarily used for making classifications or predictions.
For example, it can classify an unknown animal as a lion or elephant based on its
features.
○ Efficiency: Often requires less data to achieve high accuracy in
classification because it focuses on distinguishing features rather than
understanding the entire data distribution.
○ Applications: Commonly used in tasks like image recognition, spam
detection, and speech recognition.
○ Examples include logistic regression and support vector machines (SVMs).
○ Performance: Typically performs better in classification tasks where the primary
goal is to distinguish between categories, rather than understanding how each
category is generated.
Summary:
● Generative Models aim to understand and model how data is generated, providing a
deeper insight into the data distribution and enabling data generation.
● Discriminative Models aim to focus on distinguishing between different categories
based on features, often leading to more accurate and efficient classification in practice.
Complexity of Approaches
Evaluation Metrics:
1. Error Rate: Measures the ratio of errors to the total number of examples. Lower
error rates indicate better performance.
2. F1 Measure: The harmonic mean of recall and precision, which balances both
metrics to provide a single performance measure. Higher F1 scores reflect better
performance.
Performance Results: