ML QBF
ML QBF
Syntactic Ambiguity
1. Definition: Happens when a sentence can be interpreted in multiple ways because of its structure.
2. Example: "I saw the man with the telescope." (Did I use the telescope to see the man or did the man have
the telescope?)
3. Resolution: Solved by understanding the sentence structure.
4. Focus: On how words are arranged in a sentence.
5. Source: Complex sentence structures with multiple possible meanings.
6. Tools: Sentence parsers help figure out the correct structure.
7. Effect: Changes the overall meaning based on sentence structure.
8. Example: Different possible syntax trees for the same sentence.
Lexical Ambiguity
1. Definition: Happens when a word has more than one meaning.
2. Example: "Bank" (Could mean a place to store money or the side of a river.)
3. Resolution: Solved by using the surrounding words to figure out which meaning is correct.
4. Focus: On the different meanings of individual words.
5. Source: Words with multiple definitions.
6. Tools: Contextual clues and word sense disambiguation help identify the right meaning.
7. Effect: Changes how we understand individual words in context.
1. Phonology
• What It Is: Phonology is about the sounds of language and how they are used to convey
meaning.
• Key Ideas:
o Phonemes: The smallest sound units that can change meaning (e.g., "p" vs. "b" in "pat"
and "bat").
o Prosody: The rhythm and tone of speech that add emotion and meaning.
• Use in NLP: Important for making systems that recognize speech or convert text to speech
accurately.
2. Morphology
• What It Is: Morphology studies how words are formed from smaller parts (morphemes).
• Key Ideas:
o Morphemes: The smallest units of meaning (e.g., "un-" + "happy" = "unhappy").
o Inflectional Morphology: Changing a word's form to show tense or number (e.g.,
"run" → "running").
o Derivational Morphology: Creating new words by adding prefixes or suffixes (e.g.,
"happy" → "happiness").
• Use in NLP: Helps with breaking down words and understanding their meanings for tasks like
word stemming and lemmatization.
3. Syntax
• What It Is: Syntax is the study of how words are arranged to form sentences.
• Key Ideas:
o Grammar: The rules for how words combine to form sentences.
o Parsing: Breaking down a sentence to understand its grammatical structure.
• Use in NLP: Essential for tasks like sentence parsing, machine translation, and generating
correct sentences.
4. Semantics
5. Reasoning
• What It Is: Reasoning in NLP involves making inferences and conclusions from language.
• Key Ideas:
o Logical Reasoning: Using logic to draw conclusions (e.g., if A implies B, and A is
true, then B is true).
o Common-Sense Reasoning: Using general knowledge to understand implied
meanings.
• Use in NLP: Used in advanced applications like virtual assistants and AI systems that need to
make decisions based on language inputs.
o .
The stages of Natural Language Processing (NLP) involve a series of steps that transform raw text into
a form that machines can understand and analyze. Here’s a simple breakdown:
• Purpose: Break down text into its basic components like words and morphemes (smallest units
of meaning).
• Processes:
o Tokenization: Splitting text into words or sentences.
o Stemming/Lemmatization: Reducing words to their root forms.
• Purpose: Analyze the structure of sentences to understand how words are arranged and related.
• Processes:
o Grammar Checking: Ensuring sentences are grammatically correct.
o Sentence Parsing: Identifying the grammatical structure (e.g., subject, verb, object).
3. Semantic Analysis:
5. Pragmatic Analysis:
• Purpose: Interpret the intended meaning behind the text based on context.
• Processes:
o Speech Act Recognition: Understanding the intention (e.g., a request, command, or
question).
o Contextual Interpretation: Interpreting what is meant based on the situation (e.g.,
understanding sarcasm or indirect requests).
1. Ambiguity:
• Lexical Ambiguity: A word can have multiple meanings (e.g., "bat" can mean an animal or
sports equipment).
• Syntactic Ambiguity: A sentence structure can lead to multiple interpretations (e.g., "Visiting
relatives can be tiring" could mean either the act of visiting is tiring or the relatives themselves
are tiring).
• Semantic Ambiguity: The meaning of a sentence can be unclear (e.g., "He saw her duck"
could mean he saw her pet duck or saw her lower her head).
2. Contextual Understanding:
• Sarcasm and irony are difficult to detect because the literal meaning is often opposite to the
intended meaning. It requires a deeper understanding beyond just the text.
• Idioms and metaphors (e.g., "kick the bucket" means "to die") are not meant to be taken
literally, making them hard for NLP systems to understand.
5. Domain-Specific Knowledge:
• Language differs across domains (e.g., medical, legal), and NLP systems trained in one area
might not perform well in another due to lack of specialized knowledge.
6. Low-Resource Languages:
• Some languages lack large datasets for training, making it harder to develop effective NLP
systems for them.
7. Language Evolution:
• Language changes over time with new words and slang, making it difficult for NLP systems to
stay current.
8. Handling Negation:
• Understanding sentences with negation (e.g., "I don't like pizza" vs. "I like pizza") is tricky, as
it requires recognizing subtle shifts in meaning.
9. Sentiment Analysis:
• Accurately determining sentiment (positive, negative, etc.) is hard, especially when emotions
are mixed, sarcasm is used, or the sentiment depends on context.
• Using language data raises privacy and ethical issues, like ensuring models don’t reinforce
biases or misuse sensitive information.
1. Phonological Level
2. Morphological Level
• Knowledge: Understanding how words are constructed from smaller units like prefixes and
roots.
• Use: Helps break down words into their base forms (e.g., "running" → "run").
3. Syntactic Level
• Knowledge: Knowing how words are arranged in sentences according to grammar rules.
• Use: Essential for figuring out sentence structure and ensuring grammatical correctness.
4. Semantic Level
• Knowledge: Understanding the meaning of words and sentences.
• Use: Helps interpret the actual meaning of text and resolve ambiguities.
5. Pragmatic Level
6. Discourse Level
• Knowledge: Understanding how sentences and parts of a text relate to each other.
• Use: Helps in making sense of the overall flow of a conversation or text.
• Knowledge: Using general knowledge about the world to understand references and implied
meanings in text.
• Use: Important for answering questions, making inferences, and understanding context beyond
the text itself.
1. Diverse Languages
• Languages: India has 22 major languages and many dialects, each with unique scripts and
grammar.
• Challenges: The diversity makes it difficult to create universal NLP tools that work for all
languages.
• Data: Large, well-annotated datasets are not always available for every Indian language,
making it tough to train models.
• Tools: Translators and text processors are being developed, especially for major languages like
Hindi, Tamil, and Bengali.
3. Applications
4. Challenges
5. Progress
• Research: There is increasing effort to improve tools and technology for Indian languages.
• Initiatives: Government and organizations are actively working to develop better language
resources.
8. Explain Tokenization, Stemming, and Lemmatization? OR Write the difference between Stemming, and
Lemmatization
1. Tokenization
• Description: The process of splitting text into smaller units called tokens. Tokens are usually words or
phrases.
• Purpose: Converts a text string into manageable pieces for further analysis, such as words or sentences.
• Example:
2. Stemming
• Description: Reduces words to their base or root form by removing suffixes or prefixes.
• Purpose: Simplifies words to their root form to standardize them and group similar words.
• Approach: Uses heuristics and algorithms to strip suffixes (e.g., "running" → "run").
• Example:
o Word: "running"
3. Lemmatization
• Description: Reduces words to their base or dictionary form (lemma) considering the context and part of
speech.
• Purpose: Provides a more accurate base form of a word, taking into account its meaning and grammatical
role.
• Approach: Uses dictionaries and linguistic rules to return the proper base form (e.g., "running" → "run").
• Example:
o Word: "running"
Definition Cuts off prefixes or suffixes Reduces words to base or dictionary form
Method Uses rules and algorithms Uses dictionaries and grammatical rules
Often used in search engines and Often used in natural language processing for
Usage
information retrieval precise results
9. What is morphology parsing? & Write short note on Survey of English Morphology
Morphology Parsing
Definition:
• Analyzing the structure of words by breaking them into their smallest meaningful units, known as
morphemes (roots, prefixes, suffixes, inflections).
Purpose:
• Helps understand word formation and how different parts contribute to meaning and grammatical
function.
Applications in NLP:
1. Inflection:
• Description: Modifications to words to express grammatical features like tense, number, or case.
• Example: "play" (base form) → "played" (past tense), "cat" (singular) → "cats" (plural).
2. Derivation:
• Description: Creating new words by adding prefixes or suffixes to base words.
• Example: "happy" (base form) → "unhappy" (with prefix), "happiness" (with suffix).
3. Compounding:
• Description: Combining two or more words to form a new word with a specific meaning.
• Example: "tooth" + "brush" → "toothbrush".
5. Challenges:
• Irregular Forms: English has exceptions and irregular forms that do not follow standard patterns (e.g.,
"go" → "went").
• Ambiguity: Words can have multiple meanings or forms depending on context (e.g., "bark" as tree's
outer layer vs. a dog's sound).
6. Applications:
Definition:
• Patterns used to match sequences of characters in text, useful for searching, matching, and
manipulating text.
Basic Components:
1. Literal Characters:
o Definition: Match exact characters.
o Example: hello matches "hello".
2. Metacharacters:
o Definition: Special characters that define patterns.
o Examples: . (any character), ^ (start of a string), $ (end of a string), * (zero or more), + (one or
more), ? (zero or one), | (or), [] (character set), () (grouping), \ (escape).
Applications in NLP:
11. Write short note on a) finite automata b) N-gram model 3) Finite transducer
a) Finite Automata
Definition:
• A mathematical model that represents regular languages using states, transitions, an initial state, and
accepting states. It processes input symbols and transitions between states based on rules.
Types:
1. Deterministic Finite Automaton (DFA): Each state and input symbol pair has exactly one transition.
2. Nondeterministic Finite Automaton (NFA): Allows multiple transitions for a state and input symbol,
including epsilon (empty string) transitions.
Applications in NLP:
b) N-gram Model
Definition:
• A statistical language model that predicts the next word in a sequence based on the preceding N-1
words. It builds probabilistic models using sequences of N words.
Types:
Applications in NLP:
c) Finite Transducer
Definition:
• A computational model that transforms input sequences into output sequences. It extends finite
automata by including output symbols.
Types:
1. Deterministic Finite Transducer (DFT): Each state and input pair has exactly one transition with a
corresponding output.
2. Nondeterministic Finite Transducer (NFT): Allows multiple transitions with possible outputs for each
state and input.
Applications in NLP:
• Text Processing: Used in tasks like morphological analysis to transform words into their root forms.
• Machine Translation: Translates text from one language to another by mapping input text to target
language output.
12. How n-gram model is used for spelling correction? & also explain the variations of N gram model
•
• Real-World Errors: Identified by contextual analysis of actual words that are incorrect in context.
• Chain Rule Formula: Determines the probability of a word sequence based on the probabilities of
preceding words.
1. Unigram Model:
o Description: Considers single words independently.
o Application: Useful for simple frequency counts and basic predictions.
o Limitation: Lacks contextual information, which limits its predictive power.
2. Bigram Model:
o Description: Considers pairs of consecutive words.
o Application: Captures some context and dependencies between words.
o Example: Predicts the next word based on the preceding word.
o Limitation: Still limited in capturing longer context dependencies.
3. Trigram Model:
o Description: Considers triples of consecutive words.
o Application: Provides a better understanding of context by incorporating more previous
words.
o Example: Predicts the next word based on the previous two words.
o Limitation: Requires more data and memory, and may not handle very long contexts well.
• Coverage:
• Prediction Accuracy:
• Model Complexity:
• Large Corpus: Supports more complex models that understand longer sequences.
• Small Corpus: Limits the model to simpler forms, capturing less context.
• Domain Adaptation:
• Domain-Specific Corpus: Helps the model understand specific terms and contexts (like
medical or legal terms).
• General Corpus: May not perform as well in specialized fields.
• Balanced Corpus: Includes a mix of common and rare sequences, making the model more
robust.
• Biased Corpus: Skewed towards certain topics, which can limit the model’s performance in
other areas.