0% found this document useful (0 votes)
16 views13 pages

ML QBF

The document outlines key concepts in Natural Language Processing (NLP), including the differences between syntactic and lexical ambiguity, the stages of NLP, and various levels of knowledge involved. It discusses challenges in NLP, particularly in the context of Indian language processing, and provides detailed explanations of tokenization, stemming, lemmatization, and morphology parsing. Additionally, it covers the use of regular expressions for text manipulation and matching.

Uploaded by

bayilo7328
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views13 pages

ML QBF

The document outlines key concepts in Natural Language Processing (NLP), including the differences between syntactic and lexical ambiguity, the stages of NLP, and various levels of knowledge involved. It discusses challenges in NLP, particularly in the context of Indian language processing, and provides detailed explanations of tokenization, stemming, lemmatization, and morphology parsing. Additionally, it covers the use of regular expressions for text manipulation and matching.

Uploaded by

bayilo7328
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1. Differentiate between Syntactic ambiguity and lexical ambiguity.

Syntactic Ambiguity
1. Definition: Happens when a sentence can be interpreted in multiple ways because of its structure.
2. Example: "I saw the man with the telescope." (Did I use the telescope to see the man or did the man have
the telescope?)
3. Resolution: Solved by understanding the sentence structure.
4. Focus: On how words are arranged in a sentence.
5. Source: Complex sentence structures with multiple possible meanings.
6. Tools: Sentence parsers help figure out the correct structure.
7. Effect: Changes the overall meaning based on sentence structure.
8. Example: Different possible syntax trees for the same sentence.

Lexical Ambiguity
1. Definition: Happens when a word has more than one meaning.
2. Example: "Bank" (Could mean a place to store money or the side of a river.)
3. Resolution: Solved by using the surrounding words to figure out which meaning is correct.
4. Focus: On the different meanings of individual words.
5. Source: Words with multiple definitions.
6. Tools: Contextual clues and word sense disambiguation help identify the right meaning.
7. Effect: Changes how we understand individual words in context.

2. What is NLP? What are the applications of NLP?


Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses
on the interaction between computers and humans using natural language. The goal of NLP is
to enable machines to understand, interpret, and generate human language in a way that is both
meaningful and useful. NLP combines computational linguistics, machine learning, and deep
learning techniques to process and analyze large amounts of natural language data.
NLP encompasses various tasks, including but not limited to text analysis, speech recognition,
language translation, sentiment analysis, and more. It deals with both the syntactic (structure)
and semantic (meaning) aspects of language, allowing computers to understand not just the
words, but also the context in which they are used.
Here are brief explanations of the applications of NLP you mentioned:
• Machine Translation: Automatically translates text from one language to another.
• Database Access: Enables natural language queries to retrieve data from databases.
• Information Retrieval: Selects and retrieves relevant documents based on a user's query.
• Text Categorization: Sorts and classifies text into predefined topic categories.
• Extracting Data from Text: Converts unstructured text into structured data for analysis.
• Spoken Language Control Systems: Allows voice commands to control devices and
systems.
• Spelling and Grammar Checkers: Detects and corrects spelling and grammatical errors in
text.

3. What are the levels of NLP?

1. Phonology

• What It Is: Phonology is about the sounds of language and how they are used to convey
meaning.
• Key Ideas:
o Phonemes: The smallest sound units that can change meaning (e.g., "p" vs. "b" in "pat"
and "bat").
o Prosody: The rhythm and tone of speech that add emotion and meaning.
• Use in NLP: Important for making systems that recognize speech or convert text to speech
accurately.

2. Morphology

• What It Is: Morphology studies how words are formed from smaller parts (morphemes).
• Key Ideas:
o Morphemes: The smallest units of meaning (e.g., "un-" + "happy" = "unhappy").
o Inflectional Morphology: Changing a word's form to show tense or number (e.g.,
"run" → "running").
o Derivational Morphology: Creating new words by adding prefixes or suffixes (e.g.,
"happy" → "happiness").
• Use in NLP: Helps with breaking down words and understanding their meanings for tasks like
word stemming and lemmatization.

3. Syntax

• What It Is: Syntax is the study of how words are arranged to form sentences.
• Key Ideas:
o Grammar: The rules for how words combine to form sentences.
o Parsing: Breaking down a sentence to understand its grammatical structure.
• Use in NLP: Essential for tasks like sentence parsing, machine translation, and generating
correct sentences.

4. Semantics

• What It Is: Semantics is about the meaning of words and sentences.


• Key Ideas:
o Word Sense Disambiguation: Figuring out the correct meaning of a word based on
context.
o Semantic Roles: Understanding who did what to whom in a sentence.
• Use in NLP: Critical for understanding and processing meaning in tasks like sentiment
analysis and information retrieval.

5. Reasoning

• What It Is: Reasoning in NLP involves making inferences and conclusions from language.
• Key Ideas:
o Logical Reasoning: Using logic to draw conclusions (e.g., if A implies B, and A is
true, then B is true).
o Common-Sense Reasoning: Using general knowledge to understand implied
meanings.
• Use in NLP: Used in advanced applications like virtual assistants and AI systems that need to
make decisions based on language inputs.

o .

4. Explain the stages of NLP?

The stages of Natural Language Processing (NLP) involve a series of steps that transform raw text into
a form that machines can understand and analyze. Here’s a simple breakdown:

1. Morphological and Lexical Analysis:

• Purpose: Break down text into its basic components like words and morphemes (smallest units
of meaning).
• Processes:
o Tokenization: Splitting text into words or sentences.
o Stemming/Lemmatization: Reducing words to their root forms.

2. Syntactic Analysis (Parsing):

• Purpose: Analyze the structure of sentences to understand how words are arranged and related.
• Processes:
o Grammar Checking: Ensuring sentences are grammatically correct.
o Sentence Parsing: Identifying the grammatical structure (e.g., subject, verb, object).

3. Semantic Analysis:

• Purpose: Understand the meaning of words and sentences.


• Processes:
o Word Sense Disambiguation: Figuring out the correct meaning of a word based on
context.
o Semantic Role Labeling: Identifying the role of entities in the sentence (e.g., who did
what to whom).
4. Discourse Integration:

• Purpose: Understand the context and flow of multiple sentences or dialogue.


• Processes:
o Context Tracking: Keeping track of references (e.g., "it" in a sentence referring to
something mentioned earlier).
o Coherence and Cohesion: Ensuring that sentences connect logically.

5. Pragmatic Analysis:

• Purpose: Interpret the intended meaning behind the text based on context.
• Processes:
o Speech Act Recognition: Understanding the intention (e.g., a request, command, or
question).
o Contextual Interpretation: Interpreting what is meant based on the situation (e.g.,
understanding sarcasm or indirect requests).

5. Explain the challenges in NLP?

1. Ambiguity:

• Lexical Ambiguity: A word can have multiple meanings (e.g., "bat" can mean an animal or
sports equipment).
• Syntactic Ambiguity: A sentence structure can lead to multiple interpretations (e.g., "Visiting
relatives can be tiring" could mean either the act of visiting is tiring or the relatives themselves
are tiring).
• Semantic Ambiguity: The meaning of a sentence can be unclear (e.g., "He saw her duck"
could mean he saw her pet duck or saw her lower her head).

2. Contextual Understanding:

• Understanding the context of language, including previous sentences, cultural knowledge, or


situations, is hard for machines, leading to misunderstandings.

3. Sarcasm and Irony:

• Sarcasm and irony are difficult to detect because the literal meaning is often opposite to the
intended meaning. It requires a deeper understanding beyond just the text.

4. Idioms and Figurative Language:

• Idioms and metaphors (e.g., "kick the bucket" means "to die") are not meant to be taken
literally, making them hard for NLP systems to understand.

5. Domain-Specific Knowledge:
• Language differs across domains (e.g., medical, legal), and NLP systems trained in one area
might not perform well in another due to lack of specialized knowledge.

6. Low-Resource Languages:

• Some languages lack large datasets for training, making it harder to develop effective NLP
systems for them.

7. Language Evolution:

• Language changes over time with new words and slang, making it difficult for NLP systems to
stay current.

8. Handling Negation:

• Understanding sentences with negation (e.g., "I don't like pizza" vs. "I like pizza") is tricky, as
it requires recognizing subtle shifts in meaning.

9. Sentiment Analysis:

• Accurately determining sentiment (positive, negative, etc.) is hard, especially when emotions
are mixed, sarcasm is used, or the sentiment depends on context.

10. Data Privacy and Ethics:

• Using language data raises privacy and ethical issues, like ensuring models don’t reinforce
biases or misuse sensitive information.

6. Explain the knowledge level?

1. Phonological Level

• Knowledge: Understanding the sounds of language, including pronunciation and rhythm.


• Use: Important for speech recognition and generating speech.

2. Morphological Level

• Knowledge: Understanding how words are constructed from smaller units like prefixes and
roots.
• Use: Helps break down words into their base forms (e.g., "running" → "run").

3. Syntactic Level

• Knowledge: Knowing how words are arranged in sentences according to grammar rules.
• Use: Essential for figuring out sentence structure and ensuring grammatical correctness.

4. Semantic Level
• Knowledge: Understanding the meaning of words and sentences.
• Use: Helps interpret the actual meaning of text and resolve ambiguities.

5. Pragmatic Level

• Knowledge: Interpreting language based on context and the speaker's intentions.


• Use: Crucial for understanding implied meanings, such as sarcasm or indirect requests.

6. Discourse Level

• Knowledge: Understanding how sentences and parts of a text relate to each other.
• Use: Helps in making sense of the overall flow of a conversation or text.

7. World Knowledge Level

• Knowledge: Using general knowledge about the world to understand references and implied
meanings in text.
• Use: Important for answering questions, making inferences, and understanding context beyond
the text itself.

7. Write short note on Indian language processing ?

1. Diverse Languages

• Languages: India has 22 major languages and many dialects, each with unique scripts and
grammar.
• Challenges: The diversity makes it difficult to create universal NLP tools that work for all
languages.

2. Resources and Tools

• Data: Large, well-annotated datasets are not always available for every Indian language,
making it tough to train models.
• Tools: Translators and text processors are being developed, especially for major languages like
Hindi, Tamil, and Bengali.

3. Applications

• Translation: Translating text between Indian languages and English.


• Speech Recognition: Converting spoken Indian languages into text.
• Sentiment Analysis: Analyzing opinions expressed in Indian languages on social media and
other platforms.

4. Challenges

• Data Scarcity: Many Indian languages lack sufficient digital resources.


• Script Differences: Different scripts and spelling variations add complexity.
• Dialects: Numerous regional dialects make it difficult to create a single solution for all.

5. Progress

• Research: There is increasing effort to improve tools and technology for Indian languages.
• Initiatives: Government and organizations are actively working to develop better language
resources.

8. Explain Tokenization, Stemming, and Lemmatization? OR Write the difference between Stemming, and
Lemmatization

1. Tokenization

• Description: The process of splitting text into smaller units called tokens. Tokens are usually words or
phrases.

• Purpose: Converts a text string into manageable pieces for further analysis, such as words or sentences.

• Example:

o Text: "The quick brown fox."

o Tokens: ["The", "quick", "brown", "fox"]

2. Stemming

• Description: Reduces words to their base or root form by removing suffixes or prefixes.

• Purpose: Simplifies words to their root form to standardize them and group similar words.

• Approach: Uses heuristics and algorithms to strip suffixes (e.g., "running" → "run").

• Example:

o Word: "running"

o Stemmed Word: "run"

3. Lemmatization

• Description: Reduces words to their base or dictionary form (lemma) considering the context and part of
speech.

• Purpose: Provides a more accurate base form of a word, taking into account its meaning and grammatical
role.

• Approach: Uses dictionaries and linguistic rules to return the proper base form (e.g., "running" → "run").

• Example:

o Word: "running"

o Lemma: "run" (contextually correct base form)


Feature Stemming Lemmatization

Definition Cuts off prefixes or suffixes Reduces words to base or dictionary form

Method Uses rules and algorithms Uses dictionaries and grammatical rules

Example "Running" → "run" "Running" → "run"

Output May produce non-words Produces real, valid words

Speed Generally faster Generally slower

More accurate; considers context and part of


Accuracy Less accurate; may not consider context
speech

Complexity Simple process More complex process

Often used in search engines and Often used in natural language processing for
Usage
information retrieval precise results

9. What is morphology parsing? & Write short note on Survey of English Morphology

Morphology Parsing

Definition:

• Analyzing the structure of words by breaking them into their smallest meaningful units, known as
morphemes (roots, prefixes, suffixes, inflections).

Purpose:

• Helps understand word formation and how different parts contribute to meaning and grammatical
function.

Applications in NLP:

• Part-of-Speech Tagging: Identifying grammatical roles of words in a sentence.


• Named Entity Recognition: Detecting and classifying entities such as names and locations.
• Machine Translation: Translating text by understanding word structures in both source and target
languages.

Survey of English Morphology in NLP

1. Inflection:

• Description: Modifications to words to express grammatical features like tense, number, or case.
• Example: "play" (base form) → "played" (past tense), "cat" (singular) → "cats" (plural).

2. Derivation:
• Description: Creating new words by adding prefixes or suffixes to base words.
• Example: "happy" (base form) → "unhappy" (with prefix), "happiness" (with suffix).

3. Compounding:

• Description: Combining two or more words to form a new word with a specific meaning.
• Example: "tooth" + "brush" → "toothbrush".

4. Morphological Analysis Tools:

• Tokenizers: Break text into words and morphemes.


• Stemmers: Reduce words to their base form (e.g., "running" → "run").
• Lemmatizers: Convert words to their dictionary form based on context (e.g., "running" → "run").

5. Challenges:

• Irregular Forms: English has exceptions and irregular forms that do not follow standard patterns (e.g.,
"go" → "went").
• Ambiguity: Words can have multiple meanings or forms depending on context (e.g., "bark" as tree's
outer layer vs. a dog's sound).

6. Applications:

• Search Engines: Improve search accuracy by understanding word variations.


• Text Analysis: Enhance understanding and processing of text in applications like sentiment analysis
and machine translation.

10. Explain Regular expression with types?

Regular Expressions in NLP

Definition:

• Patterns used to match sequences of characters in text, useful for searching, matching, and
manipulating text.

Basic Components:

1. Literal Characters:
o Definition: Match exact characters.
o Example: hello matches "hello".
2. Metacharacters:
o Definition: Special characters that define patterns.
o Examples: . (any character), ^ (start of a string), $ (end of a string), * (zero or more), + (one or
more), ? (zero or one), | (or), [] (character set), () (grouping), \ (escape).

Types of Regular Expressions:

1. Basic Character Matching:


o Literal Matching: Matches specific characters.
▪ Example: cat matches "cat".
o Character Classes: Matches any one of a set of characters.
▪ Example: [abc] matches "a", "b", or "c".
2. Quantifiers:
o Asterisk *: Matches zero or more occurrences.
▪ Example: a* matches "", "a", "aa", etc.
o Plus +: Matches one or more occurrences.
▪ Example: a+ matches "a", "aa", "aaa", etc.
o Question Mark ?: Matches zero or one occurrence.
▪ Example: a? matches "" or "a".
3. Anchors:
o Caret ^: Matches the start of a string.
▪ Example: ^cat matches "cat" at the beginning.
o Dollar $: Matches the end of a string.
▪ Example: cat$ matches "cat" at the end.
4. Groups and Ranges:
o Parentheses (): Groups parts of a pattern.
▪ Example: (abc)+ matches "abc", "abcabc", etc.
o Brackets []: Defines a set of characters.
▪ Example: [a-z] matches any lowercase letter.
5. Special Characters:
o Dot .: Matches any single character except newline.
▪ Example: a.c matches "abc", "a-c", etc.
o Backslash \: Escapes special characters or denotes special sequences.
▪ Example: \d matches any digit (0-9).
6. Alternation:
o Pipe |: Matches either the pattern before or after the pipe.
▪ Example: cat|dog matches "cat" or "dog".

Applications in NLP:

• Text Preprocessing: Cleaning and preparing text data.


• Tokenization: Splitting text into tokens based on patterns.
• Data Extraction: Extracting specific information from text based on patterns.

11. Write short note on a) finite automata b) N-gram model 3) Finite transducer

a) Finite Automata

Definition:

• A mathematical model that represents regular languages using states, transitions, an initial state, and
accepting states. It processes input symbols and transitions between states based on rules.

Types:

1. Deterministic Finite Automaton (DFA): Each state and input symbol pair has exactly one transition.
2. Nondeterministic Finite Automaton (NFA): Allows multiple transitions for a state and input symbol,
including epsilon (empty string) transitions.

Applications in NLP:

• Pattern Matching: Identifying and matching patterns in text.


• Tokenization: Segmenting text into words or tokens.

b) N-gram Model

Definition:

• A statistical language model that predicts the next word in a sequence based on the preceding N-1
words. It builds probabilistic models using sequences of N words.

Types:

1. Unigram: Considers individual words (N=1).


2. Bigram: Considers pairs of consecutive words (N=2).
3. Trigram: Considers triples of consecutive words (N=3), and so on.

Applications in NLP:

• Text Prediction: Suggests the next word in text input.


• Speech Recognition: Improves accuracy by predicting likely word sequences.
• Text Generation: Creates coherent text sequences based on learned patterns.

c) Finite Transducer

Definition:

• A computational model that transforms input sequences into output sequences. It extends finite
automata by including output symbols.

Types:

1. Deterministic Finite Transducer (DFT): Each state and input pair has exactly one transition with a
corresponding output.
2. Nondeterministic Finite Transducer (NFT): Allows multiple transitions with possible outputs for each
state and input.

Applications in NLP:

• Text Processing: Used in tasks like morphological analysis to transform words into their root forms.
• Machine Translation: Translates text from one language to another by mapping input text to target
language output.
12. How n-gram model is used for spelling correction? & also explain the variations of N gram model


• Real-World Errors: Identified by contextual analysis of actual words that are incorrect in context.

N-gram Model for Error Detection:

• Usage: Helps detect both non-word and real-world errors.


• Method: Analyzes letter combination frequencies in a large corpus to identify unlikely or rare
sequences.
• Example: Identifying non-word errors using rare bigrams or trigrams.

Training Data for N-gram Models:

• Requirement: Requires a large corpus or dictionary.


• Purpose: Helps compile an N-gram table of possible letter combinations for accurate detection and
correction.

N-gram Probability Calculation:

• Chain Rule Formula: Determines the probability of a word sequence based on the probabilities of
preceding words.

Variations of N-gram Models

1. Unigram Model:
o Description: Considers single words independently.
o Application: Useful for simple frequency counts and basic predictions.
o Limitation: Lacks contextual information, which limits its predictive power.
2. Bigram Model:
o Description: Considers pairs of consecutive words.
o Application: Captures some context and dependencies between words.
o Example: Predicts the next word based on the preceding word.
o Limitation: Still limited in capturing longer context dependencies.
3. Trigram Model:
o Description: Considers triples of consecutive words.
o Application: Provides a better understanding of context by incorporating more previous
words.
o Example: Predicts the next word based on the previous two words.
o Limitation: Requires more data and memory, and may not handle very long contexts well.

13. Explain N-gram Sensitivity to the Training Corpus

• Coverage:

• Large Corpus: Includes lots of word sequences, improving predictions.


• Small Corpus: Might miss many sequences, leading to less accurate predictions.

• Handling Unseen Sequences:


• Small Corpus: May not cover all possible sequences, making it hard to handle new or rare
combinations.
• Solution: Use techniques to estimate probabilities for sequences not seen in the training data.

• Prediction Accuracy:

• Well-Represented Corpus: Provides better predictions by including a wide range of contexts.


• Underrepresented Corpus: Results in less accurate predictions due to missing information.

• Model Complexity:

• Large Corpus: Supports more complex models that understand longer sequences.
• Small Corpus: Limits the model to simpler forms, capturing less context.

• Domain Adaptation:

• Domain-Specific Corpus: Helps the model understand specific terms and contexts (like
medical or legal terms).
• General Corpus: May not perform as well in specialized fields.

• Frequency and Distribution:

• Balanced Corpus: Includes a mix of common and rare sequences, making the model more
robust.
• Biased Corpus: Skewed towards certain topics, which can limit the model’s performance in
other areas.

You might also like