0% found this document useful (0 votes)
12 views31 pages

NLP - Sem

The document discusses key concepts in morphology and natural language processing (NLP), including inflectional vs. derivational morphology, lemmatization, regular expressions, finite automata, finite state transducers, and part-of-speech tagging. It highlights the importance of these concepts in text processing, pattern recognition, and understanding grammatical structures. Additionally, it addresses challenges in POS tagging, such as ambiguity and handling unknown words.

Uploaded by

OM PATEL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views31 pages

NLP - Sem

The document discusses key concepts in morphology and natural language processing (NLP), including inflectional vs. derivational morphology, lemmatization, regular expressions, finite automata, finite state transducers, and part-of-speech tagging. It highlights the importance of these concepts in text processing, pattern recognition, and understanding grammatical structures. Additionally, it addresses challenges in POS tagging, such as ambiguity and handling unknown words.

Uploaded by

OM PATEL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Unit – 2

1. Differentiate between inflectional morphology and derivational morphology

Feature Inflectional Morphology Derivational Morphology

Involves the process of adding inflectional Involves adding derivational


morphemes to a word to indicate grammatical morphemes to create a new word,
Definition
features like tense, number, gender, mood, etc., which may change the word’s
without changing the word’s category. meaning or grammatical category.

Creates new words or changes the


Primarily for grammatical variation (e.g., tense,
Purpose word’s meaning and/or grammatical
number, case, etc.).
category.

Effect on Word Does not usually change the core meaning of the Often changes the core meaning or
Meaning word, only adds grammatical distinctions. the syntactic category of the word.

Can change the grammatical category


Grammatical Does not change the grammatical category of the
of the word (e.g., noun to verb,
Category word (e.g., verb stays a verb, noun stays a noun).
adjective to noun).

Help → Helper (noun), Logic →


Cat → Cats (plural), Walk → Walked (past tense),
Examples Logical (adjective), Pure → Impure
Happy → Happier (comparative)
(meaning change)

Can affect the function of the word in


Position in Does not change how a word functions within a
the sentence (e.g., noun, verb,
Sentence sentence (e.g., subject, object).
adjective).

Morphological Includes suffixes such as "-s," "-ed," "-ing," "-est," Includes affixes like "-er," "-ly," "-
Affixes etc. ness," "-ful," etc.

Derivational forms of a word are


Entry in Inflectional forms of a word are typically not
usually listed as separate entries in
Dictionary listed separately in the dictionary.
the dictionary.

Examples of -s (plural), -ed (past tense), -ing (present -er (worker), -ly (quickly), -ness
Suffixes participle), -est (superlative) (happiness), -ful (careful)

2. Explain the concept of lemmatization with examples.


Lemmatization is a text preprocessing technique in Natural Language Processing (NLP) where words are
reduced to their base or dictionary form (lemma) while considering their meaning and grammatical role. It
ensures the root form is a valid dictionary word.
Key Features:
• Uses a dictionary and part-of-speech (POS) information to determine the correct root form.
• More accurate than stemming because it considers the meaning of the word.
• Always produces valid dictionary words
• Slower as it requires a dictionary lookup and contextual understanding.
• Used in applications that need precise text processing, such as machine translation and sentiment
analysis.
How Lemmatization Works:
1. Identifies the word’s part of speech (POS) (e.g., noun, verb, adjective).
2. Looks up the dictionary form (lemma) based on POS and meaning.
3. Returns the correct root word, ensuring it exists in the dictionary.
Examples:
• "running" → "run" (verb)
• "better" → "good" (adjective)
• "mice" → "mouse" (noun)

3. What is the role of regular expressions in NLP? Provide examples.


Regular Expressions (RegEx) are a powerful tool in Natural Language Processing (NLP) used for pattern
matching, text extraction, and text preprocessing. They help in identifying and manipulating specific text
patterns efficiently.
Key Roles of Regular Expressions in NLP:
1. Tokenization – Splitting text into words or sentences.
o Example: re.split(r'\s+', "Natural Language Processing is amazing")
Output: ['Natural', 'Language', 'Processing', 'is', 'amazing']
2. Text Cleaning: Removing special characters, extra spaces, or unwanted symbols.
o Example: re.sub(r'[^a-zA-Z0-9\s]', '', "Hello! How's NLP?")
Output: "Hello Hows NLP"
3. Pattern matching – Extracting useful information like emails, dates, phone numbers.
o Example (Extracting emails):
o re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', "Contact me at
[email protected]")
Output: ['[email protected]']
4. Text Normalization: Standardizing formats, such as converting all text to lowercase
o Example (Converting multiple spaces into a single space):
o re.sub(r'\s+', ' ', "This is NLP ")
Output: "This is NLP"
5. Extracting Numeric Data – Useful for financial and statistical NLP tasks.
o Example: Extracting numbers from text:
o re.findall(r'\d+', "The price is 250 dollars and the discount is 20%")
Output: ['250', '20']
Common Functions in NLP using Regular Expressions:
1. findall()
• findall() searches for all occurrences of a pattern in a given string and returns them as a list.
• It is useful for extracting specific patterns from text.
Syntax: re.findall(pattern, string)
• pattern: The regular expression pattern to search for.
• string: The input string where the search is performed.
• Returns: A list of matched substrings.
2. sub()
• sub() replaces all occurrences of a pattern in a string with a specified replacement text.
• It is useful for text cleaning and normalization.
Syntax: re.sub(pattern, replacement, string)
• pattern: The regular expression pattern to replace.
• replacement: The string that will replace the matched pattern.
• string: The input string where the replacements are applied.
• Returns: A new string with replacements.

4. Define finite automata and its use in NLP.


A Finite Automaton (FA) is a mathematical model of computation used to recognize patterns in input data.
It consists of states, transitions, an initial state, and one or more final states. FA processes input symbols
one by one and determines whether the input belongs to a specific pattern or language.
Types of Finite Automata:
1. Deterministic Finite Automaton (DFA) – Each input leads to exactly one next state.
2. Non-Deterministic Finite Automaton (NFA) – Each input can lead to multiple possible states.
Use of Finite Automata in NLP:
Finite Automata are widely used NLP for tasks involving pattern recognition and text processing:
• Lexical Analysis: Used in tokenization to recognize keywords, identifiers, and operators.
• Regular Expressions Processing: Helps implement regex for text searching and matching.
• Text Parsing: Used in syntax analysis and structured text processing.
• Spell Checking: Helps in recognizing valid and invalid words.
Thus, Finite Automata help in efficient text and language processing by recognizing structured patterns,
making them crucial in NLP applications like parsing, lexical analysis, and information retrieval.
Example of "baa+" in Finite Automata
The regular expression "baa+" represents strings that:
• Start with "b"
• Have exactly one "a" after "b"
• Followed by one or more occurrences of "a"
Examples of valid strings matching "baa+":
1. "baa" (Starts with "b", followed by two "a"s)
2. "baaa" (Starts with "b", followed by three "a"s)
3. "baaaaa" (Starts with "b", followed by multiple "a"s)
Examples of invalid strings:
1. "b" (Missing "a+")
2. "ba" (Needs at least two "a"s)
3. "bbbaaa" (Does not start exactly with "b")
4. "aaa" (Does not start with "b")
Daw state transition diagram: q0 → (b) → q1 → (a) → q2 → (a)+ → q2 (Final State)

5. How does a finite state transducer (FST) help in morphological parsing?


A Finite State Transducer (FST) is an extension of a Finite Automaton that processes input sequences and
produces corresponding output sequences. Unlike Finite State Automata (FSA), which only recognize
patterns, FST maps input to output, making it useful for language processing tasks such as morphological
parsing.
Role of FST in Morphological Parsing:
Morphological parsing is the process of analyzing the internal structure of words by breaking them into
root words, prefixes, suffixes, and grammatical features. FST helps in this by mapping surface forms (words
as they appear) to their lexical forms (root word + morphological features).
How FST Works in Morphological Parsing:
1. Segmentation: Breaks words into root morphemes and affixes.
2. Mapping: Translates the surface form (actual word) into the lexical representation.
3. Two-Level Representation:
o Input: The actual word (e.g., "running").
o Output: Root + morphological features (e.g., "run + ING").
Examples of FST in Morphological Parsing:
Surface Form (Input) Lexical Form (Output)

"cats" "cat + PLURAL"

"running" "run + ING"

"happier" "happy + ER"

Applications of FST in NLP:


• Lemmatization: Reducing words to their base forms.
• Part-of-Speech Tagging: Identifying grammatical roles.
• Machine Translation: Mapping words between languages.
• Speech Recognition: Handling phoneme variations.
6. How does finite state morphology assist in word-level analysis?
Finite State Morphology (FSM) refers to the use of Finite State Automata (FSA) and Finite State
Transducers (FST) to analyze and generate word forms in Natural Language Processing (NLP). It helps in
word-level analysis by identifying the structure, root form, and grammatical features of words.
How Finite State Morphology Assists in Word-Level Analysis:
1. Morphological Parsing
FSM helps analyze word structure by breaking words into their root forms and morphemes (prefixes,
suffixes, and inflections).
• Example: "unhappiness" → "un" + "happy" + "ness"
• Helps in understanding meaning and word formation rules.
2. Lemmatization
FSM reduces words to their base or dictionary form while considering grammatical context.
• Example: "running" → "run"
• Used in information retrieval, text processing, and NLP applications.
3. Inflectional and Derivational Morphology
• Inflectional Morphology: Identifies grammatical changes like tense, number, or case.
o Example: "dogs" → "dog + PLURAL"
• Derivational Morphology: Identifies word formation changes that alter meaning or part of speech.
o Example: "happiness" → "happy + NESS"
4. Word Generation (Morphological Generation)
FSM can generate words from their root form by adding appropriate affixes.
• Example: "run + PAST" → "ran"
• Used in language modelling and speech synthesis.
5. Spell Checking and Correction
FSM helps validate words by matching them against a finite state lexicon and suggests corrections.
• Example: "teh" → "the"

Applications of Finite State Morphology in NLP:


Machine Translation – Helps in word transformations across languages.
Text-to-Speech Systems – Assists in pronouncing words correctly.
Named Entity Recognition (NER) – Identifies proper nouns and word categories.
Search Engines & Information Retrieval – Improves search accuracy by considering word forms.
Thus, Finite State Morphology assists in word-level analysis by efficiently handling word structure,
lemmatization, inflection, generation, and validation, making it a fundamental tool in NLP.
Unit -3
1. What is Part-Of-Speech (POS) tagging, and why is it important?
Part-of-Speech (POS) Tagging is the process of assigning grammatical categories (such as noun, verb,
adjective, etc.) to words in a given text based on context and definition. It helps in understanding the
syntactic structure of a sentence.
Why is POS Tagging Important?
1. Improves NLP Understanding: Helps machines understand sentence structure and word relationships.
2. Aids in Syntactic Parsing: Helps in parsing sentences for grammar correction and linguistic analysis.
3. Useful in Named Entity Recognition (NER): Distinguishes proper nouns (names, places, organizations,
etc.) from common words.
4. Enhances Information Retrieval: Improves search engines by recognizing the contextual meaning of
words.
5. Supports Text-to-Speech Applications: Determines the correct pronunciation of words based on
context.
6. Essential for Machine Translation: Helps in accurately translating words based on their grammatical
roles.
Example of POS Tagging:
Sentence: "The quick brown fox jumps over the lazy dog."
POS Tags:
Word POS Tag Word POS Tag
The Determiner (DT) fox Noun (NN)
quick Adjective (JJ) jumps Verb (VBZ)
brown Adjective (JJ) over Preposition (IN)

2. Describe the Penn Treebank tag set and its role in NLP.
The Penn Treebank (PTB) Tag Set is a widely used Part-of-Speech (POS) tagging system developed as part
of the Penn Treebank Project. It is used in NLP to label words with their corresponding grammatical roles,
enabling accurate syntactic and semantic analysis. It consists of 45 POS tags, categorizing words into
grammatical types such as nouns, verbs, adjectives, adverbs, prepositions, pronouns, determiners, and
conjunctions.

Role of the Penn Treebank Tag Set in NLP


1. Standardized POS Tagging:
o The PTB tag set provides a uniform labeling system for linguistic annotation, making it a benchmark for
NLP research and applications.
o It ensures consistency in text analysis, making it easier to compare results across different models and
datasets.
2. Improves NLP Applications:
o Used in syntactic parsing, named entity recognition (NER), sentiment analysis, machine translation, and
text summarization.
o Helps disambiguate word meanings by providing grammatical context.
3. Enhances Text Preprocessing:
o Supports lemmatization, stemming, and dependency parsing by providing contextual information
about words.
o Helps in speech recognition by identifying parts of speech and their relationships in sentences.
4. Supports NLP Model Training:
o Used to train POS taggers such as Hidden Markov Models (HMM), Conditional Random Fields (CRF),
and deep learning-based models.
o Helps improve the accuracy of NLP models by providing high-quality labeled data for machine learning
tasks.

Examle:
The Determiner (DT) fox Noun (NN)
quick Adjective (JJ) jumps Verb (VBZ)
brown Adjective (JJ) over Preposition (IN)

3. Compare rule-based and stochastic POS tagging.

Feature Rule-Based POS Tagging Stochastic POS Tagging

Approach Uses predefined rules and dictionaries Uses probability/statistical models

Dependency Relies on linguistic knowledge Relies on annotated training data

Higher for complex and ambiguous


Accuracy High for structured languages
sentences

Struggles with ambiguity (requires Handles ambiguity better using


Handling Ambiguity
manual rule handling) probabilities

Poor (needs new rules for unknown


Handling New Words Better (can learn from data patterns)
words)

Low (probability-based, often a "black


Interpretability High (rules are transparent)
box")

Training Data
No training data needed Needs a large annotated corpus
Requirement

Speed Faster for small datasets Slower due to probability calculations

Limited (new rules must be manually More flexible (adapts to different domains
Adaptability
added) and languages)
4. Explain the challenges of multiple tags and unknown words in POS tagging.
1. Challenges of Multiple Tags (Ambiguity in POS Tagging)
In POS tagging, a single word can have multiple possible tags depending on the context. This issue, known
as ambiguity, occurs when a word can function as more than one part of speech.
Types of Ambiguity:
• Lexical Ambiguity: A word can belong to multiple POS categories.
o Example: "Can"
▪ Noun: I bought a can of soda.
▪ Verb: Can you help me?
• Syntactic Ambiguity: The sentence structure leads to multiple interpretations.
o Example: "He saw the man with the telescope."
▪ Did he have the telescope, or did the man have it?
Solutions to Handle Multiple Tags:
• Context-Based Disambiguation: Using neighboring words to determine the correct tag.
• Statistical Models: Hidden Markov Models (HMM) or Conditional Random Fields (CRF) predict the
most likely tag.
• Deep Learning: Neural networks learn from large annotated datasets to resolve ambiguity.

2. Challenges of Unknown Words (Out-of-Vocabulary or OOV Words)


Unknown words are those not present in the training data. These include:
• New Words: New words introduced into the language (e.g., "selfie," "cryptocurrency")
• Named Entities: Proper nouns (e.g., Elon Musk, Netflix)
• Slang or Informal Words: (e.g., "gonna," "LOL")
• Misspellings or Variants: (e.g., "color" vs. "colour")
Solutions to Handle Unknown Words:
• Morphological Analysis: Breaking down words into root and affixes (e.g., unhappiness → happy).
• Word Embeddings: Neural models like Word2Vec or BERT learn representations for unseen words.
• Suffix-Based Guessing: Assigning POS tags based on common suffixes (e.g., -ly → adverb).
• Backoff Strategies: Assigning the most frequent POS tag from similar words.

5. What is a Context-Free Grammar (CFG)? Provide an example.


A Context-Free Grammar (CFG) is a formal grammar used to define the syntactic structure of languages,
particularly in NLP and compiler design. It consists of a set of production rules that describe how sentences
in a language are generated from a starting symbol.
Components of CFG
A CFG consists of four key elements:
1. Non-Terminals (N) → Variables that represent different syntactic structures (e.g., Sentence, Noun
Phrase).
2. Terminals (Σ) → Actual words or symbols in the language (e.g., "cat", "runs", "a").
3. Production Rules (P) → Rules that define how non-terminals can be replaced by other non-terminals or
terminals.
4. Start Symbol (S) → The initial non-terminal from which parsing starts.
Example of a Simple CFG Importance of CFG in NLP
Sentence: "The cat chased a dog." • Used in syntactic parsing to analyze
sentence structure.
• Forms the basis of natural language
understanding (NLU) in AI applications.
• Helps in programming language compilers
to define valid syntax.

6. Discuss the limitations of rule-based POS tagging.


Rule-based Part-of-Speech (POS) tagging relies on manually crafted linguistic rules and dictionaries to
assign POS tags to words. While effective in certain cases, it has several limitations:
1. Ambiguity Handling is Challenging
• Many words have multiple possible tags depending on context (e.g., "book" as a noun vs. verb in "I
will book a ticket.").
• Rule-based systems struggle with disambiguation, requiring complex handcrafted rules.

2. Requires Extensive Linguistic Knowledge


• Rules must be manually designed by linguists, which is time-consuming and labor-intensive.
• The system depends heavily on human expertise, making it difficult to scale across different
languages.

3. Hard to Maintain and Expand


• As language evolves, new words and structures emerge (e.g., internet slang, domain-specific
terms).
• Adding new rules may create conflicts with existing rules, making maintenance complex.

4. Struggles with Unknown Words (OOV – Out-of-Vocabulary)


• Rule-based taggers rely on predefined dictionaries, so unknown words (e.g., new slang, technical
jargon) remain unrecognized or are tagged incorrectly.
• Example: "Cryptocurrency" may not have a predefined rule and could be misclassified.
5. Computational Inefficiency
• Rule-based taggers check multiple rules sequentially, which can slow down processing.
• As rules increase, performance may degrade, making the system inefficient for large-scale text
processing.

6. Less Adaptability to Different Domains & Languages


• Rule-based systems struggle to adapt to domain-specific texts (e.g., medical, legal, technical fields)
without extensive manual modifications.
• Language-specific grammar differences require entirely new sets of rules for different languages.

7. How does sequence labeling help in NLP tasks?


Sequence labeling is a fundamental NLP technique where each word or token in a sentence is assigned a
label. It is useful in many language processing tasks that require understanding the role of each word
within a sequence.
1. Named Entity Recognition (NER)
Goal: Identify entities such as names, locations, organizations, and dates.
• Example: "Apple was founded in California in 1976."
o Apple → ORG
o California → LOC
o 1976 → DATE
• Importance: Helps in information extraction, search engines, and chatbots.

2. Part-of-Speech (POS) Tagging


Goal: Assign grammatical categories (noun, verb, adjective, etc.) to each word.
• Example: "The cat sits on the mat."
o The → DET
o cat → NOUN
o sits → VERB
• Importance: Improves syntactic parsing, grammar checking, and machine translation.

3. Chunking (Shallow Parsing)


Goal: Identify phrases such as noun phrases (NP) or verb phrases (VP).
• Example: "The quick brown fox jumps over the lazy dog."
o [The quick brown fox] → NP
o [jumps over] → VP
• Importance: Helps in syntactic analysis and machine translation.

4. Sentiment Analysis
Goal: Identify emotion or opinion in text by labeling words as positive, negative, or neutral.
• Example: "The movie was absolutely amazing!"
o amazing → POSITIVE
• Importance: Used in customer feedback analysis, social media monitoring, and product reviews.

5. Machine Translation (MT)


Goal: Identify words and their grammatical structure for better translation.
• Example: English → Spanish
o "She is going to school." → "Ella va a la escuela."
• Importance: Helps improve Google Translate, DeepL, and other AI-driven translations.
6. Coreference Resolution
Goal: Identify which words refer to the same entity in a sentence.
• Example: "John said he will come tomorrow."
o he → John
• Importance: Improves chatbots, document summarization, and question-answering systems.

Unit- 4
1. What is lexical semantics? Provide examples.
Lexical semantics is a subfield of linguistics and natural language processing (NLP) that focuses on the
meaning of words and their relationships with each other in a given language. It helps in understanding
how words convey meaning based on their context, structure, and usage.
Key Aspects of Lexical Semantics
1. Word Meaning and Sense
o Words can have multiple senses or meanings depending on context.
o Example:
▪ "She wore a diamond ring." (ring = jewelry)
▪ "I heard the phone ring." (ring = sound)
2. Synonymy (Similar Meaning Words)
o Words with similar meanings but different forms.
o Example: big ≈ large, buy ≈ purchase
3. Antonymy (Opposite Meaning Words)
o Words that have opposite meanings.
o Example: hot ↔ cold, happy ↔ sad
4. Hyponymy and Hypernymy (Word Hierarchies)
o Hyponym: A specific word under a broader category.
o Hypernym: A general category that includes hyponyms.
o Example:
▪ Dog (hyponym) → Animal (hypernym)
▪ Rose (hyponym) → Flower (hypernym)
5. Polysemy (Multiple Related Meanings of a Word)
o A single word with multiple related meanings.
o Example:
▪ Bank (financial institution)
▪ Bank (side of a river)
6. Homonymy (Same Word, Unrelated Meaning)
o Words that sound or look alike but have different meanings.
o Example:
▪ Bat (flying mammal)
▪ Bat (sports equipment)
7. Collocations (Commonly Co-Occurring Words)
o Words that naturally appear together.
o Example:
▪ Fast food (not quick food)
▪ Heavy rain (not big rain)
8. Lexical Ambiguity
o A word with multiple possible interpretations.
▪ Example: "He saw the bat." (bat = animal or sports equipment?)

2. Explain attachment ambiguities in English sentences.


Attachment ambiguity occurs when a phrase or clause in a sentence can be attached to more than one
part of the sentence, leading to multiple possible interpretations. It is a common issue in syntactic parsing
and natural language processing (NLP).
Types of Attachment Ambiguity
1. Prepositional Phrase (PP) Attachment Ambiguity
o A prepositional phrase (PP) can be attached to different parts of a sentence, causing
multiple meanings.
o Example:
▪ "I saw the man with the telescope."
▪ Meaning 1: I used a telescope to see the man. (PP modifies "saw")
▪ Meaning 2: The man had a telescope. (PP modifies "man")
2. Relative Clause Attachment Ambiguity
o A relative clause (who, which, that, etc.) can modify different parts of a sentence.
o Example:
▪ "The police arrested the man who was on the bridge."
▪ Meaning 1: The police arrested a specific man standing on the bridge.
▪ Meaning 2: The police arrested a man, and they themselves were on the
bridge.
3. Adjective or Adverb Attachment Ambiguity
o An adjective or adverb can modify different words in the sentence.
o Example:
▪ "She hit the thief with a stick."
▪ Meaning 1: She used a stick to hit the thief. (PP modifies "hit")
▪ Meaning 2: The thief had a stick. (PP modifies "thief")
4. Coordination Ambiguity (Conjunction Attachment)
o When "and" or "or" is used, it may be unclear which parts of the sentence are connected.
o Example:
▪ "Old men and women gathered at the park."
▪ Meaning 1: (Old men) and (women) gathered.
▪ Meaning 2: (Old men and old women) gathered.

3. How do relations among lexemes help in semantic analysis?


Lexemes are the basic units of meaning in a language. The relationships among lexemes help in semantic
analysis, which involves understanding meaning in text. These relations assist in disambiguation,
knowledge representation, and natural language understanding (NLU) in NLP applications.
Types of Lexeme Relations and Their Role in Semantic Analysis
1. Synonymy (Similarity in Meaning)
o Definition: Words with similar meanings.
o Example: happy ↔ joyful, big ↔ large
o Role in Semantic Analysis:
▪ Helps in query expansion in search engines.
▪ Used in paraphrase detection and text summarization.
2. Antonymy (Opposites in Meaning)
o Definition: Words with opposite meanings.
o Example: hot ↔ cold, fast ↔ slow
o Role in Semantic Analysis:
▪ Used in sentiment analysis to determine polarity.
▪ Helps in word sense disambiguation (WSD).
3. Hyponymy and Hypernymy (Hierarchical Relations)
o Hyponymy (Subtype Relation): A specific term under a broader category.
o Hypernymy (Superclass Relation): A broader category encompassing specific terms.
o Example:
▪ Dog (hyponym) → Animal (hypernym)
▪ Rose (hyponym) → Flower (hypernym)
o Role in Semantic Analysis:
▪ Used in taxonomy creation (e.g., WordNet).
▪ Helps in ontology development and text classification.
4. Polysemy (One Word, Multiple Meanings)
o Definition: A word with multiple meanings.
o Example:
▪ Bank (financial institution) vs. Bank (riverbank)
▪ Light (not heavy) vs. Light (illumination)
o Role in Semantic Analysis:
▪ Important in word sense disambiguation (WSD).
▪ Reduces ambiguity in NLP tasks like machine translation.
5. Homonymy (Same Spelling & Pronunciation, Different Meaning)
o Definition: Words that look and sound the same but have different meanings.
▪ Example: Bat (animal) vs. Bat (sports equipment)
o Role in Semantic Analysis:
▪ Helps in contextual interpretation of text.
▪ Improves speech recognition system

4. Differentiate between homonymy, polysemy, and synonymy.


Homonymy, polysemy, and synonymy are essential aspects of lexical semantics, which deals with the
relationships between words (lexemes) and their meanings. These phenomena help in understanding how
words are used in different contexts and how their meanings evolve.

Aspect Homonymy Polysemy Synonymy

Words that have the same spelling Different words with


A single word with multiple
Definition and pronunciation but have similar or identical
related meanings.
different, unrelated meanings. meanings.

Meaning No semantic connection between Meanings are related in some Meanings are closely
Relationship meanings. way. related or identical.

Example Bat (animal) vs. Bat (sports Head (of a person) vs. Head Big ↔ Large, Happy
Words equipment) (leader of a company) ↔ Joyful

- He hit the ball with a bat. (sports) - She is the head of the - He owns a big house.
Example
- A bat was flying in the cave. department. (leader) - He was - He owns a large
Sentences
(animal) hit on the head. (body part) house.
Aspect Homonymy Polysemy Synonymy

Causes high ambiguity in machine Requires context Helps in text


Ambiguity in
translation and speech understanding for correct simplification and
NLP
recognition. interpretation. search expansion.

Needs word sense disambiguation Used in query


Requires semantic analysis to
Use in NLP (WSD) to determine the correct expansion and
understand context.
meaning. paraphrasing.

5. Define hyponymy and explain its significance in NLP.


Hyponymy is a semantic relationship between words where the meaning of one word (the hyponym) is
included within the meaning of another word (the hypernym or superordinate). In simpler terms, a
hyponym is a more specific term under a broader category (hypernym).
Examples:
• Rose (hyponym) → Flower (hypernym)
• Apple (hyponym) → Fruit (hypernym)
• Sparrow (hyponym) → Bird (hypernym)
Significance of Hyponymy in NLP
1. Text Classification & Ontologies
o Helps in organizing words into hierarchical structures like WordNet, which aids in semantic search and
knowledge representation.
2. Question Answering Systems
o If a system knows that a sparrow is a bird, it can answer questions like "Is a sparrow an animal?"
correctly.
3. Semantic Search & Information Retrieval
o Allows search engines to return more relevant results by recognizing related concepts (e.g., searching
for “fruit” retrieves apples, bananas, etc.).
4. Machine Translation
o Helps in selecting contextually appropriate translations by understanding word hierarchies.
5. Named Entity Recognition (NER)
o Identifies relationships between entities (e.g., "Amazon" as a company rather than a river).

6. Explain the concept of Word Sense Disambiguation (WSD).


Word Sense Disambiguation (WSD) is the process of determining the correct meaning of a word in a given
context when the word has multiple meanings (i.e., is ambiguous). It is a crucial task in Natural Language
Processing (NLP) to improve text understanding, machine translation, and information retrieval.
Without WSD, an NLP system may misinterpret the meaning, leading to errors in translation,
summarization, or search results.
Methods of WSD
1. Knowledge-Based Approaches
o Uses dictionaries, lexical databases (like WordNet), and ontologies to determine the meaning.
o Example: Lesk Algorithm, which finds the best sense based on word definitions and overlaps.
2. Supervised Machine Learning
o Uses labeled datasets where words are annotated with their correct meanings.
o Example: Naïve Bayes, Decision Trees, Neural Networks trained on contextual word usage.
3. Unsupervised Learning
o Clusters words based on usage patterns without labeled data.
o Example: Word Embeddings, Clustering Algorithms (K-means, LDA).
4. Hybrid Approaches
o Combines knowledge-based and statistical methods to improve accuracy.
Example:
Consider the word "bank", which has multiple meanings:
1. He deposited money in the bank. → (bank = financial institution )
2. She sat by the bank of the river. → (bank = riverbank )
A WSD algorithm analyzes the context (e.g., words like money, deposited vs. river, sat) to correctly
determine the meaning of "bank" in each sentence.

7. What are the dictionary-based approaches to WSD?


Dictionary-based approaches use lexical resources like dictionaries, thesauruses, and semantic networks
(e.g., WordNet) to determine the correct sense of an ambiguous word in a given context.

Key Dictionary-Based WSD Methods


1. Lesk Algorithm
o Compares the overlap between the definitions (glosses) of an ambiguous word and its surrounding words.
o The sense with the most overlapping words is chosen.
o Example: For "bank", the algorithm compares dictionary definitions and selects the one with the most
shared words with the context (e.g., "money, deposit" → financial sense).
2. Extended Lesk Algorithm
o Expands on the Lesk Algorithm by considering the glosses of neighboring words, improving accuracy.
3. Semantic Similarity-Based Methods
o Uses semantic relationships (e.g., synonyms, hypernyms, hyponyms) from resources like WordNet to find
the best sense.
o Example: If “bat” appears in a sentence with “wings” and “fly,” the algorithm selects the animal sense
instead of the sports equipment sense.
Advantages of Dictionary-Based WSD
Does not require training data (unlike machine learning methods).
Works well with rich lexical resources like WordNet.
Good interpretability since definitions provide clear meaning.
Limitations
Depends on dictionary quality – Missing word senses can reduce accuracy.
Computationally expensive – Searching gloss overlaps can be slow.
Limited handling of context – Less effective for long or complex sentences.

8. Describe the structure and importance of WordNet.


WordNet is a lexical database for the English language, developed at Princeton University, that groups
words into sets of synonyms (synsets) and captures their semantic relationships. It serves as a valuable
resource in Natural Language Processing (NLP) for word sense disambiguation (WSD), machine
translation, information retrieval, and text analysis.
Structure of WordNet
WordNet organizes words into a hierarchical structure based on their meanings and relationships.
1. Synsets (Synonym Sets)
• A synset is a group of words with similar meanings.
• Example: {"car", "automobile"} belong to the same synset.
2. Lexical Relations
WordNet defines relationships between words and synsets, including:

Relation Description Example

Synonymy Words with similar meanings fast – quick

Antonymy Words with opposite meanings hot – cold

Hyponymy A specific word (subordinate) rose → flower

Polysemy A word with multiple meaning bank → (financial institution) or (riverbank)

Importance of WordNet in NLP


Word Sense Disambiguation (WSD) – Helps identify the correct meaning of ambiguous words.
Semantic Similarity & Text Understanding – Used for comparing word meanings in chatbots, search
engines, and text analysis.
Machine Translation & Information Retrieval – Improves word mapping across languages.
Question Answering & Knowledge Graphs – Enhances semantic search and AI comprehension.

9. How do noun phrases and verb phrases contribute to sentence meaning?


Noun Phrases (NP) and Verb Phrases (VP) are essential components of sentence structure in syntax and
semantics, helping convey clear and meaningful communication.
1. Noun Phrases (NP)
A Noun Phrase consists of a noun (or pronoun) as its head, along with modifiers such as adjectives,
determiners, or prepositional phrases.
How NPs contribute to meaning?
Identify subjects or objects in a sentence.
Provide specificity (e.g., "a cat" vs. "the black cat on the sofa").
Introduce entities, people, places, or things in a discourse.
Example:
• "The small dog barked loudly." (NP = The small dog) → Identifies the subject.
• "She adopted a cute puppy." (NP = a cute puppy) → Identifies the object.
2. Verb Phrases (VP)
A Verb Phrase contains a main verb and its auxiliaries, objects, and complements, providing information
about actions, states, or events.
How VPs contribute to meaning?
Describe actions or states (e.g., running, is sleeping).
Convey tense, mood, and aspect (e.g., has finished, was going).
Indicate relationships between subject and object (e.g., She gave him a book).
Example:
• "She is reading a novel." (VP = is reading a novel) → Describes an ongoing action.
• "The baby cried loudly." (VP = cried loudly) → Expresses an event.
3. NP + VP = Complete Sentence Meaning
A sentence is structured as:
Sentence (S) → NP + VP
Example:
• "The old man told an interesting story."
o NP: The old man (who is doing the action?)
o VP: told an interesting story (what did he do?)

10. Explain the importance of prepositional phrase attachment in NLP.


Prepositional Phrase (PP) attachment is a crucial problem in Natural Language Processing (NLP) that
involves determining the correct syntactic and semantic relationship of a prepositional phrase (PP) in a
sentence. The correct attachment of a PP can significantly impact sentence meaning and disambiguation.
Why is PP Attachment Important?
1. Ambiguity Resolution
o Incorrect attachment can lead to different meanings of a sentence.
▪ Example: "She saw the man with a telescope."
▪ (1) Did she use a telescope to see the man?
▪ (2) Did she see a man who had a telescope?
o NLP models need to disambiguate such sentences correctly.
2. Improved Parsing Accuracy
o Correct PP attachment improves the performance of parsing algorithms, ensuring accurate
syntactic structures.
3. Enhancing Machine Translation (MT) & Text Generation
o Incorrect PP attachment can lead to grammatically incorrect or misinterpreted translations.
4. Better Named Entity Recognition (NER) & Information Extraction
o Helps in extracting correct relationships in knowledge graphs and databases.
5. Critical for Question Answering Systems & Chatbots
o Helps NLP models understand contextual meanings in user queries.

Example of PP Attachment Ambiguity


• Sentence: "He put the cake on the table with candles."
• Possible interpretations:
1. The cake already had candles on it before being placed on the table.
2. He placed both the cake and candles on the table.

Unit – 5
1. What is text summarization, and how does LEX RANK work?
Text summarization is the process of condensing a large text into a shorter, meaningful version while
preserving its key information. It helps in quick information retrieval and is widely used in news
aggregation, search engines, and document summarization.
There are two main types of text summarization:
1. Extractive Summarization
o Selects important sentences/phrases from the original text.
o Example: LEX RANK, TextRank.
2. Abstractive Summarization
o Generates a summary by paraphrasing and reinterpreting the text.
o Example: Transformer-based models like BERTSUM, T5, GPT.

How Does LEX RANK Work?


LEX RANK is an extractive summarization algorithm based on graph-based ranking. It works by identifying
the most important sentences in a document using a concept similar to PageRank.
Steps in LEX RANK:
1. Sentence Representation:
o Convert text into individual sentences.
o Compute sentence similarity using cosine similarity or TF-IDF.
2. Graph Construction:
o Represent sentences as nodes in a graph.
o Create edges between sentences based on content similarity.
3. Sentence Importance Ranking:
o Use random walk algorithms (like PageRank) to rank sentences based on how frequently
they are referenced.
o Higher-ranked sentences are considered more important.
4. Summary Generation:
o Select top-ranked sentences to form the summary.

2. Explain optimization-based approaches for summarization.


Optimization-based summarization aims to select the best subset of sentences from a document by
maximizing relevance and minimizing redundancy using mathematical optimization techniques. These
approaches formulate summarization as an optimization problem where an objective function is defined
and solved using linear programming, integer programming, or deep learning-based optimization methods.

Key Components of Optimization-Based Summarization


1. Objective Function
o Defines what makes a summary optimal (e.g., maximizing informativeness, coherence, and
diversity while minimizing redundancy).
2. Constraints
o Ensures the generated summary meets specific requirements, such as:
▪ Length constraint (e.g., 100 words max).
▪ Coverage constraint (important topics must be included).
▪ Redundancy constraint (avoid repeating similar information).
3. Optimization Techniques Used
Linear Programming (LP):
o Defines the summarization problem as an LP model where sentences are weighted based on
importance scores.
Integer Linear Programming (ILP):
o Similar to LP but with discrete (binary) decision variables (i.e., select or not select a
sentence).
o Used in systems like SUMBasic and ILP-based Text Summarization.
Submodular Optimization:
o Defines a function that ensures the selection of diverse and informative sentences.
o Used in multi-document summarization to prevent redundancy.
Neural Network-Based Optimization:
o Deep learning models use reinforcement learning (e.g., ROUGE reward optimization) to
generate high-quality summaries.
o Example: Pointer-Generator Networks, Reinforcement Learning-based Summarization (RL-
SUM).
Example of Optimization-Based Summarization
Problem: Summarize a news article while ensuring that the summary is concise, diverse, and
informative.
Solution:
• Define an objective function that maximizes sentence importance while minimizing redundancy.
• Use ILP to find the best subset of sentences that meet the word limit and coverage constraints.
Result: A concise, non-redundant, and informative summary that effectively captures the main ideas.
Advantages of Optimization-Based Approaches
Ensures High Relevance & Diversity – Uses formal constraints to improve summary quality.
Flexible & Customizable – Can adjust parameters for different summarization needs.
Handles Multi-Document Summarization Well – Helps in selecting diverse information from multiple
sources.
Limitations
Computationally Expensive – Solving optimization problems can be slow for large datasets.
Requires Fine-Tuning – Choosing the right constraints and weights is challenging.
Not as Context-Aware as Deep Learning Models – Often relies on predefined scoring rather than true
understanding.

3. How is summarization evaluation performed?


Evaluating text summarization is crucial to ensure quality, coherence, informativeness, and readability.
Summarization evaluation is broadly classified into intrinsic (content-focused) and extrinsic (task-focused)
evaluation methods.
1. Intrinsic Evaluation (Quality & Content Assessment)
These methods directly measure how well the summary represents the original text.
A. Automatic Evaluation Metrics (Fast & Scalable)
Used for comparing summaries against reference summaries using algorithms.

Metric Description Strengths Weaknesses

Measures precision of n-
BLEU (Bilingual Useful for machine Prefers short summaries,
gram matches between
Evaluation translation-based does not capture meaning
system & reference
Understudy) summarization. variations well.
summaries.
Metric Description Strengths Weaknesses

Uses pre-trained BERT Captures semantic meaning


BERTScore embeddings to compare well, good for abstractive Computationally expensive.
sentence similarity. summarization.

B. Human Evaluation (Detailed & Reliable)

Used when quality & coherence need subjective judgment.

Aspect What It Evaluates?

Fluency Is the summary grammatically correct & readable?

Coherence Are the ideas logically connected?

Informativeness Does it capture the most important content?

Relevance Does it avoid unnecessary or off-topic information?

Conciseness Is the summary brief yet meaningful?

Redundancy Does it avoid repeating information?

2. Extrinsic Evaluation (Task-Based Performance)

Measures how well the summary helps in real-world tasks.

Evaluation Method Example Task Purpose

Does the summary help find relevant Measures usefulness for search
Information Retrieval
documents faster? engines.

Can users answer questions based on the


Question-Answering Tests informativeness.
summary?

Reading Does it help users understand the original


Measures clarity and relevance.
Comprehension document?

4. What is text classification, and why is it important in NLP?


Text classification is the process of assigning predefined categories or labels to a given text based on its
content. It is a fundamental Natural Language Processing (NLP) task used for organizing, filtering, and
analyzing text data automatically.
Why is Text Classification Important in NLP?
1. Automates Information Processing
Helps businesses and researchers handle large volumes of text data efficiently.
2. Enhances Search & Retrieval
Improves search engines, recommendation systems, and content filtering.
3. Supports Decision-Making
Enables sentiment analysis for brand monitoring and customer feedback analysis.
4. Powers AI Applications
Essential for chatbots, virtual assistants, and fraud detection.
5. Helps in Content Moderation
Automatically detects toxic comments, hate speech, or inappropriate content.

How is Text Classification Done?


There are two main approaches:
1 Rule-Based Approaches: Use manually created rules & keyword-based patterns.
2 Machine Learning Approaches: Train supervised models (SVM, Naïve Bayes, Random Forest) or deep
learning models (LSTMs, BERT, Transformers) on labeled text data.

Example:
• Spam Detection: Classifying emails as spam or not spam
• Sentiment Analysis: Categorizing customer reviews as positive, negative, or neutral
• Topic Categorization: Classifying news articles into sports, politics, entertainment, etc.

5. Explain the role of NLKT (Natural Language Toolkit) in NLP tasks.


NLTK (Natural Language Toolkit) is a powerful Python library for Natural Language Processing (NLP). It
provides pre-built tools and datasets to help with text analysis, preprocessing, and linguistic computation.

Why is NLTK Important?


• Simplifies text preprocessing (tokenization, stemming, lemmatization).
• Provides corpora and lexical resources (WordNet, stopwords).
• Supports POS tagging, parsing, and named entity recognition (NER).
• Enables text classification and sentiment analysis.
• Works well for academic research and prototyping NLP models.

Key NLP Tasks Using NLTK


1. Tokenization – Splitting text into words or sentences.
2. Stemming & Lemmatization – Reducing words to root forms.
3. POS Tagging – Assigning Part-of-Speech tags to words.
4. Named Entity Recognition (NER) – Identifying people, places, etc.
5. Text Classification – Categorizing text into labels (spam, sentiment, etc.).
6. Sentiment Analysis – Detecting emotions in text.
7. Parsing & Syntax Analysis – Understanding sentence structure.
Important Classes and Functions in NLTK
1. Tokenization
• word_tokenize() → Splits text into words.
• sent_tokenize() → Splits text into sentences.
2. Stopwords Removal
• stopwords.words() → Provides common stopwords in different languages.
3. Stemming & Lemmatization
• PorterStemmer() → Applies stemming to words.
• WordNetLemmatizer() → Performs lemmatization using WordNet.
4. Part-of-Speech (POS) Tagging
• pos_tag() → Tags words with their grammatical roles.

6. Define sentiment analysis and its applications.


Sentiment Analysis (also called Opinion Mining) is a Natural Language Processing (NLP) technique used to
determine the emotional tone behind a piece of text. It classifies text as positive, negative, or neutral and
can also detect more nuanced emotions like joy, anger, sadness, or excitement.
How Does Sentiment Analysis Work?
• Uses machine learning (ML), deep learning (DL), and lexicon-based approaches.
• Analyzes text from reviews, social media, feedback, and customer support.
• Helps businesses understand public opinion, brand perception, and customer satisfaction.
Applications of Sentiment Analysis
1. Social Media Monitoring
• Analyzes public sentiment towards brands, products, or events.
• Tracks customer reactions on platforms like Twitter, Facebook, and Instagram.
2. Customer Feedback & Reviews Analysis
• Helps businesses analyze product/service reviews (e.g., Amazon, Yelp, Google Reviews).
• Identifies customer pain points and improves service quality.
3. Brand Reputation Management
• Monitors brand mentions and user sentiment in real-time.
• Helps companies respond to negative feedback proactively.
4. Financial Market Prediction
• Analyzes news articles, earnings reports, and social media sentiment.
• Assists in predicting stock trends based on public sentiment.
5. Healthcare and Patient Feedback Analysis
• Analyzes patient reviews to improve healthcare services.
• Detects emotional distress in mental health applications.
Example of Sentiment Analysis
Input Review: "I love the new iPhone! The camera is amazing, and the battery lasts all day."
Sentiment Analysis Output:
Sentiment: Positive
Confidence Score: 92%

7. What are affective lexicons, and how are they used in sentiment analysis?
Affective lexicons are specialized dictionaries of words and phrases that are assigned emotional or
sentiment scores based on their meaning and intensity. These lexicons help determine the affective
(emotional) state conveyed in a text by mapping words to emotions such as happiness, anger, sadness,
fear, surprise, and disgust.
Key Features of Affective Lexicons:
• Contain words labeled with sentiment polarity (positive, negative, neutral).
• Assign emotion intensity scores to words.
• Used in rule-based and hybrid sentiment analysis approaches.
How Are Affective Lexicons Used in Sentiment Analysis?
1 Word-Level Sentiment Scoring
• Each word in a text is matched with its sentiment score from the lexicon.
• Example: “happy” → Positive (+0.9), “terrible” → Negative (-0.8).
2 Sentence-Level Sentiment Analysis
• Aggregates word sentiment scores to determine overall sentence sentiment.
• Example:
o "The movie was fantastic, but the ending was sad."
o Words: fantastic (+0.9), sad (-0.5) → Overall sentiment: Neutral/Positive.
3️ Emotion Detection
• Lexicons help classify text into specific emotions like joy, anger, or fear.
• Example: "I am thrilled about the new project!" → Emotion: Joy.
4️ Aspect-Based Sentiment Analysis (ABSA)
• Helps identify sentiment towards specific aspects of a product/service.
• Example: "The battery life is excellent, but the camera is poor."
o Battery life → Positive, Camera → Negative.
Example of Sentiment Analysis Using Affective Lexicons
Sentence: "I love this new laptop! The screen is amazing, but the battery drains fast."
Lexicon-Based Sentiment Score: • drains (-0.6) → Negative
• love (+0.9) → Positive • fast (neutral) → No sentiment
• amazing (+0.8) → Positive Overall Sentiment: Positive
8. Explain the concept of aspect-based sentiment analysis.
Aspect-Based Sentiment Analysis (ABSA) is an advanced form of sentiment analysis that focuses on
identifying the sentiment polarity (positive, negative, or neutral) towards specific aspects or features of an
entity, rather than analyzing the overall sentiment of a text.
Key Idea: Instead of classifying an entire review as positive or negative, ABSA determines which part of
the entity (product, service, etc.) is being praised or criticized.

How ABSA Works?


1.Aspect Extraction
• Identifies the specific aspects or features mentioned in the text.
• Example: In "The camera is amazing, but the battery life is terrible.", the aspects are camera and
battery life.
2. Sentiment Classification
• Determines whether the sentiment towards each aspect is positive, negative, or neutral.
3️. Opinion Target Association
• Connects opinion words (e.g., "amazing", "terrible") with their respective aspects.
• Helps in identifying which aspect the sentiment belongs to.

Example of ABSA
Customer Review:
"The hotel room was spacious and clean, but the WiFi was slow."
ABSA Breakdown:

Aspect Sentiment Opinion

Room Positive Spacious, Clean

WiFi Negative Slow

Overall Sentiment: Mixed (Room is good, WiFi is bad).


Applications of ABSA
Product Reviews Analysis
• Helps e-commerce platforms (Amazon, Flipkart) analyze customer opinions on specific product
features.
Social Media Monitoring
• Brands use ABSA to track customer feedback on platforms like Twitter, Instagram.
Market Research & Competitor Analysis
• Helps companies compare products and understand strengths & weaknesses.
9. What is Named Entity Recognition (NER), and why is it important in NLP?
Named Entity Recognition (NER) is a subtask of Natural Language Processing (NLP) that involves identifying
and classifying named entities in text into predefined categories such as persons, organizations, locations,
dates, numerical values, and more.

Why is NER Important?


Information Extraction
• Extracts key information from unstructured text (e.g., news articles, legal documents, research papers).
Improving Search Engines
• Helps search engines understand entities, making search results more relevant.
Chatbots & Virtual Assistants
• Enhances AI assistants (e.g., Siri, Alexa) by recognizing names, dates, locations in queries.
Finance & Business Intelligence
• Identifies company names, stock symbols, monetary values, useful for risk analysis.
Healthcare & Biomedical Text Mining
• Extracts diseases, drug names, symptoms from medical literature.

Types of Named Entities in NER

Entity Type Example Entity Type Example

PERSON Barack Obama, Elon Musk DATE January 1, 2025, Monday

ORGANIZATION Google, NASA, Tesla TIME 5:00 PM, Midnight

LOCATION New York, France, Asia

NER Techniques in NLP

1. Rule-Based Methods
• Uses patterns & dictionaries (e.g., regex for identifying dates).

2. Machine Learning-Based NER


• Uses models like CRF (Conditional Random Fields), HMM (Hidden Markov Models).

3. Deep Learning-Based NER


• Uses LSTMs, Transformers (BERT, SpaCy, Stanford NER) for advanced entity recognition.
Unit – 1
2. Describe the generic architecture of an NLP system.
A Natural Language Processing (NLP) system follows a structured pipeline to process and understand
human language. The architecture consists of several components that transform raw text into meaningful
insights. Below is a generic architecture of an NLP system, along with its key components:

1. Text Input (Data Acquisition)


• Raw text is obtained from documents, web pages, social media, chatbots, or speech-to-text
conversions.
• Data sources include structured (databases) and unstructured (news articles, books, emails, etc.)
formats.

2. Text Preprocessing (Data Cleaning)


Before analysis, raw text must be cleaned and transformed into a structured format. Common steps
include:

✔ Tokenization – Splitting text into words or sentences.


✔ Stopword Removal – Removing common words like "the", "is", "and".
✔ Lowercasing – Converting text to lowercase for uniformity.
✔ Lemmatization & Stemming – Reducing words to their root forms (running → run).
✔ Punctuation & Special Character Removal – Removing unnecessary symbols.
✔ Part-of-Speech (POS) Tagging – Assigning grammatical roles (noun, verb, adjective).

3. Syntactic Processing (Parsing)


This step focuses on sentence structure and grammar analysis:

Dependency Parsing – Identifying relationships between words.


Constituency Parsing – Breaking sentences into hierarchical substructures.
Example:
"The cat sat on the mat."
Dependency Parsing Output:
• "cat" → Subject of "sat"
• "on the mat" → Prepositional phrase modifying "sat"

4. Semantic Analysis (Meaning Extraction)


Understanding the meaning of words and sentences:

✔ Named Entity Recognition (NER) – Identifies names, dates, locations, organizations.


✔ Word Sense Disambiguation (WSD) – Resolves ambiguity in word meanings.
✔ Semantic Role Labeling (SRL) – Determines who did what to whom in a sentence.
Example:
"Apple is planning to release a new iPhone."
NER Output:
• "Apple" → ORGANIZATION
• "iPhone" → PRODUCT

5. Pragmatic Analysis (Context Understanding)


This step considers real-world knowledge, tone, and intent:

Sentiment Analysis – Determines positive, negative, or neutral emotions.


Coreference Resolution – Identifies references to the same entity ("John" = "he" in later sentences).
Speech Act Recognition – Detects whether a sentence is a statement, question, request, or command.
Example:
"John bought a Tesla. He loves it!"
✔ Coreference Resolution Output:
• "He" → John
• "it" → Tesla

6. Output Generation (Final Processing)


The system produces output based on NLP tasks such as:

✔ Text Classification – Categorizing emails as spam or ham.


✔ Machine Translation – Converting text between languages (e.g., Google Translate).
✔ Text Summarization – Extracting key points from documents.
✔ Chatbot Responses – Generating meaningful replies.
✔ Information Retrieval – Extracting relevant data from databases.

4. How does knowledge play a role in language processing?


Knowledge plays a crucial role in Natural Language Processing (NLP) as it helps machines understand,
interpret, and generate human language more effectively. NLP systems rely on different types of
knowledge to process language accurately. Below are the key aspects of how knowledge contributes to
language processing:

5. Explain the concept of ambiguity in natural language with examples.


Ambiguity in natural language refers to situations where a word, phrase, or sentence has multiple possible
interpretations. This creates challenges for both humans and machines in understanding the intended
meaning correctly. Ambiguity is a fundamental problem in Natural Language Processing (NLP) because
human languages are often context-dependent.

10. Why is ambiguity a major challenge in NLP? Provide examples.


Ambiguity is one of the biggest challenges in Natural Language Processing (NLP) because human languages
are inherently complex, context-dependent, and open to multiple interpretations. Ambiguity affects
various NLP tasks like machine translation, sentiment analysis, information retrieval, and chatbot
responses, leading to inaccurate or unintended results.
Key Reasons Why Ambiguity is Challenging in NLP

1. Multiple Interpretations
• NLP models often struggle to determine the intended meaning of words, phrases, or sentences.
• Machines lack human intuition and world knowledge to resolve ambiguity effectively.
Example (Lexical Ambiguity):
• "The bank is closed today."
o Bank (financial institution)
o Bank (riverbank)
Challenge: Without context, NLP models may misinterpret "bank" incorrectly.
2. Complexity in Sentence Structure
• Syntactic ambiguity arises when a sentence can be parsed in multiple ways, leading to different
meanings.
Example (Syntactic Ambiguity):
• "I saw the man with a telescope."
o Did I use a telescope to see the man?
o Or did the man have a telescope?
Challenge: Parsing algorithms may struggle to determine the correct sentence structure.
3. Meaning Changes Based on Context
• Semantic ambiguity occurs when words or phrases have different meanings depending on the
context.
Example (Semantic Ambiguity):
• "He is looking for a match."
o Match (for lighting a fire)
o Match (sports event)
o Match (romantic partner)
Challenge: NLP models need context-aware mechanisms (e.g., transformer-based models like BERT) to
infer the correct meaning.
4. Pronoun Reference Issues
• Anaphoric ambiguity occurs when pronouns can refer to multiple entities, making coreference
resolution difficult.
Example (Anaphoric Ambiguity):
• "John told Mike that he won the game."
o Who won? John or Mike?
Challenge: NLP systems must accurately link pronouns to the correct nouns using coreference
resolution techniques.
5. Real-World Knowledge & Pragmatics
• Pragmatic ambiguity arises when sentences require external world knowledge to interpret
correctly.
Example (Pragmatic Ambiguity):
• "Can you pass the salt?"
o Literal Meaning: Are you physically capable of passing the salt?
o Intended Meaning: Please pass me the salt.
Challenge: NLP models must infer speaker intentions using context and common sense reasoning.
Impact of Ambiguity in NLP Applications

NLP Task Challenge Due to Ambiguity

Machine Translation Words with multiple meanings can lead to incorrect translations.

Sentiment Analysis Sarcasm or ambiguous words may cause misclassification.

Chatbots & Virtual Assistants Ambiguous inputs may result in irrelevant responses.

Search Engines Wrong interpretation can return inaccurate search results.

You might also like