0% found this document useful (0 votes)

31 views37 pages

Notes

The document discusses compositional semantics and word sense disambiguation (WSD), highlighting challenges such as word sense ambiguity and various approaches to resolve it, including rule-based, unsupervised, and supervised methods. It categorizes types of word sense ambiguity into homonymy, polysemy, and categorial ambiguity, detailing their implications for semantic interpretation and disambiguation strategies. Additionally, it addresses the limitations of N-gram models in modern NLP, emphasizing the shift towards neural models for better context handling and semantic understanding.

Uploaded by

praneethakalva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views37 pages

Notes

Uploaded by

praneethakalva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Compositional Semantics & Word Sense Disambiguation (WSD)

Your discussion highlights key challenges in compositional semantics, where the meaning of a
sentence is built from the meanings of its parts (words). A major hurdle is word sense ambiguity:
the same word (nail) can evoke different meanings depending on context. Below is a structured
breakdown of the issues and approaches you mentioned.

1. The Problem of Word Sense

 Is there a finite set of senses per word?

o Lexicographers (e.g., in dictionaries) attempt to list senses, but real-world usage is

fluid.

o Some theories (e.g., Prototype Theory) argue that word meanings are fuzzy, with
central and peripheral senses.

o Distributional Semantics (e.g., word embeddings) suggests meanings are continuous

rather than discrete.

 Does a word evoke a single sense or a distribution?

o Traditional WSD assumes one correct sense per context (discrete).

o Probabilistic models (e.g., Bayesian WSD) assign probabilities to possible senses.

o Neural approaches (e.g., BERT) generate contextualized embeddings where

meaning is a gradient.

2. Approaches to Word Sense Disambiguation (WSD)

(A) Rule-Based & Knowledge-Based Methods

 Early systems relied on:

o Dictionaries (e.g., WordNet) listing predefined senses.

o Selectional restrictions (e.g., hammer → tool sense of nail).

 Example: Lesk Algorithm (1986) compares dictionary glosses of nearby words.

(B) Unsupervised Methods (Word Sense Induction)

 Clustering (e.g., k-means) groups similar word usages into "induced senses."

 Topic Modeling (e.g., LDA) treats senses as latent topics.

 Advantage: Adaptable to new domains without labeled data.

(C) Supervised & Semi-Supervised Learning

 Classic ML (e.g., SVM, Decision Trees) trained on sense-annotated corpora (e.g., SemCor).

 Neural WSD:
o Uses contextual embeddings (e.g., BERT) to predict the most likely sense.

o Can rank senses by probability (soft disambiguation).

 Limitation: Requires large labeled datasets; struggles with rare senses.

3. Example: Disambiguating "Nail"

Your examples show how co-occurring words disambiguate nail:

Sentence Trigger Words Sense of "Nail"

1. ...with a hammer hammer Metallic fastener

2. ...from the hardware store hardware store Metallic fastener

3. ...to get his nails clipped beauty salon, clipped Fingernail

4. ...to get a manicure manicure Fingernail

Here, selectional preferences (e.g., hammer only fits the tool sense) resolve ambiguity.

4. Challenges in WSD

 Granularity of senses: Should nail (body part) and nail (metal spike) be separate senses, or
sub-senses?

 Domain dependence: Nail in construction vs. beauty contexts.

 Dynamic meanings: New senses emerge (e.g., "nail it" = succeed).

5. Integration with Compositional Semantics

For full semantic interpretation, WSD interacts with:

1. Syntax: Parse trees help rule out implausible meanings.

2. Pragmatics: World knowledge (e.g., salons don’t sell metal nails).

3. Discourse: Prior mentions (e.g., "He bought nails" primes the hardware sense).

Key Takeaways

 Word sense is context-dependent and may not be strictly discrete.

 Hybrid approaches (knowledge + machine learning) often work best.

 Modern NLP leans toward distributional semantics (embeddings) over rigid sense
inventories.

Types of Word Sense Ambiguity in Compositional Semantics

Your discussion highlights three primary types of word sense ambiguity: homonymy, polysemy,
and categorial ambiguity. Below is a structured breakdown of each type, along with how they
impact semantic interpretation.

1. Homonymy

Definition:

 Two or more unrelated meanings share the same spelling and pronunciation.

 The senses are orthogonal (no semantic connection).

Examples:

Word Sense 1 Sense 2

bank Financial institution Side of a river

bat Flying mammal Sports equipment

nail Metallic fastener Fingernail

Key Properties:

 Discrete meanings: No overlap between senses.

 Requires strong contextual cues for disambiguation (e.g., "withdraw money from the
bank" vs. "swim near the river bank").

2. Polysemy

Definition:

 A word has multiple related senses, often derived from a core meaning.

 Unlike homonymy, the senses are connected metaphorically or metonymically.

Examples:

Word Core Sense Extended Senses

bank Financial institution Bank of clouds (collection)

head Body part Head of a company (leader)

mouse Small rodent Computer mouse (shape analogy)

Key Properties:

 Graded meaning shift: Some senses are closer to the core meaning than others.

 Often requires world knowledge (e.g., "bank of clouds" implies a stacked formation, similar
to how money is stored in a bank).
 More challenging for machines than homonymy because boundaries between senses are
fuzzy.

3. Categorial Ambiguity

Definition:

 A word belongs to different grammatical categories (e.g., noun vs. verb) with distinct
meanings.

 Unlike homonymy/polysemy, syntactic role alone can resolve the ambiguity.

Examples:

Word Noun Sense Verb Sense

book A readable volume To book a ticket (reserve)

fish Aquatic animal To fish for compliments (seek)

light Illumination source To light a candle (ignite)

Key Properties:

 Resolvable via part-of-speech (POS) tagging (e.g., if book is tagged as a verb, it cannot mean
a physical book).

 Easier for NLP systems because syntax provides clear disambiguation cues.

How These Ambiguities Impact Semantic Interpretation

1. Disambiguation Strategies

Type Resolution Method Example

Strong contextual cues (lexical, world

Homonymy "River bank" vs. "Bank account"
knowledge)

Polysemy Semantic similarity, metaphor detection "Bank of data" (storage metaphor)

Categorial Syntactic parsing (POS tags, dependency "Book a flight" (verb) vs. "Read a
Ambiguity trees) book" (noun)

2. Challenges in NLP

 Homonymy & Polysemy:

o Rule-based systems (e.g., WordNet) struggle with fine-grained sense distinctions.

o Neural models (e.g., BERT) handle polysemy better by generating contextualized

embeddings.

 Categorial Ambiguity:

o Solved early in the NLP pipeline (POS tagging), but errors propagate if misclassified.
3. Theoretical Implications

 Lexical Semantics: Should word senses be discrete (homonymy-style) or continuous

(polysemy-style)?

 Cognitive Linguistics: Do humans store all senses separately, or derive them dynamically?

Conclusion

 Homonymy: Unrelated meanings (hard disambiguation).

 Polysemy: Related, extended meanings (graded disambiguation).

 Categorial Ambiguity: Resolved via syntax (easiest for machines).

Modern NLP systems (e.g., transformer models) blend distributional semantics (embeddings)
with symbolic knowledge (WordNet) to handle these ambiguities. However, polysemy remains the
hardest due to its fluid, context-dependent nature.

Would you like a deeper dive into how BERT or FrameNet handles these ambiguities?

N-Gram Models and Their Limitations: A Deep Dive

1. What is an N-Gram Model?

An N-gram model is a probabilistic language model that predicts the next word in a sequence based
on the previous N−1N−1 words. It relies on the Markov assumption, which simplifies language by
assuming that the probability of a word depends only on a fixed window of prior words.

Types of N-Gram Models

Model Context Window Example

Unigram N=1N=1 (no context) P("cat")P("cat")

Bigram N=2N=2 (1 previous word) P("sat"∣"cat")P("sat"∣"cat")

Trigram N=3N=3 (2 previous words) P("on"∣"cat sat")P("on"∣"cat sat")

Higher N N>3N>3 (more context) Rarely used due to sparsity.

Probability Calculation

For a bigram model:

P(wn∣wn−1)=Count(wn−1,wn)Count(wn−1)P(wn∣wn−1)=Count(wn−1)Count(wn−1,wn)

 Example: If "cat sat" appears 100 times and "cat" appears 200
times, P("sat"∣"cat")=0.5P("sat"∣"cat")=0.5.

2. Key Limitations of N-Gram Models

(1) Data Sparsity

 Problem: Many possible word combinations (n-grams) never appear in training data, leading
to zero probabilities.

o Example: The phrase "purple unicorn dances" may never occur,

making P("dances"∣"purple unicorn")=0P("dances"∣"purple unicorn")=0.

 Solutions:

o Smoothing (e.g., Laplace, Good-Turing): Assign small probabilities to unseen n-

grams.

o Backoff: Use lower-order n-grams (e.g., bigrams if trigrams are missing).

(2) Computational and Memory Constraints

 Problem: Storing all possible n-grams requires exponential memory (e.g., a 5-gram model
over 50k words needs 50k550k5 entries).

 Solutions:

o Pruning: Discard rare n-grams.

o Hashing: Compress n-gram tables.

(3) No Long-Range Dependencies

 Problem: N-grams only consider local context (e.g., bigrams ignore words beyond the
immediate predecessor).

o Example: In "The cat that I adopted yesterday sat on the mat," the relationship
between "cat" and "sat" is lost in a bigram model.

 Solution: Use RNNs/LSTMs or Transformers (e.g., BERT) to capture long-range context.

(4) Out-of-Vocabulary (OOV) Words

 Problem: New/unseen words break the model (e.g., slang, rare terms).

 Solutions:

o Subword tokenization (e.g., Byte-Pair Encoding in BERT).

o Unknown word tokens (e.g., <UNK>).

(5) Lack of Semantic Understanding

 Problem: N-grams treat words as discrete symbols, ignoring meaning.

o Example: "Bank" (financial vs. river) cannot be disambiguated without deeper

context.

 Solutions:

o Word embeddings (Word2Vec, GloVe) for semantic similarity.

o Contextual embeddings (BERT, GPT) for dynamic word sense disambiguation.

3. Why Modern NLP Moved Beyond N-Grams

Aspect N-Gram Models Neural Models (e.g., BERT)

Context Handling Fixed window (local) Full-sentence attention (global)

Sparsity Prone to zero probabilities Generalizes via embeddings

Memory Grows exponentially with NN Compact (shared embeddings)

Semantics No word meaning Captures polysemy, synonyms, and syntax

4. When Are N-Grams Still Useful?

Despite limitations, n-grams are still used for:

1. Simple tasks (e.g., spell-checking, autocomplete).

2. Baseline models in NLP research.

3. Feature extraction in hybrid systems (e.g., n-grams + ML classifiers).

5. Example: N-Gram vs. BERT for "Bank" Disambiguation

 N-Gram: Fails to distinguish "bank" in:

o "Deposit money at the bank." (financial)

o "Fish near the river bank." (geographical)

 BERT: Uses contextual attention to infer the correct sense.

Conclusion

N-gram models are simple but limited by sparsity, fixed context, and lack of semantics. Modern NLP
leverages neural networks (Transformers, embeddings) to overcome these issues, though n-grams
remain useful for lightweight applications.

Next Steps: Explore Word2Vec, BERT, or smoothing techniques in detail? Let me know!

reference resolution techniques

Reference Resolution Techniques in Natural Language Processing (NLP)

Reference resolution is the task of identifying what entities or concepts pronouns, demonstratives,
or other referring expressions point to in a discourse. It is crucial for tasks like coreference
resolution, anaphora resolution, and discourse understanding. Below are key techniques and their
applications.

1. Types of Reference Resolution

(A) Anaphora Resolution

 Resolves pronouns (e.g., he, she, it) to their antecedents.

o "John left early because he was tired." → "he" = John

(B) Cataphora Resolution

 Resolves references where the pronoun appears before the antecedent.

o "When he arrived, John sat down." → "he" = John

(C) Coreference Resolution

 Links all mentions of the same entity in a text.

o "Mary called her friend. She was happy." → "her", "she" = Mary

(D) Bridging References

 Resolves references where an entity is indirectly related to another.

o "I entered the room. The ceiling was high." → "the ceiling" belongs to "the room"

2. Key Techniques for Reference Resolution

(1) Rule-Based & Syntax-Driven Methods

 Morphological & Syntactic Constraints

o Pronouns must agree in gender, number, and grammatical role.

 "The doctor finished her shift." → "her" must refer to a female doctor.

o Uses parse trees to check grammatical relationships.

 Hobbs' Algorithm (1978)

o A classic rule-based method that traverses a syntactic tree to find the closest
compatible antecedent.

(2) Machine Learning & Statistical Approaches

 Feature-Based Models

o Uses supervised learning (e.g., SVM, decision trees) with features like:

 Distance (how far the pronoun is from the candidate antecedent).

 Grammatical role (subject/object preference).

 Semantic compatibility (e.g., "The car… it" vs. "The car… she").

 Mention-Pair Models

o Treats coreference as a binary classification problem:

 Does "he" refer to "John"? (Yes/No)

(3) Neural & Deep Learning Approaches

 End-to-End Neural Coreference Models

o Lee et al. (2017) – Uses span representations to predict coreference clusters.

o BERT-based models (e.g., CorefBERT) leverage contextual embeddings for better

disambiguation.

 Transformer-Based Models (e.g., SpanBERT)

o Improves performance by masking contiguous word spans during pretraining.

(4) Hybrid & Knowledge-Based Methods

 Combines rules + ML for robustness.

 Uses ontologies (WordNet, FrameNet) to resolve semantic ambiguities.

o "The bank collapsed. It was near the river." → WordNet helps distinguish financial
vs. river bank.

3. Challenges in Reference Resolution

Challenge Example Solution

Ambiguous BERT-style contextual

"John told Bill he won." (Who won?)
Antecedents disambiguation

"She left and Ø closed the door." (Implied

Zero Anaphora Discourse modeling
subject)

"John and Mary met. They left." (Plural

Split Antecedents Entity clustering
reference)

World Knowledge "The iPhone crashed. It needed a Knowledge graphs (e.g.,

Needed reboot." (What is "it"?) Wikidata)

4. Evaluation Metrics

 MUC (1995) – Measures link-based coreference chains.

 B³ (2010) – Evaluates mention-based clustering.

 CEAF (2005) – Uses entity alignment metrics.

 BLANC (2012) – Focuses on link prediction.

5. Applications

1. Machine Translation (Resolving pronouns across languages).

2. Summarization (Tracking entities in long documents).

3. Dialogue Systems (Understanding user references in chatbots).

4. Question Answering (Resolving "Who does 'he' refer to?").

6. Tools & Libraries

Tool Description

Stanford CoreNLP Rule-based + statistical coreference.

spaCy (neuralcoref) Neural coreference resolution.

AllenNLP (CorefModel) State-of-the-art BERT-based coreference.

HuggingFace (CorefBERT) Pretrained transformer models.

7. Future Directions

 Multimodal Coreference (Resolving references in text + images/video).

 Cross-Lingual Coreference (Handling references in multilingual texts).

 Explainable AI (Why did the model choose an antecedent?).

Conclusion

Reference resolution is essential for coherent NLP systems, evolving from rule-based to neural
methods. While transformers (BERT, GPT) now dominate, challenges like ambiguity and world
knowledge integration remain open problems.

Would you like a deeper dive into a specific technique (e.g., Hobbs' algorithm or SpanBERT)? Let
me know!

cohesion vs coherence in discourse

Cohesion vs. Coherence in Discourse: Key Differences

In discourse analysis, cohesion and coherence are fundamental concepts that determine how well a
text holds together. While they are related, they address different aspects of textual unity.

1. Cohesion: Surface-Level Connections

Definition:
Cohesion refers to the grammatical and lexical devices that link sentences and clauses explicitly,
making the text structurally connected.

Types of Cohesive Devices

Type Example Function

Reference "John arrived. He was late." Pronouns (he, she, it) refer back to nouns.

Substitution "I prefer tea. Do you want some?" Replaces a word (some = some tea).

Ellipsis "Who took the book? Mary [did]." Omits redundant words.

Conjunctions "It rained, so we stayed home." Links clauses (so, because, however).

Lexical Cohesion "The cat purred. The animal slept." Synonyms/repetition tie ideas together.

Key Point:
Cohesion is explicit—it relies on visible linguistic markers that create formal connections between
sentences.

2. Coherence: Meaningful Logical Flow

Definition:
Coherence refers to the underlying logical consistency and meaningful organization of ideas in a
text, making it understandable to the reader.

Elements of Coherence

Aspect Example Function

"Global warming is a crisis. It threatens

Topic Continuity Maintains a central theme.
ecosystems."

Logical Order "First, boil water. Next, add tea leaves." Sequences ideas chronologically.

Explains relationships between

Cause-Effect "She forgot her keys, so she was locked out."
events.

World Readers infer causation via

"The CEO resigned. Stocks plummeted."
Knowledge context.

Key Point:
Coherence is implicit—it depends on the reader’s ability to infer connections, even if explicit
markers (like conjunctions) are missing.

3. Key Differences

Feature Cohesion Coherence

Nature Surface-level (linguistic links) Deep-level (logical meaning)

Visibility Explicit (grammar/lexical ties) Implicit (reader inference)

Dependence Needs cohesive markers Can exist without cohesive devices

Example "She left. Then, he cried." "She left. He was heartbroken."

Analogy:

 Cohesion = The bricks and mortar holding a wall together.

 Coherence = The architectural design ensuring the wall stands logically.

4. Why Both Matter

 A text can be cohesive but not coherent:

o "The sun is hot. Coffee is bitter. She runs fast." (Grammatically linked but illogical).

 A text can be coherent but not cohesive:

o "Rain. Floods. Evacuations." (No conjunctions/pronouns, but the meaning is clear).

Best Practices:

 Use cohesive devices to guide readers (e.g., "However, in contrast").

 Ensure coherence through logical flow (e.g., problem → solution).

5. Applications in NLP

 Machine Translation: Must preserve cohesion (pronoun resolution) and coherence (logical
order).

 Text Summarization: Cohesive chains (entity repetition) help retain key ideas.

 Chatbots: Need coherence to maintain conversational logic.

Tools for Analysis:

 Cohesion: Use Coreference Resolution (e.g., spaCy’s neuralcoref).

 Coherence: Measure topic modeling (e.g., LDA) or entity grid models.

Conclusion

 Cohesion = "How" ideas are linked (grammar/lexis).

 Coherence = "Why" ideas make sense (logic/meaning).

Strong discourse requires both.

Want to explore how to measure coherence computationally? Let me know!

what is perplexity in language modelling how is it used to evaluate language models

Perplexity in Language Modeling: Definition and Evaluation

Perplexity is a key metric used to evaluate how well a language model predicts a sequence of
words. It measures the uncertainty or "confusion" of the model when assigning probabilities to
unseen text. Lower perplexity indicates better performance.
1. Mathematical Definition

Perplexity (PP) is defined as the exponential of the cross-entropy loss:

Perplexity=exp⁡(−1N∑i=1Nlog⁡P(wi∣w<i))Perplexity=exp(−N1i=1∑NlogP(wi∣w<i))

where:

 NN = number of words in the test corpus,

 P(wi∣w<i)P(wi∣w<i) = probability assigned by the model to the ii-th word given its context.

Intuition Behind the Formula

 If the model assigns high probability to the correct next word, perplexity decreases.

 If the model is uncertain (low probability for correct words), perplexity increases.

Example:

 A perplexity of 20 means the model was as "confused" as if it had to choose uniformly

among 20 possible words at each step.

2. How Perplexity Evaluates Language Models

(A) Lower Perplexity = Better Model

 A lower perplexity means the model is more confident in its predictions.

 Example:

o Model A (Perplexity = 50) is worse than Model B (Perplexity = 30).

(B) Comparing Different Models

 Used to compare:

o N-gram models vs. neural models (e.g., LSTM vs. Transformer).

o Different training datasets (e.g., Wikipedia vs. Twitter data).

o Hyperparameter tuning (e.g., model size, learning rate).

(C) Limitations of Perplexity

1. Does Not Measure Fluency or Coherence

o A model may achieve low perplexity but generate nonsensical text (e.g., GPT-2
memorizing training data).

2. Corpus-Dependent

o Perplexity values are only meaningful when comparing models on the same test set.

3. Biased Towards Frequent Words

o Models may optimize for predicting common words (e.g., "the," "a") rather than
rare but meaningful terms.

3. Practical Example

Suppose a bigram language model is tested on the sentence:

"The cat sat on the mat."

Word (wiwi) Previous Word (wi−1wi−1) P(wi∣wi−1)P(wi∣wi−1) −log⁡P(wi∣wi−1)−logP(wi∣wi−1)

"cat" "The" 0.4 0.916

"sat" "cat" 0.3 1.204

"on" "sat" 0.6 0.511

"the" "on" 0.5 0.693

"mat" "the" 0.2 1.609

Perplexity Calculation:

Cross-entropy=0.916+1.204+0.511+0.693+1.6095=0.9866Cross-
entropy=50.916+1.204+0.511+0.693+1.609
=0.9866Perplexity=e0.9866≈2.68Perplexity=e0.9866≈2.68

This means the model is roughly as uncertain as choosing between ~2.7 words at each step.

4. Perplexity in Modern NLP

 Transformers (GPT, BERT): Achieve single-digit perplexity on benchmark datasets (e.g.,

WikiText-2, Penn Treebank).

 N-gram Models: Typically have much higher perplexity (e.g., 100+ for trigram models).

 Human Baseline: Estimated to be ~10-20 (humans also face ambiguity in language).

5. When to Use Perplexity?

Use Case Alternative Metrics

Pre-training Evaluation Perplexity

Fine-tuning Tasks BLEU, ROUGE (for translation/summarization)

Human-like Text Human evaluation (e.g., fluency, coherence)

6. Key Takeaways

1. Perplexity measures how "surprised" a model is by unseen text.

2. Lower values = better predictive performance.

3. Useful for comparing models but doesn’t guarantee good text generation.

4. Works best alongside other metrics (e.g., BLEU, human judgment)

Smoothing Techniques in N-Gram Models

Smoothing is a crucial technique in N-gram language modeling to handle unseen word

sequences (zero-probability issues) and improve generalization. Below are the key methods, ranked
from simplest to most advanced:

1. Why Smoothing is Needed?

Problem:

 N-gram models assign zero probability to unseen N-grams (due to data sparsity).

 Example: If "cat sat" never appeared in training, P("sat"∣"cat")=0P("sat"∣"cat")=0.

Solution:
Smoothing redistributes probability mass to unseen events while discounting seen ones.

2. Types of Smoothing Techniques

(1) Laplace (Add-One) Smoothing

 Idea: Add 1 to all N-gram counts (including unseen ones).

 Formula:

P(wi∣wi−1)=Count(wi−1,wi)+1Count(wi−1)+VP(wi∣wi−1)=Count(wi−1)+VCount(wi−1,wi)+1

o VV = Vocabulary size (total unique words).

 Pros: Simple.

 Cons: Over-smoothes (assigns too much mass to rare/unseen N-grams).

Example:

 If "cat sat" occurred 15 times and "cat" occurred 50 times, with V=1000V=1000:

PLaplace("sat"∣"cat")=15+150+1000≈0.015PLaplace("sat"∣"cat")=50+100015+1≈0.015

(2) Add-K Smoothing (Lidstone Smoothing)

 Improvement: Add a fractional kk (e.g., 0.5) instead of 1.

 Formula:

P(wi∣wi−1)=Count(wi−1,wi)+kCount(wi−1)+k⋅VP(wi∣wi−1)=Count(wi−1)+k⋅VCount(wi−1,wi)+k
 Pros: More flexible than Laplace.

 Cons: Requires tuning kk.

(3) Good-Turing Smoothing

 Idea: Estimate the probability of unseen N-grams using the frequency of N-grams seen
once ("singletons").

 Formula:

PGT=N1N(for unseen N-grams)PGT=NN1(for unseen N-grams)

o N1N1 = Number of N-grams seen once.

o NN = Total N-grams.

 Pros: Theoretically grounded.

 Cons: Fails for high-frequency N-grams.

Example:

 If 10 bigrams appeared once in a corpus of 1000 bigrams:

PGT(unseen)=101000=0.01PGT(unseen)=100010=0.01

(4) Jelinek-Mercer (Linear Interpolation)

 Idea: Combine higher-order and lower-order N-grams (e.g., trigram + bigram + unigram).

 Formula:

P(wi∣wi−2,wi−1)=λ1Ptrigram+λ2Pbigram+λ3PunigramP(wi∣wi−2,wi−1)=λ1Ptrigram+λ2Pbigram+λ3
Punigram

o λλs are weights (∑λ=1∑λ=1).

 Pros: Balances context specificity and data sparsity.

 Cons: Requires tuning λλs (often via EM algorithm).

(5) Kneser-Ney Smoothing

 State-of-the-art: Used in modern N-gram models (e.g., Google N-gram Corpus).

 Key Idea:

o Absolute Discounting: Subtract a fixed DD from seen N-gram counts.

o Continuation Probability: Estimate P(wi)P(wi) based on how many contexts it

appears in (not raw frequency).

 Formula:
PKN(wi∣wi−1)=max⁡(Count(wi−1,wi)−D,0)Count(wi−1)+λ(wi−1)Pcontinuation(wi)PKN(wi∣wi−1
)=Count(wi−1)max(Count(wi−1,wi)−D,0)+λ(wi−1)Pcontinuation(wi)

o Pcontinuation(wi)=∣{wi−1:Count(wi−1,wi)>0}∣∣All bigrams∣Pcontinuation(wi
)=∣All bigrams∣∣{wi−1:Count(wi−1,wi)>0}∣

 Pros: Handles rare words better (e.g., "San Francisco" vs. "Francisco" alone).

3. Comparison of Smoothing Methods

Method Handles Unseen Handles Rare Complexity Used In

Laplace Yes Poorly Low Basic NLP

Good-Turing Yes Moderately Medium Small corpora

Jelinek-Mercer Yes Well High Speech recognition

Kneser-Ney Yes Best Highest SOTA N-gram models

4. Practical Example

Task: Compute P("sat"∣"cat")P("sat"∣"cat") with Kneser-Ney smoothing.

 Training Corpus:

o Bigram "cat sat" appeared 15 times.

o Word "cat" appeared 50 times.

o "sat" appeared in 3 distinct contexts (e.g., "cat sat", "dog sat", "bird sat").

o Total unique bigrams = 1000.

o Discount D=0.75D=0.75.

Calculation:

1. Discount seen bigrams:

max⁡(15−0.75,0)50=14.2550=0.28550max(15−0.75,0)=5014.25=0.285

2. Continuation probability:

Pcontinuation("sat")=31000=0.003Pcontinuation("sat")=10003=0.003

3. Normalization factor λλ:

λ("cat")=0.75×∣{w:Count("cat",w)>0}∣50λ("cat")=500.75×∣{w:Count("cat",w)>0}∣

If "cat" precedes 5 unique words, λ=0.75×550=0.075λ=500.75×5=0.075.

4. Final probability:

PKN("sat"∣"cat")=0.285+0.075×0.003≈0.285PKN("sat"∣"cat")=0.285+0.075×0.003≈0.285
5. Which One to Use?

 Small datasets: Good-Turing or Laplace.

 Medium datasets: Jelinek-Mercer.

 Large datasets: Kneser-Ney (industry standard).

Modern NLP Note:

While smoothing is vital for N-grams, neural LMs (e.g., BERT, GPT) use subword
tokenization and embedding dropout to handle sparsity implicitly.

Summary

Smoothing ensures N-gram models generalize to unseen data. Kneser-Ney is the gold standard, but
simpler methods (Laplace, Good-Turing) are useful for understanding the fundamentals.

Need code implementations? Ask for Python examples!

Explain the architecture of NLG systems with a block diagram.

Architecture of Natural Language Generation (NLG) Systems

Natural Language Generation (NLG) converts structured data into human-readable text. Below is
a block diagram and explanation of its key components:

Block Diagram of NLG System

plaintext

Copy

+-------------------+ +-------------------+ +-------------------+ +-------------------+

| Data Input | ----> | Content | ----> | Text | ----> | Output |

| (Structured Data)| | Planning | | Realization | | (Natural Language|

+-------------------+ | (What to Say) | | (How to Say It) | | Text/Speech) |

+-------------------+ +-------------------+ +-------------------+

| |

v v

+-------------------+ +-------------------+

| Discourse | | Lexical & |

| Structuring | | Syntactic Choices|

+-------------------+ +-------------------+
Key Components Explained

1. Data Input (Structured Data)

 Input: Tables, databases, JSON, or knowledge graphs.

 Example: Weather data ({temp: 25°C, humidity: 80%}).

2. Content Planning (What to Say)

 Task: Decides what information to include.

 Methods:

o Rule-based: Templates (If rain → mention umbrella).

o ML-based: Entity importance ranking.

 Output: Message plan (["temp=25", "humidity=high"]).

3. Discourse Structuring (Logical Flow)

 Task: Organizes content into a coherent structure.

 Methods:

o Rhetorical Structure Theory (RST): Creates logical relations (cause-effect, contrast).

o Sequencing: Time/importance order ("High humidity (80%) with 25°C").

4. Text Realization (How to Say It)

 Task: Converts structured messages into grammatical sentences.

 Sub-components:

o Lexical Choice: Picks words ("high" vs "elevated" humidity).

o Syntax Generation: Applies grammar rules ("The temperature is 25°C").

o Referring Expressions: Handles pronouns ("It will rain").

5. Output (Natural Language Text/Speech)

 Final Output:

o Text: "Today’s temperature is 25°C with high humidity (80%)."

o Speech: Converted via TTS (e.g., Alexa, Siri).

Types of NLG Systems

Type Approach Example

Template-Based Fixed placeholders "The {temp} is {value}°C."

Type Approach Example

Pipeline Sequential (as above) Classic NLG (e.g., WEATHERGEN)

End-to-End Neural networks (Seq2Seq) GPT-3, T5

Example: Weather Report Generation

1. Input: {temp: 25, humidity: 80, rain: false}

2. Content Planning: Include temp & humidity; skip rain.

3. Discourse: Order by importance → Temp → Humidity.

4. Realization:

o Lexical: "high" for humidity.

o Syntax: "The temperature is 25°C with high humidity (80%)."

5. Output: Final text.

Modern NLG (Neural Approaches)

 Architecture:

plaintext

Copy

+-------------------+ +-------------------+ +-------------------+

| Encoder | ----> | Decoder | ----> | Text |

| (Data → Vectors) | | (Vectors → Text) | | Output |

+-------------------+ +-------------------+ +-------------------+

 Models: GPT-3, BART, T5 (no explicit planning; end-to-end learning).

Key Challenges

1. Coherence: Maintaining logical flow in long texts.

2. Diversity: Avoiding repetitive outputs.

3. Control: Ensuring factual correctness (e.g., in medical reports).

Summary

NLG systems transform structured data → natural language through:

1. Content Planning (What to say),

2. Discourse Structuring (Ordering),

3. Text Realization (Grammar + word choice).

Neural models (GPT, T5) merge these steps into one end-to-end system.

Would you like a code example for a simple NLG pipeline? Let me know!

Compare Statistical Machine Translation (SMT) vs. Neural Machine Translation (NMT)

Here’s a detailed comparison between Statistical Machine Translation (SMT) and Neural Machine
Translation (NMT):

Statistical Machine Translation (SMT) vs. Neural Machine Translation (NMT)

Feature Statistical Machine Translation (SMT) Neural Machine Translation (NMT)

Based on probabilistic models (e.g., Uses deep neural networks (e.g.,

Approach
phrase/word alignments). Seq2Seq, Transformers).

Breaks sentences into phrases, Learns end-to-end mapping from source

Core Idea
translates them, and reorders. to target language.

- Phrase tables
- Encoder-Decoder architecture
Key Components - Language model
- Attention mechanism
- Reordering model

Relies on parallel corpora (source ↔ Requires large parallel corpora but

Training Data
target sentences). learns features automatically.

1. Segment source text into phrases. 1. Encoder converts source to vector.

Translation
2. Look up translations. 2. Decoder generates target
Process
3. Reorder and glue phrases. sequentially.

Handling Uses context from n-gram language Captures long-range dependencies via
Ambiguity models (limited). attention (better context).

Fluency & Often produces rigid, less fluent output Generates more natural, human-like
Coherence due to phrase stitching. translations.

Out-of-Vocabulary Struggles with rare/unknown words Handles OOV better via subword units
(OOV) (requires backoff methods). (BPE, WordPiece).

Domain Needs manual feature engineering for Adapts better with fine-tuning (transfer
Adaptation new domains. learning).

Good for short, simple sentences; Superior for complex sentences and
Performance
struggles with long sentences. long-range dependencies.

Faster at inference (pre-computed Slower due to sequential decoding but

Speed
phrase tables). optimized with GPUs.
Feature Statistical Machine Translation (SMT) Neural Machine Translation (NMT)

- Google Translate (GNMT)

- Moses (phrase-based SMT)
Examples - OpenAI GPT-3
- IBM Models (word-based)
- Transformer models

Key Differences Explained

1. Translation Approach

 SMT:

o Uses statistical models (e.g., noisy channel model).

o Splits sentences into phrases, translates them independently, and reorders.

o Relies on handcrafted features (e.g., phrase alignment, distortion penalties).

 NMT:

o Treats translation as a sequence-to-sequence (Seq2Seq) problem.

o Encoder converts source sentence into a dense vector.

o Decoder generates target words one-by-one (autoregressive).

o Uses attention mechanisms (e.g., Transformer) to focus on relevant words.

2. Context Handling

 SMT: Limited to local context (n-grams).

o Example: "bank" might be mistranslated if the sentence is long.

 NMT: Captures global context via attention.

o Example: Correctly translates "bank" as financial or river based on full sentence.

3. Output Quality

 SMT:

o Pros: Fast, interpretable (phrase tables can be inspected).

o Cons: Less fluent, struggles with idiomatic expressions.

 NMT:

o Pros: More fluent, handles idioms and long sentences better.

o Cons: Black-box nature (hard to debug errors).

4. Training & Scalability

 SMT: Requires pipeline tuning (language model, reordering model).

 NMT: End-to-end training (single model optimizes all steps).

Example Translation

Source: "The cat sat on the mat."

 SMT: Looks up "the cat" → "le chat", "sat on" → "assis sur", etc., and combines them.

 NMT: Encodes the entire sentence, then generates French words sequentially: "Le chat s'est
assis sur le tapis."

When to Use Which?

 SMT:

o Low-resource languages (small parallel corpora).

o Applications needing speed (e.g., real-time chat).

 NMT:

o High-resource languages (e.g., EN↔FR).

o Quality-critical tasks (e.g., publishing, customer support).

Conclusion

 SMT was dominant pre-2014 (e.g., Google Translate used it until 2016).

 NMT (post-2016) now dominates due to better fluency and context handling.

 Hybrid models (SMT + NMT) are sometimes used for low-resource languages.

Would you like a deep dive into Transformer-based NMT? Let me know!

What are the challenges in Machine Translation and how can they be solved?

Challenges in Machine Translation (MT) & Their Solutions

Machine Translation (MT) faces several challenges due to linguistic complexity, cultural differences,
and technical limitations. Below are the key challenges and potential solutions:

1. Ambiguity in Language

Problem:

 Words/phrases can have multiple meanings (lexical/syntactic ambiguity).

o Example: "He saw the bank" → Financial bank or river bank?

Solutions:

 Contextual Embeddings (BERT, GPT) to capture word sense from surrounding text.

 Disambiguation Rules (e.g., "bank" + "money" → financial institution).

2. Idioms & Cultural Nuances

Problem:

 Literal translation fails for idioms (e.g., "Kick the bucket" → "Die").

Solutions:

 Parallel Corpora with Idioms (train on phrase pairs like "It’s raining cats and dogs" → "Il
pleut des cordes").

 Post-Editing by Humans for culturally sensitive content.

3. Low-Resource Languages

Problem:

 Limited training data for rare languages (e.g., Swahili ↔ Navajo).

Solutions:

 Transfer Learning: Pretrain on high-resource languages (e.g., English), then fine-tune.

 Backtranslation: Generate synthetic parallel data (e.g., translate monolingual text).

 Multilingual Models (e.g., mBERT, NLLB) that share parameters across languages.

4. Word Order Differences

Problem:

 Languages have different syntax (e.g., English: Subject-Verb-Object vs. Japanese: Subject-
Object-Verb).

Solutions:

 Attention Mechanisms (Transformers) to align words dynamically.

 Reordering Rules (SMT) or syntactic preprocessing.

5. Rare/Unknown Words (OOV)

Problem:

 Proper nouns (e.g., "ChatGPT"), technical terms, or slang may be missing.

Solutions:

 Subword Tokenization (Byte-Pair Encoding, WordPiece) to split rare words.

 Copy Mechanism: Directly copy OOV words from source to target.

6. Domain Adaptation
Problem:

 Medical/legal MT models perform poorly on casual text (and vice versa).

Solutions:

 Fine-Tuning: Train base models on domain-specific data.

 Hybrid Models: Combine general MT with domain-specific rules.

7. Gender & Bias

Problem:

 Models reinforce stereotypes (e.g., "doctor → he, nurse → she").

Solutions:

 Debiasing Datasets (balance gender references).

 Neutral Rewriting: Use gender-neutral terms (e.g., "they").

8. Evaluation Challenges

Problem:

 BLEU/ROUGE scores don’t always match human judgment.

Solutions:

 Human Evaluation + automated metrics (e.g., COMET, BERTScore).

 Adversarial Testing: Check robustness to perturbations.

Key Takeaways

Challenge Solution Example Tools/Models

Ambiguity Contextual embeddings BERT, GPT-3

Idioms Curated parallel corpora OPUS, ParaCrawl

Low-Resource Languages Transfer learning, backtranslation mBERT, NLLB-200

Word Order Attention mechanisms Transformer, MarianMT

OOV Words Subword tokenization SentencePiece, WordPiece

Domain Adaptation Fine-tuning BioBERT, Legal-BERT

Bias Debiasing datasets HuggingFace’s unbiased models

Future Directions

 Zero-Shot Translation: Translate between language pairs never seen during training.

 Explainable AI: Debug why MT models make certain errors.

 Real-Time Adaptive MT: Learn user preferences on the fly.

Conclusion: While NMT (e.g., Google Translate, DeepL) has improved translation quality, challenges
like ambiguity, bias, and low-resource languages persist. Combining neural models with linguistic
rules and human-in-the-loop validation offers the best path forward.

Would you like a case study on how Google Translate handles these challenges? Let me know!

Discuss the applications of NLG in real-world scenarios.

Applications of Natural Language Generation (NLG) in Real-World Scenarios

Natural Language Generation (NLG) transforms structured data into human-like text, enabling
automation and enhanced communication across industries. Below are key real-world applications:

1. Business Intelligence & Analytics

Use Case: Automating reports from data (sales, finance, operations).

 Examples:

o Financial Reports: Generate earnings summaries from spreadsheets.

o Dashboards: Convert KPIs into narratives (e.g., "Q3 sales grew by 12% due to
demand in Europe").

 Tools:

o Narrative Science’s Quill: Creates data-driven business reports.

o Power BI + NLG: Adds textual insights to visualizations.

2. E-Commerce & Retail

Use Case: Product descriptions, personalized recommendations.

 Examples:

o Amazon: Auto-generates product descriptions for millions of items.

o Chatbots: Suggest products in natural language (e.g., "You might like this red dress
based on your browsing history").

 Tools:

o GPT-3 for Shopify: Writes SEO-friendly product descriptions.

3. Healthcare
Use Case: Clinical documentation, patient summaries.

 Examples:

o Radiology Reports: Convert imaging data into text (e.g., "The X-ray shows a 2cm
nodule in the left lung").

o Discharge Summaries: Automate post-visit instructions for patients.

 Tools:

o IBM Watson Health: Generates oncology reports.

4. Journalism & Content Creation

Use Case: News generation, sports recaps, weather forecasts.

 Examples:

o Associated Press (AP): Uses NLG to publish thousands of earnings reports quarterly.

o Weather.com: Converts data into forecasts (e.g., "Expect rain after 3 PM with 80%
humidity").

 Tools:

o Automated Insights’ Wordsmith: Powers AP’s automated stories.

5. Customer Service & Chatbots

Use Case: Dynamic responses in chatbots, email automation.

 Examples:

o Banking: Explain transaction disputes in plain language.

o Zendesk: Auto-generates ticket responses.

 Tools:

o Google’s Dialogflow: Creates context-aware chatbot replies.

6. Education & E-Learning

Use Case: Personalized feedback, quiz generation.

 Examples:

o Duolingo: Generates language exercises dynamically.

o Automated Grading: Provides essay feedback (e.g., "Your argument needs more
evidence").

 Tools:
o OpenAI’s Codex: Explains programming errors in natural language.

7. Legal & Compliance

Use Case: Drafting contracts, summarizing case law.

 Examples:

o Legal Documents: Auto-generate clauses for NDAs.

o LexisNexis: Summarizes court rulings.

 Tools:

o ROSS Intelligence: Uses AI for legal research.

8. Gaming & Interactive Stories

Use Case: Dynamic NPC dialogues, procedural storytelling.

 Examples:

o AI Dungeon: Generates interactive fantasy stories.

o ChatGPT for RPGs: Creates quest narratives on the fly.

9. Social Media & Marketing

Use Case: Ad copy, personalized emails.

 Examples:

o Phrasee: Optimizes email subject lines using NLG.

o ChatGPT: Writes LinkedIn posts or tweet threads.

Key Benefits of NLG

 Scalability: Generate millions of texts instantly.

 Consistency: Avoids human errors in repetitive tasks.

 Personalization: Tailors content to user data.

Challenges

 Quality Control: May produce biased or inaccurate text.

 Context Limits: Struggles with nuanced humor/sarcasm.

Future Trends
 Multimodal NLG: Combine text with images/video (e.g., AI-generated video scripts).

 Real-Time NLG: Live sports commentary, stock market alerts.

Conclusion: NLG is revolutionizing industries by automating text generation, but human oversight
remains critical for quality. From healthcare to gaming, its applications are expanding with advances
in AI like GPT-4 and Google’s PaLM.

Want a deep dive into NLG for a specific industry? Ask away!

What are the problems in translating Indian languages? include hindi telugu

3/3

Challenges in Translating Indian Languages (Hindi, Telugu, and Others)

Translating between Indian languages (e.g., Hindi to Telugu or vice versa) poses unique difficulties
due to linguistic diversity, script variations, and cultural nuances. Below are the key challenges and
potential solutions:

1. Linguistic Diversity & Grammar Differences

Problems:

 Word Order Variations:

o Hindi follows Subject-Object-Verb (SOV) order ("मैं सेब खाता हूँ " → "I apple eat").

o Telugu also uses SOV ("నేను ఆపిల్ తింటాను"), but other Dravidian languages like
Tamil may differ.

 Gender & Verb Agreement:

o Hindi verbs change based on gender ("खाता हूँ " for male vs. "खाती हूँ " for female).

o Telugu verbs also inflect for gender ("తింటాను" for male, "తింటున్నా ను" for
female).

 Lack of Standardization:

o Dialectal variations (e.g., Hyderabadi Hindi vs. Delhi Hindi).

Solutions:

 Rule-based reordering in Statistical Machine Translation (SMT).

 Neural Machine Translation (NMT) models with attention mechanisms (e.g., Google’s
IndicTrans).

2. Script & Orthography Challenges

Problems:

 Different Scripts:
o Hindi: Devanagari (अक्षर)

o Telugu: Telugu script (అక్షరిం)

o Romanized Hindi/Telugu (e.g., "Akshar") adds complexity.

 Schwa Deletion in Hindi:

o Hindi often drops the schwa sound (e.g., "राम" is pronounced "Ram" but written
with an inherent 'a').

Solutions:

 Unicode normalization for script conversion.

 Transliteration tools (e.g., Google’s Indic Transliteration API).

3. Lack of Parallel Corpora

Problems:

 Scarce Training Data:

o Limited Hindi-Telugu parallel datasets compared to English-Hindi.

 Domain-Specific Gaps:

o Medical/legal translations are rare.

Solutions:

 Backtranslation (generate synthetic data).

 Multilingual NMT models (e.g., Facebook’s NLLB-200).

4. Morphological Richness

Problems:

 Agglutination in Telugu:

o Words combine multiple morphemes (e.g., "చదివించింది" = "made someone

read").

 Compound Words in Hindi:

o "राष्ट्रपतत" (Rashtrapati = President) is a single word.

Solutions:

 Morphological analyzers (e.g., IIIT-Hyderabad’s tools for Telugu).

 Subword tokenization (Byte-Pair Encoding).

5. Idioms & Cultural Nuances

Problems:

 Untranslatable Phrases:

o Hindi: "जान छु ड़ाना" (to save one’s life) has no direct Telugu equivalent.

o Telugu: "గు డ్డి గుర్రిం" (blind horse = stubborn person).

 Religious/Cultural Terms:

o "Prasadam" (Hindi/Telugu) has no English equivalent.

Solutions:

 Bilingual dictionaries with explanations.

 Human-in-the-loop post-editing.

6. Code-Mixing & Loanwords

Problems:

 English Words in Hindi/Telugu:

o "मैं ने उसको call तकया" (Hindi)

o "నేను అతనికి call చేశాను" (Telugu)

 Persian/Arabic Influence in Hindi:

o "मु श्किल" (difficulty) vs. Telugu’s native "క ష్ిం

ట ".
Solutions:

 Code-mixed language models (e.g., Hinglish/Telugu-English corpora).

7. Low-Resource Language Pairs

Problems:

 Hindi-Telugu translation tools are less developed than Hindi-English.

 Fewer NLP Researchers focus on Dravidian languages.

Solutions:

 Government initiatives (e.g., AI4Bharat’s IndicNLP Suite).

 Community-driven datasets (e.g., Anuvaad for Hindi-Telugu).

Comparison: Hindi vs. Telugu Translation Challenges

Challenge Hindi Telugu

Script Devanagari Telugu script

Verb Agreement Gender-based (M/F) Gender & plurality-based

Word Order SOV SOV

Loanwords Persian/Arabic/English Sanskrit/English

Existing Tools for Hindi-Telugu Translation

1. Google Translate (Neural MT, decent but error-prone).

2. Microsoft Translator (Supports code-mixed input).

3. IndicTrans (IIIT-Hyderabad’s open-source NMT model).

4. Shakti MT (IIT-led initiative for Indian languages).

Future Directions

 Better Pretrained Models (e.g., IndicBERT).

 Speech-to-Speech Translation (e.g., Hindi-Telugu real-time apps).

Conclusion: While Hindi-Telugu translation is improving with NMT, challenges like data scarcity,
script differences, and idioms persist. Collaborative efforts between linguists and AI researchers are
key to progress.

Need examples of Hindi-Telugu translation errors? Ask for a case study!

xplain Rule-Based vs. Corpus-Based Machine Translation.

Rule-Based vs. Corpus-Based Machine Translation: Key Differences

Machine Translation (MT) systems can be broadly categorized into Rule-Based (RBMT) and Corpus-
Based (CBMT) approaches. Here's a detailed comparison:

1. Rule-Based Machine Translation (RBMT)

Definition:
Uses linguistic rules (grammar, syntax, semantics) and bilingual dictionaries to translate text.

How It Works:

1. Analysis: Parses source text into grammatical structures.

2. Transfer: Applies language-pair rules to convert structures.

3. Generation: Produces target-language text using dictionaries.

Types of RBMT:
 Direct MT: Word-by-word substitution (e.g., early systems like SYSTRAN).

 Transfer-Based MT: Uses intermediate structural representations.

 Interlingua MT: Translates via a universal abstract language.

Pros:

✅ Transparent: Rules are human-readable and editable.

✅ No training data needed: Works for low-resource languages.
✅ Consistent outputs: Follows predefined rules strictly.

Cons:

❌ Labor-intensive: Requires manual rule creation by linguists.

❌ Inflexible: Fails with idioms, slang, or complex sentences.
❌ Poor scalability: Adding new languages requires new rules.

Example:

 Input (Hindi): "वह तकताब पढ़ रहा है ।"

 RBMT Steps:

1. Parse: [वह] [तकताब] [पढ़ रहा है ] → Pronoun + Noun + Verb.

2. Transfer: Hindi SOV → English SVO.

3. Generate: "He is reading the book."

2. Corpus-Based Machine Translation (CBMT)

Definition:
Relies on statistical patterns or neural networks learned from large bilingual text corpora.

Subtypes:

1. Statistical MT (SMT):

o Uses probability models (e.g., IBM Models, phrase-based SMT).

o Example: Google Translate (pre-2016).

2. Neural MT (NMT):

o Uses deep learning (e.g., Seq2Seq, Transformers).

o Example: Modern Google Translate, DeepL.

How It Works:

 SMT:

1. Aligns parallel sentences (e.g., Hindi-English).

2. Calculates translation probabilities (e.g., P("book"∣"तकताब"P("book"∣"तकताब").

 NMT:

1. Encodes source text into vectors.

2. Decodes vectors into target language.

Pros:

✅ Adaptive: Learns from data (handles idioms, slang).

✅ Scalable: Improves with more data.
✅ Fluency: NMT produces more natural outputs.

Cons:

❌ Data-hungry: Requires massive parallel corpora.

❌ Black-box: Hard to debug errors.
❌ Biases: Inherits biases from training data.

Example:

 Input (Hindi): "वह तकताब पढ़ रहा है ।"

 SMT/NMT Steps:

1. Match with similar training examples.

2. Output: "He is reading the book."

Key Differences

Feature Rule-Based (RBMT) Corpus-Based (CBMT)

Basis Linguistic rules Statistical/neural patterns

Data Needed None (only dictionaries/rules) Large parallel corpora

Flexibility Rigid Adapts to context

Development Cost High (manual labor) Lower (automated training)

Output Quality Grammatical but unnatural Fluent but may hallucinate

Best For Low-resource languages High-resource languages

Hybrid Approaches

Modern systems (e.g., Google Translate) combine:

 RBMT: For rare languages (e.g., Sanskrit).

 NMT: For high-resource pairs (e.g., Hindi-English).

Conclusion

 Use RBMT when:

o You need control over translations (e.g., legal texts).

o Working with low-resource languages (e.g., Bhojpuri).

 Use CBMT (NMT) when:

o You have large datasets.

o Prioritize fluency (e.g., customer support chatbots).

NLG Architecture

Natural Language Generation (NLG) Architectures

Adapted from Rags Mellish et al. (2004) and other seminal works

1. Common Tasks in NLG Systems

All NLG systems perform three core tasks, with increasing specificity:

Task Function Examples

Discourse Plans overall structure (order, hierarchy). Outputs a tree-like DRAFTER,

Planner schema. DRAFTER-2

Decides sentence structure: aggregation, lexicalization,

Text Planner HEALTHDOC
referring expressions.

Surface Realizer Generates grammatically correct sentences from the plan. KUNET

Example of Specificity:
 Vague: "He danced with Jaya and she got angry."

 Specific: "He forced Jaya to dance, and she got angry."

o Choice of verb ("forced") changes the referent of "she."

2. NLG Architecture Types

(A) Pipelined Architecture

 Design: Strict sequential flow (Discourse → Text → Surface).

 Constraints: Each stage must complete before the next begins.

 Pros: Modular, easy to debug.

 Cons: Inflexible; errors propagate downstream.

 Example:

o Input: Data → Discourse Plan → Text Plan → "He danced with Jaya and she got
angry."

(B) Interleaved Architecture

 Design: Merges stages into two phases:

1. Planning (Discourse + Text).

2. Realization (Surface).

 Pros: More flexible than pipelined.

 Cons: Harder to maintain.

 Examples: PAULINE, MUMBLE, IPG, PROTEUS.

(C) Integrated (Blackboard) Architecture

 Design: All tasks interact dynamically on a shared "blackboard."

 Pros: Highly adaptable; handles complex constraints.

 Cons: Computationally intensive.

 Example: GLINDA.

3. Key Differences

Feature Pipelined Interleaved Integrated

Flow Linear Two-phase Dynamic

Flexibility Low Moderate High

Use Case Simple, deterministic Balanced complexity Highly context-dependent

4. Challenges & Trade-offs

 Pipelined: Struggles with backtracking (e.g., realizing a sentence may reveal planning errors).

 Interleaved/Integrated: Better for nuanced texts but require more computational power.

Example of Constraint Propagation:

 Pipelined: Surface realizer cannot revise the text planner’s choice of "danced" vs. "forced."

 Integrated: Blackboard allows real-time negotiation between modules.

5. Practical Applications

 Pipelined: Weather reports, templated medical summaries.

 Interleaved: Personalized narratives (e.g., patient education).

 Integrated: Creative storytelling, dialogue systems.

Summary

 Pipelined: "Assembly line" approach.

 Interleaved: Balances structure and flexibility.

 Integrated: Mimics human-like adaptability.

Next Steps: Explore how modern neural NLG (GPT-3, T5) blends these architectures.

Word Sense Disambiguation in Natural Language Processing - GeeksforGeeks
No ratings yet
Word Sense Disambiguation in Natural Language Processing - GeeksforGeeks
6 pages
Unit III 1
No ratings yet
Unit III 1
11 pages
Semantic Disambiguation
No ratings yet
Semantic Disambiguation
46 pages
NLP Unit 3
No ratings yet
NLP Unit 3
83 pages
Ambiguous Synonyms Implementing An Unsup
No ratings yet
Ambiguous Synonyms Implementing An Unsup
40 pages
AMSLI Questions Part 3
100% (4)
AMSLI Questions Part 3
5 pages
Unit 3 NLP New
No ratings yet
Unit 3 NLP New
15 pages
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
No ratings yet
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
9 pages
NLP 4-6
No ratings yet
NLP 4-6
29 pages
Shortcuts in Windows 11
100% (3)
Shortcuts in Windows 11
8 pages
Word Sense Disambiguation
No ratings yet
Word Sense Disambiguation
68 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
103 pages
NLP M4 Part 1 SPP
No ratings yet
NLP M4 Part 1 SPP
57 pages
Semantic Parsing
No ratings yet
Semantic Parsing
79 pages
Chapter 4 NLP
No ratings yet
Chapter 4 NLP
17 pages
S10 1
No ratings yet
S10 1
477 pages
NLP1 Lecture5
No ratings yet
NLP1 Lecture5
52 pages
Slides 6
No ratings yet
Slides 6
21 pages
NLP Module3
No ratings yet
NLP Module3
27 pages
Word Sense Disambiguation
No ratings yet
Word Sense Disambiguation
39 pages
Unsupervised Hindi Word Sense Disambiguation Using Graph Based Centrality Measures
No ratings yet
Unsupervised Hindi Word Sense Disambiguation Using Graph Based Centrality Measures
8 pages
Neural Models For NLP
No ratings yet
Neural Models For NLP
67 pages
An Hybrid Approach To Word Sense Disambiguation
No ratings yet
An Hybrid Approach To Word Sense Disambiguation
12 pages
Advances in WSD
No ratings yet
Advances in WSD
208 pages
Unit 3-1
No ratings yet
Unit 3-1
66 pages
CS 388: Natural Language Processing: Word Sense Disambiguation
No ratings yet
CS 388: Natural Language Processing: Word Sense Disambiguation
31 pages
Enterprise Service Bus (ESB) Product Evaluation Comparisons: Prepared by Robert Woolley October 18, 2006
100% (3)
Enterprise Service Bus (ESB) Product Evaluation Comparisons: Prepared by Robert Woolley October 18, 2006
58 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
Lexical Semantic - 3.3
No ratings yet
Lexical Semantic - 3.3
19 pages
NLP Mod4 Lec1 Word Sense Disambiguation
No ratings yet
NLP Mod4 Lec1 Word Sense Disambiguation
26 pages
NLP Unit 3
No ratings yet
NLP Unit 3
20 pages
Historiography and Identity Ii Postroman Multiplicity and New Political Identities Helmut Reimitz Instant Download
No ratings yet
Historiography and Identity Ii Postroman Multiplicity and New Political Identities Helmut Reimitz Instant Download
85 pages
Semantic Parsing Notes
No ratings yet
Semantic Parsing Notes
5 pages
NLP Unit Test 2
No ratings yet
NLP Unit Test 2
10 pages
Modeling Word Interpretation With Deep Language Models: The Interaction Between Expectations and Lexical Information
No ratings yet
Modeling Word Interpretation With Deep Language Models: The Interaction Between Expectations and Lexical Information
7 pages
2019 Wiedemannetal Konvens Bert 2
No ratings yet
2019 Wiedemannetal Konvens Bert 2
2 pages
2019 Wiedemannetal Konvens Bert 5
No ratings yet
2019 Wiedemannetal Konvens Bert 5
2 pages
Question Bank NLP SOLUTIONS
No ratings yet
Question Bank NLP SOLUTIONS
21 pages
How Great Is Our God Chords
No ratings yet
How Great Is Our God Chords
1 page
Xsense: Learning Sense-Separated Sparse Representations and Textual Definitions For Explainable Word Sense Networks
No ratings yet
Xsense: Learning Sense-Separated Sparse Representations and Textual Definitions For Explainable Word Sense Networks
8 pages
First Stage
No ratings yet
First Stage
15 pages
m-715 Writing
No ratings yet
m-715 Writing
81 pages
Unit No 4
No ratings yet
Unit No 4
9 pages
Aktu-Qp BCC302 2023-24 Odd-Sem
No ratings yet
Aktu-Qp BCC302 2023-24 Odd-Sem
4 pages
Introduction To NLP and Ambiguity
No ratings yet
Introduction To NLP and Ambiguity
42 pages
WSD Using Dictionary
No ratings yet
WSD Using Dictionary
4 pages
4 Word Representation
No ratings yet
4 Word Representation
41 pages
NLP IA1 Question Bank: Concept
No ratings yet
NLP IA1 Question Bank: Concept
10 pages
An Innovative Method For Hindi Word Sense Disambiguation: Binod Kumar Mishra Suresh Jain
No ratings yet
An Innovative Method For Hindi Word Sense Disambiguation: Binod Kumar Mishra Suresh Jain
17 pages
English Verb Conjugation 2
No ratings yet
English Verb Conjugation 2
2 pages
NSS Hindi Book List 2007
No ratings yet
NSS Hindi Book List 2007
19 pages
Learning Expressive Models For Word Sense Disambiguation: Lucia Specia Mark Stevenson Maria Das Graças V. Nunes
No ratings yet
Learning Expressive Models For Word Sense Disambiguation: Lucia Specia Mark Stevenson Maria Das Graças V. Nunes
8 pages
System Paradigms
No ratings yet
System Paradigms
4 pages
Madhav Institute of Technology & Science, Gwalior
No ratings yet
Madhav Institute of Technology & Science, Gwalior
13 pages
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation With Contextualized Embeddings
No ratings yet
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation With Contextualized Embeddings
10 pages
NLP Summary
No ratings yet
NLP Summary
6 pages
CB Chapterwise Index
No ratings yet
CB Chapterwise Index
4 pages
A Survey On Word Sense Disambiguation: Agatheeswaran.T, Vigneshwaran.M, Tamilnadu College of Engineering, Coimbatore
No ratings yet
A Survey On Word Sense Disambiguation: Agatheeswaran.T, Vigneshwaran.M, Tamilnadu College of Engineering, Coimbatore
22 pages
Jyotish in The Vaidic Stage
No ratings yet
Jyotish in The Vaidic Stage
6 pages
A Metaheuristic With A Neural Surrogate Function - 2022 - Machine Learning With
No ratings yet
A Metaheuristic With A Neural Surrogate Function - 2022 - Machine Learning With
11 pages
UNIT 4 New
No ratings yet
UNIT 4 New
14 pages
International Journal of Engineering and Science Invention (IJESI)
No ratings yet
International Journal of Engineering and Science Invention (IJESI)
4 pages
NLP Assign Mod-4,5,6 IramShaikh
No ratings yet
NLP Assign Mod-4,5,6 IramShaikh
10 pages
NLP - Mid 2 Examination
No ratings yet
NLP - Mid 2 Examination
4 pages
NLPQB2
No ratings yet
NLPQB2
8 pages
Departmental Examination For Preliminary Grade Medical Officers and Dental Surgeons September 2024 Application en - WWW - Gazette.lk
No ratings yet
Departmental Examination For Preliminary Grade Medical Officers and Dental Surgeons September 2024 Application en - WWW - Gazette.lk
4 pages
Session 2.1 Ancient World
No ratings yet
Session 2.1 Ancient World
31 pages
2019 Wiedemannetal Konvens Bert 1
No ratings yet
2019 Wiedemannetal Konvens Bert 1
2 pages
IT3401 - WE Lesson Plan
No ratings yet
IT3401 - WE Lesson Plan
6 pages
Creative Writing
No ratings yet
Creative Writing
17 pages
End Sem Answer Key 2023
No ratings yet
End Sem Answer Key 2023
4 pages
CS - 8TH Bridge Course
No ratings yet
CS - 8TH Bridge Course
3 pages
Sample Paper 13 IP
No ratings yet
Sample Paper 13 IP
9 pages
Problem Statement NLP WSD
No ratings yet
Problem Statement NLP WSD
9 pages
Chapter III CONTRASTIVE SYNTAX ANALYSIS
No ratings yet
Chapter III CONTRASTIVE SYNTAX ANALYSIS
39 pages
Bisayan Studies
No ratings yet
Bisayan Studies
10 pages
Example Worksheets Format
No ratings yet
Example Worksheets Format
5 pages
Bootstrap Datetimepicker Min Css Example
No ratings yet
Bootstrap Datetimepicker Min Css Example
7 pages
Design and Software Development For Vaccine Management System Using Java
No ratings yet
Design and Software Development For Vaccine Management System Using Java
3 pages
Birth Application Fillable Form 1
No ratings yet
Birth Application Fillable Form 1
1 page
Microprocessor Microcontroller EXAM 2021
No ratings yet
Microprocessor Microcontroller EXAM 2021
5 pages
Activity Microcurricular - Planning - 1 - First - Baccalaureate
No ratings yet
Activity Microcurricular - Planning - 1 - First - Baccalaureate
8 pages
Online Food Ordering System in ASP Net S
No ratings yet
Online Food Ordering System in ASP Net S
5 pages
Theorizing Affect and Emotion
No ratings yet
Theorizing Affect and Emotion
7 pages
Govardhan Ashtaka
No ratings yet
Govardhan Ashtaka
2 pages
Antigone Love and Marriage
No ratings yet
Antigone Love and Marriage
4 pages
MAT 200 Course Notes On Geometry 5
No ratings yet
MAT 200 Course Notes On Geometry 5
7 pages
Semantic Modeling In Formal English
From Everand
Semantic Modeling In Formal English
Dr. Ir. Andries Van Renssen
No ratings yet
Computer Science Ontology: Fundamentals and Applications
From Everand
Computer Science Ontology: Fundamentals and Applications
Fouad Sabry
No ratings yet
Coreference: Fundamentals and Applications
From Everand
Coreference: Fundamentals and Applications
Fouad Sabry
No ratings yet