0% found this document useful (0 votes)
25 views37 pages

Notes

The document discusses compositional semantics and word sense disambiguation (WSD), highlighting challenges such as word sense ambiguity and various approaches to resolve it, including rule-based, unsupervised, and supervised methods. It categorizes types of word sense ambiguity into homonymy, polysemy, and categorial ambiguity, detailing their implications for semantic interpretation and disambiguation strategies. Additionally, it addresses the limitations of N-gram models in modern NLP, emphasizing the shift towards neural models for better context handling and semantic understanding.

Uploaded by

praneethakalva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views37 pages

Notes

The document discusses compositional semantics and word sense disambiguation (WSD), highlighting challenges such as word sense ambiguity and various approaches to resolve it, including rule-based, unsupervised, and supervised methods. It categorizes types of word sense ambiguity into homonymy, polysemy, and categorial ambiguity, detailing their implications for semantic interpretation and disambiguation strategies. Additionally, it addresses the limitations of N-gram models in modern NLP, emphasizing the shift towards neural models for better context handling and semantic understanding.

Uploaded by

praneethakalva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Compositional Semantics & Word Sense Disambiguation (WSD)

Your discussion highlights key challenges in compositional semantics, where the meaning of a
sentence is built from the meanings of its parts (words). A major hurdle is word sense ambiguity:
the same word (nail) can evoke different meanings depending on context. Below is a structured
breakdown of the issues and approaches you mentioned.

1. The Problem of Word Sense

 Is there a finite set of senses per word?

o Lexicographers (e.g., in dictionaries) attempt to list senses, but real-world usage is


fluid.

o Some theories (e.g., Prototype Theory) argue that word meanings are fuzzy, with
central and peripheral senses.

o Distributional Semantics (e.g., word embeddings) suggests meanings are continuous


rather than discrete.

 Does a word evoke a single sense or a distribution?

o Traditional WSD assumes one correct sense per context (discrete).

o Probabilistic models (e.g., Bayesian WSD) assign probabilities to possible senses.

o Neural approaches (e.g., BERT) generate contextualized embeddings where


meaning is a gradient.

2. Approaches to Word Sense Disambiguation (WSD)

(A) Rule-Based & Knowledge-Based Methods

 Early systems relied on:

o Dictionaries (e.g., WordNet) listing predefined senses.

o Selectional restrictions (e.g., hammer → tool sense of nail).

 Example: Lesk Algorithm (1986) compares dictionary glosses of nearby words.

(B) Unsupervised Methods (Word Sense Induction)

 Clustering (e.g., k-means) groups similar word usages into "induced senses."

 Topic Modeling (e.g., LDA) treats senses as latent topics.

 Advantage: Adaptable to new domains without labeled data.

(C) Supervised & Semi-Supervised Learning

 Classic ML (e.g., SVM, Decision Trees) trained on sense-annotated corpora (e.g., SemCor).

 Neural WSD:
o Uses contextual embeddings (e.g., BERT) to predict the most likely sense.

o Can rank senses by probability (soft disambiguation).

 Limitation: Requires large labeled datasets; struggles with rare senses.

3. Example: Disambiguating "Nail"

Your examples show how co-occurring words disambiguate nail:

Sentence Trigger Words Sense of "Nail"

1. ...with a hammer hammer Metallic fastener

2. ...from the hardware store hardware store Metallic fastener

3. ...to get his nails clipped beauty salon, clipped Fingernail

4. ...to get a manicure manicure Fingernail

Here, selectional preferences (e.g., hammer only fits the tool sense) resolve ambiguity.

4. Challenges in WSD

 Granularity of senses: Should nail (body part) and nail (metal spike) be separate senses, or
sub-senses?

 Domain dependence: Nail in construction vs. beauty contexts.

 Dynamic meanings: New senses emerge (e.g., "nail it" = succeed).

5. Integration with Compositional Semantics

For full semantic interpretation, WSD interacts with:

1. Syntax: Parse trees help rule out implausible meanings.

2. Pragmatics: World knowledge (e.g., salons don’t sell metal nails).

3. Discourse: Prior mentions (e.g., "He bought nails" primes the hardware sense).

Key Takeaways

 Word sense is context-dependent and may not be strictly discrete.

 Hybrid approaches (knowledge + machine learning) often work best.

 Modern NLP leans toward distributional semantics (embeddings) over rigid sense
inventories.

Types of Word Sense Ambiguity in Compositional Semantics


Your discussion highlights three primary types of word sense ambiguity: homonymy, polysemy,
and categorial ambiguity. Below is a structured breakdown of each type, along with how they
impact semantic interpretation.

1. Homonymy

Definition:

 Two or more unrelated meanings share the same spelling and pronunciation.

 The senses are orthogonal (no semantic connection).

Examples:

Word Sense 1 Sense 2

bank Financial institution Side of a river

bat Flying mammal Sports equipment

nail Metallic fastener Fingernail

Key Properties:

 Discrete meanings: No overlap between senses.

 Requires strong contextual cues for disambiguation (e.g., "withdraw money from the
bank" vs. "swim near the river bank").

2. Polysemy

Definition:

 A word has multiple related senses, often derived from a core meaning.

 Unlike homonymy, the senses are connected metaphorically or metonymically.

Examples:

Word Core Sense Extended Senses

bank Financial institution Bank of clouds (collection)

head Body part Head of a company (leader)

mouse Small rodent Computer mouse (shape analogy)

Key Properties:

 Graded meaning shift: Some senses are closer to the core meaning than others.

 Often requires world knowledge (e.g., "bank of clouds" implies a stacked formation, similar
to how money is stored in a bank).
 More challenging for machines than homonymy because boundaries between senses are
fuzzy.

3. Categorial Ambiguity

Definition:

 A word belongs to different grammatical categories (e.g., noun vs. verb) with distinct
meanings.

 Unlike homonymy/polysemy, syntactic role alone can resolve the ambiguity.

Examples:

Word Noun Sense Verb Sense

book A readable volume To book a ticket (reserve)

fish Aquatic animal To fish for compliments (seek)

light Illumination source To light a candle (ignite)

Key Properties:

 Resolvable via part-of-speech (POS) tagging (e.g., if book is tagged as a verb, it cannot mean
a physical book).

 Easier for NLP systems because syntax provides clear disambiguation cues.

How These Ambiguities Impact Semantic Interpretation

1. Disambiguation Strategies

Type Resolution Method Example

Strong contextual cues (lexical, world


Homonymy "River bank" vs. "Bank account"
knowledge)

Polysemy Semantic similarity, metaphor detection "Bank of data" (storage metaphor)

Categorial Syntactic parsing (POS tags, dependency "Book a flight" (verb) vs. "Read a
Ambiguity trees) book" (noun)

2. Challenges in NLP

 Homonymy & Polysemy:

o Rule-based systems (e.g., WordNet) struggle with fine-grained sense distinctions.

o Neural models (e.g., BERT) handle polysemy better by generating contextualized


embeddings.

 Categorial Ambiguity:

o Solved early in the NLP pipeline (POS tagging), but errors propagate if misclassified.
3. Theoretical Implications

 Lexical Semantics: Should word senses be discrete (homonymy-style) or continuous


(polysemy-style)?

 Cognitive Linguistics: Do humans store all senses separately, or derive them dynamically?

Conclusion

 Homonymy: Unrelated meanings (hard disambiguation).

 Polysemy: Related, extended meanings (graded disambiguation).

 Categorial Ambiguity: Resolved via syntax (easiest for machines).

Modern NLP systems (e.g., transformer models) blend distributional semantics (embeddings)
with symbolic knowledge (WordNet) to handle these ambiguities. However, polysemy remains the
hardest due to its fluid, context-dependent nature.

Would you like a deeper dive into how BERT or FrameNet handles these ambiguities?

N-Gram Models and Their Limitations: A Deep Dive

1. What is an N-Gram Model?

An N-gram model is a probabilistic language model that predicts the next word in a sequence based
on the previous N−1N−1 words. It relies on the Markov assumption, which simplifies language by
assuming that the probability of a word depends only on a fixed window of prior words.

Types of N-Gram Models

Model Context Window Example

Unigram N=1N=1 (no context) P("cat")P("cat")

Bigram N=2N=2 (1 previous word) P("sat"∣"cat")P("sat"∣"cat")

Trigram N=3N=3 (2 previous words) P("on"∣"cat sat")P("on"∣"cat sat")

Higher N N>3N>3 (more context) Rarely used due to sparsity.

Probability Calculation

For a bigram model:

P(wn∣wn−1)=Count(wn−1,wn)Count(wn−1)P(wn∣wn−1)=Count(wn−1)Count(wn−1,wn)

 Example: If "cat sat" appears 100 times and "cat" appears 200
times, P("sat"∣"cat")=0.5P("sat"∣"cat")=0.5.

2. Key Limitations of N-Gram Models

(1) Data Sparsity


 Problem: Many possible word combinations (n-grams) never appear in training data, leading
to zero probabilities.

o Example: The phrase "purple unicorn dances" may never occur,


making P("dances"∣"purple unicorn")=0P("dances"∣"purple unicorn")=0.

 Solutions:

o Smoothing (e.g., Laplace, Good-Turing): Assign small probabilities to unseen n-


grams.

o Backoff: Use lower-order n-grams (e.g., bigrams if trigrams are missing).

(2) Computational and Memory Constraints

 Problem: Storing all possible n-grams requires exponential memory (e.g., a 5-gram model
over 50k words needs 50k550k5 entries).

 Solutions:

o Pruning: Discard rare n-grams.

o Hashing: Compress n-gram tables.

(3) No Long-Range Dependencies

 Problem: N-grams only consider local context (e.g., bigrams ignore words beyond the
immediate predecessor).

o Example: In "The cat that I adopted yesterday sat on the mat," the relationship
between "cat" and "sat" is lost in a bigram model.

 Solution: Use RNNs/LSTMs or Transformers (e.g., BERT) to capture long-range context.

(4) Out-of-Vocabulary (OOV) Words

 Problem: New/unseen words break the model (e.g., slang, rare terms).

 Solutions:

o Subword tokenization (e.g., Byte-Pair Encoding in BERT).

o Unknown word tokens (e.g., <UNK>).

(5) Lack of Semantic Understanding

 Problem: N-grams treat words as discrete symbols, ignoring meaning.

o Example: "Bank" (financial vs. river) cannot be disambiguated without deeper


context.

 Solutions:

o Word embeddings (Word2Vec, GloVe) for semantic similarity.

o Contextual embeddings (BERT, GPT) for dynamic word sense disambiguation.


3. Why Modern NLP Moved Beyond N-Grams

Aspect N-Gram Models Neural Models (e.g., BERT)

Context Handling Fixed window (local) Full-sentence attention (global)

Sparsity Prone to zero probabilities Generalizes via embeddings

Memory Grows exponentially with NN Compact (shared embeddings)

Semantics No word meaning Captures polysemy, synonyms, and syntax

4. When Are N-Grams Still Useful?

Despite limitations, n-grams are still used for:

1. Simple tasks (e.g., spell-checking, autocomplete).

2. Baseline models in NLP research.

3. Feature extraction in hybrid systems (e.g., n-grams + ML classifiers).

5. Example: N-Gram vs. BERT for "Bank" Disambiguation

 N-Gram: Fails to distinguish "bank" in:

o "Deposit money at the bank." (financial)

o "Fish near the river bank." (geographical)

 BERT: Uses contextual attention to infer the correct sense.

Conclusion

N-gram models are simple but limited by sparsity, fixed context, and lack of semantics. Modern NLP
leverages neural networks (Transformers, embeddings) to overcome these issues, though n-grams
remain useful for lightweight applications.

Next Steps: Explore Word2Vec, BERT, or smoothing techniques in detail? Let me know!

reference resolution techniques

Reference Resolution Techniques in Natural Language Processing (NLP)

Reference resolution is the task of identifying what entities or concepts pronouns, demonstratives,
or other referring expressions point to in a discourse. It is crucial for tasks like coreference
resolution, anaphora resolution, and discourse understanding. Below are key techniques and their
applications.

1. Types of Reference Resolution


(A) Anaphora Resolution

 Resolves pronouns (e.g., he, she, it) to their antecedents.

o "John left early because he was tired." → "he" = John

(B) Cataphora Resolution

 Resolves references where the pronoun appears before the antecedent.

o "When he arrived, John sat down." → "he" = John

(C) Coreference Resolution

 Links all mentions of the same entity in a text.

o "Mary called her friend. She was happy." → "her", "she" = Mary

(D) Bridging References

 Resolves references where an entity is indirectly related to another.

o "I entered the room. The ceiling was high." → "the ceiling" belongs to "the room"

2. Key Techniques for Reference Resolution

(1) Rule-Based & Syntax-Driven Methods

 Morphological & Syntactic Constraints

o Pronouns must agree in gender, number, and grammatical role.

 "The doctor finished her shift." → "her" must refer to a female doctor.

o Uses parse trees to check grammatical relationships.

 Hobbs' Algorithm (1978)

o A classic rule-based method that traverses a syntactic tree to find the closest
compatible antecedent.

(2) Machine Learning & Statistical Approaches

 Feature-Based Models

o Uses supervised learning (e.g., SVM, decision trees) with features like:

 Distance (how far the pronoun is from the candidate antecedent).

 Grammatical role (subject/object preference).

 Semantic compatibility (e.g., "The car… it" vs. "The car… she").

 Mention-Pair Models

o Treats coreference as a binary classification problem:

 Does "he" refer to "John"? (Yes/No)


(3) Neural & Deep Learning Approaches

 End-to-End Neural Coreference Models

o Lee et al. (2017) – Uses span representations to predict coreference clusters.

o BERT-based models (e.g., CorefBERT) leverage contextual embeddings for better


disambiguation.

 Transformer-Based Models (e.g., SpanBERT)

o Improves performance by masking contiguous word spans during pretraining.

(4) Hybrid & Knowledge-Based Methods

 Combines rules + ML for robustness.

 Uses ontologies (WordNet, FrameNet) to resolve semantic ambiguities.

o "The bank collapsed. It was near the river." → WordNet helps distinguish financial
vs. river bank.

3. Challenges in Reference Resolution

Challenge Example Solution

Ambiguous BERT-style contextual


"John told Bill he won." (Who won?)
Antecedents disambiguation

"She left and Ø closed the door." (Implied


Zero Anaphora Discourse modeling
subject)

"John and Mary met. They left." (Plural


Split Antecedents Entity clustering
reference)

World Knowledge "The iPhone crashed. It needed a Knowledge graphs (e.g.,


Needed reboot." (What is "it"?) Wikidata)

4. Evaluation Metrics

 MUC (1995) – Measures link-based coreference chains.

 B³ (2010) – Evaluates mention-based clustering.

 CEAF (2005) – Uses entity alignment metrics.

 BLANC (2012) – Focuses on link prediction.

5. Applications

1. Machine Translation (Resolving pronouns across languages).

2. Summarization (Tracking entities in long documents).


3. Dialogue Systems (Understanding user references in chatbots).

4. Question Answering (Resolving "Who does 'he' refer to?").

6. Tools & Libraries

Tool Description

Stanford CoreNLP Rule-based + statistical coreference.

spaCy (neuralcoref) Neural coreference resolution.

AllenNLP (CorefModel) State-of-the-art BERT-based coreference.

HuggingFace (CorefBERT) Pretrained transformer models.

7. Future Directions

 Multimodal Coreference (Resolving references in text + images/video).

 Cross-Lingual Coreference (Handling references in multilingual texts).

 Explainable AI (Why did the model choose an antecedent?).

Conclusion

Reference resolution is essential for coherent NLP systems, evolving from rule-based to neural
methods. While transformers (BERT, GPT) now dominate, challenges like ambiguity and world
knowledge integration remain open problems.

Would you like a deeper dive into a specific technique (e.g., Hobbs' algorithm or SpanBERT)? Let
me know!

cohesion vs coherence in discourse

Cohesion vs. Coherence in Discourse: Key Differences

In discourse analysis, cohesion and coherence are fundamental concepts that determine how well a
text holds together. While they are related, they address different aspects of textual unity.

1. Cohesion: Surface-Level Connections

Definition:
Cohesion refers to the grammatical and lexical devices that link sentences and clauses explicitly,
making the text structurally connected.

Types of Cohesive Devices


Type Example Function

Reference "John arrived. He was late." Pronouns (he, she, it) refer back to nouns.

Substitution "I prefer tea. Do you want some?" Replaces a word (some = some tea).

Ellipsis "Who took the book? Mary [did]." Omits redundant words.

Conjunctions "It rained, so we stayed home." Links clauses (so, because, however).

Lexical Cohesion "The cat purred. The animal slept." Synonyms/repetition tie ideas together.

Key Point:
Cohesion is explicit—it relies on visible linguistic markers that create formal connections between
sentences.

2. Coherence: Meaningful Logical Flow

Definition:
Coherence refers to the underlying logical consistency and meaningful organization of ideas in a
text, making it understandable to the reader.

Elements of Coherence

Aspect Example Function

"Global warming is a crisis. It threatens


Topic Continuity Maintains a central theme.
ecosystems."

Logical Order "First, boil water. Next, add tea leaves." Sequences ideas chronologically.

Explains relationships between


Cause-Effect "She forgot her keys, so she was locked out."
events.

World Readers infer causation via


"The CEO resigned. Stocks plummeted."
Knowledge context.

Key Point:
Coherence is implicit—it depends on the reader’s ability to infer connections, even if explicit
markers (like conjunctions) are missing.

3. Key Differences

Feature Cohesion Coherence

Nature Surface-level (linguistic links) Deep-level (logical meaning)

Visibility Explicit (grammar/lexical ties) Implicit (reader inference)

Dependence Needs cohesive markers Can exist without cohesive devices

Example "She left. Then, he cried." "She left. He was heartbroken."


Analogy:

 Cohesion = The bricks and mortar holding a wall together.

 Coherence = The architectural design ensuring the wall stands logically.

4. Why Both Matter

 A text can be cohesive but not coherent:

o "The sun is hot. Coffee is bitter. She runs fast." (Grammatically linked but illogical).

 A text can be coherent but not cohesive:

o "Rain. Floods. Evacuations." (No conjunctions/pronouns, but the meaning is clear).

Best Practices:

 Use cohesive devices to guide readers (e.g., "However, in contrast").

 Ensure coherence through logical flow (e.g., problem → solution).

5. Applications in NLP

 Machine Translation: Must preserve cohesion (pronoun resolution) and coherence (logical
order).

 Text Summarization: Cohesive chains (entity repetition) help retain key ideas.

 Chatbots: Need coherence to maintain conversational logic.

Tools for Analysis:

 Cohesion: Use Coreference Resolution (e.g., spaCy’s neuralcoref).

 Coherence: Measure topic modeling (e.g., LDA) or entity grid models.

Conclusion

 Cohesion = "How" ideas are linked (grammar/lexis).

 Coherence = "Why" ideas make sense (logic/meaning).


Strong discourse requires both.

Want to explore how to measure coherence computationally? Let me know!

what is perplexity in language modelling how is it used to evaluate language models

Perplexity in Language Modeling: Definition and Evaluation

Perplexity is a key metric used to evaluate how well a language model predicts a sequence of
words. It measures the uncertainty or "confusion" of the model when assigning probabilities to
unseen text. Lower perplexity indicates better performance.
1. Mathematical Definition

Perplexity (PP) is defined as the exponential of the cross-entropy loss:

Perplexity=exp⁡(−1N∑i=1Nlog⁡P(wi∣w<i))Perplexity=exp(−N1i=1∑NlogP(wi∣w<i))

where:

 NN = number of words in the test corpus,

 P(wi∣w<i)P(wi∣w<i) = probability assigned by the model to the ii-th word given its context.

Intuition Behind the Formula

 If the model assigns high probability to the correct next word, perplexity decreases.

 If the model is uncertain (low probability for correct words), perplexity increases.

Example:

 A perplexity of 20 means the model was as "confused" as if it had to choose uniformly


among 20 possible words at each step.

2. How Perplexity Evaluates Language Models

(A) Lower Perplexity = Better Model

 A lower perplexity means the model is more confident in its predictions.

 Example:

o Model A (Perplexity = 50) is worse than Model B (Perplexity = 30).

(B) Comparing Different Models

 Used to compare:

o N-gram models vs. neural models (e.g., LSTM vs. Transformer).

o Different training datasets (e.g., Wikipedia vs. Twitter data).

o Hyperparameter tuning (e.g., model size, learning rate).

(C) Limitations of Perplexity

1. Does Not Measure Fluency or Coherence

o A model may achieve low perplexity but generate nonsensical text (e.g., GPT-2
memorizing training data).

2. Corpus-Dependent

o Perplexity values are only meaningful when comparing models on the same test set.

3. Biased Towards Frequent Words


o Models may optimize for predicting common words (e.g., "the," "a") rather than
rare but meaningful terms.

3. Practical Example

Suppose a bigram language model is tested on the sentence:

"The cat sat on the mat."

Word (wiwi) Previous Word (wi−1wi−1) P(wi∣wi−1)P(wi∣wi−1) −log⁡P(wi∣wi−1)−logP(wi∣wi−1)

"cat" "The" 0.4 0.916

"sat" "cat" 0.3 1.204

"on" "sat" 0.6 0.511

"the" "on" 0.5 0.693

"mat" "the" 0.2 1.609

Perplexity Calculation:

Cross-entropy=0.916+1.204+0.511+0.693+1.6095=0.9866Cross-
entropy=50.916+1.204+0.511+0.693+1.609
=0.9866Perplexity=e0.9866≈2.68Perplexity=e0.9866≈2.68

This means the model is roughly as uncertain as choosing between ~2.7 words at each step.

4. Perplexity in Modern NLP

 Transformers (GPT, BERT): Achieve single-digit perplexity on benchmark datasets (e.g.,


WikiText-2, Penn Treebank).

 N-gram Models: Typically have much higher perplexity (e.g., 100+ for trigram models).

 Human Baseline: Estimated to be ~10-20 (humans also face ambiguity in language).

5. When to Use Perplexity?

Use Case Alternative Metrics

Pre-training Evaluation Perplexity

Fine-tuning Tasks BLEU, ROUGE (for translation/summarization)

Human-like Text Human evaluation (e.g., fluency, coherence)

6. Key Takeaways

1. Perplexity measures how "surprised" a model is by unseen text.


2. Lower values = better predictive performance.

3. Useful for comparing models but doesn’t guarantee good text generation.

4. Works best alongside other metrics (e.g., BLEU, human judgment)

Smoothing Techniques in N-Gram Models

Smoothing is a crucial technique in N-gram language modeling to handle unseen word


sequences (zero-probability issues) and improve generalization. Below are the key methods, ranked
from simplest to most advanced:

1. Why Smoothing is Needed?

Problem:

 N-gram models assign zero probability to unseen N-grams (due to data sparsity).

 Example: If "cat sat" never appeared in training, P("sat"∣"cat")=0P("sat"∣"cat")=0.

Solution:
Smoothing redistributes probability mass to unseen events while discounting seen ones.

2. Types of Smoothing Techniques

(1) Laplace (Add-One) Smoothing

 Idea: Add 1 to all N-gram counts (including unseen ones).

 Formula:

P(wi∣wi−1)=Count(wi−1,wi)+1Count(wi−1)+VP(wi∣wi−1)=Count(wi−1)+VCount(wi−1,wi)+1

o VV = Vocabulary size (total unique words).

 Pros: Simple.

 Cons: Over-smoothes (assigns too much mass to rare/unseen N-grams).

Example:

 If "cat sat" occurred 15 times and "cat" occurred 50 times, with V=1000V=1000:

PLaplace("sat"∣"cat")=15+150+1000≈0.015PLaplace("sat"∣"cat")=50+100015+1≈0.015

(2) Add-K Smoothing (Lidstone Smoothing)

 Improvement: Add a fractional kk (e.g., 0.5) instead of 1.

 Formula:

P(wi∣wi−1)=Count(wi−1,wi)+kCount(wi−1)+k⋅VP(wi∣wi−1)=Count(wi−1)+k⋅VCount(wi−1,wi)+k
 Pros: More flexible than Laplace.

 Cons: Requires tuning kk.

(3) Good-Turing Smoothing

 Idea: Estimate the probability of unseen N-grams using the frequency of N-grams seen
once ("singletons").

 Formula:

PGT=N1N(for unseen N-grams)PGT=NN1(for unseen N-grams)

o N1N1 = Number of N-grams seen once.

o NN = Total N-grams.

 Pros: Theoretically grounded.

 Cons: Fails for high-frequency N-grams.

Example:

 If 10 bigrams appeared once in a corpus of 1000 bigrams:

PGT(unseen)=101000=0.01PGT(unseen)=100010=0.01

(4) Jelinek-Mercer (Linear Interpolation)

 Idea: Combine higher-order and lower-order N-grams (e.g., trigram + bigram + unigram).

 Formula:

P(wi∣wi−2,wi−1)=λ1Ptrigram+λ2Pbigram+λ3PunigramP(wi∣wi−2,wi−1)=λ1Ptrigram+λ2Pbigram+λ3
Punigram

o λλs are weights (∑λ=1∑λ=1).

 Pros: Balances context specificity and data sparsity.

 Cons: Requires tuning λλs (often via EM algorithm).

(5) Kneser-Ney Smoothing

 State-of-the-art: Used in modern N-gram models (e.g., Google N-gram Corpus).

 Key Idea:

o Absolute Discounting: Subtract a fixed DD from seen N-gram counts.

o Continuation Probability: Estimate P(wi)P(wi) based on how many contexts it


appears in (not raw frequency).

 Formula:
PKN(wi∣wi−1)=max⁡(Count(wi−1,wi)−D,0)Count(wi−1)+λ(wi−1)Pcontinuation(wi)PKN(wi∣wi−1
)=Count(wi−1)max(Count(wi−1,wi)−D,0)+λ(wi−1)Pcontinuation(wi)

o Pcontinuation(wi)=∣{wi−1:Count(wi−1,wi)>0}∣∣All bigrams∣Pcontinuation(wi
)=∣All bigrams∣∣{wi−1:Count(wi−1,wi)>0}∣

 Pros: Handles rare words better (e.g., "San Francisco" vs. "Francisco" alone).

3. Comparison of Smoothing Methods

Method Handles Unseen Handles Rare Complexity Used In

Laplace Yes Poorly Low Basic NLP

Good-Turing Yes Moderately Medium Small corpora

Jelinek-Mercer Yes Well High Speech recognition

Kneser-Ney Yes Best Highest SOTA N-gram models

4. Practical Example

Task: Compute P("sat"∣"cat")P("sat"∣"cat") with Kneser-Ney smoothing.

 Training Corpus:

o Bigram "cat sat" appeared 15 times.

o Word "cat" appeared 50 times.

o "sat" appeared in 3 distinct contexts (e.g., "cat sat", "dog sat", "bird sat").

o Total unique bigrams = 1000.

o Discount D=0.75D=0.75.

Calculation:

1. Discount seen bigrams:

max⁡(15−0.75,0)50=14.2550=0.28550max(15−0.75,0)=5014.25=0.285

2. Continuation probability:

Pcontinuation("sat")=31000=0.003Pcontinuation("sat")=10003=0.003

3. Normalization factor λλ:

λ("cat")=0.75×∣{w:Count("cat",w)>0}∣50λ("cat")=500.75×∣{w:Count("cat",w)>0}∣

If "cat" precedes 5 unique words, λ=0.75×550=0.075λ=500.75×5=0.075.

4. Final probability:

PKN("sat"∣"cat")=0.285+0.075×0.003≈0.285PKN("sat"∣"cat")=0.285+0.075×0.003≈0.285
5. Which One to Use?

 Small datasets: Good-Turing or Laplace.

 Medium datasets: Jelinek-Mercer.

 Large datasets: Kneser-Ney (industry standard).

Modern NLP Note:


While smoothing is vital for N-grams, neural LMs (e.g., BERT, GPT) use subword
tokenization and embedding dropout to handle sparsity implicitly.

Summary

Smoothing ensures N-gram models generalize to unseen data. Kneser-Ney is the gold standard, but
simpler methods (Laplace, Good-Turing) are useful for understanding the fundamentals.

Need code implementations? Ask for Python examples!

Explain the architecture of NLG systems with a block diagram.

Architecture of Natural Language Generation (NLG) Systems

Natural Language Generation (NLG) converts structured data into human-readable text. Below is
a block diagram and explanation of its key components:

Block Diagram of NLG System

plaintext

Copy

+-------------------+ +-------------------+ +-------------------+ +-------------------+

| Data Input | ----> | Content | ----> | Text | ----> | Output |

| (Structured Data)| | Planning | | Realization | | (Natural Language|

+-------------------+ | (What to Say) | | (How to Say It) | | Text/Speech) |

+-------------------+ +-------------------+ +-------------------+

| |

v v

+-------------------+ +-------------------+

| Discourse | | Lexical & |

| Structuring | | Syntactic Choices|

+-------------------+ +-------------------+
Key Components Explained

1. Data Input (Structured Data)

 Input: Tables, databases, JSON, or knowledge graphs.

 Example: Weather data ({temp: 25°C, humidity: 80%}).

2. Content Planning (What to Say)

 Task: Decides what information to include.

 Methods:

o Rule-based: Templates (If rain → mention umbrella).

o ML-based: Entity importance ranking.

 Output: Message plan (["temp=25", "humidity=high"]).

3. Discourse Structuring (Logical Flow)

 Task: Organizes content into a coherent structure.

 Methods:

o Rhetorical Structure Theory (RST): Creates logical relations (cause-effect, contrast).

o Sequencing: Time/importance order ("High humidity (80%) with 25°C").

4. Text Realization (How to Say It)

 Task: Converts structured messages into grammatical sentences.

 Sub-components:

o Lexical Choice: Picks words ("high" vs "elevated" humidity).

o Syntax Generation: Applies grammar rules ("The temperature is 25°C").

o Referring Expressions: Handles pronouns ("It will rain").

5. Output (Natural Language Text/Speech)

 Final Output:

o Text: "Today’s temperature is 25°C with high humidity (80%)."

o Speech: Converted via TTS (e.g., Alexa, Siri).

Types of NLG Systems

Type Approach Example

Template-Based Fixed placeholders "The {temp} is {value}°C."


Type Approach Example

Pipeline Sequential (as above) Classic NLG (e.g., WEATHERGEN)

End-to-End Neural networks (Seq2Seq) GPT-3, T5

Example: Weather Report Generation

1. Input: {temp: 25, humidity: 80, rain: false}

2. Content Planning: Include temp & humidity; skip rain.

3. Discourse: Order by importance → Temp → Humidity.

4. Realization:

o Lexical: "high" for humidity.

o Syntax: "The temperature is 25°C with high humidity (80%)."

5. Output: Final text.

Modern NLG (Neural Approaches)

 Architecture:

plaintext

Copy

+-------------------+ +-------------------+ +-------------------+

| Encoder | ----> | Decoder | ----> | Text |

| (Data → Vectors) | | (Vectors → Text) | | Output |

+-------------------+ +-------------------+ +-------------------+

 Models: GPT-3, BART, T5 (no explicit planning; end-to-end learning).

Key Challenges

1. Coherence: Maintaining logical flow in long texts.

2. Diversity: Avoiding repetitive outputs.

3. Control: Ensuring factual correctness (e.g., in medical reports).

Summary

NLG systems transform structured data → natural language through:

1. Content Planning (What to say),


2. Discourse Structuring (Ordering),

3. Text Realization (Grammar + word choice).

Neural models (GPT, T5) merge these steps into one end-to-end system.

Would you like a code example for a simple NLG pipeline? Let me know!

Compare Statistical Machine Translation (SMT) vs. Neural Machine Translation (NMT)

Here’s a detailed comparison between Statistical Machine Translation (SMT) and Neural Machine
Translation (NMT):

Statistical Machine Translation (SMT) vs. Neural Machine Translation (NMT)

Feature Statistical Machine Translation (SMT) Neural Machine Translation (NMT)

Based on probabilistic models (e.g., Uses deep neural networks (e.g.,


Approach
phrase/word alignments). Seq2Seq, Transformers).

Breaks sentences into phrases, Learns end-to-end mapping from source


Core Idea
translates them, and reorders. to target language.

- Phrase tables
- Encoder-Decoder architecture
Key Components - Language model
- Attention mechanism
- Reordering model

Relies on parallel corpora (source ↔ Requires large parallel corpora but


Training Data
target sentences). learns features automatically.

1. Segment source text into phrases. 1. Encoder converts source to vector.


Translation
2. Look up translations. 2. Decoder generates target
Process
3. Reorder and glue phrases. sequentially.

Handling Uses context from n-gram language Captures long-range dependencies via
Ambiguity models (limited). attention (better context).

Fluency & Often produces rigid, less fluent output Generates more natural, human-like
Coherence due to phrase stitching. translations.

Out-of-Vocabulary Struggles with rare/unknown words Handles OOV better via subword units
(OOV) (requires backoff methods). (BPE, WordPiece).

Domain Needs manual feature engineering for Adapts better with fine-tuning (transfer
Adaptation new domains. learning).

Good for short, simple sentences; Superior for complex sentences and
Performance
struggles with long sentences. long-range dependencies.

Faster at inference (pre-computed Slower due to sequential decoding but


Speed
phrase tables). optimized with GPUs.
Feature Statistical Machine Translation (SMT) Neural Machine Translation (NMT)

- Google Translate (GNMT)


- Moses (phrase-based SMT)
Examples - OpenAI GPT-3
- IBM Models (word-based)
- Transformer models

Key Differences Explained

1. Translation Approach

 SMT:

o Uses statistical models (e.g., noisy channel model).

o Splits sentences into phrases, translates them independently, and reorders.

o Relies on handcrafted features (e.g., phrase alignment, distortion penalties).

 NMT:

o Treats translation as a sequence-to-sequence (Seq2Seq) problem.

o Encoder converts source sentence into a dense vector.

o Decoder generates target words one-by-one (autoregressive).

o Uses attention mechanisms (e.g., Transformer) to focus on relevant words.

2. Context Handling

 SMT: Limited to local context (n-grams).

o Example: "bank" might be mistranslated if the sentence is long.

 NMT: Captures global context via attention.

o Example: Correctly translates "bank" as financial or river based on full sentence.

3. Output Quality

 SMT:

o Pros: Fast, interpretable (phrase tables can be inspected).

o Cons: Less fluent, struggles with idiomatic expressions.

 NMT:

o Pros: More fluent, handles idioms and long sentences better.

o Cons: Black-box nature (hard to debug errors).

4. Training & Scalability

 SMT: Requires pipeline tuning (language model, reordering model).

 NMT: End-to-end training (single model optimizes all steps).


Example Translation

Source: "The cat sat on the mat."

 SMT: Looks up "the cat" → "le chat", "sat on" → "assis sur", etc., and combines them.

 NMT: Encodes the entire sentence, then generates French words sequentially: "Le chat s'est
assis sur le tapis."

When to Use Which?

 SMT:

o Low-resource languages (small parallel corpora).

o Applications needing speed (e.g., real-time chat).

 NMT:

o High-resource languages (e.g., EN↔FR).

o Quality-critical tasks (e.g., publishing, customer support).

Conclusion

 SMT was dominant pre-2014 (e.g., Google Translate used it until 2016).

 NMT (post-2016) now dominates due to better fluency and context handling.

 Hybrid models (SMT + NMT) are sometimes used for low-resource languages.

Would you like a deep dive into Transformer-based NMT? Let me know!

What are the challenges in Machine Translation and how can they be solved?

Challenges in Machine Translation (MT) & Their Solutions

Machine Translation (MT) faces several challenges due to linguistic complexity, cultural differences,
and technical limitations. Below are the key challenges and potential solutions:

1. Ambiguity in Language

Problem:

 Words/phrases can have multiple meanings (lexical/syntactic ambiguity).

o Example: "He saw the bank" → Financial bank or river bank?

Solutions:

 Contextual Embeddings (BERT, GPT) to capture word sense from surrounding text.

 Disambiguation Rules (e.g., "bank" + "money" → financial institution).


2. Idioms & Cultural Nuances

Problem:

 Literal translation fails for idioms (e.g., "Kick the bucket" → "Die").

Solutions:

 Parallel Corpora with Idioms (train on phrase pairs like "It’s raining cats and dogs" → "Il
pleut des cordes").

 Post-Editing by Humans for culturally sensitive content.

3. Low-Resource Languages

Problem:

 Limited training data for rare languages (e.g., Swahili ↔ Navajo).

Solutions:

 Transfer Learning: Pretrain on high-resource languages (e.g., English), then fine-tune.

 Backtranslation: Generate synthetic parallel data (e.g., translate monolingual text).

 Multilingual Models (e.g., mBERT, NLLB) that share parameters across languages.

4. Word Order Differences

Problem:

 Languages have different syntax (e.g., English: Subject-Verb-Object vs. Japanese: Subject-
Object-Verb).

Solutions:

 Attention Mechanisms (Transformers) to align words dynamically.

 Reordering Rules (SMT) or syntactic preprocessing.

5. Rare/Unknown Words (OOV)

Problem:

 Proper nouns (e.g., "ChatGPT"), technical terms, or slang may be missing.

Solutions:

 Subword Tokenization (Byte-Pair Encoding, WordPiece) to split rare words.

 Copy Mechanism: Directly copy OOV words from source to target.

6. Domain Adaptation
Problem:

 Medical/legal MT models perform poorly on casual text (and vice versa).

Solutions:

 Fine-Tuning: Train base models on domain-specific data.

 Hybrid Models: Combine general MT with domain-specific rules.

7. Gender & Bias

Problem:

 Models reinforce stereotypes (e.g., "doctor → he, nurse → she").

Solutions:

 Debiasing Datasets (balance gender references).

 Neutral Rewriting: Use gender-neutral terms (e.g., "they").

8. Evaluation Challenges

Problem:

 BLEU/ROUGE scores don’t always match human judgment.

Solutions:

 Human Evaluation + automated metrics (e.g., COMET, BERTScore).

 Adversarial Testing: Check robustness to perturbations.

Key Takeaways

Challenge Solution Example Tools/Models

Ambiguity Contextual embeddings BERT, GPT-3

Idioms Curated parallel corpora OPUS, ParaCrawl

Low-Resource Languages Transfer learning, backtranslation mBERT, NLLB-200

Word Order Attention mechanisms Transformer, MarianMT

OOV Words Subword tokenization SentencePiece, WordPiece

Domain Adaptation Fine-tuning BioBERT, Legal-BERT

Bias Debiasing datasets HuggingFace’s unbiased models


Future Directions

 Zero-Shot Translation: Translate between language pairs never seen during training.

 Explainable AI: Debug why MT models make certain errors.

 Real-Time Adaptive MT: Learn user preferences on the fly.

Conclusion: While NMT (e.g., Google Translate, DeepL) has improved translation quality, challenges
like ambiguity, bias, and low-resource languages persist. Combining neural models with linguistic
rules and human-in-the-loop validation offers the best path forward.

Would you like a case study on how Google Translate handles these challenges? Let me know!

Discuss the applications of NLG in real-world scenarios.

Applications of Natural Language Generation (NLG) in Real-World Scenarios

Natural Language Generation (NLG) transforms structured data into human-like text, enabling
automation and enhanced communication across industries. Below are key real-world applications:

1. Business Intelligence & Analytics

Use Case: Automating reports from data (sales, finance, operations).

 Examples:

o Financial Reports: Generate earnings summaries from spreadsheets.

o Dashboards: Convert KPIs into narratives (e.g., "Q3 sales grew by 12% due to
demand in Europe").

 Tools:

o Narrative Science’s Quill: Creates data-driven business reports.

o Power BI + NLG: Adds textual insights to visualizations.

2. E-Commerce & Retail

Use Case: Product descriptions, personalized recommendations.

 Examples:

o Amazon: Auto-generates product descriptions for millions of items.

o Chatbots: Suggest products in natural language (e.g., "You might like this red dress
based on your browsing history").

 Tools:

o GPT-3 for Shopify: Writes SEO-friendly product descriptions.

3. Healthcare
Use Case: Clinical documentation, patient summaries.

 Examples:

o Radiology Reports: Convert imaging data into text (e.g., "The X-ray shows a 2cm
nodule in the left lung").

o Discharge Summaries: Automate post-visit instructions for patients.

 Tools:

o IBM Watson Health: Generates oncology reports.

4. Journalism & Content Creation

Use Case: News generation, sports recaps, weather forecasts.

 Examples:

o Associated Press (AP): Uses NLG to publish thousands of earnings reports quarterly.

o Weather.com: Converts data into forecasts (e.g., "Expect rain after 3 PM with 80%
humidity").

 Tools:

o Automated Insights’ Wordsmith: Powers AP’s automated stories.

5. Customer Service & Chatbots

Use Case: Dynamic responses in chatbots, email automation.

 Examples:

o Banking: Explain transaction disputes in plain language.

o Zendesk: Auto-generates ticket responses.

 Tools:

o Google’s Dialogflow: Creates context-aware chatbot replies.

6. Education & E-Learning

Use Case: Personalized feedback, quiz generation.

 Examples:

o Duolingo: Generates language exercises dynamically.

o Automated Grading: Provides essay feedback (e.g., "Your argument needs more
evidence").

 Tools:
o OpenAI’s Codex: Explains programming errors in natural language.

7. Legal & Compliance

Use Case: Drafting contracts, summarizing case law.

 Examples:

o Legal Documents: Auto-generate clauses for NDAs.

o LexisNexis: Summarizes court rulings.

 Tools:

o ROSS Intelligence: Uses AI for legal research.

8. Gaming & Interactive Stories

Use Case: Dynamic NPC dialogues, procedural storytelling.

 Examples:

o AI Dungeon: Generates interactive fantasy stories.

o ChatGPT for RPGs: Creates quest narratives on the fly.

9. Social Media & Marketing

Use Case: Ad copy, personalized emails.

 Examples:

o Phrasee: Optimizes email subject lines using NLG.

o ChatGPT: Writes LinkedIn posts or tweet threads.

Key Benefits of NLG

 Scalability: Generate millions of texts instantly.

 Consistency: Avoids human errors in repetitive tasks.

 Personalization: Tailors content to user data.

Challenges

 Quality Control: May produce biased or inaccurate text.

 Context Limits: Struggles with nuanced humor/sarcasm.

Future Trends
 Multimodal NLG: Combine text with images/video (e.g., AI-generated video scripts).

 Real-Time NLG: Live sports commentary, stock market alerts.

Conclusion: NLG is revolutionizing industries by automating text generation, but human oversight
remains critical for quality. From healthcare to gaming, its applications are expanding with advances
in AI like GPT-4 and Google’s PaLM.

Want a deep dive into NLG for a specific industry? Ask away!

What are the problems in translating Indian languages? include hindi telugu

3/3

Challenges in Translating Indian Languages (Hindi, Telugu, and Others)

Translating between Indian languages (e.g., Hindi to Telugu or vice versa) poses unique difficulties
due to linguistic diversity, script variations, and cultural nuances. Below are the key challenges and
potential solutions:

1. Linguistic Diversity & Grammar Differences

Problems:

 Word Order Variations:

o Hindi follows Subject-Object-Verb (SOV) order ("मैं सेब खाता हूँ " → "I apple eat").

o Telugu also uses SOV ("నేను ఆపిల్ తింటాను"), but other Dravidian languages like
Tamil may differ.

 Gender & Verb Agreement:

o Hindi verbs change based on gender ("खाता हूँ " for male vs. "खाती हूँ " for female).

o Telugu verbs also inflect for gender ("తింటాను" for male, "తింటున్నా ను" for
female).

 Lack of Standardization:

o Dialectal variations (e.g., Hyderabadi Hindi vs. Delhi Hindi).

Solutions:

 Rule-based reordering in Statistical Machine Translation (SMT).

 Neural Machine Translation (NMT) models with attention mechanisms (e.g., Google’s
IndicTrans).

2. Script & Orthography Challenges

Problems:

 Different Scripts:
o Hindi: Devanagari (अक्षर)

o Telugu: Telugu script (అక్షరిం)

o Romanized Hindi/Telugu (e.g., "Akshar") adds complexity.

 Schwa Deletion in Hindi:

o Hindi often drops the schwa sound (e.g., "राम" is pronounced "Ram" but written
with an inherent 'a').

Solutions:

 Unicode normalization for script conversion.

 Transliteration tools (e.g., Google’s Indic Transliteration API).

3. Lack of Parallel Corpora

Problems:

 Scarce Training Data:

o Limited Hindi-Telugu parallel datasets compared to English-Hindi.

 Domain-Specific Gaps:

o Medical/legal translations are rare.

Solutions:

 Backtranslation (generate synthetic data).

 Multilingual NMT models (e.g., Facebook’s NLLB-200).

4. Morphological Richness

Problems:

 Agglutination in Telugu:

o Words combine multiple morphemes (e.g., "చదివించింది" = "made someone


read").

 Compound Words in Hindi:

o "राष्ट्रपतत" (Rashtrapati = President) is a single word.

Solutions:

 Morphological analyzers (e.g., IIIT-Hyderabad’s tools for Telugu).

 Subword tokenization (Byte-Pair Encoding).


5. Idioms & Cultural Nuances

Problems:

 Untranslatable Phrases:

o Hindi: "जान छु ड़ाना" (to save one’s life) has no direct Telugu equivalent.

o Telugu: "గు డ్డి గుర్రిం" (blind horse = stubborn person).

 Religious/Cultural Terms:

o "Prasadam" (Hindi/Telugu) has no English equivalent.

Solutions:

 Bilingual dictionaries with explanations.

 Human-in-the-loop post-editing.

6. Code-Mixing & Loanwords

Problems:

 English Words in Hindi/Telugu:

o "मैं ने उसको call तकया" (Hindi)

o "నేను అతనికి call చేశాను" (Telugu)

 Persian/Arabic Influence in Hindi:

o "मु श्किल" (difficulty) vs. Telugu’s native "క ష్ిం


ట ".
Solutions:

 Code-mixed language models (e.g., Hinglish/Telugu-English corpora).

7. Low-Resource Language Pairs

Problems:

 Hindi-Telugu translation tools are less developed than Hindi-English.

 Fewer NLP Researchers focus on Dravidian languages.

Solutions:

 Government initiatives (e.g., AI4Bharat’s IndicNLP Suite).

 Community-driven datasets (e.g., Anuvaad for Hindi-Telugu).

Comparison: Hindi vs. Telugu Translation Challenges


Challenge Hindi Telugu

Script Devanagari Telugu script

Verb Agreement Gender-based (M/F) Gender & plurality-based

Word Order SOV SOV

Loanwords Persian/Arabic/English Sanskrit/English

Existing Tools for Hindi-Telugu Translation

1. Google Translate (Neural MT, decent but error-prone).

2. Microsoft Translator (Supports code-mixed input).

3. IndicTrans (IIIT-Hyderabad’s open-source NMT model).

4. Shakti MT (IIT-led initiative for Indian languages).

Future Directions

 Better Pretrained Models (e.g., IndicBERT).

 Speech-to-Speech Translation (e.g., Hindi-Telugu real-time apps).

Conclusion: While Hindi-Telugu translation is improving with NMT, challenges like data scarcity,
script differences, and idioms persist. Collaborative efforts between linguists and AI researchers are
key to progress.

Need examples of Hindi-Telugu translation errors? Ask for a case study!

xplain Rule-Based vs. Corpus-Based Machine Translation.

Rule-Based vs. Corpus-Based Machine Translation: Key Differences

Machine Translation (MT) systems can be broadly categorized into Rule-Based (RBMT) and Corpus-
Based (CBMT) approaches. Here's a detailed comparison:

1. Rule-Based Machine Translation (RBMT)

Definition:
Uses linguistic rules (grammar, syntax, semantics) and bilingual dictionaries to translate text.

How It Works:

1. Analysis: Parses source text into grammatical structures.

2. Transfer: Applies language-pair rules to convert structures.

3. Generation: Produces target-language text using dictionaries.

Types of RBMT:
 Direct MT: Word-by-word substitution (e.g., early systems like SYSTRAN).

 Transfer-Based MT: Uses intermediate structural representations.

 Interlingua MT: Translates via a universal abstract language.

Pros:

✅ Transparent: Rules are human-readable and editable.


✅ No training data needed: Works for low-resource languages.
✅ Consistent outputs: Follows predefined rules strictly.

Cons:

❌ Labor-intensive: Requires manual rule creation by linguists.


❌ Inflexible: Fails with idioms, slang, or complex sentences.
❌ Poor scalability: Adding new languages requires new rules.

Example:

 Input (Hindi): "वह तकताब पढ़ रहा है ।"

 RBMT Steps:

1. Parse: [वह] [तकताब] [पढ़ रहा है ] → Pronoun + Noun + Verb.

2. Transfer: Hindi SOV → English SVO.

3. Generate: "He is reading the book."

2. Corpus-Based Machine Translation (CBMT)

Definition:
Relies on statistical patterns or neural networks learned from large bilingual text corpora.

Subtypes:

1. Statistical MT (SMT):

o Uses probability models (e.g., IBM Models, phrase-based SMT).

o Example: Google Translate (pre-2016).

2. Neural MT (NMT):

o Uses deep learning (e.g., Seq2Seq, Transformers).

o Example: Modern Google Translate, DeepL.

How It Works:

 SMT:

1. Aligns parallel sentences (e.g., Hindi-English).

2. Calculates translation probabilities (e.g., P("book"∣"तकताब"P("book"∣"तकताब").


 NMT:

1. Encodes source text into vectors.

2. Decodes vectors into target language.

Pros:

✅ Adaptive: Learns from data (handles idioms, slang).


✅ Scalable: Improves with more data.
✅ Fluency: NMT produces more natural outputs.

Cons:

❌ Data-hungry: Requires massive parallel corpora.


❌ Black-box: Hard to debug errors.
❌ Biases: Inherits biases from training data.

Example:

 Input (Hindi): "वह तकताब पढ़ रहा है ।"

 SMT/NMT Steps:

1. Match with similar training examples.

2. Output: "He is reading the book."

Key Differences

Feature Rule-Based (RBMT) Corpus-Based (CBMT)

Basis Linguistic rules Statistical/neural patterns

Data Needed None (only dictionaries/rules) Large parallel corpora

Flexibility Rigid Adapts to context

Development Cost High (manual labor) Lower (automated training)

Output Quality Grammatical but unnatural Fluent but may hallucinate

Best For Low-resource languages High-resource languages

Hybrid Approaches

Modern systems (e.g., Google Translate) combine:

 RBMT: For rare languages (e.g., Sanskrit).

 NMT: For high-resource pairs (e.g., Hindi-English).


Conclusion

 Use RBMT when:

o You need control over translations (e.g., legal texts).

o Working with low-resource languages (e.g., Bhojpuri).

 Use CBMT (NMT) when:

o You have large datasets.

o Prioritize fluency (e.g., customer support chatbots).

NLG Architecture

Natural Language Generation (NLG) Architectures

Adapted from Rags Mellish et al. (2004) and other seminal works

1. Common Tasks in NLG Systems

All NLG systems perform three core tasks, with increasing specificity:

Task Function Examples

Discourse Plans overall structure (order, hierarchy). Outputs a tree-like DRAFTER,


Planner schema. DRAFTER-2

Decides sentence structure: aggregation, lexicalization,


Text Planner HEALTHDOC
referring expressions.

Surface Realizer Generates grammatically correct sentences from the plan. KUNET

Example of Specificity:
 Vague: "He danced with Jaya and she got angry."

 Specific: "He forced Jaya to dance, and she got angry."

o Choice of verb ("forced") changes the referent of "she."

2. NLG Architecture Types

(A) Pipelined Architecture

 Design: Strict sequential flow (Discourse → Text → Surface).

 Constraints: Each stage must complete before the next begins.

 Pros: Modular, easy to debug.

 Cons: Inflexible; errors propagate downstream.

 Example:

o Input: Data → Discourse Plan → Text Plan → "He danced with Jaya and she got
angry."

(B) Interleaved Architecture

 Design: Merges stages into two phases:

1. Planning (Discourse + Text).

2. Realization (Surface).

 Pros: More flexible than pipelined.

 Cons: Harder to maintain.

 Examples: PAULINE, MUMBLE, IPG, PROTEUS.

(C) Integrated (Blackboard) Architecture

 Design: All tasks interact dynamically on a shared "blackboard."

 Pros: Highly adaptable; handles complex constraints.

 Cons: Computationally intensive.

 Example: GLINDA.

3. Key Differences

Feature Pipelined Interleaved Integrated

Flow Linear Two-phase Dynamic

Flexibility Low Moderate High

Use Case Simple, deterministic Balanced complexity Highly context-dependent


4. Challenges & Trade-offs

 Pipelined: Struggles with backtracking (e.g., realizing a sentence may reveal planning errors).

 Interleaved/Integrated: Better for nuanced texts but require more computational power.

Example of Constraint Propagation:

 Pipelined: Surface realizer cannot revise the text planner’s choice of "danced" vs. "forced."

 Integrated: Blackboard allows real-time negotiation between modules.

5. Practical Applications

 Pipelined: Weather reports, templated medical summaries.

 Interleaved: Personalized narratives (e.g., patient education).

 Integrated: Creative storytelling, dialogue systems.

Summary

 Pipelined: "Assembly line" approach.

 Interleaved: Balances structure and flexibility.

 Integrated: Mimics human-like adaptability.

Next Steps: Explore how modern neural NLG (GPT-3, T5) blends these architectures.

You might also like