Notes
Notes
Your discussion highlights key challenges in compositional semantics, where the meaning of a
sentence is built from the meanings of its parts (words). A major hurdle is word sense ambiguity:
the same word (nail) can evoke different meanings depending on context. Below is a structured
breakdown of the issues and approaches you mentioned.
o Some theories (e.g., Prototype Theory) argue that word meanings are fuzzy, with
central and peripheral senses.
Clustering (e.g., k-means) groups similar word usages into "induced senses."
Classic ML (e.g., SVM, Decision Trees) trained on sense-annotated corpora (e.g., SemCor).
Neural WSD:
o Uses contextual embeddings (e.g., BERT) to predict the most likely sense.
Here, selectional preferences (e.g., hammer only fits the tool sense) resolve ambiguity.
4. Challenges in WSD
Granularity of senses: Should nail (body part) and nail (metal spike) be separate senses, or
sub-senses?
3. Discourse: Prior mentions (e.g., "He bought nails" primes the hardware sense).
Key Takeaways
Modern NLP leans toward distributional semantics (embeddings) over rigid sense
inventories.
1. Homonymy
Definition:
Two or more unrelated meanings share the same spelling and pronunciation.
Examples:
Key Properties:
Requires strong contextual cues for disambiguation (e.g., "withdraw money from the
bank" vs. "swim near the river bank").
2. Polysemy
Definition:
A word has multiple related senses, often derived from a core meaning.
Examples:
Key Properties:
Graded meaning shift: Some senses are closer to the core meaning than others.
Often requires world knowledge (e.g., "bank of clouds" implies a stacked formation, similar
to how money is stored in a bank).
More challenging for machines than homonymy because boundaries between senses are
fuzzy.
3. Categorial Ambiguity
Definition:
A word belongs to different grammatical categories (e.g., noun vs. verb) with distinct
meanings.
Examples:
Key Properties:
Resolvable via part-of-speech (POS) tagging (e.g., if book is tagged as a verb, it cannot mean
a physical book).
Easier for NLP systems because syntax provides clear disambiguation cues.
1. Disambiguation Strategies
Categorial Syntactic parsing (POS tags, dependency "Book a flight" (verb) vs. "Read a
Ambiguity trees) book" (noun)
2. Challenges in NLP
Categorial Ambiguity:
o Solved early in the NLP pipeline (POS tagging), but errors propagate if misclassified.
3. Theoretical Implications
Cognitive Linguistics: Do humans store all senses separately, or derive them dynamically?
Conclusion
Modern NLP systems (e.g., transformer models) blend distributional semantics (embeddings)
with symbolic knowledge (WordNet) to handle these ambiguities. However, polysemy remains the
hardest due to its fluid, context-dependent nature.
Would you like a deeper dive into how BERT or FrameNet handles these ambiguities?
An N-gram model is a probabilistic language model that predicts the next word in a sequence based
on the previous N−1N−1 words. It relies on the Markov assumption, which simplifies language by
assuming that the probability of a word depends only on a fixed window of prior words.
Probability Calculation
P(wn∣wn−1)=Count(wn−1,wn)Count(wn−1)P(wn∣wn−1)=Count(wn−1)Count(wn−1,wn)
Example: If "cat sat" appears 100 times and "cat" appears 200
times, P("sat"∣"cat")=0.5P("sat"∣"cat")=0.5.
Solutions:
Problem: Storing all possible n-grams requires exponential memory (e.g., a 5-gram model
over 50k words needs 50k550k5 entries).
Solutions:
Problem: N-grams only consider local context (e.g., bigrams ignore words beyond the
immediate predecessor).
o Example: In "The cat that I adopted yesterday sat on the mat," the relationship
between "cat" and "sat" is lost in a bigram model.
Problem: New/unseen words break the model (e.g., slang, rare terms).
Solutions:
Solutions:
Conclusion
N-gram models are simple but limited by sparsity, fixed context, and lack of semantics. Modern NLP
leverages neural networks (Transformers, embeddings) to overcome these issues, though n-grams
remain useful for lightweight applications.
Next Steps: Explore Word2Vec, BERT, or smoothing techniques in detail? Let me know!
Reference resolution is the task of identifying what entities or concepts pronouns, demonstratives,
or other referring expressions point to in a discourse. It is crucial for tasks like coreference
resolution, anaphora resolution, and discourse understanding. Below are key techniques and their
applications.
o "Mary called her friend. She was happy." → "her", "she" = Mary
o "I entered the room. The ceiling was high." → "the ceiling" belongs to "the room"
"The doctor finished her shift." → "her" must refer to a female doctor.
o A classic rule-based method that traverses a syntactic tree to find the closest
compatible antecedent.
Feature-Based Models
o Uses supervised learning (e.g., SVM, decision trees) with features like:
Semantic compatibility (e.g., "The car… it" vs. "The car… she").
Mention-Pair Models
o "The bank collapsed. It was near the river." → WordNet helps distinguish financial
vs. river bank.
4. Evaluation Metrics
5. Applications
Tool Description
7. Future Directions
Conclusion
Reference resolution is essential for coherent NLP systems, evolving from rule-based to neural
methods. While transformers (BERT, GPT) now dominate, challenges like ambiguity and world
knowledge integration remain open problems.
Would you like a deeper dive into a specific technique (e.g., Hobbs' algorithm or SpanBERT)? Let
me know!
In discourse analysis, cohesion and coherence are fundamental concepts that determine how well a
text holds together. While they are related, they address different aspects of textual unity.
Definition:
Cohesion refers to the grammatical and lexical devices that link sentences and clauses explicitly,
making the text structurally connected.
Reference "John arrived. He was late." Pronouns (he, she, it) refer back to nouns.
Substitution "I prefer tea. Do you want some?" Replaces a word (some = some tea).
Ellipsis "Who took the book? Mary [did]." Omits redundant words.
Conjunctions "It rained, so we stayed home." Links clauses (so, because, however).
Lexical Cohesion "The cat purred. The animal slept." Synonyms/repetition tie ideas together.
Key Point:
Cohesion is explicit—it relies on visible linguistic markers that create formal connections between
sentences.
Definition:
Coherence refers to the underlying logical consistency and meaningful organization of ideas in a
text, making it understandable to the reader.
Elements of Coherence
Logical Order "First, boil water. Next, add tea leaves." Sequences ideas chronologically.
Key Point:
Coherence is implicit—it depends on the reader’s ability to infer connections, even if explicit
markers (like conjunctions) are missing.
3. Key Differences
o "The sun is hot. Coffee is bitter. She runs fast." (Grammatically linked but illogical).
Best Practices:
5. Applications in NLP
Machine Translation: Must preserve cohesion (pronoun resolution) and coherence (logical
order).
Text Summarization: Cohesive chains (entity repetition) help retain key ideas.
Conclusion
Perplexity is a key metric used to evaluate how well a language model predicts a sequence of
words. It measures the uncertainty or "confusion" of the model when assigning probabilities to
unseen text. Lower perplexity indicates better performance.
1. Mathematical Definition
Perplexity=exp(−1N∑i=1NlogP(wi∣w<i))Perplexity=exp(−N1i=1∑NlogP(wi∣w<i))
where:
P(wi∣w<i)P(wi∣w<i) = probability assigned by the model to the ii-th word given its context.
If the model assigns high probability to the correct next word, perplexity decreases.
If the model is uncertain (low probability for correct words), perplexity increases.
Example:
Example:
Used to compare:
o A model may achieve low perplexity but generate nonsensical text (e.g., GPT-2
memorizing training data).
2. Corpus-Dependent
o Perplexity values are only meaningful when comparing models on the same test set.
3. Practical Example
Perplexity Calculation:
Cross-entropy=0.916+1.204+0.511+0.693+1.6095=0.9866Cross-
entropy=50.916+1.204+0.511+0.693+1.609
=0.9866Perplexity=e0.9866≈2.68Perplexity=e0.9866≈2.68
This means the model is roughly as uncertain as choosing between ~2.7 words at each step.
N-gram Models: Typically have much higher perplexity (e.g., 100+ for trigram models).
6. Key Takeaways
3. Useful for comparing models but doesn’t guarantee good text generation.
Problem:
N-gram models assign zero probability to unseen N-grams (due to data sparsity).
Solution:
Smoothing redistributes probability mass to unseen events while discounting seen ones.
Formula:
P(wi∣wi−1)=Count(wi−1,wi)+1Count(wi−1)+VP(wi∣wi−1)=Count(wi−1)+VCount(wi−1,wi)+1
Pros: Simple.
Example:
If "cat sat" occurred 15 times and "cat" occurred 50 times, with V=1000V=1000:
PLaplace("sat"∣"cat")=15+150+1000≈0.015PLaplace("sat"∣"cat")=50+100015+1≈0.015
Formula:
P(wi∣wi−1)=Count(wi−1,wi)+kCount(wi−1)+k⋅VP(wi∣wi−1)=Count(wi−1)+k⋅VCount(wi−1,wi)+k
Pros: More flexible than Laplace.
Idea: Estimate the probability of unseen N-grams using the frequency of N-grams seen
once ("singletons").
Formula:
o NN = Total N-grams.
Example:
PGT(unseen)=101000=0.01PGT(unseen)=100010=0.01
Idea: Combine higher-order and lower-order N-grams (e.g., trigram + bigram + unigram).
Formula:
P(wi∣wi−2,wi−1)=λ1Ptrigram+λ2Pbigram+λ3PunigramP(wi∣wi−2,wi−1)=λ1Ptrigram+λ2Pbigram+λ3
Punigram
Key Idea:
Formula:
PKN(wi∣wi−1)=max(Count(wi−1,wi)−D,0)Count(wi−1)+λ(wi−1)Pcontinuation(wi)PKN(wi∣wi−1
)=Count(wi−1)max(Count(wi−1,wi)−D,0)+λ(wi−1)Pcontinuation(wi)
o Pcontinuation(wi)=∣{wi−1:Count(wi−1,wi)>0}∣∣All bigrams∣Pcontinuation(wi
)=∣All bigrams∣∣{wi−1:Count(wi−1,wi)>0}∣
Pros: Handles rare words better (e.g., "San Francisco" vs. "Francisco" alone).
4. Practical Example
Training Corpus:
o "sat" appeared in 3 distinct contexts (e.g., "cat sat", "dog sat", "bird sat").
o Discount D=0.75D=0.75.
Calculation:
max(15−0.75,0)50=14.2550=0.28550max(15−0.75,0)=5014.25=0.285
2. Continuation probability:
Pcontinuation("sat")=31000=0.003Pcontinuation("sat")=10003=0.003
λ("cat")=0.75×∣{w:Count("cat",w)>0}∣50λ("cat")=500.75×∣{w:Count("cat",w)>0}∣
4. Final probability:
PKN("sat"∣"cat")=0.285+0.075×0.003≈0.285PKN("sat"∣"cat")=0.285+0.075×0.003≈0.285
5. Which One to Use?
Summary
Smoothing ensures N-gram models generalize to unseen data. Kneser-Ney is the gold standard, but
simpler methods (Laplace, Good-Turing) are useful for understanding the fundamentals.
Natural Language Generation (NLG) converts structured data into human-readable text. Below is
a block diagram and explanation of its key components:
plaintext
Copy
| |
v v
+-------------------+ +-------------------+
+-------------------+ +-------------------+
Key Components Explained
Methods:
Methods:
Sub-components:
Final Output:
4. Realization:
Architecture:
plaintext
Copy
Key Challenges
Summary
Neural models (GPT, T5) merge these steps into one end-to-end system.
Would you like a code example for a simple NLG pipeline? Let me know!
Compare Statistical Machine Translation (SMT) vs. Neural Machine Translation (NMT)
Here’s a detailed comparison between Statistical Machine Translation (SMT) and Neural Machine
Translation (NMT):
- Phrase tables
- Encoder-Decoder architecture
Key Components - Language model
- Attention mechanism
- Reordering model
Handling Uses context from n-gram language Captures long-range dependencies via
Ambiguity models (limited). attention (better context).
Fluency & Often produces rigid, less fluent output Generates more natural, human-like
Coherence due to phrase stitching. translations.
Out-of-Vocabulary Struggles with rare/unknown words Handles OOV better via subword units
(OOV) (requires backoff methods). (BPE, WordPiece).
Domain Needs manual feature engineering for Adapts better with fine-tuning (transfer
Adaptation new domains. learning).
Good for short, simple sentences; Superior for complex sentences and
Performance
struggles with long sentences. long-range dependencies.
1. Translation Approach
SMT:
NMT:
2. Context Handling
3. Output Quality
SMT:
NMT:
SMT: Looks up "the cat" → "le chat", "sat on" → "assis sur", etc., and combines them.
NMT: Encodes the entire sentence, then generates French words sequentially: "Le chat s'est
assis sur le tapis."
SMT:
NMT:
Conclusion
SMT was dominant pre-2014 (e.g., Google Translate used it until 2016).
NMT (post-2016) now dominates due to better fluency and context handling.
Hybrid models (SMT + NMT) are sometimes used for low-resource languages.
Would you like a deep dive into Transformer-based NMT? Let me know!
What are the challenges in Machine Translation and how can they be solved?
Machine Translation (MT) faces several challenges due to linguistic complexity, cultural differences,
and technical limitations. Below are the key challenges and potential solutions:
1. Ambiguity in Language
Problem:
Solutions:
Contextual Embeddings (BERT, GPT) to capture word sense from surrounding text.
Problem:
Literal translation fails for idioms (e.g., "Kick the bucket" → "Die").
Solutions:
Parallel Corpora with Idioms (train on phrase pairs like "It’s raining cats and dogs" → "Il
pleut des cordes").
3. Low-Resource Languages
Problem:
Solutions:
Multilingual Models (e.g., mBERT, NLLB) that share parameters across languages.
Problem:
Languages have different syntax (e.g., English: Subject-Verb-Object vs. Japanese: Subject-
Object-Verb).
Solutions:
Problem:
Solutions:
6. Domain Adaptation
Problem:
Solutions:
Problem:
Solutions:
8. Evaluation Challenges
Problem:
Solutions:
Key Takeaways
Zero-Shot Translation: Translate between language pairs never seen during training.
Conclusion: While NMT (e.g., Google Translate, DeepL) has improved translation quality, challenges
like ambiguity, bias, and low-resource languages persist. Combining neural models with linguistic
rules and human-in-the-loop validation offers the best path forward.
Would you like a case study on how Google Translate handles these challenges? Let me know!
Natural Language Generation (NLG) transforms structured data into human-like text, enabling
automation and enhanced communication across industries. Below are key real-world applications:
Examples:
o Dashboards: Convert KPIs into narratives (e.g., "Q3 sales grew by 12% due to
demand in Europe").
Tools:
Examples:
o Chatbots: Suggest products in natural language (e.g., "You might like this red dress
based on your browsing history").
Tools:
3. Healthcare
Use Case: Clinical documentation, patient summaries.
Examples:
o Radiology Reports: Convert imaging data into text (e.g., "The X-ray shows a 2cm
nodule in the left lung").
Tools:
Examples:
o Associated Press (AP): Uses NLG to publish thousands of earnings reports quarterly.
o Weather.com: Converts data into forecasts (e.g., "Expect rain after 3 PM with 80%
humidity").
Tools:
Examples:
Tools:
Examples:
o Automated Grading: Provides essay feedback (e.g., "Your argument needs more
evidence").
Tools:
o OpenAI’s Codex: Explains programming errors in natural language.
Examples:
Tools:
Examples:
Examples:
Challenges
Future Trends
Multimodal NLG: Combine text with images/video (e.g., AI-generated video scripts).
Conclusion: NLG is revolutionizing industries by automating text generation, but human oversight
remains critical for quality. From healthcare to gaming, its applications are expanding with advances
in AI like GPT-4 and Google’s PaLM.
Want a deep dive into NLG for a specific industry? Ask away!
What are the problems in translating Indian languages? include hindi telugu
3/3
Translating between Indian languages (e.g., Hindi to Telugu or vice versa) poses unique difficulties
due to linguistic diversity, script variations, and cultural nuances. Below are the key challenges and
potential solutions:
Problems:
o Hindi follows Subject-Object-Verb (SOV) order ("मैं सेब खाता हूँ " → "I apple eat").
o Telugu also uses SOV ("నేను ఆపిల్ తింటాను"), but other Dravidian languages like
Tamil may differ.
o Hindi verbs change based on gender ("खाता हूँ " for male vs. "खाती हूँ " for female).
o Telugu verbs also inflect for gender ("తింటాను" for male, "తింటున్నా ను" for
female).
Lack of Standardization:
Solutions:
Neural Machine Translation (NMT) models with attention mechanisms (e.g., Google’s
IndicTrans).
Problems:
Different Scripts:
o Hindi: Devanagari (अक्षर)
o Hindi often drops the schwa sound (e.g., "राम" is pronounced "Ram" but written
with an inherent 'a').
Solutions:
Problems:
Domain-Specific Gaps:
Solutions:
4. Morphological Richness
Problems:
Agglutination in Telugu:
Solutions:
Problems:
Untranslatable Phrases:
o Hindi: "जान छु ड़ाना" (to save one’s life) has no direct Telugu equivalent.
Religious/Cultural Terms:
Solutions:
Human-in-the-loop post-editing.
Problems:
Problems:
Solutions:
Future Directions
Conclusion: While Hindi-Telugu translation is improving with NMT, challenges like data scarcity,
script differences, and idioms persist. Collaborative efforts between linguists and AI researchers are
key to progress.
Machine Translation (MT) systems can be broadly categorized into Rule-Based (RBMT) and Corpus-
Based (CBMT) approaches. Here's a detailed comparison:
Definition:
Uses linguistic rules (grammar, syntax, semantics) and bilingual dictionaries to translate text.
How It Works:
Types of RBMT:
Direct MT: Word-by-word substitution (e.g., early systems like SYSTRAN).
Pros:
Cons:
Example:
RBMT Steps:
Definition:
Relies on statistical patterns or neural networks learned from large bilingual text corpora.
Subtypes:
1. Statistical MT (SMT):
2. Neural MT (NMT):
How It Works:
SMT:
Pros:
Cons:
Example:
SMT/NMT Steps:
Key Differences
Hybrid Approaches
NLG Architecture
Adapted from Rags Mellish et al. (2004) and other seminal works
All NLG systems perform three core tasks, with increasing specificity:
Surface Realizer Generates grammatically correct sentences from the plan. KUNET
Example of Specificity:
Vague: "He danced with Jaya and she got angry."
Example:
o Input: Data → Discourse Plan → Text Plan → "He danced with Jaya and she got
angry."
2. Realization (Surface).
Example: GLINDA.
3. Key Differences
Pipelined: Struggles with backtracking (e.g., realizing a sentence may reveal planning errors).
Interleaved/Integrated: Better for nuanced texts but require more computational power.
Pipelined: Surface realizer cannot revise the text planner’s choice of "danced" vs. "forced."
5. Practical Applications
Summary
Next Steps: Explore how modern neural NLG (GPT-3, T5) blends these architectures.