NLP Mod 1 SEE
NLP Mod 1 SEE
Q1) Differences Between NLU (Natural Language Understanding) and NLG (Natural Language
Generation). (IAT)
Error Impact Misunderstanding user input leads Poorly generated language affects user
to incorrect responses. trust and readability.
Q2) List the generations of NLP and the advantages and disadvantages of NLP. (IAT)
Advantages of NLP:
1. Enhanced Communication - Bridges human-computer interaction using natural
language.
2. Automation - Reduces manual effort by automating tasks like sentiment analysis and
translation.
3. Speed and Efficiency - Processes vast text data quickly and accurately.
4. Real-time Assistance - Enables applications like chatbots for immediate query
resolution.
5. Personalization - Customizes recommendations based on user behavior and language.
6. Multi-language Support - Supports diverse languages for global applicability.
7. Data Extraction - Extracts relevant information from unstructured text.
8. Cost-Effective - Lowers operational costs with automation and data insights.
Disadvantages of NLP:
1. Complexity - Requires sophisticated algorithms and extensive training data.
2. Ambiguity - Struggles with uncertainty and contextual understanding in sentences.
3. Bias Risks - Inherits biases from training datasets.
4. Dependency on Data - Performance depends heavily on the quality of training data.
5. Error Propagation - Mistakes in early stages can affect subsequent analysis.
6. Cultural Sensitivity - May misinterpret cultural significance and expressions.
7. Computational Cost - High processing power requirements for large-scale models.
2. Sarcasm and Irony: Statements often mean the opposite of their literal wording.
Example: "Oh great, another traffic jam!" implies frustration, not happiness.
5. Synonyms: Different words can have similar meanings but precise usage differences.
Example: "Big" and "large" both mean sizable but are used differently in some contexts.
6. Grammar Variability: Different sentence structures can convey the same idea.
Example: "She read the book" vs. "The book was read by her."
7. Idiomatic Expressions: Phrases cannot be understood from literal word meanings.
Example: "Spill the tea" means to reveal a secret, not literally spilling tea.
9. Out-of-Vocabulary Words: New or rare words may not exist in training data.
Example: Words like "NFT" or "Metaverse" might not be recognized initially.
11. Domain-Specific Knowledge: Jargon varies greatly across fields like medicine or law.
Example: "BP" in medicine refers to blood pressure, but in finance, it means basis points.
12. Handling Multilingual Texts: Different languages have unique syntax and semantics.
Example: In Hindi, verbs change forms based on gender, making translation complex.
1. Sentence Segmentation:
a. Breaks the text into individual sentences.
b. Example:
i. Input: "Independence Day is one of the important festivals for every Indian
citizen. It is celebrated on the 15th of August each year."
ii. Output:
1. "Independence Day is one of the important festivals for every Indian
citizen."
2. "It is celebrated on the 15th of August each year."
2. Word Tokenization:
a. Divides sentences into individual words or tokens.
b. Example:
i. Input: "JavaTpoint offers Corporate Training, Summer Training, Online
Training, and Winter Training."
ii. Output: ["JavaTpoint", "offers", "Corporate", "Training", "Summer",
"Training", "Online", "Training", "and", "Winter", "Training", "."]
3. Stemming:
a. Reduces words to their root form, though the root may not be a meaningful word.
b. Example:
i. Input: "celebrates", "celebrated", "celebrating"
ii. Output: "celebr", "celebr", "celebr"
4. Lemmatization:
a. Converts words to their base form (lemma), which is a meaningful word.
b. Example:
i. Input: "intelligence", "intelligent", "intelligently"
ii. Output: "intelligent", "intelligent", "intelligent"
5. Identifying Stop Words:
a. Filters out common words that add little value to the analysis (e.g., "is", "and",
"the").
b. Example:
i. Input: "He is a good boy."
ii. Output (after removing stop words): "good", "boy"
6. Dependency Parsing:
a. Determines how words in a sentence are related to each other grammatically.
b. Example:
i. Sentence: "She eats an apple."
ii. Output: Parses relationships, identifying "eats" as the action related to both
"She" and "apple".
7. POS Tagging:
a. Assigns parts of speech (noun, verb, adjective, etc.) to each word in a sentence.
b. Example:
i. Sentence: "Google is a tech company."
ii. Output: Google (NNP), is (VBZ), a (DT), tech (JJ), company (NN)
8. Named Entity Recognition (NER):
a. Identifies and classifies named entities like people, organizations, or locations.
b. Example:
i. Sentence: "Steve Jobs introduced iPhone at the Macworld Conference in San
Francisco, California."
ii. Output: Steve Jobs (Person), iPhone (Product), Macworld Conference
(Event), San Francisco (Location), California (Location)
9. Chunking:
a. Groups tokens into chunks based on their syntactic roles, such as noun phrases or
verb phrases.
b. Example:
i. Sentence: "The quick brown fox jumps over the lazy dog."
ii. Output: [The quick brown fox] (NP), [jumps over] (VP), [the lazy dog] (NP)