Chapter - 1
Chapter - 1
Applications of NLP :-
1. Text Processing: Tokenization, stopword removal, stemming,
lemmatization.
2. Speech Recognition: Converts spoken language into text (e.g., Siri,
Google Assistant).
3. Machine Translation: Automated language translation (e.g., Google
Translate).
4. Sentiment Analysis: Determines the emotional tone of text (e.g.,
product reviews, social media monitoring).
5. Chatbots and Virtual Assistants: AI-driven conversational agents
(e.g., ChatGPT, Alexa).
6. Information Retrieval: Search engines and document indexing
(e.g., Google Search).
Challenges Of NLP :-
1. Ambiguity: Words and sentences may have multiple
interpretations.
2. Context Understanding: Difficulty in grasping contextual and
idiomatic expressions.
3. Resource Scarcity: Limited labeled datasets for
underrepresented languages.
4. Scalability: Handling large-scale real-time processing
efficiently.
Key Processes:
Key Processes:
3. Semantic Analysis
Objective: Extract meaning from text by understanding relationships
between words and phrases.
Key Processes:
4. Discourse Integration
Key Processes:
5. Pragmatic Analysis
Key Processes:
1. Tokenization
Types of Tokenization:
o Example: "I love NLP. It's amazing!" → ["I love NLP.", "It's
amazing!"] Importance:
Stemming:
• Example:
o "running" → "run"
Lemmatization:
• Example:
Importance:
Example:
Importance:
Example:
PERSON
Category Example
"$100", "₹5000"
Importance:
Types of Checks:
Techniques Used:
Importance:
Example in NLP:
Class Probability
Sports 0.2
Politics 0.7
Technology 0.1
argmax([0.2,0.7,0.1]) = 1
Since Politics (index 1) has the highest probability (0.7), the model
predicts this category.
• Part-of-Speech (POS) Tagging: Selecting the most likely POS tag for a
word.
Examples in NLP:
Neutral.
2. Topic Classification:
Example:
Class Probability
Sports 0.3
Politics 0.5
Technology 0.2
Applying argmax:
argmax([0.3,0.5,0.2]) = 1
NLP 1 1 is 1 0 amazing 1 0
and 0 1 AI 0 1
are 0 1 related 0 1
• Doc 1 → [1, 1, 1, 0, 0, 0, 0]
• Doc 2 → [1, 0, 0, 1, 1, 1, 1]
Advantages of BoW:
AI 0 1/5 1/5
Term Score
Term Calculation
NLP 3 log(3/3) 0
is 2 log(3/2) 0.18
AI 2 log(3/2) 0.18
Step 3: Compute TF-IDF Score
For Doc 1:
NLP 1/3 0 0
Advantages of TF-IDF:
✅ Reduces the importance of common words
✅ Highlights rare but meaningful words
✅ Improves information retrieval (Google, search engines)
Limitations of TF-IDF:
❌ Does not capture word meaning (e.g., synonyms "big" and "large"
are treated separately)
❌ Fails to consider word order and semantics
For example:
1. Non-compositionality:
2. Non-substitutability:
3. Non-modifiability:
1. Non-compositionality:
2. Non-substitutability:
3. Non-modifiability: