100% found this document useful (1 vote)
148 views86 pages

NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment

The document provides an overview of the Naive Bayes classifier, a probabilistic machine learning algorithm used for text classification tasks such as spam detection and sentiment analysis. It explains the underlying principles, assumptions, advantages, and limitations of the Naive Bayes method, along with practical examples and optimizations for sentiment analysis. Additionally, it discusses the application of Naive Bayes in various real-world scenarios and highlights the importance of custom features for tasks like spam detection.

Uploaded by

22ai-reema4383
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
148 views86 pages

NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment

The document provides an overview of the Naive Bayes classifier, a probabilistic machine learning algorithm used for text classification tasks such as spam detection and sentiment analysis. It explains the underlying principles, assumptions, advantages, and limitations of the Naive Bayes method, along with practical examples and optimizations for sentiment analysis. Additionally, it discusses the application of Naive Bayes in various real-world scenarios and highlights the importance of custom features for tasks like spam detection.

Uploaded by

22ai-reema4383
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

NLP – Module 3

Naïve Bayes, Text


Classification and Sentiment
Prof. Mohammed Siraj B
YIT, Moodbidri
Naïve Bayes Classifier
What is Naive Bayes?
• Naive Bayes is a probabilistic machine learning algorithm used for
classification tasks.
• It is based on Bayes' Theorem, which calculates the probability of an
event based on prior knowledge.
Why "Naive"?
• It makes a "naive" assumption that all features (e.g., words in a
document) are independent of each other.
• This simplifies calculations but may not always reflect reality.
Applications:
• Spam detection (spam vs. not spam).
• Sentiment analysis (positive vs. negative reviews).
• Document categorization (e.g., sports, politics, technology).
Intuition Behind Naive Bayes
• Text as a Bag of Words:
• Imagine a document as a "bag" filled with words.
• The order of words doesn’t matter; only their frequency does.
• Example:
• "I love this movie" → {I:1, love:1, this:1, movie:1}.
• "I would recommend it" → {I:1, would:1, recommend:1, it:1}.
• Visual Example:
• Consider a movie review:
• "I love this movie! It's sweet, but with satirical humor..."
• The bag-of-words representation counts word frequencies:
• "I": 5 times, "it": 6 times, "love": 1 time, etc.
• This representation is simple but effective for many tasks.
Bayes' Theorem
What is Bayes' Theorem?
• It describes the probability of an event based on prior knowledge.
• Formula:
Naive Bayes Classifier
Goal:
• For a given document dd, find the class cc that maximizes P(c∣d)P(c∣d).
• Formula:
Naive Bayes Assumptions
1.Bag-of-Words Assumption:
• Ignores word order and position.
• Only considers word frequencies.
• Example:
1. "I love this movie" and "Movie love I this" are treated the same.
2.Conditional Independence Assumption:
• Assumes words are independent given the class.
• Formula:
Example: Sentiment Analysis
Why Use Naive Bayes?
• Advantages:
• Simple and Fast: Easy to implement and computationally efficient.
• Works Well with Small Datasets: Performs well even with limited training
data.
• Handles High Dimensions: Can handle large vocabularies.
• Example Use Case:
• Spam detection: Classify emails as spam or not spam based on word
frequencies.
Limitations of Naive Bayes
• Strong Independence Assumption:
• Words are not always independent in real-world text.
• Example: "New York" has a different meaning than "new" and "York"
separately.
• Zero Probability Problem:
• If a word in the test set was not seen in training, P(wi∣c)=0.
• Solved using smoothing techniques (e.g., Laplace smoothing).
• Not Ideal for Complex Relationships:
• Struggles with capturing context or word order.
Real-World Applications
• Spam Detection:
• Classify emails as spam or not spam based on word frequencies.
• Sentiment Analysis:
• Analyze product reviews to determine if they are positive or negative.
• Document Categorization:
• Organize news articles into categories like sports, politics, or technology.
Example: Spam Detection
Training the Naive Bayes Classifier
• Goal of Training:
• Learn two key probabilities:
• Class Prior (P(c)): Probability of each class.
• Word Likelihood (P(wi∣c) Probability of a word given a class.
• Why is Training Important?
• These probabilities are used to classify new documents.
Estimating
Class
Prior P(c)
Estimating Word
Likelihood P(wi∣c)
The Zero
Probability
Problem
What is the Solution for Zero Probability
Issues?
Laplace Smoothing
Handling Unknown Words
• Problem:
• Words in the test set that are not in the training vocabulary.
• Solution:
• Ignore unknown words in the test document.
• Do not include any probability for them.
• Example:
• Test document: "predictable with no fun."
• "with" is not in the vocabulary → remove it.
Training Algorithm
• Steps:
• Calculate P(c) for each class.
• Concatenate all documents in each class.
• Compute P(wi∣c) using Laplace smoothing.
• Return P(c) and P(wi∣c).
Worked Example of Naive Bayes
Classification
• Objective:
• Walk through a practical example of training and testing a Naive Bayes
classifier.
• Domain:
• Sentiment analysis (positive vs. negative movie reviews).
• Training Data:
• 3 negative reviews, 2 positive reviews.
• Test Document:
• "predictable with no fun."
Training Data
• Negative Reviews:
• "just plain boring"
• "entirely predictable and lacks energy"
• "no surprises and very few laughs"
• Positive Reviews:
• "very powerful"
• "the most fun film of the summer“

• Test Document:
• "predictable with no fun."
How to Solve in step by step
Step 1: Understand the Training Data
• Negative Reviews:
• "just plain boring"
• "entirely predictable and lacks energy"
• "no surprises and very few laughs"
• Positive Reviews:
• "very powerful"
• "the most fun film of the summer"
• Test Document:
• "predictable with no fun."
Step 2: Preprocess the Training Data
• Tokenize the Reviews:
• Negative Reviews:
• ["just", "plain", "boring"]
• ["entirely", "predictable", "and", "lacks", "energy"]
• ["no", "surprises", "and", "very", "few", "laughs"]
• Positive Reviews:
• ["very", "powerful"]
• ["the", "most", "fun", "film", "of", "the", "summer"]
• Build Vocabulary V:
• Union of all unique words in the training data:
• ["just", "plain", "boring", "entirely", "predictable", "and", "lacks", "energy", "no",
"surprises", "very", "few", "laughs", "powerful", "the", "most", "fun", "film", "of",
"summer"]
• Vocabulary size ∣V∣=20.
Step 3: Compute Class Priors P(c)
Step 4: Compute Word Likelihoods P(wi∣c)
Step 5: Preprocess the Test Document
• Test Document:
• "predictable with no fun."
• Remove Unknown Words:
• "with" is not in the vocabulary → remove it.
• Test Document After Removal:
• "predictable no fun."
Step 6: Calculate Class Probabilities
Step 7: Make the Classification Decision
Handling Unknown Words
Tasks:
1.Identify the unknown words in the test document (if any) and remove
them.
2.Compute the class priors P(c) for spam and ham.
3.Compute the word likelihoods P(wi∣c) for the remaining words using
Laplace smoothing.
4.Calculate P(c∣d) for both classes and classify the test document.
Step 1: Preprocess the Training Data
Total Words in Each Class:
• Spam: 7 words.
• Ham: 5 words.
Step 2: Preprocess the Test Document
• Test Document:
• "free lunch and prize"
• Tokenize:
• ["free", "lunch", "and", "prize"]
• Identify Unknown Words:
• "and" is not in the vocabulary → remove it.
• Test Document After Removal:
• ["free", "lunch", "prize"]
Step 3: Compute Class Priors P(c)
Step 4: Compute Word Likelihoods P(wi∣c)
Step 5: Calculate Class Probabilities P(c∣d)
Step 6: Classification Decision
Worked Examples
Multinomial Naive Bayes Classifier
• The Multinomial Naive Bayes (MNB) classifier is a probabilistic
machine learning algorithm.
• Commonly used for text classification tasks, such as spam detection,
sentiment analysis, and document categorization.
• It is based on Bayes' Theorem and makes the "naive" assumption that
the features (words) are conditionally independent given the class.
Optimizing for Sentiment Analysis
• Sentiment analysis is the task of determining the sentiment
(positive, negative, or neutral) expressed in a piece of text,
such as a review, tweet, or comment.
• While the standard Multinomial Naive Bayes
(MNB) classifier works well for sentiment analysis, there
are some optimizations that can improve its performance.
• These optimizations include:
1.Binary Naive Bayes
2.Handling Negation
3.Using Sentiment Lexicons
Why Optimization is Required in Sentiment
Analysis
• While the standard Multinomial Naive Bayes (MNB) classifier
works well for many text classification tasks, sentiment analysis
has some unique challenges that require optimizations to
improve performance.
• Eg:
• Document 1 : “I like that movie”
• Document 2 : “I didn’t like that movie”
• Naïve Bayes will treat both document as positive
1. Binary Naive Bayes
What is Binary Naive Bayes?
• Binary Naive Bayes is a variant of the standard Multinomial Naive
Bayes classifier.
• Instead of using word frequencies (how many times a word appears
in a document), it focuses on word presence (whether a word
appears or not).
• This means that even if a word appears multiple times in a document,
it is treated as if it appeared only once.
Why Use Binary Naive Bayes?
• In sentiment analysis, the presence of a word (e.g., "love" or "hate")
is often more important than its frequency.
• For example, saying "I love this movie" once or multiple times still
conveys the same sentiment.
• By ignoring word frequency, binary Naive Bayes reduces noise and
improves performance for sentiment tasks.
How It Works:
• During training and testing, duplicate words in a document are
removed.
• Example:
• Original document: "great great film" → Binary representation: "great film".
Example:
Training Data:
• Positive: "great film"
• Negative: "boring movie"
Test Document:
• "great great movie"
Binary Representation:
• "great movie"
Classification:
• The classifier predicts the sentiment based on the presence of "great" and
"movie" rather than their frequency.
2. Handling Negation
What is Negation?
• Negation words (e.g., "not", "didn't", "never") can flip the sentiment
of a sentence.
• For example:
• "I like this movie" (Positive)
• "I didn't like this movie" (Negative)
Why Handle Negation?
• Negation can completely change the meaning of a sentence, so
it’s important to capture its effect in sentiment analysis.
How to Handle Negation:
• A simple approach is to prepend "NOT_" to every word after a
negation token (e.g., "not", "didn't", "never") until the next
punctuation mark.
• Example:
• Original: "I didn't like this movie, but I enjoyed the acting."
• After negation handling: "I didn't NOT_like NOT_this NOT_movie, but I
enjoyed the acting."
Effect:
• Words like "NOT_like" and "NOT_movie" act as cues for negative
sentiment.
• Words like "NOT_bored" and "NOT_dismiss" act as cues for
positive sentiment.
Example:
• Training Data:
• Positive: "I enjoyed the movie"
• Negative: "I didn't enjoy the movie"
• Test Document:
• "I didn't like the acting"
• After Negation Handling:
• "I didn't NOT_like NOT_the NOT_acting"
• Classification:
• The classifier recognizes "NOT_like" and "NOT_acting" as negative cues.
3. Using Sentiment Lexicons
What are Sentiment Lexicons?
• Sentiment lexicons are pre-annotated lists of
words marked with positive or negative sentiment.
• Examples of popular lexicons:
• MPQA Subjectivity Lexicon: Contains 6,885 words marked as
strongly/weakly positive or negative.
• General Inquirer, LIWC, Hu and Liu Opinion Lexicon.
Why Use Sentiment Lexicons?
• When labeled training data is limited, sentiment lexicons
provide a reliable way to identify positive and negative
words.
• They help generalize better when the test data differs from
the training data.
How to Use Sentiment Lexicons:
• Add features like:
• "This word occurs in the positive lexicon."
• "This word occurs in the negative lexicon."
• Instead of counting individual words, count occurrences of
lexicon-based features.
Example:
Training Data:
• Positive: "great film"
• Negative: "awful movie"
Lexicon Features:
• Positive Lexicon: "great" → Count for positive feature.
• Negative Lexicon: "awful" → Count for negative feature.
Test Document:
• "great acting"
Classification:
• The classifier uses the lexicon features to predict sentiment.
Summary of Optimizations
1.Binary Naive Bayes:
1. Focuses on word presence rather than frequency.
2. Reduces noise and improves performance for sentiment analysis.
2.Handling Negation:
1. Modifies words after negation tokens to capture sentiment changes.
2. Ensures that negation flips are properly represented.
3.Sentiment Lexicons:
1. Uses pre-annotated word lists to identify positive and negative words.
2. Provides robust features when training data is limited.
Why These Optimizations Matter
• Improved Accuracy:
• Binary Naive Bayes reduces noise from word frequency.
• Negation handling captures subtle sentiment changes.
• Sentiment lexicons provide reliable features for sparse data.
• Real-World Applications:
• Sentiment analysis for product reviews, social media, and customer
feedback.
Naive Bayes for Other Text Classification
Tasks
• Naive Bayes is a powerful and flexible algorithm that can be used for
many text classification tasks beyond sentiment analysis.
• In this section, we’ll explore how Naive Bayes can be adapted for two
important tasks:
• Spam detection and
• Language identification (language ID).
• We’ll also discuss how custom features can make the classifier more
effective for these tasks.
1. Spam Detection
• What is Spam Detection?
• Spam detection is about identifying unwanted emails (spam)
from legitimate ones (ham).
• It’s like teaching a computer to recognize junk mail so it can filter it
out of your inbox.
• Why Custom Features are Needed:
• Spam emails often use tricky language or specific patterns to
trick people.
• For example, spam emails might say things like “You’ve won
$1,000,000!” or “Click here for a free prize!”
• Using all words as features might not work well because spam
emails can use normal-sounding words to hide their true nature.
Custom Features for Spam Detection:
• Phrases and Patterns:
• Look for specific phrases like “one hundred percent guaranteed” or
“urgent reply.”
• Use regular expressions to match patterns like “mentions millions of
dollars” or “online pharmaceutical.”
• Non-Linguistic Features:
• Check if the email subject is in ALL CAPS.
• Look for suspicious HTML code, like unbalanced “head” tags.
• Analyze the email’s metadata (e.g., where it came from).
Example: SpamAssassin Features
• Phrases:
• “one hundred percent guaranteed”
• “urgent reply”
• Patterns:
• Matches large sums of money (e.g., “$1,000,000”).
• Non-Linguistic Features:
• Email subject line is all capital letters.
• HTML has unbalanced “head” tags.
• Claims you can be removed from the list.
How Naive Bayes Works for Spam
Detection:
• Training:
• The classifier learns from labeled emails (spam vs. ham) using custom
features like phrases, patterns, and non-linguistic features.
• Testing:
• For a new email, the classifier checks for these custom features.
• Classification:
• If the email has many spam-like features, it’s classified as spam.
Otherwise, it’s classified as ham.
2. Language Identification (Language ID)
• What is Language ID?
• Language ID is about figuring out what language a piece of text is
written in.
• For example, is the text in English, Spanish, or French?
• Why Custom Features are Needed:
• Words alone might not be enough because many languages share
common words (e.g., “the” in English and Dutch).
• Instead, we use character n-grams (small sequences of
characters) to capture language-specific patterns.
Custom Features for Language ID:
• Character n-grams:
• 2-grams: ‘th’, ‘er’, ‘in’
• 3-grams: ‘the’, ‘ing’, ‘and’
• 4-grams: ‘tion’, ‘ment’
• Byte n-grams:
• Treat text as a sequence of raw bytes (useful for handling different
character encodings).
Example: langid.py System
• Features:
• Uses all possible n-grams of lengths 1-4.
• Selects the most informative 7,000 features.
• Training Data:
• Multilingual text from sources like Wikipedia, Twitter, and Bible
translations.
• Includes regional dialects (e.g., Nigerian English, African American
Vernacular English).
How Naive Bayes Works for Language ID
• Training:
• The classifier learns from multilingual text using character or byte n-
grams as features.
• Testing:
• For a new text, the classifier checks for the presence of these n-grams.
• Classification:
• The text is classified as the language with the highest probability.
Why Custom Features Matter
1.Spam Detection:
1. Spam emails use tricky language and specific patterns, so custom features
like phrases and non-linguistic properties are necessary to catch them.
2.Language ID:
1. Words alone might not distinguish between languages, so character n-
grams are used to capture language-specific patterns.
Real-World Examples
• Spam Detection Example:
• Email:
• Subject: “URGENT REPLY NEEDED!!!”
• Body: “You have won $1,000,000! Click here to claim your prize.”
• Custom Features:
• Subject line is all capital letters.
• Contains the phrase “URGENT REPLY.”
• Mentions a large sum of money (“$1,000,000”).
• Classification:
• The classifier identifies these features and classifies the email as spam.
Language ID Example:
• Text:
• “El perro está en la casa.”
• Character n-grams:
• 2-grams: ‘El’, ‘ p’, ‘er’, ‘ro’, ‘ e’, ‘st’, ‘á e’, ‘n l’, ‘a c’, ‘as’, ‘a.’
• Classification:
• The classifier recognizes the n-grams as Spanish and classifies the text
as Spanish.
Interactive Activity
• Activity 1: Spam Detection
• Task:
• Look at the following email and identify spam-like features:
• Subject: “Congratulations! You’ve won a free iPhone!”
• Body: “Click here to claim your prize now!”
• Questions:
• What phrases or patterns make this email suspicious?
• Would you classify it as spam or ham?
Activity 2: Language ID
• Task:
• Look at the following text and guess the language:
• “Le chat est sur la table.”
• Questions:
• What character n-grams can you identify?
• What language do you think this is?
Summary
• Naive Bayes can be adapted for spam detection and language ID by
using custom features.
• For spam detection, features like phrases, patterns, and non-linguistic
properties are effective.
• For language ID, character or byte n-grams are used to capture
language-specific patterns.
• These custom features make the classifier more accurate and robust
for real-world applications.
Naive Bayes as a Language Model
• In this section, we’ll explore how Naive Bayes can be viewed as
a language model.
• Specifically, we’ll see how Naive Bayes, when using individual
word features, behaves like a set of class-specific unigram
language models.
• This means that for each class (e.g., positive or negative
sentiment), Naive Bayes creates a separate language model that
assigns probabilities to words and sentences.
1. Naive Bayes as a Language Model
• When Naive Bayes uses individual word features (and all words in the
text), it can be seen as a set of class-specific unigram language
models.
• A unigram language model assigns probabilities to individual words,
assuming each word is independent of the others.
• For each class (e.g., positive or negative), Naive Bayes creates a
separate unigram language model.
2. Sentence Probability
• The Naive Bayes model assigns a probability to a sentence by
multiplying the probabilities of the words in the sentence, given the
class.
• Formula:

You might also like