0% found this document useful (0 votes)

11 views46 pages

Multimedia Application L7 - For

The document provides an overview of Naive Bayes classifiers, including their types, training methods, and applications in text classification tasks such as sentiment analysis and spam detection. It explains Bayes' theorem, the bag-of-words representation, and the process of training and predicting with Naive Bayes models. Additionally, it discusses the importance of features, handling unknown words, and the use of lexicons in sentiment classification.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views46 pages

Multimedia Application L7 - For

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 46

Multimedia

Application
By

Minhaz Uddin Ahmed, PhD

Department of Computer Engineering
Inha University Tashkent.
Email: [email protected]
Content
 Naive Bayes Classifiers
 Training the Naive Bayes Classifier
 Worked example
 Optimizing for Sentiment Analysis
 Naive Bayes for other text classification tasks
 Naive Bayes as a Language Model
Bayes theorem

 Bayes’ theorem is also known as Bayes’ Rule or Bayes’ law, which

is used to determine the probability of a hypothesis with prior
knowledge. It depends on the conditional probability

P(A|B) is Posterior probability: Probability of hypothesis A on the

observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given
that the probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing
the evidence.
P(B) is Marginal Probability: Probability of Evidence.
Types Of Naive Bayes:

 There are three types of Naive Bayes model under the scikit-learn library:
• Gaussian: It is used in classification and it assumes that features follow
a normal distribution.
• Multinomial: It is used for discrete counts. For example, let’s say, we
have a text classification problem. Here we can consider Bernoulli trials
which is one step further and instead of “word occurring in the
document”, we have “count how often word occurs in the document”,
you can think of it as “number of times outcome number x_i is observed
over the n trials”.
• Bernoulli: The binomial model is useful if your feature vectors are
binary (i.e. zeros and ones). One application would be text classification
with ‘bag of words’ model where the 1s & 0s are “word occurs in the
document” and “word does not occur in the document” respectively.
Example
Example solution

 Assigning subject categories, topics, or genres

 Spam detection
 Authorship identification
 Age/gender identification
 Language Identification
 Sentiment analysis
Who wrote which Federalist papers?

 1787-8: anonymous essays try to convince New York to ratify U.S

Constitution: Jay, Madison, Hamilton.
 Authorship of 12 of the letters in dispute
 1963: solved by Mosteller and Wallace using Bayesian methods

James Madison Alexander Hamilton

Male or female author from a given
text
By 1925 present-day Vietnam was divided into three parts under French
colonial rule. The southern region embracing Saigon and the Mekong
delta was the colony of Cochin-China; the central area with its imperial
capital at Hue was the protectorate of Annam …

Clara never failed to be astonished by the extraordinary felicity of her

own name. She found it hard to trust herself to the mercy of fate, which
had managed over the years to convert her greatest shame into one of
her greatest assets…
Text Classification: definition

 Input:
 a document d
 a fixed set of classes C = {c1, c2,…, cJ}

 Output: a predicted class c  C

Classification Methods:
Hand-coded rules
 Rules based on combinations of words or other features
 spam: black-list-address OR (“dollars” AND“have been selected”)
 Accuracy can be high
 If rules carefully refined by expert
 But building and maintaining these rules is expensive
Classification Methods:
Supervised Machine Learning
 Input:
 a document d

 a fixed set of classes C = {c1, c2,…, cJ}

 A training set of m hand-labeled documents (d1,c1),....,(dm,cm)

 Output:
 a learned classifier γ:d  c
Classification Methods:
Supervised Machine Learning
 Any kind of classifier
 Naïve Bayes

 Logistic regression

 Support-vector machines

 k-Nearest Neighbors
Naive Bayes Intuition

 Simple ("naive") classification method based on Bayes rule

 Relies on very simple representation of document
 Bag of words
The Bag of Words Representation

 We preprocess the dataset by converting each email into a

bag-of-words representation, where each word is a feature and
its frequency in the email is its value. We also assign a label
(spam or not spam) to each email
The Bag of Words Representation
The bag of words representation

seen 2
sweet 1

γ whimsical
recommend
happy
1
1
1
)=c
( ... ...
Training

 We train the Naïve Bayes classifier on the labeled dataset. During

training, the classifier calculates the probabilities of each word
occurring in spam and not spam emails, as well as the prior
probabilities of spam and not spam emails in the dataset.
Prediction

Step 1: Given a new email, we convert it into a bag-of-words

representation.
Step 2: For each word in the email, we calculate its
conditional probability of occurring in spam and not spam
emails based on the probabilities learned during training.
Step 3: We multiply the conditional probabilities of all words
in the email and multiply them by the prior probabilities of
spam and not spam emails.
Step 4: We compare the calculated probabilities for spam
and not spam, and classify the email as spam or not spam
based on the higher probability.
Bayes’ Rule Applied to Documents
and Classes
 For a document d and a class c
Naive Bayes Classifier (I)

MAP is “maximum a
posteriori” = most
likely class

Bayes Rule

Dropping the
denominator
Text Classification and Naïve Bayes
Learning the Multinomial Naive Bayes Model

 First attempt: maximum likelihood estimates

 simply use the frequencies in the data

𝑁𝑐
^ (𝑐 )=
𝑃 𝑗
𝑗
𝑁 𝑡𝑜𝑡𝑎𝑙
Parameter estimation

fraction of times word wi appears

among all words in documents of topic cj

 Create mega-document for topic j by concatenating all docs in this

topic
 Use frequency of w in mega-document
Problem with Maximum Likelihood

 What if we have seen no training documents with the word

fantastic and classified in the topic positive (thumbs-up)?

 Zero probabilities cannot be conditioned away, no matter the

other evidence!
Laplace (add-1) smoothing for Naïve
Bayes
Multinomial Naïve Bayes: Learning

 From training corpus, extract Vocabulary

 Calculate P(cj) terms • Calculate P(wk | cj) terms

 For each cj in C do • Textj  single doc containing all docsj
docsj  all docs with class =cj
• For each word wk in Vocabulary
nk  # of occurrences of wk in Textj
Unknown words

 What about unknown words

 that appear in our test data
 but not in our training data or vocabulary?
 We ignore them
 Remove them from the test document!
 Pretend they weren't there!
 Don't include any probability for them at all!
 Why don't we build an unknown word model?
 It doesn't help: knowing which class has more unknown words is
not generally helpful!
Stop words

 Some systems ignore stop words

 Stop words: very frequent words like the and a.
 Sort the vocabulary by word frequency in training set
 Call the top 10 or 50 words the stopword list.
 Remove all stop words from both training and test sets
As if they were never there!
 But removing stop words doesn't usually help
• So in practice most NB algorithms use all words and don't
use stopword lists
Naive Bayes: Learning

Sentiment
Example:
A worked sentiment example with
add-1 smoothing
1. Prior from training:
^ (𝑐 )=
𝑃 𝑗
𝑁𝑐 𝑗 P(-) = 3/5
𝑁 𝑡𝑜𝑡𝑎𝑙
P(+) = 2/5
2. Drop "with"
3. Likelihoods from training:
𝑐𝑜𝑢𝑛𝑡 ( 𝑤 𝑖 , 𝑐 ) +1
𝑝 ( 𝑤 𝑖|𝑐 ) =
(∑ )
𝑐𝑜𝑢𝑛𝑡 (𝑤 ,𝑐 ) + ¿ 𝑉 ∨¿ ¿ 4. Scoring the test set:
𝑤 ∈𝑉
Optimizing for sentiment analysis

For tasks like sentiment, word occurrence seems to be more

important than word frequency.
 The occurrence of the word fantastic tells us a lot
 Thefact that it occurs 5 times may not tell us much
more.
Binary multinominal naive bayes, or binary NB
 Clip our word counts at 1
 Note: this is different than Bernoulli naive bayes; see the textbook
at the end of the chapter.
Binary Multinomial Naïve Bayes:
Learning

• From training corpus, extract Vocabulary

• Remove duplicates in each doc:
• For each word type w in docj
 Calculate P(cj) terms • Calculate P(wk | c ) terms
j
• Retain only a single instance of w
 For each cj in C do • Textj  single doc containing all docsj
docsj  all docs with class =cj • For each word wk in Vocabulary
nk  # of occurrences of wk in Textj
Binary Multinomial Naive Bayes
on a test document d
First remove all duplicate words from d
Then compute NB using the same equation
Binary multinominal naive Bayes
Binary multinominal naive Bayes

Counts can still be 2! Binarization is within-doc!

 I really like this movie

I really don't like this movie

Negation changes the meaning of "like" to negative.

Negation can also change negative to positive-ish
◦ Don't dismiss this film
◦ Doesn't let us get bored
Sentiment Classification: Lexicons

Sometimes we don't have enough labeled training data

In that case, we can make use of pre-built word lists
Called lexicons
There are various publicly available lexicons
MPQA Subjectivity Cues Lexicon

Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in
Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005.

Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.

 Home page: https://mpqa.cs.pitt.edu/lexicons/subj_lexicon/

 6885 words from 8221 lemmas, annotated for intensity (strong/weak)
 2718 positive
 4912 negative
 + : admirable, beautiful, confident, dazzling, ecstatic, favor, glee, great
 − : awful, bad, bias, catastrophe, cheat, deny, envious, foul, harsh,
hate
Using Lexicons in Sentiment
Classification
Add a feature that gets a count whenever a word from the lexicon
occurs
 E.g., a feature called "this word occurs in the positive lexicon" or
"this word occurs in the negative lexicon"
Now all positive words (good, great, beautiful, wonderful) or negative
words count for that feature.
Using 1-2 features isn't as good as using all the words.
• But when training data is sparse or not representative of the test set,
dense lexicon features can help
Naive Bayes in Other tasks: Spam
Filtering
 Spam Assassin Features:
 Mentions millions of (dollar) ((dollar) NN,NNN,NNN.NN)
 From: starts with many numbers
 Subject is all capitals
 HTML has a low ratio of text to image area
 "One hundred percent guaranteed"
 Claims you can be removed from the list
Naive Bayes in Language ID

 Determining what language a piece of text is

written in.
Features based on character n-grams do very
well
 Important to train on lots of varieties of each
language
(e.g., American English varieties like African-American English,
or English varieties around the world like Indian English)
Summary: Naive Bayes is Not So
Naive
 Very Fast, low storage requirements
 Work well with very small amounts of training data
 Robust to Irrelevant Features
 Irrelevant Features cancel each other without affecting results
 Very good in domains with many equally important features
 Decision Trees suffer from fragmentation in such cases – especially if
little data
 Optimal if the independence assumptions hold: If assumed independence is
correct, then it is the Bayes Optimal Classifier for problem
 A good dependable baseline for text classification
Reference

Chapter 4
Question
Thank you

Mlts Exam 2025 Ws Mock Exam Solutions
No ratings yet
Mlts Exam 2025 Ws Mock Exam Solutions
6 pages
Group 5 - Disney+ Case
100% (1)
Group 5 - Disney+ Case
7 pages
A Comprehensive Survey On Applications of Transformers For Deep Learning Tasks
No ratings yet
A Comprehensive Survey On Applications of Transformers For Deep Learning Tasks
58 pages
Bayesian Learning
No ratings yet
Bayesian Learning
49 pages
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
No ratings yet
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
47 pages
2013 COMP5318 Lecture1
No ratings yet
2013 COMP5318 Lecture1
21 pages
04 Textcat
No ratings yet
04 Textcat
101 pages
AI Lec 04+05 - Naive Bayes
No ratings yet
AI Lec 04+05 - Naive Bayes
55 pages
1.jumio IDVerificationDatasheet v7
100% (1)
1.jumio IDVerificationDatasheet v7
2 pages
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
No ratings yet
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
55 pages
02 Text Processing PDF
No ratings yet
02 Text Processing PDF
70 pages
Implementing Artificial Intelligence (AI) in Higher Education: A Narrative Literature Review
No ratings yet
Implementing Artificial Intelligence (AI) in Higher Education: A Narrative Literature Review
32 pages
Resentation On Aïve Bayesian Lassification
No ratings yet
Resentation On Aïve Bayesian Lassification
38 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
Inf2b Learn Note07 2up
No ratings yet
Inf2b Learn Note07 2up
5 pages
Statistics
No ratings yet
Statistics
25 pages
Naive Bayes Classifiers: Connectionist and Statistical Language Processing
No ratings yet
Naive Bayes Classifiers: Connectionist and Statistical Language Processing
22 pages
(IJCST-V11I3P21) :ms. Deepali Bhimrao Chavan, Prof. Suraj Shivaji Redekar
No ratings yet
(IJCST-V11I3P21) :ms. Deepali Bhimrao Chavan, Prof. Suraj Shivaji Redekar
4 pages
Sentiment Analysis: Using Naïve Bayes Classifier
No ratings yet
Sentiment Analysis: Using Naïve Bayes Classifier
18 pages
Clustering. Computational Journalism Week 2
No ratings yet
Clustering. Computational Journalism Week 2
41 pages
Entropy: Prediction Model of The Power System Frequency Using A Cross-Entropy Ensemble Algorithm
No ratings yet
Entropy: Prediction Model of The Power System Frequency Using A Cross-Entropy Ensemble Algorithm
17 pages
Gu An Empirical Study ICCV 2017 Paper PDF
No ratings yet
Gu An Empirical Study ICCV 2017 Paper PDF
10 pages
Nidhi Resume
No ratings yet
Nidhi Resume
1 page
An Approach of The Naive Bayes Classifier For The Document Classification
No ratings yet
An Approach of The Naive Bayes Classifier For The Document Classification
4 pages
Na Ive Bayes Classifier
No ratings yet
Na Ive Bayes Classifier
3 pages
Text Categorization and Classification
No ratings yet
Text Categorization and Classification
13 pages
L5 TextClassification Updated
No ratings yet
L5 TextClassification Updated
179 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
48 pages
Naivebayes 2021
No ratings yet
Naivebayes 2021
77 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
28 pages
2022 Slide9 BayesML Eng
No ratings yet
2022 Slide9 BayesML Eng
34 pages
INT354 Question Bank
No ratings yet
INT354 Question Bank
11 pages
Ca 12
No ratings yet
Ca 12
64 pages
1 s2.0 S1359645422006061 Main 1 5
No ratings yet
1 s2.0 S1359645422006061 Main 1 5
5 pages
4 NB 2024
No ratings yet
4 NB 2024
82 pages
NLP NB
No ratings yet
NLP NB
52 pages
7 - Text Classification Naive Bayes
No ratings yet
7 - Text Classification Naive Bayes
41 pages
LM3 - Naive Bayes Model
No ratings yet
LM3 - Naive Bayes Model
21 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
ML Engineer RMTI
No ratings yet
ML Engineer RMTI
3 pages
Major Project Team 38
No ratings yet
Major Project Team 38
34 pages
NB 24 Aug
No ratings yet
NB 24 Aug
82 pages
T4L1 Naive Bayes
No ratings yet
T4L1 Naive Bayes
50 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
Comparison of Naive Bayes Classifier and C-LSTM
No ratings yet
Comparison of Naive Bayes Classifier and C-LSTM
6 pages
Text Classification
No ratings yet
Text Classification
53 pages
DL UNIT-4 Part-1
No ratings yet
DL UNIT-4 Part-1
10 pages
4 Naive Bayes
No ratings yet
4 Naive Bayes
82 pages
04 - 1 06 Naivebayes
No ratings yet
04 - 1 06 Naivebayes
65 pages
Naive Bayes
No ratings yet
Naive Bayes
56 pages
Lecture 5-1 Naive
No ratings yet
Lecture 5-1 Naive
44 pages
24 Shivangi DMDW
No ratings yet
24 Shivangi DMDW
12 pages
4.machine Learning For Text Understanding-1
No ratings yet
4.machine Learning For Text Understanding-1
45 pages
An AIoT-based System For Real-Time Monitoring of Tunnel Construction
No ratings yet
An AIoT-based System For Real-Time Monitoring of Tunnel Construction
12 pages
The Role of Artificial Intelligence in Enhancing Data Security
No ratings yet
The Role of Artificial Intelligence in Enhancing Data Security
24 pages
Week 4
No ratings yet
Week 4
45 pages
Multinomial NB
No ratings yet
Multinomial NB
52 pages
Lecture03 Naive Bayes
No ratings yet
Lecture03 Naive Bayes
33 pages
Naive Bayes
No ratings yet
Naive Bayes
12 pages
Naive Bayes With Sentiment Classification
No ratings yet
Naive Bayes With Sentiment Classification
82 pages
Naive Bayes
No ratings yet
Naive Bayes
4 pages
Lecture03 Naivebayes
No ratings yet
Lecture03 Naivebayes
25 pages
Naive Bayes and Sentiment Classification: CS6431 Natural Language Processing Spring 2023
No ratings yet
Naive Bayes and Sentiment Classification: CS6431 Natural Language Processing Spring 2023
36 pages
Applying Machine Learning For Hydraulic Flow Unit Classification
No ratings yet
Applying Machine Learning For Hydraulic Flow Unit Classification
16 pages
DF Notes
No ratings yet
DF Notes
13 pages
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
100% (1)
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
86 pages
Module 3 NLP
No ratings yet
Module 3 NLP
17 pages
Sih Report
No ratings yet
Sih Report
33 pages
NB 24 Aug
No ratings yet
NB 24 Aug
79 pages
NB 24 Aug
No ratings yet
NB 24 Aug
85 pages
Naive Bayes Classifier Presentation
No ratings yet
Naive Bayes Classifier Presentation
10 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Multimedia Application L8
No ratings yet
Multimedia Application L8
68 pages
Synaesthetic Architecture: A Building Dreams
No ratings yet
Synaesthetic Architecture: A Building Dreams
10 pages
Text Classification Using TF-IDF and Machine Learning
No ratings yet
Text Classification Using TF-IDF and Machine Learning
30 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
In4080 2022 Lecture 03
No ratings yet
In4080 2022 Lecture 03
62 pages
Spam Call Detector
No ratings yet
Spam Call Detector
11 pages
Week 2
No ratings yet
Week 2
31 pages
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
No ratings yet
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
38 pages
Bag - of - Words NLP
No ratings yet
Bag - of - Words NLP
23 pages
Text Classification
No ratings yet
Text Classification
60 pages
Multimedia Application L4
No ratings yet
Multimedia Application L4
42 pages
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
No ratings yet
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
27 pages
Text Classification
No ratings yet
Text Classification
11 pages
Multimedia Application L2
No ratings yet
Multimedia Application L2
47 pages
BAI601 Module 3 PDF
No ratings yet
BAI601 Module 3 PDF
19 pages
Project Planning
No ratings yet
Project Planning
1 page
Andhra Assessment Shortlistings - FY25
No ratings yet
Andhra Assessment Shortlistings - FY25
79 pages
Multimedia Application L9
No ratings yet
Multimedia Application L9
43 pages
05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
Block Cipher
No ratings yet
Block Cipher
17 pages
Q-Learning Algorithm
No ratings yet
Q-Learning Algorithm
13 pages
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet

Multimedia Application L7 - For

Uploaded by

Multimedia Application L7 - For

Uploaded by

Multimedia

Minhaz Uddin Ahmed, PhD

 Bayes’ theorem is also known as Bayes’ Rule or Bayes’ law, which

P(A|B) is Posterior probability: Probability of hypothesis A on the

 Assigning subject categories, topics, or genres

 1787-8: anonymous essays try to convince New York to ratify U.S

James Madison Alexander Hamilton

Clara never failed to be astonished by the extraordinary felicity of her

 Output: a predicted class c  C

 a fixed set of classes C = {c1, c2,…, cJ}

 Simple ("naive") classification method based on Bayes rule

 We preprocess the dataset by converting each email into a

 We train the Naïve Bayes classifier on the labeled dataset. During

Step 1: Given a new email, we convert it into a bag-of-words

 First attempt: maximum likelihood estimates

fraction of times word wi appears

 Create mega-document for topic j by concatenating all docs in this

 What if we have seen no training documents with the word

 Zero probabilities cannot be conditioned away, no matter the

 From training corpus, extract Vocabulary

 Calculate P(cj) terms • Calculate P(wk | cj) terms

 What about unknown words

 Some systems ignore stop words

For tasks like sentiment, word occurrence seems to be more

• From training corpus, extract Vocabulary

Counts can still be 2! Binarization is within-doc!

 I really like this movie

I really don't like this movie

Negation changes the meaning of "like" to negative.

Sometimes we don't have enough labeled training data

 Home page: https://mpqa.cs.pitt.edu/lexicons/subj_lexicon/

 Determining what language a piece of text is

You might also like