0% found this document useful (0 votes)

24 views

Text Classification

Uploaded by

ranjanagirish30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Text Classification

Uploaded by

ranjanagirish30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Text Classification

Besat Kassaie
2 Outline

 Introduction
 Text classification definition
 Naive Bayes
 Vector Space Classification
 Rocchio Classficiation
 KNN Classification
 New Applications
3 Spam Detection
4 Authorship identification

 Was Mark Twain the writer of “Quintus Curtius Snodgrass” (QCS) letters?
 Mark Twain's role in the Civil War has been a subject of dispute for years.
 The evidence of Twain’s possible military connection in New Orleans was
from content of ten letters published in New Orleans’ Daily Crescent.
 In these letters, which have been credited to Twain and were signed
“Quintus Curtius Snodgrass” (QCS), the writer described his military
adventures.
 Bringar (1963) applied statistical tests to QCS letters.
 Bringar used word frequency distribution to determine Mark Twain’s writing
style
 Mark Twain was not the author of the disputed letters!
5 Gender Identification

 Determining if an author is male or female

 Men and women inherently use different classes of language styles.
 Samples:
 a large number of determiners (a, the, that, these) and quantifiers (one, two,
more, some) are male indicators.
 Using large number of the pronouns (I, you, she, her, their, myself, yourself,
herself) are all strong female indicators.

 Applications:
 Marketing, personalization, legal investigation
6 Sentiment Analysis

 Is the attitude of this text positive or negative?

 Rank the attitude of this text from 1 to 5
 Sample Applications:
 Movie: is this review positive or negative?
 Products: what do people think about the new iPhone?
 Public sentiment: how is consumer confidence? Is despair
increasing?
 Politics: what do people think about this candidate or issue?
 Prediction: predict election outcomes or market trends from
sentiment
7 Subject Assignment

 What is the subject of this article:

 Antagonists and inhibitors
 Blood Supply
 Chemistry
 Drug Therapy
??  Embryology
 ….
8 Text Classification Definition

Input:
a document d
a fixed set of classes C={c1,c2,…,cj}

Output: a predicted class c ∈ C

9 Approaches to Text Classification

 Manual
 Many classification tasks have traditionally been solved manually.
 e.g. Books in a library are assigned Library of Congress categories by a librarian
 manual classification is expensive to scale

 Hand-crafted rules
 A rule captures a certain combination of keywords that indicates a class.
 e.g. (multicore OR multi-core) AND (chip OR processor OR microprocessor)
 good scaling properties
 creating and maintaining them over time is labor intensive

 Machine learning based methods

10 Supervised Learning

 Input:
 a document d
 a fixed set of classes C={c1,c2,…,cj}
 a training set of m labeled documents (d1,c1), (d2,c2),…,(dm,cm)

 Output: a learned classifier γ :d → c

11 Supervised Learning

 We can use any kind of classifiers:

 Naïve bayes
 kNN
 Rocchio
 SVM
 Logistic Regression
 ….
12

Naïve Bayes Classification

13 Naïve Bayes Intuition

 It is a simple classification technique based on Bayes rule

 Documents are represented by “bag of words” model
 The bag of words model :
 a document d, is represented by the set of words and their weights
determined by the TF weights
 TF is the number of occurrences of each term.
 The document “Mary is quicker than John” is, in this view, identical to
the document “John is quicker than Mary”.
14 The bag of words representation

I love this movie! It's sweet,

but with satirical humor. The
dialogue is great and the

γ( )=c
adventure scenes are fun… It
manages to be whimsical and
romantic while laughing at the
conventions of the fairy tale
genre. I would recommend it to
just about anyone. I've seen
it several times, and I'm
always happy to see it again
whenever I have a friend who
hasn't seen it yet.
15 The bag of words representation

γ( )=c
16 Features

 Supervised learning classifiers can use any sort of feature

 URL, email address, punctuation, capitalization, social network features
and so on
 In the bag of words view of documents
 We use only word features
17 Baye’s Rule Applied to Documents and
Classes
 For a document d and a class c

𝑃 𝑑 𝑐 𝑃(𝑐)
𝑃 𝑐𝑑 =
𝑃(𝑑)
18 Naïve Bayes Classifier

𝑐𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝 𝑐 𝑑 MAP is “maximum a

𝑐∈𝐶 posteriori” = most
likely class

𝑃 𝑑 𝑐 𝑃(𝑐)
= 𝑎𝑟𝑔𝑚𝑎𝑥 Bayes Rule
𝑐∈𝐶 𝑃(𝑑)

Dropping the
= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑑 𝑐 𝑃(𝑐) denominator
𝑐∈𝐶
Document d
= 𝑎𝑟𝑔𝑚𝑎𝑥𝑃 𝑥1 , 𝑥1 , … , 𝑥𝑛 𝑐 𝑃(𝑐) represented as Features
𝑐∈𝐶 x1..xn
19 Multinomial NB VS Bernoulli NB

 There are two different approaches for setting up an NB classifier.

 multinomial model: generates one term from the vocabulary in each
position of the document
 multivariate Bernoulli model: generates an indicator for each term of the
vocabulary, either 1 indicating presence of the term in the document or 0
indicating absence

 Difference in estimation strategies

20 Document Features

𝑐𝑚𝑎𝑝 =𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑑 𝑐 𝑃(𝑐)

𝑐∈𝐶
 Multinomial:
 𝑐𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑤1 , 𝑤2 , … , 𝑤𝑛 𝑐 𝑃(𝑐)
𝑐∈𝐶

 𝑤1 , 𝑤2 , … , 𝑤𝑛 is the sequence of terms as it occurs in d

 Bernoulli:
 𝑐𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑒1 , 𝑒2 , … , 𝑒𝑣 𝑐 𝑃(𝑐)
𝑐∈𝐶

 𝑒1 , 𝑒2 , … , 𝑒𝑣 is a binary vector of dimensionality V that indicates for each

term whether it occurs in d or not
21 ASSUMPTIONS: Conditional independence

 𝑃 𝑥1 , 𝑥2 , … , 𝑥𝑛 𝑐 = 𝑃 𝑥1 𝑐 ∙ 𝑃 𝑥2 𝑐 ∙ ⋯ ∙ 𝑃 𝑥𝑛 𝑐
 Multinomial:
 𝑐𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑤1 , 𝑤2 , … , 𝑤𝑛 𝑐 𝑃 𝑐
𝑐∈𝐶

 𝑐𝑁𝐵 = 𝑎𝑟𝑔𝑚𝑎𝑥 1≤𝑘≤𝑛 𝑃 𝑋𝑘 = 𝑤𝑘 𝑐 𝑃(𝑐)

𝑐∈𝐶

 Xk is the random variable for position k in the document and takes

as values terms from the vocabulary.
 𝑃 𝑋𝑘 = 𝑤𝑘 𝑐 is the probability that in a document of class c the term
w will occur in position k
22 ASSUMPTIONS: Conditional independence

 Bernoulli:
 𝑐𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑒1 , 𝑒2 , … , 𝑒𝑣 𝑐 𝑃 𝑐
𝑐∈𝐶

 𝑐𝑁𝐵 = 𝑎𝑟𝑔𝑚𝑎𝑥 1≤𝑖≤𝑣 𝑃 𝑈𝑖 = 𝑤𝑖 𝑐 𝑃(𝑐)

𝑐∈𝐶

U is the random variable for vocabulary term i and takes as values

0 (absence) and 1 (presence).

 𝑃 𝑈𝑖 = 𝑤𝑖 𝑐 is the probability that in a document of class c the term
wi will occur
23 ASSUMPTIONS: Bag of words
assumption
 if we assumed different term distributions for each position k, we would have to
estimate a different set of parameters for each k

 positional independence: The conditional probabilities for a term are the same,
independent of position in the document.
 E.g. P(Xk1 = t|c) = P(Xk2 = t|c)

 we have a single distribution of terms that is valid for all positions k

𝑐𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑤1 , 𝑤2 , … , 𝑤𝑛 𝑐 𝑃 𝑐
𝑐∈𝐶

𝑐𝑁𝐵 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑤 𝑐 𝑃(𝑐)

𝑐∈𝐶
𝑤𝜖𝑊
24 Learning Parameters for Multinomial
Naïve Bayes
 Based on Maximum Likelihood Estimation

Fraction of documents
𝐶𝑜𝑢𝑛𝑡 𝐶 = 𝑐𝑗 which are labeled as
𝑃 𝑐𝑗 =
𝑁𝑑𝑜𝑐 class cj

Create mega document

𝐶𝑜𝑢𝑛𝑡(𝑤𝑖, 𝑐𝑗 ) Fraction of times word wi
for class j By concatenating 𝑃(𝑤𝑖|𝑐𝑗 ) =
𝑤𝜖𝑉 𝐶𝑜𝑢𝑛𝑡(𝑤, 𝑐𝑗 )
appears among all words
all docs In this class.
of class cj
Use frequency of w in
mega document.
25 Laplace (add-1) smoothing for Naïve
Bayes
 Solving the problem of zero probabilities

𝐶𝑜𝑢𝑛𝑡 𝑤𝑖, 𝑐𝑗 + 1
𝑃 𝑤𝑖 𝑐𝑗 =
𝑤𝜖𝑉 𝐶𝑜𝑢𝑛𝑡 𝑤, 𝑐𝑗 + 1

𝐶𝑜𝑢𝑛𝑡 𝑤𝑖, 𝑐𝑗 + 1
=
𝑤𝜖𝑉 𝐶𝑜𝑢𝑛𝑡 𝑤, 𝑐𝑗 + |𝑉|
26 Laplace (add-1) smoothing: unknown
words
 Add one extra word to the vocabulary, the “unknown word” , wu

𝐶𝑜𝑢𝑛𝑡 𝑤𝑢, 𝑐𝑗 + 1
𝑃 𝑤𝑢 𝑐𝑗 =
( 𝑤𝜖𝑉 𝑐𝑜𝑢𝑛𝑡(𝑤, 𝑐𝑗 )) + |𝑉 + 1|

1
=
( 𝑤𝜖𝑉 𝑐𝑜𝑢𝑛𝑡(𝑤, 𝑐𝑗 )) + |𝑉 + 1|

Probability for all unknown words

27 Example
28

Vector Space Classification

29 Vector Space Representation

 The document representation in Naive Bayes is a sequence of terms or a binary

vector
 A different representation for text classification: the vector space model
 Each document represented as a vector with one real-valued component,
usually a tf-idf weight, for each term.

 Each vector is composed of all terms

Of all documents.

 Terms are axis

 High dimensionality
30 Vector Space Classification

 Vector space classification is based on the contiguity hypothesis:

 Objects in the same class form a contiguous region, and regions of different
classes do not overlap

 Classification is to compute the boundaries in the space that separate the

classes; the decision boundaries

 Two methods will be discussed:

Rocchio and kNN
31

Vector Space Classification

Rocchio
32 Rocchio Classification

 Basic idea: Rocchio forms a simple representative for each class by using
the centroids.
 To compute the centroid of each class:

1
𝜇 𝑐 = 𝑑𝜖𝐷𝑐 𝑣(𝑑)
|𝐷𝑐|

Dc is the set of all documents that belong to class c

v(d) is the vector space representation of d.

33 Rocchio Classification

 The boundary between two classes in

Rocchio classification is the set of points
with equal distance from the two centroids

 To classify a new object determine

which centroid is closest to and assign it to the corresponding class ci .
34 Problems with Rocchio Classification

 Implicitly assumes that classes are spheres with similar radius

 Ignores details of distribution of points within a class and only considers
centroid distances.

 Red cube is assigned to Kenya class

 But is a better fit for UK class
since UK is more scattered than Kenya

 Not requiring similar radius → modified decision rule:

Assign d to class c iff 𝜇 𝑐 − 𝑣 𝑑 ≤ 𝜇 𝑐 −𝑣 𝑑 −𝑏
35 Problems with Rocchio Classification

 Does not work well for classes than cannot be accurately represented by a
single “center”

 Class “a” consists of two different clusters.

 Rocchio classification will misclassify “o” as “a” because
it is closer to the centroid A of the “a” class than to the
centroid B of the “b” class.
36

Vector Space Classification

K Nearest Neighbor
37 K Nearest Neighbor Classification

 For kNN we assign each document to the majority class of its k closest
neighbors where k is a parameter.

 The rationale of kNN classification is that, based on the contiguity

hypothesis, we expect a test document d to have the same label as the
training documents located in the local region surrounding d.

 Learning: just store the labeled training examples D

38 K Nearest Neighbor Classification

 Determining class of a new document :

 For k = 1: Assign each object to the class of its closest neighbor.

 For k > 1: Assign each object to the majority class among its k closest
neighbors.
 The parameter k must be specified in advance. k = 3 and k = 5 are
common choices, but much larger values between 50 and 100 are also
used
39 kNN decision boundary

 The decision boundary defined by the Voronoi tessellation

 Assuming k = 1: For a given set of objects in the space, each object define
a cell consisting of all points that are closer to that object than to other
objects.
 Results in a set of convex polygons;
so-called Voronoi cells.

 Decomposing a space into such cells

gives us the so-called Voronoi tessellation.
 In the case of k > 1, the Voronoi cells
are given by the regions in the space for which
the set of k nearest neighbors is the same.
40

 Decision boundary for 1NN (double line): defined along the regions of
Voronoi cells for the objects in each class. Shows the non-linearity of kNN
41 kNN: Probabilistic version

 There is a probabilistic version of kNN classification algorithm.

 We can estimate the probability of membership in class c as the proportion of
the k nearest neighbors in c.

 𝑃 (𝑐𝑖𝑟𝑐𝑙𝑒 𝑐𝑙𝑎𝑠𝑠|𝑠𝑡𝑎𝑟) = 1/3

 𝑃(𝑋 𝑐𝑙𝑎𝑠𝑠|𝑠𝑡𝑎𝑟) = 2/3
 𝑃(𝑑𝑖𝑎𝑚𝑜𝑛𝑑 𝑐𝑙𝑎𝑠𝑠|𝑠𝑡𝑎𝑟) = 0

 The 3NN estimate: 𝑃(𝑐𝑖𝑟𝑐𝑙𝑒 𝑐𝑙𝑎𝑠𝑠|𝑠𝑡𝑎𝑟) = 1/3

 The 1NN estimate: 𝑃(𝑐𝑖𝑟𝑐𝑙𝑒 𝑐𝑙𝑎𝑠𝑠|𝑠𝑡𝑎𝑟) = 1
 3NN preferring the X class
and 1NN preferring the circle class .
42 kNN: distance weighted version

 We can weight the “votes” of the k nearest neighbors by their cosine

similarity:

𝑆𝑐𝑜𝑟𝑒(𝑐, 𝑑) = 𝐼𝑐 𝑑 cos(𝑣 𝑑 , 𝑣 𝑑 )
𝑑𝜖𝑆𝑘 (𝑑)

 Sk(d) is the set of d’s k nearest neighbors

 Ic(d′) = 1 iff d′ is in class c and 0 otherwise.
 Assign the document to the class with the highest score.
 Weighting by similarities is often more accurate than simple voting.
43 kNN Properties

 No training necessary
 Scales well with large number of classes
 Don’t need to train n classifiers for n classes
 May be expensive at test time
 In most cases it’s more accurate than NB or Rocchio
44 Some Recent Applications of Text
Classification
 Social networks are rich source of text data e.g. twitter
 Many new applications are emerging based on these sources
 We will discuss three recent works based on twitter data
 “Learning Similarity Functions for Topic Detection in Online Reputation
Monitoring. Damino Spina, et. Al, SIGIR, 2014”
 “Target-dependent Churn Classification in Microblogs, Hadi Amiri, et.al,
AAAI 2015”
 “Fine-Grained Location Extraction from Tweets with Temporal Awareness,
Chenliang Li, SIGIR, 2014”
45 Learning Similarity Functions for Topic
Detection in Online Reputation Monitoring

 What are people are saying about a given entity (company, brand,
organization, personality,…)
 Is there any issues that may damage the reputation of the entity?
 In order to answer such questions, reputation experts have to daily monitor
social networks such as twitter
 The paper aim is to solve this problem automatically as topic detection task
 Reputation alerts must be detected early before they explode. So there are
a few number of related tweets
 Probabilistic generative approaches are less appropriate here because of
data sparsity
46 Learning Similarity Functions…

 Paper proposes a hybrid approach: supervised + unsupervised

 Tweets need to be clustered based on their topics
 Clusters change over time based on data stream
 Tweets with similar topics belong to the same cluster
 For clustering we need a similarity function.
 Classification (SVM) is used to learn the similarity function.
 Two tweets belong to either similar or not similar classes

SVM

Hierarchical Agglomerative Clustering

47 Learning Similarity Functions…

 Dataset is comprised of 142,527 manually annotated tweets.

 Features used for classification:
 Term Features:
 taking into account similarity between terms in the tweets
 Tweets sharing high percentage of vocabulary are likely about same topic

 Semantic features:
 Useful for tweets that do not have words in common.
 Used Wikipedia as a knowledge based to find semantically related words. e.g. mexico,
mexicanas

 Metadata features:
 Such as author, URLs, hashtags,…

 Time-aware features:
 Time stamp of tweets
48 Fine-Grained Location Extraction from
Tweets with Temporal Awareness
 Through tweets, users often casually or implicitly reveal their current
locations and short term plans where to visit next, at fine-grained granularity

 Such information enables tremendous opportunities for personalization and

location-based services/marketing

 This paper is about extracting fine-grained locations mentioned in tweets

with temporal awareness.
49 Fine-Grained Location Extraction…

 Dataset: a sample of 4000 manually annotated tweets

 Classifier: Conditional Random Fields
 Some features:
 Lexical Features: words itself, its lower case, bag-of-words of the context window,
 Grammatical Features: POS tags (for verb tense), word groups (shudd, shuld,
shoud)
 …
 If a user mentions a point-of-interest (POI) (e.g., restaurant, shopping
mall,…) in her tweet:
 The name of the POI is extracted
 The tweet is assigned into one of these classes: The POI is visited, is currently at, or
will soon visit (temporal awareness).
50 Target-dependent Churn Classification
in Microblogs
 The problem of classifying micro-posts as churny or non-churny with respect
to a given brand (telecom companies) using Twitter data.
 Whether a tweet is an indicator that user is going to cancel a service
 Sample tweets:
 “One of my main goals for 2013 is to leave BrandName”
 “I cant take it anymore, the unlimited data isn’t even worth it”
 “My days with BrandName are numbered”
51 Target-dependent Churn Classification
in Microblogs
 Dataset : 4800 tweets about three telecommunication brands
 Classification method: linear regression, SVM, logistic regression
 Features:
 Demographic Churn Indicators:
 Activity ratio (if a user is not active with respect to any competitors, then he is less likely
to send churny contents about a target brand),
 # followers and friends, …

 Content Churn Indicators:

 Sentiment features, Tense of tweet,…

…
52

THANK YOU
53 References

https://web.stanford.edu/class/cs124/lec/sentiment.pptx
https://web.stanford.edu/class/cs124/lec/naivebayes.pdf
An introduction to Information Retrieval, Christopher D. Manning
,Prabhakar Raghavan,Hinrich Schütze,2009
Learning Similarity Functions for Topic Detection in Online
Reputation Monitoring. Damino Spina, et. Al, SIGIR, 2014
Target-dependent Churn Classification in Microblogs, Hadi Amiri,
et.al, AAAI 2015
Fine-Grained Location Extraction from Tweets with Temporal
Awareness, Chenliang Li, SIGIR, 2014

Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
No ratings yet
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
38 pages
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
No ratings yet
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
55 pages
Document
No ratings yet
Document
7 pages
04 Textcat
No ratings yet
04 Textcat
101 pages
NLP NB
No ratings yet
NLP NB
52 pages
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
L5 TextClassification Updated
No ratings yet
L5 TextClassification Updated
179 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
nb24aug
No ratings yet
nb24aug
85 pages
MultinomialNB
No ratings yet
MultinomialNB
52 pages
4 NB 2024
No ratings yet
4 NB 2024
82 pages
Naive Bayes
No ratings yet
Naive Bayes
56 pages
Naivebayes 2021
No ratings yet
Naivebayes 2021
77 pages
Multimedia Application L7_for
No ratings yet
Multimedia Application L7_for
46 pages
NB 24 Aug
No ratings yet
NB 24 Aug
82 pages
in4080_2022_lecture_03
No ratings yet
in4080_2022_lecture_03
62 pages
bag_of_words nlp
No ratings yet
bag_of_words nlp
23 pages
Multimedia Application L8
No ratings yet
Multimedia Application L8
68 pages
20250129_Lecture03_naivebayes
No ratings yet
20250129_Lecture03_naivebayes
25 pages
Text Classification
No ratings yet
Text Classification
32 pages
Lecture 4
No ratings yet
Lecture 4
36 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
03 Classification
No ratings yet
03 Classification
66 pages
Inf2b Learn Note07 2up
No ratings yet
Inf2b Learn Note07 2up
5 pages
Week4
No ratings yet
Week4
45 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
48 pages
AI Lec 04+05 - Naive Bayes
No ratings yet
AI Lec 04+05 - Naive Bayes
55 pages
nb24aug
No ratings yet
nb24aug
79 pages
lecture3-linear-classifiers
No ratings yet
lecture3-linear-classifiers
36 pages
mla_unit-5'2 (1)
No ratings yet
mla_unit-5'2 (1)
74 pages
L2_CSE256_FA24_TC
No ratings yet
L2_CSE256_FA24_TC
65 pages
T4L1 Naive Bayes
No ratings yet
T4L1 Naive Bayes
50 pages
04-textcat text class
No ratings yet
04-textcat text class
77 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
11 Text Categorization
No ratings yet
11 Text Categorization
25 pages
4 Naive Bayes
No ratings yet
4 Naive Bayes
82 pages
Introduction To Machine Learning and Deep Learning
No ratings yet
Introduction To Machine Learning and Deep Learning
40 pages
Naive Bayes With Sentiment Classification
No ratings yet
Naive Bayes With Sentiment Classification
82 pages
Lecture5 421
No ratings yet
Lecture5 421
115 pages
Lecture-Feb20&25
No ratings yet
Lecture-Feb20&25
11 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Text Classification[1][1]
No ratings yet
Text Classification[1][1]
11 pages
Naive Bayes and Sentiment Classification
No ratings yet
Naive Bayes and Sentiment Classification
23 pages
Tackling The Poor Assumptions of Naive Bayes Text Classifiers
No ratings yet
Tackling The Poor Assumptions of Naive Bayes Text Classifiers
8 pages
02 Text Processing PDF
No ratings yet
02 Text Processing PDF
70 pages
NLP ch4 l1
No ratings yet
NLP ch4 l1
23 pages
Lecture 12 Dr. Lamiaa
No ratings yet
Lecture 12 Dr. Lamiaa
21 pages
CSC 325 AI Lecture08 Supervised Learning Fall2024 Dr Raheel 20022025 034558pm
No ratings yet
CSC 325 AI Lecture08 Supervised Learning Fall2024 Dr Raheel 20022025 034558pm
29 pages
Unit 3 PPT
No ratings yet
Unit 3 PPT
20 pages
Naive Bayes Sentiment Analysis
No ratings yet
Naive Bayes Sentiment Analysis
23 pages
NLP Module 3
No ratings yet
NLP Module 3
66 pages
Naive_Bayes (1)
No ratings yet
Naive_Bayes (1)
4 pages
Unit 4
No ratings yet
Unit 4
207 pages
4.Machine Learning for Text Understanding-1
No ratings yet
4.Machine Learning for Text Understanding-1
45 pages
IR unit 2 (1,2)
No ratings yet
IR unit 2 (1,2)
76 pages
Naive Bayes and Sentiment
No ratings yet
Naive Bayes and Sentiment
19 pages
01 What Is Text Classification 8-12
No ratings yet
01 What Is Text Classification 8-12
4 pages
Discriminative Generative: R Follow A
100% (1)
Discriminative Generative: R Follow A
18 pages
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Environmental Studies
No ratings yet
Environmental Studies
59 pages
LLM System Design
No ratings yet
LLM System Design
11 pages
New_ Stanford State of AI in 2025
No ratings yet
New_ Stanford State of AI in 2025
24 pages
A Perfect Guide to DeepSeek R1
No ratings yet
A Perfect Guide to DeepSeek R1
9 pages
Prompting Guide 101
No ratings yet
Prompting Guide 101
68 pages
11 Optimizing Real Time Object Detection A Comparison of Yolo Models
No ratings yet
11 Optimizing Real Time Object Detection A Comparison of Yolo Models
18 pages
AI Engineer Reading List
No ratings yet
AI Engineer Reading List
10 pages
Kansal Privacy-Enhancing Person Re-Identification Framework - A Dual-Stage Approach WACV 2024 Paper
No ratings yet
Kansal Privacy-Enhancing Person Re-Identification Framework - A Dual-Stage Approach WACV 2024 Paper
10 pages
ThoughtSpot Survey Report
No ratings yet
ThoughtSpot Survey Report
20 pages
Ai For All
No ratings yet
Ai For All
8 pages
TM Forum AI in Practice Benchmark Report wHhznq0
No ratings yet
TM Forum AI in Practice Benchmark Report wHhznq0
74 pages
L10-Naive Bayes Continuous
No ratings yet
L10-Naive Bayes Continuous
16 pages
Kashish Akbar
No ratings yet
Kashish Akbar
14 pages
2017 2018 Assessment of the Army Research Laboratory Interim Report 1st Edition And Medicine Engineering National Academies Of Sciences - Download the ebook today and own the complete version
No ratings yet
2017 2018 Assessment of the Army Research Laboratory Interim Report 1st Edition And Medicine Engineering National Academies Of Sciences - Download the ebook today and own the complete version
75 pages
Supervised Assignment 2
No ratings yet
Supervised Assignment 2
2 pages
Machine Learning for Cybersecurity Cookbook Over 80 recipes on how to implement machine learning algorithms for building security systems using Python 1st edition by Emmanuel Tsukerman 9781838556341 1838556346 download
100% (2)
Machine Learning for Cybersecurity Cookbook Over 80 recipes on how to implement machine learning algorithms for building security systems using Python 1st edition by Emmanuel Tsukerman 9781838556341 1838556346 download
29 pages
Berk Ra Statistical Learning From A Regression Perspective
100% (2)
Berk Ra Statistical Learning From A Regression Perspective
451 pages
Experiment No. 5 Te Sl-II (Ann)
No ratings yet
Experiment No. 5 Te Sl-II (Ann)
5 pages
NN Lab Course Plan
No ratings yet
NN Lab Course Plan
9 pages
Machine Learning Q and AI: 30 Essential Questions and Answers on Machine Learning and AI 1 / converted Edition Sebastian Raschka download pdf
75% (4)
Machine Learning Q and AI: 30 Essential Questions and Answers on Machine Learning and AI 1 / converted Edition Sebastian Raschka download pdf
65 pages
Bumps and Pothole Detection Report Final
No ratings yet
Bumps and Pothole Detection Report Final
64 pages
L11.2 Prob Models em
No ratings yet
L11.2 Prob Models em
20 pages
Easy Read AI Disabilities Report
No ratings yet
Easy Read AI Disabilities Report
65 pages
Introduction To Human Resource Metrics
100% (1)
Introduction To Human Resource Metrics
16 pages
AI in Software An Overview
No ratings yet
AI in Software An Overview
10 pages
Machine Learning: Dr. Muhammad Asadullah
No ratings yet
Machine Learning: Dr. Muhammad Asadullah
69 pages
Artificial_Intelligence_AI_Multidiscipli
No ratings yet
Artificial_Intelligence_AI_Multidiscipli
47 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
ModelCompressionTechniquesinDeepLearning
No ratings yet
ModelCompressionTechniquesinDeepLearning
23 pages
Mslearn dp100 02
No ratings yet
Mslearn dp100 02
5 pages
Fuzzy Neural Logic Network and Its Learning Algorithms
No ratings yet
Fuzzy Neural Logic Network and Its Learning Algorithms
10 pages
Crime Prediction
No ratings yet
Crime Prediction
11 pages
Lupi
No ratings yet
Lupi
5 pages
Rtmnu Machine Learning Paper Winter 2024
100% (1)
Rtmnu Machine Learning Paper Winter 2024
4 pages
Unit 3 NNDL-1
No ratings yet
Unit 3 NNDL-1
31 pages
2.) Bachelor-of-Data-Science
No ratings yet
2.) Bachelor-of-Data-Science
2 pages
NLP-1 (Tokenization)
100% (1)
NLP-1 (Tokenization)
10 pages
Applied Artificial Intelligence For Predicting Construction Projects Delay
No ratings yet
Applied Artificial Intelligence For Predicting Construction Projects Delay
16 pages
Parameter-Efficient Transfer Learning For NLP: Dai & Le 2015 Howard & Ruder 2018 Radford Et Al. 2018
No ratings yet
Parameter-Efficient Transfer Learning For NLP: Dai & Le 2015 Howard & Ruder 2018 Radford Et Al. 2018
13 pages
MAKE A COPY OF THE DOC - Harvard CV Template
No ratings yet
MAKE A COPY OF THE DOC - Harvard CV Template
1 page

Text Classification

Uploaded by

Text Classification

Uploaded by

Text Classification

 Determining if an author is male or female

 Is the attitude of this text positive or negative?

 What is the subject of this article:

Output: a predicted class c ∈ C

 Machine learning based methods

 Output: a learned classifier γ :d → c

 We can use any kind of classifiers:

Naïve Bayes Classification

 It is a simple classification technique based on Bayes rule

I love this movie! It's sweet,

 Supervised learning classifiers can use any sort of feature

𝑐𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑝 𝑐 𝑑 MAP is “maximum a

 There are two different approaches for setting up an NB classifier.

 Difference in estimation strategies

𝑐𝑚𝑎𝑝 =𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑑 𝑐 𝑃(𝑐)

 𝑤1 , 𝑤2 , … , 𝑤𝑛 is the sequence of terms as it occurs in d

 𝑒1 , 𝑒2 , … , 𝑒𝑣 is a binary vector of dimensionality V that indicates for each

 𝑐𝑁𝐵 = 𝑎𝑟𝑔𝑚𝑎𝑥 1≤𝑘≤𝑛 𝑃 𝑋𝑘 = 𝑤𝑘 𝑐 𝑃(𝑐)

 Xk is the random variable for position k in the document and takes

 𝑐𝑁𝐵 = 𝑎𝑟𝑔𝑚𝑎𝑥 1≤𝑖≤𝑣 𝑃 𝑈𝑖 = 𝑤𝑖 𝑐 𝑃(𝑐)

U is the random variable for vocabulary term i and takes as values

0 (absence) and 1 (presence).

 we have a single distribution of terms that is valid for all positions k

𝑐𝑁𝐵 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑃 𝑤 𝑐 𝑃(𝑐)

Create mega document

Probability for all unknown words

Vector Space Classification

 The document representation in Naive Bayes is a sequence of terms or a binary

 Each vector is composed of all terms

 Terms are axis

 Vector space classification is based on the contiguity hypothesis:

 Classification is to compute the boundaries in the space that separate the

 Two methods will be discussed:

Vector Space Classification

Dc is the set of all documents that belong to class c

v(d) is the vector space representation of d.

 The boundary between two classes in

 To classify a new object determine

 Implicitly assumes that classes are spheres with similar radius

 Red cube is assigned to Kenya class

 Not requiring similar radius → modified decision rule:

 Class “a” consists of two different clusters.

Vector Space Classification

 The rationale of kNN classification is that, based on the contiguity

 Learning: just store the labeled training examples D

 Determining class of a new document :

 For k = 1: Assign each object to the class of its closest neighbor.

 The decision boundary defined by the Voronoi tessellation

 Decomposing a space into such cells

 There is a probabilistic version of kNN classification algorithm.

 𝑃 (𝑐𝑖𝑟𝑐𝑙𝑒 𝑐𝑙𝑎𝑠𝑠|𝑠𝑡𝑎𝑟) = 1/3

 The 3NN estimate: 𝑃(𝑐𝑖𝑟𝑐𝑙𝑒 𝑐𝑙𝑎𝑠𝑠|𝑠𝑡𝑎𝑟) = 1/3

 We can weight the “votes” of the k nearest neighbors by their cosine

 Sk(d) is the set of d’s k nearest neighbors

 Paper proposes a hybrid approach: supervised + unsupervised

Hierarchical Agglomerative Clustering

 Dataset is comprised of 142,527 manually annotated tweets.

 Such information enables tremendous opportunities for personalization and

 This paper is about extracting fine-grained locations mentioned in tweets

 Dataset: a sample of 4000 manually annotated tweets

 Content Churn Indicators:

You might also like