0% found this document useful (0 votes)

98 views50 pages

Classification DMKD

The document discusses the differences between classification and prediction problems, describing classification as predicting categorical class labels while prediction models continuous values. It also covers supervised vs unsupervised learning, with supervised learning using labeled training data and unsupervised learning not having labels. Various classification and prediction applications are provided like credit approval, medical diagnosis, and web page categorization.

Uploaded by

nayanisateesh2805

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views50 pages

Classification DMKD

Uploaded by

nayanisateesh2805

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 50

Classification vs.

Prediction
• Classification:
• predicts categorical class labels
• classifies data (constructs a model) based on the training set and the
values (class labels) in a classifying attribute and uses it in classifying
new data
• Prediction:
• models continuous-valued functions, i.e., predicts unknown or missing
values
• Typical Applications
• credit approval
• target marketing
• medical diagnosis
• treatment effectiveness analysis
• Large data sets: disk-resident rather than memory-resident
data
Supervised vs. Unsupervised Learning
• Supervised learning (classification)
• Supervision: The training data (observations,
measurements, etc.) are accompanied by labels indicating
the class of the observations
• New data is classified based on the training set
• Unsupervised learning (clustering)
• The class labels of training data is unknown
• Given a set of measurements, observations, etc. with the
aim of establishing the existence of classes or clusters in the
data
2
Prediction Problems: Classification vs.
Numeric Prediction
• Classification
• predicts categorical class labels (discrete or nominal)
• classifies data (constructs a model) based on the training
set and the values (class labels) in a classifying attribute and
uses it in classifying new data
• Numeric Prediction
• models continuous-valued functions, i.e., predicts unknown
or missing values
• Typical applications
• Credit/loan approval:
• Medical diagnosis: if a tumor is cancerous or benign
• Fraud detection: if a transaction is fraudulent
• Web page categorization: which category it is

3
Classification—A Two-Step Process
• Model construction: describing a set of predetermined classes
• Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute
• The set of tuples used for model construction is training set
• The model is represented as classification rules, decision trees, or
mathematical formulae
• Model usage: for classifying future or unknown objects
• Estimate accuracy of the model
• The known label of test sample is compared with the classified
result from the model
• Accuracy rate is the percentage of test set samples that are
correctly classified by the model
• Test set is independent of training set (otherwise overfitting)
• If the accuracy is acceptable, use the model to classify new data

4
Process (1): Model Construction

Classification
Algorithms
Training
Data

Classifier
Student Maths physics chemistry Grade (Model)
name
Ram 90 80 70 A
Siva 70 75 80 B
IF maths > 80 OR physics > 80
Mani 99 68 98 A
OR Chemistry > 80
sanjay 76 79 74 B
THEN Grade = ‘A’
5 Else Grade = ‘B’
Process (2): Using the Model in Prediction

Classifier

Testing
Data Unseen Data

(Manoj, 70, 68, 77)

Student Maths physics chemistry
name
Gaurav 78 79 56
Grade: B
Ankith 90 91 80
Manoj 70 68 77
6 Rakesh 90 81 82
Entropy

• Entropy (Information theory)

• A measure of uncertainty associated with a random
variable
Interpretation
• Higher Entropy -> higher uncertainty
means that the events being measured are less predictable
• Lower Entropy -> Lower uncertainty
mean that the events being measured are more predictable
Attribute Selection Measure:
Information Gain
 Select the attribute with the highest information gain
 Let pi be the probability that an arbitrary tuple in D belongs to
class Ci, estimated by |Ci, D|/|D|
 Expected information (entropy) needed to classify a tuple in D:
m
Info( D)   pi log 2 ( pi )
i 1
 Information needed (after using A to split D into v partitions) to
classify D: v | D |
InfoA ( D)    Info( D j )
j

j 1 | D |
 Information gained by branching on attribute A

Gain(A)  Info(D)  InfoA(D)

8
Attribute Selection: Information Gain
age income student credit_rating buys_computer
 Class P: buys_computer = “yes”
<=30 high no fair no
 Class N: buys_computer = “no” <=30 high no excellent no
31…40 high no fair yes
m >40 medium no fair yes
Info( D)   pi log 2 ( pi ) >40
>40
low
low
yes
yes
fair
excellent
yes
no
i 1
31…40 low yes excellent yes
<=30 medium no fair no
9 9 5 5
Info( D)  I (9,5)   log 2 ( )  log 2 ( ) <=30 low yes fair yes
14 14 14 14 >40 medium yes fair yes
<=30 medium yes excellent yes
v | Dj | 31…40 medium no excellent yes
InfoA ( D)    Info( D j ) 31…40 high yes fair yes
j 1 | D| >40 medium no excellent no

5
Infoage ( D) 
5
I (2,3) 
4
I (4,0) I (2,3) means “age <=30” has 5 out of 14 samples,
14 with 2 yes’es and 3 no’s. Hence
14 14
5
 I (3,2) 2 2 3 3
14 I (2,3)   log 2 ( )  log 2 ( )
5 5 5 5
Gain
Gain(age)  Info( D)  Infoage ( D)  0.246

Gain(income)  0.029
Gain( student )  0.151
Gain(credit _ rating )  0.048
age
<=30 >40
31-40

Income Income Income

High High Low
High Low
Med Low Med Med
Student Student Student
Student Student Student
Student
Yes Student Student Yes
No Yes No Yes
No Yes Yes No Yes
Yes No No No
CR CR CR CR
No No CR CR CR
CR CR
CR CR CR CR
CR CR CR CR CR

FAIR EXCELLENT
Buys Buys
comp comp
Yes Yes
No No
Weather Temperature Humidity Wind Golf Play
fine Hot high none no
fine Hot high few no
cloudy Hot high none Yes
rain Warm high none yes
rain Cold medium none yes
rain Cold medium few no
cloudy Cold medium few yes
fine Warm high none no
fine Cold medium none yes
rain Warm medium none yes
fine Warm medium few yes
cloudy Warm high few yes
cloudy Hot medium none yes
rain Warm high few no
S1
gender major birth_country age_range gpa count
M Science Canada 20-25 Very_good 16
F Science Foreign 25-30 Excellent 22
M Engineering Foreign 25-30 Excellent 18
F Science Foreign 25-30 Excellent 25
M Science Canada 20-25 Excellent 21
F Engineering Canada 20-25 Excellent 18

S2 120

gender major birth_country age_range gpa count

M Science Foreign <20 Very_good 18
F Business Canada <20 Fair 20
M Business Canada <20 Fair 22
F Science Canada 20-25 Fair 24
M Engineering Foreign 20-25 Very_good 22
F Engineering Canada <20 Excellent 24

130
120 120 130 130
I(s 1, s 2)  I(120,130)   log 2  log 2  0.9988
250 250 250 250
For major=”Science”: S11=84 S21=42 I(s11,s21)=0.9183

For major=”Engineering”: S12=36 S22=46 I(s12,s22)=0.9892

For major=”Business”: S13=0 S23=42 I(s13,s23)=0

126 82 42
E(major)  I ( s11, s 21)  I ( s12, s 22)  I ( s13, s 23)  0.7873
250 250 250

Gain(major )  I(s 1, s 2)  E(major)  0.2115

Bayes’ Theorem: Basics

• Bayes’ Theorem: P(H | X)  P(X | H )P(H )  P(X | H ) P(H ) / P(X)

P(X)
• Let X be a data sample (“evidence”): class label is unknown
• Let H be a hypothesis that X belongs to class C
• Classification is to determine P(H|X), (i.e., posteriori probability): the
probability that the hypothesis holds given the observed data sample X
• P(H) (prior probability): the initial probability
• E.g., X will buy computer, regardless of age, income, …
• P(X): probability that sample data is observed
• P(X|H) (likelihood): the probability of observing the sample X, given that
the hypothesis holds
• E.g., Given that X will buy computer, the prob. that X is 31..40,
medium income
15
Prediction Based on Bayes’ Theorem
• Given training data X, posteriori probability of a hypothesis H,
P(H|X), follows the Bayes’ theorem

P(H | X)  P(X | H )P(H )  P(X | H ) P(H ) / P(X)

P(X)
• Informally, this can be viewed as
posteriori = likelihood x prior/evidence
• Predicts X belongs to Ci if the probability P(Ci|X) is the highest
among all the P(Ck|X) for all the k classes
• Practical difficulty: It requires initial knowledge of many
probabilities, involving significant computational cost

16
Classification Is to Derive the Maximum Posteriori
• Let D be a training set of tuples and their associated class labels,
and each tuple is represented by an n-D attribute vector X = (x1,
x2, …, xn)
• Suppose there are m classes C1, C2, …, Cm.
• Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
• This can be derived from Bayes’ theorem
P(X | C )P(C )
P(C | X)  i i
i P(X)

• Since P(X) is constant for all classes, only

• This greatly reduces the computation cost: Only counts the class
distribution

18
Naïve Bayes Classifier: Training Dataset
age income student credit_rating buys_computer
<=30 high no fair no
Class: <=30 high no excellent no
31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
Data to be classified: <=30 medium no fair no
X = (age <=30, <=30 low yes fair yes
>40 medium yes fair yes
Income = medium, <=30 medium yes excellent yes
Student = yes 31…40 medium no excellent yes
31…40 high yes fair yes
Credit_rating = Fair) >40 medium no excellent no

19
Naïve Bayes Classifier: An Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
• Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
• P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
20
Using IF-THEN Rules for Classification
• Represent the knowledge in the form of IF-THEN rules
R: IF age = <=30 AND student = yes THEN buys_computer = yes
• Assessment of a rule: coverage and accuracy
• ncovers = no of tuples covered by R
• ncorrect = no of tuples correctly classified by R
coverage(R) = ncovers /|D| /* D: training data set */
accuracy(R) = ncorrect / ncovers

• IF age = “<=30” AND student = no AND credit rating = fair THEN

buys_computer = no

• If more than one rule are triggered, need conflict resolution

• Size ordering: assign the highest priority to the triggering rules that has the
“toughest” requirement (i.e., with the most attribute tests)
• Class-based ordering: decreasing order of prevalence or misclassification cost
per class
• Rule-based ordering (decision list): rules are organized into one long priority
list, according to some measure of rule quality or by experts
21
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

IF age = “<=30” AND student = no ncovers = no of tuples covered by R = 3 [no of records that satisfy
the rule antecedent]
THEN buys_computer = no ncorrect = no of tuples correctly classified by R = 3 [no of records
Let given rule as R:AB Then that satisfy both the antecedent and consequent]
|D|= total no of records
Coverage(R) = |A| /|D|
coverage(R) = ncovers /|D| = 3/14
Accuracy(R) = |A∩B| /|A| accuracy(R) = ncorrect / ncovers =3/3 [i.e. 100 %]
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low no fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

IF age = “<=30” AND student = no ncovers = no of tuples covered by R = 4 [no of records that satisfy
the rule antecedent]
THEN buys_computer = no ncorrect = no of tuples correctly classified by R = 3 [no of records
Let given rule as R:AB Then that satisfy both the antecedent and consequent]
|D|= total no of records
Coverage(R) = |A| /|D|
coverage(R) = ncovers /|D| = 4/14
Accuracy(R) = |A∩B| /|A| accuracy(R) = ncorrect / ncovers =3/4 [i.e. 75 %]
Rule Extraction from a Decision Tree
 Rules are easier to understand than large
trees age?

 One rule is created for each path from the <=30 31..40 >40

root to a leaf student? credit rating?

yes
 Each attribute-value pair along a path forms a no yes excellent fair
conjunction: the leaf holds the class no yes yes
prediction
 Rules are mutually exclusive and exhaustive
• Example: Rule extraction from our buys_computer decision-tree
IF age = young AND student = no THEN buys_computer = no
IF age = young AND student = yes THEN buys_computer = yes
IF age = mid-age THEN buys_computer = yes
IF age = old AND credit_rating = excellent THEN buys_computer = no
IF age = old AND credit_rating = fair THEN buys_computer = yes
24
Rule Induction: Sequential Covering Method
• Sequential covering algorithm: Extracts rules directly from training
data
• Typical sequential covering algorithms: FOIL, AQ, CN2, RIPPER
• Rules are learned sequentially, each for a given class Ci will cover
many tuples of Ci but none (or few) of the tuples of other classes
• Steps:
• Rules are learned one at a time
• Each time a rule is learned, the tuples covered by the rules are
removed
• Repeat the process on the remaining tuples until termination
condition, e.g., when no more training examples or when the
quality of a rule returned is below a user-specified threshold
• Comp. w. decision-tree induction: learning a set of rules
simultaneously
25
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Sequential Covering Algorithm
while (enough target tuples left)
generate a rule
remove positive target tuples satisfying this rule

Examples covered
Examples covered by Rule 2
by Rule 1 Examples covered
by Rule 3

Positive
examples

27
Rule Generation
• To generate a rule
while(true)
find the best predicate p
if coverage > threshold then add p to current rule
else break

A3=1&&A1=2
A3=1&&A1=2
&&A8=5A3=1

Positive Negative
examples examples

28
How to Learn-One-Rule?

• Start with the most general rule possible: condition = empty

• Adding new attributes by adopting a greedy depth-first strategy
• Picks the one that most improves the rule quality
• Rule-Quality measures: consider both coverage and accuracy

29
Model Evaluation
• Metrics for Performance Evaluation
• How to evaluate the performance of a model?

• Methods for Performance Evaluation

• How to obtain reliable estimates?

• Methods for Model Comparison

• How to compare the relative performance among competing models?
Metrics for Performance Evaluation
• Focus on the predictive capability of a model
• Rather than how fast it takes to classify or build models, scalability, etc.
• Confusion Matrix:
PREDICTED CLASS

Class=Yes Class=No
a: TP (true positive)
b: FN (false negative)
Class=Yes a b
ACTUAL c: FP (false positive)
d: TN (true negative)
CLASS Class=No c d
Metrics for Performance Evaluation…
PREDICTED CLASS

Class=Yes Class=No

Class=Yes a b
ACTUAL (TP) (FN)
CLASS
Class=No c d
(FP) (TN)

• Most widely-used metric:

ad TP  TN
Accuracy  
a  b  c  d TP  TN  FP  FN
Limitation of Accuracy
• Consider a 2-class problem
• Number of Class 0 examples = 9990
• Number of Class 1 examples = 10

• If model predicts everything to be class 0, accuracy is 9990/10000 =

99.9 %
• Accuracy is misleading because model does not detect any class 1 example
Cost Matrix
PREDICTED CLASS

C(i|j) Class=Yes Class=No

Class=Yes C(Yes|Yes) C(No|Yes)

ACTUAL
CLASS Class=No C(Yes|No) C(No|No)

C(i|j): Cost of misclassifying class j example as class i

Cost-Sensitive Measures
a
Precision (p) 
ac
a
Recall (r) 
ab
2rp 2a
F - measure (F)  
r  p 2a  b  c
 Precision is biased towards C(Yes|Yes) & C(Yes|No)
 Recall is biased towards C(Yes|Yes) & C(No|Yes)
 F-measure is biased towards all except C(No|No)
K- Nearest Neighbor Algorithm
• K- nearest neighbor is a simple algorithm that stores all available
classes and classifies new cases based on similarity measure (eg.
Euclidean distance)
• A case is classified by a majority vote of its neighbor, with the case
being assigned to the class most common amongst its K nearest
neighbors measured by a distance function.

d ( p, q )   ( pi
i
q )
i
2

• K is an integer, If K= 1 then the case is simply assigned to the class of

its nearest neighbor.
K- Nearest Neighbor Algorithm
Example

Name Acid Durability Strength Class

Type 1 7 7 Bad
Type 2 7 4 Bad
Type 3 3 4 Good
Type 4 1 4 Good

Test Data  Acid durability = 3 and strength = 7 class = ?

• Euclidian distance

d ( p, q )   ( pi
i
q ) i
2

Name Acid Durability Strength Class Distance

Type 1 7 7 Bad Sqrt((7-3)2+(7-7)2) = 4

Type 2 7 4 Bad 5
Type 3 3 4 Good 3
Type 4 1 4 Good 3.6

Test Data  Acid durability = 3 and strength = 7 class = ?

Name Acid Strength Class Distance Rank
Durability
Type 1 7 7 Bad 4 3
Type 2 7 4 Bad 5 4
Type 3 3 4 Good 3 1
Type 4 1 4 Good 3.6 2
K=1

Name Acid Strength Class Distance Rank

Durability
Type 1 7 7 Bad 4 3
Type 2 7 4 Bad 5 4
Type 3 3 4 Good 3 1
Type 4 1 4 Good 3.6 2

Test Data  Acid durability = 3 and strength = 7 class = Good

K=2

Name Acid Strength Class Distance Rank

Durability
Type 1 7 7 Bad 4 3
Type 2 7 4 Bad 5 4
Type 3 3 4 Good 3 1
Type 4 1 4 Good 3.6 2

Test Data  Acid durability = 3 and strength = 7 class = Good

K=3

Name Acid Strength Class Distance Rank

Durability
Type 1 7 7 Bad 4 3
Type 2 7 4 Bad 5 4
Type 3 3 4 Good 3 1
Type 4 1 4 Good 3.6 2

Test Data  Acid durability = 3 and strength = 7 class = 2 Good and 1 Bad majority = Good
Practical Issues of Classification
• Underfitting and Overfitting

• Missing Values
Underfitting and Overfitting (Example)
500 circular and 500
triangular data points.
Underfitting and Overfitting
Overfitting

Underfitting: when model is too simple, both training and test errors are large
Overfitting due to Noise

Decision boundary is distorted by noise point

Overfitting due to Insufficient Examples

Lack of data points in the lower half of the diagram makes it difficult
to predict correctly the class labels of that region
- Insufficient number of training records in the region causes the
decision tree to predict the test examples using other training
records that are irrelevant to the classification task
Computing Impurity Measure
Tid Refund Marital Taxable Before Splitting:
Status Income Class Entropy(Parent)
= -0.3 log(0.3)-(0.7)log(0.7) = 0.8813
1 Yes Single 125K No
2 No Married 100K No Class Class
3 No Single 70K No = Yes = No
Refund=Yes 0 3
4 Yes Married 120K No
Refund=No 2 4
5 No Divorced 95K Yes
Refund=? 1 0
6 No Married 60K No
7 Yes Divorced 220K No
Split on Refund:

8 No Single 85K Yes Entropy(Refund=Yes) = 0

9 No Married 75K No Entropy(Refund=No)
10 ? Single 90K Yes = -(2/6)log(2/6) – (4/6)log(4/6) = 0.9183
10

Entropy(Children)
Missing = 0.3 (0) + 0.6 (0.9183) = 0.551
value
Gain = 0.9  (0.8813 – 0.551) = 0.3303
Distribute Instances
Tid Refund Marital Taxable
Status Income Class
Tid Refund Marital Taxable
1 Yes Single 125K No Status Income Class
2 No Married 100K No
10 ? Single 90K Yes
3 No Single 70K No 10

4 Yes Married 120K No

Refund
5 No Divorced 95K Yes Yes No
6 No Married 60K No
Class=Yes 0 + 3/9 Class=Yes 2 + 6/9
7 Yes Divorced 220K No
Class=No 3 Class=No 4
8 No Single 85K Yes
9 No Married 75K No
10

Probability that Refund=Yes is 3/9

Refund
Yes No Probability that Refund=No is 6/9
Assign record to the left child with
Class=Yes 0 Cheat=Yes 2 weight = 3/9 and to the right child
Class=No 3 Cheat=No 4 with weight = 6/9

19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
Module 3
No ratings yet
Module 3
132 pages
Unit 4
No ratings yet
Unit 4
186 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
Classification: Basic Concepts
No ratings yet
Classification: Basic Concepts
73 pages
VII - CS8031 - DMDW - Module 6 - Classification - VBP
No ratings yet
VII - CS8031 - DMDW - Module 6 - Classification - VBP
99 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
83 pages
Classification and Prediction
No ratings yet
Classification and Prediction
130 pages
Lecture 4
No ratings yet
Lecture 4
79 pages
Unit 3-Classification
No ratings yet
Unit 3-Classification
71 pages
Unit 4 DM
No ratings yet
Unit 4 DM
88 pages
IntroClassificationDA 2024
No ratings yet
IntroClassificationDA 2024
129 pages
08ClassBasic L
No ratings yet
08ClassBasic L
78 pages
08 Class Basic
No ratings yet
08 Class Basic
76 pages
04 Classification
No ratings yet
04 Classification
72 pages
Classification
100% (1)
Classification
37 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
Classification
No ratings yet
Classification
73 pages
Classification
No ratings yet
Classification
36 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
Classification-1
No ratings yet
Classification-1
48 pages
Data Mining Book
No ratings yet
Data Mining Book
84 pages
Slide 07 Chapter8 Classification Basic Concept
No ratings yet
Slide 07 Chapter8 Classification Basic Concept
55 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
CH 5
No ratings yet
CH 5
84 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
Class Basic
No ratings yet
Class Basic
67 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
7 Classification
100% (3)
7 Classification
63 pages
6 Classification
No ratings yet
6 Classification
53 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Lecture 8 - Naive Bayes
No ratings yet
Lecture 8 - Naive Bayes
27 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
No ratings yet
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
28 pages
Bayesian
No ratings yet
Bayesian
23 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
AI Notes
No ratings yet
AI Notes
19 pages
UGC Public Notice Treating All Degrees
No ratings yet
UGC Public Notice Treating All Degrees
1 page
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Classification
No ratings yet
Classification
33 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
Bayesian Classification - Problem
No ratings yet
Bayesian Classification - Problem
4 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Data Structures and Algorithms
100% (1)
Data Structures and Algorithms
5 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
KCS-402 2022-23
No ratings yet
KCS-402 2022-23
2 pages
T.E (2019 Pattern) May 2022
No ratings yet
T.E (2019 Pattern) May 2022
32 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Ada Lab Manual
No ratings yet
Ada Lab Manual
57 pages
Unit2-Notes Ai Updated
No ratings yet
Unit2-Notes Ai Updated
36 pages
2.7 Turing Machines
No ratings yet
2.7 Turing Machines
28 pages
Number Representation
No ratings yet
Number Representation
7 pages
Writing Proofs - Analyzing Games - Problems - Lavrov (2015-16)
No ratings yet
Writing Proofs - Analyzing Games - Problems - Lavrov (2015-16)
2 pages
Soft Computing Practical Teacher Manual
No ratings yet
Soft Computing Practical Teacher Manual
87 pages
Cognitivecomputing
100% (1)
Cognitivecomputing
2 pages
ML Module 5
No ratings yet
ML Module 5
5 pages
Dual
No ratings yet
Dual
35 pages
Chapter 2 - Data Envelopment Analysis
100% (1)
Chapter 2 - Data Envelopment Analysis
28 pages
Feynman Simulating
No ratings yet
Feynman Simulating
28 pages
Lec05 Uninformed Search
No ratings yet
Lec05 Uninformed Search
24 pages
Comp Arc A
No ratings yet
Comp Arc A
52 pages
Object Oriented Programming Object Oriented Programming: Lecture-13-16 Instructor Name
No ratings yet
Object Oriented Programming Object Oriented Programming: Lecture-13-16 Instructor Name
69 pages
Lecture 22
No ratings yet
Lecture 22
39 pages
Dsa 07 T Shankar
No ratings yet
Dsa 07 T Shankar
4 pages
RRMRegistrationForm July2018
No ratings yet
RRMRegistrationForm July2018
1 page
Instructions For Candidates
No ratings yet
Instructions For Candidates
3 pages
IA Error Codes
No ratings yet
IA Error Codes
14 pages
Model Sample Paper (Standard) 2
No ratings yet
Model Sample Paper (Standard) 2
8 pages
Final QT Vam Duality
No ratings yet
Final QT Vam Duality
12 pages
Data Visualization 1
No ratings yet
Data Visualization 1
5 pages
Fee Structure Details
No ratings yet
Fee Structure Details
1 page
J2ME Viva Questions & Answers
No ratings yet
J2ME Viva Questions & Answers
10 pages
CS 2750 Machine Learning
No ratings yet
CS 2750 Machine Learning
14 pages
Local Search For Fast Matrix Multiplication
No ratings yet
Local Search For Fast Matrix Multiplication
9 pages
Be Summer 2022
No ratings yet
Be Summer 2022
2 pages
DS Lab-Scheme
No ratings yet
DS Lab-Scheme
4 pages
Quiz-2 COL100
No ratings yet
Quiz-2 COL100
4 pages
Unit 6.3 - Linear Program Simplex Method
No ratings yet
Unit 6.3 - Linear Program Simplex Method
6 pages
VTU E-Shikshana Programme - 02 Schedule For Live Transmission of Lectures From 19.08.2019 To 16.11.2019
No ratings yet
VTU E-Shikshana Programme - 02 Schedule For Live Transmission of Lectures From 19.08.2019 To 16.11.2019
3 pages
Sheet 1
No ratings yet
Sheet 1
2 pages
Sea130 DS Exp 4
No ratings yet
Sea130 DS Exp 4
5 pages
ELEC1010 Homework 4
No ratings yet
ELEC1010 Homework 4
3 pages
Neo4j Graph Data Science Certified - Exam Practice Tests
From Everand
Neo4j Graph Data Science Certified - Exam Practice Tests
Cristian Scutaru
No ratings yet

Classification DMKD

Uploaded by

Classification DMKD

Uploaded by

Classification vs.

(Manoj, 70, 68, 77)

• Entropy (Information theory)

Gain(A)  Info(D)  InfoA(D)

Income Income Income

gender major birth_country age_range gpa count

For major=”Engineering”: S12=36 S22=46 I(s12,s22)=0.9892

For major=”Business”: S13=0 S23=42 I(s13,s23)=0

Gain(major )  I(s 1, s 2)  E(major)  0.2115

• Bayes’ Theorem: P(H | X)  P(X | H )P(H )  P(X | H ) P(H ) / P(X)

P(H | X)  P(X | H )P(H )  P(X | H ) P(H ) / P(X)

• Since P(X) is constant for all classes, only

• IF age = “<=30” AND student = no AND credit rating = fair THEN

• If more than one rule are triggered, need conflict resolution

root to a leaf student? credit rating?

• Start with the most general rule possible: condition = empty

• Methods for Performance Evaluation

• Methods for Model Comparison

• Most widely-used metric:

• If model predicts everything to be class 0, accuracy is 9990/10000 =

C(i|j) Class=Yes Class=No

Class=Yes C(Yes|Yes) C(No|Yes)

C(i|j): Cost of misclassifying class j example as class i

• K is an integer, If K= 1 then the case is simply assigned to the class of

Name Acid Durability Strength Class

Test Data  Acid durability = 3 and strength = 7 class = ?

Name Acid Durability Strength Class Distance

Test Data  Acid durability = 3 and strength = 7 class = ?

Name Acid Strength Class Distance Rank

Test Data  Acid durability = 3 and strength = 7 class = Good

Name Acid Strength Class Distance Rank

Test Data  Acid durability = 3 and strength = 7 class = Good

Name Acid Strength Class Distance Rank

Decision boundary is distorted by noise point

8 No Single 85K Yes Entropy(Refund=Yes) = 0

4 Yes Married 120K No

Probability that Refund=Yes is 3/9

You might also like