0% found this document useful (0 votes)

224 views71 pages

Unit 3 Classification

This document provides an overview of basic concepts in classification, including: 1) Classification techniques use learning algorithms to build models that predict class labels for new records based on patterns in training data. Common techniques include decision trees, rules, neural networks, and naive Bayes. 2) The training process involves building a classification model from a training set with known labels, which is then tested on new data. Performance is evaluated using metrics like accuracy and a confusion matrix. 3) Key issues include data cleaning, feature selection, evaluation of models based on accuracy, speed, robustness, and interpretability. Decision tree induction is described as a common classification approach.

Uploaded by

Saima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

224 views71 pages

Unit 3 Classification

Uploaded by

Saima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 71

Classification: Basic Concepts

Unsupervised Learning
General Approach to Solving a
Classification Problem
• A classification technique (or classifier) is a
systematic approach to building classification models
from an input data set.
Classification Techniques
 Decision Tree based Methods
 Rule-based Methods
 Memory based reasoning
 Neural Networks
 Naïve Bayes and Bayesian Belief Networks
 Support Vector Machines
• Each technique employs a learning algorithm
to identify a model that best fits the
relationship between the attribute set and
class label of the input data.
• key objective of the learning algorithm is to
build models with good generalization
capability i.e., models that accurately predict
the class labels of previously unknown records.
• First, a training set
consisting of records whose
class labels are known must
be provided.
• The training set is used to
build a classification model,
which is subsequently
applied to the test set,
which consists of records
with unknown class labels.
• Evaluation of the performance of a classification
model is based on the counts of test records
correctly and incorrectly predicted by the
model. These counts are tabulated in a table
known as a confusion matrix.
Issues regarding classification and prediction

• Data cleaning
– Preprocess data in order to reduce noise and handle
missing values
• Relevance analysis (feature selection)
– Remove the irrelevant or redundant attributes
• Data transformation
– Generalize and/or normalize data

10/27/22 12
Evaluation of Classifiers
• Predictive accuracy
• Speed and scalability
– time to construct the model
– time to use the model
• Robustness
– handling noise and missing values
• Scalability
– efficiency in disk-resident databases
• Interpretability:
– understanding and insight provded by the model
• Goodness of rules
– decision tree size
– compactness of classification rules
10/27/22 13
Classification by Decision Tree Induction
• Decision tree
– A flow-chart-like tree structure
– Internal node denotes a test on an attribute
– Branch represents an outcome of the test
– Leaf nodes represent class labels or class distribution
• Decision tree generation consists of two phases
– Tree construction
• At start, all the training examples are at the root
• Partition examples recursively based on selected attributes
– Tree pruning
• Identify and remove branches that reflect noise or outliers
• Use of decision tree: Classifying an unknown sample
– Test the attribute values of the sample against the decision tree

10/27/22 14
Training Dataset

age income student credit_rating

This <=30 high no fair
<=30 high no excellent
follows an 31…40 high no fair
example >40 medium no fair
from >40 low yes fair
>40 low yes excellent
Quinlan’s 31…40 low yes excellent
ID3 <=30 medium no fair
<=30 low yes fair
>40 medium yes fair
<=30 medium yes excellent
31…40 medium no excellent
31…40 high yes fair
10/27/22
>40 medium no excellent 15
Output: A Decision Tree for “buys_computer”

age?

<=30 overcast
30..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes

10/27/22 16
Algorithm for Decision Tree Induction
• Basic algorithm (a greedy algorithm)
– Tree is constructed in a top-down recursive divide-and-conquer manner
– At start, all the training examples are at the root
– Attributes are categorical (if continuous-valued, they are discretized in
advance)
– Examples are partitioned recursively based on selected attributes
– Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
• Conditions for stopping partitioning
– All samples for a given node belong to the same class
– There are no remaining attributes for further partitioning – majority
voting is employed for classifying the leaf
– There are no samples left

10/27/22 17
Attribute Selection Measure

• Information gain (ID3/C4.5)

– All attributes are assumed to be categorical
– Can be modified for continuous-valued attributes
• Gini index (IBM IntelligentMiner)
– All attributes are assumed continuous-valued
– Assume there exist several possible split values for each
attribute
– May need other tools, such as clustering, to get the possible
split values
– Can be modified for categorical attributes

10/27/22 18
Information Gain (ID3/C4.5)

• Select the attribute with the highest information gain

• Assume there are two classes, P and N
– Let the set of examples S contain p elements of class P and n
elements of class N
– The amount of information, needed to decide if an arbitrary
example in S belongs to P or N is defined as

p p n n
I ( p , n)   log2  log2
pn pn pn pn

10/27/22 19
Information Gain in Decision Tree
Induction

• Assume that using attribute A a set S will be partitioned into

sets {S1, S2 , …, Sv}
– If Si contains pi examples of P and ni examples of N, the
entropy, or the expected information needed to classify
objects in all subtrees Si is
 p n
E ( A)   i i I ( pi , ni )
i 1 p  n

• The encoding information that would be gained by branching

on A Gain( A)  I ( p, n)  E ( A)
10/27/22 20
Attribute Selection by Information Gain
Computation
5 4
E ( age )  I ( 2 ,3 )  I ( 4 ,0 )
g Class P: buys_computer = “yes” 14 14
5
g Class N: buys_computer = “no”  I ( 3 , 2 )  0 . 69
14
g I(p, n) = I(9, 5) =0.940
Hence
g Compute the entropy for age:
Gain(age)  I ( p, n)  E (age)
Similarly

age pi ni I(pi, ni) Gain(income)  0.029

<=30 2 3 0.971
Gain( student )  0.151
30…40 4 0 0
>40 3 2 0.971 Gain(credit _ rating )  0.048

10/27/22 21
Gini Index (IBM IntelligentMiner)

• If a data set T contains examples from n classes, gini index, gini(T)

n
is defined as gini (T )  1   p 2j
j 1
where pj is the relative frequency of class j in T.
• If a data set T is split into two subsets T1 and T2 with sizes N1 and
N2 respectively, the gini index of the split data contains examples
from n classes, the gini index gini(T) is defined as
gini split
(T )  N 1 gini (T 1)  N 2 gini (T 2 )
N N

• The attribute provides the smallest ginisplit(T) is chosen to split the

node (need to enumerate all possible splitting points for each
attribute).
10/27/22 22
Extracting Classification Rules from Trees

• Represent the knowledge in the form of IF-THEN rules

• One rule is created for each path from the root to a leaf
• Each attribute-value pair along a path forms a conjunction
• The leaf node holds the class prediction
• Rules are easier for humans to understand
• Example
IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN buys_computer =
“yes”
IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”

10/27/22 23
Avoid Overfitting in Classification
• The generated tree may overfit the training data
– Too many branches, some may reflect anomalies due to
noise or outliers
– Result is in poor accuracy for unseen samples
• Two approaches to avoid overfitting
– Prepruning: Halt tree construction early—do not split a
node if this would result in the goodness measure falling
below a threshold
• Difficult to choose an appropriate threshold
– Postpruning: Remove branches from a “fully grown” tree
—get a sequence of progressively pruned trees
• Use a set of data different from the training data to
decide which is the “best pruned tree”
10/27/22 24
Approaches to Determine the Final Tree Size

• Separate training (2/3) and testing (1/3) sets

• Use cross validation, e.g., 10-fold cross validation
• Use all the data for training
– but apply a statistical test (e.g., chi-square) to estimate
whether expanding or pruning a node may improve the
entire distribution
• Use minimum description length (MDL) principle:
– halting growth of the tree when the encoding is
minimized
10/27/22 25
Tree Induction

 Greedy strategy.
– Split the records based on an attribute test that
optimizes certain criterion.

 Issues
– Determine how to split the records
How to specify the attribute test condition?
How to determine the best split?

– Determine when to stop splitting

Tree Induction

 Greedy strategy.
– Split the records based on an attribute test that
optimizes certain criterion.

 Issues
– Determine how to split the records
How to specify the attribute test condition?
How to determine the best split?

– Determine when to stop splitting

How to Specify Test Condition?

 Depends on attribute types

– Nominal
– Ordinal
– Continuous

 Depends on number of ways to split

– 2-way split
– Multi-way split
Splitting Based on Nominal Attributes

 Multi-way split: Use as many partitions as

distinct values.
CarType
Family Luxury
Sports

 Binary split: Divides values into two subsets.

Need to find optimal partitioning.
CarType CarType
{Sports, OR {Family,
Luxury} {Family} Luxury} {Sports}
Splitting Based on Ordinal Attributes

 Multi-way split: Use as many partitions as distinct

values.
Size
Small Large
Medium

 Binary split: Divides values into two subsets.

Need to find optimal partitioning.
Size Size
{Small, OR {Medium,
Medium} {Large} Large} {Small}

Size
{Small,
 What about this split? Large} {Medium}
Splitting Based on Continuous Attributes

 Different ways of handling

– Discretization to form an ordinal categorical
attribute
 Static – discretize once at the beginning
 Dynamic – ranges can be found by equal interval
bucketing, equal frequency bucketing
(percentiles), or clustering.

– Binary Decision: (A < v) or (A  v)

 consider all possible splits and finds the best cut
 can be more compute intensive
Splitting Based on Continuous Attributes

Taxable Taxable
Income Income?
> 80K?
< 10K > 80K
Yes No

[10K,25K) [25K,50K) [50K,80K)

(i) Binary split (ii) Multi-way split

Tree Induction

 Greedy strategy.
– Split the records based on an attribute test that
optimizes certain criterion.

 Issues
– Determine how to split the records
How to specify the attribute test condition?
How to determine the best split?

– Determine when to stop splitting

How to determine the Best Split

Before Splitting: 10 records of class 0,

10 records of class 1

Own Car Student

Car? Type? ID?

Yes No Family Luxury c1 c20

c10 c11
Sports
C0: 6
C1: 4
C0: 4
C1: 6
C0: 1
C1: 3
C0: 8
C1: 0
C0: 1
C1: 7
C0: 1
C1: 0
... C0: 1
C1: 0
C0: 0
C1: 1
... C0: 0
C1: 1

Which test condition is the best?

How to determine the Best Split

 Greedy approach:
– Nodes with homogeneous class distribution are
preferred
 Need a measure of node impurity:

C0: 5 C0: 9
C1: 5 C1: 1

Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity
Measures of Node Impurity

 Gini Index

 Entropy

 Misclassification error
How to Find the Best Split

Before Splitting: C0 N00

M0
C1 N01

A? B?
Yes No Yes No

Node N1 Node N2 Node N3 Node N4

C0 N10 C0 N20 C0 N30 C0 N40

C1 N11 C1 N21 C1 N31 C1 N41

M1 M2 M3 M4

M12 M34
Gain = M0 – M12 vs M0 – M34
Measure of Impurity: GINI

 Gini Index for a given node t :

GINI (t )  1   [ p ( j | t )]2
j
(NOTE: p( j | t) is the relative frequency of class j at node t).
– Maximum (1 - 1/nc) when records are equally distributed
among all classes, implying least interesting information
– Minimum (0.0) when all records belong to one class,
implying most interesting information

C1 0 C1 1 C1 2 C1 3
C2 6 C2 5 C2 4 C2 3
Gini=0.000 Gini=0.278 Gini=0.444 Gini=0.500
Examples for computing GINI

GINI (t )  1   [ p ( j | t )]2
j

C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

C2 6 Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0

C1 1 P(C1) = 1/6 P(C2) = 5/6

C2 5 Gini = 1 – (1/6)2 – (5/6)2 = 0.278

C1 2 P(C1) = 2/6 P(C2) = 4/6

C2 4 Gini = 1 – (2/6)2 – (4/6)2 = 0.444
Splitting Based on GINI

 Used in CART, SLIQ, SPRINT.

 When a node p is split into k partitions (children), the quality of
split is computed as,
k
ni
GINI split   GINI (i )
i 1 n

where, ni = number of records at child i,

n = number of records at node p.
Binary Attributes: Computing GINI Index

 Splits into two partitions

 Effect of Weighing partitions:
– Larger and Purer Partitions are sought for.

Parent
B? C1 6
Yes No C2 6
Gini = 0.500
Node N1 Node N2
Gini(N1)
= 1 – (5/6)2 – (2/6)2 N1 N2 Gini(Children)
= 0.194
C1 5 1 = 7/12 * 0.194 +
Gini(N2) C2 2 4 5/12 * 0.528
= 1 – (1/6)2 – (4/6)2 Gini=0.333 = 0.333
= 0.528
Categorical Attributes: Computing Gini Index

 For each distinct value, gather counts for each class in the
dataset
 Use the count matrix to make decisions

Multi-way split Two-way split

(find best partition of values)

CarType CarType CarType

Family Sports Luxury {Sports, {Family,
{Family} {Sports}
Luxury} Luxury}
C1 1 2 1 C1 C1
3 1 2 2
C2 4 1 1 C2 2 4 C2 1 5
Gini 0.393 Gini 0.400 Gini 0.419
Continuous Attributes: Computing Gini Index

 Use Binary Decisions based on one value

 Several Choices for the splitting value
– Number of possible splitting values
= Number of distinct values
 Each splitting value has a count matrix
associated with it
– Class counts in each of the partitions,
A < v and A  v
 Simple method to choose best v
– For each v, scan the database to
gather count matrix and compute its
Gini index
– Computationally Inefficient! Repetition Taxable
Income
of work.
> 80K?

Yes No
Bayes Classifier
• A probabilistic framework for solving
classification problems P ( A, C )
P (C | A) 
• Conditional Probability: P ( A)
P ( A, C )
P( A | C ) 
P(C )
• Bayes theorem:
P( A | C) P(C)
P(C | A) 
P( A)
Example of Bayes Theorem
• Given:
– A doctor knows that meningitis causes stiff neck 50% of the time
– Prior probability of any patient having meningitis is 1/50,000
– Prior probability of any patient having stiff neck is 1/20

• If a patient has stiff neck, what’s the probability

he/she has meningitis?

P( S | M ) P( M ) 0.5 1 / 50000
P( M | S )    0.0002
P( S ) 1 / 20
Bayesian Classifiers
• Consider each attribute and class label as random
variables

• Given a record with attributes (A1, A2,…,An)

– Goal is to predict class C
– Specifically, we want to find the value of C that maximizes
P(C| A1, A2,…,An )

• Can we estimate P(C| A1, A2,…,An ) directly from

data?
Bayesian Classifiers
• Approach:
– compute the posterior probability P(C | A1, A2, …, An) for all
values of C using the Bayes theorem
P ( A A  A | C ) P (C )
P (C | A A  A )  1 2 n

P( A A  A )
1 2 n

1 2 n

– Choose value of C that maximizes

P(C | A1, A2, …, An)

– Equivalent to choosing value of C that maximizes

P(A1, A2, …, An|C) P(C)

• How to estimate P(A1, A2, …, An | C )?

Naïve Bayes Classifier
• Assume independence among attributes Ai when class is
given:
– P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj)

– Can estimate P(Ai| Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj)  P(Ai| Cj) is maximal.

How to Estimate Probabilities from Data?

• Class: P(C) = Nc/N

– e.g., P(No) = 7/10,
P(Yes) = 3/10
Tid Refund Marital Taxable

• For discrete attributes:

Status Incom e Evade

1 Yes Single 125K No

P(Ai | Ck) = |Aik|/ Nkc

2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
– where |Aik| is number of
6 No Married 60K No
7 Yes Divorced 220K No instances having attribute Ai
8 No Single 85K Yes and belongs to class Ck
9 No Married 75K No
10 No Single 90K Yes
– Examples:
10

P(Status=Married|No) = 4/7
P(Refund=Yes|Yes)=0
How to Estimate Probabilities from Data?

• For continuous attributes:

– Discretize the range into bins
• one ordinal attribute per bin k
• violates independence assumption
– Two-way split: (A < v) or (A > v)
• choose only one of the two splits as new attribute
– Probability density estimation:
• Assume attribute follows a normal distribution
• Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
• Once probability distribution is known, can use it to estimate
the conditional probability P(Ai|c)
Example of Naïve Bayes Classifier
Given a Test Record:
X  (Refund  No, Married,Income  120K)
naive Bayes Classifier:

P(Refund=Yes|No) = 3/7  P(X|Class=No) = P(Refund=No|Class=No)

P(Refund=No|No) = 4/7  P(Married| Class=No)
P(Refund=Yes|Yes) = 0  P(Income=120K| Class=No)
P(Refund=No|Yes) = 1 = 4/7  4/7  0.0072 = 0.0024
P(Marital Status=Single|No) = 2/7
P(Marital Status=Divorced|No)=1/7
P(Marital Status=Married|No) = 4/7  P(X|Class=Yes) = P(Refund=No| Class=Yes)
P(Marital Status=Single|Yes) = 2/7  P(Married| Class=Yes)
P(Marital Status=Divorced|Yes)=1/7  P(Income=120K| Class=Yes)
P(Marital Status=Married|Yes) = 0 = 1  0  1.2  10-9 = 0
For taxable income:
If class=No: sample mean=110 Since P(X|No)P(No) > P(X|Yes)P(Yes)
sample variance=2975 Therefore P(No|X) > P(Yes|X)
If class=Yes: sample mean=90
sample variance=25 => Class = No
Naïve Bayes Classifier
• If one of the conditional probability is zero,
then the entire expression becomes zero
• Probability estimation:
N ic
Original : P( Ai | C ) 
Nc
c: number of classes
N ic  1
Laplace : P( Ai | C )  p: prior probability
Nc  c
m: parameter
N ic  mp
m - estimate : P( Ai | C ) 
Nc  m
Example of Naïve Bayes Classifier
Name Give Birth Can Fly Live in Water Have Legs Class
human yes no no yes mammals A: attributes
python no no no no non-mammals
salmon no no yes no non-mammals M: mammals
whale yes no yes no mammals
frog no no sometimes yes non-mammals N: non-mammals
komodo no no no yes non-mammals
6 6 2 2
bat yes yes no yes mammals P ( A | M )      0.06
pigeon
cat
no
yes
yes
no
no
no
yes
yes
non-mammals
mammals
7 7 7 7
leopard shark yes no yes no non-mammals 1 10 3 4
turtle no no sometimes yes non-mammals P ( A | N )      0.0042
penguin no no sometimes yes non-mammals 13 13 13 13
porcupine yes no no yes mammals
eel no no yes no non-mammals 7
salamander no no sometimes yes non-mammals P ( A | M ) P( M )  0.06   0.021
gila monster no no no yes non-mammals 20
platypus no no no yes mammals
13
owl
dolphin
no
yes
yes
no
no
yes
yes
no
non-mammals
mammals
P ( A | N ) P ( N )  0.004   0.0027
eagle no yes no yes non-mammals 20

Give Birth Can Fly Live in Water Have Legs Class P(A|M)P(M) > P(A|N)P(N)
yes no yes no ?
=> Mammals
Naïve Bayes (Summary)
• Robust to isolated noise points

• Handle missing values by ignoring the instance during

probability estimate calculations

• Robust to irrelevant attributes

• Independence assumption may not hold for some

attributes
– Use other techniques such as Bayesian Belief Networks
(BBN)
Bayesian Theorem

• Given training data D, posteriori probability of a hypothesis

h, P(h|D) follows the Bayes theorem
P(h | D)  P(D | h)P(h)
P(D)
• MAP (maximum posteriori) hypothesis
h  arg max P(h | D)  arg max P(D | h)P(h).
MAP hH hH
• Practical difficulty: require initial knowledge of many
probabilities, significant computational cost

10/27/22 55
The independence hypothesis…
• … makes computation possible
• … yields optimal classifiers when satisfied
• … but is seldom satisfied in practice, as attributes (variables)
are often correlated.
• Attempts to overcome this limitation:
– Bayesian networks, that combine Bayesian reasoning with
causal relationships between attributes
– Decision trees, that reason on one attribute at the time,
considering most important attributes first
10/27/22 56
Bayesian Belief Networks (I)
Family
Smoker
History
(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)

LC 0.8 0.5 0.7 0.1

LungCancer Emphysema ~LC 0.2 0.5 0.3 0.9

The conditional probability table for the

variable LungCancer

PositiveXRay Dyspnea

Bayesian Belief Networks

10/27/22 57
Bayesian Belief Networks (II)
• Bayesian belief network allows a subset of the variables
conditionally independent
• A graphical model of causal relationships
• Several cases of learning Bayesian belief networks
– Given both network structure and all the variables: easy
– Given network structure but only some variables
– When the network structure is not known in advance

10/27/22 58
Nearest Neighbor Classifiers
• Basic idea:
– If it walks like a duck, quacks like a duck, then it’s
probably a duck
Compute
Distance Test Record

Training Choose k of the

Records “nearest” records
Nearest-Neighbor Classifiers
Unknown record  Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve

 To classify an unknown record:

– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points that have

the k smallest distance to x
1 nearest-neighbor
Voronoi Diagram
Nearest Neighbor Classification
• Compute distance between two points:
– Euclidean distance

d ( p, q )   ( pi
i
q )
i
2

• Determine the class from nearest neighbor list

– take the majority vote of class labels among the k-nearest
neighbors
– Weigh the vote according to distance
• weight factor, w = 1/d2
Nearest Neighbor Classification…
• Choosing the value of k:
– If k is too small, sensitive to noise points
– If k is too large, neighborhood may include points from
other classes

X
Nearest Neighbor Classification…
• Scaling issues
– Attributes may have to be scaled to prevent
distance measures from being dominated by one
of the attributes
– Example:
• height of a person may vary from 1.5m to 1.8m
• weight of a person may vary from 90lb to 300lb
• income of a person may vary from $10K to $1M
Nearest Neighbor Classification…
• Problem with Euclidean measure:
– High dimensional data
• curse of dimensionality
– Can produce counter-intuitive results
111111111110 100000000000
vs
011111111111 000000000001
d = 1.4142 d = 1.4142

 Solution: Normalize the vectors to unit length

Nearest neighbor Classification…
• k-NN classifiers are lazy learners
– It does not build models explicitly
– Unlike eager learners such as decision tree
induction and rule-based systems
– Classifying unknown records are relatively
expensive

Resume Format For History Teachers
100% (2)
Resume Format For History Teachers
6 pages
Interior Ballistics Simulation of Modular Charge Gun System Using Matlab
100% (1)
Interior Ballistics Simulation of Modular Charge Gun System Using Matlab
7 pages
Milk Vending Machine
50% (2)
Milk Vending Machine
21 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Module 4
No ratings yet
Module 4
99 pages
7 Classification
100% (3)
7 Classification
63 pages
Editing in MS Word
No ratings yet
Editing in MS Word
3 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
08ClassBasic L
No ratings yet
08ClassBasic L
78 pages
Unit 3
No ratings yet
Unit 3
98 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Unit 3
100% (1)
Unit 3
21 pages
Classification
No ratings yet
Classification
75 pages
Unit 4 DM
No ratings yet
Unit 4 DM
88 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Lecture 8
No ratings yet
Lecture 8
81 pages
05 Classification
No ratings yet
05 Classification
79 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
UNIT 2 Class Basic
No ratings yet
UNIT 2 Class Basic
69 pages
Classification
No ratings yet
Classification
45 pages
Classification
100% (1)
Classification
37 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
Definitive Guide To Azure Kubernetes Service (AKS) Security
No ratings yet
Definitive Guide To Azure Kubernetes Service (AKS) Security
19 pages
08ClassBasic v1
No ratings yet
08ClassBasic v1
46 pages
DM 4
No ratings yet
DM 4
68 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
5 Classification
No ratings yet
5 Classification
59 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
11052r Wingding
No ratings yet
11052r Wingding
45 pages
Employmentary Skill For Class - X All 5 Units 2024-25
No ratings yet
Employmentary Skill For Class - X All 5 Units 2024-25
63 pages
Classification
No ratings yet
Classification
45 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Data Mining - Lecture 5
No ratings yet
Data Mining - Lecture 5
33 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Untitled
No ratings yet
Untitled
44 pages
CH 5
No ratings yet
CH 5
81 pages
Decision Tree
No ratings yet
Decision Tree
22 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Class Basic
No ratings yet
Class Basic
75 pages
Circuit Theory Lec 4
No ratings yet
Circuit Theory Lec 4
34 pages
15B11CI212 - Theoretical Foundations of Computer Science Tutorial 11 Solutions Automata Theory
No ratings yet
15B11CI212 - Theoretical Foundations of Computer Science Tutorial 11 Solutions Automata Theory
7 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
All-In-One-Matrix Ao 11072024 (External)
No ratings yet
All-In-One-Matrix Ao 11072024 (External)
26 pages
DMDW Classification
No ratings yet
DMDW Classification
18 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
87 pages
Ilmhouse Obokkhoykal Audiobook
No ratings yet
Ilmhouse Obokkhoykal Audiobook
16 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Afnan PPT (1) (Read-Only)
No ratings yet
Afnan PPT (1) (Read-Only)
13 pages
RFDD Arfhd
No ratings yet
RFDD Arfhd
10 pages
DM Unit-Iii
No ratings yet
DM Unit-Iii
13 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Makalah Group 8 B. Ingg
No ratings yet
Makalah Group 8 B. Ingg
19 pages
Algebra - Task & Drill Sheets Gr. 3-5
From Everand
Algebra - Task & Drill Sheets Gr. 3-5
Nat Reed
No ratings yet
7 - Classfication - Concept - DecisionTree - Evaluation
No ratings yet
7 - Classfication - Concept - DecisionTree - Evaluation
47 pages
Designing Advanced Encryption Methods For Secure IoT Communication
No ratings yet
Designing Advanced Encryption Methods For Secure IoT Communication
4 pages
A Computer Is An Electronic Device That Has Storage
No ratings yet
A Computer Is An Electronic Device That Has Storage
4 pages
Manual Detroit Diesel Serie 92
No ratings yet
Manual Detroit Diesel Serie 92
180 pages
1.2 Workbook - 101
No ratings yet
1.2 Workbook - 101
11 pages
Current Openings
No ratings yet
Current Openings
3 pages
wb3 Draft
No ratings yet
wb3 Draft
6 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Compiler Design BCST 602
No ratings yet
Compiler Design BCST 602
2 pages
Tài liệu không có tiêu đề
No ratings yet
Tài liệu không có tiêu đề
2 pages
Ecomdash Setup Checklist
No ratings yet
Ecomdash Setup Checklist
2 pages
Openbravo Obtt2 Platform Course Guide
No ratings yet
Openbravo Obtt2 Platform Course Guide
7 pages
How2electronics Com BLDC Brushless DC Motor Driver Circuit 555
No ratings yet
How2electronics Com BLDC Brushless DC Motor Driver Circuit 555
10 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Job Description - ShareChat - CodeChef
No ratings yet
Job Description - ShareChat - CodeChef
2 pages
CV - 2023 03 11 014419
No ratings yet
CV - 2023 03 11 014419
1 page
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Template Module Content
No ratings yet
Template Module Content
2 pages
Reading: Form B - Extra Reading
No ratings yet
Reading: Form B - Extra Reading
3 pages
Algebra - Drill Sheets Gr. 6-8
From Everand
Algebra - Drill Sheets Gr. 6-8
Nat Reed
No ratings yet

Unit 3 Classification

Uploaded by

Unit 3 Classification

Uploaded by

Classification: Basic Concepts

age income student credit_rating

student? yes credit rating?

no yes excellent fair

• Information gain (ID3/C4.5)

• Select the attribute with the highest information gain

• Assume that using attribute A a set S will be partitioned into

• The encoding information that would be gained by branching

age pi ni I(pi, ni) Gain(income)  0.029

• If a data set T contains examples from n classes, gini index, gini(T)

• The attribute provides the smallest ginisplit(T) is chosen to split the

• Represent the knowledge in the form of IF-THEN rules

• Separate training (2/3) and testing (1/3) sets

– Determine when to stop splitting

– Determine when to stop splitting

 Depends on attribute types

 Depends on number of ways to split

 Multi-way split: Use as many partitions as

 Binary split: Divides values into two subsets.

 Multi-way split: Use as many partitions as distinct

 Binary split: Divides values into two subsets.

 Different ways of handling

– Binary Decision: (A < v) or (A  v)

[10K,25K) [25K,50K) [50K,80K)

(i) Binary split (ii) Multi-way split

– Determine when to stop splitting

Before Splitting: 10 records of class 0,

Own Car Student

Yes No Family Luxury c1 c20

Which test condition is the best?

Before Splitting: C0 N00

Node N1 Node N2 Node N3 Node N4

C0 N10 C0 N20 C0 N30 C0 N40

 Gini Index for a given node t :

C1 0 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1

C1 1 P(C1) = 1/6 P(C2) = 5/6

C1 2 P(C1) = 2/6 P(C2) = 4/6

 Used in CART, SLIQ, SPRINT.

where, ni = number of records at child i,

 Splits into two partitions

Multi-way split Two-way split

CarType CarType CarType

 Use Binary Decisions based on one value

• If a patient has stiff neck, what’s the probability

• Given a record with attributes (A1, A2,…,An)

• Can we estimate P(C| A1, A2,…,An ) directly from

– Choose value of C that maximizes

– Equivalent to choosing value of C that maximizes

• How to estimate P(A1, A2, …, An | C )?

– Can estimate P(Ai| Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj)  P(Ai| Cj) is maximal.

• Class: P(C) = Nc/N

• For discrete attributes:

1 Yes Single 125K No

P(Ai | Ck) = |Aik|/ Nkc

• For continuous attributes:

P(Refund=Yes|No) = 3/7  P(X|Class=No) = P(Refund=No|Class=No)

• Handle missing values by ignoring the instance during

• Robust to irrelevant attributes

• Independence assumption may not hold for some

• Given training data D, posteriori probability of a hypothesis

LC 0.8 0.5 0.7 0.1

The conditional probability table for the

Bayesian Belief Networks

Training Choose k of the

 To classify an unknown record:

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points that have

• Determine the class from nearest neighbor list

 Solution: Normalize the vectors to unit length

You might also like