Unit5 Ch8 - DataMiningClassification
Unit5 Ch8 - DataMiningClassification
❖Machine Learning is the field of study that gives computers the capability to learn without
being explicitly programmed
❖The primary aim is to allow the computers learn automatically without human intervention or
assistance and adjust actions accordingly
❖It is a branch of Artificial Intelligence
❖Machine Learning Techniques
❖Supervised learning
❖Unsupervised learning
❖Semi-supervised learning
❖Re-inforcement machine learning
7
Definition - Classification
Classification is the problem of identifying to which of a set of categories (subpopulations),
a new observation belongs to, on the basis of a training set of data containing
observations and whose categories membership is known.
9
Classification—A Two-Step Process
Learning phase or Model construction: describing a set of predetermined classes
◦ Each tuple/sample is assumed to belong to a predefined class, as determined by the class
label attribute
◦ The set of tuples used for model construction is training set
◦ The model is represented as classification rules, decision trees, or mathematical formulae
Classification phase or Model usage: for classifying future or unknown objects
◦ Estimate accuracy of the model
◦ The known label of test sample is compared with the classified result from the model
◦ If the accuracy is acceptable, use the model to classify new data
Note: If the test set is used to select models, it is called validation set
Classifier
(Model)
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
12/17/2022 MIT-WPU, DMDW 11 11
Step (2): Using the Model in Prediction
Classifier
Testing
Unseen Data
Data
(Jeff, Professor, 4)
Tenured?
Yes
Humidity Wind
No Yes No Yes
Decision Tree : Another Example
❑ Training data set: Buys_computer
❑ Resulting tree:
age?
no yes no yes
12/17/2022 MIT-WPU, DMDW 15 15
Decision Tree Induction
❑Decision tree induction is learning of decision tree from class-labeled training
tuples
❑In decision tree, each internal node denotes a test on an attribute, each
branch represents an outcome of test, and each leaf node holds a class label
❑For an unknown tuple X, path is traced from root to leaf node for prediction of
class.
❑Decision tree can be converted to classification rules.
▪Select the attribute with the highest information gain reflecting least impurity
▪Let pi be the probability that an arbitrary tuple in D belongs to class Ci, estimated by |Ci,D|/|D|
▪Let m = number of distinct values of class label attribute
▪Expected information (entropy of D) needed to classify a tuple in D:
The ID3 algorithm is run recursively on the non-leaf branches, until all data is
classified.
12/17/2022 MIT-WPU, DMDW 36
Final Decision Tree
❖where pi is the probability that a tuple in D belongs to class Ci and is estimated by |Ci,D|/|D|. The sum
is computed over m classes.
❖The Gini index considers a binary split for each attribute.
❖For example, if income has three possible values, namely {low, medium, high}, then the possible subsets
are {low, medium, high}, {low, medium}, {low, high}, {medium, high}, {low}, {medium}, {high}, and {}.
❖We exclude the power set, {low, medium, high}, and the empty set from consideration since, conceptu-
ally, they do not represent a split.
❖Therefore, there are 2v − 2 possible ways to form two partitions of the data, D, based on a binary split
on A.
12/17/2022 MIT-WPU, DMDW 45
Gini Index
❖When considering a binary split, we compute a weighted sum of the impurity of each resulting
partition.
❖For example, if a binary split on A partitions D into D1 and D2, the Gini index of D given that
partitioning is
❖a possible split-point of A, D1 is the set of tuples in D satisfying A ≤ split point, and D2 is the set
of tuples in D satisfying A > split point.
❖The reduction in impurity that would be incurred by a binary split on a discrete- or continuous-
valued attribute A is
P(X|buys_computer=yes) = P(age=youth|buys_computer=yes) *
P(income=medium|buys_computer=yes) *
P(student=yes|buys_computer=yes) *
P(credit_rating=fair|buys_computer=yes)
= 0.222 * 0.444 * 0.667 * 0.667
= 0.044
Predicted Class
C1 (yes / +ve) C2 (no / -ve)