Unit 7
Unit 7
Supervised Learning:
Classification and Regression
Subject: Machine Learning(3170724)
Dr. Ami Tusharkant Choksi
Associate professor, Computer Engineering Department,
C.K.Pithawala College of Engineering and Technology
Website: www.ckpcet.ac.in
1
Dr. Ami Tusharkant Choksi@CKPCET Machine Learning(3170724)
Supervised vs. Unsupervised Learning
■ Supervised learning (classification)
■ Supervision: The training data (observations, measurements, etc.)
are accompanied by labels indicating the class of the observations
■ New data is classified based on the training set
■ Unsupervised learning (clustering)
■ The class labels of training data is unknown
■ Given a set of measurements, observations, etc. with the aim of
establishing the existence of classes or clusters in the data
■ GainRatio(A) = Gain(A)/SplitInfo(A)
■ Ex.
■ Reduction in Impurity:
■ The attribute provides the smallest ginisplit(D) (or the largest reduction in
impurity) is chosen to split the node (need to enumerate all the possible
splitting points for each attribute)
Dr. Ami Tusharkant Choksi@CKPCET Machine Learning(3170724) 12
Computation of Gini Index
■ Ex. D has 9 tuples in buys_computer = “yes” and 5 in “no”
■ Suppose the attribute income partitions D into 10 in D1: {low, medium} and 4
in D2
13
13
Dr. Ami Tusharkant Choksi@CKPCET Machine Learning(3170724)
Computation of Gini Index
■ All attributes are assumed continuous-valued
■ May need other tools, e.g., clustering, to get the possible split
values
■ Can be modified for categorical attributes
14
14
Dr. Ami Tusharkant Choksi@CKPCET Machine Learning(3170724)
Disadvantages of CART
■ Gain ratio:
■ Attribute construction
■ Create new attributes based on existing ones that are sparsely
represented
■ This reduces fragmentation, repetition, and replication
Test data
given here is:
Name=Josh,
Aptitude=5
Communicati
on=4.5
Class=?
■ But it is often a tricky decision to decide the value of k. The reasons are as
follows:
■ If the value of k is very large (in the extreme case equal to the total number of
records in the training data), the class label of the majority class of the training
data set will be assigned to the test data regardless of the class labels of the
neighbours nearest to the test data.
■ If the value of k is very small (in the extreme case equal to 1), the class value of a
noisy data or outlier in the training data set which is the nearest neighbour to
the test data will be assigned to the test data.
■ The best k value is somewhere between these two extremes.
■ This is to ensure that all the data instances that belong to one
class falls above one hyperplane and all the data instances
belonging to the other class falls below another hyperplane.
■ According to vector geometry, the distance of these planes
Dr. Ami Tusharkant Choksi@CKPCET Machine Learning(3170724) 64
Identifying the MMH for linearly separable data
LSVM