Unit 3
Unit 3
Techniques
KCS 055
Decision Tree
• A decision tree in a machine learning is a
flowchart structure in which each node
represents a test, and each branch represents the
outcome of the test.
• The end node called leaf node represents a class
label.
• Decision Tree is a supervised learning method .
• Used for both classification and regression.
Decision Tree Learning
𝐸 𝑥 = − 𝑃 𝑥𝑖 . 𝑙𝑜𝑔2𝑃(𝑥𝑖)
𝑖=1
• In other words, entropy is the measure of
“randomness information” of a variable.
• It is the measure of uncertainty associated with
random variable (x)
Calculate the entropy (E) of a single attribute
“Playing Golf” problem.
𝐸 𝑆 = − σ𝑛𝑖=1 𝑝𝑖 . 𝑙𝑜𝑔2𝑝i
Where x = current state, 𝑝𝑖 = Prob. of event (i) of state (S).
Yes No Entropy(Playing Golf) = Entropy(5,9)
9 5 Prob. Of Play Golf = Yes = 9/14 = 0.64
Prob. Of Play Golf = No = 5/14 = 0.36
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
5 4 5
𝐼𝐺 𝑆, 𝐴 = 0.94 − .𝐸 𝑆𝑢𝑛𝑛𝑦 − .E Overcast − .E(Rain)
14 14 14
Attribute 2: Temp.
Temp. Yes No
Hot
Entropy (S) = E(Play) = 0.94 Mild
Cool
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
𝐼𝐺 𝑆, 𝐴 = 0.94 − 𝑃(𝐻𝑜𝑡). 𝐸 𝐻𝑜𝑡 −
𝑃(𝑀𝑖𝑙d). E Mild − 𝑃(𝐶𝑜𝑜𝑙).E(Cool)
Attribute 3: Humidity
Humidity Yes No
High
Entropy (S) = E(Play) = 0.94 Normal
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
𝐼𝐺 𝑆, 𝐴
= 0.94 − 𝑃(𝐻𝑖𝑔ℎ). 𝐸 𝐻𝑖𝑔ℎ − 𝑃(Normal). E Normal
Attribute 4: Wind
Wind Yes No
Weak
Entropy (S) = E(Play) = 0.94 Strong
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Wind) = 0.94 - 𝑃(𝑊𝑒𝑎𝑘).E(Weak) -
𝑃(𝑆𝑡𝑟𝑜𝑛𝑔).E(Strong)
• IG(Outlook) = 0.2464 Root Node
• IG(Temp) = 0.0289
• IG(Humidity) = 0.1316
• IG (Wind) = 0.0478
Outlook
SSunny=[2+, 3-]
2 2 3 3
Entropy(Ssunny) = - log2 - log2 = 0.97
5 5 5 5
Attribute 1: Temp.
Temp. Yes No
Hot 0 2
Entropy (Ssunny) = E(Ssunny) = 0.97 Mild 1 1
Cool 1 0
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
2 2 1
IG(S, Temp) = 0.97 - .E(Hot) - .E(Mild) - .E(Cool)
5 5 5
Attribute 2: Humidity
Humidity Yes No
High
Entropy (Ssunny) = E(Ssunny) = 0.97 Normal
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Humidity) = 0.97 - 𝑃(𝐻𝑖𝑔ℎ).E(High) -
𝑃(𝑁𝑜𝑟𝑚𝑎𝑙).E(Normal)
Attribute 3: Wind
Wind Yes No
Weak
Entropy (Ssunny) = E(Ssunny) = 0.97 Strong
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Wind) = 0.97 - 𝑃(𝑊𝑒𝑎𝑘).E(Weak) -
𝑃(𝑆𝑡𝑟𝑜𝑛𝑔).E(Strong)
• IG(Ssunny ,Temp) = 0.570
• IG(Ssunny ,Humidity) = 0.97 Next Root Node
Yes
No
Information Gain of each attribute w.r.t
Rain
Srain=[3+, 2-]
3 3 2 2
Entropy(Srain) = - log2 - log2 = 0.97
5 5 5 5
Attribute 1: Temp.
Temp. Yes No
Hot 0 0
Entropy (Srain) = E(Srain) = 0.97 Mild 2 1
Cool 1 1
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
0 3 2
IG(S, Temp) = 0.97 - .E(Hot) - .E(Mild) - .E(Cool)
5 5 5
Attribute 2: Humidity
Humidity Yes No
High
Entropy (Srain) = E(Srain) = 0.97 Normal
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Humidity) = 0.97 - 𝑃(𝐻𝑖𝑔ℎ).E(High) -
𝑃(𝑁𝑜𝑟𝑚𝑎𝑙).E(Normal)
Attribute 3: Wind
Wind Yes No
Weak
Entropy (Srain) = E(Srain) = 0.97 Strong
𝑆𝑉
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − . 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑉)
𝑆
IG(S, Wind) = 0.97 - 𝑃(𝑊𝑒𝑎𝑘).E(Weak) -
𝑃(𝑆𝑡𝑟𝑜𝑛𝑔).E(Strong)
• IG(Srain ,Temp) = 0.0192
Yes
Humidity Wind
High Normal
Weak Strong
No Yes Yes No
General Decision Tree Algorithm Steps
• Inductive Learning
• Deductive Learning
Restrictive
• Based on Conditions
Bias
Preference
• Based on Priorities
Bias
• According to Occums
Razor
– Prefer simplest
hypothesis that fits
the data.
3 main properties of Instance Based
Learning
• Lazy Learners.
𝑑1 = 3−7 2 + 7 −7 2 = 16 + 0 = 16 = 4
𝑑2 = 3−7 2 + 7 −4 2 = 16 + 9 = 25 = 5
where,
𝑎𝑖 𝑥 = Value of ith attribute of instance (𝑥)
𝑤𝑖 = Coefficients of weights
Locally Weighted Linear Regression
To find coefficients of weights, we use gradient descent rule:
where,
n = Learning rate constant
𝑓 𝑥 = Target Function
መ
𝑓(𝑥) =Approximation to target function.
𝑎𝑗(𝑥) = Value of jth attribute of instance (x)
Radial Basis Function (RBF)
• Multiquadric
Փ(r) =(r2 + c2)1/2
where, c>0, constant
• Inverse Multiquadric
1
Փ(r) = 2 2 1/2
(r + c )
• Gaussian Function
−r2
Փ(r) =exp[ 2 ]
2𝜎
Radial Basis Function (RBF)
• Used for approximation of multivariate target functions.
መ
𝑓(𝑥) = 𝑤0 + 𝑤𝑢 . 𝐾𝑢 𝑑 𝑥𝑢, 𝑥
መ
𝑓(𝑥) = Approximation of multivariate target functions
𝑤0 = Initial weight
𝑤𝑢= Unit Weight
𝐾𝑢 𝑑 𝑥𝑢, 𝑥 = Kernel Function
𝑑 𝑥𝑢, 𝑥 = distance between 𝑥𝑢 and 𝑥.
Radial Basis Function (RBF) Networks
• Used in Artificial Neural Networks (ANN).
• Used for classification task in ANN.
• Commonly used in ANN for function
approximation also.
• RBF networks are different from simple ANN due
to their universal approximation and faster
speed.
• Feed forward neural network.
• Consists of 3 layers – Input layer, Middle Layer
and Output Layer
Case Based Learning or Case Based
Reasoning (CBR)
Tom M. Mitchell, Ethem Alpaydin, ―Introduction Stephen Marsland, Bishop, C., Pattern
―Machine Learning, to Machine Learning (Adaptive ―Machine Learning: An Recognition and Machine
McGraw-Hill Computation and Machine Algorithmic Perspective, Learning. Berlin:
Education (India) Learning), The MIT Press 2004. CRC Press, 2009. Springer-Verlag.
Private Limited, 2013.
Text Books
Saikat Dutt, Andreas C. Müller and John Paul Mueller and Dr. Himanshu
Subramanian Sarah Guido - Luca Massaron - Sharma, Machine
Chandramaouli, Amit Introduction to Machine Machine Learning for Learning, S.K.
Kumar Das – Machine Learning with Python Dummies Kataria & Sons -2022
Learning, Pearson