CSC454 10
CSC454 10
• Ensemble learning
o Bagging
o Boosting
Decision Trees
▪ Decision tree algorithm belongs to the family of supervised
learning algorithms.
N Y N Y
Graphical intuition on decision tree
age Y
< 𝟑𝟎 𝟑𝟎 − 𝟒𝟎 > 𝟒𝟎
We have
- 14 observations/ examples
- 4 attributes/features: outlook- temperature- humidity- wind
- 2 classes (Yes, No) https://www.saedsayad.com/decision_tree.htm
Decision Tree Representation
1. Each node corresponds to an attribute
0 0 14 14
✓ 𝑺𝒖𝒑𝒑𝒐𝒔𝒆 𝟎 𝒚𝒆𝒔 𝒂𝒏𝒅 𝟏𝟒 𝒏𝒐 𝐻 𝑦 = − log( )− log( )=0
14 14 14 14
14 14 0 0
✓ 𝑺𝒖𝒑𝒑𝒐𝒔𝒆 𝟏𝟒 𝒚𝒆𝒔 𝒂𝒏𝒅 𝟎 𝒏𝒐 𝐻 𝑦 = − log( )− log( )=0
14 14 14 14
7 7 7 7
✓ 𝑺𝒖𝒑𝒑𝒐𝒔𝒆 𝟕 𝒚𝒆𝒔 𝒂𝒏𝒅 𝟕 𝒏𝒐 𝐻 𝑦 = − log( )− log( )=1
14 14 14 14
Entropy
5 4 5
Weighted entropy= × 0.97+ × 0+ × 0.97
14 14 14
= 0.692
𝐼𝐺 𝑦, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 0.94 − 0.692 =0.246
𝐼𝐺 𝑦, 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 0.029
𝐼𝐺 𝑦, 𝑤𝑖𝑛𝑑𝑦 = 0.94 − 0.892 =0.048
𝐼𝐺 𝑦, ℎ𝑢𝑚𝑑𝑖𝑡𝑦 = 0.94 − 0.789 =0.151
First step: which attribute to test at
the root?
▪ Which attribute should be tested at the root?
▪ Gain(S, Outlook) = 0.246
▪ Gain(S, Humidity) = 0.151
▪ Gain(S, Wind) = 0.048
▪ Gain(S, Temperature) = 0.029
▪ Outlook provides the best prediction for the target
▪ Lets grow the tree:
▪ add to the tree a successor for each possible value of Outlook
▪ partition the training samples according to the value of Outlook
After first step
Constructing decision tree
You need to calculate the information gain for sunny to choose which attribute is
the best attribute ( temperature, humidity, and Windy)
2 2 1
𝑊𝑒𝑖𝑔𝑡ℎ𝑡𝑒𝑑 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(Temp)= × 0 + × 1 + × 0 = .4
5 5 5
𝐼𝐺 𝑆𝑢𝑛𝑛𝑦, 𝑡𝑒𝑚𝑝𝑡𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 0.97 − 0.4 = 0.57
Second step
▪ Working on Outlook=Sunny node:
Gain(SSunny, Humidity) = 0.970 − 3/5 0.0 − 2/5 0.0 = 0.970
Gain(SSunny, Wind) = 0.970 − 2/5 1.0 − 3.5 0.918 = 0 .019
Gain(SSunny, Temp.) = 0.970 − 2/5 0.0 − 2/5 1.0 − 1/5 0.0 = 0.570
▪ Humidity provides the best prediction for the target
▪ Lets grow the tree:
▪ add to the tree a successor for each possible value of Humidity
▪ partition the training samples according to the value of Humidity
Second and third steps
{D1, D2, D8} {D9, D11} {D4, D5, D10} {D6, D14}
No Yes No Yes
Constructing decision tree outlook
𝒔𝒖𝒏𝒏𝒚 𝑹𝒂𝒊𝒏
The same process will be repeated to
select the best attribute. For example humidity wind
All Yes
𝐼𝐺 𝑅𝑎𝑖𝑛𝑦, 𝑡𝑒𝑚𝑝𝑡𝑒𝑟𝑎𝑡𝑢𝑟𝑒 2 yes 3 No
2 No 3 Yes
𝐼𝐺 𝑅𝑎𝑖𝑛𝑦, 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦
𝐼𝐺 𝑅𝑎𝑖𝑛𝑙𝑦, 𝑊𝑖𝑛𝑑𝑦
Gini index/ impurity
• Gini Impurity is a method for
splitting the nodes when the target
variable is categorical.
𝒄
𝑮 𝒚 = 𝟏 − (𝒑𝒊 )𝟐
𝒊=𝟏
𝟗 𝟐 𝟓 𝟐 yes No Gini
𝑮 𝒚 = 𝟏 −( ) −( ) =0.4598
𝟏𝟒 𝟏𝟒
0 14 0
1 13 0.1328
• Gini index is maximal if the 7 7 0.5
classes are perfectly mixed 0 14 0
• Gini Impurity is preferred to Information Gain because it does
not contain logarithms which are computationally intensive.
• The lower the Gini index value the more effective the attribute
in classifying training data.
▪ What is the best split (between outlook, Temp, Humidity, Wind) according to the
Gini index?
▪ For attribute Outlook, the Gini index is:
▪ P(Y)(1 – overcast – Sunny - Rainy) + P(N) (1 – overcast – Sunny - Rainy)
9 4 2 2 2 3 2 5 2 2 3 2 0 2
▪ 1− − − + 14 1− − − = 0.584
14 9 9 9 5 5 5
3. Remove the feature assigned in root node from the feature list
and again find the maximum increase in information gain for each
branch. Assign the feature as child node of each branch and remove
that feature from feature list for that branch.
Sunny Branch for outlook root node has humidity as child node.
4. Repeat step 3 until you get branches with only pure leaf. In our
example, either yes or no
Decision tree in sklearn
Visualizing decision tree
Decision Tree Regression
• Decision tree regression predicts continuous output.
Overfitting & Underfitting in Decision tree
• Whereas smaller the depth of the tree more are the chances
of bias tree(underfitting).
Causes of overfitting
• Overfitting Due to Presence of Noise: Mislabeled
instances may contradict the class labels of other similar
records.
https://www.saedsayad.com/decision_tree_overfitting.htm
Ensemble learning
Ensemble learning
• Ensemble learning is a model that makes predictions
based on a number of different models.
• By combining individual models, the ensemble model
tends to be more flexible (less bias) and less data-
sensitive (less variance).
• Now, you might get above 60% accuracy, but you would still
be misclassifying a lot of data points!
Age>40
Yes No
Bagging and boosting
The End
References