L3 - Decision Trees
L3 - Decision Trees
Lương Thái Lê
Outline of the Lecture
1. Introduction of Decision Trees (DT)
2. DT Algorithms
3. Choose the Best Features
• Information Gain
• Example
Root
Decision Tree (DT) Introduction
Banch
• DT is a supervised learning method – classification
• DT learns a classification function represented by a
decision tree
• Can be presented by a set of rules IF – THEN
• Can perform even with noise data
• As one of the most common inductive learning
Leaf
methods
• Successfully applied in many application problems
• Ex: Spam email filtering…
A DT: Example
• A learned DT will classify for an example, by traversing the tree from the root node
to a leaf node
=> The class label associated with that leaf node will be assigned to the example to be classified
Represent a DT (2)
• A DT represents a disjunction of
combinations of constraints for
the attribute values of the
examples
• Each path from the root node to a
leaf node corresponds to a
combination of attribute tests
DT – Problem Setting
• Set of possible instances X:
• each instance x in X is a feature vector
• x = <x1, x2,…, xn >; Ex: <Humidity=low, Win=weak, Outlook=rain, Temp=hot>
• Unknown target function: 𝑓: 𝑋 → 𝑌
• 𝑦 ∈ 𝑌; 𝑦 = 1 𝑖𝑓 𝑤𝑒 𝑝𝑙𝑎𝑦 𝑡𝑒𝑛𝑛𝑖𝑠 𝑜𝑛 𝑡ℎ𝑖𝑠 𝑑𝑎𝑦, 𝑒𝑙𝑠𝑒 𝑦 = 0
• Set of function hypotheses 𝐻 = ℎ ℎ: 𝑋 → 𝑌}
• each hypothesis ℎ is a decision tree
• Input:
• Training Examples: < 𝑥 𝑖 , 𝑦 𝑖
> of unkown target function f
• Output:
• Hypothesis ℎ ∈ 𝐻 that best approximates f
Top – down Induction of Decision Trees
[ID3, C4.5, Quinlan]
node = Root
Main loop:
1. 𝐴 ← 𝑡ℎ𝑒 𝑏𝑒𝑠𝑡 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 (𝑓𝑒𝑎𝑡𝑢𝑟𝑒) 𝑓𝑜𝑟 𝑛𝑒𝑥𝑡 𝑛𝑜𝑑𝑒
2. 𝐴𝑠𝑠𝑖𝑔𝑛 𝐴 𝑎𝑠 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝑓𝑜𝑟 𝑛𝑜𝑑𝑒
3. 𝐹𝑜𝑟 𝑒𝑎𝑐ℎ 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝐴, 𝑐𝑟𝑒𝑎𝑡𝑒 𝑑𝑒𝑐𝑒𝑛𝑑𝑎𝑛𝑡 𝑜𝑓 𝑛𝑜𝑑𝑒
4. 𝑆𝑜𝑟𝑡 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑡𝑜 𝑙𝑒𝑎𝑓 𝑛𝑜𝑑𝑒𝑠
5. 𝐼𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑝𝑒𝑟𝑓𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑, 𝑡ℎ𝑒𝑛 𝑆𝑇𝑂𝑃 𝑒𝑙𝑠𝑒 𝑖𝑡𝑒𝑟𝑎𝑡𝑒 𝑜𝑣𝑒𝑟 𝑛𝑒𝑤 𝑙𝑒𝑎𝑓 𝑛𝑜𝑑𝑒
𝐼𝐺(𝑆, 𝐴)
𝐺𝑎𝑖𝑛𝑅𝑎𝑡𝑖𝑜 𝑆, 𝐴 =
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛(𝑆, 𝐴)