Ai&ml M-3
Ai&ml M-3
• Decision trees classify instances by sorting them down the tree from the root to some leaf
node, which provides the classification of the instance.
• Each node in the tree specifies a test of some attribute of the instance, and each branch
descending from that node corresponds to one of the possible values for this attribute.
• An instance is classified by starting at the root node of the tree, testing the attribute specified
by this node, then moving down the tree branch corresponding to the value of the attribute in
the given example. This process is then repeated for the subtree rooted at the new node.
• Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the
tree itself to a disjunction of these conjunctions
For example,
(Outlook = Overcast)
• The entropy is 1 when the collection contains an equal number of positive and negative
examples
• If the collection contains unequal numbers of positive and negative examples, the entropy
is between 0 and 1
• Information gain, is the expected reduction in entropy caused by partitioning the examples
according to this attribute.
• The information gain, Gain (S, A) of an attribute A, relative to a collection of examples S, is defined
as
Example: Information gain
S =[9+,5-]
S Weak =[6+,2-]
S Strong =[3+,3-]
Information gain of attribute Wind:
= 0.048
ii)A or [B and C]
(b) selects trees that place the attributes with highest information gain closest to the root
Approximate inductive bias of ID3: Shorter trees are preferred over larger trees
• Consider an algorithm that begins with the empty tree and searches breadth first through
progressively more complex trees.
• First considering all trees of depth 1, then all trees of depth 2, etc.
• Once it finds a decision tree consistent with the training data, it returns the smallest
consistent tree at that search depth (e.g., the tree with the fewest nodes).
• BFS-ID3 finds a shortest decision tree and thus exhibits the bias "shorter trees are preferred
over longer trees.
A closer approximation to the inductive bias of ID3: Shorter trees are preferred over longer
trees. Trees that place high information gain attributes close to the root are preferred over those that
do not.
• ID3 can be viewed as an efficient approximation to BFS-ID3, using a greedy heuristic search to
attempt to find the shortest tree without conducting the entire breadth-first search through the
hypothesis space
• Because ID3 uses the information gain heuristic and a hill climbing strategy, it exhibits a more
complex bias than BFS-ID3.
• In particular, it does not always find the shortest consistent tree, and it is biased to favour trees
that place attributes with high information gain closest to the root.
iii)
iv)