0% found this document useful (0 votes)
25 views

Decision Tree

The document describes decision trees, which classify instances by sorting them from the root node to a leaf node through a series of tests on attribute values. An instance is classified by starting at the root and moving through the tree based on the results of attribute tests at each node. Decision trees are suitable for problems with discrete attributes and outputs, and can handle missing data. The algorithm selects attributes using an information gain heuristic to build the tree recursively in a top-down manner until reaching pure leaf nodes. Overfitting can be addressed through pre-pruning or post-pruning techniques. Classification rules can be extracted from trees by creating rules for each path from root to leaf.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Decision Tree

The document describes decision trees, which classify instances by sorting them from the root node to a leaf node through a series of tests on attribute values. An instance is classified by starting at the root and moving through the tree based on the results of attribute tests at each node. Decision trees are suitable for problems with discrete attributes and outputs, and can handle missing data. The algorithm selects attributes using an information gain heuristic to build the tree recursively in a top-down manner until reaching pure leaf nodes. Overfitting can be addressed through pre-pruning or post-pruning techniques. Classification rules can be extracted from trees by creating rules for each path from root to leaf.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Decision Tree

Decision Tree Representation


• A decision tree is an arrangement of tests that provides an appropriate
classification at every step in an analysis.
• "In general, decision trees represent a disjunction of conjunctions of
constraints on the attribute-values of instances. Each path from the tree root
to a leaf corresponds to a conjunction of attribute tests, and the tree itself to
a disjunction of these conjunctions"
• More specifically, decision trees classify instances by sorting them down
the tree from the root node to some leaf node, which provides the
classification of the instance. Each node in the tree specifies a test of
some attribute of the instance, and each branch descending from that node
corresponds to one of the possible values for this attribute.
Decision Tree(Contd..)
• An instance is classified by starting at the root node of the decision
tree, testing the attribute specified by this node, then moving down the
tree branch corresponding to the value of the attribute. This process is
then repeated at the node on this branch and so on until a leaf node is
reached.
• Diagram
• Each nonleaf node is connected to a test that splits its set of possible
answers into subsets corresponding to different test results.
• Each branch carries a particular test result's subset to another node.
• Each node is connected to a set of possible answers.
Appropriate Problems for Decision Tree
Learning
• Decision tree learning is generally best suited to problems with the
following characteristics:
• Instances are represented by attribute-value pairs.
• There is a finite list of attributes (e.g. hair color) and each instance stores a
value for that attribute (e.g. blonde).
• When each attribute has a small number of distinct values (e.g. blonde, brown,
red) it is easier for the decision tree to reach a useful solution.
• The algorithm can be extended to handle real-valued attributes (e.g. a floating
point temperature)
Contd..
• The target function has discrete output values.
• A decision tree classifies each example as one of the output values.
• Simplest case exists when there are only two possible classes (Boolean classification).
• However, it is easy to extend the decision tree to produce a target function with more than two
possible output values.
• Although it is less common, the algorithm can also be extended to produce a target
function with real-valued outputs.
• Disjunctive descriptions may be required.
• Decision trees naturally represent disjunctive expressions.
• The training data may contain errors.
• Errors in the classification of examples, or in the attribute values describing those
examples are handled well by decision trees, making them a robust learning method.
• Decision tree prone to overfit the data
Cond..
• The training data may contain missing attribute values.
• Decision tree methods can be used even when some training examples have
unknown values (e.g., humidity is known for only a fraction of the examples).
• After a decision tree learns classification rules, it can also be
re-represented as a set of if-then rules in order to improve readability.
Basic Decision Tree learning
• A decision tree is constructed by looking for regularities in data.
• Quinlan's ID3 Algorithm for Constructing a Decision Tree
Algorithm for Decision Tree Induction
• Basic algorithm (a greedy algorithm)
• Tree is constructed in a top-down recursive divide-and-conquer
manner
• At start, all the training examples are at the root
• Attributes are categorical (if continuous-valued, they are
discretized in advance)
• Examples are partitioned recursively based on selected
attributes
• Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
• Conditions for stopping partitioning
• All samples for a given node belong to the same class
• There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
• There are no samples left
9
Brief Review of Entropy

m=2
10
Attribute Selection Measure: Information
Gain (ID3/C4.5)
■ Select the attribute with the highest information gain
■ Let pi be the probability that an arbitrary tuple in D belongs to
class Ci, estimated by |Ci, D|/|D|
■ Expected information (entropy) needed to classify a tuple in D:

■ Information needed (after using A to split D into v partitions) to


classify D:

■ Information gained by branching on attribute A

11
Example
Attribute Selection: Information Gain
g Class P: buys_computer = “yes”
g Class N: buys_computer = “no”

means “age <=30” has 5 out of 14


samples, with 2 yes’es and 3 no’s.
Hence

Similarly,

13
Resulted Decision Tree
Overfitting and Tree Pruning
• Overfitting: An induced tree may overfit the training data
• Too many branches, some may reflect anomalies due to
noise or outliers
• Poor accuracy for unseen samples
• Two approaches to avoid overfitting
• Prepruning: Halt tree construction early ̵ do not split a node if
this would result in the goodness measure falling below a
threshold
• Difficult to choose an appropriate threshold
• Postpruning: Remove branches from a “fully grown”
tree—get a sequence of progressively pruned trees
• Use a set of data different from the training data to decide which is
the “best pruned tree”

15
If-Then Rules
• Extracting Classification Rules from Trees
• Goal: Represent the knowledge in the form of IF-THEN determinant
rules
• One rule is created for each path from the root to a leaf;
• Each attribute-value pair along a path forms a conjunction;
• The leaf node holds the class prediction
• Rules are easier to understand
If- Then Rules
Excercise

You might also like