0% found this document useful (0 votes)
22 views

Decision Tree Using ID3 Algorithm

The document discusses the ID3 decision tree algorithm. It describes how ID3 builds classification trees from training data using the concept of information gain to choose attributes that best split the data. The inductive bias and hypothesis space of decision tree learning are also explained.

Uploaded by

Srujana Shetty
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Decision Tree Using ID3 Algorithm

The document discusses the ID3 decision tree algorithm. It describes how ID3 builds classification trees from training data using the concept of information gain to choose attributes that best split the data. The inductive bias and hypothesis space of decision tree learning are also explained.

Uploaded by

Srujana Shetty
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Decision Tree Using ID3

Algorithm
Compiled by,
Dr. Shashank Shetty
DECISION TREE REPRESENTATION

• Decision trees classify instances by sorting them down the tree from
the root to some leaf node, which provides the classification of the
instance.
• Each node in the tree specifies a test of some attribute of the
instance, and each branch descending from that node corresponds to
one of the possible values for this attribute.
• An instance is classified by starting at the root node of the tree,
testing the attribute specified by this node, then moving down the
tree branch corresponding to the value of the attribute in the given
example. This process is then repeated for the subtree rooted at the
new node.
• Decision trees represent a disjunction of conjunctions of constraints
on the attribute values of instances.
• Each path from the tree root to a leaf corresponds to a conjunction of
attribute tests, and the tree itself to a disjunction of these
conjunctions For example, the decision tree shown in above figure
corresponds to the expression
(Outlook = Sunny ∧ Humidity = Normal) ∨
(Outlook = Overcast) ∨
(Outlook = Rain ∧ Wind = Weak)
Appropriate Problems for Decision Tree Learning:
• Decision tree learning is generally best suited to problems with the
following characteristics:
1. Instances are represented by attribute-value pairs – Instances are
described by a fixed set of attributes and their values.
2. The target function has discrete output values – The decision tree assigns
a Boolean classification (e.g., yes or no) to each example. Decision tree
methods easily extend to learning functions with more than two possible
output values.
3. Disjunctive descriptions may be required.
4. The training data may contain errors – Decision tree learning methods are
robust to errors, both errors in classifications of the training examples and
errors in the attribute values that describe these examples.
5. The training data may contain missing attribute values – Decision tree
methods can be used even when some training examples have unknown
values.
What is ID3?
• A mathematical algorithm for building the decision tree.
• Invented by J. Ross Quinlan in 1979.
• Uses Information Theory invented by Shannon in 1948.
• Builds the tree from the top down, with no backtracking.
• Information Gain is used to select the most useful attribute for
classification.
Entropy
• A formula to calculate the homogeneity of a sample.
• A completely homogeneous sample has entropy of 0.
• An equally divided sample has entropy of 1.
• Entropy(s) = - p+log2 (p+) -p-log2 (p-) for a sample of negative and
positive elements.
• The formula for entropy is:
Entropy Example
Entropy(S) =
- (9/14) Log2 (9/14) - (5/14) Log2 (5/14)
= 0.940
Information Gain (IG)
• The information gain is based on the decrease in entropy after a dataset is split on an
attribute.
• Which attribute creates the most homogeneous branches?
• First the entropy of the total dataset is calculated.
• The dataset is then split on the different attributes.
• The entropy for each branch is calculated. Then it is added proportionally, to get total
entropy for the split.
• The resulting entropy is subtracted from the entropy before the split.
• The result is the Information Gain, or decrease in entropy.
• The attribute that yields the largest IG is chosen for the decision node.
Information Gain (cont’d)
• A branch set with entropy of 0 is a leaf node.
• Otherwise, the branch needs further splitting to classify its dataset.
• The ID3 algorithm is run recursively on the non-leaf branches, until all data
is classified.
Input Parameters:

•Examples: The training examples with known attribute values and corresponding class labels.

•Target attribute: The attribute whose value we want to predict.

•Attributes: A list of attributes that may be used to make decisions.


Algorithm Flow:

a. Create a Root node for the tree.


b. If all Examples have the same class label:
• If all Examples are positive, return a single-node tree Root with label =
"+"
• If all Examples are negative, return a single-node tree Root with label
= "-“
• If Attributes is empty:
• Return a single-node tree Root with the label as the most common
value of the Target attribute in Examples.
d. Otherwise, start the decision-making process:
i. Calculate the entropy of the current dataset (Examples) using the formula
ii. For each attribute in Attributes, calculate the information gain (IG) using
the formula
iii. Select the attribute with the highest information gain as the decision
attribute for the Root node.
iv. For each possible value of the selected attribute:
- Create a new branch below the Root node corresponding to the test
"Attribute = vi".
- Divide the Examples into subsets based on the value of the selected
attribute.
- If a subset is empty:
- Add a leaf node with the label as the most common value of the Target
attribute in the Examples.
- Otherwise: - Recursively create a subtree using the ID3 algorithm with the
subset of Examples, excluding the selected attribute.
• e. Return the Root node of the decision tree.
Hypothesis Space Search In Decision Tree
Learning
• In the process of decision tree learning, like with the ID3 algorithm,
we're essentially trying to find the best tree structure that accurately
classifies our training data. This involves exploring various hypotheses
or potential decision trees to find the one that fits our data the best.
• The hypothesis space searched by ID3 is the set of possible decision
trees. ID3 performs a simple-to complex, hill-climbing search through
this hypothesis space, beginning with the empty tree, then
considering progressively more elaborate hypotheses in search of a
decision tree that correctly classifies the training data.
Hypothesis Space Search In Decision Tree
Learning
• The goal is to find the tree that maximizes the information gain, which
essentially means it helps to classify the training data better.
• One key advantage of ID3 is that it considers all possible decision
trees that can be constructed from the available attributes, ensuring
it won't miss the target function. However, it only maintains a single
hypothesis at any given time, unlike some other methods that keep
track of multiple consistent hypotheses. This limitation means it can't
explore alternative trees or ask new questions to improve its
understanding.
Hypothesis Space Search In Decision Tree
Learning
• Another thing to note is that ID3 doesn't backtrack once it selects an
attribute to split the data at a certain level of the tree. This means it
might get stuck at locally optimal solutions, missing out on potentially
better trees along different paths. To address this, there are
extensions like post-pruning, which involves refining the tree
structure after it's been constructed.
• ID3 also differs from methods that make decisions based on individual
training examples; instead, it uses statistical properties of the entire
dataset to guide its decisions. This makes it less sensitive to errors in
individual examples and allows it to handle noisy data by accepting
hypotheses that might not perfectly fit the training data.
Inductive Bias in Decision tree Learning
• The inductive bias in decision tree learning, specifically in the ID3
algorithm, refers to the set of assumptions and preferences guiding
how the algorithm generalizes from observed training examples to
classify unseen instances.
• In simpler terms, it's like the inherent tendencies or rules that ID3
follows when it's making decisions about how to classify things based
on the data it has seen.
Inductive Bias in Decision tree Learning
• Preference for Shorter Trees: ID3 prefers simpler decision trees over
complex ones. This means it likes to keep the rules as concise as
possible. It does this by selecting the first acceptable tree it
encounters during its search, favoring shorter paths through the tree.
• Placing High Information Gain Attributes Close to the Root: ID3 also
tends to prioritize attributes that provide the most useful information
for classification. It tries to put these attributes closer to the top of
the decision tree, as they can quickly split the data into meaningful
subsets.
Hypothesis Space:
•Definition: The hypothesis space refers to the set of all possible
hypotheses (models or rules) that a learning algorithm can
consider to explain the data.
•Characteristics: It encompasses the range of potential solutions
the algorithm can explore during the learning process.
•Example: In decision tree learning, the hypothesis space includes
all possible decision trees that can be formed using different
combinations of attributes and decision rules.
Inductive Bias:
•Definition: Inductive bias refers to the set of assumptions, preferences, or
constraints that a learning algorithm incorporates into its decision-making
process when generalizing from observed data to classify unseen instances.
•Characteristics: It guides the algorithm's learning process by favoring
certain hypotheses over others based on predefined criteria or principles.
•Example: In decision tree learning, the inductive bias might include
preferences for simpler trees (those with fewer branches or nodes), favoring
attributes with higher information gain, or placing important attributes closer to
the root of the tree.
• In summary, the hypothesis space defines the range of possible
solutions that a learning algorithm considers, while the inductive bias
influences the algorithm's decision-making process within that space
by favoring certain types of hypotheses over others based on
predefined principles or preferences.

You might also like