0% found this document useful (0 votes)
85 views

Decision Tree Learning

Decision tree learning is a common method used in data mining to predict target variables. It creates a flowchart-like structure where internal nodes represent input variables that are used to split the data into separate subsets, and leaf nodes represent target variable outcomes. It has advantages like being simple to understand, handling both numerical and categorical data, and performing well with large datasets. Limitations include the problem being NP-complete and possibility of overfitting.

Uploaded by

dbaechtel
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Decision Tree Learning

Decision tree learning is a common method used in data mining to predict target variables. It creates a flowchart-like structure where internal nodes represent input variables that are used to split the data into separate subsets, and leaf nodes represent target variable outcomes. It has advantages like being simple to understand, handling both numerical and categorical data, and performing well with large datasets. Limitations include the problem being NP-complete and possibility of overfitting.

Uploaded by

dbaechtel
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Decision Tree Learning

Presented by
Don Baechtel
Decision tree learning
• used in statistics, data mining and machine learning.
• uses a decision tree as a predictive model which
maps observations about an item to conclusions
about the item's target value.
• More descriptive names for such tree models are
classification trees or regression trees.
• In these tree structures, leaves represent
classifications and branches represent conjunctions
of features that lead to those classifications.
Decision Analysis
• a decision tree can be used to visually and
explicitly represent decisions and
decision making.
• In data mining, a decision tree describes data
but not decisions;
• rather the resulting classification tree can be
an input for decision making.
Example Decision Tree
Decision tree learning
• Decision tree learning is a common method used in data mining.
• The goal is to create a model that predicts the value of a target
variable based on several input variables.
• Each interior node corresponds to one of the input variables;
• there are edges to children for each of the possible values of that
input variable.
• Each leaf represents a value of the target variable given the
values of the input variables represented by the path from the
root to the leaf.
• trees can be described also as the combination of mathematical
and computational techniques to aid the description,
categorization and generalization of a given set of data.
Tree Learning
• A tree can be "learned" by splitting the source set
into subsets based on an attribute value test.
• This process is repeated on each derived subset in
a recursive manner called recursive partitioning.
• The recursion is completed when the subset at a
node all has the same value of the target variable,
or when splitting no longer adds value to the
predictions.
Decision Tree Types
• Classification tree analysis is when the predicted outcome is the
class to which the data belongs.
• Regression tree analysis is when the predicted outcome can be
considered a real number (e.g. the price of a house, or a
patient’s length of stay in a hospital).
• Classification And Regression Tree (CART) analysis is used to
refer to both of the above procedures.
• CHi-squared Automatic Interaction Detector (CHAID). Performs
multi-level splits when computing classification trees.
• A Random Forest classifier uses a number of decision trees, in
order to improve the classification rate.
• Boosted Trees can be used for regression-type and classification-
type problems.
Formulae
• The algorithms that are used for constructing decision trees
usually work top-down by choosing a variable at each step that
is the next best variable to use in splitting the set of items.
• "Best" is defined by how well the variable splits the set into
homogeneous subsets that have the same value of the target
variable.
• Different algorithms use different formulae for measuring
"best".
• These formulae are applied to each candidate subset, and the
resulting values are combined (e.g., averaged) to provide a
measure of the quality of the split.
Gini impurity
• Used by the CART algorithm, Gini impurity is a
measure of how often a randomly chosen element
from the set would be incorrectly labeled if it were
randomly labeled according to the distribution of
labels in the subset.
• Gini impurity can be computed by summing the
probability of each item being chosen times the
probability of a mistake in categorizing that item.
• It reaches its minimum (zero) when all cases in the
node fall into a single target category.
Gini impurity
Decision tree advantages
• Simple to understand and interpret. People are able to understand decision tree models
after a brief explanation.
• Requires little data preparation. Other techniques often require data normalization,
dummy variables need to be created and blank values to be removed.
• Able to handle both numerical and categorical data. Other techniques are usually
specialized in analyzing datasets that have only one type of variable. Ex: relation rules can
be used only with nominal variables while neural networks can be used only with numerical
variables.
• Uses a white box model. If a given situation is observable in a model the explanation for
the condition is easily explained by Boolean logic. An example of a black box model is an
artificial neural network since the explanation for the results is difficult to understand.
• Possible to validate a model using statistical tests. That makes it possible to account for
the reliability of the model.
• Robust. Performs well even if its assumptions are somewhat violated by the true model
from which the data were generated.
• Perform well with large data in a short time. Large amounts of data can be analyzed using
personal computers in a time short enough to enable stakeholders to take decisions based
on its analysis.
Limitations
• The problem of learning an optimal decision tree is known to be NP-complete
under several aspects of optimality and even for simple concepts. Consequently,
practical decision-tree learning algorithms are based on heuristic algorithms such
as the greedy algorithm where locally optimal decisions are made at each node.
Such algorithms cannot guarantee to return the globally optimal decision tree.
Recent developments suggest the use of genetic algorithms to avoid local optimal
decisions and search the decision tree space with little a priori bias.
• Decision-tree learners can create over-complex trees that do not generalize the
data well. This is called overfitting. Mechanisms such as pruning are necessary to
avoid this problem.
• There are concepts that are hard to learn because decision trees do not express
them easily, such as XOR, parity or multiplexer problems. In such cases, the
decision tree becomes prohibitively large. Approaches to solve the problem involve
either changing the representation of the problem domain (known as
propositionalization) or using learning algorithms based on more expressive
representations (such as statistical relational learning or
inductive logic programming).
Extending decision trees
with decision graphs
• In a decision tree, all paths from the root node to the leaf node
proceed by way of conjunction, or AND.
• In a decision graph, it is possible to use disjunctions (ORs) to join
two more paths together using Minimum Message Length (MML).
• Decision graphs have been further extended to allow for
previously unstated new attributes to be learnt dynamically and
used at different places within the graph.
• The more general coding scheme results in better predictive
accuracy and log-loss probabilistic scoring.
• In general, decision graphs infer models with fewer leaves than
decision trees.
Implementations
• Weka, a free and open-source data mining
suite, contains many decision tree algorithms.
• Orange, a free data mining software suite,
module orngTree.
• Sipina, a free decision tree software, including
an interactive tree builder.
Reference Materials
• Building Decision Trees in Python From
O'Reilly.
• An Addendum to "Building Decision Trees in P
ython"
From O'Reilly.
• Decision Trees page at aaai.org, a page with
commented links.
• Decision tree implementation in Ruby (AI4R)
at http://ai4r.rubyforge.org

You might also like