5.desion Tree
5.desion Tree
Root Node
e
inside
-
known data - splitting X
-
·
42
wys
I
3000e
randa. 3
M
↳ node
age =2.9
I barethe -leaf
⑧
-
class value
Overview
§ A decision tree is a decision support tool that uses a tree like model of decisions and their possible
--
consequences
- condition
§ A decision tree is a flowchart-like structure in which each internal node represents a “test” on an
-
-
attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the
-
-
test, and each leaf node represents a class label (decision taken after computing all attributes). The
-
§ Tree based learning algorithms are considered to be one of the best and mostly used supervised
-
-
learning methods
§ Tree based methods empower predictive models with high accuracy, stability and ease of
--
-
interpretation
§ Unlike linear models, they map non-linear relationships quite well
- - -
§ Decision Tree algorithms are referred to as CART (Classification and Regression Trees)
-
§ Root Node: It represents entire population or sample and this further gets divided into two or more
-
-
homogeneous sets.
-
§ Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node.
- - -
§ Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node.
--
§ Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say
- - -
opposite process of splitting.
-
§ Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-nodes
-
§ It is one of the more popular classification algorithms being used in Data Mining
-
§ Determination of likely buyers of a product using demographic data to enable targeting of limited
-
advertisement budget
-
§ Prediction of likelihood of default for applicant borrowers using predictive models generated from
-
-
historical data
§ Help with prioritization of emergency room patient treatment using a predictive model based on
-
§ Decision trees are commonly used in operations research, specifically in decision analysis, to help
-
identify a strategy most likely to reach a goaln, and other measurements
-
§ Because of their simplicity, tree diagrams have been used in a broad range of industries and
-
§ Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is
-
-
§ It works for both categorical and continuous input and output variables
-
§ In this technique, we split the population or sample into two or more homogeneous sets (or sub-
e
-informations
§ Place the best attribute of the dataset at the root of the tree.
§ Split the training set into subsets. Subsets should be made in such a way that each subset contains
-msenous
data with the same value for an attribute.
-
d a ta
§ Repeat step 1 and step 2 on each subset until you find leaf nodes in all the branches of the tree.
-
§ Order to placing attributes as root or internal node of the tree is done by using some statistical
ing
- -
approach
-
it -
Iinformalene
gain
I
Sign
I
is is
Ges
gre aug
oI d
Regression
§ Decision Tree has continuous target variable then it is called as Continuous Variable Decision Tree
§ Easy to Understand
-
§ Decision tree output is very easy to understand even for people from non-analytical background
§ It does not require any statistical knowledge to read and interpret them
§ Its graphical representation is very intuitive and users can easily relate their hypothesis
§ Useful in Data exploration
-
§ Decision tree is one of the fastest way to identify most significant variables and relation between two or more
variables
§ With the help of decision trees, we can create new variables / features that has better power to predict target
variable
§ It can also be used in data exploration stage
§ For e.g., we are working on a problem where we have information available in hundreds of variables, there
decision tree will help to identify most significant variable.
§ Over fitting
-
§ Decision-tree learners can create over-complex trees that do not generalize the data well. This is called
overfitting.
§ Over fitting is one of the most practical difficulty for decision tree models
§ This problem gets solved by setting constraints on model parameters and pruning
§ Not fit for continuous variables
-
§ While working with continuous numerical variables, decision tree loses information, when it categorizes
variables in different categories.
§ Decision trees can be unstable because small variations in the data might result in a completely
different tree being generated. This is called variance, which needs to be lowered by methods
like bagging and boosting
§ Decision tree learners create biased trees if some classes dominate. It is therefore recommended to
balance the data set prior to fitting with the decision tree
§ Calculations can become complex when there are many class label
§ Regression trees are used when dependent variable is continuous. Classification Trees are used
- -
§ In case of Regression Tree, the value obtained by terminal nodes in the training data is the mean
- n e e
response of observation falling in that region. Thus, if an unseen data observation falls in that region,
-
we’ll make its prediction with mean value.
§ In case of Classification Tree, the value (class) obtained by terminal node in the training data is the
-
mode of observations falling in that region. Thus, if an unseen data observation falls in that region,
-
we’ll make its prediction with mode value. - homogeneous
~
§ Both the trees divide the predictor space (independent variables) into distinct and non-overlapping
-
-
regions.
§ Both the trees follow a top-down greedy approach known as recursive binary splitting.
-
§ This splitting process is continued until a user defined stopping criteria is reached
-
-
-
-- -
-
bi navy Ascelticlass
- -
-
-
-
- -
-
-
§ Gini index says, if we select two items from a population at random then they must be of same class
- -
§ CART (Classification and Regression Tree) uses Gini method to create binary splits.
-
eit
faa)
rainy
-
-- I sunny
-
(215)
I
=
iis
-
overcast -
(1/5)
=
po)2]
2
is Gini(w): 1
=
-
[P(r) p(x)
+ +
- =1 -
((21s)" + (2st2 +
4(5)]
⑤ -
§ It is an algorithm to find out the statistical significance between the differences between sub-nodes
and parent node
§ We measure it by sum of squares of standardized differences between observed and expected
frequencies of target variable
§ It works with categorical target variable “Success” or “Failure”.
§ It can perform two or more splits.
§ Higher the value of Chi-Square higher the statistical significance of differences between sub-node and
Parent node
§ It generates tree called CHAID (Chi-square Automatic Interaction Detector)
§ Less impure node requires less information to describe it. And, more impure node requires more
information
§ Information theory is a measure to define this degree of disorganization in a system known as Entropy
§ If the sample is completely homogeneous, then the entropy is zero and if the sample is an equally
divided (50% — 50%), it has entropy of one
§ Entropy can be calculated using formula