0% found this document useful (0 votes)

6 views78 pages

Trees

Decision Trees are non-parametric models used for classification and regression, constructed using nodes and branches to evaluate features and make predictions. They are easy to interpret and require less data to train, but are prone to overfitting and can be biased towards features with many levels. Techniques like ID3 and CART utilize measures such as Information Gain and Gini Index to determine the best attributes for splitting data.

Uploaded by

kaustub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views78 pages

Trees

Uploaded by

kaustub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 78

Decision Trees

• In the Machine Learning world, Decision Trees are a kind of non parametric
models, that can be used for both classification and regression.

• Decision trees are flexible models that don’t increase their number of
parameters as we add more features (if we build them correctly), and they can
either output
• a categorical prediction (like if a plant is of a certain kind or not) or
• a numerical prediction (like the price of a house).
Decision Trees
• They are constructed using two kinds of
elements: nodes and branches.
• At each node, one of the features of our
data is evaluated in order to split the
observations in the training process or
to make an specific data point follow a
certain path when making a prediction.
• When they are being built decision
trees are constructed
by recursively evaluating different
features and using at each node the
feature that best splits the data.
Decision Trees
• The Root Node: Is the node that starts
the graph. In a normal decision tree it
evaluates the variable that best splits
the data.
• Intermediate nodes: These are nodes
where variables are evaluated but which
are not the final nodes where
predictions are made.
• Leave nodes: These are the final nodes
of the tree, where the predictions of a
category or a numerical value are made.
Decision Trees
• It utilizes an if-then rule set which is
mutually exclusive and exhaustive for
classification.
• The rules are learned sequentially using
the training data one at a time.
• Each time a rule is learned, the tuples
covered by the rules are removed.
• This process is continued on the training
set until meeting a termination
condition.
Decision Trees
• Advantages:
• The main advantage of decision trees is how easy they are to interpret. While other machine
Learning models are close to black boxes, decision trees provide a graphical and intuitive way
to understand what our algorithm does.
• Compared to other Machine Learning algorithms Decision Trees require less data to train.
• They can be used for Classification and Regression.
• They are simple.

• Disadvantages
• They are quite prone to over fitting to the training data and can be sensible to outliers.
• They are weak learners: a single decision tree normally does not make great predictions, so
multiple trees are often combined to make ‘forests’ to give birth to stronger ensemble models.
• Bias towards features with many levels: Decision trees tend to favor features with many
levels or categories because they can create more splits, potentially leading to biased
predictions.
Example 1: Decision Tree
• We need to classify the fruits in the basket
Example 2: Decision Tree
Quantifying impurity
• Quantify uncertainty at each node.
• Calculate impurity first.
• Range is 0 to 1.

low impurity is low uncertainty

high impurity is high uncertainty
Quantifying impurity
Quantifying impurity

Impurity= 1-p
Decision Tree Induction Techniques
• Decision tree induction is a top-down, recursive and divide-and-conquer approach.
• Choose an attribute and split larger training set into smaller training sets
• Different algorithms have been proposed to take a good control over
1. Choosing the best attribute to be split – attribute selection measure
2. Splitting criteria
• Some of the algorithms are
• ID3
• CART
• Issues
• Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
• Determine when to stop splitting
Attribute Selection Measures
• An attribute selection measure is a heuristic for selecting the splitting
criterion that “best” separates a given data partition, D, of class-labeled
training tuples into individual classes.

• If we were to split D into smaller partitions according to the outcomes of

the splitting criterion, ideally each partition would be pure (i.e., all the
tuples that fall into a given partition would belong to the same class).

• Conceptually, the “best” splitting criterion is the one that most closely
results in such a scenario.
Measures of Node Impurity
• Entropy

• Information Gain – used by ID3 algorithm

• Gini Index – used by CART

Entropy
Entropy is an information theory metric that measures the impurity or uncertainty in a group of observations. It determines how
a decision tree chooses to split data. It is used to measure how informative a node is.

Entropy measures the impurity in the given dataset. In Physics and Mathematics, entropy is referred to as the randomness or
uncertainty of a random variable X. In information theory, it refers to the impurity in a group of examples. Splitting on any
attribute has the property that average entropy of the resulting training subsets is less than or equal to that of the previous
training set

Information gain is the decrease in entropy. Information gain computes the difference between entropy before split and average
entropy after split of the dataset based on given attribute values.

Attribute with the largest value of information gain is chosen as the splitting attribute
Attribute Selection Measures –
Information Gain
• ID3 uses information gain as its attribute selection measure.

• Constructing a decision tree is all about finding attribute that returns

the highest information gain (i.e., the most homogeneous branches).

• Let node N represent or hold the tuples of partition D. The attribute

with the highest information gain is chosen as the splitting attribute
for node N.
Entropy
• A={1,1,1,1,1,2,2,2,3}
• There are 3 distinct classes
• p1=5/9, p2=3/9, p3=1/9
• E=p1log(1/p1)+p2log(1/p2)+p3log(1/p3)

• Entropy of a Training Set

• If there are k classes , ……., and for denotes the number of occurrences of classes
divided by the total number of instances
Entropy
What happened if all the observations belong to the same class? Entropy would be 0

E=-(1log21) = 0

Such a data set does not have any impurity. Such a dataset would not be useful for learning.

However, if we have a dataset with say, two classes, half made up of yellow and the other half being purple, the
entropy will be one.

E=-((0.5log20.5)+(0.5log20.5)) = 1

This kind of a dataset is good for learning

Attribute Selection Measures –
Information Gain
• This attribute minimizes the information needed to classify the tuples
in the resulting partitions
• Such an approach minimizes the expected number of tests needed to
classify a given tuple and guarantees that a simple (but not
necessarily the simplest) tree is found.

Info(D) known as the entropy of D

where pi is the nonzero probability that an arbitrary tuple in D belongs to class Ci and is
estimated by |Ci,D|/|Dj|
Info(D) is just the average amount of information needed to identify the class label of a
Attribute Selection Measures –
Information Gain
Attribute Selection Measures – Information Gain
Entropy of D before splitting
Attribute Selection Measures –
Information Gain

Gain(A) = Info(D)-InfoA(D)

The attribute A with the highest information gain, Gain.(A), is chosen as the splitting
attribute at node N.

This is equivalent to saying that we want to partition on the attribute A that would do the
“best classification,” so that the amount of information still required to finish classifying
the tuples is minimal
Example

Training set, D, of class-labeled tuples randomly selected from the

AllElectronics customer database.
Example

Training set, D, of class-labeled tuples randomly selected from the

AllElectronics customer database.
Example

• Each attribute is discrete valued.

• Continuous-valued attributes have been

generalized.)

• The class label attribute, buys computer, has

two distinct values (namely, {yes, no});
therefore, there are two distinct classes (i.e., m
=2).

• Let class C1 correspond to yes and class C2

correspond to no.

• There are nine tuples of class yes and five tuples

Training set, D, of class-labeled tuples randomly selected fromno.
of class the A (root) node N is created for the
AllElectronics customer database. tuples in D.
Example

1. To find the splitting criterion for these tuples, we must

compute the information gain of each attribute using below
formula.
Example

2. we need to compute the expected information requirement

for each attribute using formula:

Let us compute for age attribute- with values youth, middle

aged, senior

Info(buys_computer|age=youth) = -2/5 log2(2/5) – 3/5log2(3/5)

= 0.97095
Info(buys_computer|age=middle_aged)=-4/4log2(4/4)=0
Info(buys_computer|age=senior)=-3/5log2(3/5)-2/5log2(2/5)=
0.097095

Infoage(D) = (5/14)(0.97095) + (4/14)(0) + (5/14)*(0.97095)

=0.694
Example

2. we need to compute the expected information requirement

for each attribute using formula:

Let us compute for age attribute- with values youth, middle

aged, senior

Info(buys_computer|age=youth) = -2/5 log2(2/5) – 3/5log2(3/5)

= 0.97095
Info(buys_computer|age=middle_aged)=-4/4log2(4/4)=0
Info(buys_computer|age=senior)=-3/5log2(3/5)-2/5log2(2/5)=
0.097095

Infoage(D) = (5/14)(0.97095) + (4/14)(0) + (5/14)*(0.97095)

=0.694

Gain = Infod(D)- Infoage(D) =0.940-0.694=0.246 bits

Example

Calculate gain for income, student and credit rating attribute.

Example

Gain(income)= 0.029 bits

Gain(student)=0.151 bits

Gain(credit_rating)= 0.048 bits

Gain (age)= 0.246 bits

As age has highest information gain, it is selected as the

splitting attribute.
Example
Example
Example: Decision tree

1. Compute entropy for the dataset

Example: Decision tree

1. Compute entropy for the dataset

Example: Decision tree

1. Compute entropy for the dataset

Example: Decision tree

1. Compute entropy for the dataset

Example: Decision Tree
• Selection of root node
Example: Decision Tree
Choose outlook
Example: Decision Tree

(rainy)
Example: Decision Tree

(rainy)
Example: Decision Tree
E(() ()
= 0.982
Example: Decision Tree
Example: Decision Tree
Example: Decision Tree
Example: Decision Tree
Example: Decision Tree

Person will go to play?

sunny
rain
overcast Windy: Strong
Humidity: High
Outlook: Sunny

Person will not go to play

CART Algorithm
• Information gain measure used in ID3 is biased towards test with many outcomes
• ID3 prefers to select attributes (ex: outlook, humidity, windy) having a large number of
values
• L. Breiman, J. Friedman, R. Olshen and C. Stone in 1984 proposed an algorithm to build a
binary decision tree also called CART decision tree.
• CART stands for Classification and Regression Tree
• CART is a technique that generates a binary decision tree
• Unlike ID3, in CART, for each node only two branches are created
• ID3 uses Information gain as a measure to select the best attribute to be split
• CART uses another measurement called Gini index
• It is also known as Gini Index of Diversity and is denote as
Gini Index
• Gini Index or Gini impurity measures the degree or probability of a particular
variable being wrongly classified when it is randomly chosen.

• If all the elements belong to a single class, then it can be called pure. The degree
of Gini Index varies between 0 and 1,

• where, '0' denotes that all elements belong to a certain class or there exists only
one class (pure), and '1' denotes that the elements are randomly distributed
across various classes (impure).

• A Gini Index of '0.5 'denotes equally distributed elements into some classes.
Formula for Gini Index
• The formula of the Gini Index is as follows:

where, ‘pi’ is the probability of an object being classified to a

particular class.

While building the decision tree, we would prefer to choose the

attribute/feature with the least Gini Index as the root node.
Gini Index of Diversity

Suppose, a binary partition on A splits D into and , then the weighted

average Gini Index of splitting denoted by is given by

This binary partition of D reduces the impurity and the reduction in impurity
is measured by
Gini
Example: CART Algorithm
Build decision tree for the data set EMP as given in the table below.
Tuple# Age Salary Job Performance Select
1 Y H P A N

Age Job 2 Y H P E N

Y : young G : government 3 M H P A Y

M : middle-aged P : private 4 O M P A Y

O : old 5 O L G A Y
Performance
6 O L G E N
A : Average
Salary E : Excellent 7 M L G E Y
L : low 8 Y M P A N
M : medium Class : Select 9 Y L G A Y
H : high Y : yes 10 O M G A Y
N : no 11 Y M G E Y

12 M M P E Y

13 M H G A Y

14 O M P E N
Example: CART Algorithm
For the EMP data set, Tuple# Age Salary Job Performance Select
1 Y H P A N
2 Y H P E N

Total samples =14, Y =9, N =5 3 M H P A Y

4 O M P A Y
5 O L G A Y
6 O L G E N
7 M L G E Y
8 Y M P A N
9 Y L G A Y
10 O M G A Y
11 Y M G E Y
12 M M P E Y
13 M H G A Y
14 O M P E N

Consider the calculation of for Age, Salary, Job and Performance

Example: CART Algorithm
Attribute of splitting: Age
• has three values, namely Y, M and O Tuple# Age Salary Job Perform Select
• There are 6 subsets, that should be considered for splitting as: ance
1 Y H P A N
2 Y H P E N
3 M H P A Y
4 O M P A Y
For age1, Tuple is {Y|MO} 5 O L G A Y
6 O L G E N
Age is divided into two branches, Y and (M, O)
7 M L G E Y
For Y, 5 instances (2Yes and 3 No), For M and O, 9 instances (7 Yes and 2 No)
8 Y M P A N
9 Y L G A Y
10 O M G A Y
11 Y M G E Y
12 M M P E Y
Age {Y} 13 M H G A Y
14 O M P E N
Yes No

{Y} {M,O}
Example: CART Algorithm
Attribute of splitting: Age
Tuple Ag Salar Jo Perform Select
# e y b ance
0.3936
1 Y H P A N
2 Y H P E N
?
For , it is {M|YO}
3 M H P A Y
4 O M P A Y

M has 4 instances (4 Yes), Y and O have 10 instances (5 Yes

and 5 No)
5 O L G A Y
6 O L G E N
7 M L G E Y
8 Y M P A N

0.3571 9 Y L G A Y
10 O M G A Y
11 Y M G E Y
Age {M}
12 M M P E Y
Yes No 13 M H G A Y
14 O M P E N
{M} {Y,O}
Example: CART Algorithm
Attribute of splitting: Age
The attribute age has three values, namely Y, M and O. So there are 6 subsets, Tuple Ag Salar Jo Perform Select
that should be considered for splitting as: # e y b ance
1 Y H P A N

0.3936
2 Y H P E N
3 M H P A Y
4 O M P A Y
5 O L G A Y
? 6 O L G E N
7 M L G E Y
8 Y M P A N
9 Y L G A Y
10 O M G A Y
Age {O} 11 Y M G E Y

Yes No 12 M M P E Y
13 M H G A Y
{O} {Y,M} 14 O M P E N
Example: CART Algorithm
Attribute of splitting: Age
Tuple Ag Salar Jo Perform Select

0.3936
# e y b ance
1 Y H P A N
2 Y H P E N
0.3571 3 M H P A Y
0.4571 4 O M P A Y
5 O L G A Y
considers {Y,M|O} and considers {O|Y,M} 6 O L G E N
Therefore, , 7 M L G E Y
8 Y M P A N
9 Y L G A Y
10 O M G A Y
11 Y M G E Y
The best value of Gini Index to split attribute Age is minimum of all
12 M M P E Y
13 M H G A Y
14 O M P E N
Example: CART Algorithm
Attribute of Splitting: Salary
Tuple Ag Salar Job Performa Select
The attribute salary has three values namely L, M and H. # e y nce
So, there are 6 subsets, that should be considered for 1 Y H P A N
splitting as: 2 Y H P E N
3 M H P A Y
4 O M P A Y
5 O L G A Y
6 O L G E N
Salary {L} 7 M L G E Y
8 Y M P A N
Yes No
9 Y L G A Y
10 O M G A Y
{L} {M,H}
11 Y M G E Y
12 M M P E Y
Overall Gini Index is G(EMP)
13 M H G A Y
14 O M P E N
Example: CART Algorithm
Attribute of Splitting: job
Tuple Ag Salar Job Performa Select
# e y nce
Job being the binary attribute , we have 1 Y H P A N
2 Y H P E N
3 M H P A Y
4 O M P A Y
5 O L G A Y
6 O L G E N
7 M L G E Y
8 Y M P A N
9 Y L G A Y
=? 10 O M G A Y
11 Y M G E Y
12 M M P E Y
? 13 M H G A Y
14 O M P E N
Example: CART Algorithm
Attribute of Splitting: Performance

Performance is a binary attribute

?
?

Out of these attributes, the attribute that gives the maximum reduction in impurity
would be chosen for splitting subset.
Trees Vs Linear Model
• Comparing tree-based models with linear models is essential in machine learning, as they represent two different approaches to modeling data. Here's a
comparison between the two:

• Model Complexity:
Tree-Based Models: Decision trees can capture complex nonlinear relationships between features and the target variable. Ensemble methods like random forests and gradient
boosting machines further increase the model complexity by combining multiple decision trees.
Linear Models: Linear models assume a linear relationship between the features and the target variable. They are relatively simple and may struggle to capture complex nonlinear
relationships.

• Interpretability:
Tree-Based Models: Decision trees offer interpretability as they represent a series of simple if-else conditions. It's easy to understand the decision-making process within a tree
model.
Linear Models: Linear models are also interpretable, as coefficients associated with each feature indicate the direction and strength of the relationship between the feature and the
target variable.

• Handling of Features:
Tree-Based Models: Tree models can handle both numerical and categorical features without the need for preprocessing such as one-hot encoding.
Linear Models:** Linear models typically require preprocessing for categorical variables, such as one-hot encoding, to represent them effectively.

• Robustness to Outliers:
Tree-Based Models: Decision trees are generally robust to outliers since they partition the feature space based on thresholds.
Linear Models: Linear models can be sensitive to outliers, especially in models like linear regression, where outliers can disproportionately influence the estimated coefficients.

• Overfitting:
Tree-Based Models: Decision trees are prone to overfitting, especially when the tree depth is not limited. Ensemble methods like random forests and gradient boosting machines
mitigate overfitting to some extent.
Linear Models: Linear models are less prone to overfitting, especially when the number of features is small compared to the number of observations.

• Computational Complexity:
Tree-Based Models: Building decision trees can be computationally expensive, especially for large datasets, although predicting with a trained tree is typically fast.
Linear Models: Linear models are computationally efficient both during training and prediction, making them suitable for large datasets.
Ensembles
• An ensemble is simply a collection of models that are all trained to
perform the same task
• An ensemble consist of many different versions of the model, or many
type of models
• The final output of an ensemble of classifiers is typically obtained
through a (weighted) average or vote of the predictions of the
different models in the ensemble
• An ensemble of different models that all achieve similar
generalization performance often outperforms any of the individual
models
What is Ensemble Learning?
It is a powerful method to improve the performance of the model. It combines the predictions
from multiple models. The results are improved by using this ML algorithm.

All algorithm should be different

• All Base Models should be different.

• OR Use same Model but provide different data.
• OR Combination of above two
Ensemble Prediction
For Classification Problem – It will calculate sum of 1
and 0 and display output which has majority.
Example – Pred1=0, Pred2=1 and Pred3=1 then
Count of Ones = 2 & Count of zeros = 1
Majority = Ones [1] then Ensemble Prediction =
1

For Regression Problem – Its will take mean of all

predictions and display that mean value
Example- Pred1=1.5, Pred2=2.5 and Pred3=2.0 then
Average/Mean =[1.5+2.5+2.0]/3
then Ensemble Prediction = 2
Types of Ensemble Learning
• Voting Ensemble

• Bagging or Bootstrap Aggregation — Random Forest

• Boosting — AdaBoosting, Gradient Boost and XG Boost

• Stacking Ensemble
Some Simple Ensembles
• Voting or averaging of predictions of multiple pre-trained models

• “Stacking” Use predictions of multiple models as features to train a

new model and use the new model to make predictions on test data
Ensemble : Another Approach
• Instead of training different models on same data, train same model
multiple times on different data sets, and combine these different
models.
• How do we get multiple training data sets (in practice, we have only
one data set at training time)?
Bagging
“Bagging” comes from Bootstrap+
AGGregating.
• All algorithm to be provided different set of
D data.
• Data Sampling will be of three types
a) Row Sampling
b) Column Sampling
c) Combination of Row & Column
D1
D2 • D1,D2 & D3 should always be less than D
D3 • Use same Machine Learning Models and they
all runs in Parallel
• Boot Strapping Sample Data - In bagging, a
random sample of data in a training set is
“with replacement”—meaning that the
individual data points can be chosen more
than once.
• All Machine learning Models are same, but
Input data is different, so prediction would
have different variety.
Random Forest
Random forest is a commonly-used machine learning algorithm,
which combines the output of multiple decision trees to reach a single result.
Its ease of use and flexibility have fuelled its adoption, as it handles both classification and regression problems.
Random Forest Hyperparameter
• max_depth
• min_sample_split
• max_leaf_nodes
• min_samples_leaf
• n_estimators
• max_sample (bootstrap sample)
• max_features
Random Forest Hyperparameter
• max_depth : of a tree in Random Forest is defined as the longest path
between the root node and the leaf node:
• min_sample_split : This parameter that tells the decision tree in a random
forest the minimum required number of observations in any given node
in order to split it. default value as 2 poses the issue that a tree often
keeps on splitting until the nodes are completely pure. As a result, the
tree grows in size and therefore overfits the data. By increasing the
value of the min_sample_split, we can reduce the number of splits that
happen in the decision tree and therefore prevent the model from
overfitting. Image show change from 2 to 6.

• min_sample_leaf : This specifies the minimum number of

samples that should be present in the leaf node after
splitting a node.
Random Forest Hyperparameter
n_estimators : We know that a Random Forest algorithm is nothing but a grouping of
trees. But how many trees should we consider

max_samples : it determines what fraction of the original dataset is given to any

individual tree.
Model performance reaches its max when the data provided is
less than 0.2 fraction of the original dataset

max_features : This is the number of maximum features provided to each tree in a

random forest.

max_terminal_nodes/max_leaf_nodes : This parameter sets a condition on the splitting of

the nodes in the tree and hence restricts the growth of the tree.
Boosting
• Boosting is a method used in machine learning to reduce errors in predictive data
analysis.
• A single machine learning model might make prediction errors depending on the
accuracy of the training dataset.
• Boosting tries to overcome this issue by training multiple models sequentially to
improve the accuracy of the overall system.

How does boosting work?

• Create the initial weak learner.
• Use the weak learner to make predictions on the entire dataset.
• Compute the prediction errors.
• Incorrect predictions are assigned more weight.
• Build another weak learner aimed at fixing the errors of the previous learner.
• Make predictions on the whole dataset using the new learner.
• Repeat this process until the optimal results are obtained.
• The final model is obtained by weighting the mean of all weak learners.
Boosting
Boosting
Why is boosting important?
Boosting is used when you have high bias and low variance.

• Boosting improves machine models' predictive accuracy and performance by converting multiple weak
learners into a single strong learning model. Machine learning models can be weak learners or strong
learners:
• Weak learners
• Weak learners have low prediction accuracy, similar to random guessing. They are prone to overfitting—
that is, they can't classify data that varies too much from their original dataset. For example, if you train
the model to identify cats as animals with pointed ears, it might fail to recognize a cat whose ears are
curled.
• Strong learners
• Strong learners have higher prediction accuracy. Boosting converts a system of weak learners into a single
strong learning system. For example, to identify the cat image, it combines a weak learner that guesses for
pointy ears and another learner that guesses for cat-shaped eyes. After analysing the animal image for
pointy ears, the system analyzes it once again for cat-shaped eyes. This improves the system's overall
accuracy.
Boosting
• You use same Models, same training
data to test model [which includes
wrong predictions from prior model] in
Sequential Order
• Data to be given to first Model.
• During training it identify all data which
is incorrectly predicted.
• Making a prediction and passing the
wrong predictions to the next predictor
• Paying more and more attention at each
iteration to the observations. having
prediction errors
• Making new predictions until the limit is
reached or a higher accuracy is
achieved.
• Model which are providing you more
accuracy, assign more weights to them
Bagging Vs Boosting

COMP 6930 Topic01 Classification Basics
No ratings yet
COMP 6930 Topic01 Classification Basics
190 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
No ratings yet
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
61 pages
8 Classification
No ratings yet
8 Classification
82 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
MLT 3 UNIT-Part-1
No ratings yet
MLT 3 UNIT-Part-1
28 pages
Mod 3 Part1 - Merged
No ratings yet
Mod 3 Part1 - Merged
101 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Unit-3 (MLT)
No ratings yet
Unit-3 (MLT)
46 pages
ANSYS 4 Maxwell PDF
No ratings yet
ANSYS 4 Maxwell PDF
2,747 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Unit 3 - Classification
No ratings yet
Unit 3 - Classification
28 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
009-6200-201 (6200 R1.1 Install Commission) Issue2
No ratings yet
009-6200-201 (6200 R1.1 Install Commission) Issue2
256 pages
Classification
No ratings yet
Classification
75 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Topic01 Classification Basics Jiawei Han Extra
No ratings yet
Topic01 Classification Basics Jiawei Han Extra
198 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
Skin Cancer Classification Using Deep Learning
No ratings yet
Skin Cancer Classification Using Deep Learning
65 pages
Unit 1 Classification & Prediction DM
No ratings yet
Unit 1 Classification & Prediction DM
71 pages
Classification by Decision Tree Induction
No ratings yet
Classification by Decision Tree Induction
25 pages
Awsadmst
No ratings yet
Awsadmst
371 pages
DM 3
No ratings yet
DM 3
37 pages
DM 4
No ratings yet
DM 4
68 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
ML - 4
No ratings yet
ML - 4
58 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Unit-3 Classification
No ratings yet
Unit-3 Classification
28 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AIML Lec-11
No ratings yet
AIML Lec-11
18 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
EU GDPR Opportunities For Grocery Retail
100% (1)
EU GDPR Opportunities For Grocery Retail
24 pages
Lec05 Classification DecisionTree
No ratings yet
Lec05 Classification DecisionTree
67 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
1806 - Cyber Law Final Draft
No ratings yet
1806 - Cyber Law Final Draft
25 pages
Class Basic
No ratings yet
Class Basic
75 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
Module 5: Data Mining Algorithms: Classification
No ratings yet
Module 5: Data Mining Algorithms: Classification
34 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Lesson 1: Functions: Solution
0% (1)
Lesson 1: Functions: Solution
4 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Neural Networks
No ratings yet
Neural Networks
19 pages
Business Mathematics - Syllabus
No ratings yet
Business Mathematics - Syllabus
3 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decitions Tree
No ratings yet
Decitions Tree
6 pages
Chapter 4 Supply Management Integration For Competitive Advantage
No ratings yet
Chapter 4 Supply Management Integration For Competitive Advantage
81 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
BioBlocksLab - A Portable DIY Bio Lab Using BioBlocks Language - ScienceDirect
No ratings yet
BioBlocksLab - A Portable DIY Bio Lab Using BioBlocks Language - ScienceDirect
14 pages
Recommendation Systems
No ratings yet
Recommendation Systems
12 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
Elite-7x: Operation Manual
No ratings yet
Elite-7x: Operation Manual
0 pages
Fisher D4 Control Valve With Gen 2 Easy-Drive Electric Actuator
No ratings yet
Fisher D4 Control Valve With Gen 2 Easy-Drive Electric Actuator
36 pages
Chameye C720 User - Manual
No ratings yet
Chameye C720 User - Manual
31 pages
Support Vector Machine
No ratings yet
Support Vector Machine
28 pages
أثر استخدام تقنية الذكاء الاصطناعي (chat gpt) على التحصيل العلمي للطلبة الجامعيين في ظل اقتصاد المعرفة، دراسة ميدانية على عينة من طلبة جامعة الجزائر 2
No ratings yet
أثر استخدام تقنية الذكاء الاصطناعي (chat gpt) على التحصيل العلمي للطلبة الجامعيين في ظل اقتصاد المعرفة، دراسة ميدانية على عينة من طلبة جامعة الجزائر 2
17 pages
L Lpi3 A4
No ratings yet
L Lpi3 A4
29 pages
CH 2 - Path and Circuits
No ratings yet
CH 2 - Path and Circuits
34 pages
SPAR H Guidance
No ratings yet
SPAR H Guidance
25 pages
Dayananda Sagar College of Engineering: Shavige Malleshwara Hills, Kumaraswamy Layout, Bangalore-560078
No ratings yet
Dayananda Sagar College of Engineering: Shavige Malleshwara Hills, Kumaraswamy Layout, Bangalore-560078
12 pages
User Guide To Prosys
No ratings yet
User Guide To Prosys
6 pages
The Jargon File - The Hacker's Dictionary - On The Media
No ratings yet
The Jargon File - The Hacker's Dictionary - On The Media
3 pages
Oracle - End of Support Dates
No ratings yet
Oracle - End of Support Dates
3 pages
Experiment 8 - DDCA
No ratings yet
Experiment 8 - DDCA
4 pages
Classes That Can Be Instantiated: Ghoul Class
No ratings yet
Classes That Can Be Instantiated: Ghoul Class
14 pages
Taxonomy For Cloud Computing
No ratings yet
Taxonomy For Cloud Computing
4 pages
Exam Cell Automation System
No ratings yet
Exam Cell Automation System
3 pages
Naukri NarahariJayavardhan (6y 0m)
No ratings yet
Naukri NarahariJayavardhan (6y 0m)
2 pages
Resume 2
No ratings yet
Resume 2
1 page
Tiv PDF
No ratings yet
Tiv PDF
1 page
What To Do If Passport Is Lost Before Ielts - Google Search
No ratings yet
What To Do If Passport Is Lost Before Ielts - Google Search
1 page
TV Price List WS - June 04, 2020
No ratings yet
TV Price List WS - June 04, 2020
1 page
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Trees

Uploaded by

Trees

Uploaded by

Decision Trees

low impurity is low uncertainty

• If we were to split D into smaller partitions according to the outcomes of

• Information Gain – used by ID3 algorithm

• Gini Index – used by CART

• Constructing a decision tree is all about finding attribute that returns

• Let node N represent or hold the tuples of partition D. The attribute

• Entropy of a Training Set

This kind of a dataset is good for learning

Info(D) known as the entropy of D

Training set, D, of class-labeled tuples randomly selected from the

Training set, D, of class-labeled tuples randomly selected from the

• Each attribute is discrete valued.

• Continuous-valued attributes have been

• The class label attribute, buys computer, has

• Let class C1 correspond to yes and class C2

• There are nine tuples of class yes and five tuples

1. To find the splitting criterion for these tuples, we must

2. we need to compute the expected information requirement

Let us compute for age attribute- with values youth, middle

Info(buys_computer|age=youth) = -2/5 log2(2/5) – 3/5log2(3/5)

Infoage(D) = (5/14)*(0.97095) + (4/14)*(0) + (5/14)*(0.97095)

2. we need to compute the expected information requirement

Let us compute for age attribute- with values youth, middle

Info(buys_computer|age=youth) = -2/5 log2(2/5) – 3/5log2(3/5)

Infoage(D) = (5/14)*(0.97095) + (4/14)*(0) + (5/14)*(0.97095)

Gain = Infod(D)- Infoage(D) =0.940-0.694=0.246 bits

Calculate gain for income, student and credit rating attribute.

Gain(income)= 0.029 bits

Gain(credit_rating)= 0.048 bits

Gain (age)= 0.246 bits

As age has highest information gain, it is selected as the

1. Compute entropy for the dataset

1. Compute entropy for the dataset

1. Compute entropy for the dataset

1. Compute entropy for the dataset

Person will go to play?

Person will not go to play

where, ‘pi’ is the probability of an object being classified to a

While building the decision tree, we would prefer to choose the

Suppose, a binary partition on A splits D into and , then the weighted

Total samples =14, Y =9, N =5 3 M H P A Y

Consider the calculation of for Age, Salary, Job and Performance

M has 4 instances (4 Yes), Y and O have 10 instances (5 Yes

Performance is a binary attribute

All algorithm should be different

• All Base Models should be different.

For Regression Problem – Its will take mean of all

• Bagging or Bootstrap Aggregation — Random Forest

• Boosting — AdaBoosting, Gradient Boost and XG Boost

• “Stacking” Use predictions of multiple models as features to train a

• min_sample_leaf : This specifies the minimum number of

max_samples : it determines what fraction of the original dataset is given to any

max_features : This is the number of maximum features provided to each tree in a

max_terminal_nodes/max_leaf_nodes : This parameter sets a condition on the splitting of

How does boosting work?

You might also like

Infoage(D) = (5/14)(0.97095) + (4/14)(0) + (5/14)*(0.97095)

Infoage(D) = (5/14)(0.97095) + (4/14)(0) + (5/14)*(0.97095)