0% found this document useful (0 votes)
5 views40 pages

Lecture 7 - Decision Tree Regression Imran 19032025 103416am

The document provides an overview of decision tree regression, explaining its intuitive approach of segmenting predictor space into distinct regions for making predictions. It describes the process of building a decision tree using the ID3 algorithm, which involves calculating standard deviation reduction to ensure homogeneity in branches. Additionally, it outlines the implementation of decision tree regression in Python, emphasizing its suitability for datasets with multiple independent variables and the importance of evaluating model performance.

Uploaded by

ridasaman47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views40 pages

Lecture 7 - Decision Tree Regression Imran 19032025 103416am

The document provides an overview of decision tree regression, explaining its intuitive approach of segmenting predictor space into distinct regions for making predictions. It describes the process of building a decision tree using the ID3 algorithm, which involves calculating standard deviation reduction to ensure homogeneity in branches. Additionally, it outlines the implementation of decision tree regression in Python, emphasizing its suitability for datasets with multiple independent variables and the importance of evaluating model performance.

Uploaded by

ridasaman47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Decision Tree Regression

Decision tree intuition

Example of decision tree regression

Contents Implementation using python

Class Exercise
(Classification and regression trees)
Decision tree intuition

Tree-based methods for regression and classification involve stratifying or


segmenting the predictor space into a number of simple regions.

Since the set of splitting rules used to segment the predictor space can be
summarized in a tree, these types of machine learning approaches are
known as decision tree methods.

The basic idea of these methods is to partition the space and identify
some representative centroids.
• We are given some data,
which consists of two
independent variable x1
and x2
• The plot represents a
scatter plot of the data
• We want to predict a
dependent variable y, which
we cannot see on this
scatter plot
• We will work with the data
points to build a regression
tree and then consider Y
One way to make predictions in a
regression problem is to divide the
predictor space (i.e. all the possible values
for X1, X2,…, Xp) into distinct regions, say R1,
R2,…,Rk (terminal nodes or leaves).

Then, for every X that falls into a particular


region (say Rj) we make the same
Decision tree
prediction.
intuition

Decision trees are typically drawn upside


down, in the sense that the leaves are at
the bottom of the tree.
• How the data may be split
• Split 1:
• x1>20 and x1<20
• Split 2:
• x2>170 and x2<170
• Split 3:
• x2>200 and x1<200
• Split 4:
• x1>40 and x1<40
• There needs to be a criteria
over which the split takes
place or not (Entropy)
• If a split adds more
information to the data
then the split happens.
• Otherwise the split
does not happen

• The points along the tree where the predictor space is split are referred to as internal nodes.
• The terminology used for these regions created by the splits is leaf
• When a split cannot take place in a region it is called a terminal leaf
• Split 1:
• x1>20 and x1<20
• Split 2:
• x2>170 and x2<170
• For only those points
where x1>20
• Split 3:
• x2>200 and x2<200
• For only those points
where x1<20
• Split 4:
• x1>40 and 20<x1<40
• For only those points
where x2<170
• Now what are we going to
put in the empty boxes
• This is where we need to
consider y i.e., our
dependent variable that
we need to predict or
model
• What we need to check is
how are we going to
predict the value of y for
let us assume an
observation that has
x1=30 and x2=50
• The observation x1=30
and x2=50 lies in the
terminal leaf given in
green
• Now how does that
information that the
observation lies in the
terminal leaf given in
green help us in predicting
the value of y.
• You take the average of all
the values in your terminal
leaf
• Y
• So let us assume the
average for each
terminal leaf are as
given the figure.
• Then for the given
point x1=30 and x2=50
the regression tree
algorithm will give the
output value or
predicted value of Y as
-64.1
• It is pretty straight
forward
• We need to remember
that it works on
averages
• The goal is to add
information to better
predict Y
• The last step is to
add the mean
values into the
decision tree
Example of decision tree regression

The core algorithm for building decision trees called ID3 by J. R.


Quinlan which employs a top-down, greedy search through the space
of possible branches with no backtracking.

The ID3 algorithm can be used to construct a decision tree for


regression by replacing Information Gain with Standard Deviation
Reduction.
• Lets assume we have the following dataset
• We want to train a model to predict the hours played using the four independent variables using decision tree regression
Parameters for the dependent variable
• A decision tree is built top-down
from a root node and involves
partitioning the data into
subsets that contain instances
with similar values
(homogenous).
• We use standard deviation to
calculate the homogeneity of a
numerical sample. If the
numerical sample is completely
homogeneous its standard •Standard Deviation (S) is for tree building (branching).
deviation is zero. •Coefficient of Deviation (CV) is used to decide when to stop
branching. We can use Count (n) as well.
•Average (Avg) is the value in the leaf nodes.
Computing the
Standard
deviation for two
attributes (target
and predictor):
• Standard Deviation Reduction
• The standard deviation reduction is based
on the decrease in standard deviation
after a dataset is split on an attribute.
Constructing a decision tree is all about
finding attribute that returns the highest
standard deviation reduction (i.e., the
most homogeneous branches).
• Step 1: The standard deviation of the
target is calculated.
Standard deviation (Hours Played) = 9.32
• Step 2:
• The dataset is then split on
the different attributes.
• The standard deviation for
each branch is calculated.
• The resulting standard
deviation is subtracted from
the standard deviation
before the split.
• The result is the standard
deviation reduction.
• Step 3: The attribute with the largest standard
deviation reduction is chosen for the decision node
• Step 4a: The dataset is divided based on the values of the
selected attribute. This process is run recursively on the non-leaf
branches, until all data is processed.
• Step 4c: However, the "Sunny" branch has an CV (28%) more
than the threshold (10%) which needs further splitting. We
select "Windy" as the best node after "Outlook" because it
has the largest SDR
• Because the number of data points for both branches (FALSE and
TRUE) is equal or less than 3 we stop further branching and
assign the average of each branch to the related leaf node.
• Step 4d: Moreover, the "rainy" branch has an CV (22%) which is
more than the threshold (10%). This branch needs further
splitting. We select "Windy" as the best node because it has
the largest SDR
• Because the number of data points for all three branches (Cool, Hot
and Mild) is equal or less than 3 we stop further branching and assign
the average of each branch to the related leaf node.

When the number of instances is more than one at a leaf node we calculate the average as the final value for
the target.
Implementation using Python
Introduction

Importing the libraries

Importing the dataset

Contents Training the decision tree algorithm

Predicting new values

Visualizing the decision tree regression results

Evaluating the model


We are using the same dataset of employee
positions and salary

The problem is same that we need to predict


the salary of the perspective new employee

Introduction The decision tree regression algorithm is not


very well adapted to these simple datasets.

It is usually useful for datasets involving


multiple independent variables.

We do not need to apply feature scaling for


the decision tree algorithm
Importing the libraries and dataset

Once you have imported the dataset check to see which columns you need to consider

Also check whether you need to apply any data pre-processing techniques
• Handling missing values
• Data cleaning
• Encoding categorical data
• Feature scaling
Training the decision tree
regression model on the whole
data

• Congragulations!!! You have trained


your first decision tree regression
model. Well done…

• Predicting the new value?


• Please do this yourself
• The predicted salary is lower than
the requested salary.
Plots for the results
• The decision tree regression is not
very well adapted to two dimensional
datasets i.e., one independent and
one dependent variable
• The staircase curve is due to the
splitting of nodes.
• We can observe that 0.5 point before
and after a position level gives the
same salary
• This is the average between the two
levels
• The decision tree plot is not
continuous, rather it is discrete in the
sense that we jump from one value to
the next in each step.
Low resolution
plot, not
correct
• Wrong!!!
• We will not evaluate the model here since
in training we can see all the datapoints
pass through the curve.
Evaluate the • Therefore, for training
• RMSE will be zero
model • Adjusted R score will be 1
• Since RMSE will be zero so Durbin
Watson statistics will be ‘nan’
• Please try this decision tree regression for
higher dimensional datasets i.e., the one
used in multiple linear regression now.
• Once you have trained the model for
higher dimensional dataset use the given
instance to predict the ‘Profit’
• State = ‘Florida’ (after one hot
Class Exercise encoding the code is 010)
• R&D spent = 10
• Administration = 9194
• Marketing = 2000

• [[0,1,0,10,9194,2000]]

You might also like