0% found this document useful (0 votes)

5 views40 pages

Lecture 7 - Decision Tree Regression Imran 19032025 103416am

The document provides an overview of decision tree regression, explaining its intuitive approach of segmenting predictor space into distinct regions for making predictions. It describes the process of building a decision tree using the ID3 algorithm, which involves calculating standard deviation reduction to ensure homogeneity in branches. Additionally, it outlines the implementation of decision tree regression in Python, emphasizing its suitability for datasets with multiple independent variables and the importance of evaluating model performance.

Uploaded by

ridasaman47

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views40 pages

Lecture 7 - Decision Tree Regression Imran 19032025 103416am

Uploaded by

ridasaman47

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Decision Tree Regression

Decision tree intuition

Example of decision tree regression

Contents Implementation using python

Class Exercise
(Classification and regression trees)
Decision tree intuition

Tree-based methods for regression and classification involve stratifying or

segmenting the predictor space into a number of simple regions.

Since the set of splitting rules used to segment the predictor space can be
summarized in a tree, these types of machine learning approaches are
known as decision tree methods.

The basic idea of these methods is to partition the space and identify
some representative centroids.
• We are given some data,
which consists of two
independent variable x1
and x2
• The plot represents a
scatter plot of the data
• We want to predict a
dependent variable y, which
we cannot see on this
scatter plot
• We will work with the data
points to build a regression
tree and then consider Y
One way to make predictions in a
regression problem is to divide the
predictor space (i.e. all the possible values
for X1, X2,…, Xp) into distinct regions, say R1,
R2,…,Rk (terminal nodes or leaves).

Then, for every X that falls into a particular

region (say Rj) we make the same
Decision tree
prediction.
intuition

Decision trees are typically drawn upside

down, in the sense that the leaves are at
the bottom of the tree.
• How the data may be split
• Split 1:
• x1>20 and x1<20
• Split 2:
• x2>170 and x2<170
• Split 3:
• x2>200 and x1<200
• Split 4:
• x1>40 and x1<40
• There needs to be a criteria
over which the split takes
place or not (Entropy)
• If a split adds more
information to the data
then the split happens.
• Otherwise the split
does not happen

• The points along the tree where the predictor space is split are referred to as internal nodes.
• The terminology used for these regions created by the splits is leaf
• When a split cannot take place in a region it is called a terminal leaf
• Split 1:
• x1>20 and x1<20
• Split 2:
• x2>170 and x2<170
• For only those points
where x1>20
• Split 3:
• x2>200 and x2<200
• For only those points
where x1<20
• Split 4:
• x1>40 and 20<x1<40
• For only those points
where x2<170
• Now what are we going to
put in the empty boxes
• This is where we need to
consider y i.e., our
dependent variable that
we need to predict or
model
• What we need to check is
how are we going to
predict the value of y for
let us assume an
observation that has
x1=30 and x2=50
• The observation x1=30
and x2=50 lies in the
terminal leaf given in
green
• Now how does that
information that the
observation lies in the
terminal leaf given in
green help us in predicting
the value of y.
• You take the average of all
the values in your terminal
leaf
• Y
• So let us assume the
average for each
terminal leaf are as
given the figure.
• Then for the given
point x1=30 and x2=50
the regression tree
algorithm will give the
output value or
predicted value of Y as
-64.1
• It is pretty straight
forward
• We need to remember
that it works on
averages
• The goal is to add
information to better
predict Y
• The last step is to
add the mean
values into the
decision tree
Example of decision tree regression

The core algorithm for building decision trees called ID3 by J. R.

Quinlan which employs a top-down, greedy search through the space
of possible branches with no backtracking.

The ID3 algorithm can be used to construct a decision tree for

regression by replacing Information Gain with Standard Deviation
Reduction.
• Lets assume we have the following dataset
• We want to train a model to predict the hours played using the four independent variables using decision tree regression
Parameters for the dependent variable
• A decision tree is built top-down
from a root node and involves
partitioning the data into
subsets that contain instances
with similar values
(homogenous).
• We use standard deviation to
calculate the homogeneity of a
numerical sample. If the
numerical sample is completely
homogeneous its standard •Standard Deviation (S) is for tree building (branching).
deviation is zero. •Coefficient of Deviation (CV) is used to decide when to stop
branching. We can use Count (n) as well.
•Average (Avg) is the value in the leaf nodes.
Computing the
Standard
deviation for two
attributes (target
and predictor):
• Standard Deviation Reduction
• The standard deviation reduction is based
on the decrease in standard deviation
after a dataset is split on an attribute.
Constructing a decision tree is all about
finding attribute that returns the highest
standard deviation reduction (i.e., the
most homogeneous branches).
• Step 1: The standard deviation of the
target is calculated.
Standard deviation (Hours Played) = 9.32
• Step 2:
• The dataset is then split on
the different attributes.
• The standard deviation for
each branch is calculated.
• The resulting standard
deviation is subtracted from
the standard deviation
before the split.
• The result is the standard
deviation reduction.
• Step 3: The attribute with the largest standard
deviation reduction is chosen for the decision node
• Step 4a: The dataset is divided based on the values of the
selected attribute. This process is run recursively on the non-leaf
branches, until all data is processed.
• Step 4c: However, the "Sunny" branch has an CV (28%) more
than the threshold (10%) which needs further splitting. We
select "Windy" as the best node after "Outlook" because it
has the largest SDR
• Because the number of data points for both branches (FALSE and
TRUE) is equal or less than 3 we stop further branching and
assign the average of each branch to the related leaf node.
• Step 4d: Moreover, the "rainy" branch has an CV (22%) which is
more than the threshold (10%). This branch needs further
splitting. We select "Windy" as the best node because it has
the largest SDR
• Because the number of data points for all three branches (Cool, Hot
and Mild) is equal or less than 3 we stop further branching and assign
the average of each branch to the related leaf node.

When the number of instances is more than one at a leaf node we calculate the average as the final value for
the target.
Implementation using Python
Introduction

Importing the libraries

Importing the dataset

Contents Training the decision tree algorithm

Predicting new values

Visualizing the decision tree regression results

Evaluating the model

We are using the same dataset of employee
positions and salary

The problem is same that we need to predict

the salary of the perspective new employee

Introduction The decision tree regression algorithm is not

very well adapted to these simple datasets.

It is usually useful for datasets involving

multiple independent variables.

We do not need to apply feature scaling for

the decision tree algorithm
Importing the libraries and dataset

Once you have imported the dataset check to see which columns you need to consider

Also check whether you need to apply any data pre-processing techniques
• Handling missing values
• Data cleaning
• Encoding categorical data
• Feature scaling
Training the decision tree
regression model on the whole
data

• Congragulations!!! You have trained

your first decision tree regression
model. Well done…

• Predicting the new value?

• Please do this yourself
• The predicted salary is lower than
the requested salary.
Plots for the results
• The decision tree regression is not
very well adapted to two dimensional
datasets i.e., one independent and
one dependent variable
• The staircase curve is due to the
splitting of nodes.
• We can observe that 0.5 point before
and after a position level gives the
same salary
• This is the average between the two
levels
• The decision tree plot is not
continuous, rather it is discrete in the
sense that we jump from one value to
the next in each step.
Low resolution
plot, not
correct
• Wrong!!!
• We will not evaluate the model here since
in training we can see all the datapoints
pass through the curve.
Evaluate the • Therefore, for training
• RMSE will be zero
model • Adjusted R score will be 1
• Since RMSE will be zero so Durbin
Watson statistics will be ‘nan’
• Please try this decision tree regression for
higher dimensional datasets i.e., the one
used in multiple linear regression now.
• Once you have trained the model for
higher dimensional dataset use the given
instance to predict the ‘Profit’
• State = ‘Florida’ (after one hot
Class Exercise encoding the code is 010)
• R&D spent = 10
• Administration = 9194
• Marketing = 2000

• [[0,1,0,10,9194,2000]]

Attached The New Science of Adult Attach PDF
7% (28)
Attached The New Science of Adult Attach PDF
3 pages
Cnc Toefl Vocabulary: Name Result / 60 통과여부 P / F Date
No ratings yet
Cnc Toefl Vocabulary: Name Result / 60 통과여부 P / F Date
2 pages
CCN Lecture 3 BRIS SP25 17022025 033144pm
No ratings yet
CCN Lecture 3 BRIS SP25 17022025 033144pm
23 pages
Assignment1 CVT 14032025 112123am
No ratings yet
Assignment1 CVT 14032025 112123am
1 page
AI Quizi 1 Bs RIS 4C SPR 2025
No ratings yet
AI Quizi 1 Bs RIS 4C SPR 2025
1 page
Midterm Lab Viva - BS RIS 2A 27032025 120831pm
No ratings yet
Midterm Lab Viva - BS RIS 2A 27032025 120831pm
2 pages
AI Assi 1 BEE Fall 2024
No ratings yet
AI Assi 1 BEE Fall 2024
1 page
Lecture 2 20022025 092902am
No ratings yet
Lecture 2 20022025 092902am
87 pages
Decision Tree
No ratings yet
Decision Tree
82 pages
Probabilistic Physics-Guided Machine Learning For Fatigue Data Analysis
No ratings yet
Probabilistic Physics-Guided Machine Learning For Fatigue Data Analysis
31 pages
Artificial Intelligence Journal
No ratings yet
Artificial Intelligence Journal
46 pages
Writing Fce Wed
No ratings yet
Writing Fce Wed
1 page
Lecture 5a
No ratings yet
Lecture 5a
24 pages
Decision Tree Regression Fully Explained by Example
No ratings yet
Decision Tree Regression Fully Explained by Example
4 pages
Physical Quantities, Conversion of Units, and Scientific Notation
No ratings yet
Physical Quantities, Conversion of Units, and Scientific Notation
52 pages
ML Ch-3 Decision Trees and Ensemble Methods
No ratings yet
ML Ch-3 Decision Trees and Ensemble Methods
14 pages
Assignment No 2 28112024 092512pm
No ratings yet
Assignment No 2 28112024 092512pm
1 page
Cassi
No ratings yet
Cassi
4 pages
Classification and Regression Tree
No ratings yet
Classification and Regression Tree
5 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
BUS - ETHICS Q4 WEEK 6 Send
No ratings yet
BUS - ETHICS Q4 WEEK 6 Send
16 pages
Decision - Tree - Regression - Ipynb - Colab
No ratings yet
Decision - Tree - Regression - Ipynb - Colab
3 pages
MI - Unit 4
No ratings yet
MI - Unit 4
79 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
UNIT-3 ML Notes
No ratings yet
UNIT-3 ML Notes
4 pages
Module 5 Machine Learning
No ratings yet
Module 5 Machine Learning
36 pages
Unit Iii Machine Learning
No ratings yet
Unit Iii Machine Learning
19 pages
2 - Updated - Ai ML Unit 3 QB 1 2
No ratings yet
2 - Updated - Ai ML Unit 3 QB 1 2
75 pages
Logistic - Regression
No ratings yet
Logistic - Regression
31 pages
DA Lab Week-3
No ratings yet
DA Lab Week-3
15 pages
Decision Trees
No ratings yet
Decision Trees
77 pages
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
No ratings yet
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
10 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
MIS410 Chapter6
No ratings yet
MIS410 Chapter6
47 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Maths Grade 9 - Lesson Note
No ratings yet
Maths Grade 9 - Lesson Note
5 pages
Decision Tree
No ratings yet
Decision Tree
3 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
2b Decision Tree 18may
No ratings yet
2b Decision Tree 18may
16 pages
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
17 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Support, Decision and Random
No ratings yet
Support, Decision and Random
8 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Unit IV
No ratings yet
Unit IV
36 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Manual For Research Project 8613-For Online
No ratings yet
Manual For Research Project 8613-For Online
32 pages
Unit 2
No ratings yet
Unit 2
11 pages
6 - CART Models
No ratings yet
6 - CART Models
15 pages
Random Forest
No ratings yet
Random Forest
25 pages
Internship Rubric 70 Marks
No ratings yet
Internship Rubric 70 Marks
1 page
Decision Tree
No ratings yet
Decision Tree
13 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Lecture Note 5
No ratings yet
Lecture Note 5
7 pages
Session 9 10 Decision Tree
No ratings yet
Session 9 10 Decision Tree
41 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
YOJANA Bhagat
No ratings yet
YOJANA Bhagat
6 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Decision Tree (Autosaved)
No ratings yet
Decision Tree (Autosaved)
14 pages
ngoại ngữ quảng ninh
No ratings yet
ngoại ngữ quảng ninh
6 pages
ML 2
No ratings yet
ML 2
6 pages
Khalid Awan CV of Electrical Engineer 1
No ratings yet
Khalid Awan CV of Electrical Engineer 1
4 pages
Microscopio Nikon E200 Led
No ratings yet
Microscopio Nikon E200 Led
9 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Trust and Managerial Problem Solving
100% (1)
Trust and Managerial Problem Solving
12 pages
Unit 4
No ratings yet
Unit 4
33 pages
Ecocentrism and Anthropocentrism Moral Reasoning About
100% (1)
Ecocentrism and Anthropocentrism Moral Reasoning About
13 pages
Jabatan Kerja Raya Concrete
No ratings yet
Jabatan Kerja Raya Concrete
19 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Unit I: History of Architecture & Culture - Vi
No ratings yet
Unit I: History of Architecture & Culture - Vi
44 pages
F Technology Story
100% (2)
F Technology Story
16 pages
Foundations of Machine Learning: Module 2: Linear Regression and Decision Tree
100% (2)
Foundations of Machine Learning: Module 2: Linear Regression and Decision Tree
16 pages
Supply Chain Management in The Era of The Internet of Things
100% (1)
Supply Chain Management in The Era of The Internet of Things
3 pages
The Introvert Advantage
67% (3)
The Introvert Advantage
28 pages
Unit 3: Classification & Regression: Question Bank and Its Solution
No ratings yet
Unit 3: Classification & Regression: Question Bank and Its Solution
180 pages
Guest Speaker Lesson Plan
No ratings yet
Guest Speaker Lesson Plan
2 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
Taxonomy: Naming, Classifying, and Identifying Microorganisms
No ratings yet
Taxonomy: Naming, Classifying, and Identifying Microorganisms
2 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Machine Design Unit 1 Design Philosophy
100% (1)
Machine Design Unit 1 Design Philosophy
13 pages
08 Tree Regression 1
No ratings yet
08 Tree Regression 1
49 pages
Heriot-Watt University Malaysia Campus
No ratings yet
Heriot-Watt University Malaysia Campus
5 pages
Natural and Wood Fibre Reinforcement in Polymers - Rapra Review Reports
No ratings yet
Natural and Wood Fibre Reinforcement in Polymers - Rapra Review Reports
158 pages
Chapter 7 - Trees
No ratings yet
Chapter 7 - Trees
80 pages
VBScript Regular Expressions Documentation
No ratings yet
VBScript Regular Expressions Documentation
5 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
Green SaND
No ratings yet
Green SaND
7 pages
18R 97
0% (1)
18R 97
16 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Mou Abstrct Suarez
No ratings yet
Mou Abstrct Suarez
4 pages
Standard Operating Procedures
No ratings yet
Standard Operating Procedures
12 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
From Average To K-means
From Everand
From Average To K-means
Beam van Waardenberg
No ratings yet

Lecture 7 - Decision Tree Regression Imran 19032025 103416am

Uploaded by

Lecture 7 - Decision Tree Regression Imran 19032025 103416am

Uploaded by

Decision Tree Regression

Decision tree intuition

Example of decision tree regression

Contents Implementation using python

Tree-based methods for regression and classification involve stratifying or

Then, for every X that falls into a particular

Decision trees are typically drawn upside

The core algorithm for building decision trees called ID3 by J. R.

The ID3 algorithm can be used to construct a decision tree for

Importing the libraries

Importing the dataset

Contents Training the decision tree algorithm

Predicting new values

Visualizing the decision tree regression results

Evaluating the model

The problem is same that we need to predict

Introduction The decision tree regression algorithm is not

It is usually useful for datasets involving

We do not need to apply feature scaling for

• Congragulations!!! You have trained

• Predicting the new value?

You might also like