0% found this document useful (0 votes)

5 views

trees_regression.ipynb - Colab

This document presents a notebook on using decision trees for regression, specifically utilizing the penguins dataset to illustrate the differences between regression and classification settings. It demonstrates how decision trees make predictions through piecewise constant functions and compares them to linear regression models. The notebook also explores the impact of tree depth on prediction complexity in regression tasks.

Uploaded by

whizbainz

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

trees_regression.ipynb - Colab

Uploaded by

whizbainz

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

2/5/25, 2:00 PM trees_regression.

ipynb - Colab

keyboard_arrow_down Decision tree for regression

In this notebook, we present how decision trees are working in regression problems. We show differences with the decision trees previously
presented in a classification setting.

First, we load the penguins dataset specifically for solving a regression problem.

Start coding or generate with AI.

import pandas as pd

url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/palmer_penguins.csv"
penguins = pd.read_csv(url)

penguins = penguins[penguins['Flipper Length (mm)'].notna()]

feature_name = "Flipper Length (mm)"

target_name = "Body Mass (g)"
data_train, target_train = penguins[[feature_name]], penguins[target_name]

To illustrate how decision trees predict in a regression setting, we create a synthetic dataset containing some of the possible flipper length
values between the minimum and the maximum of the original data.

import numpy as np

data_test = pd.DataFrame(
np.arange(data_train[feature_name].min(), data_train[feature_name].max()),
columns=[feature_name],
)

Using the term "test" here refers to data that was not used for training. It should not be confused with data coming from a train-test split, as it
was generated in equally-spaced intervals for the visual evaluation of the predictions.

Note that this is methodologically valid here because our objective is to get some intuitive understanding on the shape of the decision
function of the learned decision trees.

However, computing an evaluation metric on such a synthetic test set would be meaningless since the synthetic dataset does not follow the
same distribution as the real world data on which the model would be deployed.

import matplotlib.pyplot as plt

import seaborn as sns

sns.scatterplot(
data=penguins, x=feature_name, y=target_name, color="black", alpha=0.5
)
_ = plt.title("Illustration of the regression dataset used")

We first illustrate the difference between a linear model and a decision tree.

https://colab.research.google.com/drive/1tWhPW0-421_AIeLqjY87yUjjB_1CuSkv?usp=sharing#scrollTo=1b551029 1/4
2/5/25, 2:00 PM trees_regression.ipynb - Colab

from sklearn.linear_model import LinearRegression

linear_model = LinearRegression()
linear_model.fit(data_train, target_train)
target_predicted = linear_model.predict(data_test)

sns.scatterplot(
data=penguins, x=feature_name, y=target_name, color="black", alpha=0.5
)
plt.plot(data_test[feature_name], target_predicted, label="Linear regression")
plt.legend()
_ = plt.title("Prediction function using a LinearRegression")

On the plot above, we see that a non-regularized LinearRegression is able to fit the data. A feature of this model is that all new predictions
will be on the line.

ax = sns.scatterplot(
data=penguins, x=feature_name, y=target_name, color="black", alpha=0.5
)
plt.plot(
data_test[feature_name],
target_predicted,
label="Linear regression",
linestyle="--",
)
plt.scatter(
data_test[::3],
target_predicted[::3],
label="Predictions",
color="tab:orange",
)
plt.legend()
_ = plt.title("Prediction function using a LinearRegression")

https://colab.research.google.com/drive/1tWhPW0-421_AIeLqjY87yUjjB_1CuSkv?usp=sharing#scrollTo=1b551029 2/4
2/5/25, 2:00 PM trees_regression.ipynb - Colab

Contrary to linear models, decision trees are non-parametric models: they do not make assumptions about the way data is distributed. This
affects the prediction scheme. Repeating the above experiment highlights the differences.

from sklearn.tree import DecisionTreeRegressor

tree = DecisionTreeRegressor(max_depth=1)
tree.fit(data_train, target_train)
target_predicted = tree.predict(data_test)

sns.scatterplot(
data=penguins, x=feature_name, y=target_name, color="black", alpha=0.5
)
plt.plot(data_test[feature_name], target_predicted, label="Decision tree")
plt.legend()
_ = plt.title("Prediction function using a DecisionTreeRegressor")

We see that the decision tree model does not have an a priori distribution for the data and we do not end-up with a straight line to regress
flipper length and body mass.

Instead, we observe that the predictions of the tree are piecewise constant. Indeed, our feature space was split into two partitions. Let's
check the tree structure to see what was the threshold found during the training.

from sklearn.tree import plot_tree

_, ax = plt.subplots(figsize=(8, 6))
_ = plot_tree(tree, feature_names=[feature_name], ax=ax)

https://colab.research.google.com/drive/1tWhPW0-421_AIeLqjY87yUjjB_1CuSkv?usp=sharing#scrollTo=1b551029 3/4
2/5/25, 2:00 PM trees_regression.ipynb - Colab

The threshold for our feature (flipper length) is 206.5 mm. The predicted values on each side of the split are two constants: 3698.71 g and
5032.36 g. These values corresponds to the mean values of the training samples in each partition.

In classification, we saw that increasing the depth of the tree allowed us to get more complex decision boundaries. Let's check the effect of
increasing the depth in a regression setting:

tree = DecisionTreeRegressor(max_depth=3)
tree.fit(data_train, target_train)
target_predicted = tree.predict(data_test)

Increasing the depth of the tree increases the number of partitions and thus the number of constant values that the tree is capable of
predicting.

In this notebook, we highlighted the differences in behavior of a decision tree used in a classification problem in contrast to a regression
problem.

https://colab.research.google.com/drive/1tWhPW0-421_AIeLqjY87yUjjB_1CuSkv?usp=sharing#scrollTo=1b551029 4/4

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
100% (8)
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
148 pages
How To Hack Atm
87% (15)
How To Hack Atm
1 page
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (20)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
100% (10)
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
Machine Learning For Humans
100% (4)
Machine Learning For Humans
97 pages
UNIT III MACHINE LEARNING
No ratings yet
UNIT III MACHINE LEARNING
19 pages
1.10. Decision Trees — scikit-learn 0.24.1 documentation
No ratings yet
1.10. Decision Trees — scikit-learn 0.24.1 documentation
10 pages
Decision_Tree_Regression.ipynb - Colab
No ratings yet
Decision_Tree_Regression.ipynb - Colab
3 pages
ML_4,5 (1)
No ratings yet
ML_4,5 (1)
5 pages
Lecture-7---Decision-Tree-Regression-imran-19032025-103416am
No ratings yet
Lecture-7---Decision-Tree-Regression-imran-19032025-103416am
40 pages
trees_classification.ipynb - Colab
No ratings yet
trees_classification.ipynb - Colab
6 pages
DT_R
No ratings yet
DT_R
2 pages
DA_Lab_Week-3 (1)
No ratings yet
DA_Lab_Week-3 (1)
15 pages
MIS410-Chapter6
No ratings yet
MIS410-Chapter6
47 pages
AIH_Lab2
No ratings yet
AIH_Lab2
10 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
Unit IV
No ratings yet
Unit IV
36 pages
ML Exp6
No ratings yet
ML Exp6
3 pages
ml using python programs
No ratings yet
ml using python programs
12 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
Random Forest
No ratings yet
Random Forest
25 pages
Week 2 Watermark
No ratings yet
Week 2 Watermark
84 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
Exp4 - Supervised Learning
No ratings yet
Exp4 - Supervised Learning
10 pages
Exp 3 121a1047 Lavanya Kurup ML
No ratings yet
Exp 3 121a1047 Lavanya Kurup ML
4 pages
Decision Trees
No ratings yet
Decision Trees
8 pages
Day 3 Assignment
No ratings yet
Day 3 Assignment
4 pages
ML 2
No ratings yet
ML 2
6 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Tutorial 6
No ratings yet
Tutorial 6
8 pages
771 A18 Lec3
No ratings yet
771 A18 Lec3
83 pages
5b Python Implementation of Decision Tree
No ratings yet
5b Python Implementation of Decision Tree
7 pages
ml lab programs 2
No ratings yet
ml lab programs 2
16 pages
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
No ratings yet
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
27 pages
RandomForest2324 CR - 4p
No ratings yet
RandomForest2324 CR - 4p
11 pages
Test2 ML Model Answer
No ratings yet
Test2 ML Model Answer
10 pages
Random Forest
No ratings yet
Random Forest
2 pages
Foundations of Machine Learning: Module 2: Linear Regression and Decision Tree
100% (2)
Foundations of Machine Learning: Module 2: Linear Regression and Decision Tree
16 pages
Unit No. 03 - Classification & Regression
No ratings yet
Unit No. 03 - Classification & Regression
75 pages
Chapter 2 Types of Machine Learning and Their Learning Strategies
No ratings yet
Chapter 2 Types of Machine Learning and Their Learning Strategies
45 pages
Decision Tree Regression
No ratings yet
Decision Tree Regression
2 pages
AI ML - Cycle 2 Programs (1)
No ratings yet
AI ML - Cycle 2 Programs (1)
15 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
22 pages
DecisionTrees RandomForest v2
No ratings yet
DecisionTrees RandomForest v2
27 pages
Decision Trees in Sklearn Decision Trees in Sklearn
No ratings yet
Decision Trees in Sklearn Decision Trees in Sklearn
7 pages
Decision Trees Set-1
No ratings yet
Decision Trees Set-1
7 pages
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 7
No ratings yet
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 7
23 pages
08 09 10 Cross Validation and Decision Trees
No ratings yet
08 09 10 Cross Validation and Decision Trees
15 pages
ML Assignment
No ratings yet
ML Assignment
7 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
CE880_Lecture7_slides
No ratings yet
CE880_Lecture7_slides
78 pages
SL_DT
No ratings yet
SL_DT
25 pages
CSET301 LabW8L2
No ratings yet
CSET301 LabW8L2
1 page
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
Supervised Learning
No ratings yet
Supervised Learning
71 pages
Decision Tree Classification
No ratings yet
Decision Tree Classification
1 page
Decision Tree Comprehesive
No ratings yet
Decision Tree Comprehesive
7 pages
Resensi Big Data, Data Mining, and Machine Learning "Bahasa Inggris"
No ratings yet
Resensi Big Data, Data Mining, and Machine Learning "Bahasa Inggris"
2 pages
Random Forest - Basics
No ratings yet
Random Forest - Basics
9 pages
FDP Session 4 (Decision Tree)
No ratings yet
FDP Session 4 (Decision Tree)
1 page
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
25 pages
Decision Trees For Classification and Regression: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Decision Trees For Classification and Regression: Piyush Rai Introduction To Machine Learning (CS771A)
26 pages
2b Decision Tree 18may
No ratings yet
2b Decision Tree 18may
16 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
From Music To Mathematic
100% (1)
From Music To Mathematic
4 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
Tech Trend 2024 Report-2
No ratings yet
Tech Trend 2024 Report-2
11 pages
Mind Control Patents
100% (1)
Mind Control Patents
41 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (7)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Attention Is All You Need
67% (3)
Attention Is All You Need
11 pages
Wisc V Interpretation
100% (1)
Wisc V Interpretation
8 pages
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
No ratings yet
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
456 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages
Intro To Analytics Modeling Homework 2
No ratings yet
Intro To Analytics Modeling Homework 2
22 pages
Li DAR
No ratings yet
Li DAR
44 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
35 pages
Applied Eco No Metrics With Stata
No ratings yet
Applied Eco No Metrics With Stata
170 pages
Learning Activity Sheet No. 1: Department of Education
No ratings yet
Learning Activity Sheet No. 1: Department of Education
4 pages
Re RSCH2111
No ratings yet
Re RSCH2111
34 pages
Adhitya Pradana 22010110120064 Bab 8 Kti PDF
No ratings yet
Adhitya Pradana 22010110120064 Bab 8 Kti PDF
28 pages
Sample Final MC
No ratings yet
Sample Final MC
5 pages
Analisis Pengaruh Inovasi Produk Dan Kewirausahaan Terhadap Strategi Pemasaran Dalam Membangun Kinerja Pemasaran
No ratings yet
Analisis Pengaruh Inovasi Produk Dan Kewirausahaan Terhadap Strategi Pemasaran Dalam Membangun Kinerja Pemasaran
13 pages
Essentials of Biostatistics - Second Edi
0% (2)
Essentials of Biostatistics - Second Edi
13 pages
Tutorial 1 Machine Learning
No ratings yet
Tutorial 1 Machine Learning
4 pages
Business Statistics and Analytics For Decision Making
No ratings yet
Business Statistics and Analytics For Decision Making
3 pages
Traditional Method
No ratings yet
Traditional Method
55 pages
Bolivia 2019 Elections Newman
No ratings yet
Bolivia 2019 Elections Newman
31 pages
Test Reliability PDF
No ratings yet
Test Reliability PDF
47 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
6 pages
Measures of The Location of The Data
No ratings yet
Measures of The Location of The Data
13 pages
PSYCHSTATS
No ratings yet
PSYCHSTATS
9 pages
Stat Slides 5
No ratings yet
Stat Slides 5
30 pages
Group Assignment - AS GRP 5 PDF
No ratings yet
Group Assignment - AS GRP 5 PDF
37 pages
Default_of_Credit_Card_Clients
No ratings yet
Default_of_Credit_Card_Clients
33 pages
IS310 CH 9 Flashcards - Quizlet
No ratings yet
IS310 CH 9 Flashcards - Quizlet
3 pages
An Introduction To Modern Missing Data Analyses
No ratings yet
An Introduction To Modern Missing Data Analyses
33 pages
Unit-11 Assignment - Statistics
No ratings yet
Unit-11 Assignment - Statistics
4 pages
A Comparison of Goodness-Of-Fit Tests PDF
No ratings yet
A Comparison of Goodness-Of-Fit Tests PDF
16 pages
The Influence of Think-Talk-Write Strategy Towards Reading Comprehension
No ratings yet
The Influence of Think-Talk-Write Strategy Towards Reading Comprehension
12 pages
Quantitative Techniques For Business - 1 2021
No ratings yet
Quantitative Techniques For Business - 1 2021
3 pages
Statistical Quality Control 7th Edition Douglas C. Montgomery 2024 Scribd Download
100% (6)
Statistical Quality Control 7th Edition Douglas C. Montgomery 2024 Scribd Download
40 pages
Math in The Modern World
No ratings yet
Math in The Modern World
3 pages
Confusion Matrix
No ratings yet
Confusion Matrix
4 pages

trees_regression.ipynb - Colab

Uploaded by

trees_regression.ipynb - Colab

Uploaded by

2/5/25, 2:00 PM trees_regression.

keyboard_arrow_down Decision tree for regression

Start coding or generate with AI.

penguins = penguins[penguins['Flipper Length (mm)'].notna()]

feature_name = "Flipper Length (mm)"

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

from sklearn.tree import DecisionTreeRegressor

from sklearn.tree import plot_tree

You might also like