0% found this document useful (0 votes)

3 views7 pages

Regression Tree

This lecture discusses two approaches to regression: the Basis Approach and Regression Trees. The Basis Approach involves approximating the regression function using an orthonormal basis and analyzing bias and variance to determine an optimal tuning parameter. The Regression Tree method partitions the covariate space into regions and predicts values based on averages within those regions, with a focus on minimizing a score that balances fitting quality and model complexity.

Uploaded by

Sylvia Cheung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views7 pages

Regression Tree

Uploaded by

Sylvia Cheung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

STAT 425: Introduction to Nonparametric Statistics Winter 2018

Lecture 10: Regression: Basis Approach and Regression Tree

Instructor: Yen-Chi Chen

Reference: page 108–111 and Chapter 8 of All of nonparametric statistics.

10.1 Basis Approach

Recall that we observes pairs (X1 , Y1 ), · · · , (Xn , Yn ) and we are interested in the regression function m(x) =
E(Y1 |X1 = x). In this section, we will make the following two assumptions:

• Yi = m(Xi ) + σ · i , where i ∼ N (0, 1) is the noise. Moreover, 1 , · · · , n are IID.

i
• Xi = n. Namely, the covariates consist a uniform grid over [0, 1] and is non-random.

Similar to the basis approach for the density estimation problem where we approximate the density function
by the sum of coefficients and basis, we will approximate the regression function by a basis:

∞
X
m(x) = θj φj (x),
j=1

where {φ1 , φ2 , · · · } is an orthonormal basis and θ1 , θ2 , · · · are the coefficients.

Again, here we consider the cosine basis:
√
φ1 (x) = 1, φj (x) = 2 cos((j − 1)πx), j = 2, 3, · · · .

As is done in the density estimation, we will use only the top M basis to form our estimator. Namely,

M
X
m
b M (x) = θbj φj (x),
j=1

for some coefficient estimates θb1 , · · · . Again, M is the tuning parameter in our estimator.
Here is a simple choice of the coefficient estimates that we will be using:

n n
1X 1X i
θbj = Yi φj (Xi ) = Yi φj .
n i=1 n i=1 n

To determine the tuning parameter M , we analyze the MISE. We start with analyzing the bias and variance
of θbj .

10-1
10-2 Lecture 10: Regression: Basis Approach and Regression Tree

10.1.1 Asymptotic theory

Asymptotic normality. Note that the estimator can be rewritten as

M
X
m
b M (x) = θbj φj (x)
j=1
M n
X 1X i
= Yi φj φj (x)
j=1
n i=1 n
n M
1X X i
= Yi φj φj (x).
n i=1 j=1 n

Thus, for M being fixed, we have

√ D 2
b M (x) − E(m
n (m b M (x))) → N (0, σM )

2
for some σM . Note that later our analysis will demonstrate

M
X M
X
2 2
E(m
b M (x)) = θj φj (x), σM =σ φ2j (x).
j=1 j=1

Bias.

bias(θbj ) = E(θbj ) − θj
n !
1X i i
=E Yi φj |Xi = − θj
n i=1 n n
n
1X i i
= E Yi |Xi = φj − θj
n i=1 n n
n
1X i i
= m φj − θj
n i=1 n n
n Z 1
1X i i
= m φj − m(x)φj (x)dx.
n i=1 n n 0

Namely, the bias is the difference between actual integration and a discretized version of integration. We
know that when n is large, the two integrations are almost the same so we can ignore the bias. Thus, we
will write

bias(θbj ) = 0

for simplicity.
Lecture 10: Regression: Basis Approach and Regression Tree 10-3

Variance.
 
1 X n 
i i 
Var(θbj ) = Var  m + σ · i φ j

 n i=1 n n 

| {z }
=Yi
n
1 X 2 i
= 2 φ Var (i )
n i=1 j n
n
σ2 X 2 i

= 2 φj .
n i=1 n

1
Pn i
R1
Note that n i=1 φ2j n ≈ 0
φ2j (x)dx = 1. For simplicity, we just write

σ2
Var(θbj ) = .
n

MISE. To analyze the MISE, we first note that the bias of m

b M (x) is
M
X ∞
X ∞
X
bias(m b M (x)) − m(x) =
b M (x)) = E(m θj φj (x) − θj φj (x) = θj φj (x).
j=1 j=1 j=M +1

This further implies that the integrated sqaured bias

Z 1 Z 1 X ∞ ∞
X
bias2 (m
b M (x))dx = θj φj (x) θ` φ` (x)dx
0 0 j=M +1 `=M +1
X∞ ∞
X Z 1
= θj θ` φj (x)φ` (x)dx
j=M +1 `=M +1 |0 {z }
=I(j=`)
∞
X
= θj2 .
j=M +1

R1
Again, if we assume that m satisfies 0
|m00 (x)|2 dx < ∞, we have
∞
X
θj2 = O(M −4 ).
j=M +1

Now we turn to the analysis of variance.

 
XM
Var(m
b M (x)) = Var  θbj φj (x)
j=1
M
X
= Var(θbj )φ2j (x)
j=1
M
σ2 X 2
= φ (x).
n j=1 j
10-4 Lecture 10: Regression: Basis Approach and Regression Tree

Thus, the integrated variance is

1 M Z
σ2 X 1 2 σ2 M
Z
M
Var(m
b M (x))dx = φj (x)dx = =O .
0 n j=1 0 n n

Recall that the MISE is just the sum of integrated bias and integrated variance, we obtain
Z 1 Z 1
M
MISE(m
bM) = bias2 (m
b M (x))dx + b M (x))dx = O(M −4 ) + O
Var(m .
0 0 n

Thus, the optimal choice is

M ∗ n1/5 .

10.1.2 Basis approach as a linear smoother

The basis estimator is another linear smoother. To see this, we use the follow expansion:

M
X
m
b M (x) = θbj φj (x)
j=1
M n
X 1X
= Yi φj (Xi )φj (x)
j=1
n i=1
 
n M
X X 1
=  φj (Xi )φj (x) Yi
i=1 j=1
n
n
X
= `i (x)Yi ,
i=1

PM 1
where `i (x) = j=1 n φj (Xi )φj (x).

Recall that from the linear smoother theory, we can estimate σ 2 using the residuals and the degree of freedom:
n
1 X
b2 =
σ e2 ,
n − 2ν + νe i=1 i

where ei = Ybi − Yi = m
b M (Xi ) − Yi and ν, νe are the degree of freedoms (see the previous lecture note).
2 PM
With this variance estimator and the fact that Var(m b M (x)) = σn 2
j=1 φj (x) and the asymptotic normality,
we can construct a confidence interval (band) of m using

M
b2 X 2
σ
b M (x) ± z1−α/2
m φ (x).
n j=1 j

PM
Note that this confidence interval is valid for E(m
b M (x)) = j=1 θj φj (x), not the actual m(x). The difference
between them is the bias of our estimator.
Lecture 10: Regression: Basis Approach and Regression Tree 10-5

10.2 Regression Tree

In this section, we assume that the covariate may have multiple dimensions, i.e., x = (x1 , · · · , xd ). And our
data are (X1 , Y1 ), · · · , (Xn , Yn ) ∼ P for some CDF P . Again, we are interested in the regression function
m(x) = E(Y1 |X1 = x).
Regression tree constructs an estimator of the form:
M
X
m(x) = c` I(x ∈ R` ),
`=1

where R` is some rectangle partition of the space of covariates.

Here is an example of a regression tree and its splits. In this example, there are two covariates (namely,
d = 2) and we have 3 regions R1 , R2 , R3 :
R1 = {(x1 , x2 ) : x1 < 10, x2 < 5}, R2 = {(x1 , x2 ) : x1 < 10, x2 ≥ 5}, R3 = {(x1 , x2 ) : x1 ≥ 10}.

x1
< 10 ≥ 10

x2 R3
<5 ≥5

R1 R2
x2

R2
R3

R1
x1
A regression tree estimator will predict the same value of the response Y within the same area of the covariate.
Namely, m(x) will be the same when x is within the same area.
To use a regression tree, there are 2M quantities to be determined: the regions R1 , · · · , RM and the predicted
values c1 , · · · , cM . When R1 , · · · , RM are given, c1 , · · · , cM can be simply estimated by the average within
each region, i.e., Pn
Yi I(Xi ∈ R` )
c` = Pi=1n .
i=1 I(Xi ∈ R` )
b

Thus, the difficult part is the determination of R1 , · · · , RM .

Unfortunately, there is no simple closed form solution to these regions. We only have a procedure for
computing it. Here is what we will do in practice. Let Xij be the j-th coordinate of the i-th observation
(Xi ).
10-6 Lecture 10: Regression: Basis Approach and Regression Tree

1. For a given j, we define

Ra (j, s) = {x : x < s}, Rb (j, s) = {x : x ≥ s}.

2. Find ca and cb that minimizes

X X
(Yi − ca )2 , (Yi − cb )2 ,
Xi ∈Ra Xi ∈Rb

respectively.
3. Compute the score X X
S(j, s) = (Yi − ca )2 + (Yi − cb )2 .
Xi ∈Ra Xi ∈Rb

4. Change s and repeat the same calculation until we find the minimizer of S(j, s), denoted the minimal
score as S ∗ (j).
5. Compute the score S ∗ (j) for j = 1, · · · , d.
6. Pick the dimension (coordinate) and the corresponding split point s that has the minimal score S ∗ (j).
Partition the space into two parts according to this split.
7. Repeat the above procedure for each partition until certain stopping criterion is satisfied.

b1 , · · · , R
Using the above procedure, we will eventually end up with a collection of rectangle partitions R bM .
Then the final estimator is
M
X
m(x)
b = c` I(x ∈ R
b b` ).
`=1

For the stopping criterion, sometimes people will pick the number of M so as long as we obtain M regions,
the splitting procedure will stop. However, such a choice M is rather arbitrary. A popular alternative is to
top the criterion based on minimizing some score that balances the fitting quality and the complexity of the
tree. For instance, we may stop the criterion if the following score is no longer decreasing:
n
1X
Cλ,n (M ) = b i ))2 + λM,
(Yi − m(X
n i=1

where λ > 0 is a tuning parameter that determines the ‘penalty’ for having a complex tree. In the next
lecture, we will talk more about this penalty type tuning parameter.
Remark.

• Interpreation. Regression tree has a powerful feature that it is easy to interpret. Even without
much training, a practitioner can use the output from a regression tree very easily. A limitation of
the regression tree is that it partitions the space of covariates into rectangle regions, which may be
unrealistic for the actual regression model.
• Cross-validation. How to choose the tuning parameter λ? There is a simple approach called the
cross-validation1 that can compute a good choice of this quantity. Not only λ, other tuning parameters
such as the number of basis M , the smoothing bandwidth h, the bin size b, can be chosen using the
cross-validation.
1 https://en.wikipedia.org/wiki/Cross-validation_(statistics)
Lecture 10: Regression: Basis Approach and Regression Tree 10-7

• MARS (multivariate adaptive regression splines). The regression tree has another limitation
that it predicts the same value within the same region. This creates a jump on the boundary of
two consecutive regions. There is a modified regression tree called MARS (multivariate adaptive
regression splines) that allows a continuous (and possibly smooth) changes over two regions. See
https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines.

The Unfolding of Time and Space
No ratings yet
The Unfolding of Time and Space
38 pages
Data Structures For Statistical Computing in Python
No ratings yet
Data Structures For Statistical Computing in Python
6 pages
Karma: What It Is, What It Isn't, Why It Matters. by Traleg Kyabgon
100% (1)
Karma: What It Is, What It Isn't, Why It Matters. by Traleg Kyabgon
2 pages
The Process of Statistical Analysis in Psychology, 1st Edition Complete EPUB Download
100% (11)
The Process of Statistical Analysis in Psychology, 1st Edition Complete EPUB Download
17 pages
Lecture Notes Week One
No ratings yet
Lecture Notes Week One
16 pages
Chapter 7 - Trees
No ratings yet
Chapter 7 - Trees
80 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Big Book For Buckyballs Tricks
0% (2)
Big Book For Buckyballs Tricks
6 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
31 pages
Confidence Interval
No ratings yet
Confidence Interval
16 pages
Desingn of Experiments ch10
No ratings yet
Desingn of Experiments ch10
5 pages
Linear Regression Analysis - 4
No ratings yet
Linear Regression Analysis - 4
23 pages
Week 2
No ratings yet
Week 2
54 pages
Non Par Regression
No ratings yet
Non Par Regression
35 pages
Delivery Format - Step 2203058 - 24
No ratings yet
Delivery Format - Step 2203058 - 24
11 pages
New Quantum Paradox Clarifies Where Our Views of Reality Go Wrong
No ratings yet
New Quantum Paradox Clarifies Where Our Views of Reality Go Wrong
32 pages
Topic 7.6 Regression Analysis and Learning Regression Analysis
No ratings yet
Topic 7.6 Regression Analysis and Learning Regression Analysis
6 pages
Cart Animation en Feb19 Final
No ratings yet
Cart Animation en Feb19 Final
60 pages
LP5 Q4W1 Miot
No ratings yet
LP5 Q4W1 Miot
17 pages
Lecture Notes - Extensions of Functions
No ratings yet
Lecture Notes - Extensions of Functions
42 pages
Yesim Ozan - Simple Linear Regression-Presentation - 08.08.15
No ratings yet
Yesim Ozan - Simple Linear Regression-Presentation - 08.08.15
19 pages
ML U2 Regression
No ratings yet
ML U2 Regression
20 pages
Piecewise-Polynomial Regression Trees
No ratings yet
Piecewise-Polynomial Regression Trees
30 pages
Abdul Wahab DSP LAB 33
No ratings yet
Abdul Wahab DSP LAB 33
7 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
??? ????? ???? ??????? ?
No ratings yet
??? ????? ???? ??????? ?
4 pages
08 Tree Regression 1
No ratings yet
08 Tree Regression 1
49 pages
Multiple Reg Ludlow
No ratings yet
Multiple Reg Ludlow
54 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Computer Intensive Methods in Statistics
No ratings yet
Computer Intensive Methods in Statistics
227 pages
Tree Based Methods
No ratings yet
Tree Based Methods
21 pages
DISC 212 Session 13
No ratings yet
DISC 212 Session 13
29 pages
EUC1502 Module3 Machine-Learning
No ratings yet
EUC1502 Module3 Machine-Learning
25 pages
Mod4 Eda
No ratings yet
Mod4 Eda
13 pages
Endogeneity
No ratings yet
Endogeneity
19 pages
There Is No Matter, Only Waves
No ratings yet
There Is No Matter, Only Waves
6 pages
Energy Resonance System
No ratings yet
Energy Resonance System
2 pages
Quantum Uncertainty and Wave-Particle Duality Are Equivalent
No ratings yet
Quantum Uncertainty and Wave-Particle Duality Are Equivalent
3 pages
Nonparametric Regression
No ratings yet
Nonparametric Regression
7 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
Information Converted To Energy
No ratings yet
Information Converted To Energy
4 pages
9 W9INSE6220 Fall 2023
No ratings yet
9 W9INSE6220 Fall 2023
42 pages
A LATTICE-THEORETICAL FIXPOINT THEOREM - Tarski PDF
No ratings yet
A LATTICE-THEORETICAL FIXPOINT THEOREM - Tarski PDF
26 pages
ECMT1020 Formulas 2021
No ratings yet
ECMT1020 Formulas 2021
9 pages
M 3 F 22 CH VIII
No ratings yet
M 3 F 22 CH VIII
25 pages
Statistics
No ratings yet
Statistics
53 pages
M&B Icp
No ratings yet
M&B Icp
1 page
اسئلة السنوات الماضية للمقاومة (الكورس الثاني)
No ratings yet
اسئلة السنوات الماضية للمقاومة (الكورس الثاني)
7 pages
Adv Stat Inf
No ratings yet
Adv Stat Inf
194 pages
SVM Regressor
No ratings yet
SVM Regressor
13 pages
Multivariate Classification
No ratings yet
Multivariate Classification
7 pages
CS New
No ratings yet
CS New
7 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
12 W12NSE6220 - Fall 2023 - Zeng
No ratings yet
12 W12NSE6220 - Fall 2023 - Zeng
44 pages
Chapter Iv: Stochastic Processes in Discrete Time 1. Filtrations
No ratings yet
Chapter Iv: Stochastic Processes in Discrete Time 1. Filtrations
12 pages
ML Cheat
No ratings yet
ML Cheat
9 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Support, Decision and Random
No ratings yet
Support, Decision and Random
8 pages
C Lab Questions
100% (9)
C Lab Questions
20 pages
GAYSAN ENDUSTRI KATALOGU Sikistirildi 1 Min
No ratings yet
GAYSAN ENDUSTRI KATALOGU Sikistirildi 1 Min
61 pages
Code Challenges For A Level 21 40 1
No ratings yet
Code Challenges For A Level 21 40 1
10 pages
Business Analytics
No ratings yet
Business Analytics
19 pages
Eviews VAR Mit
No ratings yet
Eviews VAR Mit
5 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
05 - Measures For in Sample Evaluation - en
No ratings yet
05 - Measures For in Sample Evaluation - en
1 page
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
Robust Regression: 1 M-Estimation
No ratings yet
Robust Regression: 1 M-Estimation
8 pages
CIRED MV Shielded Busbar Long Term Ageing Test
No ratings yet
CIRED MV Shielded Busbar Long Term Ageing Test
5 pages
10.05 Finding The Equation From A Graph - Worksheet
No ratings yet
10.05 Finding The Equation From A Graph - Worksheet
9 pages
R Unit 4th and 5th
No ratings yet
R Unit 4th and 5th
17 pages
Exploring Negative Space: Log Cross Ratio 2i
100% (3)
Exploring Negative Space: Log Cross Ratio 2i
11 pages
Chapter 5 Learning Deterministic Models
No ratings yet
Chapter 5 Learning Deterministic Models
28 pages
Rizwan & Khan (2007)
No ratings yet
Rizwan & Khan (2007)
14 pages
Nalysis of Lgorithms
No ratings yet
Nalysis of Lgorithms
17 pages
Input Data Sheet For E-Class Record: Region Division School Name School Id School Year
No ratings yet
Input Data Sheet For E-Class Record: Region Division School Name School Id School Year
13 pages
Multiple Regression
No ratings yet
Multiple Regression
32 pages
Quantum Pure Possibilities
No ratings yet
Quantum Pure Possibilities
44 pages
Something From Nothing
100% (1)
Something From Nothing
6 pages
Maths Question Sheet PDF
No ratings yet
Maths Question Sheet PDF
32 pages
Data Science Cheatsheet
100% (1)
Data Science Cheatsheet
5 pages
Bootstrapping Time Series Models
No ratings yet
Bootstrapping Time Series Models
43 pages
Space Time and Consciousness
No ratings yet
Space Time and Consciousness
10 pages
Exploring Data Patterns: Time Series
No ratings yet
Exploring Data Patterns: Time Series
20 pages
Testing Two Related Means
No ratings yet
Testing Two Related Means
19 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Emptiness and Category Theory
No ratings yet
Emptiness and Category Theory
14 pages
Smoothing: Smooth
No ratings yet
Smoothing: Smooth
19 pages
Econometric Theory: Module - Ii
No ratings yet
Econometric Theory: Module - Ii
11 pages
Cl-Vii Ass2 4301063
No ratings yet
Cl-Vii Ass2 4301063
5 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Padasalai Net 12th Computer Science Full Answer Key Quarterly Exam 2017
No ratings yet
Padasalai Net 12th Computer Science Full Answer Key Quarterly Exam 2017
5 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Capital Asset Pricing Model
No ratings yet
Capital Asset Pricing Model
1 page
Prewhitening With SPSS
No ratings yet
Prewhitening With SPSS
4 pages
Adaptation of Mars Scale For Online Students
No ratings yet
Adaptation of Mars Scale For Online Students
6 pages
Data Sufficiency Question Bank
No ratings yet
Data Sufficiency Question Bank
5 pages
Informatics and Consciousness Transfer
No ratings yet
Informatics and Consciousness Transfer
13 pages
Decision Trees
No ratings yet
Decision Trees
27 pages
Tinnitus - How An Alternative Remedy Became The Only Weapon Against The Ringing
No ratings yet
Tinnitus - How An Alternative Remedy Became The Only Weapon Against The Ringing
13 pages
Tta 1617
No ratings yet
Tta 1617
5 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
Appendix Robust Regression
No ratings yet
Appendix Robust Regression
8 pages
Comparatives and Superlatives
No ratings yet
Comparatives and Superlatives
3 pages
Is 2500 2 1965
100% (1)
Is 2500 2 1965
44 pages
Decomposition of Time Series
No ratings yet
Decomposition of Time Series
14 pages
Cprog
No ratings yet
Cprog
5 pages
Kangaroo Math 2015 For Practice Purposes
No ratings yet
Kangaroo Math 2015 For Practice Purposes
7 pages
Part 8 Linear Regression
No ratings yet
Part 8 Linear Regression
6 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)

Regression Tree

Uploaded by

Regression Tree

Uploaded by

STAT 425: Introduction to Nonparametric Statistics Winter 2018

Lecture 10: Regression: Basis Approach and Regression Tree

Reference: page 108–111 and Chapter 8 of All of nonparametric statistics.

10.1 Basis Approach

• Yi = m(Xi ) + σ · i , where i ∼ N (0, 1) is the noise. Moreover, 1 , · · · , n are IID.

where {φ1 , φ2 , · · · } is an orthonormal basis and θ1 , θ2 , · · · are the coefficients.

10.1.1 Asymptotic theory

Asymptotic normality. Note that the estimator can be rewritten as

Thus, for M being fixed, we have

MISE. To analyze the MISE, we first note that the bias of m

This further implies that the integrated sqaured bias

Now we turn to the analysis of variance.

Thus, the integrated variance is

Thus, the optimal choice is

10.1.2 Basis approach as a linear smoother

10.2 Regression Tree

where R` is some rectangle partition of the space of covariates.

Thus, the difficult part is the determination of R1 , · · · , RM .

1. For a given j, we define

Ra (j, s) = {x : x < s}, Rb (j, s) = {x : x ≥ s}.

2. Find ca and cb that minimizes

You might also like

• Yi = m(Xi ) + σ · i , where i ∼ N (0, 1) is the noise. Moreover, 1 , · · · , n are IID.