0% found this document useful (0 votes)

27 views11 pages

Regression Trees

This document presents a new algorithm for regression problems that combines regression trees and linear regression. The algorithm grows a regression tree in the standard way, but at each internal node it fits a linear regression model to the examples at that node. It then uses the predictions from that linear model to generate a new attribute, which it uses to split the data and continue growing the tree. This allows the tree to learn oblique splits in the data rather than just axis-parallel splits. An experimental evaluation on 16 datasets found that this algorithm has advantages in generalization ability compared to regression trees or linear regression alone.

Uploaded by

Preeti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views11 pages

Regression Trees

Uploaded by

Preeti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Functional Trees for Regression

João Gama

LIACC, FEP, University of Porto

[email protected]

Abstract. In this paper we present and evaluate a new algorithm for

supervised learning regression problems. The algorithm combines a uni-
variate regression tree with a linear regression function by means of con-
structive induction. When growing the tree, at each internal node, a
linear-regression function creates one new attribute. This new attribute
is the instantiation of the regression function for each example that fall at
this node. This new instance space is propagated down through the tree.
Tests based on those new attributes correspond to an oblique decision
surface. Our approach can be seen as a hybrid model that combines a lin-
ear regression known to have low variance with a regression tree known
to have low bias. Our algorithm was compared against to its components,
and two simpliﬁed versions, and M5 using 16 benchmark datasets. The
experimental evaluation shows that our algorithm has clear advantages
with respect to the generalization ability when compared against its com-
ponents and competes well against the state-of-art in regression trees.

1 Introduction
The generalization capacity of a learning algorithm depends on the appropriate-
ness of its representation language to express a generalization of the examples
for the given task. Different learning algorithms employ different representations,
search heuristics, evaluation functions, and search spaces. It is now commonly
accepted that each algorithm has its own selective superiority [3]; each is best
for some but not all tasks. The design of algorithms that explore multiple repre-
sentation languages and explore different search spaces has an intuitive appeal.
This paper presents one such algorithm.
In the context of supervised learning problems it is useful to distinguish
between classification problems and regression problems. In the former the target
variable takes values in a finite and pre-defined set of un-ordered values, and the
usual goal is to minimize a 0-1-loss function. In the latter the target variable
is ordered and takes values in a subset of . The usual goal is to minimize a
squared error loss function. Mainly due to the differences in the type of the target
variable successful techniques in one type of problems are not directly applicable
to the other type of problems.
Gama [6] has presented a technique to combine classifiers that use distinct
representation languages using constructive induction. In this work we study
the applicability of a related technique for regression problems. In particular we
combine a linear model with a regression tree using constructive induction.

F. Hoﬀmann et al. (Eds.): IDA 2001, LNCS 2189, pp. 156–166, 2001.

c Springer-Verlag Berlin Heidelberg 2001
Functional Trees for Regression 157

Generalized Linear Models. Generalized Linear Models (GLM) is the most fre-
quently applied statistical technique to set a relationship between several inde-
pendent variables
and a target variable. In the most general terms GLM are of
the form w0 + wi × fi (xi ). GLM estimation is aimed at minimizing the sum of
squared deviations of the observed values for the dependent variable from those
predicted by the model. One appealing characteristic is that there is an analyti-
cal solution for this problem. The coeﬃcients for the polynomial using the least
squares error criterion is found by solving the equation: W = (X T X)−1 X T Y
In this paper, we assume that fi is the identity function leading to the linear
multiple regression.

Regression Trees. A regression tree uses a divide-and-conquer strategy that de-

composes a complex problem into simpler problems and recursively applies the
same strategy to the sub-problems. This is the basic idea behind well-known
regression tree based algorithms [2,10,14]. The power of this approach comes
from the ability to split the space of the attributes into subspaces, whereby each
subspace is ﬁtted with diﬀerent functions. The main drawback of this method is
its instability for small variations of the training set [2].

Constructive Induction. Constructive Induction discovers new attributes from

the training set and transforms the original instance space into a new high di-
mensional space by applying attribute constructor operators. The difficulty is
how to choose the appropriate operators for the problem in question. In this
paper we argue that, in regression domains described at least partially by nu-
merical attributes, techniques based on GLM are a useful tool for constructive
induction.
The algorithm that we describe in this work is in the confluence of these
three areas. It explores the power of divide-and-conquer from regression trees
and the ability of generating hyper-planes from linear-regression. It integrates
both using constructive induction. In the next section of the paper we describe
our proposal to functional regression trees. In Section 3 we discuss the different
variants of regression models. In Section 4 we present related work both in the
classification and regression settings. In Section 5 we evaluate our algorithm on
16 benchmark datasets. Last Section concludes the paper.

2 The Algorithm for Regression Trees

The standard algorithm to build regression trees consists of two phases. In the
ﬁrst phase a large decision tree is constructed. In the second phase this tree
is pruned back. The algorithm to grow the tree follows the standard divide-
and-conquer approach. The most relevant aspects are: the splitting rule, the
termination criterion, and the leaf assignment criterion. With respect to the last
criterion, the usual rule consists of assignment a constant to a leaf node. This
constant is usually the mean of the y values taken from the examples that fall
at this node. With respect to the splitting rule, each attribute value deﬁnes a
158 João Gama

possible partition of the dataset. We distinguish between nominal attributes and

continuous ones. In the former the number of partitions is equal to the number of
values of the attribute, in the latter a binary partition is obtained. To estimate
the merit of the
partition obtained by a given attribute we use the following
( y i )2
heuristic: i ni , where i represents the number of partitions and ni is the
number of examples in partition i. The attribute that maximizes this expression
is chosen as test attribute at this node.
The pruning phase consists of traversing the tree in a depth-first fashion.
At each non-leaf node two measures should be estimated. An estimate of the
error of the subtree below this node, that is computed as a weighted sum of
the estimated error for each leaf of the subtree, and the estimated error of the
non-leaf node if it was pruned to a leaf. If the latter is lower than the former, the
entire subtree is replaced to a leaf. To estimate the error at each leaf we assume
a χ2 distribution of the variance of the cases in it. Following [14] a pessimistic
estimate of the MSE at each node t is given by:
n−1 1 1
M SE × ×( 2 + 2 ) (1)
2 χα/2,n−1 χ1−α/2,n−1
where M SE is the mean squared error at this node, n denotes the number of
examples at this node, and α is the confidence level.
All of these aspects have several and important variants, see for example
[2,14]. Nevertheless all decision nodes contain conditions based in the values
of one attribute. The first proposal for a multivariate regression tree has been
presented in 1992 by Quinlan [10]. System M5 builds a tree-based model but
can use at the leaves multiple-linear models. The goal of this paper is to study
when and where to use decisions based on a combination of attributes. Instead
of considering multivariate models restricted to leaves, we analyze and evaluate
multivariate models both at leaves and internal nodes.

2.1 Functional Trees for Regression

In this section we present the general algorithm to construct a functional re-
gression tree. Given a set of examples and an attribute constructor, the main
algorithm used to build a decision tree is:
Function Tree(Dataset, Constructor)
1. If Stop Criterion(DataSet) Return a Leaf Node with a constant value.
2. Construct a model Φ using Constructor
3. For each example x ∈ DataSet
– Compute ŷ = Φ(x)
– Extend x with a new attribute ŷ.
4. Select the attribute that maximizes some merit-function
5. For each partition i of the DataSet using the selected attribute
– Treei = Tree(Dataseti , Constructor)
6. Return a Tree, as a decision node based on the select attribute, containing
the Φ model, and descendents Treei .
End Function
Functional Trees for Regression 159

This algorithm is similar to many others, except in the constructive step

(steps 2 and 3). Here a function is built and mapped to a new attribute. In this
paper, we restricted the Constructor to the linear-regression (LR) function [9].
There are some aspects of this algorithm that should be made explicit. In step
2, a model is built using the Constructor function. This is done using only the
examples that fall at this node. Later, in step 3, the model is mapped to one new
attribute. The merit of the new attribute is evaluated using the merit-function
of the decision tree, and in competition with the original attributes (step 4). The
models built by our algorithm have two types of decision nodes: Those based
on a test of one of the original attributes, and those based on the values of the
constructor function. Once a tree has been constructed, it is pruned back. The
general algorithm to prune the tree is:
Function Prune(Tree)
1. Estimate Leaf Error as the error at this node.
2. If Tree is a leaf Return Leaf Error.
3. Estimate Constructor Error as the error of Φ 1 .
4. For each descendent i
– Backed Up Error += Prune(Treei )
5. If argmin(Leaf Error,Constructor Error,Backed Up Error)
– Is Leaf Error
• Tree = Leaf
• Tree Error = Leaf Error
– Is Model Error
• Tree = Constructor Leaf
• Tree Error = Constructor Error
– Is Backed Up Error
• Tree Error = Backed Up Error
6. Return Tree Error
End Function
The error estimates needed in step 1 and 3 use equation 1. The pruning
algorithm produces two diﬀerent types of leaves: ordinary leaves that predict
the mean of the target variable observed in the examples that fall at this node,
and constructor leaves that is leaves that predict the value of the constructor
function learned (in the growing phase) at this node.
By simplifying our algorithm we obtain diﬀerent conceptual models. Two
interesting lesions are described in the following sub-sections.

Bottom-Up Approach We denote as Bottom-Up Approach to functional re-

gression trees when the multivariate models are used exclusively at leaves. This
is the strategy used for example in M5 [11,16], in Cubist [12], and in RT system
[14]. In our tree algorithm this is done restricting the selection of the test at-
tribute (step 4 in the growing algorithm) to the original attributes. Nevertheless
we still build, at each node, the linear-regression function. The model built by
1
The Constructor model learned in the growing phase at this node.
160 João Gama

the linear-regression function is used later in the pruning phase. In this way, all
decision nodes are based in the original attributes. Leaf nodes could contain a
constructor model. A leaf node contains a constructor model if and only if in the
pruning algorithm the estimated mean-squared error of the constructor model
is lower than the Backed-up-error and the estimated error of the node has if a
leaf replaced it.

Top-Down Approach We denote as Top-Down Approach to functional regres-

sion trees when the multivariate models are used exclusively at decision nodes
(internal nodes). In our algorithm, restricting the pruning algorithm to choose
only between the Backed Up Error and the Leaf Error obtain these kinds
of models. In this case all leaves predict a constant value.
Our algorithm can be seen as a hybrid model that performs a tight combi-
nation of a decision tree and a linear-regression. The components of the hybrid
algorithm use diﬀerent representation languages and search strategies. While the
tree uses a divide-and-conquer method, the linear-regression performs a global
minimization approach. While the former performs feature selection, the latter
uses all (or almost all) the attributes to build a model. From the point of view of
the bias-variance decomposition of the error [1] a decision tree is known to have
low bias but high variance, while linear-regression is known to have low variance
but high bias. This is the desirable behavior for components of hybrid models.

3 An Illustrative Example

In this section we use the well-known regression dataset Housing to illustrate

the diﬀerent variants of regression models. The attribute constructor used is the
linear regression model. Figure 1(a) presents a univariate tree for the Housing
dataset. Decision nodes only contain tests based on the original attributes. Leaf
nodes predict the average of y values taken from the examples that fall at the
leaf.
In a top-down functional tree (Figure 1(b)) decision nodes could contain (not
necessarily) tests based on a linear combination of the original attributes. The
tree contains a mixture of built-in attributes, denoted as LR Node, and original
attributes, e.g. AGE, DIS. Any of the linear-regression attributes can be used
both at the node where they have been created and at deeper nodes. For example,
the LR Node 19 has been created at the second level of the tree. It is used as
test attribute at this node, and also (due to the constructive ability) as test
attribute at the third level of the tree. Leaf nodes predict the average of y values
of the examples that fall at this leaf. In a bottom-up functional tree (Figure
2(a)) decision nodes only contain tests based on the original attributes. Leaf
nodes could predict (not necessarily) values obtained by using a linear-regression
function built from the examples that fall at this node. This is the kind of model
regression trees that usually appears on the literature. For example, systems M5
[11,16] and RT [14] generate this kind of models.
Functional Trees for Regression 161

RM LR Node 14

<= 6.94 > 6.94 <= 29.71 > 29.71

LSTAT RM LR Node 15 LR Node 19

<= 14.4 > 14.4 <= 7.44 > 7.44 <= 18.11 > 18.11 <= 41.0 > 41.0

DIS CRIM Leaf 32.11 B LR Node 16 DIS LR Node 20 LR Node 19

<= 1.38 > 1.38 <= 6.99 > 6.99 <= 361.92 > 361.92 <= 13.87 > 13.87 <= 1.23 > 1.23 <= 32.67 > 32.67 <= 46.14 > 46.14

Leaf 45.58 RM NOX NOX Leaf 21.9 Leaf 45.89 LR Node 17 AGE Leaf 50.0 LR Node 18 Leaf 29.321 Leaf 35.35 DIS Leaf 49.86

<= 6.54 > 6.54 <= 0.53 > 0.53 <= 0.61 > 0.61 <= 12.13 > 12.13 <= 38.2 > 38.2 <= 22.50 > 22.5 <= 2.58 > 2.58

Leaf 21.63 Leaf 27.43 Leaf 20.02 Leaf 16.24 Leaf 16.63 Leaf 11.08 Leaf 9.82 Leaf 13.66 Leaf 23.4 Leaf 15.89 Leaf 20.21 Leaf 24.45 Leaf 49.6 Leaf 44.06

Fig. 1. (a)The Univariate Regression Tree and (b) Top-Down Functional regres-
sion tree for the Housing problem.

Figure 2(b) presents the full functional regression tree using both top-down
and bottom-up multivariate approaches. In this case, decision nodes could con-
tain (not necessarily) tests based on a linear combination of the original at-
tributes, and leaf nodes could predict (not necessarily) values obtained by using
a linear-regression function built from the examples that fall at this node.

4 Related Work

In the context of classiﬁcation problems, several algorithms have been presented

that use at each decision node tests based on linear combination of the attributes
[2,8,15,4,5]. Also, Gama [6] has presented Cascade Generalization, a method to
combine classification algorithms by means of constructive induction. The work
presented here near follows this method but in the context of regression domains.
Another difference is related to the pruning algorithm. In this work we consider
functional leaves.
Breiman et al. [2] presents the first extensive and in-depth study of the prob-
lem of constructing decision and regression trees. But, while in the case of deci-
sion trees they consider internal nodes with a test based on linear combination
of attributes, in the case of regression trees internal nodes are always based on a
single attribute. Quinlan [10] has presented the system M5. It builds multivari-
ate trees using linear models at the leaves. In the pruning phase for each leaf a
linear model is built. Recently, Witten and Eibe [16] have extended M5. A linear
model is built at each node of the initial regression tree. All the models along a
particular path from the root to a leaf node are then combined into one linear
model in a smoothing step. Also Karalic [7] has studied the influence of using
linear regression in the leaves of a regression tree. As in the work of Quinlan,
162 João Gama

RM LR Node 14

<= 29.71 > 29.71

<= 6.94 > 6.94
LR Node 15 LR Node 17

LSTAT RM
<= 18.11 > 18.11 <= 41.0 > 41.0

<= 14.4 > 14.4 <= 7.43 > 7.43 LR Node 16 DIS LR Leaf LR Node 17

<= 13.87 > 13.87 <= 1.22 > 1.22 <= 46.14 > 46.14
LR Leaf CRIM LR Leaf B
LR Leaf LR Leaf Leaf 50.0 LR Leaf DIS Leaf 49.86
<= 6.99 > 6.992 <= 361.92 > 361.92
<= 2.58 > 2.58

LR Leaf LR Leaf Leaf 21.9 Leaf 45.89 Leaf 49.6 Leaf 44.06

Fig. 2. (a)The Bottom-Up Functional Regression Tree and (b) The Functional
Regression Tree for the Housing problem.

Karalic shows that it leads to smaller models with increase of performance. Torgo
[13] has presented an experimental study about functional models for regression
tree leaves. Later, the same author [14] has presented the system RT. Using RT
with linear models at the leaves, RT builds and prunes a regular univariate tree.
Then at each leaf a linear model is built using the examples that fall at this leaf.

5 Experimental Evaluation
It is commonly accepted that multivariate regression trees should be competitive
against univariate models. In this section we evaluate the proposed algorithm,
its lesioned variants, and its components on a set of benchmark datasets. For
comparative proposes we evaluate also system M52 . The main goal in this ex-
perimental evaluation is to study the influence in terms of performance of the
position inside a regression tree of the linear models. We evaluate three situa-
tions:
– Trees that could use linear combinations at each internal node.
– Trees that could use linear combinations at each leaf.
– Trees that could use linear combinations both at each internal and leaf nodes.
All evaluated models are based on the same tree growing and pruning algorithm.
That is, they use exactly the same splitting criteria, stopping criteria, and prun-
ing mechanism. Moreover they share many minor heuristics that individually
are too small to mention, but collectively can make difference. Doing so, the dif-
ferences on the evaluation statistics are due to the differences in the conceptual
2
We have used M5 from the last version of Weka environment. We have used several
regression systems. The most competitive was M5.
Functional Trees for Regression 163

model. In this work we estimate the performance of a learned model using the
mean squared error statistic.

5.1 Evaluation Design and Results

We have chosen 16 datasets from the Repository of Regression problems at LI-
ACC3 . The choice of datasets was restricted by the criteria that almost all the
attributes are ordered with few missing values4 . To estimate the mean squared
error of an algorithm on a given dataset we use 10 fold cross validation. To ap-
ply pairwise comparisons we guarantee that, in all runs, all algorithms learn and
test on the same partitions of the data. The following table resumes datasets
characteristics:
Dataset #Examples #Attributes Dataset #Examples #Attributes
Abalone AB 4177 8 (1 Nom.) Auto-Mpg AU 398 6
Cart Car 40768 10 Computer CA 8192 21
Cpu CP 210 7 Diabetes DI 43 2
Elevators EL 8752 17 Fried FR 40768 10
House 16 H16 22784 16 House 8 H8 22784 8
Housing HO 506 13 Kinematics KI 8192 8
Machine MA 209 5 Pole PO 5000 48
Pyrimidines PY 74 28 Quake QU 2178 3

The results in terms of MSE and standard deviation are presented in Table
1. The first two columns refer to the results of the components of the hybrid
algorithm. The following three columns refer to the simplified versions of our
algorithm and the full model. The last column refers to the M5 system. For each
dataset, the algorithms are compared against the full multivariate tree using the
Wilcoxon signed rank-test. The null hypothesis is that the difference between
error rates has median value zero. A − (+) sign indicates that for this dataset
the performance of the algorithm was worse (better) than the full model with a p
value less than 0.01. It is interesting to note that the full model (MT) significantly
improves over both components (LR and UT) in 9 datasets out of 16. Table 1
also presents a comparative summary of the results. The first line presents the
geometric mean of the MSE statistic across all datasets. The second line shows
the average rank of all models, computed for each dataset by assigning rank 1
to the best algorithm, 2 to the second best and so on. The third line shows the
average ratio of MSE. This is computed for each dataset as the ratio between the
MSE of one algorithm and the MSE of M5. The fourth line shows the number of
significant differences using the signed-rank test taking the multivariate tree MT
as reference. We use the Wilcoxon Matched-Pairs Signed-Ranks Test to compare
the error rate of pairs of algorithms across datasets. The last line shows the p
values associated with this test for the MSE results on all datasets and taking
3
http://www.ncc.up.pt/∼ltorgo/Datasets
4
The actual implementation ignores missing values at learning time. At application
time, if the value of the test attribute is unknown, all descendent branches produce
a prediction. The final prediction is a weighted average of the predictions.
164 João Gama

L.Regression Univ. Tree Functional Trees

Data (LR) (UT) Top Bottom MT M5
AB − 10.586±1.1 − 6.992±0.5 4.659±0.3 − 4.995±0.3 4.674±0.3 4.524±0.3
AU 11.069±3.3 − 17.407±6.9 8.905±2.5 8.853±2.9 8.914±2.4 7.490±2.8
CA − 94.277±17.0 − 10.133±0.8 + 6.268±0.7 6.844±0.5 6.343±0.7 − 6.961±1.3
CP − 3201±3183 2114±2557 1131±2766 1062±2545 1960±2361 1063±1623
Car − 5.685±0.1 + 0.996±0.0 − 1.012±0.0 0.995±0.0 1.007±0.0 + 0.994±0.0
DI 0.398±0.2 0.469±0.3 0.474±0.2 0.398±0.2 0.398±0.2 − 0.469±0.3
EL2 − 0.008±0.0 − 0.014±0.0 0.004±0.0 − 0.005±0.0 0.004±0.0 − 0.0048±0.0
FR − 6.925±0.3 − 3.171±0.1 − 1.783±0.1 − 2.163±0.1 1.772±0.1 − 1.928±0.1
H16 − 2.074e9±2.5e8 − 1.608e9±2.2e8 1.19e9±1.4e8 1.17e9±1.8e8 1.21e9±1.4e8 1.26e9±1.2e8
H8 − 1.73e9±2.1e8 − 1.18e9±1.32e8 1.01e9±1.23e8 1.0e9±1.12e8 1.01e9±1.1e8 9.97e8±9.3e7
HO 23.683±10.5 15.687±4.9 15.785±7.1 11.666±3.3 16.576±8.6 12.875±4.9
KI − 0.041±0.0 − 0.035±0.0 0.025±0.0 0.027±0.0 0.025±0.0 0.026±0.0
MA 4684±3657 3764±3798 3077±2247 2616±2289 3464±3080 3301±2462
PO − 939.44±41.9 + 44.93±5.1 86.03±15.2 + 36.57±5.8 86.18±14.8 + 41.95±6.2
PY 0.015±0.0 0.017±0.0 0.012±0.0 0.013±0.0 0.013±0.0 0.012±0.0
QU 0.036±0.0 0.036±0.0 − 0.037±0.0 0.036±0.0 0.036±0.0 0.037±0.0
2)M SE × 1000
Summary of MSE Results
LR UT FT-T FT-B FT M5
Geometric Mean 39.7 22.99 17.15 16.22 17.7 16.4
Average Rank 5.4 4.8 3.0 2.4 2.9 2.3
Average Ratio 4.0 1.47 1.06 0.99 1.1 1
Signi. Wins/Losses 0/10 2/8 1/3 1/3 – 2/4
Wilcoxon Test 0.0 0.02 0.21 0.1 – 0.23

Table 1. Results in mean squared error (MSE).

MT as reference. All the functional trees have a similar performance. Using the
signiﬁcant test as criteria, FT is the most performing algorithm. It is interesting
to note that the bottom-up version is the most competitive algorithm. Neverthe-
less there is a computational cost associated with the increase in performance
veriﬁed. To run all the experiments referred here, FT requires almost 1.8 more
time than the univariate regression tree.

6 Conclusions

We have presented a hybrid regression tree that combines a linear-regression

function with a univariate regression tree by means of constructive induction.
At each decision node two inductive steps occurs: the first one consists of building
the linear-regression function; the second one consists of applying the regression
tree criteria. In the first step the linear-regression function is not used (i.e. not
used to produce predictions). All decisions, such as stopping, choosing the split-
ting attribute, etc, are delayed to the second inductive step. The final decision
is made by the regression tree criteria. Using Wolpert’s terminology [17] the
constructive step performed at each decision node represents a bi-stacked gen-
eralization. From this point of view, the proposed methodology can be seen as
Functional Trees for Regression 165

a general architecture for combining algorithms by means of constructive induc-

tion, a kind of local bi-stacked generalization.
In this paper we have studied where to use decisions based on a combination
of attributes. Instead of considering multivariate models restricted to leaves, we
analyze and evaluate multivariate models both at leaves and internal nodes. Our
experimental study suggests that the full model, that is a functional model using
linear regression both at decision nodes and leaves, improves the performance of
the algorithm. A natural extension of this work consists of applying the pro-
posed methodology to classiﬁcation problems. We have done this work using a
discriminant function as attribute constructor. It is subject of another paper.
The results on classiﬁcation are consistent with the conclusions of this paper.

Acknowledgments
Gratitude to the ﬁnancial support given by the FEDER project, projects Sol
Eu-Net, Metal, and ALES, and the Plurianual support of LIACC. We would like
to thank Luis Torgo and the referees for their useful comments.

References
1. L. Breiman. Arcing classifiers. The Annals of Statistics, 26(3):801–849, 1998.
2. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression
Trees. Wadsworth International Group., 1984.
3. Carla E. Brodley. Recursive automatic bias selection for classifier construction.
Machine Learning, 20:63–94, 1995.
4. Carla E. Brodley and Paul E. Utgoff. Multivariate decision trees. Machine Learn-
ing, 19:45–77, 1995.
5. João Gama. Probabilistic Linear Tree. In D. Fisher, editor, Machine Learning,
Proceedings of the 14th International Conference. Morgan Kaufmann, 1997.
6. João Gama and P. Brazdil. Cascade Generalization. Machine Learning, 41:315–
343, 2000.
7. Aram Karalic. Employing linear regression in regression tree leaves. In Bernard
Neumann, editor, European Conference on Artificial Intelligence, 1992.
8. S. Murthy, S. Kasif, and S. Salzberg. A system for induction of oblique decision
trees. Journal of Artificial Intelligence Research, 1994.
9. W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C:
the art of scientific computing 2 Ed. University of Cambridge, 1992.
10. R. Quinlan. Learning with continuous classes. In Adams and Sterling, editors,
Proceedings of AI’92. World Scientific, 1992.
11. R. Quinlan. Combining instance-based and model-based learning. In P.Utgoff,
editor, ML93, Machine Learning, Proceedings of the 10th International Conference.
Morgan Kaufmann, 1993.
12. R. Quinlan. Data mining tools See5 and C5.0. Technical report, RuleQuest Re-
search, 1998.
13. Luis Torgo. Functional models for regression tree leaves. In D. Fisher, editor, Ma-
chine Learning, Proceedings of the 14th International Conference. Morgan Kauf-
mann, 1997.
166 João Gama

14. Luis Torgo. Inductive Learning of Tree-based Regression Models. PhD thesis,
University of Porto, 2000.
15. P. Utgoﬀ and C. Brodley. Linear machine decision trees. Coins technical report,
91-10, University of Massachusetts, 1991.
16. Ian Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and
Techniques with Java Implementations. Morgan Kaufmann Publishers, 2000.
17. D. Wolpert. Stacked generalization. In Neural Networks, volume 5, pages 241–260.
Pergamon Press, 1992.

MLT - Lab - Manual FINAL
No ratings yet
MLT - Lab - Manual FINAL
38 pages
Decision Tree in ML
No ratings yet
Decision Tree in ML
21 pages
MI - Unit 4
No ratings yet
MI - Unit 4
79 pages
Lecture 7 - Decision Tree Regression Imran 19032025 103416am
No ratings yet
Lecture 7 - Decision Tree Regression Imran 19032025 103416am
40 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
17 pages
2005 - LANDWEHR - Logistic Model Trees
No ratings yet
2005 - LANDWEHR - Logistic Model Trees
45 pages
Mathematical Optimization in Classification and Regression Trees
No ratings yet
Mathematical Optimization in Classification and Regression Trees
29 pages
Best Splitting Attributes ML
No ratings yet
Best Splitting Attributes ML
34 pages
Secret Extended
No ratings yet
Secret Extended
11 pages
Documento
No ratings yet
Documento
5 pages
Logistic - Regression
No ratings yet
Logistic - Regression
31 pages
Classification and Regression Tree
No ratings yet
Classification and Regression Tree
5 pages
Chap9 Cart 574 1
No ratings yet
Chap9 Cart 574 1
42 pages
THUẬT TOÁN
No ratings yet
THUẬT TOÁN
4 pages
Multivariate Decision Trees: © 1995 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands
No ratings yet
Multivariate Decision Trees: © 1995 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands
33 pages
08 Tree Regression 1
No ratings yet
08 Tree Regression 1
49 pages
A Comparative Analysis of Methods For Pruning Decision Trees
No ratings yet
A Comparative Analysis of Methods For Pruning Decision Trees
16 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
Mod4 Eda
No ratings yet
Mod4 Eda
13 pages
Chapter 7 - Trees
No ratings yet
Chapter 7 - Trees
80 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
36 pages
Unit II
No ratings yet
Unit II
34 pages
Gee Cart 2008
No ratings yet
Gee Cart 2008
8 pages
6 - CART Models
No ratings yet
6 - CART Models
15 pages
Unit-7 ML
No ratings yet
Unit-7 ML
11 pages
Module10 TreeBasedMethods
No ratings yet
Module10 TreeBasedMethods
33 pages
Module09 TreeBasedMethods
No ratings yet
Module09 TreeBasedMethods
36 pages
Unit Iii Machine Learning
No ratings yet
Unit Iii Machine Learning
19 pages
Unit 2
No ratings yet
Unit 2
11 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Árboles de Regresión. Algunos Algoritmos y Extensiones A Métodos de Consenso Autor David Gonzalo Ejea Carbonell
No ratings yet
Árboles de Regresión. Algunos Algoritmos y Extensiones A Métodos de Consenso Autor David Gonzalo Ejea Carbonell
34 pages
Flog Ug
No ratings yet
Flog Ug
924 pages
STAT 432: Basics of Statistical Learning: Tree and Random Forests
No ratings yet
STAT 432: Basics of Statistical Learning: Tree and Random Forests
54 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Support, Decision and Random
No ratings yet
Support, Decision and Random
8 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Unit IV
No ratings yet
Unit IV
36 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Discrete Mathematics An Open Introduction, 2nd Edition
No ratings yet
Discrete Mathematics An Open Introduction, 2nd Edition
242 pages
Chapter 09 CART - N
No ratings yet
Chapter 09 CART - N
24 pages
8051 Microcontroller Program
100% (1)
8051 Microcontroller Program
15 pages
Maths STD (1) .Pdf2222
No ratings yet
Maths STD (1) .Pdf2222
35 pages
FMLanswerkey-IT 2
No ratings yet
FMLanswerkey-IT 2
11 pages
Figure 9: Process of Knowledge Data Discovery Based On
No ratings yet
Figure 9: Process of Knowledge Data Discovery Based On
7 pages
Insurance Analytics: Prof. Julien Trufin
No ratings yet
Insurance Analytics: Prof. Julien Trufin
64 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Ece-Vii-dsp Algorithms & Architecture (10ec751) - Notes
0% (2)
Ece-Vii-dsp Algorithms & Architecture (10ec751) - Notes
186 pages
Chapter 9 - Classification and Regression Trees: Data Mining For Business Intelligence
No ratings yet
Chapter 9 - Classification and Regression Trees: Data Mining For Business Intelligence
36 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
Functional Models For Regression Tree Leaves: Luís Torgo
No ratings yet
Functional Models For Regression Tree Leaves: Luís Torgo
9 pages
Trees Handout
No ratings yet
Trees Handout
51 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Elementary Statistics A Step by Step Approach 9th Edition Bluman Test Bank PDF Download
100% (2)
Elementary Statistics A Step by Step Approach 9th Edition Bluman Test Bank PDF Download
65 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Classification and Regression Tree Construction
No ratings yet
Classification and Regression Tree Construction
18 pages
Cost of Quality-Fial Case Study
No ratings yet
Cost of Quality-Fial Case Study
12 pages
Bucketwheel Stacker Reclaimers - Part1
No ratings yet
Bucketwheel Stacker Reclaimers - Part1
10 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Resensi Big Data, Data Mining, and Machine Learning "Bahasa Inggris"
No ratings yet
Resensi Big Data, Data Mining, and Machine Learning "Bahasa Inggris"
2 pages
Get Signals and Systems Principles and Applications 1st Edition Shaila Dinkar Apte Free All Chapters
No ratings yet
Get Signals and Systems Principles and Applications 1st Edition Shaila Dinkar Apte Free All Chapters
55 pages
Presuel Moreno2013
No ratings yet
Presuel Moreno2013
9 pages
Classification and Regression Tree Methods
No ratings yet
Classification and Regression Tree Methods
13 pages
Transcend Ts4gssd25h M
No ratings yet
Transcend Ts4gssd25h M
34 pages
Silva 2017
No ratings yet
Silva 2017
16 pages
Data Science Interview Preparation (30 Days of Interview Preparation)
No ratings yet
Data Science Interview Preparation (30 Days of Interview Preparation)
22 pages
Phatak 2005
No ratings yet
Phatak 2005
3 pages
Emotional Artificial Neural Network EANN-based Pre
No ratings yet
Emotional Artificial Neural Network EANN-based Pre
9 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
Grade 12 Maths Model Exam
No ratings yet
Grade 12 Maths Model Exam
57 pages
C27 Btest-1 Physics Paper
No ratings yet
C27 Btest-1 Physics Paper
8 pages
ANUCDE Math Assignment
No ratings yet
ANUCDE Math Assignment
8 pages
Hyperspectral Imaging
No ratings yet
Hyperspectral Imaging
24 pages
Or Assignment 4 Queuing N Simulation
0% (1)
Or Assignment 4 Queuing N Simulation
2 pages
Micro Economicsforever Sem3
No ratings yet
Micro Economicsforever Sem3
34 pages
CS 4476 Project 1 Description
No ratings yet
CS 4476 Project 1 Description
8 pages
2 Limits To Post
No ratings yet
2 Limits To Post
3 pages
11 Phy DPP 32
No ratings yet
11 Phy DPP 32
4 pages
Maths F3 PP1 QS
No ratings yet
Maths F3 PP1 QS
15 pages
Electrical First Term Allocation
No ratings yet
Electrical First Term Allocation
1 page
Neural Networks and M5 Model Trees in Modelling Water Level-Discharge Relationship
No ratings yet
Neural Networks and M5 Model Trees in Modelling Water Level-Discharge Relationship
16 pages
Discontinuidades en Concreto
No ratings yet
Discontinuidades en Concreto
9 pages
Deep Learning Review in Construction
No ratings yet
Deep Learning Review in Construction
14 pages
ME 2019 New Question Format Chem 2
No ratings yet
ME 2019 New Question Format Chem 2
4 pages
OMAE2007-29155: Mooring Line Damping Estimation by A Simplified Dynamic Model
No ratings yet
OMAE2007-29155: Mooring Line Damping Estimation by A Simplified Dynamic Model
8 pages
Emotion Recognition Based On Joint Visual and Audi
No ratings yet
Emotion Recognition Based On Joint Visual and Audi
4 pages
Bài Tập Về Nhà Buổi 1: YÊU CẦU: Viết mô hình hồi quy mẫu và tính R, RSS, Fqs của các bài sau
No ratings yet
Bài Tập Về Nhà Buổi 1: YÊU CẦU: Viết mô hình hồi quy mẫu và tính R, RSS, Fqs của các bài sau
2 pages
GOVT 702: Advanced Political Analysis Georgetown University
No ratings yet
GOVT 702: Advanced Political Analysis Georgetown University
5 pages
Nassim Taleb 5% P-Values
No ratings yet
Nassim Taleb 5% P-Values
4 pages
Tiger Tools
No ratings yet
Tiger Tools
2 pages
Object Detection and Recognition: Final Project Title
No ratings yet
Object Detection and Recognition: Final Project Title
6 pages
UMEP Sample
No ratings yet
UMEP Sample
2 pages
Science MODEL QUESTION 2079
No ratings yet
Science MODEL QUESTION 2079
5 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet

Regression Trees

Uploaded by

Regression Trees

Uploaded by

Functional Trees for Regression

LIACC, FEP, University of Porto

Abstract. In this paper we present and evaluate a new algorithm for

Regression Trees. A regression tree uses a divide-and-conquer strategy that de-

Constructive Induction. Constructive Induction discovers new attributes from

2 The Algorithm for Regression Trees

possible partition of the dataset. We distinguish between nominal attributes and

2.1 Functional Trees for Regression

This algorithm is similar to many others, except in the constructive step

Bottom-Up Approach We denote as Bottom-Up Approach to functional re-

Top-Down Approach We denote as Top-Down Approach to functional regres-

In this section we use the well-known regression dataset Housing to illustrate

<= 6.94 > 6.94 <= 29.71 > 29.71

LSTAT RM LR Node 15 LR Node 19

DIS CRIM Leaf 32.11 B LR Node 16 DIS LR Node 20 LR Node 19

In the context of classiﬁcation problems, several algorithms have been presented

<= 29.71 > 29.71

5.1 Evaluation Design and Results

L.Regression Univ. Tree Functional Trees

Table 1. Results in mean squared error (MSE).

We have presented a hybrid regression tree that combines a linear-regression

a general architecture for combining algorithms by means of constructive induc-

You might also like