0% found this document useful (0 votes)
27 views11 pages

Regression Trees

This document presents a new algorithm for regression problems that combines regression trees and linear regression. The algorithm grows a regression tree in the standard way, but at each internal node it fits a linear regression model to the examples at that node. It then uses the predictions from that linear model to generate a new attribute, which it uses to split the data and continue growing the tree. This allows the tree to learn oblique splits in the data rather than just axis-parallel splits. An experimental evaluation on 16 datasets found that this algorithm has advantages in generalization ability compared to regression trees or linear regression alone.

Uploaded by

Preeti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views11 pages

Regression Trees

This document presents a new algorithm for regression problems that combines regression trees and linear regression. The algorithm grows a regression tree in the standard way, but at each internal node it fits a linear regression model to the examples at that node. It then uses the predictions from that linear model to generate a new attribute, which it uses to split the data and continue growing the tree. This allows the tree to learn oblique splits in the data rather than just axis-parallel splits. An experimental evaluation on 16 datasets found that this algorithm has advantages in generalization ability compared to regression trees or linear regression alone.

Uploaded by

Preeti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Functional Trees for Regression

João Gama

LIACC, FEP, University of Porto


[email protected]

Abstract. In this paper we present and evaluate a new algorithm for


supervised learning regression problems. The algorithm combines a uni-
variate regression tree with a linear regression function by means of con-
structive induction. When growing the tree, at each internal node, a
linear-regression function creates one new attribute. This new attribute
is the instantiation of the regression function for each example that fall at
this node. This new instance space is propagated down through the tree.
Tests based on those new attributes correspond to an oblique decision
surface. Our approach can be seen as a hybrid model that combines a lin-
ear regression known to have low variance with a regression tree known
to have low bias. Our algorithm was compared against to its components,
and two simplified versions, and M5 using 16 benchmark datasets. The
experimental evaluation shows that our algorithm has clear advantages
with respect to the generalization ability when compared against its com-
ponents and competes well against the state-of-art in regression trees.

1 Introduction
The generalization capacity of a learning algorithm depends on the appropriate-
ness of its representation language to express a generalization of the examples
for the given task. Different learning algorithms employ different representations,
search heuristics, evaluation functions, and search spaces. It is now commonly
accepted that each algorithm has its own selective superiority [3]; each is best
for some but not all tasks. The design of algorithms that explore multiple repre-
sentation languages and explore different search spaces has an intuitive appeal.
This paper presents one such algorithm.
In the context of supervised learning problems it is useful to distinguish
between classification problems and regression problems. In the former the target
variable takes values in a finite and pre-defined set of un-ordered values, and the
usual goal is to minimize a 0-1-loss function. In the latter the target variable
is ordered and takes values in a subset of . The usual goal is to minimize a
squared error loss function. Mainly due to the differences in the type of the target
variable successful techniques in one type of problems are not directly applicable
to the other type of problems.
Gama [6] has presented a technique to combine classifiers that use distinct
representation languages using constructive induction. In this work we study
the applicability of a related technique for regression problems. In particular we
combine a linear model with a regression tree using constructive induction.

F. Hoffmann et al. (Eds.): IDA 2001, LNCS 2189, pp. 156–166, 2001.

c Springer-Verlag Berlin Heidelberg 2001
Functional Trees for Regression 157

Generalized Linear Models. Generalized Linear Models (GLM) is the most fre-
quently applied statistical technique to set a relationship between several inde-
pendent variables
 and a target variable. In the most general terms GLM are of
the form w0 + wi × fi (xi ). GLM estimation is aimed at minimizing the sum of
squared deviations of the observed values for the dependent variable from those
predicted by the model. One appealing characteristic is that there is an analyti-
cal solution for this problem. The coefficients for the polynomial using the least
squares error criterion is found by solving the equation: W = (X T X)−1 X T Y
In this paper, we assume that fi is the identity function leading to the linear
multiple regression.

Regression Trees. A regression tree uses a divide-and-conquer strategy that de-


composes a complex problem into simpler problems and recursively applies the
same strategy to the sub-problems. This is the basic idea behind well-known
regression tree based algorithms [2,10,14]. The power of this approach comes
from the ability to split the space of the attributes into subspaces, whereby each
subspace is fitted with different functions. The main drawback of this method is
its instability for small variations of the training set [2].

Constructive Induction. Constructive Induction discovers new attributes from


the training set and transforms the original instance space into a new high di-
mensional space by applying attribute constructor operators. The difficulty is
how to choose the appropriate operators for the problem in question. In this
paper we argue that, in regression domains described at least partially by nu-
merical attributes, techniques based on GLM are a useful tool for constructive
induction.
The algorithm that we describe in this work is in the confluence of these
three areas. It explores the power of divide-and-conquer from regression trees
and the ability of generating hyper-planes from linear-regression. It integrates
both using constructive induction. In the next section of the paper we describe
our proposal to functional regression trees. In Section 3 we discuss the different
variants of regression models. In Section 4 we present related work both in the
classification and regression settings. In Section 5 we evaluate our algorithm on
16 benchmark datasets. Last Section concludes the paper.

2 The Algorithm for Regression Trees

The standard algorithm to build regression trees consists of two phases. In the
first phase a large decision tree is constructed. In the second phase this tree
is pruned back. The algorithm to grow the tree follows the standard divide-
and-conquer approach. The most relevant aspects are: the splitting rule, the
termination criterion, and the leaf assignment criterion. With respect to the last
criterion, the usual rule consists of assignment a constant to a leaf node. This
constant is usually the mean of the y values taken from the examples that fall
at this node. With respect to the splitting rule, each attribute value defines a
158 João Gama

possible partition of the dataset. We distinguish between nominal attributes and


continuous ones. In the former the number of partitions is equal to the number of
values of the attribute, in the latter a binary partition is obtained. To estimate
the merit of the
 partition obtained by a given attribute we use the following
 ( y i )2
heuristic: i ni , where i represents the number of partitions and ni is the
number of examples in partition i. The attribute that maximizes this expression
is chosen as test attribute at this node.
The pruning phase consists of traversing the tree in a depth-first fashion.
At each non-leaf node two measures should be estimated. An estimate of the
error of the subtree below this node, that is computed as a weighted sum of
the estimated error for each leaf of the subtree, and the estimated error of the
non-leaf node if it was pruned to a leaf. If the latter is lower than the former, the
entire subtree is replaced to a leaf. To estimate the error at each leaf we assume
a χ2 distribution of the variance of the cases in it. Following [14] a pessimistic
estimate of the MSE at each node t is given by:
n−1 1 1
M SE × ×( 2 + 2 ) (1)
2 χα/2,n−1 χ1−α/2,n−1
where M SE is the mean squared error at this node, n denotes the number of
examples at this node, and α is the confidence level.
All of these aspects have several and important variants, see for example
[2,14]. Nevertheless all decision nodes contain conditions based in the values
of one attribute. The first proposal for a multivariate regression tree has been
presented in 1992 by Quinlan [10]. System M5 builds a tree-based model but
can use at the leaves multiple-linear models. The goal of this paper is to study
when and where to use decisions based on a combination of attributes. Instead
of considering multivariate models restricted to leaves, we analyze and evaluate
multivariate models both at leaves and internal nodes.

2.1 Functional Trees for Regression


In this section we present the general algorithm to construct a functional re-
gression tree. Given a set of examples and an attribute constructor, the main
algorithm used to build a decision tree is:
Function Tree(Dataset, Constructor)
1. If Stop Criterion(DataSet) Return a Leaf Node with a constant value.
2. Construct a model Φ using Constructor
3. For each example x ∈ DataSet
– Compute ŷ = Φ(x)
– Extend x with a new attribute ŷ.
4. Select the attribute that maximizes some merit-function
5. For each partition i of the DataSet using the selected attribute
– Treei = Tree(Dataseti , Constructor)
6. Return a Tree, as a decision node based on the select attribute, containing
the Φ model, and descendents Treei .
End Function
Functional Trees for Regression 159

This algorithm is similar to many others, except in the constructive step


(steps 2 and 3). Here a function is built and mapped to a new attribute. In this
paper, we restricted the Constructor to the linear-regression (LR) function [9].
There are some aspects of this algorithm that should be made explicit. In step
2, a model is built using the Constructor function. This is done using only the
examples that fall at this node. Later, in step 3, the model is mapped to one new
attribute. The merit of the new attribute is evaluated using the merit-function
of the decision tree, and in competition with the original attributes (step 4). The
models built by our algorithm have two types of decision nodes: Those based
on a test of one of the original attributes, and those based on the values of the
constructor function. Once a tree has been constructed, it is pruned back. The
general algorithm to prune the tree is:
Function Prune(Tree)
1. Estimate Leaf Error as the error at this node.
2. If Tree is a leaf Return Leaf Error.
3. Estimate Constructor Error as the error of Φ 1 .
4. For each descendent i
– Backed Up Error += Prune(Treei )
5. If argmin(Leaf Error,Constructor Error,Backed Up Error)
– Is Leaf Error
• Tree = Leaf
• Tree Error = Leaf Error
– Is Model Error
• Tree = Constructor Leaf
• Tree Error = Constructor Error
– Is Backed Up Error
• Tree Error = Backed Up Error
6. Return Tree Error
End Function
The error estimates needed in step 1 and 3 use equation 1. The pruning
algorithm produces two different types of leaves: ordinary leaves that predict
the mean of the target variable observed in the examples that fall at this node,
and constructor leaves that is leaves that predict the value of the constructor
function learned (in the growing phase) at this node.
By simplifying our algorithm we obtain different conceptual models. Two
interesting lesions are described in the following sub-sections.

Bottom-Up Approach We denote as Bottom-Up Approach to functional re-


gression trees when the multivariate models are used exclusively at leaves. This
is the strategy used for example in M5 [11,16], in Cubist [12], and in RT system
[14]. In our tree algorithm this is done restricting the selection of the test at-
tribute (step 4 in the growing algorithm) to the original attributes. Nevertheless
we still build, at each node, the linear-regression function. The model built by
1
The Constructor model learned in the growing phase at this node.
160 João Gama

the linear-regression function is used later in the pruning phase. In this way, all
decision nodes are based in the original attributes. Leaf nodes could contain a
constructor model. A leaf node contains a constructor model if and only if in the
pruning algorithm the estimated mean-squared error of the constructor model
is lower than the Backed-up-error and the estimated error of the node has if a
leaf replaced it.

Top-Down Approach We denote as Top-Down Approach to functional regres-


sion trees when the multivariate models are used exclusively at decision nodes
(internal nodes). In our algorithm, restricting the pruning algorithm to choose
only between the Backed Up Error and the Leaf Error obtain these kinds
of models. In this case all leaves predict a constant value.
Our algorithm can be seen as a hybrid model that performs a tight combi-
nation of a decision tree and a linear-regression. The components of the hybrid
algorithm use different representation languages and search strategies. While the
tree uses a divide-and-conquer method, the linear-regression performs a global
minimization approach. While the former performs feature selection, the latter
uses all (or almost all) the attributes to build a model. From the point of view of
the bias-variance decomposition of the error [1] a decision tree is known to have
low bias but high variance, while linear-regression is known to have low variance
but high bias. This is the desirable behavior for components of hybrid models.

3 An Illustrative Example

In this section we use the well-known regression dataset Housing to illustrate


the different variants of regression models. The attribute constructor used is the
linear regression model. Figure 1(a) presents a univariate tree for the Housing
dataset. Decision nodes only contain tests based on the original attributes. Leaf
nodes predict the average of y values taken from the examples that fall at the
leaf.
In a top-down functional tree (Figure 1(b)) decision nodes could contain (not
necessarily) tests based on a linear combination of the original attributes. The
tree contains a mixture of built-in attributes, denoted as LR Node, and original
attributes, e.g. AGE, DIS. Any of the linear-regression attributes can be used
both at the node where they have been created and at deeper nodes. For example,
the LR Node 19 has been created at the second level of the tree. It is used as
test attribute at this node, and also (due to the constructive ability) as test
attribute at the third level of the tree. Leaf nodes predict the average of y values
of the examples that fall at this leaf. In a bottom-up functional tree (Figure
2(a)) decision nodes only contain tests based on the original attributes. Leaf
nodes could predict (not necessarily) values obtained by using a linear-regression
function built from the examples that fall at this node. This is the kind of model
regression trees that usually appears on the literature. For example, systems M5
[11,16] and RT [14] generate this kind of models.
Functional Trees for Regression 161

RM LR Node 14

<= 6.94 > 6.94 <= 29.71 > 29.71

LSTAT RM LR Node 15 LR Node 19

<= 14.4 > 14.4 <= 7.44 > 7.44 <= 18.11 > 18.11 <= 41.0 > 41.0

DIS CRIM Leaf 32.11 B LR Node 16 DIS LR Node 20 LR Node 19

<= 1.38 > 1.38 <= 6.99 > 6.99 <= 361.92 > 361.92 <= 13.87 > 13.87 <= 1.23 > 1.23 <= 32.67 > 32.67 <= 46.14 > 46.14

Leaf 45.58 RM NOX NOX Leaf 21.9 Leaf 45.89 LR Node 17 AGE Leaf 50.0 LR Node 18 Leaf 29.321 Leaf 35.35 DIS Leaf 49.86

<= 6.54 > 6.54 <= 0.53 > 0.53 <= 0.61 > 0.61 <= 12.13 > 12.13 <= 38.2 > 38.2 <= 22.50 > 22.5 <= 2.58 > 2.58

Leaf 21.63 Leaf 27.43 Leaf 20.02 Leaf 16.24 Leaf 16.63 Leaf 11.08 Leaf 9.82 Leaf 13.66 Leaf 23.4 Leaf 15.89 Leaf 20.21 Leaf 24.45 Leaf 49.6 Leaf 44.06

Fig. 1. (a)The Univariate Regression Tree and (b) Top-Down Functional regres-
sion tree for the Housing problem.

Figure 2(b) presents the full functional regression tree using both top-down
and bottom-up multivariate approaches. In this case, decision nodes could con-
tain (not necessarily) tests based on a linear combination of the original at-
tributes, and leaf nodes could predict (not necessarily) values obtained by using
a linear-regression function built from the examples that fall at this node.

4 Related Work

In the context of classification problems, several algorithms have been presented


that use at each decision node tests based on linear combination of the attributes
[2,8,15,4,5]. Also, Gama [6] has presented Cascade Generalization, a method to
combine classification algorithms by means of constructive induction. The work
presented here near follows this method but in the context of regression domains.
Another difference is related to the pruning algorithm. In this work we consider
functional leaves.
Breiman et al. [2] presents the first extensive and in-depth study of the prob-
lem of constructing decision and regression trees. But, while in the case of deci-
sion trees they consider internal nodes with a test based on linear combination
of attributes, in the case of regression trees internal nodes are always based on a
single attribute. Quinlan [10] has presented the system M5. It builds multivari-
ate trees using linear models at the leaves. In the pruning phase for each leaf a
linear model is built. Recently, Witten and Eibe [16] have extended M5. A linear
model is built at each node of the initial regression tree. All the models along a
particular path from the root to a leaf node are then combined into one linear
model in a smoothing step. Also Karalic [7] has studied the influence of using
linear regression in the leaves of a regression tree. As in the work of Quinlan,
162 João Gama

RM LR Node 14

<= 29.71 > 29.71


<= 6.94 > 6.94
LR Node 15 LR Node 17

LSTAT RM
<= 18.11 > 18.11 <= 41.0 > 41.0

<= 14.4 > 14.4 <= 7.43 > 7.43 LR Node 16 DIS LR Leaf LR Node 17

<= 13.87 > 13.87 <= 1.22 > 1.22 <= 46.14 > 46.14
LR Leaf CRIM LR Leaf B
LR Leaf LR Leaf Leaf 50.0 LR Leaf DIS Leaf 49.86
<= 6.99 > 6.992 <= 361.92 > 361.92
<= 2.58 > 2.58

LR Leaf LR Leaf Leaf 21.9 Leaf 45.89 Leaf 49.6 Leaf 44.06

Fig. 2. (a)The Bottom-Up Functional Regression Tree and (b) The Functional
Regression Tree for the Housing problem.

Karalic shows that it leads to smaller models with increase of performance. Torgo
[13] has presented an experimental study about functional models for regression
tree leaves. Later, the same author [14] has presented the system RT. Using RT
with linear models at the leaves, RT builds and prunes a regular univariate tree.
Then at each leaf a linear model is built using the examples that fall at this leaf.

5 Experimental Evaluation
It is commonly accepted that multivariate regression trees should be competitive
against univariate models. In this section we evaluate the proposed algorithm,
its lesioned variants, and its components on a set of benchmark datasets. For
comparative proposes we evaluate also system M52 . The main goal in this ex-
perimental evaluation is to study the influence in terms of performance of the
position inside a regression tree of the linear models. We evaluate three situa-
tions:
– Trees that could use linear combinations at each internal node.
– Trees that could use linear combinations at each leaf.
– Trees that could use linear combinations both at each internal and leaf nodes.
All evaluated models are based on the same tree growing and pruning algorithm.
That is, they use exactly the same splitting criteria, stopping criteria, and prun-
ing mechanism. Moreover they share many minor heuristics that individually
are too small to mention, but collectively can make difference. Doing so, the dif-
ferences on the evaluation statistics are due to the differences in the conceptual
2
We have used M5 from the last version of Weka environment. We have used several
regression systems. The most competitive was M5.
Functional Trees for Regression 163

model. In this work we estimate the performance of a learned model using the
mean squared error statistic.

5.1 Evaluation Design and Results


We have chosen 16 datasets from the Repository of Regression problems at LI-
ACC3 . The choice of datasets was restricted by the criteria that almost all the
attributes are ordered with few missing values4 . To estimate the mean squared
error of an algorithm on a given dataset we use 10 fold cross validation. To ap-
ply pairwise comparisons we guarantee that, in all runs, all algorithms learn and
test on the same partitions of the data. The following table resumes datasets
characteristics:
Dataset #Examples #Attributes Dataset #Examples #Attributes
Abalone AB 4177 8 (1 Nom.) Auto-Mpg AU 398 6
Cart Car 40768 10 Computer CA 8192 21
Cpu CP 210 7 Diabetes DI 43 2
Elevators EL 8752 17 Fried FR 40768 10
House 16 H16 22784 16 House 8 H8 22784 8
Housing HO 506 13 Kinematics KI 8192 8
Machine MA 209 5 Pole PO 5000 48
Pyrimidines PY 74 28 Quake QU 2178 3

The results in terms of MSE and standard deviation are presented in Table
1. The first two columns refer to the results of the components of the hybrid
algorithm. The following three columns refer to the simplified versions of our
algorithm and the full model. The last column refers to the M5 system. For each
dataset, the algorithms are compared against the full multivariate tree using the
Wilcoxon signed rank-test. The null hypothesis is that the difference between
error rates has median value zero. A − (+) sign indicates that for this dataset
the performance of the algorithm was worse (better) than the full model with a p
value less than 0.01. It is interesting to note that the full model (MT) significantly
improves over both components (LR and UT) in 9 datasets out of 16. Table 1
also presents a comparative summary of the results. The first line presents the
geometric mean of the MSE statistic across all datasets. The second line shows
the average rank of all models, computed for each dataset by assigning rank 1
to the best algorithm, 2 to the second best and so on. The third line shows the
average ratio of MSE. This is computed for each dataset as the ratio between the
MSE of one algorithm and the MSE of M5. The fourth line shows the number of
significant differences using the signed-rank test taking the multivariate tree MT
as reference. We use the Wilcoxon Matched-Pairs Signed-Ranks Test to compare
the error rate of pairs of algorithms across datasets. The last line shows the p
values associated with this test for the MSE results on all datasets and taking
3
http://www.ncc.up.pt/∼ltorgo/Datasets
4
The actual implementation ignores missing values at learning time. At application
time, if the value of the test attribute is unknown, all descendent branches produce
a prediction. The final prediction is a weighted average of the predictions.
164 João Gama

L.Regression Univ. Tree Functional Trees


Data (LR) (UT) Top Bottom MT M5
AB − 10.586±1.1 − 6.992±0.5 4.659±0.3 − 4.995±0.3 4.674±0.3 4.524±0.3
AU 11.069±3.3 − 17.407±6.9 8.905±2.5 8.853±2.9 8.914±2.4 7.490±2.8
CA − 94.277±17.0 − 10.133±0.8 + 6.268±0.7 6.844±0.5 6.343±0.7 − 6.961±1.3
CP − 3201±3183 2114±2557 1131±2766 1062±2545 1960±2361 1063±1623
Car − 5.685±0.1 + 0.996±0.0 − 1.012±0.0 0.995±0.0 1.007±0.0 + 0.994±0.0
DI 0.398±0.2 0.469±0.3 0.474±0.2 0.398±0.2 0.398±0.2 − 0.469±0.3
EL2 − 0.008±0.0 − 0.014±0.0 0.004±0.0 − 0.005±0.0 0.004±0.0 − 0.0048±0.0
FR − 6.925±0.3 − 3.171±0.1 − 1.783±0.1 − 2.163±0.1 1.772±0.1 − 1.928±0.1
H16 − 2.074e9±2.5e8 − 1.608e9±2.2e8 1.19e9±1.4e8 1.17e9±1.8e8 1.21e9±1.4e8 1.26e9±1.2e8
H8 − 1.73e9±2.1e8 − 1.18e9±1.32e8 1.01e9±1.23e8 1.0e9±1.12e8 1.01e9±1.1e8 9.97e8±9.3e7
HO 23.683±10.5 15.687±4.9 15.785±7.1 11.666±3.3 16.576±8.6 12.875±4.9
KI − 0.041±0.0 − 0.035±0.0 0.025±0.0 0.027±0.0 0.025±0.0 0.026±0.0
MA 4684±3657 3764±3798 3077±2247 2616±2289 3464±3080 3301±2462
PO − 939.44±41.9 + 44.93±5.1 86.03±15.2 + 36.57±5.8 86.18±14.8 + 41.95±6.2
PY 0.015±0.0 0.017±0.0 0.012±0.0 0.013±0.0 0.013±0.0 0.012±0.0
QU 0.036±0.0 0.036±0.0 − 0.037±0.0 0.036±0.0 0.036±0.0 0.037±0.0
2)M SE × 1000
Summary of MSE Results
LR UT FT-T FT-B FT M5
Geometric Mean 39.7 22.99 17.15 16.22 17.7 16.4
Average Rank 5.4 4.8 3.0 2.4 2.9 2.3
Average Ratio 4.0 1.47 1.06 0.99 1.1 1
Signi. Wins/Losses 0/10 2/8 1/3 1/3 – 2/4
Wilcoxon Test 0.0 0.02 0.21 0.1 – 0.23

Table 1. Results in mean squared error (MSE).

MT as reference. All the functional trees have a similar performance. Using the
significant test as criteria, FT is the most performing algorithm. It is interesting
to note that the bottom-up version is the most competitive algorithm. Neverthe-
less there is a computational cost associated with the increase in performance
verified. To run all the experiments referred here, FT requires almost 1.8 more
time than the univariate regression tree.

6 Conclusions

We have presented a hybrid regression tree that combines a linear-regression


function with a univariate regression tree by means of constructive induction.
At each decision node two inductive steps occurs: the first one consists of building
the linear-regression function; the second one consists of applying the regression
tree criteria. In the first step the linear-regression function is not used (i.e. not
used to produce predictions). All decisions, such as stopping, choosing the split-
ting attribute, etc, are delayed to the second inductive step. The final decision
is made by the regression tree criteria. Using Wolpert’s terminology [17] the
constructive step performed at each decision node represents a bi-stacked gen-
eralization. From this point of view, the proposed methodology can be seen as
Functional Trees for Regression 165

a general architecture for combining algorithms by means of constructive induc-


tion, a kind of local bi-stacked generalization.
In this paper we have studied where to use decisions based on a combination
of attributes. Instead of considering multivariate models restricted to leaves, we
analyze and evaluate multivariate models both at leaves and internal nodes. Our
experimental study suggests that the full model, that is a functional model using
linear regression both at decision nodes and leaves, improves the performance of
the algorithm. A natural extension of this work consists of applying the pro-
posed methodology to classification problems. We have done this work using a
discriminant function as attribute constructor. It is subject of another paper.
The results on classification are consistent with the conclusions of this paper.

Acknowledgments
Gratitude to the financial support given by the FEDER project, projects Sol
Eu-Net, Metal, and ALES, and the Plurianual support of LIACC. We would like
to thank Luis Torgo and the referees for their useful comments.

References
1. L. Breiman. Arcing classifiers. The Annals of Statistics, 26(3):801–849, 1998.
2. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression
Trees. Wadsworth International Group., 1984.
3. Carla E. Brodley. Recursive automatic bias selection for classifier construction.
Machine Learning, 20:63–94, 1995.
4. Carla E. Brodley and Paul E. Utgoff. Multivariate decision trees. Machine Learn-
ing, 19:45–77, 1995.
5. João Gama. Probabilistic Linear Tree. In D. Fisher, editor, Machine Learning,
Proceedings of the 14th International Conference. Morgan Kaufmann, 1997.
6. João Gama and P. Brazdil. Cascade Generalization. Machine Learning, 41:315–
343, 2000.
7. Aram Karalic. Employing linear regression in regression tree leaves. In Bernard
Neumann, editor, European Conference on Artificial Intelligence, 1992.
8. S. Murthy, S. Kasif, and S. Salzberg. A system for induction of oblique decision
trees. Journal of Artificial Intelligence Research, 1994.
9. W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C:
the art of scientific computing 2 Ed. University of Cambridge, 1992.
10. R. Quinlan. Learning with continuous classes. In Adams and Sterling, editors,
Proceedings of AI’92. World Scientific, 1992.
11. R. Quinlan. Combining instance-based and model-based learning. In P.Utgoff,
editor, ML93, Machine Learning, Proceedings of the 10th International Conference.
Morgan Kaufmann, 1993.
12. R. Quinlan. Data mining tools See5 and C5.0. Technical report, RuleQuest Re-
search, 1998.
13. Luis Torgo. Functional models for regression tree leaves. In D. Fisher, editor, Ma-
chine Learning, Proceedings of the 14th International Conference. Morgan Kauf-
mann, 1997.
166 João Gama

14. Luis Torgo. Inductive Learning of Tree-based Regression Models. PhD thesis,
University of Porto, 2000.
15. P. Utgoff and C. Brodley. Linear machine decision trees. Coins technical report,
91-10, University of Massachusetts, 1991.
16. Ian Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and
Techniques with Java Implementations. Morgan Kaufmann Publishers, 2000.
17. D. Wolpert. Stacked generalization. In Neural Networks, volume 5, pages 241–260.
Pergamon Press, 1992.

You might also like